Code Quality Uncensored
Table of Contents
Most of the time, we associate the words “good code” with code quality, but what is good? What is quality? In my opinion, just using these expressions, without digging a little deeper in their meaning, is shallow. That’s why, in this article, we will be considering that high quality code is readable, maintainable, scalable, simple, and easy to change and test. On the other hand, a good code is readable, problem-oriented, and follows coding and programming language best practices.
Why is it important to know if the code has a high quality or not?
Well, one of the reasons is that, when evaluating the code for quality, it is possible to identify areas to be improved or fixed before they cause any harm. This ensures code base healthiness and evolution, and supports the business goals evolution. Additionally, it usually reduces technical debt and improves the overall quality of the software, translating it into developers fire-fighting less, code aging well and increased customer satisfaction.
How do I know if the code has a high quality and how can I improve it?
Thinking about this question, plus my own experience and feelings about code quality, I came up with the following list of instruments that have been helping me build less complex and more maintainable and reliable code.
Quality instruments #
Cyclomatic complexity #
Cyclomatic complexity was introduced by Thomas McCabe in 1976, in which it uses the paths on a program’s control flow graph to count the different possibilities of execution of that program. For example, the image above represents the control flow graph of a very simple and hypothetical software that only has an if-else statement.
The complexity of this graph can be calculated by the expression:
CYC = E − N + 2P
Where E represents the total amount of edges in the graph, N represents the number of nodes and P represents connected components.
In this case, the number of possible execution flows, or the amount of branches, is:
4 − 4 + (2 x 1) = 2
So you might be asking yourself if you need to dig deep into graph theory and cyclomatic complexity to get this measure right? Luckily, there are IDE plugins and other tools that can do this logic for us, and I think they are more than welcome in our daily work.
Despite the usage of this metric as an indicator of general code complexity and quality in the industry, researchers do not have a consensus about its effectiveness (1). In my experience, cyclomatic complexity is a great measurement to:
- define the minimum amount of tests cases necessary to cover the potential paths in the code;
- highlight code areas that may be difficult to understand or maintain because of the high cyclomatic complexity value.
When combining cyclomatic complexity and tests, we have path testing, the topic of our next session. When talking about high cyclomatic complexity as an indicator of difficulty to read and maintain code, there are a few things we can do to reduce it, writing less complex and more maintainable and reliable code:
- Reduce the number of decision points within a function. This makes the functions problem-specific and easier to understand.
The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that. — Robert C. Martin
- Don’t repeat code, reuse it. This makes the code concise with less areas to change.
DRY is about duplication of knowledge, of intent. It’s about expressing the same thing in two different places, possibly in two totally different ways. - The Pragmatic Programmer: 20th Anniversary Edition, 2nd Edition.
- Refactor and improve the code you are working on. Every time you stumble upon something that took you some time to understand, try to rewrite it in a way that it might take less time to understand. This will help you and your team maintain the code for years to come.
Always leave the campground cleaner than you found it - Clean code
Path testing and code coverage #
As mentioned before, cyclomatic complexity is a metric intimately related to path testing, because it determines the number of different paths in a program. By knowing it, tests can be written so that every path in a program has been executed at least once. That’s also known as path testing. For instance, in the previous example of an hypothetical software that only has an if-else statement, there might be 2 test cases:
- a test case where the condition under test is true;
- a test case where the condition under test is false.
Path testing is a strategy to fulfill a certain degree of code coverage. Code coverage is the percentage of code that has been executed when a particular test suite runs.
Oftentimes, the code coverage percentage is highly attached to a quality gate, where its value is used as a threshold to ensure that the software has a certain quality level and is ready for release. But, bear in mind, code coverage is far from being enough to be the single indicator that the code has enough quality and is production ready. This measurement checks if the test executes a specific line of code. It doesn’t evaluate the code behavior for null and blank entries, if something is not executed, if it’s executed in a different order. These test scenarios lay under the developer’s responsibility and might be defined by specifications and limited by time and effort. Thus, it is more important to cover potential scenarios than lines of code.
Specification based tests #
Another way to guide you through what you should test, beside the path testing and code coverage, is specification based tests. Specification based tests are tests that focus on what is expected from the program or function under test. If our previous example of an hypothetical software that only has an if-else statement was validating the age of a person when she logs in a Brazilian winery website, in which the requirements are:
- if the age is less than 18, a message should be displayed informing that the required minimum age was not fulfilled;
- if the age is greater than or equal to 18, the user should be redirected to the winery webpage.
The test cases that could be created based on the requirements are:
- should redirect to winery when answer is yes
- should return message when answer is no
In the previous section the path tests for the same problem were:
- a test case where the condition under test is true;
- a test case where the condition under test is false.
Note how the new tests, based on specification, bring more information to the table and even test the expected output. When doing specification based testing, what we want to do is to focus primarily on the requirements and on the different possible inputs and outputs of the program. Then we can create test cases covering some common “mistakes”, like the possibility of an entry being null or blank. Also, don’t be attached to writing only unit tests for the application. Sometimes, the business rules involve a combination of things. For instance, if you want to delete someone’s account in your winery website, you might want to have that account available before it is deleted. Depending on how you created the process of account creation and deletion it is easier and faster to test via integration testing.
This is the magic of specification based tests, while the previous ones were more focused only on testing the decision point over the output. Here we generate tests for the right combinations of inputs and outputs, then we can apply path testing and code coverage to validate if there is something that was left out. If the coverage report shows that one branch of the code was not covered, don’t worry, go back to the specifications and check if you have a reason to test that case or not. Sometimes we have good reasons and it’s ok to not create that test. This reinforces the idea that code coverage should not be used as the most important metric to determine if a code can be promoted to the next stage and, also, that the more tests we create with the product specifications in mind, the more assertive the code will be. So, when bugs arise, they will come from unknown specifications and uncovered use cases.
Tools #
Beside the practices and measurements mentioned earlier, other methods can be added to help define how much quality does the code have. Some of them are engraved in every developer since day 1, like coding conventions, guidelines and standards used to determine how a code should be written and formatted, in a specific language, to promote consistency and readability, or clean code lessons. Other methods rely on third party softwares that collect data on the application’s behavior. For instance, there are softwares that change parts of the code to see its behavior (mutation testing), or report on data from deployed applications with its mean time between failures and mean time to recovery, repair, respond, or resolve.
Here I highlight tools I have already worked with and helped me and my team on keeping track of our code quality.
- IntelliJ CodeMetrics plugin: to indicate the cyclomatic complexity of each function;
- IntelliJ built-in coverage report: to explore the coverage of a Java code before implementing JaCoCo;
- JaCoCo: code coverage library for Java;
- SonarLint: to flag programming and style errors, and potential construction problems;
- SonarQube: to highlight code duplication, violation of code guidelines, security breaches, and high complexity code.
- Checkstyle with the code guidelines xml file and JaCoCo: to guarantee that every code produced is posses the same standards.
Final thoughts #
In this article, we could dig a little deeper on some measurements and practices that can help us understand if the code we produce has a high quality and how we can improve it. By bringing cyclomatic complexity to light, I hope developers can identify areas of the code that may be difficult to understand, test, and maintain, and then prioritize and allocate resources to improve it. Also, with cyclomatic complexity, for the younger developers out there, I hope this comes to you as a guide on where you can improve your logic. With code coverage and the path testing, I hope you will be able to grow your test suite. But, please, don’t game the metric! Finally, with specification based tests, I hope it becomes more meaningful to write tests. I know, for senior engineers the gut feelings show them what should be tested. However, for younger ones, this might take some time, and specification based tests might help you grow in that field.
In conclusion, to have a high quality code, we might learn and apply cyclomatic complexity, code coverage and perfect our test strategies on a daily basis. By doing so, we ensure that the code is more readable, maintainable, scalable, simple, and easy to change and test. Additionally, we are helping ourselves to identify potential issues before they become major problems, while reducing technical debt. Therefore, developers must take code quality measurement seriously and make it an integral part of their software development process, and don’t worry if you are a beginner, it is ok not to master it. The goal here is to improve the code a little bit every day.
References #
Ajami, S., Woodbridge, Y., & Feitelson, D. G. (2018). Syntax, predicates, idioms — what really affects code complexity? Empirical Software Engineering. doi:10.1007/s10664-018-9628-3
Aniche, M (2023, May). Why do developers hate code coverage? And why they should not hate it!. Effective software testing, by Maurício Aniche. https://www.effective-software-testing.com/why-do-developers-hate-code-coverage
Barbosa, M. A., de Lima Neto, F. B., & Marwala, T. (2016). Tolerance to complexity: Measuring capacity of development teams to handle source code complexity. 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). doi:10.1109/smc.2016.7844689
H. Hemmati, “How Effective Are Code Coverage Criteria?,” 2015 IEEE International Conference on Software Quality, Reliability and Security, Vancouver, BC, Canada, 2015, pp. 151-156, doi: 10.1109/QRS.2015.30.
Sommerville, I. (2011). Software Engineering, 9/E. Pearson Education India.