Virtual Machinery - Sidebar 7 - How can I find out which code needs to be reviewed?

Sidebar 7 - How can I find out which code needs to be reviewed?

Links

**** Just released - JHawk 6.1.3 - See here for details - new Review Factor metric and DataViewer command line service to create motion charts (Professional licenses only) ****

Code Reviews – How can we detect bad code?

We do code reviews a lot more than we think we do. Of course there are formal code reviews where we review a piece of code that has been created internally (or externally) to our specification. The purpose of the review will be to ensure that the code meets that specification, how it fits into the overall design, conforms to the house style and whether it’s written in a way that’s robust and easy to maintain. Outside of these formal reviews we review code –

When we inherit code (perhaps from a colleague, perhaps an old library that we need to change)
When we are evaluating a company/product that we are thinking of buying
When we are trying to merge code from different sources – either to create a new hybrid product or to create a ‘best of breed’ product

Specification and house style can be automated with things like JUnit, Sonar and Checkstyle. But we are left with the indefinable things of robustness and maintainability. What you are really looking for here is a measure of how happy you and your team are going to be looking after this code.

The problem is that everybody writes code differently, and people at different phases in their career and in different coding cultures write code differently depending on which phase or culture they are in. Sometimes that makes the initial engagement frustrating. I’ve often looked at code and inwardly groaned ‘why the ****** did they do that?’ and after a couple of days gone ‘OK – now I see why’. So I now take a more cautious, methodical approach to reviewing code.

To ascertain the quality of code you have to read it – and then you have to understand it, and then you have to figure out whether it is good code or bad code. The less code there is to read the better – you can get code fatigue scanning line after line and it becomes easy to miss even the most obvious mistakes.

I think about books when I think about code. Imagine you have to proof read a book and you want to estimate how long it’s going to take you to do the job. The kind of factors that you have to take into consideration would be:

It’s much easier to proof read a book written for children than it is to proof read a book written for an adult.
It’s much easier to proof read a book that has short paragraphs whose lines are well separated on the page than a book with forty line paragraphs with tiny spaces between lines.
It’s much easier to proof read a popular novel then a literary novel
It’s much easier to proof read a novel than it is a technical manual

Looking at these criteria generally you can see that layout, vocabulary, subject matter and density contribute to the difficulty of the task. When we review code those same factors influence the difficulty of the task. You can still have errors in a children’s book but they will be easier to spot than those in novel and much easier than those in academic textbook.

When we are reviewing code we can either read all the code or we can filter the code and only look at the classes that we think might be of interest. If we have reviewed the code before then this task is easier - we will probably just look at the code that has changed since our last review. That’s not necessarily a guarantee that we’ll catch any new bugs but reviewing all the code every time is usually not feasible.

Most of us are used to reading books – I could probably give you two books and you could tell me which would be more difficult to proofread after a quick glance through them. If you are used to reviewing code you could probably tell me which of two (say) Java classes was more prone to error. This discussion is about what factors you would use (consciously or unconsciously) to make that judgement. When we are reviewing code we can either read all the code or we can filter the code and only look at the classes that we think might be of interest. If we have reviewed the code before then this task is easier - we will probably just look at the code that has changed since our last review. That’s not necessarily a guarantee that we’ll catch any new bugs but reviewing all the code every time is usually not feasible. So let me suggest how I used to approach this –

I didn’t look at interfaces
I didn’t look at value objects i.e. classes that consist of getters and setters and are just used to move a value around and perhaps format it for the purposes of display.
I didn’t look at simple methods – methods that only have a few statements, that don’t have complex if…else logic and that only have one return point (or are void).

That can eliminate some code from the review process but I’m still left with a lot. I’ve always wanted to find a way of getting straight to the classes and methods that might have issues. So I looked at a way to somehow build a metric that was built on the experience I had gained over the years doing code reviews and I came up with the following rules in addition to those mentioned above:

Classes with a lot of methods are more difficult to understand. This is because understanding the class means that we have to retain more information about the class as we review it. This is equivalent to having to review a very large book.
Very long methods are more difficult to read and are therefore more likely to have errors. My basic rule is that you should be able to see the entire method without scrolling for more than one page in whatever IDE you use. I generally use Eclipse and if I am doing a code review I would have quite a small scroll window open. I feel 60 lines is the maximum – sometimes you can’t avoid a long method if it contains a lengthy switch statement for example. This is like trying to review a badly written book where the text hasn’t been split into reasonably sized paragraphs.
Complex methods are more difficult to read. Cyclomatic complexity is one measure but it relates complexity to the level of branching in the code. Density of expressions is also a factor –in Java we can measure this by dividing number of expressions by the number of Java statements. Continuing our book analogy cyclomatic complexity tells you how complicated the paragraphs are and expression density tells you how complex the sentences are.
When you are reading code it is more difficult to do so when there are references to classes outside of the code. In the past this aspect of code has been measured using coupling metrics. This is rather like reviewing an academic book where you have to check the references to other texts. On closer examination I found that there were a wide range of factors relating to class references so I’ll talk about them separately below.

Coupling metrics and external class references

As we said above it is more difficult to read code that has references to other classes. These classes can be:

Java library classes – some of those we might know well and some not so well but we know that these classes are, in general, trustworthy. Whether they have been called correctly is another matter but we would hope that their interfaces are as well known to the developers of the code as they are to us, the reviewer. We are less likely to need to go and look at the APIs of these classes as we will certainly know the more common ones e.g. String, Collection classes, mathematics-related classes etc. In the book analogy these could be references to texts that everyone knows – Shakespeare or the Bible for example.
Third party libraries – again there are some libraries that we know to be well trusted e.g. log4j and some others that we know less about – in general we will view these as less reliable than the Java libraries and we are more likely to have to consult their APIs as a part of our review process. These are equivalent to references to other texts that we probably don’t know but we would tend to trust them if they appear to have a reputable source e.g. journals published by the ACM. In the design of our code we may have specified the use of these libraries because we are familiar with them.
Classes that are part of the system under review. This is equivalent to reviewing references to other parts of the book. At least in this case we have access to the code for these classes and if we are reviewing these classes as well we will know how much we can trust them. For example we might feel happy to ignore them if they are interfaces or value objects. But our access to greater knowledge can help us to discriminate between the classes, for example:
- A reference to an interface should require us to do less research than a reference to a class as we would hope that our interfaces would be more clearly defined and documented. In terms of maintenance an interface is simpler as we can change the implementation independent of the interface.
- References to classes vary as well :
  - A reference to a class in the same package (or even in the same class) is going to be easier to find and if we have the source code of the referencing class to hand we are also likely to have the source of the referred class to hand as well. There is a good chance that these classes were written by the same pool of people and/or in the same timeframe so they are more likely to share a common style, making them easier to read. We may also have reviewed this class as part of the current review – meaning that it may be fresh in our mind.
  - A reference to a class in a different package will be likely to be the reverse of the situation of a class in the same package – more likely to follow a different style (as the package may have been written by another team), and potentially the source code is not available (or less easy to find).
  - We should also have sufficient knowledge to tell whether the referenced class implements an interface. If it does implement an interface and we are not using that interface then that should be flagged in some way as it suggests there is a flaw in the interface that forces us to use the class.
- How we reference the class is important –
  - Declaration of a variable
  - Instantiation of a variable via its constructor
  - Instantiation of a variable of one class via a call to the method of a variable of another class
  - Passed as a parameter
  - Static method call
  - Instance method call
  - Reference to static variable
  - Reference to instance variable
  - Cast

So I think we can try to think of a hierarchy here which allows us to rank these factors based on their contribution to the quality of our code. I think it makes sense to eliminate things that are unlikely to contribute negatively to the quality of our code:

We don’t include interfaces in our calculations
We don’t include instance variable accessor methods in our calculations

At a class level we need to make some kind of judgement on the size of class that we view as acceptable. So we have to ask a couple of questions:

Should we just include methods that are not accessor methods in our assessment of size?
Should we include instance and static variables (constants) in our assessment of size?
Should we include static code in our assessment of size?
If our class is a value object (i.e. it consists only of instance and static variables and accessors) is there a limit to the number of methods/variables that would suggest that this class is poorly designed?

In answer to these questions I would suggest that:

accessor methods should not be included in our assessment of size on the basis that they don’t really contribute to the scale of our review task
the number of instance and static variables will certainly contribute towards the complexity of the class but they are not the biggest influence. A class could have 4 instance variables and yet carry out some very complex activity – this would be reflected in the complexity of the methods in the class. A class might have 10 instance variables and be a value object with only get and set methods. A class might define a lot of constants which are used either within the class or throughout the system. Since we want to encourage the use of constants rather than using hard-coded strings and numbers we should not include constants in the criteria we use for judging whether a class should be reviewed
we should treat static code as if it were a method
A value class with a huge number of instance variables (other than constants) is probably poorly designed.

So I would suggest the following measures for the class that should contribute towards our review coefficient. The phrase in brackets suggests how great that contribution should be:

Number of methods / static code blocks that are not accessors or constructors (Moderate)
Number of instance / static variables (Low)
Number of packages imported (Low)

When we look at methods we should look at:

How much code is in the method – number of lines of code, number of statements. This equates to how much information we have to absorb to understand the method.
How complex is the code in the method – expression density, cyclomatic complexity. So here we are calculating how difficult it might be to understand the method
How dependent is the method on outside code – e.g. class references, method calls, external variable references. This relates to the amount of extra research we have to do to understand the method.
How much information does the method need – i.e. parameters. This is additional information that comes in to the method – again this is likely to force us to spend some time looking outside the method to understand what this additional information is and why it is required.
How much information does the method create – local variables created, instance variables instantiated. This is the new information that the method creates either for use at the class level (i.e. information that is made available for other methods to use) or at the method level (local variables used to carry out functions local to the method or to be passed as parameters to another method).

I would suggest the following measures for methods that should contribute towards our review coefficient. Again the phrase in brackets suggests how great that contribution should be.

Length of method (lines of code) (Low)
Expression density (High)
Cyclomatic complexity (High)
Interface reference (Low)
Number of parameters passed (High)
Number of local variables declared (High)
Class reference to Java library (Low)
Class reference to third party library (Moderate)
Concrete Class reference to class in system – same package (Low)
Concrete Class reference to class in system – different package (Moderate)
Concrete Class reference to class in system that could be replaced by interface reference (Very High)
Reference impact adjustment – declaration of variable, direct instantiation of variable (low)
Reference impact adjustment – cast, indirect instantiation (moderate)
Reference impact adjustment – static method call, instance method call, static variable reference (low)
Reference impact adjustment – instance variable reference (high)

I prefer metrics that, at least at some point, give a definitive answer. I also like metrics that have a fixed range. I’m going to suggest that we make this metric a percentage i.e. on a scale 0 to 100. At zero we are not going to review the code, at 100 we will review the code. Everything else is on a sliding scale in between and it’s up to us to make a decision on the cut-off point that we will use on a particular project.

So taking our rules we should first separate out the rules that would make our decision to review a zero:

The class is an interface
All the methods in the class are either accessors or constructors and there are less than 10 instance variables
All the methods in the class have a cyclomatic complexity of 4 or less and there are less than 30 instance variables

Next let’s take the rules that would make our decision to review a 100 :

Concrete Class reference in a method to a class in the system that could be replaced by an interface reference
Number of instance variables greater than 30
Number of lines of code in one or more methods greater than 100
Cyclomatic complexity of one or more methods greater than 15
The calculations for the individual methods bring us to 100 or more.

So we are left with the following criteria that we feel would have a high impact on the likelihood of a piece of code requiring review:

Expression density (greater than 2)
Cyclomatic complexity (between 5 and 15)
Number of parameters passed (more than 5)
Number of local variables declared (More than 5)
Direct reference to an instance variable in another class

If we think that these have a high impact we should say that if three of those factors pass acceptable limits then then we should definitely review the code. So the maximum value of each of these criteria should be over 34 – let’s call it 40. And the following we feel would have a moderate impact:

Number of methods/static code blocks that are not accessors (divide by 2 up to a maximum of 10)
Class references to third party libraries (multiply by 2 up to maximum of 10)
Class references to classes not in this package (number up to maximum of 10)
Class casts (number up to maximum of 10)
Instantiation of variable using a method call to another class (not a constructor) (multiply by 2 up to maximum of 10)

Even if all these breached their levels that wouldn’t give us definite cause to review a class but if there were a number of other violations in other categories we might consider it so let’s give these criteria a maximum value of 10. So now we are left with the low level criteria :

Number of lines of code (number of lines more than 5 up to maximum of 5)
References to interfaces (number up to maximum of 5)
Class reference to Java library (number up to maximum of 5)
Class reference to class in same package (number up to maximum of 5)
External Method calls (number up to 5)
Declaration of variable (we’ve included this above – double check)
Instantiation of variable using constructor – I believe there should be no penalty for this
Static variable reference – I believe there should be no penalty for this

Again breaching all of these in a method wouldn’t necessarily cause us to definitely want to review that method – giving each of these a maximum value of 5 would mean if all of them were breached we would hit 40. So I created a metric to do this and started reviewing code. Obviously I used our JHawk code tool to do this but I’ve provided you with the criteria I used so you can use whatever tool you like to do this. JHawk allows you to write your own metrics so I’ve provided the JHawk metric code as an appendix to this. The ReviewMetric has been included in the latest release of JHawk so you don’t need to rewrite if you have the latest release – however some of the API calls used in the metric are not available in older versions of JHawk. You can use the JHawk code as a template for your own code if you wish to implement the metric in another environment.

I used the source code of the Eclipse project to do this analysis. I’ve used code from Eclipse 1.0 all the way through to Eclipse 4.21. I was looking for a body of code to measure that allowed me to trace its history through all these editions. I selected the org.eclipse.ui classes – they’ve always been there and although the UI has changed it has generally grown organically rather than suffered cataclysmic change. So I refined my Review Factors code and properties using this body of code. What I was looking for were classes and packages that seemed to have similar attributes and yet ended up on opposite sides of the good/bad code divide. I would then analyse them and see if I could see why.

For example in one instance I found that two classes with very similar numbers of Java Statements, Maintainability Indices and Average Cyclomatic complexity lay on opposite sides of the good/bad divide. Looking at the code I realised that the issue was that the offending class contained a large number of constant declarations. My initial inclination had been to treat all instance variables in the same way – i.e. a class that is doing too much probably has too many instance variables so I set my limit at 30. It is recommended programming practice to use constants rather than literal expressions (e.g. Strings or numbers) so we should not punish this behaviour. A lot of constants that are only used locally could still be a sign of a class that is doing too much so local constants should be treated differently to publically available constants. So I came to the conclusion that instance variables that are constants should not impact the review factor – I did this by setting the multiple for constants to 0, leaving the multiple for other constants at 1. The definition that I used for a constant was that it was marked as public, static and final (using the isConstant() method in the InstanceVariableReferenceInterface interface).

For the file containing the JHawk Package Level Review Factor metric source code click here and choose 'Save target As..'

For the file containing the JHawk Class Level Review Factor metric source code click here and choose 'Save target As..'

If you want to find out more about Java metrics you can read our full tutorial here or you can read the individual sections on System and Package Level Metrics, Class Level Metrics and Method Level Metrics.

You may be interested in our products. All have demo or trial versions. Just click on the links below to find out more -






	All our products can be purchased online - just follow the link below -