I’ve written a lot about metrics as a means to gauge the complexity of a program including SLOC and the Halstead and McCabe metrics. An often over looked metric – and one I haven’t touched on before – is one that’s in plain sight: comments
Comments are added to programs presumably to explain things that aren’t clear. And sometimes they’re enough just to get the gist of how complex it might be. While you could start with a simple count of the number of comment lines it is better to get a ratio of Comments/Statements. You can do this a few ways. The first is to take the number of comment lines and divide that into the total Lines of Code. The alternative is to divide the comments by the number of executable statements. This gives you a better idea of the ratio of the comments to the real “guts” of the program.
Relating this metric to actual complexity can be difficult. It is best to think of it as a gauge of what the authors of the program felt they needed to say. They may have felt the need to add a lot of comments due to the complexity. To see if this might be the case you could align this metric with the Halstead and McCabe metrics. This would give you an idea if the amount of comments relates to the complexity. If you have low comments with high complexity, it could be more difficult to understand, conversely, a high amount of comments for low complexity could make the program easier to understand.
This metric can be used as part of your Quality Gate. You could establish a standard ratio to make sure that programs are well commented in relationship to size. Watching the comment count can also help you see if comments were added when changes were made. Actually, knowing the amount of comments adds another piece of information to help you better understand your programs, and your portfolio.
Here are some things to keep in mind with this metric:
- it does not know the content of the comments. It only knows that there is a comment in the code;
- this is based on the idea that people naturally comment things that don’t seem clear, so a high level would therefore indicate that it was perceived to be complex, and they accounted for it;
- if you have a standard of having a template at the beginning of a program that contains a summary and list of changes this could skew the results slightly in that the ratio would be higher in smaller programs, but it would be diluted in larger ones;
- consider what a large amount of comments may mean. Perhaps the code itself is too complex or the naming isn’t clear. What could be done to make it more “self-documenting”?