Attribution: Horia Varlan

In my last post I discussed Software Lines of Code (SLOC), the most commonly used program metric.  This is a very easy metric to produce but lacks real insight into the program. For this reason, “Logical” SLOC is often used because it measures the number of executable statements. It provides some meaning, but gives no insight into the variables used in the program.

Maurice Halstead introduced some new metrics in 1977. Instead of just counting the lines, you look at the actual verbs and variables used in the program. The calculation begins with a count of the Unique Operators and Operands. It’s a count of all the distinct verbs and variables used in the program.  This gives you an idea of how many unique verbs, but perhaps more importantly, the variables you have in the program. These numbers are added together to come up with the Vocabulary. Next, you count the total number of verbs and variables used. The exercise will include, for example, each occurrence of a variable that is referenced multiple times in the program. This will give you the Total Operators and Operands counts, which are combined to produce the Length.

Halstead metrics values:

  • Unique Operators (n1). The unique or distinct number of verbs and elements other than data elements occurring in your program. Operators are syntactic elements such as +, -, <, >.
  • Unique Operands (n2). The unique or distinct number of data elements occurring in your program. Operands consist of literal expressions, constants and variables.
  • Total Operators (N1). The total number of verbs and elements other than data elements occurring in your program. Paired operators such as BEGIN .. END, DO .. UNTIL , FOR .. NEXT are treated as a single operator.
  • Total Operands (N2). The total number of data elements occurring in your program.
  • Vocabulary (n). The number of unique operators and operands in your program, n, computed n1+n2. This is an estimation of the size of the program’s vocabulary (the number of things that must be known to understand the program).
  • Length (N). The length of your program, computed N1+N2.

This is the core but there are many other metrics based off these building blocks. Before we get to those, let’s fully understand how these can be used to compare programs.

 

TRIMAIN

CWXTCOB

PP110

PDA008

Lines of Code

56

701

2655

3043

Comment Lines

0

164

813

786

Statements

15

200

452

685

Unique Operators
[n1]

10

14

16

34

Unique Operands [n2]

23

204

563

472

Total Operators [N1]

16

201

471

665

Total Operands [N2]

39

522

1194

1717

Vocabulary

33

218

579

506

Length

55

723

1,665

2382

These four programs (TRIMAIN, CWXTCOB, PP110, PDA008) progress in size from a very small one to a relatively large one. Straight Lines of Code shows PDA008 being the largest, but let’s keep looking. The next metric here is Comment Lines so we can see how well documented the programs are. No surprises here: the larger the program, the more comments that will be needed. You want to look at the ratio of comments to statements to get a real feel of how well a program is documented. The count of Statements gives us a better feel of size, and here we can see that PDA008 is still the largest.

Now we can look at the Halstead Metrics. The Unique Operators are not much different for the first three, but much larger for the last. But, when we look at Unique Operands, which is actually a count of all the variables used in the program, not merely defined, we see that PP110 has the most variables to understand. The Unique counts give us a base of what is in the program, but not actual size. To see how often these are used we look at the Total counts. Here we can start to see the complexity grow, but the important metrics – Vocabulary and Length – are next. It is often said that Vocabulary is the number of things that must be known to understand the program. I think of it as the “number of things I have to keep in my head to know what is going on in the program.” This is the base number, using Unique. For TRIMAIN it is 33 things. I can handle that. CWXTCOB has 218 so it is more challenging to handle, but the next two are interesting. Because there are more variables used in PP110, I need to know more to understand it than PDA008. This is something that is not revealed in LOC or Statements. Lastly there is Length where we see the usual progression in the programs.

So looking at these metrics what have we learned? Hopefully that there is more to understanding and comparing programs that just their apparent size. Using the Halstead Vocabulary and Length provide more reliable metrics for judging what is important – how much you need to understand to work on a program. From this you can start to make decisions on working with a program, such as how hard it will be and how long it may take.

For more information on the Halstead Metrics see Halstead, Maurice H. (1977). Elements of Software Science. Amsterdam: Elsevier North-Holland, Inc. ISBN 0-444-00205-7.Citations