Code reviews - whatever form they take - are generally seen as a very useful and powerful instrument to guarantee the quality of a program, in terms of adhering to the requirements and in terms of maintainability in all its aspects. But there seem very few specific guidelines published. And by "specific" I mean: what sort of things in a piece of code should one look for? I want to collect such specific guidelines in this document and as a start I have assembled a number of coding practices that caused problems or made me wonder ...
Note: the examples I give below are both from programs other people wrote and from programs I wrote myself - I am hardly immune to bad coding practices.
Another note: while I present this as guidelines for code reviews, you can also use these rules while coding in the first place. And while the code fragments are in Fortran, most of the rules apply equally to other languages.
I try to capture the underlying principle of each example as a positive rule and then give the example. But there is no grouping yet or any priority.
For instance: the message
Error: Invalid number of areasin a program that dealt with geographical areas was rahter puzzling. It turned out, after inspecting the source code that we had exceeded an arbitrary limit in the number of areas it could handle. A much clearer and adequate message would have been:
Error: Too many areasor even better:
Error: Too many areas - increase size of array XHere are a few other examples:
An internal error occurred(in a message box. No further information: it is not even clear if the program stops after clicking the "OK" button)
Invalid line in header: ....(the culprit line is shown, but no reason why it is invalid)
For instance: if the library is used in a graphical user-interface, it would be much better to present a messagebox to indicate the fatal error (and what can be done about it) than simply terminate the entire program. But the form should be left to the program!
For instance, the program that dealt with geograpical areas was a large GUI handling input scenarios via a relational database. Jut about everything in it was dynamically allocated - except the arrays holding the information on the geographical areas.
Another one I have come across: strings of exactly 12 characters, so that a DOS file name would fit perfectly ...
However:
real(kind=kind(1.0d0)) :: value1 = 0.1 real(kind=kind(1.0d0)) :: value2 = 0.1d0will give two slightly different values for these two variables.
Using temporary files for storing intermediate results means that the data structure is spread out over the writing and reading part of these files. So changes in the structure require changes in various locations of the program code in addition to the changes required for the actual processing.
They can be of great use though: for instance to strip comments from an input file, so that reading the data becomes simpler.
If you want to identify the contents of a string - for instance: see which keyword it contains, then avoid code like this:
if ( string(1:5) .eq. 'start' ) thenUse instead:
if ( string .eq. 'start' ) thenBecause the keyword is "start" full stop, there is no need to limit the string to just the five characters of the keyword. It is even error-prone: miscounting is easy and what do you do with "starttime"?
Above all: if you want to use uppercase characters, do so consistently. If you want to use capitalised keywords like "Program" and "End If", suit yourself, but use the casing consistently. This is even more important for variable names: VarName and VARNAME and varname all indicate the same variable (at least in Fortran), but using these casings arbitrarily makes your program look sloppy.
Lack of documentation made me puzzle over the strange results I got with one particular program (one I wrote!). It turned out that the date, one of the command-line arguments, had to be specified as 2008-01-01, not as 1-jan-2008, two forms it would actually accept, "cleverly" converting between them (this was actually done by one single routine that would automatically convert one form in the other).
You should probably not do this: the logic soon gets very intricate and whatever processing was common is flooded by it.
Another reason: the input with the wrong characteristic can lead to wrong results somewhere down the line (see the rule on documenting the input).
change = min( allowed, potential_change )(for instance in the context of numerically solving partial differential equations with source terms - that is my context), be aware of the effect of negative numbers: if the potential_change is negative, there will be no limitation, despite what you think, and the solution becomes unstable anyway.
Negative numbers as arguments for MOD() and MODULO() are even worse: they may give very different results than you expect. (I would have to look up the definition myself!)
For instance, if you need to find the position of a particular substring in a string, use INDEX(). Do not write a DO-loop:
pos = 0 do i = 1,len(string) if ( string(i:i+3) == 'word' ) then pos = i exit endif enddoA fragment like:
pos = index( string, 'word' )expresses the intention at one glance - not to mention that you do not have to worry about the length of the substring you are searching.
The same holds for control constructs like DO-loops, IF-blocks, SELECT-blocks and so on. Do not write DO-loops like:
do TheLoopIndex = StartOfIndex,StopOfIndex ... enddoLoop indices are usually variables with short names.
do l1 = 1,10 a(ll) = 1 enddo(Do you see the error?)
When you force the compiler to check the declaration of all variables, the typo would be spotted rightaway.
real, parameter :: seconds_per_day = 86400.0 real, parameter :: grams_per_kilo = 1000.0 real, parameter :: kg_per_day = grams_per_kilo / seconds_per_day waste_load = waste_load * kg_per_dayInstead of:
waste_load = waste_load * 1000.0/86400.0or even:
waste_load = waste_load * 0.01574074074(For this particular type of computations, you might even consider using a specialised library that deals with physical units)