The Code Analyser LCLint

by David Santo Orcero

Soon after C was born without function prototyping, it was agreed that code debugging was an excessively complex task. As a result, a very powerful tool, lint, was created which was able to make a large number of static verifications of the code. This interesting program was written by S.C. Johnson in the early '70s, and it was the first static code-validation tool.

With the creation of the ANSI C standard, some lint verifications became superfluous. However, lint continued to be used while other tools were being created. John Guttag and Jim Horning took a step forward by programming LCLint. In a rough approximation, it works the same way as lint, and lint's users will need very little time to understand how to use it. However, with a bit of work, we can improve the detail of analysis we received from lint by using LCLint. Since its birth it has received the financial support of ARPA, NSF and DEC ERP.

Today, the LClint source code is freely available. It was written in C; therefore, anyone with a standard ANSI C compiler can recompile it for their machine. Linux users can download it as a tar archive file and as an RPM package. It is available in both source and executable format. Some distributions, such as SuSE Linux 5.3, currently provide it as part of the distribution. It can be downloaded from http://linuxdoc.org/.

The original lint has far fewer commands than LCLint, and all can be emulated using LCLint. That is why we can use LCLint to replace lint. Much of the difference between lint and LCLint is that lint's annotations are related to minimizing the number of spurious messages, while LCLint's annotations have more features, only some of which are used for this purpose.

lint Commands

All lint commands are fully supported by LCLint, but we can obtain a more accurate analysis using LCLint annotations. The equivalencies between the most common annotations of lint and a more accurate version of the commands are shown in the sidebar “lint Commands”.

The basic idea of LCLint is to run it on our code before compiling. lint will search for many possible bugs in the program and return its findings. However, the quantity of bug candidates, though larger than the -Wall option of gcc, is still small because LCLint makes an analysis of specified code, but does not know if that code was written according to its semantics. As a result, we can use LCLint in one of several forms. The first is to use it as a simple static analyser of raw code. Although, in this mode, LCLint recognizes many mistakes without a big-time investment in the process, it is recommended only for debugging code written by other programmers. This mode of analysis is not the best, because LCLint does not know the semantics used when the code was written and therefore it cannot discover if the program does whatever it theoretically is supposed to do.

When we wish to use it on our own programs, we indicate the semantics of our program through annotations which give LCLint more information on the program functions. LCLint uses that information for deducing whether the program does what we intended and not something else. This form of using LCLint allows us to reduce the debugging effort of large amounts of code with a minimal time investment. I will discuss some annotations to demonstrate the use of LCLint. For more information, read the manual.

A third analysis method is based on the specification of abstract data types. We can specify interfaces, functions, types and predicates using the pseudo-language LCL. LCLint will automatically generate the corresponding header files to such specifications and verify that the code agrees with these specifications. However, although LCL is an algebraic-specification language, LCLint will not prove theorems or properties, verify correctness or completeness of the algebraic definition, or generate code. It will generate headings and verify that the code corresponds to the algebraic definition.

Levels of Analysis Depth in LCLint

LCLint is a versatile tool because it can do code analysis at any level of abstraction we wish. LCLint has several levels of analysis, corresponding to a greater or smaller quantity of different checking techniques. These levels are:

  • weak, which does minimal monitoring. This level should be used only for C code without annotations. LCLint static analysis could be used as a first pass.

  • standard is the default analysis. It does the same as weak analysis, plus a bit more. It needs a few annotations to be truly useful.

  • checks does all the checks of standard, plus complete monitoring of the function parameters and inconsistencies. It is a good level for proving definitely that no statically verifiable mistakes are present before using other software testing techniques.

  • strict makes absurdly strict checks. Should be used only if there is a very subtle but annoying mistake, or if we are fanatics of strong-typed and strong-structured languages. It includes checks such as if the global variables specified within a function are usable in that function (usable or not as marked by annotations), and checking of types is quite strict. The programmers of LCLint assure me that they will give a prize to anyone who writes a real program that receives no warning when using LCLint in this mode.

In addition to these generic analysis options, for each possible check we have an option to enable one and disable another.

We can gain the best use of LCLint using an incremental approach. First of all, compiling the program using Wall can help us find the biggest errors; for example, truly strange operations with pointers or arguments passed to functions in the wrong way.

After this, we can begin with the weak analysis. If we use a naming convention—strongly recommended—it is a good time to use the flags to test that we used the right names and used variables and macros in the right way. This is a deeper level of analysis than we can do without annotations; if we used annotations, we can get that kind of analysis.

The highest level of analysis commonly used is checks. If we obtain only spurious analysis, it is time for dynamic debugging techniques.

The strict level of analysis is good for pieces of code where we cannot find an error that we know exists.

Annotations of LCLint

All LClint annotations have a common syntax, which is

/*@command@*/

All annotations are inserted within code in the same places we can insert comments; thus, its presence in the code does not affect normal compilation.

The most frequent location of the annotations is near the location of modifying semantics; e.g., in the definition of types, if the annotations are going to affect that type.

LCLint Annotations

LCLint has almost a hundred commands. Some of the most useful ones are shown in the sidebar “LCLint Annotations”. We are not activating or deactivating types of checks with these commands, but enriching the code so that LCLint has information on the semantics of the program and is able to do its analysis more accurately.

Naming Conventions

The use of naming conventions is a programming technique which has many users, but also many detractors. LCLint does not force you to use naming conventions, but it contains support for some of them. Supported naming conventions are Slovak, Czech and Czechoslovakian.

Slovak-Naming Convention

The rule of the Slovak-naming convention is that identifiers are constructed with the scheme abstracttypeVarname. The abstract type and the identifier name are separated with the first character of the identifier name in upper case. The annotations of LCLint related to the Slovak naming convention are shown in the sidebar. Remember that a type's name must never have a capital letter when using the Slovak-naming convention.

Czech-Naming Convention

The rule of the Czech-naming convention is that identifiers are constructed with the scheme abstracttype_varname. The abstract type name and the identificator name are separated by an underline character. The modifiers related to Czech-naming convention are shown in the sidebar. Remember that a type's name must never have an underline character when using the Czech-naming convention.

Czechoslovakian-Naming Convention

The Czechoslovakian-naming convention is the same as using Czech- and Slovak-naming convention at the same time. That is why there are valid Czech and Slovak indentificators in the Czechoslovakian-naming convention. The modifiers related to Czechoslovakian naming convention are shown in the sidebar. Remember that a type's name must never have an underline character or a capital letter when using the Czechoslovakian-naming convention.

The Code Analyser LCLint
David Santo Orcero (irbis@df.ibilce.unesp.br) is working on his Dr. Sc. degree in molecular biophysics at IBILCE (Brazil). (www.biocristalografia.df.ibilce.unesp.br/irbis) He received a grant for a FAPESP research project, “Parallel computational methods for molecular biophysics”. He does scientific research on tridimensional structures of haemoglobines and snakes' toxins using a Linux cluster.
Load Disqus comments