Guide to writing portable and efficient code in C



Introduction

This guide mainly focuses prime issues that are involved in writing professional code, which is very efficient both in terms of space and time as well as be readily ported, to different systems with minimal changes. Writing portable code is an area, which is not paid much heed in the initial period of project. The code might work on the developers’ system architecture but once it goes to field, bugs began to raise their ugly heads. This way although time to market the product minimizes but overall time for final product maturity decreases. The best approach for any developer is to avoid bugs in the first place.

One of the easiest ways to write portable software is to do it in a language that is itself portable. However every project has different requirements and hence prime coding language is chosen after a thorough evaluation of various aspects like nature and type of project, development period of project, its application, developers‘ability etc. This guide primarily focuses on portability issues relating to modules written in C language. Further if “used properly”, C language can be portable, but in practice, developers are bugged of many portability problems due to non-portable libraries & header files due to ‘n’ number of reasons of which many would be discussed in this article.

The introduction section briefly explains the topic of the article. It consists of three main sections. The first section consists of common aspects that enhance (or reduce if not used properly) portability of any generic code. The next section gives tips and suggestions pertaining to the design level with specific examples to explain the idea. The last, but not the least section, is a conclusion section, which is followed by references and web resources.


Intended audience

This guide is intended for reference of software developers working in small to large projects using C language as their base coding language. This will also be useful for any person who is aspiring to be a professional developer in systems side programming. It would also help those who are working on compilers and user level libraries. Further it will be a starting point for those who are making move from college level projects to industrial level live projects.

Prerequisites

• C language
• Programming Concepts,
• Data Structures
• Basic Understanding of Compiler



1. Problems and suggestions

This section will focus on developing an approach to write efficient, portable and bug free code.

1. Follow a systematic approach at design time

The main intent of any programmer should ideally be to write less LOC (lines of code) with fewer bugs than more LOC having more potential bugs. It is not feasible to continually re-engineer old modules to make them suitable for the present needs. Design stage is most critical in any product’s life cycle because this stage decides the whole architecture of the system. (mostly done after reiterations as per iterative process models [8]). Thus if any feature level bug remains in the design, the project will come to a halt later. And the main reason for it is poor design decisions. Thus designing a module right the first time not only saves time but also eventually reduces time to market.

So it is better to devote more time at design stage because one will rarely get the chance to re-engineer the module.


2. Coding approach

Another issue that is of prime concern is approach towards development of a module. There are basically two approaches for coding any module. The first way is working feverishly on a module and compile at the last time. The second way is to carefully divide the module into smaller tasks ; than code the module, one piece at a time and finally move on to the next piece only when the developer is certain that the piece he just wrote is working. In this process he not only codes the module but also has done its complete unit testing and also has generated test lobes for it. This way it will make life easy for the developer while system and integration testing.

The Author believes that coding a module a piece at a time is much easier than coding the entire module straight. Although the task looks big once the developer sub divides a module, most of the time the module is completed in time and with minimum bugs.


3. Following a proper coding convention

This is an area which although looks minor but if neglected, than later at integration and testing stages, can create a lot of problems. A good programmer should always adopt a standard coding convention and stick to it consistently. It may be possible that such a convention is already been developed in house as per requirements of the application. Further the developer must also follow some in house conventions specific to their company. For example, there should be a proper convention to declare local variables used in the code. Like if x is to be declared an integer then instead of int x, it is better to declare it as

typedef int uint16_t; uint16_t x;

This away of declaring an integer is useful (in case of embedded programs like writing hardware libraries) when size of ‘int’ is an important factor considering the overall memory available in the system in the long run. By following a systematic approach, the code will not only be efficient but also well documented and readable. Its transfer to a new developer, for e.g. in case the original person leaves it in the middle for some reason, becomes very easy and quick.


4. Module framework

The basic framework of all types of modules are common to a good extend. This implies that the initial approach towards writing any module is same and specifics are followed later. For e.g., consider the case of a protocol layer working in any protocol stack. The generic structure of the main module of any protocol will look like this:

/*Set of important include files*/
#include 
#include 

int main()
{
 ………..
 initialization of state m/c s
 ……….
 Init._Protocol()
 …….

 For (; ;) //infinite probing loop  
 {
   Receive message from the queue or Fifo or buffer used  
   for communication with other Layers
   ……………………
   Read message n identify the layer  
   {
    Check sender layer id and switch on its basis
    switch (senderlayer id)
    {
     case A: //lets say for e.g layers are named as A,B etc
             switch (message type from above message)
             {
              case message type1:
                                  message type1_handler();
                                  break;
              case message type2:
                                  message type2_handler();
                                  break;
      case message type3:
                          message type3_handler();
                          break;
      ……………………………………
      ……………………………………
      ……………………………………

      case default:
                       Illegal message obtained;
                       Handle error;
             } 
     
    case B :
            ………………………
            Same procedure
            ………………………      
   }
  } 
}//end for
………………………………………
return; 
    }//end main

All the protocols implemented using message based approach follows this basic structure and add specifics of that particular layer. In a nutshell every module has some parameters expected, some return value in case of functions, specific entry points and exit point, initialization and destruction sequences etc. One of the best way to start writing a module is to start with an empty function, then define its interface as per specification (this includes parameter definitions, return value etc.), then define its APIENTRY and APIEXIT functions (empty). When all details are know and specific framework is complete then, all the APIENTRY functions are stubbed out and code is complete subsequently. The APIs are then defined in a separate wrapper file (explained later). This approach will make the module more generic.


5. Use of local, global and static variable

Variables used in C language can be classified on the basis of their lifetime and scope. Local variables have their life only in the block in which they are defined. They are initialized to “garbage value” at run time and have priority over global variables in their parent block. Global variables have scope over the file in which it is defined; are initialized to 0 at run time and its life is the point till program remains in memory. Static variables on the other hand are initialized to zero on startup, have scope in the file in which it is defined. So it is better to use local variables over global as too many global variables make the code complex and hard to debug. Local variables can be declared in each block and named as per their purpose.


6. Nesting levels

Many times in a module, the developer uses a deeply nested loop structure to implement some intra module testing. This is basically done using a nested ‘if else’ loop which in many cases can go 7 or 8 times. When nesting gets too deep, the code becomes harder to read and understand. There are two basic solutions to this problem.

• Unroll the tests.
The first solution is to create a Boolean variable that maintains the current success or failure status and to constantly retest it. • Call another function
The second solution is to package the innermost tests into another function and to call that function instead of performing the tests directly.


7. Size of functions and their Entry/Exit point

The primary reason to keep functions small is that it helps to manage a programming problem better. In case developer revisits the code a year later to modify it, he would have forgotten many small details and have to spend a good amount of time to figure out them. It sure helps if functions are small.

As a general rule, every developer should try to keep functions simple and manageable by restricting their length to one page. Most of the time functions are smaller than a page and sometimes they are a page or two. Having a function spanning five pages will only add to confusion.

“As a general rule, try to keep functions under one page”

If a function becomes very long, it is better to reiterate and break the function into smaller functions which each carry a definite and well-defined task of the superset function. A function is nothing but a method that transitions an object from one valid state to another valid state. Another important aspect giving rise to bugs and pitfalls is flow sequence. If a function has one entry point, and one exit point, then it is easier to understand than a function with multiple exit points. It also helps eliminate buggy code because using a return in the middle of a function implies an algorithm that does not have a straightforward flow of control. The algorithm should be redesigned so that there is only one exit point. In a sense, a return in the middle of a function is just like using a goto statement. Also with several returns, in particular case errors checking, developers’ tend to forget doing the proper cleanup before returning after handling the error. However sometimes it becomes necessary to have multiple returns in the code. For e.g.,

void func()
{
  ……………..
  ……………..
  ……………..
 if (!cond1)
 {
   error detected
   cleanup();
   return;
 }
 …………
 …………
 return (1);
} 


8. Functions versus Macros

The developer can sometimes safely replace a subroutine with a macro. A macro is a label that replaces a block of instructions that is used more than once, but only coded once (useful for reentrant code). It differs from a subroutine in that the assembler inserts the code where the call is made rather than having a jump-to-it command. It works by text substitution and is usually faster than a subroutine as no stack operation is involved but takes up more memory due to extra storage.


9. Use of goto statements

The Author goes with the majority opinion that goto statements should be avoided. Functions with goto statements are hard to maintain although in some cases like error handling. However in some case particularly in device driver routines, goto statements can help in debugging the module during development stage, although there are techniques to all together avoid the use of goto construct [10].


10. Adding error control and recovery in the code

In hardware design circuits like FPGAs, PLLs etc, DFT [6] technology is wide used these days. On similar lines, most of the software products have automated test suits and beds to test and understand their functionality. In a nutshell, the debug build and retail build of the product is fast merging on to one. The Author feels that any released application should be able to be switch into debug mode at runtime on its own. For example most of the development tools for embedded systems like Codewarrior, Powertap etc., are based on this strategy.

One way of having a single release of the product is to maintain a global Boolean variable called debug_status that is either FALSE or TRUE. The debugging code could then be placed within an if statement that checks debug_status. This is usually done for debugging code that adds a lot of execution overhead. For debug code that does not add many overheads, the developer should simply include the debugging code and need not bother himself with Debugging.

The obvious advantage of doing this extra work is that in case a customer is runs into a severe problem with the product, the manufacturer can instruct the customer how to run the product in debug mode and this way enable him to possibly find the problem and report for further action.


11. Dependence on compiler for all optimization

Code optimization is an important issue on which different developers have different opinion. The time of code optimization is not that important than the way it is done. The Author feels that while coding the module, the developers should do code optimization side by side of writing it. This approach will reduce the efforts in the later stages. Further developers should not write any code which depends on the compilers behavior. For example, some compilers insert some padding bits to any structures in case the target architecture reads efficiently only from even addresses. In such cases for boundary alignments, it becomes important from program’s point of view as to where exactly in the structure are padding bits inserted. For instance consider this structure,

typedef struct _Packet{
			int u16;
			char x;
			float y;
			} PACKET;    

In this case considering size of int as 16 bits, char as 8 bits and float as 32 bits then total length of this structure is 56 bits or 8 bytes or octets. Now the compiler may add a pad byte to it. There are two possibilities here. i.e., either the compiler will align each individual field at even addresses (in this case it will add a pad byte after char x) or it may add a pad byte at the end. In both cases the interpretation of the structure will change once the code is ported to different systems. Some compilers have a _PackedType attribute to bind the structure in minimal space.

Further in C language the behavior of system is undefined for following cases :

• Floating point representation
• The order in which function designator or arguments of the function in a function call are evaluated.
• Pointer converted to types other than integral or pointer type.
• Order of evaluation of preprocessor concatenation operators # and ## during macro substitution.

In cases like this the best solutions will be to manually pad the structure so as to make it have one single interpretation in all systems. The best way out is to write the program that are not dependent on its data's representation

An optimizing compiler makes a program run faster, but it is a good design that makes a program run faster and in a more efficient manner. A little more time spent on a programming problem generally results in a better design, which can make a program run significantly faster.


12. Use of wrapper libraries

Wrapper libraries are a good way to bundle already-implemented functionality into a portable interface. Even if the implementation of the wrapper library needs conditional compilation to select implementation, possibly from various system-dependent libraries under it, at least the programs that use the wrapper library can be written without conditional compilation. The application programs need not be aware of any kind of system-specific details around the library.

For e.g., suppose one is writing a module in Linux platform using system call malloc ( ) to allocate a small piece of memory then this code might not work in any other operating system as it might have a different OEA and different APIs. Hence it is advisable to define such system calls as User defined Macros and call these macros instead of real system calls. These user-defined macros could be declared along with other definitions. Further the actual definitions of these user-defined macros could be defined in a separate file where in mapping code could be written to make the module compile and run on different environments. This is the procedure of making wrapper libraries.


13. Documentation Tools

Every programmer should document the work he has done and is currently doing so that other persons who need to know about that work can easily read it out. However it is a fact that programmers hate to make documentation where as they love to write code. Further adding comments to their code depends upon programmer to programmer. Further documentation if done once, is not updated regularly and hence it is more than likely out of date because it hasn't been maintained to reflect code changes.

By having all programmers follow a common documentation style in the entire project, it is possible to write a program that scans all source files and produces documentation. The Author personally uses markers in comment blocks to assist in parsing the comments. For example, module comment blocks can be something like this: /*Module entry point # */, APIENTRY function comment blocks can begin with /*API xxxx entry point #*/ and LOCAL function comment blocks can begin with /*Local function xxxxx entry point #*/.


14. Having a Source-Code Control Systems and maintenance of Revision logs

“Always review changes before checking source code back in.”

A source-code control system is a must for any project that is having multiple modules and several people are simultaneously working on more than one module. Even in case of that no two developers are working on the same file at a time, still they would be changing definitions in common headers. This will create the problem of synchronization and data duplicacy and destruction. I like them because they give me access to the source as it existed all the way back to day one. It is also essential for tracking down problems in released software. In Unix environment, RCS and CVS are good sources of version and source control. Among their many advantages, it helps in maintaining an accurate log of what changes were made to a module and document the reasons for the changes. This way the entire revision history becomes available at any time and is maintained by the source-code control system.



2. Guidelines for writing an efficient and portable code

This section talks about common rules and suggestions which would make any C module more efficient in terms of space and speed as well as give it a more portable look. These aspects become more important in case of Embedded Systems where applications needed to be ported to different architectures and having various memory and timing constraints. This part of the article focuses on six main topics i.e., generic declarations, naming conventions arrays, pointers, complex data types, preprocessors and the left overs.


1. Generic declarations


2. Naming conventions

Apart from standard naming convention rules [5], it is better to adopt an in house naming convention and stick to it for the whole project. The specifics will vary with the kind and nature of project but here are some important generic conventions, which are followed worldwide.


3. Arrays



4. Pointers


5. Complex data types
 
	struct{
	      S32 x;
	      U8 y;
	    }a;

	struct{
	      U32 a1;
	      U8 b1;
	    }b;

	# define ASSIGN (a,b) (*memcopy ((char*)&a, (char*)&b, sizeof(a)))


6. Preprocessors



7. Miscellaneous




3. Conclusion

C language is so successful because it is so flexible for both system as well as application programmers. For example, it is flexible to the compiler writer because many key issues are left to the compiler writer to specify how they should work. This was done so that each implementation of C could take advantage of how particular machine architecture works. For example, what is the sign of the remainder upon integer division? How many bytes are there in an int or long or short? Are members of a structure padded to an alignment boundary? Does a zero-length file actually exist? What is the ordering of bytes within an int, long or short? etc.

Further issues like writing data to files or sockets in binary, native form on a system may be not be readable when ported to different system and sometimes to another system of same configuration. Most compilers provide a chapter or two in their documentation on how they have implemented these and many more implementation-defined behaviors. It is always a good approach to read some details about the system set on which the code is intended to be run.

In a nutshell, the main intent of the author is to introduce the idea of writing efficient and portable code in particularly C language right from the start of the module. The article addresses some key areas which have a mammoth effect on portability and efficiency of any program as well as suggests ways to increase their level.



4. References

[1] Programming FAQs : Frequently Asked Questions by Steve summit, Addison-Wesley Pub Co, 2nd edition.
[2] C Programming Language (2nd Edition) by Brian W. Kernighan (Author), Dennis Ritchie (Author), Dennis M. Ritchie, Prentice Hall PTR; 2nd edition (March 22, 1988).
[3] http:// www.google.com
[4] http://www.cuj.com
[5] http://www.splint.org/manual/html
[6] http://www.mentor.com/dft/dft-tech.pdf
[7] http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&group=comp.programming
[8] Software Engineering by Roger S. Pressman, McGraw Hill Text; 3rd edition (September 1991).
[9] http://www.duckware.com/bugfreec
[10] Linux Device Drivers, (2nd Edition) by Alessandro Rubini, Jonathan Corbet, O'Reilly & Associates; 2nd edition (June 2001), (chapter 1, 2).
[11] C99 standard


1