Pointers, References and Values by Mike Crawford Continued...

Conclusion: Pay Attention to Performance

Understanding the tradeoffs of each kind of data representation can
dramatically improve the performance of your C++ programs.

The decisions about how to store and represent the data in your C++ programs can be complicated. I hope I have been able to give a straightforward way to decide on the various options that will serve for most of the code you will write.

These guidelines are just that - guidelines - and are not meant to be followed slavishly. What is more important is for you to understand the reasoning behind them so you can make appropriate decisions about your own code when the need arises.

In particular, it is not always immediately obvious how one's decision will affect performance in a program, and it is often the case that code that is efficient on one microprocessor will be less efficient on another. For example, the PowerPC has lots of registers and so it can make one's products more efficient to write code that encourages the use of registers (use lots of small local variables). This same code may be slow when ported to the Intel x86 architecture because that processor has only a few general purpose registers and your machine code may be frequently swapping data between the stack and the few registers available.

For this reason, it is important to use a profiler to test performance-critical code. If you're designing a library or code that may be taken cross-platform, it is worth your while to profile it with different processor architectures and different compilers, and to work to understand the results.

One way you can understand the results is by compiling your C++ into assembly source and then examining the resulting code. Some people think I'm pretty odd for suggesting this, but in fact it is one of the reasons I recommend everyone learn at least one kind of assembly code (even - maybe especially - Java programmers).

I have done a great deal of very successful performance tuning and this is one of the significant ways I have done it. You'd likely be pretty surprised at what you find in your assembly output. It would probably do you a world of good to do this even once, if you work to understand the reasons behind what you see.

Another helpful method is to set breakpoints at the beginning and end of some function, and then use your debugger to trace (or continuously step then continue) through the machine instructions while saving each assembly instruction to a log file. The trace should step all the way down into all subroutines to the very ends of the call chain and back out again.

When you are done, open the log in a text editor and copy each of the instructions for a particular subroutine all into the first place that subroutine is encountered. If a loop is executed ten times, you will have ten copies of its code - put them all together. Then count the number of instructions in each subroutine, and optimize the code for the subroutines with the greatest instruction count. Not the longest subroutines - but the subroutines responsible for executing the greatest number of instructions, taking into account repetitions.

It is easier to count instructions if you print out your edited log file, then print a paper "counting ruler" in the same font, one number per line, like this:


Perl wizards might write a script to parse the raw debugger log in an automated way. If I did this frequently I know I would.

Many people have argued with me that examining the assembly produced by C++ source is not a valid way to tune code because the result is dependent on a particular architecture, compiler or even compiler settings. There are particular cases where they are right (such as encouraging the use of register variables) but in general it is not so. One of the biggest performance traps for C++ is the construction and deletion of unneeded temporary variables. Examining your assembly really hammers this home. There are many other performance problems this will reveal, most of them platform-independent.

The vast majority of your code can be written in a straightforward way using simple conventions, but use a profiler to identify critical regions in your code, and then use it to measure the results of changing your code to use different conventions.

If you are not getting what you need from your profiler, consider writing your own timer library by making use of performance registers that may be available on your microprocessor. For example, some models of the PowerPC have a register that counts the number of clock cycles since system startup, divided by four. You can count the number of cycles required to execute a particular subroutine, and then experiment with implementing it in different ways.

Hoare said "Premature optimization is the root of all evil in programming." - and this statement is widely quoted by those who believe it (it is usually misattributed to Donald Knuth, who himself gave Hoare proper credit).

However, I do not feel that this is strictly true. The best performance must be both designed in from the beginning, like thread and exception safety, and it must be maintained through conscious application by each working engineer throughout the software development process.

This is not to say one should sit around minutely tweaking code before you have anything working (here I would agree with Hoare). Rather, one should plan the overall design of one's product to use efficient algorithms, to minimize memory consumption, virtual memory paging, cache consumption and overall code size.

Programmers in their day-to-day work should follow guidelines such as those I have laid out in this paper because the whole runtime and memory of your programs will be ultimately determined by the sum of all your code. The fate of the battle is ultimately determined by those slogging it out in the trenches - not the generals. It will not be economical to revise vast portions of your codebase to encourage detailed efficiency after you have it all implemented. You can do it, and maybe you will have to, but it will come at a great expense.

Failure to pay attention to performance throughout the development process leads to the slow, bloated and buggy software that has come to be the norm in the consumer market today. Despite the valiant effort of the hardware engineers to uphold Moore's Law, lazy and ill-informed programmers (or perhaps good ones constrained by uncaring management) work even harder to reverse the progress, leading to the phenomenon of ordinary office and home users having to replace their hardware every three years just to continue performing basic productivity tasks with currently available programs.

Earlier I promised that I would write more about inline functions and I have not done so. I have decided I will write a Programming Tip devoted just to inline functions so I may take the time to discuss them in detail. But the important rule to remember is to use inline functions, but use them judiciously. Many programs do not use them at all, and of the ones I have seen that do use inline functions, some use them to such excess as to reduce the overall performance of the program. I will discuss the reasons for this in the article I have planned.

Above all, what is most important is to understand the reasons behind the code you write. Many inexperienced programmers, and even some old pros who are set in their ways do everything just one way because it is what they are accustomed to. You can get stuff written this way but it will not achieve the best results possible.

next button previous page contents all programming tips titles

Copyright © 2000, 2001, 2002, 2005 Mike Crawford. All Rights Reserved.

One Must Not Trifle With Wizards For It Makes Us Soggy And Hard To Light