Pointers, References and Values by Michael D. Crawford Continued...

How to Store and Initialize Member Variables

Member variables must be initialized in the constructor's initialization list. Smart pointer members minimize dependencies while allowing exception safety.

Minimize Dependencies by Storing Members as Pointers

What to you do if the member variable's header file is big and complex, or you have a lot of member variables and don't want to slow down compilation and encourage dependencies? The simple answer is to store the member variables as pointers, and to allocate them with new in your class' constructor. (In certain special cases you have references as member variables instead). You will also need to be sure to delete them in your class' destructor. Here is a first shot at it:

// User.h
class PointerMember;
class RefParam;

class User{
   public:
      User( const RefParam &inParam );
      virtual ~User()

   private:
      PointerMember *mPointerMember;
};

// User.cpp
#include "User.h"
User::User( const RefParam &inParam )
   : mPointerMember( new PointerMember( inParam ) )
{
	return;
}

User::~User()
{
   delete mPointerMember;
   return;
}

Then whenever you use your member, you use mPointerMember->Something where you would have otherwise used mValMember.Something. Search and replace functions in text editors and IDE's can make it easy to switch the type of storage used for a member.

The Initialization List

Note that it is terribly important that you initialize pointer members (actually any member) of your objects in the constructor's initialization list. For C++ neophytes, that is the weird looking code like this that falls just before the opening brace in the example above:

  : mPointerMember( new PointerMember( inParam ) )

If you don't always need to have a pointer member in existence during the lifetime of your object, you may choose to initialize it to nil (note that it is always safe to delete a nil pointer - delete's implementation checks the pointer value before passing it to the heap manager). If the pointer is going to need to be allocated before the constructor is done, always do it in the initialization list, not in the body of the constructor, like this:

User::User( const RefParam &inParam )
{
   mPointerMember = new PointerMember( inParam );		// DON'T DO THIS
   return;
}

I've worked on huge C++ projects where few if any member variables were initialized in initialization lists - and those projects were riddled with bugs. On one of them, where the source code took up 70 megabytes on disk, I did nothing but debugging the entire time I worked at the company. While I fixed bushels of bugs, thousands remain. Failure to properly initialize member variables wasn't the only problem in that code but it did contribute to the quality issues.

In general the body of a constructor should only be used to carry out operations on member variables or the whole object once the whole thing has been initialized. Basically, reserve it for code that could not possibly fit in the initialization list.

Since learning about the proper use of initialization lists, I've been writing new constructors and reworking old ones so that the bodies of them are almost always empty or contain only a few lines of code, while all the real work is done in the lists. Sometimes some extra strategy is required to get this to work but the labor required pays off handsomely in the end.

Note that the initialization list is the only place that reference and const members can be initialized at all. If you fail to initialize a const member in your initializer list, it will be initialized by its default constructor and you will not be able to change it elsewhere, not even in the body of the constructor. If you fail to initialize a reference this way your code won't compile at all. The following code gets a fatal error from g++:

class HasRefMember
{
	public:
		HasRefMember( int &inIntToAlias );

	private:
		int	&mSomebodyElsesInt;
};

HasRefMember::HasRefMember( int &inIntToAlias )  // No initialization list!
{ // refinit.cpp: In method `HasRefMember::HasRefMember(int &)':
  // refinit.cpp:11: uninitialized reference member `HasRefMember::mSomebodyElsesInt'

        mSomebodyElsesInt = inIntToAlias;  // The compiler doesn't even get this far
}

Another note about the initialization list - place each member variable in the initialization list in the same order as they fall in the class declaration. The fact is, the C++ compiler will always initialize your members in the same order as they fall in the declaration; having the list's order match the order the compiler actually uses will avoid confusion. Further, if you understand that members are initialized in a particular order, you can arrange the order to be able to use an earlier member as a parameter to the constructor of a later member, like this:

class Example{
   public:
      Example( double inVal );

   private:

      double mSqrt;
      double m2Sqrt;
};

Example::Example( double inVal )
   : mSqrt( sqrt( inVal ) ),
     m2Sqrt( mSqrt * 2 )
{
   return;
}

If we were to change the order of the member declarations but leave the initialization list as it is given here, m2Sqrt would be initialized to whatever garbage was left in memory (an undefinited value) and then mSqrt would be initialized to the square root of inVal.

Thus if you should change the order of member variables in any class' declaration, check the constructor's initialization list and update it too, and periodically inspect all the constructors in your program to make sure they have the order right. Some compilers are nice enough to warn if the order is incorrect.

There are some limitations to the syntax allowed in an initialization list that may require some work to get around.

Each of the items in a list is a call to a member variable's constructor. Some types do not appear to have the constructors you are calling but you can initialize them with a value or reference to an object of their type - you are calling the copy constructor, and the compiler will provide a default copy constructor if you do not write one yourself (even if the default does not provide the correct behaviour). Construction of built-in types is just assignment.

The parameters to the constructors may only be a comma-separated list of zero or more expressions. You cannot provide statements, basic blocks, or calls to void functions. You can call functions that return a result of the right type (such as a value to pass into the copy constructor). Importantly, you cannot provide loops or if statements. If you need these, you will have to place them in subroutines that are called from the initializer.

Don't call non-static member functions from the initialization list. Your object hasn't been fully constructed yet and if you or, in the future, some maintenance programmer should refer to a member that hasn't been initialized yet undefined behaviour will result. If you need to write a subroutine to calculate a parameter to a member constructor, declare it static and explicitly pass it any parameters it may need - but only pass parameters that have been constructed at the point of the call, and don't pass this.

Note that it is permissible to call member functions in a base class, because the base object has been fully constructed, and you can call functions defined in other classes, as long as you do not pass this as a parameter - again, because the object has not been fully constructed yet.

I will cautiously suggest that it is permissible to pass this from the initialization list if the function that is receiving it expects a pointer to your base class. This is because the base class object has been fully constructed by the time the derived class constructor is called. This is only permissible if the pointer is truly used only as a pointer to the base - that it is not downcast by the recipient, and that you respect encapsulation - being a derived object should not modify the characteristics of the base part as it stands on its own.

Sometimes you may not want to call a subroutine from the initializer. Maybe you just do not want to write a whole subroutine that you will only call from one place. Maybe you do not want to have to declare its prototype in the class header file and force lengthy recompilation of all the sources that depend on that header. Maybe the constructor will be called very frequently and you want to avoid the overhead of the subroutine call (but consider using an inline function for this case). Maybe the return value of the subroutine would be a very large object and it would incur a lot of runtime overhead to construct one, copy it, and destroy the original.

This is a handy place to know about the conditional expression operator ?:. It is a compact form of if statement that resolves to a value, so it is an expression that can be used in an initializer list. Many people do not like to use it because they feel it is obscure, and I admit it can be if the expressions are complex. But it is just the thing for an initializer list parameter when you would otherwise write a subroutine just to hold an if statement:

#include <string>

class InitExample
{
	public:
		InitExample( std::string const &inFileName, bool inWritable );

	private:
		int	mFileDescriptor;
};

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <exception>

InitExample::InitExample( std::string const &inFileName, bool inWritable )
	: mFileDescriptor( open( inFileName.c_str(),
				inWritable ? O_RDWR : O_RDONLY ) )
{
	if ( mFileDescriptor < 0 )
		throw std::exception();

	return;
}

Finally, sometimes initialization lists get very lengthy and hard to read. That is a good point to consider whether your class has too many members. Perhaps it would be better to find groups of members that naturally work together and place them into separate classes, and to implement your original class through composition.

auto_ptr: the Exception-Safe Smart Pointer

There are a couple problems with using pointers for member variables even when you properly initialize them. One is a tendency to have memory leaks. There's a couple steps you can take.

One is to exhaustively inspect your code and make sure that all member variables are properly initialized, and then open your .cpp files to the destructor in one editor window and your .h files in another and systematically ensure you delete each pointer that your object actually owns.

That is, sometimes your object holds a pointer member variable not because it possesses it but because it was passed in by some other class for its use, and it would be an error to delete it - you need to consider carefully what your policy on object ownership is and be consistent about it.

Carefully inspecting your header files and destructors in a systematic way will get rid of most of your memory leaks. But a problem remains, and having to manually delete your pointers is tedious and error prone - and there's no clear way to designate the distinction between a pointer you own and a pointer you use that is owned by somebody else.

There is another problem (ready?) - resource leaks caused by exceptions. A lot of C++ programmers would rather just avoid dealing with exceptions and I know I have in the past. Handling exceptions properly in application code is a difficult issue and must be approached strategically. Handling them properly in library code so that it is possible for your library's users to even attempt to be exception safe is an incredibly difficult problem, especially when templates are involved, because you don't have any control over whether a parameter to the template might throw an exception.

I have read that the problem of making old code exception safe is similar to the problem of making single-threaded code thread-safe. Neither thread-safety nor exception-safety are features you can just add to old code, they must be engineered in from the beginning. It is much easier to write totally new code for each case, but even so learning to do them well is a process that takes a long time. Transforming old code is a vast undertaking. Witness, for example, the lengthy process the Linux kernel has gone through to handle symmetric multiprocessing, and then once it was working, the effort required for the kernel to scale well to systems with many processors.

I'm just beginning to get a handle on exceptions in my own code as I write the first draft of this (October 2000). If your code does not handle exceptions at all, it is likely that it will crash from time to time and your users will lose data. If you handle them moderately well but not perfectly then it is likely that you will leak memory and over a long period of time your program will slow down (perhaps the whole machine will) and die a long slow death, maybe taking the machine down with it.

Why is it that a shrink-wrap GUI application from a consumer software vendor will crash in a few hours or days of use, while we are capable - and have been for decades - of sending a space probe to the outer planets whose computer stays running the whole time, even to survive and to stay operational as it leaves the solar system to go into interplanetary space? One reason is careful attention to detail, handling error conditions - and in C++, this means handling exceptions.

Let me restate this more bluntly. If you write in C++ and you do not deal with exceptions, your code is seriously buggy. Even if you never throw an exception on purpose, exceptions are the only way to fail out of a constructor, and lots of library code throws exceptions. The lack of any way to report an error from a constructor was a primary reason I gave up on C++ after I wrote that test tool at Apple in 1990. Others thought so too, so exceptions were added to the language later.

If an exception is thrown in the body of some C++ code, all of the objects that are stored by value which have been constructed between the time of the try and the catch clause that eventually catches the exception will be properly destroyed. Pointer values and reference values will not be:

void AClass::AFunction()
{
	AnotherClass anObject;

	anObject.MemberFunction();

	PointerThing *ptr = new PointerThing;

	YetAnotherClass anotherObject;

	if ( anotherObject.ShouldIThrow() ){
		throw exception();
	}

	StillAnotherClass stillAnother;

	stillAnother.GetDownGetFunky();

	delete ptr;
}

In the example above, if YetAnotherClass::ShouldIThrow returns true, then an exception will be thrown. anotherObject and anObject will be destroyed. stillAnother will not be destroyed because it has not come into scope yet.

This code has a memory leaks. ptr will not be deleted during an exception. There is really no way C++ exceptions could know to delete pointers or destroy references because pointers and references are both aliases; they are additional names for objects, not the objects themselves, and there can be more than one of them referring to the same object.

In the above code, the auto_ptr template will take care of the problem. This is because you create an auto_ptr as a whole object (instantiate it on the stack, by value, or place it as a "whole object" member variable in a class), and store a pointer in it. auto_ptr implements the operators usually used by pointers like "*", "->" and so on. You don't delete an auto_ptr object; you simply let it go out of scope, so in the above we would use:

	auto_ptr<PointerThing> ptr( new PointerThing );

The allocated memory will be deleted either when the function exists, or importantly, if an exception is thrown. We do not explicitly delete ptr.

Why is this so important to member variables? One of the reasons is exception safety, as emphasized by Scott Meyers. The other reason is designating ownership. Because auto_ptr's own the memory they point to, you don't use an auto_ptr for memory you don't want to delete yourself. If you are holding a pointer to some other guy's memory, then just hold a naked pointer to it. Otherwise, always use a smart pointer such as auto_ptr. (There are common cases where auto_ptr's will not suffice, as we discuss on the next page.)

The reason this is so important to exception safety when it comes to member variables is that C++ will only destroy completely constructed objects when an exception is thrown. If an exception is thrown while processing the initializer list of a constructor, then that object will not be destroyed. This is because some of the member variables will be in an undefined state and we couldn't count on the destructor behaving correctly - it could even cause a crash, for example if an unitialized pointer were deleted.

If the object is a derived class, the base class will be destroyed. Member variables that have been completely constructed will be destroyed, and that is why using auto_ptr's for member variables is so important. Suppose we have the following class and an exception should be thrown:

// BabyBear.h
#include "Bear.h"
#include <memory>		// auto_ptr is declared in <memory>
class Papa;
class Mama;
class Goldilocks;

class BabyBear: public Bear{
   public:
      BabyBear( Goldilocks *inGirl );
      virtual ~BabyBear();

   private:

      Goldilocks         *mGirl;      // BabyBear does not own a Goldilocks
      auto_ptr< Papa >   mPapa;
      auto_ptr< Mama >   mMama;
};

// BabyBear.cpp
#include "BabyBear.h"
#include "Papa.h"
#include "Mama.h"
#include "Goldilocks.h"

BabyBear::BabyBear( Goldilocks *inGirl )
   : Bear( "brown" ),      // construct the base class first
     mGirl( inGirl ),
     mPapa( new Papa ),
     mMama( new Mama ),    // suppose an exception is thrown here

{
	return;
{

BabyBear::~BabyBear()
{
   return;   // note that the destructor is empty
}

First, note that the destructor is empty. Along with striving towards empty constructor bodies, it is helpful to strive towards empty destructors. That's not to say the destructor doesn't do anything - it's just that what it does is automatic - the C++ compiler actually puts a lot of stuff in the executable machine code of the destructor, we just don't see it in the C++ source. Using auto_ptr saves on your typing and helps to remove errors from your code.

In the above code, if a BabyBear is completely constructed and then destroyed, the auto_ptr template will call delete on the pointers held by mMama and on mPapa and will do nothing on mGirl.

The interesting thing is what will happen if an exception is thrown partway through BabyBear's initialization list. Suppose we succeeded in allocating a Papa and storing its pointer in the mPapa auto_ptr, but then an exception was thrown in Mama's constructor?

The code at hand is not responsible for what Mama's constructor does so we should not consider it. Let us hope Mama's programmer did the right thing, or return to it on another day. The problem at hand is BabyBear.

Nothing will happen to mMama because it hasn't been constructed yet.

Nothing will happen to mGirl because it is a pointer. Exceptions don't delete pointers. An alternative way to look at this is to understand that pointers, as data objects themselves, don't have destructors - the objects they point to have the destructors.

mPapa will be destroyed, because it was successfully constructed. auto_ptr will call delete on the pointer to the Papa.

BabyBear::~BabyBear will not be called, because the object was not completely constructed.

The base class Bear will be destructed, because the base class was successfully constructed.

So as long as Mama's constructor doesn't leak in exceptions, and as long as Bear doesn't leak in normal use, then the above code will have no memory leaks.

This is the first step towards exception safety. It comes at a very small cost, learning how to use auto_ptr. This might be a problem, either because your development environment doesn't support templates (or doesn't support templates correctly) or because it doesn't come with an implementation of the auto_ptr template.

There's not a lot you can do about failure to support templates besides getting a new development environment, but I think the C++ ISO Standard has been around long enough now that any compiler you're likely to encounter will support templates well enough to do auto_ptr correctly.

If you don't have an implementation of auto_ptr, one is given in Scott Meyers book. You just have to type it in - it's not very long.

Please note that it is an error to use auto_ptr with a pointer to an array. This is because auto_ptr calls delete on its pointer in auto_ptr::~auto_ptr() - for arrays, one is required to call delete []. Lots of programmers still don't practice this correctly, and this was not done in the first version of C++.

There are two important reasons why delete [] rather than delete is required for arrays. One is that the array member's destructor must be called individually on each member of the array; the way C++ knows to do this is because of which version of delete you used.

If you call delete instead of delete [] on a an array whose member has a nontrivial destructor, you will leak resources. It may work for primitive types like character arrays - for example, old-fashioned C strings allocated with new[], but your code may crash, or may work when compiled with your current development system or on your current OS but may crash when ported to another.

This is because the number of array elements has to be stored somewhere. One perfectly legitimate was to do it would be to store it in a longword just before the array, then subtract 4 from the address passed to delete and take the count from there. The address that is actually passed to the underlying heap manager to free will be the address you passed to delete [] minus 4 (or I suppose 8 on a 64-bit architecture).

But if you call delete rather than delete [] this subtraction will not be done, and the address freed will not be one that was previously allocated. To increase performance, memory managers almost never check pointers for validity before freeing them, so a crash will occur.

But not always - this is not how every development system manages its C++ memory, and does not seem to be the case with the one I usually use, Metrowerks Codewarrior. With Codewarrior, it actually does seem to work to use delete on arrays of primitive types or types with trivial destructors. This must be because the number of elements in an array is stored somewhere else, perhaps in a map with the array address as the key.

I do know that such crashes do occur with other systems though, as others confirmed it after I brought the question of array auto_ptr's up in some online forums.

One feature I'd like to see in any development system, and one that I plan to add to the debugging heap manager in the application framework I currently use, is to validate the correct use of delete and delete [] consistently with new and new [] and call assert if they are used inconsistently.

How is one to deal with the lack of array auto_ptr's? I'm told that the C++ Standards Committee did not call for them, and some experts have advised me not to use them, because one should use the vector template from the Standard Template Library instead.

This is the preferred solution, but does not always solve the problem. Sometimes one is working with old code, perhaps where one does not have the source, or with code that does not use the STL because it is intended to be so portable as to be able to run on compilers that don't provide conforming implementations of it. One such library is the Xerces-C library (actually written in C++) which is a validating parser for the Document Object Model and SAX events from the Apache Software Foundation. I'm using Xerces in a product I've been working on for a while.

I came across the problem with array auto_ptr's when dealing with the memory returned by Xerces' transcode() functions, which convert ASCII text (or rather, "local codepage text") into Unicode and back. XML API's strictly use Unicode internally, and XML is a standard for Unicode documents, but it is possible to create documents in any character set, not just Unicode. One also needs to translate to and from the local code page to handle text in a GUI program that is using an XML library, if the GUI implementation is not Unicode aware - as most aren't yet.

My problem was that Xerces would allocate C style strings with one or two byte characters using new[] and pass them back to the user program - my program - with the user being responsible for deletion. After several attempts at dealing with these strings cleanly, I hit on using array auto_ptr's to manage them. Imagine my disappointment when I found out that, although my code worked, it was not correct.

There are several solutions to this problem. One is provided by Xerces itself. That is to write a custom class which provides auto_ptr-like functionality but with a different name. Xerces has the Janitor and the ArrayJanitor classes for these, and I suppose if I'd looked inside of Xerces more carefully the question would have never arisen. Xerces does use templates, it's just that it uses simpler ones than are typical for the STL, and the developers will take pains to verify the templates work on all the platforms Xerces is supported on.

Another solution was available to me if I'd read the auto_ptr template in my development system more carefully. Metrowerks Codewarrior has a non-standard implementation of auto_ptr, one with two template parameters. The second parameter defaults to give a behavior like the normal single-object auto_ptr. But a class that is provided just for this purpose can be provided as the second parameter to make a custom auto_ptr handle arrays correctly.

It is an elegant solution but one I ultimately chose not to use because I specialize in cross-platform code. Although my current product runs only on Mac and Windows, and I have Codewarrior to compile for both platforms (I recommend it highly BTW - it's a great development system), I want any code I write to be ultimately portable to any compiler. Coding to an implementation of auto_ptr that would not exist on another platform would be a problem for me - it would not be a problem to use something with a distinct name but something with a standard name but nonstandard behaviour I could not use.

The Design and Evolution of C++ cover

The Design and Evolution of C++

by Bjarne Stroustrup

[ Buy]  

My choice was to write conversion utilities that would capture the arrays passed back by Xerces and convert them into standard C++ STL strings (for ASCII) or Xerces DOMStrings (for Unicode strings). These types release their memory when they go out of scope. In this special case I think this was the right choice for me. I discuss the issue of auto_ptr's of arrays and my solution further in another programming tip, On Refactoring C++ Code.

While I think the C++ experts are right that using the vector template is a better choice than using arrays, this is not always possible. One of the design principles of C++ has always been that it was to be a practical language for the working programmer and it was to support working with legacy code. This philisophy is emphasized in Bjarne Stroustrup's The Design and Evolution of C++. I feel the ISO C++ Standard Committee fell short here.

One certainly can write one's own array manager, as the Xerces folks have done, but I would prefer it if there was only one name for this kind of thing, either one could call it auto_array or add the traits parameter to auto_ptr as Metrowerks has - but it needs to be added to the ISO standard.

next button previous page contents all programming tips titles

Copyright © 2000, 2001, 2002, 2005 Michael D. Crawford. All Rights Reserved.

One Must Not Trifle With Wizards For It Makes Us Soggy And Hard To Light