Preprocessor – The #define Directive

This is the most complicated of the preprocessor directives (a little over 7 pages is devoted to it in the C99 spec and the C++98 spec devotes about 4 pages to it).

Behaviour of the #define directive is the same in both C and C++.

C99 added variable argument macros; the upcoming C++0x standard adds variadic macros, as well.

Purpose

It is used to define a macro.

A macro is an identifier (or label) followed by text.

When the preprocessor encounters the macro in the program it replaces the macro with the text that follows the macro name.

There is no restriction on what may be in the replacement text. The only restriction on the replacement text is that it must be all on the same line as the #define directive (you can use the line continuation character \ to spread your macro over several lines).

A macro may be a simple label or it may be a label with a parameter list. Macros also may use the token pasting (##) operator and stringizing (#) operator to modify their replacement text.

Format

#define MACRO_NAME (optional_parameter_list) replacement_text

All preprocessor directives begin with the # symbol. It must be the first character on the line or the first character on the line following optional white space.

Some early compilers flagged an error if # was not the first character on the line.

Spaces or tabs are permitted between the # and define, but not escape characters or other symbols or macros. The preprocessor removes whitespace and concatenates the # and define together.

The following are valid uses:

#define my_macro
# define my_macro
#   define     my_macro

The following are invalid uses:

#define empty_macro
// #empty_macro is not a valid preprocessor directive
# empty_macro define my_macro
// #\ is not a valid preprocessor directive
# \t define my_macro
// #" is not a valid preprocessor directive
# "" define my_macro

Macros with no Parameters

The simplest macros are those which only have a name and some replacement text:

#define my_macro
#define my_other_macro "This is some replacement text."

The first #define creates a macro named my_macro that has no replacement text. This is valid because the replacement text is just nothing (effectively an empty string).

The second define creates a macro named my_other_macro that has "This is some replacement text" as its replacement text (including the quotes).

The replacement text begins at the first non-whitespace character following the macro name and ends at the last non-whitespace character on the line (remember, you can spread your macro over multiple lines by using the \ operator to indicate line continuation).

#define my_macro This is some replacement text
                 ^                           ^
The replacement text begins and ends at the ^ characters.

Every time the preprocessor encounters macro name in the code, it will replace it with the replacement text.

The preprocessor will not replace macro names inside of strings.

#define my_macro "a replacement string"
// this will print a replacement string without quotes
printf(my_macro);
// this will print my_macro because my_macro is inside a string
printf("my_macro");

Macros with Parameters

Macros can also take parameters. Text passed as parameters replaces the parameter placeholder in the replacement text.

When declaring a macro that takes parameters, it is important that the opening parenthesis ( immediately follows the macro name. If there is a space between the macro name and ( then everything following the macro name is considered to be the replacement text.

This is a macro without parameters:

#define max (a, b) a > b ? a : b
           ^
The space marks the end of the macro name,
everything following is replacement text.

#define square(a) a*a 

The macro name is square.

The macro parameter is a.

The replacement text is a*a.

Whatever text you pass in the a parameter replaces a in the replacement text.

After the parameters are replaced, that new replacement text replaces square in your code.

After preprocessing, the following source code:

int number = 5;
int c = square(number);

ends up looking like this (this is what the C/C++ compiler sees after preprocessing):

int number = 5;
int c = number * number;

which looks as we would expect it – all occurrences of a have been replaced by the parameter we passed (number).

The preprocessor only performs text replacement, so we can also write the following:

int c = square(12 + 3);

This gets preprocessed into:

int c = 12 + 3 * 12 + 3;

The compiler will parse it as follows (following operator precedence):

int c = 12 + (3 * 12) + 3;

which does not give us the answer we want. Remember, the preprocessor replaces the parameter with the text passed to it. In this case, each occurrence of a is replaced with 12 + 3. This is an example of an “unintended side effect” using a macro. The way to fix this is to ensure each replacement of a is evaluated in its entirety. We do this by adding parentheses around the parameter in the replacement text:

#define square(a) ((a) * (a)) 

Parentheses were also placed around the replacement text to ensure the entire replacement text is evaluated as a whole.

Even with parentheses, it is still possible to get unintended side effects:

int number = 5;
int c = square(number++);

When expanded, this becomes:

int c = ((number++) * (number++));

The side effect is that number is incremented twice and not once as would be assumed from reading square(number++).

Pretty much any text can be passed as a macro parameter – even other macros. You will have difficulty passing a comma (,) since it delimits parameters and the closing parenthesis ()) since it marks the end of the macro parameter list. They can be passed without problem inside of a string.

Macros as Parameters

When a macro is passed as a parameter, the macro will be expanded before replacing the parameter in the replacement text. If the parameter is associated with the stringizing (#) or token pasting (##) operators, then the macro name will be used instead of being expanded.

#define invalid_name "invalid name"
#define error "ERROR - "
#define error_msg(a) error a
.
.
.
printf(error_msg(invalid_name));

In the printf statement, the macro error_msg is passed the macro invalid_name as its parameter. The preprocessor will replace the macro name invalid_name with its replacement text "invalid name". It then processes the replacement text error a by (1) replacing the macro error with its replacement text, (2) replacing parameter a with the previously expanded "invalid name". Once expanded, the compiler sees the following:

printf("ERROR = ""invalid name");

Since there are two juxtaposed strings, they will be concatenated:

printf("ERROR = invalid name");

Passing macros as macro parameters allows you to pass characters you normally wouldn’t be able to.

my_macro defines a simple printf as replacement text. It is obvious that parameter x must be a comma (,) in order for the statement to compile. Trying to pass a comma in the parameter list will fail because it looks to the preprocessor that my_macro is being passed 3 parameters (when it was defined to accept only two):

#define my_macro(x, y) printf("Number passed to macro was %d" x y)
.
.
.
my_macro(,,10);

We can get around this by defining a macro for the comma:

#define comma ,
#define my_macro(x, y) printf("Number passed to macro was %d" x y)
.
.
.
my_macro(comma, 10);

Variadic Macros

In C99 and in the upcoming C++0x versions of the languages, macros can also be defined with a variable number of parameters:

#define my_macro(...)
#define my_other_macro(a, b, ...)

The ellipsis (...) specifies that a variable number of parameters are permitted in place of the ellipsis. The ellipsis must be the last parameter in the macro. (If it is the only parameter in the macro, then it counts as being the last parameter.)

In the replacement text, the parameters taking the place of the ellipsis – including the commas – are referenced using the identifier __VA_ARGS__. Before being assigned to __VA_ARGS__, the preprocessor performs all macro replacements on the arguments:

#define g1 "gyre"
#define g2 "gimble"
#define my_macro(...) printf(__VA_ARGS__)
.
.
.
my_macro("the slithy toves did %s and %s", g1, g2);

my_macro was defined as taking a variable number of arguments. These arguments were then passed to printf. Before passing the variable arguments – referenced as __VA_ARGS__ – to printf, the preprocessor replaces the macros g1 and g2 with their respective replacement texts. This results in the following expansion:

printf("the slithy toves did %s and %s", "gyre", "gimble");

Using ... tells the preprocessor that everything following the preceding comma , up to the closing parenthesis ) is just one argument called __VA_ARGS__. This is one way you can pass commas as arguments without using the comma macro trick described earlier. If you want to pass a closing parenthesis, you still have to use a macro.

#define my_macro(a, b, ...)
                 ^  ^  ^
                 |  |  |
                 |  |  | everything until the ) replaces
                 |  |  |__VA_ARGS__ in the replacement
                 |  |  |text
                 |  | what is passed here replaces b in the
                 |  | replacement text
                 |what is passed here replaces a in the
                 |replacement text

the only difference between the parameter ... and parameters a and b is that ... can include commas. For a and b as soon as a comma is encountered, the preprocessor assumes it has reached the end of the parameter.

I don’t know why they didn’t just call ... __VA_ARGS__ instead:

#define my_macro(a, b, __VA_ARGS__)

because it would have been more obvious what was going on. The only thing special about ... is that the preprocessor does not process the commas (,) for you – you are responsible for parsing the arguments out of __VA_ARGS__ (or just passing it to something that knows how to parse arguments).

Macro re-Processing

After the # and ## operations have been performed, the parameters expanded (if they were macros) and inserted into the replacement text, then the replacement text is processed again looking for macros to replace. However, if any macros are found that have already been replaced, then they are NOT replaced again.

#define comma ,
#define reparse(x, y) x##y
.
.
.
printf("This is a %s" reparse(com, ma) "string");

When reparse(com, ma) is processed, the result is the macro name comma, which is then replaced by the symbol ,

The following:

#define comma reparse(com, ma)
#define reparse(x, y) x##y
reparse(com, ma);

will result in reparse(com, ma) being processed into the macro comma. The macro comma will be re-evaluated and processed into reparse(com, ma). At this point, the preprocessor will stop re-evaluating the macro because it already evaluated the macro reparse earlier in this cycle of evaluations.

Macro Redefinition

All macro names exist in the same namespace, so all macro names must be unique.

If the preprocessor encounters a duplicate macro name, it checks to see if the macro is identical to the original definition – if it is not, the program is considered malformed.

Leading and trailing whitespace are not considered significant. The following are considered identical:

#define my_macro(a) a*a
#define my_macro( a ) a*a
#define my_macro(a) /* comments are */ a*a /* whitespace */

Different parameter names are considered significant. The following are considered different and indicate a malformed program if they appear in the same program:

#define my macro(a) a*a
#define my_macro(b) a*a
#define my_macro(b) b*b