edryd.org

some of my neat stuff
git clone git://edryd.org/edryd.org
Log | Files | Refs | LICENSE

macro-dangers.md (9801B)


      1 ---
      2 title: "The Dangers of C Macros"
      3 date: 2018-09-11
      4 tags: tech code c
      5 categories: tech
      6 ---
      7 
      8 C macros, while being extremely powerful when used correctly, can also be the
      9 cause for a lot of unnecessary headaches if you are not aware of their
     10 limitations. It is easy to view macros as just a fast shorthand for making
     11 simple functions, but there are very important differences which need to be
     12 addressed.
     13 
     14 This post outlines some real life examples of macros I have come across or
     15 attempted to use, some are very beneficial, improving life for everyone,
     16 others are terrible and impossible to debug, and then some just are plain
     17 stupid.
     18 
     19 ## intro
     20 
     21 In C, macros are delegated to the preprocessor, a program run before the
     22 compiler which changes the source C files so they are ready to be compiled.
     23 This includes basic things such as removing comments or adding the contents of
     24 others files with `#include`. The preprocessor also handles a crude, yet
     25 powerful, form of constant variable creation with `#define`. For example, the
     26 following makes the C preprocessor replaces every occurrence of `PI` with the
     27 number `3.14159`.
     28 
     29 ```c
     30 #define PI 3.14159
     31 ```
     32 
     33 This is also extended to accept arguments, allowing for macros which act as
     34 basic functions.
     35 
     36 ```c
     37 #define RADTODEG(X) ((X) * 57.29578)
     38 ```
     39 
     40 The preceding macro replaces every `RADTODEG(PI/2)`, with `((3.14159/2) *
     41 57.29578)`, converting *π/2* radians to about 90 degrees.
     42 
     43 ## the good
     44 
     45 ```c
     46 #define MAX(A, B)         ((A) > (B) ? (A) : (B))
     47 #define MIN(A, B)         ((A) < (B) ? (A) : (B))
     48 #define BETWEEN(X, A, B)  ((A) <= (X) && (X) <= (B))
     49 #define LEN(X)            (sizeof(X) / sizeof((X)[0]))
     50 ```
     51 
     52 Above is a list of four macros which I have in pretty much every project I am
     53 working on, just because they are so useful. The first one, `MAX` returns the
     54 larger of the two given numbers. This is a nice shorthand, making the code much
     55 easier to read by hiding the ternary operator away. In companion with it is of
     56 course `MIN`, which does exactly what you think it does.
     57 
     58 Next, I often find my self needing `BETWEEN`, which returns whether or not the
     59 given character `X` is inside `A` and `B`. One example of this is to figure out
     60 if a given character is a lower case letter: `BETWEEN(c, 'a', 'z')`.  Finally,
     61 `LEN` returns the length of an array, fairly basic and well needed.
     62 
     63 ## the bad
     64 
     65 Here is a seemly innocent macro I wrote to check if a character is valid for a
     66 specific application:
     67 
     68 ```c
     69 #define ISVALID(C) (BETWEEN(C, 'a', 'z') || strchr("_-", C))
     70 ```
     71 
     72 The macro should return 1 if the passed character, `C`, is a lowercase letter,
     73 an underscore, or a hyphen. At first, it might seem like this macro works
     74 perfectly fine, and it does for the most part; however, in certain cases, there
     75 are undesirable side effects which are hard to figure out. For example, I
     76 wanted to use this macro, which had been working well so far, to strip the
     77 characters at the end of a string that are not valid. Simple enough, right?
     78 
     79 ```c
     80 for (char *s = str; *s && ISVALID(*s++); len++)
     81 	/* do nothing in here */ ;
     82 str[len] = '\0';
     83 ```
     84 
     85 This should move the terminating null character to where the last valid
     86 character of the string is, but in this current usage, it doesn't seem to
     87 work correctly. If you use the example string `"test-string! removed"`
     88 you would expect `"test-string"`, instead you get `"te"`, which is much
     89 shorter than it ought to be.
     90 
     91 In order to know why this happens you have to understand what the C
     92 preprocessor is doing under the hood. For every instance of `ISVALID`, C
     93 replaces it with the defined expression, in this case `(BETWEEN(C, 'a', 'z') ||
     94 strchr("_-", C))`.  If you specified arguments, which is the case for macros,
     95 the variable is then replaced with every occurrence within the given
     96 expression, so the for loop gets replaced with:
     97 
     98 ```c
     99 for (char *s = str; *s && (('a' <= *s++ && *s++ <= 'z') || strchr("_-", *s++)); len++) ;
    100 ```
    101 
    102 It should be clear now why this is producing weird results, the increment is
    103 duplicated three times. When a function is run, each argument is evaluated
    104 before being supplied to the body, but for macros, the preprocessor doesn't
    105 understand the expression, it just blindly copies and pastes it to every
    106 occurrence, causing the character to be incremented more times than wanted.
    107 
    108 This subtle, but critical, distinction between macros and functions can cause
    109 these hard to find bugs when you refuse to acknowledge their differences.
    110 
    111 To solve this error I ended up just replacing this short macro with a function,
    112 which in this case demonstrates some of the limitations of macros. Sometimes it
    113 is just easier to use a function.
    114 
    115 Another example I have come across is a macro used in a codebase to report and
    116 keep count of any errors encountered. The initial version of this macro is
    117 shown.
    118 
    119 ```c
    120 #define report(M, ...)                                                        \
    121 	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
    122 	errors++;
    123 ```
    124 
    125 This works fine for many causes, but problems arise when you start to use it
    126 more often in different situations. One of these use cases which no longer
    127 works as intended is when you try to call it in an `if` statement.
    128 
    129 ```c
    130 if (val != A_NUM)
    131 	report("error: variable 'val' is [%d] not A_NUM", val);
    132 ```
    133 
    134 In C the curly braces around a conditional statement can be omitted if the
    135 statement only contains a single line. Most of the time this works fine and
    136 makes the code look cleaner, but this example complicates things. While the
    137 macro may look like a single line, when the preprocessor modifies it is now two
    138 separate lines, the `fprintf` function and the `errors++` statement. The `if`
    139 statement only encompasses the `fprintf`, so the program always increments
    140 `errors` by one, even if `val` is the desired value and there is no issue.
    141 
    142 At first, this seems easy enough to fix, once you realize that you are calling
    143 a multi-lined macro, not a function, you just add some curly braces to your
    144 macro.
    145 
    146 ```c
    147 #define report(M, ...) {                                                      \
    148 	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
    149 	errors++;                                                             \
    150 }
    151 ```
    152 
    153 This does indeed solve this particular problem, but it also introduces some
    154 others.  Later on, I wanted to add an `else` to the `if` statement, but the
    155 compiler spat out a syntax error complaining that the there is no `if` for the
    156 `else`.  After much examination, I realized that the semicolon after the macro
    157 is actually not needed and is getting in the way of the `else`. When expanded
    158 this code:
    159 
    160 ```c
    161 if (str == NULL)
    162 	report("error: variable str is NULL");
    163 else
    164 	do_something(str);
    165 ```
    166 
    167 Becomes:
    168 
    169 ```c
    170 if (str == NULL) {
    171 	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__);
    172 	errors++;
    173 };
    174 else
    175 	do_something(str);
    176 ```
    177 
    178 Now it is clear that this semicolon is separating the `if` and `else`
    179 statements. You could just remove this semicolon since it's not actually
    180 needed, but now it looks like your code is missing a semicolon, and every time
    181 you use this macro you have to remember that you can't use a semicolon. This is
    182 less than ideal, so instead, you can extend these curly braces to become a
    183 do-while loop.
    184 
    185 ```c
    186 #define report(M, ...) do {                                                   \
    187 	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
    188 	errors++;                                                             \
    189 } while(0)
    190 ```
    191 
    192 Since it is a do-while loop it is always evaluated at least once, but because
    193 the condition is `0`, it never repeats. A while loop also needs a semicolon at
    194 the end, this allows us to include one after the macro, giving the programmer
    195 the expected results. The do-while loop also only counts as one line, so the
    196 shorten `if` statement notation can be used.
    197 
    198 In this example, macros are still a very viable option, once you are aware of
    199 their limitations.
    200 
    201 ## the ugly
    202 
    203 The next portion is for serious macro abuses, one such example I found
    204 stumbling through `tcsh`'s source code.
    205 
    206 ```c
    207 #define DO_STRBUF(STRBUF, CHAR, STRLEN)				\
    208 								\
    209 struct STRBUF *							\
    210 STRBUF##_alloc(void)						\
    211 {								\
    212     return xcalloc(1, sizeof(struct STRBUF));			\
    213 }								\
    214 								\
    215 void								\
    216 STRBUF##_free(void *xbuf)					\
    217 {								\
    218     STRBUF##_cleanup(xbuf);					\
    219     xfree(xbuf);						\
    220 }								\
    221 								\
    222 const struct STRBUF STRBUF##_init /* = STRBUF##_INIT; */
    223 
    224 DO_STRBUF(strbuf, char, strlen);
    225 DO_STRBUF(Strbuf, Char, Strlen);
    226 ```
    227 
    228 `tcsh`'s `tc.str.c` defines an 80 line long macro (small portion displayed
    229 above, the whole mess is [here][1]) in order to duplicate a family of functions
    230 to work with their `Char` variable type as well as normal `char`. The macro is
    231 defined as `DO_STRBUF` which takes 3 arguments, a struct `STRBUF`, a type
    232 `CHAR`, and a function `STRLEN`. `tcsh`'s old code base is designed to work on
    233 many legacy and outdated systems, so it needs to support the various types of
    234 `char`, such as `wchar_t`, `wint_t`, `short`, etc. The overly complex
    235 assignment of `Char` can be seen [here][2]. For some reason, the authors
    236 thought it best to include two types of these boilerplate functions, instead of
    237 unifying them as one set, which would greatly improve the entire code base's
    238 simplicity and readability.
    239 
    240 ## conclusion
    241 
    242 If you are aware of macros' limitations then they can become a powerful
    243 tool to quickly write clean and effective code. You always have to be careful
    244 though when utilizing them, use your judgement to determine when their
    245 advantages over normal functions become problems and headaches instead of fast
    246 time savers.
    247 
    248 [1]: https://github.com/tcsh-org/tcsh/blob/master/tc.str.c#L628-L710
    249 [2]: https://github.com/tcsh-org/tcsh/blob/master/sh.h#L94-L124