macro-dangers.md (9801B)
1 --- 2 title: "The Dangers of C Macros" 3 date: 2018-09-11 4 tags: tech code c 5 categories: tech 6 --- 7 8 C macros, while being extremely powerful when used correctly, can also be the 9 cause for a lot of unnecessary headaches if you are not aware of their 10 limitations. It is easy to view macros as just a fast shorthand for making 11 simple functions, but there are very important differences which need to be 12 addressed. 13 14 This post outlines some real life examples of macros I have come across or 15 attempted to use, some are very beneficial, improving life for everyone, 16 others are terrible and impossible to debug, and then some just are plain 17 stupid. 18 19 ## intro 20 21 In C, macros are delegated to the preprocessor, a program run before the 22 compiler which changes the source C files so they are ready to be compiled. 23 This includes basic things such as removing comments or adding the contents of 24 others files with `#include`. The preprocessor also handles a crude, yet 25 powerful, form of constant variable creation with `#define`. For example, the 26 following makes the C preprocessor replaces every occurrence of `PI` with the 27 number `3.14159`. 28 29 ```c 30 #define PI 3.14159 31 ``` 32 33 This is also extended to accept arguments, allowing for macros which act as 34 basic functions. 35 36 ```c 37 #define RADTODEG(X) ((X) * 57.29578) 38 ``` 39 40 The preceding macro replaces every `RADTODEG(PI/2)`, with `((3.14159/2) * 41 57.29578)`, converting *π/2* radians to about 90 degrees. 42 43 ## the good 44 45 ```c 46 #define MAX(A, B) ((A) > (B) ? (A) : (B)) 47 #define MIN(A, B) ((A) < (B) ? (A) : (B)) 48 #define BETWEEN(X, A, B) ((A) <= (X) && (X) <= (B)) 49 #define LEN(X) (sizeof(X) / sizeof((X)[0])) 50 ``` 51 52 Above is a list of four macros which I have in pretty much every project I am 53 working on, just because they are so useful. The first one, `MAX` returns the 54 larger of the two given numbers. This is a nice shorthand, making the code much 55 easier to read by hiding the ternary operator away. In companion with it is of 56 course `MIN`, which does exactly what you think it does. 57 58 Next, I often find my self needing `BETWEEN`, which returns whether or not the 59 given character `X` is inside `A` and `B`. One example of this is to figure out 60 if a given character is a lower case letter: `BETWEEN(c, 'a', 'z')`. Finally, 61 `LEN` returns the length of an array, fairly basic and well needed. 62 63 ## the bad 64 65 Here is a seemly innocent macro I wrote to check if a character is valid for a 66 specific application: 67 68 ```c 69 #define ISVALID(C) (BETWEEN(C, 'a', 'z') || strchr("_-", C)) 70 ``` 71 72 The macro should return 1 if the passed character, `C`, is a lowercase letter, 73 an underscore, or a hyphen. At first, it might seem like this macro works 74 perfectly fine, and it does for the most part; however, in certain cases, there 75 are undesirable side effects which are hard to figure out. For example, I 76 wanted to use this macro, which had been working well so far, to strip the 77 characters at the end of a string that are not valid. Simple enough, right? 78 79 ```c 80 for (char *s = str; *s && ISVALID(*s++); len++) 81 /* do nothing in here */ ; 82 str[len] = '\0'; 83 ``` 84 85 This should move the terminating null character to where the last valid 86 character of the string is, but in this current usage, it doesn't seem to 87 work correctly. If you use the example string `"test-string! removed"` 88 you would expect `"test-string"`, instead you get `"te"`, which is much 89 shorter than it ought to be. 90 91 In order to know why this happens you have to understand what the C 92 preprocessor is doing under the hood. For every instance of `ISVALID`, C 93 replaces it with the defined expression, in this case `(BETWEEN(C, 'a', 'z') || 94 strchr("_-", C))`. If you specified arguments, which is the case for macros, 95 the variable is then replaced with every occurrence within the given 96 expression, so the for loop gets replaced with: 97 98 ```c 99 for (char *s = str; *s && (('a' <= *s++ && *s++ <= 'z') || strchr("_-", *s++)); len++) ; 100 ``` 101 102 It should be clear now why this is producing weird results, the increment is 103 duplicated three times. When a function is run, each argument is evaluated 104 before being supplied to the body, but for macros, the preprocessor doesn't 105 understand the expression, it just blindly copies and pastes it to every 106 occurrence, causing the character to be incremented more times than wanted. 107 108 This subtle, but critical, distinction between macros and functions can cause 109 these hard to find bugs when you refuse to acknowledge their differences. 110 111 To solve this error I ended up just replacing this short macro with a function, 112 which in this case demonstrates some of the limitations of macros. Sometimes it 113 is just easier to use a function. 114 115 Another example I have come across is a macro used in a codebase to report and 116 keep count of any errors encountered. The initial version of this macro is 117 shown. 118 119 ```c 120 #define report(M, ...) \ 121 fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \ 122 errors++; 123 ``` 124 125 This works fine for many causes, but problems arise when you start to use it 126 more often in different situations. One of these use cases which no longer 127 works as intended is when you try to call it in an `if` statement. 128 129 ```c 130 if (val != A_NUM) 131 report("error: variable 'val' is [%d] not A_NUM", val); 132 ``` 133 134 In C the curly braces around a conditional statement can be omitted if the 135 statement only contains a single line. Most of the time this works fine and 136 makes the code look cleaner, but this example complicates things. While the 137 macro may look like a single line, when the preprocessor modifies it is now two 138 separate lines, the `fprintf` function and the `errors++` statement. The `if` 139 statement only encompasses the `fprintf`, so the program always increments 140 `errors` by one, even if `val` is the desired value and there is no issue. 141 142 At first, this seems easy enough to fix, once you realize that you are calling 143 a multi-lined macro, not a function, you just add some curly braces to your 144 macro. 145 146 ```c 147 #define report(M, ...) { \ 148 fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \ 149 errors++; \ 150 } 151 ``` 152 153 This does indeed solve this particular problem, but it also introduces some 154 others. Later on, I wanted to add an `else` to the `if` statement, but the 155 compiler spat out a syntax error complaining that the there is no `if` for the 156 `else`. After much examination, I realized that the semicolon after the macro 157 is actually not needed and is getting in the way of the `else`. When expanded 158 this code: 159 160 ```c 161 if (str == NULL) 162 report("error: variable str is NULL"); 163 else 164 do_something(str); 165 ``` 166 167 Becomes: 168 169 ```c 170 if (str == NULL) { 171 fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); 172 errors++; 173 }; 174 else 175 do_something(str); 176 ``` 177 178 Now it is clear that this semicolon is separating the `if` and `else` 179 statements. You could just remove this semicolon since it's not actually 180 needed, but now it looks like your code is missing a semicolon, and every time 181 you use this macro you have to remember that you can't use a semicolon. This is 182 less than ideal, so instead, you can extend these curly braces to become a 183 do-while loop. 184 185 ```c 186 #define report(M, ...) do { \ 187 fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \ 188 errors++; \ 189 } while(0) 190 ``` 191 192 Since it is a do-while loop it is always evaluated at least once, but because 193 the condition is `0`, it never repeats. A while loop also needs a semicolon at 194 the end, this allows us to include one after the macro, giving the programmer 195 the expected results. The do-while loop also only counts as one line, so the 196 shorten `if` statement notation can be used. 197 198 In this example, macros are still a very viable option, once you are aware of 199 their limitations. 200 201 ## the ugly 202 203 The next portion is for serious macro abuses, one such example I found 204 stumbling through `tcsh`'s source code. 205 206 ```c 207 #define DO_STRBUF(STRBUF, CHAR, STRLEN) \ 208 \ 209 struct STRBUF * \ 210 STRBUF##_alloc(void) \ 211 { \ 212 return xcalloc(1, sizeof(struct STRBUF)); \ 213 } \ 214 \ 215 void \ 216 STRBUF##_free(void *xbuf) \ 217 { \ 218 STRBUF##_cleanup(xbuf); \ 219 xfree(xbuf); \ 220 } \ 221 \ 222 const struct STRBUF STRBUF##_init /* = STRBUF##_INIT; */ 223 224 DO_STRBUF(strbuf, char, strlen); 225 DO_STRBUF(Strbuf, Char, Strlen); 226 ``` 227 228 `tcsh`'s `tc.str.c` defines an 80 line long macro (small portion displayed 229 above, the whole mess is [here][1]) in order to duplicate a family of functions 230 to work with their `Char` variable type as well as normal `char`. The macro is 231 defined as `DO_STRBUF` which takes 3 arguments, a struct `STRBUF`, a type 232 `CHAR`, and a function `STRLEN`. `tcsh`'s old code base is designed to work on 233 many legacy and outdated systems, so it needs to support the various types of 234 `char`, such as `wchar_t`, `wint_t`, `short`, etc. The overly complex 235 assignment of `Char` can be seen [here][2]. For some reason, the authors 236 thought it best to include two types of these boilerplate functions, instead of 237 unifying them as one set, which would greatly improve the entire code base's 238 simplicity and readability. 239 240 ## conclusion 241 242 If you are aware of macros' limitations then they can become a powerful 243 tool to quickly write clean and effective code. You always have to be careful 244 though when utilizing them, use your judgement to determine when their 245 advantages over normal functions become problems and headaches instead of fast 246 time savers. 247 248 [1]: https://github.com/tcsh-org/tcsh/blob/master/tc.str.c#L628-L710 249 [2]: https://github.com/tcsh-org/tcsh/blob/master/sh.h#L94-L124