some of my neat stuff
git clone git://edryd.org/edryd.org
Log | Files | Refs | LICENSE

commit e185ac967a679129e3e0ce9dcc5224c0156fd95b
parent 63746dda92b67c9d80c86332a0a89ce7065a0ee6
Author: Ed van Bruggen <edvb@uw.edu>
Date:   Tue, 11 Sep 2018 22:07:51 -0700

Add dangers of c macros post

_posts/2018-09-11-macro-dangers.md | 248+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 248 insertions(+), 0 deletions(-)

diff --git a/_posts/2018-09-11-macro-dangers.md b/_posts/2018-09-11-macro-dangers.md @@ -0,0 +1,248 @@ +--- +title: "the dangers of C macros" +tags: tech code c +categories: tech +--- + +C macros, while being extremely powerful when used correctly, can also be the +cause for a lot of unnecessary headaches if you are not aware of their +limitations. It is easy to view macros as just a fast shorthand for making +simple functions, but there are very important differences which need to be +addressed. + +This post outlines some real life examples of macros I have come across or +attempted to use, some are very beneficial, improving life for everyone, +others are terrible and impossible to debug, and then some just are plain +stupid. + +## intro + +In C, macros are delegated to the preprocessor, a program run before the +compiler which changes the source C files so they are ready to be compiled. +This includes basic things such as removing comments or adding the contents of +others files with `#include`. The preprocessor also handles a crude, yet +powerful, form of constant variable creation with `#define`. For example, the +following makes the C preprocessor replaces every occurrence of `PI` with the +number `3.14159`. + +```c +#define PI 3.14159 +``` + +This is also extended to accept arguments, allowing for macros which act as +basic functions. + +```c +#define RADTODEG(X) ((X) * 57.29578) +``` + +The preceding macro replaces every `RADTODEG(PI/2)`, with `((3.14159/2) * +57.29578)`, converting *π* over 2 radians to about 90 degrees. + +## the good + +```c +#define MAX(A, B) ((A) > (B) ? (A) : (B)) +#define MIN(A, B) ((A) < (B) ? (A) : (B)) +#define BETWEEN(X, A, B) ((A) <= (X) && (X) <= (B)) +#define LEN(X) (sizeof(X) / sizeof((X)[0])) +``` + +Above is a list of four macros which I have in pretty much every project I am +working on, just because they are so useful. The first one, `MAX` returns the +larger of the two given numbers. This is a nice shorthand, making the code much +easier to read by hiding the ternary operator away. In companion with it is of +course `MIN`, which does exactly what you think it does. + +Next, I often find my self needing `BETWEEN`, which returns whether or not the +given character `X` is inside `A` and `B`. One example of this is to figure out +if a given character is a lower case letter: `BETWEEN(c, 'a', 'z')`. Finally, +`LEN` returns the length of an array, fairly basic and well needed. + +## the bad + +Here is a seemly innocent macro I wrote to check if a character is valid for a +specific application: + +```c +#define ISVALID(C) (BETWEEN(C, 'a', 'z') || strchr("_-", C)) +``` + +The macro should return 1 if the passed character, `C`, is a lowercase letter, +an underscore, or a hyphen. At first, it might seem like this macro works +perfectly fine, and it does for the most part; however, in certain cases, there +are undesirable side effects which are hard to figure out. For example, I +wanted to use this macro, which had been working well so far, to strip the +characters at the end of a string that are not valid. Simple enough, right? + +```c +for (char *s = str; *s && ISVALID(*s++); len++) + /* do nothing in here */ ; +str[len] = '\0'; +``` + +This should move the terminating null character to where the last valid +character of the string is, but in this current usage, it doesn't seem to +work correctly. If you use the example string `"test-string! removed"` +you would expect `"test-string"`, instead you get `"te"`, which is much +shorter than it ought to be. + +In order to know why this happens you have to understand what the C +preprocessor is doing under the hood. For every instance of `ISVALID`, C +replaces it with the defined expression, in this case `(BETWEEN(C, 'a', 'z') || +strchr("_-", C))`. If you specified arguments, which is the case for macros, +the variable is then replaced with every occurrence within the given +expression, so the for loop gets replaced with: + +```c +for (char *s = str; *s && (('a' <= *s++ && *s++ <= 'z') || strchr("_-", *s++)); len++) ; +``` + +It should be clear now why this is producing weird results, the increment is +duplicated three times. When a function is run, each argument is evaluated +before being supplied to the body, but for macros, the preprocessor doesn't +understand the expression, it just blindly copies and pastes it to every +occurrence, causing the character to be incremented more times than wanted. + +This subtle, but critical, distinction between macros and functions can cause +these hard to find bugs when you refuse to acknowledge their differences. + +To solve this error I ended up just replacing this short macro with a function, +which in this case demonstrates some of the limitations of macros. Sometimes it +is just easier to use a function. + +Another example I have come across is a macro used in a codebase to report and +keep count of any errors encountered. The initial version of this macro is +shown. + +```c +#define report(M, ...) \ + fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \ + errors++; +``` + +This works fine for many causes, but problems arise when you start to use it +more often in different situations. One of these use cases which no longer +works as intended is when you try to call it in an `if` statement. + +``` +if (val != A_NUM) + report("error: variable 'val' is [%d] not A_NUM", val); +``` + +In C the curly braces around a conditional statement can be omitted if the +statement only contains a single line. Most of the time this works fine and +makes the code look cleaner, but this example complicates things. While the +macro may look like a single line, when the preprocessor modifies it is now two +separate lines, the `fprintf` function and the `errors++` statement. The `if` +statement only encompasses the `fprintf`, so the program always increments +`errors` by one, even if `val` is the desired value and there is no issue. + +At first, this seems easy enough to fix, once you realize that you are calling +a multi-lined macro, not a function, you just add some curly braces to your +macro. + +```c +#define report(M, ...) { \ + fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \ + errors++; \ +} +``` + +This does indeed solve this particular problem, but it also introduces some +others. Later on, I wanted to add an `else` to the `if` statement, but the +compiler spat out a syntax error complaining that the there is no `if` for the +`else`. After much examination, I realized that the semicolon after the macro +is actually not needed and is getting in the way of the `else`. When expanded +this code: + +``` +if (str == NULL) + report("error: variable str is NULL"); +else + do_something(str); +``` + +Becomes: + +``` +if (str == NULL) { + fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); + errors++; +}; +else + do_something(str); +``` + +Now it is clear that this semicolon is separating the `if` and `else` +statements. You could just remove this semicolon since it's not actually +needed, but now it looks like your code is missing a semicolon, and every time +you use this macro you have to remember that you can't use a semicolon. This is +less than ideal, so instead, you can extend these curly braces to become a +do-while loop. + +```c +#define report(M, ...) do { \ + fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \ + errors++; \ +} while(0) +``` + +Since it is a do-while loop it is always evaluated at least once, but because +the condition is `0`, it never repeats. A while loop also needs a semicolon at +the end, this allows us to include one after the macro, giving the programmer +the expected results. The do-while loop also only counts as one line, so the +shorten `if` statement notation can be used. + +In this example, macros are still a very viable option, once you are aware of +their limitations. + +## the ugly + +The next portion is for serious macro abuses, one such example I found +stumbling through `tcsh`'s source code. + +```c +#define DO_STRBUF(STRBUF, CHAR, STRLEN) \ + \ +struct STRBUF * \ +STRBUF##_alloc(void) \ +{ \ + return xcalloc(1, sizeof(struct STRBUF)); \ +} \ + \ +void \ +STRBUF##_free(void *xbuf) \ +{ \ + STRBUF##_cleanup(xbuf); \ + xfree(xbuf); \ +} \ + \ +const struct STRBUF STRBUF##_init /* = STRBUF##_INIT; */ + +DO_STRBUF(strbuf, char, strlen); +DO_STRBUF(Strbuf, Char, Strlen); +``` + +`tcsh`'s `tc.str.c` defines an 80 line long macro (small portion displayed +above, the whole mess is [here][1]) in order to duplicate a family of functions +to work with their `Char` variable type as well as normal `char`. The macro is +defined as `DO_STRBUF` which takes 3 arguments, a struct `STRBUF`, a type +`CHAR`, and a function `STRLEN`. `tcsh`'s old code base is designed to work on +many legacy and outdated systems, so it needs to support the various types of +`char`, such as `wchar_t`, `wint_t`, `short`, etc. The overly complex +assignment of `Char` can be seen [here][2]. For some reason, the authors +thought it best to include two types of these boilerplate functions, instead of +unifying them as one set, which would greatly improve the entire code base's +simplicity and readability. + +## conclusion + +If you are aware of macros' limitations then they can become a powerful +tool to quickly write clean and effective code. You always have to be careful +though when utilizing them, use your judgement to determine when their +advantages over normal functions become problems and headaches instead of fast +time savers. + +[1]: https://github.com/tcsh-org/tcsh/blob/master/tc.str.c#L628-L710 +[2]: https://github.com/tcsh-org/tcsh/blob/master/sh.h#L94-L124