xxd.md (6329B)
1 --- 2 title: "Read Files at Compile Time in C" 3 date: 2021-01-30 4 tags: tech c make 5 categories: tech 6 --- 7 8 Any [beginners C tutorial][1] will teach the basics of the C standard library's 9 file reading functions (eg `fopen`). The limitations of these interfaces is that they are designed 10 for run time reading, meaning that the file will be opened and read when the program is executed. 11 For most cases this is exactly what is wanted, often the file is only given at 12 run time by the user. However, there might be situations where instead it needs to read files 13 at compile time, similar to having a statically linked library. For example, I use this in my 14 [hobby programming language][2] to include the language's standard library written in the language 15 itself. This is done at compile time so it is statically bundled with the binary and requires no 16 external text file dependencies. 17 18 [1]: https://www.tutorialspoint.com/cprogramming/c_file_io.htm 19 [2]: https://edryd.org/projects/tisp 20 21 To achieve this a command line tool called xxd can be used. xxd is tool to create hex dumps of any 22 file given to it, and might already be on your Linux distro as it is included with 23 [vim][3]. 24 25 [3]: https://www.vim.org 26 27 Hex files are hexadecimal (base 16) representations of each byte of a file. Instead of printing 28 each letter as the character it represents, it displays the binary encoding as a base 16 number. 29 This allows reading of special and invisible characters such as newlines, tabs, and 30 spaces. The hex file encoding can also be easily imported into a C program with xxd. 31 32 To create hex files with xxd, simply give it the file name and it will print a hex dump to stdout. 33 34 `file.txt`: 35 ``` 36 Tomorrow, and tomorrow, and tomorrow, 37 Creeps in this petty pace from day to day, 38 To the last syllable of recorded time 39 ``` 40 41 `xxd file.txt` output: 42 ``` 43 $ xxd file.txt 44 00000000: 546f 6d6f 7272 6f77 2c20 616e 6420 746f Tomorrow, and to 45 00000010: 6d6f 7272 6f77 2c20 616e 6420 746f 6d6f morrow, and tomo 46 00000020: 7272 6f77 2c0a 4372 6565 7073 2069 6e20 rrow,.Creeps in 47 00000030: 7468 6973 2070 6574 7479 2070 6163 6520 this petty pace 48 00000040: 6672 6f6d 2064 6179 2074 6f20 6461 792c from day to day, 49 00000050: 0a54 6f20 7468 6520 6c61 7374 2073 796c .To the last syl 50 00000060: 6c61 626c 6520 6f66 2072 6563 6f72 6465 lable of recorde 51 00000070: 6420 7469 6d65 0a d time. 52 ``` 53 54 This default output is useful for humans to read, but not easy to get into a C program. Luckily 55 xxd includes an option to format the hex dump in a syntax C can parse. If supplied the flag `-i` 56 xxd outputs the hex bytes formatted as an array declaration in C syntax, where each element is 57 a byte of the file in hexadecimal. 58 59 ```c 60 $ xxd -i file.txt 61 unsigned char file_txt[] = { 62 0x54, 0x6f, 0x6d, 0x6f, 0x72, 0x72, 0x6f, 0x77, 0x2c, 0x20, 0x61, 0x6e, 63 0x64, 0x20, 0x74, 0x6f, 0x6d, 0x6f, 0x72, 0x72, 0x6f, 0x77, 0x2c, 0x20, 64 0x61, 0x6e, 0x64, 0x20, 0x74, 0x6f, 0x6d, 0x6f, 0x72, 0x72, 0x6f, 0x77, 65 0x2c, 0x0a, 0x43, 0x72, 0x65, 0x65, 0x70, 0x73, 0x20, 0x69, 0x6e, 0x20, 66 0x74, 0x68, 0x69, 0x73, 0x20, 0x70, 0x65, 0x74, 0x74, 0x79, 0x20, 0x70, 67 0x61, 0x63, 0x65, 0x20, 0x66, 0x72, 0x6f, 0x6d, 0x20, 0x64, 0x61, 0x79, 68 0x20, 0x74, 0x6f, 0x20, 0x64, 0x61, 0x79, 0x2c, 0x0a, 0x54, 0x6f, 0x20, 69 0x74, 0x68, 0x65, 0x20, 0x6c, 0x61, 0x73, 0x74, 0x20, 0x73, 0x79, 0x6c, 70 0x6c, 0x61, 0x62, 0x6c, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x72, 0x65, 0x63, 71 0x6f, 0x72, 0x64, 0x65, 0x64, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x0a 72 }; 73 unsigned int file_txt_len = 119; 74 ``` 75 76 This creates an array containing the contents of the file. It also defines another variable as 77 the number of characters in the file, which is the length of the array. 78 79 Since xxd takes care of all the formatting, to import it in C simply save it 80 to a file `xxd -i file.txt > file.h` and include it `#include "file.h"` in the necessary C file. 81 You now have access to both variables, and therefore the contents of the file, without the need 82 for `fopen`. Because the generated header file is now being read instead of the file directly the 83 compiled binary does not need the original file to exist. 84 85 If all that is needed is to loop over each character, simply iterate over the array until the 86 length is reached. However, in C character arrays can also be treated as strings (since strings 87 are just pointers, and arrays are specially allocated pointers). The only thing that is missing is 88 the null terminator required for standard strings. To insert this zero byte use the following to 89 append null to the end of the array: 90 91 ``` 92 xxd -i file.txt | tac | sed "3s/$/, 0x00/" | tac > file.h 93 ``` 94 95 Now the variable `file_txt` can be used as a regular string in the C code, reading the contents 96 of the file. 97 98 If multiple files need to be read, they can be included all at once as a single file by being 99 concatenated `cat file1.txt file2.txt | xxd -i -`. However, this will not produce the correct 100 output. When xxd reads from stdin it does not know the file name so does not know what to call the 101 variables. Instead it only outputs the formatted contents of the array, letting you set the 102 variables' names yourself. 103 104 This can be done by a simple make recipe which can be added to a project's `Makefile`: 105 106 ```make 107 file.h: file1.txt file2.txt 108 @echo xxd $@ 109 @echo "unsigned char fileh[] = { " > $@ 110 @cat $^ | xxd -i - >> $@ 111 @echo ", 0x00};" >> $@ 112 @echo "unsigned int fileh_len = `wc -c $^ | awk 'END{print $$1}'`;" >> $@ 113 ``` 114 115 Here `file.h` is the produced header, and `file1.txt`, `file2.txt` are the files to be read. 116 Simply add `file.h` as a dependency to the C file which includes it to have the recipe run when 117 needed. For a real world use of this method see [my programming language][4]. 118 119 [4]: https://github.com/edvb/tisp/blob/master/Makefile#L27 120 121 Using xxd is a simple but effective way to convert any file to a C header, allowing it to be 122 included when compiled and removing the need for the file path to exist at run 123 time. While only basic ASCII text file examples were covered, this method also works for any 124 file (eg unicode text, images, PDFs, etc) since they all store data as a binary file which can be 125 converted to hexadecimal. However, extra steps would need to be taken when reading these files, 126 for example in unicode each letter is not necessary a single byte or single element of the array.