edryd.org

some of my neat stuff
git clone git://edryd.org/edryd.org
Log | Files | Refs | LICENSE

xxd.md (6329B)


      1 ---
      2 title: "Read Files at Compile Time in C"
      3 date: 2021-01-30
      4 tags: tech c make
      5 categories: tech
      6 ---
      7 
      8 Any [beginners C tutorial][1] will teach the basics of the C standard library's
      9 file reading functions (eg `fopen`). The limitations of these interfaces is that they are designed
     10 for run time reading, meaning that the file will be opened and read when the program is executed.
     11 For most cases this is exactly what is wanted, often the file is only given at
     12 run time by the user. However, there might be situations where instead it needs to read files
     13 at compile time, similar to having a statically linked library. For example, I use this in my
     14 [hobby programming language][2] to include the language's standard library written in the language
     15 itself. This is done at compile time so it is statically bundled with the binary and requires no
     16 external text file dependencies.
     17 
     18 [1]: https://www.tutorialspoint.com/cprogramming/c_file_io.htm
     19 [2]: https://edryd.org/projects/tisp
     20 
     21 To achieve this a command line tool called xxd can be used. xxd is tool to create hex dumps of any
     22 file given to it, and might already be on your Linux distro as it is included with
     23 [vim][3].
     24 
     25 [3]: https://www.vim.org
     26 
     27 Hex files are hexadecimal (base 16) representations of each byte of a file. Instead of printing
     28 each letter as the character it represents, it displays the binary encoding as a base 16 number.
     29 This allows reading of special and invisible characters such as newlines, tabs, and
     30 spaces. The hex file encoding can also be easily imported into a C program with xxd.
     31 
     32 To create hex files with xxd, simply give it the file name and it will print a hex dump to stdout.
     33 
     34 `file.txt`:
     35 ```
     36 Tomorrow, and tomorrow, and tomorrow,
     37 Creeps in this petty pace from day to day,
     38 To the last syllable of recorded time
     39 ```
     40 
     41 `xxd file.txt` output:
     42 ```
     43 $ xxd file.txt
     44 00000000: 546f 6d6f 7272 6f77 2c20 616e 6420 746f  Tomorrow, and to
     45 00000010: 6d6f 7272 6f77 2c20 616e 6420 746f 6d6f  morrow, and tomo
     46 00000020: 7272 6f77 2c0a 4372 6565 7073 2069 6e20  rrow,.Creeps in
     47 00000030: 7468 6973 2070 6574 7479 2070 6163 6520  this petty pace
     48 00000040: 6672 6f6d 2064 6179 2074 6f20 6461 792c  from day to day,
     49 00000050: 0a54 6f20 7468 6520 6c61 7374 2073 796c  .To the last syl
     50 00000060: 6c61 626c 6520 6f66 2072 6563 6f72 6465  lable of recorde
     51 00000070: 6420 7469 6d65 0a                        d time.
     52 ```
     53 
     54 This default output is useful for humans to read, but not easy to get into a C program. Luckily
     55 xxd includes an option to format the hex dump in a syntax C can parse. If supplied the flag `-i`
     56 xxd outputs the hex bytes formatted as an array declaration in C syntax, where each element is
     57 a byte of the file in hexadecimal.
     58 
     59 ```c
     60 $ xxd -i file.txt
     61 unsigned char file_txt[] = {
     62   0x54, 0x6f, 0x6d, 0x6f, 0x72, 0x72, 0x6f, 0x77, 0x2c, 0x20, 0x61, 0x6e,
     63   0x64, 0x20, 0x74, 0x6f, 0x6d, 0x6f, 0x72, 0x72, 0x6f, 0x77, 0x2c, 0x20,
     64   0x61, 0x6e, 0x64, 0x20, 0x74, 0x6f, 0x6d, 0x6f, 0x72, 0x72, 0x6f, 0x77,
     65   0x2c, 0x0a, 0x43, 0x72, 0x65, 0x65, 0x70, 0x73, 0x20, 0x69, 0x6e, 0x20,
     66   0x74, 0x68, 0x69, 0x73, 0x20, 0x70, 0x65, 0x74, 0x74, 0x79, 0x20, 0x70,
     67   0x61, 0x63, 0x65, 0x20, 0x66, 0x72, 0x6f, 0x6d, 0x20, 0x64, 0x61, 0x79,
     68   0x20, 0x74, 0x6f, 0x20, 0x64, 0x61, 0x79, 0x2c, 0x0a, 0x54, 0x6f, 0x20,
     69   0x74, 0x68, 0x65, 0x20, 0x6c, 0x61, 0x73, 0x74, 0x20, 0x73, 0x79, 0x6c,
     70   0x6c, 0x61, 0x62, 0x6c, 0x65, 0x20, 0x6f, 0x66, 0x20, 0x72, 0x65, 0x63,
     71   0x6f, 0x72, 0x64, 0x65, 0x64, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x0a
     72 };
     73 unsigned int file_txt_len = 119;
     74 ```
     75 
     76 This creates an array containing the contents of the file.  It also defines another variable as
     77 the number of characters in the file, which is the length of the array.
     78 
     79 Since xxd takes care of all the formatting, to import it in C simply save it
     80 to a file `xxd -i file.txt > file.h` and include it `#include "file.h"` in the necessary C file.
     81 You now have access to both variables, and therefore the contents of the file, without the need
     82 for `fopen`. Because the generated header file is now being read instead of the file directly the
     83 compiled binary does not need the original file to exist.
     84 
     85 If all that is needed is to loop over each character, simply iterate over the array until the
     86 length is reached.  However, in C character arrays can also be treated as strings (since strings
     87 are just pointers, and arrays are specially allocated pointers). The only thing that is missing is
     88 the null terminator required for standard strings. To insert this zero byte use the following to
     89 append null to the end of the array:
     90 
     91 ```
     92 xxd -i file.txt | tac | sed "3s/$/, 0x00/" | tac > file.h
     93 ```
     94 
     95 Now the variable `file_txt` can be used as a regular string in the C code, reading the contents
     96 of the file.
     97 
     98 If multiple files need to be read, they can be included all at once as a single file by being
     99 concatenated `cat file1.txt file2.txt | xxd -i -`. However, this will not produce the correct
    100 output. When xxd reads from stdin it does not know the file name so does not know what to call the
    101 variables. Instead it only outputs the formatted contents of the array, letting you set the
    102 variables' names yourself.
    103 
    104 This can be done by a simple make recipe which can be added to a project's `Makefile`:
    105 
    106 ```make
    107 file.h: file1.txt file2.txt
    108 	@echo xxd $@
    109 	@echo "unsigned char fileh[] = { " > $@
    110 	@cat $^ | xxd -i - >> $@
    111 	@echo ", 0x00};" >> $@
    112 	@echo "unsigned int fileh_len = `wc -c $^ | awk 'END{print $$1}'`;" >> $@
    113 ```
    114 
    115 Here `file.h` is the produced header, and `file1.txt`, `file2.txt` are the files to be read.
    116 Simply add `file.h` as a dependency to the C file which includes it to have the recipe run when
    117 needed. For a real world use of this method see [my programming language][4].
    118 
    119 [4]: https://github.com/edvb/tisp/blob/master/Makefile#L27
    120 
    121 Using xxd is a simple but effective way to convert any file to a C header, allowing it to be
    122 included when compiled and removing the need for the file path to exist at run
    123 time. While only basic ASCII text file examples were covered, this method also works for any
    124 file (eg unicode text, images, PDFs, etc) since they all store data as a binary file which can be
    125 converted to hexadecimal.  However, extra steps would need to be taken when reading these files,
    126 for example in unicode each letter is not necessary a single byte or single element of the array.