Monday, 15 October 2012

The hexdump Utility: A Tutorial

Introduction

The hexdump utility displays file contents in a user-specified format. This tutorial will take a look at some of its features to help you control the format of the output produced by the program. A more complete description of the command is available on its man page—which, of course, you can read with the following command:

man hexdump

Example Input File

If you want to follow along with the examples in this tutorial, then I suggest you run the following command to create the “mpoppins” input file:

echo 'supercalifragilisticexpialidocious' > mpoppins

The tutorial will attempt to explain enough about the hexdump utility to allow you to create the following output from this input file:

00000000    73 75 70 65 72 63 61 6C  69 66 72 61 67 69 6C 69    supercalifragili
00000010    73 74 69 63 65 78 70 69  61 6C 69 64 6F 63 69 6F    sticexpialidocio
00000020    75 73 0A                                            us.

Default Output

The simplest invocation of the hexdump utility displays a hexadecimal dump of the input file:

hexdump mpoppins
0000000 7573 6570 6372 6c61 6669 6172 6967 696c
0000010 7473 6369 7865 6970 6c61 6469 636f 6f69
0000020 7375 000a
0000023

Each output line displays the input offset in hexadecimal, followed by eight chunks of data. Each chunk shows two input bytes, processed as a 16-bit integer number, and is output with four hexadecimal digits.

Note that, since the x86 family of CPUs stores integers in little-endian byte order, integers will appear “byte-reversed”. For instance, since the first byte of the example file is hexadecimal ‘73’ and its second byte is hexadecimal ‘75’, the integer number formed by these two bytes will be hexadecimal ‘7573’, not ‘7375’.

Also note that, even though the size of the input file is uneven, the last chunk of input data will be padded with a null byte. Thus, the last chunk is processed as if the last input byte—hexadecimal ‘0A’—were followed by an additional null byte, so that it, too, can be displayed as a two-byte hexadecimal integer.

Finally, the last output line displays the input offset after the input file is completely processed—i.e., the input file size—also in hexadecimal. Therefore, even though the output shows an even number of bytes, the number displayed on the last output line indicates that the input file size is actually uneven.

The ‘-x’ Option

If you specify the ‘-x’ option to the hexdump utility, then the output will be equivalent to the default, even though the spacing will be different:

hexdump -x mpoppins
0000000    7573    6570    6372    6c61    6669    6172    6967    696c
0000010    7473    6369    7865    6970    6c61    6469    636f    6f69
0000020    7375    000a
0000023

The ‘-C’ Option

The ‘-C’ option produces what the program calls “canonical hex + ASCII” output:

hexdump -C mpoppins
00000000  73 75 70 65 72 63 61 6c  69 66 72 61 67 69 6c 69  |supercalifragili|
00000010  73 74 69 63 65 78 70 69  61 6c 69 64 6f 63 69 6f  |sticexpialidocio|
00000020  75 73 0a                                          |us.|
00000023

This time, each line displays sixteen bytes, one by one, in hexadecimal, followed by the same sixteen bytes in ASCII (where non-printable characters are replaced with a single “.”); the sequence of ASCII characters is enclosed in vertical bars (i.e., “|”).

Using a Format File

The output format that the hexdump utility produces, is specified through a set of format strings. Certain formats are built into the program, and can be activated with a simple command-line option—e.g., the ‘-x’ or ‘-C’ options demonstrated above.

The program is not limited to its built-in formats, however, but it allows you to specify a custom format instead. You can store a custom format specification into a format file, which you can then identify to the program using the ‘-f’ command-line option.

The format file consists of one or more text lines, where each line defines one format string.

The simplest format string consists simply of one format—which should be enclosed in double quotes. For example, a simple hexadecimal format, with a single space separating successive output chunks, can be specified as follows:

"%x "

To store this format into a format file, you can run, e.g., the following command:

echo '"%x "' > fmt-01

You can then pass this format file to the hexdump command:

hexdump -f fmt-01 mpoppins
65707573 6c616372 61726669 696c6967 63697473 69707865 64696c61 6f69636f a7375

As you can see, the “%x” format produces hexadecimal output, with four input bytes per chunk. Again, each chunk is processed as an integer number, which (on an x86-family computer) causes the bytes to appear in reversed order. Also, as the last chunk demonstrates, leading zeroes are suppressed.

If you want to prevent this zero suppression, then you can specify the desired field width, with a leading zero, on the format—e.g., for eight-character fixed-width output:

echo '"%08x "' > fmt-02
hexdump -f fmt-02 mpoppins
65707573 6c616372 61726669 696c6967 63697473 69707865 64696c61 6f69636f 000a7375

If you omit the leading zero on the field width, then the leading zeroes will be replaced with spaces:

echo '"%8x "' > fmt-03
hexdump -f fmt-03 mpoppins
65707573 6c616372 61726669 696c6967 63697473 69707865 64696c61 6f69636f    a7375

It should be noted that the field width determines the minimum number of output characters produced for the field. If a value doesn’t fit, then the output width will be increased as needed.

In addition to the field width, the format specification may include a precision, which, if present, must be preceded by a period (“.”), and identifies the minimum number of digits that must be produced:

echo '"%08.8x "' > fmt-04
hexdump -f fmt-04 mpoppins
65707573 6c616372 61726669 696c6967 63697473 69707865 64696c61 6f69636f 000a7375

Even though the precision may seem redundant (particularly if, as in the above example, it is set equal to the field width), it appears to be common practice to include it nonetheless, and this convention will also be used in the remainder of this tutorial.

By the way, the precision does make a difference if it is less than the field width—for example:

echo '"%08.7x "' > fmt-05
hexdump -f fmt-05 mpoppins
65707573 6c616372 61726669 696c6967 63697473 69707865 64696c61 6f69636f  00a7375

This time, all values are formatted into eight-character output fields, but—as the last value shows—the number of actual digits produced may be reduced to seven, if possible.

You can change the number of input bytes per chunk. In the case of the hexadecimal output format, four input bytes per chunk are assumed by default, but you can request either one or two input bytes per chunk instead. This byte count should be preceded by a forward slash (i.e., “/”) and followed by the format to which you want to apply it—e.g., for a hexadecimal format with a byte count of 2:

echo '/2 "%04.4x "' > fmt-06
hexdump -f fmt-06 mpoppins
7573 6570 6372 6c61 6669 6172 6967 696c 7473 6369 7865 6970 6c61 6469 636f 6f69 7375 000a

Similarly, with a byte count of 1:

echo '/1 "%02.2x "' > fmt-07
hexdump -f fmt-07 mpoppins
73 75 70 65 72 63 61 6c 69 66 72 61 67 69 6c 69 73 74 69 63 65 78 70 69 61 6c 69 64 6f 63 69 6f 75 73 0a

Clearly, displaying all of the output on one long line is not particularly elegant. You may, for example, prefer to display eight chunks of two input bytes (for a total of sixteen bytes) per line. This number of output chunks is called the iteration count, and should precede the forward slash—e.g.:

echo '8/2 "%04.4x "' > fmt-08
hexdump -f fmt-08 mpoppins
7573 6570 6372 6c61 6669 6172 6967 696c7473 6369 7865 6970 6c61 6469 636f 6f697375 000a

Note that all of the output is still displayed on one line, since you haven’t requested any newlines in your format just yet. You may, however, be surprised that after every eighth chunk, there appears to be a space missing—e.g., the eighth and ninth chunks run into one another and are displayed as ‘696c7473’. This is not a bug—it’s a feature: If the iteration count is greater than 1, then there will be no trailing spaces produced following the last iteration.

To specify a newline in your format, you should use the “\n” escape sequence; you can simply append it, enclosed in double quotes, to your format string—i.e.:

echo '8/2 "%04.4x " "\n"' > fmt-09
hexdump -f fmt-09 mpoppins
7573 6570 6372 6c61 6669 6172 6967 696c
7473 6369 7865 6970 6c61 6469 636f 6f69
7375 000a

The iteration count in this example is 8, and the byte count is 2. As a consequence, the format immediately following will process an input chunk of two bytes at a time, and will be applied eight times in a row. The format string will, therefore, work with blocks of sixteen input bytes in total.

Each two-byte chunk will be displayed as four zero-padded hexadecimal digits, followed by a space. However, since the iteration count is greater than 1, the space following the last iteration will be suppressed.

Finally, after eight iterations are executed, a newline will be output.

The format string will be applied to each sixteen-byte input block in succession, until the end of the input file is reached.

Note that the last block has only four input bytes left to process—the final three bytes from the input file, plus one padding null byte. In such a case, the remaining iterations will output spaces instead of actual values obtained from the input file. In other words, spaces will be appended to the last line of output, as shown below—where each ‘•’-sign represents a space that is output in place of a digit from a missing input value:

7573 6570 6372 6c61 6669 6172 6967 696c
7473 6369 7865 6970 6c61 6469 636f 6f69
7375 000a •••• •••• •••• •••• •••• ••••

Displaying the Current Input Offset

To display the current input offset in hexadecimal, the hexdump utility supports the special “%_ax” format specification. Just like the “%x” format, it supports a field width and a precision.

Thus you can, for instance, expand the previous example to include the current input offset, with seven hexadecimal digits, at the start of each line, like this:

echo '"%07.7_ax " 8/2 "%04.4x " "\n"' > fmt-10
hexdump -f fmt-10 mpoppins
0000000 7573 6570 6372 6c61 6669 6172 6967 696c
0000010 7473 6369 7865 6970 6c61 6469 636f 6f69
0000020 7375 000a

Note that, except for the final output line that displays the total number of bytes processed, this result is identical to the output that the program produces by default.

Displaying the Final Input Offset at the End of the Output

The hexdump utility also supports a “%_Ax” format specification (with an uppercase ‘A’), which will display the input offset only after the input file is completely processed. You can simply append it to your existing format string, like so:

echo '"%07.7_ax " 8/2 "%04.4x " "\n" "%07.7_Ax\n"' > fmt-11
hexdump -f fmt-11 mpoppins
0000000 7573 6570 6372 6c61 6669 6172 6967 696c
0000010 7473 6369 7865 6970 6c61 6469 636f 6f69
0000020 7375 000a
0000023

The result is now completely identical to the default output from the program.

Multiple Format Strings

All of the examples in this tutorial so far specified a single format string. If you want to use multiple format strings instead, then just enter each of them on a separate line in the format file.

It is, for example, common practice to create a separate format string for the “%_Ax” format specification—e.g.:

echo '"%07.7_ax " 8/2 "%04.4x " "\n"
"%07.7_Ax\n"' > fmt-12
hexdump -f fmt-12 mpoppins
0000000 7573 6570 6372 6c61 6669 6172 6967 696c
0000010 7473 6369 7865 6970 6c61 6469 636f 6f69
0000020 7375 000a
0000023

If multiple format strings are given, then they will be applied sequentially to each input block.

In fact, if you really want to, you can enter each format specification on its own, separate format string—as follows:

echo '"%07.7_ax "
8/2 "%04.4x "
"\n"
"%07.7_Ax\n"' > fmt-13
hexdump -f fmt-13 mpoppins
0000000 7573 6570 6372 6c61 6669 6172 6967 696c
0000010 7473 6369 7865 6970 6c61 6469 636f 6f69
0000020 7375 000a
0000023

In this example, it doesn’t really matter whether or not you split up the format into multiple format strings—the output is not affected. As later examples in this tutorial will show, however, this holds true only in the simplest of cases.

Displaying Bytes in Hexadecimal and as Printable ASCII

By now, it should be obvious how you can display bytes, one by one, in hexadecimal, with sixteen bytes per line:

echo '16/1 "%02.2x " "\n"' > fmt-14
hexdump -f fmt-14 mpoppins
73 75 70 65 72 63 61 6c 69 66 72 61 67 69 6c 69
73 74 69 63 65 78 70 69 61 6c 69 64 6f 63 69 6f
75 73 0a

If you want to output an extra space between the eighth and ninth bytes on each line, then you will have to specify two format units with an iteration count of 8, with two spaces in between—like so:

echo '8/1 "%02.2x " "  " 8/1 "%02.2x " "\n"' > fmt-15
hexdump -f fmt-15 mpoppins
73 75 70 65 72 63 61 6c  69 66 72 61 67 69 6c 69
73 74 69 63 65 78 70 69  61 6c 69 64 6f 63 69 6f
75 73 0a

Keep in mind that, if the iteration count on a format unit is greater than 1, then the space following the last iteration will be suppressed. Hence, the literal string following the first format unit needs two spaces instead of just one.

Note that this format string still processes the input data in 16-byte blocks, which will be output in two 8-byte halves.

Also, it is important to enter the format specifications on a single format string; if you split the format, then the result will different, and likely not what you want:

echo '8/1 "%02.2x " "  "
8/1 "%02.2x " "\n"' > fmt-16
hexdump -f fmt-16 mpoppins
73 75 70 65 72 63 61 6c  73 75 70 65 72 63 61 6c
69 66 72 61 67 69 6c 69  69 66 72 61 67 69 6c 69
73 74 69 63 65 78 70 69  73 74 69 63 65 78 70 69
61 6c 69 64 6f 63 69 6f  61 6c 69 64 6f 63 69 6f
75 73 0a                 75 73 0a

This time, both format strings consume eight input bytes. As a result, the input data will be processed in eight-byte blocks, which will be passed to both of the format strings sequentially; in other words, each eight-byte input block will be formatted twice.

Consequently, if you want to display the input data in two formats—e.g., both in hexadecimal and as printable ASCII characters—then you will have to specify two format strings: one for the hexadecimal output, and another one for the printable ASCII output. Given that the format specification for printable ASCII output is “%_p”, you can produce the desired result as follows:

echo '8/1 "%02.2x " "  " 8/1 "%02.2x " "    "
16/1 "%_p" "\n"' > fmt-17
hexdump -f fmt-17 mpoppins
73 75 70 65 72 63 61 6c  69 66 72 61 67 69 6c 69    supercalifragili
73 74 69 63 65 78 70 69  61 6c 69 64 6f 63 69 6f    sticexpialidocio
75 73 0a                                            us.

Both format strings in this example will process the input data in 16-byte blocks. The first format string will split the block in two 8-byte halves, and display the byte values in hexadecimal; the second format string will display the block as 16 printable ASCII characters.

Finally, you may want to output the current input offset at the start of each line:

echo '"%08.8_ax    " 8/1 "%02.2x " "  " 8/1 "%02.2x " "    "
16/1 "%_p" "\n"' > fmt-18
hexdump -f fmt-18 mpoppins
00000000    73 75 70 65 72 63 61 6c  69 66 72 61 67 69 6c 69    supercalifragili
00000010    73 74 69 63 65 78 70 69  61 6c 69 64 6f 63 69 6f    sticexpialidocio
00000020    75 73 0a                                            us.

You can probably guess by now what happens if you accidentally type the entire format on a single format string—as follows:

echo '"%08.8_ax    " 8/1 "%02.2x " "  " 8/1 "%02.2x " "    " 16/1 "%_p" "\n"' > fmt-19
hexdump -f fmt-19 mpoppins
00000000    73 75 70 65 72 63 61 6c  69 66 72 61 67 69 6c 69    sticexpialidocio
00000020    75 73 0a

This time, the input data will be processed in 32-byte blocks. The first 16 bytes of the block will be displayed in hexadecimal, while the last 16 bytes will be output as printable ASCII characters.

Typing the Format Strings on the Command Line

Instead of saving the format strings to a file, you may prefer to enter them directly on the command line when you run the hexdump program. In such a case, you should omit the ‘-f’ option, and replace it with one or more ‘-e’ options.

The ‘-e’ option requires a format string as a parameter, and one such option must be specified for each format string—e.g.:

hexdump -e '"%08.8_ax    " 8/1 "%02.2x " "  " 8/1 "%02.2x " "    "' -e '16/1 "%_p" "\n"' mpoppins
00000000    73 75 70 65 72 63 61 6c  69 66 72 61 67 69 6c 69    supercalifragili
00000010    73 74 69 63 65 78 70 69  61 6c 69 64 6f 63 69 6f    sticexpialidocio
00000020    75 73 0a                                            us.

Closing Notes

  • By default, the hexdump utility will collapse successive identical output lines, which will be indicated with an output line that contains just a single asterisk.

    To illustrate this feature, you can create a data file that contains 512 null bytes, and pass it to the program as the input file:

    dd if=/dev/zero of=nulls count=1 bs=512
    hexdump nulls
    0000000 0000 0000 0000 0000 0000 0000 0000 0000
    *
    0000200

    If you want to prevent this behaviour, and want to display all input data normally, then you can use the ‘-v’ option:

    hexdump -v nulls

  • The default output from the program is produced using the following format strings:

    "%07.7_Ax\n"
    "%07.7_ax " 8/2 "%04x " "\n"

  • The ‘-x’ option, which is equivalent to the default except for the spacing, uses the following format strings:

    "%07.7_Ax\n"
    "%07.7_ax " 8/2 "   %04x " "\n"

  • The ‘-C’ option, on the other hand, uses the following format strings:

    "%08.8_Ax\n"
    "%08.8_ax  " 8/1 "%02x " "  " 8/1 "%02x "
    "  |" 16/1 "%_p" "|\n"

  • Even though this tutorial should give you a pretty good idea about the operation of the hexdump utility, the program supports quite a few additional features. As noted above, more information is available on its man page.

2 comments:

  1. thanks a lot, I wonder how to extract file names from a FAT directory (8+3 characters MS-DOS) using hexdump. This means you have given (fixed) offsets and within a fixed length of possible valid ascii characters surrounded by binary data that does not matter.

    ReplyDelete
  2. Not sure what exactly you are trying to do, but couldn't you use the 'strings' utility to print the strings of printable characters from the directory?

    ReplyDelete