thoughts | THEDEN

ELFs and magic in linux

Motivation

Lately I’ve been interested in running arbitrary executables on AWS lambda, which often means using an existing static executable of a program, or building a statically linked executable. However, a dynamically linked executable that run on Amazon’s Linux also ought to work on lambda.

What’s interesting is that static linking and dynamic linking have both in the past been considered harmful. But we’re not interested in that.

All this is a discussion for a different day—as a precursor, in this post we’ll just play with magic.

Inside ELFs

Let’s use df as an example. Using file to run magic tests we can learn a few things about the executable

$ file $(which df)
/bin/df: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=0b3a8835cb66adac6ff3015859e2d7cccf805bbb, stripped

We can use the tool xxd to do a hexdump of the header

$ xxd -l 64 $(which df)
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 1036 4000 0000 0000  ..>......6@.....
00000020: 4000 0000 0000 0000 3877 0100 0000 0000  @.......8w......
00000030: 0000 0000 4000 3800 0900 4000 1d00 1c00  ....@.8...@.....

(The -l flag is used to limit the output length since the ELF header is 64 bytes for 64-bit executables.)

Now what do these values mean? We can have a look at elf.h from the linux kernel—the 64-bit header’s described as

typedef struct elf64_hdr {
  unsigned char    e_ident[EI_NIDENT];    /* ELF "magic number" */
  Elf64_Half e_type;
  Elf64_Half e_machine;
  Elf64_Word e_version;
  Elf64_Addr e_entry;        /* Entry point virtual address */
  Elf64_Off e_phoff;        /* Program header table file offset */
  Elf64_Off e_shoff;        /* Section header table file offset */
  Elf64_Word e_flags;
  Elf64_Half e_ehsize;
  Elf64_Half e_phentsize;
  Elf64_Half e_phnum;
  Elf64_Half e_shentsize;
  Elf64_Half e_shnum;
  Elf64_Half e_shstrndx;
} Elf64_Ehdr;

the manpage for elf has a good description of each—one example

e_machine: This member specifies the required architecture for an individual file

and in our case, it’s 0x3e for x86-64. linuxbase.org’s specs, specifically for ELF is particularly useful for understanding what’s what.

What’s interesting is that the ABI value in the magic output of df above shows SYSV, which corresponds to 0x00 for e_ident[EI_OSABI], even though there is a Linux ABI value (0x03). Linuxbase.org explains why this is the case

If the object file does not use any extensions, it is recommended that this byte be set to 0

We’ll play with setting different values next.

Crafting a Header

Let’s hack together an ELF header. A really useful tool for displaying information about ELF files is readelf. For example running readelf -a $(which df) will give a wealth of information. Of course, it only works on ELF files

$ cat /dev/urandom | base64 | head -c 64 > textfile && readelf -a textfile
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start

However we can rip df’s header and construct an ELF file,

$ xxd -p -l 64 -c 64 $(which df) # Output the first 64 bytes
7f454c4602010100000000000000000002003e00010000001036400000000000400000000000000038770100000000000000000040003800090040001d001c00

$ perl -e 'print pack "H*", "7f454c4602010100000000000000000002003e00010000001036400000000000400000000000000038770100000000000000000040003800090040001d001c00"' > newfile # Create a file with the same header


$ file newfile
newfile: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)

$ xxd newfile
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 1036 4000 0000 0000  ..>......6@.....
00000020: 4000 0000 0000 0000 3877 0100 0000 0000  @.......8w......
00000030: 0000 0000 4000 3800 0900 4000 1d00 1c00  ....@.8...@.....

Now readelf should work (with some errors since we only copied the file header)

$ readelf -a newfile
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x403610
  Start of program headers:          64 (bytes into file)
  Start of section headers:          96056 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         29
  Section header string table index: 28
readelf: Error: Reading 0x740 bytes extends past end of file for section headers
readelf: Error: Section headers are not available!
readelf: Error: Reading 0x1f8 bytes extends past end of file for program headers
readelf: Error: Reading 0x1f8 bytes extends past end of file for program headers

we can also play around with how the data is stacked

$ echo "hello, world" >> newfile

$ file newfile # Now file returns an error
newfile: ERROR: ELF 64-bit LSB executable, x86-64, version 1 (SYSV) error reading

$ cat newfile
ELF>6@@8w@8     @hello, world

$ cat newfile | cut -b 64- # Strip out the header
hello, world

As discussed earlier, we can set we can set e_ident[EI_OSABI] to 0x03 to make it show linux

$ perl -e 'print pack "H*", "7f454c4602010103000000000000000002003e00010000001036400000000000400000000000000038770100000000000000000040003800090040001d001c00"' | file -
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux)

Though as we learned from the spec, it’s not recommended. We can also iterate through the values to 0x11 to see what outputs we get from file

$ for i in $(seq --format="%02.f" 0 11); do echo "print pack \"H*\", \"7f454c46020101${i}000000000000000002003e00010000001036400000000000400000000000000038770100000000000000000040003800090040001d001c00\"" | perl | file - ; done
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (HP-UX)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (NetBSD)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Hurd)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (86Open)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (Solaris)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (Monterey)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (IRIX)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD)
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1
/dev/stdin: ELF 64-bit LSB executable, x86-64, version 1

Next we’ll go into more detail on how this all works.

Magic Files

The file tool depends .mgc binaries to work. Usually located in /usr/local/share/misc/magic.mgc, magic files are databases of tests, typically containing so-called “magic patterns”.

You can quickly (or lazily) locate where the .mgc file is with strace

$ strace file $(which df) 2>&1 | grep magic.mgc
stat("/home/ubuntu/.magic.mgc", 0x7ffc9771ead0) = -1 ENOENT (No such file or directory)
open("/etc/magic.mgc", O_RDONLY)        = -1 ENOENT (No such file or directory)
open("/usr/share/misc/magic.mgc", O_RDONLY) = 3

This tells us a two things

the manpage for file confirms this

The information identifying these files is read from /etc/magic and the compiled magic file /usr/share/misc/magic.mgc, or the files in the directory /usr/share/misc/magic if the compiled file does not exist.  In addition, if $HOME/.magic.mgc or $HOME/.magic exists, it will be used in preference to the system magic files.

Interestingly, /usr/share/misc/magic.mgc is a symbolic link

$ ls -l /usr/share/misc/magic.mgc
lrwxrwxrwx 1 root root 17 Nov 20  2015 /usr/share/misc/magic.mgc -> ../file/magic.mgc

(If you want to a list of patterns used for matching, and their strength sorted descending run file -l)

To understand how magic these files work, we can look at a section of the magic file for ELFs

0	string		\177ELF		ELF
!:strength *2
>4	byte		0		invalid class
>4	byte		1		32-bit
>4	byte		2		64-bit
>5	byte		0		invalid byte order
>5	byte		1		LSB
>>0	use		elf-le
>5	byte		2		MSB
>>0	use		\^elf-le
>7	byte		0		(SYSV)
>7	byte		1		(HP-UX)
>7	byte		2		(NetBSD)
>7	byte		3		(GNU/Linux)
>7	byte		4		(GNU/Hurd)
>7	byte		5		(86Open)
>7	byte		6		(Solaris)
>7	byte		7		(Monterey)
>7	byte		8		(IRIX)
>7	byte		9		(FreeBSD)
>7	byte		10		(Tru64)
>7	byte		11		(Novell Modesto)
>7	byte		12		(OpenBSD)
>7	byte		13		(OpenVMS)
>7	byte		14		(HP NonStop Kernel)
>7	byte		15		(AROS Research Operating System)
>7	byte		16		(FenixOS)
>7	byte		17		(Nuxi CloudABI)
>7	byte		97		(ARM)
>7 byte 255 (embedded)

(The !:strength *2 line is a multiplier on the computed magic strength)

Let’s break it down:

Let’s break down the ELF header from what we built before using the ELF magic file as reference

7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
\177 (octal 177) == "7f" (0)
45 4c 46 ==  "ELF" (0)
02 == "64-bit" (>4)
01 == "1 to signify little endianness" (>5)
01 == "Recall value is e_ident[EI_VERSION] to set original ELF version" (>6)
00 == "ABI value — SYSV in this case" (>7)

As a bonus, you can see how shells and interpreters are matched with file by having a look at their magic files. For example with bash and python for shebangs

0	string/wt	#!\ /bin/bash	Bourne-Again shell script text executable
!:mime text/x-shellscript

0	search/1/w	#!\ /usr/bin/python	Python script text executable
!:strength + 15
!:mime text/x-python

# from module.submodule import func1, func2
0	regex		\^from[\040\t\f\r\n]+([A-Za-z0-9_]|\\.)+[\040\t\f\r\n]+import.*$	Python script text executable
!:strength + 15
!:mime text/x-python

Where search/N which indicates to search for the string indicated in the next field up to N byes from the offset. For mimes, I’m going to refer to the manpage since they explain it succinctly

A MIME type is given on a separate line, which must be the next non-blank or comment line after the magic line that identifies the file type, and has the following format:

           !:mime  MIMETYPE

Custom magic

Knowing what we know now, let’s make our own trivial magic file. Say I want to use the fish shell and want my fish scripts to have their own shebang that gets recognised by file.

Let’s make a fish script

$ printf '#!/usr/bin/fish\necho "hello world"\n' > script.fish
$ cat script.fish
#!/usr/bin/fish
echo "hello world"

Creating a magic file

$ printf '0   search/1/w     #!\ /usr/bin/fish     fish shell script text executable\n!:mime  text/x-fishscript\n' > $HOME/.magic.mgc

$ cat $HOME/.magic.mgc
0   search/1/w     #!\ /usr/bin/fish     fish shell script text executable
!:mime  text/x-fishscript

then running file

$ file script.fish
script.fish: fish shell script, ASCII text executable

checking the mime

$ file script.fish --mime
script.fish: text/x-fishscript; charset=us-ascii

Pretty neat!

That’s it for now. If you want to make something more complicated IBM’s Knowledge Center has some easy to read information on magic files (and of course there’s always the manpage for magic).

Written May 2018.

← Hello, World.  Building a Local Linux Development Environment with Docker and Make →