Never confuse education with intelligence, you can have a PhD and still be an idiot.
- Richard Feynman -



Chapter:C libraries

From Juneday education
Revision as of 10:20, 23 May 2018 by Henrik Sandklef (Talk | contribs) (Introduction)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

Let's say you're writing code that manages books. You use the code in a server and would like to use some the same code in a text client and why not in a GUI. So we would like to share code between programs. We could do this in some different ways:

  1. compile the same source code files and use the object files
  2. create an archive and let the programs linked against that
  3. create a shared library and let the programs linked against that

Compile the same source code files

The first one is ok, well it could be kind of ok, if all our code (server, client and GUI) were located in the same place. But what would happen if we decide to split the project into some smaller sub projects? It will not be as easy to manage the code base shared. Wouldn't it be easier if the sub project writing the shared code could produce a deliverable that in itself could be shared. Fact is, they can. They can produce (create) an archive or a shared library.

Create an archive

An archive is a collection of object files (compiler c files). So instead of delivering a some object files as they are, tared, or zipped we could use a format is suited, actually written, for this purpose. This is an archive. It is a collection of object files that can be linked the same way as the objects files could. The functions (and variables..) you would like to use are linked in to the program when the program is being created (linked).

Create a shared library

A shared library is also a collection of object files. The functions (and variables..) you would like to use are linked in to the program when the program is being executed. This kind of linking is called dynamic linking and the library loaded dynamically is called a dynamic library. The same library can be loaded (at runtime) to many programs, which might save disk space. If the team responsible for the shared code delivers a new version all the other teams have to is replace the shared library with the new one - no compilation or linked needed (apart from run-time linking of course).

Videos

C libraries (Full playlist) | C - libraries 1/2 | C - libraries 2/2 | Libraries (PDF)

Library bascis

Let's assume that we want to create a library from the files book.c and author.c.They both have coresponding header files book.h and author.h. We create a simple test program (test-book.c) that creates one book and prints it out - yeah, in a true academic style.

You can find the source code here: github.com/progund/programming-with-c/tree/master/libraries/book-example. In this directory you'll find a bash script called build.sh. Invoke it to build as in the following section:

$ ./build.sh
Using no libraries
---------------------------
gcc -Ilib lib/*.c test/test-book.c -o test-prog
Executing program:
Pendulum (Umberto Eco, umberto@thegreatone.com)

Using archive
---------------------------
cd lib && gcc -c *.c
ar rcu lib/libbook.a lib/book.o lib/author.o
gcc -Ilib test/test-book.c -Llib -lbook -o test-prog
Executing program:
./test-prog: error while loading shared libraries: libbook.so: cannot open shared object file: No such file or directory

Using shared library
---------------------------
rm lib/lib*.a
cd lib && gcc -fPIC -c *.c
gcc -shared  -o lib/libbook.so lib/book.o lib/author.o
gcc -Ilib test/test-book.c -Llib -lbook -o test-prog
Executing program:
Pendulum (Umberto Eco, umberto@thegreatone.com)

Below we will go through what happened when invoking the script.

Without libraries

Without libraries you would compile as follows:

$ rm lib/*.o
$ gcc -Ilib lib/*.c test/test-book.c -o test-prog

Explanation (line by line):

  1. remove all objects files in lib
  2. compile all c files in lib and test/test-book.c and call the program test-prog

Create and use a static archive

$ rm lib/*.o
$ cd lib && gcc -c *.c
$ ar rcu lib/libbook.a lib/book.o lib/author.o
$ gcc -Ilib test/test-book.c -Llib -lbook -o test-prog

Explanation (line by line):

  1. remove all objects files in lib
  2. change to directory lib
    compile all c files
  3. create an archive, lib/libbook.a from these object files
  4. compile test/test-book.c and link with the archive just created, and finally call the program test-prog

Create and use a shared library

$ rm lib/*.o lib/*.a
$ cd lib && gcc -fPIC -c *.c
$ gcc -shared  -o lib/libbook.so lib/book.o lib/author.o
$ gcc -Ilib test/test-book.c -Llib -lbook -o test-prog

Explanation (line by line):

  1. remove all objects files and archives in lib
  2. change to directory lib
    compile all c files (with the flag -fPIC to produce so called position independent code).
  3. create a shared library, lib/libbook.so from these object files
  4. compile test/test-book.c and link with the shared library just created, and finally call the program test-prog

Using multiple files to create a program

Let's leave the book code above and look at some other example. This time we will look at some code which can be used to store names. As wo go along using this code and a small test program we will look into:

  • compiling all c files at once (not using libraries)
  • creating an archive and use it
  • creating a shared library and use it
  • examining the content of object files (be it object files, archives, shared libraries and binaries)
  • examining the library dependency of a program
  • check the size of object files (be it object files, archives, shared libraries and binaries)

Preparations

Set some compiler flags to make the command listing slightly easier to read:

export CFLAGS="-pedantic -Wconversion -Wall -Werror -Wextra -Wstrict-prototypes"

When having set this variable we can compile like this

gcc ${CFLAGS} ......

This way we don't have to write all the compiler flags every time we invoke gcc on this page.

Introducing the name library

We've written a small c with functions that can be used to store names dynamically. The names are stored in a struct called name_list:

typedef struct name_list_
{
  char** names;
  unsigned int size;
} name_list;

Together with this struct we have written a couple of functions.

Expand using link to the right to see a list of the functions.

/**
 * @brief Returns a pointer to a name_list struct (dynamically allocated).
 * @return A pointer to the allocated memory. NULL is returned if memory allocation fails
 */
name_list* name_list_new(void);

/**
 * @brief Adds name name_list struct. The string is copied and memory has been created for it using the alloc family.
 * @param list the list to add name to
 * @param name the name to add 
 * @return NAME_OK (0) on success. 
 */
int name_list_add(name_list *list, char *name);

/**
 * @brief Removes the first occurance of name in the list. The string
 * is freed and the names afterwards are copied down. The size of the list is alos adjusted (with -1) on success.
 * @param list the list to remove name from
 * @param name to remove
 * @return NAME_OK (0) on success.
 */
int name_list_remove(name_list *list, char *name);

/**
 * @brief Removes the name at a given position in the list. The string
 * is freed and the names afterwards are copied down. The size of the list is alos adjusted (with -1) on success.
 * @param list the list to remove name from
 * @param pos the position of the name to remove
 * @return NAME_OK (0) on success.
 */
int name_list_remove_at(name_list *list, unsigned int pos);

/**
 * @brief Returns the position of the name in the list
 * @param list the list to get the position of name from
 * @param name the name to search for
 * @return NAME_OK (0) on success.
 */
int name_list_positon(name_list *list, char *name);

/**
 * @brief Prints (to stream) the entire list of names.
 * @param list the list to get the position of name from
 * @param stream where to print
 * to add @return NAME_OK (0) on success.
 */
int name_list_print(name_list *list, FILE* stream);

/**
 * @brief Frees all the memory used by the list
 * @param list the list to free
 * @return NAME_OK (0) on success.
 */
int name_list_free(name_list *list);

/**
 * @brief Frees all the memory used by the list
 * @param list the list to free
 * @return NAME_OK (0) on success.
 */
int name_list_size(name_list *list);

/**
 * @brief Returns the name at position in the list
 * @param list the list to get the name (at position) from
 * @param pos the position to get the name
 * @return NAME_OK (0) on success.
 */
char* name_list_at_position(name_list *list, unsigned int pos);


The source code can be found at github: github.com/progund/programming-with-c/libraries/src

Without archive

Compile name.c

$ gcc $CFLAGS -c  name.c -o name.o

Examine name.o

$ nm name.o
                 U calloc
                 U fprintf
                 U free
                 U _GLOBAL_OFFSET_TABLE_
0000000000000045 T name_list_add
00000000000001d7 T name_list_at_position
0000000000000490 T name_list_free
0000000000000000 T name_list_new
0000000000000375 T name_list_positon
00000000000003fd T name_list_print
000000000000017c T name_list_remove
0000000000000212 T name_list_remove_at
0000000000000530 T name_list_size
                 U realloc
0000000000000004 C RET_VAL
                 U stderr
                 U strdup
                 U strlen
                 U strncmp

This might need a bit of explanation. The lines with U are lines with functions not defined (undefined) in this object files. For example, the line U strlen means that strlen is used, but not defined in name.o. Check the lines with T. T means The symbol is in the text (code) section. so this is what we're looking for. Let's use grep to filter a bit.

$ nm name.o | grep " [Tt] name"
0000000000000045 T name_list_add
00000000000001d7 T name_list_at_position
0000000000000490 T name_list_free
0000000000000000 T name_list_new
0000000000000375 T name_list_positon
00000000000003fd T name_list_print
000000000000017c T name_list_remove
0000000000000212 T name_list_remove_at
0000000000000530 T name_list_size

This means that the functions above are defined in the file name.o.

Use the object file / link the program

$ gcc $CFLAGS -I. test/test-name.c name.o -o test-prog

Execute the program

Run the test program

$ ./test-prog 
realloc 0x14ce260, new size: 1
Assign 0x14ce280, ..
.....

Examine the program

$ nm test-prog 
                 U __assert_fail@@GLIBC_2.2.5
0000000000602064 B __bss_start
                 U calloc@@GLIBC_2.2.5
... snip
00000000004007a7 T main
0000000000400fd1 T name_list_add
0000000000401163 T name_list_at_position
000000000040141c T name_list_free
.... snip

Note: we have cut away rather much from the printout.

Expand using link to the right to see a complete listing from the nm command above.

                 U __assert_fail@@GLIBC_2.2.5
0000000000602064 B __bss_start
                 U calloc@@GLIBC_2.2.5
0000000000602064 b completed.6973
0000000000602060 D __data_start
0000000000602060 W data_start
0000000000400700 t deregister_tm_clones
0000000000400770 t __do_global_dtors_aux
0000000000601e08 t __do_global_dtors_aux_fini_array_entry
0000000000401568 R __dso_handle
0000000000601e10 d _DYNAMIC
0000000000602064 D _edata
0000000000602070 B _end
0000000000401554 T _fini
                 U fprintf@@GLIBC_2.2.5
00000000004007a0 t frame_dummy
0000000000601e00 t __frame_dummy_init_array_entry
0000000000401a94 r __FRAME_END__
                 U free@@GLIBC_2.2.5
0000000000602000 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
0000000000401804 r __GNU_EH_FRAME_HDR
0000000000400610 T _init
0000000000601e08 t __init_array_end
0000000000601e00 t __init_array_start
0000000000401560 R _IO_stdin_used
0000000000401550 T __libc_csu_fini
00000000004014e0 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
00000000004007a7 T main
0000000000400fd1 T name_list_add
0000000000401163 T name_list_at_position
000000000040141c T name_list_free
0000000000400f8c T name_list_new
0000000000401301 T name_list_positon
0000000000401389 T name_list_print
0000000000401108 T name_list_remove
000000000040119e T name_list_remove_at
00000000004014bc T name_list_size
00000000004017a4 r __PRETTY_FUNCTION__.2478
                 U realloc@@GLIBC_2.2.5
0000000000400730 t register_tm_clones
0000000000602068 B RET_VAL
00000000004006d0 T _start
                 U stderr@@GLIBC_2.2.5
                 U stdout@@GLIBC_2.2.5
                 U strcmp@@GLIBC_2.2.5
                 U strdup@@GLIBC_2.2.5
                 U strlen@@GLIBC_2.2.5
                 U strncmp@@GLIBC_2.2.5
0000000000602068 D __TMC_END__

Let's use grep to filter a bit.

$ nm test-prog | grep " [Tt] name"
0000000000400fd1 T name_list_add
0000000000401163 T name_list_at_position
000000000040141c T name_list_free
0000000000400f8c T name_list_new
0000000000401301 T name_list_positon
0000000000401389 T name_list_print
0000000000401108 T name_list_remove
000000000040119e T name_list_remove_at
00000000004014bc T name_list_size

We can see that the program (test-prog) contains the functions we wrote in the files name.c and test/test-name.c. This should hopefully not come as a surprise for you.

Size matters

Let's look at the size of the compiled c file (name.c) and the program

$ du -sb name.o test-prog 
4624	name.o
13256	test-prog

Archive

Compile name.c and create an archive

Compile name.c and create an archive

$ gcc $CFLAGS -c -I. -I.. name.c -o name.o
$ ar rcu  libname.a name.o

Examine the archive

Get someinformation about the archive

$ file libname.a 
libname.a: current ar archive

Ah.. it is an archive. Great. Wouldn't it be great if we could see what functions are defined in the archive? Well, we can using mn. Let's use grep to filter a bit.

$ nm libname.a | grep " [tT] " 
0000000000000045 T name_list_add
00000000000001d7 T name_list_at_position
0000000000000490 T name_list_free
0000000000000000 T name_list_new
0000000000000375 T name_list_positon
00000000000003fd T name_list_print
000000000000017c T name_list_remove
0000000000000212 T name_list_remove_at
0000000000000530 T name_list_size

So the functions name_list_add, name_list_at_position, name_list_free, name_list_new, name_list_positon, name_list_print, name_list_remove, name_list_remove_at, name_list_size are defined in the archive.

Note: If you're interested, we used the command:
nm libname.a | grep " [tT] " | awk '{ printf "<code>%s</code>, ", $3}'; echo
to get the list above printed in mediawiki format.

Use the archive / link the program

Compile test/test-name.c and create a binary using the archive

$ gcc $CFLAGS -I. test/test-name.c -L. -lname -o test-prog

Execute the program (linked with the archive)

Run the test program

$ ./test-prog 
realloc 0x14ce260, new size: 1
Assign 0x14ce280, ..
.....


Examine the program (linked with the archive)

Let's look at what functions are defined in the program

$ nm test-prog  | grep " [Tt] "
0000000000400700 t deregister_tm_clones
0000000000400770 t __do_global_dtors_aux
0000000000601e08 t __do_global_dtors_aux_fini_array_entry
0000000000401554 T _fini
00000000004007a0 t frame_dummy
0000000000601e00 t __frame_dummy_init_array_entry
0000000000400610 T _init
0000000000601e08 t __init_array_end
0000000000601e00 t __init_array_start
0000000000401550 T __libc_csu_fini
00000000004014e0 T __libc_csu_init
00000000004007a7 T main
0000000000400fd1 T name_list_add
0000000000401163 T name_list_at_position
000000000040141c T name_list_free
0000000000400f8c T name_list_new
0000000000401301 T name_list_positon
0000000000401389 T name_list_print
0000000000401108 T name_list_remove
000000000040119e T name_list_remove_at
00000000004014bc T name_list_size
0000000000400730 t register_tm_clones
00000000004006d0 T _start

Lots of internal things, so let's focus on our "name" things.

$ nm test-prog  | grep " [Tt] name"
0000000000400fd1 T name_list_add
0000000000401163 T name_list_at_position
000000000040141c T name_list_free
0000000000400f8c T name_list_new
0000000000401301 T name_list_positon
0000000000401389 T name_list_print
0000000000401108 T name_list_remove
000000000040119e T name_list_remove_at
00000000004014bc T name_list_size

This means that the functions above are now defined in the program. They have been included (statically) to the program via the archive (libname.a) which was build from name.o.

$ file test-prog 
test-prog: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=01b9d852e2d40905432a1e1d30e40bbd3c91c6cf, not stripped

So, the test program is actually using some shared libraries. Let's check if it is using our library as a dynamic library or as an archive. Time to get some extra information about the test program

$ ldd test-prog 
	linux-vdso.so.1 (0x00007fff5177d000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fa14abf3000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa14afd8000)

So, we can see that test-prog is not using the library name dynamically.

Size matters

Let's look at the size of the compiled c file (name.c), the archive and the program

$ du -sb name.o libname.a test-prog 
4624	name.o
4956	libname.a
13256	test-prog

The object file (name.o) and the archive (libname.a) more or less have the same size. A good guess is that the archive has some additional information. The program has size 13256. Let's compare this to when using the object files directly (previous section). Then the size was 13256. Oh what a coincidence (co-inky-dinky) :).

Shared library

Compile name.c and create a shared library

Compile name.c and create an shared library:

$ gcc -c -fPIC ${CFLAGS} -I. -I.. name.c -o name.o
$ gcc -shared -o libname.so name.o

Examine the shared library

Get someinformation about the shared library

$ file libname.so 
libname.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=c4d19ab6e8c97cc5623bc112524c1ec2e11a0380, not stripped

The words shared object tell us that this is a shared library.

Let's analys it the same way we did with the archive:

$ $ nm libname.so | grep " [tT] name" 
0000000000000a3f T name_list_add
0000000000000bd1 T name_list_at_position
0000000000000e8a T name_list_free
00000000000009fa T name_list_new
0000000000000d6f T name_list_positon
0000000000000df7 T name_list_print
0000000000000b76 T name_list_remove
0000000000000c0c T name_list_remove_at
0000000000000f2a T name_list_size

Use the archive / link the program

$ gcc -fPIC ${CFLAGS} -I. -I.. test/test-name.c -L. -lname -o test-prog

Let's see what libraries this program depends on:

$ ldd test-prog 
	linux-vdso.so.1 (0x00007ffc37357000)
	libname.so => not found
	libc.so.6 => /lib64/libc.so.6 (0x00007f83ba538000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f83ba91d000)

The programs uses four dynamic libraries. This may vary depending on your operating system. The important thing to note here is that we can see libname.so listed there. Why "not found".... you'll find out in the next section.

If we were to examine a program linked with the archive the same way we'd get something like this:

$ ldd test-prog 
	linux-vdso.so.1 (0x00007ffdb1fbe000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f5da05a3000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5da0988000)

We can see that libname.so is not listed.

Execute the program (linked with the shared library)

Let's execute our program.

$ ./test-prog 
./test-prog: error while loading shared libraries: libname.so: cannot open shared object file: No such file or directory

Uh oh. Kind of reminds about when bash can not find the program to start (Command not found). This time bash can find the program test-prog but it can't find the shared library. So, just as we need to tell bash that it can search for programs in the PATH variable we need to give the dynamic linker (a tool responsible for loading dynamic/shared libraries at run-time). We do this by using the LD_LIBRARY_PATH variable.

$ export LD_LIBRARY_PATH=.
$ ./test-prog

Et voila. It works :)

This could be done as a one-liner:

$ LD_LIBRARY_PATH=. ./test-prog

Note: the command above only sets the variable once (before executing the program).

Examine the program (linked with the shared library)

Let's look at what functions are defined in the program

$ nm test-prog  | grep name_
                 U name_list_add
                 U name_list_at_position
                 U name_list_free
                 U name_list_new
                 U name_list_print
                 U name_list_remove
                 U name_list_remove_at
                 U name_list_size

Now the functions in the file name.c are not defined in the program. They are used and need to be defined elsewhere (in the shared library).

Size matter

Let's look at the size of the compiled c file (name.c), the shared library and the program

$ du -sb name.o libname.so test-prog
4624	name.o
12736	libname.so
12840	test-prog

The object file (name.o) and the archive (libname.a) have different sizes, fact is that the shared library is roughly 3 times as big. The program has size 12840. Let's compare this to when using the object files directly (previous section) and when using an archive. Then the size, in both cases, was 13256. So using shared libraries with only one program using the shared library does not give us any advantage when it comes to saving disk space ;)

Links

Library (wikipedia)

Static_library (wikipedia)

Source code: https://github.com/progund/programming-with-c/tree/master/libraries (Our Github)

TODO: (Possibly)

Other useful tools:

  • readelf (-h | main -d | ...)
  • ldd -d test-prog (report missing objects)
  • ldd -r test-prog (report missing objects and functions)
  • ltrace?
  • strace
  • truss? Just kidding