Sunday, March 2, 2014

main() - how a program starts (linux, C/C++)

Whenever you execute a program (either from a shell or replace a forked child process image using one of the functions from exec() family) - eventually the system call execve(2) is invoked.
The prototype of this system call is int execve(const char *filename, char *const argv[], char *const envp[]);

This system call, among other things (like verifying execute permission on the filename, setuid etc.), figures the value of argc using argv and copies argc, argv and argp arguments on new user stack along with process .data and .text. It stores argc into %rsp, argv[0] in LP_SIZE(%rsp)argv[argc] = null in (LP_SIZE*argc)(%rsp), similary envp[0] in (LP_SIZE*(argc+1))(%rsp) ... null. Where LP_SIZE is the size of long pointer in bytes.

NOTE: both argv and argp are null terminated arrays.

The arguments to function int main(int argc, char *argv[], char *envp[]); is usually implementation defined and specified by platforms ABI. C99 does neither bless or forbid envp argument to main function.

On linux executable of a program is created according to ELF specifications. Typically the ELF is implemented such that, some glibc wrapper functions are called before main to make sure that argc is initialized from stack (these functions typically involve _start, __libc_csu_init, __libc_start_main etc.).

You could experiment your binary files produced on linux using the binutils (notably readelf and objdump utilities among others).

Following is an example where a simple C program prints the command line arguments and environment variables.

Assuming that above programs filename is argc.c, it can be compiled using command
gcc -o argc argc.c
This should give you an executable file argc (an executable ELF object).

Now you can disassemble the executable .sections of argc using objdump utility that comes with binutils package.
objdump -d argc > argc.objdump
will give you following (or equivalent depending on architecture you're working on) in file argc.objdump.

The disassembled file is divided in various .sections as specified by the platform specific ABI. On a linux platform, for our argc program we have following .sections with executable instructions.

.init process initialization code.
.plt procedure linkage table.
.text program text, or executables instructions of program.
.fini finalization code of the process.

All the .section(s) of an ELF object can be listed by
objdump --section-headers argc

As can be seen in above disassembled executable .sections of argc executable object, the _starup code figures the argc and then pushes argc, argv, init, fini and rtld_fini, on the argument stack and calls __libc_start_main

__libc_start_main uses following arguments:
1. address of main function,
2. argc,
3. argv,
4. init,
5. fini,
6. rtld_fini, and
7. stack_end and is responsible to finally calling main() with appropriate arguments. There's a lot that goes in __libc_start_main, please read glibc's code for more details.

Monday, February 10, 2014

Java launcher debug

If you ever want to set your development environment so that java utilities should print some debug information you can use _JAVA_LAUNCHER_DEBUG. Note that this environment variable’s value is not relevant, as long as its set to something. So, if you want to disable debugging you have to unset it. Following is an example with javac and java

Compile a java program

Note the details below starting from the launcher state variables (e.g. full java version), arguments to java, the config values read from jvm.cfg, path of libjvm.so, and JavaJVM arguments. These are very handy when debugging a build.

Run the java program

Similary, we can see all the above when invoking a java program.

Java program used above