Debugging a SIGSEGV Backtrace

Before begin my post I need introduce you the term of backtrace, a backtrace is a series of the last function calls in your program (view $man backtrace), with a backtrace you can access the call stack, so, in other words how did you get to that point in your program.

Today working on a code I got a SIGSEGV and the obviously subsecuently crash. After checking the log I found this backlog (which was made using backtrace()):

[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1233: SIGSEGV(11), puntero 0xc0 desde 0x7f288f7c79aa
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (0) /usr/lib64/twsmedia/libtwsmedia.so(twsmedia_widget_alarm_pool_draw+0x1680)[0x7f288f7c79aa]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (1) /usr/lib64/twsmedia/libtwsmedia.so(twsmedia_widget_alarm_pool_draw+0x1680)[0x7f288f7c79aa]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (2) /usr/sbin/mwconstructor[0x40a2c0]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (3) /lib64/libpthread.so.0(+0x7df5)[0x7f289173edf5]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (4) /lib64/libc.so.6(clone+0x6d)[0x7f288834c1ad]

Viewing this log you know where the problem was … but… Which line is it?

The simplest way to debug(if you dont know this trick) is run gdb and try to reproduce the bug, but not every time its a great decision.

But, wait! what you want to do then?

We will search directly inside the .o of the library for the problematic line… Lets begin:

Use nm to find the function’s start position on .o file

nm src/twsmedia_widget.o | less

You will find something like this line

000000000000cc9a T twsmedia_widget_alarm_pool_draw

0xcc9a is the start line of twsmedia_widget_alarm_pool_draw function

Then, add 0x1680 offset (twsmedia_widget_alarm_pool_draw+0x1680)) to that pos, in this case resulting in 0xe31a

For last, call addr2line, to search for the specific line in the .text section of the object

$ addr2line -j .text -e src/twsmedia_widget.o 0x000000000000e31a
 src/twsmedia_widget.c:3139

And Problem solved! Now we have the problematic line:  src/twsmedia_widget.c:3139

Its important to highlight that this method won’t work in some scenarios like having static functions (because you won’t have the function names in the backtrace).

That’s all for now, see you later!

 

Codec vs Format: Parte 1

Comúnmente, cuando hablamos de formatos de video, se suele confundir el concepto de codec con el de formato .

La diferencia es simple: El codec hace referencia al tipo de algoritmo que se utilizó para comprimir vídeo (o audio, subtitulos), mientras que la palabra formato suele referirse a la combinación de transporte (o encapsulamiento) que se utilizó para almacenar audio y video sumado a los codecs que se utilizaron para comprimirlos.

Formato = Transporte + codecs

Ejemplo: XDCAM es un formato que utiliza el transporte MXF, el codec de video MPEG2Video y el audio en PCM

No hay convenciones para todas las combinaciones posibles de manera que por lo general el formato hace referencia al transporte y comunmente se manifiesta en la extensión del archivo.

Por ejemplo: MP4. Cuando se habla de formato MP4 suele asociarse a que el transporte es MP4, que el codec de video es H264 y el codec de audio es AAC, pero los codecs podrían ser otros también.

Ejemplos de transportes:
MP4, AVI, MOV, MKV, MPEG-TS, OGG, WMV

Ejemplos de códecs:
H264, MPEG4, WMV, MPEG2-VIDEO, AAC, AC3

No todos los transportes pueden contener todos los codecs, y puede suceder que un reproductor reconozca el transporte pero no algunos de los códecs.

Imagínense al transporte justamente como un medio de transporte de carga (avion, tren, autobus) y al codec como los tipos de cargamento que podrían ir en su interior. Por los general son independientes unos de otros, salvo algunas excepciones que determinados transportes son exclusivos para determinados codecs como por ejemplo el FLV

AVI y MP4 son transportes.

A mi entender AVI podría ser un tranporte antiguo, limitado e inseguro como este tren:

Resultado de imagen para tren

y MP4 podría ser algo así:

Resultado de imagen para tren

 

¿WMV es un codec o un formato?

WMV es el nombre de un formato pero también es el nombre del codec ambos creado por Microsoft. Como formato solo puede contener en su interior codec de video WMV y codec de audio WMA. El codec WMV de video a su vez puede estar contenido dentro de un formato AVI

Cuando alguien dice que tiene un archivo de formato MP4, en realidad no se sabe que codec de video va a tener en su interior, hay una cantidad de codec soportados como MPEG2VIDEO y H264, pero en realidad sólo está haciendo referencia al tranporte.

Es muy importante saber entonces que cuando hablamos de formato generalmente hablamos de transporte, y que dentro del mismo existen videos y/o audios comprimidos con algún codec.

Showing what the threads are doing, trying not to interrupt the process

Humans are curious, perhaps that makes us humans[1], and you might be curious about what your program is doing.
Sometimes, the program has several threads running and sometimes you can’t completely stop it neither kill it to see what it is doing.

A situation like that could be when there is a service running on a client and it has some problem, you suspect a thread (one of several) is locked or waiting for something that will never happen, but all other threads look like they are running fine; so you don’t want to interrupt neither kill the process for now.

Don't stop me now
Don’t stop me now

What I do, is to use gdb and write a file with the commands I want to run and ending the file with the ‘q’ command (quit), making gdb quit so the process can continue its execution. I usually write a file called ‘commands’ with this:

thread apply all bt
q

That will execute ‘bt’ (backtrace) to all threads and then ‘q’ (quit) gdb after executing backtrace. Printing the backtrace for all threads shows me (more or less) what the threads are executing.
Using the ‘commands’ file I run gdb like this:
gdb -p <pid> -x commands > /tmp/threads

Being <pid> the process pid.
Notice I redirect the output to a file, that is because unless I redirect to a file, gdb will stop the output when the console is full and wait for me to press ‘return’; which will make the process stop for a while, something I don’t want to happen.

After seeing the threads, I can write another ‘commands’ files with another instructions to gdb, like printing some variable.

[1]:
[…]the ability to ask questions is probably the central cognitive element that distinguishes human and animal cognitive abilities[…] (https://en.wikipedia.org/wiki/Great_ape_language#Limitations_of_ape_language)

Build shared libraries with invalid instructions

Some background

We have to decode many videos in multiple encodings(mpeg2video, h264, etc). The decoding proccess consumes too much CPU, and sometimes in one server we decode up to 20 Full HD videos, to make this possible we aid the CPU with either Nvidia CUDA or Intel QSV.

We distribute our software in rpm packages, which are built in our R&D department and deployed to our clients, usually this means that the hardware specifications are not the same.

These days we dealt with an interesting problem, we wanted to unify our decoding library and make it’s compilation hardware independant. CUDA didn’t pose a problem because we install the libraries in all of our systems whether they have a video card or not. However, Intel QSV support proved to be a little bit more difficult… If Intel QSV is not supported by the processor, the application raises a SIGILL (Illegal Instruction), which means that the processor tried to execute an invalid instruction, and stops the execution.

We wanted to keep one package for the library that could be built in all of our processors (whether they support Intel QSV or not). Up to now, if we wanted Intel QSV, we had to compile the library in the specific processor that supported it, and we had a bunch of conditions in our Makefiles to detect the availabity of QSV and build the library with it. Since the decoding library changes very often, new rpm packages are uploaded almost daily by us, so if the CPU doesn’t support QSV the package is built without it, and when is installed in a CPU that does support it, it doesn’t use it D:

Challenge accepted
Wait.. what is the challenge?

We want to run an application that may have invalid instructions in some of the libraries it includes, and decide during execution time whether to call them or not…

The solution

So what did we do? First of all, we took away the Intel QSV decoding part, and made a new library with it, like a wrapper. You might be wondering what does this change, because our main decoding library would link the QSV library and it should throw a SIGILL same as before.

What I didn’t mention is how do we link this new library, we are not linking it as a shared object when compiling, we dynamically load it during execution. How do we do it? Using ldfcn(), this allow us to read object files, such as a shared object, and link it during execution. By linking in this way, we can check the processor during execution and link the QSV library only if supported, however we have to manually link every function we have in the library to function pointers in the main decoding library context using dlsym(). To sum up:

  1. Divide the main decoding library and the QSV part
  2. Build the latter as a shared object
  3. Check in the main library when is QSV supported
  4. Dynamically load the .so
spongebob
The solution
A few considerations

We need the headers of the QSV library because they provide the definitions of the structures, so the rpm package will be installed always, whether we have QSV support or not, this way we can include the header and compile the main decoding library without errors.

It is important not to link QSV when compiling, if we do so the application will compile but when we try to run it, a SIGILL will show up (when not supported) before the main() is even executed. This happens because the library has a constructor that is called when the library is loaded.

If the unsupported library has too many API functions, we will have to link each and every one of them manually, this can be tedious .__. And take care not to load the library when not supported because you will have a SIGILL immediately