Debugging a SIGSEGV Backtrace Part 2

Today I found an effortless way to debug a sisseg backtrace using gdb:

The problem was this backtrace:
[17450 XX 23:31:34 (+18)][10237] <TWS_ERROR> {sigsafe} src/common.c@1303: [bt]: (5) /usr/lib64/twsch/libtwsch.so(ch_thread+0x1fd)[0x7f64a2467669]

The procedure:

1 – First of all we need to open a instance of the process with gdb, and run only to the begin of the main function to let the process dynamic load the libraries:

$ gdb ./your_buggy_app

(gdb) b main

2 – use the function name and the offset in this way

(gdb) info line *ott_thread+0x1fd

the result:

Line 426 of “src//ch.c” starts at address 0x7fffee99a65d <ott_thread+497> and ends at 0x7fffee99a68b <ch_thread+543>.

More clear? Throw it some water

via GIPHY

 

13 Reasons Why… Vim

Hello, do you have a moment to talk about Vim?
Today we bring to you THE ultimate text editor: Vim

Before Vim our life was sad, really sad, then we found it, and our life was sad and complicated :(, but then, we learnt how to use it, and now, we are ridiculously happy!!( except for the designer, he uses Windows and can’t use Vim like us, so he is still sad)
So, why would you want to use it?? Let’ s see:

1 – If you don’t have money to buy a mouse, don’t worry, you don’t need one!.
2 – It´s doesn’t consume 130% of RAM like other editors.
3 – It´s stable, Vim just … never gonna give you up, never gonna let you down .. really, no crashing.
4 – You will find Vi ( Vim´s father) included in just about any unix derived system.
5 – Ideal when you are ssh-ing into a linux-running server or something, where don’t even need an X.
6 – Works fine even using a poor ssh or network connection.
7 – It’s customizable and extensible, you can personalize it as you see fit.
8 – It´s free!
9 – Wherever you go, you can take with you your vim configuration easily.
10 – Very nice documentation, just need to use “:help”.
11 – Vim is really powerfull and incredibly fast once you get past the initial learning curve (which is really steep, don’t give up halfway!).
12 – It has regex!
13 – Vim is awesome! (yes, that’s a reason).

 

 

Join the Vim side of the Force …

Ahorrando tiempo de compilación con make -j

A la hora de compilar una libreria, un programa o el propio kernel de Linux, el comando make por defecto utiliza un sólo procesador lógico (thread) del sistema.
Para acelerar los tiempos podemos utilizar el parámetro -j e indicar la cantidad de procesadores lógicos que queremos utilizar.

Ejemplo:

make -j 8

¿Cómo saber que cantidad de procesadores lógicos o threads tengo disponible?

cat /proc/cpuinfo | grep processor | wc -l

En el siguiente video les mostraré como compilar el kernel con un servidor Dell con doble procesador Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

 

Aprovechamos este post para invitarlos a participar de la edición 2017 de NAB SHOW en Las Vegas.

Se pueden registrar en el siguiente link: http://www.3way.com.ar/nab2017.php

Using GCC Intrinsics (MMX, SSEx, AVX) to look for max value in array

To begin with, you shouldn’t start your new codes focusing on performance; functionality should be the key factor and consider leaving room for future improvements. But well.. after you did your job and everything is working as it should be, you might need to tweak your code a little bit to increase its performance.

efficiency
First functionality then efficiency
The problem

We want to look for the maximum value in an array, this array is composed of int16_t mono audio samples. The maximum value of the array will be the peak value during the audio interval being analysed, this peak is known as sample peak and should not be interpreted as the real peak of the audio which is the true peak (there is a very good explanation about the differences here).

A basic(?) solution

Ok, we have to look for the maximum value in an array and we focus on functionality, this is quite simple actually…

int16_t max = buff [0];
for(i = 1; i < size; i++) {
>....if(max < buff[i]) {
>....>....max = buff[i];
>....}
}

 The intrinsics

These are a series of functions which implement many MMX, SSE and AVX instructions, they are mapped directly to C functions and are also further optimized with gcc. Most of the instructions use vector operations, and you can work with 128, 256 or 512 bit vectors depending on the architecture and the compiler. There is a very detailed guide here and you can see a full list of the funcions here.

You will need to include the headers depending on what functions you want to call, or just include x86intrin.h, which will include all the available ones. Then, you will need to add the appropiate flag to the gcc compile line, in this specific case I’m using -maxv2. If you want to check the supported functions you may use the following command to list you the corresponding includes that gcc will use.

$ gcc -mavx2 -dM -E - < /dev/null | egrep "SSE|AVX"
#define __AVX__ 1
#define __AVX2__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1

The code

What I’m going to do is compare two buffers of 128 bits, one of them has the max value (initialized to 0, if there are all negative values the result will be wrong)  and the other will be the input buffer, this is done using: _mm_max_epi16() which will compare 8 values (int16_t) at a time. This is one of the reasons why the intrinsics increase the performance.

After going through the whole input buffer, I will have in maxval array the maximum value, but I don’t know the position. For the sake of the example I am doing two redundant things here. First, using _mm_shufflelo_epi16()/_mm_shufflehi_epi16() and the _mm_max_epi16()  I will compare values inside the vector and rotate them so the whole buffer has the maximum value, being 16 bit values I can’t shuffle the whole buffer so the shuffle is done in the high bits and the low bits separately.

Finally I will store the final vector in a int16_t array with _mm_store_si128() and I’ll look for the maximum inside it (I could have done this before, but I wanted to show the shuffle which might be useful if the samples were not 16 bit, and the shuffles were not partial).

int16_t find_max(int16_t* buff, int size)
{
    int16_t maxmax[8];
    int i;
    int16_t max = buff[0];

    __m128i *f8 = (__m128i*)buff;
    __m128i maxval = _mm_setzero_si128();
    __m128i maxval2 = _mm_setzero_si128();
    for (i = 0; i < size / 16; i++) {
        maxval = _mm_max_epi16(maxval, f8[i]);
    }
    maxval2 = maxval;
    for (i = 0; i < 3; i++) {
        maxval = _mm_max_epi16(maxval, _mm_shufflehi_epi16(maxval, 0x3));
        _mm_store_si128(&maxmax, maxval);
        maxval2 = _mm_max_epi16(maxval2, _mm_shufflelo_epi16(maxval2, 0x3));
        _mm_store_si128(&maxmax, maxval2);
    }
    _mm_store_si128(&maxmax, maxval);
    for(i = 0; i < 8; i++)
        if(max < maxmax[i])
            max = maxmax[i];
    return max;
}
Some numbers

I’m going to compare 3 different cases, and find the maximum value in random (pseudo random) arrays of 10000 and 1000000 samples.

  • Using an intuitive for loop as the one shown before
  • Same loop as before but with compiler optimizations
  • Using SSE instructions via intrinsics (sample code).

This table shows the total delay in us (micro seconds) that the different functions take.

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Elements Mode Avg. Delay [us]
10000 Default 650
10000 -O2 250
10000 Intrinsics 200
10000 Intrinsics w/-O2 30
1000000 Default 31000
1000000 -O2 13000
1000000 Intrinsics 8500
1000000 Intrinsics w/-O2 3500

As you can see, the code gets much faster with the Intrinsics, and even faster with the optimizations.

Afterwords
  • You will see an improvement in most cases, and consider that it gets better when the arrays are larger
  • It gets even better with optimizations (-O2)
  • There is still a difference with smaller arrays (10000 element ones)
  • The key factor is to look for repetitive operations in large arrays
  • You may lose some portability, as some functions may not be available in every microprocessor
  • There are some transition penalties when switching between AVX and SSE, so when mixing both this should be considered

Hope you guys liked the post, please feel free to ask any questions and if I can/know, I will answer you.


Debugging a SIGSEGV Backtrace

Before begin my post I need introduce you the term of backtrace, a backtrace is a series of the last function calls in your program (view $man backtrace), with a backtrace you can access the call stack, so, in other words how did you get to that point in your program.

Today working on a code I got a SIGSEGV and the obviously subsecuently crash. After checking the log I found this backlog (which was made using backtrace()):

[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1233: SIGSEGV(11), puntero 0xc0 desde 0x7f288f7c79aa
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (0) /usr/lib64/twsmedia/libtwsmedia.so(twsmedia_widget_alarm_pool_draw+0x1680)[0x7f288f7c79aa]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (1) /usr/lib64/twsmedia/libtwsmedia.so(twsmedia_widget_alarm_pool_draw+0x1680)[0x7f288f7c79aa]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (2) /usr/sbin/mwconstructor[0x40a2c0]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (3) /lib64/libpthread.so.0(+0x7df5)[0x7f289173edf5]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (4) /lib64/libc.so.6(clone+0x6d)[0x7f288834c1ad]

Viewing this log you know where the problem was … but… Which line is it?

The simplest way to debug(if you dont know this trick) is run gdb and try to reproduce the bug, but not every time its a great decision.

But, wait! what you want to do then?

We will search directly inside the .o of the library for the problematic line… Lets begin:

Use nm to find the function’s start position on .o file

nm src/twsmedia_widget.o | less

You will find something like this line

000000000000cc9a T twsmedia_widget_alarm_pool_draw

0xcc9a is the start line of twsmedia_widget_alarm_pool_draw function

Then, add 0x1680 offset (twsmedia_widget_alarm_pool_draw+0x1680)) to that pos, in this case resulting in 0xe31a

For last, call addr2line, to search for the specific line in the .text section of the object

$ addr2line -j .text -e src/twsmedia_widget.o 0x000000000000e31a
 src/twsmedia_widget.c:3139

And Problem solved! Now we have the problematic line:  src/twsmedia_widget.c:3139

Its important to highlight that this method won’t work in some scenarios like having static functions (because you won’t have the function names in the backtrace).

That’s all for now, see you later!

 

Build shared libraries with invalid instructions

Some background

We have to decode many videos in multiple encodings(mpeg2video, h264, etc). The decoding proccess consumes too much CPU, and sometimes in one server we decode up to 20 Full HD videos, to make this possible we aid the CPU with either Nvidia CUDA or Intel QSV.

We distribute our software in rpm packages, which are built in our R&D department and deployed to our clients, usually this means that the hardware specifications are not the same.

These days we dealt with an interesting problem, we wanted to unify our decoding library and make it’s compilation hardware independant. CUDA didn’t pose a problem because we install the libraries in all of our systems whether they have a video card or not. However, Intel QSV support proved to be a little bit more difficult… If Intel QSV is not supported by the processor, the application raises a SIGILL (Illegal Instruction), which means that the processor tried to execute an invalid instruction, and stops the execution.

We wanted to keep one package for the library that could be built in all of our processors (whether they support Intel QSV or not). Up to now, if we wanted Intel QSV, we had to compile the library in the specific processor that supported it, and we had a bunch of conditions in our Makefiles to detect the availabity of QSV and build the library with it. Since the decoding library changes very often, new rpm packages are uploaded almost daily by us, so if the CPU doesn’t support QSV the package is built without it, and when is installed in a CPU that does support it, it doesn’t use it D:

Challenge accepted
Wait.. what is the challenge?

We want to run an application that may have invalid instructions in some of the libraries it includes, and decide during execution time whether to call them or not…

The solution

So what did we do? First of all, we took away the Intel QSV decoding part, and made a new library with it, like a wrapper. You might be wondering what does this change, because our main decoding library would link the QSV library and it should throw a SIGILL same as before.

What I didn’t mention is how do we link this new library, we are not linking it as a shared object when compiling, we dynamically load it during execution. How do we do it? Using ldfcn(), this allow us to read object files, such as a shared object, and link it during execution. By linking in this way, we can check the processor during execution and link the QSV library only if supported, however we have to manually link every function we have in the library to function pointers in the main decoding library context using dlsym(). To sum up:

  1. Divide the main decoding library and the QSV part
  2. Build the latter as a shared object
  3. Check in the main library when is QSV supported
  4. Dynamically load the .so
spongebob
The solution
A few considerations

We need the headers of the QSV library because they provide the definitions of the structures, so the rpm package will be installed always, whether we have QSV support or not, this way we can include the header and compile the main decoding library without errors.

It is important not to link QSV when compiling, if we do so the application will compile but when we try to run it, a SIGILL will show up (when not supported) before the main() is even executed. This happens because the library has a constructor that is called when the library is loaded.

If the unsupported library has too many API functions, we will have to link each and every one of them manually, this can be tedious .__. And take care not to load the library when not supported because you will have a SIGILL immediately

tag tag tag!

The problem

Lot of branches, therefore much disorder. To solve this we can tag and archive.

To keep the git repo tidy we usually recommend to avoid leaving open branches that have already been merged to the main branch.


How to tag a branch and then close correctly on local and remote repos..

1 - Create the tag (locally)
     git tag archive/<branchname> <branchname>
2 - push the new tag to remote
     git push --tags
3 - delete branch locally
     git branch -d <branchname>
4 - delete branch on remote repo
     git push origin --delete <branch_name>
5 - go to master branch (or other branch)
     git checkout master

How to recover older tagged branch

1 - go to tag
     git checkout archive/<branchname>
2 - recreate branch
     git checkout -b new_branch_name

This is all for now! matta ne!

Hello world!

This is 3 Way Solutions’s official blog focused on technology issues and geeky stuff. In here you will get to know about us in the Research and Development department, our perspectives, challenges, issues, and how we tackle them (or not). Most of our posts will be regarding TV (both analog and digital), Linux, Programming, HPC, etc and how we merge all this to create our products.

3 Way Solutions is a company that sells products and solutions (mostly hardware), but the essence of these products is the software, so the R&D dept is composed mainly of programmers. All of our systems are based on GNU/Linux, and some of our main programming languages are C, C++, Perl, Bash scripts, and more. Our products are intended mainly for broadcast, cable, professional video, and goverment focused on TV Recording, Content detection, Media Monitoring, Content Repurpouse, compliance, QoS and QoE monitoring.

All of our blog entries will be authored by our developers and we will try to keep them in English (we are based in Argentina, so our first language is Spanish, sorry in advance for our English n.nb). We will share, of course without giving our top secrets, tips, tricks, sample scripts and apps, different approaches, and issues we face, hoping to enrich the community and looking for fellow developers perspectives and opinions. Thank you for reading and we hope you like and participate in our posts.

To the infinity and beyond
To the infinity and beyond