Reverse ssh tunnel Part 2: Going through Windows

Preface

We did a post about reverse ssh tunneling post, which is in spanish (we might redo it in english… or not), so this is part 2.

The main idea of the Part 1 was to make a reverse ssh tunnel to access remotely a server in a private network and the network staff won’t allow us to access it directly.

The issue

Let’s say we have a remote server we want to access (http or ssh) but this server is in a local network and doesn’t have internet access, we will call it Server_1 and its local IP is 172.20.1.1. Then we have our local PC, called Local_PC, which has internet connection and our public IP is 200.100.1.1. If we want to access directly to Server_1 we won’t be allowed since it doesn’t have internet connection.

Tunneling through anything

Then when we need to access the server, the client in the remote network provides us with a Teamviewer connection though a Laptop running windows, what can we do to access directly from our Local_PC running our beloved Linux?

Principle of least effort
The solution

Putty

Yes, we can use Putty to make a reverse ssh tunnel from Server_1 to Local_PC, but how do we do so?

You open up Putty in the laptop and go to Connection -> SSH -> Tunnels then you have to input the Source Port which will be Local_Pc: 200.100.1.1:<remote_port> (be sure not to use an already occupied port). Then in Destination we input 172.20.1.1:22 for ssh or 172.20.1.1:80 for http or whatever port you need. Change the checkbox from Local to Remote, since we are doing a reverse tunnel, and simply click Add. Finally go back to the Session section and connect to Local_PC via ssh and voilá!

Putty <3 (Note the “Remote” checkbox)

You will be able to access Server_1 from Local_PC simply doing:

ssh user@localhost -p <remote_port>

It is possible to Add more tunnels with a single connection, so you can tunnel an ssh connection and a http one.

13 Reasons Why… Vim

Hello, do you have a moment to talk about Vim?
Today we bring to you THE ultimate text editor: Vim

Before Vim our life was sad, really sad, then we found it, and our life was sad and complicated :(, but then, we learnt how to use it, and now, we are ridiculously happy!!( except for the designer, he uses Windows and can’t use Vim like us, so he is still sad)
So, why would you want to use it?? Let’ s see:

1 – If you don’t have money to buy a mouse, don’t worry, you don’t need one!.
2 – It´s doesn’t consume 130% of RAM like other editors.
3 – It´s stable, Vim just … never gonna give you up, never gonna let you down .. really, no crashing.
4 – You will find Vi ( Vim´s father) included in just about any unix derived system.
5 – Ideal when you are ssh-ing into a linux-running server or something, where don’t even need an X.
6 – Works fine even using a poor ssh or network connection.
7 – It’s customizable and extensible, you can personalize it as you see fit.
8 – It´s free!
9 – Wherever you go, you can take with you your vim configuration easily.
10 – Very nice documentation, just need to use “:help”.
11 – Vim is really powerfull and incredibly fast once you get past the initial learning curve (which is really steep, don’t give up halfway!).
12 – It has regex!
13 – Vim is awesome! (yes, that’s a reason).

 

 

Join the Vim side of the Force …

htop: uptime (!)

Realizando una revisión de rutina en los equipos de un cliente, noté algo que no me había llamado la atención hasta ahora…

Al abrir nuestro amado y user-friendly htop 

Un tipico htop con mas de 40 hilos de ejecucion, se tuvo que hacer un poco de crop en la imagen

En la imagen podemos ver que junto a la no pequeña suma de 322 dias de uptime tenemos un (!). Esto parece ser que siempre estuvo, pero nunca le preste atención, al investigar un poco sobre que es, llegue al codigo del htop donde podemos ver:

static void UptimeMeter_updateValues(Meter* this, char* buffer, int len) {
   int totalseconds = Platform_getUptime();
   if (totalseconds == -1) {
      snprintf(buffer, len, "(unknown)");
      return;
   }
   int seconds = totalseconds % 60;
   int minutes = (totalseconds/60) % 60;
   int hours = (totalseconds/3600) % 24;
   int days = (totalseconds/86400);
   this->values[0] = days;
   if (days > this->total) {
      this->total = days;
   }
   char daysbuf[15];
   if (days > 100) {
      snprintf(daysbuf, sizeof(daysbuf), "%d days(!), ", days);
   } else if (days > 1) {
      snprintf(daysbuf, sizeof(daysbuf), "%d days, ", days);
   } else if (days == 1) {
      snprintf(daysbuf, sizeof(daysbuf), "1 day, ");
   } else {
      daysbuf[0] = '\0';
   }
   snprintf(buffer, len, "%s%02d:%02d:%02d", daysbuf, hours, minutes, seconds);
}

Y ahi nos damos cuenta que solamente era el htop felicitandonos(?) por haber sobrepasado el hito de 100 dias de uptime 😛

Ahorrando tiempo de compilación con make -j

A la hora de compilar una libreria, un programa o el propio kernel de Linux, el comando make por defecto utiliza un sólo procesador lógico (thread) del sistema.
Para acelerar los tiempos podemos utilizar el parámetro -j e indicar la cantidad de procesadores lógicos que queremos utilizar.

Ejemplo:

make -j 8

¿Cómo saber que cantidad de procesadores lógicos o threads tengo disponible?

cat /proc/cpuinfo | grep processor | wc -l

En el siguiente video les mostraré como compilar el kernel con un servidor Dell con doble procesador Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

 

Aprovechamos este post para invitarlos a participar de la edición 2017 de NAB SHOW en Las Vegas.

Se pueden registrar en el siguiente link: http://www.3way.com.ar/nab2017.php

Acceso remoto por tunel SSH inverso cuando no hay acceso

Cuantas veces han querido tener acceso SSH a un servidor dentro de una red privada de una empresa y no han logrado que el encargado en redes les haga un NAT hacia el puerto SSH (22) del servidor requerido.

A veces por políticas de seguridad o por inconvenientes en la configuración no logran redirigir algún puerto público al puerto 22 del servidor que queremos tener acceso.

Hay una solución, no temas tener que caer en las manos de un TeamViewer sin licencia.

Si Mahoma no va a la montaña, la montaña va a Mahoma.
Hay una posibilidad de hacer un SSH inverso (o túnel SSH).

¿Cómo?

Solo hay que acceder una única vez al servidor (el día de la instalación, o pedirle a alguien que ejecute una línea) y tener desde la pc que estamos intentando conectarnos el servidor de SSH corriendo, como también conocer la ip pública.
Sabiendo

esto, sólo es necesario ejecutar un comando de SSH:

ssh -N -f -R {puerto_destino}:localhost:22 {ip_publica_nuestra}

Ej: ssh -N -f -R 22022:localhost:22 200.142.168.151

Este comando se conecta por SSH a nuestra PC (pide login a nuestra pc) y deja un túnel creado asociado al localhost de nuestra pc, para que si luego nos queremos conectar a ese servidor lo podamos hacer con este simple comando:

Ej: ssh -p 22022 root@localhost

De esta manera tenemos creado un túnel constante a nuestra pc y nos podemos conectar sin necesidad de que haya que redirigir ningún puerto 22 desde el router.

* El servidor debe tener configurada correctamente la salida a Internet.
* Debe estar corriendo el servidor SSH desde la ip pública que estamos saliendo a Internet.
* Dicho script remoto podría quedar configurado al inicio con un  certificado SSH para que la conexión se restablezca sola si el servidor se reinicia.

When clipping a video, mind the GOP and hope the I-frame is an IDR-frame

A picture is worth a thousand words, but if it is a B-frame it may be worth two hundred words.

Mind the GOP
Mind the GOP

One of our products ,ViDeus Auditor, lets you clip and join videos, showing a preview before doing the actual clipping. For doing it, we have to understand how an encoded video is composed. We usually work with H264.

When encoding each video picture you can get a I-frame, a P-frame or a B-frame.

  • The I-frame is the easy one, all the information for decoding the picture is within the I-frame.
  • The P-frame is a frame which needs previous decoded pictures for being decoded. So it uses information from old pictures.
  • And the B-frame needs decoded pictures from the past and from the future. So it uses information from old and future pictures.
    For example, you can get something like this:
I B B P B B P B B P

The I-frame can be decoded instantaneously, then the second frame (B-frame) needs information from previous frames (the I-frame for example) and from future frames (like the P-frame).

As the B-frames may need information from the following frames, the stream is rearranged for decoding, in a way such as when a B-frame is being decoded, everything needed is there. So usually the frames are transmitted like this:

I P B B P B B P B B

This results in having a decoding time-stamp (DTS) less than the presentation time-stamp (PTS) in the rearranged frames.

That series of frames can be a GOP, a Group of Picture, a video is composed by a series of GOPs, each GOP starting with an I-frame, this would be three GOPs:

I B B P B B P B B P I B B P B B P B B P I B B P B B P B B P

As the I-frame doesn’t need any other information for decoding, that’s a good point for fast-clipping a video because all the information for decoding is within it; clipping a video in the middle of a GOP (when it’s not an I-frame) will most likely result in a corrupt output for a while until a new full GOP is decoded.

IDR-frame

But, clipping a video at a GOP start will not always result in a clean output.

The I-frame at the beginning will certainly be decoded fine, it doesn’t need anything special. However, the following B-frames and P-frames will probably need previous frames for being decoded correctly. Sometimes those needed frames are within the GOP which it is usefull, but sometimes they are outside the GOP which is bad for clipping, because it means they reference pictures which are before the I-frame where we cut the video, resulting in a corrupt output.

When frames from a GOP reference frames from another GOP it’s called Open GOP. If not, it is called a Closed GOP.

Hopefully, the video was encoded with IDR-frames. Those are a special case of I-frames. Apart from being an I-frame the IDR-frame ensures the following frames will not reference any frame before the IDR.
In a GOP the IDR-frame replace the I-frame, all IDR-frames are I-frames but not all I-frames are IDR-frames.

So, if an IDR is found that’s a good place for clipping, because that frame will be decoded without any other information and all the following frames will not require information from before the IDR-frame.

Next time you want to clip a video, mind the GOP and find an IDR-frame.

Using GCC Intrinsics (MMX, SSEx, AVX) to look for max value in array

To begin with, you shouldn’t start your new codes focusing on performance; functionality should be the key factor and consider leaving room for future improvements. But well.. after you did your job and everything is working as it should be, you might need to tweak your code a little bit to increase its performance.

efficiency
First functionality then efficiency
The problem

We want to look for the maximum value in an array, this array is composed of int16_t mono audio samples. The maximum value of the array will be the peak value during the audio interval being analysed, this peak is known as sample peak and should not be interpreted as the real peak of the audio which is the true peak (there is a very good explanation about the differences here).

A basic(?) solution

Ok, we have to look for the maximum value in an array and we focus on functionality, this is quite simple actually…

int16_t max = buff [0];
for(i = 1; i < size; i++) {
>....if(max < buff[i]) {
>....>....max = buff[i];
>....}
}

 The intrinsics

These are a series of functions which implement many MMX, SSE and AVX instructions, they are mapped directly to C functions and are also further optimized with gcc. Most of the instructions use vector operations, and you can work with 128, 256 or 512 bit vectors depending on the architecture and the compiler. There is a very detailed guide here and you can see a full list of the funcions here.

You will need to include the headers depending on what functions you want to call, or just include x86intrin.h, which will include all the available ones. Then, you will need to add the appropiate flag to the gcc compile line, in this specific case I’m using -maxv2. If you want to check the supported functions you may use the following command to list you the corresponding includes that gcc will use.

$ gcc -mavx2 -dM -E - < /dev/null | egrep "SSE|AVX"
#define __AVX__ 1
#define __AVX2__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1

The code

What I’m going to do is compare two buffers of 128 bits, one of them has the max value (initialized to 0, if there are all negative values the result will be wrong)  and the other will be the input buffer, this is done using: _mm_max_epi16() which will compare 8 values (int16_t) at a time. This is one of the reasons why the intrinsics increase the performance.

After going through the whole input buffer, I will have in maxval array the maximum value, but I don’t know the position. For the sake of the example I am doing two redundant things here. First, using _mm_shufflelo_epi16()/_mm_shufflehi_epi16() and the _mm_max_epi16()  I will compare values inside the vector and rotate them so the whole buffer has the maximum value, being 16 bit values I can’t shuffle the whole buffer so the shuffle is done in the high bits and the low bits separately.

Finally I will store the final vector in a int16_t array with _mm_store_si128() and I’ll look for the maximum inside it (I could have done this before, but I wanted to show the shuffle which might be useful if the samples were not 16 bit, and the shuffles were not partial).

int16_t find_max(int16_t* buff, int size)
{
    int16_t maxmax[8];
    int i;
    int16_t max = buff[0];

    __m128i *f8 = (__m128i*)buff;
    __m128i maxval = _mm_setzero_si128();
    __m128i maxval2 = _mm_setzero_si128();
    for (i = 0; i < size / 16; i++) {
        maxval = _mm_max_epi16(maxval, f8[i]);
    }
    maxval2 = maxval;
    for (i = 0; i < 3; i++) {
        maxval = _mm_max_epi16(maxval, _mm_shufflehi_epi16(maxval, 0x3));
        _mm_store_si128(&maxmax, maxval);
        maxval2 = _mm_max_epi16(maxval2, _mm_shufflelo_epi16(maxval2, 0x3));
        _mm_store_si128(&maxmax, maxval2);
    }
    _mm_store_si128(&maxmax, maxval);
    for(i = 0; i < 8; i++)
        if(max < maxmax[i])
            max = maxmax[i];
    return max;
}
Some numbers

I’m going to compare 3 different cases, and find the maximum value in random (pseudo random) arrays of 10000 and 1000000 samples.

  • Using an intuitive for loop as the one shown before
  • Same loop as before but with compiler optimizations
  • Using SSE instructions via intrinsics (sample code).

This table shows the total delay in us (micro seconds) that the different functions take.

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Elements Mode Avg. Delay [us]
10000 Default 650
10000 -O2 250
10000 Intrinsics 200
10000 Intrinsics w/-O2 30
1000000 Default 31000
1000000 -O2 13000
1000000 Intrinsics 8500
1000000 Intrinsics w/-O2 3500

As you can see, the code gets much faster with the Intrinsics, and even faster with the optimizations.

Afterwords
  • You will see an improvement in most cases, and consider that it gets better when the arrays are larger
  • It gets even better with optimizations (-O2)
  • There is still a difference with smaller arrays (10000 element ones)
  • The key factor is to look for repetitive operations in large arrays
  • You may lose some portability, as some functions may not be available in every microprocessor
  • There are some transition penalties when switching between AVX and SSE, so when mixing both this should be considered

Hope you guys liked the post, please feel free to ask any questions and if I can/know, I will answer you.


Debugging a SIGSEGV Backtrace

Before begin my post I need introduce you the term of backtrace, a backtrace is a series of the last function calls in your program (view $man backtrace), with a backtrace you can access the call stack, so, in other words how did you get to that point in your program.

Today working on a code I got a SIGSEGV and the obviously subsecuently crash. After checking the log I found this backlog (which was made using backtrace()):

[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1233: SIGSEGV(11), puntero 0xc0 desde 0x7f288f7c79aa
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (0) /usr/lib64/twsmedia/libtwsmedia.so(twsmedia_widget_alarm_pool_draw+0x1680)[0x7f288f7c79aa]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (1) /usr/lib64/twsmedia/libtwsmedia.so(twsmedia_widget_alarm_pool_draw+0x1680)[0x7f288f7c79aa]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (2) /usr/sbin/mwconstructor[0x40a2c0]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (3) /lib64/libpthread.so.0(+0x7df5)[0x7f289173edf5]
[17101 XX 12:17:05 (+6)][23417] {sigsafe} src/common.c@1252: [bt]: (4) /lib64/libc.so.6(clone+0x6d)[0x7f288834c1ad]

Viewing this log you know where the problem was … but… Which line is it?

The simplest way to debug(if you dont know this trick) is run gdb and try to reproduce the bug, but not every time its a great decision.

But, wait! what you want to do then?

We will search directly inside the .o of the library for the problematic line… Lets begin:

Use nm to find the function’s start position on .o file

nm src/twsmedia_widget.o | less

You will find something like this line

000000000000cc9a T twsmedia_widget_alarm_pool_draw

0xcc9a is the start line of twsmedia_widget_alarm_pool_draw function

Then, add 0x1680 offset (twsmedia_widget_alarm_pool_draw+0x1680)) to that pos, in this case resulting in 0xe31a

For last, call addr2line, to search for the specific line in the .text section of the object

$ addr2line -j .text -e src/twsmedia_widget.o 0x000000000000e31a
 src/twsmedia_widget.c:3139

And Problem solved! Now we have the problematic line:  src/twsmedia_widget.c:3139

Its important to highlight that this method won’t work in some scenarios like having static functions (because you won’t have the function names in the backtrace).

That’s all for now, see you later!

 

Codec vs Format: Parte 1

Comúnmente, cuando hablamos de formatos de video, se suele confundir el concepto de codec con el de formato .

La diferencia es simple: El codec hace referencia al tipo de algoritmo que se utilizó para comprimir vídeo (o audio, subtitulos), mientras que la palabra formato suele referirse a la combinación de transporte (o encapsulamiento) que se utilizó para almacenar audio y video sumado a los codecs que se utilizaron para comprimirlos.

Formato = Transporte + codecs

Ejemplo: XDCAM es un formato que utiliza el transporte MXF, el codec de video MPEG2Video y el audio en PCM

No hay convenciones para todas las combinaciones posibles de manera que por lo general el formato hace referencia al transporte y comunmente se manifiesta en la extensión del archivo.

Por ejemplo: MP4. Cuando se habla de formato MP4 suele asociarse a que el transporte es MP4, que el codec de video es H264 y el codec de audio es AAC, pero los codecs podrían ser otros también.

Ejemplos de transportes:
MP4, AVI, MOV, MKV, MPEG-TS, OGG, WMV

Ejemplos de códecs:
H264, MPEG4, WMV, MPEG2-VIDEO, AAC, AC3

No todos los transportes pueden contener todos los codecs, y puede suceder que un reproductor reconozca el transporte pero no algunos de los códecs.

Imagínense al transporte justamente como un medio de transporte de carga (avion, tren, autobus) y al codec como los tipos de cargamento que podrían ir en su interior. Por los general son independientes unos de otros, salvo algunas excepciones que determinados transportes son exclusivos para determinados codecs como por ejemplo el FLV

AVI y MP4 son transportes.

A mi entender AVI podría ser un tranporte antiguo, limitado e inseguro como este tren:

Resultado de imagen para tren

y MP4 podría ser algo así:

Resultado de imagen para tren

 

¿WMV es un codec o un formato?

WMV es el nombre de un formato pero también es el nombre del codec ambos creado por Microsoft. Como formato solo puede contener en su interior codec de video WMV y codec de audio WMA. El codec WMV de video a su vez puede estar contenido dentro de un formato AVI

Cuando alguien dice que tiene un archivo de formato MP4, en realidad no se sabe que codec de video va a tener en su interior, hay una cantidad de codec soportados como MPEG2VIDEO y H264, pero en realidad sólo está haciendo referencia al tranporte.

Es muy importante saber entonces que cuando hablamos de formato generalmente hablamos de transporte, y que dentro del mismo existen videos y/o audios comprimidos con algún codec.

Showing what the threads are doing, trying not to interrupt the process

Humans are curious, perhaps that makes us humans[1], and you might be curious about what your program is doing.
Sometimes, the program has several threads running and sometimes you can’t completely stop it neither kill it to see what it is doing.

A situation like that could be when there is a service running on a client and it has some problem, you suspect a thread (one of several) is locked or waiting for something that will never happen, but all other threads look like they are running fine; so you don’t want to interrupt neither kill the process for now.

Don't stop me now
Don’t stop me now

What I do, is to use gdb and write a file with the commands I want to run and ending the file with the ‘q’ command (quit), making gdb quit so the process can continue its execution. I usually write a file called ‘commands’ with this:

thread apply all bt
q

That will execute ‘bt’ (backtrace) to all threads and then ‘q’ (quit) gdb after executing backtrace. Printing the backtrace for all threads shows me (more or less) what the threads are executing.
Using the ‘commands’ file I run gdb like this:
gdb -p <pid> -x commands > /tmp/threads

Being <pid> the process pid.
Notice I redirect the output to a file, that is because unless I redirect to a file, gdb will stop the output when the console is full and wait for me to press ‘return’; which will make the process stop for a while, something I don’t want to happen.

After seeing the threads, I can write another ‘commands’ files with another instructions to gdb, like printing some variable.

[1]:
[…]the ability to ask questions is probably the central cognitive element that distinguishes human and animal cognitive abilities[…] (https://en.wikipedia.org/wiki/Great_ape_language#Limitations_of_ape_language)