When clipping a video, mind the GOP and hope the I-frame is an IDR-frame

A picture is worth a thousand words, but if it is a B-frame it may be worth two hundred words.

Mind the GOP
Mind the GOP

One of our products ,ViDeus Auditor, lets you clip and join videos, showing a preview before doing the actual clipping. For doing it, we have to understand how an encoded video is composed. We usually work with H264.

When encoding each video picture you can get a I-frame, a P-frame or a B-frame.

  • The I-frame is the easy one, all the information for decoding the picture is within the I-frame.
  • The P-frame is a frame which needs previous decoded pictures for being decoded. So it uses information from old pictures.
  • And the B-frame needs decoded pictures from the past and from the future. So it uses information from old and future pictures.
    For example, you can get something like this:
I B B P B B P B B P

The I-frame can be decoded instantaneously, then the second frame (B-frame) needs information from previous frames (the I-frame for example) and from future frames (like the P-frame).

As the B-frames may need information from the following frames, the stream is rearranged for decoding, in a way such as when a B-frame is being decoded, everything needed is there. So usually the frames are transmitted like this:

I P B B P B B P B B

This results in having a decoding time-stamp (DTS) less than the presentation time-stamp (PTS) in the rearranged frames.

That series of frames can be a GOP, a Group of Picture, a video is composed by a series of GOPs, each GOP starting with an I-frame, this would be three GOPs:

I B B P B B P B B P I B B P B B P B B P I B B P B B P B B P

As the I-frame doesn’t need any other information for decoding, that’s a good point for fast-clipping a video because all the information for decoding is within it; clipping a video in the middle of a GOP (when it’s not an I-frame) will most likely result in a corrupt output for a while until a new full GOP is decoded.

IDR-frame

But, clipping a video at a GOP start will not always result in a clean output.

The I-frame at the beginning will certainly be decoded fine, it doesn’t need anything special. However, the following B-frames and P-frames will probably need previous frames for being decoded correctly. Sometimes those needed frames are within the GOP which it is usefull, but sometimes they are outside the GOP which is bad for clipping, because it means they reference pictures which are before the I-frame where we cut the video, resulting in a corrupt output.

When frames from a GOP reference frames from another GOP it’s called Open GOP. If not, it is called a Closed GOP.

Hopefully, the video was encoded with IDR-frames. Those are a special case of I-frames. Apart from being an I-frame the IDR-frame ensures the following frames will not reference any frame before the IDR.
In a GOP the IDR-frame replace the I-frame, all IDR-frames are I-frames but not all I-frames are IDR-frames.

So, if an IDR is found that’s a good place for clipping, because that frame will be decoded without any other information and all the following frames will not require information from before the IDR-frame.

Next time you want to clip a video, mind the GOP and find an IDR-frame.

Codec vs Format: Parte 1

Comúnmente, cuando hablamos de formatos de video, se suele confundir el concepto de codec con el de formato .

La diferencia es simple: El codec hace referencia al tipo de algoritmo que se utilizó para comprimir vídeo (o audio, subtitulos), mientras que la palabra formato suele referirse a la combinación de transporte (o encapsulamiento) que se utilizó para almacenar audio y video sumado a los codecs que se utilizaron para comprimirlos.

Formato = Transporte + codecs

Ejemplo: XDCAM es un formato que utiliza el transporte MXF, el codec de video MPEG2Video y el audio en PCM

No hay convenciones para todas las combinaciones posibles de manera que por lo general el formato hace referencia al transporte y comunmente se manifiesta en la extensión del archivo.

Por ejemplo: MP4. Cuando se habla de formato MP4 suele asociarse a que el transporte es MP4, que el codec de video es H264 y el codec de audio es AAC, pero los codecs podrían ser otros también.

Ejemplos de transportes:
MP4, AVI, MOV, MKV, MPEG-TS, OGG, WMV

Ejemplos de códecs:
H264, MPEG4, WMV, MPEG2-VIDEO, AAC, AC3

No todos los transportes pueden contener todos los codecs, y puede suceder que un reproductor reconozca el transporte pero no algunos de los códecs.

Imagínense al transporte justamente como un medio de transporte de carga (avion, tren, autobus) y al codec como los tipos de cargamento que podrían ir en su interior. Por los general son independientes unos de otros, salvo algunas excepciones que determinados transportes son exclusivos para determinados codecs como por ejemplo el FLV

AVI y MP4 son transportes.

A mi entender AVI podría ser un tranporte antiguo, limitado e inseguro como este tren:

Resultado de imagen para tren

y MP4 podría ser algo así:

Resultado de imagen para tren

 

¿WMV es un codec o un formato?

WMV es el nombre de un formato pero también es el nombre del codec ambos creado por Microsoft. Como formato solo puede contener en su interior codec de video WMV y codec de audio WMA. El codec WMV de video a su vez puede estar contenido dentro de un formato AVI

Cuando alguien dice que tiene un archivo de formato MP4, en realidad no se sabe que codec de video va a tener en su interior, hay una cantidad de codec soportados como MPEG2VIDEO y H264, pero en realidad sólo está haciendo referencia al tranporte.

Es muy importante saber entonces que cuando hablamos de formato generalmente hablamos de transporte, y que dentro del mismo existen videos y/o audios comprimidos con algún codec.

Hello world!

This is 3 Way Solutions’s official blog focused on technology issues and geeky stuff. In here you will get to know about us in the Research and Development department, our perspectives, challenges, issues, and how we tackle them (or not). Most of our posts will be regarding TV (both analog and digital), Linux, Programming, HPC, etc and how we merge all this to create our products.

3 Way Solutions is a company that sells products and solutions (mostly hardware), but the essence of these products is the software, so the R&D dept is composed mainly of programmers. All of our systems are based on GNU/Linux, and some of our main programming languages are C, C++, Perl, Bash scripts, and more. Our products are intended mainly for broadcast, cable, professional video, and goverment focused on TV Recording, Content detection, Media Monitoring, Content Repurpouse, compliance, QoS and QoE monitoring.

All of our blog entries will be authored by our developers and we will try to keep them in English (we are based in Argentina, so our first language is Spanish, sorry in advance for our English n.nb). We will share, of course without giving our top secrets, tips, tricks, sample scripts and apps, different approaches, and issues we face, hoping to enrich the community and looking for fellow developers perspectives and opinions. Thank you for reading and we hope you like and participate in our posts.

To the infinity and beyond
To the infinity and beyond