vlc/doc/developer/decoders.xml

<chapter> <title> How to write a decoder </title>

  <sect1> <title> What is precisely a decoder in the VLC scheme ? </title>

    <para>
The decoder does the mathematical part of the process of playing a
stream. It is separated from the demultiplexers (in the input module),
which manage packets to rebuild a continuous elementary stream, and from
the output thread, which takes samples reconstituted by the decoder
and plays them. Basically, a decoder has no interaction with devices,
it is purely algorithmic.
    </para>

    <para>
In the next section we will describe how the decoder retrieves the
stream from the input. The output API (how to say "this sample is
decoded and can be played at xx") will be talked about in the next
chapters.
    </para>

  </sect1>

  <sect1> <title> Decoder configuration </title>

    <para>
The input thread spawns the appropriate decoder modules from <filename>
src/input/input_dec.c</filename>. The <function>Dec_CreateThread</function>
function selects the more accurate decoder module. Each decoder module
looks at decoder_config.i_type and returns a score [ see the modules
section ]. It then launches <function> module.pf_run()</function>,
with a <type>decoder_config_t</type>, described in <filename>
include/input_ext-dec.h</filename>.
    </para>

    <para>
The generic <type>decoder_config_t</type> structure, gives the decoder
the ES ID and type, and pointers to a <type> stream_control_t </type>
structure (gives information on the play status), a <type> decoder_fifo_t
</type> and <parameter> pf_init_bit_stream</parameter>, which will be
described in the next two sections.
    </para>

  </sect1>

  <sect1> <title> Packet structures </title>

    <para>
The input module provides an advanced API for delivering stream data
to the decoders. First let's have a look at the packet structures.
They are defined in <filename> include/input_ext-dec.h</filename>.
    </para>

    <para>
<type>data_packet_t</type> contains a pointer to the physical location
of data. Decoders should only start to read them at <parameter>
p_payload_start </parameter> until <parameter> p_payload_end</parameter>.
Thereafter, it will switch to the next packet, <parameter> p_next
</parameter> if it is not <constant>NULL</constant>. If the
<parameter> b_discard_payload
</parameter> flag is up, the content of the packet is messed up and it
should be discarded.
    </para>

    <para>
<type>data_packet_t</type> are contained into <type>pes_packet_t</type>.
<type>pes_packet_t</type> features a chained list
(<parameter>p_first</parameter>) of <type>data_packet_t
</type> representing (in the MPEG paradigm) a complete PES packet. For
PS streams, a <type> pes_packet_t </type> usually only contains one
<type>data_packet_t</type>. In TS streams though, one PES can be split
among dozens of TS packets. A PES packet has PTS dates (see your
MPEG specification for more information) and the current pace of reading
that should be applied for interpolating dates (<parameter>i_rate</parameter>).
<parameter> b_data_alignment </parameter> (if available in the system
layer) indicates if the packet is a random access point, and <parameter>
b_discontinuity </parameter> tells whether previous packets have been
dropped.
    </para>

    <mediaobject>
      <imageobject>
        <imagedata fileref="ps.png" format="PNG" scalefit="1" scale="95" />
      </imageobject>
      <imageobject>
        <imagedata fileref="ps.gif" format="GIF" />
      </imageobject>
      <textobject>
        <phrase> A PES packet in a Program Stream </phrase>
      </textobject>
      <caption>
        <para> In a Program Stream, a PES packet features only one
        data packet, whose buffer contains the PS header, the PES
        header, and the data payload.
        </para>
      </caption>
    </mediaobject>

    <mediaobject>
      <imageobject>
        <imagedata fileref="ts.png" format="PNG" scalefit="1" scale="95" />
      </imageobject>
      <imageobject>
        <imagedata fileref="ts.gif" format="GIF" />
      </imageobject>
      <textobject>
        <phrase> A PES packet in a Transport Stream </phrase>
      </textobject>
      <caption>
        <para> In a Transport Stream, a PES packet can feature an
        unlimited number of data packets (three on the figure)
        whose buffers contains the PS header, the PES
        header, and the data payload.
        </para>
      </caption>
    </mediaobject>

    <para>
The structure shared by both the input and the decoder is <type>
decoder_fifo_t</type>. It features a rotative FIFO of PES packets to
be decoded. The input provides macros to manipulate it : <function>
DECODER_FIFO_ISEMPTY, DECODER_FIFO_ISFULL, DECODER_FIFO_START,
DECODER_FIFO_INCSTART, DECODER_FIFO_END, DECODER_FIFO_INCEND</function>.
Please remember to take <parameter>p_decoder_fifo-&gt;data_lock
</parameter> before any operation on the FIFO.
    </para>

    <para>
The next packet to be decoded is DECODER_FIFO_START( *p_decoder_fifo ).
When it is finished, you need to call <function>
p_decoder_fifo-&gt;pf_delete_pes( p_decoder_fifo-&gt;p_packets_mgt,
DECODER_FIFO_START( *p_decoder_fifo ) ) </function> and then
<function> DECODER_FIFO_INCSTART( *p_decoder_fifo )</function> to
return the PES to the <link linkend="input_buff">buffer manager</link>.
    </para>

    <para>
If the FIFO is empty (<function>DECODER_FIFO_ISEMPTY</function>), you
can block until a new packet is received with a cond signal :
<function> vlc_cond_wait( &amp;p_fifo-&gt;data_wait,
&amp;p_fifo-&gt;data_lock )</function>. You have to hold the lock before
entering this function. If the file is over or the user quits,
<parameter>p_fifo-&gt;b_die</parameter> will be set to 1. It indicates
that you must free all your data structures and call <function>
vlc_thread_exit() </function> as soon as possible.
    </para>

  </sect1>

  <sect1> <title> The bit stream (input module) </title>

    <para>
This classical way of reading packets is not convenient, though, since
the elementary stream can be split up arbitrarily. The input module
provides primitives which make reading a bit stream much easier.
Whether you use it or not is at your option, though if you use it you
shouldn't access the packet buffer any longer.
    </para>

    <para>
The bit stream allows you to just call <function> GetBits()</function>,
and this functions will transparently read the packet buffers, change
data packets and pes packets when necessary, without any intervention
from you. So it is much more convenient for you to read a continuous
Elementary Stream, you don't have to deal with packet boundaries
and the FIFO, the bit stream will do it for you.
    </para>

    <para>
The central idea is to introduce a buffer of 32 bits [normally
<type> WORD_TYPE</type>, but 64-bit version doesn't work yet], <type>
bit_fifo_t</type>. It contains the word buffer and the number of
significant bits (higher part). The input module provides five
inline functions to manage it :
    </para>

    <itemizedlist>
      <listitem> <para> <type> u32 </type> <function> GetBits </function>
      <parameter>( bit_stream_t * p_bit_stream, unsigned int i_bits )
      </parameter> :
      Returns the next <parameter> i_bits </parameter> bits from the
      bit buffer. If there are not enough bits, it fetches the following
      word from the <type>decoder_fifo_t</type>. This function is only
      guaranteed to work with up to 24 bits. For the moment it works until
      31 bits, but it is a side effect. We were obliged to write a different
      function, <function>GetBits32</function>, for 32-bit reading,
      because of the &lt;&lt; operator.
      </para> </listitem>

      <listitem> <para> <function> RemoveBits </function> <parameter>
      ( bit_stream_t * p_bit_stream, unsigned int i_bits ) </parameter> :
      The same as <function> GetBits()</function>, except that the bits
      aren't returned (we spare a few CPU cycles). It has the same
      limitations, and we also wrote <function> RemoveBits32</function>.
      </para> </listitem>

      <listitem> <para> <type> u32 </type> <function> ShowBits </function>
      <parameter>( bit_stream_t * p_bit_stream, unsigned int i_bits )
      </parameter> :
      The same as <function> GetBits()</function>, except that the bits
      don't get flushed after reading, so that you need to call
      <function> RemoveBits() </function> by hand afterwards. Beware,
      this function won't work above 24 bits, except if you're aligned
      on a byte boundary (see next function).
      </para> </listitem>

      <listitem> <para> <function> RealignBits </function> <parameter>
      ( bit_stream_t * p_bit_stream ) </parameter> :
      Drops the n higher bits (n &lt; 8), so that the first bit of
      the buffer be aligned an a byte boundary. It is useful when
      looking for an aligned startcode (MPEG for instance).
      </para> </listitem>

      <listitem> <para> <function> GetChunk </function> <parameter>
      ( bit_stream_t * p_bit_stream, byte_t * p_buffer, size_t i_buf_len )
      </parameter> :
      It is an analog of <function> memcpy()</function>, but taking
      a bit stream as first argument. <parameter> p_buffer </parameter>
      must be allocated and at least <parameter> i_buf_len </parameter>
      long. It is useful to copy data you want to keep track of.
      </para> </listitem>
    </itemizedlist>

    <para>
All these functions recreate a continuous elementary stream paradigm.
When the bit buffer is empty, they take the following word in the
current packet. When the packet is empty, it switches to the next
<type>data_packet_t</type>, or if unapplicable to the next <type>
pes_packet_t</type> (see <function>
p_bit_stream-&gt;pf_next_data_packet</function>). All this is
completely transparent.
    </para>

    <note> <title> Packet changes and alignment issues </title>
    <para>
      We have to study the conjunction of two problems. First, a
      <type> data_packet_t </type> can have an even number of bytes,
      for instance 177, so the last word will be truncated. Second,
      many CPU (sparc, alpha...) can only read words aligned on a
      word boundary (that is, 32 bits for a 32-bit word). So packet
      changes are a lot more complicated than you can imagine, because
      we have to read truncated words and get aligned.
    </para>

    <para>
      For instance <function> GetBits() </function> will call
      <function> UnalignedGetBits() </function> from <filename>
      src/input/input_ext-dec.c</filename>. Basically it will
      read byte after byte until the stream gets realigned. <function>
      UnalignedShowBits() </function> is a bit more complicated
      and may require a temporary packet
      (<parameter>p_bit_stream-&gt;showbits_data</parameter>).
    </para> </note>

    <para>
To use the bit stream, you have to call <parameter>
p_decoder_config-&gt;pf_init_bit_stream( bit_stream_t * p_bit_stream,
decoder_fifo_t * p_fifo )</parameter> to set up all variables. You will
probably need to regularly fetch specific information from the packet,
for instance the PTS. If <parameter> p_bit_stream-&gt;pf_bit_stream_callback
</parameter> is not <constant> NULL</constant>, it will be called
on a packet change. See <filename> src/video_parser/video_parser.c
</filename> for an example. The second argument
indicates whether it is just a new <type>data_packet_t</type> or
also a new <type>pes_packet_t</type>. You can store your own structure in
<parameter> p_bit_stream-&gt;p_callback_arg</parameter>.
    </para>

    <warning> <para>
      When you call <function>pf_init_bit_stream</function>, the
      <function>pf_bitstream_callback</function> is not defined yet,
      but it jumps to the first packet, though. You will probably
      want to call your bitstream callback by hand just after
      <function> pf_init_bit_stream</function>.
    </para> </warning>

  </sect1>

  <sect1> <title> Built-in decoders </title>

    <para>
VLC already features an MPEG layer 1 and 2 audio decoder, an MPEG MP@ML
video decoder, an AC3 decoder (borrowed from LiViD), a DVD SPU decoder,
and an LPCM decoder. You can write your own decoder, just mimic the
video parser.
    </para>

    <note> <title> Limitations in the current design </title>
    <para>
To add a new decoder, you'll still have to add the stream type as there's
still a a hard-wired piece of code in <filename> src/input/input_programs.c
</filename>.
    </para> </note>

    <para>
The MPEG audio decoder is native, but doesn't support layer 3 decoding
[too much trouble], the AC3 decoder is a port from Aaron
Holtzman's libac3 (the original libac3 isn't reentrant), and the
SPU decoder is native. You may want to have a look at <function>
BitstreamCallback </function> in the AC3 decoder. In that case we have
to jump the first 3 bytes of a PES packet, which are not part of the
elementary stream. The video decoder is a bit special and will
be described in the following section.
    </para>

    </sect1>

    <sect1> <title> The MPEG video decoder </title>

      <para>
VLC media player provides an MPEG-1, and an MPEG-2 Main Profile @
Main Level decoder. It has been natively written for VLC, and is quite
mature. Its status is a bit special, since it is splitted between two
logicial entities : video parser and video decoder.
The initial goal is to separate bit stream parsing functions from
highly parallelizable mathematical algorithms. In theory, there can be
one video parser thread (and only one, otherwise we would have race
conditions reading the bit stream), along with a pool of video decoder
threads, which do IDCT and motion compensation on several blocks
at once.
      </para>

      <para>
It doesn't (and won't) support MPEG-4 or DivX decoding. It is not an
encoder. It should support the whole MPEG-2 MP@ML specification, though
some features are still left untested, like Differential Motion Vectors.
Please bear in mind before complaining that the input elementary stream
must be valid (for instance this is not the case when you directly read
a DVD multi-angle .vob file).
      </para>

      <para>
The most interesting file is <filename> vpar_synchro.c</filename>, it is
really worth the shot. It explains the whole frame dropping algorithm.
In a nutshell, if the machine is powerful enough, we decoder all IPBs,
otherwise we decode all IPs and Bs if we have enough time (this is
based on on-the-fly decoding time statistics). Another interesting file
is <filename>vpar_blocks.c</filename>, which describes all block
(including coefficients and motion vectors) parsing algorithms. Look
at the bottom of the file, we indeed generate one optimized function
for every common picture type, and one slow generic function. There
are also several levels of optimization (which makes compilation slower
but certain types of files faster decoded) called <constant>
VPAR_OPTIM_LEVEL</constant>, level 0 means no optimization, level 1
means optimizations for MPEG-1 and MPEG-2 frame pictures, level 2
means optimizations for MPEG-1 and MPEG-2 field and frame pictures.
      </para>

      <sect2> <title> Motion compensation plug-ins </title>

        <para>
Motion compensation (i.e. copy of regions from a reference picture) is
very platform-dependant (for instance with MMX or AltiVec versions), so
we moved it to the <filename> plugins/motion </filename> directory. It
is more convenient for the video decoder, and resulting plug-ins may
be used by other video decoders (MPEG-4 ?). A motion plugin must
define 6 functions, coming straight from the specification :
<function> vdec_MotionFieldField420, vdec_MotionField16x8420,
vdec_MotionFieldDMV420, vdec_MotionFrameFrame420, vdec_MotionFrameField420,
vdec_MotionFrameDMV420</function>. The equivalent 4:2:2 and 4:4:4
functions are unused, since these formats are forbidden in MP@ML (it
would only take longer compilation time).
        </para>

        <para>
Look at the C version of the algorithms if you want more information.
Note also that the DMV algorithm is untested and is probably buggy.
        </para>

      </sect2>

      <sect2> <title> IDCT plug-ins </title>

        <para>
Just like motion compensation, IDCT is platform-specific. So we moved it
to <filename> plugins/idct</filename>. This module does the IDCT
calculation, and copies the data to the final picture. You need to define
seven methods :
        </para>

        <itemizedlist>
          <listitem> <para> <function> vdec_IDCT </function> <parameter>
          ( decoder_config_t * p_config, dctelem_t * p_block, int )
          </parameter> :
          Does the complete 2-D IDCT. 64 coefficients are in <parameter>
          p_block</parameter>.
          </para> </listitem>

          <listitem> <para> <function> vdec_SparseIDCT </function>
          <parameter> ( vdec_thread_t * p_vdec, dctelem_t * p_block,
          int i_sparse_pos ) </parameter> :
          Does an IDCT on a block with only one non-NULL coefficient
          (designated by <parameter> i_sparse_pos</parameter>). You can
          use the function defined in <filename> plugins/idct/idct_common.c
          </filename> which precalculates these 64 matrices at
          initialization time.
          </para> </listitem>

          <listitem> <para> <function> vdec_InitIDCT </function>
          <parameter> ( vdec_thread_t * p_vdec ) </parameter> :
          Does the initialization stuff needed by <function>
          vdec_SparseIDCT</function>.
          </para> </listitem>

          <listitem> <para> <function> vdec_NormScan </function>
          <parameter> ( u8 ppi_scan[2][64] ) </parameter> :
          Normally, this function does nothing. For minor optimizations,
          some IDCT (MMX) need to invert certain coefficients in the
          MPEG scan matrices (see ISO/IEC 13818-2).
          </para> </listitem>

          <listitem> <para> <function> vdec_InitDecode </function>
          <parameter> ( struct vdec_thread_s * p_vdec ) </parameter> :
          Initializes the IDCT and optional crop tables.
          </para> </listitem>

          <listitem> <para> <function> vdec_DecodeMacroblockC </function>
          <parameter> ( struct vdec_thread_s *p_vdec,
          struct macroblock_s * p_mb ); </parameter> :
          Decodes an entire macroblock and copies its data to the final
          picture, including chromatic information.
          </para> </listitem>

          <listitem> <para> <function> vdec_DecodeMacroblockBW </function>
          <parameter> ( struct vdec_thread_s *p_vdec,
          struct macroblock_s * p_mb ); </parameter> :
          Decodes an entire macroblock and copies its data to the final
          picture, except chromatic information (used in grayscale mode).
          </para> </listitem>
        </itemizedlist>

        <para>
Currently we have implemented optimized versions for : MMX, MMXEXT, and
AltiVec [doesn't work]. We have two plain C versions, the normal
(supposedly optimized) Berkeley version (<filename>idct.c</filename>),
and the simple 1-D separation IDCT from the ISO reference decoder
(<filename>idctclassic.c</filename>).
        </para>

      </sect2>

      <sect2> <title> Symmetrical Multiprocessing </title>

        <para>
The MPEG video decoder of VLC can take advantage of several processors if
necessary. The idea is to launch a pool of decoders, which will do
IDCT/motion compensation on several macroblocks at once.
        </para>

        <para>
The functions managing the pool are in <filename>
src/video_decoder/vpar_pool.c</filename>. Its use on non-SMP machines is
not recommanded, since it is actually slower than the monothread version.
Even on SMP machines sometimes...
        </para>

      </sect2>

  </sect1>

</chapter>