Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

H.264 and Video Compression


Stewart Taylor is a software architect at Intel Corporation and was the lead designer of the Intel IPP functions. He is also author of Optimizing Applications for Multi-Core Processors, from which this article is adapted. Copyright (c) 2007 Intel Corporation. All rights reserved.

The two series of video codec nomenclature H.26x and MPEG-x overlap. MPEG-2 is named H.262 in the H.26x scheme. Likewise, another popular codec, H.264, is a subset of MPEG-4 also known as MPEG-4 Advanced Video Coding (AVC). Its intent, like that of all of MPEG-4, was to produce video compression of acceptable quality and very low bit-rate -- around half of its predecessors MPEG-2 and H.263.

Like its predecessors in the H.26x video codec family, H.264 has two encoding modes for individual video frames -- intra and inter. In the former, a frame of video is encoded as a stand-alone image without reference to other images in the sequence. In the latter, the previous and possibly future frames are used to predict the values. Figure 1 shows the high-level blocks involved in intra-frame encoding and decoding of H.264. Figure 2 shows the encoding and decoding process for inter frames.

Figure 1: Intra-Mode Encoding and Decoding in H.264

Whether in inter or intra frames, blocks in H.264 can be expressed relative to previous and subsequent blocks or frames. In inter frames, this is called "motion estimation" and is relative to blocks in other frames. This is the source of considerable compression. As with other video compression techniques, this exploits the fact that there is considerably less entropy in the difference between similar blocks than in the absolute values of the blocks. This is particularly true if the difference can be between a block and a constructed block at an offset from that block in another frame.

Figure 2: Inter-Mode Encoding and Decoding in H.264

H.264 has very flexible support for motion estimation. The estimation can choose from 32 other frames as reference images, and is allowed to refer to blocks that have to be constructed by interpolation.

The encoder is responsible for determining a reference image, block and motion vector. This block is generally chosen using some search among the possibilities, starting with the most likely options. The encoder then calculates and encodes the difference between previously encoded blocks and the new data.

On the decoding end, after decoding the reference blocks, the code adds the reference data and the decoded difference data. The blocks and frames are likely to be decoded in non-temporal order, since the frames can be encoded relative to forward-looking blocks and frames.

H.264 encoding supports sub-pixel resolution for motion vectors, meaning that the reference block is actually calculated by interpolating inside a block of real pixels. The motion vectors for luma blocks are expressed at quarter-pixel resolution, and for chroma blocks the accuracy can be eighth-pixel accuracy.

This sub-pixel resolution increases the algorithmic and computational complexity significantly. The decoding portion, which requires performing sub-pixel motion compensation only once per block, takes about 10 to 20 percent of decoding pipeline. The bulk of this time is spent interpolating values between pixels to generate the sub-pixel-offset reference blocks. The cost of performing sub-pixel estimation varies with the encoding algorithm, but may require performing motion compensation more than once.

The interpolation algorithm to generate offset reference blocks is defined differently for luma and chroma blocks. For luma, interpolation is performed in two steps, half-pixel and then quarter-pixel interpolation. The half-pixel values are created by filtering with this kernel horizontally and vertically:

[1 -5 20 20 -5 1]/32

Quarter-pixel interpolation is then performed by linearly averaging adjacent half-pixel values.

Motion compensation for chroma blocks uses bilinear interpolation with quarter-pixel or eighth-pixel accuracy, depending on the chroma format. Each sub-pixel position is a linear combination of the neighboring pixels.

Figure 3 illustrates which pixels are thus used for both interpolation approaches.

Figure 3: Sub-pixel Interpolation for Motion Compensation in H.264

After interpolating to generate the reference block, the algorithm adds that reference block to the decoded difference information to get the reconstructed block. The encoder executes this step to get reconstructed reference frames, and the decoder executes this step to get the output frames.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.