Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Multi-core MPEG-4 Video Encode Partitioning




MPEG-4 Implementation on the CT3600
Now let's see how these procedures are mapped onto the CT3600 architecture.

Motion estimation partitioning
The memory requirements of the search area imposes some natural restrictions on the way ME needs to be partitioned on the CT3600 architecture. For example, processors working in parallel on the ME need to be assigned neighboring MBs in order to share the same search area and use the memory most efficiently. Neighboring groups of MBs on a same row need to be processed one at a time from left to right whenever possible in order to take advantage of the motion predictor vectors that were already computed for the closest MB. In the case of Cradle's implementation, each row of MB is divided in a handful of these groups, the exact number depending on the image resolution. The MB rows also need to be processed one row at a time in order to update the search area progressively using a circular buffer, and making only non-overlapping accesses to DRAM to minimize DRAM utilization. These various restrictions lead to a data partitioning approach for the ME procedure where a group of processors able to share the same local memory efficiently --processors belonging to the same Quad in the case of the CT3616 --work in parallel on contiguous groups of MBs belonging to the same row.

The amount of local memory that needs to be allocated for ME depends on the search area. For example, if the search range is set to 64 horizontally and 32 vertically, then the search area needs to contain five rows of macroblocks (80 pixel lines). The frame can be split into as many partitions as required based on the amount of local memory available. This is at the cost of increased bandwidth, since the overlapping search areas for vertical partitions need to be reloaded for every partition. In the current implementation, the number of vertical partitions is chosen dynamically during initialization given the amount of local memory ME can use.

The RISC controller for ME ensures that the search area and the current macroblock row are ready in local memory before the DSPs start working on a new macroblock row. The RISC processor uses the DMA controller to load both the search area and the current macroblock row in the background.

The motion vector data, the macroblock type, the motion compensated data, and other relevant information need to be communicated to the TE. This is accomplished by having each DSP store this information in the SDRAM using DMA transfers whenever appropriate.

Texture encoding
The TE works on data packets produced and stored by the ME in SDRAM. As previously explained, TE can be applied to multiple slices simultaneously since the local memory requirements for the TE are smaller.

The TE processing block itself is divided into two tasks following a functional partitioning approach. These tasks are the Pixel Processor Task (PPT) and the Entropy Coding Task (ECT). The PPT consists of computing DCT, Quantization and Inverse Quantization, Inverse DCT, and reconstruction. The PPT also stores the quantized DCT coefficients and other relevant information in SDRAM in what is called an ECT packet. The ECT processes the ECT packet and produces an MPEG-4 compliant bitstream. Each processor can be assigned to a PPT, which operates on an entire row of MBs, or an ECT, which operates on a slice. In the case of Cradle's implementation, a D1 frame is divided into four of these slices while a CIF frame maps directly into one slice. A PPT is pushed into the MTS PPT queue for every row of macroblock in the frame, and ECT is pushed into the MTS ECT queue for every slice of the frame. Several ECTs and PPTs can be processed in parallel provided that the relevant ECT packets are available.

These conditions allow for a dynamic partitioning approach where any DSP can process a new ECT or PPT as soon as the DSPs have completed the previous task they were assigned: this approach ensures proper load balancing since the large number of tasks keeps all DSPs busy continuously, thus taking full advantage of the processing resources available. This dynamic partitioning approach is implemented efficiently on the CT3600 by leveraging the Multi-Tasking Scheduler (MTS) tool that is part of Cradle's Software Development Kit: MTS provides software developers with a set of primitives for the RISC and DSP processors. The RISC processors act as controllers by defining, allocating, and controlling the order of execution of DSP tasks. The DSPs under the MTS framework run these tasks whenever they have completed the previous task that they were assigned. An important feature of MTS is that it offers a task switching mechanism with a low overhead, allowing work on relatively small tasks without incurring noticeable task switching overheads.

Next: Implementation Validation


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.