Design

Multi-core MPEG-4 Video Encode Partitioning

By Laurent Bonetto, Ram Natarajan, and Dr. R K Singh and
Cradle Technologies, October 06, 2006

Partitioning a video-encoding algorithm onto a multi-core architecture can utilize a variety of techniques, including data partitioning and pipelining. Cradle Technologies explains them, and how to do MPEG-4 Baseline Profile implementation on their multi-core CT3600 processor family.

MPEG-4 Implementation on the CT3600
Now let's see how these procedures are mapped onto the CT3600 architecture.

Motion estimation partitioning
The memory requirements of the search area imposes some natural restrictions on the way ME needs to be partitioned on the CT3600 architecture. For example, processors working in parallel on the ME need to be assigned neighboring MBs in order to share the same search area and use the memory most efficiently. Neighboring groups of MBs on a same row need to be processed one at a time from left to right whenever possible in order to take advantage of the motion predictor vectors that were already computed for the closest MB. In the case of Cradle's implementation, each row of MB is divided in a handful of these groups, the exact number depending on the image resolution. The MB rows also need to be processed one row at a time in order to update the search area progressively using a circular buffer, and making only non-overlapping accesses to DRAM to minimize DRAM utilization. These various restrictions lead to a data partitioning approach for the ME procedure where a group of processors able to share the same local memory efficiently --processors belonging to the same Quad in the case of the CT3616 --work in parallel on contiguous groups of MBs belonging to the same row.

The amount of local memory that needs to be allocated for ME depends on the search area. For example, if the search range is set to 64 horizontally and 32 vertically, then the search area needs to contain five rows of macroblocks (80 pixel lines). The frame can be split into as many partitions as required based on the amount of local memory available. This is at the cost of increased bandwidth, since the overlapping search areas for vertical partitions need to be reloaded for every partition. In the current implementation, the number of vertical partitions is chosen dynamically during initialization given the amount of local memory ME can use.

The RISC controller for ME ensures that the search area and the current macroblock row are ready in local memory before the DSPs start working on a new macroblock row. The RISC processor uses the DMA controller to load both the search area and the current macroblock row in the background.

The motion vector data, the macroblock type, the motion compensated data, and other relevant information need to be communicated to the TE. This is accomplished by having each DSP store this information in the SDRAM using DMA transfers whenever appropriate.

Texture encoding
The TE works on data packets produced and stored by the ME in SDRAM. As previously explained, TE can be applied to multiple slices simultaneously since the local memory requirements for the TE are smaller.

The TE processing block itself is divided into two tasks following a functional partitioning approach. These tasks are the Pixel Processor Task (PPT) and the Entropy Coding Task (ECT). The PPT consists of computing DCT, Quantization and Inverse Quantization, Inverse DCT, and reconstruction. The PPT also stores the quantized DCT coefficients and other relevant information in SDRAM in what is called an ECT packet. The ECT processes the ECT packet and produces an MPEG-4 compliant bitstream. Each processor can be assigned to a PPT, which operates on an entire row of MBs, or an ECT, which operates on a slice. In the case of Cradle's implementation, a D1 frame is divided into four of these slices while a CIF frame maps directly into one slice. A PPT is pushed into the MTS PPT queue for every row of macroblock in the frame, and ECT is pushed into the MTS ECT queue for every slice of the frame. Several ECTs and PPTs can be processed in parallel provided that the relevant ECT packets are available.

These conditions allow for a dynamic partitioning approach where any DSP can process a new ECT or PPT as soon as the DSPs have completed the previous task they were assigned: this approach ensures proper load balancing since the large number of tasks keeps all DSPs busy continuously, thus taking full advantage of the processing resources available. This dynamic partitioning approach is implemented efficiently on the CT3600 by leveraging the Multi-Tasking Scheduler (MTS) tool that is part of Cradle's Software Development Kit: MTS provides software developers with a set of primitives for the RISC and DSP processors. The RISC processors act as controllers by defining, allocating, and controlling the order of execution of DSP tasks. The DSPs under the MTS framework run these tasks whenever they have completed the previous task that they were assigned. An important feature of MTS is that it offers a task switching mechanism with a low overhead, allowing work on relatively small tasks without incurring noticeable task switching overheads.

Next: Implementation Validation

Previous 1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design

Multi-core MPEG-4 Video Encode Partitioning

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Design

Multi-core MPEG-4 Video Encode Partitioning

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content