Design

Multi-core MPEG-4 Video Encode Partitioning

By Laurent Bonetto, Ram Natarajan, and Dr. R K Singh and
Cradle Technologies, October 06, 2006

Partitioning a video-encoding algorithm onto a multi-core architecture can utilize a variety of techniques, including data partitioning and pipelining. Cradle Technologies explains them, and how to do MPEG-4 Baseline Profile implementation on their multi-core CT3600 processor family.

Efficient Multi-Core Partitioning
Efficient partitioning of complex algorithms such as video encoders requires a combination of the two partitioning techniques described above, and the ability to assign tasks to processors at run time instead of compile time whenever appropriate. To overcome the limitations of data partitioning, the granularity of the blocks being processed by individual processors needs to be smaller than a slice, which introduces data dependencies that need to be dealt with. This granularity level may be at macroblock (MB) level or the level of a small group of MBs. Bringing down the granularity of data partitioning to a finer level and combining it with data pipelining creates a large number of individual tasks. This large number of tasks that can be allocated to processors at run time is the key to an efficient use of multi-core architecture resources.

Many challenges are in the way of this approach. How do you define tasks to minimize data dependencies? How do you decide in which order tasks need to be processed to ensure that there will always be new tasks available when a processor becomes available and despite the fact that the processing requirements of some of the tasks may vary drastically with the data being processed? How do you ensure that the task-switching overhead -- the time spent between when a processor completes a task and starts the next task -- remains small? How do you ensure that this partitioning approach is scalable so that you can assign a variable number of processors to one algorithm depending on the other algorithms running in parallel and the respective processing requirements? How do you ensure that each processor has enough fast memory available to process their tasks efficiently given that some tasks have much higher memory requirements than others and that the amount of fast memory is limited and shared across many processors?

The answers to many of these questions depend on the application being targeted, the multi-core architecture being used, and the software libraries and tools provided by the processor vendor to develop and debug code running on that architecture. In this article we focus on how Cradle implemented the MPEG-4 encoder on the CT3600 chip and provide elements of answers to many of these questions. We start with a brief overview of Cradle CT3600 architecture and the structure of video encoders like MPEG-4. We then discuss in detail how the MPEG-4 encoder was partitioned on the CT3600 architecture.

The CT3600 MDSP family
The Cradle CT3600 family of Multi-core DSP (MDSP) processors is a family of heterogeneous multi-core chips, accompanied by an easy-to-use multi-core programming system that comprises development, debug and profiling capabilities. One platform can be reprogrammed for any or all of the multi-channel, multi-application products.

The Cradle CT3600 architecture has up to 8 RISC processors and 16 DSPs. It is a shared data memory architecture with all elements having their own instruction memory and 32-bit wide register files. Cradle defines a group of 4 RISC processors as a Quad. Associated with a Quad are 8 DSPs, 128k bytes of shared data memory and nine 8-bit Programmable I/O Ports, each embedding a CPLD and state machine (Figure 1).

Figure 1: CT3616 architecture block diagram

Global resources include a PCI Bus interface and DDR-SDRAM controller with multiple DMA channels, Global Semaphores and bus-performance monitors.

Co-designed with the processor architecture is the Cradle SDK -- a multi-core simulator and debugger Software Development Kit. All 24 processors and all I/Os can either be simulated or accessed in the hardware directly through a JTAG or PCI interface.

Next: MPEG-4 Encoder Structure

Previous 1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design

Multi-core MPEG-4 Video Encode Partitioning

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Design

Multi-core MPEG-4 Video Encode Partitioning

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content