The Cg Programming Language

By John W. Ratclif, October 01, 2002

Cg, short for "C for graphics," is a high-level C-like language for graphics programming. John presents a virtual base called "CgBinding" that provides a framework to dynamically bind any Cg program to your application.

Oct02: The Cg Programming Language

John is a senior technology architect at Sony Online Entertainment. He can be contacted at [email protected].

Processors are evolving at a pace that leaves most programmers gasping for breath trying to keep up. Just by plugging a modern 3D graphics accelerator into your AGP slot, for instance, you add special-purpose coprocessors dedicated to performing advanced 3D graphics computations at a couple of trillion instructions per second. And it's not just one coprocessor—there are many, including those dedicated to performing 3D geometry calculations or manipulating individual pixels on the screen.

Furthermore, product generation cycles for graphics hardware are now only six months long; and with each new project, we see a two-fold increase in performance. Graphics processor performance is increasing at three times the rate of microprocessors. To cope with these hardware developments, APIs are changing nearly as fast with major new releases as often as once a year.

So how do software developers keep up with all this? How do we create software for dedicated coprocessors that operate on esoteric data types and contain instructions that have no correlation to any type of programming we've done in the past? How do we keep these coprocessors humming along in parallel without becoming stalled because applications aren't able to feed data and resources as fast as they are gobbled up?

To date, we've written assembly-language routines targeted to the instruction set of each processor. This works, but is tedious and time consuming. Furthermore, the hardware is sometimes so bleeding edge you can't get assemblers for it, much less a debugger or other basic development tools. Consequently, few of us have come close to taking advantage of the available processing power.

What we need is a way to create a hardware abstraction layer so that we can simply recompile code and it will automatically target new platforms. Developers have been doing this for decades with mainstream programming languages. Many modern languages are so hardware independent that they compile to virtual machine code; for instance, you don't need any specific hardware to run the same Java object code on any machine. And, in general, you need only recompile your C or C++ application to run it natively on any operating system or processor.

With this in mind, the engineers at Nvidia developed the Cg programming language. Cg, short for "C for graphics," is a high-level C-like language designed to provide the features we take for granted on microprocessors and extend them to graphics coprocessors. You can get the complete Cg toolkit from the Nvidia at http://developer.nvidia.com/. This includes the open-source compiler, run-time libraries, examples, and documentation. (There are also a number of developer sites devoted to sharing Cg shader techniques and resources; see http://cgshaders.org/, for instance.)

Although Cg is C-like, it is most definitely not C. Graphics processors operate on SIMD (Single Instruction Multiple Data) data types. When you perform math operations on the contents of two registers on a SIMD processor, you usually perform four operations simultaneously in a single instruction cycle. The size of a register is typically 128-bits long. Since the basic unit of operation is 128 bits representing four distinct values, you can see why modifications need to be made to the C language to interact with this type of data.

Another issue Cg addresses is that program fragments—not complete programs—usually run on coprocessors. Consequently, some kind of late binding must occur between the application software and program fragment. For lack of a better analogy, every vertex shader and pixel shader program you run is like a mini DLL that has to be dynamically linked to your app at run time, including passing data and register values at extremely high speed. The source code I provide here presents a virtual base called CgBinding that provides a framework to dynamically bind any Cg program to your application.

Cg Programs

Before PC-based 3D graphics cards were widely available, all geometry processing and rasterization was performed on the host CPU. Eventually, Microsoft introduced an API for 3D graphics with its DirectX 3.0 and, with the release of DirectX 5.0, provided fairly robust support to integrate 3D graphics hardware with application software. At that time, 3D graphics hardware simply handled rasterization, and geometry processing was still performed on the host CPU. Since rasterizing 3D graphics is CPU intensive, this was a benefit for graphics developers.

DirectX 7.0 had support for hardware-accelerated geometry processing—the technology identified most closely with the Nvidia GeForce1 graphics card. The goal in this case was to offload both geometry and pixel processing to the graphics card, achieving as much parallelism as possible to free up the host CPU so it could perform other operations.

The development of hardware-accelerated geometry processing was well received and raised the quality of 3D graphics on PCs. However, the geometry processing provided at that time (called the "fixed-function pipeline") was inflexible, thereby homogenizing the look-and-feel of graphics applications.

With the release of DirectX 8.0, a new era of graphics programming opened up. DirectX 8.0 provides a hardware-independent abstraction layer for both geometry and pixel programming. This specification includes a hardware-independent assembly language developed in cooperation with Microsoft and a consortium of 3D graphics hardware vendors (http://graphics3d.com/). The result are vertex shaders and pixel shaders.

With vertex shaders, you can go beyond the limitations of the fixed-function pipeline and effectively reprogram graphics processors to perform any kind of magic you can imagine. Likewise, pixel shaders let you create short programs that operate on individual pixels on the screen. Of course, taking advantage of this technology requires developing assembly-language routines, then tying them into an application. For instance, a vertex shader in assembly code is hardcoded to perform one very specific operation in one very specific way. Doing anything other than what it was originally designed to do usually requires rewriting the entire assembly-language routine.

Taking all of this into account—not to mention all the new hardware on the horizon—makes it clear that a better development tool is needed. This is where Cg comes into play.

Cg is a language specification. Nvidia engineers created it because they knew that if developers didn't have better tools, they would not be able to take full advantage of both current and future hardware. Any vendor can create a Cg compiler or debugger. A Cg compiler simply produces vertex shader and pixel shader assembly and object code. It is not tied to any specific set of libraries, API, language, or hardware. You can run shaders produced by Cg under Direct3D or OpenGL and you can load them in any programming language. That said, Nvidia does provide a set of libraries that let you embed the compiler into applications, making it possible to recompile and debug shaders interactively.

Microsoft maintains the vertex shader and pixel shader assembly-language specification for DirectX in consultation with the development community and all graphics hardware vendors. No one is required to use Cg, just as no one is required to use a C compiler. If you want to develop your next application in pure assembly code, nobody is stopping you (though your product might take longer to ship).

Vertex Shaders

A vertex shader accepts input data that usually represents a single vertex in a piece of geometry and transforms it into homogeneous clip-space coordinates to be handed off to the rasterization hardware. The most common operation is to take a position that represents a 3D object space coordinate and transform it through an object-view-projection matrix. The end result is a single homogenous clip-space coordinate that can be scan-converted by the next stage of the hardware pipeline. Example 1(a), Cg code that performs this operation, takes the input object space position (specified by I.position), multiplies it times the 4×4 object-view-projection matrix, and stores the result into the output position O.HPOS, to be handed off to the rasterization hardware. (The object-view-projection matrix is the 4×4 matrix that transforms an object into its position and orientation in the world, which is then multiplied times the view matrix that represent the position and orientation of the camera and, finally, multiplied by the projection matrix that defines the viewing frustum.)

Example 1(b) is the assembly-language instructions produced from Example 1(a). Example 1 performs four operations with each component of the input vertex against the input matrix that has been loaded into the constant registers c3 through c6. It then stores the intermediate results into register r0 and, at the end, moves it to the output homogeneous position register. If nothing else, the convenience of not having to keep track of which elements are in which registers—not to mention being able to deal with named variables—is a huge improvement.

While Example 1 is about the simplest useful vertex shader program you could write, vertex shaders get much more complex. Fortunately, you can rely on library routines that handle these common calculations with Cg.

In addition to outputting homogeneous clip-space coordinates, a vertex shader usually performs lighting calculations to produce the output color, and also manages texture coordinates and other iterated values to be used in the pixel processing stage.

Pixel Shaders

Pixel shaders accept the iterated values of the vertex program and describe the color of a single pixel sent to the frame buffer. During rasterization, the three vertices of a triangle are executed by the vertex shader to produce homogeneous clip-space coordinates that are then iterated and scan converted so that each individual pixel on the screen can be processed with the correct iterated values. This includes iterated color components as well as texture coordinates.

Pixel shader programs are currently small, typically no more than eight instruction cycles. The most common operations are texture fetches and blend operations. In the future, however, pixel shader programs could be thousands of instructions long. Obviously, you would not expect to achieve interactive frame rates this year with such complex pixel shaders (unless you drastically shrink your display window); however, it might become possible to produce film-quality renderings at hundreds of times the speed currently available. Once general-purpose programs of significant size can be run in hardware on a per pixel basis, the visual realism achieved will be amazing.

Example 2 is a basic pixel shader program written in Cg. The first line samples a 2D texture and stores the color into shadow. The next line samples a 2D texture and stores the result in spotlight. Next, the iterated color component calculated by the vertex shader program is loaded into the variable called lighting. The final step is to modulate the two textures with the lighting component and store the result into the output.

Running Cg Programs

Because vertex and pixel shader programs are only fragments, you need an API to load them and send them to the hardware. Both OpenGL and DirectX have APIs to perform these operations. The Cg programming toolkit comes with a set of libraries that let you easily load Cg programs, compile them on the fly, hand them off to either OpenGL or Direct3D, and dynamically bind to them.

One of the most challenging parts of integrating Cg programs with applications is run-time binding. While, as a language, Cg lets you name your input variables anything you want, there is a catch—every single named input to your Cg program implies a snippet of application code to run-time bind to that exact variable name.

Consequently, there are compelling reasons to standardize the process, even though it is convenient to be able to name your input variables anything you want. For instance, in reviewing a number of sample Cg programs, I found seven different name variations for the input variable world-view-projection matrix. This doesn't seem like a problem until you realize that you need a piece of run-time binding code to dynamically link that variable name to your application. This means that if you want to use a Cg program someone else wrote, you have to figure out how to remap all of their named variables to your application framework.

To simplify this process, the virtual base class CgBinding that I present here provides a standardized framework to bind applications to Cg programs. It operates through two sets of enumerations and four virtual methods. (The CgBinding source code is available electronically; see "Resource Center," page 5.) CgBinding requires DirectX 8.1 and the Cg programming toolkit to compile. The first enumerated type is called VertexBindingType, which is designed to represent every possible application bind point between a Cg vertex shader and any application. The second enumerated type is PixelBindingType, which does the same thing for Cg pixel shader programs. These are placed into two separate enumerations simply for clarity, since the run-time data needed for each type of shader is so distinct.

Two methods are provided in the base class CgBinding to translate an ASCII string into an enumerated type—AsciiToVertexBindingType and AsciiToPixelBindingType. I have reviewed all of the sample Cg programs released to date and added an ASCII alias for every possible type. For example, if you refer to your world-view-projection matrix by any one of the seven names I have encountered so far, then it automatically gets remapped to the logical binding operation VBT_WORLDVIEWPROJ.

Whenever you want to create a new alias, you simply add it to one of these routines. If you want to add a new operation, you add it first to the enumeration and then next to the translation routine. The benefit of using this virtual base class is that you can easily locate all of your named bindings for all of your shaders in one clear and distinct location.

The application that inherits this base class must provide four virtual methods. The first method simply asks the application whether it knows how to handle a particular operation. The second method provides the current data value for that item. These methods are: ValidateVertexBindingType, ValidatePixelBindingType, GetVertexBindingType, and GetPixelBindingType. Example 3 shows all the calls necessary for your application to support a Cg program that requires only a world-view-projection matrix. Example 4 presents the virtual methods that perform the run-time binding.

DDJ

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.