Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Inside Xbox Graphics


Aug00: Inside Xbox Graphics

Michael is a programmer at Microsoft where he's recently been working on Xbox graphics. He can be contacted at [email protected].


Xbox Memory Bandwidth

I don't know exactly when it was that I realized I liked programming and computers more than most people did, but I know precisely when my wife figured it out. In 1980, I was working at a small, cash-starved consulting company that had a contract to put together energy-analysis software. Unfortunately, we couldn't afford mainframe time to develop the program, so I bought a CP/M machine, set it up in my apartment, and started coding. It worked great, and it was a blast having a computer that was all mine, every cycle, right down to the metal.

After a few months, a deadline came up, and my friend Mike Koved and I worked late into the night on a simulation. After many hours, we started printing the results -- only to find that the printer was horribly loud in the late-night stillness. Afraid we would wake the neighbors, we unzipped a sleeping bag and hunched over the printer with it draped over our backs, creating a little hut to muffle the printer.

It didn't work very well, of course, and it wasn't long before my wife woke up, looked at the buzzing mound of down-filled nylon in the living room, glanced at the clock, which read 4 am, said, "You're nuts," and went back to sleep. After that, she never had any doubt where my future lay; a year later, when I suggested spending half of our life savings on an IBM PC so I could write a game for it, she never batted an eye.

The interesting thing to me is that this story is kind of like buggy-whip obsolescence on fast forward. When I tell it now, 20 years later, half the time people don't understand why the printer was a problem. You see, the offending printer was a dot-matrix, and now there's a whole generation that's never known anything but laser and inkjet printers -- so they've never had a printer that made any noise. Things change fast in the computer industry, to say the least.

Xbox Graphics

That story is as good a way as any to introduce Xbox graphics, a project I've been working on for the past half-year at Microsoft. Forget about 20 years -- just four years ago, when Quake shipped, the state of the art for consumer 3D was a software rasterizer running on a Pentium Pro. In 2001, Microsoft will ship Xbox, a game console featuring an Intel PIII/733 with 128-KB cache, 64 MB of RAM, a DVD, hard disk, custom sound processor with 64 3D-audio channels, and an as-yet-unnamed NVIDIA graphics processing unit (GPU) with capabilities literally several orders of magnitude better than Quake offered five years earlier. Xbox will have hardware transformation, clipping, lighting, curved surfaces, programmable pixel and vertex shaders, full-screen antialiasing, and the ability to pump out an astonishing number of triangles.

I'll look at each of these in turn, but raw triangle throughput is the foundation on which everything else rests, so I'll start there. For context, Quake on a Pentium Pro pumped out maybe 100K triangles/second (tris/sec.) and 10M pixels/second (pix/sec.) at best. Shortly thereafter, the first hardware accelerators upped the ante to roughly 1M tris/sec. and 100M pix/sec.

Today, both triangles and pixels per second are hard to capture with a single number. Triangle rate varies with many factors, such as number of textures and type of primitive. Fill rate is even harder to nail down, due to z and occlusion techniques that save varying amounts of work under different circumstances. Matters get even more complicated with antialiasing, which is evaluated in terms of how many samples would have to be drawn with box-filtered supersampling to produce a visually equivalent result, a measure that's difficult to quantify. Finally, Xbox is not yet a shipping product, and the final graphics clock speed is still being nailed down, although it'll definitely be between 250 and 300 MHz.

Having said all that, the Xbox GPU will be able, even at 250 MHz, to handle up to 125 million Gouraud-shaded, two-texture triangles per second, complete with transformation, clipping, and perspective projection. With one infinite hardware light added, the rate will be at least 62.5 Mtris/sec.; with eight local lights, at least 8 Mtris/sec.

The GPU's raw fill rate at 250 MHz will be 1 Gpix/sec. To provide some context for that number, at a resolution of 640×480, it is sufficient to draw every pixel on the screen more than 50 times per frame, at a frame rate of 60 Hz. (TV is the normal Xbox output device, although HDTV and VGA monitors are supported. TV is interlaced, but Xbox will normally render noninterlaced, then filter the interlaced fields from the full frame, so I'll use full 640×480 TV-frame resolution in this article.) There's also occlusion-detection circuitry that can increase fill rate by up to 4X; the effect varies depending on whether pixels are occluded when they're drawn, but tends to be greatest exactly when it's needed most -- when there's a lot of overdraw. Finally, antialiased drawing can produce still higher equivalent fill rates. Those numbers are of interest mainly for marketing purposes; from a developer's perspective, the important stat is that the GPU will support antialiased TV rendering at a full 60 Hz, even with 10-50X overdraw of every pixel on the screen.

To put that into easily visualized terms, Quake on a Pentium Pro could draw about three lit player models per frame at 640×480, 60 Hz. Xbox should be able to draw roughly 3000 of the same models per frame, even with a second texture, pixel shading, and antialiasing thrown in for good measure.

Lies, Damn Lies, and Graphics Chip Specs

At this point, you might reasonably be wondering how much of the theoretical performance of the GPU Xbox games will actually be able to achieve. There are several factors that can potentially keep the GPU from running at full speed, but a surprisingly large fraction of theoretical performance really will be available.

Memory size and performance, CPU performance, and bus bandwidth are all potential bottlenecks; see Figure 1. It's certainly true that it will be a challenge to fit databases for scenes with more than a million visible triangles into Xbox's 64 MB of memory; however, texture and vertex compression and reuse, as well as curved surfaces, will help considerably. It's equally true that the CPU can't support anything like full GPU performance, particularly because it has only 1 GBps front-side-bus (FSB) bandwidth to memory. At roughly 26 bytes/tri., that's only enough bandwidth for 38 Mtris/sec. at best. However, in the highest-performance applications, most of the triangle data won't be coming from the CPU; it will consist of static meshes pulled directly from memory by the GPU. Don't be confused by the term "static" -- these meshes are processed according to the current transform, complete with skinning, so they can readily support animation of jointed models. It is with this sort of rendering that Xbox will really shine: Think of the droid army in The Phantom Menace, with thousands of entities, each drawn from the same set of jointed elements, but with different shading and positioning of those elements.

That leaves one potential bottleneck, and it's a big one -- memory bandwidth. Xbox has a unified memory architecture (UMA), whereby the GPU and CPU share a single memory space, with memory control provided by the GPU. This is in marked contrast to the separate memories used for high-performance 3D in PCs. UMA has a significant advantage in that it allows the CPU, DVD and disk controllers, and GPU to access common data without copying; for example, models and textures can be streamed off the DVD into memory and used directly by the GPU. However, the history of UMA is spotty; witness IBM's PCjr UMA, which stopped the CPU virtually dead in its tracks by allotting two out of every three memory cycles to graphics. Not surprisingly, this is the aspect of Xbox that has aroused the greatest degree of public skepticism, so the accompanying text box entitled "Xbox Memory Bandwidth" discusses Xbox's memory bandwidth in high-end scenarios. The short version of the bandwidth story is that while there are scenarios in which Xbox could run out of bandwidth, there should be more than enough for most cases, particularly those that leverage the GPU's programmable pipeline. Under virtually any set of assumptions, Xbox has adequate memory bandwidth to handle 50 Mtris/sec. in real-world use, and usually plenty to hit the pipeline limits of the chip.

Xbox also offers support for 125 million particles -- single-color, front-facing squares -- per second. The particle control structure is considerably smaller than that of a triangle, letting the CPU generate many more particles per frame than would be possible with triangles. This makes particles well suited for procedural effects such as smoke, fire, and flowing and splashing liquid.

The bottom line is that Xbox is a well-balanced graphics system that will generally be capable of approaching the specs being claimed for it. "Capable" is not the same as "easy" -- this level of performance will require exemplary programming, taking full advantage of the fact that Xbox is a fixed platform to which code can be carefully tuned. Microsoft will encourage this by providing plenty of sample code and documentation showing how to push the hardware to its limits, via a version of DirectX 8 optimized and extended to support every feature of Xbox. For those who prefer OpenGL, NVIDIA will provide a fully Xbox-enhanced version.

Lots of Triangles -- And Good-Looking Ones, Too

When it comes to triangles, quantity matters a lot, but quality matters as well. Xbox offers a number of quality-enhancing features, such as perspective-correct interpolation of all attributes, w-buffering, and shaders. The quality feature that will surely have the broadest and most visible impact, however, is full-screen antialiasing, which addresses the most glaring rendering artifact of the current generation of graphics hardware: crawling jaggies along the edges of objects.

When enabled, Xbox's antialiasing will automatically be performed for the entire scene, writing either two or four samples per pixel (whichever the programmer chooses), with no game programming changes required. The impact is greater than suggested by the relatively modest increase in sample rate; the set of samples per pixel and the filtering that converts the samples to pixels, taken together, produce results substantially better than two- or four-sample box filtering.

There are three costs to using antialiasing: larger pixel and z buffers; the need for a filtering blt from the pixel buffer to the frame buffer; and a performance cost because more pixel and z samples need to be handled per pixel. As noted in "Xbox Memory Bandwidth," however, the cost of the pixel buffer can be cut in half by using 16-bpp pixels, with no appreciable loss in quality, so for two-sample antialiasing, the memory cost is only 1.2 MB; for four-sample antialiasing, it's 4.8 MB. The filtering blt takes less than 4 percent of total memory bandwidth at most. As for rendering performance, the pipeline can actually process antialiased pixels just as fast as nonantialiased. The only cost from the additional samples is memory bandwidth, and, as "Xbox Memory Bandwidth" shows, pixel and z accesses are not the largest consumers of bandwidth. The upshot is that there will be only a modest slowdown associated with antialiasing, especially the two-sample version.

Not Your Father's Pipeline

The final major feature of the GPU is a new kind of graphics pipeline, featuring far more programmability than the pipelines we're used to. There's no way I can discuss it in detail here, but I'll provide an overview, and point you to sources for more information.

Figure 2 shows the six major components of the pipeline. Of these, only the alpha blender and rasterizer match the current generation of graphics hardware. The remaining four components constitute a significant break with the past, mirroring the radical changes in DirectX 8 from DX 7.

The GPU is DX 8 compliant (with some extra features tossed in), so the DX 8 specification is an excellent reference for the chip, especially the new components. As I write this, DX 8 is in beta and can only be obtained under nondisclosure agreement, but it should become widely available sometime soon; check out http://msdn .microsoft.com/ directx/. Also, the GPU has the same pipeline features as NVIDIA's upcoming next-generation PC chip, so information about that chip (at http://www .nvidia.com/ developer/) will be applicable to Xbox as well. There's already considerable information on the aforementioned web site about GeForce and GeForce 2 GTS, which contain a subset of the Xbox GPU's pixel and vertex shaders.

Starting with the first new part of the pipeline, the surface engine is pretty much what you'd expect. It works with the CPU to tessellate high-order surfaces such as Catmull-Clark, Bezier, Loop, and uniform B-splines, at a maximum rate of at least 50 Mtris/sec.

The vertex shader is a programmable SIMD engine with storage for 192 quadwords of data and 128 app-downloadable instructions, which specify the program to be executed on each vertex. The vertex shader can do standard transformation, clipping, and lighting, but that's just the start. Some other applications include vertex blending, morphing, animation, skinning, elevation fog, mesh warping (procedural waves, for example), procedural texture-coordinate generation (such as reflection maps), and arbitrary lighting techniques.

The texture unit supports up to four textures per pixel. Two textures can be handled in a single clock; three or four textures take two clocks. 3D textures and cube maps are also supported.

The radical feature of the texture unit is that the output of one texture can serve as a coordinate for another texture, effectively allowing a texture to be used as a one-, two-, or three-dimensional lookup table. Cube maps can be used to perform additional computations as part of the lookup; for example, a texture can be used in conjunction with a cube map to normalize vectors to unit length, which is very handy for lighting via dot products.

You can think of the pixel shader as a compact but powerful programmable pixel processor that runs a nine-instruction program on each pixel. It consists of eight register combiners cascaded together, with each taking inputs from up to four textures (which themselves may have been cascaded together as noted earlier), and also from constants, interpolated values, and scratch registers. I'd love to show you a figure that sums up the pixel shader, but unfortunately it's just too complicated for that; however, there are already several relevant papers, with a good set of figures, on http:// www.nvidia.com/developer/, and more to come. Many of the registers are writable by the combiners, so combiners can send outputs to later combiners through the registers, which thereby represent the state of each pixel as it passes through the pixel shader. Each combiner takes up to four RGB inputs, and can perform two three-element multiplies or dot products, add the two products, and output up to three values: the overall result and the results of the two multiplies. (A limited form of conditional output is also possible.) There's a parallel, separately configurable set of eight alpha combiners. Which combiners are active, what they do, and where their data comes from and goes to is fully programmable. The pipeline runs at full speed with up to two active combiners, takes two cycles with up to four combiners, three cycles up to six combiners, and four cycles above that.

Lastly, the final combiner performs a similarly programmable operation on the output of the last active combiner stage, and sends the result to the alpha blender. However, this stage has additional components for specular and fog.

If this sounds weird, well, it is. It's a far cry from the fixed-function pipelines we're used to, and it's not immediately obvious how all this programmability can be applied. For starters, though, note that the programmable pipeline supports all the capabilities of DX 7, so it's a superset of the standard pipeline. It's also capable of supporting stuff like diffuse and reflective bump mapping, shadow maps, fresnel effects, Phong shading, z-correct bump mapping, and even some forms of BRDF shading, which makes it possible to do sophisticated materials such as velvet, wood, and aluminum. Take a look at the GeForce samples on NVIDIA's web site; you'll be astonished at what even a small subset of the Xbox GPU's pipeline can do. NVIDIA suggests that you stop thinking of parts of the pipeline as "color interpolators," "bump mappers," and the like, and start thinking of the pipeline as a set of resources that can be combined to enable a whole new set of capabilities. As for exactly what those might be, no one really knows, and that's the most exciting part of all. For several years now, fixed-function hardware has restricted developers' flexibility and creativity in 3D rendering, but suddenly there's a whole new world to explore, and a new set of tricks to figure out.

Looking Ahead

The best thing about Xbox is that it won't change. Ever. Judging by other consoles, Xbox should have a four or five year run, and wonderful as the profusion of constantly evolving PC technology is, I love the idea of being able to spend years working with a high-powered, fixed platform -- figuring out how to apply all that programmability, understanding the performance quirks, getting things right.

It's a long way from hunching over a dot-matrix printer. Still, there's a key similarity. Twenty years ago, I loved the feeling of having all that computer power at my command, and the part that really rocked was figuring out how to use it -- all of which goes double for Xbox.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.