Xbox 360 GPU DetailsJason CrossClean Start
At a brief press conference during E3 2005 titled "The Xbox 360 Visual Experience," ATI took the opportunity to dish out a few more details about the graphics chip developed for Microsoft's new console. The company claimed it took about two years to design the chip, and that they started with "a clean sheet of paper," devoting themselves to creating the ideal console graphics chip without any considerations for turning this into a PC architecture or restricting themselves to a particular API.
The basic architecture looks like this:
It's actually kind of hard to describe exactly what is going on within the Xbox 360 GPU. In fact, it's confusing enough that even ATI had a hard time with the nomenclature when answering our questions.
As shader instructions and data enter the GPU, the sequencer takes vertex or pixel data (that has gone through a few set-up steps, like the Vertex Grouper or Scan Converter) and prepares it to be fed to a large array of Arithmetic Logic Units (ALUs). The goal of the sequencer is to keep all 48 ALUs busy 100% of the time in the most efficient way possible. It processes up to 64 threads at a time, with all the expected features of modern sequencers like predicated branching and such. Since there are 64 threads waiting to be processed, they don't have to worry about a "stalled" thread causing ALUs to sit idle. If a thread needs to wait for data, another thread can be processed and sent to the ALUs. ATI calls them "Perfectly Efficient" shaders.
The ATI-designed GPU is the main memory arbiter for the multicore Xbox 360 CPU. It is connected to the three-core CPU by a 22GB/sec bus, and to the SiS southbridge and I/O controller via a 2-lane PCI Express link. Besides the 10MB of Embedded DRAM (EDRAM), it has a 256-bit bus to 512MB of GDDR3 running at 700MHz, for a total bandwidth of 25.6 GB/sec. Due to the use of extremely fast "smart" EDRAM, ATI claimed "we have bandwidth to spare."
The 48 ALUs are divided into three SIMD groups of 16. When it reaches the final shader pipe, each of the 16 ALUs has the ability to write out two samples to the 10MB of EDRAM. Thus, the chip is capable of writing out a maximum of 32 samples per clock. At 500MHz, that means a peak fill rate of 16 gigasamples. Each of the ALUs can perform 5 floating-point shader operations. Thus, the peak computational power of the shader units is 240 floating-point shader ops per cycle, or 120 billion shader ops per second at 500MHz.
All 48 of the ALUs are able to perform operations on either pixel or vertex data. All 48 have to be doing the same thing during the same clock cycle (pixel or vertex operations), but this can alternate from clock to clock. One cycle, all 48 ALUs can be crunching vertex data, the next, they can all be doing pixel ops, but they cannot be split in the same clock cycle.
The 10MB of EDRAM is actually on a separate die, at least initially. As future process technologies become available, it is possible that it could be on the same piece of silicon as the GPU. Still, the EDRAM resides on the same package, and has a wide bus running at 2GHz to deliver 256GB/sec of bandwidth. That's a true 256GB/sec, not one of those fuzzy counting methods where the 256GB is "effective" bandwidth that accounts for all kinds of compression. The GPU writes the back buffer, Z buffer, and stencil buffer to the EDRAM. When it is finally able to drawn to the screen, the EDRAM transfers the back buffer to the 512MB of GDDR3 for scan-out. The EDRAM does not store any textures—the full 10MB gets pretty much filled up with 1280x720 HD resolution, including Z, stencil, and anti-aliasing sub-pixel samples.
There's even a little magic that happens at that phase. The EDRAM has built in logic to perform Z compare, alpha blending, and resolving anti-aliasing samples into pixels. Normally those operations happen on the GPU, and require not only valuable silicon real estate and on-chip caches, but eat into memory bandwidth as data has to go back and forth to the GPU from the main graphics RAM. ATI's solution of building that logic into the EDRAM where the back, Z, and stencil buffers live eliminates a lot of data transfer and save time and silicon space on the GPU die itself. Because of the bandwidth savings and absolutely massive bandwidth to EDRAM, the Xbox 360 should be able to perform frame buffer effects like motion blur, depth of field, or lens flare with incredible speed. Continued...
How big is the chip? ATI and Microsoft won't give out numbers, but told us "it's smaller than you think." Even running at 500MHz, it draws less than 35 watts of power when running full-bore. That's including the EDRAM. ATI has built pretty aggressive power management features into the chip, and can perform clock gating at both a "macro" and "micro" level, turning off large blocks of the chip or smaller logical units. It's even passively cooled—at least, there is no fan directly on the GPU. There is only a passive heat sink attached, which the fans on the back of the Xbox 360 draw air over and out of the box.
Readers may recall that during our Xbox 360 hardware interview, it was revealed that the system would only be initially available with composite, s-video, component, and VGA output. With no digital output, we wondered if the ATI chip even had a digital interface. It turns out that it does, and providing DVI or HDMI output is simply a matter of Microsoft wanting to make the dongle and support a market for those connections.
We asked about the manufacturability of the chip, too. 10MB of EDRAM is relatively easy to manufacture, even with a bit of logic in it, but new GPUs are always a bit tricky, especially with the move to a new manufacturing process. Microsoft intends to launch the Xbox 360 in North America, Japan, and Europe all within weeks of each other before Christmas of this year, so they're going to need to make a whole lot in a hurry. We were assured that the GPU is actually fairly easy to generate in mass quantities. Not only is it "smaller than you think," but there is some redundancy in the ALU array. If some of the ALUs fail upon testing, they can be disabled in the lithography process and still get a working graphics chip running at "full" power. Of course, even if 100% of the chip works, the redundant units will be disabled so that all Xbox 360s have exactly the same capabilities. ATI told us there is also some redundancy in their on-chip caches. Ultimately, they think it will be much easier to make the entire package, GPU and EDRAM, than most high-end PC graphics cards.
With ATI's claim that they designed the chip from a blank slate with no regard for PC graphics or any existing API, we wondered if the Xbox uses DirectX 9, or some variant, as its graphics API. We were told that it doesn't. The API is of course derived from DX9 and OpenGL as a starting point, but is specifically designed to adhere to the capabilities of the graphics chip and maximize its flexibility. We were assured that the API is not an early version of the next DirectX for PCs.
This is considerably different from the approach used in the first Xbox, where the overall architecture of all the components more closely resembled a PC. The Xbox 360 is another beast altogether. But as the Playstation 3 progresses, with its nVidia-developed graphics core, we'll no doubt see yet another variation on the seemingly never-ending battle of words and technology between ATI and nVidia.
Related Articles:
Copyright © 2005 Ziff Davis Media Inc. All Rights Reserved. Originally appearing in ExtremeTech.
|