Radeon R520
ATI's "R520" core (codenamed Fudo) is the foundation for a line of DirectX 9.0c 3D accelerator X1000 video cards. It is ATI's first major architectural overhaul since the "R300" core and is highly optimized for Shader Model 3.0. The Radeon X1000 series using the core was introduced on October 5, 2005. ATI has released the successor to the R500 series with the R600 series on May 14, 2007.
The R520 core architecture is referred to by ATI as an "Ultra Threaded Dispatch Processor". This refers to ATI's plan to boost the efficiency of their core, instead of going with a brute force increase in the number of processing units. A central pixel shader "dispatch unit" breaks shaders down into threads (batches) of 16 pixels (4x4) and can track and distribute up to 128 threads per pixel "quad" (4 pipelines each). When one of the shader quads becomes idle, due to a completion of a task or waiting for other data, the dispatch engine will assign the quad with another task to do in the meantime, with the overall result being a greater utilization of the shader units, theoretically. With such a large number of threads per "quad", ATI created a very large general purpose register array that is capable of multiple concurrent reads and writes and has a high-bandwidth connection to each shader array. This provides temporary storage necessary to keep the pipelines fed by having work available as much as possible. With chips such as RV530 and R580, where the number of shader units per pipeline triples, the efficiency of pixel shading drops off slightly because these shaders still have the same level of threading resources as the less endowed RV515 and R520.
The next major change to the core is with its memory bus. R420 and R300 had nearly identical memory controller designs, with the former being a bug fixed release designed for higher clock speeds. R520, however, differs with its central controller (arbiter) that connects to the "memory clients". Around the chip there are two 256-bit ring buses running at the same speed as the DRAM chips, but in opposite directions to reduce latency. Along these ring buses are 4 "stop" points where data exits the ring and going into or out of the memory chips. There is actually a fifth stop, one that is significantly less complex, designed for the PCI Express interface and video input. This design allows memory accesses to be far quicker though lower latency by virtue of the smaller distance the signals need to move through the GPU, and by increasing the number of banks per DRAM. Basically the chip can spread out memory requests faster and more directly to the RAM chips. ATI claims a 40% improvement in efficiency over older designs. Again, the smaller cores such as RV515 and RV530 receive cutbacks due to their smaller, less costly designs. RV530, for example, has two internal 128-bit buses instead. This generation has support for all recent memory types, including GDDR4. In addition to ring bus, each memory channel now has the granularity of 32-bits, which improves memory efficiency when performing small memory requests.
The vertex shader engines were already of the required FP32 precision in ATI's older products. Changes necessary for SM3.0 included longer instruction lengths, dynamic flow control instructions, with branches, loops and subroutines and a larger temporary register space. The pixel shader engines are actually quite similar in computational layout to their R420 counterparts, although they were heavily optimized and tweaked to reach high clock speeds on the 90 nm process. ATI has been working for years on a high-performance shader compiler in their driver for their older hardware, so staying with a similar basic design that is compatible offered obvious cost and time savings.
At the end of the pipeline, the texture addressing processors are now decoupled from pixel shader, so any unused texturing units can be dynamically allocated to pixels that need more texture layers. Other improvements include 4096x4096 texture support and ATI's 3Dc normal map compression sees an improvement in compression ratio for more specific situations.
The R5xx family introduced a more advanced onboard motion-video engine. Like the Radeon cards since the R100, the R5xx can offload almost the entire MPEG-1/2 video pipeline. The R5xx can also assist in Microsoft WMV9/VC-1 and MPEG H.264/AVC decoding, by a combination of the 3D/pipeline's shader-units and the motion-video engine. Benchmarks show only a modest decrease in CPU-utilization for VC-1 and H.264 playback.
As is typical for an ATI video card release, a selection of real-time 3D demonstration programs were released at launch. ATI's development of their "digital superstar", Ruby, continued with a new demo named The Assassin. The demo showcased a highly-complex environment, with high dynamic range lighting (HDR) and dynamic soft shadows. Ruby's latest nemesis, Cyn, was composed of 120,000 polygons.
The cards support dual-link DVI output and HDCP. However, using HDCP requires external ROM to be installed, which were not available for early models of the video cards. RV515, RV530, RV535 cores include 1 single and 1 double DVI link; R520, RV560, RV570, R580, R580+ cores include 2 double DVI links.
0 comments:
Posting Komentar