Comparison of Various GPUs
Presented By Group No 1
Bashir Rahmani |
11811314 |
Mohit
Mawal |
11811101 |
Sanket Bendrey |
11810295 |
Sandesh Mankar |
11810326 |
Hasibullah Nuristani |
11811315 |
What is GPU’s?
With the emergence of extreme
scale computing, modern graphics processing units (GPUs) have been widely used
to build powerful supercomputers and data centers. With large number of
processing cores and high-performance memory subsystem, modern GPUs are perfect
candidates to facilitate high performance computing (HPC).
What Does a GPU Do?
The graphics processing unit, or GPU, has become one of the most important
types of computing technology, both for personal and business computing.
Designed for parallel processing, the GPU is used in a wide range of
applications, including graphics and video rendering. Although they’re best
known for their capabilities in gaming, GPUs are becoming more popular for use
in creative production and artificial intelligence (AI).
GPUs were originally designed to accelerate the rendering of 3D graphics.
Over time, they became more flexible and programmable, enhancing their
capabilities. This allowed graphics programmers to create more interesting
visual effects and realistic scenes with advanced lighting and shadowing
techniques. Other developers also began to tap the power of GPUs to
dramatically accelerate additional workloads in high performance computing
(HPC), deep learning, and more.
Types of
GPU’S?
On broder category
there are two types of Graphic processing units
(1)
Integrated
Graphics Processing Unit
The majority of GPUs
on the market are actually integrated graphics. So, what are integrated
graphics and how does it work in your computer? A CPU that comes with a fully
integrated GPU on its motherboard allows for thinner and lighter systems,
reduced power consumption, and lower system costs.
Intel® Graphics Technology, which includes Intel® Iris® Plus and Intel® Iris® Xe graphics, is at the forefront of integrated graphics technology.
With Intel® Graphics, users can experience immersive graphics in systems that
run cooler and deliver long battery life.
(2)
Discrete
Graphics Processing Unit
Many computing
applications can run well with integrated GPUs. However, for more
resource-intensive applications with extensive performance demands, a discrete
GPU (sometimes called a dedicated graphics card) is better suited to the job.
These GPUs add processing power at the cost of additional energy consumption and heat creation. Discrete GPUs generally require dedicated cooling for maximum performance.
We
concentrate on two recently released GPUs: an
Nvidia GeForce GTX 580 (Fermi) and an ATI Radeon HD 5870 (Cypress), and
compare their performance and power consumption features.
By running a set of
representative general-purpose GPU (GPGPU) programs, we demonstrate the key
design difference between the two platforms and illustrate their impact on the
performance.
The first architectural
deviation between the target GPUs is that the ATI GPUs adopt very long
instruction word (VLIW) processors to carry out multiple operations in a single
VLIW instruction to gain an extra level of parallelism over its
single instruction multiple
data (SIMD) engines.
Typically, in an n-way VLIW
processor, up to n independent instructions can be assigned to the slots and be
executed simultaneously. Obviously, if the n
slots can be filled with valid instructions, the VLIW architecture can execute n operations per VLIW instruction. However,
this is not likely to always happen because the compiler may fail to find
sufficient independent instruction.
The second
major difference between two GPUs exists in the memory subsystem. Inherent from the graphics
applications, both GPUs have
separate global memories located
off-chip for the global, private
(referred as local in Nvidia GPU), texture, and constant data. They also have
fast on-chip local memory (called shared memory in Nvidia and local data share
in ATI) and caches for the texture and constant data. The Nvidia Fermi
introduces new L1 and L2 caches for caching both global and local data that are
not allowed in Radeon HD 5870. In the GTX 580, the L1 cache and shared memory
can be configured to two different size combinations. The L1 cache can also be
disabled by setting a compiler flag. All off-chip memory accesses go through
the L2 in GTX 580. Given the additional L1 and L2 caches for global and local
data, we will investigate and compare
the performance of the memory system of the target GPUs.
Thirdly, power consumption and energy efficiency stand as a first-order concern in
high performance computing areas. Due to the large amount of transistors
integrated on chip, a modern GPU is likely to consume more power than a typical
CPU. The resultant high power consumption tends to generate substantial heat
and increase the cost on the system cooling, thus mitigating the benefits
gained from the performance boost. Both
Nvidia and ATI are well aware of this issue and have introduced effective techniques to trim
the power budget of their products. For instance, the PowerPlay technology is
implemented on ATI Radeon graphics cards, which significantly drops the GPU
idle power.
Similarly, Nvidia use the
PowerMizer technique to reduce the power consumption of its mo-
bile GPUs. In this paper, we
measure and compare energy efficiencies of these two GPUs for further
assessment.
Table: System Information
Fermi Architecture
Fermi is the latest generation
of CUDA-capable GPU architecture introduced by Nvidia. Derived from prior
families such as G80 and GT200, the Fermi architecture has been improved to
satisfy the requirements of large scale computing problems. The GeForce GTX 580
used in this study is a Fermi-generation GPU.
The major component
of this device is an array of streaming multi-processors (SMs),
each of which contains 32 Streaming Processors (SPs, or CUDA cores).
There are 16 SMs on the chip with a total of 512 cores integrated in the GPU.
Within a CUDA core, there exist a fully pipelined integer ALU and a floating point unit (FPU).
In addition, each SM also includes
four special function units (SFU) which are capable of executing
transcendental operations such as sine, co-sine, and square root.
Cypress Architecture
Cypress is the codename of the
ATI Radeon HD 5800 series GPU. In general, it is composed of 20 Compute Units
(CUs), which are also referred as Single-Instruction-Multiple-Data (SIMD) computation
engines, and the underlying memory hierarchy. Inside an SIMD
engine, there are 16 thread processors (TP) and a 32KB local data share.
Basically, an SIMD engine is similar to a stream multiprocessor (SM) on an
Nvidia GPU while the local data share is equivalent to the shared memory on an
SM. Note that on the Radeon HD 5870 GPU, there is an 8KB L1 cache on each SIMD
engine and a 512KB L2 cache shared among all compute units. However, these
components function differently from the caches on the Fermi GPU in that they
are mainly used to cache image objects. In this paper, we use the
term HD 5870, Cypress GPU, and
ATI GPU interchangeably.
Conclusion:
To conclude, discrete graphics cards are
standalone graphics processors connected to the motherboard via the PCIe slot.
They provide cutting-edge
rendering technology in
real-time and a plethora of other features. Some of these include streaming 4K and 8K videos
and games, as well as VR.
While integrated graphics have displayed major
improvement in recent years, they are still better suited for light daily use.
Discrete graphics cards have the power to make complex graphical tasks look
sleek, making them a better option for gaming, video
editing, and game development.
references:
https://www.techtarget.com/searchvirtualdesktop/definition/GPU-graphics-processing-unit
https://en.wikipedia.org/wiki/Graphics_processing_unit
Comments
Post a Comment