Nvidia takes the L as they admit RTX flop in sells expectations!

  • 53 results
  • 1
  • 2
Avatar image for m3dude1
m3dude1

2334

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#51  Edited By m3dude1
Member since 2007 • 2334 Posts

@ronvalencia: i wouldnt use those numbers as anything other than a theroetical best case. Also i meant register pressure, not cache. I had a brain fart. The reduced register pressure is probably the biggest benefit to rpm on AMD cards. Register bottlenecks are huge on AMD

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#52  Edited By ronvalencia
Member since 2008 • 29612 Posts

@m3dude1 said:

@ronvalencia: i wouldnt use those numbers as anything other than a theroetical best case. Also i meant register pressure, not cache. I had a brain fart. The reduced register pressure is probably the biggest benefit to rpm on AMD cards. Register bottlenecks are huge on AMD

AMD GCN CU's register storage vs stream processor (ALU) ratios

64KB / (32bit /8) = 16,000 32bit registers per 16 ALU stream processors. 1000 32bit registers per ALU stream processor ratio.

For the entire CU, 64,000 32bit registers per 64 ALU stream processors.

Each CU has 256 KB register storage

256 KB / 64 ALU = 4KB per ALU ratio

----------

For NVIDIA Pascal GP100 SM unit

(32768 register x 4 bytes) x2 yields 256 KB register storage per SM

256 KB / 64 32bit CUDA cores = 4KB per CUDA cores ratio

Similar register storage vs ALU/CUDA core ratio between AMD GCN and Pascal GP100

-----

For NVIDIA Pascal GP104 SM unit

(16384 x 4 bytes) x 4 = 256 KB register storage per SM unit.

256 KB / 128 CUDA cores = 2 KB per CUDA core

When compared AMD GCN, register storage stress is higher on GP104, hence the reason for aggressive NVIDIA's Gameworks initiative with ready made shader code optimized for mainstream Pascal register storage usage patterns.

----------------

NVIDIA Turing TU102 SM, https://hexus.net/tech/reviews/graphics/122045-nvidia-turing-architecture-examined-and-explained/

On FP32, Turing SM similar to CU register storage to FP32 CUDA core ratio! Turing SM is effectively NVIDIA's GCN with superior raster GPU hardware.

NVIDIA Turing SM's 16384 registers x 4 bytes (32bit datatype) = 64 KB associated with 16 CUDA cores i.e. 4KB per CUDA core

AMD GCN CU's 64 KB associated with 16 stream processors i.e. 4 KB per Stream Processor

PS; Buying NVIDIA Turing is like buying AMD GCN with extra Tensor cores, RT (accelerated BVH search) units and superior raster hardware e.g. superior delta color compression..

Rapid Pack Math also benefits NVIDIA's Turing!

Avatar image for m3dude1
m3dude1

2334

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#53 m3dude1
Member since 2007 • 2334 Posts

@ronvalencia: its not that simple. many devs have discussed over the years that register pressure is much more of a problem on GCN than nvidia when it comes to getting high levels of utilization. lots of them on Beyond3d

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#54  Edited By ronvalencia
Member since 2008 • 29612 Posts

@m3dude1 said:

@ronvalencia: its not that simple. many devs have discussed over the years that register pressure is much more of a problem on GCN than nvidia when it comes to getting high levels of utilization. lots of them on Beyond3d

At similar price segment, GCN has higher stream processor count to populate when compared to NVIDIA's counterpart.

From Maxwell GPUs, part of NV's GPU performance factor is with higher clock speed and less being very wide data processor.

Narrower Vega 56 with 12 TFLOPS at 1710Mhz beats wider Vega 64 with 13 TFLOPS at 1590Mhz

Wave count is a problem with register storage.