@ronvalencia: i wouldnt use those numbers as anything other than a theroetical best case. Also i meant register pressure, not cache. I had a brain fart. The reduced register pressure is probably the biggest benefit to rpm on AMD cards. Register bottlenecks are huge on AMD
@ronvalencia: i wouldnt use those numbers as anything other than a theroetical best case. Also i meant register pressure, not cache. I had a brain fart. The reduced register pressure is probably the biggest benefit to rpm on AMD cards. Register bottlenecks are huge on AMD
AMD GCN CU's register storage vs stream processor (ALU) ratios
64KB / (32bit /8) = 16,000 32bit registers per 16 ALU stream processors. 1000 32bit registers per ALU stream processor ratio.
For the entire CU, 64,000 32bit registers per 64 ALU stream processors.
Each CU has 256 KB register storage
256 KB / 64 ALU = 4KB per ALU ratio
----------
For NVIDIA Pascal GP100 SM unit
(32768 register x 4 bytes) x2 yields 256 KB register storage per SM
256 KB / 64 32bit CUDA cores = 4KB per CUDA cores ratio
Similar register storage vs ALU/CUDA core ratio between AMD GCN and Pascal GP100
-----
For NVIDIA Pascal GP104 SM unit
(16384 x 4 bytes) x 4 = 256 KB register storage per SM unit.
256 KB / 128 CUDA cores = 2 KB per CUDA core
When compared AMD GCN, register storage stress is higher on GP104, hence the reason for aggressive NVIDIA's Gameworks initiative with ready made shader code optimized for mainstream Pascal register storage usage patterns.
On FP32, Turing SM similar to CU register storage to FP32 CUDA core ratio! Turing SM is effectively NVIDIA's GCN with superior raster GPU hardware.
NVIDIA Turing SM's 16384 registers x 4 bytes (32bit datatype) = 64 KB associated with 16 CUDA cores i.e. 4KB per CUDA core
AMD GCN CU's 64 KB associated with 16 stream processors i.e. 4 KB per Stream Processor
PS; Buying NVIDIA Turing is like buying AMD GCN with extra Tensor cores, RT (accelerated BVH search) units and superior raster hardware e.g. superior delta color compression..
@ronvalencia: its not that simple. many devs have discussed over the years that register pressure is much more of a problem on GCN than nvidia when it comes to getting high levels of utilization. lots of them on Beyond3d
@ronvalencia: its not that simple. many devs have discussed over the years that register pressure is much more of a problem on GCN than nvidia when it comes to getting high levels of utilization. lots of them on Beyond3d
At similar price segment, GCN has higher stream processor count to populate when compared to NVIDIA's counterpart.
From Maxwell GPUs, part of NV's GPU performance factor is with higher clock speed and less being very wide data processor.
Narrower Vega 56 with 12 TFLOPS at 1710Mhz beats wider Vega 64 with 13 TFLOPS at 1590Mhz
Log in to comment