ampere – latest and greatest

volta – datacenter card, first card to introduce tensor cores, allows Nvidia to perform very fast low precision matrix multiplications

turing – consumer brother of Volta server, fast integer computations for very fast quantized inference

ampere vs volta – ampere introduced new datatypes for matrix multiplications – Bfloat16 datatype used for long time on TPUs 16 bits of memory, doesn’t suffer from fp16 (small dynamic range, prone to underflow, overflow, lots of numerical tricks have to be applied to avoid this). Fewer mantissa bits, as regular mantissa bits as regular fp32, so if you don’t over/underflow with fp32, Bfloat16 will provide stable characteristics. code runs faster even if you don’t use any new datatypes.

PyTorch/Tensorflow – these deep learning frameworks abstract away CUDA, which is a C-like programming language, so that we can focus on higher level ideas in deep learning. The beauty of these frameworks is we’re able to utilize the GPU without having to understand the hardware.