Is video memory DRAM or SRAM?
Oftentimes even moderate size models, such as DeepMind AlphaFold2 (which requires 20GB RAM) can't fit into the video RAM (such as TPUv3 with 16GB RAM) and have to re-calculate activations during the backward pass, essentially sacrificing FLOPS for RAM.
Which leads to my question on video RAM. Nowadays regular RAM is the cheapest part of your server, I worked with machines with Terabytes of DRAM and basically ignored the cost of RAM in the bottom line compared to the costs of CPU, NVMe, HDDs and high-end network interfaces. And still your typical video card for deep learning wouldn't have more than a few dozen GB of video memory. Why?
Is GPU video memory the fast and expensive SRAM, based on triggers, or slower and dirty cheap condenser-based DRAM?
Topic hardware neural-network
Category Data Science