Nvidia (NVDA): Part 1 of 2

Jul 12, 2021

∙ Paid

At a high level, a computer runs a program by compiling human decipherable instructions into machine readable sequences of commands (threads) that are carried out by processing units (cores) on a CPU. So when you open a web browser, one thread might tell a processor to display images, another to download a file. The speed with which those commands are carried out (instructions per second) is a function of clock cycle (the amount of time that lapses between 2 electronic pulses in a CPU; a 4GHz processor has 4bn clock cycles per second) and the number of instructions executed per clock cycle. You can boost performance by improving clock speed, but doing so puts more voltage on the transistors, which increases the power demands exponentially (power consumption rises faster than clock speed).

So then you might try running several tasks at the same time. Rather than execute the first thread through to completion, a core, while waiting for the resources needed to execute the next task in that thread, will work on the second thread and vice versa, switching back and forth so rapidly that it appears to be executing both threads simultaneously. Through this hyper-threading process, one physical CPU core acts like two “logical cores” handling different threads. To parallelize things further, you can add more cores. A dual core processor can execute twice as many instructions at the same time as a single core; a quad core processor, 4x. Four hyper-threaded cores can execute 8 threads concurrently. But this too has it limits, as adding cores requires more cache memory and entails a non-linear draw on power. Plus you’ll be spreading a fixed number of transistors over more cores, which degrades the performance of workloads that don’t require lots of parallelism.

But use cases geared to parallel processing – those with operations that are independent of each other – are better handled by Graphical Processor Unit (GPU). Unlike CPUs, general purpose processors that execute instructions sequentially, GPUs are specialized processors that do so in parallel. CPUs are latency-oriented, meaning they minimize the time required to complete a unit of work, namely through large cores with high clock speeds. GPUs, by contrast, are throughput-oriented, meaning they maximize work per unit of time, through thousands of smaller cores, each with slower clock speed than a bigger CPU core, processing lots of instructions at once across most of the die’s surface area1.

The parallel operations of linear algebra are especially useful for rendering 3D graphics in gaming programs. To create 3D objects in a scene, millions of little polygons are stitched together in a mesh. The polygons’ vertices’ coordinates are transformed through matrix multiplication to move, shrink, enlarge, or rotate the object and are accompanied by values representing color, shading, texture, translucency, etc. that update as those shapes move or rotate. The math behind these processes is so computationally intense that before Nvidia’s GeForce 3 GPU was introduced in 2001, it could only be performed offline on server farms.

Until recently, a game’s 3D object models were smooshed down to 2D pixels and, in a process called rasterization, the shading in each of those pixels was graduated according to how light moved around in a scene2. Rasterization is being replaced by a more advanced and math heavy technique called ray tracing – first featured in the GeForce RTX series based on Nvidia’s Turing architecture 4 years ago and now being adopted by the entire gaming industry – where algorithms dynamically simulate the path of a light ray from the camera to the objects it hits to accurately represent reflections and shadows, giving rise to stunningly realistic computer-generated images, like this:

For most of its history, Nvidia was known as gaming graphics company. From 1999, when Nvidia launched its first GPU, to around 2005, most of Nvidia’s revenue came from GPUs sold under the GeForce brand, valued by gamers seeking better visuals and greater speed. To a lesser degree, Nvidia sold GPUs into professional workstations, used by OEMs to design vehicles and planes and by film studios to create animations3. It also had smaller business units that sold GPUs into game consoles (Nvidia’s GPU was in the first Xbox) and consumer devices, and that supplied PC chipsets to OEMs.

Nvidia’s now defunct chipset business is worth spending some time on as it provides context for the company’s non-gaming consumer efforts.

In the early 2000s, Nvidia entered into cross-licensing agreements with Intel and AMD to develop chipsets (systems that transfers data from the CPU to a graphics chip, a networking chip, memory controllers, and other supporting components)4. As part of these agreements, Nvidia – which was mostly known as a vendor of standalone discrete/dedicated graphics (that is, a standalone GPU or a GPU embedded in a graphics cards) for high-end PCs and laptops, a market it split ~50/50 with ATI – supplied graphics and media processors to chipsets anchored by Intel and AMD CPUs, creating integrated graphics processors (IGPs) for the low-end of the PC market5. Nvidia could do this without cannibalizing GeForce because while integrated graphics were improving with every release cycle, they still could not come anywhere close to rivaling the performance of discrete GPUs. This was true in 2010, when Nvidia declared that 40% of the top game titles were “unplayable” (< 30 frames per second) on Intel’s Sandy Bridge integrated CPU/GPU architecture, and it remains true today.

scuttleblurb

Nvidia (NVDA): Part 1 of 2

This post is for paid subscribers