AMD Talks AI Capabilities of RDNA 3 GPUs & XDNA NPU: Radeon RX 7900 XT Up To 8X Faster Than Ryzen 7 8700G

Original source (on modern site) | Article images: [1]

AMD has shared some interesting data regarding the capabilities of its RDNA 3 GPU & XDNA NPU hardware in consumer-centric AI workloads.

AMD's RDNA 3 GPUs & XDNA NPU Provide A Robust Suite of Consumer-Centric AI Capabilities On PC Platforms

There's no doubt that AMD has been ahead of the curve in offering AI capabilities to a wider PC audience through the implementation of XDNA NPU on its Ryzen APUs. The first NPU launched back in 2023 with the Phoenix "Ryzen 7040" APUs and recently got updated with the Hawk Point "Ryzen 8040" series. Besides the NPU, AMD's RDNA 3 GPU architecture has also incorporated a large sum of dedicated AI cores that can handle these workloads and the company is trying to solidify its momentum with its ROCm software suite.

2 of 9

During the latest "Meet The Experts" webinar, AMD discussed how its Radeon Graphics suite such as the RDNA 3 series provides gamers, creators, & developers with a range of optimized workloads which include:

Video Quality Enhancement
Background Noise Removal
Text-To-Image (GenAI)
Large Language Models (GenAI)
Photo Editing
Video Editing
Upscaling
Text-To-Image
Model Training (Linux)
ROCm Platform (Linux)

Starting with the AMD RDNA 3 graphics architecture, the latest GPUs featured on Radeon RX 7000 GPUs and Ryzen 7000/8000 CPUs provide over 2x gen-over-gen AI performance uplifts.

These GPU products offer up to 192 AI accelerators which are optimized for FP16 workloads, are optimized in multiple ML frameworks such as Microsoft DirectML, Nod.AI Shark & ROCm, & feature large pools of dedicated VRAM which is essential for handling large data sets (up to 48 GB) & also feature faster bandwidth which is boosted by Infinity Cache technology.

According to AMD, the majority of AI use cases on the PC platform include LLM's and Diffusion models which are mainly dependent on FP16 compute and memory capabilities of the hardware they are running on. Certain models such as SDXL (Diffusion) are bound by Compute and require around 4-16 GB of memory while Llama2-13B and Mistral-8x 7B are bound by memory & can use up to 23 GB of memory.

As mentioned earlier, AMD has a wide range of hardware that features dedicated AI Acceleration. Even the company's Radeon RX 7600 XT, a $329 US graphics card, has 16 GB of VRAM and in terms of performance, it offers a 3.6x boost over the Ryzen 7 8700G in LM Studio while the RX 7900 XT is up to 8x faster than the 8700G.

2 of 9

LM Studio Performance (Higher is Better):

Ryzen 7 8700G NPU: 11 Tokens/second
RX 7600 XT 16 GB: 40 Tokens/second
RX 7900 XT 20 GB: 85 Tokens/second

AMUSE Diffusion (Lower is Better):

Ryzen 7 8700G NPU: 2.6 second/image
RX 7600 XT 16 GB: 0.97 second/image
RX 7900 XT 20 GB: 0.6 second/image

AMD also makes a small comparison against NVIDIA's GeForce RTX which the green team calls the "Premium AI PC" platform. Both lineups offer similar support but AMD shows how its 16 GB GPUs come in at a lower price point of $329 US (7600 XT) whereas NVIDIA's most entry-level 16 GB GPU starts at around $500 US (4060 TI 16 GB). The company also has a high-end stack that scales up to 48 GB of memory. AMD has also previously shown strong performance against Intel's Core Ultra in AI at a better value.

2 of 9

Moving forward, AMD talks about how ROCm 6.0 has been progressing and how the open-source stack has been receiving support for consumer-tier hardware such as the Radeon RX 7900 XTX, 7900 XT, 7900 GRE, PRO W7900, and the PRO W7800. ROCm 6.0 supports both PyTorch & ONNX Runtime ML models and algorithms on the Ubuntu 22.03.3 (Linux) OS & improves interoperability by adding INT8 for more complex models.

The company is also trying to make ROCm even more open-source by offering developers a range of software stacks and hardware documentation.

2 of 9

AMD and its ROCm suite are competing against the dominant NVIDIA CUDA & TensorRT stack while Intel is also gaining ground with its own OneAPI AI stack. These are the three forces to look out for when it comes to AI workloads on the PC platform so expect lots of innovations and optimizations for existing and next-gen hardware in the future.

< Back to 68k.news DK front page