NVIDIA · AI GPU Review

RTX 4080 Super

Name: RTX 4080 Super
Brand: NVIDIA
Price: 999 USD

Our complete AI-workload review of the RTX 4080 Super: VRAM analysis, Flux & ComfyUI throughput, local LLM performance, power efficiency, and workstation fit.

16 GB

VRAM

2.1 it/s

Flux 1024

19 t/s

L3 70B Q4

320 W

TDP

Overview

The RTX 4080 Super delivers 72/100 on our composite AI workload index — a balanced view of Flux generation throughput, SDXL inference, and quantized local LLM performance. With 16GB of VRAM and a 320W TDP, it's positioned for mainstream AI creators with budget constraints.

Pros & Cons

Pros

16GB VRAM handles most Flux & SDXL workflows
2.1 it/s on Flux.1 dev FP16
Excellent CUDA ecosystem support
Strong resale value

Cons

320W TDP requires serious PSU
Limited for 70B+ local LLMs
Limited stock at MSRP

Performance benchmarks

Flux.1 dev FP16 · 1024² · 25 steps		2.1 it/s
SDXL Base · 1024² · 20 steps		10.8 it/s
Llama 3 70B · Q4_K_M · 2k ctx		19 tok/s
Hunyuan Video · 720p · 5s		0.63 it/s

VRAM analysis

With 16GB of VRAM, the RTX 4080 Super can comfortably run: Flux.1 dev FP8 with text encoders offloaded, SDXL with multiple LoRAs, and Llama 3 8B at FP16 or 34B at Q4.

Flux performance

At 2.1 it/s, a standard Flux.1 dev 25-step generation completes in ~11.9 seconds. For batch workflows, expect linear scaling up to VRAM limits.

ComfyUI performance

SDXL throughput of 10.8 it/s makes the RTX 4080 Super ideal for iterative ComfyUI workflows. Block-swap and tiled VAE are rarely needed.