GGUF Memory Estimator

Estimate VRAM requirements for any GGUF model from HuggingFace GitHub ↗

Model Input

HuggingFace path or URL

Select GGUF file

Multimodal projector (mmproj)

Context length

Batch size

KV Cache Quantization

K cache

V cache

Full-size SWA cache (--swa-full)

Hardware (tokens/sec)

GPU preset

Available VRAM (GiB, optional)

GPU FP16 TFLOPS

GPU mem BW (GB/s)

CPU preset (optional, for spill)

Available RAM (GiB, optional)

CPU FP16 TFLOPS

RAM BW (GB/s)

GPU layers (auto if empty)

Performance calibration (advanced)

BW utilization

Decode compute util

Prefill MFU

Weight read ratio

GPU count (TP)

Interconnect BW (GB/s)

Flash attention Speculative decoding

Fetching model metadata...

📦

Enter a HuggingFace model path and click Resolve to get started.

Model Info

Model Weights

Total weight size -

Quantization	Tensors	Elements	Size

KV Cache

K cache (F16) -

V cache (F16) -

KV layers -

KV heads (GQA) -

Activations

Activation memory (FP32) -

Theoretical Memory (Full Offload)

VRAM required -

Weights -

KV cache -

Activations -

RAM required -

Total system memory -

System Fit Check

VRAM -

-

-