Local AI Capacity Calculator

NOTE: I am moving servers soon so this tool may be unavailable briefly late May or early June. This tool is especially useful for planning agentic setups which often need multiple separate context sessions and may go beyond the concurrency settings, spill into system RAM for added context, or require queuing for expanded concurrency. It's not possible to account for every configuration and setup nuance, so use the closest match to your planned configuration to get the most accurate estimates.

Hardware Type

GPU Unified - Apple Unified - NVIDIA Unified - AMD

Model

Model Family Model Weight Quantization KV Cache Quantization Context Length Token Batch Size Inference Engine Target Concurrent Sessions

Leave blank or 0 to use the automatically calculated maximum concurrency.

Hardware

GPU Vendor GPU Architecture

GPU / System GPU Count

CPU Vendor

CPU Family

CPU Architecture

CPU

CPU Count

RAM Profile Total System RAM System RAM Already In Use (GB) VRAM Already In Use (GB)

PCIe Topology

Define the PCIe generation and lane width for each GPU slot.

GPU 1 PCIe Slot

PCIe Generation Lane Width

GPU 2 PCIe Slot

PCIe Generation Lane Width

GPU 3 PCIe Slot

PCIe Generation Lane Width

GPU 4 PCIe Slot

PCIe Generation Lane Width

GPU 5 PCIe Slot

PCIe Generation Lane Width

GPU 6 PCIe Slot

PCIe Generation Lane Width

GPU 7 PCIe Slot

PCIe Generation Lane Width

GPU 8 PCIe Slot

PCIe Generation Lane Width

Concurrent Session Storage

Use Concurrent Session Storage

Disclaimer: Calculations are not going to be absolutely accurate, but should offer reasonable estimates for capacity planning. Planned enhancements: GGUF on/off, calculation support for Speculative Decoding (Draft Models), MTP, TurboQuant, RotorQuant and Pipeline Tensor vs Pipeline Parallelism. If you have suggestions, notice sizing inaccuracies or missing models/GPUs, send feedback and share whatever details you believe will be needed.

Local AI Capacity Calculator

Hardware Type

Model

Hardware

PCIe Topology

GPU 1 PCIe Slot

GPU 2 PCIe Slot

GPU 3 PCIe Slot

GPU 4 PCIe Slot

GPU 5 PCIe Slot

GPU 6 PCIe Slot

GPU 7 PCIe Slot

GPU 8 PCIe Slot

GPU Interconnects

Concurrent Session Storage