DataSculpting.AI Tool:

Local AI Capacity Calculator
NOTE: I am moving servers soon so this tool may be unavailable briefly late May or early June. This tool is especially useful for planning agentic setups which often need multiple separate context sessions and may go beyond the concurrency settings, spill into system RAM for added context, or require queuing for expanded concurrency. It's not possible to account for every configuration and setup nuance, so use the closest match to your planned configuration to get the most accurate estimates.
Disclaimer: Calculations are not going to be absolutely accurate, but should offer reasonable estimates for capacity planning. Planned enhancements: GGUF on/off, calculation support for Speculative Decoding (Draft Models), MTP, TurboQuant, RotorQuant and Pipeline Tensor vs Pipeline Parallelism. If you have suggestions, notice sizing inaccuracies or missing models/GPUs, send feedback and share whatever details you believe will be needed.