vLLM · FinOps Profile
Vllm Finops
FOCUS-aligned FinOps profile for vLLM. The software is free. Cost is GPU compute and storage on your chosen platform (cloud spot/on-demand GPUs, on-prem hardware, or a managed-vLLM provider). Optimize via continuous batching (default in vLLM), prefix caching, paged-attention sizing, model quantization (AWQ/GPTQ/INT8/FP8), and routing across instance sizes.
Vllm Finops is the FinOps profile for vLLM on the APIs.io network, aligned with the FinOps Foundation Framework.
It defines 2 billable meters, billed in USD, on a n/a (project) / hourly (infra) cycle, and pricing category free oss / compute.
The profile maps 7 FOCUS columns for cost-allocation reporting.
Tagged areas include LLM, Inference, Open Source, GPU, and OpenAI Compatible.
Category: LLM Inference
Pricing: Free OSS / Compute
Billing: N/A (project) / Hourly (infra)
FOCUS v1.3
LLMInferenceOpen SourceGPUOpenAI CompatibleSelf-HostedFinOpsCost ManagementFOCUS
Framework Alignment
Charge Categories
Usage
FOCUS Columns
BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
N/A
ProviderName
vLLM Project
PublisherName
vLLM Project
ServiceCategory
LLM Inference
ServiceName
vLLM
Meters
gpu_hours
GPU hours consumed by vLLM serving processes.
tokens_served
Output tokens generated (operational meter for $/token tracking).