vLLM · FinOps Profile

Vllm Finops

FOCUS-aligned FinOps profile for vLLM. The software is free. Cost is GPU compute and storage on your chosen platform (cloud spot/on-demand GPUs, on-prem hardware, or a managed-vLLM provider). Optimize via continuous batching (default in vLLM), prefix caching, paged-attention sizing, model quantization (AWQ/GPTQ/INT8/FP8), and routing across instance sizes.

Vllm Finops is the FinOps profile for vLLM on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 2 billable meters, billed in USD, on a n/a (project) / hourly (infra) cycle, and pricing category free oss / compute.

The profile maps 7 FOCUS columns for cost-allocation reporting.

Tagged areas include LLM, Inference, Open Source, GPU, and OpenAI Compatible.

Category: LLM Inference Pricing: Free OSS / Compute Billing: N/A (project) / Hourly (infra) FOCUS v1.3
LLMInferenceOpen SourceGPUOpenAI CompatibleSelf-HostedFinOpsCost ManagementFOCUS

Framework Alignment

Framework
Data Spec

Charge Categories

Usage

FOCUS Columns

BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
N/A
ProviderName
vLLM Project
PublisherName
vLLM Project
ServiceCategory
LLM Inference
ServiceName
vLLM

Meters

gpu_hours
Unit: hour
GPU hours consumed by vLLM serving processes.
tokens_served
Unit: token
Output tokens generated (operational meter for $/token tracking).

Sources