vLLM · FinOps Profile

Vllm Finops

Name: Vllm Finops
Creator: vLLM
Keywords: LLM, Inference, Open Source, GPU, OpenAI Compatible, Self-Hosted, FinOps, Cost Management, FOCUS

FOCUS-aligned FinOps profile for vLLM. The software is free. Cost is GPU compute and storage on your chosen platform (cloud spot/on-demand GPUs, on-prem hardware, or a managed-vLLM provider). Optimize via continuous batching (default in vLLM), prefix caching, paged-attention sizing, model quantization (AWQ/GPTQ/INT8/FP8), and routing across instance sizes.

Vllm Finops is the FinOps profile for vLLM on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 2 billable meters, billed in USD, on a n/a (project) / hourly (infra) cycle, and pricing category free oss / compute.

The profile maps 7 FOCUS columns for cost-allocation reporting.

Tagged areas include LLM, Inference, Open Source, GPU, and OpenAI Compatible.

Category: LLM Inference Pricing: Free OSS / Compute Billing: N/A (project) / Hourly (infra) FOCUS v1.3

LLMInferenceOpen SourceGPUOpenAI CompatibleSelf-HostedFinOpsCost ManagementFOCUS

Framework Alignment

Framework

FinOps Foundation Framework

Data Spec

FOCUS v1.3

Charge Categories

Usage

FOCUS Columns

BillingCurrency

USD

ChargeCategory

Usage

InvoiceIssuerName

N/A

ProviderName

vLLM Project

PublisherName

vLLM Project

ServiceCategory

LLM Inference

ServiceName

vLLM

Meters

gpu_hours

Unit: hour

GPU hours consumed by vLLM serving processes.

tokens_served

Unit: token

Output tokens generated (operational meter for $/token tracking).

Vllm Finops

Framework Alignment

Charge Categories

FOCUS Columns

Meters

Sources