Ollama · FinOps Profile

Ollama Finops

FOCUS-aligned FinOps for Ollama: local inference is free and self-hosted; Ollama Cloud is a tiered subscription priced on GPU utilization rather than tokens, with concurrency caps per tier.

Ollama Finops is the FinOps profile for Ollama on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 4 billable meters, billed in USD, on a monthly or annual cycle, and pricing category subscription (tiered) + self-hosted free.

The profile maps 9 FOCUS columns for cost-allocation reporting.

Tagged areas include Artificial Intelligence, Large Language Models, Models, FinOps, and Cost Management.

Category: AI Infrastructure Pricing: Subscription (Tiered) + Self-Hosted Free Billing: Monthly or Annual FOCUS v1.3
Artificial IntelligenceLarge Language ModelsModelsFinOpsCost ManagementFOCUS

Framework Alignment

Framework
Data Spec

Charge Categories

PurchaseUsageTaxCredit

FOCUS Columns

BillingCurrency
USD
ChargeCategory
Purchase
InvoiceIssuerName
Ollama Inc.
PricingCategory
Subscription
PricingUnit
month
ProviderName
Ollama
PublisherName
Ollama Inc.
ServiceCategory
AI Infrastructure
ServiceName
Ollama

Meters

cloud_subscription
Unit: month
Per-account Ollama Cloud subscription (Free, Pro, Max)
cloud_gpu_time
Unit: gpu-second
GPU time consumed on Ollama Cloud (the actual consumption meter)
cloud_concurrent_models
Unit: model
Peak concurrent cloud model count (used to size tier)
local_inference_requests
Unit: request
Local Ollama requests (no charge from Ollama; cost is hardware/power)

Sources