Groq · FinOps Profile

Groq Finops

FinOps view of GroqCloud spend. Groq bills usage-based per-token rates for chat / vision / reasoning per model, per-million-character rates for TTS, per-hour transcription rates for STT, per-call or per-hour rates for tools, and a 50% Batch discount. Prompt Caching gives 50% off cached input tokens.

Groq Finops is the FinOps profile for Groq on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 9 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, LLM, Inference, LPU, and Low Latency.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3
AILLMInferenceLPULow LatencyFinOpsCost ManagementFOCUS

Framework Alignment

Framework
Data Spec

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Groq
PricingCategory
Usage-Based
ProviderName
Groq
PublisherName
Groq
ServiceCategory
AI and Machine Learning
ServiceName
GroqCloud

Meters

input_tokens
Unit: tokens
Tokens sent in chat / vision / reasoning requests, billed per 1M tokens per model.
cached_input_tokens
Unit: tokens
Cached-input tokens billed at 50% of the standard input rate.
output_tokens
Unit: tokens
Tokens generated, billed per 1M tokens per model.
tts_characters
Unit: characters
TTS characters synthesized, billed per 1M characters per voice/model.
stt_audio_hours
Unit: hours
Audio hours transcribed, billed per hour per Whisper variant.
tool_invocations
Unit: invocations
Tool calls (web search, Wolfram) priced per 1,000 invocations.
tool_compute_hours
Unit: hours
Tool compute hours (e.g., Code Execution at $0.18/hr).
batch_tokens
Unit: tokens
Tokens consumed via the Batch API at 50% discount.
flex_tokens
Unit: tokens
Tokens consumed via Flex Processing tier at relaxed-latency discount.

Sources