Groq · FinOps Profile
Groq Finops
FinOps view of GroqCloud spend. Groq bills usage-based per-token rates for chat / vision / reasoning per model, per-million-character rates for TTS, per-hour transcription rates for STT, per-call or per-hour rates for tools, and a 50% Batch discount. Prompt Caching gives 50% off cached input tokens.
Groq Finops is the FinOps profile for Groq on the APIs.io network, aligned with the FinOps Foundation Framework.
It defines 9 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.
The profile maps 8 FOCUS columns for cost-allocation reporting.
Tagged areas include AI, LLM, Inference, LPU, and Low Latency.
Category: AI and Machine Learning
Pricing: Usage-Based
Billing: Monthly
FOCUS v1.3
AILLMInferenceLPULow LatencyFinOpsCost ManagementFOCUS
Framework Alignment
Charge Categories
UsagePurchaseAdjustment
FOCUS Columns
BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Groq
PricingCategory
Usage-Based
ProviderName
Groq
PublisherName
Groq
ServiceCategory
AI and Machine Learning
ServiceName
GroqCloud
Meters
input_tokens
Tokens sent in chat / vision / reasoning requests, billed per 1M tokens per model.
cached_input_tokens
Cached-input tokens billed at 50% of the standard input rate.
output_tokens
Tokens generated, billed per 1M tokens per model.
tts_characters
TTS characters synthesized, billed per 1M characters per voice/model.
stt_audio_hours
Audio hours transcribed, billed per hour per Whisper variant.
tool_invocations
Tool calls (web search, Wolfram) priced per 1,000 invocations.
tool_compute_hours
Tool compute hours (e.g., Code Execution at $0.18/hr).
batch_tokens
Tokens consumed via the Batch API at 50% discount.
flex_tokens
Tokens consumed via Flex Processing tier at relaxed-latency discount.