Groq · FinOps Profile

Groq Finops

Name: Groq Finops
Creator: Groq
Keywords: AI, LLM, Inference, LPU, Low Latency, FinOps, Cost Management, FOCUS

FinOps view of GroqCloud spend. Groq bills usage-based per-token rates for chat / vision / reasoning per model, per-million-character rates for TTS, per-hour transcription rates for STT, per-call or per-hour rates for tools, and a 50% Batch discount. Prompt Caching gives 50% off cached input tokens.

Groq Finops is the FinOps profile for Groq on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 9 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, LLM, Inference, LPU, and Low Latency.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3

AILLMInferenceLPULow LatencyFinOpsCost ManagementFOCUS

Framework Alignment

Framework

FinOps Foundation Framework

Data Spec

FOCUS v1.3

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency

USD

ChargeCategory

Usage

InvoiceIssuerName

Groq

PricingCategory

Usage-Based

ProviderName

Groq

PublisherName

Groq

ServiceCategory

AI and Machine Learning

ServiceName

GroqCloud

Meters

input_tokens

Unit: tokens

Tokens sent in chat / vision / reasoning requests, billed per 1M tokens per model.

cached_input_tokens

Unit: tokens

Cached-input tokens billed at 50% of the standard input rate.

output_tokens

Unit: tokens

Tokens generated, billed per 1M tokens per model.

tts_characters

Unit: characters

TTS characters synthesized, billed per 1M characters per voice/model.

stt_audio_hours

Unit: hours

Audio hours transcribed, billed per hour per Whisper variant.

tool_invocations

Unit: invocations

Tool calls (web search, Wolfram) priced per 1,000 invocations.

tool_compute_hours

Unit: hours

Tool compute hours (e.g., Code Execution at $0.18/hr).

batch_tokens

Unit: tokens

Tokens consumed via the Batch API at 50% discount.

flex_tokens

Unit: tokens

Tokens consumed via Flex Processing tier at relaxed-latency discount.

Groq Finops

Framework Alignment

Charge Categories

FOCUS Columns

Meters

Sources