Fireworks AI · FinOps Profile
Fireworks Ai Finops
FinOps view of Fireworks AI spend. Postpaid usage-based billing in USD, with serverless per-token rates per model, embedding rates by model size, per-1M-token training-token fine-tuning rates, per-GPU-hour on-demand deployments, and a 50% discount on Batch and Cached Input tokens.
Fireworks Ai Finops is the FinOps profile for Fireworks AI on the APIs.io network, aligned with the FinOps Foundation Framework.
It defines 10 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.
The profile maps 8 FOCUS columns for cost-allocation reporting.
Tagged areas include AI, LLM, Inference, Multimodal, and Fine-tuning.
Category: AI and Machine Learning
Pricing: Usage-Based
Billing: Monthly
FOCUS v1.3
AILLMInferenceMultimodalFine-tuningGPUFinOpsCost ManagementFOCUS
Framework Alignment
Charge Categories
UsagePurchaseAdjustment
FOCUS Columns
BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Fireworks AI
PricingCategory
Usage-Based
ProviderName
Fireworks AI
PublisherName
Fireworks AI
ServiceCategory
AI and Machine Learning
ServiceName
Fireworks AI
Meters
input_tokens
Tokens sent in chat / vision requests, billed per 1M tokens per model.
cached_input_tokens
Cached-input tokens at 50% of standard input rate.
output_tokens
Tokens generated by the model, per 1M tokens per model.
embedding_tokens
Tokens processed for embeddings, per 1M tokens per embedding-model size class.
rerank_documents
Documents reranked, per model.
image_generations
Image generation requests per model and resolution.
audio_seconds
Audio seconds for STT/TTS workloads per model.
batch_tokens
Tokens consumed via the Batch API at 50% discount.
fine_tuning_tokens
Training tokens for SFT (LoRA / full) jobs per base-model size class.
gpu_seconds
On-demand dedicated GPU runtime per GPU class (H100, H200, B200, B300).