Fireworks AI · FinOps Profile

Fireworks Ai Finops

Name: Fireworks Ai Finops
Creator: Fireworks AI
Keywords: AI, LLM, Inference, Multimodal, Fine-tuning, GPU, FinOps, Cost Management, FOCUS

FinOps view of Fireworks AI spend. Postpaid usage-based billing in USD, with serverless per-token rates per model, embedding rates by model size, per-1M-token training-token fine-tuning rates, per-GPU-hour on-demand deployments, and a 50% discount on Batch and Cached Input tokens.

Fireworks Ai Finops is the FinOps profile for Fireworks AI on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 10 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, LLM, Inference, Multimodal, and Fine-tuning.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3

AILLMInferenceMultimodalFine-tuningGPUFinOpsCost ManagementFOCUS

Framework Alignment

Framework

FinOps Foundation Framework

Data Spec

FOCUS v1.3

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency

USD

ChargeCategory

Usage

InvoiceIssuerName

Fireworks AI

PricingCategory

Usage-Based

ProviderName

Fireworks AI

PublisherName

Fireworks AI

ServiceCategory

AI and Machine Learning

ServiceName

Fireworks AI

Meters

input_tokens

Unit: tokens

Tokens sent in chat / vision requests, billed per 1M tokens per model.

cached_input_tokens

Unit: tokens

Cached-input tokens at 50% of standard input rate.

output_tokens

Unit: tokens

Tokens generated by the model, per 1M tokens per model.

embedding_tokens

Unit: tokens

Tokens processed for embeddings, per 1M tokens per embedding-model size class.

rerank_documents

Unit: documents

Documents reranked, per model.

image_generations

Unit: images

Image generation requests per model and resolution.

audio_seconds

Unit: seconds

Audio seconds for STT/TTS workloads per model.

batch_tokens

Unit: tokens

Tokens consumed via the Batch API at 50% discount.

fine_tuning_tokens

Unit: tokens

Training tokens for SFT (LoRA / full) jobs per base-model size class.

gpu_seconds

Unit: seconds

On-demand dedicated GPU runtime per GPU class (H100, H200, B200, B300).

Fireworks Ai Finops

Framework Alignment

Charge Categories

FOCUS Columns

Meters

Sources