Together AI · FinOps Profile

Together Ai Finops

FinOps view of Together AI spend. Together bills usage-based per-token rates for serverless inference (chat, embeddings, rerank, vision, audio), per-asset rates for image and video, per-1M-character rates for audio, per-token rates for fine-tuning training, and hourly rates for dedicated endpoints and GPU clusters (with reserved discounts).

Together Ai Finops is the FinOps profile for Together AI on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 11 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, LLM, Inference, Open Source, and Fine-tuning.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3
AILLMInferenceOpen SourceFine-tuningGPUFinOpsCost ManagementFOCUS

Framework Alignment

Framework
Data Spec

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Together AI
PricingCategory
Usage-Based
ProviderName
Together AI
PublisherName
Together AI
ServiceCategory
AI and Machine Learning
ServiceName
Together AI Cloud

Meters

input_tokens
Unit: tokens
Tokens sent in chat / completion requests, billed per 1M tokens per model.
output_tokens
Unit: tokens
Tokens generated, billed per 1M tokens per model.
embedding_tokens
Unit: tokens
Tokens processed for embeddings, billed per 1M tokens per model.
rerank_documents
Unit: documents
Documents reranked, billed per 1K or per 1M units depending on model.
image_generations
Unit: images
Image generation requests, billed per image per model/quality tier.
video_generations
Unit: videos
Video generation requests, billed per video per model/quality tier.
audio_characters
Unit: characters
TTS / STT character counts, billed per 1M characters per model.
fine_tuning_tokens
Unit: tokens
Training tokens for fine-tuning jobs, billed per 1M tokens per base-model size class.
dedicated_endpoint_hours
Unit: hours
Wall-clock hours of dedicated inference endpoint runtime per GPU class.
gpu_cluster_hours
Unit: hours
Wall-clock hours of bare-metal GPU cluster runtime per GPU class and reservation tier.
batch_tokens
Unit: tokens
Tokens consumed via the Batch API at up to 50% discount versus serverless.

Sources