Together AI · FinOps Profile
Together Ai Finops
FinOps view of Together AI spend. Together bills usage-based per-token rates for serverless inference (chat, embeddings, rerank, vision, audio), per-asset rates for image and video, per-1M-character rates for audio, per-token rates for fine-tuning training, and hourly rates for dedicated endpoints and GPU clusters (with reserved discounts).
Together Ai Finops is the FinOps profile for Together AI on the APIs.io network, aligned with the FinOps Foundation Framework.
It defines 11 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.
The profile maps 8 FOCUS columns for cost-allocation reporting.
Tagged areas include AI, LLM, Inference, Open Source, and Fine-tuning.
Category: AI and Machine Learning
Pricing: Usage-Based
Billing: Monthly
FOCUS v1.3
AILLMInferenceOpen SourceFine-tuningGPUFinOpsCost ManagementFOCUS
Framework Alignment
Charge Categories
UsagePurchaseAdjustment
FOCUS Columns
BillingCurrency
USD
ChargeCategory
Usage
InvoiceIssuerName
Together AI
PricingCategory
Usage-Based
ProviderName
Together AI
PublisherName
Together AI
ServiceCategory
AI and Machine Learning
ServiceName
Together AI Cloud
Meters
input_tokens
Tokens sent in chat / completion requests, billed per 1M tokens per model.
output_tokens
Tokens generated, billed per 1M tokens per model.
embedding_tokens
Tokens processed for embeddings, billed per 1M tokens per model.
rerank_documents
Documents reranked, billed per 1K or per 1M units depending on model.
image_generations
Image generation requests, billed per image per model/quality tier.
video_generations
Video generation requests, billed per video per model/quality tier.
audio_characters
TTS / STT character counts, billed per 1M characters per model.
fine_tuning_tokens
Training tokens for fine-tuning jobs, billed per 1M tokens per base-model size class.
dedicated_endpoint_hours
Wall-clock hours of dedicated inference endpoint runtime per GPU class.
gpu_cluster_hours
Wall-clock hours of bare-metal GPU cluster runtime per GPU class and reservation tier.
batch_tokens
Tokens consumed via the Batch API at up to 50% discount versus serverless.