Together AI · FinOps Profile

Together Ai Finops

Name: Together Ai Finops
Creator: Together AI
Keywords: AI, LLM, Inference, Open Source, Fine-tuning, GPU, FinOps, Cost Management, FOCUS

FinOps view of Together AI spend. Together bills usage-based per-token rates for serverless inference (chat, embeddings, rerank, vision, audio), per-asset rates for image and video, per-1M-character rates for audio, per-token rates for fine-tuning training, and hourly rates for dedicated endpoints and GPU clusters (with reserved discounts).

Together Ai Finops is the FinOps profile for Together AI on the APIs.io network, aligned with the FinOps Foundation Framework.

It defines 11 billable meters, billed in USD, on a monthly cycle, and pricing category usage-based.

The profile maps 8 FOCUS columns for cost-allocation reporting.

Tagged areas include AI, LLM, Inference, Open Source, and Fine-tuning.

Category: AI and Machine Learning Pricing: Usage-Based Billing: Monthly FOCUS v1.3

AILLMInferenceOpen SourceFine-tuningGPUFinOpsCost ManagementFOCUS

Framework Alignment

Framework

FinOps Foundation Framework

Data Spec

FOCUS v1.3

Charge Categories

UsagePurchaseAdjustment

FOCUS Columns

BillingCurrency

USD

ChargeCategory

Usage

InvoiceIssuerName

Together AI

PricingCategory

Usage-Based

ProviderName

Together AI

PublisherName

Together AI

ServiceCategory

AI and Machine Learning

ServiceName

Together AI Cloud

Meters

input_tokens

Unit: tokens

Tokens sent in chat / completion requests, billed per 1M tokens per model.

output_tokens

Unit: tokens

Tokens generated, billed per 1M tokens per model.

embedding_tokens

Unit: tokens

Tokens processed for embeddings, billed per 1M tokens per model.

rerank_documents

Unit: documents

Documents reranked, billed per 1K or per 1M units depending on model.

image_generations

Unit: images

Image generation requests, billed per image per model/quality tier.

video_generations

Unit: videos

Video generation requests, billed per video per model/quality tier.

audio_characters

Unit: characters

TTS / STT character counts, billed per 1M characters per model.

fine_tuning_tokens

Unit: tokens

Training tokens for fine-tuning jobs, billed per 1M tokens per base-model size class.

dedicated_endpoint_hours

Unit: hours

Wall-clock hours of dedicated inference endpoint runtime per GPU class.

gpu_cluster_hours

Unit: hours

Wall-clock hours of bare-metal GPU cluster runtime per GPU class and reservation tier.

batch_tokens

Unit: tokens

Tokens consumed via the Batch API at up to 50% discount versus serverless.

Together Ai Finops

Framework Alignment

Charge Categories

FOCUS Columns

Meters

Sources