Data Formats

Optimizing LLM Costs with TOON: Practical Token Reduction Strategies

📅 December 10, 2025 ⏱️ 4 min read 👁️ 120 views 🏷️ Data Formats

LLM API bills have a habit of growing quietly until someone finally looks at the invoice. I ran into this problem while building an internal analytics tool that pushed large product and user datasets into an LLM for classification and summarization. The logic was solid, the prompts were fine, but the token usage was out of control. That is where TOON made a measurable difference.

Understanding LLM Pricing in Practice

Most LLM providers charge based on input and output tokens. On paper, the numbers look manageable. In real systems, they compound fast once traffic grows or batch sizes increase.

Model Input Cost per 1M Tokens Output Cost per 1M Tokens
GPT-4o$2.50$10.00
GPT-4 Turbo$10.00$30.00
Claude 3 Opus$15.00$75.00
Claude 3.5 Sonnet$3.00$15.00
Gemini 1.5 Pro$3.50$10.50

One mistake I made early was assuming small per request costs would stay small. When the same prompt ran hundreds of times per day, the monthly bill told a different story.

Real World Cost Savings Example

A real case involved an ecommerce catalog with pricing, availability, and category data. The system sent this data to an LLM to generate recommendations and summaries. Initially, everything was sent as formatted JSON.

import tiktoken

def calculate_cost(text, model="gpt-4-turbo"):
    enc = tiktoken.encoding_for_model(model)
    tokens = len(enc.encode(text))
    cost_per_token = 10 / 1_000_000
    return tokens, tokens * cost_per_token

# Observed averages from production data
# JSON payload ~45,000 tokens
# TOON payload ~18,000 tokens

The first issue I faced was invalid JSON sneaking into token calculations due to trailing commas and broken arrays. Running the payloads through https://jsonformatterspro.com helped catch those errors before measuring anything.

Once corrected, the numbers were consistent. Switching to TOON reduced token usage by roughly 60 percent for the same dataset.

Monthly Cost Projection from a Live System

This projection mirrors a real deployment that processed product data multiple times per day.

def project_monthly_savings(daily_requests, records_per_request):
    json_tokens_per_record = 45
    toon_tokens_per_record = 18

    monthly_requests = daily_requests * 30
    total_records = monthly_requests * records_per_request

    json_total_tokens = total_records * json_tokens_per_record
    toon_total_tokens = total_records * toon_tokens_per_record

    cost_per_million = 10
    json_cost = (json_total_tokens / 1_000_000) * cost_per_million
    toon_cost = (toon_total_tokens / 1_000_000) * cost_per_million

    return json_cost, toon_cost, json_cost - toon_cost

json_cost, toon_cost, savings = project_monthly_savings(100, 500)

Before optimization, the monthly spend was high enough to trigger internal budget reviews. After conversion, the savings were obvious and recurring.

Common Problems I Faced During Implementation

Problem 1: Inconsistent Data Shapes

TOON performs best with uniform arrays. Early datasets contained optional fields that broke tabular layouts. The fix was simple but important. Normalize records before conversion and fill missing fields explicitly.

Problem 2: Token Miscounting

I initially relied on rough character based estimates. These were close but not accurate enough for billing decisions. Switching to proper tokenizers eliminated guesswork.

Problem 3: Prompt Confusion

The LLM sometimes misunderstood TOON input when no explanation was provided. Adding a short format note in the prompt resolved this immediately.

Implementation Pattern Used in Production

This pattern evolved after several iterations and failures. It now runs reliably in production.

class LLMDataPipeline {
  constructor(apiKey) {
    this.apiKey = apiKey;
  }

  async query(data, question, options = {}) {
    const optimize = options.optimizeTokens !== false;

    const formatted = optimize
      ? this.jsonToToon(data)
      : JSON.stringify(data, null, 2);

    const prompt = this.buildPrompt(formatted, question, optimize);
    const tokens = this.estimateTokens(prompt);

    console.log('Input tokens:', tokens);

    return await this.callModel(prompt);
  }

  buildPrompt(data, question, isToon) {
    const note = isToon
      ? 'Data uses TOON format with tabular arrays'
      : 'Data uses standard JSON format';

    return note + '\n\nDATA:\n' + data + '\n\nQUESTION:\n' + question;
  }

  estimateTokens(text) {
    return Math.ceil(text.length / 4);
  }

  jsonToToon(data) {
    // conversion logic
  }
}

Best Practices Learned the Hard Way

  1. Convert only after validating JSON structure
  2. Use TOON mainly for large, uniform datasets
  3. Add format hints to prompts
  4. Track token usage before and after optimization
  5. Cache converted payloads whenever possible

ROI Calculation from a Production Scenario

This calculation reflects a real reporting dashboard used by non technical teams to understand savings.

function calculateROI(params) {
  const daily = params.dailyRequests;
  const records = params.avgRecordsPerRequest;
  const cost = params.modelCostPerMillionTokens;

  const jsonTokens = 45;
  const toonTokens = 18;

  const monthly = daily * 30;
  const jsonCost = (monthly * records * jsonTokens / 1000000) * cost;
  const toonCost = (monthly * records * toonTokens / 1000000) * cost;

  return {
    monthlySavings: jsonCost - toonCost,
    yearlySavings: (jsonCost - toonCost) * 12
  };
}

Seeing yearly savings broken out made adoption an easy decision for stakeholders who did not care about formats but cared deeply about cost.

🔧 Try Our Free TOON Converter

Convert your JSON to TOON format instantly and see your token savings in real-time!

⚡ Open TOON Converter

Conclusion

TOON is not just a theoretical optimization. It solves a real cost problem that shows up once LLM systems move beyond prototypes. JSON remains essential for APIs and storage, but when it comes to feeding structured data into models, TOON consistently delivers measurable savings with minimal tradeoffs.

🏷️ Tags:
llm optimization cost reduction token optimization openai claude gpt-4 api costs

📚 Related Articles