06 VĀNI · MĀRGA

Intelligent LLM Routing. 80% Less Cost.

मार्गा“The Path” — Sanskrit

A small, fast classifier in front of your LLM stack. Every request hits the advisor first — trivial tasks go to tiny models, complex reasoning goes to frontier models. Same quality, fraction of the cost.

MĀRGA intelligent LLM routing visualization — requests flowing through the advisor to optimal model tiers
The Advisor Pattern

How MĀRGA Routes Requests

Like a hospital triage nurse — you don't send every patient to the head surgeon. MĀRGA classifies in under 10ms, then routes to the optimal model.

MĀRGA routing architecture — advisor classifies requests across 3 axes then routes to the optimal model tier
Request Flow
Incoming Request → MĀRGA Advisor (8ms) → Routing Table → Optimal Model
Tier 0: Trivial
→ Local 4B model
~500ms · $0.00
Tier 1: Simple
→ 7B–24B model
~2s · $0.001
Tier 2: Moderate
→ 70B or mid-tier API
~5s · $0.01
Tier 3: Complex
→ Frontier model
~15s · $0.05

8ms Classification

The advisor runs on a tiny 4B model. Classification takes 5–8ms on modest hardware — imperceptible latency for massive cost savings.

🎯

3-Axis Routing

Every request is classified on task type, complexity, and latency requirements. Three dimensions are enough — more adds noise.

💰

60–80% Cost Reduction

Most agent workloads don't need frontier models. MĀRGA routes 60–80% of requests to cheaper tiers without sacrificing output quality.

🔄

Self-Improving

Routing decisions feed back into the advisor. Over time, MĀRGA learns your workload patterns and gets more accurate at classification.

🏗️

Model Agnostic

Works with any LLM provider — OpenAI, Anthropic, Google, local Ollama models. Swap models without changing application code.

📊

Distributed Caching

Redis-powered distributed cache layer with intelligent prompt deduplication. Sub-millisecond cache lookups, cross-node consistency, and automatic TTL management — at 1M+ requests/day, the difference between 20% and 60% cache hits is the difference between profit and loss.

8ms
Classification Latency
80%
Cost Reduction
64K
Requests / Second
99.8%
Distributed Cache Hit Rate
0
Code Changes Required
Performance Validated

From $12K/mo to $2.4K/mo — Without Changing a Line of Code

A real-world stress test on commodity hardware demonstrates that MĀRGA handles production workloads with sub-millisecond overhead and extreme throughput.

The Problem

Every request → GPT-4

An AI-powered DevOps platform was sending all requests — from simple “what's the status?” queries to complex incident analysis — to a single frontier model. At 50K requests/day, costs were $12,000/month and climbing 40% month-over-month.

The Solution

MĀRGA as a drop-in proxy

Changed one line: base_url = "marga.avyay.ai/v1". The advisor classified each request in 8ms, routing 72% to cheaper tiers. Zero application code changes. Zero quality degradation on the tasks that matter.

The Result

80% cost reduction

Monthly LLM costs dropped from $12K → $2.4K. P95 response latency improved by 35% (cheaper models respond faster). The self-improving advisor now routes with 94% accuracy after 30 days of learning.

Stress Test Results — Apple M2 Max, 32GB RAM

ConcurrencyThroughputp50p95p99Errors
10064,532 req/s1.3ms3.2ms4.2ms0
1,00059,345 req/s15.2ms22.0ms26.1ms0
5,00047,329 req/s87.2ms98.2ms102.9ms0
10,00049,419 req/s66.1ms125.0ms0

Zero errors at 10,000 concurrent connections. Rate limiter accuracy: 100% (120/120 exact). Cache hit rate: 99.8%.

MĀRGA cost optimization — before and after comparison showing 80% reduction
One Line to Integrate

OpenAI-Compatible API. Change the URL, Keep Your Code.

MĀRGA speaks the OpenAI Chat Completions protocol. Any SDK or tool that works with OpenAI works with MĀRGA — just swap the base URL.

from openai import OpenAI

# Drop-in replacement — just change the base URL
client = OpenAI(
    base_url="https://marga.avyay.ai/v1",
    api_key="your-marga-key"
)

response = client.chat.completions.create(
    model="auto",   # MĀRGA picks the optimal model
    messages=[
        {"role": "user", "content": "Summarize this incident report..."}
    ]
)
print(response.choices[0].message.content)
1
Get API Key
Sign up and generate your MĀRGA API key from the dashboard.
2
Change Base URL
Point your OpenAI SDK to marga.avyay.ai/v1 — one line change.
3
Set model: "auto"
MĀRGA picks the best model. Or pin a tier with model: "tier-1".
Simple Pricing

Pay for What You Route

Start free, scale when ready. No hidden fees, no per-seat pricing. You only pay for routing — model costs are passed through at cost.

Free
$0
Forever — no credit card required
  • 1,000 routed requests / day
  • 2 model tiers (Tier 0 + Tier 1)
  • OpenAI-compatible API
  • Basic analytics dashboard
  • Community support
Get Started Free
ProPOPULAR
$79/mo
Billed annually ($948/yr)
  • 50,000 routed requests / day
  • All 4 model tiers
  • Self-improving advisor
  • Advanced analytics + cost tracking
  • Custom routing rules
  • Webhook notifications
  • Priority email support
Start 14-Day Free Trial
Enterprise
Custom
Volume pricing for high-scale teams
  • Unlimited routed requests
  • All 4 model tiers + custom tiers
  • Self-host or managed deployment
  • SSO / SAML authentication
  • Custom model integrations
  • SLA guarantees (99.99%)
  • Dedicated support + Slack channel
  • On-prem deployment option
Contact Sales

All plans include model cost pass-through at provider rates — no markup. MĀRGA only charges for routing intelligence, not the underlying model usage.

📖
API Documentation
Complete OpenAI-compatible API reference with examples in Python, Node.js, Go, and curl.
🧪
API Playground
Try MĀRGA live — send requests, see responses, and generate curl commands interactively.
⚙️
Self-Host Guide
Run MĀRGA on your own infrastructure. Docker, Kubernetes, GCP Cloud Run, and bare-metal deployment guides.
Protected by:🛡️ SOC 2🇪🇺 GDPR🔒 CCPA🔐 AES-256Learn more →
Alpha Access · मार्गा

Get Early Access to MĀRGA

Join our alpha program. Limited spots — we'll review applications and send API keys to approved users.

No spam. We'll only email you about your alpha access.