
Hydra (AIDirector)
Intelligent AI API Gateway with Automatic Failover
Self-initiated core platform used across products. Technical architecture shared; source code is proprietary.
The Challenge
I needed one reliable AI gateway that could serve multiple products (Zeqah and ISEvolutions workloads) without each app managing provider instability independently.
The Approach
I designed a 3-step architecture: Client SDK → Vercel (token validation) → Cloudflare Worker (AI generation on the free tier where possible) → Vercel (cache result). Fallback chains automatically switch between Gemini and OpenRouter when one fails.
System Architecture
A 3-step request flow designed to minimize costs and maximize reliability: (1) Client SDK sends request to Vercel Edge Function for token validation. (2) Validated requests are forwarded to a Cloudflare Worker (free tier) for actual AI generation via Gemini, with automatic failover to OpenRouter if Gemini fails. The gateway explicitly routes heavy Document Intelligence, PDF, and image extraction (OCR) tasks to optimal vision models. (3) Results are cached in Redis via a Vercel function before being returned to the client. The SDK is published on npm for easy integration. HMAC request signing prevents API key theft between the Vercel and Cloudflare layers. PPP (Purchasing Power Parity) pricing is calculated server-side based on the user's country, making AI accessible in 50+ markets.
System architecture overview
Built For
Developers and startups building AI-powered products who need a reliable, affordable AI API without managing provider relationships directly. Particularly developers in emerging markets where AI API costs can be prohibitive - PPP pricing makes the platform accessible to builders in 50+ countries.
Design Decisions
Why the 3-step architecture instead of a simple proxy?
Cloudflare Workers have a generous free tier for compute. By doing the heavy AI API calls there, we eliminated server costs for generation entirely. Vercel handles auth and caching (which it's better at), while CF handles the actual AI calls (which are CPU-bound).
Why HMAC authentication?
API keys can be stolen from client-side code. HMAC request signing ensures that even if someone inspects network requests, they can't forge new ones - every request is cryptographically signed with a timestamp to prevent replay attacks.
Why PPP (Purchasing Power Parity) pricing?
A developer in Nigeria shouldn't pay the same as one in San Francisco. PPP pricing adjusts costs based on the user's country, making the platform accessible globally while maintaining revenue targets.
Code Preview
Curated excerpts from the actual production codebase — demonstrating architecture, patterns, and engineering quality.
💡 Cryptographic request signing system that prevents API key theft and replay attacks. The server never sees the full secret key — it verifies signatures using the stored hash. Timing-safe comparison prevents side-channel attacks.
1export function verifyHmacSignature(2 secretKeyHash: string,3 signature: string,4 config: HmacSignatureConfig,5 toleranceMs: number = 5 * 60 * 1000 // 5 minutes default6): { valid: boolean; error?: string } {7 // Check timestamp freshness → prevents replay attacks8 const now = Date.now();9 const timeDiff = Math.abs(now - config.timestamp);1011 if (timeDiff > toleranceMs) {12 return {13 valid: false,14 error: `Request timestamp expired. Diff: ${timeDiff}ms`,15 };The Team
Founder & Architect (me)
System design, SDK, failover architecture, caching, and operations
Product Integrators
Integrated Hydra into Zeqah and ISEvolutions product workflows
Tech Stack
Outcomes & Impact
- ~50% cache hit rate reduces AI API calls (and costs) by half
- Automatic failover keeps the service available during single-provider outages
- Client SDK published on npm for easy integration
- PPP pricing enables developers in 50+ countries to use the platform affordably
- Robust multi-modal pipeline supporting high-throughput document intelligence and OCR parsing tasks
💬 Behind the Scenes
“The name "Hydra" isn't just cool - it's the architecture. Cut off one head (provider goes down), and another takes its place. We tested this by intentionally breaking providers during load tests. The failover was seamless.”