Ahmet Kemal Yilmaz | Security Architect

Incident Report

"Internal Server Error" - The New Normal?

If you were trying to code with Copilot or chat with GPT-4 in November 2023, you likely stared at a spinning wheel or a generic "Capacity Reached" error. OpenAI confirmed they were dealing with "abnormal traffic patterns reflective of a DDoS attack."

[Image of DDoS attack types diagram]

This wasn't your average script kiddie attack.
Traditional volumetric attacks (Layer 3/4) flood the pipes with garbage UDP packets. This was a Layer 7 (Application Layer) attack. The difference is crucial:

Volumetric: Tries to clog the network cable.
Layer 7: Tries to exhaust the server's CPU/RAM.

The "Asymmetric" Cost of AI

The attackers (a group claiming to be Anonymous Sudan) exploited a specific weakness in LLM architecture: Asymmetry.

Cheap to Request: Sending a request like POST /completions { prompt: "Write a novel..." } costs the attacker fractions of a cent and minimal bandwidth.
Expensive to Process: The server must spin up massive H100 GPU clusters, load gigabytes of weights into VRAM, and spend seconds generating tokens.

This creates a resource imbalance. You don't need a massive botnet to take down an AI service; you just need enough requests to fill the inference queue.

Technical Mitigation: The "Proof of Work" Defense

How do you stop this? Rate limiting based on IP isn't enough because attackers rotate IPs. The industry standard solution involves implementing a Client-Side Proof of Work (PoW) challenge.

Before the API accepts your request, it forces your browser/client to solve a math puzzle.

// Conceptual Example of a Challenge-Response
async function solveChallenge(challengeString) {
    let nonce = 0;
    while (true) {
        const hash = await sha256(challengeString + nonce);
        if (hash.startsWith("0000")) { // Finding the "Golden Nonce"
            return nonce;
        }
        nonce++;
    }
}

Why this works: It shifts the cost back to the attacker. If their botnet has to spend 100% CPU to solve puzzles for every request, the attack becomes too expensive to sustain.

Lessons for API Developers

This outage taught us that 429 Too Many Requests is not a failure; it's a survival mechanism. If you are building GenAI wrappers, you need aggressive caching and circuit breakers. Never assume the upstream provider is infinite.

SYSTEM NOTES