Ahmet Kemal Yilmaz | Security Architect

The Cost of a Request

In the old days, a request cost the server microseconds. In the age of LLMs (Large Language Models), a single API call can tie up a GPU for 10 seconds. This makes Application Layer (Layer 7) DDoS attacks devastatingly effective and cheap to launch.

If you use a simple "10 requests per minute" limit, attackers will just rotate through 10,000 IP addresses. We need smarter defense.

Strategy 1: The Token Bucket with Redis & Lua

To handle high throughput with atomic precision, we shouldn't rely on application logic (which is slow). We push the logic into Redis using Lua scripts. This ensures the "Check" and "Decrement" operations happen atomically.

-- redis_limiter.lua
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])

local current = redis.call("INCR", key)
if current == 1 then
    redis.call("EXPIRE", key, window)
end

if current > limit then
    return 0 -- Rejected
else
    return 1 -- Accepted
end

Strategy 2: JA3 Fingerprinting

IP addresses are cheap. Browser fingerprints are expensive. JA3 is a method to fingerprint the SSL/TLS client hello packet. Even if a bot rotates its IP, its SSL handshake parameters (Cipher Suites, TLS version, Extensions order) usually stay the same.

By implementing JA3 filtering at your ingress (e.g., using Cloudflare Workers or Nginx Plus), you can block a specific type of bot client globally, regardless of which IP it comes from.

Summary

Protecting GenAI APIs requires a multi-layered approach:

Network: Cloudflare/AWS Shield.
Identity: Mandatory Auth0/Cognito tokens.
Compute: Smart Rate Limiting based on token count, not just request count.