2026-03-17 | Arjoonn S | Sethu Vishal S
How we run an API serving 200k requests/day across multiple nodes for ~$40/month ; no load balancer, no central coordinator, just cooperative clients, local read replicas, and SSH-tunneled writes.
Three VPS machines at $12/month each. Two app nodes, one PostgreSQL primary. No central load balancer. Clients decide which node to talk to. ~220k API requests/day, spiky peaks hit 50 req/sec in short bursts, 600 req/min over a minute, 20k req/hr over an hour. Graceful read only degradation when the primary goes down. Total cost: ~$40/month including S3 backups.
We didn't design this upfront. Each stage was forced by a concrete failure over a period of a few years.
Django + SQLite on one VPS. Deploys were git pull && systemctl restart gunicorn. Worked fine while the serup to a few thousand requests/day.
Multiple gunicorn workers writing simultaneously produced database is locked errors under traffic bursts. WAL mode helped but didn't fix it fully. We migrated to PostgreSQL on the same machine.
Past ~80k requests/day, gunicorn workers and PostgreSQL were competing for memory and CPU on a small machine. Traffic is spiky, average throughput looks modest but peaks hit ~50 req/sec, which is enough to cause worker pileups during slow queries. We split app and DB onto separate machines.
One app server means every restart is downtime, and a hardware failure means ~20-40 minutes offline while a replacement provisions on the cloud. At 200k requests/day that's a real cost. We added a second web node.
Db was still on one machine. When that went down, everything went down. So we added a managed DB cluster. To manage costs we reduced the API machine to a single one. This worked well.
With a beefy API server and a Linode managed PostgreSQL cluster (3 node HA) the 3 node cluster alone cost ~$180/month. Total with the API node pushed the bill past $200/month ; prohibitive for our scale. We cannot simply drop it and switch to a self hosted primary on a budget VPS. That would reintroduce the SPOF problem.
In this time, we also had a datacenter fire, and multiple AWS downtime events all of which took down our service, making us think of multi cloud to make sure the service does not go down.
That's where we are now. The rest of this post explains the current architecture in detail.
Each web node (api1, api2, ...) is an independent VPS, potentially across different cloud providers. There is no load balancer in front of them. Clients know about each node by name and choose which one to call themselves.
Each node runs:
On the DB node we run PostgreSQL primary alongside a hardened SSH tunnel sidecar container that shares the same Docker network as the primary. The DB host itself does not accept SSH connections; all tunnels terminate at the sidecar, which forwards into the private Docker network.
Reads are served from the local replica. Writes go through the Mole tunnel, terminate at the sidecar on the DB node, and are forwarded directly to the primary. The primary is never exposed to the public internet.
This is the most important design decision, and the most unusual one. There is no central load balancer or DNS round robin deciding which node a client hits. Each downstream application ships with knowledge of all named API endpoints (api1.example.com, api2.example.com, ...) and picks one itself based on:
429 Too Many Requests, the client backs off from that node for a while and prefers the others.Our client side algorithm: jittered round robin weighted by latency derived health scores stored in Redis. We track per node moving latency and recent error/429 rates; selection adds jitter to avoid herds and decays weights over time so nodes recover smoothly.
Other algorithms we could choose depending on workload at some point in the future:
This makes failures self healing from the client's perspective. If api1 and api2 go down, clients automatically route to api3-api10 with no DNS change required. There's no TTL to wait out, no health check propagation delay, no coordinator that can become a single point of failure.
A common question here: isn't the client SDK itself a coordinator? Yes, in a sense. The routing logic lives in the SDK rather than on a central server. The difference is that the downstream clients are all systems we control and deploy, so we can update the SDK on our own schedule. More importantly, the coordination cost is distributed and each client handles routing for its own traffic slice rather than one box handling all of it. We've also had a central HAProxy go down before, which caused a full outage. This approach removes that failure mode entirely.
Another question: how do clients discover new nodes? They don't, automatically. We keep nodes pre provisioned and named (api1 through api10). Under normal conditions only a few are active. When we need more capacity, we bring up one of the spare numbered nodes. Clients already have that hostname baked in and will start routing to it once it responds. Emergency scaling is vertical (upgrading existing active machines), which doesn't require any client change at all.
Mole is a lightweight CLI tool that establishes and maintains a persistent SSH tunnel. We use it to forward the PostgreSQL port from each web node to the primary DB node. Each web node's Django app sees the primary as localhost:5433 ;no VPN, no firewall rules, no cloud provider VLAN.
Isolation details: the SSH server we connect to is a separate hardened container on the DB host, on the same Docker network as pg_main. The host and the Postgres container do not accept SSH themselves; all tunnels terminate at the sidecar and forward into the private Docker network. This allows strict network policy while keeping the primary unexposed.
We considered alternatives:
SSH is simple, auditable, and universally understood. Mole runs as a systemd service with Restart=always. If the tunnel drops, it reconnects within seconds. A dropped tunnel means that node can't replicate or write until it reconnects reads from the local replica continue uninterrupted.
PostgreSQL streaming replication uses a persistent connection through the tunnel. When the tunnel bounces, the replica detects the lost connection and automatically re establishes the WAL stream once Mole reconnects. No manual intervention needed.
Our API traffic is overwhelmingly read heavy. Django's DB router sends all reads to the local replica and all writes to the primary (via the Mole tunnel). The configuration looks like:
DATABASES = {
'default': { # primary - writes only (via Mole tunnel to the DB node)
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'localhost',
'PORT': '5433', # Mole tunnel
...
},
'replica': { # local replica - reads
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'localhost',
'PORT': '5432',
...
},
}
If the primary goes down, reads keep working from the local replica. Only write endpoints start returning errors. This is graceful degradation , most of our API surface stays available without any special failover logic. The degradation is a natural consequence of the routing, not an engineered mode we switch into.
One sharp edge: read-your-own-writes. If a client writes something and immediately reads it back, replication lag means it might not see the write yet. Lag can reach up to 2 seconds on multi continent connections between replica and primary. Any endpoint where this matters must explicitly use using('default') to force a read from the primary. Easy to forget; worth having a test or lint rule for.
We also use django-knox for auth tokens. Knox stores tokens in the DB, which gets replicated to all nodes so a user who authenticates against api1 is eventually valid on api2 as well, with no shared session store needed. The replication lag window is the same as above.
Each node rate limits independently using its local Redis. This is intentional: different nodes can serve different traffic volumes due to differences in cloud providers / machine specs etc (we are pinning the cost per machine, not the machine spec), and our workload is almost entirely read only, so there's no correctness issue with clients getting more aggregate throughput by using multiple nodes.
Abuse prevention works differently. We use a rotating secret User-Agent allowlist, backed by simple, slow rotating secrets so we can gate "unknown" traffic quickly. This is the first of several layers we apply to prevent abuse; unrecognized traffic hits a low catch all rate limit.
| Component | Provider | Spec | Rough Cost / month | Notes |
|---|---|---|---|---|
api1 |
Linode | 1 vCPU, 2 GB RAM | $12 | Caddy, Django, PG replica, Redis |
api2 |
E2E Networks | 2 vCPU, 6 GB RAM | $12 | Identical setup, different provider |
| DB primary | Contabo | 8 vCPU, 24 GB RAM | $12 | PG primary; SSH tunnel sidecar; not public facing |
| S3 backups | <$5 | Hourly pg_dump | ||
| Total | ~$40 | Adding a node costs $12/month |
The specs look mismatched on paper ; a 1 core app node and a 24 GB DB node, all for the same price. That's just how budget VPS pricing works across providers. Contabo's ~$12 tier is unusually generous on resources; Linode's is not. The DB node is intentionally bigger: PostgreSQL benefits from having room to keep working sets in memory, and a beefy enough primary means it won't become the bottleneck for a long time.
The obvious question is whether $40 is justified for 220k requests/day. Purely on traffic, no ; this load is manageable on a single $10 VPS. Anyone who says otherwise is either running expensive queries or has under specced hardware. The cost is not buying throughput. It's buying the ability to deploy without downtime, keep reads alive when the primary dies, lose a full node or an entire provider without an incident, and sleep through most failure scenarios. On managed infrastructure, equivalent properties cost $200-400/month: RDS Multi AZ alone starts costing $100+ before any app redundancy.
Restart=always handles most cases within seconds, but we alert on tunnel down events.connect_timeout or statement_timeout set on the primary connection beyond Django's defaults, so a persistently slow tunnel would eventually cause gunicorn worker pileups. We haven't hit this, but it's worth adding explicit timeouts as a precaution.pg_dump to S3. Our data changes slowly so this RPO is acceptable for now, but continuous WAL archiving with something like wal-g is the right long term answer.connect_timeout and statement_timeout on the primary connection would bound the blast radius of a degraded tunnel.The system handles 220k requests/day on $40/month, has survived node failures, deploys, and a full provider outage without manual intervention, and the only thing that stops writes is the primary DB going down. We're comfortable with that for now. It’s a reasonable, frugal design for our constraints; niche appropriate rather than generally superior to managed HA stacks.
Comments