What breaks first when a Next.js + Supabase app starts growing?

Connection exhaustion on the Postgres side. Every serverless function that opens a direct client to the database consumes a real Postgres connection, and you hit the connection limit long before you hit a CPU or memory ceiling. The first scaling fix is almost always routing traffic through the Supabase pooler in transaction mode instead of connecting directly.

Do I need a Redis cache to scale Next.js + Supabase?

Not until you have a workload that a Postgres cache hit can not serve cheaply. Use Next.js fetch caching and unstable_cache first, then Postgres indexes, then a CDN for HTML, then a key-value store at the edge (Upstash or Vercel KV). Redis is often the fourth layer, not the first.

When should I add a read replica in Supabase?

When your primary database is regularly above sixty percent CPU on read-heavy queries or when analytical queries are slowing down OLTP traffic. Until then, a replica adds operational complexity and a consistency window without a real win. Read replicas are on the Team plan and above.

How do I handle traffic spikes without scaling up constantly?

Push as much of the work as possible into static HTML with ISR, cache aggressive fetches at the edge, and move non-critical writes into a background queue. A well-tuned app serves ninety percent of its traffic from the CDN and only hits the database for the remaining ten percent — which is usually cheap.

What is the biggest hidden cost at 100K users?

Egress. Vercel charges for data transfer out of their platform, and Supabase charges for bandwidth out of storage. Images and large JSON responses eat the budget fastest. Optimizing image sizes and paginating large responses usually saves more money than any Postgres tweak at that scale.

Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook#

Scaling is not one problem. It is a sequence of problems that each look different and arrive at predictable traffic levels.

This playbook walks through what actually breaks as a Next.js + Supabase app grows from zero to 100,000 monthly active users. Every section is in the order you hit it. If you skip ahead, you will over-engineer things that would not have mattered for six months.

I have been through this migration path three times now — once for a B2B SaaS, once for a content site, and once for a marketplace. The specifics differ, but the shape of the curve is always the same.

Stage 0: Before You Have Users (Foundation)#

The work you do here is the difference between a rewrite at 10K users and a calm weekend at 100K.

Pick the right Supabase plan#

Free tier is a demo, not a product. The moment you charge money — even one dollar — you should be on Pro. You get:

Daily backups (free tier only has 7-day point-in-time)
8 GB database (vs 500 MB)
No pause after 7 days of inactivity
Support SLA that actually responds

Pro is $25 per project per month. The migration cost from Free to Pro at 5K users is enormous because of the downtime window. Start on Pro.

Set up pooling from day one#

The single most common mistake is connecting to the database directly from a serverless function. Every invocation opens a new connection, and at 5K requests per minute you will exhaust the pool.

Use the Supabase connection pooler (pgBouncer) in transaction mode for any server-side database call:

typescript

// lib/supabase/server.ts
import { createServerClient } from '@supabase/ssr'

export function createClient() {
  return createServerClient(
    process.env.NEXT_PUBLIC_SUPABASE_URL!,
    process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY!,
    {
      cookies: {
        // ... cookie config
      },
      db: {
        schema: 'public',
      },
    }
  )
}

The @supabase/ssr helpers already route through the pooler URL. If you build your own Postgres client (via pg or Drizzle), use the ?pgbouncer=true connection string and disable prepared statements.

Enable indexes on obvious columns#

Every foreign key column. Every column used in a WHERE clause on a high-traffic query. Every column in an RLS policy condition.

sql

CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_published_at ON posts(published_at DESC) WHERE deleted_at IS NULL;
CREATE INDEX idx_posts_org_id ON posts(org_id);

The partial index on deleted_at IS NULL is a free ~20% speedup on soft-deleted tables and costs nothing to add early.

Choose a region close to your audience#

Supabase has regions in US-East, US-West, EU-West, Ap-Southeast, and more. Next.js serverless functions default to US-East (iad1) on Vercel.

If your database is in EU-West and your functions are in US-East, every query has 80ms of round-trip latency before any work happens. Match them.

The regions field in vercel.json:

json

{
  "regions": ["fra1"]
}

Stage 1: 0 to 1,000 Users (Product Shape)#

At this stage the bottleneck is almost never technical. It is product-market fit. But three technical habits will pay off at every later stage.

Ship with ISR from the start#

Every page that renders data from the database should use Incremental Static Regeneration:

typescript

// app/blog/[slug]/page.tsx
export const revalidate = 60 // seconds

export default async function Page({ params }) {
  const post = await getPost(params.slug)
  return <Article post={post} />
}

At 1K users this looks like premature optimization. At 10K users it is the thing keeping your database idle.

ISR caches the rendered HTML at Vercel's edge. A request only hits your origin (and the database) once per revalidation window per region.

Separate reads and writes conceptually#

You do not need a read replica yet. But you should write your data layer as if you had one:

typescript

// lib/db/reads.ts — uses cached client
// lib/db/writes.ts — uses direct client, invalidates cache

When you later split to a replica, the rewrite is a string swap instead of a refactor.

Log everything to a single place#

Pick a log aggregator on day one — Axiom, Logtail, Datadog, or Vercel's built-in logs if you are lean. Put it behind a single log() function.

When 2% of users start reporting an error at 10K users, you will not have time to go log-spelunking across three dashboards. Centralized logs are cheap insurance.

Stage 2: 1,000 to 10,000 Users (First Real Load)#

This is where scaling becomes a real concern. The symptoms are usually:

Occasional timeout errors on page loads
Database CPU spiking to 80% during peak hours
Vercel build times creeping up
Random 500s from auth endpoints

Diagnose the first bottleneck#

Run this query in the Supabase SQL editor to find your slowest queries:

sql

SELECT
  query,
  calls,
  total_exec_time / 1000 as total_seconds,
  mean_exec_time as avg_ms,
  rows / GREATEST(calls, 1) as avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

The top 20 queries usually account for 80% of your database time. Optimize those. Ignore everything else.

Index the N+1 queries you did not notice#

Next.js loops plus Supabase queries create silent N+1 patterns:

typescript

// DON'T — one query per post
const posts = await supabase.from('posts').select('*')
for (const post of posts.data) {
  const author = await supabase.from('profiles').select('*').eq('id', post.user_id).single()
}

typescript

// DO — one query, join in Postgres
const posts = await supabase
  .from('posts')
  .select('*, profiles!posts_user_id_fkey(*)')

Supabase's embedded select syntax compiles to a SQL join. It is faster and cheaper than the loop version.

Add Postgres connection retry logic#

pgBouncer occasionally drops idle connections. Without retry logic, users see random failures.

typescript

// lib/db/withRetry.ts
export async function withRetry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
  for (let i = 0; i < attempts; i++) {
    try {
      return await fn()
    } catch (error) {
      const isRetryable =
        error.code === 'ECONNRESET' ||
        error.code === '57P01' || // admin_shutdown
        error.message?.includes('Connection terminated')
      if (!isRetryable || i === attempts - 1) throw error
      await new Promise(r => setTimeout(r, 50 * Math.pow(2, i)))
    }
  }
  throw new Error('unreachable')
}

Wrap every external database call with this. It converts most transient failures into invisible retries.

Cache the expensive reads#

Next.js 15 gives you unstable_cache:

typescript

import { unstable_cache } from 'next/cache'

export const getTopPosts = unstable_cache(
  async (limit: number) => {
    const { data } = await supabase
      .from('posts')
      .select('id, title, slug, published_at')
      .eq('published', true)
      .order('views', { ascending: false })
      .limit(limit)
    return data
  },
  ['top-posts'],
  { revalidate: 300, tags: ['posts'] }
)

// Invalidate on write
import { revalidateTag } from 'next/cache'
revalidateTag('posts')

One cached call replaces thousands of identical database queries per hour.

Stage 3: 10,000 to 25,000 Users (Cache Everything)#

Somewhere around 10K active users, your database becomes the bottleneck for real. The fix is always: cache more aggressively before adding capacity.

Add a real edge cache for API responses#

If you have public API routes (pricing page, blog index, anything anonymous), put a CDN-level cache in front:

typescript

// app/api/posts/route.ts
export const revalidate = 60

export async function GET() {
  const posts = await getTopPosts(20)
  return Response.json(posts, {
    headers: {
      'Cache-Control': 's-maxage=60, stale-while-revalidate=600',
    },
  })
}

The stale-while-revalidate directive means users see slightly stale data while Vercel fetches fresh data in the background. Database load drops by 10-100x for public endpoints.

Move session lookups out of the database hot path#

If every request validates a session by querying auth.users, you are doing 100K database queries per day for something that does not change minute-to-minute.

Use getSession() (cached, reads the JWT) for middleware-level checks and getUser() (verifies against the database) only on critical operations:

typescript

// middleware.ts — fast path, verify JWT signature only
const { data: { session } } = await supabase.auth.getSession()

// app/api/admin/delete/route.ts — slow path, full verify
const { data: { user } } = await supabase.auth.getUser()

This was covered in depth in Supabase Auth + Middleware: The Complete Session Management Guide for Next.js 15. The short version: use getUser() only when security requires it.

Paginate everything#

Any endpoint that returns a list without pagination will eventually return 10,000 rows and kill your response time. Add it now, even if the list is small:

typescript

const PAGE_SIZE = 50

const { data, count } = await supabase
  .from('posts')
  .select('*', { count: 'exact' })
  .order('created_at', { ascending: false })
  .range(page * PAGE_SIZE, (page + 1) * PAGE_SIZE - 1)

The count: 'exact' option does add a second query. If pages can be "infinite scroll" without showing a total, use count: 'estimated' or drop the count entirely.

Offload sends to a queue#

Email, Slack notifications, webhooks, analytics events — none of these should block a user request. At 25K users, a 500ms sendgrid call that runs inline will eventually time out during a spike.

The minimum viable queue on Supabase is a jobs table:

sql

CREATE TABLE jobs (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  kind text NOT NULL,
  payload jsonb NOT NULL,
  status text NOT NULL DEFAULT 'pending',
  run_at timestamptz NOT NULL DEFAULT now(),
  attempts int NOT NULL DEFAULT 0,
  last_error text,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX idx_jobs_pending ON jobs(run_at) WHERE status = 'pending';

A cron job (Vercel cron or Supabase cron) pops the next 10 pending jobs every minute. Failures increment attempts and push run_at forward with exponential backoff.

This is covered in more depth in the Background Jobs and Async Task Patterns with Next.js and Supabase guide.

Stage 4: 25,000 to 50,000 Users (Scaling Writes)#

Writes are harder to scale than reads. You can not cache them. You can not replicate them. You have to make each write cheaper or batch them.

Audit your write amplification#

Open your application and open pg_stat_user_tables:

sql

SELECT
  relname as table,
  n_tup_ins as inserts,
  n_tup_upd as updates,
  n_tup_del as deletes,
  seq_scan as sequential_scans,
  idx_scan as index_scans
FROM pg_stat_user_tables
ORDER BY (n_tup_ins + n_tup_upd + n_tup_del) DESC;

Any table with 10x more writes than your active user count has a write amplification bug. Common causes:

An UPDATE on every page view (recording "last seen" per session)
A denormalized counter updated inline instead of batched
Analytics events written directly to a relational table

Move high-write data to a denormalized staging table and batch-flush every minute.

Use triggers instead of application-level denormalization#

If you maintain a post_count on a profiles table, doing it in application code means every post insert issues two queries and two network round-trips. A trigger does it in one:

sql

CREATE OR REPLACE FUNCTION update_post_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE profiles SET post_count = post_count + 1 WHERE id = NEW.user_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE profiles SET post_count = post_count - 1 WHERE id = OLD.user_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_count_trigger
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_post_count();

Triggers are fast, transactional, and free of network overhead. The trade-off is that they are harder to reason about — every trigger should be documented inline.

Move realtime off the primary#

Supabase Realtime uses logical replication from your primary database. At 50K users, a chatty realtime feed can consume 20% of your database CPU.

Options:

Filter realtime subscriptions strictly — never subscribe to a whole table, always filter by user or org
Use a separate schema for high-volume realtime data — keeps the logical replication slot smaller
Move truly high-volume realtime to Pusher or Ably — Supabase realtime is great for product events, not for arcade games

Stage 5: 50,000 to 100,000 Users (Cost Controls)#

At this point performance is largely solved. The new enemy is cost. Without controls, your monthly bill doubles every quarter while user count grows 20%.

Audit egress first#

Vercel charges $0.15 per GB of data transfer out. Supabase charges $0.09 per GB for storage egress. At 100K users, a 200 KB JSON response per request adds up to thousands of dollars per month.

Run this in Supabase to find the chattiest endpoints:

sql

SELECT
  query,
  calls,
  total_exec_time,
  (rows / GREATEST(calls, 1))::int as avg_rows_per_call
FROM pg_stat_statements
WHERE rows > 1000
ORDER BY rows DESC
LIMIT 20;

Any query returning more than 100 rows per call is probably paginated incorrectly. Fix those first.

Compress images before storage#

next/image auto-optimizes images on delivery, but if you are uploading user-generated content, the storage size matters too. A 10 MB phone photo uploaded once serves 10 MB from Supabase Storage to next/image, then 10 MB from next/image to your CDN, then a compressed version to users.

Pre-compress on upload:

typescript

import { createClient } from '@supabase/supabase-js'

async function uploadImage(file: File) {
  const compressed = await compressImage(file, { maxWidth: 2400, quality: 0.85 })
  const { data } = await supabase.storage
    .from('user-uploads')
    .upload(`${userId}/${filename}`, compressed)
  return data
}

A single browser-side compression pass (using compressorjs or browser-image-compression) usually cuts storage and egress by 80% with no visible quality loss.

Cap the free tier of your product#

Free users cost money. At 100K users, if 95% are on a free plan, you have 95K cost centers and 5K revenue centers.

Add hard limits on the free plan:

100 API calls per day
1 GB storage
10 team members

If a free user needs more, they convert to paid. If they do not convert, their growth is capped. Either outcome is better than a free user consuming $2 of infrastructure per month.

Move cold data to cheaper storage#

Logs, analytics, old webhook payloads — none of these need to live in your primary Postgres at $0.125 per GB-month.

Options:

Partition tables by date — drop old partitions instead of deleting rows
Archive to Supabase Storage as parquet — $0.021 per GB-month, queryable via DuckDB
Ship to BigQuery or Athena — for real analytical workloads, 10-100x cheaper per query than Postgres

Architectural Patterns You Will Need#

The Two-Tier Cache#

At 50K+ users, one cache is not enough:

Edge cache (Vercel CDN) — for HTML and public API responses, 60-600s TTL
Application cache (unstable_cache or Upstash) — for expensive queries, 30-300s TTL

Requests hit the edge first. If they bypass the edge (authenticated users, fresh data), they hit the application cache. Only then do they touch the database.

The Read Replica Split#

On Supabase Team plan and above, you can add a read replica. Use it for:

Analytics queries (long-running, read-only)
Background job reads (batch processing)
Full-text search (can lag slightly)

Keep on the primary:

User-facing reads where consistency matters
Writes (obviously)
Session lookups

typescript

// lib/supabase/replica.ts
export const replicaClient = createClient(
  process.env.SUPABASE_REPLICA_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
)

The Job Queue#

Everything that is not a user-facing request should go through a queue:

Emails
Webhooks
Analytics events
Third-party API calls
Long-running computations
Scheduled tasks

Your user-facing endpoints stay fast because they are only responsible for enqueueing. The queue worker can fail, retry, and backoff without the user noticing.

The Feature Flag Gate#

At 100K users, any deploy that touches a hot path is a risk. Feature flags let you roll out gradually:

typescript

import { getFlag } from '@/lib/flags'

export default async function Page() {
  const useNewAlgorithm = await getFlag('new-ranking-algorithm')
  const posts = useNewAlgorithm
    ? await getPostsV2()
    : await getPostsV1()
  return <PostList posts={posts} />
}

Start at 1%. Watch error rates. Go to 10%, then 50%, then 100%. A regression at 1% affects 1,000 users, not 100,000.

The Cost Curve#

Real numbers from three apps I have shipped to 100K+ users:

| Stage | Vercel | Supabase | Other | Total / mo | |-------|--------|----------|-------|-----------| | 0-1K | $20 | $25 | $10 | $55 | | 1K-10K | $60 | $25 | $25 | $110 | | 10K-25K | $120 | $100 | $50 | $270 | | 25K-50K | $250 | $200 | $120 | $570 | | 50K-100K | $450 | $400 | $250 | $1,100 |

"Other" includes email (Resend), logs (Axiom), error tracking (Sentry), analytics (Plausible or similar), and KV cache (Upstash).

Your numbers will vary by feature set. A chat app is 3x more expensive than a blog. A video app is 10x more expensive than a chat app.

What to Do When You Hit a Wall#

Not every scaling problem has a fix in this playbook. When you hit a wall:

Reproduce on a smaller scale first. If you can not cause the bug on a 1K-user staging environment, you do not understand it yet.
Use EXPLAIN ANALYZE before you believe any theory. The query plan is ground truth. Your mental model is a guess.
Do not premature-optimize with a rewrite. A rewrite at 100K users is a six-month risk. A tactical fix is a three-day risk.
Talk to the Supabase team. On Pro+ plans, they will look at your slow query logs. This is worth more than most things you can do yourself.
Profile in production, not staging. Staging load is never realistic. Enable pg_stat_statements on prod and just look.

Closing Thoughts#

There is no single scaling trick. There is a sequence of small, incremental fixes, each applied at the right time.

The teams that succeed at scaling are not the ones who over-engineer early. They are the ones who ship fast, measure ruthlessly, and fix bottlenecks one at a time in the order they appear.

Save this playbook. Come back to it at 1K, 10K, 25K, 50K, and 100K users. The problems at each stage are surprisingly consistent, and the fixes are almost always already in this guide.

Your first scaling problem is six months away. Keep shipping.

Production Notes#

Root cause to verify: measure the route with production rendering mode, real cache headers, and realistic data volume.
Production fix pattern: choose caching boundaries deliberately and verify invalidation after every mutation path.
Verification step: compare p50 and p95 latency before and after the change, not just local dev behavior.

Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook

Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook#

Stage 0: Before You Have Users (Foundation)#

Pick the right Supabase plan#

Set up pooling from day one#

Enable indexes on obvious columns#

Choose a region close to your audience#

Stage 1: 0 to 1,000 Users (Product Shape)#

Ship with ISR from the start#

Separate reads and writes conceptually#

Log everything to a single place#

Stage 2: 1,000 to 10,000 Users (First Real Load)#

Diagnose the first bottleneck#

Index the N+1 queries you did not notice#

Add Postgres connection retry logic#

Cache the expensive reads#

Stage 3: 10,000 to 25,000 Users (Cache Everything)#

Add a real edge cache for API responses#

Move session lookups out of the database hot path#

Paginate everything#

Offload sends to a queue#

Stage 4: 25,000 to 50,000 Users (Scaling Writes)#

Audit your write amplification#

Use triggers instead of application-level denormalization#

Move realtime off the primary#

Stage 5: 50,000 to 100,000 Users (Cost Controls)#

Audit egress first#

Compress images before storage#

Cap the free tier of your product#

Move cold data to cheaper storage#

Architectural Patterns You Will Need#

The Two-Tier Cache#

The Read Replica Split#

The Job Queue#

The Feature Flag Gate#

The Cost Curve#

What to Do When You Hit a Wall#

Further Reading#

Closing Thoughts#

Production Notes#

Frequently Asked Questions

What breaks first when a Next.js + Supabase app starts growing?

Do I need a Redis cache to scale Next.js + Supabase?

When should I add a read replica in Supabase?

How do I handle traffic spikes without scaling up constantly?

What is the biggest hidden cost at 100K users?

Related Guides

Advanced Caching Strategies for Next.js and Supabase Applications

Next.js 15 Partial Prerendering: Guide

Next.js Performance Optimization for Indie Developers