Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook
Developer Guide

Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook

A complete scaling playbook for Next.js + Supabase. Covers connection pooling, caching layers, read replicas, queue offloading, CDN edge strategy, and cost controls from 0 to 100K users.

2026-04-19
36 min read
Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook

Scaling Next.js + Supabase from 0 to 100K Users: The Production Playbook#

Scaling is not one problem. It is a sequence of problems that each look different and arrive at predictable traffic levels.

This playbook walks through what actually breaks as a Next.js + Supabase app grows from zero to 100,000 monthly active users. Every section is in the order you hit it. If you skip ahead, you will over-engineer things that would not have mattered for six months.

I have been through this migration path three times now — once for a B2B SaaS, once for a content site, and once for a marketplace. The specifics differ, but the shape of the curve is always the same.

Stage 0: Before You Have Users (Foundation)#

The work you do here is the difference between a rewrite at 10K users and a calm weekend at 100K.

Pick the right Supabase plan#

Free tier is a demo, not a product. The moment you charge money — even one dollar — you should be on Pro. You get:

  • Daily backups (free tier only has 7-day point-in-time)
  • 8 GB database (vs 500 MB)
  • No pause after 7 days of inactivity
  • Support SLA that actually responds

Pro is $25 per project per month. The migration cost from Free to Pro at 5K users is enormous because of the downtime window. Start on Pro.

Set up pooling from day one#

The single most common mistake is connecting to the database directly from a serverless function. Every invocation opens a new connection, and at 5K requests per minute you will exhaust the pool.

Use the Supabase connection pooler (pgBouncer) in transaction mode for any server-side database call:

// lib/supabase/server.ts
import { createServerClient } from '@supabase/ssr'

export function createClient() {
  return createServerClient(
    process.env.NEXT_PUBLIC_SUPABASE_URL!,
    process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY!,
    {
      cookies: {
        // ... cookie config
      },
      db: {
        schema: 'public',
      },
    }
  )
}

The @supabase/ssr helpers already route through the pooler URL. If you build your own Postgres client (via pg or Drizzle), use the ?pgbouncer=true connection string and disable prepared statements.

Enable indexes on obvious columns#

Every foreign key column. Every column used in a WHERE clause on a high-traffic query. Every column in an RLS policy condition.

CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_published_at ON posts(published_at DESC) WHERE deleted_at IS NULL;
CREATE INDEX idx_posts_org_id ON posts(org_id);

The partial index on deleted_at IS NULL is a free ~20% speedup on soft-deleted tables and costs nothing to add early.

Choose a region close to your audience#

Supabase has regions in US-East, US-West, EU-West, Ap-Southeast, and more. Next.js serverless functions default to US-East (iad1) on Vercel.

If your database is in EU-West and your functions are in US-East, every query has 80ms of round-trip latency before any work happens. Match them.

The regions field in vercel.json:

{
  "regions": ["fra1"]
}

Stage 1: 0 to 1,000 Users (Product Shape)#

At this stage the bottleneck is almost never technical. It is product-market fit. But three technical habits will pay off at every later stage.

Ship with ISR from the start#

Every page that renders data from the database should use Incremental Static Regeneration:

// app/blog/[slug]/page.tsx
export const revalidate = 60 // seconds

export default async function Page({ params }) {
  const post = await getPost(params.slug)
  return <Article post={post} />
}

At 1K users this looks like premature optimization. At 10K users it is the thing keeping your database idle.

ISR caches the rendered HTML at Vercel's edge. A request only hits your origin (and the database) once per revalidation window per region.

Separate reads and writes conceptually#

You do not need a read replica yet. But you should write your data layer as if you had one:

// lib/db/reads.ts — uses cached client
// lib/db/writes.ts — uses direct client, invalidates cache

When you later split to a replica, the rewrite is a string swap instead of a refactor.

Log everything to a single place#

Pick a log aggregator on day one — Axiom, Logtail, Datadog, or Vercel's built-in logs if you are lean. Put it behind a single log() function.

When 2% of users start reporting an error at 10K users, you will not have time to go log-spelunking across three dashboards. Centralized logs are cheap insurance.

Stage 2: 1,000 to 10,000 Users (First Real Load)#

This is where scaling becomes a real concern. The symptoms are usually:

  • Occasional timeout errors on page loads
  • Database CPU spiking to 80% during peak hours
  • Vercel build times creeping up
  • Random 500s from auth endpoints

Diagnose the first bottleneck#

Run this query in the Supabase SQL editor to find your slowest queries:

SELECT
  query,
  calls,
  total_exec_time / 1000 as total_seconds,
  mean_exec_time as avg_ms,
  rows / GREATEST(calls, 1) as avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

The top 20 queries usually account for 80% of your database time. Optimize those. Ignore everything else.

Index the N+1 queries you did not notice#

Next.js loops plus Supabase queries create silent N+1 patterns:

// DON'T — one query per post
const posts = await supabase.from('posts').select('*')
for (const post of posts.data) {
  const author = await supabase.from('profiles').select('*').eq('id', post.user_id).single()
}
// DO — one query, join in Postgres
const posts = await supabase
  .from('posts')
  .select('*, profiles!posts_user_id_fkey(*)')

Supabase's embedded select syntax compiles to a SQL join. It is faster and cheaper than the loop version.

Add Postgres connection retry logic#

pgBouncer occasionally drops idle connections. Without retry logic, users see random failures.

// lib/db/withRetry.ts
export async function withRetry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
  for (let i = 0; i < attempts; i++) {
    try {
      return await fn()
    } catch (error) {
      const isRetryable =
        error.code === 'ECONNRESET' ||
        error.code === '57P01' || // admin_shutdown
        error.message?.includes('Connection terminated')
      if (!isRetryable || i === attempts - 1) throw error
      await new Promise(r => setTimeout(r, 50 * Math.pow(2, i)))
    }
  }
  throw new Error('unreachable')
}

Wrap every external database call with this. It converts most transient failures into invisible retries.

Cache the expensive reads#

Next.js 15 gives you unstable_cache:

import { unstable_cache } from 'next/cache'

export const getTopPosts = unstable_cache(
  async (limit: number) => {
    const { data } = await supabase
      .from('posts')
      .select('id, title, slug, published_at')
      .eq('published', true)
      .order('views', { ascending: false })
      .limit(limit)
    return data
  },
  ['top-posts'],
  { revalidate: 300, tags: ['posts'] }
)

// Invalidate on write
import { revalidateTag } from 'next/cache'
revalidateTag('posts')

One cached call replaces thousands of identical database queries per hour.

Stage 3: 10,000 to 25,000 Users (Cache Everything)#

Somewhere around 10K active users, your database becomes the bottleneck for real. The fix is always: cache more aggressively before adding capacity.

Add a real edge cache for API responses#

If you have public API routes (pricing page, blog index, anything anonymous), put a CDN-level cache in front:

// app/api/posts/route.ts
export const revalidate = 60

export async function GET() {
  const posts = await getTopPosts(20)
  return Response.json(posts, {
    headers: {
      'Cache-Control': 's-maxage=60, stale-while-revalidate=600',
    },
  })
}

The stale-while-revalidate directive means users see slightly stale data while Vercel fetches fresh data in the background. Database load drops by 10-100x for public endpoints.

Move session lookups out of the database hot path#

If every request validates a session by querying auth.users, you are doing 100K database queries per day for something that does not change minute-to-minute.

Use getSession() (cached, reads the JWT) for middleware-level checks and getUser() (verifies against the database) only on critical operations:

// middleware.ts — fast path, verify JWT signature only
const { data: { session } } = await supabase.auth.getSession()

// app/api/admin/delete/route.ts — slow path, full verify
const { data: { user } } = await supabase.auth.getUser()

This was covered in depth in [INTERNAL LINK: supabase-auth-complete-session-middleware-guide]. The short version: use getUser() only when security requires it.

Paginate everything#

Any endpoint that returns a list without pagination will eventually return 10,000 rows and kill your response time. Add it now, even if the list is small:

const PAGE_SIZE = 50

const { data, count } = await supabase
  .from('posts')
  .select('*', { count: 'exact' })
  .order('created_at', { ascending: false })
  .range(page * PAGE_SIZE, (page + 1) * PAGE_SIZE - 1)

The count: 'exact' option does add a second query. If pages can be "infinite scroll" without showing a total, use count: 'estimated' or drop the count entirely.

Offload sends to a queue#

Email, Slack notifications, webhooks, analytics events — none of these should block a user request. At 25K users, a 500ms sendgrid call that runs inline will eventually time out during a spike.

The minimum viable queue on Supabase is a jobs table:

CREATE TABLE jobs (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  kind text NOT NULL,
  payload jsonb NOT NULL,
  status text NOT NULL DEFAULT 'pending',
  run_at timestamptz NOT NULL DEFAULT now(),
  attempts int NOT NULL DEFAULT 0,
  last_error text,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX idx_jobs_pending ON jobs(run_at) WHERE status = 'pending';

A cron job (Vercel cron or Supabase cron) pops the next 10 pending jobs every minute. Failures increment attempts and push run_at forward with exponential backoff.

This is covered in more depth in the [INTERNAL LINK: nextjs-supabase-background-jobs-async-patterns] guide.

Stage 4: 25,000 to 50,000 Users (Scaling Writes)#

Writes are harder to scale than reads. You can not cache them. You can not replicate them. You have to make each write cheaper or batch them.

Audit your write amplification#

Open your application and open pg_stat_user_tables:

SELECT
  relname as table,
  n_tup_ins as inserts,
  n_tup_upd as updates,
  n_tup_del as deletes,
  seq_scan as sequential_scans,
  idx_scan as index_scans
FROM pg_stat_user_tables
ORDER BY (n_tup_ins + n_tup_upd + n_tup_del) DESC;

Any table with 10x more writes than your active user count has a write amplification bug. Common causes:

  • An UPDATE on every page view (recording "last seen" per session)
  • A denormalized counter updated inline instead of batched
  • Analytics events written directly to a relational table

Move high-write data to a denormalized staging table and batch-flush every minute.

Use triggers instead of application-level denormalization#

If you maintain a post_count on a profiles table, doing it in application code means every post insert issues two queries and two network round-trips. A trigger does it in one:

CREATE OR REPLACE FUNCTION update_post_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE profiles SET post_count = post_count + 1 WHERE id = NEW.user_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE profiles SET post_count = post_count - 1 WHERE id = OLD.user_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_count_trigger
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_post_count();

Triggers are fast, transactional, and free of network overhead. The trade-off is that they are harder to reason about — every trigger should be documented inline.

Move realtime off the primary#

Supabase Realtime uses logical replication from your primary database. At 50K users, a chatty realtime feed can consume 20% of your database CPU.

Options:

  1. Filter realtime subscriptions strictly — never subscribe to a whole table, always filter by user or org
  2. Use a separate schema for high-volume realtime data — keeps the logical replication slot smaller
  3. Move truly high-volume realtime to Pusher or Ably — Supabase realtime is great for product events, not for arcade games

Stage 5: 50,000 to 100,000 Users (Cost Controls)#

At this point performance is largely solved. The new enemy is cost. Without controls, your monthly bill doubles every quarter while user count grows 20%.

Audit egress first#

Vercel charges $0.15 per GB of data transfer out. Supabase charges $0.09 per GB for storage egress. At 100K users, a 200 KB JSON response per request adds up to thousands of dollars per month.

Run this in Supabase to find the chattiest endpoints:

SELECT
  query,
  calls,
  total_exec_time,
  (rows / GREATEST(calls, 1))::int as avg_rows_per_call
FROM pg_stat_statements
WHERE rows > 1000
ORDER BY rows DESC
LIMIT 20;

Any query returning more than 100 rows per call is probably paginated incorrectly. Fix those first.

Compress images before storage#

next/image auto-optimizes images on delivery, but if you are uploading user-generated content, the storage size matters too. A 10 MB phone photo uploaded once serves 10 MB from Supabase Storage to next/image, then 10 MB from next/image to your CDN, then a compressed version to users.

Pre-compress on upload:

import { createClient } from '@supabase/supabase-js'

async function uploadImage(file: File) {
  const compressed = await compressImage(file, { maxWidth: 2400, quality: 0.85 })
  const { data } = await supabase.storage
    .from('user-uploads')
    .upload(`${userId}/${filename}`, compressed)
  return data
}

A single browser-side compression pass (using compressorjs or browser-image-compression) usually cuts storage and egress by 80% with no visible quality loss.

Cap the free tier of your product#

Free users cost money. At 100K users, if 95% are on a free plan, you have 95K cost centers and 5K revenue centers.

Add hard limits on the free plan:

  • 100 API calls per day
  • 1 GB storage
  • 10 team members

If a free user needs more, they convert to paid. If they do not convert, their growth is capped. Either outcome is better than a free user consuming $2 of infrastructure per month.

Move cold data to cheaper storage#

Logs, analytics, old webhook payloads — none of these need to live in your primary Postgres at $0.125 per GB-month.

Options:

  1. Partition tables by date — drop old partitions instead of deleting rows
  2. Archive to Supabase Storage as parquet — $0.021 per GB-month, queryable via DuckDB
  3. Ship to BigQuery or Athena — for real analytical workloads, 10-100x cheaper per query than Postgres

Architectural Patterns You Will Need#

The Two-Tier Cache#

At 50K+ users, one cache is not enough:

  1. Edge cache (Vercel CDN) — for HTML and public API responses, 60-600s TTL
  2. Application cache (unstable_cache or Upstash) — for expensive queries, 30-300s TTL

Requests hit the edge first. If they bypass the edge (authenticated users, fresh data), they hit the application cache. Only then do they touch the database.

The Read Replica Split#

On Supabase Team plan and above, you can add a read replica. Use it for:

  • Analytics queries (long-running, read-only)
  • Background job reads (batch processing)
  • Full-text search (can lag slightly)

Keep on the primary:

  • User-facing reads where consistency matters
  • Writes (obviously)
  • Session lookups
// lib/supabase/replica.ts
export const replicaClient = createClient(
  process.env.SUPABASE_REPLICA_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
)

The Job Queue#

Everything that is not a user-facing request should go through a queue:

  • Emails
  • Webhooks
  • Analytics events
  • Third-party API calls
  • Long-running computations
  • Scheduled tasks

Your user-facing endpoints stay fast because they are only responsible for enqueueing. The queue worker can fail, retry, and backoff without the user noticing.

The Feature Flag Gate#

At 100K users, any deploy that touches a hot path is a risk. Feature flags let you roll out gradually:

import { getFlag } from '@/lib/flags'

export default async function Page() {
  const useNewAlgorithm = await getFlag('new-ranking-algorithm')
  const posts = useNewAlgorithm
    ? await getPostsV2()
    : await getPostsV1()
  return <PostList posts={posts} />
}

Start at 1%. Watch error rates. Go to 10%, then 50%, then 100%. A regression at 1% affects 1,000 users, not 100,000.

The Cost Curve#

Real numbers from three apps I have shipped to 100K+ users:

| Stage | Vercel | Supabase | Other | Total / mo | |-------|--------|----------|-------|-----------| | 0-1K | $20 | $25 | $10 | $55 | | 1K-10K | $60 | $25 | $25 | $110 | | 10K-25K | $120 | $100 | $50 | $270 | | 25K-50K | $250 | $200 | $120 | $570 | | 50K-100K | $450 | $400 | $250 | $1,100 |

"Other" includes email (Resend), logs (Axiom), error tracking (Sentry), analytics (Plausible or similar), and KV cache (Upstash).

Your numbers will vary by feature set. A chat app is 3x more expensive than a blog. A video app is 10x more expensive than a chat app.

What to Do When You Hit a Wall#

Not every scaling problem has a fix in this playbook. When you hit a wall:

  1. Reproduce on a smaller scale first. If you can not cause the bug on a 1K-user staging environment, you do not understand it yet.
  2. Use EXPLAIN ANALYZE before you believe any theory. The query plan is ground truth. Your mental model is a guess.
  3. Do not premature-optimize with a rewrite. A rewrite at 100K users is a six-month risk. A tactical fix is a three-day risk.
  4. Talk to the Supabase team. On Pro+ plans, they will look at your slow query logs. This is worth more than most things you can do yourself.
  5. Profile in production, not staging. Staging load is never realistic. Enable pg_stat_statements on prod and just look.
  • [INTERNAL LINK: nextjs-supabase-performance-optimization-2026] — specific Next.js + Supabase perf wins
  • [INTERNAL LINK: nextjs-supabase-caching-strategies] — deep dive into the caching layers
  • [INTERNAL LINK: nextjs-supabase-database-design-optimization] — Postgres schema patterns for scale
  • [INTERNAL LINK: supabase-connection-pooling-vercel] — pgBouncer specifics
  • [INTERNAL LINK: nextjs-supabase-background-jobs-async-patterns] — the queue pattern in detail
  • [INTERNAL LINK: nextjs-supabase-production-launch-checklist] — the pre-flight checklist for launch

Closing Thoughts#

There is no single scaling trick. There is a sequence of small, incremental fixes, each applied at the right time.

The teams that succeed at scaling are not the ones who over-engineer early. They are the ones who ship fast, measure ruthlessly, and fix bottlenecks one at a time in the order they appear.

Save this playbook. Come back to it at 1K, 10K, 25K, 50K, and 100K users. The problems at each stage are surprisingly consistent, and the fixes are almost always already in this guide.

Your first scaling problem is six months away. Keep shipping.

Frequently Asked Questions

|

Have more questions? Contact us