Caching Strategies for Fast Web Apps
Caching is the highest-leverage performance work in most web apps. A clear map of the layers, the patterns, and the invalidation traps.
The fastest query is the one you never run
Caching strategy is the highest-leverage performance work available to most web applications, because the cheapest request is the one your database and your origin server never see. But caching done carelessly serves stale data, leaks one user's information to another, or creates bugs that only appear under load. The skill is knowing what to cache, where, and — hardest of all — when to throw it away.
Think in layers. A request passes through several places where a cached answer can short-circuit the work, and each layer has different rules.
The layers, from browser to database
- Browser cache. Controlled by HTTP headers. Cache-Control and ETag let the browser reuse assets without asking the server. Versioned filenames (a content hash in the name) let you cache static assets for a year and bust them by changing the URL.
- CDN / edge cache. A shared cache near the user. Excellent for static assets and public, non-personalised pages. The win is both speed and offloading traffic from your origin entirely.
- Application cache. An in-memory store like Redis holding computed results, rendered fragments, or session data. This is where most dynamic-content caching lives.
- Database cache. Materialised views and the database's own buffer cache. Useful for expensive aggregations that change infrequently.
Patterns worth knowing by name
A few caching patterns cover the overwhelming majority of cases:
- Cache-aside (lazy loading). The application checks the cache; on a miss it reads the source, stores the result, and returns it. Simple, robust, and the default choice. The cost is a slightly slower first request after expiry.
- Write-through. Writes go to the cache and the database together, so the cache is always current. Good when reads vastly outnumber writes and freshness matters.
- Time-based expiry (TTL). Let entries expire after a fixed window. The right TTL is a business decision: a product price might tolerate seconds, a help article might tolerate a day.
Invalidation is the hard part
There is an old joke that the two hard problems in computing are naming things, cache invalidation, and off-by-one errors. The joke endures because invalidation really is where caching bites back.
The core tension: cache too aggressively and users see stale data; cache too timidly and you lose the benefit. Practical defences we rely on:
- Key your cache by everything that changes the answer. If a response depends on the user, the locale, and a feature flag, all three belong in the cache key. The most dangerous caching bug is serving a personalised page from a shared cache — one user seeing another's data.
- Prefer short TTLs over manual invalidation when you can tolerate slight staleness. A 60-second TTL is far simpler to reason about than a web of explicit cache-clear calls scattered through your write paths.
- Invalidate on write for data that must be fresh. When an order status changes, clear that order's cache entry in the same operation, not on a timer.
- Never cache anything user-specific at the CDN edge unless the cache key isolates it by user. This rule has no exceptions.
What not to cache
Caching is not free and not always right. Skip it for data that changes on nearly every read, for cheap queries where the cache lookup costs as much as the query, and for anything where a stale answer causes real harm — a payment balance, an inventory count at checkout, an authorisation decision. Reaching for a cache to paper over a slow query is often the wrong move; sometimes the right fix is an index, not a cache.
Measure before you cache and after. A cache that does not improve a metric you care about is just added complexity and a new way to serve wrong data.
The stampede that takes down your origin
One failure mode deserves its own warning because it tends to strike at the worst moment. When a popular cached entry expires, every in-flight request misses at once and they all rush the database together to recompute the same value. Under high traffic this thundering herd can overwhelm the very origin the cache was protecting — and it happens precisely when you are busiest.
- Serve stale while you revalidate. Return the slightly expired value and refresh it in the background, so users never wait on the recompute and the origin sees one request, not thousands.
- Let one request rebuild, not all of them. A short lock around the recompute means a single caller refreshes the entry while the rest briefly serve stale or wait, instead of stampeding in parallel.
- Spread expiry with jitter. Add a small random offset to TTLs so a batch of entries cached together does not all expire on the same second.
A sane default stack
For a typical web app we start simple: long-lived browser caching on versioned static assets, a CDN in front of public pages and assets, and Redis with cache-aside plus modest TTLs for expensive dynamic results. That covers most of the gain with little risk, and you add write-through, stale-while-revalidate, or explicit invalidation only where a specific requirement demands it.
A few operational notes keep that stack healthy. Give Redis a memory limit and an eviction policy so it sheds the least-useful keys instead of falling over when full. Treat the cache as disposable: if it vanishes, the application must still work, only slower, because a cache you cannot lose has quietly become a database without the durability guarantees. And put the cache hit rate on a dashboard — a ratio that drifts downward is the early sign that your keys or TTLs have stopped matching how the app is actually used.
How BSH can help
BSH Technologies builds fast, scalable web applications for clients in India and worldwide, and caching is one of the first levers we reach for when an app needs to handle more load without a bigger bill. We will map your request path, place caches where they earn their keep, and get the invalidation right so speed never comes at the cost of correctness. Get in touch when performance matters.
From the blog
View all postsDesigning Multi-Tenant SaaS That Scales
Choosing an isolation model, keeping tenant data separate, and dodging the noisy-neighbour and migration traps that bite SaaS later.
Hitting Green Core Web Vitals in Next.js
A practical guide to LCP, INP and CLS in Next.js — image handling, font loading, the App Router boundary, and costly third-party scripts.