Serverless: When It Fits and When It Doesn't
A senior look at where serverless functions genuinely win, where they quietly cost you, and the workloads to keep on long-running servers.
Serverless is a billing model, not a magic wand
Serverless computing changes one thing fundamentally: you pay for execution time, not for idle capacity. That single shift decides almost everything about whether it fits your workload. When traffic is spiky and unpredictable, you win big. When it is steady and high, you often pay a premium for the privilege of not managing servers you could have run cheaply anyway.
We deploy serverless on GCP Cloud Functions, Cloud Run, and AWS Lambda for clients every month, and the pattern is consistent: the technology is excellent, but the decision is economic and operational, not technical.
Where it genuinely wins
There are workloads where serverless is close to the obvious right answer:
- Event-driven glue. Resize an image on upload, send a webhook, process a queue message. These run rarely, finish fast, and scale to zero between events.
- Spiky or seasonal traffic. A registration portal that sees ten requests most of the year and ten thousand on launch day. Paying per request beats provisioning for a peak you hit twice.
- Internal tools and cron jobs. Scheduled reports, nightly syncs, admin endpoints. Nobody wants a server sitting idle 23 hours a day for a job that runs once.
- Early-stage products. When you do not yet know your traffic shape, scale-to-zero keeps the bill near nothing while you find product-market fit.
Where it quietly costs you
The failure modes are rarely loud. They show up as a slow creep in latency, cost, or debugging time:
- Cold starts on the critical path. A function that has scaled to zero pays a startup penalty on the next request — often hundreds of milliseconds, sometimes seconds for heavy runtimes. For a background job, nobody notices. For a user-facing API at the 95th percentile, it is a real problem.
- Sustained high throughput. Past a certain steady request volume, a right-sized container or VM running continuously is simply cheaper per request than per-invocation billing. We have moved clients off functions and onto Cloud Run or a small autoscaling group precisely when their traffic stopped being spiky.
- Long-running and stateful work. Functions have execution time limits and no durable local state. Video transcoding, long ETL, WebSocket sessions, and anything holding a connection open fight the model rather than fit it.
- Database connection storms. A thousand concurrent function instances each opening a Postgres connection will exhaust the pool fast. You end up adding a connection proxy, which is more infrastructure to operate — the very thing serverless promised to remove.
A decision checklist we actually use
Before recommending serverless to a client, we walk through a short set of questions. If most answers point the same way, the decision makes itself:
- Is traffic spiky, low-average, or genuinely unpredictable? Serverless leans favourable.
- Does each unit of work finish in seconds, not minutes? Favourable.
- Is the work stateless between invocations? Favourable.
- Is steady throughput high and predictable? Lean toward containers.
- Do you need sub-100ms tail latency with no cold-start tolerance? Lean toward always-warm containers, or pay for provisioned concurrency.
The honest answer for most real systems is a mix: serverless for the spiky edges and background jobs, long-running services for the hot path. Purity is not a virtue here.
The operational reality nobody mentions
Serverless removes server management but adds its own operational surface. Observability is harder when execution is ephemeral — you need structured logging and distributed tracing from day one, because you cannot SSH into a function to see what happened. Local development needs emulation. Cold starts need monitoring as a first-class metric. And vendor primitives differ enough that a Lambda is not a drop-in Cloud Function. None of this is disqualifying. It just means serverless is a trade, not a free lunch, and you should go in knowing which costs you are accepting.
Cost modelling that survives contact with reality
The pricing pitch — pay only for what you use — is true and also incomplete. Per-invocation billing combines request count, allocated memory, and execution duration, and the interactions surprise people. Doubling a function's memory to cut cold starts also doubles the per-millisecond rate, so a change made for latency quietly raises cost. Functions that fan out to many downstream calls bill for the whole wall-clock wait, including time spent idle on a slow database. We have seen a tidy-looking function bill balloon simply because the work inside it spent most of its time waiting on something else.
The discipline that keeps this honest is modelling cost at your realistic request volume before you commit, then watching the actual bill against that model for the first month. Two numbers decide most architectures: requests per month and average duration. Plug in pessimistic values, compare against the equivalent always-on container, and let the spreadsheet — not the trend — make the call. When the volume is genuinely spiky, serverless usually wins by a wide margin. When it is not, the gap narrows or reverses, and you want to know that before launch, not after the invoice arrives.
How BSH can help
From our base in Thrissur, Kerala, we build cloud systems on both GCP and AWS for clients worldwide, and we have no incentive to push you toward a buzzword. If you are weighing serverless against containers, we will map your actual traffic shape, model the cost both ways, and architect the mix that fits — then build and operate it. Talk to BSH Technologies when you want the trade-off made on evidence, not hype.
From the blog
View all postsDesigning Multi-Tenant SaaS That Scales
Choosing an isolation model, keeping tenant data separate, and dodging the noisy-neighbour and migration traps that bite SaaS later.
Hitting Green Core Web Vitals in Next.js
A practical guide to LCP, INP and CLS in Next.js — image handling, font loading, the App Router boundary, and costly third-party scripts.