Building a WhatsApp AI productivity bot. How do you actually scale this without going broke?

Written by

I’m building a WhatsApp productivity bot.

It tracks screen time, sends hourly nudges, asks you to log what you did, then generates a monthly AI “growth report” using an LLM.

Simple idea. But I know the LLM + messaging combo can get expensive and messy fast.

I’m trying to think like someone who actually wants this to survive at scale, not just ship a cute MVP.

Main concerns:

Concurrency. What happens when 5k users reply at the same time?
Inference. Do you queue everything? Async workers? Batch LLM calls?
Cost. Are you summarizing daily to compress memory so you’re not passing huge context every month?
WhatsApp rate limits. What breaks first?
Multi-user isolation. How do you avoid context bleeding?

Rough flow in my head:
Webhook → queue → worker → DB → LLM if needed → respond.

For people who’ve actually scaled LLM bots:
What killed you first? Infra? Token bills? Latency

Tell me what I’m underestimating.

More posts