Building a WhatsApp AI productivity bot. How do you actually scale this without going broke?

I’m building a WhatsApp productivity bot.

It tracks screen time, sends hourly nudges, asks you to log what you did, then generates a monthly AI “growth report” using an LLM.

Simple idea. But I know the LLM + messaging combo can get expensive and messy fast.

I’m trying to think like someone who actually wants this to survive at scale, not just ship a cute MVP.

Main concerns:

  • Concurrency. What happens when 5k users reply at the same time?
  • Inference. Do you queue everything? Async workers? Batch LLM calls?
  • Cost. Are you summarizing daily to compress memory so you’re not passing huge context every month?
  • WhatsApp rate limits. What breaks first?
  • Multi-user isolation. How do you avoid context bleeding?

Rough flow in my head:
Webhook → queue → worker → DB → LLM if needed → respond.

For people who’ve actually scaled LLM bots:
What killed you first? Infra? Token bills? Latency

Tell me what I’m underestimating.

submitted by /u/Dizzy-Watercress-744
[link] [comments]