I’m building a WhatsApp productivity bot.
It tracks screen time, sends hourly nudges, asks you to log what you did, then generates a monthly AI “growth report” using an LLM.
Simple idea. But I know the LLM + messaging combo can get expensive and messy fast.
I’m trying to think like someone who actually wants this to survive at scale, not just ship a cute MVP.
Main concerns:
- Concurrency. What happens when 5k users reply at the same time?
- Inference. Do you queue everything? Async workers? Batch LLM calls?
- Cost. Are you summarizing daily to compress memory so you’re not passing huge context every month?
- WhatsApp rate limits. What breaks first?
- Multi-user isolation. How do you avoid context bleeding?
Rough flow in my head:
Webhook → queue → worker → DB → LLM if needed → respond.
For people who’ve actually scaled LLM bots:
What killed you first? Infra? Token bills? Latency
Tell me what I’m underestimating.
submitted by /u/Dizzy-Watercress-744
[link] [comments]