Skip to content

Chat
Phone 2 Web
Login
- My Profile
- Submit New Blog Post
Registration

How do large AI chatbots/companions manage LLM costs at scale?

Written by

Franz Malten Buemann

in

I’ve been looking at multiple repos for memory, intent detection, and classification, and most rely heavily on LLM API calls. Based on rough calculations, self-hosting a 10B parameter LLM for 10k users making ~50 calls/day would cost around $90k/month (~$9/user). Clearly, that’s not practical at scale.

There are AI apps with 1M+ users and thousands of daily active users. How are they managing AI infrastructure costs and staying profitable? Are there caching strategies beyond prompt or query caching that I’m missing?

Would love to hear insights from anyone with experience handling high-volume LLM workloads.

submitted by /u/rohansarkar
[link] [comments]

←[Hiring] Chatbot Developer Needed

The Best AI Girlfriend: Five Things I Tested Across 20 Platforms→

More posts

Chatgpt misread my text?!

July 2, 2026
[Academic] research on AI use in human-to-human romantic relationships

July 1, 2026
Which RAG design patterns are you actually using for internal chatbots? And where do you find solid templates?

July 1, 2026
How to add an AI chatbot to your website without coding

July 1, 2026

VEP Test Site

FSC Chat

Blog
About
FAQs
Authors

Events
Shop
Patterns
Themes

Twenty Twenty-Five

Designed with WordPress