I am not qualified to make an AI Humanoid Gf, So I make an AI Waifu Gf instead. It’s kind of a digital humanoid, just living in a virtual world instead of physical.

Written by

in

Meet Aiko β€” My AI Anime Waifu

Aiko is an AI chatbot with persistent memory, vision, voice input/output, web-search via simple tool calling, and 3D avatar animation β€” all running on models under 4B parameters. Technically, she could run on a home PC.

I submitted Aiko as my entry for the Hugging Face Build Small Hackathon, and after approval, got some free credits from their sponsors. So right now, all models run via llama.cpp servers on the Modal platform.

The source code is free and fully open source β€” do whatever you want with it. Note: the codebase is currently wired to Hugging Face Spaces (Gradio) and my specific Modal endpoints, so some modifications are needed to port it to a local machine and run on a normal OS.

Why I built her

None of the AI companions or chatbots on the market really suited my taste. What I really wanted was a good TTS that could speak both English and Japanese with a beautiful anime girl voice β€” something I could use to help learn Japanese.

For the memory architecture, I recycled the system I originally built for my AI autonomous robot project (submitted to NVIDIA earlier this year). Figured it’s better to put it to good use than let it sit there doing nothing.

Stack

  • 🧠 Brain: Ministral-3 3B Instruct (GGUF) via llama.cpp
  • πŸ”Š Voice: MioTTS 2.6B, custom-cloned voice (may try FishTTS S2 Pro β€” it supports emotional voices including laughing, angry, surprised, shouting, and even moaning… hehe)
  • πŸŽ™οΈ Ears: faster-whisper large-v3-turbo for speech-to-text (may try Nemotron 3.5 streaming model β€” supposedly faster with streaming support)
  • πŸ‘οΈ Eyes: MiniCPM-V 4.6 (1.3B) for image/video inference via webcam or uploaded images/videos
  • πŸ’Ύ Memory: SQLite-vec vector DB with hybrid semantic + keyword search, using an Ebbinghaus decay curve so frequently-recalled memories persist and unused ones fade after a grace period
  • πŸ” Web Search: Self-hosted SearXNG instance on Modal, called via LLM tool/function calling
  • πŸ”§ Tools: Weather, time/timezone, currency conversion, crypto prices, and anime lookups β€” all via LLM-driven function calling with regex-based intent detection as fallback β€” May try for more complicate tasks in the future
  • 🎭 Avatar: VRoid 3D model rendered via Three.js/three-vrm (using the default model for now β€” may build my own once I figure out 3D modeling and animation)

submitted by /u/Oppa-AI
[link] [comments]