Help with PDF-Based Chatbot and hallucination issues

Hello community,

I’m working on a project aimed at creating a chatbot that uses PDF files as its database. The goal is for the user to be able to ask questions and for the API to provide exact responses extracted from the PDFs. Currently, I’m using the OpenAI API.

My main issue is the model’s hallucinations. I have read in some forums that the OpenAI API tends to hallucinate when is provided with many files and I’m currently seeking solutions to this problem and also some experiences on what is the best approach to take for this problem. I’m not so sure if it’s a problem that only the OpenAI API has or is a common problem with IA in general. Also for some additional context, I have 50 PDF files and each has between 50 and 100 pages in total.

Here are my questions:

How can I reduce or eliminate hallucinations when working with multiple PDFs?
What techniques or approaches (ex.: ML or DL) are recommended to improve the chatbot’s accuracy? Specifically, I want it to extract information directly from the PDFs without generating false data caused by hallucinations.

Thank you very much in advance

submitted by /u/dze02
[link] [comments]

Help with PDF-Based Chatbot and hallucination issues

More posts

TTS Latenz Benchmark 2025: Google vs. Microsoft Voices für Phonebots

What’s the Funniest Thing an AI Chatbot Has Ever Said to You?

Recommend your best AI chatbot for 2025

Chatbots : Conventional and unconventional uses