Hello community,
I’m working on a project aimed at creating a chatbot that uses PDF files as its database. The goal is for the user to be able to ask questions and for the API to provide exact responses extracted from the PDFs. Currently, I’m using the OpenAI API.
My main issue is the model’s hallucinations. I have read in some forums that the OpenAI API tends to hallucinate when is provided with many files and I’m currently seeking solutions to this problem and also some experiences on what is the best approach to take for this problem. I’m not so sure if it’s a problem that only the OpenAI API has or is a common problem with IA in general. Also for some additional context, I have 50 PDF files and each has between 50 and 100 pages in total.
Here are my questions:
-
How can I reduce or eliminate hallucinations when working with multiple PDFs?
-
What techniques or approaches (ex.: ML or DL) are recommended to improve the chatbot’s accuracy? Specifically, I want it to extract information directly from the PDFs without generating false data caused by hallucinations.
Thank you very much in advance
submitted by /u/dze02
[link] [comments]