I was developing Chatbot for telegram Where i used to scrap contents from websites using langchain webBaseLoader
But the problem is, the data was too rough (eg: one content title combines with another) and some times the entire data may not be useful or the contents are in non-English language
But i need only the contents to be in proper format as much as possible
Any better possible way, that can improve content scraping from websites?
I found, some of the API are available they provide better content scraping, but I’m student, so i can’t invest on those Free API was not enough for my purpose as well
Thankyou for everybody in advance ❤️
submitted by /u/ExpressBalance2601
[link] [comments]