Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It doesn't work very well with this PDF. #3

Open
datermine opened this issue Jan 23, 2024 · 4 comments
Open

It doesn't work very well with this PDF. #3

datermine opened this issue Jan 23, 2024 · 4 comments

Comments

@datermine
Copy link

Here's an example of a PDF it doesn't work well at all with:
https://nysirestakes.com/backend/News/news_upload/2023_Breeders_Award_12123_1706.pdf

Sample prompt: What are the headers of the table?

@Nutlope
Copy link
Owner

Nutlope commented Jan 24, 2024

I appreciate you reporting this! Yeah I don't think it does too well with tables to be honest since I pass it all in as just text. Perhaps a feature to implement, which is detecting tables and embedding them in a certain format

@ajaxbo360
Copy link

yeah its a feature to implement , detecting tables should be nice to have .

@rudro12356
Copy link

I also tried this with a research paper and it didn't work well. The pdf had tables, charts and texts. The model seemed to be hallucinating.

@hynra
Copy link

hynra commented Feb 28, 2024

How about using a custom document loader like Unstructred? Unstructred is also available on Langchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants