Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for disabling document text method #25

Open
m0dd0 opened this issue Mar 21, 2024 · 1 comment
Open

Option for disabling document text method #25

m0dd0 opened this issue Mar 21, 2024 · 1 comment

Comments

@m0dd0
Copy link

m0dd0 commented Mar 21, 2024

First, thanks for this very helpful library!

For many of the papers I read your algorithm works fine and finds the correct doi.
But as you already mention in the README, for some papers the used document_text method results in a wrong doi as the doi of other papers appear first.
Unfortunately this is very often the case for papers of certain conferences I read often as they contain arxiv IDs in the references and do not contain their own doi anywhere else in the text. At the same time, when I comment out the document_text method, I get pretty good results with the fourth method.
I am wondering if one of the following features might help to reduce these type of errors:

  • only using the first pages to look for doi in text
  • having an option to disable certain steps in the search process
  • being able to customize the order of the search methods

Do you think one of these options (or smth else) is something which the library would benefit from and can be implemented with a reasonable effort? If so, I can see if I find the time to turn my current "comment-out-workaround" into a mergable feature.

@MicheleCotrufo
Copy link
Owner

All options you suggested are possible.
I would avoid the first option, because it would be tricky to come up with the "right amount of pages to look into". In many journals, the DOI is at the end of the paper.
Options 2 and 3 are relatively easy, although they would make the command line more verbose. You are more than welcome to do a PR with your code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants