New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open Library Publication Date Mismatches #7
base: main
Are you sure you want to change the base?
Conversation
…atches Added Open Library Publication Date Mismatch Files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw a small complication but might be ignored. The OL JSON file has "first publication-date" does not match with the publication date when going to the actual website. Ex: "Q100202821","OL8704115W","1960-01-01T00:00:00Z" has February 1981 in JSON but the publish date on https://openlibrary.org/works/OL8704115W/Cuba_para_principiantes?mode=all is 1970 on the website. This might bring in some confusion.
Anyways, there are a couple of other stuff to consider as well. Might need some tweaking but definitely in the right direction and a lot of good work here!
...on/1_mismatch_generations/Open Library Mismatches/OL-Publication-Date-Mismatch-Generation.py
Outdated
Show resolved
Hide resolved
...on/1_mismatch_generations/Open Library Mismatches/OL-Publication-Date-Mismatch-Generation.py
Show resolved
Hide resolved
...on/1_mismatch_generations/Open Library Mismatches/OL-Publication-Date-Mismatch-Generation.py
Outdated
Show resolved
Hide resolved
# In[31]: | ||
|
||
|
||
mismatch_dataframe.to_csv('openlibrary_publication_date_mismatches.csv', index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just a minor thing but a few items might not be mismatches. Ex: Lines 3, 23, 37, 324, etc. The years for both sets are correct but the OpenLibrary value only has the year while the Wikidata value has month and year. Consider comparing only years when OL only has the year value in the JSON file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing line 23 at least, and this is a good point. If open library only has the year, then we shouldn't be triggering a mismatch on something that has the year and the month :)
Just verified something. Maggie is comparing first publication date for an OpenLibary work to "publication date" in a Wikidata work. "publication date" when used on a Wikidata work means the date a work is first published according to https://www.wikidata.org/wiki/Wikidata:WikiProject_Books so we are good with this upload. |
Some quick things on this:
|
I'm deriving the publication date mismatches from Open Library's API. I'm not sure why the date external value is not on the Open Library page, but it is the result when I make a request to the API. In this case, it looks like there are also editions of this work published prior to This seems to be the case for many books. For Q5142283, the Open Library page shows It seems that the attribute 'first_publish_date' may not be reliable in all cases. |
Thanks for the further explanation on this, @mgaoann! Are we able to merge some of these into a common ID for a specific work? If that could work, then I think there should be a lot of potential here :) |
@andrewtavis-wmde Could you clarify a bit about what you mean by merging into a common ID for a specific work? |
Hey @mgaoann 👋 Sorry for the late reply. Let's talk about this in the meeting later, but generally what I'm meaning here is can we find a common ID for all editions of an individual book and then use that to derive the earliest publication date? So can we link the |
Hey @andrewtavis-wmde Sorry I wasn't able to be at the meeting. When I originally wrote the code, I make API requests only to get the works, but I did some digging, and I believe there's a way to get all of the editions associated for a work. Here's the result. It seems that the Either way, since I can see all the publication dates of the editions associated with the work, I may be able to find the original publication date by comparing the dates to determine the original publication date. Is this what you were referring to? |
No stress, @mgaoann! Hope all's well :) And the process you suggested makes total sense and is what I was thinking about :) Let's find the original publication date and compare that value 😊 Looking forward to the results, and let me know if there's anything I can do to support! |
Updating
…hes/openlibrary_publication_date_mismatches.csv
…hes/OL-Publication-Date-Mismatch-Generation.ipynb
…hes/OL-Publication-Date-Mismatch-Generation.py
First set of mismatches - Open Library works (P648) for publication date (P577)