Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the Zenon-database as source #9

Open
nmueller18 opened this issue Apr 16, 2021 · 1 comment
Open

Adding the Zenon-database as source #9

nmueller18 opened this issue Apr 16, 2021 · 1 comment

Comments

@nmueller18
Copy link

I would like to see the Zenon-Database added to the possible sources. Taking especially the pubmed- and CrossRef-importers as template, this should not be too difficult. Each Zenon-entry is identified by a unique identifier, and this entry is accessible via a BibTeX-entry.
I have modified the files citation_api_import/index.js and citation_api_import/templates.js accordingly and generated an additional file citation_api_import/zenon.js. However, at the moment I am struggling how to parse the records. An example output could look like that:

<div>
                                            <a href="/Record/001219271" class="title getFull" data-view="full">
                                                                    Die Nutzung baltischen Feuersteins an der Schwelle zur Bronzezeit, Krise oder Konjunktur der Feuersteinverarbeitung?                    </a>
                                        </div>

                                        <div>

                                                                                                                                                                                                                                            von                                                            
                                            <a href="  /Author/Home?author=Rassmann%2C+Knut.">Rassmann, Knut.</a>
                                            <br/>

                                                                    
                                                                    
                                                                    
                                                                                                                                Veröffentlicht in                                                        
                                            <a href="/Record/000644412">
                                                                                    Bericht der Römisch-Germanischen Kommission, 81 (2000)                            </a>
                                            <br/>

                                                                    
                                                                                                2000.                            
                                            <br/>

                                                                    
                                                                                                                                                          Umfang/Format:  5-36 : Abb. Taf.
                                            <br/>
                                        </div>

Because it is possible that more than one Zenon id is included in a reference (if this is part of another referenced item), the querySelector needs to cater for this possibility. Would something like that work:
const zenonid = el.querySelector('input[<a href="/Record/" class="]').value?
Then the rest of the record needs to be parsed to get the three components Author, Title and Published. This should be possible as there are lots of <br/>s and <a>s. But I do not know how to modify the code snippet
const descriptionParts = el.innerHTML.split('<br>\n')[1].split(/ <b>\(|\)<\/b>\. /g). Why, for example, is the string split twice?

@johanneswilm
Copy link
Member

I would like to see the Zenon-Database added to the possible sources. Taking especially the pubmed- and CrossRef-importers as template, this should not be too difficult. Each Zenon-entry is identified by a unique identifier, and this entry is accessible via a BibTeX-entry.

I agree, this should not be too difficult to achieve.

[...]

Because it is possible that more than one Zenon id is included in a reference (if this is part of another referenced item), the querySelector needs to cater for this possibility. Would something like that work:
const zenonid = el.querySelector('input[<a href="/Record/" class="]').value?

The querySelector needs to receive a valid CSS selector. Briefly looking at the source code here, it looks like there are three links in any record:

<a href="/Record/000644412">...</a>
<a href=" /Author/Home?author=Rassmann%2C+Knut.">...</a>
<a href="/Record/001219271" class="title getFull" data-view="full">...</a>

It is the last record we want, right? In that case it is simple because it can be distinguished by it's class attribute like this:

const zenonLink = el.querySelector('a.getFull')

or, even better, to get the id from the link:

const zenonid = parseInt(el.querySelector('a.getFull').getAttribute('href').split('/').pop())

Then the rest of the record needs to be parsed to get the three components Author, Title and Published. This should be possible as there are lots of <br/>s and <a>s. But I do not know how to modify the code snippet
const descriptionParts = el.innerHTML.split('<br>\n')[1].split(/ <b>\(|\)<\/b>\. /g). Why, for example, is the string split twice?

This has simply to do with the structure of the HTML used by one of the other citation database sites. The text wrangling will be very specific to every site (and will need to be updated once the website changes). In this case, I am guessing we need to fetch the author from the links leading to author pages. Those links have no special class, so instead we just need to filter through all included links in the entry, for example like this:

const authors = Array.from(el.querySelectorAll('a')).filter(a => a.getAttribute('href').includes('?author=')).map(a => a.innerText)

which will return:

["Rassmann, Knut."]

If we also want the period gone at the end, we could modify it like this:

Array.from(el.querySelectorAll('a')).filter(a => a.getAttribute('href').includes('?author=')).map(a => a.innerText.replace(/\.$/g,''))

which returns:

["Rassmann, Knut"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants