Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to be able to get the tracklist section from album page #32

Open
foderking opened this issue Aug 8, 2021 · 4 comments
Open

Need to be able to get the tracklist section from album page #32

foderking opened this issue Aug 8, 2021 · 4 comments

Comments

@foderking
Copy link

i'm writing an app for get album information. right now i'm using an hackby first using regex to get the "tracklist" section , then parsing that.

it would be cool to be able to parse tracklist easily - espcially for double albums where you have 2 or more "{{tracklist...}}" sections

@dijs
Copy link
Owner

dijs commented Aug 9, 2021

That sounds like a cool feature! Could you give me a few wikipedia page examples please?

@foderking
Copy link
Author

https://en.wikipedia.org/wiki/Scorpion_(Drake_album)
https://en.wikipedia.org/wiki/The_Best_in_the_World_Pack
https://en.wikipedia.org/wiki/Positions_(album)

Generally any page for an album.
Parsing the wikitext source ignores the "tracklist section", thats why i have to use regex first to get only section and then parse that.

@dijs
Copy link
Owner

dijs commented Aug 11, 2021

So, this is an interesting and difficult problem. First of all, the track listings are not ever in a infobox. This parser has stretched itself to parse other things (albeit, not very well) outside of infoboxes, but I do not think it was wise to do that.

That being said, I may try and refactor out my data-types to common components which can be used to parse infoboxes, page sections, or even entire page sources.

It's a complex problem, like many that come up in wiki-text parsing.

By the way, how was the parsed version of the album when you did it manually? If it was nice, I may just hack that together for now.

@foderking
Copy link
Author

foderking commented Aug 30, 2021

i did a regex match for the tracklist section /{{track.*list.*?^}}/gmsi
This also captures when there are like 2 tracklist sections
I then parse the sections independently with the infobox. it works pretty well, although producer info is kept in the "extra credits" in the parsed object

heres the link to the repository
https://github.com/foderking/WhoProduced/blob/main/src/App.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants