Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can anyone help me with the following? #172

Open
KaMyKaSii opened this issue Mar 5, 2022 · 2 comments
Open

Can anyone help me with the following? #172

KaMyKaSii opened this issue Mar 5, 2022 · 2 comments

Comments

@KaMyKaSii
Copy link

I'm no html expert, I just want to get a string from a site to use in a shell script. What command can I use on this page to get the string "2022-02-10 23:09:03"? Any help is appreciated. Thanks.

@rjp
Copy link

rjp commented Apr 1, 2022

Looks like pup won't let you access the wire:initial-data attribute directly (which seems like a bug to me, will probably create an issue later) but you can work around that with the json{} output and jq (or other JSON processor, I guess?)

cat 45480728909.html | \
pup -p 'div[wire:initial-data] json{}' | \
jq -r '.[]|."wire:initial-data"|fromjson|.serverMemo.data.stream.stream_created_at|select(.)' | \
sort -u

Since 2022-02-10 23:09:03 is only mentioned in the wire:initial-data attribute of various divs, we match those, print them as JSON (using the -p flag to convert the entities), then use jq to do the heavy lifting of 1) getting that attribute, 2) converting it to a real object, 3) finding the stream_created_at key (which is the only one that matches the given date), 4) removing the nulls from the list, and then using sort -u to condense it to a unique list (which in this case is just the one date.)

(If you don't have sort, you can do the uniquification in jq: jq -r '[.[]|."wire:initial-data"|fromjson|.serverMemo.data.stream.stream_created_at|select(.)]|unique|.[]')

@rjp
Copy link

rjp commented Apr 2, 2022

If PR #175 gets pulled in, you can change the pup part to pup -p 'div[wire:initial-data] attr{wire:initial-data}' which will retrieve the data and simplifies the jq bit later.

cat 45480728909.html | \
pup -p 'div[wire:initial-data] attr{wire:initial-data}' | \
jq -sr '.[]|.serverMemo.data.stream.stream_created_at|select(.)' | \
sort -u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants