Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to execute different commands on multiple selectors #169

Open
tathastu871 opened this issue Dec 29, 2021 · 4 comments
Open

Ability to execute different commands on multiple selectors #169

tathastu871 opened this issue Dec 29, 2021 · 4 comments

Comments

@tathastu871
Copy link

Eg.
curl 'example.com' | pup 'a, td'
But if i need to reformat the output selectively using linux commands like sed awk tr etc.

curl example.com | pup 'a (command to execute on 'a' tags, td (command to execute on 'td' tags)
Just like tee and pee from moreutils of linux.

Pup will extract the multiple selector and pipe it to defined command. and finally pipe the formatted output of all selectors to stdout

@tathastu871
Copy link
Author

Please reply.
It is highly needed because if user wants to parse two selectors and selectively format the output of each selector.

It prevents the crawling of same site two times for two selectors

@gromgit
Copy link

gromgit commented Dec 29, 2021

It prevents the crawling of same site two times for two selectors

Have you considered just dumping the output of curl to a temp file, and running pup off that instead?

curl -o /tmp/example.html example.com
pup 'a (command to execute)' < /tmp/example.html
pup 'td (command to execute)' < /tmp/example.html

Much easier than waiting for a feature that will most likely never come, given that this project has outstanding PRs going back four years: https://github.com/ericchiang/pup/pulls

@amitbha
Copy link

amitbha commented Jan 23, 2022

Highly recommend xidel, support xpath selector

@tathastu871
Copy link
Author

It prevents the crawling of same site two times for two selectors

Have you considered just dumping the output of curl to a temp file, and running pup off that instead?

curl -o /tmp/example.html example.com
pup 'a (command to execute)' < /tmp/example.html
pup 'td (command to execute)' < /tmp/example.html

Much easier than waiting for a feature that will most likely never come, given that this project has outstanding PRs going back four years: https://github.com/ericchiang/pup/pulls

curl site | tee >(pup a | command) >(pup td | command)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants