Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{polite} package and web etiquette #511

Open
wibeasley opened this issue Apr 4, 2023 · 8 comments
Open

{polite} package and web etiquette #511

wibeasley opened this issue Apr 4, 2023 · 8 comments
Assignees

Comments

@wibeasley
Copy link
Collaborator

I like the Scraping ethics & legalities section for R for Data Science (2e).

Before we get started discussing the code you’ll need to perform web scraping, we need to talk about whether it’s legal and ethical for you to do so....

I think many R users (like students, statisticians, data scientists) are not as familiar with etiquette & conventions as web developers and most people web scraping. It would be nice if our web scraping section referred the reader to this info, as well as the polite package.

The three pillars of a polite session are seeking permission, taking slowly and never asking twice.

The package builds on awesome toolkits for defining and managing http sessions (httr and rvest), declaring the user agent string and investigating site policies (robotstxt), and utilizing rate-limiting and response caching (ratelimitr and memoise).

@pachadotdev, do you have thoughts? It's not the conventional material for a cran task view. I'm thinking a few sentences and links. Nothing preachy --just pointing them to these resources if the reader wants to educate themselves?

@wibeasley wibeasley self-assigned this Apr 4, 2023
@pachadotdev
Copy link
Collaborator

@wibeasley this would be extremely positive
in my own case, I have to scrap a lot of data, so I can write a part after apr 21

@pachadotdev
Copy link
Collaborator

@wibeasley I have a draft from a workshop I attended. I will put this in a separate branch

@pachadotdev
Copy link
Collaborator

https://github.com/cran-task-views/WebTechnologies/tree/511

@wibeasley
Copy link
Collaborator Author

@pachadotdev, I like it. I think it will be helpful to some audiences.

Are you writing it in a separate file, and later combining it into the Task View when you're satisfied?

I made converted it to semantic line breaks, which I've found helpful maintaining files that a lot of people touch. I also made a few changes that I hope you like. Reject anything you think doesn't improve the clarity.

@pachadotdev
Copy link
Collaborator

thanks! yes, I put that in a separate file

@wibeasley
Copy link
Collaborator Author

Will it stay in a separate file, or be integrated into the Task View?

If it stays in a separate file, I think the Task View should link to the page you wrote.

@pachadotdev
Copy link
Collaborator

the idea should be to include it in the readme, once it's ready

@zeileis
Copy link
Contributor

zeileis commented Jul 15, 2023

Thanks for putting this together, I think this is very useful!

However, this should be in the task view, not in the README. The README is just in the GitHub repository and the main page that readers will consult is the task view itself, typically on a CRAN mirror. So please put it into the task view itself when you think it is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants