Skip to content

Here we study how easy to extract data from websites. Beginner level.

Notifications You must be signed in to change notification settings

nika999/Simple-Web-mining.-Beginner-level

Repository files navigation

Simple-Web-mining.-Beginner-level.

Here we study how easy to extract data from websites. Beginner level.

First step

Here we will unload data from the https://pudding.cool/ which will contain title, author and description of articles.

We will use Pandas for data manipulating, Urllib3 is used to open URLs and the Beautiful Soup package is used to extract data from html files.

And we have this table as a result of the first step:

image

Second step

Here we will unload data from the https://www.work.ua/jobs-kyiv-data+analyst/ which will contain job title and hiring company. In order to get all the data on request from this site, we will have to upload data from several pages:

image

And we have csv file as a result of the second step:

image

Third step

What if we want to gather information from the inner part of articles?

To do it we need to gather links of these articles and than go through them to gather inner information.

Study 3rd step to learn how to do it.

image

As a result we have titles and time from the inner side of articles: image

About

Here we study how easy to extract data from websites. Beginner level.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published