Skip to content

Automated script that navigates the World Wide Web in a methodical and automated way for automatic searches on specific sites

Notifications You must be signed in to change notification settings

luizmellodev/ScrapingNews.v3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping News v3

This project consists of a code was developed using the Selenium, a web-based application automation framework through browsers.

The code leaves the search page of news sites with a filter for a period of time, filling the search field with the keywords. Like result, relevant news about the keywords is shown in a period of time. For each news item, the algorithm collects the title, description, date and full news URL. After collecting all this information, the algorithm enters each stored URL and collects the news content.

  • Selenium is a portable framework for testing web applications. Selenium provides a reproduction tool to create functional tests without the need to learn a test scripting language.

*Disclaimer: The use of this library/software in the wrong way is the sole responsibility of the user. This code was developed for academic projects and approved by the sites that are receiving data collection.

Development current status

All methods are in the process of being built since the moment I write this.

Installation

The repo is structured like a package, so it can be installed from pip using github clone url. From command line type:

pip install git+https://github.com/luizeduardomr/ScrapingNews.v3

To upgrade the package if you have already installed it:

pip install git+https://github.com/luizeduardomr/ScrapingNews.v3h --upgrade

Please note that you should also install Google Chrome browser in order to use this software better

About the repository

This repository is intended to help developers understand the Scraping process. The purpose is NOT to disclose the code for malicious use or to disclose the software to anyone to use.

About

Automated script that navigates the World Wide Web in a methodical and automated way for automatic searches on specific sites

Topics

Resources

Stars

Watchers

Forks

Sponsor this project