Skip to content

A program to scrape indeed job listings to rank given skills in order of most needed

License

Notifications You must be signed in to change notification settings

Mandawi/Skillstract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skillstract

A program to scrape indeed job listings to rank given skills in order of most needed. In addition, we are working on using machine learning clustering algorithms to cluster the data (i.e. communication skills, technical skills, etc.). We are also learning SQL to save the data because that will be more convenient than saving the data locally in text files.

Check out the Roman branch for latest, unstable version.

Performance

Currently takes about 1 minute per 10 job listings

Further explanation of the approach


This is an example of what the first job listings page for software
engineering in MA looks like: https://www.indeed.com/jobs?q=software+engineer&l=MA&sort=date
Now, if we look at a single job: https://www.indeed.com/jobs?q=software+engineer&l=MA&sort=date&vjk=3916106ade6d80b3
Note that this is the same URL as the one before, with only vjk=3916106ade6d80b3, the unique job id, added to it.
Overall, this means we can replace the text after q= to get results for a different job (with spaces converted to +),
and replace text after l= with state abbreviation

Requirements


Have pip ready: https://stackoverflow.com/questions/4750806/how-do-i-install-pip-on-windows?rq=1
*Note, you may already have pip, so check by going to cmd, typing python, and then import pip and you should get no errors, if you have it
Have selenium ready: https://pypi.org/project/selenium/
use: 'pip install selenium' without quotes in cmd

Nice to have the following to get visual results


Have easygui ready: https://pypi.org/project/easygui/
use: 'pip install easygui' without quotes in cmd
Have matplotlib ready (this is quite heavy): https://pypi.org/project/matplotlib/
use: 'pip install matplotlib' without quotes in cmd


About

A program to scrape indeed job listings to rank given skills in order of most needed

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages