Task:

https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection Problem: Create a REST api that allows you to explore the data set above. You can use any one of the .sgm files in the data set, you may import the data into a data store if you want to. You are expected to use Java or Python and REST and any other technology of your choice Expected APIs

API to list content
API to search content
API get a specific content by id/any identifier

Please share code git repo and be ready to demonstrate and discuss on a call.

Prerequisites:

install pyenv: https://github.com/pyenv/pyenv-installer

install pipenv: sudo -H pip install -U pipenv

set the version for repo pyenv install 3.7.1

pyenv local 3.7.1

How to work with repo:

Install dependencies: make install

Run server: make start

Test functionality: make verify

Test health: curl http://localhost:8000/reuters/health

TBD:

list articles according to time .....
increase readability of the cooe
be able to debug tests also in vscode (PYTHONPATH)
add regex search for fulltext
add precommit hook
create better documentation for api (swagger)

How to start:

import: h2o.postman_collection to your postman and playaround with the rest api

APIs:

localhost:8000/reuters/articles/<int:newid>
returns detail view of the article with body for display of the article to readers
localhost:8000/retures/articles? returns list of articles you can use querystring to filter out the articles e.g. http://localhost:8000/reuters/articles?metadata.topics=YES&places=usa

metadata.newid
metadtaa.oldid
metadta.cgisplit
metaddata.lewissplit
metadata.topics
places
people
orgs
exchanges
companies
topics

http://localhost:8000/reuters/search?fulltext.body=businessmen returns the fulltext search you can query these by fulltext: fulltext.title fulltext.dataline fulltext.body

Notes:

trying out node like tools for python
sgml data do not have unique keys that is why I am using dot based selectors, imo fastest approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Task:

Prerequisites:

How to work with repo:

How to start:

Notes:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Task:

Prerequisites:

How to work with repo:

How to start:

Notes: