Skip to content

Sample Faust project to process tweets in real-time

License

Notifications You must be signed in to change notification settings

ipeluffo/faust-hashtags-counter

Repository files navigation

Hashtags counter with Faust

Sample Faust project to process tweets in real-time and count hashtags.

What are we building?

1. Custom Faust CLI command

A custom faust CLI command is responsible for filtering a stream of tweets using a list of words in CSV format.

More information about Twitter API track filter: https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters.html.

For this, the command is integrated with peony-twitter to process the Twitter stream.

Finally, the command will create one event for each hashtag found in tweets returned by the Twitter stream.

2. Faust agent to process events

The agent will process all events and store the hashtags counters in a tumbling window table.

3. Faust views

This project will expose a few Faust views:

  1. Get all hashtags
  2. Get hashtag count

Requirements

Setup Twitter secrets

  1. Make copy of .env.example file and rename it as .env
  2. Set values from your developer account

Install dependencies

pip install -U pipenv
pipenv sync 

To install dependencies for development:

pipenv sync --dev

How to run the project

Set all environment variables

Check all env vars defined in the .env file and set the corresponding values.

Note:

  • Kafka connection string is the list of brokers with the port separated by semicolon.

Have a Kafka instance running

You can use your own cluster or use one of the docker compose file provided in the docker folder.

Running using Docker Compose

From docker folder:

docker-compose up

This will run both Zookeeper and Kafka using the default ports 2181 and 9092.

If you want to store the data, from docker folder, run:

docker-compose -f docker-compose-with-storage.yml up

Stopping containers

Just stop containers running pressing CTRL+C, or from another window run:

docker-compose stop

Run CLI command

From project's folder:

pipenv run faust -A commands.commands -l info hashtags_events_generator --track word1,hashtag1,word2

Get command help

pipenv run faust -A commands.commands -l info hashtags_events_generator --help

Run Faust worker

In a different terminal:

pipenv run faust -A src.app worker -l info

Important:

  1. This will expose the views on the default port 6066
  2. The worker will store data in the default folder

Views

List of hashtags

This view is exposed at: http://localhost:6066/hashtags

Hashtag's count

This view is exposed at: http://localhost:6066/{hashtag}/count

About

Sample Faust project to process tweets in real-time

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages