Skip to content

Commit

Permalink
Merge pull request #121 from creekorful/develop
Browse files Browse the repository at this point in the history
Release 0.10.0
  • Loading branch information
creekorful committed Jan 6, 2021
2 parents 6c4ecc1 + 829afcb commit 4075dfc
Show file tree
Hide file tree
Showing 42 changed files with 989 additions and 1,878 deletions.
50 changes: 4 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,50 +30,16 @@ and wait for all containers to start.

# How to initiate crawling

Since the API is exposed on localhost:15005, one can use it to start crawling:
One can use the RabbitMQ dashhboard available at localhost:15003, and publish a new JSON object in the **crawlingQueue**.

using trandoshanctl executable:
The object should look like this:

```sh
$ trandoshanctl --api-token <token> schedule https://www.facebookcorewwwi.onion
```

or using the docker image:

```sh
$ docker run creekorful/trandoshanctl --api-token <token> --api-uri <uri> schedule https://www.facebookcorewwwi.onion
```

(you'll need to specify the api uri if you use the docker container)

this will schedule given URL for crawling.

## Example token

Here's a working API token that you can use with trandoshanctl if you haven't changed the API signing key:

```
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRyYW5kb3NoYW5jdGwiLCJyaWdodHMiOnsiUE9TVCI6WyIvdjEvdXJscyJdLCJHRVQiOlsiL3YxL3Jlc291cmNlcyJdfX0.jGA8WODYKtKy7ZijngoV8C3iWi1eTvMitA8Z1Is2GUg
```

This token is the representation of the following payload:

```
```json
{
"username": "trandoshanctl",
"rights": {
"POST": [
"/v1/urls"
],
"GET": [
"/v1/resources"
]
}
"url": "https://facebookcorewwwi.onion"
}
```

you may create your own tokens with the rights needed. In the future a CLI tool will allow token generation easily.

## How to speed up crawling

If one want to speed up the crawling, he can scale the instance of crawling component in order to increase performances.
Expand All @@ -87,14 +53,6 @@ this will set the number of crawler instance to 5.

# How to view results

## Using trandoshanctl

```sh
$ trandoshanctl search <term>
```

## Using kibana

You can use the Kibana dashboard available at http://localhost:15004. You will need to create an index pattern named '
resources', and when it asks for the time field, choose 'time'.

Expand Down
24 changes: 0 additions & 24 deletions build/docker/Dockerfile.tdsh-archiver

This file was deleted.

24 changes: 0 additions & 24 deletions build/docker/Dockerfile.trandoshanctl

This file was deleted.

14 changes: 0 additions & 14 deletions cmd/tdsh-archiver/tdsh-archiver.go

This file was deleted.

13 changes: 0 additions & 13 deletions cmd/trandoshanctl/trandoshanctl.go

This file was deleted.

29 changes: 17 additions & 12 deletions deployments/docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,46 +45,47 @@ services:
--log-level debug
--hub-uri amqp://guest:guest@rabbitmq:5672
--config-api-uri http://configapi:8080
--redis-uri redis:6379
restart: always
depends_on:
- rabbitmq
archiver:
image: creekorful/tdsh-archiver:latest
indexer-local:
image: creekorful/tdsh-indexer:latest
command: >
--log-level debug
--hub-uri amqp://guest:guest@rabbitmq:5672
--storage-dir /archive
--config-api-uri http://configapi:8080
--index-driver local
--index-dest /archive
restart: always
volumes:
- archiverdata:/archive
depends_on:
- rabbitmq
indexer:
- configapi
indexer-es:
image: creekorful/tdsh-indexer:latest
command: >
--log-level debug
--hub-uri amqp://guest:guest@rabbitmq:5672
--elasticsearch-uri http://elasticsearch:9200
--signing-key K==M5RsU_DQa4_XSbkX?L27s^xWmde25
--config-api-uri http://configapi:8080
--redis-uri redis:6379
--index-driver elastic
--index-dest http://elasticsearch:9200
restart: always
depends_on:
- rabbitmq
- elasticsearch
- configapi
- redis
ports:
- 15005:8080
configapi:
image: creekorful/tdsh-configapi:latest
command: >
--log-level debug
--hub-uri amqp://guest:guest@rabbitmq:5672
--redis-uri redis:6379
--default-value forbidden-hostnames="[{\"hostname\": \"facebookcorewwwi.onion\"}, {\"hostname\": \"nytimes3xbfgragh.onion\"}]"
--default-value forbidden-hostnames="[]"
--default-value allowed-mime-types="[{\"content-type\":\"text/\",\"extensions\":[\"html\",\"php\",\"aspx\", \"htm\"]}]"
--default-value refresh-delay="{\"delay\": -1}"
--default-value blacklist-threshold="{\"threshold\": 5}"
restart: always
depends_on:
- rabbitmq
Expand All @@ -97,10 +98,14 @@ services:
--log-level debug
--hub-uri amqp://guest:guest@rabbitmq:5672
--config-api-uri http://configapi:8080
--redis-uri redis:6379
--tor-uri torproxy:9050
restart: always
depends_on:
- rabbitmq
- configapi
- redis
- torproxy

volumes:
esdata:
Expand All @@ -110,4 +115,4 @@ volumes:
archiverdata:
driver: local
redisdata:
driver: local
driver: local
Binary file modified docs/architecture.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
99 changes: 0 additions & 99 deletions internal/archiver/archiver.go

This file was deleted.

55 changes: 0 additions & 55 deletions internal/archiver/archiver_test.go

This file was deleted.

11 changes: 0 additions & 11 deletions internal/archiver/storage/storage.go

This file was deleted.

0 comments on commit 4075dfc

Please sign in to comment.