Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk index speed is slow #290

Open
amirEBD opened this issue Jul 27, 2022 · 5 comments
Open

Bulk index speed is slow #290

amirEBD opened this issue Jul 27, 2022 · 5 comments

Comments

@amirEBD
Copy link

amirEBD commented Jul 27, 2022

Hi everyone at @Sonic,
I was looking for ElasticSearch's alternatives because of resource usage issues on ES cluster. I found sonic so useful as an alternative.
So decided to do some benchmarks on it and see how it will respond. The problem now I face is that bulk index insertion is too slow regarding other search engines like elasticsearch/zinc.
I'm using go-sonic client to bulk some data inside sonic and it took about 1-2 hours to bulk below data! Should I change the client to nodeJS for example?

Data size: about 50MBs
Doc count: 2M string as Text field

Note:
Used the same config for sonic as it is on the github page
Also used docker for sonic.

Thanks for any help indeed :)

PS: As a comparison I tested Elasticsearch for about 5GB of data in 1 hour.

@valeriansaliou
Copy link
Owner

You could try the NodeJS client: https://github.com/valeriansaliou/node-sonic-channel which is official and for which I've measured the performances to be rather good yes.

@amirEBD
Copy link
Author

amirEBD commented Jul 27, 2022

Thanks for your suggestion,
I tried the node JS client but hadn't meet my expectations again!

  • There is no bulk api to send enourmues data into it
  • Sending pushes with bigger strings (e.g. a string with the length of 10 words) will cause a dissconnetion on the sonic

Is there anyway to put lots data into sonic for tests reasons? No example code was is the node client github. Just one ingest.js which will send 1 push to server.

@valeriansaliou
Copy link
Owner

The NodeJS library would split the text data into sub-commands chunks, so that definitely works. Though you should maybe pre-split your data before pushing.

Sonic was built for chat messages indexing + email indexing at first, which is why everything is centered around small chunks of data.

In other words, it is intended that inserting 1M messages results in 1M+ commands (a bit more considering some messages are larger than the max chunk size, but the NodeJS library handles splitting for you, based on the server dynamically-provided buffer size).

@valeriansaliou
Copy link
Owner

valeriansaliou commented Jul 27, 2022

In order to maximize speed, note that you should split the work between multiple NodeJS instances running the ingestion on multiple split of your data. Let's say you have 4 cores on the server running the ingestion script, then you'd split your data in 4 and run 4 NodeJS instances to push that data to Sonic. Because each ingestion thread can be seen as a synchronous command channel, blocking for a few micro-seconds at each PUSH command.

On the Sonic server end (on another server), to maximize ingestion speed, you should also ensure you have as many CPUs as there are data producer NodeJS instances (as Sonic spawns 1 thread per Sonic Channel opened over TCP by clients), + some spare CPUs for the RocksDB internal threads to do their work.

And also adjust your config.cfg accordingly to max out your Sonic server resources.

That way you can max out your importer server + Sonic server capacity. Also make sure everything is running on fast SSDs.

@amirEBD
Copy link
Author

amirEBD commented Jul 30, 2022

Thanks for your detailed explanation. I had test go-sonic with the simple push which showed a better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants