Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CTranslate2 wrapping in translation_server #2001

Open
francoishernandez opened this issue Jan 27, 2021 · 5 comments
Open

Improve CTranslate2 wrapping in translation_server #2001

francoishernandez opened this issue Jan 27, 2021 · 5 comments

Comments

@francoishernandez
Copy link
Member

https://forum.opennmt.net/t/ctranslate2-on-opennmt-py-server/4175/8

@guillaumekln
Copy link
Contributor

guillaumekln commented Mar 16, 2021

After reviewing the code, here's what could be improved:

  • Make the following translator parameters configurable:
    • inter_threads
    • intra_threads
    • compute_type
  • Allow parallel translations as supported by CTranslate2: I tried to enable that but even though the waitress module is multi-threaded and accepts concurrent requests, it seems the requests are then processed sequentially
  • Revise the unloading mechanism to not assume the model is running on the GPU
  • Maybe cleanup the initial dummy translation: the first translation has a higher latency on GPU but this was improved in recent versions (I think it's around 200 ms now)

@guillaumekln
Copy link
Contributor

I tried to enable that but even though the waitress module is multi-threaded and accepts concurrent requests, it seems the requests are then processed sequentially

I did not realize that the translation method is inside a critical section. Note this is not needed for CTranslate2: the translation and model loading/unloading are fully thread safe. So removing the critical section for CTranslate2 can improve the scalability of the server for CPU translations with inter_threads > 1 and multi-GPU translations.

@vince62s
Copy link
Member

@francoishernandez @pltrdy do you recall why this #1108 was introduced ?
threads memoery leakages ?

@pltrdy
Copy link
Contributor

pltrdy commented Feb 21, 2023

I think that in the translation server loading/unloading and even running a model was not thread safe. I don't know anything about CTranslate 2 tho, so I can't tell how they differ

@souleymanefall176
Copy link

"Good evening. I have an issue. When I run the command (ct2-opennmt-py-converter --model_path averaged-10-epoch.pt --output_dir ende_ctranslate2 --quantization int8), I get this error (ModuleNotFoundError: No module named 'onmt.inputters.text_dataset')."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants