Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap after v0.3 #64

Open
2 of 15 tasks
gavento opened this issue Jul 2, 2018 · 0 comments
Open
2 of 15 tasks

Roadmap after v0.3 #64

gavento opened this issue Jul 2, 2018 · 0 comments
Labels
enhancement New feature or request meta
Milestone

Comments

@gavento
Copy link
Contributor

gavento commented Jul 2, 2018

A document to track the directions from 0.3, replacing #26. Our mid- and long-term goals, their [priority], (asignee) and any sub-tasks.

Any help is welcome with mentoring available for most tasks!

Remaining enhancements from v0.3

Will be updated after prioritization discussion.

Client-side protocols

Replace capnp RPC and the current monitoring dashboard HTTP API with common protocol.
Part of #11 (more discussion there) but specific to the public API.

Improve the dashboard with more information and post-mortem analysis

Fix current bugs

Custom tasks (subworkers) in more languages

  • Python subworker as a library [low] (run standalone scripts as opposed to defining them in the client only)

Easier deployment in the cloud

Packaging for easier deployment

Multiple options, priorities may vary. (@spirali)

  • AppImage/Snap packages [low] (we already have static binaries)
  • Deb/other distro packages [low]

Improve Python API

Pythonize the client API.

Improve testing infrastructure

More real-world code examples

Lower priority, best based on real use-cases. Ideas: numpy subtasks, C++/Rust subworkers

Enhancements to revisit in the (not so distant) future

  • Integration with some popular libraries
    • Apache Arrow content-type
      • Basic type and loading is implemented. We could add more operations (filter, split, merge, ...)
    • XGBoost tasks, etc ...
    • Why not now: Not clear what would be the demand
  • Worker configuration files (needed for common (CPU) and special resources (GPU), different subworker locatins and configurations, ...)
    • Partially done
    • Why not now: Needs to be thought-through (esp. w.r.t. resources), not needed now
  • Separate session construction and running (save/load session)
    • Why not now: Not clear what would be the use-cases, not difficult when API stabilized
  • Clients in other languages: Rust, C++, Java, ...
    • Why not now: Not clear what would be the demand. Easier after the protocol/Python API stabilization.
  • Scale the scheduler, benchmarks
    • There is a benchmark in utils/bench/simple_task_scaling.py. The results as of 0.2 are here.
    • Why not now: While eventually crucial, the scheduler is sufficient when there are <1000 tasks to be scheduled at once.
@gavento gavento added the meta label Jul 2, 2018
@gavento gavento added this to the v0.4 milestone Jul 2, 2018
@gavento gavento added the enhancement New feature or request label Jul 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request meta
Projects
None yet
Development

No branches or pull requests

1 participant