Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes for watchexec lib #601

Merged
merged 128 commits into from Nov 25, 2023
Merged

Breaking changes for watchexec lib #601

merged 128 commits into from Nov 25, 2023

Conversation

passcod
Copy link
Member

@passcod passcod commented Jun 4, 2023

  • Multi-process watchexec (as in, multiple processes can be managed by a single Watchexec instance) — library (Multi-Process Watchexec #595)
  • Polish documentation
  • Upgrade Notify to 6
  • Update nix, gix
  • Add PID1 support
  • Change the Command type to include a new Isolation enum, for now only ProcessGroup, per command. (Later to have Pty and perhaps even Cgroup)
  • Remove deprecated items
  • Simplify config to one struct, not two
  • Remove async handlers
  • Remove handler trait
  • Make Config live instead of the reconfigure() method
  • Make Action::quit() work
  • Custom prespawn hook per supervised process
  • Replace postspawn with generalised outcome hook
  • Bump Tokio to 1.32 to get support for raw_args on Windows
  • Make it explicit in the docs that the on_action handler should be cheap because it blocks the core event loop.
  • Fix the whole thing about how actions are processed after the handler
    • Encapsulate outcome queue and worker inside supervisor instead of the other way around
    • Remove from supervisor set in Outcome?
    • Write tests for the supervisor
    • Finish implementing new supervisor
    • Hook up new supervisor to action worker
    • Revise action handler for new supervisor
  • Add supervisorid to processcompletion?
    • This is about providing a tie-back to which process it was that completed, relative to the supervisor that spawned it, now that there can be multiple in flight.
    • Cancelled: code can await supervisor jobs directly, and filter code doesn't need to care.
  • Move files around to be more thematic after refactors (outcome, action handler, etc)
    • The big refactor was mostly in-place and a bunch of filenames/modules don't really vibe with what they contain anymore.
  • Tidy the argument situation for the spawn method(s)
    • spawn* methods in the API has a lot of arguments, and clippy is complaining. It could be that the solution is to do nothing, as anything else than arguments is even more unwieldy, but a little investigations into the options available is warranted.
    • Cancelled: superseded by the splitting of the supervisor crate.
  • Fix the library examples
  • Optionally-async action handler
  • Adapt the CLI
    • The watchexec library public API has changed quite a bit, so this is about making the CLI part (watchexec and/or cargo-watch) work (again) (and inform the API in return if anything doesn't fit).
  • Always grab all I/O streams and control how they're hooked up between commands and the main process (scoped out)

Draft release/upgrading notes

Final changelog here: https://github.com/watchexec/watchexec/blob/main/crates/lib/CHANGELOG.md

General

  • Crate is more oriented around Watchexec the core experience rather than providing the kitchensink / components so you could build your own from the pieces; that helps the cohesion of the whole and simplifies many patterns.
  • Deprecated items (mostly leftover from splitting out the watchexec_events and watchexec_signals crates) are removed.
  • Watchexec can now supervise multiple commands at once. See Action below, the Action docs, and the Supervisor docs for more.
  • Because of this new feature, the one where multiple commands could be set under the one supervisor is removed.
  • Watchexec's supervisor was split up into its own crate, watchexec-supervisor.
  • Running as PID1 (e.g. in Docker) is now fully handled, with support from the pid1 crate.
  • Tokio requirement is now 1.33.
  • Notify was upgraded to 6.0.
  • Nix was upgraded to 0.27.

Watchexec

  • Watchexec::new() now takes the on_action handler. As this is the most important handler to define and Watchexec will not be functional without one, that enforces providing it first.
  • Watchexec::with_config() lets one provide a config upfront, otherwise the default values are used.
  • Watchexec::default() is mostly used to avoid boilerplate in doc comment examples, and panics on initialisation errors.
  • Watchexec::reconfigure() is removed. Use the public config field instead to access the "live" Arc<Config> (see below).

Config

  • InitConfig and RuntimeConfig have been unified into a single Config struct.
  • Instead of module-specific WorkingData structures, all of the config is now flat in the same Config. That makes it easier to work with as all that's needed is to pass an Arc<Config> around, but it does mean the event sources are no longer independent.
  • Instead of using tokio::sync::watch for some values, and HandlerLock for handlers, and so on, everything is now a new Changeable type, specialised to ChangeableFn for closures and ChangeableFilterer for the Filterer.
  • There's now a signal_change() method which must be called after changes to the config; this is taken care of when using the methods on Config. This is required for the few places in Watchexec which need active reconfiguration rather than reading config values just-in-time.
  • The above means that instead of using Watchexec::reconfigure() and keeping a clone of the config around, an Arc<Config> is now "live" and changes applied to it will affect the Watchexec instance directly.
  • command / commands are removed from config. Instead use the Action handler API for creating new supervised commands.
  • command_grouped is removed from config. That's now an option set on Command.
  • action_throttle is renamed to throttle and now defaults to 50ms, which is the default in Watchexec CLI.
  • keyboard_emit_eof is renamed to keyboard_events.
  • pre_spawn_handler is removed. Use Job#set_spawn_hook instead.
  • post_spawn_handler is removed. Use Job#run instead.

Command

The structure has been reworked to be simpler and more extensible. Instead of a Command enum, there's now a Command struct, which holds a single Program and behaviour-altering options. Shell has also been redone, with less special-casing.

If you had:

Command::Exec {
  prog: "date".into(),
  args: vec!["+%s".into()],
}

You should now write:

Command {
  program: Program::Exec {
    prog: "date".into(),
    args: vec!["+%s".into()],
  },
  options: Default::default(),
}
  • New Program::Shell field args: Vec<String> lets you pass (trailing) arguments to the shell invocation:
    Program::Shell {
      shell: Shell::new("sh"),
      command: "ls".into(),
      args: vec!["--".into(), "movies".into()],
    }
    is equivalent to:
    sh -c "ls" -- movies
    
  • The old args field of Command::Shell is now the options field of Shell.
  • Shell has a new field program_option: Option<Cow<OsStr>> which is the syntax of the option used to provide the command. Ie for most shells it's -c and for CMD.EXE it's /C; this makes it fully customisable (including its absence!) if you want to use weird shells or non-shell programs as shells.
  • The special-cased Shell::Powershell is removed.
  • On Windows, arguments are specified with raw_arg instead of arg to avoid quoting issues.
  • Command can no longer take a list of programs. That was always quite a hack; now that multiple supervised commands are possible, that's how multiple programs should be handled.
  • The top-level Watchexec command_grouped option is now Command-level, so you can start both grouped and non-grouped programs.
  • There's a new reset_sigmask option to control whether commands should have their signal masks reset on Unix. By default the signal mask is inherited.

Errors

  • RuntimeError::NoCommands, RuntimeError::Handler, RuntimeError::HandlerLockHeld, and CriticalError::MissingHandler are removed as the relevant types/structures don't exist anymore.
  • RuntimeError::CommandShellEmptyCommand and RuntimeError::CommandShellEmptyShell are removed; you can construct Shell with empty shell program and Program::Shell with an empty command, these will at best do nothing but they won't error early through Watchexec.
  • Watchexec will now panic if locks are poisoned; we can't recover from that.
  • The filesystem watcher's "too many files", "too many handles", and other initialisation errors are removed as RuntimeErrors, and are now CriticalErrors. These being runtime, nominally recoverable errors instead of end-the-world failures is one of the most common pitfalls of using the library, and though recovery is technically possible, it's better approached other ways.
  • The on_error handler is now sync only and no longer returns a Result; as such there's no longer the weird logic of "if the on_error handler errors, it will call itself on the error once, then crash".
  • If you were doing async work in on_error, you should instead use non-async calls (like try_send() for Tokio channels). The error handler is expected to return as fast as possible, and not do blocking work if it can at all avoid it; this was always the case but is now documented more explicitly.
  • Error diagnostic codes are removed.

Action

The process supervision system is entirely reworked. Instead of "applying Outcomes", there's now a Job type which is a single supervised command, provided by the separate watchexec-supervisor crate. The Action handler itself can only create new jobs and list existing ones, and interaction with commands is done through the Job type.

The controls available on Job are now modeled on "real" supervisors like systemd, and are both more and less powerful than the old Outcome system. This can be seen clearly in how a "restart" is specified. Previously, this was an Outcome combinator:

Outcome::if_running(
    Outcome::both(Outcome::stop(), Outcome::start()),
    Outcome::start(),
)

Now, it's a discrete method:

job.restart();

Previously, a graceful stop was a mess:

Outcome::if_running(
    Outcome::both(
        Outcome::both(
            Outcome::signal(Signal::Terminate),
            Outcome::wait_timeout(Duration::from_secs(30)),
        ),
        Outcome::both(Outcome::stop(), Outcome::start()),
    ),
    Outcome::DoNothing,
)

Now, it's again a discrete method:

job.stop_with_signal(Signal::Terminate, Duration::from_secs(30));

The stop() and start() methods also do nothing if the process is already stopped or started, respectively, so you don't need to check the status of the job before calling them. The try_restart() method is available to do a restart only if the job is running, with the try_restart_with_signal() variant for graceful restarts.

Further, all of these methods are non-blocking sync (and take &self), but they return a Ticket, a future which resolves when the control has been processed. That can be dropped if you don't care about it without affecting the job, or used to perform more advanced flow control. The special to_wait() method returns a detached, cloneable, "wait()" future, which will resolve when the process exits, without needing to hold on to the Job or a reference at all.

Here's a simplified example which starts a job, waits for it to end, then (re)starts another job if it exited successfully:

let build_id = Id::default();
let run_id = Id::default();
Watchexec::new(|mut action| {
    // omitted: signal handling, quit on ctrl-c, etc

    let build = action.get_or_create_job(build_id, Command {
        program: Program::Exec {
            program: "cargo".into(),
            args: vec!["build".into()],
        },
        options: Default::default(),
    });

    let run = action.get_or_create_job(run_id, Command {
        program: Program::Exec {
            program: "cargo".into(),
            args: vec!["run".into()],
        },
        options: {
            grouped: true,
            ..Default::default()
        },
    });

    build.restart();
    tokio::spawn(async move {
        build.to_wait().await;
        build.run(|context| {
            if let CommandState::Finished { status: ProcessEnd::Success, .. } = context.current {
                run.restart();
            }
        }).await;
    });

    action
});

@emilHof
Copy link
Contributor

emilHof commented Jun 6, 2023

hey @passcod, this looks awesome!

there are a couple of questions that i was curious about! some of which i believe you posed in the past as well, if memory is not mistaken me!

  1. is there going to be a mechanism for guaranteeing ordering of execution or for granting the user some way to ensure a certain (maybe partial) order of execution?
  2. would there be a way to get notified of a Command exiting/finishing, or would users handle this themselves (through the PostSpawn handler for example)?

the use of the Isolation enum to prepare the crate for the future addition of Cgroup is really clever by the way! :D

@passcod
Copy link
Member Author

passcod commented Jun 6, 2023

postspawn is run immediately after spawning, so that's not how, but the supervisor issues a ProcessCompletion event when a process ends, which is how you get notified (in the action handler)

@passcod
Copy link
Member Author

passcod commented Jun 6, 2023

is there going to be a mechanism for guaranteeing ordering of execution

no. that's an application concern exterior to watchexec as a library

@emilHof
Copy link
Contributor

emilHof commented Jun 6, 2023

postspawn is run immediately after spawning

aha, okie! as in the name yea.. that would make sense haha!

would it be this Event, or a different one?

let event = Event {
    tags: vec![
        Tag::Source(Source::Internal),
        Tag::ProcessCompletion(status.map(Into::into)),
    ],
    metadata: Default::default(),
};

also, would there be a way to identify which Process/Command was the one that ended, from the issued ProcessCompletion event?

@emilHof
Copy link
Contributor

emilHof commented Jun 6, 2023

no. that's an application concern exterior to watchexec as a library

oh okie ! :)

@passcod
Copy link
Member Author

passcod commented Jun 7, 2023

would there be a way to identify which Process/Command was the one that ended,

not yet, but I'll add supervisor id to it I think

@emilHof
Copy link
Contributor

emilHof commented Jun 7, 2023

I'll add supervisor id to it I think

i like that idea, yess! :D

@passcod
Copy link
Member Author

passcod commented Jul 2, 2023

(I haven't forgotten about this, I've just been moving house and dayjob got intense at the same time.)

@emilHof
Copy link
Contributor

emilHof commented Jul 3, 2023

ohh, @passcod hope you're doing and managing well with all of this going on! certainly take all the time you need! can't have our number one maintainer burn out ! :))

@passcod
Copy link
Member Author

passcod commented Aug 12, 2023

Made a bunch of progress and also expanded the scope quite a bit; going for a big simplification of the interface while still keeping all (or more!) of the features.

The three big changes so far are:

  • InitConfig and RuntimeConfig are no more, there's just Config.
  • Handlers are always sync and don't return errors, which simplifies a lot.
  • The Command structure is completely turned around and drops a lot of the windows-specific handling while being more extensible for the future.

And I'm working on making reconfiguration more natural, like just being able to change the config directly, not keep a clone of Config around to pass it to reconfigure().

I moved around the multi-process stuff a bit, so that now it's:

  • create just creates the supervisor entry kinda thing but doesn't start the command
  • delete just drops the supervisor etc (and kills the command)
  • apply works as before.

That means that it's no longer needed to pass EventSet to anything but apply(), which simplifies a bunch of types too.

@emilHof
Copy link
Contributor

emilHof commented Aug 14, 2023

heyy @passcod, these changes look awesome, wow!

Handlers are always sync and don't return errors, which simplifies a lot.

ohh, wow, this change is amazing! you are so right, it does seem to make things a lot easier!

  • create just creates the supervisor entry kinda thing but doesn't start the command
  • delete just drops the supervisor etc (and kills the command)
  • apply works as before.

oooo, these will be very interesting to dive into and see how to take advantage of these when using watchexec/lib! really nice, especially how create is just there to get the Supervisor up and running and apply does the actual controlling, if this is the correct understanding ?

one quick question, if that is alright? before, when calling apply for some Supervisor would it just overwrite the previous Outcome, yet now all apply calls will have their Outcomes applied to the Supervisor in order ? not certain this is the right understanding!

@passcod
Copy link
Member Author

passcod commented Aug 14, 2023

your understanding is correct! note that the latter is not implemented yet, but that's the plan. essentially it will build up an Outcome::both though, nothing more complicated than that. the real interesting thing is with quit() or remove(): those will wait until all outcomes or the supervisor's respectively are done before running, so a graceful quit can be done like

for supervisors {
  signal(sigterm), timeout(30s), kill()
  remove()
}

quit()

oh and also the order will be kept only per supervisor, all supervisors will be affected in parallel

so in the pseudocode above, all processes would get gracefully stopped and then removed at the same time, not one after the other, and once they're all done watchexec would quit

@emilHof
Copy link
Contributor

emilHof commented Aug 15, 2023

essentially it will build up an Outcome::both though, nothing more complicated than that

you maybe are underselling how clever of you this is ! the simplicity of implementing it this way shouldn't be understated, i think.. :) super nice @passcod ! :))

quit() or remove(): those will wait until all outcomes or the supervisor's respectively

ooo, wow! if the previous impl was already so clever and then this as well hihi! wow, yess, you are right, that seems to offer so many cool ways of handling certain event chains, and whole processes!

is it a correct interpretation, that remove() acts on one Supervisor's level and quit() on the whole watchexec's, or is there maybe some nuance that is being missed by me in this regard (very likely hehe) ?

@passcod passcod mentioned this pull request Nov 25, 2023
6 tasks
@passcod passcod force-pushed the lib/3.0.0 branch 4 times, most recently from b887c70 to 4181c99 Compare November 25, 2023 20:25
@passcod passcod enabled auto-merge (squash) November 25, 2023 20:28
@passcod passcod merged commit a13bc42 into main Nov 25, 2023
8 checks passed
@passcod passcod deleted the lib/3.0.0 branch November 25, 2023 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants