[WIP] Allow to respawn node with different set of arguments #190

s0me0ne-unkn0wn · 2024-03-17T14:02:28Z

I made these changes while researching a quite specific use case. Not all of them may be useful for the general public, but I believe some of them may.

Everything is implemented for the native provider only at the moment. I currently lack the expertise to implement it for Kubernetes.

It also introduces changes in publicly exposed interfaces, so that should be considered a breaking change.

An approximate list of what's been done and why:

Got rid of nodes_by_name hashmap in the orchestrator::Network. It didn't make sense from the POV of performance (as a zombienet is unlikely to be of size exceeding a couple of hundreds of nodes), but cloning structures around was effectively sealing the network at the moment it was spawned, not allowing to change it dynamically;
Fixed a bug which resulted in double-adding nodes to network::network::Relaychain and network::network::Parachain when a new validator or collator is added;
Introduced a call allowing to kill a node without restarting it;
Introduced a call allowing to re-spawn a previously killed node, altering its spec between the kill and the re-spawn;
Implemented means of dynamically replacing the node inside a network.

Thoughts and ideas:

Narrowing down the use case to changing node command line arguments during the restart is a decision dictated by my specific research. Probably, it may be generalized to like "change everything you want while the node is dead", but the scope of the allowed changes has to be discussed;
The overall approach is somewhat brittle and not that foolproof. It would be better to have something like pub async fn restart_with(&mut self, mutate: impl FnMut(&mut NetworkNode)) that would kill the node, run the closure to alter its configuration, and then re-spawn the node, but that gives less leverage to the developer. This needs to be discussed as well.

I'll add a usage example a bit later. Feel free to ask questions, provide suggestions, add your commits, etc.

s0me0ne-unkn0wn · 2024-03-17T14:47:17Z

Usage example: https://github.com/s0me0ne-unkn0wn/peerid-change-test/blob/master/src/main.rs

pepoviola · 2024-03-17T23:57:49Z

Hi @s0me0ne-unkn0wn, thanks for your feedback and contribution!!

Related to individual changes:

Got rid of nodes_by_name hashmap in the orchestrator::Network. It didn't make sense from the POV of performance (as a zombienet is unlikely to be of size exceeding a couple of hundreds of nodes), but cloning structures around was effectively sealing the network at the moment it was spawned, not allowing to change it dynamically;

Fixed a bug which resulted in double-adding nodes to network::network::Relaychain and network::network::Parachain when a new validator or collator is added;

This looks great!

Introduced a call allowing to kill a node without restarting it;

Did you think make sense to expose this method? Looks like will be always using followed by a re-spawn call since only kill the node will leave the node in the network but the process/pod inaccessible (maybe we can add a state flag). Maybe we can call it internally in the respawn method.

Introduced a call allowing to re-spawn a previously killed node, altering its spec between the kill and the re-spawn;

I think this is a great, we made some hack in v1 to allow to restart the node with a modified cmd and this is a more clean approach. We can check which other fields make sense to allow to modify in the spec to cover other use-cases. Also, one thing to notice is that key/peer_id is pretty much bounded to the node name by default and if we allow to change those we are breaking that pattern. I think that will not cause issues in most of the cases but we should document so users know that naming/key/peer_id can be changed.

Implemented means of dynamically replacing the node inside a network.

This looks great!!

Thoughts and ideas:

Narrowing down the use case to changing node command line arguments during the restart is a decision dictated by my specific research. Probably, it may be generalized to like "change everything you want while the node is dead", but the scope of the allowed changes has to be discussed;

Sounds good, command, subcommand, args and env are other good candidates to evaluate.

The overall approach is somewhat brittle and not that foolproof. It would be better to have something like pub async fn restart_with(&mut self, mutate: impl FnMut(&mut NetworkNode)) that would kill the node, run the closure to alter its configuration, and then re-spawn the node, but that gives less leverage to the developer. This needs to be discussed as well.

Yes, restart_with sounds good but as you said we should think what is the better ux for the developers.

Again, thanks for your feedback and contribution!! If you have time I will ping you to talks about this and implement the needed changes to support this feature also in kubernetes.

Thx!

--
Side note, I think the wait_for_metric fn from your example could be also a good candidate to implement and very useful for developers.

s0me0ne-unkn0wn · 2024-03-18T08:51:46Z

If you have time I will ping you to talks about this

Sure! It's a great tool we're actively using so I'm interested in making it better :)

@s0me0ne-unkn0wn

based on #190 cc: @s0me0ne-unkn0wn

s0me0ne-unkn0wn added 4 commits March 16, 2024 14:00

Generalize metric assertions

fb4e1e9

Initial implementation

d76d649

cargo fmt

43994db

Merge remote-tracking branch 'origin/main' into s0me0ne/respawn-node

265cdeb

pepoviola requested review from l0r1s and pepoviola March 17, 2024 22:45

pepoviola mentioned this pull request Mar 21, 2024

remove hashmap nodes_by_name #193

Merged

pepoviola added a commit that referenced this pull request Mar 21, 2024

remove hashmap nodes_by_name (#193)

3d36d46

based on #190 cc: @s0me0ne-unkn0wn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Allow to respawn node with different set of arguments #190

[WIP] Allow to respawn node with different set of arguments #190

s0me0ne-unkn0wn commented Mar 17, 2024 •

edited

s0me0ne-unkn0wn commented Mar 17, 2024

pepoviola commented Mar 17, 2024 •

edited by s0me0ne-unkn0wn

s0me0ne-unkn0wn commented Mar 18, 2024

[WIP] Allow to respawn node with different set of arguments #190

Are you sure you want to change the base?

[WIP] Allow to respawn node with different set of arguments #190

Conversation

s0me0ne-unkn0wn commented Mar 17, 2024 • edited

s0me0ne-unkn0wn commented Mar 17, 2024

pepoviola commented Mar 17, 2024 • edited by s0me0ne-unkn0wn

s0me0ne-unkn0wn commented Mar 18, 2024

s0me0ne-unkn0wn commented Mar 17, 2024 •

edited

pepoviola commented Mar 17, 2024 •

edited by s0me0ne-unkn0wn