-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add layers
function to rustworkx-core
#1194
base: main
Are you sure you want to change the base?
Conversation
- Modify the `layers` python interface to use the `rustworkx-core` equivalent.
Pull Request Test Coverage Report for Build 9384925392Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, just some quick high level comments from a first skim over the code.
The other things are can you add a couple of other rust space tests so we have some more dedicated coverage of the rust api (more than just the doctest) and also a release note?
- Move `layers` to `dag_algo.rs`. - Add check for cycles, if a cycle is found throw an error. - Refactor `LayersIndexError` to `LayersError`. - Move `LayersError` to `err.rs`. - Other small tweaks and fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some quick comments from a first skim through the PR.
releasenotes/notes/add-layers-rustworkx-core-1caaccb1ca292ca8.yaml
Outdated
Show resolved
Hide resolved
releasenotes/notes/add-layers-rustworkx-core-1caaccb1ca292ca8.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, it's a good start. I think in this case we should make layers()
a true iterator because it doesn't need to do a complete traversal to compute the final output. We can and should return this layer by layer. The reason we couldn't do that in python is we can't put references in a pyclass for the iteration, but in a Rust API we don't have that limitation. I left an inline code suggestion to start you down that path.
The other thing is I'm thinking we may not need to return a Result<>
for this function, the only error condition we returned before was input checking we needed for Python interoperability, but doesn't really feel right in a rust API. You added a cycle check in this case which we could error for, but I'd also be fine just panicking in that case because this function doesn't work if there are cycles. Or alternatively just stop iterating when we reach a cycle. In either case we could just document that behavior and avoid the Result
and I think either would probably be acceptable because this function is really only for DAGs so you shouldn't be passing a graph with cycles.
rustworkx-core/src/dag_algo.rs
Outdated
if let Some(node) = &cur_layer.first() { | ||
let check_cycle = find_cycle(&graph, Some(**node)); | ||
if !check_cycle.is_empty() { | ||
return Err(LayersError(Some( | ||
"The provided graph has a cycle".to_string(), | ||
))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new right? I don't think the old code had a cycle check like this. This probably adds more overhead than needed since we've got to iterate over all the nodes a second time. Could we maintain a HashSet
of encountered nodes we can add a lookup on that node id at the right point and error if we've cycled.
This being said with my below suggestion for using an iterator this might be ok overhead so we don't have to deal with returning a fallible iterator which IMO aren't the most ergonomic to work with, but might be what's called for in this case. I'd also be fine if we just called panic!()
if we encounter a cycle while iteration as long as we document that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I had the error in the first place was to communicate to Python about what errors were encountered. I could change it to panic, and then just have Python make the cycle checks beforehand. I would still have to implement a check on the rust side that involves less overhead. I will think about the logic for that.
rustworkx-core/src/dag_algo.rs
Outdated
output_indices.push(cur_layer.to_owned()); | ||
|
||
// Iterate until there are no more | ||
while !cur_layer.is_empty() { | ||
for node in &cur_layer { | ||
let children = graph.neighbors_directed(*node, petgraph::Direction::Outgoing); | ||
let mut used_indices: HashSet<G::NodeId> = HashSet::new(); | ||
for succ in children { | ||
// Skip duplicate successors | ||
if used_indices.contains(&succ) { | ||
continue; | ||
} | ||
used_indices.insert(succ); | ||
let mut multiplicity: usize = 0; | ||
let raw_edges: G::EdgesDirected = | ||
graph.edges_directed(*node, petgraph::Direction::Outgoing); | ||
for edge in raw_edges { | ||
if edge.target() == succ { | ||
multiplicity += 1; | ||
} | ||
} | ||
predecessor_count | ||
.entry(succ) | ||
.and_modify(|e| *e -= multiplicity) | ||
.or_insert( | ||
// Get the number of incoming edges to the successor | ||
graph | ||
.edges_directed(succ, petgraph::Direction::Incoming) | ||
.count() | ||
- multiplicity, | ||
); | ||
if *predecessor_count.get(&succ).unwrap() == 0 { | ||
next_layer.push(succ); | ||
predecessor_count.remove(&succ); | ||
} | ||
} | ||
} | ||
if !next_layer.is_empty() { | ||
output_indices.push(next_layer.to_owned()); | ||
} | ||
cur_layer = next_layer; | ||
next_layer = Vec::new(); | ||
} | ||
Ok(output_indices) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterator comment I referred to in the original issue can actually be done here. It requires a bit of a restructuring of the code, you have to move the tracking variables into a struct and implement the Iterator
trait for that struct. This ends up looking something like:
output_indices.push(cur_layer.to_owned()); | |
// Iterate until there are no more | |
while !cur_layer.is_empty() { | |
for node in &cur_layer { | |
let children = graph.neighbors_directed(*node, petgraph::Direction::Outgoing); | |
let mut used_indices: HashSet<G::NodeId> = HashSet::new(); | |
for succ in children { | |
// Skip duplicate successors | |
if used_indices.contains(&succ) { | |
continue; | |
} | |
used_indices.insert(succ); | |
let mut multiplicity: usize = 0; | |
let raw_edges: G::EdgesDirected = | |
graph.edges_directed(*node, petgraph::Direction::Outgoing); | |
for edge in raw_edges { | |
if edge.target() == succ { | |
multiplicity += 1; | |
} | |
} | |
predecessor_count | |
.entry(succ) | |
.and_modify(|e| *e -= multiplicity) | |
.or_insert( | |
// Get the number of incoming edges to the successor | |
graph | |
.edges_directed(succ, petgraph::Direction::Incoming) | |
.count() | |
- multiplicity, | |
); | |
if *predecessor_count.get(&succ).unwrap() == 0 { | |
next_layer.push(succ); | |
predecessor_count.remove(&succ); | |
} | |
} | |
} | |
if !next_layer.is_empty() { | |
output_indices.push(next_layer.to_owned()); | |
} | |
cur_layer = next_layer; | |
next_layer = Vec::new(); | |
} | |
Ok(output_indices) | |
} | |
Ok(LayersIterator { | |
graph, | |
cur_layer, | |
output_indices, | |
next_layer, | |
predecessor_count, | |
node_set, | |
first_iter: false, | |
}) | |
} | |
struct LayersIterator<G, N> { | |
graph: G, | |
cur_layer: Vec<N>, | |
output_indices: Vec<Vec<N>>, | |
next_layer: Vec<N>, | |
predecessor_count: HashMap<N, usize>, | |
node_set: HashSet<N>, | |
first_iter: bool, | |
} | |
impl<G, N> Iterator for LayersIterator<G, N> | |
where | |
G: NodeIndexable // Used in from_index and to_index. | |
+ NodeCount // Used in find_cycle | |
+ EdgeCount // Used in find_cycle | |
+ Visitable // Used in find_cycle | |
+ IntoNodeIdentifiers<NodeId = N> // Used for .node_identifiers | |
+ IntoNeighborsDirected // Used for .neighbors_directed | |
+ IntoEdgesDirected // Used for .edged_directed | |
+ GraphBase<NodeId = N>, | |
N: Copy + Eq + Hash, | |
{ | |
type Item = Vec<N>; | |
fn next(&mut self) -> Option<Self::Item> { | |
if !self.first_iter { | |
self.first_iter = true; | |
Some(self.cur_layer.clone()) | |
} else if self.cur_layer.is_empty() { | |
None | |
} else { | |
while !self.cur_layer.is_empty() { | |
for node in &self.cur_layer { | |
let children = self | |
.graph | |
.neighbors_directed(*node, petgraph::Direction::Outgoing); | |
let mut used_indices: HashSet<N> = HashSet::new(); | |
for succ in children { | |
// Skip duplicate successors | |
if used_indices.contains(&succ) { | |
continue; | |
} | |
used_indices.insert(succ); | |
let mut multiplicity: usize = 0; | |
let raw_edges: G::EdgesDirected = self | |
.graph | |
.edges_directed(*node, petgraph::Direction::Outgoing); | |
for edge in raw_edges { | |
if edge.target() == succ { | |
multiplicity += 1; | |
} | |
} | |
self.predecessor_count | |
.entry(succ) | |
.and_modify(|e| *e -= multiplicity) | |
.or_insert( | |
// Get the number of incoming edges to the successor | |
self.graph | |
.edges_directed(succ, petgraph::Direction::Incoming) | |
.count() | |
- multiplicity, | |
); | |
if *self.predecessor_count.get(&succ).unwrap() == 0 { | |
self.next_layer.push(succ); | |
self.predecessor_count.remove(&succ); | |
} | |
} | |
} | |
if !self.next_layer.is_empty() { | |
return Some(self.next_layer.to_owned()); | |
} | |
std::mem::swap(&mut self.cur_layer, &mut self.next_layer); | |
self.next_layer.clear(); | |
} | |
None | |
} | |
} | |
} |
Then you return the return type to
impl Iterator<Item=Vec<G::NodeId>>
I think having the true iterator here would make sense because we can stop early if someone hits the target, or we can operate on the data in place as we iterate which I think would be more powerful than collecting into a Vec.
Note, I haven't tested the functionality of this suggestion, just that it compiles locally. But there are some warnings so I probably messed up something in the migration. This is more the basic structure to shoot for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, I'll have something like this ready soon!
rustworkx-core/src/dag_algo.rs
Outdated
for layer_node in &cur_layer { | ||
if !node_set.contains(layer_node) { | ||
return Err(LayersError(Some(format!( | ||
"An index input in 'first_layer' {} is not a valid node index in the graph", | ||
graph.to_index(*layer_node) | ||
)))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this in the rust API? I'm wondering if we should just treat this with a panic because it's a straight programming error in rust, not an expected input. We need this python side so we avoid a panic and return an IndexError
but in rust this condition is the same as doing:
let test = vec![0, 1];
println!("{}", test[42]);
so guarding against it feels unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it to Python only then
- Use panic exceptions for specific cases. - Other tweaks and fixes.
Attempts to resolve #1167
Summary
These commits bring the
layers
function down torustworkx-core
so that it can be used within the rust interface. This function was only available through a python interface previously.Changes
rustworkx.rustworkx-core.layers
layers()
: This function returns a list of subgraphs whose nodes are disjointed.NodeId
instances. To get the weights/data stored in these nodes, the user can iterate through the graph accessing the weights based on the instance ofNodeId
.rustworkx.dag_algo
layers()
: This function is similar to the one inrustworkx-core
, but allows the user to choose between getting node indices and node weights/data.rustworkx-core
version, but its behavior is identical to what it was previously.Future directions:
To return anAdded!Iterator
instance fromrustworkx-core
, instead of aVec<Vec<G::NodeId>>
to prevent from unnecessary allocations.