Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add layers function to rustworkx-core #1194

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

raynelfss
Copy link
Contributor

@raynelfss raynelfss commented May 17, 2024

Attempts to resolve #1167

Summary

These commits bring the layers function down to rustworkx-core so that it can be used within the rust interface. This function was only available through a python interface previously.

Changes

  • rustworkx.rustworkx-core.layers layers(): This function returns a list of subgraphs whose nodes are disjointed.
    • The main difference between this and its python counterpart is that this function only returns a list of NodeId instances. To get the weights/data stored in these nodes, the user can iterate through the graph accessing the weights based on the instance of NodeId.
  • rustworkx.dag_algo layers(): This function is similar to the one in rustworkx-core, but allows the user to choose between getting node indices and node weights/data.
    • The logic has been updated to use the rustworkx-core version, but its behavior is identical to what it was previously.

Future directions:

  • To return an Iterator instance from rustworkx-core, instead of a Vec<Vec<G::NodeId>> to prevent from unnecessary allocations. Added!
  • Open to suggestions.

- Modify the `layers` python interface to use the `rustworkx-core` equivalent.
@coveralls
Copy link

coveralls commented May 17, 2024

Pull Request Test Coverage Report for Build 9384925392

Details

  • 91 of 103 (88.35%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.06%) to 95.774%

Changes Missing Coverage Covered Lines Changed/Added Lines %
rustworkx-core/src/dag_algo.rs 65 69 94.2%
rustworkx-core/src/err.rs 0 8 0.0%
Totals Coverage Status
Change from base Build 9341056741: -0.06%
Covered Lines: 17362
Relevant Lines: 18128

💛 - Coveralls

@raynelfss raynelfss marked this pull request as ready for review May 17, 2024 18:48
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, just some quick high level comments from a first skim over the code.

The other things are can you add a couple of other rust space tests so we have some more dedicated coverage of the rust api (more than just the doctest) and also a release note?

rustworkx-core/src/layers.rs Outdated Show resolved Hide resolved
rustworkx-core/src/layers.rs Outdated Show resolved Hide resolved
rustworkx-core/src/layers.rs Outdated Show resolved Hide resolved
mtreinish and others added 7 commits May 24, 2024 19:48
- Move `layers` to `dag_algo.rs`.
- Add check for cycles, if a cycle is found throw an error.
- Refactor `LayersIndexError` to `LayersError`.
- Move `LayersError` to `err.rs`.
- Other small tweaks and fixes.
@mtreinish mtreinish added the rustworkx-core Issues tracking adding functionality to rustworkx-core label Jun 2, 2024
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some quick comments from a first skim through the PR.

rustworkx-core/src/dag_algo.rs Outdated Show resolved Hide resolved
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, it's a good start. I think in this case we should make layers() a true iterator because it doesn't need to do a complete traversal to compute the final output. We can and should return this layer by layer. The reason we couldn't do that in python is we can't put references in a pyclass for the iteration, but in a Rust API we don't have that limitation. I left an inline code suggestion to start you down that path.

The other thing is I'm thinking we may not need to return a Result<> for this function, the only error condition we returned before was input checking we needed for Python interoperability, but doesn't really feel right in a rust API. You added a cycle check in this case which we could error for, but I'd also be fine just panicking in that case because this function doesn't work if there are cycles. Or alternatively just stop iterating when we reach a cycle. In either case we could just document that behavior and avoid the Result and I think either would probably be acceptable because this function is really only for DAGs so you shouldn't be passing a graph with cycles.

Comment on lines 370 to 377
if let Some(node) = &cur_layer.first() {
let check_cycle = find_cycle(&graph, Some(**node));
if !check_cycle.is_empty() {
return Err(LayersError(Some(
"The provided graph has a cycle".to_string(),
)));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new right? I don't think the old code had a cycle check like this. This probably adds more overhead than needed since we've got to iterate over all the nodes a second time. Could we maintain a HashSet of encountered nodes we can add a lookup on that node id at the right point and error if we've cycled.

This being said with my below suggestion for using an iterator this might be ok overhead so we don't have to deal with returning a fallible iterator which IMO aren't the most ergonomic to work with, but might be what's called for in this case. I'd also be fine if we just called panic!() if we encounter a cycle while iteration as long as we document that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I had the error in the first place was to communicate to Python about what errors were encountered. I could change it to panic, and then just have Python make the cycle checks beforehand. I would still have to implement a check on the rust side that involves less overhead. I will think about the logic for that.

Comment on lines 386 to 430
output_indices.push(cur_layer.to_owned());

// Iterate until there are no more
while !cur_layer.is_empty() {
for node in &cur_layer {
let children = graph.neighbors_directed(*node, petgraph::Direction::Outgoing);
let mut used_indices: HashSet<G::NodeId> = HashSet::new();
for succ in children {
// Skip duplicate successors
if used_indices.contains(&succ) {
continue;
}
used_indices.insert(succ);
let mut multiplicity: usize = 0;
let raw_edges: G::EdgesDirected =
graph.edges_directed(*node, petgraph::Direction::Outgoing);
for edge in raw_edges {
if edge.target() == succ {
multiplicity += 1;
}
}
predecessor_count
.entry(succ)
.and_modify(|e| *e -= multiplicity)
.or_insert(
// Get the number of incoming edges to the successor
graph
.edges_directed(succ, petgraph::Direction::Incoming)
.count()
- multiplicity,
);
if *predecessor_count.get(&succ).unwrap() == 0 {
next_layer.push(succ);
predecessor_count.remove(&succ);
}
}
}
if !next_layer.is_empty() {
output_indices.push(next_layer.to_owned());
}
cur_layer = next_layer;
next_layer = Vec::new();
}
Ok(output_indices)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator comment I referred to in the original issue can actually be done here. It requires a bit of a restructuring of the code, you have to move the tracking variables into a struct and implement the Iterator trait for that struct. This ends up looking something like:

Suggested change
output_indices.push(cur_layer.to_owned());
// Iterate until there are no more
while !cur_layer.is_empty() {
for node in &cur_layer {
let children = graph.neighbors_directed(*node, petgraph::Direction::Outgoing);
let mut used_indices: HashSet<G::NodeId> = HashSet::new();
for succ in children {
// Skip duplicate successors
if used_indices.contains(&succ) {
continue;
}
used_indices.insert(succ);
let mut multiplicity: usize = 0;
let raw_edges: G::EdgesDirected =
graph.edges_directed(*node, petgraph::Direction::Outgoing);
for edge in raw_edges {
if edge.target() == succ {
multiplicity += 1;
}
}
predecessor_count
.entry(succ)
.and_modify(|e| *e -= multiplicity)
.or_insert(
// Get the number of incoming edges to the successor
graph
.edges_directed(succ, petgraph::Direction::Incoming)
.count()
- multiplicity,
);
if *predecessor_count.get(&succ).unwrap() == 0 {
next_layer.push(succ);
predecessor_count.remove(&succ);
}
}
}
if !next_layer.is_empty() {
output_indices.push(next_layer.to_owned());
}
cur_layer = next_layer;
next_layer = Vec::new();
}
Ok(output_indices)
}
Ok(LayersIterator {
graph,
cur_layer,
output_indices,
next_layer,
predecessor_count,
node_set,
first_iter: false,
})
}
struct LayersIterator<G, N> {
graph: G,
cur_layer: Vec<N>,
output_indices: Vec<Vec<N>>,
next_layer: Vec<N>,
predecessor_count: HashMap<N, usize>,
node_set: HashSet<N>,
first_iter: bool,
}
impl<G, N> Iterator for LayersIterator<G, N>
where
G: NodeIndexable // Used in from_index and to_index.
+ NodeCount // Used in find_cycle
+ EdgeCount // Used in find_cycle
+ Visitable // Used in find_cycle
+ IntoNodeIdentifiers<NodeId = N> // Used for .node_identifiers
+ IntoNeighborsDirected // Used for .neighbors_directed
+ IntoEdgesDirected // Used for .edged_directed
+ GraphBase<NodeId = N>,
N: Copy + Eq + Hash,
{
type Item = Vec<N>;
fn next(&mut self) -> Option<Self::Item> {
if !self.first_iter {
self.first_iter = true;
Some(self.cur_layer.clone())
} else if self.cur_layer.is_empty() {
None
} else {
while !self.cur_layer.is_empty() {
for node in &self.cur_layer {
let children = self
.graph
.neighbors_directed(*node, petgraph::Direction::Outgoing);
let mut used_indices: HashSet<N> = HashSet::new();
for succ in children {
// Skip duplicate successors
if used_indices.contains(&succ) {
continue;
}
used_indices.insert(succ);
let mut multiplicity: usize = 0;
let raw_edges: G::EdgesDirected = self
.graph
.edges_directed(*node, petgraph::Direction::Outgoing);
for edge in raw_edges {
if edge.target() == succ {
multiplicity += 1;
}
}
self.predecessor_count
.entry(succ)
.and_modify(|e| *e -= multiplicity)
.or_insert(
// Get the number of incoming edges to the successor
self.graph
.edges_directed(succ, petgraph::Direction::Incoming)
.count()
- multiplicity,
);
if *self.predecessor_count.get(&succ).unwrap() == 0 {
self.next_layer.push(succ);
self.predecessor_count.remove(&succ);
}
}
}
if !self.next_layer.is_empty() {
return Some(self.next_layer.to_owned());
}
std::mem::swap(&mut self.cur_layer, &mut self.next_layer);
self.next_layer.clear();
}
None
}
}
}

Then you return the return type to

impl Iterator<Item=Vec<G::NodeId>>

I think having the true iterator here would make sense because we can stop early if someone hits the target, or we can operate on the data in place as we iterate which I think would be more powerful than collecting into a Vec.

Note, I haven't tested the functionality of this suggestion, just that it compiles locally. But there are some warnings so I probably messed up something in the migration. This is more the basic structure to shoot for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, I'll have something like this ready soon!

Comment on lines 378 to 385
for layer_node in &cur_layer {
if !node_set.contains(layer_node) {
return Err(LayersError(Some(format!(
"An index input in 'first_layer' {} is not a valid node index in the graph",
graph.to_index(*layer_node)
))));
}
}
Copy link
Member

@mtreinish mtreinish Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in the rust API? I'm wondering if we should just treat this with a panic because it's a straight programming error in rust, not an expected input. We need this python side so we avoid a panic and return an IndexError but in rust this condition is the same as doing:

let test = vec![0, 1];
println!("{}", test[42]);

so guarding against it feels unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it to Python only then

- Use panic exceptions for specific cases.
- Other tweaks and fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rustworkx-core Issues tracking adding functionality to rustworkx-core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move layers functionality to rustworkx-core
4 participants