Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to /expansion #4728

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

michaelhrabanek
Copy link
Contributor

@michaelhrabanek michaelhrabanek commented May 12, 2024

This PR addresses the issue described in #4601, implementing the solution proposed by @nilsnolde. Currently, the expansion process is overly verbose because it logs edges multiple times—for each stage of expansion and when an edge is reached from another branch. This PR keeps track of every affected edge during expansion only once, with all properties associated with it (necessary for the final response assembly). Properties such as cost and duration for already tracked edges are updated only at stages that are higher or same (to keep track of latest ones). This also means we keeping only latest prev_edge_it so we lose information about whether an edge was reached from other branches. I'm not sure how significant this issue might be. Thoughts?

This is still a draft, and gurka tests are missing, as I would appreciate some feedback before proceeding further.

Tasklist

@michaelhrabanek michaelhrabanek marked this pull request as draft May 12, 2024 13:40
@kevinkreiser
Copy link
Member

i left a comment on the issue: #4601 (comment)

@kevinkreiser
Copy link
Member

just to be super clear i dont have any problem with us updating this. but id rather it be something you opt into with a flag. maybe dedupe=true or something. and when that flag is on the callback will check if the edge is already in the output and if so only update if the info should override the edge properties (lower cost, final state etc). otherwise (dedupe=false, the default) the code should always add a new feature as it does today so that we can still make sense of the timeseries information.

@michaelhrabanek
Copy link
Contributor Author

Ok, I've added dedupe option, which defaults to false and keeps the original behaviour.

@michaelhrabanek michaelhrabanek marked this pull request as ready for review May 30, 2024 20:19
@@ -12,10 +12,11 @@ The expansion service wraps the `route`, `isochrone` and `sources_to_targets` se

Since this service wraps other services, the request format mostly follows the ones of the [route](../turn-by-turn/api-reference.md#inputs-of-a-route), [isochrone](../isochrone/api-reference.md#inputs-of-the-isochrone-service) and [matrix](../matrix/api-reference.md#inputs-of-the-matrix-service). Additionally, it accepts the following parameters:

| Parameter | Description |
| Parameter | Description |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably dont need the whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I've overlooked this, it's fixed now

| `skip_opposites` (optional) | If set to `true` the output won't contain an edge's opposing edge. Opposing edges can be thought of as both directions of one road segment. Of the two, we discard the directional edge with higher cost and keep the one with less cost. Default false. |
| `action` (required) | The service whose expansion should be tracked. Currently one of `route`, `isochrone` or `sources_to_targets`. |
| `skip_opposites` (optional) | If set to `true` the output won't contain an edge's opposing edge. Opposing edges can be thought of as both directions of one road segment. Of the two, we discard the directional edge with higher cost and keep the one with less cost. Default false. |
| `dedupe` (optional) | If set to `true`, the output will contain each edge only once, significantly reducing the response size. The expansion will keep track of already visited edges and override their properties, ensuring that only the newer one or the one with higher edge states is returned. Default false. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much for documenting!

Suggested change
| `dedupe` (optional) | If set to `true`, the output will contain each edge only once, significantly reducing the response size. The expansion will keep track of already visited edges and override their properties, ensuring that only the newer one or the one with higher edge states is returned. Default false. |
| `dedupe` (optional) | If set to `true`, the output will contain each edge only once, significantly reducing the response size. The expansion will keep track of already visited edges and override their properties, ensuring that only the newer one or the one with higher edge states is returned. Default `false`. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it in skip_opposites too, for consistency

@@ -18,9 +18,10 @@ message Expansion {
repeated uint32 costs = 1 [packed=true];
repeated uint32 durations = 2 [packed=true];
repeated uint32 distances = 3 [packed=true];
repeated EdgeStatus edge_status = 4;
repeated EdgeStatus edge_status = 4 [packed=true];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically a breaking change but its a beta service so oh well i guess

Comment on lines +24 to +26
repeated uint32 edge_status_flags = 7 [packed=true];

repeated Geometry geometries = 7;
repeated Geometry geometries = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving things around in proto is very much a breaking change again it doesnt matter because we call this service beta but we should try not to do this when we dont need to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right of course, but since we're breaking it anyway, I've moved it around ... well just for aesthetic reasons tbh.

Comment on lines 136 to 138
// unfortunately we have to call this before checking if we can skip
// else the tile could change underneath us when we get the opposing
auto shape = tile->edgeinfo(edge).shape();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're saying that because of the call to GetOpposingEdgeId below? if so the simple solution would be to, inside of the skip_opps if block to add auto opp_tile = tile; and use opp_tile in the call to GetOpposingEdgeId. if i misunderstood this comment let me know!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is original comment from @nilsnolde actually, I haven't touch it, it just got marked by diff as I've clean up some unused code around those lines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean would be good to move the grabbing of the shape below the if though in case that method bails before using it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've moved it and removed the original comment

Comment on lines +79 to +90
// check if status is higher or same – as we will keep track of the latest one
static bool is_latest_status(uint8_t current, uint8_t candidate) {
uint8_t first_significant = current | candidate;
// connected(LSb) is the highest status, so check from right
for (uint8_t i = 0; i < Expansion_EdgeStatus_EdgeStatus_ARRAYSIZE; i++) {
uint8_t mask = 1 << i;
if (!(first_significant & mask))
continue;
return candidate & mask;
}
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not obvious to me why we need the complication of remembering all the statuses an edge has seen using a mask. it is true that in bi/multidirectional search an edge can be reached and settled multiple times (once for each search direction) but we dont need to know the prior statuses we only need to know the most important (lowest value) one its ever seen. tell me what im missing here. cant this just be:

Suggested change
// check if status is higher or same – as we will keep track of the latest one
static bool is_latest_status(uint8_t current, uint8_t candidate) {
uint8_t first_significant = current | candidate;
// connected(LSb) is the highest status, so check from right
for (uint8_t i = 0; i < Expansion_EdgeStatus_EdgeStatus_ARRAYSIZE; i++) {
uint8_t mask = 1 << i;
if (!(first_significant & mask))
continue;
return candidate & mask;
}
return false;
}
// check if status is higher or same – as we will keep track of the latest one
static bool is_latest_status(uint8_t current, uint8_t candidate) {
return candidate < current;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yes, that would simplify a lot of things. I implemented it this way because it was the proposed solution in the original issue (#4601). To be honest, I assumed it had some use case I wasn't aware of but would be useful anyway. @nilsnolde, could you please shed some insight on this?

}
stage |= edge_state.at(edgeid).stages_mask;
}
edge_state[edgeid] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i very much appreciate that we dont fill up this large map when not deduping, thank you for that!

Comment on lines +130 to +134
if (tile == nullptr) {
LOG_ERROR("thor_worker_t::expansion error, tile no longer available" +
std::to_string(edgeid.Tile_Base()));
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this was here before but it seems crazy pedantic. the graph reader can fail to get a tile for sure but not one its already seen (ie we know is in the tileset). the worst case is that its a cache miss and it has to pull it from disk or from network but if the algorithm is passing you this graph id for an edge its seen it will have to be able to get the edge. imho we can delete this check, if this happens someone is deleting tiles from the data storage while the application is running which is not worth protecting against

Comment on lines +146 to +160
const auto* edge = tile->directededge(edgeid);
auto shape = tile->edgeinfo(edge).shape();

if (!edge->forward())
std::reverse(shape.begin(), shape.end());
Polyline2<PointLL>::Generalize(shape, gen_factor, {}, false);
if (dedupe) {
uint8_t stage = 1 << status;
if (edge_state.contains(edgeid)) {
auto edge_stages = edge_state.at(edgeid).stages_mask;
// Keep only properties of last/highest status of edge, but update what stages the edge
// has seen
if (!expansion_properties_t::is_latest_status(edge_stages, stage)) {
edge_state.at(edgeid).stages_mask |= stage;
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refactor this section just a little bit to add an early exit before the shape stuff? something like this (pseudo code, i suggest renaming the stage member to status if the thing i said about the mask makes sense):

// if we've seen it before we can update if better and bail either way
auto maybe_state = edge_state.find(edgeid); // works for dedupe == false, since its empty
if (maybe_state != edge_state.cend()) {
  if (maybe_state->status > status)
    edge_state[edgeid].status = status;
  return;
}

// get the shape ready because we're going to use it
const auto* edge = tile->directededge(edgeid);
auto shape = tile->edgeinfo(edge).shape();
if (!edge->forward())
  std::reverse(shape.begin(), shape.end());
Polyline2<PointLL>::Generalize(shape, gen_factor, {}, false);

// either we track it for serialization later or
if (dedupe) {
  edge_state[edgeid] = expansion_properties_t(prev_edgeid, status, duration, distance, shape, cost);
} // we simply serialize it now
else {
  writeExpansionProgress(expansion, edgeid, prev_edgeid, shape, exp_props, status, status, duration, distance, cost, false);
}

@@ -146,6 +199,15 @@ std::string thor_worker_t::expansion(Api& request) {
// anyway
}

// assemble the properties from latest/highest stages it went through
if (dedupe) {
for (const auto& e : edge_state) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its too bad this doesnt maintain insertion order, would be a kind of convenient side effect

Comment on lines +67 to +71
static const std::vector<std::string> status_captions{{"status_connected"},
{"status_settled"},
{"status_reached"}};
for (uint8_t j = 0; j < status_captions.size(); j++)
writer(status_captions[j], (bool)(expansion.edge_status_flags(i) & 1 << j));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think this is unneccesary the best status that is reached implies any others below it were reached

Copy link
Member

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had a few suggested changes but for the most part it looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants