Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover_power is very slow #4982

Closed
oharboe opened this issue Apr 22, 2024 · 4 comments
Closed

recover_power is very slow #4982

oharboe opened this issue Apr 22, 2024 · 4 comments
Assignees
Labels
grt Global Routing rsz Resizer

Comments

@oharboe
Copy link
Collaborator

oharboe commented Apr 22, 2024

Description

Tar made with make DESIGN_CONFIG=designs/asap7/jpeg_lvt/config.mk global_route_issue

untar https://drive.google.com/file/d/1Aax8fTJl9KyA0rl7YNv8mwu5yG4kaBeo/view?usp=sharing

Run:

./run-me-jpeg_lvt-asap7-base.sh
$ ./run-me-jpeg_lvt-asap7-base.sh 
OpenROAD v2.0-13348-gd423155d6
[deleted]
[quickly, a few minutes, gets to here and then it is stuck for ca. 1-2 hours]
Downsizing/switching to higher Vt  for non critical gates for power recovery
Percent of paths optimized 100
tns 0.00
wns 0.00
Group                  Internal  Switching    Leakage      Total
                          Power      Power      Power      Power (Watts)
----------------------------------------------------------------
Sequential             1.34e-02   2.66e-03   6.57e-06   1.61e-02  20.9%
Combinational          2.25e-02   3.23e-02   1.04e-05   5.48e-02  71.2%
Clock                  3.64e-03   2.43e-03   1.39e-07   6.07e-03   7.9%
Macro                  0.00e+00   0.00e+00   0.00e+00   0.00e+00   0.0%
Pad                    0.00e+00   0.00e+00   0.00e+00   0.00e+00   0.0%
----------------------------------------------------------------
Total                  3.96e-02   3.74e-02   1.71e-05   7.70e-02 100.0%
                          51.4%      48.6%       0.0%

Compared to the rest of the flow, global route is surprisingly slow:

Log                            Elapsed seconds Peak Memory/KB
1_1_yosys                                  243         971412
2_1_floorplan                               27         510688
2_2_floorplan_io                             4         363812
2_3_floorplan_tdms                           4         362360
2_4_floorplan_macro                          4         367364
2_5_floorplan_tapcell                        4         325804
2_6_floorplan_pdn                            6         381020
3_1_place_gp_skip_io                        38         461672
3_2_place_iop                                4         372996
3_3_place_gp                               613        1048224
3_4_place_resized                           94         602152
3_5_place_dp                                78         657092
4_1_cts                                    152         747132
5_1_grt                                   6137        1298604
5_2_fillcell                                 5         494576
5_3_route                                  543       11470100
6_1_merge                                   15         815860
6_report                                   198        2751824
Total                                     8169       11470100

Suggested Solution

Hopefully this is just a good example of a pathological slowdown and making it faster should be possible.

Additional Context

No response

@maliberty maliberty added the rsz Resizer label Apr 22, 2024
@eder-matheus eder-matheus self-assigned this Apr 22, 2024
@maliberty maliberty added the grt Global Routing label Apr 28, 2024
@kbieganski
Copy link
Contributor

94% of the time is spent in:

void RecoverPower::recoverPower(const float recover_power_percent)
{
init();
constexpr int digits = 3;
resize_count_ = 0;
resizer_->buffer_moved_into_core_ = false;
// Sort failing endpoints by slack.
VertexSet* endpoints = sta_->endpoints();
VertexSeq ends_with_slack;
for (Vertex* end : *endpoints) {
const Slack end_slack = sta_->vertexSlack(end, max_);
if (end_slack > setup_slack_margin_
&& end_slack < setup_slack_max_margin_) {
ends_with_slack.push_back(end);
}
}
sort(ends_with_slack, [=](Vertex* end1, Vertex* end2) {
return sta_->vertexSlack(end1, max_) > sta_->vertexSlack(end2, max_);
});
debugPrint(logger_,
RSZ,
"recover_power",
1,
"Candidate paths {}/{} {}%",
ends_with_slack.size(),
endpoints->size(),
int(ends_with_slack.size() / double(endpoints->size()) * 100));
int max_end_count = ends_with_slack.size() * recover_power_percent;
// As long as we are here fix at least one path
max_end_count = std::max(max_end_count, 1);
resizer_->incrementalParasiticsBegin();
resizer_->updateParasitics();
sta_->findRequireds();
Slack worst_slack_before;
Vertex* worst_vertex;
sta_->worstSlack(max_, worst_slack_before, worst_vertex);
int end_index = 0;
int failed_move_threshold = 0;
for (Vertex* end : ends_with_slack) {
const Slack end_slack_before = sta_->vertexSlack(end, max_);
Slack worst_slack_after;
//=====================================================================
// Just a counter to know when to break out
end_index++;
debugPrint(logger_,
RSZ,
"recover_power",
2,
"Doing {} /{}",
end_index,
max_end_count);
if (end_index > max_end_count) {
break;
}
//=====================================================================
resizer_->journalBegin();
PathRef end_path = sta_->vertexWorstSlackPath(end, max_);
const bool changed = recoverPower(end_path, end_slack_before);
if (changed) {
resizer_->updateParasitics(true);
sta_->findRequireds();
const Slack end_slack_after = sta_->vertexSlack(end, max_);
sta_->worstSlack(max_, worst_slack_after, worst_vertex);
const float worst_slack_percent = fabs(
(worst_slack_before - worst_slack_after) / worst_slack_before * 100);
const bool better
= (worst_slack_percent < 0.0001
|| (worst_slack_before > 0
&& worst_slack_after / worst_slack_before > 0.5));
debugPrint(logger_,
RSZ,
"recover_power",
2,
"slack = {} worst_slack = {} better = {}",
delayAsString(end_slack_after, sta_, digits),
delayAsString(worst_slack_after, sta_, digits),
better ? "save" : "");
if (better) {
failed_move_threshold = 0;
resizer_->journalBegin();
debugPrint(logger_,
RSZ,
"recover_power",
2,
"{}/{} Resize for power Slack change {} -> {}",
end_index,
ends_with_slack.size(),
worst_slack_before,
worst_slack_after);
} else {
// Undo the change here.
++failed_move_threshold;
if (failed_move_threshold > failed_move_threshold_limit_) {
logger_->info(RSZ,
142,
"{} successive tries yielded negative slack. Ending "
"power recovery",
failed_move_threshold_limit_);
break;
}
int resize_count = 100;
int inserted_buffer_count = 100;
int cloned_gate_count = 100;
resizer_->journalRestore(
resize_count, inserted_buffer_count, cloned_gate_count);
resizer_->updateParasitics();
sta_->findRequireds();
debugPrint(logger_,
RSZ,
"recover_power",
2,
"{}/{} Undo resize for power Slack change {} -> {}",
end_index,
ends_with_slack.size(),
worst_slack_before,
worst_slack_after);
}
if (resizer_->overMaxArea()) {
break;
}
}
}
resizer_->incrementalParasiticsEnd();
// TODO: Add the appropriate metric here
// logger_->metric("design__instance__count__setup_buffer",
// inserted_buffer_count_);
if (resize_count_ > 0) {
logger_->info(RSZ, 141, "Resized {} instances.", resize_count_);
}
if (resizer_->overMaxArea()) {
logger_->error(RSZ, 125, "max utilization reached.");
}
}

It's doing ~4500 of iterations of this:

for (Vertex* end : ends_with_slack) {

Each iteration is about 1s, so it adds up.

There are three parameters that determine the iteration count. One is customizable via Tcl, it's called RECOVER_POWER:

int max_end_count = ends_with_slack.size() * recover_power_percent;

This parameter is set to 0 by default for most designs in flow-scripts, like Ibex or BlackParrot, so this step is skipped entirely there. For jpeg_lvt, it's set to 100%.
(sorry if this is obvious to you)

The other two are setup_slack_margin_ and setup_slack_max_margin_:

if (end_slack > setup_slack_margin_
&& end_slack < setup_slack_max_margin_) {
ends_with_slack.push_back(end);

They're defined here:

// Paths with slack more than this would be considered for power recovery
static constexpr float setup_slack_margin_ = 1e-11;
// For paths with no timing the max margin is INT_MAX. We need to filter those
// out (using 1e-4)
static constexpr float setup_slack_max_margin_ = 1e-4;

They are constants, so they cannot be tweaked by the user. They were chosen in this commit: 2611d81, not sure if a rationale was provided.

Anyway, it doesn't seem like a pathological case, unless these constants are incorrect. The only options I see are to either tweak them or the RECOVER_POWER param, or speed up sta::Search::findRequireds() which takes up 88% of the run time (called by rsz::RecoverPower::recoverPower).

@maliberty
Copy link
Member

RECOVER_POWER is an opt-in flow variable. For prototyping I don't think it is helpful unless you are trying to make a study of power. We will address the performance but I'm not sure it should matter to you.

@maliberty maliberty assigned kbieganski and unassigned eder-matheus May 8, 2024
@maliberty maliberty changed the title Surprisingly slow global route compared to rest of flow recover_power is very slow May 8, 2024
@kbieganski
Copy link
Contributor

@oharboe Can you try this again with the current master? Is the performance acceptable now?

@oharboe
Copy link
Collaborator Author

oharboe commented May 29, 2024

@kbieganski We don't actually use recover_power currently, but thanks for fixing this!

@oharboe oharboe closed this as completed May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grt Global Routing rsz Resizer
Projects
None yet
Development

No branches or pull requests

4 participants