Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check global place for pathological slowdowns, takes 2x longer than detailed route #4974

Closed
oharboe opened this issue Apr 18, 2024 · 40 comments
Assignees
Labels
gpl Global Placement

Comments

@oharboe
Copy link
Collaborator

oharboe commented Apr 18, 2024

Description

22000 seconds to complete. Is there any low hanging fruit here?

make global_place_issue tar file https://drive.google.com/file/d/1KHxaEr9AFHwCJbsMKQFg5nv3Ndb2-bgw/view?usp=sharing

Suggested Solution

Have a look to see if there's anything easy that can be done here?

Additional Context

No response

@gudeh gudeh self-assigned this Apr 18, 2024
@gudeh
Copy link
Contributor

gudeh commented Apr 18, 2024

Hi @oharboe, I was unable to execute the ./run-me of your generated issue tar:

./run-me-BoomTile-asap7-base.sh
OpenROAD v2.0-13318-g19635e967
Features included (+) or not (-): +Charts +GPU +GUI +MPL2 +PAR +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
Error: read_liberty.tcl, 25 cannot read file /home/oyvind/.cache/bazel/_bazel_oyvind/7e6ad621f3f951c3ee6f5b179289b54e/execroot/_main/bazel-out/k8-fastbuild/bin/results/asap7/l2_tlb_ram_0_512x46/base/l2_tlb_ram_0_512x46.lib.
openroad>

@maliberty
Copy link
Member

Its a packaging error - it should be ./home not /home in var*sh

@gudeh
Copy link
Contributor

gudeh commented Apr 18, 2024

Furthermore, looking at the logs I see an unusual behavior from the GPL, it stopped with 0.781807 overflow, it usually stops with 0.10 or less. And routability mode was not activated. I understand this can happen when density is too high.

@maliberty
Copy link
Member

how many iterations?

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 18, 2024

Its a packaging error - it should be ./home not /home in var*sh

It was created with make global_place_issue. So that script needs a tweak...

@gudeh
Copy link
Contributor

gudeh commented Apr 18, 2024

It stopped on iteration 280. Here are the last ones:

[NesterovSolve] Iter: 240 overflow: 0.869161 HPWL: 12149035473
[NesterovSolve] Iter: 250 overflow: 0.849265 HPWL: 13253550712
[NesterovSolve] Iter: 260 overflow: 0.827114 HPWL: 14365264704
[NesterovSolve] Iter: 270 overflow: 0.805211 HPWL: 15373704557
[NesterovSolve] Iter: 280 overflow: 0.781807 HPWL: 16242167774

@maliberty
Copy link
Member

In my run I see

[NesterovSolve] Iter: 1370 overflow: 0.235532 HPWL: 19759299769
[NesterovSolve] Iter: 1380 overflow: 0.209551 HPWL: 19286141404
[INFO GPL-0075] Routability numCall: 8 inflationIterCnt: 3 bloatIterCnt: 1

and still running. I wonder if the log provided was from an incomplete run.

@maliberty
Copy link
Member

I suspect the extra time comes from these

[NesterovSolve] Revert back to snapshot coordi

@gudeh gudeh added the gpl Global Placement label Apr 19, 2024
@gudeh
Copy link
Contributor

gudeh commented Apr 19, 2024

I ran the issue and it also reached iteration 1380 after some hours, it may be stuck there, but it is still running.

I also ran the issue without routability and it finished GPL on iteration 500. So @oharboe, you can consider turning off routability mode in GPL. Or increasing the target RC parameter.

We recently adjusted the default target routing congestion for GPL routability mode (from 1.25 to 1.00), which now causes this design to activate routability mode, previously it wouldn’t activate. However, this change significantly extend the completion time.

I am curious about the implications for DRT runtime in both scenarios, since routability is actually able to reduce routing congestion from 1.15 to 1.05 at least until iteration 1380.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 19, 2024

@gudeh Silly question: what parameters exactly should I adjust in ORFS?

@gudeh
Copy link
Contributor

gudeh commented Apr 19, 2024

To turn off routability you can comment out lines 32 to 37 in /flow/scripts/global_place.tcl. Or if you wish to turn off routability mode only on this design you can put export GPL_ROUTABILITY_DRIVEN = 0 in the config.mk file of the design.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 19, 2024

What about the "RC" parameter, what is that?

@gudeh
Copy link
Contributor

gudeh commented Apr 19, 2024

If routability is activated, the global placer will try to improve the routing congestion during placement, it inflates the cells to do so. The target RC is the target routing congestion it attempts to reach during this process. Every time it does not reach the desired target RC it tries again, starting from the [NesterovSolve] Revert back to snapshot coordi. It keeps trying to do so if it notices the final RC is decreasing.

Under your situation, I would suggest changing the target RC to 1.10, since it quickly reaches 1.09:
[INFO GPL-0074] FinalRC: 1.096847, right after iteration 580. This way you can improve the routability for DRT without paying too much extra runtime during GPL. You can do so by adding this to your config.mk: export GPL_TARGET_RC = 1.10

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 19, 2024

@gudeh Will try. Perhaps this github issue can be put to bed if the progress messages are improved to include the advice you have above?

The user experience is then that routing takes a long time, the user looks at the logs where some advice on adjusted settings to rein in runtimes is found...

@gudeh
Copy link
Contributor

gudeh commented Apr 19, 2024

Sorry @oharboe, I did not understand what you mean with:

The user experience is then that routing takes a long time, the user looks at the logs where some advice on adjusted settings to rein in runtimes is found...

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 19, 2024

@maliberty @gudeh The problem for the user is that global routing runs "forever" here. The fix is to adjust the parameters to global route. So what is to "solve" the feature request in this issue is to improve the user experience, not to a change to global routing as such. If I understand correctly.

I think that the user experience could be improved to the point that this issue is "fixed" if the progress messages in global route included advice on how to adjust global routing parameters.

@gudeh
Copy link
Contributor

gudeh commented Apr 19, 2024

The target RC is a GPL parameter actually. With a higher target RC value, the GPL should call the global router less frequently.

Furthermore, we are in the process of substituting the global router used during GPL routability, going from fastroute to rudy, which is much faster.

@gudeh
Copy link
Contributor

gudeh commented Apr 19, 2024

Either way, I can try to improve the log messages during GPL, if that is your suggestion.

@maliberty
Copy link
Member

@gudeh do you know where the congestion is that routability isn't able to resolve? It seems we are stuck in loop that becomes mostly futile after the first few iterations.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 19, 2024

Either way, I can try to improve the log messages during GPL, if that is your suggestion.

That's my idea and understanding. I'd like to hear what others think...

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 20, 2024

@maliberty @gudeh Do you need any further input from me? It looks like I would just be in the way and create long turnaround times if you try to instruct me to run experiments, I misunderstand and then you try to interpret my slightly off experiments... It is probably easier and faster for you to run your own experiments with options?

@maliberty
Copy link
Member

@gudeh I see on both this design and ariane/gf12 thin bands of congestion right after the routability iteration, eg
image

They seem solvable as the surrounding area is not congested. I'm wondering if

  grouter_->setOverflowIterations(0);

may be too conservative. Perhaps you can experiment

@gudeh
Copy link
Contributor

gudeh commented Apr 22, 2024

@oharboe could you please send the config.mk file? I could not find it on the make issue tar.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 22, 2024

@oharboe could you please send the config.mk file? I could not find it on the make issue tar.

This is from bazel-orfs, so it is a bit of a work in progress. Could you make do with vars-*.sh file that is in the .tar.gz file for now?

@gudeh
Copy link
Contributor

gudeh commented Apr 22, 2024

I wanted the file so I could go forward with the flow and check the runtime during DRT. @maliberty is there anyway to do that without the config.mk? Or should I just build a config.mk manually?

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 22, 2024

With GPL_ROUTABILITY_DRIVEN=0, the running times for megaboom are:

Log Elapsed seconds Percent Complete
1_1_yosys 3965 8
1_1_yosys_hier_report 3615 15
2_1_floorplan 132 15
2_2_floorplan_io 12 15
2_4_floorplan_macro 676 16
2_5_floorplan_tapcell 546 17
2_6_floorplan_pdn 320 17
3_1_place_gp_skip_io 711 18
3_2_place_iop 27 18
3_3_place_gp 6143 32
3_4_place_resized 583 32
3_5_place_dp 1177 33
4_1_cts 449 34
5_1_grt 3249 41
5_2_fillcell 77 41
5_3_route 17573 71
6_1_merge 383 72
6_report 6416 85
generate_abstract 863 86
Total 46917 100

@maliberty
Copy link
Member

@oharboe I think the pdn stripe is too close to the macro. Could you modify your pdn strategy to include -halo {2.0 2.0 2.0 2.0} on the define_pdn_grid for the macros and re-run?

I'll look at making this change to asap7 in general.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 22, 2024

@oharboe I think the pdn stripe is too close to the macro. Could you modify your pdn strategy to include -halo {2.0 2.0 2.0 2.0} on the define_pdn_grid for the macros and re-run?

I'll look at making this change to asap7 in general.

Is this a case where global routing or pdn could add an actionable warning/error/progress message or is a case of trivial to see after 30 hears of ASIC experience? :-)

I'll give it a go:

$ git diff
diff --git a/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl b/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl
index 2a95094e..4f91331e 100644
--- a/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl
+++ b/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl
@@ -30,6 +30,6 @@ add_pdn_connect -grid {top} -layers {M5 M6}
 # Element grid
 ####################################
 # The halo around the macro prevents pdn from blocking pin access
-define_pdn_grid -macro -cells $::env(MACROS) -halo "0.25 0.25 0.25 0.25" -voltage_domains {CORE} -name ElementGrid
+define_pdn_grid -macro -cells $::env(MACROS) -halo "2.0 2.0 2.0 2.0" -voltage_domains {CORE} -name ElementGrid

@maliberty
Copy link
Member

Once you look at the congestion map you can see all the congestion is right around these stripes. I'm not sure there is an simple way to detect and make a message out of that.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 22, 2024

Once you look at the congestion map you can see all the congestion is right around these stripes. I'm not sure there is an simple way to detect and make a message out of that.

Maybe a more general progress message that this isn't converging in a normal amount of time?

I'm haggling here for what is practical and makes sense in terms of actionable feedback that helps to educate the user...

@maliberty
Copy link
Member

I'm not sure what action to suggest. I didn't know until I dug into it more.

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 22, 2024

@maliberty Please try to run this, it should have the updated halo for PDN. Only top level(BoomTile), I didn't redo the macros. https://drive.google.com/file/d/1Ri9YtRqJnGa2zVGIYe1b5KTOQ6Ru7TA_/view?usp=sharing

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 22, 2024

@maliberty Ran out of memory after a while on my laptop:

[NesterovSolve] Iter: 430 overflow: 0.234661 HPWL: 21465138028
[NesterovSolve] Iter: 440 overflow: 0.212438 HPWL: 20700377471
[INFO GPL-0100] worst slack 5.28e-10
[INFO GPL-0103] Weighted 177853 nets.
[INFO GPL-0075] Routability numCall: 1 inflationIterCnt: 1 bloatIterCnt: 0
./run-me-BoomTile-asap7-base.sh: line 7: 2076448 Killed                  openroad -no_init ${SCRIPTS_DIR}/global_place.tcl
oyvind@small-cigar:~/megaboom/bar$ echo $?
137

137 is the exit code for running out of of memory.

@gudeh
Copy link
Contributor

gudeh commented Apr 23, 2024

@oharboe running the files you sent last, GPL converged until the end. Although it still took a lot of iterations (2660):
image

@maliberty
Copy link
Member

Your test case packaging still has issues. In var*sh full paths are used:

export OBJECTS_DIR="/home/oyvind/.cache/bazel/_bazel_oyvind/7e6ad621f3f951c3ee6f5b179289b54e/execroot/_main/bazel-out/k8-fastbuild/bin/objects/asap7/BoomTile/base"

@oharboe
Copy link
Collaborator Author

oharboe commented Apr 23, 2024

@maliberty I am aware. I am investigating a fix to bazel-orfs or ORFS. Stay tuned.

@gudeh
Copy link
Contributor

gudeh commented Jun 12, 2024

Hi @oharboe, we merged RUDY for routability mode today! You should not find yourself stuck on "roubatility numcall" messages anymore!

@oharboe
Copy link
Collaborator Author

oharboe commented Jun 12, 2024

Fantastic! We are also using RUDY for fast turnaround heatmaps. Very nice feature!!!

@gudeh
Copy link
Contributor

gudeh commented Jun 12, 2024

That's great! Do you think we can close this issue? I see you have other ones about gpl messages also, I will try to modify them a little so messages are more clear.

@oharboe
Copy link
Collaborator Author

oharboe commented Jun 12, 2024

Yes, I think we can close. I will open a new issue if I observe anything that merits further followup woork.

@oharboe oharboe closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpl Global Placement
Projects
None yet
Development

No branches or pull requests

3 participants