Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 22.04 for python tbb. hopefully will solve the hang problem. #1719

Merged

Conversation

talregev
Copy link
Contributor

Change the os for python + tbb to be ubuntu 22.04.
It may solve the hang problem.
The hang problem, is a know problem as we can see in this issue:
actions/runner#1326
actions/runner#2454

I am not sure what cause the hang problem. Maybe gtsam code, maybe the machine itself, or maybe python version.

But I want to try change the os. Better then to stay on this state. We can always revert this PR.
I didn't do much tests (I run the CI on my end) or I don't have any prove, but I think the ubuntu 22.04 machine are stronger. Also maybe the os image github action is using for this os is better.

Eventually the ci moving forward with the os version, and delete old version and get new one.
This can be tested only on python tbb to see his behavior for long run.

@varunagrawal
Copy link
Collaborator

Any reason you're not first testing this on your fork's CI to verify correctness?

@varunagrawal varunagrawal merged commit f724f30 into borglab:develop Feb 9, 2024
31 checks passed
@talregev
Copy link
Contributor Author

Any reason you're not first testing this on your fork's CI to verify correctness?

I first test it on my ci before I did a PR to here.
As I write on the first msg.

Thank you for your merge!

@talregev talregev deleted the TalR/python_tbb_ubuntu_22_04 branch February 14, 2024 06:27
varunagrawal added a commit that referenced this pull request Feb 26, 2024
…_22_04"

This reverts commit f724f30, reversing
changes made to 448132a.
@varunagrawal
Copy link
Collaborator

I verified this change in another PR on my local CI and I can confirm that this fixes the CI issue for the Python TBB. Thanks @talregev!

@talregev
Copy link
Contributor Author

talregev commented Mar 10, 2024

@varunagrawal You already revert this commit.

Please consider again the changes here also:
#1714

@varunagrawal
Copy link
Collaborator

varunagrawal commented Mar 10, 2024

Yes, because I wanted to verify the charges one by one. The changes to the swap don't have any effect, so #1714 isn't needed.

I'll be adding the necessary changes in a subsequent PR.

@talregev
Copy link
Contributor Author

talregev commented Mar 10, 2024

The changes to the swap don't have any effect

That correct, and less code is not needed that it better. That mean that the ci don't need memory to compile tbb as it use to be.
This is the point of that change. Remove unneeded changes.

@varunagrawal
Copy link
Collaborator

We need the swap space for the wrapper to compile otherwise the process will sometimes get killed due to being out of memory. We don't add changes unless they are needed. I understand you don't have context for this.

@talregev
Copy link
Contributor Author

We need the swap space for the wrapper to compile otherwise the process will sometimes get killed due to being out of memory. We don't add changes unless they are needed. I understand you don't have context for this.

Can you share a link to recent ci fail due to out of memory error?

@varunagrawal
Copy link
Collaborator

This was from a few years ago. Sorry but I don't have the time to devote to this issue anymore.

@borglab borglab deleted a comment from talregev Mar 10, 2024
@varunagrawal
Copy link
Collaborator

This was from a few years ago. Sorry but I don't have the time to devote to this issue anymore.

Exactly my point. The issue with out memory is not relevant anymore.

I don't think you're referring to the same thing. The CI would kill the wrapper compilation due to lack of RAM, and ever since we increased the swap space we haven't had that issue. Until I see an analysis showing that the extra swap is no longer needed, I don't agree with removing it and potentially reintroducing the compilation memory issue.

The OOM issue this PR fixed is specific to the wrapper+TBB. This OOM issue seems to stem from running the tests. so at run time, not compile time.

Also, I think I accidentally deleted your last comment. I was on my phone when responding and might have swiped incorrectly. I have quoted your comment for posterity.

@talregev
Copy link
Contributor Author

talregev commented Mar 15, 2024

The CI would kill the wrapper compilation due to lack of RAM

This is the exact issue I am talking about. It happen on compilation wrapper + python in windows and not happen any more because microsoft upgrade the vms memory. The same apply for linux vms.
You can spot that I was report about this on this PR:
#1413 (comment)

@talregev
Copy link
Contributor Author

Until I see an analysis showing that the extra swap is no longer needed

You already see the CI run for 2-3 weeks (until you decide to remove this commit) with compile wrapper + tbb and not fail on lack of memory.

@varunagrawal
Copy link
Collaborator

The CI would kill the wrapper compilation due to lack of RAM

This is the exact issue I am talking about. It happen on compilation wrapper + python in windows and not happen any more because microsoft upgrade the vms memory. The same apply for linux vms.
You can spot that I was report about this on this PR:
#1413 (comment)

I'll need to see some proof of this, such as an upgrade notice from GitHub or Microsoft. Your comment is conjecture at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants