Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix torch elastic test SimpleElasticAgentTest.test_restart_workers br… #126002

Closed
wants to merge 1 commit into from

Conversation

cdzhan
Copy link
Contributor

@cdzhan cdzhan commented May 11, 2024

Failure Info:

(pt) betterman@bjys1009:/projs/framework/betterman/code/pytorch_new/test/distributed/elastic/agent/server/test$ pytest api_test.py -k test_restart_workers
=============================================================================================================================================== test session starts ================================================================================================================================================
platform linux -- Python 3.10.8, pytest-8.1.1, pluggy-1.4.0
rootdir: /projs/framework/betterman/code/pytorch_new
configfile: pytest.ini
plugins: hypothesis-6.15.0, rerunfailures-14.0, flakefinder-1.1.0, xdist-3.3.1
collecting 1 item                                                                                                                                                                                                                                                                                                  /
projs/framework/betterman/code/pytorch_new/test/distributed/elastic/agent/server/test/api_test.py:123: PytestCollectionWarning: cannot collect test class 'TestAgent' because it has a __init__ constructor (from: test/distributed/elastic/agent/server/test/api_test.py)
  class TestAgent(SimpleElasticAgent):
collected 29 items / 28 deselected / 1 selected
Running 1 items in this shard

api_test.py F                                                                                                                                                                                                                                                                                                [100%]

===================================================================================================================================================== FAILURES =====================================================================================================================================================
___________________________________________________________________________________________________________________________________ SimpleElasticAgentTest.test_restart_workers ____________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/projs/framework/betterman/code/pytorch_new/test/distributed/elastic/agent/server/test/api_test.py", line 368, in test_restart_workers
    agent._restart_workers(worker_group)
  File "/projs/framework/betterman/code/pytorch_new/torch/distributed/elastic/metrics/api.py", line 123, in wrapper
    result = f(*args, **kwargs)
  File "/projs/framework/betterman/code/pytorch_new/torch/distributed/elastic/agent/server/api.py", line 728, in _restart_workers
    self._stop_workers(worker_group, is_restart=True)
TypeError: TestAgent._stop_workers() got an unexpected keyword argument 'is_restart'
============================================================================================================================================= short test summary info ==============================================================================================================================================
FAILED [0.0054s] api_test.py::SimpleElasticAgentTest::test_restart_workers - TypeError: TestAgent._stop_workers() got an unexpected keyword argument 'is_restart'
========================================================================================================================================= 1 failed, 28 deselected in 7.37s =========================================================================================================================================

Caused by #124819 .

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

Copy link

pytorch-bot bot commented May 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126002

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c0b1fe1 with merge base b24ad7e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels May 11, 2024
@ezyang
Copy link
Contributor

ezyang commented May 14, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 14, 2024
@ezyang ezyang added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed ciflow/trunk Trigger trunk jobs on your pull request labels May 14, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@cdzhan cdzhan deleted the cdzhan-patch-5 branch May 16, 2024 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants