Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated #1888

Merged

Conversation

li-boxuan
Copy link
Collaborator

@li-boxuan li-boxuan commented May 18, 2024

This PR reorganizes integration test CI to reduce the number of GitHub Checks. It leverages regenerate.sh to avoid inconsistency between regenerate.sh and run-integration-tests.yml. This PR also adds MacOS platform to integration testing, but only runs it against ssh sandbox.

Motivation of adding tests for MacOS: a month ago, OpenDevin broke on MacOS once: #887. Having integration tests could prevent this from happening again. Note: MacOS integration test suite is optional (non-blocking) due to its slowness.

This PR also marks SWE-Agent as deprecated and removed it from integration testing. They can still be used normally, but eventually we might remove them if we don't see active development and/or usage. Similar for PlannerAgent and MonologueAgent, but at the moment we only remove some tests and retain a basic test to ensure their functionalities don't break.

Finally, this PR also removes excessive logging during the test run.

@li-boxuan li-boxuan added the test Test cases and testing framework changes label May 18, 2024
@enyst
Copy link
Collaborator

enyst commented May 18, 2024

UPDATE: I cannot get it work on 3.12. Switching back to 3.11.

Aww. May I ask, what was the issue on 3.12, was it some chroma dependency?

@rbren
Copy link
Collaborator

rbren commented May 18, 2024

Man this is gonna blow out the (already quite large) list of CI checks. I wonder if there's a way to tell github not to show every test in the matrix...

@li-boxuan
Copy link
Collaborator Author

Man this is gonna blow out the (already quite large) list of CI checks. I wonder if there's a way to tell github not to show every test in the matrix...

This PR itself only adds one CI check, but I can merge a few existing CI checks into a single one. Otherwise, as the number of agents grows, the CI list would grow longer and longer.

@li-boxuan
Copy link
Collaborator Author

Aww. May I ask, what was the issue on 3.12, was it some chroma dependency?

@enyst Yep, it seems so: https://github.com/OpenDevin/OpenDevin/actions/runs/9141617963/job/25136227953#step:5:245 do you have any idea on that?

@li-boxuan
Copy link
Collaborator Author

Okay now we only have 4 CI checks for integration testing:

  • linux + ssh sandbox
  • linux + local sandbox
  • linux + exec sandbox
  • mac + ssh sandbox

@li-boxuan li-boxuan changed the title Add MacOS to integration tests CI: rewrite run-integration-tests.yml and add MacOS tests May 18, 2024
@li-boxuan li-boxuan changed the title CI: rewrite run-integration-tests.yml and add MacOS tests CI: reorganize run-integration-tests.yml and add MacOS tests May 18, 2024
@li-boxuan
Copy link
Collaborator Author

li-boxuan commented May 18, 2024

Sad :( somehow codecov is not able to combine all the reports anymore... checking...

UPDATE: fixed

@li-boxuan li-boxuan marked this pull request as ready for review May 19, 2024 04:27
@li-boxuan li-boxuan requested a review from rbren May 19, 2024 22:57
@li-boxuan li-boxuan changed the title CI: reorganize run-integration-tests.yml and add MacOS tests Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated May 21, 2024
@li-boxuan li-boxuan force-pushed the test/add-macos-to-integration-tests branch from d297cd7 to 3c8600e Compare May 21, 2024 02:52
@li-boxuan
Copy link
Collaborator Author

li-boxuan commented May 23, 2024

Test coverage dropped by 1.22% because test coverage for SWE-agent was higher than average project coverage. Now we don't count that package into coverage thus the drop.

@li-boxuan li-boxuan merged commit acb430e into OpenDevin:main May 23, 2024
20 checks passed
@li-boxuan li-boxuan deleted the test/add-macos-to-integration-tests branch May 23, 2024 03:40
super-dainiu pushed a commit to super-dainiu/OpenDevin that referenced this pull request May 23, 2024
…ew agents as deprecated (OpenDevin#1888)

* Add MacOS to integration tests

* Switch back to python 3.11

* Install Docker for macos pipeline

* regenerate.sh: Use environmental variable for sandbox type

* Pack different agents' tests into a single check

* Fix CodeAct tests

* Reduce file match and extensive debug logs

* Add TEST_IN_CI mode that reports codecov

* Small fix: don't quit if reusing old responses failed

* Merge codecov results

* Fix typos

* Remove coverage merge step - codecov automatically does that

* Make mac integration tests as optional - too slow

* Fix codecov args

* Add comments in yaml

* Include sandbox type in codecov report name

* Fix codecov report merge

* Revert renaming of test_matrix_success

* Remove SWEAgent and PlannerAgent from tests

* Mark planner agent and SWE agent as deprecated

* CodeCov: Ignore planner and sweagent

* Revert "Remove SWEAgent and PlannerAgent from tests"

This reverts commit 040cb3b.

* Remove all tests for SWE Agent

* Only keep basic tests for MonologueAgent and PlannerAgent

* Mark SWE Agent as deprecated, and ignore code coverage for it

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Test cases and testing framework changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants