-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated #1888
Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated #1888
Conversation
Aww. May I ask, what was the issue on 3.12, was it some chroma dependency? |
Man this is gonna blow out the (already quite large) list of CI checks. I wonder if there's a way to tell github not to show every test in the matrix... |
This PR itself only adds one CI check, but I can merge a few existing CI checks into a single one. Otherwise, as the number of agents grows, the CI list would grow longer and longer. |
@enyst Yep, it seems so: https://github.com/OpenDevin/OpenDevin/actions/runs/9141617963/job/25136227953#step:5:245 do you have any idea on that? |
Okay now we only have 4 CI checks for integration testing:
|
Sad :( somehow codecov is not able to combine all the reports anymore... checking... UPDATE: fixed |
d297cd7
to
3c8600e
Compare
This reverts commit 040cb3b.
Test coverage dropped by 1.22% because test coverage for SWE-agent was higher than average project coverage. Now we don't count that package into coverage thus the drop. |
…ew agents as deprecated (OpenDevin#1888) * Add MacOS to integration tests * Switch back to python 3.11 * Install Docker for macos pipeline * regenerate.sh: Use environmental variable for sandbox type * Pack different agents' tests into a single check * Fix CodeAct tests * Reduce file match and extensive debug logs * Add TEST_IN_CI mode that reports codecov * Small fix: don't quit if reusing old responses failed * Merge codecov results * Fix typos * Remove coverage merge step - codecov automatically does that * Make mac integration tests as optional - too slow * Fix codecov args * Add comments in yaml * Include sandbox type in codecov report name * Fix codecov report merge * Revert renaming of test_matrix_success * Remove SWEAgent and PlannerAgent from tests * Mark planner agent and SWE agent as deprecated * CodeCov: Ignore planner and sweagent * Revert "Remove SWEAgent and PlannerAgent from tests" This reverts commit 040cb3b. * Remove all tests for SWE Agent * Only keep basic tests for MonologueAgent and PlannerAgent * Mark SWE Agent as deprecated, and ignore code coverage for it --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
This PR reorganizes integration test CI to reduce the number of GitHub Checks. It leverages
regenerate.sh
to avoid inconsistency betweenregenerate.sh
andrun-integration-tests.yml
. This PR also adds MacOS platform to integration testing, but only runs it against ssh sandbox.Motivation of adding tests for MacOS: a month ago, OpenDevin broke on MacOS once: #887. Having integration tests could prevent this from happening again. Note: MacOS integration test suite is optional (non-blocking) due to its slowness.
This PR also marks SWE-Agent as deprecated and removed it from integration testing. They can still be used normally, but eventually we might remove them if we don't see active development and/or usage. Similar for PlannerAgent and MonologueAgent, but at the moment we only remove some tests and retain a basic test to ensure their functionalities don't break.
Finally, this PR also removes excessive logging during the test run.