Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an id property to all test cases #698

Open
Julian opened this issue Oct 31, 2023 · 4 comments
Open

Add an id property to all test cases #698

Julian opened this issue Oct 31, 2023 · 4 comments
Assignees
Labels
enhancement An enhancement to the tooling or structure of the suite (as opposed to a new test). good first issue An issue that is a good candidate for new contributors

Comments

@Julian
Copy link
Member

Julian commented Oct 31, 2023

Problem: Tests in the suite are occasionally moved around in the file, or their descriptions are clarified, or they are modified because they have a bug. Given that the intended use of the test suite is that downstream users -- (all implementations, Bowtie, other research use cases) -- be able to run the tests, they do not have a way to "track" a test over time. Specifically, say an implementation has a known issue with a specific test for $dynamicRef -- a test case called $dynamicRef flubs the bub when you bla bla. An implementer who wishes to mark this test as a TODO in their test suite has no reliable way to do so over the long term, because it's name, position within the file, or even file itself may change here upstream for various reasons.

We should therefore introduce a stable ID which we guarantee will never change for a test, even if its description changes, or if the test is moved elsewhere within the file or suite.

A "slug" is likely the simplest choice structure-wise, in that it's derivable from the existing test descriptions. UUIDs or globally incrementing integers are other options (see below for considerations).

This addition should also include a sanity check ensuring that the IDs have the properties we wish of them (i.e. uniqueness, see below).

For context, we had previously added this exact concept to the test case schema in a PR #53 but did not ever add it to tests

Considerations:

  • The IDs should likely be version-wide unique -- i.e. not simply within one file, though also not necessarily across the whole test suite. This is to ensure we can move tests between files if we ever recategorize them (c.f. e.g. several misclassified tests using URNs in $id #630 which we ultimately haven't done, but we have in other cases) This would mean a true "global" ID for a test would be the composite ID (version, ID).
  • We should spend a moment considering whether IDs should be historically unique as well -- i.e. if we delete a test for whatever reason, not reuse the ID we gave it.
  • If a test was incorrect, we should likely change it's ID, given that anyone referencing it has possibly skipped it because they indeed had the correct behavior -- but this may need a bit more thought.
  • We may need to be more precise in the future in cases where we modify a test in deciding whether the new test is the "same" and therefore should keep its ID or "new" in which case it should get a new one. In general we should err heavily towards never modifying tests in any non-trivial way, so this hopefully will never come up "in practice" but it undoubtedly has once or twice previously.
  • We need to consider what to do about tests themselves (rather than test cases) -- one would suspect we could rely on the ordering of the test array as IDs for tests, but again, this means we are not able to reliably reorder tests within a test case without disrupting downstream or forcing reliance on their test names. It seems initially then like we should include IDs for tests as well.
  • The test case schema will need updating to document and allow this new property. We should initially make the property optional until all tests in the suite have it, at which point we can then require the property.
  • We should probably write the above considerations down in the README when merging a PR implementing this.
  • Please do not send a giant PR which does this for all files all at once. Send small PRs, perhaps one file at a time, or even less, to help reviewers be able to review the PR reasonably.
@Julian Julian added enhancement An enhancement to the tooling or structure of the suite (as opposed to a new test). good first issue An issue that is a good candidate for new contributors labels Oct 31, 2023
@mwadams
Copy link
Contributor

mwadams commented Nov 15, 2023

If you'd like to assign me to this, I'll take it on.

I propose the following in response to this:

  1. Use UUIDs for the IDs, and make them globally unique, rather than just unique within the version of the spec.
  2. Using Array index for the tests themselves is extremely brittle - people keep arbitrarily adding tests in the middle of the lists, and it is difficult to enforce. I propose using a UUID for each test within the test suite.
  3. I will write a tool to update the IDs for everything and land a complete update all in one go
  4. If a test changes in any way, I propose that its ID should be changed, and we should document that in the README
  5. Note that with respect to (4) a test that relies on the metaschema should not be considered to be changed if the metaschema changes.

@Relequestual
Copy link
Member

I originally presumed we would have some sort of human readable ID. UUIDs would be simpler in some respects.
I guess it depends on the intended use case.

If someone has a list of tests they skip, having human readable names which partially suggest the sort of test in question, it could be helpful. However, this wouldn't be the best if we agree that IDs should change if the test changes (which I think is a good idea).

In that case, I wonder if a simpler approach is to just MD5 the test object... although I suspect because of JSON and object ordering, that might not end up the same depending on various things.

All that to say, are we convinced UUIDs as IDs would be the best here?

@mwadams
Copy link
Contributor

mwadams commented Nov 15, 2023

I did consider an MD5 hash of the test object (and the schema in the case of the test case) but that makes it rather harder to generate while developing the tests, and feels like a barrier to entry.

@Julian
Copy link
Member Author

Julian commented Nov 15, 2023

I don't really have any strong preference on the structure of the IDs -- I think I mentioned I think slugs are better because of the extra benefit (of being user readable and usable for other reasons in isolation) but if I'm not doing the work I definitely don't care.

The only thing I do care about is:

I will write a tool to update the IDs for everything and land a complete update all in one go

I don't personally want to (ever) review large automated PRs (and have said elsewhere that as far as I have seen in "real life" it's simply impossible for any human to do.) -- meaning I'm happy to review the script you write to do this, or else for someone else who thinks it's possible to do the review -- though experience in this repo itself has reaffirmed as far as I'm concerned that this is the case :) -- the last two times I said this (no large automated PRs) and others reviewed it, the PRs indeed introduced subtle test bugs that were fixed over a few months afterwards (not that it took months to fix, it took months to notice, which is often the problem).

So yeah, happy to review the script, or else to say "find any other test suite contributor willing to review the PR".

Otherwise I'm fine with UUIDs if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the tooling or structure of the suite (as opposed to a new test). good first issue An issue that is a good candidate for new contributors
Projects
None yet
Development

No branches or pull requests

3 participants