Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize schemas #547

Merged
merged 134 commits into from Nov 13, 2023
Merged

Generalize schemas #547

merged 134 commits into from Nov 13, 2023

Conversation

daanboer
Copy link
Contributor

This pull request does a lot more than was originally planned. The core improvements include the switch to zod for parsing and validating, which is used in combination with zod-to-json-schema to generate the JSON schemas. The schema itself has also seen major changes. Most of these changes are aimed at generalizing the schemas to support other data types apart from LUT cross sections. In short, these changes allow for the effortless definition of new LTP process data types (potentials, rate coefficients, diff. cross sections, etc.), and for the ways in which they are presented (function parameter set, constant, expression, etc.), without the need for major infrastructural changes.

  • The new zod schemas allow for additional, custom validation logic through refine calls. This functionality is used to ensure that referenced state and reference ids are actually present in the provided dictionaries. Fixes Add validation for cross section set reference and state labels #47.
  • The species schemas are polished and simplified. A new Unspecified class of states is introduced that removes the confusion around the definition of unspecified electronic states (e.g. $\mathrm{Ar}^*$). Furthermore, singular and unspecified state descriptors do no longer have to be supplied as an array, making their description and intention more clear. Finally, unspecified state descriptors are now simple strings instead of objects that store the identifier in the somewhat arbitrary e, v, and J properties (for electronic, vibrational, and rotational states respectively). Fixes Add a separate Unspecified class of states #57, Enhance state schema #58. Closes Centralize list of states #26.
  • The @lxcat/database package has been reworked. All functionality is now encapsulated in an LXCatDatabase class. Previously, each encapsulated query call retrieved the global database singleton to run its query. This made the code very hard to test, as it was impossible to define multiple databases without spinning up a new instance of ArangoDB. This has now been solved and test speedups have been observed of ~40x.
  • The @lxcat/schema and @lxcat/database now have a well-defined external API through the use of the exports field in their package.json files. Fixes Draft: No dist in import #108.
  • Cleanup of the @lxcat/database package, some unused or deprecated code has been removed.
  • The user-facing @lxcat/app pages are updated to support the new schema structure.
  • Started development on a new data set edit form for authors. A utility is included that is able to dynamically generate form components based on a given JSON schema, which is used to generate the species forms.
  • A new RouteBuilder class is introduced that provides excellent TypeScript support and integrates well with the "new" Nextjs app router. Additionally, a compatible zod middleware is provided that allows for easy, elegant parsing of user-provided data.

Important to note is that merging the pull request in its current state will leave the @lxcat/app package in a state where it does not compile. This is a deliberate choice, as (clearly) this pull request did not stick to its original scope and is already way too large for comfort. The idea is to complete the remaining tasks in dedicated feature branches.

It is not necessary to use XOR as AnyAtom and AnyMolecule
are already discriminated unions.
This results cleaner generated schemas, as `@internal` types
are not explicitly included in the schema `definitions`, but are
instead evaluated inline (which is beneficial as their
generated names are often nondescript and verbose).
The `@discriminator` annotation will generate more performant
`if-then-else` schemas for discriminated union types, instead
of the default `anyOf` schemas.
In preparation for rate coefficients.
Annotate `InputDocument` with `@internal`.
Add the `AnySpecies` type.
Add `AnyParticle` to `AnySpecies`.
This function adds a state with all of its required parent states, and
the corresponding relations to the database.
As the database type.
Move AnySpecies to its own module.
This is already required to support one of the state descriptions
in the test set, e.g. CO2{X, {0,n,0|n,0,0}}.
Allows users to pick existing species from the database.
Also adds the corresponding api endpoints:
 - /api/species
 - /api/species/children
Command is now: `pnpm json:set`.
Add cli module that prints the generated `LTPMixture` schema.
Add `dom` lib dependency for printing.
Remove `ts-json-schema-generator` dependency.
Remove old schema types.
Compound species should have at least two entries.
Fix wrong label for unspecified entries in compound species.
The process `info` property is now always an array.
Made code a lot more testable.
Fix broken database tests.
Tests now run in ~3s instead of ~120.
@daanboer daanboer marked this pull request as ready for review November 13, 2023 15:17
@daanboer daanboer merged commit 4eccd39 into main Nov 13, 2023
5 of 6 checks passed
@daanboer daanboer deleted the enhance-schema branch November 13, 2023 15:25
@daanboer daanboer mentioned this pull request Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant