Generalize schemas #547

daanboer · 2023-11-13T15:16:05Z

This pull request does a lot more than was originally planned. The core improvements include the switch to zod for parsing and validating, which is used in combination with zod-to-json-schema to generate the JSON schemas. The schema itself has also seen major changes. Most of these changes are aimed at generalizing the schemas to support other data types apart from LUT cross sections. In short, these changes allow for the effortless definition of new LTP process data types (potentials, rate coefficients, diff. cross sections, etc.), and for the ways in which they are presented (function parameter set, constant, expression, etc.), without the need for major infrastructural changes.

The new zod schemas allow for additional, custom validation logic through refine calls. This functionality is used to ensure that referenced state and reference ids are actually present in the provided dictionaries. Fixes Add validation for cross section set reference and state labels #47.
The species schemas are polished and simplified. A new Unspecified class of states is introduced that removes the confusion around the definition of unspecified electronic states (e.g. $\mathrm{Ar}^*$). Furthermore, singular and unspecified state descriptors do no longer have to be supplied as an array, making their description and intention more clear. Finally, unspecified state descriptors are now simple strings instead of objects that store the identifier in the somewhat arbitrary e, v, and J properties (for electronic, vibrational, and rotational states respectively). Fixes Add a separate Unspecified class of states #57, Enhance state schema #58. Closes Centralize list of states #26.
The @lxcat/database package has been reworked. All functionality is now encapsulated in an LXCatDatabase class. Previously, each encapsulated query call retrieved the global database singleton to run its query. This made the code very hard to test, as it was impossible to define multiple databases without spinning up a new instance of ArangoDB. This has now been solved and test speedups have been observed of ~40x.
The @lxcat/schema and @lxcat/database now have a well-defined external API through the use of the exports field in their package.json files. Fixes Draft: No dist in import #108.
Cleanup of the @lxcat/database package, some unused or deprecated code has been removed.
The user-facing @lxcat/app pages are updated to support the new schema structure.
Started development on a new data set edit form for authors. A utility is included that is able to dynamically generate form components based on a given JSON schema, which is used to generate the species forms.
A new RouteBuilder class is introduced that provides excellent TypeScript support and integrates well with the "new" Nextjs app router. Additionally, a compatible zod middleware is provided that allows for easy, elegant parsing of user-provided data.

Important to note is that merging the pull request in its current state will leave the @lxcat/app package in a state where it does not compile. This is a deliberate choice, as (clearly) this pull request did not stick to its original scope and is already way too large for comfort. The idea is to complete the remaining tasks in dedicated feature branches.

It is not necessary to use XOR as AnyAtom and AnyMolecule are already discriminated unions.

This results cleaner generated schemas, as `@internal` types are not explicitly included in the schema `definitions`, but are instead evaluated inline (which is beneficial as their generated names are often nondescript and verbose).

The `@discriminator` annotation will generate more performant `if-then-else` schemas for discriminated union types, instead of the default `anyOf` schemas.

In preparation for rate coefficients.

Annotate `InputDocument` with `@internal`.

Partially implements #58

Add the `AnySpecies` type.

Add `AnyParticle` to `AnySpecies`.

This function adds a state with all of its required parent states, and the corresponding relations to the database.

As the database type. Move AnySpecies to its own module.

This is already required to support one of the state descriptions in the test set, e.g. CO2{X, {0,n,0|n,0,0}}.

@Discriminator

and add @Discriminator tags to `State` types.

Allows users to pick existing species from the database. Also adds the corresponding api endpoints: - /api/species - /api/species/children

Command is now: `pnpm json:set`. Add cli module that prints the generated `LTPMixture` schema. Add `dom` lib dependency for printing. Remove `ts-json-schema-generator` dependency. Remove old schema types.

Compound species should have at least two entries. Fix wrong label for unspecified entries in compound species.

The process `info` property is now always an array.

Made code a lot more testable. Fix broken database tests. Tests now run in ~3s instead of ~120.

daanboer added 30 commits July 22, 2023 00:01

Eliminate the usage of the XOR type

31f8b56

It is not necessary to use XOR as AnyAtom and AnyMolecule are already discriminated unions.

Apply @internal to internal types

5a32063

This results cleaner generated schemas, as `@internal` types are not explicitly included in the schema `definitions`, but are instead evaluated inline (which is beneficial as their generated names are often nondescript and verbose).

Annotate species union types with @discriminator type

44c3909

The `@discriminator` annotation will generate more performant `if-then-else` schemas for discriminated union types, instead of the default `anyOf` schemas.

Add Constant storage

1c811d5

In preparation for rate coefficients.

Add as cast to fix type error

7e47c15

Split the set header into a separate type

134285c

Annotate `InputDocument` with `@internal`.

Rename data to value in LUT storage type`

4a46728

Add Expression storage type

6e8d93c

Switch to more simple generators for species types

8b1d931

Partially implements #58

Annotate more types with @internal

e7d15e1

Add the `AnySpecies` type.

Add stateIsAtom type guard function

c4c59b0

Add `AnyParticle` to `AnySpecies`.

Reimplement insert_state_tree to support the new schema changes

4ca1932

This function adds a state with all of its required parent states, and the corresponding relations to the database.

Update State database collection type

9244ef2

Introduce KeyedSpecies

92ac370

As the database type. Move AnySpecies to its own module.

Update CrossSection schema

2a73df8

Update State schema

9b24869

Update state tests for database package

eb23f1a

Allow unspecified state descriptors as part of compound state

db1ef4f

This is already required to support one of the state descriptions in the test set, e.g. CO2{X, {0,n,0|n,0,0}}.

Update @lxcat/schema test data

93a9be2

Regenerate database state schema

ebe4fb0

Update @lxcat/schema tests to comply with new schema version

cdfb7fc

Add zod dependency to @lxcat/schema

178ae3c

Add preliminary migrate script for schema

1476e08

Move SimpleParticle and AnyParticle to dedicated module

180fcb5

Add restriction on State generic parameter

666b553

and add @Discriminator tags to `State` types.

Add missing immer dep for @lxcat/database

aff62f7

Use KeyedSpecies

18dfe7c

Update state query generic constraints

d4c90da

Remove unused LutTable component

43ad835

Cleanup minor (type) errors in EditForm component

62232f5

daanboer added 22 commits November 7, 2023 10:16

Update flake lock file

b035a45

Add array component for species form generation

2133764

Fix type errors involving document self referencing

bb90176

Allow for unspecified entries in rovibrational compounds

0b1dcc2

Add species picker in set edit form

8d6623f

Allows users to pick existing species from the database. Also adds the corresponding api endpoints: - /api/species - /api/species/children

Update schema generation command

cc4df7d

Command is now: `pnpm json:set`. Add cli module that prints the generated `LTPMixture` schema. Add `dom` lib dependency for printing. Remove `ts-json-schema-generator` dependency. Remove old schema types.

Secure endpoints using middleware

6bd1b0c

Add species API tests

d869f45

Fix /scat-css/[id] endpoint

8c870f0

Fix many database tests

f661739

Fix and restrict atom and molecule schemas

01a453f

Compound species should have at least two entries. Fix wrong label for unspecified entries in compound species.

Fix reaction test

3188d95

Partially fix CS write tests

ded7009

Fix more database tests

d107714

The process `info` property is now always an array.

Overhaul database package

64d2f61

Made code a lot more testable. Fix broken database tests. Tests now run in ~3s instead of ~120.

Fix database package build

fd55010

Fix annotate script

7d03127

Fix reuse compliance

7ca9813

Tidy database package API

2ba59d7

Fix schema regression test

4ea4b00

Merge remote-tracking branch 'origin/main' into enhance-schema

7434ae1

Remove flake files from version control

bed694f

daanboer marked this pull request as ready for review November 13, 2023 15:17

Fix @lxcat/schema and @lxcat/app REUSE compliance

54899fc

daanboer merged commit 4eccd39 into main Nov 13, 2023
5 of 6 checks passed

daanboer deleted the enhance-schema branch November 13, 2023 15:25

daanboer mentioned this pull request Nov 17, 2023

Enhance state schema #58

Closed

This was referenced Nov 29, 2023

Accurately treat species chemical composition #587

Open

Draft: Implement functionality for potential data #63

Open

Rework set edit form #589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize schemas #547

Generalize schemas #547

daanboer commented Nov 13, 2023

Generalize schemas #547

Generalize schemas #547

Conversation

daanboer commented Nov 13, 2023