Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance state schema #58

Closed
daanboer opened this issue Mar 23, 2023 · 1 comment
Closed

Enhance state schema #58

daanboer opened this issue Mar 23, 2023 · 1 comment
Labels

Comments

@daanboer
Copy link
Contributor

daanboer commented Mar 23, 2023

TLDR

These changes introduce three main benefits:

  1. They make it so that it is a lot more clear at first sight which type of state (singular, compound, unspecified) you are dealing with.
  2. The resulting state definitions are more concise.
  3. The arbitrary e, v, and J properties of unspecified states are eliminated.

Explanation

The current state schema roughly distinguishes between three different types of states: singular, compound, and unspecified. To allow for the schema to express each of these types it is setup in a generic way. For example, electronic, vibrational, and rotational subcomponents are always presented as an array, whereas this is not always necessary. I will present three examples from the test set to illustrate this.

Singular: $\mathrm{N2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0\left(J=24\right)\right)\right)$
{
  "particle": "N2",
  "charge": 0,
  "type": "HomonuclearDiatom",
  "electronic": [
    {
      "e": "X",
      "Lambda": 0,
      "S": 0,
      "parity": "g",
      "reflection": "+",
      "vibrational": [
        {
          "v": 0,
          "rotational": [{ "J": 24 }]
        }
      ]
    }
  ]
}
Compound: $\mathrm{CO2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0,0,0|0,5,0|2,1,0|1,3,0|0,2,1|1,0,1\right)\right)$
{
  "particle": "CO2",
  "charge": 0,
  "type": "LinearTriatomInversionCenter",
  "electronic": [
    {
      "e": "X",
      "Lambda": 0,
      "S": 0,
      "parity": "g",
      "reflection": "+",
      "vibrational": [
        { "v": [0, 0, 0] },
        { "v": [0, 5, 0] },
        { "v": [2, 1, 0] },
        { "v": [1, 3, 0] },
        { "v": [0, 2, 1] },
        { "v": [1, 0, 1] }
      ]
    }
  ]
}
Unspecified: $\mathrm{Ar}\left(*\right)$
{
  "particle": "Ar",
  "charge": 0,
  "type": "AtomLS",
  "electronic": [
    {
      "e": "*"
    }
  ]
}

To clarify, I am talking about the electronic, vibrational, and rotational properties. Their values are always presented as an array, whereas this only really makes sense for compound states. For example, for singular states the straight brackets can be omitted such that the only entry in the array is directly assigned to the relevant property. This is demonstrated in the following adjusted example:

Singular: $\mathrm{N2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0\left(J=24\right)\right)\right)$
{
  "particle": "N2",
  "charge": 0,
  "type": "HomonuclearDiatom",
  "electronic": {
    "e": "X",
    "Lambda": 0,
    "S": 0,
    "parity": "g",
    "reflection": "+",
    "vibrational": {
      "v": 0,
      "rotational": { "J": 24 }
    }
  }
}

For the compound example, the electronic array can be collapsed, but the vibrational property is still presented by an array as this state is a vibrational compound. One big question is whether we even want to allow multi-level compounds (that is, states that provide multiple children on multiple levels, e.g. electronic and vibrational)? I currently see little advantage in allowing for their existence, and supporting them can imply more work for surrounding tools (e.g. how would these states be stored and linked in the database?). Although perhaps there are some edge cases that would require such states?

Compound: $\mathrm{CO2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0,0,0|0,5,0|2,1,0|1,3,0|0,2,1|1,0,1\right)\right)$
{
  "particle": "CO2",
  "charge": 0,
  "type": "LinearTriatomInversionCenter",
  "electronic": {
    "e": "X",
    "Lambda": 0,
    "S": 0,
    "parity": "g",
    "reflection": "+",
    "vibrational": [
      { "v": [0, 0, 0] },
      { "v": [0, 5, 0] },
      { "v": [2, 1, 0] },
      { "v": [1, 3, 0] },
      { "v": [0, 2, 1] },
      { "v": [1, 0, 1] }
    ]
  }
}

The unspecified states can be simplified even further. Currently, an unspecified component is an object with a single property that stores the string definition. The key of this property differs per level, e.g.: e; for electronic, v; for vibrational, and J; for rotational. This was originally done such that when the state definition is parsed, the summary and latex properties can be added to this object. However, these object are superfluous for unspecified components as they are always equal to the original string identifier. Therefore, the added indirection is unnecessary, and the string identifier can be directly assigned to the relevant property. This results in a much more concise state definition:

Unspecified: $\mathrm{Ar}\left(*\right)$
{
  "particle": "Ar",
  "charge": 0,
  "type": "Unspecified",
  "electronic": "*"
}

where I use the Unspecified type as proposed in #57.

daanboer added a commit that referenced this issue Jul 4, 2023
daanboer added a commit that referenced this issue Jul 16, 2023
daanboer added a commit that referenced this issue Jul 19, 2023
daanboer added a commit that referenced this issue Jul 21, 2023
daanboer added a commit that referenced this issue Nov 13, 2023
This pull request does a lot more than was originally planned. The core improvements include the switch to zod for parsing and validating, which is used in combination with zod-to-json-schema to generate the JSON schemas. The schema itself has also seen major changes. Most of these changes are aimed at generalizing the schemas to support other data types apart from LUT cross sections. In short, these changes allow for the effortless definition of new LTP process data types (potentials, rate coefficients, diff. cross sections, etc.), and for the ways in which they are presented (function parameter set, constant, expression, etc.), without the need for major infrastructural changes.

The new zod schemas allow for additional, custom validation logic through refine calls. This functionality is used to ensure that referenced state and reference ids are actually present in the provided dictionaries. Fixes #47.
The species schemas are polished and simplified. A new Unspecified class of states is introduced that removes the confusion around the definition of unspecified electronic states (e.g. 
). Furthermore, singular and unspecified state descriptors do no longer have to be supplied as an array, making their description and intention more clear. Finally, unspecified state descriptors are now simple strings instead of objects that store the identifier in the somewhat arbitrary e, v, and J properties (for electronic, vibrational, and rotational states respectively). Fixes #57, #58. Closes #26.
The @lxcat/database package has been reworked. All functionality is now encapsulated in an LXCatDatabase class. Previously, each encapsulated query call retrieved the global database singleton to run its query. This made the code very hard to test, as it was impossible to define multiple databases without spinning up a new instance of ArangoDB. This has now been solved and test speedups have been observed of ~40x.
The @lxcat/schema and @lxcat/database now have a well-defined external API through the use of the exports field in their package.json files. Fixes #108.
Cleanup of the @lxcat/database package, some unused or deprecated code has been removed.
The user-facing @lxcat/app pages are updated to support the new schema structure.
Started development on a new data set edit form for authors. A utility is included that is able to dynamically generate form components based on a given JSON schema, which is used to generate the species forms.
A new RouteBuilder class is introduced that provides excellent TypeScript support and integrates well with the "new" Nextjs app router. Additionally, a compatible zod middleware is provided that allows for easy, elegant parsing of user-provided data.
Important to note is that merging the pull request in its current state will leave the @lxcat/app package in a state where it does not compile. This is a deliberate choice, as (clearly) this pull request did not stick to its original scope and is already way too large for comfort. The idea is to complete the remaining tasks in dedicated feature branches.

Squash commit log:

* Eliminate the usage of the XOR type

It is not necessary to use XOR as AnyAtom and AnyMolecule
are already discriminated unions.

* Apply `@internal` to internal types

This results cleaner generated schemas, as `@internal` types
are not explicitly included in the schema `definitions`, but are
instead evaluated inline (which is beneficial as their
generated names are often nondescript and verbose).

* Annotate species union types with `@discriminator type`

The `@discriminator` annotation will generate more performant
`if-then-else` schemas for discriminated union types, instead
of the default `anyOf` schemas.

* Add `Constant` storage

In preparation for rate coefficients.

* Add `as` cast to fix type error

* Split the set header into a separate type

Annotate `InputDocument` with `@internal`.

* Rename `data` to `value` in `LUT` storage type`

* Add `Expression` storage type

* Switch to more simple generators for species types

Partially implements #58

* Annotate more types with `@internal`

Add the `AnySpecies` type.

* Add `stateIsAtom` type guard function

Add `AnyParticle` to `AnySpecies`.

* Reimplement `insert_state_tree` to support the new schema changes

This function adds a state with all of its required parent states, and
the corresponding relations to the database.

* Update `State` database collection type

* Introduce `KeyedSpecies`

As the database type.
Move AnySpecies to its own module.

* Update CrossSection schema

* Update `State` schema

* Update state tests for database package

* Allow unspecified state descriptors as part of compound state

This is already required to support one of the state descriptions
in the test set, e.g. CO2{X, {0,n,0|n,0,0}}.

* Update `@lxcat/schema` test data

* Regenerate database state schema

* Update `@lxcat/schema` tests to comply with new schema version

* Add `zod` dependency to `@lxcat/schema`

* Add preliminary migrate script for schema

* Move `SimpleParticle` and `AnyParticle` to dedicated module

* Add restriction on `State` generic parameter

and add @Discriminator tags to `State` types.

* Add missing `immer` dep for `@lxcat/database`

* Use `KeyedSpecies`

* Update state query generic constraints

* Remove unused `LutTable` component

* Cleanup minor (type) errors in `EditForm` component

* Added `Unspecified` class of states

This class of states accepts a string identifier as their
`electronic` value. This is useful for e.g. `He*` type
states.

* Fix build errors in `EditForm` component

FIXME: The edit form is in a broken state and needs to be
completely revamped.

* Fix build errors in `Chart`

* Update `State` collection schema

* Remove redundant example code

* Fix query generators for new schema

Remove use of `immer` in insert state procedure.

* Remove migrate file from version control

* Start new edit set page using app router

* Add preliminary `zod` version of schema

* Add `General` tab to new edit form

* Annotate new files

* Fix `LS1` and `J1L2` schema definitions

* Add script to generate `AnySpecies` schema from `zod` definitions

* Add `complete` flag to `SetHeader` definition

* Make species schemas more strict where appropriate

* Initial pass at automatic form generation from species schemas

Includes form factory functions that use the schema generated from
`AnySpecies` to build an input form for each of the state types.

The method is currently working fairly well, apart from one annoying error
concerning `Select` component that function as a switch between `Single`,
`Compound` and `Unspecified` state descriptors. The error is as follows:
*Warning: A component is changing a controlled input to be uncontrolled.
This is likely caused by the value changing from a defined to undefined,
which should not happen. Decide between using a controlled or uncontrolled
input element for the lifetime of the component.* The result is that the
selected option is not rendered, but the functionality of the component
is fine.

Proper handling of arrays should still be implemented.

* Add keys in `allOf` component and default value in `anyOf` component

* Equalize schema and app `zod` versions

* Add descriptions for `Singular`, `Compound`, and `Unspecified` options in schema

* Update new `EditForm` component

Use a native `select` component to specify the state component type.
Add new query to grab a data set by its id.
Add `NotFound` component.
Infer `LTPDocument` type.
Rename `scat-css-new` to `set`.

* Update copyright statement

* Add `drop_non_user` script

Drops all DB tables except for the user table.

* Fix warning in `Dialog`

Concerning lifetime of `ref.current`.

* Fix zod types for `CSLData` and `Reaction`

* Update zod schema

Add `parameters` and `threshold` properties for cross sections.
Allow unspecified level descriptors in compound layers.

* Update Zod types

Use method to generate generic typescript types from generic
zod schemas.
Use output types instead of input types.

* Start using Zod types

* `package.json` formatting

* Rework species parsing

Each zod component is now transformed to supply `summary` and
`latex` functions that respectively serialize the object to a
summarized short form and a latex form.

* Add `serialize` function to `State`

This function takes a `State` and generates its `StateSummary`.

* Add first state serializer tests

* Fix `@lxcat/schema` vitest configuration

* Add more state tests

* Add regression test for LTPMixture schema

* Start deprecation of old schema

* Fix further `@lxcat/schema` references

* Update JSON schema creation

* Fix build errors

* Fix state charge serialization

* Fix inspect page

For new schema.

* Add note about selection page crash

* Fix data selection page

* Fix serializer tests

* Continue fixing database tests

* Fix database state queries

Fix more database tests.

* Testing external serializer functions

Supplied through `Component` type.

* Use separate function to serialize `ShellEntry`

* Rename `LSTermImpl` -> `LSTermUncoupled`

* Add LS1 coupled component

* Push `SimpleParticle` down the tree

Add `makeComponent` function to help in definition of
state level components.

* Add helper types for new serialization strategy

These helper types allow for the simple construction of
both `serializable` and `non-serializable` versions of
components and atoms.
The atomic types have already been reworked to use these
helper types.

* Use `makeComponent` for molecular components

* Split molecular types into serializable and non-serializable

* `@lxcat/schema`: Emit ES6 with ESM imports

* Remove `State` types

`AnySpecies` is now used instead.
Start preparing `index` files for removal of `dist`
in external import statements.

* Use zod refine to check validity of `state` and `reference` keys

in LTPDocument.
Add corresponding tests.

* Perform key checks on `LTPMixture`

Add tests.

* Move `SetHeader` and `SelfReference` to dedicated modules.

* Encapsulate exports and use `nodenext` module resolution

Use `exports` field in `package.json`.

* Fix database build

Use new `@lxcat/schema` imports and types.
Fix `tsconfig` to work with new module resolution strategy.

* Update `module` setting in `schema` and `database` `tsconfig`

* Add additional fields to `CSLNameVariable`

These are not present in the standard CSL schemas, but may
be supplied by `citation-js`.

* Fix `@lxcat/app` build errors

* Update database cli setup script

* Fix data select/inspect/compute routes

These pages now all use the new schema.
Database cli scripts have also been fixed.

* Update state interface in edit form

Use an accordion with latex state descriptions
in the control.
Add `+` button that adds an empty species.
Add `Add from database` button that should allow
the user to pick an already existing species from
the database (TODO).

* Create `electronic` property upon type switch

For new state objects.

* Fix `byIdJSON` function

* Add new `RouteBuilder` class

This class can be used to build routes (as the name suggests). It utilizes the builder
pattern to build routes for the Nextjs app router using middlewares and route handlers,
while maintaining type safety.

* Add `async` api to `RouteBuilder`

* Add `hasSessionOrAPIToken` and `hasDeveloperOrDownloadRole` middlewares

* Remove `dom` lib from `schema` tsconfig

* Pin `zod` version to `3.21.3`

Previous version `3.22.4` caused OOM errors on transpilation.

* Add simple species routes

Provide generally useful API calls.
Will be used for edit form `pick from database` option.

* Pin `zod` version in `app`

* Add model to pick species from database in set edit form

* Update flake lock file

* Add array component for species form generation

* Fix type errors involving document self referencing

* Allow for unspecified entries in rovibrational compounds

* Add species picker in set edit form

Allows users to pick existing species from the database.
Also adds the corresponding api endpoints:
 - /api/species
 - /api/species/children

* Update schema generation command

Command is now: `pnpm json:set`.
Add cli module that prints the generated `LTPMixture` schema.
Add `dom` lib dependency for printing.
Remove `ts-json-schema-generator` dependency.
Remove old schema types.

* Secure endpoints using middleware

* Add species API tests

* Fix `/scat-css/[id]` endpoint

* Fix many database tests

* Fix and restrict atom and molecule schemas

Compound species should have at least two entries.
Fix wrong label for unspecified entries in compound species.

* Fix reaction test

* Partially fix CS write tests

* Fix more database tests

The process `info` property is now always an array.

* Overhaul database package

Made code a lot more testable.
Fix broken database tests.
Tests now run in ~3s instead of ~120.

* Fix database package build

* Fix annotate script

* Fix reuse compliance

* Tidy database package API

* Fix schema regression test

* Remove flake files from version control

* Fix `@lxcat/schema` and `@lxcat/app` REUSE compliance
@daanboer
Copy link
Contributor Author

Closed by #547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant