Enhance state schema #58

daanboer · 2023-03-23T16:17:07Z

TLDR

These changes introduce three main benefits:

They make it so that it is a lot more clear at first sight which type of state (singular, compound, unspecified) you are dealing with.
The resulting state definitions are more concise.
The arbitrary e, v, and J properties of unspecified states are eliminated.

Explanation

The current state schema roughly distinguishes between three different types of states: singular, compound, and unspecified. To allow for the schema to express each of these types it is setup in a generic way. For example, electronic, vibrational, and rotational subcomponents are always presented as an array, whereas this is not always necessary. I will present three examples from the test set to illustrate this.

Singular: $\mathrm{N2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0\left(J=24\right)\right)\right)$

{
  "particle": "N2",
  "charge": 0,
  "type": "HomonuclearDiatom",
  "electronic": [
    {
      "e": "X",
      "Lambda": 0,
      "S": 0,
      "parity": "g",
      "reflection": "+",
      "vibrational": [
        {
          "v": 0,
          "rotational": [{ "J": 24 }]
        }
      ]
    }
  ]
}

Compound: $\mathrm{CO2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0,0,0|0,5,0|2,1,0|1,3,0|0,2,1|1,0,1\right)\right)$

{
  "particle": "CO2",
  "charge": 0,
  "type": "LinearTriatomInversionCenter",
  "electronic": [
    {
      "e": "X",
      "Lambda": 0,
      "S": 0,
      "parity": "g",
      "reflection": "+",
      "vibrational": [
        { "v": [0, 0, 0] },
        { "v": [0, 5, 0] },
        { "v": [2, 1, 0] },
        { "v": [1, 3, 0] },
        { "v": [0, 2, 1] },
        { "v": [1, 0, 1] }
      ]
    }
  ]
}

Unspecified: $\mathrm{Ar}\left(*\right)$

{
  "particle": "Ar",
  "charge": 0,
  "type": "AtomLS",
  "electronic": [
    {
      "e": "*"
    }
  ]
}

To clarify, I am talking about the electronic, vibrational, and rotational properties. Their values are always presented as an array, whereas this only really makes sense for compound states. For example, for singular states the straight brackets can be omitted such that the only entry in the array is directly assigned to the relevant property. This is demonstrated in the following adjusted example:

Singular: $\mathrm{N2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0\left(J=24\right)\right)\right)$

{
  "particle": "N2",
  "charge": 0,
  "type": "HomonuclearDiatom",
  "electronic": {
    "e": "X",
    "Lambda": 0,
    "S": 0,
    "parity": "g",
    "reflection": "+",
    "vibrational": {
      "v": 0,
      "rotational": { "J": 24 }
    }
  }
}

For the compound example, the electronic array can be collapsed, but the vibrational property is still presented by an array as this state is a vibrational compound. One big question is whether we even want to allow multi-level compounds (that is, states that provide multiple children on multiple levels, e.g. electronic and vibrational)? I currently see little advantage in allowing for their existence, and supporting them can imply more work for surrounding tools (e.g. how would these states be stored and linked in the database?). Although perhaps there are some edge cases that would require such states?

Compound: $\mathrm{CO2}\left(\mathrm{X}^{1}\Sigma_\mathrm{g}^+\left(v=0,0,0|0,5,0|2,1,0|1,3,0|0,2,1|1,0,1\right)\right)$

{
  "particle": "CO2",
  "charge": 0,
  "type": "LinearTriatomInversionCenter",
  "electronic": {
    "e": "X",
    "Lambda": 0,
    "S": 0,
    "parity": "g",
    "reflection": "+",
    "vibrational": [
      { "v": [0, 0, 0] },
      { "v": [0, 5, 0] },
      { "v": [2, 1, 0] },
      { "v": [1, 3, 0] },
      { "v": [0, 2, 1] },
      { "v": [1, 0, 1] }
    ]
  }
}

The unspecified states can be simplified even further. Currently, an unspecified component is an object with a single property that stores the string definition. The key of this property differs per level, e.g.: e; for electronic, v; for vibrational, and J; for rotational. This was originally done such that when the state definition is parsed, the summary and latex properties can be added to this object. However, these object are superfluous for unspecified components as they are always equal to the original string identifier. Therefore, the added indirection is unnecessary, and the string identifier can be directly assigned to the relevant property. This results in a much more concise state definition:

Unspecified: $\mathrm{Ar}\left(*\right)$

{
  "particle": "Ar",
  "charge": 0,
  "type": "Unspecified",
  "electronic": "*"
}

where I use the Unspecified type as proposed in #57.

The text was updated successfully, but these errors were encountered:

Partially implements #58

@Discriminator

This pull request does a lot more than was originally planned. The core improvements include the switch to zod for parsing and validating, which is used in combination with zod-to-json-schema to generate the JSON schemas. The schema itself has also seen major changes. Most of these changes are aimed at generalizing the schemas to support other data types apart from LUT cross sections. In short, these changes allow for the effortless definition of new LTP process data types (potentials, rate coefficients, diff. cross sections, etc.), and for the ways in which they are presented (function parameter set, constant, expression, etc.), without the need for major infrastructural changes. The new zod schemas allow for additional, custom validation logic through refine calls. This functionality is used to ensure that referenced state and reference ids are actually present in the provided dictionaries. Fixes #47. The species schemas are polished and simplified. A new Unspecified class of states is introduced that removes the confusion around the definition of unspecified electronic states (e.g. ). Furthermore, singular and unspecified state descriptors do no longer have to be supplied as an array, making their description and intention more clear. Finally, unspecified state descriptors are now simple strings instead of objects that store the identifier in the somewhat arbitrary e, v, and J properties (for electronic, vibrational, and rotational states respectively). Fixes #57, #58. Closes #26. The @lxcat/database package has been reworked. All functionality is now encapsulated in an LXCatDatabase class. Previously, each encapsulated query call retrieved the global database singleton to run its query. This made the code very hard to test, as it was impossible to define multiple databases without spinning up a new instance of ArangoDB. This has now been solved and test speedups have been observed of ~40x. The @lxcat/schema and @lxcat/database now have a well-defined external API through the use of the exports field in their package.json files. Fixes #108. Cleanup of the @lxcat/database package, some unused or deprecated code has been removed. The user-facing @lxcat/app pages are updated to support the new schema structure. Started development on a new data set edit form for authors. A utility is included that is able to dynamically generate form components based on a given JSON schema, which is used to generate the species forms. A new RouteBuilder class is introduced that provides excellent TypeScript support and integrates well with the "new" Nextjs app router. Additionally, a compatible zod middleware is provided that allows for easy, elegant parsing of user-provided data. Important to note is that merging the pull request in its current state will leave the @lxcat/app package in a state where it does not compile. This is a deliberate choice, as (clearly) this pull request did not stick to its original scope and is already way too large for comfort. The idea is to complete the remaining tasks in dedicated feature branches. Squash commit log: * Eliminate the usage of the XOR type It is not necessary to use XOR as AnyAtom and AnyMolecule are already discriminated unions. * Apply `@internal` to internal types This results cleaner generated schemas, as `@internal` types are not explicitly included in the schema `definitions`, but are instead evaluated inline (which is beneficial as their generated names are often nondescript and verbose). * Annotate species union types with `@discriminator type` The `@discriminator` annotation will generate more performant `if-then-else` schemas for discriminated union types, instead of the default `anyOf` schemas. * Add `Constant` storage In preparation for rate coefficients. * Add `as` cast to fix type error * Split the set header into a separate type Annotate `InputDocument` with `@internal`. * Rename `data` to `value` in `LUT` storage type` * Add `Expression` storage type * Switch to more simple generators for species types Partially implements #58 * Annotate more types with `@internal` Add the `AnySpecies` type. * Add `stateIsAtom` type guard function Add `AnyParticle` to `AnySpecies`. * Reimplement `insert_state_tree` to support the new schema changes This function adds a state with all of its required parent states, and the corresponding relations to the database. * Update `State` database collection type * Introduce `KeyedSpecies` As the database type. Move AnySpecies to its own module. * Update CrossSection schema * Update `State` schema * Update state tests for database package * Allow unspecified state descriptors as part of compound state This is already required to support one of the state descriptions in the test set, e.g. CO2{X, {0,n,0|n,0,0}}. * Update `@lxcat/schema` test data * Regenerate database state schema * Update `@lxcat/schema` tests to comply with new schema version * Add `zod` dependency to `@lxcat/schema` * Add preliminary migrate script for schema * Move `SimpleParticle` and `AnyParticle` to dedicated module * Add restriction on `State` generic parameter and add @Discriminator tags to `State` types. * Add missing `immer` dep for `@lxcat/database` * Use `KeyedSpecies` * Update state query generic constraints * Remove unused `LutTable` component * Cleanup minor (type) errors in `EditForm` component * Added `Unspecified` class of states This class of states accepts a string identifier as their `electronic` value. This is useful for e.g. `He*` type states. * Fix build errors in `EditForm` component FIXME: The edit form is in a broken state and needs to be completely revamped. * Fix build errors in `Chart` * Update `State` collection schema * Remove redundant example code * Fix query generators for new schema Remove use of `immer` in insert state procedure. * Remove migrate file from version control * Start new edit set page using app router * Add preliminary `zod` version of schema * Add `General` tab to new edit form * Annotate new files * Fix `LS1` and `J1L2` schema definitions * Add script to generate `AnySpecies` schema from `zod` definitions * Add `complete` flag to `SetHeader` definition * Make species schemas more strict where appropriate * Initial pass at automatic form generation from species schemas Includes form factory functions that use the schema generated from `AnySpecies` to build an input form for each of the state types. The method is currently working fairly well, apart from one annoying error concerning `Select` component that function as a switch between `Single`, `Compound` and `Unspecified` state descriptors. The error is as follows: *Warning: A component is changing a controlled input to be uncontrolled. This is likely caused by the value changing from a defined to undefined, which should not happen. Decide between using a controlled or uncontrolled input element for the lifetime of the component.* The result is that the selected option is not rendered, but the functionality of the component is fine. Proper handling of arrays should still be implemented. * Add keys in `allOf` component and default value in `anyOf` component * Equalize schema and app `zod` versions * Add descriptions for `Singular`, `Compound`, and `Unspecified` options in schema * Update new `EditForm` component Use a native `select` component to specify the state component type. Add new query to grab a data set by its id. Add `NotFound` component. Infer `LTPDocument` type. Rename `scat-css-new` to `set`. * Update copyright statement * Add `drop_non_user` script Drops all DB tables except for the user table. * Fix warning in `Dialog` Concerning lifetime of `ref.current`. * Fix zod types for `CSLData` and `Reaction` * Update zod schema Add `parameters` and `threshold` properties for cross sections. Allow unspecified level descriptors in compound layers. * Update Zod types Use method to generate generic typescript types from generic zod schemas. Use output types instead of input types. * Start using Zod types * `package.json` formatting * Rework species parsing Each zod component is now transformed to supply `summary` and `latex` functions that respectively serialize the object to a summarized short form and a latex form. * Add `serialize` function to `State` This function takes a `State` and generates its `StateSummary`. * Add first state serializer tests * Fix `@lxcat/schema` vitest configuration * Add more state tests * Add regression test for LTPMixture schema * Start deprecation of old schema * Fix further `@lxcat/schema` references * Update JSON schema creation * Fix build errors * Fix state charge serialization * Fix inspect page For new schema. * Add note about selection page crash * Fix data selection page * Fix serializer tests * Continue fixing database tests * Fix database state queries Fix more database tests. * Testing external serializer functions Supplied through `Component` type. * Use separate function to serialize `ShellEntry` * Rename `LSTermImpl` -> `LSTermUncoupled` * Add LS1 coupled component * Push `SimpleParticle` down the tree Add `makeComponent` function to help in definition of state level components. * Add helper types for new serialization strategy These helper types allow for the simple construction of both `serializable` and `non-serializable` versions of components and atoms. The atomic types have already been reworked to use these helper types. * Use `makeComponent` for molecular components * Split molecular types into serializable and non-serializable * `@lxcat/schema`: Emit ES6 with ESM imports * Remove `State` types `AnySpecies` is now used instead. Start preparing `index` files for removal of `dist` in external import statements. * Use zod refine to check validity of `state` and `reference` keys in LTPDocument. Add corresponding tests. * Perform key checks on `LTPMixture` Add tests. * Move `SetHeader` and `SelfReference` to dedicated modules. * Encapsulate exports and use `nodenext` module resolution Use `exports` field in `package.json`. * Fix database build Use new `@lxcat/schema` imports and types. Fix `tsconfig` to work with new module resolution strategy. * Update `module` setting in `schema` and `database` `tsconfig` * Add additional fields to `CSLNameVariable` These are not present in the standard CSL schemas, but may be supplied by `citation-js`. * Fix `@lxcat/app` build errors * Update database cli setup script * Fix data select/inspect/compute routes These pages now all use the new schema. Database cli scripts have also been fixed. * Update state interface in edit form Use an accordion with latex state descriptions in the control. Add `+` button that adds an empty species. Add `Add from database` button that should allow the user to pick an already existing species from the database (TODO). * Create `electronic` property upon type switch For new state objects. * Fix `byIdJSON` function * Add new `RouteBuilder` class This class can be used to build routes (as the name suggests). It utilizes the builder pattern to build routes for the Nextjs app router using middlewares and route handlers, while maintaining type safety. * Add `async` api to `RouteBuilder` * Add `hasSessionOrAPIToken` and `hasDeveloperOrDownloadRole` middlewares * Remove `dom` lib from `schema` tsconfig * Pin `zod` version to `3.21.3` Previous version `3.22.4` caused OOM errors on transpilation. * Add simple species routes Provide generally useful API calls. Will be used for edit form `pick from database` option. * Pin `zod` version in `app` * Add model to pick species from database in set edit form * Update flake lock file * Add array component for species form generation * Fix type errors involving document self referencing * Allow for unspecified entries in rovibrational compounds * Add species picker in set edit form Allows users to pick existing species from the database. Also adds the corresponding api endpoints: - /api/species - /api/species/children * Update schema generation command Command is now: `pnpm json:set`. Add cli module that prints the generated `LTPMixture` schema. Add `dom` lib dependency for printing. Remove `ts-json-schema-generator` dependency. Remove old schema types. * Secure endpoints using middleware * Add species API tests * Fix `/scat-css/[id]` endpoint * Fix many database tests * Fix and restrict atom and molecule schemas Compound species should have at least two entries. Fix wrong label for unspecified entries in compound species. * Fix reaction test * Partially fix CS write tests * Fix more database tests The process `info` property is now always an array. * Overhaul database package Made code a lot more testable. Fix broken database tests. Tests now run in ~3s instead of ~120. * Fix database package build * Fix annotate script * Fix reuse compliance * Tidy database package API * Fix schema regression test * Remove flake files from version control * Fix `@lxcat/schema` and `@lxcat/app` REUSE compliance

daanboer · 2023-11-17T07:40:41Z

Closed by #547

daanboer added a commit that referenced this issue Jul 4, 2023

Switch to more simple generators for species types

c89d005

Partially implements #58

daanboer added a commit that referenced this issue Jul 16, 2023

Switch to more simple generators for species types

ea8d8ff

Partially implements #58

daanboer added a commit that referenced this issue Jul 19, 2023

Switch to more simple generators for species types

c16d6c5

Partially implements #58

daanboer added a commit that referenced this issue Jul 21, 2023

Switch to more simple generators for species types

8b1d931

Partially implements #58

daanboer mentioned this issue Nov 13, 2023

Generalize schemas #547

Merged

daanboer closed this as completed Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance state schema #58

Enhance state schema #58

daanboer commented Mar 23, 2023 •

edited

daanboer commented Nov 17, 2023

Enhance state schema #58

Enhance state schema #58

Comments

daanboer commented Mar 23, 2023 • edited

TLDR

Explanation

daanboer commented Nov 17, 2023

daanboer commented Mar 23, 2023 •

edited