Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Dataset Sidecar / Config prototype #1376

Draft
wants to merge 89 commits into
base: master
Choose a base branch
from

Conversation

matbryan52
Copy link
Member

Adds a prototype configuration file format / parser / validator for LiberTEM objects. Responds to the needs in #768 (and others referenced in that issue).

The functionality is split into two (loosely coupled) parts:

  1. tree.py which handles parsing configuration files/strings (in TOML, JSON etc) into a tree structure, and implements some features like searching for keys in the tree and extracting sub-trees.
  2. models.py which implements a set of Pydantic models for input data validation / casting (https://docs.pydantic.dev/).

There is a document describing the features in more detail in schema_description.md.

The advantage of Pydantic is that it's quite easy to express complex input validation and default values: it uses Python typing to coerce and validate input data, with the added ability to specify extra / complex validators as a set of classmethods on the model. Originally I tried to do this with jsonschema but built up a lot of complexity in the schema definitions and the custom validation functions. Another nice feature is that the data is inserted into a TypedDict-like structure with easy dot-access for consumers. A further benefit is that it also can give type-hints to an IDE under certain circumstances.

The data models are sub-classable to make more specific schemas, for example:

class StandardDatasetConfig(WithRootModel):
    """
    Config for 'standard' LiberTEM dataset arguments
    """
    config_type: Optional[Literal['dataset']] = 'dataset'
    # ds_format could be an Optional[ds_enum] of valid dataset types to restrict possible values
    ds_format: Optional[str] = 'auto'
    path: Union[FileConfig, pathlib.Path, str]
    # conlist and PostiveInt automatically enforce constraints on the input data if present
    nav_shape: Optional[conlist(PositiveInt, min_items=1)] = None
    sig_shape: Optional[conlist(PositiveInt, min_items=1)] = None
    sync_offset: Optional[int] = 0


class RawDataSetConfig(StandardDatasetConfig):
    ds_format: Optional[Literal['raw']] = 'raw'
    nav_shape: conlist(PositiveInt, min_items=1)
    sig_shape: conlist(PositiveInt, min_items=1)
    dtype: DType

to express the fact that RawDataSet requires nav_shape and sig_shape, and additionally requires the dtype argument.

None of the functionality is integrated with any of the LiberTEM classes yet as we would have to modify some signatures (e.g. make RawDataSet accept a single path argument without dtype, nav_shape, sig_shape etc, so that these can be filled from the config information. An example of how this could be done is in example.py. Eventually we could use the data model itself to handle most input argument validation for a given class, even when not using a config file.

Contributor Checklist:

Reviewer Checklist:

  • /azp run libertem.libertem-data passed
  • No import of GPL code from MIT code

@matbryan52 matbryan52 linked an issue Jan 11, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parameter sidecar files
1 participant