Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature for Daffodil DFDL Data querying #2835

Open
mbeckerle opened this issue Oct 11, 2023 · 2 comments
Open

Feature for Daffodil DFDL Data querying #2835

mbeckerle opened this issue Oct 11, 2023 · 2 comments
Assignees
Labels
doc-impacting PRs that affect the documentation enhancement PRs that add a new functionality to Drill new-format New Format Plugin

Comments

@mbeckerle
Copy link
Contributor

Use DFDL language to describe data, and then enable Drill to query that data immediately by way of Apache Daffodil's DFDL implementation.

(Creating this so I have a ticket number to cite in commits/PRs)

@cgivre cgivre added enhancement PRs that add a new functionality to Drill new-format New Format Plugin doc-impacting PRs that affect the documentation labels Oct 11, 2023
@cgivre
Copy link
Contributor

cgivre commented Oct 11, 2023

Very excited to see this! DFDL + Drill is really a great combo.

mbeckerle added a commit to mbeckerle/drill that referenced this issue Oct 11, 2023
POMs changed.
New format-daffodil module created (by copy/rename on format-xml module)

Maven seems to be happy dependency-wise, but we won't know until we
try to compile actual java code using daffodil APIs.

apache#2835
mbeckerle added a commit to mbeckerle/drill that referenced this issue Oct 11, 2023
mbeckerle added a commit to mbeckerle/drill that referenced this issue Dec 22, 2023
3.7.0-SNAPSHOT of Daffodil which has metadata support we're
using.

New format-daffodil module created

Still uses absolute paths for the schemaFileURI.
(which is cheating. Wouldn't work in a true distributed
drill environment.)

We have yet to work out how to enable Drill to provide
access for DFDL schemas in XML form with include/import
to be resolved.

The input data stream is, however, being accessed in the
proper Drill manner. Gunzip happened automatically. Nice.

Note: Fix boxed Boolean vs. boolean problem. Don't use
boxed primitives in Format config objects.

Test show this works for data as complex as having
nested repeating sub-records.

These DFDL types are supported:

- int
- long
- short
- byte
- boolean
- double
- float (does not work. Bug DAFFODIL-2367)
- hexBinary
- string

apache#2835
mbeckerle added a commit to mbeckerle/drill that referenced this issue Dec 22, 2023
3.7.0-SNAPSHOT of Daffodil which has metadata support we're
using.

New format-daffodil module created

Still uses absolute paths for the schemaFileURI.
(which is cheating. Wouldn't work in a true distributed
drill environment.)

We have yet to work out how to enable Drill to provide
access for DFDL schemas in XML form with include/import
to be resolved.

The input data stream is, however, being accessed in the
proper Drill manner. Gunzip happened automatically. Nice.

Note: Fix boxed Boolean vs. boolean problem. Don't use
boxed primitives in Format config objects.

Test show this works for data as complex as having
nested repeating sub-records.

These DFDL types are supported:

- int
- long
- short
- byte
- boolean
- double
- float (does not work. Bug DAFFODIL-2367)
- hexBinary
- string

apache#2835
@mbeckerle
Copy link
Contributor Author

PR updated: #2836

  • Metadata bridge working,
  • data bridge working.
  • simple types still tbd are: unsigned integers, decimal, date/time/dateTime

Not done: Distribution of schemas (or compiled schemas) across drill's computation fabric.

Testing of nillability, and real queries against realistic DFDL schemas are needed.

mbeckerle added a commit to mbeckerle/drill that referenced this issue Apr 27, 2024
Requires Daffodil version 3.7.0 or higher.

New format-daffodil module created

Still uses absolute paths for the schemaFileURI.
(which is cheating. Wouldn't work in a true distributed
drill environment.)

We have yet to work out how to enable Drill to provide
access for DFDL schemas in XML form with include/import
to be resolved.

The input data stream is, however, being accessed in the
proper Drill manner. Gunzip happened automatically. Nice.

Note: Fix boxed Boolean vs. boolean problem. Don't use
boxed primitives in Format config objects.

Test show this works for data as complex as having
nested repeating sub-records.

These DFDL types are supported:

- int
- long
- short
- byte
- boolean
- double
- float (does not work. Bug DAFFODIL-2367)
- hexBinary
- string

apache#2835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-impacting PRs that affect the documentation enhancement PRs that add a new functionality to Drill new-format New Format Plugin
Projects
None yet
Development

No branches or pull requests

2 participants