Beam SQL Extension raised an error when the input row contained iterable fields #31117

yyfhust · 2024-04-26T14:12:31Z

Please add a meaningful description for your change here

Related to : #31118

Upon utilizing the Beam SQL within our pipeline, we encountered a noteworthy exception. It appears that when the input row encompasses fields of iterable types, the process fails irrespective of whether the iterable fields are included in the SQL filter condition or not. This issue stems from the fact that the Beam SQL extension attempts to construct an output row schema based on the input schema, and unfortunately, it currently lacks support for iterable types.

Consider the following example:

Given an inputRow in the schema:

field1: String
field2: Integer
field3: Array<String>
field4: ITERABLE

And ANY Beam SQL condition such as :
field2 > 1
or Even
1 = 1

The pipeline will invariably fail, yielding the following error: Exception in thread "main" java.lang.UnsupportedOperationException: Unable to get ITERABLE at org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$InputGetterImpl.getBeamField(BeamCalcRel.java:603).

(My first contribution to beam , kindly advise how to test lol)

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

github-actions · 2024-04-26T15:06:11Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Abacn · 2024-05-08T19:20:25Z

thanks for the fix. While this sounds a valid fix, would you mind sharing the full stack trace. My understanding here is that there are two issues

(1) Beam SQL does not filter out fields not used

(2) Iterable isn't supported by Beam SQL

A full stack will be helpful to investigate (1), and possible optimization

The Iterable field type, introduced in #10003 was meant to be different than ARRAY. However the fix here treats it the same as ARRAY. It may have performance implications, and not work for large iterables? Maybe add a comment here or a TODO.

until it's optimized for Iterable, one can just write

case ARRAY:
case ITERABLE:
    return ....

so no need duplicate the line.

Also, there are switch (fieldType.getTypeName()) branches in several places in BeamCalRel, could it be all fixed for consistency?

add iterable support

fa1bc94

github-actions bot added java extensions sql labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beam SQL Extension raised an error when the input row contained iterable fields #31117

Beam SQL Extension raised an error when the input row contained iterable fields #31117

yyfhust commented Apr 26, 2024 •

edited

github-actions bot commented Apr 26, 2024

Abacn commented May 8, 2024

Beam SQL Extension raised an error when the input row contained iterable fields #31117

Are you sure you want to change the base?

Beam SQL Extension raised an error when the input row contained iterable fields #31117

Conversation

yyfhust commented Apr 26, 2024 • edited

GitHub Actions Tests Status (on master branch)

github-actions bot commented Apr 26, 2024

Abacn commented May 8, 2024

yyfhust commented Apr 26, 2024 •

edited