Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate timestamps when creating windowed or timestamped values #16749

Closed
wants to merge 1 commit into from

Conversation

kennknowles
Copy link
Member

In debugging a problem with out-of-bounds timestamps, some bad timestamps are caught when a DoFn outputs, when it would be better to catch it early as possible.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@kennknowles
Copy link
Member Author

R: @laraschmidt

@kennknowles
Copy link
Member Author

Run Java PreCommit

@kennknowles
Copy link
Member Author

green!

@y1chi
Copy link
Contributor

y1chi commented Feb 11, 2022

R: @laraschmidt

@laraschmidt
Copy link
Contributor

This is going to break some internal tests as is because of the bug we found on the portable runner. We may have to wait for that to be fixed and rolled out first.

@kennknowles
Copy link
Member Author

On the other hand, we know from those tests that the portable runner is broken, so would sickbaying the tests work?

@laraschmidt
Copy link
Contributor

That issue is only affecting a very small subset of jobs because of where the check is located (e.g. it requires allowed skew be infinite). I'm pretty sure this would affect all jobs with impulse.

@kennknowles
Copy link
Member Author

run dataflow validatesrunner

@kennknowles
Copy link
Member Author

Run Java Dataflow V2 ValidatesRunner

@kennknowles
Copy link
Member Author

Run Java Dataflow V2 ValidatesRunner Streaming

@aaltay
Copy link
Member

aaltay commented Feb 25, 2022

What is the next step on this PR?

@laraschmidt
Copy link
Contributor

Needs to wait for an internal dataflow fix to roll out, otherwise it will break Dataflow. :)

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 26, 2022
@kennknowles
Copy link
Member Author

I'm thinking this might be ready now.

@kennknowles
Copy link
Member Author

Run Dataflow ValidatesRunner

@kennknowles
Copy link
Member Author

known pulsar & fnapi flakes

@kennknowles
Copy link
Member Author

Run Java PreCommit

@kennknowles
Copy link
Member Author

Run Java Dataflow V2 ValidatesRunner

@kennknowles
Copy link
Member Author

Run Java Dataflow V2 ValidatesRunner Streaming

@kennknowles
Copy link
Member Author

Yup, looks good now. PTAL.

@kennknowles
Copy link
Member Author

Run Java Dataflow V2 ValidatesRunner Streaming

@kennknowles
Copy link
Member Author

@dpcollins-google

@aaltay
Copy link
Member

aaltay commented May 12, 2022

@dpcollins-google - could you please review this?

@aaltay
Copy link
Member

aaltay commented May 27, 2022

@dpcollins-google - ping?

@dpcollins-google
Copy link
Contributor

dpcollins-google commented May 27, 2022

This seems to be a different issue than the one I'm running into. In particular, its validating that the timestamp is in bounds [MIN, MAX] while my issue is that impulse is not returning MIN. LGTM though

@kennknowles
Copy link
Member Author

Taking a look at whether runner v2 is still producing invalid timestamps.

@kennknowles
Copy link
Member Author

Seeing Provided timestamp -290308-12-21T19:59:05.224Z must be within bounds [-290308-12-21T19:59:05.225Z, 294247-01-10T04:00:54.775Z]. Unless I am reading it wrong, that timestamp is within bounds, so the check added here is buggy.

@aaltay
Copy link
Member

aaltay commented Jun 9, 2022

Last ping from me on this PR :)

What is the next step on this PR? Should this be closed?

@kennknowles
Copy link
Member Author

No need to ping. Just my limited bandwidth. I will get to it.

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 20, 2023
@github-actions
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jun 27, 2023
@kennknowles kennknowles reopened this Jun 28, 2023
@github-actions github-actions bot removed the stale label Jun 29, 2023
@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 29, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 6, 2023
@kennknowles kennknowles deleted the timestampedvalue branch January 4, 2024 20:35
@hpvd
Copy link

hpvd commented Apr 24, 2024

added this to [Parent issue] Support for Apache Pulsar #31078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants