Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

euphoria-local: Allowing persist of intermediate dataset #255

Open
t-novak opened this issue Jan 24, 2018 · 6 comments
Open

euphoria-local: Allowing persist of intermediate dataset #255

t-novak opened this issue Jan 24, 2018 · 6 comments
Labels

Comments

@t-novak
Copy link
Contributor

t-novak commented Jan 24, 2018

I tried to directly persist intermediate dataset. But the sink is empty:

Dataset<T> data = ... // non-empty
data.persist(sink);
Dataset<T> FlatMap.of(data)...

It seems that indirect persist works as expected:

FlatMap.of(data)...output()
  .persist(sink);` 

Is it possible to either fix it or throw some exception at least?

@t-novak t-novak added the bug label Jan 24, 2018
@t-novak
Copy link
Contributor Author

t-novak commented Jan 24, 2018

public void persistTest() throws InterruptedException, ExecutionException {

@je-ik
Copy link
Contributor

je-ik commented Jan 24, 2018

Thanks for the report. Does the provided test fail?

@je-ik
Copy link
Contributor

je-ik commented Jan 24, 2018

If so, can you please create branch with this failing test?

@t-novak
Copy link
Contributor Author

t-novak commented Jan 25, 2018

Yes, the test fails.

Branch: https://github.com/seznam/euphoria/tree/tnovak/persist-test

@dmvk
Copy link
Contributor

dmvk commented Jan 25, 2018

Thanks, we'll look into this ;)

@je-ik
Copy link
Contributor

je-ik commented Jan 25, 2018

There is fundamental flaw in translation of Flow into runnable DAG. This flaw will be solved by #256, until then I suggest (a slightly suboptimal and highly ugly) workaround:

  • the problem is that no Dataset can have both output sink and be consumed by another operator
  • there is no problem for a single Dataset to be consumed by multiple operators
  • therefore, the workaround is to add a (dummy) mapping operation between output and the intermediate dataset:
Dataset<T> data = ... // non-empty
MapElements.of(data).using(e -> e).output().persist(sink);
Dataset<T> FlatMap.of(data)...

We will focus on correct solution (#256), but until then, this seems to be the only way out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants