Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

euphoria-core: native operator support #42

Open
xitep opened this issue Feb 24, 2017 · 1 comment
Open

euphoria-core: native operator support #42

xitep opened this issue Feb 24, 2017 · 1 comment

Comments

@xitep
Copy link
Contributor

xitep commented Feb 24, 2017

At some point in time we'll hit the point at which users will want to use an executor specific operation, which is not available through the euphoria API. We'd like to support such scenarios by providing a dedicated operator Native which will allow users to embedded technology native operations directly into a euphoria flow. The rough idea is:

Dataset<...> euphoriaDataset =
    Native.of(input-euphoria-dataset)
    .using((input-native-execution-dataset-eg-flink) -> {
        /* allow users to use the native input dataset and require them to return another */
    })
    .output();

One motivation is to all experimentation with the new operations of the native execution engine, another to avoidthe need for euphoria specific wrappers around existing engine specific libraries, e.g. MLlib.

The above suggested example has clearly a lot of problems which we'll need to work through and either accept the implications or work out corresponding solutions. One natural implication of the Native operator will be the non-portability to different execution engines.

@xitep
Copy link
Contributor Author

xitep commented Feb 24, 2017

Alternatively, another suggestion has been made by @je-ik :

Client code as part of a flow:

Dataset<T> input = ...;
  Dataset<Y> transformed = Transform
      .named("mytransform")
      .of(input).return(Y.class).output();

Further register a handle for "mytransform" as part of the engine dependent executor set-up:

 SparkExecutor executor = new SparkExecutor(conf);
  executor.execute(flow)
      .withTranslation("mytransform", MyTransformationTranslator::new);
  executor.waitForCompletion();

where MyTransformationTranslator might look like this:

 public static class MyTransformationTranslator
      implements SparkOperatorTranslator<T, Y> {

    @Override
    JavaRDD<Y> translate(Operator<?, ?> input, SparkExecutorContext context) {
      JavaRDD<T> inputRDD = context.getRDD(
          Iterables.getOnlyElement(input.listInputs()));
      JavaRDD<Y> result = inputRDD.map().filter(). .... ;
      return result;
    }
  }

The big benefit here is, that the flow still stays independent of the specific executor engine! The fact, that we end up with a bit more boilerplate than with the initially suggested Native operator doesn't hurt, since supporting native operations should be merely possible, not necessarily easy :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant