Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Build beam.MLTransform #26640

Closed
1 of 15 tasks
AnandInguva opened this issue May 10, 2023 · 3 comments
Closed
1 of 15 tasks

[Feature Request]: Build beam.MLTransform #26640

AnandInguva opened this issue May 10, 2023 · 3 comments

Comments

@AnandInguva
Copy link
Contributor

What would you like to happen?

We aim to create an easy-to-use PTransform in Apache Beam, called MLTransform, for carrying out common machine learning transforms on large datasets. MLTransform is designed to be framework-agnostic, and its primary goal is to provide an intuitive interface for users to perform various data processing transformations without writing complex code or dealing with underlying libraries. The first framework that we will make use of is TensorFlow Transform (TFT) which has many production hardened transformations already written using the Apache Beam primitives.

Design doc: https://docs.google.com/document/d/1rQkSm_8tseLqDQaLohtlCGqt5pvMaP0XIpPi5UD0LCQ/edit#
dev list discussion: https://lists.apache.org/thread/d4thp7xs1y0jm5m9v5xzshln9fwvsm7s

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@AnandInguva
Copy link
Contributor Author

AnandInguva commented Jun 30, 2023

v0 PR: #26795

@AnandInguva
Copy link
Contributor Author

AnandInguva commented Jun 30, 2023

Work to do:

  1. Add support for pa.RecordBatch - ref: MLTransform #26795 (comment)

@AnandInguva AnandInguva mentioned this issue Jun 30, 2023
3 tasks
@AnandInguva AnandInguva pinned this issue Jun 30, 2023
@AnandInguva AnandInguva unpinned this issue Jun 30, 2023
@brucearctor
Copy link
Contributor

@AnandInguva should this issue be closed?

@github-actions github-actions bot added this to the 2.57.0 Release milestone Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants