Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pyspark): manage streaming queries #9157

Open
1 task done
chloeh13q opened this issue May 8, 2024 · 1 comment
Open
1 task done

feat(pyspark): manage streaming queries #9157

chloeh13q opened this issue May 8, 2024 · 1 comment
Labels
feature Features or general enhancements pyspark The Apache PySpark backend streaming Issue related to streaming APIs or backends

Comments

@chloeh13q
Copy link
Contributor

Is your feature request related to a problem?

No

What is the motivation behind your request?

Better support for streaming functionalities in Ibis.

Describe the solution you'd like

Pyspark provides methods to manage streaming queries: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.streaming.StreamingQuery.html

If I'm submitting streaming workloads via Ibis, Ibis is managing the underlying compilation and submission of the query. Because streaming queries don't return and are run continuously in the background, sometimes I need to check on the status or stop the query. Right now I cannot do this directly in Python code because Ibis manages the query submission.

I think we can expose a wrapper class that allows users to interact with the streaming query in Ibis code to allow for a smoother user experience w/ streaming. I'm not sure whether this is within the scope of Ibis, but right now it's hard to manipulate the underlying query because Ibis does not return it (it will require Ibis returning the underlying pyspark StreamingQuery object).

What version of ibis are you running?

main

What backend(s) are you using, if any?

pyspark

Code of Conduct

  • I agree to follow this project's Code of Conduct
@chloeh13q chloeh13q added the feature Features or general enhancements label May 8, 2024
@gforsyth gforsyth added pyspark The Apache PySpark backend streaming Issue related to streaming APIs or backends labels May 10, 2024
@gforsyth
Copy link
Member

As an initial implementation, returning the StreamingQuery object seems like a reasonable goal, and then we can explore further conveniences on top of that as we and users see fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements pyspark The Apache PySpark backend streaming Issue related to streaming APIs or backends
Projects
Status: backlog
Development

No branches or pull requests

2 participants