Add `evalFlatten` methods for stream of effects #2851

bplommer · 2022-03-17T11:27:05Z

No description provided.

diesalbla · 2022-03-22T21:33:38Z

core/shared/src/main/scala/fs2/Stream.scala

+
+    /** Evaluates all inner effects concurrently, emitting the results in order.
+      */
+    def parEvalFlattenUnbounded(implicit F: Concurrent[F]) = self.parEvalMapUnbounded(identity)


So, just to clarify something: compiling this stream would immediately start a background action that would launch all the actions in this stream. The inside of the auxiliary parEvalMapAction function use an internal queue and a semaphore to control how many items from the source are running, but if the concurrency are infinite then those do not limit progress. Also nothing allows the user to control, based on the outputs of the resulting stream, if any actions from the source are delayed... Is that a desirable mode of operation?

Note that the parJoinUnbounded relies on a single-chunk channel (a funnel) to stop streams from advancing before the consumer has pulled latest chunk. So, the semantics of that unbounded is to launch all source streams to, but contending for that channel. This unbounded, on the other hand, would be "pull all the F[O] actions from the stream, and start all of them. So, operation of those actions Furthermore, without the consumer being able to control progress of the source, this stream offers no means for back-pressure. If the consumer is too slow, this operation is going to fill that unbounded buffer until it exhaust memory...

I think this method is as unsafe as other *Unbounded methods. All concurrent methods: parJoin, parEvalMap and other - are effectively push based and the only thing that guards users from launching too many computations is semaphore. In case of *Unbounded semaphore isn't working, so they all should be unsafe in that regard.

However, I see you make a distinction between parJoinUnbounded and parEvalMapUnbounded in a way they block on Channel. I think this functionality isn't intended as a protection and shouldn't be relied on.

Personally, when I use par*Unbounded methods I assume that the stream won't be parallel enough to overwhelm the consumer and choose not to bother restricting the level of parallelism. Otherwise, I would have to widen interfaces(functions or constructors) with the configuration of parallelism. Because when you really want to restrict parallelism, you should make that restriction configurable.

Personally, when I use par*Unbounded methods I assume that the stream won't be parallel enough to overwhelm the consumer and choose not to bother restricting the level of parallelism

There are two mixed concerns here: parallelism, and buffering vs backpressuring.

In parJoinUnbounded you take an outer stream of sources of data, and it launches all sources to pull, and push into the queue, in parallel. That outer stream may not end and keep incorporating new sources; so parJoinUnbounded does not limit parallelism. However, what it does restrict, by means of that output queue, is how many items are pulled from each source into memory.

The problem with this new combinator is that it would prefetch and load all data from its source into local memory, without any feedback from the consumer to stop it. Thus, a short slowdown in the consumer would cause this combinator to accrue a lot of data, and thus crash the program. A basic reliability guideline is to avoid infinite buffers.

However, I see you make a distinction between parJoinUnbounded and parEvalMapUnbounded in how they block on Channel. I don't think this functionality is intended as a protection and should be relied on.

Given that back-pressuring and laziness is an essential part of fs2 streams, building that check and balance into the pipeline seems to me a crucial part of that combinator.

As an aside, there are FS2 combinators that do fetch a lot of data, like prefetchAll, but those are some legacies. Also, some choices of parameters in other combinators can cause trouble, but those cannot be avoided.

Sorry for the radio silence on this!

Yes, the use case I have in mind for this is when backpressure is provided upstream - the case I have in mind (which occurs for example in fs2-kafka) is where you have an operation that returns F[F[Result]], where the outer F does a backpressured enqueue operation and then returns the inner F which is a handle for waiting on the result - so there is no need for backpressure on the inner F.

This should definitely have documentation and an example though, so I'm going to remove parEvalFlattenUnbounded from this PR and maybe put it in a separate one later.

core/shared/src/main/scala/fs2/Stream.scala

Daenyth · 2022-03-29T12:35:48Z

core/shared/src/main/scala/fs2/Stream.scala

@@ -4117,6 +4114,21 @@ object Stream extends StreamLowPriority {
      parJoin(Int.MaxValue)
  }

+  /** Provides syntax for a stream of `F` effects. */
+  implicit class StreamFOps[F[_], O](private val self: Stream[F, F[O]]) {


For what it's worth, I've regretted using this shape of stream every time. You can run yourself out of memory really quickly if those inner F[A] are nontrivial and you have a lot of them.

I'm uncomfortable with encoding this in the library in a way that makes it easier to use, because I personally think this shape should be discouraged

Especially when the implementation here is pretty trivial.

And as a library user, I think it's a lot more clear to the code reader to see an inline evalMap(identity) rather than yet another method they need to learn as part of the api

So I'm a polite 👎 on the PR for those reasons

good to know @Daenyth, I'd never think it could blow the memory. Would a suspend help resolve that? (at least to reduce the overhead of the non-trivial ones)

Probably not because the issue is the memory used by having large Chunk[IO[A]]

It's also possibly my codebase just was doing something really silly and that's why it cost so much memory for that structure

Add evalFlatten methods

a03fc81

bplommer marked this pull request as ready for review March 17, 2022 12:45

diesalbla reviewed Mar 22, 2022

View reviewed changes

core/shared/src/main/scala/fs2/Stream.scala Outdated Show resolved Hide resolved

Add type annotation, remove parEvalFlattenUnbounded

0c898d1

bplommer requested a review from diesalbla March 27, 2022 13:55

Daenyth reviewed Mar 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `evalFlatten` methods for stream of effects #2851

Add `evalFlatten` methods for stream of effects #2851

bplommer commented Mar 17, 2022

diesalbla Mar 22, 2022 •

edited

nikiforo Mar 23, 2022 •

edited

diesalbla Mar 23, 2022

bplommer Mar 27, 2022

Daenyth Mar 29, 2022

kubukoz Mar 29, 2022

Daenyth Mar 29, 2022

Add evalFlatten methods for stream of effects #2851

Are you sure you want to change the base?

Add evalFlatten methods for stream of effects #2851

Conversation

bplommer commented Mar 17, 2022

diesalbla Mar 22, 2022 • edited

Choose a reason for hiding this comment

nikiforo Mar 23, 2022 • edited

Choose a reason for hiding this comment

diesalbla Mar 23, 2022

Choose a reason for hiding this comment

bplommer Mar 27, 2022

Choose a reason for hiding this comment

Daenyth Mar 29, 2022

Choose a reason for hiding this comment

kubukoz Mar 29, 2022

Choose a reason for hiding this comment

Daenyth Mar 29, 2022

Choose a reason for hiding this comment

Add `evalFlatten` methods for stream of effects #2851

Add `evalFlatten` methods for stream of effects #2851

diesalbla Mar 22, 2022 •

edited

nikiforo Mar 23, 2022 •

edited