-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scio-core](feature) Sample SCollection with max weight #5352
Conversation
20fc05d
to
6baf89c
Compare
scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala
Outdated
Show resolved
Hide resolved
val observer = new ElementByteSizeObserver { | ||
override def reportElementSize(elementByteSize: Long): Unit = size += elementByteSize | ||
} | ||
bCoder.registerByteSizeObserver(e, observer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is better than CoderUtils.encodeToByteArray(beamCoder, v).length
because it uses a new CountingOutputStream(ByteStreams.nullOutputStream()))
behind the scene.
The code can be factorized with batchByteSized
moving out of draft state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala
Outdated
Show resolved
Hide resolved
scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala
Outdated
Show resolved
Hide resolved
scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala
Outdated
Show resolved
Hide resolved
5b00a85
to
e528d01
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5352 +/- ##
==========================================
+ Coverage 61.22% 61.30% +0.07%
==========================================
Files 302 303 +1
Lines 10846 10876 +30
Branches 761 780 +19
==========================================
+ Hits 6641 6667 +26
- Misses 4205 4209 +4 ☔ View full report in Codecov by Sentry. |
scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala
Outdated
Show resolved
Hide resolved
val observer = new ElementByteSizeObserver { | ||
override def reportElementSize(elementByteSize: Long): Unit = size += elementByteSize | ||
} | ||
bCoder.registerByteSizeObserver(e, observer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala
Outdated
Show resolved
Hide resolved
f493592
to
7479d50
Compare
7479d50
to
b3e6780
Compare
def sample(sampleSize: Int): SCollection[Iterable[T]] = this.transform { | ||
_.pApply(Sample.fixedSizeGlobally(sampleSize)).map(_.asScala) | ||
} | ||
// TODO move to implicit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to resolve this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be binary breaking. it is a reminder for later. maybe worth an issue with milestone
Create a sampling API with weighted limit and and byte size limit