Skip to content

Beam 2.52.0 release

Compare
Choose a tag to compare
@damccorm damccorm released this 17 Nov 18:45
· 1403 commits to master since this release

We are happy to present the new 2.52.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.52.0, check out the detailed release notes.

Highlights

  • Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been finally removed from Java SDK "core" package.
    Please, use beam-sdks-java-extensions-avro instead. This will allow to easily update Avro version in user code without
    potential breaking changes in Beam "core" since the Beam Avro extension already supports the latest Avro versions and
    should handle this. (#25252).
  • Publishing Java 21 SDK container images now supported as part of Apache Beam release process. (#28120)
    • Direct Runner and Dataflow Runner support running pipelines on Java21 (experimental until tests fully setup). For other runners (Flink, Spark, Samza, etc) support status depend on runner projects.

New Features / Improvements

  • Add UseDataStreamForBatch pipeline option to the Flink runner. When it is set to true, Flink runner will run batch
    jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed
    using the DataSet API.
  • upload_graph as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK (PR#28621.
  • state amd side input cache has been enabled to a default of 100 MB. Use --max_cache_memory_usage_mb=X to provide cache size for the user state API and side inputs. (Python) (#28770).
  • Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the README.

Breaking Changes

  • org.apache.beam.sdk.io.CountingSource.CounterMark uses custom CounterMarkCoder as a default coder since all Avro-dependent
    classes finally moved to extensions/avro. In case if it's still required to use AvroCoder for CounterMark, then,
    as a workaround, a copy of "old" CountingSource class should be placed into a project code and used directly
    (#25252).
  • Renamed host to firestoreHost in FirestoreOptions to avoid potential conflict of command line arguments (Java) (#29201).

Bugfixes

  • Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's BigtableIO.BigtableSource when you have more cores than bytes to read (Java) #28793.
  • watch_file_pattern arg of the RunInference arg had no effect prior to 2.52.0. To use the behavior of arg watch_file_pattern prior to 2.52.0, follow the documentation at https://beam.apache.org/documentation/ml/side-input-updates/ and use WatchFilePattern PTransform as a SideInput. (#28948)
  • MLTransform doesn't output artifacts such as min, max and quantiles. Instead, MLTransform will add a feature to output these artifacts as human readable format - #29017. For now, to use the artifacts such as min and max that were produced by the eariler MLTransform, use read_artifact_location of MLTransform, which reads artifacts that were produced earlier in a different MLTransform (#29016)
  • Fixed a memory leak, which affected some long-running Python pipelines: #28246.

Security Fixes

List of Contributors

According to git shortlog, the following people contributed to the 2.52.0 release. Thank you to all contributors!

Ahmed Abualsaud
Ahmet Altay
Aleksandr Dudko
Alexey Romanenko
Anand Inguva
Andrei Gurau
Andrey Devyatkin
BjornPrime
Bruno Volpato
Bulat
Chamikara Jayalath
Damon
Danny McCormick
Devansh Modi
Dominik Dębowczyk
Ferran Fernández Garrido
Hai Joey Tran
Israel Herraiz
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Jeffrey Kinard
Jiangjie Qin
Jing
Joar Wandborg
Johanna Öjeling
Julien Tournay
Kanishk Karanawat
Kenneth Knowles
Kerry Donny-Clark
Luís Bianchin
Minbo Bae
Pranav Bhandari
Rebecca Szper
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
RyuSA
Shunping Huang
Steven van Rossum
Svetak Sundhar
Tony Tang
Vitaly Terentyev
Vivek Sumanth
Vlado Djerek
Yi Hu
aku019
brucearctor
caneff
damccorm
ddebowczyk92
dependabot[bot]
dpcollins-google
edman124
gabry.wu
illoise
johnjcasey
jonathan-lemos
kennknowles
liferoad
magicgoody
martin trieu
nancyxu123
pablo rodriguez defino
tvalentyn