Skip to content

Releases: apache/beam

Beam 2.47.0 release

10 May 17:58
Compare
Choose a tag to compare

We are happy to present the new 2.47.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.47.0, check out the detailed release notes.

Highlights

  • Apache Beam adds Python 3.11 support (#23848).

I/Os

  • BigQuery Storage Write API is now available in Python SDK via cross-language (#21961).
  • Added HbaseIO support for writing RowMutations (ordered by rowkey) to Hbase (Java) (#25830).
  • Added fileio transforms MatchFiles, MatchAll and ReadMatches (Go) (#25779).
  • Add integration test for JmsIO + fix issue with multiple connections (Java) (#25887).

New Features / Improvements

  • The Flink runner now supports Flink 1.16.x (#25046).
  • Schema'd PTransforms can now be directly applied to Beam dataframes just like PCollections.
    (Note that when doing multiple operations, it may be more efficient to explicitly chain the operations
    like df | (Transform1 | Transform2 | ...) to avoid excessive conversions.)
  • The Go SDK adds new transforms periodic.Impulse and periodic.Sequence that extends support
    for slowly updating side input patterns. (#23106)
  • Python SDK now supports protobuf <4.23.0 (#24599)
  • Several Google client libraries in Python SDK dependency chain were updated to latest available major versions. (#24599)

Breaking Changes

  • If a main session fails to load, the pipeline will now fail at worker startup. (#25401).
  • Python pipeline options will now ignore unparsed command line flags prefixed with a single dash. (#25943).
  • The SmallestPerKey combiner now requires keyword-only arguments for specifying optional parameters, such as key and reverse. (#25888).

Deprecations

  • Cloud Debugger support and its pipeline options are deprecated and will be removed in the next Beam version,
    in response to the Google Cloud Debugger service turning down.
    (Java) (#25959).

Bugfixes

  • BigQuery sink in STORAGE_WRITE_API mode in batch pipelines might result in data consistency issues during the handling of other unrelated transient errors for Beam SDKs 2.35.0 - 2.46.0 (inclusive). For more details see: #26521

List of Contributors

According to git shortlog, the following people contributed to the 2.47.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Alexey Romanenko

Amir Fayazi

Amrane Ait Zeouay

Anand Inguva

Andrew Pilloud

Andrey Kot

Bjorn Pedersen

Bruno Volpato

Buqian Zheng

Chamikara Jayalath

ChangyuLi28

Damon

Danny McCormick

Dmitry Repin

George Ma

Jack Dingilian

Jack McCluskey

Jasper Van den Bossche

Jeremy Edwards

Jiangjie (Becket) Qin

Johanna Öjeling

Juta Staes

Kenneth Knowles

Kyle Weaver

Mattie Fu

Moritz Mack

Nick Li

Oleh Borysevych

Pablo Estrada

Rebecca Szper

Reuven Lax

Reza Rokni

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Saadat Su

Saifuddin53

Sam Rohde

Shubham Krishna

Svetak Sundhar

Theodore Ni

Thomas Gaddy

Timur Sultanov

Udi Meiri

Valentyn Tymofieiev

Xinyu Liu

Yanan Hao

Yi Hu

Yuvi Panda

andres-vv

bochap

dannikay

darshan-sj

dependabot[bot]

harrisonlimh

hnnsgstfssn

jrmccluskey

liferoad

tvalentyn

xianhualiu

zhangskz

Beam 2.46.0 release

17 Mar 14:57
Compare
Choose a tag to compare

We are happy to present the new 2.46.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.46.0, check out the detailed release notes.

Highlights

  • Java SDK containers migrated to Eclipse Temurin
    as a base. This change migrates away from the deprecated OpenJDK
    container. Eclipse Temurin is currently based upon Ubuntu 22.04 while the OpenJDK
    container was based upon Debian 11.
  • RunInference PTransform will accept model paths as SideInputs in Python SDK. (#24042)
  • RunInference supports ONNX runtime in Python SDK (#22972)
  • Tensorflow Model Handler for RunInference in Python SDK (#25366)
  • Java SDK modules migrated to use :sdks:java:extensions:avro (#24748)

I/Os

  • Added in JmsIO a retry policy for failed publications (Java) (#24971).
  • Support for LZMA compression/decompression of text files added to the Python SDK (#25316)
  • Added ReadFrom/WriteTo Csv/Json as top-level transforms to the Python SDK.

New Features / Improvements

  • Add UDF metrics support for Samza portable mode.
  • Option for SparkRunner to avoid the need of SDF output to fit in memory (#23852).
    This helps e.g. with ParquetIO reads. Turn the feature on by adding experiment use_bounded_concurrent_output_for_sdf.
  • Add WatchFilePattern transform, which can be used as a side input to the RunInference PTransfrom to watch for model updates using a file pattern. (#24042)
  • Add support for loading TorchScript models with PytorchModelHandler. The TorchScript model path can be
    passed to PytorchModelHandler using torch_script_model_path=<path_to_model>. (#25321)
  • The Go SDK now requires Go 1.19 to build. (#25545)
  • The Go SDK now has an initial native Go implementation of a portable Beam Runner called Prism. (#24789)

Breaking Changes

  • The deprecated SparkRunner for Spark 2 (see 2.41.0) was removed (#25263).
  • Python's BatchElements performs more aggressive batching in some cases,
    capping at 10 second rather than 1 second batches by default and excluding
    fixed cost in this computation to better handle cases where the fixed cost
    is larger than a single second. To get the old behavior, one can pass
    target_batch_duration_secs_including_fixed_cost=1 to BatchElements.

Deprecations

  • Avro related classes are deprecated in module beam-sdks-java-core and will be eventually removed. Please, migrate to a new module beam-sdks-java-extensions-avro instead by importing the classes from org.apache.beam.sdk.extensions.avro package.
    For the sake of migration simplicity, the relative package path and the whole class hierarchy of Avro related classes in new module is preserved the same as it was before.
    For example, import org.apache.beam.sdk.extensions.avro.coders.AvroCoder class instead oforg.apache.beam.sdk.coders.AvroCoder. (#24749).

List of Contributors

According to git shortlog, the following people contributed to the 2.46.0 release. Thank you to all contributors!

Ahmet Altay

Alan Zhang

Alexey Romanenko

Amrane Ait Zeouay

Anand Inguva

Andrew Pilloud

Brian Hulette

Bruno Volpato

Byron Ellis

Chamikara Jayalath

Damon

Danny McCormick

Darkhan Nausharipov

David Katz

Dmitry Repin

Doug Judd

Egbert van der Wal

Elizaveta Lomteva

Evan Galpin

Herman Mak

Jack McCluskey

Jan Lukavský

Johanna Öjeling

John Casey

Jozef Vilcek

Junhao Liu

Juta Staes

Katie Liu

Kiley Sok

Liam Miller-Cushon

Luke Cwik

Moritz Mack

Ning Kang

Oleh Borysevych

Pablo E

Pablo Estrada

Reuven Lax

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Ruslan Altynnikov

Ryan Zhang

Sam Rohde

Sam Whittle

Sam sam

Sergei Lilichenko

Shivam

Shubham Krishna

Theodore Ni

Timur Sultanov

Tony Tang

Vachan

Veronica Wasson

Vincent Devillers

Vitaly Terentyev

William Ross Morrow

Xinyu Liu

Yi Hu

ZhengLin Li

Ziqi Ma

ahmedabu98

alexeyinkin

aliftadvantage

bullet03

dannikay

darshan-sj

dependabot[bot]

johnjcasey

kamrankoupayi

kileys

liferoad

nancyxu123

nickuncaged1201

pablo rodriguez defino

tvalentyn

xqhu

Beam 2.45.0 release

15 Feb 21:47
Compare
Choose a tag to compare

We are happy to present the new 2.45.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.45.0, check out the detailed release notes.

I/Os

  • MongoDB IO connector added (Go) (#24575).

New Features / Improvements

  • RunInference Wrapper with Sklearn Model Handler support added in Go SDK (#24497).
  • Adding override of allowed TLS algorithms (Java), now maintaining the disabled/legacy algorithms
    present in 2.43.0 (up to 1.8.0_342, 11.0.16, 17.0.2 for respective Java versions). This is accompanied
    by an explicit re-enabling of TLSv1 and TLSv1.1 for Java 8 and Java 11.
  • Add UDF metrics support for Samza portable mode.

Breaking Changes

  • Portable Java pipelines, Go pipelines, Python streaming pipelines, and portable Python batch
    pipelines on Dataflow are required to use Runner V2. The disable_runner_v2,
    disable_runner_v2_until_2023, disable_prime_runner_v2 experiments will raise an error during
    pipeline construction. You can no longer specify the Dataflow worker jar override. Note that
    non-portable Java jobs and non-portable Python batch jobs are not impacted. (#24515).

Bugfixes

  • Avoids Cassandra syntax error when user-defined query has no where clause in it (Java) (#24829).
  • Fixed JDBC connection failures (Java) during handshake due to deprecated TLSv1(.1) protocol for the JDK. (#24623)
  • Fixed Python BigQuery Batch Load write may truncate valid data when deposition sets to WRITE_TRUNCATE and incoming data is large (Python) (#24623).
  • Fixed Kafka watermark issue with sparse data on many partitions (#24205)

List of Contributors

According to git shortlog, the following people contributed to the 2.45.0 release. Thank you to all contributors!

AdalbertMemSQL

Ahmed Abualsaud

Ahmet Altay

Alexey Romanenko

Anand Inguva

Andrea Nardelli

Andrei Gurau

Andrew Pilloud

Benjamin Gonzalez

BjornPrime

Brian Hulette

Bulat

Byron Ellis

Chamikara Jayalath

Charles Rothrock

Damon

Daniela Martín

Danny McCormick

Darkhan Nausharipov

Dejan Spasic

Diego Gomez

Dmitry Repin

Doug Judd

Elias Segundo Antonio

Evan Galpin

Evgeny Antyshev

Fernando Morales

Jack McCluskey

Johanna Öjeling

John Casey

Junhao Liu

Kanishk Karanawat

Kenneth Knowles

Kiley Sok

Liam Miller-Cushon

Lucas Marques

Luke Cwik

MakarkinSAkvelon

Marco Robles

Mark Zitnik

Melanie

Moritz Mack

Ning Kang

Oleh Borysevych

Pablo Estrada

Philippe Moussalli

Piyush Sagar

Rebecca Szper

Reuven Lax

Rick Viscomi

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Sam Whittle

Sergei Lilichenko

Seung Jin An

Shane Hansen

Sho Nakatani

Shunya Ueta

Siddharth Agrawal

Timur Sultanov

Veronica Wasson

Vitaly Terentyev

Xinbin Huang

Xinyu Liu

Xinyue Zhang

Yi Hu

ZhengLin Li

alexeyinkin

andoni-guzman

andthezhang

bullet03

camphillips22

gabihodoroaga

harrisonlimh

pablo rodriguez defino

ruslan-ikhsan

tvalentyn

yyy1000

zhengbuqian

v2.44.0

17 Jan 20:02
Compare
Choose a tag to compare

We are happy to present the new 2.44.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.44.0, check out the detailed release notes.

I/Os

  • Support for Bigtable sink (Write and WriteBatch) added (Go) (#23324).
  • S3 implementation of the Beam filesystem (Go) (#23991).
  • Support for SingleStoreDB source and sink added (Java) (#22617).
  • Added support for DefaultAzureCredential authentication in Azure Filesystem (Python) (#24210).
  • Added new CdapIO for CDAP Batch and Streaming Source/Sinks (Java) (#24961).
  • Added new SparkReceiverIO for Spark Receivers 2.4.* (Java) (#24960).

New Features / Improvements

  • Beam now provides a portable "runner" that can render pipeline graphs with
    graphviz. See python -m apache_beam.runners.render --help for more details.
  • Local packages can now be used as dependencies in the requirements.txt file, rather
    than requiring them to be passed separately via the --extra_package option
    (Python) (#23684).
  • Pipeline Resource Hints now supported via --resource_hints flag (Go) (#23990).
  • Make Python SDK containers reusable on portable runners by installing dependencies to temporary venvs (BEAM-12792).
  • RunInference model handlers now support the specification of a custom inference function in Python (#22572)
  • Support for map_windows urn added to Go SDK (#24307).

Breaking Changes

  • ParquetIO.withSplit was removed since splittable reading has been the default behavior since 2.35.0. The effect of
    this change is to drop support for non-splittable reading (Java)(#23832).
  • beam-sdks-java-extensions-google-cloud-platform-core is no longer a
    dependency of the Java SDK Harness. Some users of a portable runner (such as Dataflow Runner v2)
    may have an undeclared dependency on this package (for example using GCS with
    TextIO) and will now need to declare the dependency.
  • beam-sdks-java-core is no longer a dependency of the Java SDK Harness. Users of a portable
    runner (such as Dataflow Runner v2) will need to provide this package and its dependencies.
  • Slices now use the Beam Iterable Coder. This enables cross language use, but breaks pipeline updates
    if a Slice type is used as a PCollection element or State API element. (Go)#24339

Bugfixes

  • Fixed JmsIO acknowledgment issue (Java) (#20814)
  • Fixed Beam SQL CalciteUtils (Java) and Cross-language JdbcIO (Python) did not support JDBC CHAR/VARCHAR, BINARY/VARBINARY logical types (#23747, #23526).
  • Ensure iterated and emitted types are used with the generic register package are registered with the type and schema registries.(Go) (#23889)

List of Contributors

According to git shortlog, the following people contributed to the 2.44.0 release. Thank you to all contributors!

Ahmed Abualsaud
Ahmet Altay
Alex Merose
Alexey Inkin
Alexey Romanenko
Anand Inguva
Andrei Gurau
Andrej Galad
Andrew Pilloud
Ayush Sharma
Benjamin Gonzalez
Bjorn Pedersen
Brian Hulette
Bruno Volpato
Bulat Safiullin
Chamikara Jayalath
Chris Gavin
Damon Douglas
Danielle Syse
Danny McCormick
Darkhan Nausharipov
David Cavazos
Dmitry Repin
Doug Judd
Elias Segundo Antonio
Evan Galpin
Evgeny Antyshev
Heejong Lee
Henrik Heggelund-Berg
Israel Herraiz
Jack McCluskey
Jan Lukavsk\u00fd
Janek Bevendorff
Johanna \u00d6jeling
John J. Casey
Jozef Vilcek
Kanishk Karanawat
Kenneth Knowles
Kiley Sok
Laksh
Liam Miller-Cushon
Luke Cwik
MakarkinSAkvelon
Minbo Bae
Moritz Mack
Nancy Xu
Ning Kang
Nivaldo Tokuda
Oleh Borysevych
Pablo Estrada
Philippe Moussalli
Pranav Bhandari
Rebecca Szper
Reuven Lax
Rick Smit
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ryan Thompson
Sam Whittle
Sanil Jain
Scott Strong
Shubham Krishna
Steven van Rossum
Svetak Sundhar
Thiago Nunes
Tianyang Hu
Trevor Gevers
Valentyn Tymofieiev
Vitaly Terentyev
Vladislav Chunikhin
Xinyu Liu
Yi Hu
Yichi Zhang

AdalbertMemSQL
agvdndor
andremissaglia
arne-alex
bullet03
camphillips22
capthiron
creste
fab-jul
illoise
kn1kn1
nancyxu123
peridotml
shinannegans
smeet07

Beam 2.43.0 release

18 Nov 00:58
v2.43.0
Compare
Choose a tag to compare

We are happy to present the new 2.43.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.43.0, check out the detailed release notes.

Highlights

  • Python 3.10 support in Apache Beam (#21458).
  • An initial implementation of a runner that allows us to run Beam pipelines on Dask. Try it out and give us feedback! (Python) (#18962).

I/Os

  • Decreased TextSource CPU utilization by 2.3x (Java) (#23193).
  • Fixed bug when using SpannerIO with RuntimeValueProvider options (Java) (#22146).
  • Fixed issue for unicode rendering on WriteToBigQuery (#10785)
  • Remove obsolete variants of BigQuery Read and Write, always using Beam-native variant
    (#23564 and #23559).
  • Bumped google-cloud-spanner dependency version to 3.x for Python SDK (#21198).

New Features / Improvements

  • Dataframe wrapper added in Go SDK via Cross-Language (with automatic expansion service). (Go) (#23384).
  • Name all Java threads to aid in debugging (#23049).
  • An initial implementation of a runner that allows us to run Beam pipelines on Dask. (Python) (#18962).
  • Allow configuring GCP OAuth scopes via pipeline options. This unblocks usages of Beam IOs that require additional scopes.
    For example, this feature makes it possible to access Google Drive backed tables in BigQuery (#23290).
  • An example for using Python RunInference from Java (#23290).

Breaking Changes

  • CoGroupByKey transform in Python SDK has changed the output typehint. The typehint component representing grouped values changed from List to Iterable,
    which more accurately reflects the nature of the arbitrarily large output collection. #21556 Beam users may see an error on transforms downstream from CoGroupByKey. Users must change methods expecting a List to expect an Iterable going forward. See document for information and fixes.
  • The PortableRunner for Spark assumes Spark 3 as default Spark major version unless configured otherwise using --spark_version.
    Spark 2 support is deprecated and will be removed soon (#23728).

Bugfixes

  • Fixed Python cross-language JDBC IO Connector cannot read or write rows containing Numeric/Decimal type values (#19817).

List of Contributors

According to git shortlog, the following people contributed to the 2.43.0 release. Thank you to all contributors!

Ahmed Abualsaud
AlexZMLyu
Alexey Romanenko
Anand Inguva
Andrew Pilloud
Andy Ye
Arnout Engelen
Benjamin Gonzalez
Bharath Kumarasubramanian
BjornPrime
Brian Hulette
Bruno Volpato
Chamikara Jayalath
Colin Versteeg
Damon
Daniel Smilkov
Daniela Martín
Danny McCormick
Darkhan Nausharipov
David Huntsperger
Denis Pyshev
Dmitry Repin
Evan Galpin
Evgeny Antyshev
Fernando Morales
Geddy05
Harshit Mehrotra
Iñigo San Jose Visiers
Ismaël Mejía
Israel Herraiz
Jan Lukavský
Juta Staes
Kanishk Karanawat
Kenneth Knowles
KevinGG
Kiley Sok
Liam Miller-Cushon
Luke Cwik
Mc
Melissa Pashniak
Moritz Mack
Ning Kang
Pablo Estrada
Philippe Moussalli
Pranav Bhandari
Rebecca Szper
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ryan Thompson
Ryohei Nagao
Sam Rohde
Sam Whittle
Sanil Jain
Seunghwan Hong
Shane Hansen
Shubham Krishna
Shunsuke Otani
Steve Niemitz
Steven van Rossum
Svetak Sundhar
Thiago Nunes
Toran Sahu
Veronica Wasson
Vitaly Terentyev
Vladislav Chunikhin
Xinyu Liu
Yi Hu
Yixiao Shen
alexeyinkin
arne-alex
azhurkevich
bulat safiullin
bullet03
coldWater
dpcollins-google
egalpin
johnjcasey
liferoad
rvballada
shaojwu
tvalentyn

What's Changed

Read more

Beam 2.42.0 release

17 Oct 16:22
Compare
Choose a tag to compare

We are happy to present the new 2.42.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.42.0, check out the detailed release notes.

Highlights

  • Added support for stateful DoFns to the Go SDK.

New Features / Improvements

  • Added support for Zstd compression to the Python SDK.
  • Added support for Google Cloud Profiler to the Go SDK.
  • Added support for stateful DoFns to the Go SDK.

Breaking Changes

  • The Go SDK's Row Coder now uses a different single-precision float encoding for float32 types to match Java's behavior (#22629).

Bugfixes

  • Fixed Python cross-language JDBC IO Connector cannot read or write rows containing Timestamp type values 19817.

Known Issues

  • Go SDK doesn't yet support Slowly Changing Side Input pattern (#23106)
  • See a full list of open issues that affect this version.

What's Changed

Read more

Beam 2.41.0 release

23 Aug 19:07
Compare
Choose a tag to compare

We are happy to present the new 2.41.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.41.0, check out the detailed release notes.

I/Os

  • Projection Pushdown optimizer is now on by default for streaming, matching the behavior of batch pipelines since 2.38.0. If you encounter a bug with the optimizer, please file an issue and disable the optimizer using pipeline option --experiments=disable_projection_pushdown.

New Features / Improvements

  • Previously available in Java sdk, Python sdk now also supports logging level overrides per module. (#18222).

Breaking Changes

  • Projection Pushdown optimizer may break Dataflow upgrade compatibility for optimized pipelines when it removes unused fields. If you need to upgrade and encounter a compatibility issue, disable the optimizer using pipeline option --experiments=disable_projection_pushdown.

Deprecations

  • Support for Spark 2.4.x is deprecated and will be dropped with the release of Beam 2.44.0 or soon after (Spark runner) (#22094).
  • The modules amazon-web-services and
    kinesis for AWS Java SDK v1 are deprecated
    in favor of amazon-web-services2
    and will be eventually removed after a few Beam releases (Java) (#21249).

Bugfixes

  • Fixed a condition where retrying queries would yield an incorrect cursor in the Java SDK Firestore Connector (#22089).
  • Fixed plumbing allowed lateness in Go SDK. It was ignoring the user set value earlier and always used to set to 0. (#22474).

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.41.0 release. Thank you to all contributors!

Ahmed Abualsaud
Ahmet Altay
akashorabek
Alexey Inkin
Alexey Romanenko
Anand Inguva
andoni-guzman
Andrew Pilloud
Andrey
Andy Ye
Balázs Németh
Benjamin Gonzalez
BjornPrime
Brian Hulette
bulat safiullin
bullet03
Byron Ellis
Chamikara Jayalath
Damon Douglas
Daniel Oliveira
Daniel Thevessen
Danny McCormick
David Huntsperger
Dheeraj Gharde
Etienne Chauchot
Evan Galpin
Fernando Morales
Heejong Lee
Jack McCluskey
johnjcasey
Kenneth Knowles
Ke Wu
Kiley Sok
Liam Miller-Cushon
Lucas Nogueira
Luke Cwik
MakarkinSAkvelon
Manu Zhang
Minbo Bae
Moritz Mack
Naireen Hussain
Ning Kang
Oleh Borysevych
Pablo Estrada
pablo rodriguez defino
Pranav Bhandari
Rebecca Szper
Red Daly
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ryan Thompson
Sam Whittle
Steven Niemitz
Valentyn Tymofieiev
Vincent Marquez
Vitaly Terentyev
Vlad
Vladislav Chunikhin
Yichi Zhang
Yi Hu
yirutang
Yixiao Shen
Yu Feng

v2.40.0

23 Jun 16:01
Compare
Choose a tag to compare

We are happy to present the new 2.40.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this
release.

For more information on changes in 2.40.0 check out the detailed release notes.

Highlights

  • Added RunInference API, a framework agnostic transform for inference. With this release, PyTorch and Scikit-learn are supported by the transform.
    See also example at apache_beam/examples/inference/pytorch_image_classification.py

I/Os

  • Upgraded to Hive 3.1.3 for HCatalogIO. Users can still provide their own version of Hive. (Java) (Issue-19554).

New Features / Improvements

  • Go SDK users can now use generic registration functions to optimize their DoFn execution. (BEAM-14347)
  • Go SDK users may now write self-checkpointing Splittable DoFns to read from streaming sources. (BEAM-11104)
  • Go SDK textio Reads have been moved to Splittable DoFns exclusively. (BEAM-14489)
  • Pipeline drain support added for Go SDK has now been tested. (BEAM-11106)
  • Go SDK users can now see heap usage, sideinput cache stats, and active process bundle stats in Worker Status. (BEAM-13829)
  • The serialization (pickling) library for Python is dill==0.3.1.1 (BEAM-11167)

Breaking Changes

  • The Go Sdk now requires a minimum version of 1.18 in order to support generics (BEAM-14347).
  • synthetic.SourceConfig field types have changed to int64 from int for better compatibility with Flink's use of Logical types in Schemas (Go) (BEAM-14173)
  • Default coder updated to compress sources used with BoundedSourceAsSDFWrapperFn and UnboundedSourceAsSDFWrapper.

Bugfixes

  • Fixed X (Java/Python) (BEAM-X).
  • Fixed Java expansion service to allow specific files to stage (BEAM-14160).
  • Fixed Elasticsearch connection when using both ssl and username/password (Java) (BEAM-14000)

Detailed list of PRs

Read more

Beam 2.39.0 release

25 May 18:47
Compare
Choose a tag to compare

We are happy to present the new 2.39.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this
release.

For more information on changes in 2.39.0 check out the detailed release notes.

I/Os

  • JmsIO gains the ability to map any kind of input to any subclass of javax.jms.Message (Java) (BEAM-16308).
  • JmsIO introduces the ability to write to dynamic topics (Java) (BEAM-16308).
    • A topicNameMapper must be set to extract the topic name from the input value.
    • A valueMapper must be set to convert the input value to JMS message.
  • Reduce number of threads spawned by BigqueryIO StreamingInserts (
    BEAM-14283).
  • Implemented Apache PulsarIO (BEAM-8218).

New Features / Improvements

  • Support for flink scala 2.12, because most of the libraries support version 2.12 onwards. (beam-14386)
  • 'Manage Clusters' JupyterLab extension added for users to configure usage of Dataproc clusters managed by Interactive Beam (Python) (BEAM-14130).
  • Pipeline drain support added for Go SDK (BEAM-11106). Note: this feature is not yet fully validated and should be treated as experimental in this release.
  • DataFrame.unstack(), DataFrame.pivot() and Series.unstack()
    implemented for DataFrame API (BEAM-13948, BEAM-13966).
  • Support for impersonation credentials added to dataflow runner in the Java and Python SDK (BEAM-14014).
  • Implemented Jupyterlab extension for managing Dataproc clusters (BEAM-14130).
  • ExternalPythonTransform API added for easily invoking Python transforms from
    Java (BEAM-14143).
  • Added Add support for Elasticsearch 8.x (BEAM-14003).
  • Shard aware Kinesis record aggregation (AWS Sdk v2), (BEAM-14104).
  • Upgrade to ZetaSQL 2022.04.1 (BEAM-14348).
  • Fixed ReadFromBigQuery cannot be used with the interactive runner (BEAM-14112).

Breaking Changes

  • Unused functions ShallowCloneParDoPayload(), ShallowCloneSideInput(), and ShallowCloneFunctionSpec() have been removed from the Go SDK's pipelinex package (BEAM-13739).
  • JmsIO requires an explicit valueMapper to be set (BEAM-16308). You can use the TextMessageMapper to convert String inputs to JMS TestMessages:
  JmsIO.<String>write()
        .withConnectionFactory(jmsConnectionFactory)
        .withValueMapper(new TextMessageMapper());
  • Coders in Python are expected to inherit from Coder. (BEAM-14351).
  • New abstract method metadata() added to io.filesystem.FileSystem in the
    Python SDK. (BEAM-14314)

Deprecations

Bugfixes

  • Fixed Java Spanner IO NPE when ProjectID not specified in template executions (Java) (BEAM-14405).
  • Fixed potential NPE in BigQueryServicesImpl.getErrorInfo (Java) (BEAM-14133).

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.39.0 release. Thank you to all contributors!

Ahmed Abualsaud,
Ahmet Altay,
Aizhamal Nurmamat kyzy,
Alexander Zhuravlev,
Alexey Romanenko,
Anand Inguva,
Andrei Gurau,
Andrew Pilloud,
Andy Ye,
Arun Pandian,
Arwin Tio,
Aydar Farrakhov,
Aydar Zainutdinov,
AydarZaynutdinov,
Balázs Németh,
Benjamin Gonzalez,
Brian Hulette,
Buqian Zheng,
Chamikara Jayalath,
Chun Yang,
Daniel Oliveira,
Daniela Martín,
Danny McCormick,
David Huntsperger,
Deepak Nagaraj,
Denise Case,
Esun Kim,
Etienne Chauchot,
Evan Galpin,
Hector Miuler Malpica Gallegos,
Heejong Lee,
Hengfeng Li,
Ilango Rajagopal,
Ilion Beyst,
Israel Herraiz,
Jack McCluskey,
Kamil Bregula,
Kamil Breguła,
Ke Wu,
Kenneth Knowles,
KevinGG,
Kiley,
Kiley Sok,
Kyle Weaver,
Liam Miller-Cushon,
Luke Cwik,
Marco Robles,
Matt Casters,
Michael Li,
MiguelAnzoWizeline,
Milan Patel,
Minbo Bae,
Moritz Mack,
Nick Caballero,
Niel Markwick,
Ning Kang,
Oskar Firlej,
Pablo Estrada,
Pavel Avilov,
Reuven Lax,
Reza Rokni,
Ritesh Ghorse,
Robert Bradshaw,
Robert Burke,
Ryan Thompson,
Sam Whittle,
Steven Niemitz,
Thiago Nunes,
Tomo Suzuki,
Valentyn Tymofieiev,
Victor,
Yi Hu,
Yichi Zhang,
Yiru Tang,
ahmedabu98,
andoni-guzman,
brachipa,
bulat safiullin,
bullet03,
dannymartinm,
daria.malkova,
dpcollins-google,
egalpin,
emily,
fbeevikm,
johnjcasey,
kileys,
msbukal@google.com,
nguyennk92,
pablo rodriguez defino,
rszper,
rvballada,
sachinag,
tvalentyn,
vachan-shetty,
yirutang

Beam 2.38.0 release

20 Apr 23:03
Compare
Choose a tag to compare

We are happy to present the new 2.38.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.38.0 check out the detailed release notes.

I/Os

  • Introduce projection pushdown optimizer to the Java SDK (BEAM-12976). The optimizer currently only works on the BigQuery Storage API, but more I/Os will be added in future releases. If you encounter a bug with the optimizer, please file a JIRA and disable the optimizer using pipeline option --experiments=disable_projection_pushdown.
  • A new IO for Neo4j graph databases was added. (BEAM-1857) It has the ability to update nodes and relationships using UNWIND statements and to read data using cypher statements with parameters.
  • amazon-web-services2 has reached feature parity and is finally recommended over the earlier amazon-web-services and kinesis modules (Java). These will be deprecated in one of the next releases (BEAM-13174).

New Features / Improvements

  • Pipeline dependencies supplied through --requirements_file will now be staged to the runner using binary distributions (wheels) of the PyPI packages for linux_x86_64 platform (BEAM-4032). To restore the behavior to use source distributions, set pipeline option --requirements_cache_only_sources. To skip staging the packages at submission time, set pipeline option --requirements_cache=skip (Python).
  • The Flink runner now supports Flink 1.14.x (BEAM-13106).
  • Interactive Beam now supports remotely executing Flink pipelines on Dataproc (Python) (BEAM-14071).

Breaking Changes

  • (Python) Previously DoFn.infer_output_types was expected to return Iterable[element_type] where element_type is the PCollection elemnt type. It is now expected to return element_type. Take care if you have overriden infer_output_type in a DoFn (this is not common). See BEAM-13860.
  • (amazon-web-services2) The types of awsRegion / endpoint in AwsOptions changed from String to Region / URI (BEAM-13563).

Deprecations

  • Beam 2.38.0 will be the last minor release to support Flink 1.11.
  • (amazon-web-services2) Client providers (withXYZClientProvider()) as well as IO specific RetryConfigurations are deprecated, instead use withClientConfiguration() or AwsOptions to configure AWS IOs / clients.
    Custom implementations of client providers shall be replaced with a respective ClientBuilderFactory and configured through AwsOptions (BEAM-13563).

Bugfixes

  • Fix S3 copy for large objects (Java) (BEAM-14011)
  • Fix quadratic behavior of pipeline canonicalization (Go) (BEAM-14128)
    • This caused unnecessarily long pre-processing times before job submission for large complex pipelines.
  • Fix pyarrow version parsing (Python)(BEAM-14235)

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.38.0 release. Thank you to all contributors!

abhijeet-lele
Ahmet Altay
akustov
Alexander
Alexander Zhuravlev
Alexey Romanenko
AlikRodriguez
Anand Inguva
andoni-guzman
andreukus
Andy Ye
Ankur Goenka
ansh0l
Artur Khanin
Aydar Farrakhov
Aydar Zainutdinov
Benjamin Gonzalez
Brian Hulette
brucearctor
bulat safiullin
bullet03
Carl Mastrangelo
Chamikara Jayalath
Chun Yang
Daniela Martín
Daniel Oliveira
Danny McCormick
daria.malkova
David Cavazos
David Huntsperger
dmitryor
Dmytro Sadovnychyi
dpcollins-google
egalpin
Elias Segundo Antonio
emily
Etienne Chauchot
Hengfeng Li
Ismaël Mejía
Israel Herraiz
Jack McCluskey
Jakub Kukul
Janek Bevendorff
Jeff Klukas
Johan Sternby
Kamil Breguła
Kenneth Knowles
Ke Wu
Kiley
Kyle Weaver
laraschmidt
Lara Schmidt
LE QUELLEC Olivier
Luka Kalinovcic
Luke Cwik
Marcin Kuthan
masahitojp
Masato Nakamura
Matt Casters
Melissa Pashniak
Michael Li
Miguel Hernandez
Moritz Mack
mosche
nancyxu123
Nathan J Mehl
Niel Markwick
Ning Kang
Pablo Estrada
paul-tlh
Pavel Avilov
Rahul Iyer
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ryan Skraba
Ryan Thompson
Sam Whittle
Seth Vargo
sp029619
Steven Niemitz
Thiago Nunes
Udi Meiri
Valentyn Tymofieiev
Victor
vitaly.terentyev
Yichi Zhang
Yi Hu
yirutang
Zachary Houfek
Zoe