Skip to content

Releases: apache/beam

Beam 2.37.0 release

04 Mar 19:21
Compare
Choose a tag to compare

We are happy to present the new 2.37.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.37.0 check out the detailed release notes.

Highlights

  • Java 17 support for Dataflow (BEAM-12240).
    • Users using Dataflow Runner V2 may see issues with state cache due to inaccurate object sizes (BEAM-13695).
    • ZetaSql is currently unsupported (issue).
  • Python 3.9 support in Apache Beam (BEAM-12000).
    • Dataflow support for Python 3.9 is expected to be available with 2.37.0,
      but may not be fully available yet when the release is announced (BEAM-13864).
    • Users of Dataflow Runner V2 can run Python 3.9 pipelines with 2.37.0 release right away.

I/Os

  • Go SDK now has wrappers for the following Cross Language Transforms from Java, along with automatic expansion service startup for each.

New Features / Improvements

  • DataFrame API now supports pandas 1.4.x (BEAM-13605).
  • Go SDK DoFns can now observe trigger panes directly (BEAM-13757).

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.37.0 release. Thank you to all contributors!

Aizhamal Nurmamat kyzy
Alexander
Alexander Chermenin
Alexandr Zhuravlev
Alexey Romanenko
Anand Inguva
andoni-guzman
andreukus
Andy Ye
Artur Khanin
Aydar Farrakhov
Aydar Zainutdinov
AydarZaynutdinov
Benjamin Gonzalez
Brian Hulette
Chamikara Jayalath
Daniel Oliveira
Danny McCormick
daria-malkova
daria.malkova
darshan-sj
David Huntsperger
dprieto91
emily
Etienne Chauchot
Fernando Morales
Heejong Lee
Ismaël Mejía
Jack McCluskey
Jan Lukavský
johnjcasey
Kamil Breguła
kellen
Kenneth Knowles
kileys
Kyle Weaver
Luke Cwik
Marcin Kuthan
Marco Robles
Matt Rudary
Miguel Hernandez
Milena Bukal
Moritz Mack
Mostafa Aghajani
Ning Kang
Pablo Estrada
Pavel Avilov
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Sam Whittle
Sandy Chapman
Sergey Kalinin
Thiago Nunes
thorbjorn444
Tim Robertson
Tomo Suzuki
Valentyn Tymofieiev
Victor
Victor Chen
Vitaly Ivanov
Yichi Zhang

Beam 2.36.0 release

08 Feb 00:08
Compare
Choose a tag to compare

We are happy to present the new 2.36.0 release of Apache Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.36.0, check out the detailed release
notes
.

I/Os

  • Support for stopReadTime on KafkaIO SDF (Java).(BEAM-13171).

New Features / Improvements

  • Added support for cloudpickle as a pickling library for Python SDK (BEAM-8123). To use cloudpickle, set pipeline option: --pickler_lib=cloudpickle
  • Added option to specify triggering frequency when streaming to BigQuery (Python) (BEAM-12865).
  • Added option to enable caching uploaded artifacts across job runs for Python Dataflow jobs (BEAM-13459). To enable, set pipeline option: --enable_artifact_caching, this will be enabled by default in a future release.

Breaking Changes

  • Updated the jedis from 3.x to 4.x to Java RedisIO. If you are using RedisIO and using jedis directly, please refer to this page to update it. (BEAM-12092).
  • Datatype of timestamp fields in SqsMessage for AWS IOs for SDK v2 was changed from String to long, visibility of all fields was fixed from package private to public BEAM-13638.
  • Properly check output timestamps on elements output from DoFns, timers, and onWindowExpiration in Java BEAM-12931.
  • Fixed a bug with DeferredDataFrame.xs when used with a non-tuple key
    (BEAM-13421).

Known Issues

  • Users may encounter an unexpected java.lang.ArithmeticException when outputting a timestamp
    for an element further than allowedSkew from an allowed DoFN skew set to a value more than
    Integer.MAX_VALUE.
  • See a full list of open issues that affect this version.

List of Contributors

According to git shortlog, the following people contributed to the 2.36.0 release. Thank you to all contributors!

Ada Wong
Ahmet Altay
Alexander
Alexander Dahl
Alexandr Zhuravlev
Alexey Romanenko
AlikRodriguez
Anand Inguva
Andrew Pilloud
Andy Ye
Arkadiusz Gasiński
Artur Khanin
Arun Pandian
Aydar Farrakhov
Aydar Zainutdinov
AydarZaynutdinov
Benjamin Gonzalez
Brian Hulette
Chamikara Jayalath
Daniel Collins
Daniel Oliveira
Daniel Thevessen
Daniela Martín
David Hinkes
David Huntsperger
Emily Ye
Etienne Chauchot
Evan Galpin
Heejong Lee
Ilya
Ilya Kozyrev
In-Ho Yi
Jack McCluskey
Janek Bevendorff
Jarek Potiuk
Ke Wu
KevinGG
Kyle Hersey
Kyle Weaver
Luís Bianchin
Luke Cwik
Masato Nakamura
Matthias Baetens
Mehdi Drissi
Melissa Pashniak
Michel Davit
Miguel Hernandez
MiguelAnzoWizeline
Milena Bukal
Moritz Mack
Mostafa Aghajani
Nathan J Mehl
Niel Markwick
Ning Kang
Pablo Estrada
Pavel Avilov
Quentin Sommer
Reuben van Ammers
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ryan Thompson
Sam Whittle
Sayat
Sergei Lebedev
Sergey Kalinin
Steve Niemitz
Talat Uyarer
Thiago Nunes
Tianyang Hu
Tim Robertson
Valentyn Tymofieiev
Vitaly Ivanov
Yichi Zhang
Yiru Tang
Yu Feng
Yu ISHIKAWA
Zachary Houfek
blais
daria-malkova
daria.malkova
darshan-sj
dpcollins-google
emily
ewianda
johnjcasey
kileys
lam206
laraschmidt
mosche
msbukal@google.com
tvalentyn

Beam 2.35.0 release

30 Dec 02:00
Compare
Choose a tag to compare

We are happy to present the new 2.35.0 release of Apache Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.35.0, check out the detailed release
notes
.

Highlights

  • MultiMap side inputs are now supported by the Go SDK (BEAM-3293).
  • Side inputs are supported within Splittable DoFns for Dataflow Runner V1 and Dataflow Runner V2. (BEAM-12522).
  • Upgrades Log4j version used in test suites (Apache Beam testing environment only, not for end user consumption) to 2.17.0(BEAM-13434).
    Note that Apache Beam versions do not depend on the Log4j 2 dependency (log4j-core) impacted by CVE-2021-44228.
    However we urge users to update direct and indirect dependencies (if any) on Log4j 2 to the latest version by updating their build configuration and redeploying impacted pipelines.

I/Os

  • We changed the data type for ranges in JdbcIO.readWithPartitions from int to long (BEAM-13149).
    This is a relatively minor breaking change, which we're implementing to improve the usability of the transform without increasing cruft.
    This transform is relatively new, so we may implement other breaking changes in the future to improve its usability.
  • Side inputs are supported within Splittable DoFns for Dataflow Runner V1 and Dataflow Runner V2. (BEAM-12522).

New Features / Improvements

  • Added custom delimiters to Python TextIO reads (BEAM-12730).
  • Added escapechar parameter to Python TextIO reads (BEAM-13189).
  • Splittable reading is enabled by default while reading data with ParquetIO (BEAM-12070).
  • DoFn Execution Time metrics added to Go (BEAM-13001).
  • Cross-bundle side input caching is now available in the Go SDK for runners that support the feature by setting the EnableSideInputCache hook (BEAM-11097).
  • Upgraded the GCP Libraries BOM version to 24.0.0 and associated dependencies (BEAM-11205). For Google Cloud client library versions set by this BOM,
    see this table.
  • Removed avro-python3 dependency in AvroIO. Fastavro has already been our Avro library of choice on Python 3. Boolean use_fastavro is left for api compatibility, but will have no effect.(BEAM-13016).
  • MultiMap side inputs are now supported by the Go SDK (BEAM-3293).
  • Remote packages can now be downloaded from locations supported by apache_beam.io.filesystems. The files will be downloaded on Stager and uploaded to staging location. For more information, see BEAM-11275

Breaking Changes

  • A new URN convention was adopted for cross-language transforms and existing URNs were updated. This may break advanced use-cases, for example, if a custom expansion service is used to connect diffrent Beam Java and Python versions. (BEAM-12047).
  • The upgrade to Calcite 1.28.0 introduces a breaking change in the SUBSTRING function in SqlTransform, when used with the Calcite dialect (BEAM-13099, CALCITE-4427).
  • ListShards (with DescribeStreamSummary) is used instead of DescribeStream to list shards in Kinesis streams (AWS SDK v2). Due to this change, as mentioned in AWS documentation, for fine-grained IAM policies it is required to update them to allow calls to ListShards and DescribeStreamSummary APIs. For more information, see Controlling Access to Amazon Kinesis Data Streams (BEAM-13233).

Deprecations

  • Non-splittable reading is deprecated while reading data with ParquetIO (BEAM-12070).

Bugfixes

  • Properly map main input windows to side input windows by default (Go)
    (BEAM-11087).
  • Fixed data loss when writing to DynamoDB without setting deduplication key names (Java)
    (BEAM-13009).
  • Go SDK Examples now have types and functions registered. (Go) (BEAM-5378)

Known Issues

  • Users of beam-sdks-java-io-hcatalog (and beam-sdks-java-extensions-sql-hcatalog) must take care to override the transitive log4j dependency when they add a hive dependency (BEAM-13499).

List of Contributors

According to git shortlog, the following people contributed to the 2.35.0 release. Thank you to all contributors!

Ahmet Altay
Alexandr Zhuravlev
Alexey Romanenko
AlikRodriguez
Anand Inguva
Andrew Pilloud
Ankur Goenka
Anthony Sottile
Artur Khanin
Aydar Farrakhov
Aydar Zainutdinov
Benjamin Gonzalez
brachipa
Brian Hulette
Calvin Leung
Chamikara Jayalath
Chris Gray
Damon Douglas
Daniel Collins
Daniel Oliveira
daria.malkova
darshan-sj
David Huntsperger
Dmitrii Kuzin
dpcollins-google
dprieto
egalpin
Etienne Chauchot
Eugene Nikolaiev
Fernando Morales
Hector Lagos
Heejong Lee
Ilya Kozyrev
Iñigo San Jose Visiers
Jack McCluskey
Jiayang Wu
jrhy
Kenneth Knowles
KevinGG
kileys
klmilam
Kyle Weaver
Luís Bianchin
Luke Cwik
Melissa Pashniak
Michael Luckey
Miguel Hernandez
Milena Bukal
Minbo Bae
minherz
Moritz Mack
mosche
Natalie
Ning Kang
Pablo Estrada
Pavel Avilov
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Rogan Morrow
Ruslan Altynnikov
Sam Whittle
Sergey Kalinin
Slava Chernyak
Svetak Sundhar
Tianyang Hu
Tim Robertson
Tomo Suzuki
tuorhador
Udi Meiri
vachan-shetty
Valentyn Tymofieiev
Yichi Zhang
zhoufek

Beam 2.34.0 release

11 Nov 20:03
Compare
Choose a tag to compare

We are happy to present the new 2.34.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.34.0, check out the detailed release
notes
.

Highlights

  • The Beam Java API for Calcite SqlTransform is no longer experimental (BEAM-12680).
  • Python's ParDo (Map, FlatMap, etc.) transforms now suport a with_exception_handling option for easily ignoring bad records and implementing the dead letter pattern.

I/Os

  • ReadFromBigQuery and ReadAllFromBigQuery now run queries with BATCH priority by default. The query_priority parameter is introduced to the same transforms to allow configuring the query priority (Python) (BEAM-12913).
  • [EXPERIMENTAL] Support for BigQuery Storage Read API added to ReadFromBigQuery. The newly introduced method parameter can be set as DIRECT_READ to use the Storage Read API. The default is EXPORT which invokes a BigQuery export request. (Python) (BEAM-10917).
  • [EXPERIMENTAL] Added use_native_datetime parameter to ReadFromBigQuery to configure the return type of DATETIME fields when using ReadFromBigQuery. This parameter can only be used when method = DIRECT_READ(Python) (BEAM-10917).

New Features / Improvements

  • Upgrade to Calcite 1.26.0 (BEAM-9379).
  • Added a new dataframe extra to the Python SDK that tracks pandas versions
    we've verified compatibility with. We now recommend installing Beam with pip install apache-beam[dataframe] when you intend to use the DataFrame API
    (BEAM-12906).
  • Add an example of deploying Python Apache Beam job with Spark Cluster

Breaking Changes

  • SQL Rows are no longer flattened (BEAM-5505).
  • [Go SDK] beam.TryCrossLanguage's signature now matches beam.CrossLanguage. Like other Try functions it returns an error instead of panicking. (BEAM-9918).
  • BEAM-12925 was fixed. It used to silently pass incorrect null data read from JdbcIO. Pipelines affected by this will now start throwing failures instead of silently passing incorrect data.

Bugfixes

  • Fixed error while writing multiple DeferredFrames to csv (Python) (BEAM-12701).
  • Fixed error when importing the DataFrame API with pandas 1.0.x installed (BEAM-12945).
  • Fixed top.SmallestPerKey implementation in the Go SDK (BEAM-12946).

List of Contributors

According to git shortlog, the following people contributed to the 2.34.0 release. Thank you to all contributors!

Ahmet Altay,
Aizhamal Nurmamat kyzy,
Alex Amato,
Alexander Chermenin,
Alexey Romanenko,
AlikRodriguez,
Andrew Pilloud,
Andy Xu,
Ankur Goenka,
Aydar Farrakhov,
Aydar Zainutdinov,
Aydar Zaynutdinov,
AydarZaynutdinov,
Benjamin Gonzalez,
BenWhitehead,
Brachi Packter,
Brian Hulette,
Bu Sun Kim,
Chamikara Jayalath,
Chris Gray,
Chuck Yang,
Chun Yang,
Claire McGinty,
comet,
Daniel Collins,
Daniel Oliveira,
Daniel Thevessen,
daria.malkova,
David Cavazos,
David Huntsperger,
Dmytro Kozhevin,
dpcollins-google,
Eduardo Sánchez López,
Elias Djurfeldt,
emily,
Emily Ye,
Enis Sert,
Etienne Chauchot,
Fernando Morales,
Heejong Lee,
Ihor Indyk,
Ismaël Mejía,
Israel Herraiz,
Jack McCluskey,
Jonathan Hourany,
Judah Rand,
Kenneth Knowles,
KevinGG,
Ke Wu,
kileys,
Kyle Weaver,
Luke Cwik,
masahitojp,
MiguelAnzoWizeline,
Minbo Bae,
Niels Basjes,
Ning Kang,
Pablo Estrada,
pareshsarafmdb,
Paul Féraud,
Piotr Szczepanik,
Reuven Lax,
Ritesh Ghorse,
R. Miles McCain,
Robert Bradshaw,
Robert Burke,
Rogan Morrow,
Ruwan Lambrichts,
rvballada,
Ryan Thompson,
Sam Rohde,
Sam Whittle,
Ștefan Istrate,
Steve Niemitz,
Thomas Li Fredriksen,
Tomo Suzuki,
tvalentyn,
Udi Meiri,
Vachan,
Valentyn Tymofieiev,
Vincent Marquez,
WinsonT,
Yichi Zhang,
Yifan Mai,
Yilei "Dolee" Yang,
zhoufek

Beam 2.33.0 release

12 Oct 16:53
Compare
Choose a tag to compare

We are happy to present the new 2.33.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.33.0, check out the detailed release
notes
.

Highlights

  • Go SDK is no longer experimental, and is officially part of the Beam release process.
    • Matching Go SDK containers are published on release.
    • Batch usage is well supported, and tested on Flink, Spark, and the Python Portable Runner.
      • SDK Tests are also run against Google Cloud Dataflow, but this doesn't indicate reciprocal support.
    • The SDK supports Splittable DoFns, Cross Language transforms, and most Beam Model basics.
    • Go Modules are now used for dependency management.
      • This is a breaking change, see Breaking Changes for resolution.
      • Easier path to contribute to the Go SDK, no need to set up a GO_PATH.
      • Minimum Go version is now Go v1.16
    • See the announcement blogpost for full information once published.

New Features / Improvements

  • Projection pushdown in SchemaIO (BEAM-12609).
  • Upgrade Flink runner to Flink versions 1.13.2, 1.12.5 and 1.11.4 (BEAM-10955).

Breaking Changes

  • Since release 2.30.0, "The AvroCoder changes for BEAM-2303 [changed] the reader/writer from the Avro ReflectDatum* classes to the SpecificDatum* classes" (Java). This default behavior change has been reverted in this release. Use the useReflectApi setting to control it (BEAM-12628).

Deprecations

  • Python GBK will stop supporting unbounded PCollections that have global windowing and a default trigger in Beam 2.34. This can be overriden with --allow_unsafe_triggers. (BEAM-9487).
  • Python GBK will start requiring safe triggers or the --allow_unsafe_triggers flag starting with Beam 2.34. (BEAM-9487).

Bugfixes

  • UnsupportedOperationException when reading from BigQuery tables and converting
    TableRows to Beam Rows (Java)
    (BEAM-12479).
  • SDFBoundedSourceReader behaves much slower compared with the original behavior
    of BoundedSource (Python)
    (BEAM-12781).
  • ORDER BY column not in SELECT crashes (ZetaSQL)
    (BEAM-12759).

Known Issues

  • Spark 2.x users will need to update Spark's Jackson runtime dependencies (spark.jackson.version) to at least version 2.9.2, due to Beam updating its dependencies.
  • See a full list of open issues that affect this version.
  • Go SDK jobs may produce "Failed to deduce Step from MonitoringInfo" messages following successful job execution. The messages are benign and don't indicate job failure. These are due to not yet handling PCollection metrics.

List of Contributors

According to git shortlog, the following people contributed to the 2.33.0 release. Thank you to all contributors!

Ahmet Altay,
Alex Amato,
Alexey Romanenko,
Andreas Bergmeier,
Andres Rodriguez,
Andrew Pilloud,
Andy Xu,
Ankur Goenka,
anthonyqzhu,
Benjamin Gonzalez,
Bhupinder Sindhwani,
Chamikara Jayalath,
Claire McGinty,
Daniel Mateus Pires,
Daniel Oliveira,
David Huntsperger,
Dylan Hercher,
emily,
Emily Ye,
Etienne Chauchot,
Eugene Nikolaiev,
Heejong Lee,
iindyk,
Iñigo San Jose Visiers,
Ismaël Mejía,
Jack McCluskey,
Jan Lukavský,
Jeff Ruane,
Jeremy Lewi,
KevinGG,
Ke Wu,
Kyle Weaver,
lostluck,
Luke Cwik,
Marwan Tammam,
masahitojp,
Mehdi Drissi,
Minbo Bae,
Ning Kang,
Pablo Estrada,
Pascal Gillet,
Pawas Chhokra,
Reuven Lax,
Ritesh Ghorse,
Robert Bradshaw,
Robert Burke,
Rodrigo Benenson,
Ryan Thompson,
Saksham Gupta,
Sam Rohde,
Sam Whittle,
Sayat,
Sayat Satybaldiyev,
Siyuan Chen,
Slava Chernyak,
Steve Niemitz,
Steven Niemitz,
tvalentyn,
Tyson Hamilton,
Udi Meiri,
vachan-shetty,
Venkatramani Rajgopal,
Yichi Zhang,
zhoufek

Beam 2.32.0 release

26 Aug 20:48
Compare
Choose a tag to compare

We are happy to present the new 2.32.0 release of Apache Beam. This release includes both improvements and new functionality.

See the download page for this release.

For more information on changes in 2.32.0, check out the

detailed release notes.

Highlights

I/Os

  • Support for X source added (Java/Python) (BEAM-X).
  • Added ability to use JdbcIO.Write.withResults without statement and preparedStatementSetter. (BEAM-12511)
  • Added ability to register URI schemes to use the S3 protocol via FileIO. (BEAM-12435).
  • Respect number of shards set in SnowflakeWrite batch mode. (BEAM-12715)
  • Java SDK: Update Google Cloud Healthcare IO connectors from using v1beta1 to using the GA version.

New Features / Improvements

  • Add support to convert Beam Schema to Avro Schema for JDBC LogicalTypes:
    VARCHAR, NVARCHAR, LONGVARCHAR, LONGNVARCHAR, DATE, TIME
    (Java)(BEAM-12385).
  • Reading from JDBC source by partitions (Java) (BEAM-12456).
  • PubsubIO can now write to a dead-letter topic after a parsing error (Java)(BEAM-12474).
  • New append-only option for Elasticsearch sink (Java) BEAM-12601

Breaking Changes

Deprecations

Known Issues

  • Fixed race condition in RabbitMqIO causing duplicate acks (Java) (BEAM-6516))

List of Contributors

According to git shortlog, the following people contributed to the 2.32.0 release. Thank you to all contributors!

Ahmet Altay, Ajo Thomas, Alex Amato, Alexey Romanenko, Alex Koay, allenpradeep, Anant Damle, Andrew Pilloud, Ankur Goenka, Ashwin Ramaswami, Benjamin Gonzalez, BenWhitehead, Blake Williams, Boyuan Zhang, Brian Hulette, Chamikara Jayalath, Daniel Oliveira, Daniel Thevessen, daria-malkova, David Cavazos, David Huntsperger, dennisylyung, Dennis Yung, dmkozh, egalpin, emily, Esun Kim, Gabriel Melo de Paula, Harch Vardhan, Heejong Lee, heidimhurst, hoshimura, Iñigo San Jose Visiers, Ismaël Mejía, Jack McCluskey, Jan Lukavský, Justin King, Kenneth Knowles, KevinGG, Ke Wu, kileys, Kyle Weaver, Luke Cwik, Maksym Skorupskyi, masahitojp, Matthew Ouyang, Matthias Baetens, Matt Rudary, MiguelAnzoWizeline, Miguel Hernandez, Nikita Petunin, Ning Ding, Ning Kang, odidev, Pablo Estrada, Pascal Gillet, rafal.ochyra, raphael.sanamyan, Reuven Lax, Robert Bradshaw, Robert Burke, roger-mike, Ryan McDowell, Sam Rohde, Sam Whittle, Siyuan Chen, Teng Qiu, Tianzi Cai, Tobias Hermann, Tomo Suzuki, tvalentyn, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Vitaly Terentyev, Yichi Zhang, Yifan Mai, yoshiki.obata, Yu Feng, YuqiHuai, yzhang559, Zachary Houfek, zhoufek

Beam 2.31.0 release

08 Jul 17:30
v2.31.0
Compare
Choose a tag to compare

We are happy to present the new 2.31.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.31.0, check out the detailed release notes.

Highlights

I/Os

  • Fixed bug in ReadFromBigQuery when a RuntimeValueProvider is used as value of table argument (Python) (BEAM-12514).

New Features / Improvements

  • CREATE FUNCTION DDL statement added to Calcite SQL syntax. JAR and AGGREGATE are now reserved keywords. (BEAM-12339).
  • Flink 1.13 is now supported by the Flink runner (BEAM-12277).
  • DatastoreIO: Write and delete operations now follow automatic gradual ramp-up,
    in line with best practices (Java/Python) (BEAM-12260, BEAM-12272).
  • Python TriggerFn has a new may_lose_data method to signal potential data loss. Default behavior assumes safe (necessary for backwards compatibility). See Deprecations for potential impact of overriding this. (BEAM-9487).

Breaking Changes

  • Python Row objects are now sensitive to field order. So Row(x=3, y=4) is no
    longer considered equal to Row(y=4, x=3) (BEAM-11929).
  • Kafka Beam SQL tables now ascribe meaning to the LOCATION field; previously
    it was ignored if provided.
  • TopCombineFn disallow compare as its argument (Python) (BEAM-7372).
  • Drop support for Flink 1.10 (BEAM-12281).

Deprecations

  • Python GBK will stop supporting unbounded PCollections that have global windowing and a default trigger in Beam 2.33. This can be overriden with --allow_unsafe_triggers. (BEAM-9487).
  • Python GBK will start requiring safe triggers or the --allow_unsafe_triggers flag starting with Beam 2.33. (BEAM-9487).

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.31.0 release. Thank you to all contributors!

Ahmet Altay, ajo thomas, Alan Myrvold, Alex Amato, Alexey Romanenko,
AlikRodriguez, Anant Damle, Andrew Pilloud, Benjamin Gonzalez, Boyuan Zhang,
Brian Hulette, Chamikara Jayalath, Daniel Oliveira, David Cavazos,
David Huntsperger, David Moravek, Dmytro Kozhevin, dpcollins-google, Emily Ye,
Ernesto Valentino, Evan Galpin, Fernando Morales, Heejong Lee, Ismaël Mejía,
Jan Lukavský, Josias Rico, jrynd, Kenneth Knowles, Ke Wu, kileys, Kyle Weaver,
masahitojp, Matthias Baetens, Maximilian Michels, Milena Bukal,
Nathan J. Mehl, Pablo Estrada, Peter Sobot, Reuven Lax, Robert Bradshaw,
Robert Burke, roger-mike, Sam Rohde, Sam Whittle, Stephan Hoyer, Tom Underhill,
tvalentyn, Uday Singh, Udi Meiri, Vitaly Terentyev, Xinyu Liu, Yichi Zhang,
Yifan Mai, yoshiki.obata, zhoufek

Beam 2.30.0 release

09 Jun 21:48
Compare
Choose a tag to compare

We are happy to present the new 2.30.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.30.0, check out the detailed release notes.

Highlights

  • Legacy Read transform (non-SDF based Read) is used by default for non-FnAPI opensource runners. Use use_sdf_read experimental flag to re-enable SDF based Read transforms (BEAM-10670)
  • Upgraded vendored gRPC dependency to 1.36.0 (BEAM-11227)

I/Os

  • Fixed the issue that WriteToBigQuery with batch file loads does not respect schema update options when there are multiple load jobs (BEAM-11277)
  • Fixed the issue that the job didn't properly retry since BigQuery sink swallows HttpErrors when performing streaming inserts (BEAM-12362)

New Features / Improvements

  • Added capability to declare resource hints in Java and Python SDKs (BEAM-2085)
  • Added Spanner IO Performance tests for read and write in Python SDK (BEAM-10029)
  • Added support for accessing GCP PubSub Message ordering keys, message IDs and message publish timestamp in Python SDK (BEAM-7819)
  • DataFrame API: Added support for collecting DataFrame objects in interactive Beam (BEAM-11855)
  • DataFrame API: Added apache_beam.examples.dataframe module (BEAM-12024)
  • Upgraded the GCP Libraries BOM version to 20.0.0 (BEAM-11205). For Google Cloud client library versions set by this BOM, see this table
  • Added sdkContainerImage flag to (eventually) replace workerHarnessContainerImage (BEAM-12212)
  • Added support for Dataflow update when schemas are used (BEAM-12198)
  • Fixed the issue that ZipFiles.zipDirectory leaks native JVM memory (BEAM-12220)
  • Fixed the issue that Reshuffle.withNumBuckets creates (N*2)-1 buckets (BEAM-12361)

Breaking Changes

  • Drop support for Flink 1.8 and 1.9 (BEAM-11948)
  • MongoDbIO: Read.withFilter() and Read.withProjection() are removed since they are deprecated since Beam 2.12.0 (BEAM-12217)
  • RedisIO.readAll() was removed since it was deprecated since Beam 2.13.0. Please use RedisIO.readKeyPatterns() for the equivalent functionality (BEAM-12214)
  • MqttIO.create() with clientId constructor removed because it was deprecated since Beam 2.13.0 (BEAM-12216)

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.30.0 release. Thank you to all contributors!

Ahmet Altay, Alex Amato, Alexey Romanenko, Anant Damle, Andreas Bergmeier, Andrew Pilloud, Ankur Goenka,
Anup D, Artur Khanin, Benjamin Gonzalez, Bipin Upadhyaya, Boyuan Zhang, Brian Hulette, Bulat Shakirzyanov,
Chamikara Jayalath, Chun Yang, Daniel Kulp, Daniel Oliveira, David Cavazos, Elliotte Rusty Harold, Emily Ye,
Eric Roshan-Eisner, Evan Galpin, Fabien Caylus, Fernando Morales, Heejong Lee, Iñigo San Jose Visiers,
Isidro Martínez, Ismaël Mejía, Ke Wu, Kenneth Knowles, KevinGG, Kyle Weaver, Ludovic Post, MATTHEW Ouyang (LCL),
Mackenzie Clark, Masato Nakamura, Matthias Baetens, Max, Nicholas Azar, Ning Kang, Pablo Estrada, Patrick McCaffrey,
Quentin Sommer, Reuven Lax, Robert Bradshaw, Robert Burke, Rui Wang, Sam Rohde, Sam Whittle, Shoaib Zafar,
Siyuan Chen, Sruthi Sree Kumar, Steve Niemitz, Sylvain Veyrié, Tomo Suzuki, Udi Meiri, Valentyn Tymofieiev,
Vitaly Terentyev, Wenbing, Xinyu Liu, Yichi Zhang, Yifan Mai, Yueyang Qiu, Yunqing Zhou, ajo thomas, brucearctor,
dmkozh, dpcollins-google, emily, jordan-moore, kileys, lostluck, masahitojp, roger-mike, sychen, tvalentyn,
vachan-shetty, yoshiki.obata

Beam 2.29.0 Release

10 Feb 06:49
Compare
Choose a tag to compare

NOTE: This version was originally released on 2021-04-29 and added to GitHub releases late.

We are happy to present the new 2.29.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.29.0, check out the detailed release notes.

Highlights

  • Spark Classic and Portable runners officially support Spark 3 (BEAM-7093).
  • Official Java 11 support for most runners (Dataflow, Flink, Spark) (BEAM-2530).
  • DataFrame API now supports GroupBy.apply (BEAM-11628).

I/Os

  • Added support for S3 filesystem on AWS SDK V2 (Java) (BEAM-7637)
  • GCP BigQuery sink (file loads) uses runner determined sharding for unbounded data (BEAM-11772)
  • KafkaIO now recognizes the partition property in writing records (BEAM-11806)
  • Support for Hadoop configuration on ParquetIO (BEAM-11913)

New Features / Improvements

Breaking Changes

  • Deterministic coding enforced for GroupByKey and Stateful DoFns. Previously non-deterministic coding was allowed, resulting in keys not properly being grouped in some cases. (BEAM-11719)
    To restore the old behavior, one can register FakeDeterministicFastPrimitivesCoder with
    beam.coders.registry.register_fallback_coder(beam.coders.coders.FakeDeterministicFastPrimitivesCoder())
    or use the allow_non_deterministic_key_coders pipeline option.

Deprecations

  • Support for Flink 1.8 and 1.9 will be removed in the next release (2.30.0) (BEAM-11948).

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.29.0 release. Thank you to all contributors!

Ahmet Altay, Alan Myrvold, Alex Amato, Alexander Chermenin, Alexey Romanenko,
Allen Pradeep Xavier, Amy Wu, Anant Damle, Andreas Bergmeier, Andrei Balici,
Andrew Pilloud, Andy Xu, Ankur Goenka, Bashir Sadjad, Benjamin Gonzalez, Boyuan
Zhang, Brian Hulette, Chamikara Jayalath, Chinmoy Mandayam, Chuck Yang,
dandy10, Daniel Collins, Daniel Oliveira, David Cavazos, David Huntsperger,
David Moravek, Dmytro Kozhevin, Emily Ye, Esun Kim, Evgeniy Belousov, Filip
Popić, Fokko Driesprong, Gris Cuevas, Heejong Lee, Ihor Indyk, Ismaël Mejía,
Jakub-Sadowski, Jan Lukavský, John Edmonds, Juan Sandoval, 谷口恵輔, Kenneth
Jung, Kenneth Knowles, KevinGG, Kiley Sok, Kyle Weaver, MabelYC, Mackenzie
Clark, Masato Nakamura, Milena Bukal, Miltos, Minbo Bae, Miraç Vuslat Başaran,
mynameborat, Nahian-Al Hasan, Nam Bui, Niel Markwick, Niels Basjes, Ning Kang,
Nir Gazit, Pablo Estrada, Ramazan Yapparov, Raphael Sanamyan, Reuven Lax, Rion
Williams, Robert Bradshaw, Robert Burke, Rui Wang, Sam Rohde, Sam Whittle,
Shehzaad Nakhoda, Shehzaad Nakhoda, Siyuan Chen, Sonam Ramchand, Steve Niemitz,
sychen, Sylvain Veyrié, Tim Robertson, Tobias Kaymak, Tomasz Szerszeń, Tomasz
Szerszeń, Tomo Suzuki, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Yichi
Zhang, Yifan Mai, Yixing Zhang, Yoshiki Obata

Beam 2.28.0 release

22 Feb 21:19
v2.28.0
Compare
Choose a tag to compare

We are happy to present the new 2.28.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.28.0, check out the
detailed release notes.

Highlights

I/Os

  • SpannerIO supports using BigDecimal for Numeric fields (BEAM-11643)
  • Add Beam schema support to ParquetIO (BEAM-11526)
  • Support ParquetTable Writer (BEAM-8202)
  • GCP BigQuery sink (streaming inserts) uses runner determined sharding (BEAM-11408)
  • PubSub support types: TIMESTAMP, DATE, TIME, DATETIME (BEAM-11533)

New Features / Improvements

  • ParquetIO add methods readGenericRecords and readFilesGenericRecords can read files with an unknown schema. See PR-13554 and (BEAM-11460)
  • Added support for thrift in KafkaTableProvider (BEAM-11482)
  • Added support for HadoopFormatIO to skip key/value clone (BEAM-11457)
  • Support Conversion to GenericRecords in Convert.to transform (BEAM-11571).
  • Support writes for Parquet Tables in Beam SQL (BEAM-8202).
  • Support reading Parquet files with unknown schema (BEAM-11460)
  • Support user configurable Hadoop Configuration flags for ParquetIO (BEAM-11527)
  • Expose commit_offset_in_finalize and timestamp_policy to ReadFromKafka (BEAM-11677)
  • S3 options does not provided to boto3 client while using FlinkRunner and Beam worker pool container (BEAM-11799)
  • HDFS not deduplicating identical configuration paths (BEAM-11329)
  • Hash Functions in BeamSQL (BEAM-10074)
  • Create ApproximateDistinct using HLL Impl (BEAM-10324)
  • Add Beam schema support to ParquetIO (BEAM-11526)
  • Add a Deque Encoder (BEAM-11538)
  • Hash functions in ZetaSQL (BEAM-11624)
  • Refactor ParquetTableProvider ()
  • Add JVM properties to JavaJobServer (BEAM-8344)
  • Single source of truth for supported Flink versions ()
  • Use metric for Python BigQuery streaming insert API latency logging (BEAM-11018)
  • Use metric for Java BigQuery streaming insert API latency logging (BEAM-11032)
  • Upgrade Flink runner to Flink versions 1.12.1 and 1.11.3 (BEAM-11697)
  • Upgrade Beam base image to use Tensorflow 2.4.1 (BEAM-11762)
  • Create Beam GCP BOM (BEAM-11665)

Breaking Changes

  • The Java artifacts "beam-sdks-java-io-kinesis", "beam-sdks-java-io-google-cloud-platform", and
    "beam-sdks-java-extensions-sql-zetasql" declare Guava 30.1-jre dependency (It was 25.1-jre in Beam 2.27.0).
    This new Guava version may introduce dependency conflicts if your project or dependencies rely
    on removed APIs. If affected, ensure to use an appropriate Guava version via dependencyManagement in Maven and
    force in Gradle.

List of Contributors

According to git shortlog, the following people contributed to the 2.28.0 release. Thank you to all contributors!

Ahmet Altay, Alex Amato, Alexey Romanenko, Allen Pradeep Xavier, Anant Damle, Artur Khanin,
Boyuan Zhang, Brian Hulette, Chamikara Jayalath, Chris Roth, Costi Ciudatu, Damon Douglas,
Daniel Collins, Daniel Oliveira, David Cavazos, David Huntsperger, Elliotte Rusty Harold,
Emily Ye, Etienne Chauchot, Etta Rapp, Evan Palmer, Eyal, Filip Krakowski, Fokko Driesprong,
Heejong Lee, Ismaël Mejía, janeliulwq, Jan Lukavský, John Edmonds, Jozef Vilcek, Kenneth Knowles
Ke Wu, kileys, Kyle Weaver, MabelYC, masahitojp, Masato Nakamura, Milena Bukal, Miraç Vuslat Başaran,
Nelson Osacky, Niel Markwick, Ning Kang, omarismail94, Pablo Estrada, Piotr Szuberski,
ramazan-yapparov, Reuven Lax, Reza Rokni, rHermes, Robert Bradshaw, Robert Burke, Robert Gruener,
Romster, Rui Wang, Sam Whittle, shehzaadn-vd, Siyuan Chen, Sonam Ramchand, Tobiasz Kędzierski,
Tomo Suzuki, tszerszen, tvalentyn, Tyson Hamilton, Udi Meiri, Xinbin Huang, Yichi Zhang,
Yifan Mai, yoshiki.obata, Yueyang Qiu, Yusaku Matsuki