Skip to content

Releases: apache/beam

Beam 2.27.0 release

08 Jan 20:56
Compare
Choose a tag to compare

We are happy to present the new 2.27.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.27.0, check out the
detailed release notes.

Highlights

  • Java 11 Containers are now published with all Beam releases.
  • There is a new transform ReadAllFromBigQuery that can receive multiple requests to read data from BigQuery at pipeline runtime. See PR 13170, and BEAM-9650.

I/Os

  • ReadFromMongoDB can now be used with MongoDB Atlas (Python) (BEAM-11266.)
  • ReadFromMongoDB/WriteToMongoDB will mask password in display_data (Python) (BEAM-11444.)
  • There is a new transform ReadAllFromBigQuery that can receive multiple requests to read data from BigQuery at pipeline runtime. See PR 13170, and BEAM-9650.

New Features / Improvements

  • Beam modules that depend on Hadoop are now tested for compatibility with Hadoop 3 (BEAM-8569). (Hive/HCatalog pending)
  • Publishing Java 11 SDK container images now supported as part of Apache Beam release process. (BEAM-8106)
  • Added Cloud Bigtable Provider extension to Beam SQL (BEAM-11173, BEAM-11373)
  • Added a schema provider for thrift data (BEAM-11338)
  • Added combiner packing pipeline optimization to Dataflow runner. (BEAM-10641)

Breaking Changes

  • HBaseIO hbase-shaded-client dependency should be now provided by the users (BEAM-9278).
  • --region flag in amazon-web-services2 was replaced by --awsRegion (BEAM-11331).

List of Contributors

According to git shortlog, the following people contributed to the 2.27.0 release. Thank you to all contributors!

Ahmet Altay, Alan Myrvold, Alex Amato, Alexey Romanenko, Aliraza Nagamia, Allen Pradeep Xavier,
Andrew Pilloud, andreyKaparulin, Ashwin Ramaswami, Boyuan Zhang, Brent Worden, Brian Hulette,
Carlos Marin, Chamikara Jayalath, Costi Ciudatu, Damon Douglas, Daniel Collins,
Daniel Oliveira, David Huntsperger, David Lu, David Moravek, David Wrede,
dennis, Dennis Yung, dpcollins-google, Emily Ye, emkornfield,
Esun Kim, Etienne Chauchot, Eugene Nikolaiev, Frank Zhao, Haizhou Zhao,
Hector Acosta, Heejong Lee, Ilya, Iñigo San Jose Visiers, InigoSJ,
Ismaël Mejía, janeliulwq, Jan Lukavský, Kamil Wasilewski, Kenneth Jung,
Kenneth Knowles, Ke Wu, kileys, Kyle Weaver, lostluck,
Matt Casters, Maximilian Michels, Michal Walenia, Mike Dewar, nehsyc,
Nelson Osacky, Niels Basjes, Ning Kang, Pablo Estrada, palmere-google,
Pawel Pasterz, Piotr Szuberski, purbanow, Reuven Lax, rHermes,
Robert Bradshaw, Robert Burke, Rui Wang, Sam Rohde, Sam Whittle,
Siyuan Chen, Tim Robertson, Tobiasz Kędzierski, tszerszen,
Valentyn Tymofieiev, Tyson Hamilton, Udi Meiri, vachan-shetty, Xinyu Liu,
Yichi Zhang, Yifan Mai, yoshiki.obata, Yueyang Qiu

Beam 2.26.0 release

11 Dec 21:24
Compare
Choose a tag to compare

We are happy to present the new 2.26.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.26.0, check out the
detailed release notes.

Highlights

  • Splittable DoFn is now the default for executing the Read transform for Java based runners (Spark with bounded pipelines) in addition to existing runners from the 2.25.0 release (Direct, Flink, Jet, Samza, Twister2). The expected output of the Read transform is unchanged. Users can opt-out using --experiments=use_deprecated_read. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to user@beam.apache.org specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) (BEAM-10670)

I/Os

  • Java BigQuery streaming inserts now have timeouts enabled by default. Pass --HTTPWriteTimeout=0 to revert to the old behavior. (BEAM-6103)

New Features / Improvements

  • Added support for avro payload format in Beam SQL Kafka Table (BEAM-10885)
  • Added support for json payload format in Beam SQL Kafka Table (BEAM-10893)
  • Added support for protobuf payload format in Beam SQL Kafka Table (BEAM-10892)
  • Added support for avro payload format in Beam SQL Pubsub Table (BEAM-5504)
  • Added option to disable unnecessary copying between operators in Flink Runner (Java) (BEAM-11146)
  • Added CombineFn.setup and CombineFn.teardown to Python SDK. These methods let you initialize the CombineFn's state before any of the other methods of the CombineFn is executed and clean that state up later on. If you are using Dataflow, you need to enable Dataflow Runner V2 by passing --experiments=use_runner_v2 before using this feature. (BEAM-3736)

Breaking Changes

  • BigQuery's DATETIME type now maps to Beam logical type org.apache.beam.sdk.schemas.logicaltypes.SqlTypes.DATETIME
  • Pandas 1.x is now required for dataframe operations.

List of Contributors

According to git shortlog, the following people contributed to the 2.26.0 release. Thank you to all contributors!

Abhishek Yadav, AbhiY98, Ahmet Altay, Alan Myrvold, Alex Amato, Alexey Romanenko,
Andrew Pilloud, Ankur Goenka, Boyuan Zhang, Brian Hulette, Chad Dombrova,
Chamikara Jayalath, Curtis "Fjord" Hawthorne, Damon Douglas, dandy10, Daniel Oliveira,
David Cavazos, dennis, Derrick Qin, dpcollins-google, Dylan Hercher, emily, Esun Kim,
Gleb Kanterov, Heejong Lee, Ismaël Mejía, Jan Lukavský, Jean-Baptiste Onofré, Jing,
Jozef Vilcek, Justin White, Kamil Wasilewski, Kenneth Knowles, kileys, Kyle Weaver,
lostluck, Luke Cwik, Mark, Maximilian Michels, Milan Cermak, Mohammad Hossein Sekhavat,
Nelson Osacky, Neville Li, Ning Kang, pabloem, Pablo Estrada, pawelpasterz,
Pawel Pasterz, Piotr Szuberski, PoojaChandak, purbanow, rarokni, Ravi Magham,
Reuben van Ammers, Reuven Lax, Reza Rokni, Robert Bradshaw, Robert Burke,
Romain Manni-Bucau, Rui Wang, rworley-monster, Sam Rohde, Sam Whittle, shollyman,
Simone Primarosa, Siyuan Chen, Steve Niemitz, Steven van Rossum, sychen, Teodor Spæren,
Tim Clemons, Tim Robertson, Tobiasz Kędzierski, tszerszen, Tudor Marian, tvalentyn,
Tyson Hamilton, Udi Meiri, Vasu Gupta, xasm83, Yichi Zhang, yichuan66, Yifan Mai,
yoshiki.obata, Yueyang Qiu, yukihira1992

Beam 2.25.0 release

06 Nov 18:05
Compare
Choose a tag to compare

We are happy to present the new 2.25.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.25.0, check out the
detailed release notes.

Highlights

  • Splittable DoFn is now the default for executing the Read transform for Java based runners (Direct, Flink, Jet, Samza, Twister2). The expected output of the Read transform is unchanged. Users can opt-out using --experiments=use_deprecated_read. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to user@beam.apache.org specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) (BEAM-10670)

I/Os

  • Added cross-language support to Java's KinesisIO, now available in the Python module apache_beam.io.kinesis (BEAM-10138, BEAM-10137).
  • Update Snowflake JDBC dependency for SnowflakeIO (BEAM-10864)
  • Added cross-language support to Java's SnowflakeIO.Write, now available in the Python module apache_beam.io.snowflake (BEAM-9898).
  • Added delete function to Java's ElasticsearchIO#Write. Now, Java's ElasticsearchIO can be used to selectively delete documents using withIsDeleteFn function (BEAM-5757).
  • Java SDK: Added new IO connector for InfluxDB - InfluxDbIO (BEAM-2546).

New Features / Improvements

  • Support for repeatable fields in JSON decoder for ReadFromBigQuery added. (Python) (BEAM-10524)
  • Added an opt-in, performance-driven runtime type checking system for the Python SDK (BEAM-10549).
    More details will be in an upcoming blog post.
  • Added support for Python 3 type annotations on PTransforms using typed PCollections (BEAM-10258).
    More details will be in an upcoming blog post.
  • Improved the Interactive Beam API where recording streaming jobs now start a long running background recording job. Running ib.show() or ib.collect() samples from the recording (BEAM-10603).
  • In Interactive Beam, ib.show() and ib.collect() now have "n" and "duration" as parameters. These mean read only up to "n" elements and up to "duration" seconds of data read from the recording (BEAM-10603).
  • Initial preview of Dataframes support.
    See also example at apache_beam/examples/wordcount_dataframe.py
  • Fixed support for type hints on @ptransform_fn decorators in the Python SDK.
    (BEAM-4091)
    This has not enabled by default to preserve backwards compatibility; use the
    --type_check_additional=ptransform_fn flag to enable. It may be enabled by
    default in future versions of Beam.

Breaking Changes

  • Python 2 and Python 3.5 support dropped (BEAM-10644, BEAM-9372).
  • Pandas 1.x allowed. Older version of Pandas may still be used, but may not be as well tested.

Deprecations

  • Python transform ReadFromSnowflake has been moved from apache_beam.io.external.snowflake to apache_beam.io.snowflake. The previous path will be removed in the future versions.

Known Issues

  • Dataflow streaming timers once against not strictly time ordered when set earlier mid-bundle, as the fix for BEAM-8543 introduced more severe bugs and has been rolled back.
  • Default compressor change breaks dataflow python streaming job update compatibility. Please use python SDK version <= 2.23.0 or > 2.25.0 if job update is critical.(BEAM-11113)

List of Contributors

According to git shortlog, the following people contributed to the 2.25.0 release. Thank you to all contributors!

Ahmet Altay, Alan Myrvold, Aldair Coronel Ruiz, Alexey Romanenko, Andrew Pilloud, Ankur Goenka,
Ayoub ENNASSIRI, Bipin Upadhyaya, Boyuan Zhang, Brian Hulette, Brian Michalski, Chad Dombrova,
Chamikara Jayalath, Damon Douglas, Daniel Oliveira, David Cavazos, David Janicek, Doug Roeper, Eric
Roshan-Eisner, Etta Rapp, Eugene Kirpichov, Filipe Regadas, Heejong Lee, Ihor Indyk, Irvi Firqotul
Aini, Ismaël Mejía, Jan Lukavský, Jayendra, Jiadai Xia, Jithin Sukumar, Jozsef Bartok, Kamil
Gałuszka, Kamil Wasilewski, Kasia Kucharczyk, Kenneth Jung, Kenneth Knowles, Kevin Puthusseri, Kevin
Sijo Puthusseri, KevinGG, Kyle Weaver, Leiyi Zhang, Lourens Naudé, Luke Cwik, Matthew Ouyang,
Maximilian Michels, Michal Walenia, Milan Cermak, Monica Song, Nelson Osacky, Neville Li, Ning Kang,
Pablo Estrada, Piotr Szuberski, Qihang, Rehman, Reuven Lax, Robert Bradshaw, Robert Burke, Rui Wang,
Saavan Nanavati, Sam Bourne, Sam Rohde, Sam Whittle, Sergiy Kolesnikov, Sindy Li, Siyuan Chen, Steve
Niemitz, Terry Xian, Thomas Weise, Tobiasz Kędzierski, Truc Le, Tyson Hamilton, Udi Meiri, Valentyn
Tymofieiev, Yichi Zhang, Yifan Mai, Yueyang Qiu, annaqin418, danielxjd, dennis, dp, fuyuwei,
lostluck, nehsyc, odeshpande, odidev, pulasthi, purbanow, rworley-monster, sclukas77, terryxian78,
tvalentyn, yoshiki.obata

Beam 2.24.0 release

19 Sep 03:39
Compare
Choose a tag to compare

We are happy to present the new 2.24.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.24.0, check out the
detailed release notes.

Highlights

  • Apache Beam 2.24.0 is the last release with Python 2 and Python 3.5
    support.

I/Os

  • New overloads for BigtableIO.Read.withKeyRange() and BigtableIO.Read.withRowFilter()
    methods that take ValueProvider as a parameter (Java) (BEAM-10283).
  • The WriteToBigQuery transform (Python) in Dataflow Batch no longer relies on BigQuerySink by default. It relies on
    a new, fully-featured transform based on file loads into BigQuery. To revert the behavior to the old implementation,
    you may use --experiments=use_legacy_bq_sink.
  • Add cross-language support to Java's JdbcIO, now available in the Python module apache_beam.io.jdbc (BEAM-10135, BEAM-10136).
  • Add support of AWS SDK v2 for KinesisIO.Read (Java) (BEAM-9702).
  • Add streaming support to SnowflakeIO in Java SDK (BEAM-9896)
  • Support reading and writing to Google Healthcare DICOM APIs in Python SDK (BEAM-10601)
  • Add dispositions for SnowflakeIO.write (BEAM-10343)
  • Add cross-language support to SnowflakeIO.Read now available in the Python module apache_beam.io.external.snowflake (BEAM-9897).

New Features / Improvements

  • Shared library for simplifying management of large shared objects added to Python SDK. Example use case is sharing a large TF model object across threads (BEAM-10417).
  • Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle (BEAM-8543).
  • OnTimerContext should not create a new one when processing each element/timer in FnApiDoFnRunner (BEAM-9839)
  • Key should be available in @ontimer methods (Spark Runner) (BEAM-9850)

Breaking Changes

  • WriteToBigQuery transforms now require a GCS location to be provided through either
    custom_gcs_temp_location in the constructor of WriteToBigQuery or the fallback option
    --temp_location, or pass method="STREAMING_INSERTS" to WriteToBigQuery (BEAM-6928).
  • Python SDK now understands typing.FrozenSet type hints, which are not interchangeable with typing.Set. You may need to update your pipelines if type checking fails. (BEAM-10197)

List of Contributors

According to git shortlog, the following people contributed to the 2.24.0 release. Thank you to all contributors!

adesormi, Ahmet Altay, Alex Amato, Alexey Romanenko, Andrew Pilloud, Ashwin Ramaswami, Borzoo,
Boyuan Zhang, Brian Hulette, Brian M, Bu Sun Kim, Chamikara Jayalath, Colm O hEigeartaigh,
Corvin Deboeser, Damian Gadomski, Damon Douglas, Daniel Oliveira, Dariusz Aniszewski,
davidak09, David Cavazos, David Moravek, David Yan, dhodun, Doug Roeper, Emil Hessman, Emily Ye,
Etienne Chauchot, Etta Rapp, Eugene Kirpichov, fuyuwei, Gleb Kanterov,
Harrison Green, Heejong Lee, Henry Suryawirawan, InigoSJ, Ismaël Mejía, Israel Herraiz,
Jacob Ferriero, Jan Lukavský, Jayendra, jfarr, jhnmora000, Jiadai Xia, JIahao wu, Jie Fan,
Jiyong Jung, Julius Almeida, Kamil Gałuszka, Kamil Wasilewski, Kasia Kucharczyk, Kenneth Knowles,
Kevin Puthusseri, Kyle Weaver, Łukasz Gajowy, Luke Cwik, Mark-Zeng, Maximilian Michels,
Michal Walenia, Niel Markwick, Ning Kang, Pablo Estrada, pawel.urbanowicz, Piotr Szuberski,
Rafi Kamal, rarokni, Rehman Murad Ali, Reuben van Ammers, Reuven Lax, Ricardo Bordon,
Robert Bradshaw, Robert Burke, Robin Qiu, Rui Wang, Saavan Nanavati, sabhyankar, Sam Rohde,
Scott Lukas, Siddhartha Thota, Simone Primarosa, Sławomir Andrian,
Steve Niemitz, Tobiasz Kędzierski, Tomo Suzuki, Tyson Hamilton, Udi Meiri,
Valentyn Tymofieiev, viktorjonsson, Xinyu Liu, Yichi Zhang, Yixing Zhang, yoshiki.obata,
Yueyang Qiu, zijiesong

Beam 2.23.0 release

29 Jul 22:24
v2.23.0
Compare
Choose a tag to compare

We are happy to present the new 2.23.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.23.0, check out the
detailed release notes.

Highlights

I/Os

  • Support for reading from Snowflake added (Java) (BEAM-9722).
  • Support for writing to Splunk added (Java) (BEAM-8596).
  • Support for assume role added (Java) (BEAM-10335).
  • A new transform to read from BigQuery has been added: apache_beam.io.gcp.bigquery.ReadFromBigQuery. This transform
    is experimental. It reads data from BigQuery by exporting data to Avro files, and reading those files. It also supports
    reading data by exporting to JSON files. This has small differences in behavior for Time and Date-related fields. See
    Pydoc for more information.
  • Add dispositions for SnowflakeIO.write (BEAM-10343)

New Features / Improvements

  • Update Snowflake JDBC dependency and add application=beam to connection URL (BEAM-10383).

Breaking Changes

  • RowJson.RowJsonDeserializer, JsonToRow, and PubsubJsonTableProvider now accept "implicit
    nulls" by default when deserializing JSON (Java) (BEAM-10220).
    Previously nulls could only be represented with explicit null values, as in
    {"foo": "bar", "baz": null}, whereas an implicit null like {"foo": "bar"} would raise an
    exception. Now both JSON strings will yield the same result by default. This behavior can be
    overridden with RowJson.RowJsonDeserializer#withNullBehavior.
  • Fixed a bug in GroupIntoBatches experimental transform in Python to actually group batches by key.
    This changes the output type for this transform (BEAM-6696).

Deprecations

  • Remove Gearpump runner. (BEAM-9999)
  • Remove Apex runner. (BEAM-9999)
  • RedisIO.readAll() is deprecated and will be removed in 2 versions, users must use RedisIO.readKeyPatterns() as a replacement (BEAM-9747).

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.23.0 release. Thank you to all contributors!

Aaron, Abhishek Yadav, Ahmet Altay, aiyangar, Aizhamal Nurmamat kyzy, Ajo Thomas, Akshay-Iyangar, Alan Pryor, Alex Amato, Alexey Romanenko, Allen Pradeep Xavier, Andrew Crites, Andrew Pilloud, Ankur Goenka, Anna Qin, Ashwin Ramaswami, bntnam, Borzoo Esmailloo, Boyuan Zhang, Brian Hulette, Brian Michalski, brucearctor, Chamikara Jayalath, chi-chi weng, Chuck Yang, Chun Yang, Colm O hEigeartaigh, Corvin Deboeser, Craig Chambers, Damian Gadomski, Damon Douglas, Daniel Oliveira, Dariusz Aniszewski, darshanj, darshan jani, David Cavazos, David Moravek, David Yan, Esun Kim, Etienne Chauchot, Filipe Regadas, fuyuwei, Graeme Morgan, Hannah-Jiang, Harch Vardhan, Heejong Lee, Henry Suryawirawan, InigoSJ, Ismaël Mejía, Israel Herraiz, Jacob Ferriero, Jan Lukavský, Jie Fan, John Mora, Jozef Vilcek, Julien Phalip, Justine Koa, Kamil Gabryjelski, Kamil Wasilewski, Kasia Kucharczyk, Kenneth Jung, Kenneth Knowles, kevingg, Kevin Sijo Puthusseri, kshivvy, Kyle Weaver, Kyoungha Min, Kyungwon Jo, Luke Cwik, Mark Liu, Mark-Zeng, Matthias Baetens, Maximilian Michels, Michal Walenia, Mikhail Gryzykhin, Nam Bui, Nathan Fisher, Niel Markwick, Ning Kang, Omar Ismail, Pablo Estrada, paul fisher, Pawel Pasterz, perkss, Piotr Szuberski, pulasthi, purbanow, Rahul Patwari, Rajat Mittal, Rehman, Rehman Murad Ali, Reuben van Ammers, Reuven Lax, Reza Rokni, Rion Williams, Robert Bradshaw, Robert Burke, Rui Wang, Ruoyun Huang, sabhyankar, Sam Rohde, Sam Whittle, sclukas77, Sebastian Graca, Shoaib Zafar, Sruthi Sree Kumar, Stephen O'Kennedy, Steve Koonce, Steve Niemitz, Steven van Rossum, Ted Romer, Tesio, Thinh Ha, Thomas Weise, Tobias Kaymak, tobiaslieber-cognitedata, Tobiasz Kędzierski, Tomo Suzuki, Tudor Marian, tvs, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Vasu Nori, xuelianhan, Yichi Zhang, Yifan Zou, Yixing Zhang, yoshiki.obata, Yueyang Qiu, Yu Feng, Yuwei Fu, Zhuo Peng, ZijieSong946.

Beam 2.22.0 release

08 Jun 21:16
Compare
Choose a tag to compare

We are happy to present the new 2.22.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release. For more information on changes in 2.22.0, check out the detailed release notes.

I/Os

  • Basic Kafka read/write support for DataflowRunner (Python) (BEAM-8019).
  • Sources and sinks for Google Healthcare APIs (Java)(BEAM-9468).

New Features / Improvements

  • --workerCacheMB flag is supported in Dataflow streaming pipeline (BEAM-9964)
  • --direct_num_workers=0 is supported for FnApi runner. It will set the number of threads/subprocesses to number of cores of the machine executing the pipeline (BEAM-9443).
  • Python SDK now has experimental support for SqlTransform (BEAM-8603).
  • Add OnWindowExpiration method to Stateful DoFn (BEAM-1589).
  • Added PTransforms for Google Cloud DLP (Data Loss Prevention) services integration (BEAM-9723):
    • Inspection of data,
    • Deidentification of data,
    • Reidentification of data.
  • Add a more complete I/O support matrix in the documentation site (BEAM-9916).
  • Upgrade Sphinx to 3.0.3 for building PyDoc.
  • Added a PTransform for image annotation using Google Cloud AI image processing service
    (BEAM-9646)

Breaking Changes

  • The Python SDK now requires --job_endpoint to be set when using --runner=PortableRunner (BEAM-9860). Users seeking the old default behavior should set --runner=FlinkRunner instead.

List of Contributors

According to git shortlog, the following people contributed to the 2.22.0 release. Thank you to all contributors!

Ahmet Altay, aiyangar, Ajo Thomas, Akshay-Iyangar, Alan Pryor, Alexey Romanenko, Allen Pradeep Xavier, amaliujia, Andrew Pilloud, Ankur Goenka, Ashwin Ramaswami, bntnam, Borzoo Esmailloo, Boyuan Zhang, Brian Hulette, Chamikara Jayalath, Colm O hEigeartaigh, Craig Chambers, Damon Douglas, Daniel Oliveira, David Cavazos, David Moravek, Esun Kim, Etienne Chauchot, Filipe Regadas, Graeme Morgan, Hannah Jiang, Hannah-Jiang, Harch Vardhan, Heejong Lee, Henry Suryawirawan, Ismaël Mejía, Israel Herraiz, Jacob Ferriero, Jan Lukavský, John Mora, Kamil Wasilewski, Kenneth Jung, Kenneth Knowles, kevingg, Kyle Weaver, Kyoungha Min, Kyungwon Jo, Luke Cwik, Mark Liu, Matthias Baetens, Maximilian Michels, Michal Walenia, Mikhail Gryzykhin, Nam Bui, Niel Markwick, Ning Kang, Omar Ismail, omarismail94, Pablo Estrada, paul fisher, pawelpasterz, Pawel Pasterz, Piotr Szuberski, Rahul Patwari, rarokni, Rehman, Rehman Murad Ali, Reuven Lax, Robert Bradshaw, Robert Burke, Rui Wang, Ruoyun Huang, Sam Rohde, Sam Whittle, Sebastian Graca, Shoaib Zafar, Sruthi Sree Kumar, Stephen O'Kennedy, Steve Koonce, Steve Niemitz, Steven van Rossum, Tesio, Thomas Weise, tobiaslieber-cognitedata, Tomo Suzuki, Tudor Marian, tvalentyn, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Vasu Nori, xuelianhan, Yichi Zhang, Yifan Zou, yoshiki.obata, Yueyang Qiu, Zhuo Peng

Beam 2.21.0 release

01 Jun 19:17
e859735
Compare
Choose a tag to compare

We are happy to present the new 2.21.0 release of Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.21.0, check out the
detailed release notes.

I/Os

  • Python: Deprecated module apache_beam.io.gcp.datastore.v1 has been removed
    as the client it uses is out of date and does not support Python 3
    (BEAM-9529).
    Please migrate your code to use
    apache_beam.io.gcp.datastore.v1new.
    See the updated
    datastore_wordcount
    for example usage.
  • Python SDK: Added integration tests and updated batch write functionality for Google Cloud Spanner transform (BEAM-8949).

New Features / Improvements

  • Python SDK will now use Python 3 type annotations as pipeline type hints.
    (#10717)

    If you suspect that this feature is causing your pipeline to fail, calling
    apache_beam.typehints.disable_type_annotations() before pipeline creation
    will disable is completely, and decorating specific functions (such as
    process()) with @apache_beam.typehints.no_annotations will disable it
    for that function.

    More details will be in
    Ensuring Python Type Safety
    and an upcoming
    blog post.

  • Java SDK: Introducing the concept of options in Beam Schema’s. These options add extra
    context to fields and schemas. This replaces the current Beam metadata that is present
    in a FieldType only, options are available in fields and row schemas. Schema options are
    fully typed and can contain complex rows. Remark: Schema aware is still experimental.
    (BEAM-9035)

  • Java SDK: The protobuf extension is fully schema aware and also includes protobuf option
    conversion to beam schema options. Remark: Schema aware is still experimental.
    (BEAM-9044)

  • Added ability to write to BigQuery via Avro file loads (Python) (BEAM-8841)

    By default, file loads will be done using JSON, but it is possible to
    specify the temp_file_format parameter to perform file exports with AVRO.
    AVRO-based file loads work by exporting Python types into Avro types, so
    to switch to Avro-based loads, you will need to change your data types
    from Json-compatible types (string-type dates and timestamp, long numeric
    values as strings) into Python native types that are written to Avro
    (Python's date, datetime types, decimal, etc). For more information
    see https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions.

  • Added integration of Java SDK with Google Cloud AI VideoIntelligence service
    (BEAM-9147)

  • Added integration of Java SDK with Google Cloud AI natural language processing API
    (BEAM-9634)

  • docker-pull-licenses tag was introduced. Licenses/notices of third party dependencies will be added to the docker images when docker-pull-licenses was set.
    The files are added to /opt/apache/beam/third_party_licenses/.
    By default, no licenses/notices are added to the docker images. (BEAM-9136)

Breaking Changes

  • Dataflow runner now requires the --region option to be set, unless a default value is set in the environment (BEAM-9199). See here for more details.
  • HBaseIO.ReadAll now requires a PCollection of HBaseIO.Read objects instead of HBaseQuery objects (BEAM-9279).
  • ProcessContext.updateWatermark has been removed in favor of using a WatermarkEstimator (BEAM-9430).
  • Coder inference for PCollection of Row objects has been disabled (BEAM-9569).
  • Go SDK docker images are no longer released until further notice.

Deprecations

  • Java SDK: Beam Schema FieldType.getMetadata is now deprecated and is replaced by the Beam
    Schema Options, it will be removed in version 2.23.0. (BEAM-9704)
  • The --zone option in the Dataflow runner is now deprecated. Please use --worker_zone instead. (BEAM-9716)

List of Contributors

According to git shortlog, the following people contributed to the 2.21.0 release. Thank you to all contributors!

Aaron Meihm, Adrian Eka, Ahmet Altay, AldairCoronel, Alex Van Boxel, Alexey Romanenko, Andrew Crites, Andrew Pilloud, Ankur Goenka, Badrul (Taki) Chowdhury, Bartok Jozsef, Boyuan Zhang, Brian Hulette, brucearctor, bumblebee-coming, Chad Dombrova, Chamikara Jayalath, Chie Hayashida, Chris Gorgolewski, Chuck Yang, Colm O hEigeartaigh, Curtis "Fjord" Hawthorne, Daniel Mills, Daniel Oliveira, David Yan, Elias Djurfeldt, Emiliano Capoccia, Etienne Chauchot, Fernando Diaz, Filipe Regadas, Gleb Kanterov, Hai Lu, Hannah Jiang, Harch Vardhan, Heejong Lee, Henry Suryawirawan, Hk-tang, Ismaël Mejía, Jacoby, Jan Lukavský, Jeroen Van Goey, jfarr, Jozef Vilcek, Kai Jiang, Kamil Wasilewski, Kenneth Knowles, KevinGG, Kyle Weaver, Kyoungha Min, Luke Cwik, Maximilian Michels, Michal Walenia, Ning Kang, Pablo Estrada, paul fisher, Piotr Szuberski, Reuven Lax, Robert Bradshaw, Robert Burke, Rose Nguyen, Rui Wang, Sam Rohde, Sam Whittle, Spoorti Kundargi, Steve Koonce, sunjincheng121, Ted Yun, Tesio, Thomas Weise, Tomo Suzuki, Udi Meiri, Valentyn Tymofieiev, Vasu Nori, Yichi Zhang, yoshiki.obata, Yueyang Qiu

v2.21.0-RC1

19 May 22:21
e859735
Compare
Choose a tag to compare
v2.21.0-RC1 Pre-release
Pre-release
2.21.0 release candidate #1.

v2.16.0

08 Feb 03:50
de30361
Compare
Choose a tag to compare
Apache Beam 2.16.0 release

v2.15.0

22 Aug 22:10
7931ec0
Compare
Choose a tag to compare

Apache Beam 2.15.0 Release