Release Notes - Beam - Version 2.32.0 - HTML format

Sub-task

  • [BEAM-9921] - Add Go SDK tests to cross-language Flink ValidatesRunner test suite
  • [BEAM-9922] - Add Go SDK tests to cross-language Spark ValidatesRunner test suite
  • [BEAM-12094] - Support Spark 3 in spark_runner.py
  • [BEAM-12832] - Add Go SDK to "Using Cross-language Transforms" in programming guide
  • [BEAM-12834] - Improve user-facing documentation for basic Go SDK cross-language transforms.

Bug

  • [BEAM-11059] - Cannot use generic class as UDAF implementation.
  • [BEAM-11514] - Passing a PCollection without a schema to to_dataframe produces a non-obvious error
  • [BEAM-11534] - transform op doesn't work with pandas 1.2.0
  • [BEAM-11907] - SqsIO checkpoint causes throttling
  • [BEAM-12144] - Dataflow streaming worker stuck and unable to get work from Streaming Engine
  • [BEAM-12305] - Add pack_combiners to FlinkRunner and SparkRunner
  • [BEAM-12385] - AvroUtils exception when converting JDBC Row to GenericRecord
  • [BEAM-12399] - Godoc (pkg.go.dev) doesn't host documentation due to "license restrictions"
  • [BEAM-12422] - Vendored gRPC 1.36.0 is using a log4j version with security issues
  • [BEAM-12442] - RenameFields disregards renames in nested fields with DataflowRunner
  • [BEAM-12444] - GroupBy.apply on a series grouped by a callable fails
  • [BEAM-12459] - Watch does not properly advance the watermark by default
  • [BEAM-12460] - Simple AvroUtils.toGenericRecord
  • [BEAM-12471] - NumberFormatException in BeamTableUtils#autoCastField
  • [BEAM-12473] - ClassCastException when using registerUdaf in Calcite SQL
  • [BEAM-12497] - BigQueryIO should return successfully inserted rows
  • [BEAM-12514] - BigQueryIO - ReadFromBigQuery can not get table reference from RuntimeValueProvider
  • [BEAM-12516] - StreamingDataflowWorker.ShardedKey.toString throws exception if key is less than 100 bytes
  • [BEAM-12524] - Discard BundleProcessor instead of re-using it on bundle processing failure
  • [BEAM-12528] - Don't reuse failed plans.
  • [BEAM-12531] - ib.show does not handle deferred dataframe instances
  • [BEAM-12532] - BigQueryUtils fromBeamField omits ":00" from times with 0 seconds
  • [BEAM-12541] - fillna(value=DataFrame/Series) does not correctly fill values
  • [BEAM-12546] - Improve BlockingQueue performance in QueueingBeamFnDataClient
  • [BEAM-12547] - Fix epoll shading in vendored gRPC
  • [BEAM-12602] - Cache reader in getProgress call of UnboundedSourceAsSDFRestrictionTracker
  • [BEAM-12625] - beam_PostCommit_Java_ValidatesRunner_Spark failing because testTwoTimersSettingEachOtherWithCreateAsInputUnbounded
  • [BEAM-12648] - apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT failing with A Cloud KMS key cannot be used with Streaming Engine enabled
  • [BEAM-12649] - beam_PostCommit_Java_ValidatesRunner_Spark_PR failing failing
  • [BEAM-12656] - go-licenses won't build
  • [BEAM-12661] - GetData Windmill RPC calls Become Stuck
  • [BEAM-12676] - StreamingWordCountIT is failing for Python PreCommit
  • [BEAM-12678] - beam_PreCommit_GoPortable_Phrase failing to start the local job server
  • [BEAM-12723] - beam_PreCommit_Java_Examples_Dataflow_Phrase failing due to key negotiation error
  • [BEAM-12780] - StreamingDataflowWorker should limit local retries
  • [BEAM-12831] - Add Go SDK XLang transform user documentation

New Feature

  • [BEAM-8376] - Add FirestoreIO connector to Java SDK
  • [BEAM-9496] - Add a Dataframe API for Python
  • [BEAM-12076] - Update Python cross-language Kafka source to read metadata
  • [BEAM-12380] - Go SDK Kafka IO Transform implemented via XLang
  • [BEAM-12456] - Querying table in parallel  in JdbcIO

Improvement

  • [BEAM-10721] - Implement pandas datetime methods
  • [BEAM-11359] - Clean up temporary dataset after ReadAllFromBQ executes
  • [BEAM-11811] - Don't allow numWorkers > maxNumWorkers.
  • [BEAM-12024] - Add well-documented DataFrame example pipelines
  • [BEAM-12074] - Add API Documentation for the DataFrame API
  • [BEAM-12107] - Run DataFrame example pipelines continuously on Dataflow
  • [BEAM-12225] - Replace AWS API used to list shards from DescribeStream to ListShards
  • [BEAM-12260] - Java - Backport FirestoreIO connector's ramp-up to DatastoreIO connector
  • [BEAM-12272] - Python - Backport FirestoreIO connector's ramp-up to DatastoreIO connector
  • [BEAM-12435] - Generalize S3FileSystem
  • [BEAM-12465] - Fix handling of nested typing.Generic type hints in Py3.7+
  • [BEAM-12511] - JdbcIO.Write.withResults work without statement
  • [BEAM-12529] - to_pcollection does not support timestamp types
  • [BEAM-12533] - DeferredSeries and DeferredDataFrame should have a useful repr
  • [BEAM-12538] - Allow ExpansionService to accept PipelineOptions
  • [BEAM-12589] - testTwoTimersSettingEachOtherWithCreateAsInputUnbounded unsupported on Dataflow runner
  • [BEAM-12590] - Automatically upgrade Dataflow Python pipelines that use cross-language transforms to Runner v2
  • [BEAM-12594] - Google-cloud-profiler package dependency makes building custom container more complicated
  • [BEAM-12597] - Use AppendingTransformer for reference.conf in archetype poms
  • [BEAM-12611] - Populate instruction id in Java SDK harness log entries
  • [BEAM-12643] - Make Hooks more testable

Test

  • [BEAM-12470] - ReshuffleTest.testAssignShardFn is flaky due to randomness
  • [BEAM-12583] - beam_PreCommit_Java_Phrase fails

Task

  • [BEAM-11951] - Add documentation page highlighting differences from standard pandas

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.