Release Notes - Beam - Version 2.22.0 - HTML format

Sub-task

  • [BEAM-2903] - Java SDK support for portable progress reporting
  • [BEAM-3836] - Java SDK harness should understand a BundleSplitRequest and respond with a BundleSplit before bundle finishes
  • [BEAM-4682] - Integrate support for timers using the portability APIs into Dataflow
  • [BEAM-8742] - Add stateful processing to ParDo load test
  • [BEAM-8871] - Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.ByteKeyRangeTracker
  • [BEAM-8872] - Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
  • [BEAM-9383] - Staging Dataflow artifacts from environment
  • [BEAM-9416] - BIP-1: Convert avro metadata to Schema options
  • [BEAM-9634] - [Java] PTransform that integrates Cloud Natural Language functionality
  • [BEAM-9985] - Java 11 ByteBuddyUtils ConvertValueForSetter.convertArray IllegalStateException

Bug

  • [BEAM-1819] - Key should be available in @OnTimer methods (Java)
  • [BEAM-6860] - WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
  • [BEAM-7885] - DoFn.setup() don't run for streaming jobs on DirectRunner.
  • [BEAM-8944] - Python SDK harness performance degradation with UnboundedThreadPoolExecutor
  • [BEAM-9216] - Unable to run job server
  • [BEAM-9439] - KinesisReader does not report correct backlog statistics
  • [BEAM-9505] - SpannerIO spurious error message with empty bundles
  • [BEAM-9513] - NullPointerException in convertRexNodeFromResolvedExprWithRefScan
  • [BEAM-9521] - NullPointerException in convertRexNodeFromResolvedExpr
  • [BEAM-9522] - BeamJoinRel.extractJoinRexNode RexLiteral cannot be cast to RexCall
  • [BEAM-9657] - ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef
  • [BEAM-9658] - ExpressionConverter.retrieveRexNode IndexOutOfBoundsException
  • [BEAM-9659] - ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
  • [BEAM-9661] - LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
  • [BEAM-9662] - BeamSortRule NullPointerException limit parameter
  • [BEAM-9663] - BeamSortRule NullPointerException offset parameter
  • [BEAM-9664] - ArrayScanToJoinConverter ResolvedSubqueryExpr cast to ResolvedColumnRef
  • [BEAM-9674] - "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
  • [BEAM-9739] - SpannerIO - Retry on Aborted Exception during schema change
  • [BEAM-9743] - TFRecordCodec not attempt to fully read/write
  • [BEAM-9758] - [very low priority] Asterisks in nexmark should be escaped
  • [BEAM-9767] - test_streaming_wordcount flaky timeouts
  • [BEAM-9771] - colab links in example notebooks don't work
  • [BEAM-9791] - Precommit for dataflow runner v2
  • [BEAM-9807] - PostRelease_NightlySnapshot failing due to missing "region" flag
  • [BEAM-9808] - Test script fails when there are spaces in PATH
  • [BEAM-9819] - Extend acceptable httplib2 version range.
  • [BEAM-9821] - SpannerIO does not include all batching parameters in DisplayData.
  • [BEAM-9822] - SpannerIO: Reduce memory usage - especially when streaming
  • [BEAM-9824] - Multiple reshuffles are ignored in some cases on Flink batch runner.
  • [BEAM-9831] - HL7v2IO Improvements
  • [BEAM-9835] - test_multimap_multiside_input failing on Spark Python
  • [BEAM-9836] - Exclude Spark runner from UsesKeyInParDo tests
  • [BEAM-9841] - PortableRunner does not support wait_until_finish(duration=...)
  • [BEAM-9845] - Stage dependencies over the expansion service.
  • [BEAM-9846] - Remove references to native Java BQ source and sink
  • [BEAM-9860] - Make job_endpoint required for PortableRunner
  • [BEAM-9875] - PR11585 breaks python2PostCommit cross-lang suite
  • [BEAM-9887] - Throw IllegalArgumentException when building Row with logical types with Invalid input
  • [BEAM-9888] - @RequiresTimeSortedInput might feed data out of order
  • [BEAM-9940] - Dataflow runner not setting timer family specs for TimerDeclaration annotation
  • [BEAM-9941] - Add a test to prevent a regression in Dataflow when using a Flatten with different input/output coder followed by a GBK
  • [BEAM-9971] - beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
  • [BEAM-9989] - Python Schemas: error encoding from generated user types with an "id" field
  • [BEAM-10015] - output timestamp not properly propagated through the Dataflow runner
  • [BEAM-10022] - [Python] Error with `WriteToParquet` with empty buffer
  • [BEAM-10050] - VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
  • [BEAM-10057] - Failure when getting watermark "getWatermark is never meant to be invoked."
  • [BEAM-10058] - VideoIntelligenceMlTestIT.test_label_detection_with_video_context is flaky
  • [BEAM-10077] - using filename + hash instead of UUID for staging name
  • [BEAM-10121] - Python RowCoder doesn't support nested structs
  • [BEAM-10122] - Python RowCoder throws NotImplementedError in DataflowRunner
  • [BEAM-10164] - Flink: Memory efficient combine implementation for batch runner

New Feature

  • [BEAM-71] - Watermark library
  • [BEAM-2822] - Add support for progress reporting in fn API
  • [BEAM-3788] - Implement a Kafka IO for Python SDK
  • [BEAM-5602] - Dataflow runner should support reporting progress for bounded SplittableDoFn
  • [BEAM-5604] - Dataflow runner should support splitting bounded bundles for SplittableDoFn
  • [BEAM-5605] - Support Portable SplittableDoFn for batch
  • [BEAM-6327] - Don't attempt to fuse subtransforms of primitive/known transforms.
  • [BEAM-6729] - Detect and use PEP-3107 style type annotations for type hints.
  • [BEAM-6887] - Streaming Spanner Writer transform
  • [BEAM-8019] - Support cross-language transforms for DataflowRunner
  • [BEAM-9339] - Declare capabilities in SDK environments
  • [BEAM-9463] - Upgrade ZetaSQL to 2020.03.1
  • [BEAM-9468] - Add Google Cloud Healthcare API IO Connectors
  • [BEAM-9600] - Implement GetJobMetrics in Flink uber jar job server
  • [BEAM-9603] - Support Dynamic Timer in Java SDK over FnApi
  • [BEAM-9641] - Support ZetaSQL DATE functions in BeamSQL

Improvement

  • [BEAM-2925] - Fn API user timer support
  • [BEAM-4245] - Add support for passing protobuff to SubProcess example library
  • [BEAM-8031] - Add Snippets for Patterns website
  • [BEAM-8048] - Support TIMESTAMP Sub function/operator
  • [BEAM-8050] - Remove "$" from auto-generated field names
  • [BEAM-8060] - Support DATE type
  • [BEAM-8061] - Support TIME type
  • [BEAM-8070] - Support empty array literal
  • [BEAM-8074] - Update error message when reading from table with unsupported data types
  • [BEAM-8542] - Add async write to AWS SNS IO & remove retry logic
  • [BEAM-8603] - Add Python SqlTransform MVP
  • [BEAM-8888] - BeamSQL does not support LogicalType
  • [BEAM-8889] - Make GcsUtil use GoogleCloudStorage
  • [BEAM-9443] - support direct_num_workers=0
  • [BEAM-9622] - Support for consuming tagged PCollections in Python SqlTransform
  • [BEAM-9699] - Add ability to use ZetaSQL in Python SqlTransform
  • [BEAM-9720] - Add custom AWS Http Client Configuration capability for AWS client 1.0/2.0
  • [BEAM-9768] - Add a gradle command for running the Python Unified Local Runner.
  • [BEAM-9795] - Support custom avro DatumWriters when writing to BigQuery
  • [BEAM-9802] - Provide a way to customize automatically started services.
  • [BEAM-9820] - Upgrade Flink 1.9.x to 1.9.3
  • [BEAM-9826] - Update TikaIO to use Tika version 1.24.1
  • [BEAM-9840] - Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO
  • [BEAM-9848] - Pass caller pipeline options to expansion service
  • [BEAM-9856] - HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
  • [BEAM-9884] - Add local option for configuring SqlTransform planner
  • [BEAM-9885] - Use artifact staging for SqlTransform rather than staging jar experiment
  • [BEAM-9900] - Remove the need for shutdownSourcesOnFinalWatermark flag
  • [BEAM-9931] - Support custom Avro DatumReaders in AvroIO
  • [BEAM-9964] - Setting workerCacheMb to make its way to the WindmillStateCache Constructor
  • [BEAM-10037] - BeamSqlExample.java fails to build when running ./gradlew command
  • [BEAM-10052] - check hash and avoid duplicates when uploading artifact in Python Dataflow Runner
  • [BEAM-10078] - uniquify Dataflow specific jars when staging
  • [BEAM-10106] - Script the deployment of artifacts to pypi

Test

  • [BEAM-8949] - Add Spanner IO Integration Test for Python
  • [BEAM-9832] - KeyError: 'No such coder: ' in fn_runner_test
  • [BEAM-9907] - apache_beam.transforms.external_test.ExternalTransformTest.test_nested flaky

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.