Release Notes - Beam - Version 2.9.0 - HTML format

Sub-task

  • [BEAM-2918] - Flink support for portable user state
  • [BEAM-4681] - Integrate support for timers using the portability APIs into Flink
  • [BEAM-5624] - Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')
  • [BEAM-5636] - Java support for custom dataflow worker jar
  • [BEAM-5699] - Migrate Go integration test to use a staged dataflow worker jar
  • [BEAM-5720] - Default coder breaks with large ints on Python 3
  • [BEAM-5730] - Migrate Java test to use a staged worker jar
  • [BEAM-5743] - Move "works in progress" out of getting started guide
  • [BEAM-5790] - Euphoria: Remove Dataset abstraction
  • [BEAM-5800] - Support WordCountIT using fn-api worker
  • [BEAM-5801] - Support postcommit ITs using fn-api worker
  • [BEAM-5802] - Support ValidatesRunner using fn-api worker
  • [BEAM-5983] - Create Combine load test for Java SDK
  • [BEAM-6007] - Create ClassLoadingStrategy with Java 11 compatible way
  • [BEAM-6054] - Refactor traslator providers

Bug

  • [BEAM-3516] - SpannerWriteGroupFn does not respect mutation limits
  • [BEAM-4796] - SpannerIO waits for all input before writing
  • [BEAM-5045] - JdbcIO connection issue with MS SQL
  • [BEAM-5439] - StringUtf8Coder is slower than expected
  • [BEAM-5490] - Beam should not irreversibly modify inspect.getargspec
  • [BEAM-5496] - MqttIO fails to deserialize checkpoint
  • [BEAM-5675] - RowCoder#verifyDeterministic isn't consistent with Beam coders
  • [BEAM-5706] - PubSub dependency upgrade causes internal issues for Dataflow
  • [BEAM-5714] - RedisIO emit error of EXEC without MULTI
  • [BEAM-5725] - ElasticsearchIO RetryConfiguration response parse failure
  • [BEAM-5759] - ConcurrentModificationException on JmsIO checkpoint finalization
  • [BEAM-5780] - fn-api-worker and legacy-worker should point to different dir
  • [BEAM-5782] - BigQuery TableRows not cloneable when using Dataflow
  • [BEAM-5797] - SDK workers are not always killed when Flink pipeline finishes
  • [BEAM-5813] - ElasticSearchIOTest.testSplit is flaky
  • [BEAM-5848] - Flink runner synthetic source with forward stream causes job failure
  • [BEAM-5853] - PR6752 cause WindowedWordCount fn-api worker pipeline stuck
  • [BEAM-5857] - Unnecessary submission of a new job in Combine.globally
  • [BEAM-5887] - packageTests and shadowTestJar write the same file
  • [BEAM-5909] - QueryablePipeline fails to lookup UserState with Timers
  • [BEAM-5912] - The python dependency check report shows same release dates for different versions of libraries.
  • [BEAM-5915] - Python postcommit test_read_metrics times out
  • [BEAM-5919] - Build breaks when including Multi-release Jars
  • [BEAM-5921] - [SQL] Support Joda types for UDF arguments
  • [BEAM-5930] - Java SDK harness fails to access state during timer callbacks
  • [BEAM-5938] - [Flake] beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle running out of quota.
  • [BEAM-5961] - No precommit coverage for Nexmark postcommit main entry point
  • [BEAM-5971] - Checkpointing fails for UnboundedSourceWrapper with no local Readers
  • [BEAM-5974] - Migrate ByteKeyRangeTracker to use tryClaim(ByteKey.EMPTY) as end of range claim instead of markDone
  • [BEAM-5987] - Cache side inputs on Spark runner for performance
  • [BEAM-5999] - Proto error when running test_pardo_timers of Python PortableValidatesRunner
  • [BEAM-6010] - Deprecate KafkaIO withTimestampFn()
  • [BEAM-6011] - Enable Phrase triggering in Nexmark jobs
  • [BEAM-6018] - Memory leak in GCSUtil.java executeBatches
  • [BEAM-6048] - pylint-27 failed but not caught in Pre/PostCommit
  • [BEAM-6062] - Spark runner does not show Beam metrics in web UI
  • [BEAM-6068] - Wordcount example fails to read from gcs shakespare text file
  • [BEAM-6076] - NEXMark flakiness: NPE thrown from BigQuery client library once in a while
  • [BEAM-6082] - [SQL] Nexmark 5, 7 time out
  • [BEAM-6085] - PR 6683 needs to be temporarily reverted before 2.9.0 branch cut
  • [BEAM-6102] - Dataflow cannot deserialize DoFns - incompatible serialVersionUID (JDK or code version mismatch)
  • [BEAM-6111] - org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey 40/50 runs failed
  • [BEAM-6115] - SyntheticSource bundle size parameter sometimes is casted to invalid type
  • [BEAM-6116] - Pushed back watermark never emitted in portable Flink runner
  • [BEAM-6145] - PR 7051 causes user pipeline to break due to new version of google-cloud-core
  • [BEAM-6152] - PR 7107 needs to be cherry-picked into release branch
  • [BEAM-6162] - Python PipelineOptions raises "ambiguous option" error due to argparse behavior
  • [BEAM-6173] - Build Dataflow containers for Apache Beam 2.9.0 release
  • [BEAM-6182] - Use of conscrypt SSL results in stuck workflows in Dataflow
  • [BEAM-6474] - Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)

New Feature

  • [BEAM-867] - Support set/delete of timers by ID in Flink runner
  • [BEAM-1240] - Create RabbitMqIO
  • [BEAM-2687] - Python SDK support for Stateful Processing
  • [BEAM-3900] - Introduce Euphoria Java 8 DSL
  • [BEAM-4431] - Add "Edit this Page" button to website
  • [BEAM-4465] - Enable Flink runner to run IOITs
  • [BEAM-4594] - Implement Beam Python User State and Timer API
  • [BEAM-5964] - Add ClickHouseIO.Write
  • [BEAM-6261] - Dataflow runner does not refresh updated side inputs

Improvement

  • [BEAM-2660] - Set PubsubIO batch size using builder
  • [BEAM-5036] - Optimize FileBasedSink's WriteOperation.moveToOutput()
  • [BEAM-5272] - Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server
  • [BEAM-5299] - Define max global window as a shared value in protos like URN enums.
  • [BEAM-5326] - SDK support for custom dataflow worker jar
  • [BEAM-5366] - Vendor gRPC and Protobuf separately from beam-model-* Java packages
  • [BEAM-5410] - Fail SpannerIO early for unsupported streaming mode
  • [BEAM-5445] - Update SpannerIO to support unbounded writes
  • [BEAM-5446] - SplittableDoFn: Remove runner time execution information from public API surface
  • [BEAM-5456] - Update google-api-client libraries to 1.25
  • [BEAM-5649] - Remove deprecated primitive CREATE_VIEW transform from Java SDK translation to pipeline proto
  • [BEAM-5702] - Avoid reshuffle for zero and one element creates
  • [BEAM-5705] - PRs 6557 and 6589 break internal Dataflow tests
  • [BEAM-5707] - Add a portable Flink streaming synthetic source for testing
  • [BEAM-5708] - Support caching of SDKHarness environments in flink
  • [BEAM-5724] - Beam creates too many sdk_worker processes with --sdk-worker-parallelism=stage
  • [BEAM-5760] - Portable Flink support for maxBundleSize/maxBundleMillis
  • [BEAM-5778] - Add integrations of Metrics API to Big Query for SyntheticSources load tests in Python SDK
  • [BEAM-5792] - Use Impulse + Reshuffle for Create on Fn API
  • [BEAM-5891] - Update byte-buddy to 1.9.3
  • [BEAM-5917] - Update Flink Runner to 1.5.5
  • [BEAM-5933] - PCollectionViews$SimplePCollectionView.hashCode allocates memory
  • [BEAM-5996] - Nexmark postCommits are failing for Dataflow
  • [BEAM-6013] - Reduce verbose logging within SerializableCoder
  • [BEAM-6023] - Remove Create.Values translation from Spark runner
  • [BEAM-6034] - Add option to set the Flink auto watermarking interval
  • [BEAM-6037] - Make Spark runner pipeline translation based on URNs
  • [BEAM-6078] - CassandraIO.read() should use builder's keyspace by default
  • [BEAM-6099] - RedisIO support for PFADD operation
  • [BEAM-6123] - Switch to using java.nio.file.Files
  • [BEAM-6254] - Add a "Learning Resources" page

Test

  • [BEAM-5781] - Enable ValidatesRunner tests for portable Java

Task

  • [BEAM-4097] - Python SDK should set the environment in the job submission protos
  • [BEAM-4524] - We should not be using md5 to validate artifact integrity.
  • [BEAM-5946] - Upgrade google-apitools to 0.5.24

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.