Release Notes - Beam - Version 2.24.0 - HTML format

Sub-task

  • [BEAM-8319] - Errorprone 0.0.13 fails during JDK11 build
  • [BEAM-9702] - Update Java KinesisIO.Read to support AWS SDK v2
  • [BEAM-10080] - Java Core Tests failing [Java 11]
  • [BEAM-10081] - GCP Core Tests failing [Java 11]
  • [BEAM-10084] - XML IO java tests failing [Java 11]
  • [BEAM-10085] - Direct Runner Tests failing [Java 11]
  • [BEAM-10086] - GCP IO tests failing [Java 11]
  • [BEAM-10136] - Add cross-language wrapper for Java's JdbcIO Write
  • [BEAM-10502] - Eliminate nullability errors from :sdks:java:extensions:zetasketch
  • [BEAM-10568] - Spotbugs failure in JDK11: :sdks:java:core:spotbugsMain (due to spotbugs limitation)

Bug

  • [BEAM-4440] - When filesToStage is empty, the DataflowRunner should fail.
  • [BEAM-7014] - Flake in gcsio.py / filesystemio.py - NotImplementedError: offset: 0, whence: 0
  • [BEAM-8454] - Failure in org.apache.beam.fn.harness.FnHarnessTest.testLaunchFnHarnessAndTeardownCleanly
  • [BEAM-8727] - Beam Dependency Update Request: software.amazon.awssdk
  • [BEAM-9629] - JdbcIO seems to run out of connections in the connection pool and freezes pipeline
  • [BEAM-9712] - setting default timezone doesn't work
  • [BEAM-9792] - BigQuery insertAll ignores retry policy for all errors throwing IOException
  • [BEAM-9968] - beam_PreCommit_Java_Cron org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testSplit Failure
  • [BEAM-9975] - PortableRunnerTest flake "ParseError: Unexpected type for Value message."
  • [BEAM-9976] - FlinkSavepointTest timeout flake
  • [BEAM-10007] - PortableRunner doesn't handle ValueProvider instances when converting pipeline options
  • [BEAM-10243] - Incorrect checkState condition in withFieldValues in Row.java
  • [BEAM-10248] - Beam does not set correct region for BigQuery when requesting load job status
  • [BEAM-10274] - Python SDK can't parse type=json.loads pipeline options at execution time
  • [BEAM-10294] - Beam metrics are unreadable in Spark history server
  • [BEAM-10308] - Component id assignement is not consistent across PipelineContext instances
  • [BEAM-10387] - Add expansion_server keyword parameter to SqlTransform
  • [BEAM-10400] - DirectRunner: race condition in watermark update
  • [BEAM-10414] - FlinkStreamingImpulseSource fails in from_runner_api
  • [BEAM-10462] - org.apache.beam.sdk.transforms corrupt data when a value is Double.NaN
  • [BEAM-10470] - NullPointerException in DirectRunner waitUntilFinish
  • [BEAM-10482] - Schema-aware CoGroup transform doesn't work for any schema-aware POJO PCollection
  • [BEAM-10510] - NPE when closing UnboundedSourceWrapper
  • [BEAM-10517] - DirectRunner multi step combine has questionable null / Map treatment
  • [BEAM-10558] - Flushing of buffered elements during checkpoint can stall
  • [BEAM-10622] - Prefix Gradle paths with a colon for user-facing output
  • [BEAM-10631] - Performance of Schema#indexOf is broken
  • [BEAM-10647] - BigQueryIO BigQueryWrapper.get_query_location can end up in permission issue
  • [BEAM-10651] - Beam ZetaSQL does not handle NULL arrays properly
  • [BEAM-10676] - Timers use the input timestamp as the timer output timestamp which prevents watermark progress
  • [BEAM-10684] - Jdbc cross-language broken after recent merge
  • [BEAM-10697] - Python Precommit failing due to type check failure.
  • [BEAM-10698] - SDFs broken for Dataflow runner v2 due to timestamps being out of bound
  • [BEAM-10702] - Embedded job endpoint artifact service unzips PIP files, making them non-installable

New Feature

  • [BEAM-9178] - Support ZetaSQL TIMESTAMP functions in BeamSQL
  • [BEAM-9896] - Add streaming for SnowflakeIO.Write to Java SDK
  • [BEAM-9897] - Add cross-language support to SnowflakeIO.Read
  • [BEAM-10239] - Support ZetaSQL NUMERIC type in BeamSQL
  • [BEAM-10240] - Support ZetaSQL DATETIME functions in BeamSQL
  • [BEAM-10343] - Add dispositions for SnowflakeIO.write
  • [BEAM-10385] - Integrate SQL expansion into Flink job server
  • [BEAM-10490] - Support read/write ZetaSQL DATE/TIME types from/to BigQuery
  • [BEAM-10551] - Implement Navigation Functions FIRST_VALUE and LAST_VALUE
  • [BEAM-10581] - Implement Numbering functions
  • [BEAM-10601] - DICOM API Beam IO connector

Improvement

  • [BEAM-601] - Enable Kinesis integration tests
  • [BEAM-5414] - grpcio-tools 1.15.0 proto generation breaks compatibility with latest protobuf 3.6.1
  • [BEAM-6928] - Make Python SDK custom Sink the default Sink for BigQuery
  • [BEAM-8057] - Support NAN, INF, and -INF
  • [BEAM-8244] - Split Flink test_external_transforms into multiple tests
  • [BEAM-8648] - Euphoria: Deprecate OutputHints from public API
  • [BEAM-9182] - Support NULL parameter in BeamZetaSqlCalcRel
  • [BEAM-9839] - OnTimerContext should not create a new one when processing each element/timer in FnApiDoFnRunner
  • [BEAM-9932] - Add documentation describing cross-language test pipelines
  • [BEAM-9996] - Flink should not shadow FnApiRunnerTest.test_metrics
  • [BEAM-10010] - Test Python SqlTransform on fn_api_runner
  • [BEAM-10257] - Add option defaults for Spark Python tests
  • [BEAM-10335] - Add STS Assume role credentials provider to AwsModule
  • [BEAM-10336] - Move PubsubJsonTableProvider logic to core Beam
  • [BEAM-10337] - Make PubsubSchemaIOProvider more generic
  • [BEAM-10383] - Update Snowflake JDBC dependency
  • [BEAM-10392] - :sdks:java:io:rabbitmq:test gets stuck regularly
  • [BEAM-10395] - Dataflow runner should deduplicate files to stage by destination
  • [BEAM-10407] - Move Avro and Parquet provider logic out of SQL
  • [BEAM-10408] - Create general Provider and Table classes for IOs in Beam SQL
  • [BEAM-10420] - PerWindowInvoker to handle window observing SplittableDoFns
  • [BEAM-10431] - Change Graphite Metrics sink message format to include missing metric step.
  • [BEAM-10432] - Missing Dataflow pipeline options yields vague error message
  • [BEAM-10433] - :sdks:java:io:jdbc:test fails due to issues connecting to derby server
  • [BEAM-10455] - Provide an example of how to memoize the JdbcIO DataSourceProviderFn
  • [BEAM-10468] - Revamp SchemaCapableIOProvider to SchemaIOProvider
  • [BEAM-10486] - OffsetRestrictionTracker returning invalid split on completed restriction
  • [BEAM-10491] - Simplify PeriodicSequence generator to use OffsetRanges with whole numbers
  • [BEAM-10494] - PubsubSchemaCapableIOProvider Config inner class
  • [BEAM-10533] - Remove watermark hold from RequiresTimeSortedInput
  • [BEAM-10543] - Kafka Cross-language configuration lacks of few parameters
  • [BEAM-10546] - Remove util.timeout
  • [BEAM-10559] - Python SqlTransform examples
  • [BEAM-10571] - Use schemas for external protocol spec
  • [BEAM-10611] - Use ZetaSQL Value.createDatetimeValue(LocalDateTime) function to avoid extra conversions
  • [BEAM-10618] - subprocess_server.py fails to find a free port in IPv6 only environment
  • [BEAM-10629] - Create KnownBuilderInstances function for ExternalTransformRegistrar
  • [BEAM-10648] - Unused BigQuery queryTempDataset value
  • [BEAM-10653] - Modularize SQL test cases

Test

  • [BEAM-7523] - Add a server for KafkaTable Integration Test

Wish

  • [BEAM-10598] - Bump upper end of cloud Bigquery dependencies for python to the latest version, 1.26.1

Task

  • [BEAM-9953] - Beam ZetaSQL supports multiple statements in a query
  • [BEAM-10207] - Beam ZetaSQL supports pure SQL user-defined scalar functions
  • [BEAM-10224] - Test using a DATE field in an aggregation
  • [BEAM-10306] - Add latency measurement to Python benchmarks
  • [BEAM-10371] - :sdks:python:dependencyUpdates should use python 3
  • [BEAM-10668] - Replace toLowerCase().equals() with equalsIgnoreCase

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.