Release Notes - Beam - Version 2.30.0 - HTML format

Sub-task

  • [BEAM-10943] - Support SqlTransform.registerUdf in ZetaSQL dialect
  • [BEAM-12123] - Proactively reject unsupported types in Java UDFs.
  • [BEAM-12198] - Support Dataflow update when schemas are used

Bug

  • [BEAM-9582] - Beam Dependency Update Request: grpcio-tools
  • [BEAM-10854] - PeriodicImpulse default arguments are not valid values
  • [BEAM-11227] - Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216
  • [BEAM-11277] - WriteToBigQuery with batch file loads does not respect schema update options when there are multiple load jobs
  • [BEAM-11620] - PeriodicImpulse uses unspecified / operator on Durations
  • [BEAM-11883] - Unsupported operand type with PeriodicImpulse in Python
  • [BEAM-12011] - WindowFn.getOutputTime break joins, adds needless complexity
  • [BEAM-12059] - BigQueryUtils doesn't process DATETIME field that includes a T
  • [BEAM-12079] - Deterministic coding enforcement causes _StreamToBigQuery/CommitInsertIds/GroupByKey to fail
  • [BEAM-12081] - Fix AwsOptions Jackson (de)serialization of integer values
  • [BEAM-12088] - Make file staging uniform among Spark Runners
  • [BEAM-12112] - beam_PostCommit_Java_Nexmark_* failures (Dataflow/Direct/Flink)
  • [BEAM-12118] - QueuingBeamFnDataClient adds polling latency to completing bundle processing
  • [BEAM-12137] - incremental --> patch
  • [BEAM-12142] - Reduce overhead of MetricsEnvironment
  • [BEAM-12160] - Please fix errorprone, checkstyle and lint warnings for tpcds module
  • [BEAM-12166] - Beam Sql - Combine Accumulator return Map fails with class cast exception
  • [BEAM-12180] - UnbatchPandas (and to_pcollection) do not set type hint for unbatched DataFrames
  • [BEAM-12191] - python DataflowRunner upload_graph feature doesn't reduce template file size
  • [BEAM-12204] - Portable Java caches entirety of iterable side inputs.
  • [BEAM-12207] - Python PostCommits failing portableWordCountSparkRunnerBatch
  • [BEAM-12220] - ZipFiles.zipDirectory leaks native JVM memory
  • [BEAM-12222] - Dataflow side input translation "Unknown producer for value"
  • [BEAM-12229] - WindmillStateCache has a 0% hit rate in 2.29
  • [BEAM-12238] - StateBackedIterable is not Serializable
  • [BEAM-12242] - An incorrect model pipeline is generated when PubSub is followed by a x-lang transform for streaming
  • [BEAM-12243] - TPC-DS: use SQL "substring()" instead of "substr()"
  • [BEAM-12264] - Failure in beam_PostCommit_Java_ValidatesRunner_Twister2
  • [BEAM-12282] - Failure in ':vendor:grpc-1_26_0:validateVendoring'
  • [BEAM-12285] - Failure in beam_PostCommit_SQL
  • [BEAM-12290] - TestPubsub.assertThatSubscriptionEventuallyCreated timeout does not work
  • [BEAM-12294] - BeamFnStatusClient managed channel not properly shutdown.
  • [BEAM-12296] - Fix bug in :sdks:go:test:ulrValidatesRunner
  • [BEAM-12321] - Failure in test_run_packable_combine_per_key and test_run_packable_combine_globally
  • [BEAM-12326] - Resource hints not respected after transform substitution.
  • [BEAM-12337] - Replace invalid UW container name for Java SDK
  • [BEAM-12338] - Some BQ DisplayData are missing for the Python FnAPI path
  • [BEAM-12361] - Reshuffle.withNumBuckets creates (N*2)-1 buckets
  • [BEAM-12362] - BigQuery sink swallows HttpErrors when performing streaming inserts preventing retries
  • [BEAM-12390] - clickhouse test fails when test resource can't be read by others permission
  • [BEAM-12416] - Python Kafka transforms are failing due to "No Runner was specified"
  • [BEAM-12440] - Wrong version string for legacy Dataflow Java container
  • [BEAM-12553] - Dataframe API & GCP PubSub changes for Python SDK not released with 2.30.0

New Feature

  • [BEAM-11949] - Remove SDK-side dataflow runnerv2 blockers.
  • [BEAM-12045] - Add cloud profiler agent to Java SDK harness container
  • [BEAM-12194] - Support user-defined aggregate functions in ZetaSQL.
  • [BEAM-12219] - Support SUBSTRING function.
  • [BEAM-12273] - Twister2 runner support non-multimap materialization

Improvement

  • [BEAM-2303] - Add SpecificData to AvroCoder
  • [BEAM-4106] - Merge staging file options between runners
  • [BEAM-5537] - Beam Dependency Update Request: google-cloud-bigquery
  • [BEAM-10180] - Upgrade httplib2 to > 0.18.0 to resolve CVE-2020-11078
  • [BEAM-11055] - Update log4j to version 2.14.1
  • [BEAM-11712] - Run TPC-DS with BeamSQL and Spark runner
  • [BEAM-11742] - ParquetSink fails for nullable fields
  • [BEAM-11855] - ib.collect should accept DeferredDataFrame instances
  • [BEAM-11948] - Drop support for Flink 1.8 and 1.9
  • [BEAM-12012] - ElasticsearchIO - Add API key & bearer token authentication
  • [BEAM-12016] - Implement add_suffix, add_prefix for DataFrame and Series
  • [BEAM-12017] - Implement combine, combine_first for DataFrame and Series
  • [BEAM-12069] - mock should be a test dependency
  • [BEAM-12091] - Make file staging uniform among runners
  • [BEAM-12102] - Catch and rethrow Calcite CannotPlanException.
  • [BEAM-12114] - Eliminate beam_fn_api from KafkaIO expansion
  • [BEAM-12145] - Normalize transform IDs generated by SDKs
  • [BEAM-12148] - Align Spark runner jackson dependency version with Beam's
  • [BEAM-12151] - Bump Parquet dependency version to 1.12.0
  • [BEAM-12172] - Upgrade gradle to version 6.8.3
  • [BEAM-12173] - Avoid intermediate conversion to seconds in BigQueryServicesImpl
  • [BEAM-12192] - WatchKafkaTopicPartitionDoFn should respect given topic from KafkaIO
  • [BEAM-12193] - WatchKafkaTopicPartitionDoFn reports user counter to indicate which TopicPartition has been emitted to downstream
  • [BEAM-12197] - TPC-DS: Fix SQL-queries syntax
  • [BEAM-12226] - JdbcIO default retry strategy should retry on PostgreSQL deadlock
  • [BEAM-12278] - Current python library dependends on urllib2 < 0.18.0 (which has a severe vulnerability)

Task

  • [BEAM-8611] - Move TextSourceTest into TextIOReadTest
  • [BEAM-12210] - Performance issue in AvroUtils.checkTypeName
  • [BEAM-12213] - Dataflow should always create v1b3 Steps in runner_v1 flavor
  • [BEAM-12214] - Remove RedisIO.readAll() as it was deprecated since Beam 2.13.0
  • [BEAM-12216] - Remove MqttIO.create() with clientId constructor as it was deprecated since Beam 2.13.0
  • [BEAM-12217] - MongoDbIO: remove Read.withFilter() and Read.withProjection()
  • [BEAM-12247] - Reduce memory allocations in InMemoryTimerInternals
  • [BEAM-12248] - Reduce ArrayList allocation in Row/RowUtils

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.