Release Notes - Beam - Version 2.25.0 - HTML format

Sub-task

  • [BEAM-8106] - Publish Java 11 SDK Harness docker image
  • [BEAM-9372] - Drop support for Python 3.5
  • [BEAM-10873] - Stronger testing of dataframes partitioning declartions.

Bug

  • [BEAM-9399] - Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream
  • [BEAM-9979] - Fix race condition where the read index maybe reported from the last executed bundle
  • [BEAM-10292] - DefaultFilenamePolicy.ParamsCoder loses information whether Params's resource ID is file/directory
  • [BEAM-10524] - Default decoder for ReadFromBigQuery does not support repeatable fields
  • [BEAM-10532] - BigQueryUtils.fromTableSchema breaks when field in TableSchema has a NUMERIC data type
  • [BEAM-10586] - Sunset Python 2 in Dataflow runner.
  • [BEAM-10624] - tests fails on windows - pyarrow stores int always as 32 bit
  • [BEAM-10673] - DynamoDBIO.RetryConfiguration in AWS v2 is not correctly exposed
  • [BEAM-10691] - FlinkRunner: pipeline slows down due to expensive output timestamp queue
  • [BEAM-10694] - HCatalogIO.Read transform does not survive multiple serialization round trips
  • [BEAM-10760] - Cleanup timers lead to unbounded state accumulation in global window
  • [BEAM-10762] - Beam Python on Flink fails when no artifacts staged
  • [BEAM-10769] - Fix Avro IO documentation: when fastavro is used, do not pass schema parsed by avro-python3.
  • [BEAM-10773] - worker_harness_container_image flag has no effect
  • [BEAM-10783] - ZetaSQL failed on WITH queries because wrong ref column index calculated in JoinScanWithRefConverter
  • [BEAM-10790] - tar.gz artifacts written into fat jar is no longer gzip file
  • [BEAM-10808] - StreamingDataflowWorker streaming rpcs due not always observe stream failures, until timeout
  • [BEAM-10816] - Not possible to implement own RateLimitPolicy
  • [BEAM-10829] - Support Kafka Headers in KafkaWriter
  • [BEAM-10833] - Error inferring type of tuple output
  • [BEAM-10847] - NPE in BeamUnnestRel
  • [BEAM-10868] - XVR flake: :release:go-licenses:py:dockerRun container is already in use
  • [BEAM-10915] - AVG(INT64) suggests using DOUBLE, which isn't a supported type.
  • [BEAM-10941] - Use standard sharding conventions for fileio writes.
  • [BEAM-10972] - :sdks:python:container:py37:docker failed to execute
  • [BEAM-10973] - Samza ValidatesRunner tests failed: ParDoTest$MultipleInputsAndOutputTests
  • [BEAM-10975] - Test failure: apache_beam.examples.wordcount_it_test.WordCountIT.test_wordcount_it_with_prebuilt_sdk_container
  • [BEAM-10978] - Bug in inference of map types.
  • [BEAM-10986] - :build task doesn't build shaded jar with shadow >5.0.0
  • [BEAM-10991] - Timers don't release watermark holds in dataflow on 2.24
  • [BEAM-10997] - Make sure SDF unbounded wrapper close reader properly
  • [BEAM-11034] - State garbage collection timers set by Dataflow SimpleParDoFn pile up for the GlobalWindow
  • [BEAM-12500] - Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery
  • [BEAM-13097] - Don't assume 1 jar on classpath in portable jar creator

New Feature

  • [BEAM-2546] - Add InfluxDbIO
  • [BEAM-5757] - Elasticsearch IO provide delete function
  • [BEAM-9898] - Add cross-language support to SnowflakeIO.Write
  • [BEAM-10597] - Propagate BigQuery streaming insert throttled time to Dataflow worker
  • [BEAM-10645] - Warn when user requests a non-parallel pandas operation.
  • [BEAM-10844] - Add a way to prebuild python sdk container with dependencies
  • [BEAM-10895] - Support UNNEST an (possibly nested) array field of an struct column

Improvement

  • [BEAM-4379] - Make ParquetIO Read splittable
  • [BEAM-4833] - Add support for users specifying a requirements.txt for their Python portable container
  • [BEAM-5715] - Depending on grpc-all pulls in more dependencies than necessary
  • [BEAM-9850] - Key should be available in @OnTimer methods (Spark Runner)
  • [BEAM-10009] - Support for date times in Python schemas
  • [BEAM-10049] - Add licenses to Go SDK containers
  • [BEAM-10258] - Support type hint annotations on PTransform's expand()
  • [BEAM-10523] - Add support for custom DatumWriters to AvroIO.Write
  • [BEAM-10654] - Implement ExternalSchemaIOTransformRegistrar, implement for jdbc
  • [BEAM-10669] - Add support for Dataflow Templates
  • [BEAM-10770] - Remove DataflowPortabilityApiUnsupported annotation
  • [BEAM-10814] - DataframeTransform: when input is element-wise produce element-wise output
  • [BEAM-10849] - Test that default Dataflow region option is set.
  • [BEAM-10864] - Update Snowflake JDBC dependency
  • [BEAM-10869] - Pubsub native write should be supported over fnapi
  • [BEAM-10870] - Add raw private key param to snowflake cross-language python wrapper
  • [BEAM-10950] - Override Dataflow-native implementation of BQSource with a Beam source
  • [BEAM-11068] - beam-sdks-java-bom.pom cannot be signed after upgrade to Gradle 6

Test

  • [BEAM-9899] - Add integration tests for cross-language SnowflakeIO Read and Write
  • [BEAM-10570] - Consider making a logical type for generic Python types in Rows.
  • [BEAM-10685] - Add SnowflakeIO Streaming integration test
  • [BEAM-11028] - NullPointerException when running Flink Nexmark tests on Streaming after switch to SDF based translation

Wish

  • [BEAM-8862] - Dependency on io.grpc-all allows test-only dependencies on runtimeClasspath
  • [BEAM-10612] - Add support for Flink 1.11.0

Task

  • [BEAM-8371] - Sunset Beam Python 2 support in new releases in 2020.
  • [BEAM-9456] - Upgrade to gradle 6.6.1
  • [BEAM-10463] - Twister2 Beam Runner Documentation
  • [BEAM-10830] - Twister2 quickstarts and the runner maven archetypes

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.