Release Notes - Beam - Version 2.28.0 - HTML format

Bug

  • [BEAM-8829] - PubsubJsonTableProvider throws error if input schema does not have event_timestamp
  • [BEAM-10108] - publish_docker_images.sh has out of date Flink versions
  • [BEAM-11038] - runShadow error: property 'mainClass' is final
  • [BEAM-11272] - Combiner Label Constructor Arg Not Passed Through To pTransform
  • [BEAM-11327] - Replace Charset.defaultCharset() with StandardCharsets.UTF_8
  • [BEAM-11329] - HDFS not deduplicating identical configuration paths.
  • [BEAM-11383] - Normalize build timestamp in sdk.properties to avoid re-running tasks
  • [BEAM-11512] - sdk.properties is often (always?) stale.
  • [BEAM-11517] - BigQuery FILE_LOADS in streaming not tested on Dataflow
  • [BEAM-11530] - Annotated setter parameters handled wrong in schema creation
  • [BEAM-11532] - df.merge with identically-named `on` columns produces duplicate output columns
  • [BEAM-11539] - FhirIOSearchIT.testFhirIOSearch[R4] in Java PostCommit is flaky
  • [BEAM-11579] - beam_PostCommit_Python36 tests failing
  • [BEAM-11581] - Java SDK Harness emitted execution metrics are all zero
  • [BEAM-11586] - Container build always pulls licenses.
  • [BEAM-11604] - Prevent panic to run streaming_wordcap
  • [BEAM-11614] - sql_taxi examples fails on HEAD due to "transform unexpectedly with no env id"
  • [BEAM-11622] - Flaky test: org.apache.beam.sdk.io.gcp.healthcare.FhirIOSearchIT.testFhirIOSearch[R4]
  • [BEAM-11637] - BitAnd unsafely mutates an instance field.
  • [BEAM-11643] - SpannerIO does not support using BigDecimal for Numeric fields
  • [BEAM-11644] - translations.pack_combiners optimizer causes breaking change to metrics API
  • [BEAM-11679] - PubsubIO read full PubsubMessage with attributes or messgeId has encoding issue with dataflow runner v2
  • [BEAM-11689] - 401 on org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde
  • [BEAM-11695] - Remove translations.pack_combiners optimizer from defaults
  • [BEAM-11715] - Combiner packing creates an incorrect proto
  • [BEAM-11716] - Ensure combiner packing maintains internal CombinePerKeys structure
  • [BEAM-11718] - Actually run pre-optimize phases
  • [BEAM-11732] - flink-clients dependency must be provided by user.
  • [BEAM-11790] - Error when trying to read from S3 with Python SDK and external runners
  • [BEAM-11794] - ":sdks:java:extensions:sql:udf-test-provider:checkstyleMain" is failing for Beam release Gradle bulld
  • [BEAM-11799] - S3 options does not provided to boto3 client while using FlinkRunner and Beam worker pool container
  • [BEAM-11813] - beam-sdks-java-google-cloud-platform-bom files ends up in two nexus repositories
  • [BEAM-13166] - Versions after `2.28.0` fail to infer grouping decoders after a date is selected from a data structure

New Feature

  • [BEAM-6653] - Implement Lullz logging in the Beam Java SDK
  • [BEAM-9602] - Support Dynamic Timer in Python SDK over FnApi
  • [BEAM-10074] - Hash Functions in BeamSQL
  • [BEAM-10324] - Create ApproximateDistinct using HLL Impl
  • [BEAM-10473] - RowJson should support DATETIME
  • [BEAM-11460] - Support reading Parquet files with unknown schema
  • [BEAM-11482] - Thrift support for KafkaTableProvider
  • [BEAM-11526] - Add Beam schema support to ParquetIO
  • [BEAM-11538] - Add a Deque Encoder
  • [BEAM-11624] - Hash functions in ZetaSQL
  • [BEAM-11665] - Create Beam GCP BOM

Improvement

  • [BEAM-8202] - Support ParquetTable Writer
  • [BEAM-8344] - Refactor ParquetTableProvider
  • [BEAM-9179] - Refactor Beam ZetaSQL type translation code
  • [BEAM-9426] - Add JVM properties to JavaJobServer
  • [BEAM-9541] - Single source of truth for supported Flink versions
  • [BEAM-9637] - Update --runner option help
  • [BEAM-11018] - Use metric for Python BigQuery streaming insert API latency logging
  • [BEAM-11032] - Use metric for Java BigQuery streaming insert API latency logging
  • [BEAM-11408] - GCP BigQuery sink (streaming inserts) uses runner determined sharding
  • [BEAM-11410] - Example of E2E test for Kafka environment
  • [BEAM-11411] - Example of E2E test for Pub/Sub environment
  • [BEAM-11457] - Enable skip key-value clone for HadoopFormatIO
  • [BEAM-11523] - Bump Gradle to 6.7.1
  • [BEAM-11527] - Support user configurable Hadoop Configuration flags for ParquetIO
  • [BEAM-11533] - PubSub support types: TIMESTAMP, DATE, TIME, DATETIME
  • [BEAM-11542] - Support projecting groupbys
  • [BEAM-11571] - Support Conversion to GenericRecords in Convert.to transform
  • [BEAM-11572] - Use MatcherAssert#assertThat instead of deprecated Assert#assertThat
  • [BEAM-11584] - Upgrade junit to version 4.13.1
  • [BEAM-11593] - Move SparkStructuredStreamingRunnerRegistrar to its own package
  • [BEAM-11594] - Upgrade gradle to version 6.8
  • [BEAM-11677] - Expose commit_offset_in_finalize and timestamp_policy to ReadFromKafka
  • [BEAM-11678] - beam_PerformanceTests_Kafka_IO is broken by incorrect docker image cleanup
  • [BEAM-11697] - Upgrade Flink runner to Flink versions 1.12.1 and 1.11.3
  • [BEAM-11762] - Upgrade Beam base image to use Tensorflow 2.4.1

Task

  • [BEAM-11475] - Upgrade everything (Python, tests, website, etc.) to Flink 1.12.

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.