Release Notes - Beam - Version 2.20.0 - HTML format

Sub-task

  • [BEAM-4461] - Create a library of useful transforms that use schemas
  • [BEAM-7274] - Protobuf Beam Schema support
  • [BEAM-7333] - Select need the ability to rename fields
  • [BEAM-8616] - ParquetIO should have Hadoop dependencies as provided
  • [BEAM-8625] - Implement servlet in Dataflow runner for sdk status query endpoint
  • [BEAM-8626] - Implement status api handler in python sdk harness
  • [BEAM-8676] - Beam Dependency Update Request: com.google.api:gax-grpc
  • [BEAM-8685] - Beam Dependency Update Request: com.google.auth:google-auth-library-oauth2-http
  • [BEAM-8691] - Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
  • [BEAM-8695] - Beam Dependency Update Request: com.google.http-client:google-http-client
  • [BEAM-9146] - [Python] PTransform that integrates Video Intelligence functionality
  • [BEAM-9205] - Regression in validates runner tests configuration in spark module
  • [BEAM-9229] - Adding dependency information to Environment proto
  • [BEAM-9247] - [Python] PTransform that integrates Cloud Vision functionality
  • [BEAM-9248] - [Python] PTransform that integrates Cloud Natural Language functionality
  • [BEAM-9258] - [Python] PTransform that connects to Cloud DLP deidentification service
  • [BEAM-9262] - Update ApiServiceDescriptor to have open ended authentication method
  • [BEAM-9353] - ByteBuddy Schema code does not properly handle null values
  • [BEAM-9442] - Schema Select does not properly handle nested nullable fields

Bug

  • [BEAM-4409] - NoSuchMethodException reading from JmsIO
  • [BEAM-5086] - Beam Dependency Update Request: org.apache.kudu
  • [BEAM-6566] - SqlTransform does not work for beam version above 2.6.0 if RowCoder explicitly chosen
  • [BEAM-7427] - JmsCheckpointMark can not be correctly encoded
  • [BEAM-8374] - PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata
  • [BEAM-8490] - Python typehints: properly resolve empty dict type
  • [BEAM-8492] - Python typehints: don't try to strip_iterable from None
  • [BEAM-8525] - trivial_inference: slice of Const[List[T]] returns T instead of List[T]
  • [BEAM-8532] - Beam Python trigger driver sets incorrect timestamp for output windows.
  • [BEAM-8590] - Python typehints: native types: consider bare container types as containing Any
  • [BEAM-8629] - WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.
  • [BEAM-8739] - Consistently use with Pipeline(...) syntax
  • [BEAM-8965] - WriteToBigQuery failed in BundleBasedDirectRunner
  • [BEAM-9003] - test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow
  • [BEAM-9057] - Make sure restriction_tracker.deferred_remainder is never called more than once for one <element, restriction>
  • [BEAM-9113] - Protobuf NanosType<T> serialization issues
  • [BEAM-9116] - Limit the number of past invocations stored in the job service
  • [BEAM-9124] - Upgrade linkage checker 1.1.2 to avoid Maven HTTPS problem
  • [BEAM-9126] - TestStreamTest.testDiscarding mode fails for Dataflow since Dataflow TestStream only updates watermarks at 1s resolution.
  • [BEAM-9132] - State request handler is removed prematurely when closing ActiveBundle
  • [BEAM-9143] - Add withOutputParallelization to RedisIO.Read/ReadAll
  • [BEAM-9161] - Inconsistent view of current ActiveBundle from main and bundle timer thread
  • [BEAM-9204] - HBase SDF @SplitRestriction does not take the range input into account to restrict splits
  • [BEAM-9215] - FileBesedSink may suppress exceptions during close
  • [BEAM-9218] - Template staging broken on Beam 2.18.0
  • [BEAM-9225] - Flink uber jar job server hangs
  • [BEAM-9227] - Perform bounded source computations on the worker.
  • [BEAM-9228] - _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
  • [BEAM-9240] - Check for Nullability in typesEqual() method of FieldType class
  • [BEAM-9241] - Fix inconsistent nullability mapping for Protobuf to Schema
  • [BEAM-9242] - Processing Stuck messages are reported by Dataflow as errors
  • [BEAM-9252] - Problem shading Beam pipeline with Beam 2.20.0-SNAPSHOT
  • [BEAM-9253] - SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
  • [BEAM-9265] - @RequiresTimeSortedInput does not respect allowedLateness
  • [BEAM-9277] - Beam 2.19 raises exception in IPython notebook
  • [BEAM-9288] - Conscrypt shaded dependency
  • [BEAM-9290] - runner_harness_container_image experiment is not honored in python released sdks.
  • [BEAM-9304] - beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner
  • [BEAM-9311] - ZetaSQL Named Parameters are lower case, don't treat as case-sensitive
  • [BEAM-9313] - beam_PostRelease_NightlySnapshot failure due to ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum
  • [BEAM-9317] - PostCommit PVR failures
  • [BEAM-9333] - DataCatalogPipelineOptions is not registered
  • [BEAM-9345] - "Multiple environments cannot be created in detached mode"
  • [BEAM-9357] - Bump upper end of Google Bigquery dependencies for python
  • [BEAM-9394] - DynamicMessage handling of empty map violates schema nullability
  • [BEAM-9400] - Upgrade Linkage Checker 1.1.4
  • [BEAM-9405] - Python PostCommit is flaky: 'PortableRunner' object has no attribute 'create_job_service'
  • [BEAM-9413] - [beam_PostCommit_Py_ValCont] build failed
  • [BEAM-9417] - Unable to Read form BigQuery and File system in same pipeline
  • [BEAM-9423] - Re-Add the stop button to the Flink web interface for pipelines
  • [BEAM-9452] - PipelineResources algorithm is not working on Windows
  • [BEAM-9465] - Reshuffle should trigger repeatedly
  • [BEAM-9475] - New style metrics in portability throw error
  • [BEAM-9478] - Update samza runner page to reflect new changes
  • [BEAM-9485] - Dataflow Silently drops Non implemented transform in fnapi mode.
  • [BEAM-9503] - SyntaxError in process worker startup
  • [BEAM-9548] - Bad error handling with errors from TestStreamService when using Interactive Beam
  • [BEAM-9557] - Error setting processing time timers near end-of-window
  • [BEAM-9566] - Performance regression of FlinkRunner stream mode due to watermark holds update
  • [BEAM-9573] - Watermark hold for timer output timestamp is not computed correctly
  • [BEAM-9601] - Interactive test_streaming_wordcount failing
  • [BEAM-12500] - Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery

New Feature

  • [BEAM-3453] - Allow usage of public Google PubSub topics in Python DirectRunner
  • [BEAM-6857] - Support dynamic timers
  • [BEAM-7810] - Allow ValueProvider arguments to ReadFromDatastore
  • [BEAM-8550] - @RequiresTimeSortedInput DoFn annotation
  • [BEAM-8561] - Add ThriftIO to Support IO for Thrift Files
  • [BEAM-8564] - Add LZO compression and decompression support
  • [BEAM-8614] - Expose SDK harness status to Runner through FnApi
  • [BEAM-9072] - [SQL] Add support for Datastore source
  • [BEAM-9149] - Support ZetaSQL positional parameters
  • [BEAM-9184] - Add ToSet() combiner, similar to ToList() and ToDict()
  • [BEAM-9220] - Add use_runner_v2 argument for dataflow
  • [BEAM-9305] - Support ValueProvider for BigQuerySource query string
  • [BEAM-9340] - Properly populate pipeline proto requirements.
  • [BEAM-9344] - Enable bundle finalization in Java SDK

Improvement

  • [BEAM-1833] - Restructure Python pipeline construction to better follow the Runner API
  • [BEAM-6120] - Support retrieval of large gbk iterables over the state API.
  • [BEAM-7310] - Confluent Schema Registry support in KafkaIO
  • [BEAM-8042] - Parsing of aggregate query fails
  • [BEAM-8271] - StateGetRequest/Response continuation_token should be string
  • [BEAM-8298] - Implement state caching for side inputs for Python
  • [BEAM-8335] - Add streaming support to Interactive Beam
  • [BEAM-8399] - Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
  • [BEAM-8537] - Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
  • [BEAM-9022] - Publish spark job server container images in release process
  • [BEAM-9030] - Bump grpc to 1.26.0
  • [BEAM-9059] - Migrate PTransformTranslation to use string constants
  • [BEAM-9140] - Update to ZetaSQL 2020.01.1
  • [BEAM-9160] - Update AWS SDK to support Kubernetes Pod Level Identity
  • [BEAM-9162] - Upgrade Jackson to version 2.10.2
  • [BEAM-9169] - Extra character introduced during Calcite unparsing
  • [BEAM-9175] - Introduce an autoformatting tool to Python SDK
  • [BEAM-9176] - Update dataflow Java container
  • [BEAM-9203] - Programmatically determine if SQL exception is user error, unsupported, or bug
  • [BEAM-9230] - Enable CrossLanguageValidateRunner test for Spark runner
  • [BEAM-9231] - Annotate as Experimental/Internal missing classes in beam-sdks-java-core
  • [BEAM-9236] - Mark missing Schema based classes and methods as Experimental
  • [BEAM-9264] - Upgrade Spark to version 2.4.5
  • [BEAM-9268] - SpannerIO: Better documentation and warning about creating tables in the pipeline
  • [BEAM-9273] - Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner
  • [BEAM-9276] - python: create a class to encapsulate the work required to submit a pipeline to a job service
  • [BEAM-9280] - Update commons-compress to version 1.20
  • [BEAM-9281] - Update commons-csv to version 1.8
  • [BEAM-9292] - Provide an ability to specify additional maven repositories for published POMs
  • [BEAM-9296] - Add typing annotation to python SDF
  • [BEAM-9315] - HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths
  • [BEAM-9326] - JsonToRow transform should not use bounded Wildcards for its input
  • [BEAM-9329] - Support request of schemas by version on KafkaIO + Confluent Schema Registry
  • [BEAM-9343] - Upgrade ZetaSQL to 2020.02.1
  • [BEAM-9349] - Upgrade to joda time 2.10.5 to get updated TZDB
  • [BEAM-9352] - Ensure consistent usage of jackson version brought in by transitive dependencies
  • [BEAM-9359] - Use DataCatalog client libraries rather than gRPC stubs
  • [BEAM-9364] - Refactor KafkaIO to use DeserializerProviders
  • [BEAM-9436] - Improve performance of GBK

Test

  • [BEAM-8141] - Add an integration test suite for cross-language transforms for Spark runner
  • [BEAM-9347] - Remove default image for Unified Worker

Task

  • [BEAM-4150] - Standardize use of PCollection coder proto attribute
  • [BEAM-6628] - Update GCP dependencies to a recent version
  • [BEAM-7518] - Protobuf Schema: Introduce logical type for Timestamp, Duration and other
  • [BEAM-8437] - Consider using native doubles in standard_coders_test.yaml
  • [BEAM-9037] - Instant and duration as logical type
  • [BEAM-9063] - Migrate docker images to apache namespace.
  • [BEAM-9084] - Cleaning up SDK docker image tagging
  • [BEAM-9121] - Bump vendored calcite to 1.21.0
  • [BEAM-9263] - Bump python sdk fnapi version to enable status reporting
  • [BEAM-9301] - Add a script to check Java linkage issues (beam-linkage-check.sh)
  • [BEAM-9310] - Make SpannerAccessor in Java package-private to reduce API surface
  • [BEAM-9472] - Remove excessive logging in python fn_api_runner
  • [BEAM-9560] - Beam Release Process breaks by "Project '1.10' not found in project ':runners:flink'"

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.