Release Notes - Beam - Version 2.11.0 - HTML format

Sub-task

  • [BEAM-4904] - Beam Dependency Update Request: de.flapdoodle.embed:de.flapdoodle.embed.mongo 2.2.0
  • [BEAM-5322] - Finish Python 3 porting for typehints module
  • [BEAM-5617] - Side inputs don't work on Python 3
  • [BEAM-5622] - Several tests fail on Python 3 with: Runtime type violation detected
  • [BEAM-5731] - Disable compare parameter in Top.Of() combiner when executing in Python 3.
  • [BEAM-5776] - Using methods in map is broken on Python 3
  • [BEAM-5879] - TFRecordio not Py3 compatible
  • [BEAM-5953] - Enable WordCount example on DataflowRunner on Python 3
  • [BEAM-6135] - Revert dill pip install from github commit
  • [BEAM-6154] - Gcsio batch delete broken in Python 3
  • [BEAM-6207] - extend "Data insertion Pipeline" with Kafka IO
  • [BEAM-6290] - Make the schema for BQ tables storing metric results more generic (JAVA)
  • [BEAM-6291] - Make the schema for BQ tables storing metric results more generic (Python)
  • [BEAM-6454] - TypeError in DataflowRunner: dict_values does not support indexing
  • [BEAM-6532] - BigQuery IO does not work in Python 3
  • [BEAM-6567] - Extend "Data insertion pipeline" with KinesisIO
  • [BEAM-6572] - Dataflow Python runner should use a Python-3 compatible container when starting a Python 3 pipeline.
  • [BEAM-6616] - Stager should stage Python 3 wheels for Beam SDK once they are released.
  • [BEAM-6617] - Release Python 3 wheels with first Beam SDK release that supports Python 3.
  • [BEAM-6665] - SDK source tarball is different when created on Python 2 and Python 3
  • [BEAM-6705] - ConcurrentModificationException in ParDoSchemaTest
  • [BEAM-6709] - Typehinting depends on typing changes in Python 3.5.3

Bug

  • [BEAM-3435] - Python SDK examples should use beam.io.WriteToBigQuery transform rather than the BigQuerySink to interact with BQ.
  • [BEAM-3667] - Failure in MongoDbIOTest.testReadWithCustomConnectionOptions
  • [BEAM-4030] - Add CombineFn.compact, similar to Java
  • [BEAM-4142] - HadoopResourceIdTest has had a masked failure
  • [BEAM-4184] - S3ResourceIdTest has had a masked failure
  • [BEAM-4520] - No messages delivered after a while with PubsubIO
  • [BEAM-4620] - UnboundedReadFromBoundedSource.split() should always call split()
  • [BEAM-5392] - GroupByKey on Spark: All values for a single key need to fit in-memory at once
  • [BEAM-5442] - PortableRunner swallows custom options for Runner
  • [BEAM-5816] - Flink runner starts new bundles while disposing operator
  • [BEAM-5959] - Add Cloud KMS support to GCS copies
  • [BEAM-6237] - ULR ValidatesRunner tests not deleting artifacts.
  • [BEAM-6318] - beam-sdks-python:setupVirtualenv sometimes fails due to a pip flake "No matching distribution found"
  • [BEAM-6334] - Change status of LoadTests after failure on Dataflow
  • [BEAM-6359] - Support for GEOGRAPHY datatype in BQIO for Java SDK
  • [BEAM-6361] - Fix user-metric prefix detection for portable Flink metrics
  • [BEAM-6424] - RabbitMqIO: NullPointerException raised on getWatermark() first call
  • [BEAM-6489] - Python precommits are failiing due to a pip error: "no such option: --process-dependency-links"
  • [BEAM-6491] - FileIOTest.testMatchWatchForNewFiles flakey in java presubmit
  • [BEAM-6494] - We do not sync pipeline with server at bundle end, only during teardown
  • [BEAM-6497] - ContainerLaunchException in java precommit
  • [BEAM-6510] - [beam_PostCommit_Java_Nexmark_Flink] Consistently timing out
  • [BEAM-6518] - Substring-matching bug in Python metrics filtering
  • [BEAM-6583] - Audit Python 3 version support and refine compatibility spec.
  • [BEAM-6589] - SDKs::Java::Extensions::Kryo is incorrectly shaded
  • [BEAM-6601] - Fix wrong command to run Hadoop InputFormat integration test in Javadoc
  • [BEAM-6604] - SpannerIO: unspecified key column value causes NPE
  • [BEAM-6607] - SchemaCoder cannot encode row with null value in array
  • [BEAM-6608] - Flink Runner prepares to-be-staged file too late
  • [BEAM-6632] - Unable to run Query with TUMBLE interval >= 25 DAY
  • [BEAM-6634] - Fix "Run Portable_Python PreCommit" for gradle 5
  • [BEAM-6635] - Fix java.lang.OutOfMemoryError: Java heap space on gradle 5
  • [BEAM-6638] - Python ExternalTransform output mismatched
  • [BEAM-6640] - Reshuffle not translated to Flink rebalance
  • [BEAM-6650] - FlinkRunner fails to checkpoint elements emitted during finishBundle
  • [BEAM-6678] - FlinkRunner does not checkpoint partition view of watermark holds
  • [BEAM-6679] - Fix javadoc @ litterals
  • [BEAM-6720] - Binary incompatibility introduced to MapElements between 2.9.0 and 2.10.0
  • [BEAM-6736] - Upgrade gcsio dependency to 1.9.15
  • [BEAM-6806] - org.apache.beam.runners not importing in 2.10 & 2.11
  • [BEAM-7054] - PortableRunner on Flink cluster crashes
  • [BEAM-7152] - public class TestPipeline fails with java.lang.IllegalStateException
  • [BEAM-7279] - Huge memory leak when using logging with invalid params on py27
  • [BEAM-7403] - BigQueryIO.Write does not autoscale correctly (idle workers)

New Feature

  • [BEAM-1318] - PipelineOptions should warn if there are unused options
  • [BEAM-6365] - Add ZStandard compression support for Java SDK
  • [BEAM-6392] - Add support for new BigQuery streaming read API to BigQueryIO
  • [BEAM-6488] - Portable Flink runner support for running cross-language transforms
  • [BEAM-6587] - Let StringUtf8 be a well-known coder.
  • [BEAM-6697] - ParquetIO Performance test is failing on (GCS filesystem)

Improvement

  • [BEAM-1251] - Python 3 Support
  • [BEAM-4783] - Add bundleSize parameter to control splitting of Spark sources (useful for Dynamic Allocation)
  • [BEAM-5396] - Flink portable runner savepoint / upgrade support
  • [BEAM-5910] - FileSystems should retrieve lastModified time
  • [BEAM-6019] - Portable streaming flink does not preserve original error message
  • [BEAM-6285] - add parameters for offsetConsumer in KafkaIO.read()
  • [BEAM-6302] - Allow setting compression codec in ParquetIO write
  • [BEAM-6305] - Upgrade CassandraIO to use Cassandra java driver 3.6.0
  • [BEAM-6386] - Add named variant of PTransform::compose
  • [BEAM-6403] - Improve checkstyle rules on javadoc comments
  • [BEAM-6520] - Deprecate MongoDb `withKeepAlive` because it is deprecated in the Mongo driver
  • [BEAM-6533] - UnboundedSourceWrapper source log output should not be zero-based
  • [BEAM-6540] - Autoscaling should be aware of Streaming RPC Quota
  • [BEAM-6571] - Flag for streaming engine
  • [BEAM-6609] - Default tempLocation in FlinkPipelineOptions to default tmp directory
  • [BEAM-6630] - Upgrade Gradle to 5.2
  • [BEAM-6631] - Update commons-compress to version 1.18
  • [BEAM-6658] - Add kms_key to BigQuery transforms, pass to Dataflow
  • [BEAM-6663] - Add the ability to transform to/from json SerializablePipelineOptions
  • [BEAM-6664] - Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow

Test

  • [BEAM-6405] - Improve PortableValidatesRunner test reliability on Jenkins
  • [BEAM-6469] - Python Flink ValidatesRunner tests fail due to missing module
  • [BEAM-6473] - Python Flink ValidatesRunner test_flattened_side_input fails

Task

  • [BEAM-6271] - initial support for portable api in samza runner
  • [BEAM-6615] - OutputReceiver is not an annotation

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.