Release Notes - Beam - Version 2.14.0 - HTML format

Sub-task

  • [BEAM-3204] - Coders only should have a FunctionSpec, not an SdkFunctionSpec
  • [BEAM-5995] - Create Smoke and GBK Python Load Test Jenkins Job
  • [BEAM-6199] - Remove uses of, and finally definitions of, old combine URNs.
  • [BEAM-6429] - apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input fails in Python 3.6
  • [BEAM-6620] - Do not relocate guava
  • [BEAM-6623] - Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
  • [BEAM-6673] - BigQueryIO.Read should automatically produce schemas
  • [BEAM-6674] - The JdbcIO source should produce schemas
  • [BEAM-6769] - BigQuery IO does not support bytes in Python 3
  • [BEAM-6888] - Enable SpotBugs in JdbcIO
  • [BEAM-6936] - Add a Jenkins job running Java examples on Java 11 Dataflow
  • [BEAM-6959] - Run Go SDK Post Commit tests against the Flink Runner.
  • [BEAM-6985] - TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
  • [BEAM-6987] - TypeHints Py3 Error: Typehints NativeTypesTest fails on Python 3.7+
  • [BEAM-7277] - Add PostCommit suite for Python 3.7
  • [BEAM-7407] - Create a Wordcount-on-Flink Python 3 test suite.
  • [BEAM-7454] - Add Python 3.6, 3.7 as supported qualifiers to setup.py.

Bug

  • [BEAM-2611] - Better document and validate arguments of WindowInto
  • [BEAM-2943] - Non-existing fileToStage results in ClassNotFoundException
  • [BEAM-3934] - BoundedReader should be closed in JavaReadViaImpulse#ReadFromBoundedSourceFn
  • [BEAM-4288] - SplittableDoFn: splitAtFraction() API for Python
  • [BEAM-5650] - Timeout exceptions while reading a lot of files from a bounded source like S3 with Flink runner
  • [BEAM-5709] - Tests in BeamFnControlServiceTest are flaky.
  • [BEAM-6813] - Issues with state + timers in java Direct Runner (state cell is null)
  • [BEAM-6952] - concatenated compressed files bug with python sdk
  • [BEAM-6955] - Support Dataflow --sdk_location with modified version number
  • [BEAM-7073] - AvroUtils converting generic record to Beam Row causes class cast exception
  • [BEAM-7135] - Spark executable stage: Job bundle factory is not being closed
  • [BEAM-7144] - Job re-scale fails on Flink >= 1.6 with certain values of maxParallelism
  • [BEAM-7176] - Spark validatesPortableRunner test OOM
  • [BEAM-7194] - [SQL] EXCEPT DISTINCT behavior when right set contains a value is incorrect
  • [BEAM-7267] - install python3 components in release_verify_script.sh
  • [BEAM-7269] - Remove StateSpec from hashCode of SimpleStateTag
  • [BEAM-7282] - Spark portable runner doesn't support `pre_optimize=all`
  • [BEAM-7341] - Portable Spark: testGlobalCombineWithDefaultsAndTriggers fails
  • [BEAM-7349] - Invalid scope of "kafka_clients" dependency for KafkaIO
  • [BEAM-7351] - Failure in Python streaming wordcount test: unexpected messages received on output topic.
  • [BEAM-7357] - Kinesis IO.write throws LimitExceededException
  • [BEAM-7366] - Spotless target does not include project source set
  • [BEAM-7371] - Dependency checks failed because of incorrect virtualenv startup
  • [BEAM-7385] - Portable Spark: testHotKeyCombiningWithAccumulationMode fails
  • [BEAM-7405] - Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - docker-credential-gcloud not installed
  • [BEAM-7406] - Dataflow worker does not include logging backend
  • [BEAM-7412] - portable spark: thread/memory leak in local mode
  • [BEAM-7413] - Huge amount of tasks per stage in Spark runner after upgrade to Beam 2.12.0
  • [BEAM-7421] - Flink/Spark non-portable runners doesn't override Reshuffle
  • [BEAM-7422] - Update java container when running flink compatibility matrix
  • [BEAM-7424] - Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
  • [BEAM-7442] - Bounded Reads for Flink Runner fails with OOM
  • [BEAM-7446] - Reading from a Cloud Pub/Sub topic inside Python's DirectRunner results in a "too many open files" error
  • [BEAM-7467] - Gearpump Quickstart fails, java.lang.NoClassDefFoundError: com/gs/collections/api/block/procedure/Procedure
  • [BEAM-7479] - portable runner test flake: net bind issue
  • [BEAM-7487] - Flink Runner throws exception when cancel is called on FlinkPipelineResult
  • [BEAM-7493] - beam-sdks-testing-nexmark produces corrupt pom.xml
  • [BEAM-7510] - test_write_to_different_file_types is flaky
  • [BEAM-7511] - KafkaTable Initialization
  • [BEAM-7530] - Reading None value type BYTES from bigquery: AttributeError
  • [BEAM-7533] - CoderRegistry resolves Float as SerializableCoder
  • [BEAM-7536] - Fix BigQuery dataset name in collecting Load Tests metrics
  • [BEAM-7541] - IOIT tests are failing on dataflow due to shading being turned off
  • [BEAM-7542] - java.lang.ClassCastException when writing BYTES to BigQuery
  • [BEAM-7548] - test_approximate_unique_global_by_error is flaky
  • [BEAM-7551] - ImpulseSourceFunction is not checkpointed
  • [BEAM-7561] - HdfsFileSystem is unable to match a directory
  • [BEAM-7606] - Fix JDBC time conversion tests
  • [BEAM-7616] - urlopen calls could get stuck without a timeout
  • [BEAM-7649] - Match Python 3 warning messages in setup.py and __init.py__
  • [BEAM-7689] - Temporary directory for WriteOperation may not be unique in FileBaseSink
  • [BEAM-7736] - Retried work is messed up the SDKHarness work assignment
  • [BEAM-7793] - BagState drops Rows when triggered by timer
  • [BEAM-7800] - Make Dataflow container dependencies consistent with setup.py requirements for 2.14.0

New Feature

  • [BEAM-562] - DoFn Reuse: Add new DoFn setup and teardown to python SDK
  • [BEAM-563] - DoFn Reuse: Update DirectRunner to support setup and teardown
  • [BEAM-2857] - Create FileIO in Python
  • [BEAM-5148] - Implement MongoDB IO for Python SDK
  • [BEAM-6693] - ApproximateUnique transform for Python SDK
  • [BEAM-6695] - Latest transform for Python SDK
  • [BEAM-6872] - Add hook for user-defined JVM initialization in workers
  • [BEAM-6880] - Deprecate Java Portable Reference Runner
  • [BEAM-7019] - Reify transform for Python SDK
  • [BEAM-7021] - ToString transform for Python SDK
  • [BEAM-7023] - WithKeys transform for Python SDK
  • [BEAM-7043] - Add DynamoDBIO
  • [BEAM-7044] - Spark portable runner: support user state
  • [BEAM-7221] - Spark portable runner: support timers
  • [BEAM-7305] - Add first version of Hazelcast Jet Runner
  • [BEAM-7342] - Extend SyntheticPipeline map steps to be able to be splittable (Beam Python SDK)
  • [BEAM-7364] - Possible signed overflow for WindowedValue.__hash__
  • [BEAM-7443] - BoundedSource->SDF needs a wrapper in Python SDK
  • [BEAM-7450] - Support unbounded reads with HCatalogIO
  • [BEAM-7492] - Add Spark runner to Go SDK
  • [BEAM-7513] - [SQL] Row Estimation for BigQueryTable
  • [BEAM-7718] - PubsubIO to use gRPC API instead of JSON REST API

Improvement

  • [BEAM-5664] - A canceled pipeline should not return a done status in the jobserver.
  • [BEAM-5865] - Auto sharding of streaming sinks in FlinkRunner
  • [BEAM-6777] - SDK Harness Resilience
  • [BEAM-6983] - Python 3.6 Support
  • [BEAM-6984] - Python 3.7 Support
  • [BEAM-7082] - Support Unbounded Reads mode in HcatalogIO
  • [BEAM-7114] - Move dot pipeline graph renderer to runners-core-construction-java
  • [BEAM-7130] - convertAvroFieldStrict as public static function could handle more types of value for logical type timestamp-millis
  • [BEAM-7141] - Expose kv and window parameters for on_timer
  • [BEAM-7175] - Add Spark Portable ValidatesRunner Batch postcommit test
  • [BEAM-7240] - Kinesis IO Watermark Computation Improvements
  • [BEAM-7263] - Deprecate set/getClientConfiguration in JdbcIO
  • [BEAM-7265] - Update Spark runner to use spark version 2.4.3
  • [BEAM-7268] - Make external sorter Hadoop free
  • [BEAM-7283] - Have javadoc offline link dependency versions bound to versions within BeamModulePlugin.groovy
  • [BEAM-7286] - RedisIO support for INCRBY/DECRBY operations
  • [BEAM-7312] - SchemaProvider can't be used with dynamic types
  • [BEAM-7331] - Missing util function for late pane in java PAssert
  • [BEAM-7348] - Option to expire SDK worker environments
  • [BEAM-7359] - Fix static analysis issues for HadoopFormatIO
  • [BEAM-7360] - Fix static analysis issues for HCatalogIO
  • [BEAM-7388] - Reify PTransform for Python SDK
  • [BEAM-7397] - Avoid String.format in state namespace construction
  • [BEAM-7417] - Fix incorrect command on flink runner docs
  • [BEAM-7426] - FieldSpecifierNotationLexer should support underscore as field character
  • [BEAM-7436] - Use Beam coder for Flink's encoded key instead of falling back to Kryo
  • [BEAM-7448] - Remove redundant windowing information and unused accumulators on Spark runner
  • [BEAM-7465] - Upgrade Jackson to version 2.9.9
  • [BEAM-7470] - Clean up Data Plane, rely only on instruction id and transform id
  • [BEAM-7475] - Add Python stateful processing example in blog
  • [BEAM-7507] - StreamingDataflowWorker attempts to decode non-utf8 binary data as utf8
  • [BEAM-7512] - Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py
  • [BEAM-7529] - Add Sums.ofFloats and Sums.ofDoubles
  • [BEAM-7543] - ReduceByKey.combineBy must accept BinaryFunction<V, V, V>
  • [BEAM-7603] - Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads
  • [BEAM-7665] - Support TypeDefinition options in beam.Combine()
  • [BEAM-7701] - Update Python dependencies page for 2.14.0
  • [BEAM-7715] - External / Cross language (via expansion) transforms / public API should be marked @Experimental
  • [BEAM-7737] - Microbenchmark script do not work consistently
  • [BEAM-7782] - Skip DoFn params test in Python 2 on Windows
  • [BEAM-7922] - Storage API reads in BigQueryTableProvider

Test

  • [BEAM-7362] - Add a performance test for BigQueryIO write
  • [BEAM-7402] - Add a performance test for BigQueryIO read
  • [BEAM-7441] - Artifact staging filesystem errors swallowed by NullPointerException
  • [BEAM-7491] - Add Go SDK post commit test on the Spark runner

Wish

  • [BEAM-7457] - Install post build script plugin on Jenkins

Task

  • [BEAM-4782] - Enforce KV coders for MultiMap side inputs

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.