Release Notes - Beam - Version 2.10.0 - HTML format

Sub-task

  • [BEAM-3608] - Vendor Guava
  • [BEAM-3657] - Port JoinExamplesTest off DoFnTester
  • [BEAM-3659] - Port UserScoreTest off DoFnTester
  • [BEAM-3661] - Port TriggerExampleTest off DoFnTester
  • [BEAM-3912] - Add batching support for HadoopOutputFormatIO
  • [BEAM-4454] - Provide automatic schema registration for AVROs
  • [BEAM-5309] - Add streaming support for HadoopOutputFormatIO
  • [BEAM-5788] - wordcount_fnapi_it failed on TestDataflowRunner because of JSON string decoding error
  • [BEAM-5920] - Add additional OWNERS
  • [BEAM-5993] - Create SideInput Load test
  • [BEAM-6056] - Migrate gRPC to use vendoring library and vendoring format
  • [BEAM-6096] - illegal signature attribute when compiling with JDK 11
  • [BEAM-6174] - Remove unnecesarry Kryo dependecny from euphoria
  • [BEAM-6188] - Create unbounded synthetic source
  • [BEAM-6200] - Deprecate HadoopInputFormatIO
  • [BEAM-6201] - Provide a "Data insertion pipeline" for Python suites
  • [BEAM-6293] - Create automatic schema registration for AutoValue classes.
  • [BEAM-6300] - Generated row object for POJOs, Avros, and JavaBeans should work if the wrapped class is package private
  • [BEAM-6452] - Nested collection types cause NullPointerException when converting to a POJO
  • [BEAM-6568] - Update website documentation regarding using new HadoopFormatIO

Bug

  • [BEAM-1628] - Flink runner: logic around --flinkMaster is error-prone
  • [BEAM-4110] - LocalResourceIdTest has had a masked failure
  • [BEAM-4143] - GcsResourceIdTest has had a masked failure
  • [BEAM-5146] - GetterBasedSchemaProvider might create inconsistent views of the same schema
  • [BEAM-5386] - Flink Runner gets progressively stuck when Pubsub subscription is nearly empty
  • [BEAM-5462] - get rid of <pipeline>.options deprecation warnings in tests
  • [BEAM-5514] - BigQueryIO doesn't handle quotaExceeded errors properly
  • [BEAM-5662] - [beam_PostCommit_Website_Publish] [:testWebsite] External link http://wiki.apache.org/incubator/BeamProposal failed: got a time out
  • [BEAM-5674] - DataflowRunner.Deduplicate/WithKeys cannot infer Coder for K when running with experiment "enable_custom_pubsub_source"
  • [BEAM-5683] - [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake
  • [BEAM-5710] - Compatibility issues with netty 4.1.28 + tcnative 2.0.12 in beam 2.7.0 on dataflow
  • [BEAM-5734] - RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
  • [BEAM-5829] - SQL should probably not support GROUP BY or set operations on floating point numbers
  • [BEAM-5866] - RowCoder doesn't implement structuralValue
  • [BEAM-5911] - Python Dataflow IT test fails even though job succeeded
  • [BEAM-5925] - Test flake in ElasticsearchIOTest.testWriteFullAddressing
  • [BEAM-5936] - [beam_PreCommit_Java_Cron] Flake due to flink.PortableStateExecutionTest
  • [BEAM-5978] - portableWordCount gradle task not working
  • [BEAM-6002] - Nexmark tests timing out on all runners (crash loop due to metrics?)
  • [BEAM-6033] - normalize httplib2.Http initialization and usage
  • [BEAM-6058] - Support flink config directory for flink runner.
  • [BEAM-6074] - Artifact staging proto changes are causing python post commit to fail
  • [BEAM-6106] - test_bigquery_tornadoes_it fails due to a hash mismatch
  • [BEAM-6110] - Beam SQL join moves element timestamps to EOW sometimes, not always
  • [BEAM-6111] - org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey 40/50 runs failed
  • [BEAM-6146] - Portable Flink End to end precommit test
  • [BEAM-6164] - GCS OpenRead and OpenWrite do not make use of the passed in context
  • [BEAM-6170] - NexmarkLauncher stall warning causes benchmark failure
  • [BEAM-6176] - Flink Runner's master url does not support IPv6 addresses
  • [BEAM-6179] - Batch size estimation failing
  • [BEAM-6213] - Beam LocalFilesystem does not match glob patterns in windows
  • [BEAM-6227] - FlinkRunner errors if GroupByKey contains null values (streaming mode only)
  • [BEAM-6229] - BigQuery returns value error while retrieving load test metrics
  • [BEAM-6244] - updateProducerProperties is missing in KafkaIO.Write
  • [BEAM-6249] - Vendored gRPC doesn't seem to work with dataflow
  • [BEAM-6262] - KinesisIO - ShardReadersPool - Unexpected exception occurred
  • [BEAM-6270] - XmlIO documentation example should return Pcollection of Record Class not String
  • [BEAM-6276] - Performance regression caused by extra calls to TypeDescriptor.getRawType
  • [BEAM-6287] - pyarrow is not supported on Windows Python 2
  • [BEAM-6311] - org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testStandardQueryWithoutCustom flakey
  • [BEAM-6312] - org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithoutReshuffle flakey
  • [BEAM-6319] - BigQueryToTableIT.testNewTypesQueryWithoutReshuffleWithCustom flaky
  • [BEAM-6329] - State for portable timers can interfere with user state
  • [BEAM-6343] - NeedsRunner tests not running in precommit
  • [BEAM-6346] - Portable Flink Job hangs if the sdk harness fails to start
  • [BEAM-6352] - Watch PTransform is broken
  • [BEAM-6353] - TFRecordIOTest is failing
  • [BEAM-6354] - Hanging SplittableDoFnTest#testLateData
  • [BEAM-6357] - Datastore Python client: retry on ABORTED rpc responses
  • [BEAM-6364] - Warn on unsupported portable Flink metric types, instead of throwing
  • [BEAM-6367] - Beam SQL JDBC driver breaks DriverManager#getConnection for other JDBC sources
  • [BEAM-6391] - small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)
  • [BEAM-6396] - PipelineTest:test_memory_usage fails with new urllib3 on some platforms
  • [BEAM-6407] - regression: FileIO.writeDynamic() with side inputs fails in DirectRunner
  • [BEAM-6440] - FlinkTimerInternals memory leak
  • [BEAM-6460] - Jackson Cache may hold on to Classloader after pipeline restart
  • [BEAM-6545] - NPE when decoding null base 64 strings
  • [BEAM-6558] - Beam SQL transitive dependencies appear incomplete / broken
  • [BEAM-6579] - BigQueryIO improperly handles triggering when temp tables are used
  • [BEAM-6582] - Upgrade gcsio dependency to 1.9.13
  • [BEAM-6608] - Flink Runner prepares to-be-staged file too late
  • [BEAM-6648] - Metrics not shown in Flink 1.7.1
  • [BEAM-6765] - Beam 2.10.0 for Python requires pyarrow 0.11.1, which is not installable in Google Cloud DataFlow

New Feature

  • [BEAM-3346] - Add projections to MongoDb IO
  • [BEAM-4388] - Support optimized logical plan
  • [BEAM-4444] - Parquet IO for Python SDK
  • [BEAM-5112] - Investigate if Calcite can generate functions that we need
  • [BEAM-5419] - Build multiple versions of the Flink Runner against different Flink versions
  • [BEAM-5798] - Add support to write to multiple topics with KafkaIO
  • [BEAM-5817] - Nexmark test of joining stream to files
  • [BEAM-6109] - Beam SQL rejects join of unbounded to bounded, though a side input join does succeed
  • [BEAM-6133] - [SQL] Add support for TableMacro UDF
  • [BEAM-6163] - Support Process environment on Mac
  • [BEAM-6191] - Redundant error messages for failures in Dataflow runner
  • [BEAM-6294] - Use Flink's redistribute for reshuffle.

Improvement

  • [BEAM-2281] - call SqlFunctions in operator implementation
  • [BEAM-2873] - Detect number of shards for file sink in Flink Streaming Runner
  • [BEAM-2889] - Flink runs portable pipelines
  • [BEAM-5218] - Add the portable test name to the pipeline name for easy debugging
  • [BEAM-5267] - Update Flink Runner to Flink 1.6.x
  • [BEAM-5310] - Add support of HadoopOutputFormatIO
  • [BEAM-5396] - Flink portable runner savepoint / upgrade support
  • [BEAM-5807] - Add Schema support for AVRO
  • [BEAM-6021] - Serialize more core classes when using kryo's BeamSparkRunnerRegistrator
  • [BEAM-6030] - Remove Metrics Sinks specific methods from PipelineOptions
  • [BEAM-6053] - Add option to disable caching in Spark
  • [BEAM-6063] - KafkaIO: add writing support with ProducerRecord
  • [BEAM-6077] - Make UnboundedSource state rescale friendly
  • [BEAM-6079] - Add ability for CassandraIO to delete data
  • [BEAM-6098] - Support lookup join on right or left
  • [BEAM-6134] - MongoDbIO add support for projection
  • [BEAM-6150] - Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
  • [BEAM-6151] - MongoDbIO add support mongodb server with self signed ssl
  • [BEAM-6155] - Migrate the Go SDK to the modern GCS library
  • [BEAM-6165] - Send metrics to Flink in portable Flink runner
  • [BEAM-6167] - Create a Class to read content of a file keeping track of the file path (python)
  • [BEAM-6172] - Flink metrics are not generated in standard format
  • [BEAM-6183] - BigQuery insertAll API request rate is not properly controlled
  • [BEAM-6186] - Cleanup FnApiRunner optimization phases.
  • [BEAM-6187] - Drop Scala suffix of FlinkRunner artifacts
  • [BEAM-6209] - Remove Http Metrics Sink specific methods from PipelineOptions
  • [BEAM-6212] - MongoDbIO add ordered option (inserts documents even if errors)
  • [BEAM-6234] - [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
  • [BEAM-6235] - Upgrade AutoValue to version 1.6.3
  • [BEAM-6248] - Add Flink 1.7.x build target to Flink Runner
  • [BEAM-6306] - Upgrade Jackson to version 2.9.8
  • [BEAM-6332] - Avoid unnecessary serialization steps when executing combine transform
  • [BEAM-6348] - Add ValueProvider support for Statement in JdbcIO.write()
  • [BEAM-6350] - Reuse same PCollectionView when created in translators
  • [BEAM-6378] - Updating Tika
  • [BEAM-6397] - Add KeyValue attribute to SyntheticDataPubSubPublisher
  • [BEAM-6425] - Replace SSLContext.getInstance("SSL")
  • [BEAM-6552] - Upgrade bigdataoss_gcsio dependency
  • [BEAM-6609] - Default tempLocation in FlinkPipelineOptions to default tmp directory
  • [BEAM-7399] - Publishing a blog post about looping timers.

Test

  • [BEAM-5948] - Add a ParquetIO integration test
  • [BEAM-6009] - PortableValidatesRunner runs only in batch mode
  • [BEAM-6032] - Move PortableValidatesRunner configuration out of BeamModulePlugin
  • [BEAM-6245] - Add test for FlinkTransformOverrides
  • [BEAM-6263] - FlinkJobServerDriver test setup is error-prone
  • [BEAM-6283] - Convert PortableStateTimerTest and PortableExecutionTest to using PAssert
  • [BEAM-6326] - Fix test failures in streaming mode of PortableValidatesRunner
  • [BEAM-6469] - Python Flink ValidatesRunner tests fail due to missing module

Wish

  • [BEAM-3828] - Investigate performance of SQL expression evaluation

Task

  • [BEAM-6143] - Upgrade MongoDbIO to use mongo client 3.9.1
  • [BEAM-6159] - Dataflow portable runner harness should use ExecutableStage to process bundle
  • [BEAM-6204] - Delete Duplicated SocketAddressFactory class in Dataflow worker
  • [BEAM-6225] - Setup Jenkins VR job for new bundle processing code

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.