Release Notes - Beam - Version 2.4.0 - HTML format

Sub-task

  • [BEAM-1618] - Add Gauge metric type to Python SDK
  • [BEAM-2926] - Java SDK support for portable side input
  • [BEAM-2929] - Dataflow support for portable side input
  • [BEAM-3074] - Propagate pipeline protos through Dataflow API from Python
  • [BEAM-3126] - Portable flattens in Python SDK Harness
  • [BEAM-3159] - DoFnTester should be deprecated in favor of TestPipeline
  • [BEAM-3254] - Add Intellij hints to projects containing code generation tasks
  • [BEAM-3553] - Support Python 3 in metrics
  • [BEAM-3554] - Support python 3 in internal
  • [BEAM-3555] - Support Python 3 in the typehints module
  • [BEAM-3556] - Support Python 3 in the utils module
  • [BEAM-3601] - Switch to Java 8 futures
  • [BEAM-3626] - Support remapping the main input window to side input window inside the Java SDK harness
  • [BEAM-3629] - Make the windowing strategy available within the PCollectionView/CollectionToSingleton that is sent to Dataflow
  • [BEAM-3631] - Have Dataflow map main input windows to side input windows
  • [BEAM-3662] - Port MongoDbIOTest off DoFnTester

Bug

  • [BEAM-410] - ApproximateQuantiles$QuantileBuffer defines compareTo but not equals
  • [BEAM-591] - Better handling of watermark in KafkaIO
  • [BEAM-2140] - Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
  • [BEAM-2815] - Python DirectRunner is unusable with input files in the 100-250MB range
  • [BEAM-3153] - Allow streaming processing time domain timers in Beam Python DirectRunner
  • [BEAM-3228] - KinesisMockReadTest is flaky
  • [BEAM-3317] - KinesisReaderTest is Flaky due to overadvanced watermarks
  • [BEAM-3420] - TimerData#compareTo should respect Timer IDs
  • [BEAM-3423] - Distinct.withRepresentativeValueFn throws CoderException "cannot encode null KV"
  • [BEAM-3456] - Enable large scale JdbcIOIT Performance Test
  • [BEAM-3512] - Python PTransform overrides do not completely remove the overriden transform
  • [BEAM-3526] - Support for checkpointMark finalize in KafkaIO
  • [BEAM-3527] - org.apache.beam.sdk.metrics.DistributionResult mixes up min and max constructor args
  • [BEAM-3531] - Nexmark failed with NPE with DEFAULT suite
  • [BEAM-3547] - [SQL] Nested Query Generates Incompatible Trigger
  • [BEAM-3559] - ValueProvider doesn't support argparse-style 'choices'
  • [BEAM-3565] - Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness
  • [BEAM-3578] - SQL module build breaks because of missing dependency
  • [BEAM-3591] - Undefined name: exc_info
  • [BEAM-3598] - kinesis.ShardReadersPoolTest.shouldStopReadersPoolAlsoWhenExceptionsOccurDuringStopping is flaky
  • [BEAM-3599] - kinesis.ShardReadersPoolTest.shouldInterruptKinesisReadingAndStopShortly is flaky
  • [BEAM-3605] - Kinesis ShardReadersPoolTest shouldForgetClosedShardIterator failure
  • [BEAM-3613] - SpannerIO: Typo in "witHost"
  • [BEAM-3627] - Dataflow ValidatesRunner failing ViewTest
  • [BEAM-3628] - Python postcommit broken
  • [BEAM-3632] - Table partioning in DynamicDestination is lost with project is not set in Table Destination
  • [BEAM-3637] - HBaseIOTest methods do not clean up tables
  • [BEAM-3646] - Add comments about appropriate use of DoFn.Teardown
  • [BEAM-3681] - S3Filesystem fails when copying empty files
  • [BEAM-3683] - Support BigQuery column-based time partitioning
  • [BEAM-3690] - Dependency Conflict problems: several conflicting classes exist in different JARs (mockito-all/hamcrest-all)
  • [BEAM-3692] - Hadoop Input Format module is skipped from deployment after mix with Java 1.8
  • [BEAM-3695] - beam_PostCommit_Python_ValidatesContainer_Dataflow red for a few days
  • [BEAM-3705] - ApproximateUnique discards accumulated data with multiple firings.
  • [BEAM-3720] - Python Precommit Broken
  • [BEAM-3728] - Failing ParDoTest for Flink Runner
  • [BEAM-3729] - Spark ValidatesRunner broken with "org.apache.beam.sdk.options.$Proxy72 cannot access its superinterface"
  • [BEAM-3732] - Building with profiles io-it-suite and io-it-suite-local fails
  • [BEAM-3735] - Beam 2.3.0 release archetypes missing mobile gaming examples
  • [BEAM-3739] - @Parameter annotation does not work for UDFs in Beam SQL
  • [BEAM-3754] - KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with KafkaIO.readBytes()
  • [BEAM-3768] - Compile error for Flink translation
  • [BEAM-3799] - Nexmark Query 10 breaks with direct runner
  • [BEAM-3815] - 2.4.0 RC2 uses java worker version beam-master-20180228, should be 2.4.0 or beam-2.4.0
  • [BEAM-3877] - Performance tests flaky due to NoClassDefFoundError
  • [BEAM-3881] - Failure reading backlog in KinesisIO
  • [BEAM-4631] - kafkIO should run the streaming mode over spark runner
  • [BEAM-4752] - Import error in apache_beam.internal.pickler: "'module' object has no attribute 'dill'"
  • [BEAM-5409] - Beam Java SDK 2.4/2.5 PAssert with CoGroupByKey

New Feature

  • [BEAM-79] - Gearpump runner
  • [BEAM-4186] - Need to be able to set QuerySplitter in DatastoreIO.v1()

Improvement

  • [BEAM-230] - Remove WindowedValue#valueInEmptyWindows
  • [BEAM-1442] - Performance improvement of the Python DirectRunner
  • [BEAM-2469] - Handling Kinesis shards splits and merges
  • [BEAM-3124] - Make flatten explicit with portability
  • [BEAM-3154] - Support multiple KeyRanges when reading from BigTable
  • [BEAM-3205] - Publicly document known coder wire formats and their URNs
  • [BEAM-3207] - Publicly document primitive transforms and their URNs - give "impulse" a URN
  • [BEAM-3291] - Add Kinesis Write transform
  • [BEAM-3441] - Allow ValueProvider for JdbcIO.DataSourceConfiguration
  • [BEAM-3538] - Remove (or merge) Java 8 specific tests module into the main one.
  • [BEAM-3550] - Add support for different serviceEndpoint in the S3 filesystem
  • [BEAM-3552] - Support Python 3 in the smaller modules
  • [BEAM-3566] - Replace Python DirectRunner apply_* hooks with PTransformOverrides
  • [BEAM-3572] - Reduce inefficient allocations in coders
  • [BEAM-3575] - Update contribution guidelines to add the option of cloning from a fork
  • [BEAM-3593] - Remove methods that just call super()
  • [BEAM-3602] - Add source set for generated Java gRPC code
  • [BEAM-3603] - Add a ReadAll transform to tfrecordio
  • [BEAM-3611] - Split KafkaIO.java into smaller files
  • [BEAM-3618] - Remove extraneous "return" statement
  • [BEAM-3620] - Deprecate older kafka clients and make kafka clients a provided dependency
  • [BEAM-3624] - Remove collapsible if statements
  • [BEAM-3635] - Infer type hints on PTransformOverrides
  • [BEAM-3644] - Speed up Python DirectRunner execution by using the FnApiRunner when possible
  • [BEAM-3688] - add setup/teardown for BeamSqlSeekableTable
  • [BEAM-3689] - Direct runner leak a reader for every 10 input records
  • [BEAM-3762] - Update Dataflow worker image to support unlimited JCE policy
  • [BEAM-3874] - Switch AvroIO sink default codec to Snappy

Test

  • [BEAM-3217] - Add a performance test for HadoopInputFormatIO

Task

  • [BEAM-1492] - Avoid potential issue in ASM 5.0
  • [BEAM-3360] - [SQL] Do not assign triggers for HOP/TUMBLE
  • [BEAM-3562] - Update to Checkstyle 8.7
  • [BEAM-3701] - PipelineOptionsFactory doesn't use the right classloader for its SPI

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.