Release Notes - Beam - Version 2.8.0 - HTML format

Sub-task

  • [BEAM-2594] - Python shim for submitting to a JobService
  • [BEAM-2916] - Python SDK support for portable user state
  • [BEAM-3286] - Go SDK support for portable side input
  • [BEAM-3371] - Add ability to stage directories with compiled classes to Spark
  • [BEAM-3651] - Port BigQueryTornadoesTest off DoFnTester
  • [BEAM-3652] - Port WriteWithShardingFactoryTest off DoFnTester
  • [BEAM-3655] - Port MaxPerKeyExamplesTest off DoFnTester
  • [BEAM-3711] - Implement portable Combiner lifting in Dataflow Runner
  • [BEAM-4496] - Create Jenkins job to push generated HTML to asf-site branch
  • [BEAM-4499] - Migrate Apache website publishing to use apache/beam asf-site branch
  • [BEAM-4553] - Implement a Graphite sink for the metrics pusher
  • [BEAM-4841] - Test Auto JIRA Subtask 789
  • [BEAM-4911] - Beam Dependency Update Request: org.elasticsearch:elasticsearch 6.3.2
  • [BEAM-4914] - Beam Dependency Update Request: org.elasticsearch.client:elasticsearch-rest-client 6.3.2
  • [BEAM-4920] - Beam Dependency Update Request: org.elasticsearch.test:framework 6.3.2
  • [BEAM-5015] - Beam Dependency Update Request: org.elasticsearch.client:transport 6.3.2
  • [BEAM-5017] - Beam Dependency Update Request: org.elasticsearch.plugin:transport-netty4-client 6.3.2
  • [BEAM-5225] - Beam Dependency Update Request: org.elasticsearch:elasticsearch 6.4.0
  • [BEAM-5227] - Beam Dependency Update Request: org.elasticsearch.client:elasticsearch-rest-client 6.4.0
  • [BEAM-5228] - Beam Dependency Update Request: org.elasticsearch.test:framework 6.4.0
  • [BEAM-5237] - Beam Dependency Update Request: org.elasticsearch.client:transport 6.4.0
  • [BEAM-5238] - Beam Dependency Update Request: org.elasticsearch.plugin:transport-netty4-client 6.4.0
  • [BEAM-5327] - Go support for custom dataflow worker jar
  • [BEAM-5626] - Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
  • [BEAM-5678] - Jeckyll redirect_from incompatible with HTMLProofer url validation
  • [BEAM-5989] - Create ParDo Load Test
  • [BEAM-5990] - Create Combine Load Test
  • [BEAM-5991] - Create CoGroupByKey Load Test
  • [BEAM-5992] - Create GroupByKey Load Test

Bug

  • [BEAM-1909] - BigQuery read transform fails for DirectRunner when querying non-US regions
  • [BEAM-3089] - Issue with setting the parallelism at client level using Flink runner
  • [BEAM-3727] - Never shutdown sources in Flink Streaming execution mode
  • [BEAM-3955] - Need a way to translate step names for Beam names and runner names
  • [BEAM-4704] - String operations yield incorrect results when executed through SQL shell
  • [BEAM-4826] - Flink runner sends bad flatten to SDK
  • [BEAM-4858] - Clean up _BatchSizeEstimator in element-batching transform.
  • [BEAM-4861] - Hadoop Filesystem silently fails
  • [BEAM-5144] - [beam_PostCommit_Java_GradleBuild][org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMark][Flake] Expected messages count assert fails
  • [BEAM-5190] - Python pipeline options are not picked correctly by PortableRunner
  • [BEAM-5194] - Pipeline options with multi value are not deserialized correctly from map
  • [BEAM-5261] - Sql array indexing should be 1-based
  • [BEAM-5262] - JobState support for Reference Runner
  • [BEAM-5269] - Create integration tests for BigQueryIORead pipeline
  • [BEAM-5277] - Python SDK wordcount fails due to side inputs in streaming mode
  • [BEAM-5279] - Metadata is lost in CalciteUtils schema conversion
  • [BEAM-5301] - Migrate integration tests for datastore_wordcount
  • [BEAM-5332] - SDK harness containers are not eventually shut down after job ends
  • [BEAM-5337] - [beam_PostCommit_Java_GradleBuild][:beam-runners-flink_2.11:test][Flake] Build times out in beam-runners-flink target
  • [BEAM-5339] - Implement new policy on Beam dependency tooling
  • [BEAM-5341] - Migrate integration tests for TfIdf
  • [BEAM-5365] - Migrate integration tests for bigquery_tornadoes
  • [BEAM-5367] - [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website test failure
  • [BEAM-5383] - Migrate integration tests for python bigquery io read
  • [BEAM-5389] - [beam_PostCommit_Java_GradleBuild][:beam-runners-google-cloud-dataflow-java:examplesJavaIntegrationTest]
  • [BEAM-5395] - BeamPython data plane streams data
  • [BEAM-5406] - NullPointerException when converting Null Datetime to TableRow
  • [BEAM-5407] - [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post commit
  • [BEAM-5408] - (Java) Using Compression.GZIP with TFRecordIO
  • [BEAM-5412] - TFRecordIO fails with records larger than 8K
  • [BEAM-5417] - FileSystems.match behaviour diff between GCS and local file system
  • [BEAM-5457] - BigQuerySource(query=...) in DirectRunner creates temp dataset in the wrong location
  • [BEAM-5486] - Python: Filesystems.match(['gs://bucket/*']) fails
  • [BEAM-5487] - ByteKeyRangeTracker restrictions do not cover the entire interval because of incorrect next key
  • [BEAM-5500] - Portable python sdk worker leaks memory in streaming mode
  • [BEAM-5509] - Python pipeline_options doesn't handle int type
  • [BEAM-5513] - Upgrade google-cloud-pubsub to 0.35.4
  • [BEAM-5515] - beam_Release_Gradle_NightlySnapshot failed
  • [BEAM-5516] - Flink master option from Python SDK not honored by Flink runner
  • [BEAM-5518] - :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com
  • [BEAM-5528] - Java PortableRunner pipeline fails on FlinkRunner due to CREATE_VIEW
  • [BEAM-5529] - Dataflow runner raises AssertionError if job takes > 50 seconds to go from PENDING to RUNNING
  • [BEAM-5533] - Fix the comparer import in the dependency tool
  • [BEAM-5598] - :beam-website:testWebsite is flaky
  • [BEAM-5603] - Fix broken 2.6.0 links in beam-site
  • [BEAM-5608] - KeyError on Python 3 when checking environment var BEAM_EXPERIMENTAL_PY3
  • [BEAM-5619] - Fix minor bug in JdbcIO example code
  • [BEAM-5625] - PortableRunner.PipelineResult.cancel not working
  • [BEAM-5633] - Python SDK harness logging client failure
  • [BEAM-5643] - Fix broken user_score_it_test
  • [BEAM-5667] - Remove .gradle files from beam-site
  • [BEAM-5684] - Need a test that verifies Flattening / not-flattening of BQ nested records
  • [BEAM-5685] - TopWikipediaSessionsIT is flaky
  • [BEAM-5687] - Checkpointing in portable pipelines does not work
  • [BEAM-5693] - Python SDK tests failing on Windows
  • [BEAM-5695] - dataflow worker jar should built against beam project
  • [BEAM-5697] - Support parsing legacy options in FnHaness
  • [BEAM-5712] - Need an integration test for Datastore IO in Python

New Feature

  • [BEAM-3446] - RedisIO non-prefix read operations
  • [BEAM-5288] - Modify Environment to support non-dockerized SDK harness deployments
  • [BEAM-5441] - Portable Wordcount fails in GreedyPipelineFuser

Improvement

  • [BEAM-2445] - DSL SQL to use service locator pattern to automatically register UDFs
  • [BEAM-2769] - Java SDK support for submitting a Portable Pipeline
  • [BEAM-2884] - Dataflow runs portable pipelines
  • [BEAM-3820] - SolrIO: Allow changing batchSize for writes
  • [BEAM-4042] - Get rid of deprecated gradle API
  • [BEAM-4643] - Allow to check early panes of a window
  • [BEAM-5022] - Move beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to BeamModulePlugin
  • [BEAM-5062] - Add ability to configure S3ClientOptions
  • [BEAM-5105] - Move load job poll to finishBundle() method to better parallelize execution
  • [BEAM-5107] - Support ES 6.x for ElasticsearchIO
  • [BEAM-5202] - register UDF/UDAF with ServiceLoader
  • [BEAM-5219] - Expose OutboundMessage in PubSub client
  • [BEAM-5247] - Remove slf4j-simple binding from dependencies
  • [BEAM-5250] - Python Wordcount fails with Flink portable streaming
  • [BEAM-5342] - Migrate google-api-client libraries to 1.24.1
  • [BEAM-5372] - [Flink Runner] Make minPauseBetweenCheckpoints setting available in FlinkPipelineOptions
  • [BEAM-5376] - Row interface doesn't support nullability on all fields.
  • [BEAM-5382] - Combiner panics at runtime if MergeAccumulators has a context parameter
  • [BEAM-5403] - Update Flink Runner to 1.5.3
  • [BEAM-5405] - Remove deprecated AbstractStateBackend from FlinkPipelineOptions
  • [BEAM-5413] - Add method for defining composite transforms as lambda expressions
  • [BEAM-5418] - Add Flink version compatibility table to Runner page
  • [BEAM-5427] - Fix sample code (AverageFn) in Combine.java
  • [BEAM-5443] - Simplify Python pipeline options for portable runner
  • [BEAM-5455] - Don't info log for every bundle in the python sdk
  • [BEAM-5460] - Update Dataflow Python API client
  • [BEAM-5520] - Flink runner per operator SDK harness option
  • [BEAM-5531] - gs://temp-storage-for-release-validation-tests/nightly-snapshot-validation/5000_gaming_data.csv is being periodically deleted
  • [BEAM-5532] - Update Spark runner to Spark version 2.3.2

Test

  • [BEAM-5283] - Enable Python Portable Flink PostCommit Tests to Jenkins
  • [BEAM-5369] - Portable wordcount java broken because of create_view usage
  • [BEAM-5653] - Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
  • [BEAM-5754] - beam-sdks-java-io-xml:test target fails in 2.7.0 release

Task

  • [BEAM-3904] - Don't use UUID when worker_id is missing
  • [BEAM-4711] - LocalFileSystem.delete doesn't support globbing
  • [BEAM-5308] - JobBundleFactory BindException with FlinkRunner and remote cluster
  • [BEAM-5331] - Flink portable runner Python validate runner failures
  • [BEAM-5634] - Bring Dataflow Java Worker Code into Beam
  • [BEAM-5660] - Add dataflow java worker unit tests into precommit

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.