Release Notes - Beam - Version 2.18.0 - HTML format

Sub-task

  • [BEAM-3658] - Port SpannerIOReadTest off DoFnTester
  • [BEAM-5733] - Pushdown filter to table scan
  • [BEAM-5878] - Support DoFns with Keyword-only arguments in Python 3.
  • [BEAM-6756] - Support lazy iterables in schemas
  • [BEAM-7078] - Beam Dependency Update Request: com.amazonaws:amazon-kinesis-client
  • [BEAM-7636] - Migrate SqsIO to AWS SDK for Java 2
  • [BEAM-7948] - Add time-based cache threshold support in the Java data service
  • [BEAM-7952] - Make the input queue of the input buffer in Python SDK Harness size limited.
  • [BEAM-8252] - (Python SDK) Add worker_region and worker_zone options
  • [BEAM-8254] - (Java SDK) Add workerRegion and workerZone options
  • [BEAM-8442] - Unify bundle register in Python SDK harness
  • [BEAM-8557] - Clean up useless null check.

Bug

  • [BEAM-3493] - Prevent users from "implementing" PipelineOptions
  • [BEAM-4776] - Java PortableRunner should support metrics
  • [BEAM-4777] - Python PortableRunner should support metrics
  • [BEAM-7917] - Python datastore v1new fails on retry
  • [BEAM-7981] - ParDo function wrapper doesn't support Iterable output types
  • [BEAM-8146] - SchemaCoder/RowCoder have no equals() function
  • [BEAM-8204] - ParDoTest.testSideInputAnnotationWithMultipleSideInputs & AvroSchemaTest failed on ApexRunner
  • [BEAM-8205] - AvroSchemaTest failed on FlinkRunner
  • [BEAM-8347] - UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
  • [BEAM-8352] - Reading records in background may lead to OOM errors
  • [BEAM-8450] - ParDoLifecycleTest does not allow for empty bundles
  • [BEAM-8451] - Interactive Beam example failing from stack overflow
  • [BEAM-8460] - Flink/Spark runner ignores UsesStrictTimerOrdering category tests
  • [BEAM-8480] - Explicitly set restriction coder for bounded reader wrapper SDF.
  • [BEAM-8515] - Ensure that ValueProvider types have equals/hashCode implemented for comparison reasons
  • [BEAM-8517] - Jdbc Schema timestamp conversion has timezone inconsistency issue in test
  • [BEAM-8518] - Pipeline options translation fails silently with incompatible jackson-core library
  • [BEAM-8521] - beam_PostCommit_XVR_Flink failing
  • [BEAM-8530] - Dataflow portable runner fails timer ordering tests
  • [BEAM-8565] - Update .test-infra/jenkins/README with missing entries and correct wrong entries
  • [BEAM-8574] - [SQL] MongoDb PostCommit_SQL fails
  • [BEAM-8579] - Strip UTF-8 BOM bytes (if present) in TextSource.
  • [BEAM-8592] - DataCatalogTableProvider should not squash table components together into a string
  • [BEAM-8621] - [Java] beam_Dependency_Check reads smaller number of dependencies than expected
  • [BEAM-8657] - Not doing Combiner lifting for data-driven triggers
  • [BEAM-8663] - BundleBasedRunner Stacked Bundles don't respect PaneInfo
  • [BEAM-8667] - Data channel should to avoid unlimited buffering in Python SDK
  • [BEAM-8733] - The "KeyError: u'-47'" error from line 305 of sdk_worker.py
  • [BEAM-8740] - TestPubsub ignores timeout
  • [BEAM-8747] - Remove Unused non-vendored Guava compile dependencies
  • [BEAM-8802] - Timestamp combiner not respected across bundles in streaming mode.
  • [BEAM-8803] - Default behaviour for Python BQ Streaming inserts sink should be to retry always
  • [BEAM-8814] - --no_auth flag is boolean type and is misleading
  • [BEAM-8822] - Upgrade Hadoop dependencies to version 2.8
  • [BEAM-8825] - OOM when writing large numbers of 'narrow' rows
  • [BEAM-8835] - Artifact retrieval fails with FlinkUberJarJobServer
  • [BEAM-8836] - ExternalTransform is not providing a unique name
  • [BEAM-8882] - Allow Dataflow to automatically choose portability or not.
  • [BEAM-8884] - Python MongoDBIO TypeError when splitting
  • [BEAM-8974] - apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky
  • [BEAM-9013] - Multi-output TestStream breaks the DataflowRunner
  • [BEAM-9041] - SchemaCoder equals should not rely on from/toRowFunction equality
  • [BEAM-9042] - AvroUtils.schemaCoder(schema) produces a not serializable SchemaCoder

New Feature

  • [BEAM-7760] - Interactive Beam Caching PCollections bound to user defined vars in notebook
  • [BEAM-8343] - Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
  • [BEAM-8365] - Add project push-down capability to IO APIs
  • [BEAM-8379] - Cache Eviction for Interactive Beam
  • [BEAM-8383] - Add metrics to Python state cache
  • [BEAM-8402] - Create a class hierarchy to represent environments
  • [BEAM-8427] - [SQL] Add support for MongoDB source
  • [BEAM-8468] - Add predicate/filter push-down capability to IO APIs
  • [BEAM-8470] - Create a new Spark runner based on Spark Structured streaming framework
  • [BEAM-8484] - Add ToJson transform
  • [BEAM-8523] - Add useful timestamp to job servicer GetJobs
  • [BEAM-8736] - Support window allowed lateness in python sdk

Improvement

  • [BEAM-876] - Support schemaUpdateOption in BigQueryIO
  • [BEAM-6303] - Add .parquet extension to files in ParquetIO
  • [BEAM-7886] - Make row coder a standard coder and implement in python
  • [BEAM-8016] - Render Beam Pipeline as DOT with Interactive Beam
  • [BEAM-8151] - Allow the Python SDK to use many many threads
  • [BEAM-8337] - Add Flink job server container images to release process
  • [BEAM-8428] - [SQL] BigQuery should support project push-down in DIRECT_READ mode
  • [BEAM-8434] - Allow trigger transcript tests to be run as ValidatesRunner tests.
  • [BEAM-8456] - Add pipeline option to control truncate of BigQuery data processed by Beam SQL
  • [BEAM-8457] - Instrument Dataflow jobs that are launched from Notebooks
  • [BEAM-8503] - Improve TestBigQuery and TestPubsub
  • [BEAM-8508] - [SQL] Support predicate push-down without project push-down
  • [BEAM-8509] - TestPortableRunner should use JobServerDriver interface
  • [BEAM-8513] - RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
  • [BEAM-8514] - ZetaSql should use cost-based optimization to take advantage of Join Reordering Rule and Push-Down Rule
  • [BEAM-8516] - sdist build fails when artifacts from different versions are present
  • [BEAM-8524] - Stop using pubsub in fnapi streaming dataflow Impluse
  • [BEAM-8540] - Fix CSVSink example in FileIO docs
  • [BEAM-8554] - Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up
  • [BEAM-8570] - Use SDK version in default Java container tag
  • [BEAM-8573] - @SplitRestriction's documented signature is incorrect
  • [BEAM-8583] - [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
  • [BEAM-8585] - Include path in error message in path_to_beam_jar
  • [BEAM-8594] - Remove unnecessary error check of the control service accessing in DataFlow Runner
  • [BEAM-8597] - Allow TestStream trigger tests to run on other runners.
  • [BEAM-8617] - Tear down the DoFns upon the control service termination in Python SDK harness
  • [BEAM-8619] - Tear down the DoFns upon the control service termination in Java SDK harness
  • [BEAM-8658] - Optionally set artifact staging port in FlinkUberJarJobServer
  • [BEAM-8659] - RowJsonTest should test serialization independently
  • [BEAM-8664] - [SQL] MongoDb should use project push-down
  • [BEAM-8666] - Remove dependency between DataflowRunner and PortableRunner introduced by PR#9811
  • [BEAM-8743] - Add support for flat schemas in pubsub
  • [BEAM-8781] - FlinkUberJarJobServer should respect --flink_job_server_jar
  • [BEAM-8794] - Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
  • [BEAM-8796] - Optionally configure static job port for JavaJarJobServer
  • [BEAM-8805] - Remove obsolete worker_threads experiment in tests
  • [BEAM-8824] - Add support for allowed lateness in python sdk
  • [BEAM-8883] - downgrade "Failed to remove job staging directory" log level
  • [BEAM-8902] - parameterize input type of Java external transform
  • [BEAM-8903] - handling --jar_packages experimental flag in PortableRunner for staging external dependencies used in expanded transforms
  • [BEAM-8904] - properly update output pcollections from expanded transforms
  • [BEAM-8905] - matching Java PCollectionTuple translation naming convention in expansion service with index only
  • [BEAM-8976] - No default logging story for Pipeline construction time in Python

Test

  • [BEAM-8028] - Simplify running of Beam Python on Spark
  • [BEAM-8586] - [SQL] Add a server for MongoDb Integration Test

Task

  • [BEAM-3288] - Guard against unsafe triggers at construction time
  • [BEAM-4226] - Migrate hadoop dependency to 2.7.4 or upper to fix a CVE
  • [BEAM-8398] - Upgrade Dataflow Java Client API
  • [BEAM-8670] - Manage environment parallelism in DefaultJobBundleFactory

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.