Release Notes - Beam - Version 2.0.0 - HTML format

Sub-task

  • [BEAM-772] - Implement Metrics support for Dataflow Runner
  • [BEAM-773] - Implement Metrics support for Flink runner
  • [BEAM-775] - Remove Aggregators from the Java SDK
  • [BEAM-827] - Remove PipelineOptions from construction time in WriteFiles
  • [BEAM-1617] - Add Gauge metric type to Java SDK
  • [BEAM-1651] - Add code style xml to the project repository
  • [BEAM-1684] - Add unit tests for iobase.py
  • [BEAM-1722] - Move PubsubIO out of the core SDK
  • [BEAM-1726] - Verify PAssert execution in TestFlinkRunner
  • [BEAM-1763] - TestPipeline should ensure that all assertions succeeded
  • [BEAM-1912] - Move HashingFn into io/common so it can be used by other tests
  • [BEAM-1958] - Standard IO Metrics in Java SDK
  • [BEAM-2002] - Verify PAssert execution in TestSparkRunner
  • [BEAM-2003] - Verify PAssert execution in TestDataflowRunner
  • [BEAM-2030] - Implement beam FileSystem's copy()
  • [BEAM-2031] - Hadoop FileSystem needs to receive Hadoop Configuration
  • [BEAM-2032] - Implement delete
  • [BEAM-2033] - Implement ResourceIds for HadoopFileSystem
  • [BEAM-2070] - Implement match for HadoopFileSystem
  • [BEAM-2329] - ABS Function
  • [BEAM-2330] - MOD Function
  • [BEAM-2331] - SQRT Function

Bug

  • [BEAM-145] - OutputTimeFn#assignOutputTime overrides WindowFn#getOutputTime in unfortunate ways
  • [BEAM-260] - WindowMappingFn: Know the getSideInputWindow upper bound to release side input resources
  • [BEAM-437] - Data-dependent BigQueryIO in batch
  • [BEAM-463] - BoundedHeapCoder should be a StandardCoder and not a CustomCoder
  • [BEAM-539] - Error when writing to the root of a GCS location
  • [BEAM-632] - Dataflow runner does not correctly flatten duplicate inputs
  • [BEAM-655] - Rename @RunnableOnService to something more descriptive
  • [BEAM-662] - SlidingWindows should support sub-second periods
  • [BEAM-828] - Remove PipelineOptions from construction time in BigQueryIO
  • [BEAM-1013] - Recheck all existing programming guide code snippets for correctness
  • [BEAM-1022] - WindowNamespace and WindowAndTriggerNamespace should not use Java object equality when comparing windows
  • [BEAM-1040] - Hadoop InputFormat - IO Transform for reads
  • [BEAM-1048] - Spark Runner streaming batch duration does not include duration of reading from source
  • [BEAM-1053] - ApexGroupByKeyOperator serialization issues
  • [BEAM-1068] - Service Account Credentials File Specified via Pipeline Option Ignored
  • [BEAM-1101] - Remove inconsistencies in Python PipelineOptions
  • [BEAM-1213] - WordCount example failure on Apex Runner
  • [BEAM-1247] - Session state should not be lost when discardingFiredPanes
  • [BEAM-1264] - Python ChannelFactory Raise Inconsistent Error for Local FS and GCS
  • [BEAM-1283] - DoFn finishBundle should be required to specify the window for output
  • [BEAM-1316] - DoFn#startBundle should not be able to output
  • [BEAM-1355] - HDFS IO should comply with PTransform style guide
  • [BEAM-1362] - Update the beam release process to include python sdk
  • [BEAM-1366] - Add metrics checks to Python SDK once metrics have been implemented
  • [BEAM-1381] - Implement DataflowMetrics.query method
  • [BEAM-1383] - Consistency in the Metrics examples
  • [BEAM-1402] - Make TextIO and AvroIO use best-practice types.
  • [BEAM-1414] - CountingInput should comply with PTransform style guide
  • [BEAM-1415] - PubsubIO should comply with PTransform style guide
  • [BEAM-1418] - MapElements and FlatMapElements should comply with PTransform style guide
  • [BEAM-1422] - ParDo should comply with PTransform style guide
  • [BEAM-1425] - Window should comply with PTransform style guide
  • [BEAM-1428] - KinesisIO should comply with PTransform style guide
  • [BEAM-1459] - Dataflow runner has deprecated metricsUpdates in favor of counterUpdates. Add setters.
  • [BEAM-1508] - PInput, POutput#expand should not be ordered
  • [BEAM-1546] - Specify exact version for Python in the SDK
  • [BEAM-1568] - Ineffective null check in IsmFormat#structuralValue
  • [BEAM-1569] - HDFSFileSource: Unable to read from filePattern with spaces in path
  • [BEAM-1571] - Flatten on a single input PCollection should have a test associated with it
  • [BEAM-1572] - Add per-stage matching of scope in metrics for the DirectRunner
  • [BEAM-1575] - Add ValidatesRunner test to PipelineTest.test_metrics_in_source
  • [BEAM-1578] - Runners should put PT overrides into a list rather than map
  • [BEAM-1579] - Runners should verify that PT overrides converged
  • [BEAM-1580] - Typo in bigquery_tornadoes example
  • [BEAM-1594] - Treat JOB_STATE_DRAINED as terminal in DataflowRunner
  • [BEAM-1629] - Metrics/aggregators accumulators should be instantiated before traversing pipeline
  • [BEAM-1635] - TypeError in AfterWatermark class's __repr__ method
  • [BEAM-1642] - Combine transformation evaluation fails on direct runner with Avro as a fallback coder
  • [BEAM-1644] - IO ITs: shared directory for kubernetes resources and PipelineOptions?
  • [BEAM-1645] - Display data not populated on Window.Assign
  • [BEAM-1649] - Fix unresolved references in Python SDK
  • [BEAM-1653] - Error when using PubsubIO with the DirectRunner
  • [BEAM-1656] - DirectRunner should not call finalize twice in UnboundedSourceExecutorFactory
  • [BEAM-1657] - DirectRunner should not call close twice in UnboundedSourceExecutorFactory
  • [BEAM-1671] - Support bypassing `validate` flag when using tfrecordio
  • [BEAM-1673] - PubSubIO can't write attributes
  • [BEAM-1676] - SdkCoreApiSurfaceTest Failed When Directory Contains Space
  • [BEAM-1686] - MQTT IO throws exception when client id is not specified
  • [BEAM-1690] - BigQueryTornadoesIT failing
  • [BEAM-1694] - Fix docstring inaccuracies in Python-SDK
  • [BEAM-1695] - Improve Python-SDK's programming guide
  • [BEAM-1709] - Implement Single-output ParDo as Multi-output ParDo
  • [BEAM-1711] - Document extra features on quick start guide
  • [BEAM-1713] - SparkRuntimeContext instances are leaking via StateSpecFunctions#mapSourceFunction
  • [BEAM-1718] - Returning Duration.millis(Long.MAX_VALUE) in DoFn.getAllowedTimestampSkew() causes Overflow/Underflow
  • [BEAM-1719] - Test modules are included in generated documentation
  • [BEAM-1721] - Reshuffle can shift elements in time
  • [BEAM-1723] - FlinkRunner should deduplicate when an UnboundedSource requires Deduping
  • [BEAM-1732] - Window.Assign does not properly populate DisplayData of the enclosing Window transform
  • [BEAM-1737] - Implement a Single-output ParDo as a Multi-output ParDo with a single output
  • [BEAM-1741] - Update runner pages for Python
  • [BEAM-1742] - UnboundedSource CheckpointMark should have more precise documentation
  • [BEAM-1751] - Singleton ByteKeyRange with BigtableIO and Dataflow runner
  • [BEAM-1762] - Python SDK Error Message no python 3 compatible
  • [BEAM-1767] - Remove Aggregators from Dataflow runner
  • [BEAM-1768] - assert_that always passes for empty inputs
  • [BEAM-1769] - Travis - python only executes py27 tox environment
  • [BEAM-1770] - DoFn javadoc claims no runner supports state or timers
  • [BEAM-1772] - Support merging WindowFn other than IntervalWindow on Flink Runner
  • [BEAM-1776] - Timers should be delivered in the window they were set in
  • [BEAM-1777] - If PipelineEnforcement throws an exception after Pipeline.run() fails, it overwrites the original failure
  • [BEAM-1780] - BigtableReader.splitIntoFraction should more carefully guard input
  • [BEAM-1784] - DataflowPipelineJob.cancel() should be idempotent
  • [BEAM-1792] - Spark runner uses its own filtering logic to match metrics
  • [BEAM-1793] - Frequent python post commit errors
  • [BEAM-1795] - Upgrade google-cloud-bigquery to 0.23.0
  • [BEAM-1801] - default_job_name can generate names not accepted by DataFlow
  • [BEAM-1802] - Spark Runner does not shutdown correctly when executing multiple pipelines in sequence
  • [BEAM-1803] - Metrics filters have a missmatch in class-based namespace
  • [BEAM-1810] - Spark runner combineGlobally uses Kryo serialization
  • [BEAM-1815] - Avoid shuffling twice in GABW
  • [BEAM-1818] - Expose side-channel inputs in PTransform
  • [BEAM-1828] - GlobalWatermarkHolder uses unpersist instead of destory
  • [BEAM-1832] - Potentially unclosed OutputStream in ApexYarnLauncher
  • [BEAM-1835] - NPE in DirectRunner PubsubReader.ackBatch
  • [BEAM-1837] - NPE in KafkaIO writer
  • [BEAM-1838] - GlobalWindow equals() and hashCode() doesn't work with other serialization frameworks
  • [BEAM-1842] - Stop matching composite PCollectionView PTransforms
  • [BEAM-1844] - test_memory_usage fails in post commit
  • [BEAM-1849] - Output from OnTimer method has windows re-assigned
  • [BEAM-1856] - HDFSFileSink class do not use the same configuration in master and slave
  • [BEAM-1862] - SplittableDoFnOperator should close the ScheduledExecutorService
  • [BEAM-1865] - Input Coder of GroupByKey should be a KV Coder in the Python SDK
  • [BEAM-1867] - Element counts missing on Cloud Dataflow when PCollection has anything other than hardcoded name pattern
  • [BEAM-1869] - getProducingTransformInternal should not be available on any PValue
  • [BEAM-1873] - Javadoc in BigQueryIO doesn't reflect recent changes
  • [BEAM-1886] - Remove TextIO override in Flink runner
  • [BEAM-1902] - Datastore IO never retries on errors
  • [BEAM-1903] - Splittable DoFn should report watermarks via ProcessContext
  • [BEAM-1904] - Remove DoFn.ProcessContinuation
  • [BEAM-1913] - TFRecordIO should comply with PTransform style guide
  • [BEAM-1914] - XML IO should comply with PTransform style guide
  • [BEAM-1922] - DataSource in JdbcIO is not closed
  • [BEAM-1926] - Need 3 Python snippets for composite transforms section in programming guide
  • [BEAM-1935] - DirectRunner Cancel should never throw a RejectedExecutionException
  • [BEAM-1937] - PipelineSurgery renumbers already-unique transforms
  • [BEAM-1947] - DisplayData raises exception when passed unicode string
  • [BEAM-1954] - "test" extra need nose in the requirements list
  • [BEAM-1963] - Quick start on home page redirects to java quickstart
  • [BEAM-1964] - Upgrade pylint to 1.7.0
  • [BEAM-1966] - ApexRunner in cluster mode does not register standard FileSystems/IOChannelFactories
  • [BEAM-1969] - GCP extras should not required fix version of proto-google-cloud-datastore-v1
  • [BEAM-1970] - Cannot run UserScore on Flink runner due to AvroCoder classload issues
  • [BEAM-1972] - HIFIO jdk module fails enforcer when only java 7 is installed on machine
  • [BEAM-1977] - PubsubIO fails with NPE on ACK when running locally
  • [BEAM-1981] - Serialization error with TimerInternals in ApexGroupByKeyOperator
  • [BEAM-1988] - utils.path.join does not correctly handle GCS bucket roots
  • [BEAM-1989] - clean SyntaxWarning
  • [BEAM-1992] - Count.perElement javadoc refers to Count.PerElement, but Count.PerElement is private
  • [BEAM-1998] - Update json_values_test.py for ValueProvider
  • [BEAM-2017] - DataflowRunner: fix NullPointerException that can occur when no metrics are present
  • [BEAM-2019] - Count.globally() requires default values for non-GlobalWindows
  • [BEAM-2022] - ApexTimerInternals seems to treat processing time timers as event time timers
  • [BEAM-2023] - BigQueryIO.Write needs a way of dynamically specifying table schemas
  • [BEAM-2029] - NullPointerException when using multi output ParDo in Spark runner in streaming mode.
  • [BEAM-2040] - Occasional build failures caused by AutoValue
  • [BEAM-2052] - Windowed file sinks should support dynamic sharding
  • [BEAM-2071] - AttributeError in dataflow_metrics
  • [BEAM-2072] - MicrobatchSource.reader stops reading after reaching maxNumRecords for the first time
  • [BEAM-2073] - Change SourceDStream.rateControlledMaxRecords() to better reflect its intention
  • [BEAM-2074] - SourceDStream's rate control mechanism may not work
  • [BEAM-2077] - Remove AvroCoder#createDatum(Reader/Writer)
  • [BEAM-2084] - Distribution metrics should be queriable in the Dataflow Runner
  • [BEAM-2086] - TestDataflowRunner relies on metrics which are not present in streaming jobs
  • [BEAM-2091] - Typo in build instructions in Apex Runner's README.md
  • [BEAM-2092] - MicrobatchSource can be relieved of some of its methods since it's never used as an actual BoundedSource
  • [BEAM-2093] - Update Jackson version to 2.8.8 in archetype (or align with parent pom)
  • [BEAM-2094] - WordCount examples produce garbage for non-English input text
  • [BEAM-2095] - The hasNext method of the iterator returned by SourceRDD#compute is not idempotent
  • [BEAM-2096] - NullPointerException in DataflowMetrics
  • [BEAM-2098] - Walkthrough URL in example code Javadoc is 404 not found
  • [BEAM-2105] - Audit that user-facing stuff is in main jars, not the test suite jars
  • [BEAM-2106] - NotSerializableException thrown when serializing EvaluationContext
  • [BEAM-2113] - Apex Runner is not able to submit any job to YARN
  • [BEAM-2114] - KafkaIO broken with CoderException
  • [BEAM-2116] - PubsubJsonClient doesn't write user created attributeMap
  • [BEAM-2119] - FileSystems doesn't install the local filesystem on intialization by default
  • [BEAM-2120] - DataflowPipelineJob processes all log messages with each waitUntilFinish
  • [BEAM-2122] - Writing to partitioned BigQuery tables from Dataflow is causing errors
  • [BEAM-2130] - Ensure options id is never null
  • [BEAM-2136] - AvroCoderTest.testTwoClassLoaders fails on beam_PostCommit_Java_ValidatesRunner_Dataflow
  • [BEAM-2143] - (Mis)Running Dataflow Wordcount gives non-helpful errors
  • [BEAM-2152] - Authentication fails if there is an unauthenticated gcloud tool even if application default credentials are available
  • [BEAM-2154] - Writing to large numbers of BigQuery tables causes out-of-memory
  • [BEAM-2157] - HadoopFileSystemModuleTest Failed in Some JDK Versions on Jenkins
  • [BEAM-2162] - Add logging during and after long running BigQuery jobs
  • [BEAM-2170] - PubsubIO.readStrings should handle messages without metadata
  • [BEAM-2181] - Upgrade Bigtable dependency to 0.9.6.2
  • [BEAM-2183] - Maven-archetypes should depend on all Beam modules that their sources compile against
  • [BEAM-2184] - OutputTimeFn is not a Fn: in Python, rename to TimestampCombiner
  • [BEAM-2187] - SparkRuntimeContextTest fails to compile
  • [BEAM-2190] - User depending on IO-GCP still gets a dependency on protobuf-lite
  • [BEAM-2205] - AttributeError when running datastore wordcount
  • [BEAM-2210] - PubsubIO.readPubsubMessagesWithoutAttributes is awkward
  • [BEAM-2211] - DataflowRunner (Java) rejects all but GCS paths for FileBasedSource/Sink
  • [BEAM-2212] - ValueProvider-ification of core transforms makes logs and errors worse
  • [BEAM-2213] - Java DirectRunner takes 60s to shut down after wordcount runs
  • [BEAM-2222] - Clean up readme files
  • [BEAM-2223] - java8 examples are not running
  • [BEAM-2224] - maptask_executor_runner_test fails in windows
  • [BEAM-2229] - GcsFileSystem attempts to create invalid Metadata
  • [BEAM-2233] - Java 8 examples should separate runners into distinct profiles, like Java 7 examples
  • [BEAM-2236] - Move test utilities out of python core
  • [BEAM-2239] - Step context not always available when exceptions raised.
  • [BEAM-2240] - Step context not always available when exceptions raised.
  • [BEAM-2242] - Apache Beam Java modules do not correctly shade test artifacts
  • [BEAM-2243] - org.apache.beam.GcpCoreApiSurfaceTest.testApiSurface fails at release-2.0.0 head
  • [BEAM-2244] - Move runner-facing Metrics classes to runners core
  • [BEAM-2249] - AvroIO does not handle partial reads
  • [BEAM-2256] - mongodb sdk MongoDbIO.BoundedMongoDbSource.splitKeysToFilters incorrect
  • [BEAM-2259] - Reshuffle may set watermark holds past the end of time
  • [BEAM-2260] - When using WindowedWrites and default FilenamePolicy, TextIO should throw at construction time
  • [BEAM-2275] - SerializableCoder fails to serialize when used with a generic type token
  • [BEAM-2277] - IllegalArgumentException when using Hadoop file system for WordCount example.
  • [BEAM-2279] - Hadoop file system support should be included in examples/archetype profiles of Spark runner.
  • [BEAM-2305] - Dinstinct transform produces unexpected output when triggered
  • [BEAM-2326] - Verbose INFO logging with stateful DoFns and Dataflow
  • [BEAM-2429] - Conflicting filesystems with used of HadoopFileSystem

New Feature

  • [BEAM-59] - Switch from IOChannelFactory to FileSystems
  • [BEAM-73] - IO design pattern: Decouple Parsers and Coders
  • [BEAM-135] - Utilities for "batching" elements in a DoFn
  • [BEAM-147] - Introduce an easy API for pipeline metrics
  • [BEAM-404] - PubsubIO should have a mode that supports maintaining message attributes.
  • [BEAM-596] - Support cancel() and waitUntilFinish() in DirectRunner
  • [BEAM-638] - Add sink transform to write bounded data per window, pane, [and key] even when PCollection is unbounded
  • [BEAM-846] - Decouple side input window mapping from WindowFn
  • [BEAM-885] - Move PipelineOptions from Pipeline.create() to Pipeline.run()
  • [BEAM-1047] - DataflowRunner: support regionalization.
  • [BEAM-1076] - DatastoreIO template Options
  • [BEAM-1195] - Give triggers a cross-language serialization schema
  • [BEAM-1198] - ViewFn: explicitly decouple runner materialization of side inputs from SDK-specific mapping
  • [BEAM-1327] - Replace OutputTimeFn with enum
  • [BEAM-1328] - Serialize/deserialize WindowingStrategy in a language-agnostic manner
  • [BEAM-1397] - Introduce IO metrics
  • [BEAM-1398] - KafkaIO metrics
  • [BEAM-1441] - Add FileSystem support to Python SDK
  • [BEAM-1855] - Support Splittable DoFn in Flink Streaming runner
  • [BEAM-1960] - Hadoop InputFormat - Add Kubernetes large and small cluster Scripts for Cassandra and Elasticsearch tests
  • [BEAM-2005] - Add a Hadoop FileSystem implementation of Beam's FileSystem
  • [BEAM-2054] - Upgrade dataflow.version to v1b3-rev196-1.22.0
  • [BEAM-2147] - Re-enable UsesTimersInParDo tests for DataflowRunner

Improvement

  • [BEAM-447] - Stop referring to types with Bound/Unbound
  • [BEAM-649] - Smarter caching of RDDs
  • [BEAM-720] - Run WindowedWordCount Integration Test in Flink
  • [BEAM-806] - Maven Release Plugin Does Not Set Archetype Versions
  • [BEAM-818] - ValueProvider for tempLocation, runner, etc, that is unavailable to transforms during construction
  • [BEAM-831] - ParDo Chaining
  • [BEAM-848] - Shuffle input read-values to get maximum parallelism.
  • [BEAM-911] - Mark API of multiple IOs as @Experimental
  • [BEAM-1071] - Support pre-existing tables with streaming BigQueryIO
  • [BEAM-1074] - Set default-partitioner in SourceRDD.Unbounded.
  • [BEAM-1148] - Port PAssert away from Aggregators
  • [BEAM-1179] - Update assertions of source_test_utils from camelcase to underscore-separated
  • [BEAM-1182] - Direct runner should enforce encodability of unbounded source checkpoints
  • [BEAM-1199] - Condense recordAsOutput, finishSpecifyingOutput from POutput
  • [BEAM-1242] - convert older IO/Sources to use standard ReadTransform style
  • [BEAM-1269] - BigtableIO should make more efficient use of connections
  • [BEAM-1272] - Align the naming of "generateInitialSplits" and "splitIntoBundles" to better reflect their intention
  • [BEAM-1294] - Long running UnboundedSource Readers
  • [BEAM-1336] - A StateSpec that doesn't care about the key shouldn't be forced to declare it as type Object
  • [BEAM-1337] - Use our coder infrastructure for coders for state
  • [BEAM-1340] - Remove or make private public bits of the SDK that shouldn't be public
  • [BEAM-1345] - Mark @Experimental and @Internal where needed in user-facing bits of the codebase
  • [BEAM-1401] - Sinks in Beam should supported windowed unbounded PCollections
  • [BEAM-1447] - Autodetect streaming/not streaming in DataflowRunner
  • [BEAM-1491] - HadoopFileSystemOptions should be able to read the HADOOP_CONF_DIR(YARN_CONF_DIR) environment variable
  • [BEAM-1514] - change default timestamp in KafkaIO
  • [BEAM-1520] - Implement TFRecordIO (Reading/writing Tensorflow Standard format)
  • [BEAM-1530] - BigQueryIO should support value-dependent windows
  • [BEAM-1539] - Support unknown length iterables for IterableCoder in Python SDK
  • [BEAM-1562] - Use a "signal" to stop streaming tests as they finish.
  • [BEAM-1573] - KafkaIO does not allow using Kafka serializers and deserializers
  • [BEAM-1633] - Move .tox/ directory under target/ in Python SDK
  • [BEAM-1660] - withCoder() error in JdbcIO JavaDoc example
  • [BEAM-1661] - shade guava in beam-sdks-java-io-jdbc
  • [BEAM-1672] - Accumulable MetricsContainers.
  • [BEAM-1689] - Apply changes for Flink's StatefulDoFnRunner to the primary StatefulDoFnRunner
  • [BEAM-1693] - Detect supported Python & pip executables in Python-SDK
  • [BEAM-1704] - Create.TimestampedValues should take a TypeDescriptor as an alternative to explicitly specifying the Coder
  • [BEAM-1708] - Better error messages when GCP features are not installed
  • [BEAM-1727] - Add setForNowAlign(period, offset) to Timer
  • [BEAM-1740] - Update bigtable version to 0.9.5.1
  • [BEAM-1743] - View.AsSingleton should be implemented in terms of a Global Combine, not the reverse
  • [BEAM-1749] - Upgrade pep8 to pycodestyle
  • [BEAM-1786] - AutoService registration of coders, like we do with PipelineRunners
  • [BEAM-1794] - Bigtable: improve user agent
  • [BEAM-1799] - IO ITs: simplify data loading design pattern
  • [BEAM-1807] - IO ITs: shared language neutral directory for kubernetes resources
  • [BEAM-1812] - Allow configuring checkpoints in Flink Runner PipelineOptions
  • [BEAM-1827] - Fix use of deprecated Spark APIs in the runner.
  • [BEAM-1829] - MQTT message compression not working on Rapsberry Pi
  • [BEAM-1830] - add 'withTopic()' api to KafkaIO Reader
  • [BEAM-1839] - Optimize StatelessJavaSerializer
  • [BEAM-1851] - Sample.fixedSizedGlobally documentation should include single worker memory constraint
  • [BEAM-1858] - improve error message when Create.of() is called with an empty iterator
  • [BEAM-1863] - Allow users to override the base container image but still choose image type
  • [BEAM-1864] - Shorten combining state names: "CombiningValue" and "AccumulatorCombiningState" to Combining (as appropriate)
  • [BEAM-1870] - ByteKey / ByteKeyRangeTracker should not use ByteString on public API surface
  • [BEAM-1871] - Thin Java SDK Core
  • [BEAM-1875] - Remove Spark runner custom Hadoop and Avro IOs.
  • [BEAM-1876] - GroupIntoBatches may be able to use Combine.BinaryCombineLongFn
  • [BEAM-1877] - Use Iterables.isEmpty in GroupIntoBatches
  • [BEAM-1882] - Jdbc k8s scripts: switch pod -> replicaController
  • [BEAM-1895] - Create tranform in python sdk should be a custom source
  • [BEAM-1897] - Remove Sink
  • [BEAM-1907] - Delete PubsubBoundedReader
  • [BEAM-1908] - Allow setting CREATE_NEVER when using a tablespec in BigQueryIO
  • [BEAM-1921] - expose connectionProperties in JdbcIO
  • [BEAM-1923] - Improve python log messages for temporary BigQuery tables
  • [BEAM-1949] - Rename DoFn.Context#sideOutput to #output
  • [BEAM-1990] - Window.Assign should not be public since it is not meant to be used publicly
  • [BEAM-1991] - Update references to SumDoubleFn => Sum.ofDoubles
  • [BEAM-1993] - Remove special unbounded Flink source/sink
  • [BEAM-1994] - Remove Flink examples package
  • [BEAM-2013] - Upgrade to Jackson 2.8.8
  • [BEAM-2014] - Upgrade to Google Auth 0.6.1
  • [BEAM-2020] - Move CloudObject to Dataflow runner
  • [BEAM-2021] - Fix Java's Coder class hierarchy
  • [BEAM-2044] - Downgrade HBaseIO to use the stable HBase client version (1.2.x)
  • [BEAM-2047] - PubsubStreamingWrite should use the input coder by default
  • [BEAM-2049] - Remove KeyedCombineFn
  • [BEAM-2051] - Reduce scope of the PCollectionView interface
  • [BEAM-2060] - XmlIO use harcoded Charset
  • [BEAM-2062] - EventHandler jaxb unmarshaller should be optional
  • [BEAM-2067] - Add support for generic CoderProvider -> CoderFactory mapping with CoderRegistrar
  • [BEAM-2068] - Upgrade Google-Apitools to latest version
  • [BEAM-2075] - Update flink runner to use flink version 1.2.1
  • [BEAM-2076] - DirectRunner: minimal transitive API surface
  • [BEAM-2099] - Create a WordCount example that works with HDFS
  • [BEAM-2135] - Rename hdfs module to hadoop-file-system, rename gcp-core to google-cloud-platform-core
  • [BEAM-2144] - Do not publish javadoc for Java SDK's util directory
  • [BEAM-2165] - Support custom user Jackson modules for PipelineOptions
  • [BEAM-2166] - Remove Coder.Context from the public API
  • [BEAM-2174] - Allow coder factories to create Coders for a wider range of types
  • [BEAM-2206] - Move pipeline options into separate package from beam/utils
  • [BEAM-2218] - PubsubIO.readPubsubMessages function names are too long
  • [BEAM-2221] - Make KafkaIO coder specification less awkward
  • [BEAM-2241] - Correctly mark top level classes and functions as private
  • [BEAM-2245] - Remove user-facing Timer.cancel() until further notice
  • [BEAM-2250] - Remove FnHarness code from PyDocs
  • [BEAM-3770] - The problem of kafkaIO sdk for data latency

Test

  • [BEAM-1184] - Add integration tests for ElasticsearchIO
  • [BEAM-1622] - Java: Rename RunnableOnService to ValidatesRunner
  • [BEAM-1752] - Tag Spark runner tests that recover from checkpoint.
  • [BEAM-2057] - Test metrics are reported to Spark Metrics sink.
  • [BEAM-2368] - one throw "Unable to find registrar for hdfs" with same code
  • [BEAM-3383] - Create validates runner metrics tests

Wish

  • [BEAM-378] - Integrate Python SDK in the Maven build
  • [BEAM-797] - A PipelineVisitor that creates a Spark-native pipeline.
  • [BEAM-1648] - Replace gsutil calls with Cloud Storage API

Task

  • [BEAM-825] - Fill in the documentation/runners/apex portion of the website
  • [BEAM-1027] - Hosting data stores to enable IO Transform testing
  • [BEAM-1353] - Beam should comply with PTransform style guide
  • [BEAM-1764] - Remove Aggregators from Flink Runner
  • [BEAM-1765] - Remove Aggregators from Spark runner
  • [BEAM-1766] - Remove Aggregators from Apex runner
  • [BEAM-1797] - add CoGroupByKey to chapter 'Using GroupByKey'
  • [BEAM-1887] - Switch ParDo execution to use new DoFn in Apex runner
  • [BEAM-1915] - Remove OldDoFn dependency in ApexGroupByKeyOperator
  • [BEAM-2016] - Delete HDFSFileSource/Sink
  • [BEAM-2124] - Deprecate <pipeline>.options usage
  • [BEAM-2139] - Disable SplittableDoFn ValidatesRunner tests for Streaming Flink Runner
  • [BEAM-2180] - Upgrade Apex dependency to 3.6.0
  • [BEAM-2235] - Restore wordcount example to its previous state

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.