转载

分布式数据处理引擎Apache Flink v1.1.0发布

分布式数据处理引擎Apache Flink v1.1.0发布

Flink 是一个针对流数据和批数据的分布式处理引擎。它主要是由 Java 代码实现。目前主要还是依靠开源社区的贡献而发展。对 Flink 而言,其所要处理的主要场景就是流数据,批数据只是流数据的一个极限特例而已。再换句话说,Flink 会把所有任务当成流来处理,这也是其最大的特点。Flink 可以支持本地的快速迭代,以及一些环形的迭代任务。并且 Flink 可以定制化内存管理。在这点,如果要对比 Flink 和 Spark 的话,Flink 并没有将内存完全交给应用层。这也是为什么 Spark 相对于 Flink,更容易出现 OOM 的原因(out of memory)。就框架本身与应用场景来说,Flink 更相似与 Storm。如果之前了解过 Storm 或者 Flume 的读者,可能会更容易理解 Flink 的架构和很多概念。

Flink 的架构图

分布式数据处理引擎Apache Flink v1.1.0发布

更新日志

  • FLINK-1502: Expose metrics to graphite, ganglia and JMX.
  • FLINK-1550: Show JVM Metrics for JobManager
  • FLINK-3129: Add tooling to ensure interface stability
  • FLINK-3141: Design of NULL values handling in operation
  • FLINK-3223: Translate Table API query into logical relational plans for Apache Calcite
  • FLINK-3225: Optimize logical Table API plans in Calcite
  • FLINK-3226: Translate optimized logical Table API plans into physical plans representing DataSet programs
  • FLINK-3229: Kinesis streaming consumer with integration of Flink's checkpointing mechanics
  • FLINK-3230: Kinesis streaming producer
  • FLINK-3231: Handle Kinesis-side resharding in Kinesis streaming consumer
  • FLINK-3327: Attach the ExecutionConfig to the JobGraph and make it accessible to the AbstractInvocable.
  • FLINK-3489: Refactor Table API before merging into master
  • FLINK-3544: ResourceManager runtime components
  • FLINK-3545: ResourceManager: YARN integration
  • FLINK-3547: Add support for streaming projection, selection, and union
  • FLINK-3550: Rework stream join example
  • FLINK-3552: Change socket WordCount to be properly windowed
  • FLINK-3573: Implement more String functions for Table API
  • FLINK-3574: Implement math functions for Table API
  • FLINK-3586: Risk of data overflow while use sum/count to calculate AVG value
  • FLINK-3612: Fix/adjust Table API examples
  • FLINK-3714: Add Support for "Allowed Lateness"
  • FLINK-3727: Add support for embedded streaming SQL (projection, filter, union)
  • FLINK-3748: Add CASE function to Table API
  • FLINK-3756: Introduce state hierarchy in CheckpointCoordinator
  • FLINK-3837: Create FLOOR/CEIL function
  • FLINK-3923: Unify configuration conventions of the Kinesis producer to the same as the consumer
  • FLINK-3949: Collect Metrics in Runtime Operators
  • FLINK-3951: Add Histogram Metric Type
  • FLINK-4018: Configurable idle time between getRecords requests to Kinesis shards
  • FLINK-4019: Expose approximateArrivalTimestamp through the KinesisDeserializationSchema interface
  • FLINK-4020: Remove shard list querying from Kinesis consumer constructor
  • FLINK-4033: Missing Scala example snippets for the Kinesis Connector documentation
  • FLINK-4057: Expose JobManager Metrics
  • FLINK-4062: Update Windowing Documentation
  • FLINK-4080: Kinesis consumer not exactly-once if stopped in the middle of processing aggregated records
  • FLINK-4085: Set Kinesis Consumer Agent to Flink
  • FLINK-4191: Expose shard information in KinesisDeserializationSchema
  • FLINK-4239: Set Default Allowed Lateness to Zero and Make Triggers Non-Purging

Bug修复

  • FLINK-1159: Case style anonymous functions not supported by Scala API
  • FLINK-1964: Rework TwitterSource to use a Properties object instead of a file path
  • FLINK-2392: Instable test in flink-yarn-tests
  • FLINK-2544: Some test cases using PowerMock fail with Java 8u20
  • FLINK-2832: Failing test: RandomSamplerTest.testReservoirSamplerWithReplacement
  • FLINK-2915: JobManagerProcessFailureBatchRecoveryITCase
  • FLINK-3086: ExpressionParser does not support concatenation of suffix operations
  • FLINK-3105: Submission in per job YARN cluster mode reuses properties file of long-lived session
  • FLINK-3126: Remove accumulator type from "value" in web frontend
  • FLINK-3179: Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned
  • FLINK-3333: Documentation about object reuse should be improved
  • FLINK-3380: Unstable Test: JobSubmissionFailsITCase
  • FLINK-3396: Job submission Savepoint restore logic flawed
  • FLINK-3411: Failed recovery can lead to removal of HA state
  • FLINK-3444: env.fromElements relies on the first input element for determining the DataSet/DataStream type
  • FLINK-3466: Job might get stuck in restoreState() from HDFS due to interrupt
  • FLINK-3471: JDBCInputFormat cannot handle null fields of certain types
  • FLINK-3472: JDBCInputFormat.nextRecord(..) has misleading message on NPE
  • FLINK-3488: Kafka08ITCase.testBigRecordJob fails on Travis
  • FLINK-3491: HDFSCopyUtilitiesTest fails on Windows
  • FLINK-3495: RocksDB Tests can't run on Windows
  • FLINK-3505: JoinUnionTransposeRule fails to push Join past Union.
  • FLINK-3519: Subclasses of Tuples don't work if the declared type of a DataSet is not the descendant
  • FLINK-3530: Kafka09ITCase.testBigRecordJob fails on Travis
  • FLINK-3533: Update the Gelly docs wrt examples and cluster execution
  • FLINK-3534: Cancelling a running job can lead to restart instead of stopping
  • FLINK-3540: Hadoop 2.6.3 build contains /com/google/common (guava) classes in flink-dist.jar
  • FLINK-3556: Unneeded check in HA blob store configuration
  • FLINK-3561: ExecutionConfig's timestampsEnabled is unused
  • FLINK-3562: Update docs in the course of EventTimeSourceFunction removal
  • FLINK-3563: .returns() doesn't compile when using .map() with a custom MapFunction
  • FLINK-3565: FlinkKafkaConsumer does not work with Scala 2.11
  • FLINK-3566: Input type validation often fails on custom TypeInfo implementations
  • FLINK-3567: Rework selection when grouping in Table API
  • FLINK-3569: Test cases fail due to Maven Shade plugin
  • FLINK-3577: Display anchor links when hovering over headers.
  • FLINK-3578: Scala DataStream API does not support Rich Window Functions
  • FLINK-3579: Improve String concatenation
  • FLINK-3583: Configuration not visible in gui when job is running
  • FLINK-3585: Deploy scripts don't support spaces in paths
  • FLINK-3593: DistinctITCase is failing
  • FLINK-3595: Kafka09 consumer thread does not interrupt when stuck in record emission
  • FLINK-3601: JobManagerTest times out on StopSignal test
  • FLINK-3602: Recursive Types are not supported / crash TypeExtractor
  • FLINK-3608: ImmutableSettings error in ElasticsearchSink
  • FLINK-3611: Wrong link in CONTRIBUTING.md
  • FLINK-3619: SavepointCoordinator test failure
  • FLINK-3621: Misleading documentation of memory configuration parameters
  • FLINK-3622: Improve error messages for invalid joins
  • FLINK-3630: Little mistake in documentation
  • FLINK-3631: CodeGenerator does not check type compatibility for equality expressions
  • FLINK-3633: Job submission silently fails when using user code types
  • FLINK-3635: Potential null deference in TwitterExample#SelectEnglishAndTokenizeFlatMap#flatMap
  • FLINK-3636: NoClassDefFoundError while running WindowJoin example
  • FLINK-3638: Invalid default ports in documentation
  • FLINK-3644: WebRuntimMonitor set java.io.tmpdir does not work for change upload dir.
  • FLINK-3645: HDFSCopyUtilitiesTest fails in a Hadoop cluster
  • FLINK-3651: Fix faulty RollingSink Restore
  • FLINK-3653: recovery.zookeeper.storageDir is not documented on the configuration page
  • FLINK-3663: FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
  • FLINK-3669: WindowOperator registers a lot of timers at StreamTask
  • FLINK-3675: YARN ship folder incosistent behavior
  • FLINK-3676: WebClient hasn't been removed from the docs
  • FLINK-3681: CEP library does not support Java 8 lambdas as select function
  • FLINK-3682: CEP operator does not set the processing timestamp correctly
  • FLINK-3684: CEP operator does not forward watermarks properly
  • FLINK-3689: JobManager blocks cluster shutdown when not connected to ResourceManager
  • FLINK-3693: JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour is unstable
  • FLINK-3696: Some Union tests fail for TableConfigMode.EFFICIENT
  • FLINK-3697: keyBy() with nested POJO computes invalid field position indexes
  • FLINK-3701: Cant call execute after first execution
  • FLINK-3712: YARN client dynamic properties are not passed correctly to the leader election service on the client
  • FLINK-3713: DisposeSavepoint message uses system classloader to discard state
  • FLINK-3716: Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass
  • FLINK-3718: Add Option For Completely Async Backup in RocksDB State Backend
  • FLINK-3725: Exception in thread "main" scala.MatchError: … (of class scala.Tuple4)
  • FLINK-3728: Throw meaningful exceptions for unsupported SQL features
  • FLINK-3729: Several SQL tests fail on Windows OS
  • FLINK-3730: Fix RocksDB Local Directory Initialization
  • FLINK-3731: Embedded SQL outer joins should fail during translation
  • FLINK-3732: Potential null deference in ExecutionConfig#equals()
  • FLINK-3735: Embedded SQL union should fail during translation
  • FLINK-3737: WikipediaEditsSourceTest.testWikipediaEditsSource() fails locally
  • FLINK-3745: TimestampITCase testWatermarkPropagationNoFinalWatermarkOnStop failing intermittently
  • FLINK-3757: addAccumulator does not throw Exception on duplicate accumulator name
  • FLINK-3759: Table API should throw exception is null value is encountered in non-null mode.
  • FLINK-3762: Kryo StackOverflowError due to disabled Kryo Reference tracking
  • FLINK-3773: Scanners are left unclosed in SqlExplainTest
  • FLINK-3774: Flink configuration is not correctly forwarded to PlanExecutor in ScalaShellRemoteEnvironment
  • FLINK-3781: BlobClient may be left unclosed in BlobCache#deleteGlobal()
  • FLINK-3790: Rolling File sink does not pick up hadoop configuration
  • FLINK-3792: RowTypeInfo equality should not depend on field names
  • FLINK-3793: Re-organize the Table API and SQL docs
  • FLINK-3796: FileSourceFunction doesn't respect InputFormat's life cycle methods
  • FLINK-3803: Checkpoint Stats Tracker Reads from Wrong Configuration
  • FLINK-3824: ResourceManager may repeatedly connect to outdated JobManager in HA mode
  • FLINK-3826: Broken test: StreamCheckpointingITCase
  • FLINK-3835: JSON execution plan not helpful to debug plans with KeySelectors
  • FLINK-3838: CLI parameter parser is munging application params
  • FLINK-3840: RocksDB local parent dir is polluted with empty folders with random names
  • FLINK-3842: Fix handling null record/row in generated code
  • FLINK-3845: Gelly allows duplicate vertices in Graph.addVertices
  • FLINK-3846: Graph.removeEdges also removes duplicate edges
  • FLINK-3860: WikipediaEditsSourceTest.testWikipediaEditsSource times out
  • FLINK-3863: Yarn Cluster shutdown may fail if leader changed recently
  • FLINK-3864: Yarn tests don't check for prohibited strings in log output
  • FLINK-3877: Create TranslateFunction interface for Graph translators
  • FLINK-3878: File cache doesn't support multiple duplicate temp directories
  • FLINK-3882: Errors in sample Java code for the Elasticsearch 2.x sink
  • FLINK-3890: Deprecate streaming mode flag from Yarn CLI
  • FLINK-3892: ConnectionUtils may die with NullPointerException
  • FLINK-3893: LeaderChangeStateCleanupTest times out
  • FLINK-3908: FieldParsers error state is not reset correctly to NONE
  • FLINK-3909: Maven Failsafe plugin may report SUCCESS on failed tests
  • FLINK-3914: BlobServer.createTemporaryFilename() has concurrency safety problem
  • FLINK-3922: Infinite recursion on TypeExtractor
  • FLINK-3926: Incorrect implementation of getFieldIndex in TupleTypeInfo
  • FLINK-3927: TaskManager registration may fail if Yarn versions don't match
  • FLINK-3928: Potential overflow due to 32-bit int arithmetic
  • FLINK-3933: Add an auto-type-extracting DeserializationSchema
  • FLINK-3934: Prevent translation of non-equi joins in DataSetJoinRule
  • FLINK-3935: Invalid check of key and ordering fields in PartitionNode
  • FLINK-3938: Yarn tests don't run on the current master
  • FLINK-3939: Prevent distinct aggregates and grouping sets from being translated
  • FLINK-3944: Add optimization rules to reorder Cartesian products and joins
  • FLINK-3948: EventTimeWindowCheckpointingITCase Fails with Core Dump
  • FLINK-3953: Surefire plugin executes unit tests twice
  • FLINK-3956: Make FileInputFormats independent from Configuration
  • FLINK-3960: Disable, fix and re-enable EventTimeWindowCheckpointingITCase
  • FLINK-3962: JMXReporter doesn't properly register/deregister metrics
  • FLINK-3963: AbstractReporter uses shaded dependency
  • FLINK-3971: Aggregates handle null values incorrectly.
  • FLINK-3972: Subclasses of ResourceID may not to be serializable
  • FLINK-3973: Table API documentation is "hidden" in Programming Guide menu list
  • FLINK-3974: enableObjectReuse fails when an operator chains to multiple downstream operators
  • FLINK-3975: Override baseurl when serving docs locally
  • FLINK-3977: Subclasses of InternalWindowFunction must support OutputTypeConfigurable
  • FLINK-3981: Don't log duplicate TaskManager registrations as exceptions
  • FLINK-3982: Multiple ResourceManagers register at JobManager in standalone HA mode
  • FLINK-3994: Instable KNNITSuite
  • FLINK-3995: Properly Structure Test Utils and Dependencies
  • FLINK-4000: Exception: Could not restore checkpointed state to operators and functions; during Job Restart (Job restart is triggered due to one of the task manager failure)
  • FLINK-4002: [py] Improve testing infraestructure
  • FLINK-4009: Scala Shell fails to find library for inclusion in test
  • FLINK-4011: Unable to access completed job in web frontend
  • FLINK-4012: Docs: Links to "Iterations" are broken (404)
  • FLINK-4016: FoldApplyWindowFunction is not properly initialized
  • FLINK-4024: FileSourceFunction not adjusted to new IF lifecycle
  • FLINK-4027: FlinkKafkaProducer09 sink can lose messages
  • FLINK-4028: AbstractAlignedProcessingTimeWindowOperator creates wrong TimeWindow
  • FLINK-4030: ScalaShellITCase gets stuck
  • FLINK-4031: Nightly Jenkins job doesn't deploy sources
  • FLINK-4038: Impossible to set more than 1 JVM argument in env.java.opts
  • FLINK-4040: Same env.java.opts is applied for TM , JM and ZK
  • FLINK-4041: Failure while asking ResourceManager for RegisterResource
  • FLINK-4046: Failing a restarting job can get stuck in JobStatus.FAILING
  • FLINK-4052: Unstable test ConnectionUtilsTest
  • FLINK-4053: Return value from Connection should be checked against null
  • FLINK-4056: SavepointITCase.testCheckpointHasBeenRemoved failed on Travis
  • FLINK-4076: BoltWrapper#dispose() should call AbstractStreamOperator#dispose()
  • FLINK-4077: Register Pojo DataSet/DataStream as Table requires alias expression.
  • FLINK-4078: Use ClosureCleaner for CoGroup where
  • FLINK-4079: YARN properties file used for per-job cluster
  • FLINK-4083: Use ClosureCleaner for Join where and equalTo
  • FLINK-4089: Ineffective null check in YarnClusterClient#getApplicationStatus()
  • FLINK-4090: Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()
  • FLINK-4097: Cassandra Sink throws NPE on closing if server is not available
  • FLINK-4099: CliFrontendYarnAddressConfigurationTest fails
  • FLINK-4111: Flink Table & SQL doesn't work in very simple example
  • FLINK-4113: Always copy first value in ChainedAllReduceDriver
  • FLINK-4115: FsStateBackend filesystem verification can cause classpath exceptions
  • FLINK-4122: Cassandra jar contains 2 guava versions
  • FLINK-4123: CassandraWriteAheadSink can hang on cassandra failure
  • FLINK-4125: Yarn CLI incorrectly calculates slotsPerTM when parallelism < task manager count
  • FLINK-4130: CallGenerator could generate illegal code when taking no operands
  • FLINK-4132: Fix boxed comparison in CommunityDetection algorithm
  • FLINK-4133: Reflect streaming file source changes in documentation
  • FLINK-4134: EventTimeSessionWindows trigger for empty windows when dropping late events
  • FLINK-4139: Yarn: Adjust parallelism and task slots correctly
  • FLINK-4141: TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
  • FLINK-4142: Recovery problem in HA on Hadoop Yarn 2.4.1
  • FLINK-4144: Yarn properties file: replace hostname/port with Yarn application id
  • FLINK-4145: JmxReporterTest fails due to port conflicts
  • FLINK-4146: CliFrontendYarnAddressConfigurationTest picks wrong IP address on Travis
  • FLINK-4149: Fix Serialization of NFA in AbstractKeyedCEPPatternOperator
  • FLINK-4150: Problem with Blobstore in Yarn HA setting on recovery after cluster shutdown
  • FLINK-4152: TaskManager registration exponential backoff doesn't work
  • FLINK-4154: Correction of murmur hash breaks backwards compatibility
  • FLINK-4156: Job with -m yarn-cluster registers TaskManagers to another running Yarn session
  • FLINK-4157: FlinkKafkaMetrics cause TaskManager shutdown during cancellation
  • FLINK-4158: Scala QuickStart StreamingJob fails to compile
  • FLINK-4159: Quickstart poms exclude unused dependencies
  • FLINK-4160: YARN session doesn't show input validation errors
  • FLINK-4166: Generate automatic different namespaces in Zookeeper for Flink applications
  • FLINK-4167: TaskMetricGroup does not close IOMetricGroup
  • FLINK-4168: ForkableFlinkMiniCluster not available in Kinesis connector tests
  • FLINK-4171: StatsD does not accept metrics whose name contains ":"
  • FLINK-4172: Don't proxy a ProxiedObject
  • FLINK-4176: Travis build fails at flink-connector-kinesis for JDK: openjdk7
  • FLINK-4184: Ganglia and GraphiteReporter report metric names with invalid characters
  • FLINK-4196: Remove "recoveryTimestamp"
  • FLINK-4199: Misleading messages by CLI upon job submission
  • FLINK-4201: Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
  • FLINK-4214: JobExceptionsHandler will return all exceptions
  • FLINK-4216: WordWithCount example with Java has wrong generics type
  • FLINK-4232: Flink executable does not return correct pid
  • FLINK-4235: ClassLoaderITCase.testDisposeSavepointWithCustomKvState timed out on Travis
  • FLINK-4238: Only allow/require query for Tuple Stream in CassandraSink
  • FLINK-4258: Potential null pointer dereference in SavepointCoordinator#onFullyAcknowledgedCheckpoint
  • FLINK-4261: Setup atomic deployment of snapshots
  • FLINK-4277: TaskManagerConfigurationTest fails
  • FLINK-4284: DataSet/CEP link to non-existant "Linking with Flink" section
  • FLINK-4296: Scheduler accepts more tasks than it has task slots available
  • FLINK-4307: Broken user-facing API for ListState

改进

  • FLINK-1827: Move test classes in test folders and fix scope of test dependencies
  • FLINK-1991: Return Table as DataSet<Tuple>
  • FLINK-1996: Add output methods to Table API
  • FLINK-2227: .yarn-properties file is not cleaned up
  • FLINK-2522: Integrate Streaming Api into Flink-scala-shell
  • FLINK-2788: Add type hint with TypeExtactor call on Hint Type
  • FLINK-2829: Confusing error message when Flink cannot create enough task threads
  • FLINK-2929: Recovery of jobs on cluster restarts
  • FLINK-2935: Allow scala shell to read yarn properties
  • FLINK-2985: Allow different field names for unionAll() in Table API
  • FLINK-3115: Update Elasticsearch connector to 2.X
  • FLINK-3152: Support all comparisons for Date type
  • FLINK-3153: Support all comparisons for String type
  • FLINK-3174: Add merging WindowAssigner
  • FLINK-3272: Generalize vertex value type in ConnectedComponents
  • FLINK-3323: Add documentation for NiFi connector
  • FLINK-3332: Provide an exactly-once Cassandra connector
  • FLINK-3375: Allow Watermark Generation in the Kafka Source
  • FLINK-3383: Separate Maven deployment from CI testing
  • FLINK-3405: Extend NiFiSource with interface StoppableFunction
  • FLINK-3407: Extend TwitterSource with interface StoppableFunction
  • FLINK-3428: Add fixed time trailing timestamp/watermark extractor
  • FLINK-3461: Remove duplicate condition check in ZooKeeperLeaderElectionService
  • FLINK-3467: Remove superfluous objects from DataSourceTask.invoke
  • FLINK-3469: Improve documentation for grouping keys
  • FLINK-3487: FilterAggregateTransposeRule does not transform logical plan as desired.
  • FLINK-3503: ProjectJoinTransposeRule fails to push down project.
  • FLINK-3506: ReduceExpressionsRule does not remove duplicate expression in Filter
  • FLINK-3507: PruneEmptyRules does not prune empty node as expected.
  • FLINK-3524: Provide a JSONDeserialisationSchema in the kafka connector package
  • FLINK-3535: Decrease logging verbosity of StackTraceSampleCoordinator
  • FLINK-3541: Clean up workaround in FlinkKafkaConsumer09
  • FLINK-3559: Don't print pid file check if no active PID
  • FLINK-3560: Examples shouldn't always print usage statement
  • FLINK-3570: Replace random NIC selection heuristic by InetAddress.getLocalHost
  • FLINK-3575: Update Working With State Section in Doc
  • FLINK-3589: Allow setting Operator parallelism to default value
  • FLINK-3590: JDBC Format tests don't hide derby logs
  • FLINK-3591: Replace Quickstart K-Means Example by Streaming Example
  • FLINK-3596: DataSet RelNode refactoring
  • FLINK-3597: Table API operator names should reflect relational expression
  • FLINK-3603: Re-enable Table API explain
  • FLINK-3604: Enable ignored Table API tests
  • FLINK-3607: Decrease default forkCount for tests
  • FLINK-3614: Remove Non-Keyed Window Operator
  • FLINK-3618: Rename abstract UDF classes in Scatter-Gather implementation
  • FLINK-3623: Adjust MurmurHash algorithm
  • FLINK-3632: Clean up Table API exceptions
  • FLINK-3634: Fix documentation for DataSetUtils.zipWithUniqueId()
  • FLINK-3637: Change RollingSink Writer interface to allow wider range of outputs
  • FLINK-3641: Document registerCachedFile API call
  • FLINK-3649: Document stable API methods maxBy/minBy
  • FLINK-3650: Add maxBy/minBy to Scala DataSet API
  • FLINK-3654: Disable Write-Ahead-Log in RocksDB State
  • FLINK-3657: Change access of DataSetUtils.countElements() to 'public'
  • FLINK-3658: Allow the FlinkKafkaProducer to send data to multiple topics
  • FLINK-3664: Create a method to easily Summarize a DataSet
  • FLINK-3665: Range partitioning lacks support to define sort orders
  • FLINK-3666: Remove Nephele references
  • FLINK-3667: Generalize client<->cluster communication
  • FLINK-3678: Make Flink logs directory configurable
  • FLINK-3691: Extend AvroInputFormat to support Avro GenericRecord
  • FLINK-3700: Replace Guava Preconditions class with Flink Preconditions
  • FLINK-3736: Move toRexNode and toAggCall logic into Expressions
  • FLINK-3739: Add a null literal to Table API
  • FLINK-3747: Consolidate TimestampAssigner Methods in Kafka Consumer
  • FLINK-3750: Make JDBCInputFormat a parallel source
  • FLINK-3752: Add Per-Kafka-Partition Watermark Generation to the docs
  • FLINK-3754: Add a validation phase before construct RelNode using TableAPI
  • FLINK-3763: RabbitMQ Source/Sink standardize connection parameters
  • FLINK-3770: Fix TriangleEnumerator performance
  • FLINK-3776: Flink Scala shell does not allow to set configuration for local execution
  • FLINK-3794: Add checks for unsupported operations in streaming table API
  • FLINK-3799: Graph checksum should execute single job
  • FLINK-3804: Update YARN documentation
  • FLINK-3806: Revert use of DataSet.count() in Gelly
  • FLINK-3811: Refactor ExecutionEnvironment in TableEnvironment
  • FLINK-3815: Replace Guava Preconditions usage in flink-gelly
  • FLINK-3816: Replace Guava Preconditions usage in flink-clients
  • FLINK-3817: Remove unused Guava dependency from RocksDB StateBackend
  • FLINK-3818: Remove Guava dependency from flink-gelly-examples
  • FLINK-3819: Replace Guava Preconditions usage in flink-gelly-scala
  • FLINK-3821: Reduce Guava usage in flink-java
  • FLINK-3853: Reduce object creation in Gelly utility mappers
  • FLINK-3854: Support Avro key-value rolling sink writer
  • FLINK-3855: Upgrade Jackson version
  • FLINK-3868: Specialized CopyableValue serializers and comparators
  • FLINK-3876: Improve documentation of Scala Shell
  • FLINK-3880: Improve performance of Accumulator map
  • FLINK-3886: Give a better error when the application Main class is not public.
  • FLINK-3887: Improve dependency management for building docs
  • FLINK-3891: Add a class containing all supported Table API types
  • FLINK-3900: Set nullCheck=true as default in TableConfig
  • FLINK-3901: Create a RowCsvInputFormat to use as default CSV IF in Table API
  • FLINK-3912: Typos in Batch Scala API Documentation
  • FLINK-3913: Clean up documentation typos
  • FLINK-3916: Allow generic types passing the Table API
  • FLINK-3936: Add MIN / MAX aggregations function for BOOLEAN types
  • FLINK-3937: Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
  • FLINK-3945: Degree annotation for directed graphs
  • FLINK-3955: Change Table.toSink() to Table.writeToSink()
  • FLINK-3978: Add hasBroadcastVariable method to RuntimeContext
  • FLINK-3979: [documentation]add missed import classes in run_example_quickstart
  • FLINK-3980: Remove ExecutionConfig.PARALLELISM_UNKNOWN
  • FLINK-3993: [py] Add generateSequence() support to Python API
  • FLINK-4003: Use intrinsics for MathUtils logarithms
  • FLINK-4017: [py] Add Aggregation support to Python API
  • FLINK-4025: Add possiblity for the RMQ Streaming Source to customize the queue
  • FLINK-4026: Fix code, grammar, and link issues in the Streaming documentation
  • FLINK-4032: Replace all usage of Guava Preconditions
  • FLINK-4049: Mark RichInputFormat.openInputFormat and closeInputFormat as @PublicEvolving
  • FLINK-4054: Inconsistent Reporter synchronization within report()
  • FLINK-4063: Add Metrics Support for Triggers
  • FLINK-4066: RabbitMQ source, customize queue arguments
  • FLINK-4067: Add version header to savepoints
  • FLINK-4070: Support literals on left side of binary expressions
  • FLINK-4074: Reporter can block TaskManager shutdown
  • FLINK-4082: Add Setting for LargeRecordHandler
  • FLINK-4086: Hide internal Expression methods from Table API
  • FLINK-4087: JMXReporter can't handle port conflicts
  • FLINK-4093: Expose metric interfaces
  • FLINK-4095: Add configDir argument to shell scripts
  • FLINK-4103: Modify CsvTableSource to implement StreamTableSource
  • FLINK-4105: Restructure Gelly docs
  • FLINK-4106: Restructure Gelly docs
  • FLINK-4107: Restructure Gelly docs
  • FLINK-4109: Change the name of ternary condition operator 'eval' to '?'
  • FLINK-4116: Document metrics
  • FLINK-4118: The docker-flink image is outdated (1.0.2) and can be slimmed down
  • FLINK-4119: Null checks in close() for Cassandra Input/Output Formats, checking arguments via Flink Preconditions
  • FLINK-4121: Add timeunit (ms) to docs for timestamps and watermarks
  • FLINK-4127: Clean up configuration and check breaking API changes
  • FLINK-4135: Replace ChecksumHashCode as GraphAnalytic
  • FLINK-4170: Remove CONFIG_ prefix from KinesisConfigConstants variables
  • FLINK-4183: Move checking for StreamTableEnvironment into validation layer
  • FLINK-4185: Reflecting rename from Tachyon to Alluxio
  • FLINK-4186: Expose Kafka metrics through Flink metrics
  • FLINK-4192: Move Metrics API to separate module
  • FLINK-4197: Allow Kinesis Endpoint to be Overridden via Config
  • FLINK-4202: Add JM metric which shows the restart duration
  • FLINK-4206: Metric names should allow special characters
  • FLINK-4209: Fix issue on docker with multiple NICs and remove supervisord dependency
  • FLINK-4210: Move close()/isClosed() out of MetricGroup interface
  • FLINK-4229: Do not start Metrics Reporter by default
  • FLINK-4244: Field names for union operator do not have to be equal
  • FLINK-4246: Allow Specifying Multiple Metrics Reporters
  • FLINK-4269: Decrease log level in RuntimeMonitorHandler
  • FLINK-4279: [py] Set flink dependencies to provided
  • FLINK-4299: Show loss of job manager in Client

新特性

  • FLINK-1745: Add exact k-nearest-neighbours algorithm to machine learning library
  • FLINK-2044: Implementation of Gelly HITS Algorithm
  • FLINK-2099: Add a SQL API
  • FLINK-2166: Add fromCsvFile() to TableEnvironment
  • FLINK-2828: Add interfaces for Table API input formats
  • FLINK-2909: Gelly Graph Generators
  • FLINK-2946: Add orderBy() to Table API
  • FLINK-2971: Add outer joins to the Table API
  • FLINK-2997: Support range partition with user customized data distribution.
  • FLINK-2998: Support range partition comparison for multi input nodes.
  • FLINK-3034: Redis SInk Connector
  • FLINK-3068: Add a Table API configuration to TableEnvironment
  • FLINK-3207: Add a Pregel iteration abstraction to Gelly
  • FLINK-3311: Add a connector for streaming data into Cassandra
  • FLINK-3543: Introduce ResourceManager component
  • FLINK-3546: Streaming Table API
  • FLINK-3564: Implement distinct() for Table API
  • FLINK-3626: zipWithIndex in Python API
  • FLINK-3639: Add methods and utilities to register DataSets and Tables in the TableEnvironment
  • FLINK-3640: Add support for SQL queries in DataSet programs
  • FLINK-3721: Min and max accumulators
  • FLINK-3768: Clustering Coefficient
  • FLINK-3771: Methods for translating Graphs
  • FLINK-3772: Graph algorithms for vertex and edge degree
  • FLINK-3780: Jaccard Similarity
  • FLINK-3786: Add BigDecimal and BigInteger as Basic types
  • FLINK-3856: Create types for java.sql.Date/Time/Timestamp
  • FLINK-3859: Add BigDecimal/BigInteger support to Table API
  • FLINK-3872: Add Kafka TableSource with JSON serialization
  • FLINK-3879: Native implementation of HITS algorithm
  • FLINK-3898: Adamic-Adar Similarity
  • FLINK-3906: Global Clustering Coefficient
  • FLINK-3907: Directed Clustering Coefficient
  • FLINK-3919: Distributed Linear Algebra: row-based matrix
  • FLINK-3925: GraphAlgorithm to filter by maximum degree
  • FLINK-3941: Add support for UNION (with duplicate elimination)
  • FLINK-3942: Add support for INTERSECT
  • FLINK-3943: Add support for EXCEPT (set minus)
  • FLINK-3965: Delegating GraphAlgorithm
  • FLINK-4013: GraphAlgorithms to simplify directed and undirected graphs

Task

  • FLINK-3139: NULL values handling in Table API
  • FLINK-3221: Move Table API on top of Apache Calcite
  • FLINK-3609: Revisit selection of Calcite rules
  • FLINK-3738: Refactor TableEnvironment and TranslationContext
  • FLINK-3743: Upgrade breeze from 0.11.2 to 0.12
  • FLINK-3847: Reorganize package structure of flink-table
  • FLINK-3852: Use a StreamExecutionEnvironment in the quickstart job skeleton

Test

  • FLINK-2100: Add ITCases for all Table API examples
  • FLINK-2444: Add tests for HadoopInputFormats
  • FLINK-2445: Add tests for HadoopOutputFormats
  • FLINK-3517: Number of job and task managers not checked in scripts
  • FLINK-3782: ByteArrayOutputStream and ObjectOutputStream should close
  • FLINK-4010: Scala Shell tests may fail because of a locked STDIN
  • FLINK-4230: Session Windowing IT Case

Wish

  • FLINK-2673: Scala API does not support Option type as key

下载

 

本站原创,转载时保留以下信息:
本文转自:深度开源(open-open.com)
原文地址:http://www.open-open.com/news/view/3123c604

 

正文到此结束
Loading...