Flink 是一个针对流数据和批数据的分布式处理引擎。它主要是由 Java 代码实现。目前主要还是依靠开源社区的贡献而发展。对 Flink 而言,其所要处理的主要场景就是流数据,批数据只是流数据的一个极限特例而已。再换句话说,Flink 会把所有任务当成流来处理,这也是其最大的特点。Flink 可以支持本地的快速迭代,以及一些环形的迭代任务。并且 Flink 可以定制化内存管理。在这点,如果要对比 Flink 和 Spark 的话,Flink 并没有将内存完全交给应用层。这也是为什么 Spark 相对于 Flink,更容易出现 OOM 的原因(out of memory)。就框架本身与应用场景来说,Flink 更相似与 Storm。如果之前了解过 Storm 或者 Flume 的读者,可能会更容易理解 Flink 的架构和很多概念。
Flink 的架构图
更新日志
- FLINK-1502: Expose metrics to graphite, ganglia and JMX.
- FLINK-1550: Show JVM Metrics for JobManager
- FLINK-3129: Add tooling to ensure interface stability
- FLINK-3141: Design of NULL values handling in operation
- FLINK-3223: Translate Table API query into logical relational plans for Apache Calcite
- FLINK-3225: Optimize logical Table API plans in Calcite
- FLINK-3226: Translate optimized logical Table API plans into physical plans representing DataSet programs
- FLINK-3229: Kinesis streaming consumer with integration of Flink's checkpointing mechanics
- FLINK-3230: Kinesis streaming producer
- FLINK-3231: Handle Kinesis-side resharding in Kinesis streaming consumer
- FLINK-3327: Attach the ExecutionConfig to the JobGraph and make it accessible to the AbstractInvocable.
- FLINK-3489: Refactor Table API before merging into master
- FLINK-3544: ResourceManager runtime components
- FLINK-3545: ResourceManager: YARN integration
- FLINK-3547: Add support for streaming projection, selection, and union
- FLINK-3550: Rework stream join example
- FLINK-3552: Change socket WordCount to be properly windowed
- FLINK-3573: Implement more String functions for Table API
- FLINK-3574: Implement math functions for Table API
- FLINK-3586: Risk of data overflow while use sum/count to calculate AVG value
- FLINK-3612: Fix/adjust Table API examples
- FLINK-3714: Add Support for "Allowed Lateness"
- FLINK-3727: Add support for embedded streaming SQL (projection, filter, union)
- FLINK-3748: Add CASE function to Table API
- FLINK-3756: Introduce state hierarchy in CheckpointCoordinator
- FLINK-3837: Create FLOOR/CEIL function
- FLINK-3923: Unify configuration conventions of the Kinesis producer to the same as the consumer
- FLINK-3949: Collect Metrics in Runtime Operators
- FLINK-3951: Add Histogram Metric Type
- FLINK-4018: Configurable idle time between getRecords requests to Kinesis shards
- FLINK-4019: Expose approximateArrivalTimestamp through the KinesisDeserializationSchema interface
- FLINK-4020: Remove shard list querying from Kinesis consumer constructor
- FLINK-4033: Missing Scala example snippets for the Kinesis Connector documentation
- FLINK-4057: Expose JobManager Metrics
- FLINK-4062: Update Windowing Documentation
- FLINK-4080: Kinesis consumer not exactly-once if stopped in the middle of processing aggregated records
- FLINK-4085: Set Kinesis Consumer Agent to Flink
- FLINK-4191: Expose shard information in KinesisDeserializationSchema
- FLINK-4239: Set Default Allowed Lateness to Zero and Make Triggers Non-Purging
Bug修复
- FLINK-1159: Case style anonymous functions not supported by Scala API
- FLINK-1964: Rework TwitterSource to use a Properties object instead of a file path
- FLINK-2392: Instable test in flink-yarn-tests
- FLINK-2544: Some test cases using PowerMock fail with Java 8u20
- FLINK-2832: Failing test: RandomSamplerTest.testReservoirSamplerWithReplacement
- FLINK-2915: JobManagerProcessFailureBatchRecoveryITCase
- FLINK-3086: ExpressionParser does not support concatenation of suffix operations
- FLINK-3105: Submission in per job YARN cluster mode reuses properties file of long-lived session
- FLINK-3126: Remove accumulator type from "value" in web frontend
- FLINK-3179: Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned
- FLINK-3333: Documentation about object reuse should be improved
- FLINK-3380: Unstable Test: JobSubmissionFailsITCase
- FLINK-3396: Job submission Savepoint restore logic flawed
- FLINK-3411: Failed recovery can lead to removal of HA state
- FLINK-3444: env.fromElements relies on the first input element for determining the DataSet/DataStream type
- FLINK-3466: Job might get stuck in restoreState() from HDFS due to interrupt
- FLINK-3471: JDBCInputFormat cannot handle null fields of certain types
- FLINK-3472: JDBCInputFormat.nextRecord(..) has misleading message on NPE
- FLINK-3488: Kafka08ITCase.testBigRecordJob fails on Travis
- FLINK-3491: HDFSCopyUtilitiesTest fails on Windows
- FLINK-3495: RocksDB Tests can't run on Windows
- FLINK-3505: JoinUnionTransposeRule fails to push Join past Union.
- FLINK-3519: Subclasses of Tuples don't work if the declared type of a DataSet is not the descendant
- FLINK-3530: Kafka09ITCase.testBigRecordJob fails on Travis
- FLINK-3533: Update the Gelly docs wrt examples and cluster execution
- FLINK-3534: Cancelling a running job can lead to restart instead of stopping
- FLINK-3540: Hadoop 2.6.3 build contains /com/google/common (guava) classes in flink-dist.jar
- FLINK-3556: Unneeded check in HA blob store configuration
- FLINK-3561: ExecutionConfig's timestampsEnabled is unused
- FLINK-3562: Update docs in the course of EventTimeSourceFunction removal
- FLINK-3563: .returns() doesn't compile when using .map() with a custom MapFunction
- FLINK-3565: FlinkKafkaConsumer does not work with Scala 2.11
- FLINK-3566: Input type validation often fails on custom TypeInfo implementations
- FLINK-3567: Rework selection when grouping in Table API
- FLINK-3569: Test cases fail due to Maven Shade plugin
- FLINK-3577: Display anchor links when hovering over headers.
- FLINK-3578: Scala DataStream API does not support Rich Window Functions
- FLINK-3579: Improve String concatenation
- FLINK-3583: Configuration not visible in gui when job is running
- FLINK-3585: Deploy scripts don't support spaces in paths
- FLINK-3593: DistinctITCase is failing
- FLINK-3595: Kafka09 consumer thread does not interrupt when stuck in record emission
- FLINK-3601: JobManagerTest times out on StopSignal test
- FLINK-3602: Recursive Types are not supported / crash TypeExtractor
- FLINK-3608: ImmutableSettings error in ElasticsearchSink
- FLINK-3611: Wrong link in CONTRIBUTING.md
- FLINK-3619: SavepointCoordinator test failure
- FLINK-3621: Misleading documentation of memory configuration parameters
- FLINK-3622: Improve error messages for invalid joins
- FLINK-3630: Little mistake in documentation
- FLINK-3631: CodeGenerator does not check type compatibility for equality expressions
- FLINK-3633: Job submission silently fails when using user code types
- FLINK-3635: Potential null deference in TwitterExample#SelectEnglishAndTokenizeFlatMap#flatMap
- FLINK-3636: NoClassDefFoundError while running WindowJoin example
- FLINK-3638: Invalid default ports in documentation
- FLINK-3644: WebRuntimMonitor set java.io.tmpdir does not work for change upload dir.
- FLINK-3645: HDFSCopyUtilitiesTest fails in a Hadoop cluster
- FLINK-3651: Fix faulty RollingSink Restore
- FLINK-3653: recovery.zookeeper.storageDir is not documented on the configuration page
- FLINK-3663: FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
- FLINK-3669: WindowOperator registers a lot of timers at StreamTask
- FLINK-3675: YARN ship folder incosistent behavior
- FLINK-3676: WebClient hasn't been removed from the docs
- FLINK-3681: CEP library does not support Java 8 lambdas as select function
- FLINK-3682: CEP operator does not set the processing timestamp correctly
- FLINK-3684: CEP operator does not forward watermarks properly
- FLINK-3689: JobManager blocks cluster shutdown when not connected to ResourceManager
- FLINK-3693: JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour is unstable
- FLINK-3696: Some Union tests fail for TableConfigMode.EFFICIENT
- FLINK-3697: keyBy() with nested POJO computes invalid field position indexes
- FLINK-3701: Cant call execute after first execution
- FLINK-3712: YARN client dynamic properties are not passed correctly to the leader election service on the client
- FLINK-3713: DisposeSavepoint message uses system classloader to discard state
- FLINK-3716: Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass
- FLINK-3718: Add Option For Completely Async Backup in RocksDB State Backend
- FLINK-3725: Exception in thread "main" scala.MatchError: … (of class scala.Tuple4)
- FLINK-3728: Throw meaningful exceptions for unsupported SQL features
- FLINK-3729: Several SQL tests fail on Windows OS
- FLINK-3730: Fix RocksDB Local Directory Initialization
- FLINK-3731: Embedded SQL outer joins should fail during translation
- FLINK-3732: Potential null deference in ExecutionConfig#equals()
- FLINK-3735: Embedded SQL union should fail during translation
- FLINK-3737: WikipediaEditsSourceTest.testWikipediaEditsSource() fails locally
- FLINK-3745: TimestampITCase testWatermarkPropagationNoFinalWatermarkOnStop failing intermittently
- FLINK-3757: addAccumulator does not throw Exception on duplicate accumulator name
- FLINK-3759: Table API should throw exception is null value is encountered in non-null mode.
- FLINK-3762: Kryo StackOverflowError due to disabled Kryo Reference tracking
- FLINK-3773: Scanners are left unclosed in SqlExplainTest
- FLINK-3774: Flink configuration is not correctly forwarded to PlanExecutor in ScalaShellRemoteEnvironment
- FLINK-3781: BlobClient may be left unclosed in BlobCache#deleteGlobal()
- FLINK-3790: Rolling File sink does not pick up hadoop configuration
- FLINK-3792: RowTypeInfo equality should not depend on field names
- FLINK-3793: Re-organize the Table API and SQL docs
- FLINK-3796: FileSourceFunction doesn't respect InputFormat's life cycle methods
- FLINK-3803: Checkpoint Stats Tracker Reads from Wrong Configuration
- FLINK-3824: ResourceManager may repeatedly connect to outdated JobManager in HA mode
- FLINK-3826: Broken test: StreamCheckpointingITCase
- FLINK-3835: JSON execution plan not helpful to debug plans with KeySelectors
- FLINK-3838: CLI parameter parser is munging application params
- FLINK-3840: RocksDB local parent dir is polluted with empty folders with random names
- FLINK-3842: Fix handling null record/row in generated code
- FLINK-3845: Gelly allows duplicate vertices in Graph.addVertices
- FLINK-3846: Graph.removeEdges also removes duplicate edges
- FLINK-3860: WikipediaEditsSourceTest.testWikipediaEditsSource times out
- FLINK-3863: Yarn Cluster shutdown may fail if leader changed recently
- FLINK-3864: Yarn tests don't check for prohibited strings in log output
- FLINK-3877: Create TranslateFunction interface for Graph translators
- FLINK-3878: File cache doesn't support multiple duplicate temp directories
- FLINK-3882: Errors in sample Java code for the Elasticsearch 2.x sink
- FLINK-3890: Deprecate streaming mode flag from Yarn CLI
- FLINK-3892: ConnectionUtils may die with NullPointerException
- FLINK-3893: LeaderChangeStateCleanupTest times out
- FLINK-3908: FieldParsers error state is not reset correctly to NONE
- FLINK-3909: Maven Failsafe plugin may report SUCCESS on failed tests
- FLINK-3914: BlobServer.createTemporaryFilename() has concurrency safety problem
- FLINK-3922: Infinite recursion on TypeExtractor
- FLINK-3926: Incorrect implementation of getFieldIndex in TupleTypeInfo
- FLINK-3927: TaskManager registration may fail if Yarn versions don't match
- FLINK-3928: Potential overflow due to 32-bit int arithmetic
- FLINK-3933: Add an auto-type-extracting DeserializationSchema
- FLINK-3934: Prevent translation of non-equi joins in DataSetJoinRule
- FLINK-3935: Invalid check of key and ordering fields in PartitionNode
- FLINK-3938: Yarn tests don't run on the current master
- FLINK-3939: Prevent distinct aggregates and grouping sets from being translated
- FLINK-3944: Add optimization rules to reorder Cartesian products and joins
- FLINK-3948: EventTimeWindowCheckpointingITCase Fails with Core Dump
- FLINK-3953: Surefire plugin executes unit tests twice
- FLINK-3956: Make FileInputFormats independent from Configuration
- FLINK-3960: Disable, fix and re-enable EventTimeWindowCheckpointingITCase
- FLINK-3962: JMXReporter doesn't properly register/deregister metrics
- FLINK-3963: AbstractReporter uses shaded dependency
- FLINK-3971: Aggregates handle null values incorrectly.
- FLINK-3972: Subclasses of ResourceID may not to be serializable
- FLINK-3973: Table API documentation is "hidden" in Programming Guide menu list
- FLINK-3974: enableObjectReuse fails when an operator chains to multiple downstream operators
- FLINK-3975: Override baseurl when serving docs locally
- FLINK-3977: Subclasses of InternalWindowFunction must support OutputTypeConfigurable
- FLINK-3981: Don't log duplicate TaskManager registrations as exceptions
- FLINK-3982: Multiple ResourceManagers register at JobManager in standalone HA mode
- FLINK-3994: Instable KNNITSuite
- FLINK-3995: Properly Structure Test Utils and Dependencies
- FLINK-4000: Exception: Could not restore checkpointed state to operators and functions; during Job Restart (Job restart is triggered due to one of the task manager failure)
- FLINK-4002: [py] Improve testing infraestructure
- FLINK-4009: Scala Shell fails to find library for inclusion in test
- FLINK-4011: Unable to access completed job in web frontend
- FLINK-4012: Docs: Links to "Iterations" are broken (404)
- FLINK-4016: FoldApplyWindowFunction is not properly initialized
- FLINK-4024: FileSourceFunction not adjusted to new IF lifecycle
- FLINK-4027: FlinkKafkaProducer09 sink can lose messages
- FLINK-4028: AbstractAlignedProcessingTimeWindowOperator creates wrong TimeWindow
- FLINK-4030: ScalaShellITCase gets stuck
- FLINK-4031: Nightly Jenkins job doesn't deploy sources
- FLINK-4038: Impossible to set more than 1 JVM argument in env.java.opts
- FLINK-4040: Same env.java.opts is applied for TM , JM and ZK
- FLINK-4041: Failure while asking ResourceManager for RegisterResource
- FLINK-4046: Failing a restarting job can get stuck in JobStatus.FAILING
- FLINK-4052: Unstable test ConnectionUtilsTest
- FLINK-4053: Return value from Connection should be checked against null
- FLINK-4056: SavepointITCase.testCheckpointHasBeenRemoved failed on Travis
- FLINK-4076: BoltWrapper#dispose() should call AbstractStreamOperator#dispose()
- FLINK-4077: Register Pojo DataSet/DataStream as Table requires alias expression.
- FLINK-4078: Use ClosureCleaner for CoGroup where
- FLINK-4079: YARN properties file used for per-job cluster
- FLINK-4083: Use ClosureCleaner for Join where and equalTo
- FLINK-4089: Ineffective null check in YarnClusterClient#getApplicationStatus()
- FLINK-4090: Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()
- FLINK-4097: Cassandra Sink throws NPE on closing if server is not available
- FLINK-4099: CliFrontendYarnAddressConfigurationTest fails
- FLINK-4111: Flink Table & SQL doesn't work in very simple example
- FLINK-4113: Always copy first value in ChainedAllReduceDriver
- FLINK-4115: FsStateBackend filesystem verification can cause classpath exceptions
- FLINK-4122: Cassandra jar contains 2 guava versions
- FLINK-4123: CassandraWriteAheadSink can hang on cassandra failure
- FLINK-4125: Yarn CLI incorrectly calculates slotsPerTM when parallelism < task manager count
- FLINK-4130: CallGenerator could generate illegal code when taking no operands
- FLINK-4132: Fix boxed comparison in CommunityDetection algorithm
- FLINK-4133: Reflect streaming file source changes in documentation
- FLINK-4134: EventTimeSessionWindows trigger for empty windows when dropping late events
- FLINK-4139: Yarn: Adjust parallelism and task slots correctly
- FLINK-4141: TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
- FLINK-4142: Recovery problem in HA on Hadoop Yarn 2.4.1
- FLINK-4144: Yarn properties file: replace hostname/port with Yarn application id
- FLINK-4145: JmxReporterTest fails due to port conflicts
- FLINK-4146: CliFrontendYarnAddressConfigurationTest picks wrong IP address on Travis
- FLINK-4149: Fix Serialization of NFA in AbstractKeyedCEPPatternOperator
- FLINK-4150: Problem with Blobstore in Yarn HA setting on recovery after cluster shutdown
- FLINK-4152: TaskManager registration exponential backoff doesn't work
- FLINK-4154: Correction of murmur hash breaks backwards compatibility
- FLINK-4156: Job with -m yarn-cluster registers TaskManagers to another running Yarn session
- FLINK-4157: FlinkKafkaMetrics cause TaskManager shutdown during cancellation
- FLINK-4158: Scala QuickStart StreamingJob fails to compile
- FLINK-4159: Quickstart poms exclude unused dependencies
- FLINK-4160: YARN session doesn't show input validation errors
- FLINK-4166: Generate automatic different namespaces in Zookeeper for Flink applications
- FLINK-4167: TaskMetricGroup does not close IOMetricGroup
- FLINK-4168: ForkableFlinkMiniCluster not available in Kinesis connector tests
- FLINK-4171: StatsD does not accept metrics whose name contains ":"
- FLINK-4172: Don't proxy a ProxiedObject
- FLINK-4176: Travis build fails at flink-connector-kinesis for JDK: openjdk7
- FLINK-4184: Ganglia and GraphiteReporter report metric names with invalid characters
- FLINK-4196: Remove "recoveryTimestamp"
- FLINK-4199: Misleading messages by CLI upon job submission
- FLINK-4201: Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
- FLINK-4214: JobExceptionsHandler will return all exceptions
- FLINK-4216: WordWithCount example with Java has wrong generics type
- FLINK-4232: Flink executable does not return correct pid
- FLINK-4235: ClassLoaderITCase.testDisposeSavepointWithCustomKvState timed out on Travis
- FLINK-4238: Only allow/require query for Tuple Stream in CassandraSink
- FLINK-4258: Potential null pointer dereference in SavepointCoordinator#onFullyAcknowledgedCheckpoint
- FLINK-4261: Setup atomic deployment of snapshots
- FLINK-4277: TaskManagerConfigurationTest fails
- FLINK-4284: DataSet/CEP link to non-existant "Linking with Flink" section
- FLINK-4296: Scheduler accepts more tasks than it has task slots available
- FLINK-4307: Broken user-facing API for ListState
改进
- FLINK-1827: Move test classes in test folders and fix scope of test dependencies
- FLINK-1991: Return Table as DataSet<Tuple>
- FLINK-1996: Add output methods to Table API
- FLINK-2227: .yarn-properties file is not cleaned up
- FLINK-2522: Integrate Streaming Api into Flink-scala-shell
- FLINK-2788: Add type hint with TypeExtactor call on Hint Type
- FLINK-2829: Confusing error message when Flink cannot create enough task threads
- FLINK-2929: Recovery of jobs on cluster restarts
- FLINK-2935: Allow scala shell to read yarn properties
- FLINK-2985: Allow different field names for unionAll() in Table API
- FLINK-3115: Update Elasticsearch connector to 2.X
- FLINK-3152: Support all comparisons for Date type
- FLINK-3153: Support all comparisons for String type
- FLINK-3174: Add merging WindowAssigner
- FLINK-3272: Generalize vertex value type in ConnectedComponents
- FLINK-3323: Add documentation for NiFi connector
- FLINK-3332: Provide an exactly-once Cassandra connector
- FLINK-3375: Allow Watermark Generation in the Kafka Source
- FLINK-3383: Separate Maven deployment from CI testing
- FLINK-3405: Extend NiFiSource with interface StoppableFunction
- FLINK-3407: Extend TwitterSource with interface StoppableFunction
- FLINK-3428: Add fixed time trailing timestamp/watermark extractor
- FLINK-3461: Remove duplicate condition check in ZooKeeperLeaderElectionService
- FLINK-3467: Remove superfluous objects from DataSourceTask.invoke
- FLINK-3469: Improve documentation for grouping keys
- FLINK-3487: FilterAggregateTransposeRule does not transform logical plan as desired.
- FLINK-3503: ProjectJoinTransposeRule fails to push down project.
- FLINK-3506: ReduceExpressionsRule does not remove duplicate expression in Filter
- FLINK-3507: PruneEmptyRules does not prune empty node as expected.
- FLINK-3524: Provide a JSONDeserialisationSchema in the kafka connector package
- FLINK-3535: Decrease logging verbosity of StackTraceSampleCoordinator
- FLINK-3541: Clean up workaround in FlinkKafkaConsumer09
- FLINK-3559: Don't print pid file check if no active PID
- FLINK-3560: Examples shouldn't always print usage statement
- FLINK-3570: Replace random NIC selection heuristic by InetAddress.getLocalHost
- FLINK-3575: Update Working With State Section in Doc
- FLINK-3589: Allow setting Operator parallelism to default value
- FLINK-3590: JDBC Format tests don't hide derby logs
- FLINK-3591: Replace Quickstart K-Means Example by Streaming Example
- FLINK-3596: DataSet RelNode refactoring
- FLINK-3597: Table API operator names should reflect relational expression
- FLINK-3603: Re-enable Table API explain
- FLINK-3604: Enable ignored Table API tests
- FLINK-3607: Decrease default forkCount for tests
- FLINK-3614: Remove Non-Keyed Window Operator
- FLINK-3618: Rename abstract UDF classes in Scatter-Gather implementation
- FLINK-3623: Adjust MurmurHash algorithm
- FLINK-3632: Clean up Table API exceptions
- FLINK-3634: Fix documentation for DataSetUtils.zipWithUniqueId()
- FLINK-3637: Change RollingSink Writer interface to allow wider range of outputs
- FLINK-3641: Document registerCachedFile API call
- FLINK-3649: Document stable API methods maxBy/minBy
- FLINK-3650: Add maxBy/minBy to Scala DataSet API
- FLINK-3654: Disable Write-Ahead-Log in RocksDB State
- FLINK-3657: Change access of DataSetUtils.countElements() to 'public'
- FLINK-3658: Allow the FlinkKafkaProducer to send data to multiple topics
- FLINK-3664: Create a method to easily Summarize a DataSet
- FLINK-3665: Range partitioning lacks support to define sort orders
- FLINK-3666: Remove Nephele references
- FLINK-3667: Generalize client<->cluster communication
- FLINK-3678: Make Flink logs directory configurable
- FLINK-3691: Extend AvroInputFormat to support Avro GenericRecord
- FLINK-3700: Replace Guava Preconditions class with Flink Preconditions
- FLINK-3736: Move toRexNode and toAggCall logic into Expressions
- FLINK-3739: Add a null literal to Table API
- FLINK-3747: Consolidate TimestampAssigner Methods in Kafka Consumer
- FLINK-3750: Make JDBCInputFormat a parallel source
- FLINK-3752: Add Per-Kafka-Partition Watermark Generation to the docs
- FLINK-3754: Add a validation phase before construct RelNode using TableAPI
- FLINK-3763: RabbitMQ Source/Sink standardize connection parameters
- FLINK-3770: Fix TriangleEnumerator performance
- FLINK-3776: Flink Scala shell does not allow to set configuration for local execution
- FLINK-3794: Add checks for unsupported operations in streaming table API
- FLINK-3799: Graph checksum should execute single job
- FLINK-3804: Update YARN documentation
- FLINK-3806: Revert use of DataSet.count() in Gelly
- FLINK-3811: Refactor ExecutionEnvironment in TableEnvironment
- FLINK-3815: Replace Guava Preconditions usage in flink-gelly
- FLINK-3816: Replace Guava Preconditions usage in flink-clients
- FLINK-3817: Remove unused Guava dependency from RocksDB StateBackend
- FLINK-3818: Remove Guava dependency from flink-gelly-examples
- FLINK-3819: Replace Guava Preconditions usage in flink-gelly-scala
- FLINK-3821: Reduce Guava usage in flink-java
- FLINK-3853: Reduce object creation in Gelly utility mappers
- FLINK-3854: Support Avro key-value rolling sink writer
- FLINK-3855: Upgrade Jackson version
- FLINK-3868: Specialized CopyableValue serializers and comparators
- FLINK-3876: Improve documentation of Scala Shell
- FLINK-3880: Improve performance of Accumulator map
- FLINK-3886: Give a better error when the application Main class is not public.
- FLINK-3887: Improve dependency management for building docs
- FLINK-3891: Add a class containing all supported Table API types
- FLINK-3900: Set nullCheck=true as default in TableConfig
- FLINK-3901: Create a RowCsvInputFormat to use as default CSV IF in Table API
- FLINK-3912: Typos in Batch Scala API Documentation
- FLINK-3913: Clean up documentation typos
- FLINK-3916: Allow generic types passing the Table API
- FLINK-3936: Add MIN / MAX aggregations function for BOOLEAN types
- FLINK-3937: Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
- FLINK-3945: Degree annotation for directed graphs
- FLINK-3955: Change Table.toSink() to Table.writeToSink()
- FLINK-3978: Add hasBroadcastVariable method to RuntimeContext
- FLINK-3979: [documentation]add missed import classes in run_example_quickstart
- FLINK-3980: Remove ExecutionConfig.PARALLELISM_UNKNOWN
- FLINK-3993: [py] Add generateSequence() support to Python API
- FLINK-4003: Use intrinsics for MathUtils logarithms
- FLINK-4017: [py] Add Aggregation support to Python API
- FLINK-4025: Add possiblity for the RMQ Streaming Source to customize the queue
- FLINK-4026: Fix code, grammar, and link issues in the Streaming documentation
- FLINK-4032: Replace all usage of Guava Preconditions
- FLINK-4049: Mark RichInputFormat.openInputFormat and closeInputFormat as @PublicEvolving
- FLINK-4054: Inconsistent Reporter synchronization within report()
- FLINK-4063: Add Metrics Support for Triggers
- FLINK-4066: RabbitMQ source, customize queue arguments
- FLINK-4067: Add version header to savepoints
- FLINK-4070: Support literals on left side of binary expressions
- FLINK-4074: Reporter can block TaskManager shutdown
- FLINK-4082: Add Setting for LargeRecordHandler
- FLINK-4086: Hide internal Expression methods from Table API
- FLINK-4087: JMXReporter can't handle port conflicts
- FLINK-4093: Expose metric interfaces
- FLINK-4095: Add configDir argument to shell scripts
- FLINK-4103: Modify CsvTableSource to implement StreamTableSource
- FLINK-4105: Restructure Gelly docs
- FLINK-4106: Restructure Gelly docs
- FLINK-4107: Restructure Gelly docs
- FLINK-4109: Change the name of ternary condition operator 'eval' to '?'
- FLINK-4116: Document metrics
- FLINK-4118: The docker-flink image is outdated (1.0.2) and can be slimmed down
- FLINK-4119: Null checks in close() for Cassandra Input/Output Formats, checking arguments via Flink Preconditions
- FLINK-4121: Add timeunit (ms) to docs for timestamps and watermarks
- FLINK-4127: Clean up configuration and check breaking API changes
- FLINK-4135: Replace ChecksumHashCode as GraphAnalytic
- FLINK-4170: Remove
CONFIG_
prefix from KinesisConfigConstants variables
- FLINK-4183: Move checking for StreamTableEnvironment into validation layer
- FLINK-4185: Reflecting rename from Tachyon to Alluxio
- FLINK-4186: Expose Kafka metrics through Flink metrics
- FLINK-4192: Move Metrics API to separate module
- FLINK-4197: Allow Kinesis Endpoint to be Overridden via Config
- FLINK-4202: Add JM metric which shows the restart duration
- FLINK-4206: Metric names should allow special characters
- FLINK-4209: Fix issue on docker with multiple NICs and remove supervisord dependency
- FLINK-4210: Move close()/isClosed() out of MetricGroup interface
- FLINK-4229: Do not start Metrics Reporter by default
- FLINK-4244: Field names for union operator do not have to be equal
- FLINK-4246: Allow Specifying Multiple Metrics Reporters
- FLINK-4269: Decrease log level in RuntimeMonitorHandler
- FLINK-4279: [py] Set flink dependencies to provided
- FLINK-4299: Show loss of job manager in Client
新特性
- FLINK-1745: Add exact k-nearest-neighbours algorithm to machine learning library
- FLINK-2044: Implementation of Gelly HITS Algorithm
- FLINK-2099: Add a SQL API
- FLINK-2166: Add fromCsvFile() to TableEnvironment
- FLINK-2828: Add interfaces for Table API input formats
- FLINK-2909: Gelly Graph Generators
- FLINK-2946: Add orderBy() to Table API
- FLINK-2971: Add outer joins to the Table API
- FLINK-2997: Support range partition with user customized data distribution.
- FLINK-2998: Support range partition comparison for multi input nodes.
- FLINK-3034: Redis SInk Connector
- FLINK-3068: Add a Table API configuration to TableEnvironment
- FLINK-3207: Add a Pregel iteration abstraction to Gelly
- FLINK-3311: Add a connector for streaming data into Cassandra
- FLINK-3543: Introduce ResourceManager component
- FLINK-3546: Streaming Table API
- FLINK-3564: Implement distinct() for Table API
- FLINK-3626: zipWithIndex in Python API
- FLINK-3639: Add methods and utilities to register DataSets and Tables in the TableEnvironment
- FLINK-3640: Add support for SQL queries in DataSet programs
- FLINK-3721: Min and max accumulators
- FLINK-3768: Clustering Coefficient
- FLINK-3771: Methods for translating Graphs
- FLINK-3772: Graph algorithms for vertex and edge degree
- FLINK-3780: Jaccard Similarity
- FLINK-3786: Add BigDecimal and BigInteger as Basic types
- FLINK-3856: Create types for java.sql.Date/Time/Timestamp
- FLINK-3859: Add BigDecimal/BigInteger support to Table API
- FLINK-3872: Add Kafka TableSource with JSON serialization
- FLINK-3879: Native implementation of HITS algorithm
- FLINK-3898: Adamic-Adar Similarity
- FLINK-3906: Global Clustering Coefficient
- FLINK-3907: Directed Clustering Coefficient
- FLINK-3919: Distributed Linear Algebra: row-based matrix
- FLINK-3925: GraphAlgorithm to filter by maximum degree
- FLINK-3941: Add support for UNION (with duplicate elimination)
- FLINK-3942: Add support for INTERSECT
- FLINK-3943: Add support for EXCEPT (set minus)
- FLINK-3965: Delegating GraphAlgorithm
- FLINK-4013: GraphAlgorithms to simplify directed and undirected graphs
Task
- FLINK-3139: NULL values handling in Table API
- FLINK-3221: Move Table API on top of Apache Calcite
- FLINK-3609: Revisit selection of Calcite rules
- FLINK-3738: Refactor TableEnvironment and TranslationContext
- FLINK-3743: Upgrade breeze from 0.11.2 to 0.12
- FLINK-3847: Reorganize package structure of flink-table
- FLINK-3852: Use a StreamExecutionEnvironment in the quickstart job skeleton
Test
- FLINK-2100: Add ITCases for all Table API examples
- FLINK-2444: Add tests for HadoopInputFormats
- FLINK-2445: Add tests for HadoopOutputFormats
- FLINK-3517: Number of job and task managers not checked in scripts
- FLINK-3782: ByteArrayOutputStream and ObjectOutputStream should close
- FLINK-4010: Scala Shell tests may fail because of a locked STDIN
- FLINK-4230: Session Windowing IT Case
Wish
- FLINK-2673: Scala API does not support Option type as key
下载
本站原创,转载时保留以下信息:
本文转自:深度开源(open-open.com)
原文地址:http://www.open-open.com/news/view/3123c604