Spring Cloud Data Flow 是一个用于构建和编排数据处理流水线的云原生框架。它提供了一种简化的方式来定义、部署和管理数据处理任务和流应用程序。以下是一些关键特性和组件:
流处理:
批处理任务:
可扩展性:
可视化界面:
监控和管理:
Spring Cloud Data Flow Server:
Skipper:
数据流应用程序:
任务应用程序:
Spring Cloud Data Flow 适用于需要处理大规模数据流和批处理任务的场景,特别是在分布式系统和云环境中。它简化了数据管道的开发和管理,使开发者能够专注于业务逻辑的实现
version: '3'
services:
mysql:
image: mysql:5.7.25
container_name: dataflow-mysql
environment:
MYSQL_DATABASE: dataflow
MYSQL_USER: root
MYSQL_ROOT_PASSWORD: rootpw
expose:
- 3306
ports:
- "3306:3306"
volumes:
- ./my.cnf:/etc/mysql/my.cnf
kafka-broker:
image: confluentinc/cp-kafka:5.3.1
container_name: dataflow-kafka
expose:
- "9092"
environment:
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka-broker:9092
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_HOST_NAME=kafka-broker
- KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
depends_on:
- zookeeper
zookeeper:
image: confluentinc/cp-zookeeper:5.3.1
container_name: dataflow-kafka-zookeeper
expose:
- "2181"
environment:
- ZOOKEEPER_CLIENT_PORT=2181
dataflow-server:
image: springcloud/spring-cloud-dataflow-server:2.6.3
container_name: dataflow-server
ports:
- "9393:9393"
environment:
- spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers=PLAINTEXT://kafka-broker:9092
- spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.streams.binder.brokers=PLAINTEXT://kafka-broker:9092
- spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes=zookeeper:2181
- spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.streams.binder.zkNodes=zookeeper:2181
- spring.cloud.skipper.client.serverUri=http://skipper-server:7577/api
- SPRING_DATASOURCE_URL=jdbc:mysql://mysql:3306/dataflow
- SPRING_DATASOURCE_USERNAME=root
- SPRING_DATASOURCE_PASSWORD=rootpw
- SPRING_DATASOURCE_DRIVER_CLASS_NAME=org.mariadb.jdbc.Driver
depends_on:
- kafka-broker
entrypoint: "./wait-for-it.sh mysql:3306 -- java -jar /maven/spring-cloud-dataflow-server.jar"
volumes:
- ${HOST_MOUNT_PATH:-.}:${DOCKER_MOUNT_PATH:-/root/scdf}
app-import:
image: springcloud/openjdk:2.0.0.RELEASE
container_name: dataflow-app-import
depends_on:
- dataflow-server
command: >
/bin/sh -c "
./wait-for-it.sh -t 180 dataflow-server:9393;
wget -qO- 'http://dataflow-server:9393/apps' --post-data='uri=${STREAM_APPS_URI:-https://dataflow.spring.io/kafka-maven-latest&force=true}';
echo 'Stream apps imported'
wget -qO- 'http://dataflow-server:9393/apps' --post-data='uri=${TASK_APPS_URI:-https://dataflow.spring.io/task-maven-latest&force=true}';
echo 'Task apps imported'"
skipper-server:
image: springcloud/spring-cloud-skipper-server:2.5.2
container_name: skipper
ports:
- "7577:7577"
- "20000-20105:20000-20105"
environment:
- SPRING_CLOUD_SKIPPER_SERVER_PLATFORM_LOCAL_ACCOUNTS_DEFAULT_PORTRANGE_LOW=20000
- SPRING_CLOUD_SKIPPER_SERVER_PLATFORM_LOCAL_ACCOUNTS_DEFAULT_PORTRANGE_HIGH=20100
- SPRING_DATASOURCE_URL=jdbc:mysql://mysql:3306/dataflow
- SPRING_DATASOURCE_USERNAME=root
- SPRING_DATASOURCE_PASSWORD=rootpw
- SPRING_DATASOURCE_DRIVER_CLASS_NAME=org.mariadb.jdbc.Driver
entrypoint: "./wait-for-it.sh mysql:3306 -- java -jar /maven/spring-cloud-skipper-server.jar"
volumes:
- ${HOST_MOUNT_PATH:-.}:${DOCKER_MOUNT_PATH:-/root/scdf}
docker-compose -f .\docker-compose.yml up -d
任务处理用于一次性或批量数据处理,适合处理需要在特定时间点或周期性执行的任务。任务应用程序通常具有以下特性:
一次性执行:
批处理支持:
任务应用程序适用于需要在后台执行的长时间运行作业,特别是在需要处理大量数据的情况下
流处理主要用于实时数据处理,适合处理持续不断的数据流。流应用程序通常由以下三种组件组成:
源(Source):
处理器(Processor):
接收器(Sink):
流应用程序通常用于需要低延迟和高吞吐量的场景,如实时数据分析、事件驱动架构和物联网数据处理。