How to independently deploy data pipeline service
Data pipeline is an extension module in the HAP system, and users can choose whether to enable it independently. Enable data pipeline.
Quick deployment involves deploying the data pipeline service on the same server as the HAP microservices, which requires high hardware resources. If a single server cannot meet the requirements, follow this article to independently deploy the data pipeline service on a new server. More details on server configuration.
Install Docker
To install Docker, check the official installation instructions for different Linux versions or view the Docker Installation Section in the deployment examples.
Microservices Adjustment
The data pipeline service requires file storage and Kafka components, so it is necessary to map the access points of these two components in the sc service in standalone mode.
If your HAP Server environment is in cluster mode, no adjustment is needed, and you can directly connect the data pipeline service to file storage and Kafka components for configuring.
For standalone mode, to map the ports of file storage and Kafka components, you need to modify the docker-compose.yaml file by adding environment variables and port mappings as shown below.
app:
environment:
ENV_FLINK_URL: http://192.168.10.30:58081 # Add, this is the Host resolution of the Flink data pipeline service, make sure to modify it to the actual IP address
sc:
ports:
- 9000:9000
- 9092:9092
docker-compose.yaml Configuration File Example
version: '3'
services:
app:
image: nocoly/hap-community:5.8.3
environment: &app-environment
ENV_ADDRESS_MAIN: "https://hap.domain.com"
ENV_APP_VERSION: "5.8.3"
ENV_API_TOKEN: "******"
ENV_FLINK_URL: http://192.168.10.30:58081 # Add, this is the Host resolution of the Flink data pipeline service, make sure to modify it to the actual IP address
ports:
- 8880:8880
volumes:
- ./volume/data/:/data/
- ../data:/data/hap/data
sc:
image: nocoly/hap-sc:3.0.0
environment:
<<: *app-environment
volumes:
- ./volume/data/:/data/
ports:
- 9000:9000 # Add
- 9092:9092 # Add
volumes:
- ./volume/data/:/data/
After modifications, execute bash service.sh restartall
in the manager directory to restart the microservices.
Deploy Data Pipeline Service
-
Initialize the swarm environment
docker swarm init
-
Create a directory
mkdir -p /data/hap/script/volume/data
-
Create a configuration file
cat > /data/hap/script/flink.yaml <<EOF
version: '3'
services:
flink:
image: nocoly/hap-flink:1.17.1.530
entrypoint: ["/bin/bash"]
command: ["/run.sh"]
environment:
ENV_FLINK_S3_ACCESSKEY: "mdstorage"
ENV_FLINK_S3_SECRETKEY: "eBxExGQJNhGosgv5FQJiVNqH"
ENV_FLINK_S3_SSL: "false"
ENV_FLINK_S3_PATH_STYLE_ACCESS: "true"
ENV_FLINK_S3_ENDPOINT: "sc:9000" # For versions before 5.1.0 (excluding 5.1.0), fill in "app"; for versions 5.1.0+ (including 5.1.0), fill in "sc"
ENV_FLINK_S3_BUCKET: "mdoc"
ENV_FLINK_LOG_LEVEL: "INFO"
ENV_FLINK_JOBMANAGER_MEMORY: "2000m"
ENV_FLINK_TASKMANAGER_MEMORY: "10000m"
ENV_FLINK_TASKMANAGER_SLOTS: "50"
ENV_KAFKA_ENDPOINTS: "sc:9092" # For versions before 5.1.0 (excluding 5.1.0), fill in "app"; for versions 5.1.0+ (including 5.1.0), fill in "sc"; if Kafka is using external components, fill in the actual IP of Kafka.
ports:
- 58081:8081
volumes:
- ./volume/data/:/data/
extra_hosts:
- "sc:192.168.10.28" # This is the host resolution for the sc service (corresponding to the value filled in ENV_KAFKA_ENDPOINTS as "sc:9092"), make sure to modify it to the actual IP address
#- "app:192.168.10.28" # This is the host resolution for the sc service (corresponding to the value filled in ENV_KAFKA_ENDPOINTS as "sc:9092"), make sure to modify it to the actual IP address
EOF -
Configure the startup script
cat > /data/hap/script/startflink.sh <<-EOF
docker stack deploy -c /data/hap/script/flink.yaml flink
EOF
chmod +x /data/hap/script/startflink.sh -
Start the data pipeline service
bash /data/hap/script/startflink.sh
- It takes about 5 minutes for the data pipeline service container to fully start after startup.
- Stop command: docker stack rm flink
Other Considerations
The data pipeline service needs to create two directories, checkpoints
and recovery
, under the bucket of the file storage service to store relevant data.
If external object storage is enabled, the file storage will switch to S3 mode, causing issues with the data pipeline, as the data pipeline service currently does not support direct use of the S3 protocol for object storage.
Therefore, if external object storage is enabled, a new file storage service needs to be deployed for the data pipeline service.
Deploy File Storage Service
-
Create
file-flink.yaml
version: '3'
services:
file-flink:
image: nocoly/hap-file:1.6.0
volumes:
- /usr/share/zoneinfo/Etc/GMT-8:/etc/localtime
- ./volume/data/file-flink/volume:/data/storage
environment:
MINIO_ACCESS_KEY: storage
MINIO_SECRET_KEY: ITwWPDGvSLxxxxxxM46XiSEmEdF4 # customize the authentication key
command: ["./main", "server", "/data/storage/data"] -
Download the mirror for the file service
docker pull nocoly/hap-file:1.6.0
-
Create persistent storage directories for the file-flink service
mkdir -p /data/hap/script/volume/data/file-flink/volume
-
Start the file-flink file storage service
docker stack deploy -c file-flink.yaml file-flink
-
Enter the file-flink container to create the required bucket
docker exec -it xxx bash
- Replace xxx with the container id of file-flink
-
Create buckets
# mc command configuration
mc config host add file-flink http://127.0.0.1:9000 storage ITwWPDGvSLxxxxxxM46XiSEmEdF4 # modify it to your custom authentication key
# Create the required bucket: mdoc
mc mb file-flink/mdoc -
Modify relevant variables in the data pipeline service to specify connection to the file-flink service
ENV_FLINK_S3_ACCESSKEY: "storage"
ENV_FLINK_S3_SECRETKEY: "ITwWPDGvSLxxxxxxM46XiSEmEdF4" # modify it to your custom authentication key
ENV_FLINK_S3_ENDPOINT: "192.168.10.30:9000" # replace with the actual IP of the file-flink service
ENV_FLINK_S3_BUCKET: "mdoc" -
Restart the flink service
docker stack rm flink
sleep 30
bash /data/hap/script/startflink.sh