Based on Kubernetes Standard Edition Above Modes
Overview
This document provides a detailed guide on deploying a multi-node architecture Ops Platform within a Kubernetes cluster, including the deployment of Node Exporter, image preparation, node label configuration, service definition, and access settings. It is mainly applicable to standard edition clusters and above (including standard edition clusters) where components have multiple replicas.
Deploy Node Exporter
When the Ops Platform is deployed within a Kubernetes cluster, the ops-nodeagent container has a built-in Node Exporter service, running as a DaemonSet resource on each node of the Kubernetes cluster, automatically collecting monitoring data from all nodes within the cluster.
If you need to monitor servers outside the Kubernetes cluster, you should deploy Node Exporter separately on external servers.
Ops Platform Deployment
Step 1: Download Images (Offline Package Download)
crictl pull nocoly/ops-gateway:1.1.0
crictl pull nocoly/ops-prometheus:1.1.0
crictl pull nocoly/ops-agent:1.1.0
crictl pull nocoly/ops-nodeagent:1.0.0
Note:
- The
ops-gateway,ops-prometheus, andops-agentimages only need to be downloaded on the nodes where the Ops Platform is deployed. - The
ops-nodeagentimage needs to be downloaded on every node within the Kubernetes cluster as it runs as a DaemonSet.
Step 2: Create Node Labels and Taints
Configure Deployment Node Labels
By default, the Ops Platform is deployed on fixed nodes using nodeSelector, and monitoring data in the ops-prometheus service is persistently stored on local disk via hostPath.
Create the hap-ops=true label for the node where the Ops Platform services will be deployed:
kubectl label node <node-name> hap-ops=true
Tip: Replace <node-name> with the actual node name where the Ops Platform services are to be deployed. You can view node names in the cluster using the command kubectl get node -o wide.
Configure Dedicated Monitoring Node (Optional)
If you wish to add a new dedicated monitoring worker node, you need to create a taint on the new node to ensure only Ops Platform components can be scheduled to it:
kubectl taint nodes <node-name> hap-ops=true:NoSchedule
Note: If the above taint is added, you will also need to add corresponding toleration configurations for all deployed components in the ops.yaml file:
# Add to each Deployment's spec.template.spec section
# Uncomment and adjust as needed
# tolerations:
# - key: "hap-ops"
# operator: "Equal"
# value: "true"
# effect: "NoSchedule"
Step 3: Create Ops Platform Service Yaml File
Prepare Configuration File Directory
mkdir -p /data/hap/script/kubernetes/ops/
cd /data/hap/script/kubernetes/ops/
Create the Main Configuration File ops.yaml
Create the ops.yaml file, which defines the complete component architecture of the Ops Platform:
ops-agent-1: Responsible for monitoring mongodb-1, elasticsearch-1, kafka-1, and other shared componentsops-agent-2: Responsible for monitoring mongodb-2, elasticsearch-2, kafka-2ops-agent-3: Responsible for monitoring mongodb-3, elasticsearch-3, kafka-3
cat > ops.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: ops-config
namespace: hap-ops
data:
TZ: "Asia/Shanghai"
ENV_OPS_TOKEN: "SS9PobGG7SDTpcyfSZ1VVmn3gCmy2P52tYk" # Must change during initial deployment, this is the Ops Platform access authentication key
ENV_PROMETHEUS_HOST: "svc_01/192.168.1.5:59100,svc_02/192.168.1.6:59100,svc_03/192.168.1.7:59100" # Replace with actual Node Exporter service addresses
ENV_PROMETHEUS_SERVER: "http://ops-prometheus:9090"
ENV_PROMETHEUS_GRAFANA: "http://ops-prometheus:3000"
ENV_PROMETHEUS_ALERT: "http://ops-prometheus:9093"
ENV_PROMETHEUS_KARMA: "http://ops-prometheus:8080"
ENV_PROMETHEUS_KAFKA: "kafka_1/ops-agent-1:9308,kafka_2/ops-agent-2:9308,kafka_3/ops-agent-3:9308"
ENV_PROMETHEUS_ELASTICSEARCH: "elasticsearch_1/ops-agent-1:9114,elasticsearch_2/ops-agent-2:9114,elasticsearch_3/ops-agent-3:9114"
ENV_PROMETHEUS_REDIS: "redis_1/ops-agent-1:9121"
ENV_PROMETHEUS_MONGODB: "mongodb_1/ops-agent-1:9216,mongodb_2/ops-agent-2:9216,mongodb_3/ops-agent-3:9216"
ENV_PROMETHEUS_MYSQL: "mysql_1/ops-agent-1:9104"
# The following is storage component connection information, modify during deployment according to the actual environment
ENV_MYSQL_HOST: "192.168.1.7"
ENV_MYSQL_PORT: "3306"
ENV_MYSQL_USERNAME: "root"
ENV_MYSQL_PASSWORD: "changeme"
ENV_MONGODB_URI: "mongodb://root:changeme@192.168.1.9:27017,192.168.1.10:27017,192.168.1.11:27017" # Configure ops-gateway service to collect mongodb-agent metrics
ENV_MONGODB_OPTIONS: "?authSource=admin"
ENV_REDIS_HOST: "192.168.1.8"
ENV_REDIS_PORT: "6379"
ENV_REDIS_PASSWORD: "changeme"
ENV_FLINK_URL: "http://flink-jobmanager.flink:8081"
ENV_PROMETHEUS_RETENTION: "30d" # Prometheus data retention period, default 15d (when not configured)
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ops-config-agent-1 # Exclusive configuration for agent-1
namespace: hap-ops
data:
ENV_MONGODB_URI: "mongodb://root:changeme@192.168.1.9:27017" # Connection address for the first mongodb node
ENV_MONGODB_OPTIONS: "?authSource=admin"
ENV_KAFKA_ENDPOINTS: "192.168.1.12:9092" # Connection address for the first kafka node
ENV_ELASTICSEARCH_ENDPOINTS: "http://192.168.1.12:9200" # Connection address for the first elasticsearch node
ENV_ELASTICSEARCH_PASSWORD: "elastic:changeme" # Account password for the first elasticsearch node
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ops-config-agent-2 # Exclusive configuration for agent-2
namespace: hap-ops
data:
ENV_MONGODB_URI: "mongodb://root:changeme@192.168.1.10:27017" # Connection address for the second mongodb node
ENV_MONGODB_OPTIONS: "?authSource=admin"
ENV_KAFKA_ENDPOINTS: "192.168.1.13:9092" # Connection address for the second kafka node
ENV_ELASTICSEARCH_ENDPOINTS: "http://192.168.1.13:9200" # Connection address for the second elasticsearch node
ENV_ELASTICSEARCH_PASSWORD: "elastic:changeme" # Account password for the second elasticsearch node
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ops-config-agent-3 # Exclusive configuration for agent-3
namespace: hap-ops
data:
ENV_MONGODB_URI: "mongodb://root:changeme@192.168.1.11:27017" # Connection address for the third mongodb node
ENV_MONGODB_OPTIONS: "?authSource=admin"
ENV_KAFKA_ENDPOINTS: "192.168.1.14:9092" # Connection address for the third kafka node
ENV_ELASTICSEARCH_ENDPOINTS: "http://192.168.1.14:9200" # Connection address for the third elasticsearch node
ENV_ELASTICSEARCH_PASSWORD: "elastic:changeme" # Account password for the third elasticsearch node
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ops-gateway
namespace: hap-ops
spec:
replicas: 1
selector:
matchLabels:
app: ops-gateway
template:
metadata:
labels:
app: ops-gateway
spec:
# Uncomment if node taints are configured
# tolerations:
# - key: "hap-ops"
# operator: "Equal"
# value: "true"
# effect: "NoSchedule"
nodeSelector:
hap-ops: "true"
containers:
- name: ops-gateway
image: nocoly/ops-gateway:1.1.0
envFrom:
- configMapRef:
name: ops-config
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "0.1"
memory: "200Mi"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ops-prometheus
namespace: hap-ops
spec:
replicas: 1
selector:
matchLabels:
app: ops-prometheus
template:
metadata:
labels:
app: ops-prometheus
spec:
# Uncomment if node taints are configured
# tolerations:
# - key: "hap-ops"
# operator: "Equal"
# value: "true"
# effect: "NoSchedule"
nodeSelector:
hap-ops: "true"
containers:
- name: ops-prometheus
image: nocoly/ops-prometheus:1.1.0
volumeMounts:
- mountPath: /data/
name: prometheus-data
envFrom:
- configMapRef:
name: ops-config
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "0.1"
memory: "200Mi"
volumes:
- name: prometheus-data
hostPath:
path: /data/ops-prometheus-data # Persistent storage path
type: DirectoryOrCreate # Create directory if it does not exist
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ops-agent-1
namespace: hap-ops
spec:
replicas: 1
selector:
matchLabels:
app: ops-agent-1
template:
metadata:
labels:
app: ops-agent-1
spec:
# Uncomment if node taints are configured
# tolerations:
# - key: "hap-ops"
# operator: "Equal"
# value: "true"
# effect: "NoSchedule"
nodeSelector:
hap-ops: "true"
containers:
- name: ops-agent-1
image: nocoly/ops-agent:1.1.0
envFrom:
- configMapRef:
name: ops-config
- configMapRef:
name: ops-config-agent-1 # Exclusive configuration (overrides public configuration)
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "0.05"
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: ops-agent-1
namespace: hap-ops
spec:
selector:
app: ops-agent-1
ports:
- name: prometheus
port: 9104
targetPort: 9104
- name: mongodb
port: 9216
targetPort: 9216
- name: redis
port: 9121
targetPort: 9121
- name: kafka
port: 9308
targetPort: 9308
- name: elasticsearch
port: 9114
targetPort: 9114
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ops-agent-2
namespace: hap-ops
spec:
replicas: 1
selector:
matchLabels:
app: ops-agent-2
template:
metadata:
labels:
app: ops-agent-2
spec:
# Uncomment if node taints are configured
# tolerations:
# - key: "hap-ops"
# operator: "Equal"
# value: "true"
# effect: "NoSchedule"
nodeSelector:
hap-ops: "true"
containers:
- name: ops-agent-2
image: nocoly/ops-agent:1.1.0
envFrom:
- configMapRef:
name: ops-config-agent-2 # Exclusive configuration (overrides public configuration)
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "0.05"
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: ops-agent-2
namespace: hap-ops
spec:
selector:
app: ops-agent-2
ports:
- name: mongodb
port: 9216
targetPort: 9216
- name: kafka
port: 9308
targetPort: 9308
- name: elasticsearch
port: 9114
targetPort: 9114
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ops-agent-3
namespace: hap-ops
spec:
replicas: 1
selector:
matchLabels:
app: ops-agent-3
template:
metadata:
labels:
app: ops-agent-3
spec:
# Uncomment if node taints are configured
# tolerations:
# - key: "hap-ops"
# operator: "Equal"
# value: "true"
# effect: "NoSchedule"
nodeSelector:
hap-ops: "true"
containers:
- name: ops-agent-3
image: nocoly/ops-agent:1.1.0
envFrom:
- configMapRef:
name: ops-config-agent-3 # Exclusive configuration (overrides public configuration)
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "0.05"
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: ops-agent-3
namespace: hap-ops
spec:
selector:
app: ops-agent-3
ports:
- name: mongodb
port: 9216
targetPort: 9216
- name: kafka
port: 9308
targetPort: 9308
- name: elasticsearch
port: 9114
targetPort: 9114
type: ClusterIP
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ops-nodeagent
namespace: hap-ops
spec:
selector:
matchLabels:
app: ops-nodeagent
template:
metadata:
labels:
app: ops-nodeagent
spec:
containers:
- name: ops-nodeagent
image: nocoly/ops-nodeagent:1.0.0
envFrom:
- configMapRef:
name: ops-config
ports:
- containerPort: 59100
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "0.1"
memory: "200Mi"
volumeMounts:
- name: host-root
mountPath: /host
readOnly: true
mountPropagation: HostToContainer
volumes:
- name: host-root
hostPath:
path: /
hostNetwork: true # Use host network
hostPID: true # Use host PID namespace
---
apiVersion: v1
kind: Service
metadata:
name: ops-prometheus
namespace: hap-ops
spec:
selector:
app: ops-prometheus
ports:
- name: server
port: 9090
targetPort: 9090
- name: grafana
port: 3000
targetPort: 3000
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: ops-gateway
namespace: hap-ops
spec:
selector:
app: ops-gateway
ports:
- name: gateway
port: 48881
targetPort: 48881
nodePort: 30081
type: NodePort
EOF
Step 4: Create Namespace and Start Services
Create Namespace
kubectl create ns hap-ops
Note: The Ops Platform is by default deployed in the hap-ops namespace.
Start Ops Platform Services
kubectl apply -f ops.yaml
Tip:
- To stop the service, execute:
kubectl delete -f ops.yaml - It is recommended to closely monitor the Pod status during deployment to ensure all components start up properly.
Step 5: Check Ops Platform Service Status
kubectl -n hap-ops get pod -o wide
Validation Criteria: All Pods should show a READY column status of 1/1, indicating components are running smoothly.
Step 6: Configure Nginx Reverse Proxy
To facilitate access to the Ops Platform, it is recommended to configure an Nginx reverse proxy:
cat > hap-ops.conf << 'EOF'
upstream hap-ops {
server 172.29.202.34:30081; # Replace with the IP of the K8S node deploying the Ops Platform
}
server {
listen 48881;
server_name _;
access_log /data/logs/weblogs/hap-ops.log main;
error_log /data/logs/weblogs/hap-ops.error.log;
underscores_in_headers on;
client_max_body_size 2048m;
gzip on;
gzip_proxied any;
gzip_disable "msie6";
gzip_vary on;
gzip_min_length 512;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_types text/plain text/css application/json application/x-javascript application/javascript application/octet-stream text/xml application/xml application/xml+rss text/javascript image/jpeg image/gif image/png;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_pass http://hap-ops;
}
}
EOF
Note: It is recommended to use port 48881 for the access entry, maintaining consistency with the fixed backend port of the Ops Platform. After completing the configuration, place this configuration file in the Nginx configuration directory and restart the Nginx service.
Step 7: Access the Ops Platform
Using the above Nginx proxy as an example, access the Nginx entry point:
http://hap-ops.demo.com:48881
- The login token is the value of the
ENV_OPS_TOKENenvironment variable inops.yaml