Service Health Check

Standalone Mode

When there are partial functional abnormalities or system-wide access issues, you can follow the steps below for sequential troubleshooting.

docker logs $(docker ps -a | grep -E 'hap-community|hap:' | awk '{print $1}')

docker logs $(docker ps -a | grep sc | awk '{print $1}')

Log Analysis and Troubleshooting Methods:

Normal: Logs are mainly at the INFO level and update steadily in a rolling manner.
Abnormal: Continuous ERROR logs or stack trace information indicates issues requiring targeted analysis.
- Kafka Error: If storage component logs indicate that Kafka failed to start, refer to the Kafka Startup Failure Troubleshooting Steps.
- MongoDB Error: If the logs show that MongoDB has experienced automatic restarts, it is typically caused by server memory overload. A temporary restart of the HAP service can often resolve the issue.
- Microservice Error: If storage component logs are normal but microservice application logs are abnormal, attempt to resolve the issue by restarting the HAP service.

Run the following command in the directory where the installation manager is extracted:

bash service.sh restartall

If the path of the service.sh file is forgotten, use the command below to locate it:
```
find / -path /proc -prune -o -name "service.sh" -print
```

top -c

In standalone mode, a 16-core CPU is sufficient for most scenarios. If the CPU is still fully utilized and the process with the highest CPU usage is the mongod process, this is usually caused by slow queries. Refer to the Slow Query Optimization Documentation.
The wa field in the CPU metrics of the top command represents disk wait time. Normally, it should be 0 or 0.x. If it reaches 5 or higher, it indicates poor disk performance, and switching to SSD disks is strongly recommended.

free -h

When memory usage is close to capacity, it can easily trigger system anomalies, which may also lead to abnormally high CPU usage.
If memory utilization is excessively high even in environments with 64GB or more memory, use the top -co %MEM command to sort processes by memory usage percentage and identify problematic processes.

df -Th

Full capacity in data partitions will cause system functionalities to become unavailable.
Refer to documentation for cleaning up old images, deleting redundant log data, or expanding disk space. Afterward, restart the service to restore functionality.

System anomalies may sometimes exhibit latency, so it is necessary to review historical resource trends for retrospective analysis.

Existing Monitoring: If monitoring tools (such as Zabbix, Prometheus, etc.) are already installed on the server, prioritize reviewing trends in CPU, memory, and I/O metrics during the fault timeframe.
Additional Installation: If no monitoring mechanisms are currently in place, it is recommended to install an Ops Platform to achieve real-time monitoring and historical data analysis of system resources.