Skip to main content

Log Data Archiving

In HAP Server, the execution data of workflows and logs of worksheets are retained for a long time, which may result in a huge amount of such data and occupy more space in the database. Therefore, we provide a log archiving solution, according to the set rules, you can archive the related logs to a new MongoDB instance and delete the archived data in the source database. After the deletion, the execution history of workflows (including approval workflows) and the logs of worksheets of the time period will no longer be displayed on the page.

First, you need to deploy a new MongoDB instance. It is recommended that the version of the new instance is consistent with the current MongoDB version used by HAP. The version of built-in MongoDB in the standalone environment is v3.4.24 by default, and we provide a detailed document Single Node MongoDB Deployment of v3.4.24. There is no need to prepare a new MongoDB instance if you only need to delete the relevant data but not archive it.

In a standalone environment in HAP Server, to ensure that the log archiving program can access MongoDB, you need to map the MongoDB port to the host computer first, and note that you should never expose the port to the public network.

More details on how to access storage components externally

Configuration Steps:

  1. Download mirror (Offline Package)

    docker pull nocoly/hap-archivetools:1.0.2
  2. Create a config.json configuration file with the following content:

    [
    {
    "id": "1",
    "text": "Archiving of workflows execution history",
    "start": "2023-01-01",
    "end": "2023-02-01",
    "src": "mongodb://root:password@192.168.1.20:27017/mdworkflow?authSource=admin",
    "archive": "mongodb://root:password@192.168.1.30:27017/mdworkflow-archive-2023271003?authSource=admin",
    "table": "wf_instance",
    "delete": true,
    "batchSize": 500,
    "retentionDays": 0
    },
    {
    "id": "2",
    "text": "Archiving of workflows execution history",
    "start": "2023-01-01",
    "end": "2023-02-01",
    "src": "mongodb://root:password@192.168.1.30:27017/mdworkflow?authSource=admin",
    "archive": "mongodb://root:password@192.168.1.30:27017/mdworkflow-archive-2023271003?authSource=admin",
    "table": "wf_subInstanceActivity",
    "delete": true,
    "batchSize": 500,
    "retentionDays": 0
    },
    {
    "id": "3",
    "text": "Archiving of worksheets logs",
    "start": "2023-01-01",
    "end": "2023-02-01",
    "src": "mongodb://root:password@192.168.1.20:27017/mdworksheetlog?authSource=admin",
    "archive": "mongodb://root:password@192.168.1.30:27017/mdworksheetlog-archive-2023271003?authSource=admin",
    "table": "wslog*",
    "delete": true,
    "batchSize": 500,
    "retentionDays": 0
    }
    ]

    Parameter Description:

    "id": "Service Identification ID",
    "text": "Description",
    "start": "Start date of the data being archived, UTC (if the value of retentionDays is greater than 0, the configuration is not valid)",
    "end": "End date of the data being archived, UTC (if the value of retentionDays is greater than 0, the configuration is not valid)",
    "src": "Address of source base",
    "archive": "Address of target base (if empty, it is not archived but only deleted according to the set rules)",
    "table": "Data table",
    "delete": "It is fixed to true. Clean up the data that has been archived in the source base when the current archiving is complete and the number of records is checked and correct.",
    "batchSize": "Number of single batch insertions and batch deletions",
    "retentionDays": "It defaults to 0. When it is greater than 0, it means that the data from X days ago is deleted, and the timed archiving mode is enabled, and the specified start and end dates are automatically invalidated, which is performed every 24h by default."

    Whitelist of data tables that can be cleaned:

    code_catch
    hooks_catch
    webhooks_catch
    app_multiple_catch
    wf_instance
    wf_subInstanceActivity
    wf_subInstanceCallback
    custom_apipackageapi_catch
    wslog* # * are wildcards that represent all tables starting with wslog
    • All data tables are frommdworkflow, except wslog* representing data tables from mdworksheetlog.

    • code_catchhooks_catchwebhooks_catch, these three tables can be deleted by dropping them when the microservice is stopped, which directly frees the disk space occupied by the corresponding tables.

  3. Start the archiving service and execute in the directory where the config.json file is located.

    docker run -d -it -v $(pwd)/config.json:/usr/local/MDArchiveTools/config.json  -v /usr/share/zoneinfo/Etc/GMT-8:/etc/localtime nocoly/hap-archivetools:1.0.2
    • While the program is running, there will be resource consumption pressure on the source and target bases as well as the machine where the program is running, so it is recommended to execute it in the idle period of the business.

    • After the service is started, the program execution log is output in the container log, and the container exits when program execution is complete (it does not exit in timed archiving mode).

    • You can find the container you're running with docker ps -a and check the program execution logs with docker logs container ID.

    • The container runs in the background by default. If you want to see the progress of data archiving or deletion, you can remove the -d parameter from the docker run command so that the logs will be output directly in the foreground and you can see the progress bar, but make sure that the foreground is not interrupted during the execution of the program.

    • In the example config.json configuration file, name the new base in the format of source-base name-archive-date-time. Change the target base name each time you execute.

      • This is because after the archive is completed, it will first check the amount of data in the target table and will not delete it if it is not equal to that in the source table. If you do not change the name of the archiving target base in the second running, it may cause the total amount of data in the target table to be more than that in this archiving, resulting in the source data not being deleted.
    • In timed archiving mode, you can customize the ENV_ARCHIVE_INTERVAL variable to change the execution interval, in milliseconds, with a default value of 86400000.