Skip to main content

Reclaim Disk Space

After deleting data from a MongoDB instance, the storage space used by these deleted data will be marked as free, and subsequently, new data written by the same collection is usually stored directly in this free storage space. However, this free storage space cannot be reused by other collections. These unused free storage spaces are referred to as disk fragments. The more disk fragments there are, the lower the disk utilization.

There are two methods for reclaiming disk space: using the compact command and rebuilding data files.

  • compact command: This is a collection-level operation and requires collection-by-collection compression.

  • Rebuild data files: This is a database instance-level operation, performed on the entire database, and is generally more comprehensive.

compact

Precautions

  • Please make sure to have a complete backup of the database first.

  • For versions prior to MongoDB 4.4, executing the compact command may cause the database associated with the collection to be locked, and read and write operations on that database will be blocked. It is recommended to perform this operation during off-peak business hours or after upgrading the version. For more details on the blocking issue, refer to the MongoDB official documentation.

    • The time required to reclaim disk fragments using the compact command is related to the data volume of the collection, system load, disk performance, etc. During execution, there will also be a certain increase in CPU and memory usage.

    • For versions below MongoDB 4.4.9, nodes currently executing the compact command will be forced into RECOVERING state. If this state persists for an extended period, the node may no longer be able to synchronize with the PRIMARY node's data.

    • 4.4.9 - 4.4.17 之间的 MongoDB 版本,执行 compact 命令的节点则会保持在 SECONDARY 状态,但是依旧运行中状态依旧无法同步 PRIMARY 节点数据。

    For versions between MongoDB 4.4.9 and 4.4.17, nodes executing the compact command will remain in SECONDARY state but will still be unable to synchronize with the PRIMARY node's data.

    • For versions above MongoDB 4.4.17, when executing the compact command, SECONDARY nodes will continue to replicate data from the PRIMARY node. (It is recommended to execute the compact command on versions above MongoDB 4.4.17)
  • The following conditions may cause the compact command to be ineffective, for more details please refer to the open-source code.

    • The size of the physical collection is less than 1 MB.

    • In the first 80% of the storage space in a file, the amount of free storage space is less than 20%; in the first 90% of the storage space in a file, the amount of free storage space is less than 10%.

  • When executing the compact command, it is possible that the released storage space is less than the free storage space. If this occurs, you can try to repeat the compact command to release disk fragments, but it is not recommended to execute the compact command frequently.

Estimated Reclaimed Disk Fragment Space

  1. Switch the database to the database where the collection is located.

    use database_name
    • database_name is the name of the database where the collection is located.
  2. View the disk fragment space to be reclaimed for the collection.

    db.collection_name.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
    • collection_name is the name of the collection.

    The returned result is as follows:

    1485426688

    This result indicates that the estimated disk fragment space to be reclaimed is 1485426688 bytes.

Reclaim Disk Fragments for Single Node or Replica Set Instances

Single Node

A single node instance has only one node, so you only need to execute the compact command for this instance.

Replica Set

Replica set instances have multiple nodes, follow the following steps:

  1. Execute the compact command on one of the SECONDARY nodes. After the compact command is completed, repeat this operation on each remaining SECONDARY node in sequence.

  2. Reassign the primary node. Use the rs.stepDown() method on the PRIMARY node to trigger the re-election of a new PRIMARY node. Once the PRIMARY node changes to SECONDARY status and a new PRIMARY node is successfully elected, then execute the compact command.

    • If you need to force the execution of the compact command on the PRIMARY node, you will need to add the force parameter, for example:

      db.runCommand({compact:"collection_name",force:true})

compact Operation

  1. Connect to the database node using the Mongo Shell.

  2. Switch the database to the database where the collection is located.

    use database_name
    • database_name is the name of the database where the collection is located.
  3. Specify the collection to execute the compact command and reclaim disk fragments.

    db.runCommand({compact:"collection_name"})
    • collection_name is the name of the collection.

    If successful, the return result is as follows:

    { "ok" : 1 }

Rebuild Data Files

Precautions

  • Before proceeding, please make sure to have a complete backup of the database.

  • The time required to rebuild data files depends on the data volume of the collection, system load, disk performance, etc.

Single Node

  1. Stop the application service.

  2. Stop the MongoDB database.

  3. Use the --repair parameter of mongod to rebuild data files and reclaim disk space.

    Example:

    mongod --repair --dbpath /data/mongodb/
    • /data/mongodb/ is the MongoDB data storage directory.

    • Do not interrupt the operation during execution, as it may affect data integrity and prevent the database from starting.

  4. Start the MongoDB database.

Replica Set

Reclaim disk space by deleting data on SECONDARY nodes and leveraging MongoDB replica set's internal resynchronization mechanism to rebuild data files.

  1. Execute the following command on any SECONDARY node to delete the data on the current node (excluding the keyfile):

    find /data/mongodb/ -mindepth 1 ! -name 'keyfile' -exec rm -rf {} +
    • This command excludes the keyfile file in the /data/mongodb/ directory and deletes all other files and subdirectories.
  2. Restart the current MongoDB node

  3. Use the rs.status() command to check the node status. During the synchronization process, the node status will display as STARTUP2, and once synchronization is complete, it will change to SECONDARY.

  4. After the previous node has completed synchronization and the node status changes to SECONDARY, repeat the same operation on the remaining SECONDARY nodes in sequence.

  5. Finally, on the PRIMARY node, use the rs.stepDown() method to trigger the re-election. When the PRIMARY node changes to SECONDARY status and successfully elects a new PRIMARY node, you can perform the same operation on that node.