Reclaim Disk Space
After deleting data from a MongoDB instance, the storage space used by these deleted data will be marked as free, and subsequently, new data written by the same collection is usually stored directly in this free storage space. However, this free storage space cannot be reused by other collections. These unused free storage spaces are referred to as disk fragments. The more disk fragments there are, the lower the disk utilization.
There are two methods for reclaiming disk space: using the compact
command and rebuilding data files.
-
compact
command: This is a collection-level operation and requires collection-by-collection compression. -
Rebuild data files: This is a database instance-level operation, performed on the entire database, and is generally more comprehensive.
compact
Precautions
-
Please make sure to have a complete backup of the database first.
-
For versions prior to MongoDB 4.4, executing the
compact
command may cause the database associated with the collection to be locked, and read and write operations on that database will be blocked. It is recommended to perform this operation during off-peak business hours or after upgrading the version. For more details on the blocking issue, refer to the MongoDB official documentation.-
The time required to reclaim disk fragments using the
compact
command is related to the data volume of the collection, system load, disk performance, etc. During execution, there will also be a certain increase in CPU and memory usage. -
For versions below MongoDB 4.4.9, nodes currently executing the
compact
command will be forced into RECOVERING state. If this state persists for an extended period, the node may no longer be able to synchronize with thePRIMARY
node's data. -
4.4.9 - 4.4.17 之间的 MongoDB 版本,执行
compact
命令的节点则会保持在SECONDARY
状态,但是依旧运行中状态依旧无法同步PRIMARY
节点数据。
For versions between MongoDB 4.4.9 and 4.4.17, nodes executing the
compact
command will remain inSECONDARY
state but will still be unable to synchronize with thePRIMARY
node's data.- For versions above MongoDB 4.4.17, when executing the
compact
command,SECONDARY
nodes will continue to replicate data from thePRIMARY
node. (It is recommended to execute thecompact
command on versions above MongoDB 4.4.17)
-
-
The following conditions may cause the
compact
command to be ineffective, for more details please refer to the open-source code.-
The size of the physical collection is less than 1 MB.
-
In the first 80% of the storage space in a file, the amount of free storage space is less than 20%; in the first 90% of the storage space in a file, the amount of free storage space is less than 10%.
-
-
When executing the
compact
command, it is possible that the released storage space is less than the free storage space. If this occurs, you can try to repeat thecompact
command to release disk fragments, but it is not recommended to execute thecompact
command frequently.
Estimated Reclaimed Disk Fragment Space
-
Switch the database to the database where the collection is located.
use database_name
database_name
is the name of the database where the collection is located.
-
View the disk fragment space to be reclaimed for the collection.
db.collection_name.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
collection_name
is the name of the collection.
The returned result is as follows:
1485426688
This result indicates that the estimated disk fragment space to be reclaimed is 1485426688 bytes.
Reclaim Disk Fragments for Single Node or Replica Set Instances
Single Node
A single node instance has only one node, so you only need to execute the compact
command for this instance.
Replica Set
Replica set instances have multiple nodes, follow the following steps:
-
Execute the
compact
command on one of theSECONDARY
nodes. After thecompact
command is completed, repeat this operation on each remainingSECONDARY
node in sequence. -
Reassign the primary node. Use the
rs.stepDown()
method on thePRIMARY
node to trigger the re-election of a newPRIMARY
node. Once thePRIMARY
node changes toSECONDARY
status and a newPRIMARY
node is successfully elected, then execute thecompact
command.-
If you need to force the execution of the
compact
command on thePRIMARY
node, you will need to add theforce
parameter, for example:db.runCommand({compact:"collection_name",force:true})
-
compact Operation
-
Connect to the database node using the Mongo Shell.
-
Switch the database to the database where the collection is located.
use database_name
database_name
is the name of the database where the collection is located.
-
Specify the collection to execute the
compact
command and reclaim disk fragments.db.runCommand({compact:"collection_name"})
collection_name
is the name of the collection.
If successful, the return result is as follows:
{ "ok" : 1 }
Rebuild Data Files
Precautions
-
Before proceeding, please make sure to have a complete backup of the database.
-
The time required to rebuild data files depends on the data volume of the collection, system load, disk performance, etc.
Single Node
-
Stop the application service.
-
Stop the MongoDB database.
-
Use the
--repair
parameter ofmongod
to rebuild data files and reclaim disk space.Example:
mongod --repair --dbpath /data/mongodb/
-
/data/mongodb/
is the MongoDB data storage directory. -
Do not interrupt the operation during execution, as it may affect data integrity and prevent the database from starting.
-
-
Start the MongoDB database.
Replica Set
Reclaim disk space by deleting data on SECONDARY
nodes and leveraging MongoDB replica set's internal resynchronization mechanism to rebuild data files.
-
Execute the following command on any
SECONDARY
node to delete the data on the current node (excluding thekeyfile
):find /data/mongodb/ -mindepth 1 ! -name 'keyfile' -exec rm -rf {} +
- This command excludes the
keyfile
file in the/data/mongodb/
directory and deletes all other files and subdirectories.
- This command excludes the
-
Restart the current MongoDB node
-
Use the
rs.status()
command to check the node status. During the synchronization process, the node status will display asSTARTUP2
, and once synchronization is complete, it will change toSECONDARY
. -
After the previous node has completed synchronization and the node status changes to
SECONDARY
, repeat the same operation on the remainingSECONDARY
nodes in sequence. -
Finally, on the
PRIMARY
node, use thers.stepDown()
method to trigger the re-election. When thePRIMARY
node changes toSECONDARY
status and successfully elects a newPRIMARY
node, you can perform the same operation on that node.