The symptoms typically associated with this problem are:
ERROR: Insufficient disk space on this machine
HTTP 503: “The appliance has been shut down"
Disk usage warning on machine Discovery_Appliance (localhost)
Appliance Specification Baseline - MAJOR: This appliance has insufficient resources
The Spanish version of this article is below: |
To identify the partition/folders affected, ssh to the appliance and execute the commands:
df -h
du -h $TIDEWAY 2>/dev/null | sort -rh | head -n 5
The most frequently saturated partitions are /usr/tideway or the datastore partition. The output of 'df -h' may not show any partition at 100% capacity, but whichever one reports the highest percentage usage is likely the one that is having the problem.
By the time the permanent solution is found, here are some temporary workarounds: 1- Add more disk space and configure discovery to use it. For more details, see this article: Discovery: How to modify the disk configuration without using the GUI (tw_disk_utils)? - INCLUDES VIDEO 2- Configure Discovery to consume less disk (without fixing any issues if any). For example: stop scans and/or compact: Discovery: How does the compaction work? INCLUDES VIDEO 3- Delete some files using this procedure
The KA Discovery: Impact diagram showing the causes of disk full and performance issues can help to find the root cause and then the permanent solution. It shows the relations between the different events/issues that can lead to a disk full. The potential root causes are represented by the black boxes.
Possible root causes of saturation of the datastore partition:
- The storage is undersized. In the case of a standalone appliance, it is not compliant with the documented requirements
- The disk that contains the datastore is a virtual and it was extended but the partition were not. For more details, see this article: Discovery: the size of an added Disk is bigger than its partition - Discovery accumulated an unreasonable amount of nodes to delete. See this article: Discovery: the volumes of DA (DiscoveryAccess) eligible/pending for removal do not converge towards 0 - Discovery accumulated an unreasonable amount of fragmentation. See this article: Discovery: How frequent should the datastore be compacted? - Multi generational datastore is enabled and the 3 settings below are not suitable:
- Defect DRUD1-46885 - Released in 24.2. See this article: Discovery: Disk full, 502 errors, "Connection to service lost" when creating a credential and outposts can't be deleted Possible root causes of saturation of the transaction log partition (if moved to a new disk): - The "Time before history entries are purged" was decreased. See this article: Discovery: The partition of the datastore transaction logs is unreasonably big - In theory, a performance issue could slow down the datastore, lead to an accumulation of datastore transactions pending in the datastore log partition and then a saturation of this partition. This cause-effect link has never been confirmed so far, this is a suspected root cause. Possible root causes of a /usr/tideway saturation: - If the datastore (or datastore transaction logs) is stored in /usr, see the section "Potential root causes of saturation of the datastore partition" above. -Navigate to Administration > Logs > Log Files and click on button "Delete Old Logs" to remove old log files. -Navigate to Administration > Appliance Support > All > Actions and select "Delete" to clean up old appliance support files. - Large reasoning persist (*.pq) files
-Make sure that record mode is not enabled (see "Recording Mode" in this documentation page)
Possible root causes of a /var/log saturation:
- A customization can generate an unreasonable amount of lines in /var/log/messages. For example: a customization in /etc/rsyslog.conf can redirect Discovery logs into /var/log/messages. - The heartbeats of a load balancer generates logs in /var/log at an unreasonable rate.
Possible root causes of a /var saturation:
- Discovery was deployed in Azure and AzureMonitorLinuxAgent (unsupported) was deployed in /var. This agent can be confused with another one that is required by BMC in this documentation page (see "To install the Azure Virtual Machine Agent")
Possible root causes of a /boot saturation:
- the latest OS upgrade created new rescue files which saturated /boot.
Discovery: What files can be deleted from /boot? If more assistance is required, then please go through the procedure Discovery: How to report a health or performance issue? INCLUDES VIDEO , then open a case at the BMC support and attach the resulting file to the case.
Please also see the following video "How to resolve disk space problems in BMC Discovery":
|