Discovery: The appliance has shut down and cannot be restarted because a disk is full / UI shows message "Disk usage warning on machine"- INCLUDES VIDEO

Knowledge Article

Article Number

000283155

Old Article Number

000094102

Article Type

Solutions to a Product Problem

Title

Discovery: The appliance has shut down and cannot be restarted because a disk is full / UI shows message "Disk usage warning on machine"- INCLUDES VIDEO

Summary

Product

BMC Discovery

Component

BMC Discovery

Applies to

All Versions

Problem

The symptoms typically associated with this problem are:

The GUI is not accessible
The services are down (in certain members at least)
The command line "tw_service_control --restart" (or --start) fails with the error below:

ERROR: Insufficient disk space on this machine

REST API calls made to the appliance are returning following:

HTTP 503: “The appliance has been shut down"

If the services restart after having deleted some files with this procedure:
- the UI may display a disk usage warning

Disk usage warning on machine Discovery_Appliance (localhost)
Visit Appliance Baseline for details

- the error below is displayed in Administration > Baseline:

Appliance Specification Baseline - MAJOR: This appliance has insufficient resources

The Spanish version of this article is below:
BMC Discovery: Imposible iniciar los servicios debido a falta de espacio en el disco duro. (The appliance has shut down because available disk space is low)

Cause

Solution

To identify the partition/folders affected, ssh to the appliance and execute the commands:

df -h

du -h $TIDEWAY 2>/dev/null | sort -rh | head -n 5

The most frequently saturated partitions are /usr/tideway or the datastore partition. The output of 'df -h' may not show any partition at 100% capacity, but whichever one reports the highest percentage usage is likely the one that is having the problem.

By the time the permanent solution is found, here are some temporary workarounds:

1- Add more disk space and configure discovery to use it. For more details, see this article: Discovery: How to modify the disk configuration without using the GUI (tw_disk_utils)? - INCLUDES VIDEO

2- Configure Discovery to consume less disk (without fixing any issues if any). For example: stop scans and/or compact: Discovery: How does the compaction work? INCLUDES VIDEO

3- Delete some files using this procedure

The KA Discovery: Impact diagram showing the causes of disk full and performance issues can help to find the root cause and then the permanent solution. It shows the relations between the different events/issues that can lead to a disk full. The potential root causes are represented by the black boxes.

Possible root causes of saturation of the datastore partition:

- The storage is undersized. In the case of a standalone appliance, it is not compliant with the documented requirements

- The disk that contains the datastore is a virtual and it was extended but the partition were not. For more details, see this article: Discovery: the size of an added Disk is bigger than its partition

- Discovery accumulated an unreasonable amount of nodes to delete. See this article: Discovery: the volumes of DA (DiscoveryAccess) eligible/pending for removal do not converge towards 0

- Discovery accumulated an unreasonable amount of fragmentation. See this article: Discovery: How frequent should the datastore be compacted?

- Multi generational datastore is enabled and the 3 settings below are not suitable:

the new generation frequency
the compact frequency
the size of the db partition

They lead Discovery to consume too much disk space. The solution is to either add storage and/or compact more frequently and/or create new generations less frequently.

- Defect DRUD1-46885 - Released in 24.2. See this article: Discovery: Disk full, 502 errors, "Connection to service lost" when creating a credential and outposts can't be deleted

Possible root causes of saturation of the transaction log partition (if moved to a new disk):

- The "Time before history entries are purged" was decreased. See this article: Discovery: The partition of the datastore transaction logs is unreasonably big

- In theory, a performance issue could slow down the datastore, lead to an accumulation of datastore transactions pending in the datastore log partition and then a saturation of this partition. This cause-effect link has never been confirmed so far, this is a suspected root cause.

Possible root causes of a /usr/tideway saturation:

- If the datastore (or datastore transaction logs) is stored in /usr, see the section "Potential root causes of saturation of the datastore partition" above.

- There may be core dumps in /usr/tideway/cores. Consider not deleting them (they could be moved along with any other file found in that folder) and go through this KA: Discovery: Core dumps in /usr/tideway/cores.

- The logs are unreasonably big: Go to Administration > Logs > Logging Levels and unset the "Debug" mode for all types. Most of the time this is not required, but logs can be moved to a new disk.

-Navigate to Administration > Logs > Log Files and click on button "Delete Old Logs" to remove old log files.

-Navigate to Administration > Appliance Support > All > Actions and select "Delete" to clean up old appliance support files.

- Large reasoning persist (*.pq) files

- On the consolidation appliance, the /usr partition is saturated by files in /usr/tideway/var/persist/consolidation

-Discovery: What files can be safely removed / deleted to free up the disk space in discovery appliance ?

- A local backup takes too much space in /usr/tideway/var/localdisk/backup.

- There is too much recorded data in the folders /usr/tideway/var/pool and/or /usr/tideway/var/record. The contents of these folders can be deleted, but not the folders themselves.

-Make sure that record mode is not enabled (see "Recording Mode" in this documentation page)

Possible root causes of a /var/log saturation:

- A customization can generate an unreasonable amount of lines in /var/log/messages. For example: a customization in /etc/rsyslog.conf can redirect Discovery logs into /var/log/messages.

- The heartbeats of a load balancer generates logs in /var/log at an unreasonable rate.

Possible root causes of a /var saturation:

- Discovery was deployed in Azure and AzureMonitorLinuxAgent (unsupported) was deployed in /var. This agent can be confused with another one that is required by BMC in this documentation page (see "To install the Azure Virtual Machine Agent")

Possible root causes of a /boot saturation:

- the latest OS upgrade created new rescue files which saturated /boot.
Discovery: What files can be deleted from /boot?

If more assistance is required, then please go through the procedure Discovery: How to report a health or performance issue? INCLUDES VIDEO , then open a case at the BMC support and attach the resulting file to the case.

Please also see the following video "How to resolve disk space problems in BMC Discovery":

Attachment(s):