Discovery: 100% of the Outpost's CPU is consumed by ssh

Knowledge Article

Article Number

000389870

Old Article Number

Article Type

Solutions to a Product Problem

Title

Discovery: 100% of the Outpost's CPU is consumed by ssh_worker processes

Summary

Product

BMC Discovery

Component

BMC Discovery

Applies to

12x

Problem

100% of the Outpost's CPU is consumed by ssh_worker processes

Cause

Solution

Root cause 1: Defect fixed in 24.2.

The symptom may only appear with a value of "Maximum concurrent discovery requests per engine" higher than the default value (30). With the default value, The processes may consume memory only (not cpu).

To confirm: If the symptom is still visible after having stopped the outpost service, the root cause is eliminated.

Workaround: Restart the outpost services or reboot.

Root Cause 2 : The scanned ssh deamon does not respond fast enough. When this happens, a default TCP timeout will stop connection after 30 minutes.

To confirm:

If the symptom is still visible after having stopped the outpost service, the root cause is probable.
If the Discovery_worker log file contains the lines below, the root cause is possible

<timestamp>: discovery.session: DEBUG: 111.111.111.111: getLoginSession: Trying user MyUser
<timestamp>: discovery.session.ssh.using_worker: DEBUG: default worker: Established 6001 to 111.111.111.111
<timestamp>: discovery.session.base: DEBUG: ssh: Default scope: 111.111.111.111: Connected session 6001
<timestamp 30 minutes later>: discovery.session.base: DEBUG: ssh: Default scope: 111.111.111.111: establishConnection: CORBA.TIMEOUT(omniORB.TIMEOUT_CallTimedOutOnClient, CORBA.COMPLETED_MAYBE)

If the issue is correlated with errors "Unable to get the deviceInfo: TIMEOUT" in DiscoveryAccess nodes, the root cause is probable.

If it's not possible to open an ssh session and run some commands on the ips affected by the error messages above, the root cause is confirmed.

Workaround: Exclude the ips affected by this issue

Solution: Engage your IT to check why it's not possible to use ssh on the ips affected by the issue.

Cause 2: Unknown cause.

To confirm: If this issue is correlated with a significant decrease of the scan rate (-50%), this cause is possible.

If the symptoms disappear after a having applied the workaround, the cause is possible.

Workaround: reboot all the outposts

Attachment(s):