In Discovery Saas or on prem, scans are : - stuck
|
Cause 1: The status of the stuck scan is "on hold" See this documentation page: The discovery run has not finished within its scheduled scan window. This scan is considered on hold. The run continues or restarts at the next scan window. On prem: The command "tw_reasoningstatus --waiting" lists the "on hold" requests Solution: Extend the duration of scan window
See this documentation page Discovering an endpoint requires additional discovery on another endpoint that is not currently in an open scan window For example: During a scan A, a pattern decides to run a command on a remote ip. This ip is scanned by another scheduled scan B that will start later. Consequence: the scan A will be blocked by the ip and will be unblocked when the scan B will scan it.
Solution: click on "start all scans"
To confirm: Sometimes, the stuck scans are correlated with an outpost update. In this case, if some outposts stay "out date" several days after the upgrade, the root cause is possible.
If the stuck scans are correlated with "many" Das fail with ERROR/Timeout (peaks at 10-20% of the DAs), the root cause is very probable. SEARCH flags(no_segment) Host, NetworkDevice Workaround: 1- Restart all outposts - In the UI of each outposts, go to to Manage > Configuration
To confirm: If there were no pattern changes in the last 24h, this root cause is not probable
To confirm: If the same symptoms of the root cause above are observed (big persistence file) AND the size continues to grow until the disk is full, the root cause is probable
To confirm: Monitor the scans (ideally for 30 minutes at least), if some scans made some progress, the root cause is possible. On prem: Measure the performance with this troubleshooting guide (see "How to measure the scan performance?). If the scan rate is reasonably high, this root cause is possible. Potential workaround: Increase the Maximum concurrent discovery requests per engine. It may not increase the performance in general (especially is the hardware is already overloaded) but it could help Discovery to process more scans in //.
To confirm: Check the status of the outpost connection
Cause 9: A large number of sudo attempts failed. To confirm: Review the session logs of an ip scan. It it contains an unreasonable amount of sudo password prompt, the root cause is probable. For example: [sudo] If the issue can't be reproduced anymore after having configured Discovery to not use sudo, the root cause is confirmed. Solution: - wait for this RFE to be accepted/planned/released Workaround: - configure the scanned device to not ask for a password when using sudo Cause 10: Some ip scan are very long. This matches the article below Discovery: The scan duration for a single ip is unreasonably long (> 30 minutes)
Cause 11: Discovery sends requests faster than acceptable for AWS. See the article below: Discovery: Sometimes, AWS scans fail with TooManyRequestsException In SessionResult
Cause 12: Dynatrace data imports run too frequently To confirm:
search ImportRecord where type has substring "Dynatrace" If the import runs more than once a day then the root cause is probable.
Note: although not yet observed this could possibly happen with other types of imports. Only run "search ImportRecord" to see all types of imports.
Solution: Schedule the Dynatrace imports to run once a day only
Cause 13: The appliance did not shutdown while a file system went out of disk space. To confirm: If the logs contains "OSError: [Errno 28] No space left on device:", the cause is possible. If the error above was found in the reasoning logs, the cause is probable Solution: Restart the appliance after having resolved the file system saturation |