TrueSight Capacity Optimization (TSCO) Gateway Server Automation tools and techniques |
For the TrueSight Capacity Optimization (TSCO) Gateway Server, the General Manager Lite utility is designed to monitor the collection, transfer, processing. and population status of nightly Manager runs and provide a time-series view of the environment over the stability of the environment over the last 30 days (by default). It can also track the progression of that data into the TSCO data warehouse as well if TSCO has been implemented. Note: Below the 'BPA console' refers to the TSCO Gateway Server console.
NOTE: For the email notification to work the underlying system must have 'send mail' deployed and configured appropriately.IMPLEMENTATION VIA THE BCO_BPAStatusAndRecoveryManager.pl SCRIPT
>> Enter Gateway Server Console Name[s] (Multiple Consoles comma separated( Example console1,console2)) (Current Value=localhost) General Manager Lite (GMLite) must run on a Linux system where the BPA (Gateway Server) console is installed, but it can communicate with all BPA Unix, Linux, and Windows consoles in your environment in order to build a centralized view of your Gateway Server data processing. For this prompt, specify a list of BPA consoles for GMLite to contact to obtain BPA data processing information on a nightly basis. >> Enter Daily Script Execution Time (HH:MM) (Current Value=20:00) This prompt is for when GMLite scripts should be executed each day. By default, the script will execute at 8 PM. This time should be (a) some time after your last Manager run has finished processing for the day (b) data import into TSCO should be complete (if applicable), and (c) at a time when recover populates of data into TSCO could be attempted (if applicable). >> Enter Gateway Server GeneralManager Port (Current Value=10129) This is by default port 10129, and that port is commonly not customized. >> Enter Gateway Server Output Directory Where the Data will be put (Current Value=$BEST1_HOME/local/manager/status/GeneralManagerLite) Specify where the GMLite output should be written on this console if you don't want to use the default location. >> Enter gnuplot install directory (Do not specify anything if you wish to use one in your path) (Current Value=undefined) General Manager Lite can create a web page that includes charts reporting the number of computers configured, collected, transferred, processed, populated into the BPA database, and imported into TSCO. This functionality requires that the 'gnuplot' utility be installed on your TSCO Gateway Server console. If GNUplot is installed and you would enable this functionality, specify the gnuplot location here. On Linux, the default installation path for GNUplot is /usr/bin. >> Enter are you configuring BCO Gateway Server ETL Status reporting [Y|N] (You will need ORACLE_HOME, DSN, user name, and password) (Current Value=N) If you are importing BPA data into TSCO, General Manager Lite can be configured to monitor the success rate of the import of the VIS files into TSCO. Answer 'Y' if GMLite should be configured to monitor TSCO population success. >> Enter BCO Oracle DSN (must be configured via tnsnames.ora (see http://www.orafaq.com/wiki/Tnsnames.ora)) (Current Value=undefined) When integrated with TSCO, supply the TNS Name of your TSCO Database (as defined in the $ORACLE_HOME/network/admin/tnsnames.ora file). >> Enter BCO ORACLE_HOME (Current Value=undefined) When integrated with TSCO, supply the path to your Oracle Client installation on the TSCO Gateway Server console. If you are using Unix Populate this can be the same path specified in the $BEST1_HOME/local/setup/MpopulateOracleHome.loc file. >> Enter BCO Oracle Password (Displayed encrypted) (Current Value=undefined) When integrated with TSCO, supply the password for the BCO_OWN database user (schema owner). >> Enter BCO Oracle User Name (Current Value=undefined) When integrated with TSCO, supply the the TSCO database account that owns the TSCO installation (by default 'BCO_OWN'). In older TSCO installations this may be CPIT_OWN. You can validate by checking in the TSCO web interface under Administration -> System -> Configuration -> General -> Database Username (Schema Owner). >> Enter Number of Days to recover starting from today (Current Value=2) NOTE: This parameter is associated with a deprecated configuration of the TSCO Gateway Server VIS parser ETL that is generally not used so the default value can be selected. >> Enter Number per day of top Gateway Server visualizer file errors to recover (Current Value=10) NOTE: This parameter is associated with a deprecated configuration of the TSCO Gateway Server VIS parser ETL that is generally not used so the default value can be selected. >> Enter Gateway Server vis file directory (Current Value=undefined)
Sample output of configuring the script
> $BEST1_HOME/bgs/scripts/BCO_BPAStatusAndRecoveryManager.pl
GENERAL MANAGER LITE WEB PAGE OUTPUTNOTE: This section describes deprecated functionality of the General Manager Lite reporting that is not typically used. The typical use case of the GMLite reporting is the daily status e-mail that it sends. Below the 'BPA console' refers to the TSCO Gateway Server console. The reports are created by default in the $BEST1_HOME/local/manager/status/GeneralManagerLite/BCO_BPAWebReport directory and can be viewed via a local web browser running on your Linux console or shared out via a web server running on the BPA console. Below is a sample of the charts from three different BPA consoles. The breakdown of the available data is:
DETAILED INFORMATION ABOUT THE UNDERLYING GENERAL MANAGER LITE SCRIPTS
In order to obtain this functionality, the following are required (1) A Unix console, 7.5.10 or later; for 9.5 and later, this must be a Linux console (2) You need a perl script (GeneralManagerLite.pl) and updated GeneralManagerClient (this enhancement is recorded as QM001745812). These are available as part of 7.5.10 UNIX console patches, beginning with June 2012. Additional updates have been made since June 2012, QM001764244, and this is included in 7.5.10 SP2 console patch from December 2012. Additional enhancements have been made since December 2012, including support for Windows consoles (QM001781969 and QM001779687). These were included in 7.5.12 Cumulative Patch 2 (May 2013). An enhancement to support environments where there are multiple manager runs per day was introduced by fix QM001850974, first available in 9.5 SP1 from August 2014. For 9.0.00, install 9.0 SP4 or later.
$BEST1_HOME/bgs/scripts/GeneralManagerLite.pl -c <Console Name> [-o <Output Directory > -p <General Manager Port> -l -d -i <manager run pattern>] Where: Required: Console Name BPA console with GeneralManagerServer running or a comma-separated list of BPA consoles Optional: Output Directory Output Directory where results will be deposited: default is the current directory GeneralManagerPort General Manager Port: default 10129 -l Get the Remote Agent and Proxy Logs for detailed analysis (warning this can take a lot of disk space) : Default -d Save results in date-stamped directories for the last 30 days (recommended configuration) -i Ignore/remove results for manager runs that match the pattern specified; multiple patterns may be specified by using a comma Note that the BCO_BPAStatusAndRecoveryManager implementation method described above is just a semi-automated method for running this script and supplying the necessary input parameters.
If you have "special" manager runs, such as ones with no data collection where data is simply being reprocessed, you should remove these from the output by using the -i option. Otherwise, they will produce incorrect results since they don't have the full complement of activities occurring. Note that this is implemented by using a pattern match so that you don't have to specify the full names of manager runs. NOTE: If you are a Windows-console only installation, you can use a Linux VM to do a BPA console install in order to run the script. You don't need the console to be actually running any Manager runs.
INTERPRETING GENERAL MANAGER LITE OUTPUT FILES
"Nodes" which didn't get successfully put into the database for a particular day are divided between and because the type of followup required is likely to be different between the two groups of nodes. The error code associated with each node's status is provided in the .csv file: C means collect failure, T means data transfer failure, P means processing (no data created for input to the CDB) failure The error code numbers are available through this document (see attached spreadsheet or 000097173), and are detailed in the associated logs for that node (if requested using -l). This enables a summary level understanding of how many failures there are for the date, and how many nodes have the same kind of failure. The purpose is to provide a convenient way to troubleshoot groups of nodes rather than doing them one at a time. The details for each node are available in the associated log (if requested via -l), so low-level reporting is fully supported as well. lists all nodes with Collect errors 91, 92, or 94: 91 Error SD_COMM_BAD_HOST Service daemon invalid hostname provided (can not find server or DNS error) The agent name is not known by the OS. 92 Error SD_COMM_BAD_PORT Service daemon not installed on the remote node (connection refused) The product is not installed on the agent computer or the service daemon is not running. 94 Error SD_COMM_CONNECT_TIMEOUT Service daemon connection timed out (node offline The agent node is off the network.
A field-by-field description of all the output .csv files is provided as part of 000097397
"BEST PRACTICES" FOR DOING A DAILY HEALTH CHECK OF YOUR BPA CONSOLES
(1) Review the GeneralManagerLite output as described above. This gives an overall summary of how many nodes are under management, and the status of each node. Also comparing results from day-to-day immediately highlights any change in the overall health as well as pinpoints the source of the changes. Prior to 9.5 SP2, there was an unsupported script is available to summarize this daily review and to email you the results. The script (coded for a 9.0 console) is attached to this article and described in the attached Word document. Here's the information about the email option which is now part of 9.5 SP2 000085246 (2) Using the General Manager GUI (displayed in Perceiver or BCO 9.0), Console Operations -> "Recover Runs" view. Alternatively, you can use failedNodes.csv (output from GeneralManagerLite) or export the "Recover Runs" to csv if you prefer. The methodology here is to initiate any Recovery actions first, then work on the data collection problems which typically require more analysis to resolve. (3) Sort by "Populate Status". For any Manager run which is not "OK", select the run, and then select "Recover". (4) Sort by "Transfer Fail". For any Manager run which doesn't have a value of 0, select the run, and then select "Recover". (5) Sort by "Collect Fail". Use the corresponding Console Reports -> "Node History" view to establish the precise problem (using the error code), how many nodes have the same problem, and if the problem is persistent (using 3 or 5 day history setting). Perform remediation as indicated by the error code and cause. Note that the results of successful remediation may not appear for up to 2 days depending on the problem fixed and how often the Manager run is scheduled for execution. If you've specified the optional log gathering feature, the corresponding logs have already been retrieved from the remote nodes and zipped so that they can be sent to Customer Support. (6) Rerun the GeneralManagerLite script after the recovery actions have been completed in order to assess the "recovered" overall health of the data flow for today. When additional troubleshooting time is available, determining the root cause for repeating Population, Processing, or Transfer errors can avoid the need to "Recover" the run(s) each day. The lists all nodes which are listed as under management by BPA, but no collection agent is present. Typically this requires an internal ticket to get the agent software installed (either on a proxy or local agent). Note that this condition can occur when a node has its OS upgraded, but the corresponding BPA agent wasn't upgraded at the same time. See also: BEST FAQ on TrueSight and BMC Helix Continuous Optimization Gateway ServerRelated Products:
|