Remedy IT Service Management - Troubleshooting DMT UDM Load Balance and Server Group issues
Knowledge Article
Remedy IT Service Management - Troubleshooting DMT UDM Load Balance and Server Group issues
This guide is a quick overview of how to troubleshoot DMT / UDM Load Balance environments. This will be maintained and updates by support through its lifecycle.
Remedy IT Service Management Suite
Remedy ITSM Foundation
All Versions
Remedy IT Service Management Suite
Remedy ITSM Foundation
All Versions
Remedy IT Service Management Suite
Remedy ITSM Foundation
All Versions
This knowledge article may contain information that does not apply to version 21.05 or later which runs in a container environment. Please refer to Article Number 000385088 for more information about troubleshooting BMC products in containers.
The following guide will be separated in sections outlined below
It is important to read and understand the documentation behind Load Balance environments as this guide will cover some basics. Please see the following documentation below
Arjavaplugin.log file typically found in the C:\Program Files\BMC Software\ARSystem\Arserver\Db directory
arcarte.log file file typically found in the C:\Program Files\BMC Software\ARSystem\Arserver\Db directory
arerror.log file typically found in the C:\Program Files\BMC Software\ARSystem\Arserver\Db directory
ar.cfg file typically found in the C:\Program Files\BMC Software\ARSystem\Conf directory
pluginsvr_config.xml file typically found in the C:\Program Files\BMC Software\ARSystem\pluginsvr directory
Example Server group configuration and how to set it up correctly.
Recommendations:
Before configuring the UDM:Config form for a server group environment, you must rank the Atrium Integrator servers by using the AR System Server Group Operation Ranking form. If you assign ranking 1 to a server, that server becomes primary server and runs the jobs. If the primary server fails, the secondary server (failover server) runs the jobs. Failover server is the server to which you assigned ranking 2. If you do not assign ranking to the servers in a server group environment, jobs run on the server which receives the request first. For details, see Setting failover rankings for servers and operations.
BMC recommends you to select a non-user facing server (admin server) as a primary server.
Recommendation is to select a non-user facing server as your primary server
Default checkbox should be set for the primary server in the UDM:Config form
Atrium integrator engine server name should match Server-Connect Name value in the ar.cfg file
'Host' value for each entry should match the diserver server hostname defined in the armonitor.cfg for diserver/carte plugin
No long names, aliases, ip addresses or host names should be entered in the UDM:Config form
'Is default' will be set to YES for server defined as Rank-1 in the 'AR System server group operation ranking' form
'Failover server name' should not have any entries in this field
Port value should be - 20000
Should have entries for all ar servers in group and load balancer
Here is the correct way to configure entries for a 3 AR server group environment. In this example - diserver/carte plugin is enabled in all 3 servers in this server group and this is what we recommend and is a best practice. If the default server goes down then the 2nd server in the ranking form will run the jobs (as the plugin will be available and running to take care of the jobs that will be created when server 1 goes down)
In this scenario the job will always run on NEWSC-PD-AR--1 irrespective of where it was triggered from and the AR server the user session is on, as the 'Is Default' value is set to "yes" for server NEWSC-PD-AR-01
Recommendation is to always restart AR Service after making any changes to the UDM:Config form
Important note" If 3 UDM jobs were running when the server 1 went down, then those will have to be reviewed and you will need to manualy create a new job with the non promoted data and run it again for the 2nd server. Failover is not automatic, but more of being able to run the jobs on second server if it server 1 does go down.
2. UDM:RAppPassword form:
This form authenticates the Remedy Application Service password for the $SERVER$ value from the mid-tier and then finds the correct server name from the UDM:Config
This form should contain entries for ALL possible server names which can be used to connect to the AR Server, including
Host names
IP Addresses
Alias Names
Load Balancer names
Changes to the UDM:RAppPassword form does not require an AR server restart.
3. Below is the configuration for the UDM:RAppPassword using the above server samples for the UDM:Config form
newsc-s: AR Server alias name
newscorp-vip: LB alias name
UDM:PermissionInfo:
A regular form UDM:PermissionInfo that contains lists of all pentaho transformations, jobs, database connections, slave servers, partition schemas, directory, cluster schemas and corresponding user group permissions in field 112 as shown below
" Carte Server Name - (Optional) if the "Carte server name" is set for a particular transformation/job, then ARDBC plugin always executes that transformation/job on that particular carte server. This way the load balancing of data integration jobs across multiple carte servers can be done. If the carte server name is not configured for a transformation/job then that transformation/job will be always executed on the local carte server.
Forms that would contain data if a database migration has taken place, and where to clean them:
Server References should be fixed in the following forms (i.e. correct AR server names should be present on these forms):
015/12/22 14:02:36 - CI-CS-CMDBErrorOutput.0 - ERROR (version 4.1.0, build 1 from 2014-07-20 22.27.24) : org.pentaho.di.core.exception.KettleDatabaseException: 2015/12/22 14:02:36 - CI-CS-CMDBErrorOutput.0 - ERROR (version 4.1.0, build 1 from 2014-07-20 22.27.24) : Did not find Remedy Application Service password for server X in UDM:RAppPassword Form on server Y
Verify that you have the load balance server listed in the UDM:RAppPassword form and the windows HOST file contains
AR Server IP servergroupname
AR Server IP servergroupname.domain.net
The jobs are failing with the error messages: "Error Connecting to ARSystem" and "Did not find Remedy Application Service password for server xxxxxxx in UDM:RAppPassword Form
Make sure the UDM:Config and the UDM:RAppPassword form contain the correct entries mentioned in this guide
ERROR (90): Cannot establish a network connection to the AR System server; servername:31500
Make sure the UDM:Config and the UDM:RAppPassword form contain the correct entries mentioned in this guide
Error while fetching data from form UDM:ExecutionStatus ERROR (623): Authentication failed; aradmin
Ensure the UDM:Config form is configured correctly
Make sure the UDM:RappPassword is configured and the passwords are correct
ERROR [pool-4-thread-25] com.bmc.arsys.pluginsvr.plugins.a (?:?) - createEntry() FAILs in plugin: ARSYS.ARDBC.PENTAHO ERROR (8753): Error in plugin; servername.name.com
Verify that you have the load balance server listed in the UDM:RAppPassword form and the windows HOST file contains
AR Server IP servergroupname
AR Server IP servergroupname.domain.net
Error in plugin : servername.xyz.com (ARERR 8753) An application command failed. (ARERR 4554) Application-Delete-Entry "DMT:Action" 000000000060008
Verify that you have the load balance server listed in the UDM:RAppPassword form and the windows HOST file contains
AR Server IP servergroupname
AR Server IP servergroupname.domain.net
Error in plugin : No Carte Server with name servername exists in UDM:Config form. (ARERR 8753) Tue Jun 03 18:36:39 2014 390626 : An application command failed. (ARERR 4554) Tue Jun 03 18:36:39 2014 Application-Delete-Entry "DMT:Action" 000000000002304
Verify the server entries in the UDM:Config form
Nothing is being written to the arcarte, arcarte-stdout log files.
If you are in a servergroup, verify that the logs are being written on the correct servers outlined in the UDM:Config form. If you see that the logs are being written to server 2 and not the default server checked in UDM config file is not being used as the primary
Checker error trying to join Job Console: The check could not query the server parameters: %s
From central configuration , ensure that server connect is not missing.
Validate that server-connect-name and other parameters are written properly from ar.conf file. (sample: conect)
Remove REPORTING server names from next tables: select * from servgrp_board