Enabling cell tracing and starting cell in foreground
To determine why the cell is unable to start, enable cell trace and start the cell in foreground mode. Starting the cell in foreground mode is the preferred way to troubleshoot cell startup failures because it will output messages even if the cell is unable to write to the trace file.
- In pw/server/etc/<cell> directory, create a file named mcell.trace (if not present) with contents: ALL ALL stderr
- Start the cell in foreground mode: mcell -d -n <cell>
The trace will be displayed in stdout and will display the reason for the cell failing to start.
One possible workaround is to rename the xact to xact.1. If the xact is corrupt, and doesn't contain any events/data of consequence, you could also remove it, and then the cell may be able to start right away.
The common causes of cell startup failure are as follows:
Cell fails to start due to corrupt mcdb
Starting cell in foreground displays following error:
20130515 102612.423000 mcell: EVTLOG: BMC_TS-IMC065024E: Error in transaction history file C:/Program Files/BMC Software/TrueSight/pw/server/var/Admin/mcdb, line 1: bad command header
20130515 102612.426000 mcell: CONTROL: BMC_TS-IMC200015F: Could not reload State file
20130515 102612.472000 mcell: SERVICE: BMC_TS-IMC050141V: Disconnecting from destinations...
20130515 102612.473000 mcell: SERVICE: BMC_TS-IMC050114V: Disconnecting clients...
20130515 102612.473000 mcell: SERVICE: BMC_TS-IMC050106V: Cell shutdown...
The preceding error message indicates that there is an inconsistency with the state file (mcdb). In this particular situation it will be necessary to revert to a previous mcdb file located in the same directory. A dir listing shows:
Directory of C:\Program Files\BMC Software\TrueSight\pw\server\var\Admin
15/05/2013 10:38 <DIR> .
15/05/2013 10:38 <DIR> ..
15/05/2013 10:38 11 datid.txt
15/05/2013 10:38 11 evtid.txt
15/05/2013 10:38 0 mcdb
15/05/2013 08:26 237,483 mcdb.11932a910
15/05/2013 09:28 237,326 mcdb.119338a70
15/05/2013 10:17 237,495 mcdb.119344760
14/05/2013 12:51 12 smid
15/05/2013 10:37 4,286 xact
15/05/2013 09:28 47,343 xact.119338a70.1
15/05/2013 10:16 38,431 xact.119344760.1
15/05/2013 10:17 299 xact.119344870.1
15/05/2013 10:37 9,105 xact.119349320.1
12 File(s) 811,802 bytes
2 Dir(s) 10,376,409,088 bytes free
From the dir listing you can see that mcdb.119344760 was the previous file and xact.119349320.1 was the transaction file that needs to be reapplied. The following steps will be needed to ensure that there is no loss of data:
- Take a backup of the cell var directory
- Rename mcdb to mcdb.bak
- Rename mcdb.119344760 to mcdb
- Rename xact to xact.2
- Rename xact.119349320.1 to xact.1
- Run statbld to create a new mcdb from the xact.1 and xact.2 files with command
statbld -n <cell>
- After statbld has run successfully, then the cell can be started.
Cell fails to start due to stabld not working
Starting cell in foreground shows following error:
20103831.109000 mcell: EVTLOG: BMC_TS-IMC065102V: Checking for trailing transaction log file C:/Program Files/BMC Software/TrueSight/pw/server/var/Admin/xact
20130515 103831.111000 mcell: EVTLOG: BMC_TS-IMC065103V: Processing trailing transaction log file
20130515 103831.126000 mcell: EVTLOG: BMC_TS-IMC065051I: Performing State Build - please wait
BMC Impact State Builder 9.0.20 (Build 231155889 - 18-Feb-2013) [w4]
Copyright 1998-2012 BMC Software, Inc. as an unpublished work. All rights reserved.
20130515 103831.293000 mcell: SYSTEM: BMC_TS-IMC012011V: Executed program C:/Program Files/BMC Software/TrueSight/pw/server/bin/statbld.exe - exit code 1
20130515 103831.294000 mcell: EVTLOG: BMC_TS-IMC065012E: State Builder failed to process trailing transactions
20130515 103831.296000 mcell: EVTLOG: BMC_TS-IMC065011E: Cannot activate State Builder
20130515 103831.296000 mcell: EVTLOG: BMC_TS-IMC065004F: Cannot start with trailing transaction log file C:/Program Files/BMC Software/TrueSight/pw/server/var/Admin/xact - repair first
20130515 103831.297000 mcell: SERVICE: BMC_TS-IMC050141V: Disconnecting from destinations...
20130515 103831.298000 mcell: SERVICE: BMC_TS-IMC050114V: Disconnecting clients...
20130515 103831.298000 mcell: SERVICE: BMC_TS-IMC050106V: Cell shutdown...
The preceding error messages, indicate a problem with the statbld process. There are a number of reasons for its failure. Now, run the mlogchk -n <cell>
command, as this will perform a consistency check and advise of any action required. If mlogchk does not find any inconsistency then run statbld with trace enabled:
- In pw/server/etc directory, modify file statbld.trace so that it contains: ALL ALL stderr
- Run statbld from a command window:
statbld -n <cell>
The trace will be displayed in the stdout and will show the reason for statbld failure.
Cell fails to start with message "Impossible to bind endpoint"
Starting cell in foreground shows the following error:
20130515 142116.785000 mcell: SERVICE: BMC_TS-IMC050005F: Server <ANY> xx.xx.xx.xx/ 1828 setup error 5 (Impossible to bind endpoint (10048))
This indicates that the cell has been unable to bind to the port defined in the pw\server\etc\mcell.dir file. From the message we can see it is port 1828. The following are known reasons for this problem:
- There is another process using that port. A
netstat
command should be run to see if anything is already listening on that port. - This is a HA cell and the definition for that cell in mcell.dir file on primary server and secondary server are different.
- This is a secondary HA cell and the mcell.conf incorrectly contains CellDuplicateMode=1
- The cell is already running.
___
Messages from mcell-log:20231002 161753.208000 mcell: SERVICE: BMC_TS-IMC050004F: Server 10.x.x.x/1828 setup error 5 (Impossible to bind endpoint (10049))
20231002 161753.208000 mcell: SERVICE: BMC_TS-IMC050141V: Disconnecting from destinations...
20231002 161753.208000 mcell: SERVICE: BMC_TS-IMC050114V: Disconnecting clients...
20231002 161753.209000 mcell: SERVICE: BMC_TS-IMC050106V: Cell shutdown...
Follow the below steps for cell start:
- Verify the primary node of cluster resolves to the ipv6 or ipv4, If it resolves to the ipv6, then cell won’t start.
- In order to cell support ipv4, update ServerIPVersion=4 in cell's mcell.conf (path: pw/server/etc/<cell name>) (if it is not present in mcell.conf then add it)
- Check event db size if required then change to EventDBSize=500000 from default value EventDBSize=360000 in cell's mcell.conf (path: pw/server/etc/<cell name>)
- Check the events count by using this query: mquery -n CellName -q -s COUNT
- Comment out the deprecated collectors in ‘.load’ file from the cell kb, those have not been used since 11.3.04. (path: \pw\server\etc\<cell name> \kb\collectors)
Sample:
#
# File name: .load
# Version: 11.3.04
# Copyright 1998-2020 BMC Software, Inc. All Rights Reserved
#
self_collector
catchall_collector
#pom_activeevents_collectors
#pom_intelligentevents_collectors
#catchall_collector
#pom_byuser_collectors
#mc_bystatus_collectors
#mc_evr_collectors
#bii4p_collectors
#bco_collector
#itda_collectors
#ppm_sm_collector
#euem_collectors
#ibrsd_collectors
#mc_sm_collectors
#eye_collector
- Start cell and check whether cell is able to come up
- Once cell comes up, then restart TSIM
Note: As cell is the primary process on which the remaining processes are dependent, the TSIM server was not getting started at all.
Cell fails to start with message "BMC-IMC032205F: Cannot read knowledge base file"
Starting cell in foreground shows following error:
20130515 160755.218000 mcell: BAROC: BMC_TS-IMC032270V: Signature 1691 3811407335
20130515 160755.219000 mcell: BAROC: BMC_TS-IMC032269V: Installing from file C:/Program Files/BMC Software/TrueSight/pw/server/etc/Admin/kb/rules/mv_admin.wic
20130515 160755.222000 mcell: BAROC: BMC_TS-IMC032270V: Signature 1071 2064456062
20130515 160755.223000 mcell: EVTPROC: BMC_TS-IMC090004F: Failed to load knowledgebase definitions
20130515 160755.302000 mcell: SERVICE: BMC_TS-IMC050141V: Disconnecting from destinations...
20130515 160755.303000 mcell: SERVICE: BMC_TS-IMC050114V: Disconnecting clients...
20130515 160755.304000 mcell: SERVICE: BMC_TS-IMC050106V: Cell shutdown...
This indicates that the cell is unable to load the knowledge base (KB). Open a command window and run mccomp -n <cell>
to recompile the KB. Resolve any errors it reports (if any) and then start the cell again.