RTSCsupport,
We use a number of tools to monitor the health of our systems.
1) We have a batch job (R0006P, XJE0005) that we launch once an hour. Since we know that this report only takes a few seconds to run, we have a timer on the job that sends an alert if it runs for 10 minutes or more. That gives me a feel for whether the server is up, and how busy it is. If I don't hear from the job, life is good.
2) We have various system monitoring tools that our infrastructure team put in place that check to see if the servers are up, log in to web pages, check to see if the jde services are up, etc. If an error crops up, I get an email. If it's a system down error, an alert goes to the operators, and they call me.
3) We are also using the built in alerts in server manager for several types of events. Those alerts then send out an email, or if I feel it's important enough, send out a text message to my cell phone.
My point is, we put our eggs in several baskets. If your monitoring tool goes down, at the same time as your system, you will be lulled into a false sense of confidence. But if I get body slammed from multiple sources, I know that the fecal matter has hit the spinning blades.......
Just my $.02 worth.
- Gregg