Crashing enterprise servers

gregglarkin

gregglarkin

Legendary Poster
List

quick qustion - has anyone else's Windows 2003 SQL Server enterprise servers gone crazy lately? My enterprise servers have crashed twice in the last three days. The servers are:

Windows 2003
SQL server 2005
XE, SP 23V

The only major change lately is the latest round of microsoft patches that were installed last week. My infrastructure guys and HP are looking into the issue, but my boss wanted me to quiz the list to see if anyone else's servers have become unstable too.

- Gregg
 
Hi,

Please, check the following :

1. Virus, don't trust your current Antivirus, install
a different one, run it and compare results.
2. Run a hardware check.
3. Blame MS Updates (which sometimes add quite a lot
of unstability to the mix).
 
Haven't installed today's patches yet but we're not having any issues here.... yet
crazy.gif
 
Oddly, my 8.12 Enterprise Server, also on SQL Server 2005, had rebooted this morning due to a blue screen.

This server has not yet taken the last round of MS patches, though.
 
[ QUOTE ]
...servers have crashed twice in the last three days

[/ QUOTE ]
You might consider your daylight savings time rules.<font color="white">. </font> Three days ago at 2 am was on the last Sunday in October, which was the pre-2007 standard in the US for changing time of day.<font color="white">. </font> You could have more than one rule in effect on your various servers.
 
Hi,

Do you have any suspect messages on System Event Viewer?
Failing services, ntfs warnings, etc?
 
I've seen and gotten calls from two sites with quirky things around messagaes about "rebooting the machine" due to MS updates. But like you Sir, the issues were a single or on one server a double boot...all has been well since then.

Other than that...I'd blame your CNC Admin...she/he probably doesn't know what she/he is doing....hee FRICKIN hee. GO ISERIES....

But really...I've seen some issues on client machines as well with the latest round of MS updates.

I've opened calls with the Windows Servers...I'll let you know if anything comes from it.
 
Hey Jim,

Thanks for the jab, very funny, well played. At first we thought the issue was hardware. But we ruled that out. The first reboot was on Cluster Server A. The second reboot was on Cluster Server B. The commonality is that it was the node with Prod business data on it that did the reboots. Our SQL servers are active/active. We have Prod and the system databases on one cluster resource, and the DV and PY databases on the other cluster resource. We generally try to use both sides of the cluster, but can run both cluster resources on the same node if necessary.

Other than a few entries in the event log, the issue is not logging anything for the infrastructure guys. These are high end HP servers with an auto-recovery feature. I spoke with our infrastructure director last night, they decided to turn off the auto-recover feature to see if they can capture some logs or screen dumps. So Ken, hopefully if (when) it fails again, we will have a blue screen to capture data from.

Hopefully we can get this fixed soon, because month-end closing is right around the corner.

Veggy - no it's not the daylight savings rule, that is set correctly.

Sebastion - we have the same virus scanning solution on hundreds of other servers, including fourty other SQL 2005 servers without the same issue. We think your last thought, the MS patches are the culprits.

Here are the patches we installed:

10/20/2009 Security Update for Windows Internet Explorer 7 (KB974455)
10/20/2009 Security Update for Windows Server 2003 (KB954155)
10/20/2009 Security Update for Windows Server 2003 (KB958869)
10/20/2009 Security Update for Windows Server 2003 (KB969059)
10/20/2009 Security Update for Windows Server 2003 (KB971486)
10/20/2009 Security Update for Windows Server 2003 (KB973525)
10/20/2009 Security Update for Windows Server 2003 (KB974112)
10/20/2009 Security Update for Windows Server 2003 (KB974571)
10/20/2009 Security Update for Windows Server 2003 (KB975025)
10/20/2009 Security Update for Windows Server 2003 (KB975467)

Gregg Larkin, MBA
North American JDE Systems Engineer
 
We have all of those updates on our SQL server with the exception of the IE7 update (we have IE8) and have not experienced any issues...
confused.gif
 
Hi,

I just lived a similar situation on 2 accounts with
W2003 32-bits servers.

One, running WAS ND 6.1.0.15 PK63203 on that box,
WAS is suddenly unable to start its Windows services
because some files are missing or "read past EOF",
CHKDSK warns about filesystem errors, we run the CHKDSK
again but no errors this time, and we finally had to
restore WAS folders.

Another, Citrix XenApp 5.0 RP5 on W2003-32 bits too,
similar situation : Citrix is unable to find some of
its binary files and folder on c:\program files\citrix,
mysterious CHKDSK errors too. At the end, we had to
restore Citrix folders.
 
Sebastion,

So far we have not had any file corruption, just random reboots. We have opened cases with HP and Microsoft. So far no smoking guns. The current theory is that it might have something to do with some HP management tools. The servers haven't crashed since Tuesday, so that is a positive. One other thing to note, this isn't a volume thing, the previous reboots happened during very low utilization periods.

Gregg Larkin, MBA
North American JDE Systems Engineer
 
Back
Top