Don't know if this might be your culprit, but this was a recent posting from our iWATCHDOG services on Solaris crashing...
Hello folks,
We have discovered an outstanding issue with user concurrency computation. This was simlated on Sun V880 Box, Running oracle 9.0.2.4 on Solaris 2.8 for Denver Labs. Denver has acknowledged this as a problem and suspects it is prevalent on all releases using SP23/SP22 Foundation(May not apply to all Enterprise Platforms because of IPC handling pertinent to that foundation, All Unix like platforms are a suspect). Our records show that you are on the targeted foundation release. Based on our new iwatchdog services imitative you are being informed about the problem. Please ignore this mail if you are not on targeted foundation release.
Denver is working on CASE ID 4045025 that is registered under one of our esteemed client. We are trying to get an engineering fix or a possible roll-in into SP23 M1 (Which is highly unlikely based on the schedule of the one off release
http://www.peoplesoft.com/corp/en/update_fix/kgwrapper.jsp?app=uc
). We have the luxury of testing the fix on asymetrical productional server configuration of one our clients and regression test the fix working in partnership with Denver Engineers. We will keep you posted.
Problem: DDTEXT going to zero bytes in some weird situations and OneWorld crashes.
Simlarity: Denver is trying to find that the fix they put in 8.9 tools applies to this issue. Here is the SAR for that:
Title
5879131 IPC Lock issues in JDEDDAPI RPT1068473000 SystemH93
Document Detail
Hide detailed information.
SAR Number
5879131
ICE Report ID
1068473000
Parent Number
4558984
Program ID
JDB -
System Code
H93 - Database and Communications
Release Fixed
8.9
Product
EnterpriseOne Tools and Technology
Priority
1 Critical
Type
1 - Correction
Status
01 - Completed
Code Change
(none)
Date Entered
2002-03-21
Date Complete
2002-06-13
Description
A ORIGINAL REQUEST
This SAR is the result of investigations about :
- Call 4857368
- Call 4922159
Defect :
in the file system/jdekrnl/jdeddapi/jdeddapi.c,
a global counter (nDDLocks) is used to count how many IPC
**READ** locks are created when calling LockDDFiles.
In the function ReadToWrite(), only 1 read lock is released,
and then LockDDFiles is called to get
a WRITE lock.
If the number of read locks was greater than 1 before
calling ReadToWrite,
-> 1), the read lock at the IPC level is not unlocked,
-> 2) the lock operation in write mode fails at the IPC
level (ipcLockResource returns eIPCInvalidLockState)
-> 3) the error is ignored in LockDDFiles.
The net result is :
-> At the IPC level, the lock is a still a READ lock, not a
WRITE lock,
-> The code in JDEDDAPI performs WRITE operations (TamAdd,
...) against the spec files
This defect only happen rarely, and only when a "Just In
Time" Replication of specs from RBD
to the kernel specs in the enterprise server takes place.
As a result of this defect, kernel DDTEXT specs are corrupte
d on the server, as the IPC lock semantic
is not respected.
Found by extensive review of code against customer logs.
This defect is present in all releases of Xe, and in
B9.
Please include me in the code review for the fix,
MA6817134, 03/21/2002.
B FINAL DISPOSITION
Based on the description of this SAR, there is a problem
in
JDEDDAPI when a process tried to lock ddtext file multiple
times and then later tried to update ddtext. If this
situation happens, this process will not be able to get
write mode lock on ddtext, therefore will not be able
to
update ddtext. This will not cause other kernel processes
to hang or die. In the two customer calls mentioned in
section A from which this SAR resulted, there are no
evidences that above situation happened. with it is said,
even though this situation happened, it would not cause
kernels to hang or die. In both customer calls, some
kernels hanged and eventualy died. My conclusion is that
the problem pointed out by this SAR is a side effect
of
researching the original problems in those customer calls,
it it by no means the cause of the problems in the customer
calls. So the fix provided here can only fix the problem
pointed out by the SAR description, not necessarily fix
the
problems of two customer calls. To prove this, I sent
the
fix to the originator of one of the customer calls and
had
him test on their system. It turned out that their problem
still exist after applying this fix.
..
The problem was clearly described in A section. The solution
is to decrement the global counter to 0 in ReadToWrite
before requesting write mode lock. And then in WriteToRead,
restore the counter to its previous value.
C MEMBERS AFFECTED MEMBER SRC LIB OBJ LIB
B9_SPF\system\jdekrnl\jdeddapi jdeddapi.c
--------------------------------------------------------
Tony Brackett
The iConsortium
[email protected]