Oneword crashing on Solaris

kieranm

Member
Hi everyone.

We've had some issues recently with all the Oneworld processes dying on our Sun Application server simutaneously. This has happened about 4 times in the last 2 weeks and there is nothing that I can think of that has changed on the system.

There are no messages in any of the JDE logs and our Unix people are telling me that ithere is nothing in the Unix logs. I'm a little surprised that there isn't anything logged by the OS compared to other OSs (such as OS400).

Has anyone had these sorts of issues or know of anyway to capture more information about what is happening? I've logged a call with the RL but their response has been slow.

Thanks
Kieran
confused.gif


Xe Sp22_M1, Solaris 8, Oracle 8.1.6
 
You don't by any chance have 'LogErrors=0' or set to NONE, do you? Post the JDE.INI and the logs from your Sun server (dummy up the passwords unless you like to share with your coworkers and anyone else who might have access to JDELIST.)
 
Hi
Here's a copy of our server JDE.INI. We have LogErrors=TRUE

Unfortunately nothing shows up in our logs (apart from normal messages) when the processes die.

Thanks
Kieran

[DEBUG]
Output=NONE
Trace=FALSE
ClientLog=1
DebugFile=/data/jdedwardsoneworld/b7333/log/jdedebug.log
JobFile=/data/jdedwardsoneworld/b7333/log/jde.log
GlobalCompactSizeInit=1024
GlobalCompactSizeDestroy=0
LogErrors=TRUE
JDETSFile=/data/jdedwardsoneworld/b7333/log/JDETS.log
KeepLogs=1
RepTrace=0

[TAM]
TAMTraceLevel=0

[MEMORY DEBUG]
Frequency=10000
Full=1

[SVR]
EnvType=1
EnvironmentName=PD27333
SpecPath=spec
SourcePath=source
ObjectPath=obj
HeaderPath=include
HeaderVPath=includev
BinPath=bin32
LibPath=lib32
LibVPath=libv32
MakePath=make
WorkPath=work
CodeGeneratorPath=cg
ResourcePath=res
HelpPath=helps
NextIDPath=nextid
LibraryListName=PD27333

[INSTALL]
DefaultSystem=system
ClientPath=client
PackagePath=package
DataPath=data
B733=/data/jdedwardsoneworld/b7333
Double_Byte=0
LocalCodeSet=WE_ISO88591

[DB CACHE INFORMATION]
ODBC Tables=50
Maximum Request Cache=50
Library Cache=15
#Next line added as per KG Document, by Kieran 04 March 04
DataCaching=1

[JDEIPC]
ipcTrace=0
maxNumberOfSemaphores=300
startIPCKeyValue=11000

[JDEMAIL]
Rule1=90|OPT|MAILSERVER=mailhost
Rule2=100|DEFAULT|[email protected]
Rule3=110|DEFAULT|[email protected]
Rule4=120|DEFAULT|[email protected]
Rule5=130|OPT|MERGELOCAL=1
Rule6=140|OPT|UPDATELOCAL=0

[JDENET]
serviceNameListen=6009
serviceNameConnect=6009
maxNetProcesses=5
maxNetConnections=1000
maxKernelProcesses=48
maxKernelRanges=13
netTrace=0
HandleKrnlSignals=0

[JDENET_KERNEL_DEF1]
krnlName=JDENET RESERVED KERNEL
dispatchDLLName=libjdenet.so
dispatchDLLFunction=JDENET_DispatchMessage
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF2]
krnlName=UBE KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchUBEMessage
maxNumberOfProcesses=3
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF3]
krnlName=REPLICATION KERNEL
dispatchDLLName=libjderepl.so
dispatchDLLFunction=DispatchRepMessage
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF4]
krnlName=SECURITY KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchSecurity
maxNumberOfProcesses=2
numberOfAutoStartProcesses=1

[JDENET_KERNEL_DEF5]
krnlName=LOCK MANAGER KERNEL
dispatchDLLName=libtransmon.so
dispatchDLLFunction=TM_DispatchTransactionManager
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF6]
krnlName=CALL OBJECT KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchCallObjectMessage
maxNumberOfProcesses=20
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF7]
krnlName=JDBNET KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchJDBNETMessage
maxNumberOfProcesses=2
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF8]
krnlName=PACKAGE INSTALL KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchPkgInstallMessage
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF9]
krnlName=SAW KERNEL
dispatchDLLName=libjdesaw.so
dispatchDLLFunction=JDEK_DispatchSAWMessage
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF10]
krnlName=SCHEDULER KERNEL
dispatchDLLName=libjdeschr.so
dispatchDLLFunction=JDEK_DispatchScheduler
maxNumberOfProcesses=1
numberOfAutoStartProcesses=1

[JDENET_KERNEL_DEF11]
krnlName=PACKAGE BUILD KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchPkgBuildMessage
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF12]
krnlName=UBE SUBSYSTEM KERNEL
dispatchDLLName=libjdeknet.so
dispatchDLLFunction=JDEK_DispatchUBESBSMessage
maxNumberOfProcesses=1
numberOfAutoStartProcesses=0

[JDENET_KERNEL_DEF13]
krnlName=WORK FLOW KERNEL
dispatchDLLName=libworkflow.so
dispatchDLLFunction=JDEK_DispatchWFServerProcess
maxNumberOfProcesses=4
numberOfAutoStartProcesses=0

[NETWORK QUEUE SETTINGS]
UBE Semaphore Key=3600
DefaultPrinterOUTQ=laserjet
UBEQueue=QB7333

[BSFN BUILD]
BuildArea=/data/jdedwardsoneworld/b7333/packages
OptimizationFlags=-O
DebugFlags=-g -D_DEBUG -DJDEDEBUG
InliningFlags=
DefineFlags=-DKERNEL -DPRODUCTION_VERSION -DNATURAL_ALIGNMENT -D_SUN_SOURCE
CompilerFlags=-xCC -Xa -misalign -KPIC -c
OSReleaseLevel=
LinkFlags=-dy -G -L/data/jdedwardsoneworld/b7333/system/lib -ljdesaw
LinkLibraries=
SimultaneousBuilds=1

[UBE]
UBEDebugLevel=0
UBEPrintDataItems=1
UBESubsystemLimit=10


[DB SYSTEM SETTINGS]
Version=43
Default User=XXXX
Default Pwd=XXXX
Default Env=PD27333
Default PathCode=PD27333
Base Datasource=MOLLY - B7333 Server Map
Object Owner=SVM7333
Server=CASE
Database=ONEWORLD
Load Library=libora80.so
Decimal Shift=Y
Julian Dates=Y
Use Owner=Y
Secured=Y
Type=O
Library List=
TriggerLibrary=JDBTRIG

[LOCK MANAGER]
Server=
AvailableService=NONE
RequestedService=NONE

[JDB RECORD LOCKING]
;Note: When assigning the following setting, take into
;consideration the network traffic and overall system performance.
QueryTimeout=60

[SERVER ENVIRONMENT MAP]

[SECURITY]
User=XXXX
Password=XXXX
DefaultEnvironment=PD27333
DataSource=System - B7333
SecurityServer=molly
ServerPswdFile=TRUE
History=0

[CLUSTER]
;PrimaryNode=sundev1
 
is anything being logged in your start/stop log files or any of your log files that are in the installaion mount root.

are any of the UNIX kernels/processes going into Zimbie mode....do a "top" on your system and see how the resources are being used.
 
Don't know if this might be your culprit, but this was a recent posting from our iWATCHDOG services on Solaris crashing...

Hello folks,



We have discovered an outstanding issue with user concurrency computation. This was simlated on Sun V880 Box, Running oracle 9.0.2.4 on Solaris 2.8 for Denver Labs. Denver has acknowledged this as a problem and suspects it is prevalent on all releases using SP23/SP22 Foundation(May not apply to all Enterprise Platforms because of IPC handling pertinent to that foundation, All Unix like platforms are a suspect). Our records show that you are on the targeted foundation release. Based on our new iwatchdog services imitative you are being informed about the problem. Please ignore this mail if you are not on targeted foundation release.



Denver is working on CASE ID 4045025 that is registered under one of our esteemed client. We are trying to get an engineering fix or a possible roll-in into SP23 M1 (Which is highly unlikely based on the schedule of the one off release http://www.peoplesoft.com/corp/en/update_fix/kgwrapper.jsp?app=uc

). We have the luxury of testing the fix on asymetrical productional server configuration of one our clients and regression test the fix working in partnership with Denver Engineers. We will keep you posted.



Problem: DDTEXT going to zero bytes in some weird situations and OneWorld crashes.



Simlarity: Denver is trying to find that the fix they put in 8.9 tools applies to this issue. Here is the SAR for that:



Title
5879131 IPC Lock issues in JDEDDAPI RPT1068473000 SystemH93




Document Detail
Hide detailed information.




SAR Number
5879131

ICE Report ID
1068473000

Parent Number
4558984

Program ID
JDB -

System Code
H93 - Database and Communications

Release Fixed
8.9

Product
EnterpriseOne Tools and Technology

Priority
1 Critical

Type
1 - Correction

Status
01 - Completed

Code Change
(none)

Date Entered
2002-03-21

Date Complete
2002-06-13

Description

A ORIGINAL REQUEST

This SAR is the result of investigations about :
- Call 4857368
- Call 4922159
Defect :
in the file system/jdekrnl/jdeddapi/jdeddapi.c,
a global counter (nDDLocks) is used to count how many IPC
**READ** locks are created when calling LockDDFiles.
In the function ReadToWrite(), only 1 read lock is released,
and then LockDDFiles is called to get
a WRITE lock.
If the number of read locks was greater than 1 before
calling ReadToWrite,
-> 1), the read lock at the IPC level is not unlocked,
-> 2) the lock operation in write mode fails at the IPC
level (ipcLockResource returns eIPCInvalidLockState)
-> 3) the error is ignored in LockDDFiles.
The net result is :
-> At the IPC level, the lock is a still a READ lock, not a
WRITE lock,
-> The code in JDEDDAPI performs WRITE operations (TamAdd,
...) against the spec files
This defect only happen rarely, and only when a "Just In
Time" Replication of specs from RBD
to the kernel specs in the enterprise server takes place.
As a result of this defect, kernel DDTEXT specs are corrupte
d on the server, as the IPC lock semantic
is not respected.
Found by extensive review of code against customer logs.
This defect is present in all releases of Xe, and in
B9.
Please include me in the code review for the fix,
MA6817134, 03/21/2002.

B FINAL DISPOSITION

Based on the description of this SAR, there is a problem
in
JDEDDAPI when a process tried to lock ddtext file multiple
times and then later tried to update ddtext. If this
situation happens, this process will not be able to get
write mode lock on ddtext, therefore will not be able
to
update ddtext. This will not cause other kernel processes
to hang or die. In the two customer calls mentioned in
section A from which this SAR resulted, there are no
evidences that above situation happened. with it is said,
even though this situation happened, it would not cause
kernels to hang or die. In both customer calls, some
kernels hanged and eventualy died. My conclusion is that
the problem pointed out by this SAR is a side effect
of
researching the original problems in those customer calls,
it it by no means the cause of the problems in the customer
calls. So the fix provided here can only fix the problem
pointed out by the SAR description, not necessarily fix
the
problems of two customer calls. To prove this, I sent
the
fix to the originator of one of the customer calls and
had
him test on their system. It turned out that their problem
still exist after applying this fix.
..
The problem was clearly described in A section. The solution
is to decrement the global counter to 0 in ReadToWrite
before requesting write mode lock. And then in WriteToRead,
restore the counter to its previous value.

C MEMBERS AFFECTED MEMBER SRC LIB OBJ LIB

B9_SPF\system\jdekrnl\jdeddapi jdeddapi.c

--------------------------------------------------------

Tony Brackett
The iConsortium
[email protected]
 
Thanks for the info so far.

Nothing is getting logged in our stop.log (JDE doesnt even think it has been shutdown!) The start.log is typically as follows:-

Tue Dec 13 18:21:57 NZDT 2005 Starting JD Edwards OneWorld on molly
delete all MessageQueues, Shared Memories, and Semaphores, range:
0x00002af8 to 0x00002edf [11000 to 11999], owned by jde
Starting jdenet_n...
Running cleanup to check for unfinished jobs...
Starting OneWorld spec install queue...
Starting OneWorld batch queues...
Starting OneWorld package queue...

Tue Dec 13 18:22:13 NZDT 2005 JD Edwards OneWorld startup complete.


I will monitor more closely for Zombie processes and check Top, but generally our box is not breaking into a sweat, most of the time.

Thanks
Kieran
 
Back
Top