CallObjects Kernel Hanging with much CPU and bsfn timing out

antoine_mpo

Reputable Poster
Hi List,

We were previously on a EntrepiseOne Xe SP20 (windows 2000 servers, Oracle 8i database, websphere 4), but we are migrating to a new one.
This new platform is in Xe SP23_Q1, windows 2003 servers (most of servers are VMWare virtual server, except for the oracle server), Oracle 10.2.0.3 database, Websphere 6.0.2.13.
We first installed a development platform (single oracle/entreprise server/webpshere, no VMWare technology), and now we are installing the production platform.
But we are experiencing an issue in both SP23 platforms (dev and prod), that we can't reproduce :
During some Voucher Match proccess (P4314), we sometimes have CallObjects Kernels hanging (most of the time, we have 2). These windows processes are taking 25% of CPU each (the level is staying that high all the time, no decrease), but doesn't seems to work because all bsfn are ending in timeout.
We had the issue several times, during the same tests, but never when i was watching for them, with full logs activated .
So the only thing i see is that the first bsfn ending in timeout is F4314EditLine (according to jas logs), but nothing special in jde.log for these process.
I opened a case, but as you may imagine, they don't have enough information.

That's why i'm posting here to know if someone ever met similar issues.

We don't have such an issue on our previous SP20 platform.
The only ESU we installed on the SP23 platform are technical (ESU Planner required for SP23, an ESU related to P98OWSEC, and an ESU related to a missing index on F91300 table - Scheduler -).

Any help would be appreciated.

Thanks for your help.
 
Hi Antoine,

May you post the JAS and Enterprise logs?
It may help us to find some hint on what's going on.
Thanks,
 
Hi Sebastian,

Sorry i didn't answer sooner, but we started SP23 in Production on Tuesday, and i've been a bit busy.
Now that we are in production, with full load, we meet the issue more often. We kill jdenet_k process several times a day.
It seems it's always happening in the same transaction (voucher match). Yesterday a user tried 4 times to enter her voucher match, and we had to kill 4 jdenet_k process ...
So i asked the user to come to my office to try while i'll watch and try to get some logs. I enabled on the fly the jdedebug of the callobject kernel which the user was connected to.
And guess what ?
As usual, nothing happened !!! She could validate her voucher match without any issue !!
Damn, it's driving me crazy !!!
Did you ever meet some vicious problem that doesn't occur anymore when you enable debug logs ???


The only thing i noticed is that jde.log always looks the same before a callobject crashes. The last lines are always several lines like the following one :

2704/7144 Thu May 31 10:10:49 2007 JDEERR.C1679
COB0000120 - Business function passed incorrect error map to CallObject!

And that's all.

I attached a zip file (CallObject_Issue.zip) with the following information i've got :
a jde.log of a callobject kernel (Log_CallObject_pid4372.pdf)
a jas.log (jas.log_Q6_port9082.txt)
a real time log (rt.log_Q6_port9082.txt)
a picture of the web entreprise saw (with the call objects crashed with several outstanding requests) (saw_Serveur_Q11.jpg)
a picture of the webserver saw with the timeout bsfn calls (timeout_Q6_port_9082.jpg)


Thanks for your help,
Cheers,
 

Attachments

  • 121193-CallObject_Issue.zip
    199.7 KB · Views: 141
Antione,

I belive I had the similar problem on E810 with TR8.94D1. Symptoms were exactly same and the only resolution was to kill the kernel.

It happened very random and we coudln't produce the debug logs. I don't recall actually gettign any resolution on the same. I was frustrated just like you.

Now we are on TR 8.96c1 and we don't have that issue any more.
Jaise
 
Hi Antoine,

Seems that your BSFNs are "dying" because they timeout
after 300000 ms (5 minutes) waiting, which should be
more than enough to flush JDE caches to database.
Have you updated statistics and regenerated indexes for
all tables between F40 and F43?
All of the failing calls I saw were related to F42xxEndDoc,
F43xxEndDoc, etc... EndDoc is the part where the BSFN
collects all of its cache records and writes them to the
database table.
I also recommend you (from to time to time) to purge
F40UI*, F41UI*, F42UI* and F43UI* tables, those are
temporary tables used by JDE Caches and tend to accumulate
thousans or millions of records. That purge has to be
run when users are gone and scheduled UBEs are not
running on the system.
Those tables are intensively used as temporary storages
and could slow down these BSFNs performance too.
 
We had the same issue with the same business functions on XE at my previous company. We used to have multiple occurrences a day, but as data was cleaned up and more proactive database maintenance was implemented, they almost went away completely with no major application/program changes. From that experience I would say the source of the problem has to do more with the data than the programs themselves.
 
Hi Sebastian,

Thanks for having taken a look at my log files and for your suggestions.

About regenerating indexes, no we didn't do it. But the database as been recreated last week-end (by exporting data from our Oracle 8 database and importing it in Oracle 10g database), so is it necessary to regenerate indexes ? (Furthermore, i already talked with our DBA about index regeneration, and it seems that this point is a big subject of discuss in oracle dba world. According to Tom Kyte - a well known Oracle dba with his website "ask tom"-, index regeneration is useless except when an index is invalid.
But i'm not an Oracle DBA, so ...)
About statistics update, we let Oracle update it automatically.

I took a look at the tables you mentionned which are used by JDE Caches. Most of them are empty. Here are the count results :
F40UI00T : 0
F40UI002 : 0
F40UI16 : 0
F40UI17 : 0
F40UI74 : 437
F40UI801 : 0
F40UI84 : 0
F41UI001 : 0
F41UI002 : 0
F41UI003 : 0
F42UI01 : 154129
F42UI11 : 53
F42UI11C : 0
F42UI12 : 53
F42UI130 : 40
F42UI520 : 644
F42UI521 : 709
F42UI800 : 24
F43UI50D : 0
F43UI50H : 0

I checked in my development database, and the result are pretty much the same.
Do you think that the F42UI01 (with thousands of records) could be a problem ?

Should we try to delete all the records during the next oracle shut down ?

In my post, it seems i was joking about the fact that when i activate the jdedebug the issue doesn't occur ... But honestly, i'm not !! It's really happening like that !
And so far it's my only "workaround" ! The user(s) doing that transaction call me before, i look in the web saw for entreprise server to find which call object kernels are used by the user login, i activate jdedebug on these process, and then they can enter their voucher without issue.
 
This is probably correct, but your stats are way off or missing after such import. "Automatically" never works. Create stats manually and try again. This should improve it.
 
This may be late, Voucher Match does not use those work tables, there are issues with the input PO tables. From the SAW screen shots, I can tell you there are database blocks or other slowness caused by Business functions. I don't believe the performance of Xe functions. I had similar issues in Xe back in 2001 in some of the Next number retrieval functions which does full table scans on the Transaction table when it checks the next number in the existing table. Whenever we did Update statistics it used work fine at some point, then took forever at later point of time. We can tell from the debug logs. But at the same time there may be other locks on the input tables.
 
Antoine, you might want to bear in mind that Oracle 10.2.0.3 is not yet supported with JDE. (Check MTR's)
We hear that certification may start to come through August onwards.
 
Back
Top