Could ProcOpt Data Size Mismatch be causing my Zombie Kernels issue?

HighOnTek

Member
My apologies for the duplicate, I forgot to attach my logs to my last post. Feel free to delete the one with no attachments...
Running JDE 9.0, on an AS400.

Am I headed in the right direction in thinking that our zombie kernel issues are being caused by these RUN0000066 warning's I'm seeing over and over in each Zombie Log?
Our P13714, P13730, and P13732 have undergone a few customizations over the years.

I can see the Call Object Kernel Starting up, then gets followed by a RUN0000066 Warning. Shortly there-after from what I can tell we get a Zombie Kernel.
I've attached two log files for review, will show an example below.

196955/422 SYS:Dispatch Mon Nov 10 08:13:17.094032 jdekdisp.c2398
KNT0000888 - Call Object Kernel Thread Pool in multi-threaded mode.

196955/422 SYS:Dispatch Mon Nov 10 08:13:17.094376 jdekdisp.c2405
KNT0000999 - Call Object Kernel Thread Pool Setting: size 20, increment 1

196955/425 WRK:OKEA_80000000_P13714 Mon Nov 10 08:14:34.650880 rtk_frms.c648
RUN0000066 - Warning - ProcOpt Data Size Mismatch: Requested 1226 is less than Retrieved 1228 for App ,Version .… Data Structure allocated successfully. This usually means items have been added to the template and existing… business function will function correctly.

ETC...............

Would redeploying the data structures for P13714, P13730, and P13732 resolve this issue... or is something else entirely different going on that I'm not seeing?

Any assistance, input or feedback is appreciated. Complete Noobie here...
Thanks,

HTK
 

Attachments

  • JDE_171387.txt
    9.7 KB · Views: 11
  • JDE_171884.txt
    14.2 KB · Views: 24
Couple of things. Just redeploying the proc option templates is not going to fix those warnings. Unless things have changed in your TR release, the .h file that gets generated for a PO template is not checked in to the deployment server so you have to take the contents of the .h file and put it in your BSFN .h file - you cant do an #include. Doing the later (putting the updated struct def in the the BSFN .h file) and then doing a full build will get rid of the warnings but probably not your zombies (although the full build might inadvertently fix that issue as well). The warnings about the proc opt templates are annoying but they are just that, warnings. Basically the C api that allocates memory and fills the PO template DS takes size as a parameter and the warning is just saying that the caller of the api is using a smaller DS (i.e. an old struct w/o the new params) and the size is less than the data that is available. The api will allocate the requested memory and only fill up to the requested memory - in other words, its not causing a buffer overrun and subsequent zombie. Now, having said all that, there could be some weird scenario/code thing that is related to the changed PO templates... anything is possible. Best practice IMO is anytime you change a PO template you find the struct def in the .h file(s) and update it.
 
Thank you for taking the time to reply B0ster, I appreciate it.
We ended up doing a full build in June, it didn't seem to resolve these Zombies, let alone reduce them. We avg 3 - 5 a day, sometimes more. :(
We've also tried doing a few ESU updates, no difference.

Is there anything else I could do or be checking in an attempt to identify the root cause of these Zombie Kernels?

As you saw from my thread in the DEV section another thought/theory that came to mind was if the developer who did the work on our P13714,13730 and P13732 may have created a new, or customized an existing BSFN and used a global or static variable... thus causing an issue in terms of not being threadsafe. Complete shot in the dark, thoughts?
 
Do you know for sure that the developer used a global or static variable or are you just wondering if that could be the problem? If you have the code where the static/global var is used, post it and we can probably determine if it might pose a problem. I find it hard to believe that a developer used a static or global variable... if he/she did they may have had an actual valid reason to do so.

Are you able to reproduce the zombie kernel at all? Do you know what process is running when the zombie happens? Do you have a stack dump or stack trace that shows the function that caused the zombie?
 
Based on the above two logs, the kernels crashed when running IsStringBlank and MathZeroTest, neither of which should cause the kernels to crash. This usually indicates that memory got corrupted before these functions were called - this behavior is common in C based programs making it hard to identify the cause of the issue. Try increasing the logging level on your kernels (or entire subsystem) to *SECLVL. You should see more information in your job logs after you do this. Hope this helps.
 
Back
Top