UBEs submitted to AS/400 go to a TIMW status.

sashton

sashton

Reputable Poster
Hi List,

Okay, here is my strange situation. Built a full PY package over the weekend. Everything finished successfully. Yesterday, Monday, morning though when I came in to work, there were 20+ jobs running in QBATCH at a status MTXW, mutex wait. I have seen that before when deploying packages while jobs are running. But I had already deployed my full PY successfully. When I looked closer, it turns out the deployment time to the AS/400 server took 3.5 hours! Way more than the usual 10 min. So I figured something must have gotten hosed in the deploy. So, I killed all the jobs running, put the job queues on hold and redeployed the full package. This time was fine, it only took 7 min instead of 3.5 hours. Now, I started to submit some jobs and they still were going to either SEMW, semaphore wait, or MTXW. This confused me. So anyhow, we ended up just IPLing the machine and bringing it back up. Once this was complete, I ran a R0006P and this time it just went to a TIMW, time wait. After about 7-8 min, it came out of time wait and finished. I thought that perhaps the package was bad, so last night, I rebuilt another full PY pacakge. Everything built fine and deployed fine with normal times. Again, all UBEs that are submitted immediately go in to TIMW and then "wake up" after several minutes and complete normally. But why? What is now causing them to all go to TIMW for such a long time. Any help would be greatly appreciated. I have attached a pdf with the screenshots of the call stack of one of the UBEs in time wait. Any idea what some of the process names are such as "jdesleepmilliseconds"?
 

Attachments

  • 102672-Processes.pdf
    93 KB · Views: 111
Hi Steven,
Check frm WRKACTJOB that if any kernel is at status SELW/SEMW/DEQW
and, check from SAW that if any of kernel not went into "Zombie" state.
If any of above 2 case is there then bounce once again the JDE Service.
Clear the InterProcess Kernel and then bring back the service.
ENDNET B7334SYS
CLRIPC
STRNET B7334SYS.

Check the kernel logs from SAW and special UBE Kernel logs.
Does UBE kernel get started after bouncing of services?
 
Hmmm..Does indeed sound strange. Any upgrades or OS PTF's or OS things going on before the PY build?
Anyway, Are you running multi-foundation? You can turn on debug for the eServer, or on the AS400 change the JDENET_K jobs to seclvl 00-40 and logCL *yes. When you bounce services you'll get job logs. Post back if you figure it out.
Also, cleared out SQL packages lately. Anything in the JDE.LOG in the IFS?
 
This one is odd. It has kind of just cleaned itself up. I haven't done anything special....at least I don't think I have but the jobs seem to be running on a more normal basis now. I'll let you know if I uncover anything else.
 
Have been having the same thing for many months now. You have your good and bad days.
 
Same thing happenned to us. Jobs would go on TIMW or MTXW. JDE recommended tweaking the JDE.ini file on the AS400. Specifically the maxnumberofprocesses and the maxNetprocesses and we have been okay since then.
 
Call IBM while it's happening. they can look alot lower in the operating system call stacks than what we see with WRKJOB, using SST tools. They can tell you what's happening. I once had the same situation with TIMW's and also had to IPL, as IBM said that the services were waiting on some object that was not going to be available until the instruction after the one waiting was executed. for me that first IPL fixed the issue and it hasn't happened again. but you may have some other software that is in contention(?)

You haven't upgraded to a new SP or anything recently?
 
Back
Top