Severe TIMW on Iseries 570

jlowell

Member
We are currently experiencing some severe issues with Time Wait status on our 570. IBM states that it is not their issue since TIMW are handled and issued by the application. A few UBEs in question; R12807, R42800, and R47011 will go into TIMW status for hours and hold up the rest of the jobs. These jobs go into a coma and end several hours later without errors. The complete the task, but after taking several hours. Peoplesoft asks for logs, but there are no logs stating that anything is going wrong. I am chasing a ghost.

The IBM call stack clearly shows a Peoplesoft program evoking a wait state.

Peoplesoft suggested that we delete SQLPKG files, but we already do that on a regular basis. As it is, Peoplesoft suggest we remove SQLPKG files on almost every call.

We use v5r3, are on the latest IBM CUM levels via IBM/Peoplesoft APAR, meet MTR via Peoplesoft, and service pack 8.93_I1. The service pack is a bit old, but Peoplesoft doeesn't think it is a problem in our case.

Anyone with similar issues?
 
I’ve seen TIMW state in couple of circumstances.

More common one was where I call a sleep() function in my program and pass the time period of how long I want the job to be ‘asleep’. Another similar function is waittime().

The other one was where I programmed a socket program that called accept() function that listens to the incoming request to that particular socket. This will also show as TIMW state. There are other socket functions that may do the same, like recv().

Based on the comments you received from IBM, it sounds like that particular UBE program is calling sleep()-like function. IBM can’t control program execution so obviously they’ll state that problem lies with the UBE program. However, you state that UBE used to run fine, so there is some external factor driving this program to enter prolonged sleep states. So far you’ve been investigating SQL packages, but problem could lie somewhere else.

Where exactly, it’s impossible to say without more information.

Perhaps if you post your call stack at the sleep() time and the joblog up to that point – DSPJOBLOG OUTPUT(*PRINT) – someone will be able to offer some suggestions?

Otherwise, it seems to me that the right people to look at this would be Oracle development support as they know exactly what the UBE program is doing and what could drive it to enter lengthy sleep periods.
 
May I suggest running the UBE with tracing on - to generate the jdedebug.log. You may find at what point the program is freezing. Whether it is within the UBE kernal or in the ER of the report.

Strange how it's only on certain UBEs. I know R42800 pretty well, and there aren't any waittime() or sleep() system calls. I would guess it's having a problem writing the PDF or reading specs or sending a network message. Do these UBEs generate output with a lot of pages?

I saw the TIMW happen on every job because the server JDE.INI had the mail server set to jdedwards.com and were forwarding UBE completion messages via SMTP. The UBE kernel would wait 3 minutes before timing out.
 
Strange, i just started seeing TIMW on UBE's myself and that's a few days after putting on the latest CUM on a 550. The wierd thing is the latest CUM was applied on our DEV partition only, yet the TIMW jobs occur on the Production LPAR. I've managed to hold the UBE's, then release and that seems to wake them up. I've only noticed this the past 2 days and haven't seen which jobs but i will start noting them down.
 
[ QUOTE ]
May I suggest running the UBE with tracing on - to generate the jdedebug.log. You may find at what point the program is freezing. Whether it is within the UBE kernal or in the ER of the report.

[/ QUOTE ]

Turning on tracing is good advice, but it has been difficult to schedule it and then get the errors/situations to happen. From my experience, tracing slows down production, and the TIMW are already causing enough problems, so I am stuck between a rock and a hard place.

I installed the latest PTFs to meet the PeopleSoft MTR. The problem existed prior to the PTFs. I opened a call with IBM concerning this issue and they had nothing to offer me at this time. IBM and I went through a few procedure, but came up without a smoking gun.
 
Are you using Citrix at all? I noticed for me the UBe's that went into extended TIMW for me were being run by a user on a wyse thin terminal device and not the normal PC citrix ica user.
 
[ QUOTE ]
Are you using Citrix at all? I noticed for me the UBe's that went into extended TIMW for me were being run by a user on a wyse thin terminal device and not the normal PC citrix ica user.

[/ QUOTE ]

I am using Citrix; however, I had the UBE, R42800, submitted via Fat Client, using a role without any security and the results were the same.

There is a "breaking news" link on the Peoplesoft site listing must-have ESUs to help with performance problems. That may be our next step.
 
Hi,

We have a similar configuration to what you describe but v5r2 release. We also suffered severe TIMW issues at one stage. At the time, we upgraded to the latest Database Fixpack level (which was more recent than the one specified on the IBM APAR document). We also upgraded to Tools Release 8.93_R1. Another thing to try is to turn off all Real Time Events and then see if you still get the issue.

Hope this helps.

Best Regards,

Sanjeev
 
[ QUOTE ]
We have a similar configuration to what you describe but v5r2 release. We also suffered severe TIMW issues at one stage. At the time, we upgraded to the latest Database Fixpack level (which was more recent than the one specified on the IBM APAR document). We also upgraded to Tools Release 8.93_R1. Another thing to try is to turn off all Real Time Events and then see if you still get the issue.

[/ QUOTE ]

This is good information. From what I see, most of the programs in the service packs were compiled on a v5r2 level system.

Can you explain where I can find about "Real Time Events?" By how much did your TIMW improve? How often do you delete SQLPKG files?

We have been bitten by loading the latest CUM PTFs and hipers. At one time we were actually ahead of PS on this matter and the application crashed. IBM had a fix in 3 days. Now, we try to stay one level behind the latest.
 
Hi,

"Can you explain where I can find about "Real Time Events?" By how much did your TIMW improve? How often do you delete SQLPKG files?"

The RTEs can be marked inactive using the P90701 application. The TIMW issue was preventing our go live preperation as some of the data migration jobs suffered this issue. We managed to resolve the issue altogether through the number of actions I described. We delete SQL packages on a weekly basis but also do it 'ad hoc' if we are taking the OneWorld Services down for any reason.

"We have been bitten by loading the latest CUM PTFs and hipers. At one time we were actually ahead of PS on this matter and the application crashed. IBM had a fix in 3 days. Now, we try to stay one level behind the latest."

We had a senior JDE person from Denver fly over and assist us with this issue. At the time, he actually recommended we upgrade to the latest Database Fixpack.

Unfortunately, we are currently experiencing some different issues with UBEs running on our iSeries which are currently under investigattion (but not TIMW issues!).

With regards to the Performance ESUs, we haven't applied these as such. But we did apply over 100 ESUs when we upgraded from 8.9 to 8.10 (as recommended by Oracle).

I'd imagine you have a case open with Oracle?
 
[ QUOTE ]
I'd imagine you have a case open with Oracle?

[/ QUOTE ]

Yes, we have an open case with Oracle. We actually have lots of open cases with Oracle. We are paying for support, so I use them as a resource as much as I can. I imagine their AS400 people are really getting sick of me.
 
Hi,

I'm curious to know how you are getting on? Any progress on this issue?

We are currently experiencing intermittent UBE problems on our iSeries as well. Not TIMW but similar behaviour.

Thanks/Best Regards,

Sanjeev
 
This may be way off base from your problem but we experienced several weeks of slowness and timw issues on several UBEs on v5r2 on an 840 and Peoplesoft kept pointing to IBM. After thoroughly cleaning up the AS400 and creating many many logs on the OS, it turned out to be the Type 9 security setup on our application. If you use Type 9 security, use it sparingly, we created our own problem by going overboard in using it.
 
[ QUOTE ]
Hi,

I'm curious to know how you are getting on? Any progress on this issue?

We are currently experiencing intermittent UBE problems on our iSeries as well. Not TIMW but similar behaviour.

Thanks/Best Regards,

Sanjeev

[/ QUOTE ]

We are still experiencing the problem; however, we may have a solution at hand. A few people from Oracle have commented that our JDE.INI may not be tuned as good as it should be. We are currently re-tuning it on our development box with some success; however, we will not know if what we change will have any affect on our production system. We should know within a few days to a few weeks. If we are successful, I will publish what we changed.

Apart from the JDE.INI changes, we have also implemented about 150 performance ESUs, so if there is a change for the better, it will be tough to tell whether it was the changes to the INI file or the performance ESUs.
 
We are still working on promoting all the performance ESUs to the production environment; however, we think we have found the solution to our problems.

A little documented portion of the INI was tweaked. A stanza concerning IEO kernels was changed from 1 to 3.

krnlName=IEO KERNEL
dispatchDLLName=JDEIEO
dispatchDLLFunction=JDEK_DispatchIEOMessage
maxNumberOfProcesses=3
numberOfAutoStartProcesses=0

Changing the kernel from 1 to 3 has improved performance. Some jobs that ran 4 to 5 hours have dropped to running 20 to 45 minutes. A very, very big improvement.

On a side note, other jobs on our system have also increased performances due to the changing of all classes on the system. Class timeslice were changed from a standard of 2000ms for interactive and 5000ms for batch to 500ms for everything. We simply went down the list of all classes and changed every single one to 500ms. According to trusted IBM source, most timeslice values are shipped with metrics based on 1970s data. Allowing a job to have more than 500ms with today's fast CPUs is TOO much time.

Try it, and post if you get good results.
 
Back
Top