Some Job Queues does not work

husni

Well Known Member
We have recently upgraded our OW servers from Windows NT to Windows 2003. Since then, we sometimes face a problem with our job queues, if the OW services are restarted on enterprise server.

We have setup 8 single threaded job queues. Yesterday, I restarted the services on the server. After restart, only 5 job queues are working. Jobs submitted to other 3 queues are hanging with status "W".

We have made the services to manual mode and tried starting the services with a 2 min gap between each other. This method sometimes works, but yesterday no luck.

Any idea whats going with our services?

Thanks in advance for your kind assistance.
 
Look to see if you are getting any Zombie processes in your logs on your batch servers.
 
I don't mean to hijack this thread but do you have any more information on the impact of Zombies on queues.

The reason that I ask is that we have been having similar problems to Husni, for the last couple of weeks. After working perfectly well for months we have had issues with queues just stopping or with jobs being marked as processing when nothing appears to be happening or jobs just queueing up with a status of W or S. We have sometimes been able to reactivate the queues by making them inactive and then active and then refreshing them.

This has not always worked and we have had to stop and start services once or twice as well. On investigation the only other issue that we have come across is high levels of Zombie processes on our (Unix) Enterprise server, again requiring a stop/start of services. Though whether this is symptom or a cause we don't know.

We have had no major systems changes recently that may explain this sudden alteration in behaviour. Though we have had a few network issues which took a while to pin down.

I would welcome any advice that you had.

Thanks
 
Zombies are a symptom of a problem and not the cause. Zombie processes are dead processes that have not been "cleaned up" by their parent. They are not doing anything other than taking up a spot in the process table. In E1, if you see zombies, you will typically see zombie call object and runube processes. These crop up when some piece of code encounters a fatal exception. Typically bad data (unexpected null values usually), changes in data selection and sequencing that alter the flow of the UBE code in an unexpected way or corrupt version specs can cause runube processes to crash.

Why should a failing UBE cause the queues to lose threads? Who knows, the JDE programmers responsible for coding the job queues should have been able to deal with the situation. Until they get their act together we can only deal with crashing UBEs on a case by case basis.

I would suggest: redeploy the version and template of the crashing UBE, investigate if there has been a recent deployment of a new UBE or change to a UBE and investigate if there have been recent changes to processing options, data selection or sequencing.

Regards,
 
Thanks for the advice Justin

I will speak to the development team about recent changes, to see if we can identify possible causes
 
Thanks Shawn, Justin, for your valuable input. I have a got a lead to resolving the problem.
 
Back
Top