E9.2 Delay in UBE Processing

Soumen

Soumen

Reputable Poster
Dear List,

We have observed in our environment that there is a slight delay in a batch moving from 'S' to 'P' in our environment. The delay is around 3-4 seconds across multiple servers and multiple environments. I tried rebuilding the indexes on F986110 tables but that did not help.

Is this normal or is there any tweaking that can be done to improve this performance?

We are on 9.2.7.1 (64 bit), Windows 2019 and SQL Server 2019.


13316/12688 Mon Jan 29 14:24:04.173000 ipcmisc.c348
EnterpriseOne Tools Release: '9.2.7.1'
13316/12688 Mon Jan 29 14:24:04.174000 runbatch.c399
Startup for User=MRIVERO, Env=JPD920, Role=*ALL, Job#=1286945
13316/12688 Mon Jan 29 14:24:07.218000 zdrv.cpp345
Initializing the Z Driver
13316/12688 Mon Jan 29 14:24:07.219000 zdrv.cpp359
Gettting user data from hEnv
 
It's very normal and there are loads of things you can do to speed that up. (what type of queue are they running in? Is it a very large spec UBE)
It's basically looking for resoueces/alocating threads and memory before it starts properly.


Issue 2: UBEs Remains in 'S' Status Indefinitely​

If a job is in 'S' status indefinitely it means it has already been picked up by the QUEUE kernel (8.9 and above) or by the JDEQUEUE process (Xe and ERP8). It also means that there was a problem starting the RUNBATCH process, or EnterpriseOne services ended unexpectedly while jobs were getting ready to be processed. In this scenario there is no automated process in EnterpriseOne that will update the job status out of 'S'.

If you experience this problem, review the QUEUE kernel logs as well as any RUNBATCH logs in the regular EnterpriseOne logs directory (RUNBATCH logs are copied to the PrintQueue directory only if the job completes).

Additional information can be found in Document 1077518.1 - E1: UBE: What Causes Jobs to Remain in S (InQueue) Status as Seen from Work with Submitted Jobs?

You should never see jobs in 'S' status if your server is AS/400.


Is it all UBEs or just certain ones?
 
It's very normal and there are loads of things you can do to speed that up. (what type of queue are they running in? Is it a very large spec UBE)
It's basically looking for resoueces/alocating threads and memory before it starts properly.
It's actually happening for all UBEs like 3-4 secs sitting in S. I am able to to see it in case of R0006P as well.
I have seen that doc and checked a few things already.

The question I had is - Is 3-4 secs normal for the table as big as mine. (3321335 records)
 
3-4 secs seems excessive to me. - 3 million records isn't huge. From S to P should be near instant if there's nothing behind the job. I know you've checked your SQL server indexes, but 9 times out of 10, when I see this issue - it's a problem there. Are you sure there's no index fragmentation in play? Do you have the option on to log long running SQL queries? Maybe change it to 3 seconds for a bit and see if anything comes out on the kernel logs.
 
That's not too big and i'm with @TFZ here - that should be faster. I mean, purging that table NEVER hurts, so keep that in mind.
Did you check for expensive queries in SQL Server and maybe it suggests another index? JDE indexes are good, yes - but i am not saying they're perfect or the last hope. Do you have processes that check the table? Like orchestrations or some custom reports, subsystems etc.?
 
3-4 secs seems excessive to me. - 3 million records isn't huge. From S to P should be near instant if there's nothing behind the job. I know you've checked your SQL server indexes, but 9 times out of 10, when I see this issue - it's a problem there. Are you sure there's no index fragmentation in play? Do you have the option on to log long running SQL queries? Maybe change it to 3 seconds for a bit and see if anything comes out on the kernel logs.
Actually the queries that are popping up with the 2 sec threshold are on F96110R table but not seeing an index on those queries.
Size of F986110R is --10548422 rows


11588/12252 Mon Jan 15 11:17:30.242000 jdbodbc.c8989
doTimeOutQueryDiagnostics: The following SQL query took 2 seconds which is equal to or greater than QueryExecutionTimeThreshold (2 seconds) for E1User(JDE) with DBProxyUser(JDE).

11588/12252 Mon Jan 15 11:17:30.243000 jdbodbc.c9018
SELECT * FROM JDE920.SY920.F986110R WHERE ( SBEXEHOST = 'XXXXXXXX ' AND SBJOBNBR = 1221033.000000 AND SBFITYPE = 'IN' ) ORDER BY SBEXEHOST ASC,SBJOBNBR ASC,SBFITYPE ASC,SBSBMDATE ASC,SBSBMTIME ASC



11588/12252 Sat Jan 20 09:27:23.025000 jdbodbc.c8989
doTimeOutQueryDiagnostics: The following SQL query took 2 seconds which is equal to or greater than QueryExecutionTimeThreshold (2 seconds) for E1User(JDE) with DBProxyUser(JDE).

11588/12252 Sat Jan 20 09:27:23.026000 jdbodbc.c9018
DELETE FROM JDE920.SY920.F986110R WHERE ( SBEXEHOST = 'XXXXXX' AND SBJOBNBR = 1092143.000000 )
 
We had something like this and we deleted a load of old PDF CSV and logs etc using R9861101 and alike
 
Well that's sort of interesting, but I would think operations there would be from P -> D (i.e. the UBE is done, and now it writes it to the repository.) I'm assuming it's larger because you're across environments / multiple server maps?

Did you try putting the select query in to SMSS and analyzing it?
 
Well that's sort of interesting, but I would think operations there would be from P -> D (i.e. the UBE is done, and now it writes it to the repository.) I'm assuming it's larger because you're across environments / multiple server maps?

Did you try putting the select query in to SMSS and analyzing it?
Yes that's right ... multiple env and server maps.

Yes SMSS comes back in an instant no index recommendation. Execution plan looks fine as well.
 
Just thinking again, F986110R would be the write / open of the PDF, Im assuming it gets inserted there after Processing is completed, and maybe that's a normal amount of time. I was more worried about queries against the F986110 or polling in general on the database from the BIP kernels maybe adding strain, but if you're not seeing that - I'm somewhat stumped without looking at it - but S -> P taking that long definitely is not normal. Even with fragmentation, I would think you'd see a long running query pop up in the monitor.... hrmmm... ...anything going on with malware scanning/antivirus on the enterprise server?
 
One of my coworkers found this and we changed QKOnIdle from 20 to 30 seconds and all our batch delays went away. This assumes you are using VB queues.

E1: UBE: Virtual Batch Queues - Balancing Of UBE's (Doc ID 2819398.1)


And it mentions:


3) The servers work together by randomly being send job scheduling messages, and then by scanning for waiting jobs in F986110 table on the "QKOnIdle" delay time in the JDE.INI file. The default setting for this parameter is 300 seconds (5 minutes). Reduce this setting to 30 or 45 seconds in Server manager to allow the other server to see waiting jobs sooner. The setting on server manager is found under [Configuration-->Basic--> Batch Processing] after selecting the enterprise server, then scroll down to the "Network Queue Settings" and change "Queue Kernel Idle Time" from 300 to 30 or 45 seconds. DO NOT set this setting lower than 30 seconds. Do this on both servers."QKOnIdle" setting is cached, and will require the server's services for E1 to be stopped and restarted.

Make sure that, in Server manager, the Queue Kernels definitions are set to "AutoStart = 1" queue kernel when services start. (Server Manager, select the server, [Configuration-->Basic-->Kernel Definitions], then under the [Queue Kernel (Type 14)] section, set "Auto-start Process count = 1".
 
Along with the above notes... QKOnIdle in regards to VBQ, etc....

Note that since a change in 9.2.5, the spec prep message moved from UBE kernel upon submission to W status, to inside of queue kernel, which moves it to 'S' status ahead of spec prep. There is a "pre-prepare" of up to the queue depth in jobs in S status.

Example:
If you have a queue configured for, say, four jobs, up to 4 will be in P status, and up to 4 will be in S status having specs prepared for when there is room in the queue to go to P status. If the 4 jobs in P status all take 4 seconds to complete, then the jobs in S status will apeear to have a delay, but in reality, your queue should be running very close to the defined depth in P status.

Please double check the count in 'P' and 'S' -- since 9.2.5.x you should have almost full depth count in 'P' and another almost full depth count in 'S' on a busy server.

This later spec prep enabled the 9.2.5 'Move job' and 'resubmit job' capabilities if I recall my Tech Task force. This also went with the VBQ change enablement, where you don't know which server will run the job until it is taken into 'S' status.
 
Last edited:
Back
Top