Database lock on F0002 and failure of X0010 BSFN

tlwalker3

Active Member
Without notice the JDE system stops responding and the users begin to get "Unauthorized to access..." type error messages. Rebooting of the JAS and Logic server restores service. In reviewing the logs, the issue seems to start with a database lock on the F0002 table and the failure of the X0010GetNextNumber (X0010) business function. Once the lock occurs, all DB connections are lost and all the JDE user sessions are locked-up. We've increased the "enterpriseserverTimeout" and "JDENETTimeout" settings to allow the asynchronous business functions more time for processing. One other solution that we are looking at is enabling RCSI (Read Committed Snapshot Isolation) on the SQL database. Has anyone seen this issue, or have any experience with RCSI?

3328/3576 SYS:Dispatch Fri Jun 28 10:23:23.006000 JDEKDISP.C2063
KNT0000099 - JDEK_DispatchCallObjectMessage: Pooled Connection: 09D2AEA0 timed out!

3328/3576 SYS:Dispatch Fri Jun 28 10:24:06.249000 JDB_CTL.C6855
JDB4200003 - OPEN TABLE NOT CLOSED = F0002, opened from File=X0010.C, Function=X0010GetNextNumber, Line=333.

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085000 ODBC_P1.C1485
ODB0000163 - wSQLExtendedFetch failure. rc=-1

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085001 ODBC_P1.C1485
ODB0000164 - STMT:00 [08S01][121][2] [Microsoft][SQL Server Native Client 10.0]TCP Provider: The semaphore timeout period has expired.
3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085002 ODBC_P1.C1485
ODB0000164 - STMT:01 [08S01][121][2] [Microsoft][SQL Server Native Client 10.0]Communication link failure

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085003 JDB_RQ1.C1559
JDB3100001 - Failed to open table F9860 because of invalid user handle

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085004 JTP_TM.C1385
JDB9909103 - Fetch failed because the requested row of (F0002) is locked by another user.

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085005 jdbodbc.C2692
ODB0000019 - DBResetRequest failed - lost database connection.

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085006 JDB_DRVM.C1082
JDB9900172 - Failed to execute db fetch

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085007 jdbodbc.C2692
ODB0000019 - DBResetRequest failed - lost database connection.

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085008 JDB_DRVM.C1082
JDB9900172 - Failed to execute db fetch

3328/2660 WRK:MECONNO_073BA1B0_P03B102 Fri Jun 28 10:25:10.085009 ODBCLOG.C436
ODB0000162 - Connection lost during earlier operation.


Config:
JDE E1 9.1 Tools 9.1.0.3
DB: Windows 2008 R2/ SQL 2008
ES: Windows 2008 R2
HTML WLS 10.3.5.0
tongue.gif
 
Never, ever, ever, ever increase the "enterpriseserverTimeout" and "JDENETTimeout" settings to allow the asynchronous business functions more time for processing.

This will simply mast the real issues and cause a lot of pain later on.

Customers often do this due allow "bad code" to execute. All works well in testing until go-live when the wheels really come off the bus......


RCSI is low hanging fruit and should be on. You have to do some tuning around TEMPDB but this is relatively straight forward.

You should approach this issue from a SQL Server tuning standpoint and not from a JDE perspective. I've seen this issue countless times and it's always SQL Server.

I've also never resolved this without changing the infrastrucutre (disk) configuration so you might need to prepare for some small investement there.

There are only a few JDE values you can change with respect to this specific issue:

msSQLQueryAttempts=3
msSQLQueryTimeout=10000

You can do a quick search on the above parameters to get a detailed explaination.


Good luck!


Colin
 
Colin,
Thank you for the response. We have already adjusted the setting you mentioned and things improved, but not resolved.

msSQLQueryTimeout=180000
msSQLQueryAttempts=10

Oracle recommended the JDENET timeout setting changes... and yes I agree it is only masking the real issue. Which is why we are looking at enabling RCSI, in addition to other tuning oppertunities. Can you provide any more detail on the disk configration changes?... Like SSD for tempDB and Logs?
 
I have used SSD for temp db and it has worked well and improved the performance but mainly for UBEs that dealt with large data sets and sorting etc.

We also set tempdb to use multiple files , they say go with one file for each CPU available , but I personally prefer to go with N -1 where N is the number of CPUs available. But there is a threshold you do not want to go over (Plenty of research & reading material available on the internet on this topic)

If you are a batch job heavy site I think you will definitely stand to gain from tempdb tuning.
 
Thank you for the recommendations. We will look into what you've suggested.
 
Colin,

I both agree and disagreen with you.

It does mask the issue, but if the issue is high concurrency on a particular record (or records) it may be necessary to up the timeout. We finally did this because we had a UBE (vanilla, forget the name) that would lock a bunch of records. The online people were getting the BSFN timed out message in Order Entry. And as you know it doesn't tell you what order it was, orders were just 'disappearing' from the users standpoint via virtual rollback.

We finally changed the timeout and the issue was resolved. However it wasn't my first choice.

Just one mans opinion.

Tom
 

Similar threads

Back
Top