E9.2 JDE Slowness - fixed by DB Reboot

Soumen

Soumen

Reputable Poster
Dear List,

We have been facing slowness issues with JDE since last few months. The issue is reported all across the system and is reported by all users regardless of any specific module. The slowness is also on batch jobs running.

So far we have observed when the slowness ocuurs it is ususally accompanied by high CPU spikes on the database side. We have a SQL Server 2016 database with SP2. The CPU spike usually happens later in week and progressively gets worse. When this happens our only option has been to reboot the SQL Server database which eventually allivates the issue and the system performance goes back to normal.

We have not noticed any errors in JDE log or any errors in the SQL Server logs as well. We have not made any changes to database like upgrade etc for months now. We do have monthly production deployments but so far we have not been able to isolate the issue to some specific obejcts and hence the code changes does not seem to play a role here.

We have also reached out to MicroSoft try to understand what component of SQL Server database could be cuasing this. So far we have not been able to identify anything concrete. We have done several Re-indexing of tables based on our analysis.

We have not taken any major ESU/ASU. The last tools upgrade was done in April 2021. We upgraded our WebLogic in Novemeber.
I can share our JDE ini and no kernel changes was done and have not faced any issues as such with Kernels getting overloaded.

Appreciate if someone can suggest some advice on how we can further narrow down the issue and find the root cause.

Thanks,
Soumen
 
Run these scripts and report the results:

SQL:
SELECT total_physical_memory_kb/1024 'Physical Memory (MB)'
FROM sys.dm_os_sys_memory

SQL:
EXEC sp_configure 'max server memory'

SQL:
SELECT
physical_memory_in_use_kb/1024 AS sql_physical_memory_in_use_MB,
    large_page_allocations_kb/1024 AS sql_large_page_allocations_MB,
    locked_page_allocations_kb/1024 AS sql_locked_page_allocations_MB,
    virtual_address_space_reserved_kb/1024 AS sql_VAS_reserved_MB,
    virtual_address_space_committed_kb/1024 AS sql_VAS_committed_MB,
    virtual_address_space_available_kb/1024 AS sql_VAS_available_MB,
    page_fault_count AS sql_page_fault_count,
    memory_utilization_percentage AS sql_memory_utilization_percentage,
    process_physical_memory_low AS sql_process_physical_memory_low,
    process_virtual_memory_low AS sql_process_virtual_memory_low
FROM sys.dm_os_process_memory;
 
Had something similar happen a year or so ago. Network admins had implemented new security policies.
Windows was set to log everything. Also log sizes were set very high. Logs were never truncated ...
Windows server was being kept busy logging and managing same ...
 
SELECT total_physical_memory_kb/1024 'Physical Memory (MB)' FROM sys.dm_os_sys_memory
4177887 MB

SQL:
EXEC sp_configure 'max server memory'
nameminimummaximumconfig_valuerun_value
max server memory (MB)
128​
2147483647​
3686400​
3686400​



SQL:
SELECT
physical_memory_in_use_kb/1024 AS sql_physical_memory_in_use_MB,
large_page_allocations_kb/1024 AS sql_large_page_allocations_MB,
locked_page_allocations_kb/1024 AS sql_locked_page_allocations_MB,
virtual_address_space_reserved_kb/1024 AS sql_VAS_reserved_MB,
virtual_address_space_committed_kb/1024 AS sql_VAS_committed_MB,
virtual_address_space_available_kb/1024 AS sql_VAS_available_MB,
page_fault_count AS sql_page_fault_count,
memory_utilization_percentage AS sql_memory_utilization_percentage,
process_physical_memory_low AS sql_process_physical_memory_low,
process_virtual_memory_low AS sql_process_virtual_memory_low
FROM sys.dm_os_process_memory;

sql_physical_memory_in_use_MBsql_large_page_allocations_MBsql_locked_page_allocations_MBsql_VAS_reserved_MBsql_VAS_committed_MBsql_VAS_available_MBsql_page_fault_countsql_memory_utilization_percentagesql_process_physical_memory_lowsql_process_virtual_memory_low
2421491​
0​
0​
5765037​
2427953​
128452690​
844124067​
99​
0​
0​
 
4177887 MB

SQL:
EXEC sp_configure 'max server memory'
nameminimummaximumconfig_valuerun_value
max server memory (MB)
128​
2147483647​
3686400​
3686400​



SQL:
SELECT
physical_memory_in_use_kb/1024 AS sql_physical_memory_in_use_MB,
large_page_allocations_kb/1024 AS sql_large_page_allocations_MB,
locked_page_allocations_kb/1024 AS sql_locked_page_allocations_MB,
virtual_address_space_reserved_kb/1024 AS sql_VAS_reserved_MB,
virtual_address_space_committed_kb/1024 AS sql_VAS_committed_MB,
virtual_address_space_available_kb/1024 AS sql_VAS_available_MB,
page_fault_count AS sql_page_fault_count,
memory_utilization_percentage AS sql_memory_utilization_percentage,
process_physical_memory_low AS sql_process_physical_memory_low,
process_virtual_memory_low AS sql_process_virtual_memory_low
FROM sys.dm_os_process_memory;

sql_physical_memory_in_use_MBsql_large_page_allocations_MBsql_locked_page_allocations_MBsql_VAS_reserved_MBsql_VAS_committed_MBsql_VAS_available_MBsql_page_fault_countsql_memory_utilization_percentagesql_process_physical_memory_lowsql_process_virtual_memory_low
2421491​
0​
0​
5765037​
2427953​
128452690​
844124067​
99​
0​
0​

SQL Max Memory is configured properly.

Is this on a VM? Run from command prompt:
Code:
reg query HKLM\SYSTEM\CurrentControlSet\Control\SystemInformation\

If it is a VM, have your guys find out what the VMware reserved memory is for this virtual server.

Also, please report the results of this query:
SQL:
select
(physical_memory_in_use_kb/1024)Memory_usedby_Sqlserver_MB,
(locked_page_allocations_kb/1024 )Locked_pages_used_Sqlserver_MB,
(total_virtual_address_space_kb/1024 )Total_VAS_in_MB,
process_physical_memory_low,
process_virtual_memory_low
from sys.dm_os_process_memory

and this query:
SQL:
exec ('sp_readerrorlog 0, 1, ''Using locked pages''')
 
Hi @brother_of_karamazov ,

This is not a VM machine but a dedicated Amazon EC2 X2iedn 32x Large server.

Result of the 2 queries mentioned are below.

Memory_usedby_Sqlserver_MBLocked_pages_used_Sqlserver_MBTotal_VAS_in_MBprocess_physical_memory_lowprocess_virtual_memory_low
2889440​
0​
134217727​
0​
0​


SQL:
exec ('sp_readerrorlog 0, 1, ''Using locked pages''')

(0 rows affected)

No records returned.
 
You should really have a conversation with Amazon about memory reservation for this SQL server and ensure that no other processes or containers are contending with SQL for memory.

Also, Lock Pages in Memory should be in use on any SQL server of any decent size, assuming the SQL server is on a machine solely dedicate to SQL (not a combination application server, web server, etc.).

https://blog.sqlauthority.com/2019/09/16/sql-server-enable-lock-pages-in-memory-lpim/

Can you also run the following and report the results:

SQL:
select Instancename,
TargetServerMemoryKB/1024 'TargetServerMemoryMB',
TotalServerMemoryKB/1024 as 'TotalServerMemoryMB',
PLE, (TargetServerMemoryKB/1024)/PLE as 'ChurnMB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy'  then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters)
as p

and

SQL:
SELECT @@servername 'Server Name', getdate(),
numa_node = ISNULL(NULLIF(ple.instance_name, ''), 'ALL'),
    ple_sec = ple.cntr_value, db_node_mem_GB = dnm.cntr_value*8/1048576,
    db_node_mem_MB = dnm.cntr_value*8/(1048576/1000),
    ple_per_4gb = CASE WHEN ple.instance_name = ''
            THEN (SUM(ple.cntr_value) OVER()-ple.cntr_value) * 4194304 / (dnm.cntr_value*8)
            ELSE ple.cntr_value * 4194304 / (dnm.cntr_value*8)
        END
FROM sys.dm_os_performance_counters ple join sys.dm_os_performance_counters dnm
    on ple.instance_name = dnm.instance_name
    and ple.counter_name='Page life expectancy' -- PLE per NUMA node
    and dnm.counter_name='Database pages' -- buffer pool size (in pages) per NUMA node
 
Jeff,

Here it is..


InstancenameTargetServerMemoryMBTotalServerMemoryMBPLEChurnMB/sec
<Server_Name_Masked>
3686400​
3274077​
180691​
20​



Server Name(No column name)numa_nodeple_secdb_node_mem_GBdb_node_mem_MBple_per_4gb
<Server_Name_Masked>
2022-07-07 14:51:42.027​
ALL
180731​
2935​
2936889​
492​
<Server_Name_Masked>
2022-07-07 14:51:42.027​
1​
180731​
1432​
1433755​
504​
<Server_Name_Masked>
2022-07-07 14:51:42.027​
0​
180731​
1502​
1503134​
481​
 
Hmmm, Buffer Cache churn is not terribly high, but the Page Life Expectancy values may indicate that pages are not staying in memory very long - around 500 seconds per 4GB per NUMA node, not far off the recommended minimum value of 300 second.

How big are your databases?

Also, tell me what the results are for this:

SQL:
SELECT name, value_in_use
    FROM sys.configurations
    WHERE
    description LIKE '%parallelism%'

and

SQL:
select (physical_memory_in_use_kb/1024) as Phy_Mem_in_mb,
(virtual_address_space_committed_kb/1024) as Total_mem_used_MB,
(virtual_address_space_committed_kb - physical_memory_in_use_kb)/1024 as Mem_as_Pagefile_MB
 from sys.dm_os_process_memory
 
Are there multiple SQL Server instances running on this SQL server?
 
Jeff,

The proddta is around 6 TB. There is one primary node and a secondary node on AOAG - Always On SQL cluster. There is also a replicated database node for the proddta database used heavily for BI related reporting stuff. Also we have been using ROW compression mode across all tables.


SQL:
SELECT name, value_in_use
FROM sys.configurations
WHERE
description LIKE '%parallelism%'

namevalue_in_use
cost threshold for parallelism
5​
max degree of parallelism
4​



SQL:
select (physical_memory_in_use_kb/1024) as Phy_Mem_in_mb,
(virtual_address_space_committed_kb/1024) as Total_mem_used_MB,
(virtual_address_space_committed_kb - physical_memory_in_use_kb)/1024 as Mem_as_Pagefile_MB
from sys.dm_os_process_memory


Phy_Mem_in_mbTotal_mem_used_MBMem_as_Pagefile_MB
3274264​
3282161​
7896​
 
We saw a similar issue once when an orchestration ran wild and triggered 10 million records at once into the subsystem control table and then SQL Server had a malicious query plan cached, even after we cleared the table from the entries.

It's worth a try to execute DBCC FREEPROCCACHE when performance goes down. You can clear a single cache (if you can find a malicious one) or all of them. Use with caution though, read up on it here, especially the Remarks section: https://docs.microsoft.com/en-us/sq...eproccache-transact-sql?view=sql-server-ver16
 
Jeff,

The proddta is around 6 TB. There is one primary node and a secondary node on AOAG - Always On SQL cluster. There is also a replicated database node for the proddta database used heavily for BI related reporting stuff. Also we have been using ROW compression mode across all tables.


SQL:
SELECT name, value_in_use
FROM sys.configurations
WHERE
description LIKE '%parallelism%'

namevalue_in_use
cost threshold for parallelism
5​
max degree of parallelism
4​



SQL:
select (physical_memory_in_use_kb/1024) as Phy_Mem_in_mb,
(virtual_address_space_committed_kb/1024) as Total_mem_used_MB,
(virtual_address_space_committed_kb - physical_memory_in_use_kb)/1024 as Mem_as_Pagefile_MB
from sys.dm_os_process_memory


Phy_Mem_in_mbTotal_mem_used_MBMem_as_Pagefile_MB
3274264​
3282161​
7896​

Ahhhh, it's in an Availability Group. Wish you would have mentioned that earlier.

Tell Microsoft Support that you want to talk to Stuart Miller. Don't stop insisting until you are talking to that guy, and only that guy. Tell Stuart that you are talking with Jeff Stevenson and that you are experiencing that same Version Store issue we dealt with from early to mid-2019 with a customer in Connecticut.

Also, unrelated to your issue - you need more memory, probably double what you currently have, and increase your SQL max memory accordingly. You also should be using Lock Pages in Memory.

Get in touch with Stuart Miller. If you need to, contact me at [email protected], your issue is a very complex one related to AG's and Version Store and the fix is not easy.
 
Ahhhh, it's in an Availability Group. Wish you would have mentioned that earlier.

Tell Microsoft Support that you want to talk to Stuart Miller. Don't stop insisting until you are talking to that guy, and only that guy. Tell Stuart that you are talking with Jeff Stevenson and that you are experiencing that same Version Store issue we dealt with from early to mid-2019 with a customer in Connecticut.

Also, unrelated to your issue - you need more memory, probably double what you currently have, and increase your SQL max memory accordingly. You also should be using Lock Pages in Memory.

Get in touch with Stuart Miller. If you need to, contact me at [email protected], your issue is a very complex one related to AG's and Version Store and the fix is not easy.

I figured you'd like to know exactly what is causing your high CPU issue while you wait for MS. I searched back through my notes and found this:


The issue with Ghost Cleanup causing high CPU was eventually determined to be caused by the Ghost Cleanup process working through a very large version
store in use. A large version store in use *can* also show up as a high number of version ghost records and *can* also show up as a very high version store reserved count (measured by Spotlight). However, the reverse is not always true – a high ghost record count and/or a high version store reserved value do not necessarily mean a high version store in use value.

Long-running open transactions can cause high version store in use, high version store reserved and high transaction log space usage. The impact on
each is different. High version store in use will cause the high CPU condition. High version store reserved may eventually consume all tempdb space.
High transaction log space usage may eventually consume all transaction log space.


The long-running open transaction mentioned in the notes is likely a BI query. I've got a query to identify exactly *which* transaction, but that will cost you since the pain and effort involved in creating it has value.
 
Jeff,

Thank you very much for all your help with this. We are trying to get MS to enagage Stuart MIller and if I need your assistance I will reach out to you seperately in a formal capacity. Thanks again.
 
Back
Top