E9.2 JDE Slowness - fixed by DB Reboot

Soumen · Jul 5, 2022

Dear List,

We have been facing slowness issues with JDE since last few months. The issue is reported all across the system and is reported by all users regardless of any specific module. The slowness is also on batch jobs running.

So far we have observed when the slowness ocuurs it is ususally accompanied by high CPU spikes on the database side. We have a SQL Server 2016 database with SP2. The CPU spike usually happens later in week and progressively gets worse. When this happens our only option has been to reboot the SQL Server database which eventually allivates the issue and the system performance goes back to normal.

We have not noticed any errors in JDE log or any errors in the SQL Server logs as well. We have not made any changes to database like upgrade etc for months now. We do have monthly production deployments but so far we have not been able to isolate the issue to some specific obejcts and hence the code changes does not seem to play a role here.

We have also reached out to MicroSoft try to understand what component of SQL Server database could be cuasing this. So far we have not been able to identify anything concrete. We have done several Re-indexing of tables based on our analysis.

We have not taken any major ESU/ASU. The last tools upgrade was done in April 2021. We upgraded our WebLogic in Novemeber.
I can share our JDE ini and no kernel changes was done and have not faced any issues as such with Kernels getting overloaded.

Appreciate if someone can suggest some advice on how we can further narrow down the issue and find the root cause.

Thanks,
Soumen

brother_of_karamazov · Jul 6, 2022

Run these scripts and report the results:

SQL:

SELECT total_physical_memory_kb/1024 'Physical Memory (MB)'
FROM sys.dm_os_sys_memory

SQL:

EXEC sp_configure 'max server memory'

SQL:

SELECT
physical_memory_in_use_kb/1024 AS sql_physical_memory_in_use_MB,
    large_page_allocations_kb/1024 AS sql_large_page_allocations_MB,
    locked_page_allocations_kb/1024 AS sql_locked_page_allocations_MB,
    virtual_address_space_reserved_kb/1024 AS sql_VAS_reserved_MB,
    virtual_address_space_committed_kb/1024 AS sql_VAS_committed_MB,
    virtual_address_space_available_kb/1024 AS sql_VAS_available_MB,
    page_fault_count AS sql_page_fault_count,
    memory_utilization_percentage AS sql_memory_utilization_percentage,
    process_physical_memory_low AS sql_process_physical_memory_low,
    process_virtual_memory_low AS sql_process_virtual_memory_low
FROM sys.dm_os_process_memory;

Larry_Jones · Jul 6, 2022

Had something similar happen a year or so ago. Network admins had implemented new security policies.
Windows was set to log everything. Also log sizes were set very high. Logs were never truncated ...
Windows server was being kept busy logging and managing same ...

Soumen · Jul 6, 2022

brother_of_karamazov said:
SELECT total_physical_memory_kb/1024 'Physical Memory (MB)' FROM sys.dm_os_sys_memory

4177887 MB

SQL:
EXEC sp_configure 'max server memory'

name	minimum	maximum	config_value	run_value
max server memory (MB)	128	2147483647	3686400	3686400

SQL:
SELECT
physical_memory_in_use_kb/1024 AS sql_physical_memory_in_use_MB,
large_page_allocations_kb/1024 AS sql_large_page_allocations_MB,
locked_page_allocations_kb/1024 AS sql_locked_page_allocations_MB,
virtual_address_space_reserved_kb/1024 AS sql_VAS_reserved_MB,
virtual_address_space_committed_kb/1024 AS sql_VAS_committed_MB,
virtual_address_space_available_kb/1024 AS sql_VAS_available_MB,
page_fault_count AS sql_page_fault_count,
memory_utilization_percentage AS sql_memory_utilization_percentage,
process_physical_memory_low AS sql_process_physical_memory_low,
process_virtual_memory_low AS sql_process_virtual_memory_low
FROM sys.dm_os_process_memory;

sql_physical_memory_in_use_MB	sql_large_page_allocations_MB	sql_locked_page_allocations_MB	sql_VAS_reserved_MB	sql_VAS_committed_MB	sql_VAS_available_MB	sql_page_fault_count	sql_memory_utilization_percentage	sql_process_physical_memory_low	sql_process_virtual_memory_low
2421491	0	0	5765037	2427953	128452690	844124067	99	0	0

Soumen · Jul 6, 2022

Thanks @brother_of_karamazov . I have posted the result of the queries above.

Thanks @Larry_Jones - Let me get that checked as well in our environment.

brother_of_karamazov · Jul 7, 2022

Soumen said:
4177887 MB

SQL:
EXEC sp_configure 'max server memory'

name minimum maximum config_value run_value
max server memory (MB)
128
2147483647
3686400
3686400

SQL:
SELECT
physical_memory_in_use_kb/1024 AS sql_physical_memory_in_use_MB,
large_page_allocations_kb/1024 AS sql_large_page_allocations_MB,
locked_page_allocations_kb/1024 AS sql_locked_page_allocations_MB,
virtual_address_space_reserved_kb/1024 AS sql_VAS_reserved_MB,
virtual_address_space_committed_kb/1024 AS sql_VAS_committed_MB,
virtual_address_space_available_kb/1024 AS sql_VAS_available_MB,
page_fault_count AS sql_page_fault_count,
memory_utilization_percentage AS sql_memory_utilization_percentage,
process_physical_memory_low AS sql_process_physical_memory_low,
process_virtual_memory_low AS sql_process_virtual_memory_low
FROM sys.dm_os_process_memory;

sql_physical_memory_in_use_MB sql_large_page_allocations_MB sql_locked_page_allocations_MB sql_VAS_reserved_MB sql_VAS_committed_MB sql_VAS_available_MB sql_page_fault_count sql_memory_utilization_percentage sql_process_physical_memory_low sql_process_virtual_memory_low
2421491
0
0
5765037
2427953
128452690
844124067
99
0
0

SQL Max Memory is configured properly.

Is this on a VM? Run from command prompt:

Code:

reg query HKLM\SYSTEM\CurrentControlSet\Control\SystemInformation\

If it is a VM, have your guys find out what the VMware reserved memory is for this virtual server.

Also, please report the results of this query:

SQL:

select
(physical_memory_in_use_kb/1024)Memory_usedby_Sqlserver_MB,
(locked_page_allocations_kb/1024 )Locked_pages_used_Sqlserver_MB,
(total_virtual_address_space_kb/1024 )Total_VAS_in_MB,
process_physical_memory_low,
process_virtual_memory_low
from sys.dm_os_process_memory

and this query:

SQL:

exec ('sp_readerrorlog 0, 1, ''Using locked pages''')

Soumen · Jul 7, 2022

Hi @brother_of_karamazov ,

This is not a VM machine but a dedicated Amazon EC2 X2iedn 32x Large server.

Result of the 2 queries mentioned are below.

Memory_usedby_Sqlserver_MB	Locked_pages_used_Sqlserver_MB	Total_VAS_in_MB	process_physical_memory_low	process_virtual_memory_low
2889440	0	134217727	0	0

SQL:
exec ('sp_readerrorlog 0, 1, ''Using locked pages''')

(0 rows affected)

No records returned.

brother_of_karamazov · Jul 7, 2022

You should really have a conversation with Amazon about memory reservation for this SQL server and ensure that no other processes or containers are contending with SQL for memory.

Also, Lock Pages in Memory should be in use on any SQL server of any decent size, assuming the SQL server is on a machine solely dedicate to SQL (not a combination application server, web server, etc.).

https://blog.sqlauthority.com/2019/09/16/sql-server-enable-lock-pages-in-memory-lpim/

Can you also run the following and report the results:

SQL:

select Instancename,
TargetServerMemoryKB/1024 'TargetServerMemoryMB',
TotalServerMemoryKB/1024 as 'TotalServerMemoryMB',
PLE, (TargetServerMemoryKB/1024)/PLE as 'ChurnMB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy'  then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters)
as p

and

SQL:

SELECT @@servername 'Server Name', getdate(),
numa_node = ISNULL(NULLIF(ple.instance_name, ''), 'ALL'),
    ple_sec = ple.cntr_value, db_node_mem_GB = dnm.cntr_value*8/1048576,
    db_node_mem_MB = dnm.cntr_value*8/(1048576/1000),
    ple_per_4gb = CASE WHEN ple.instance_name = ''
            THEN (SUM(ple.cntr_value) OVER()-ple.cntr_value) * 4194304 / (dnm.cntr_value*8)
            ELSE ple.cntr_value * 4194304 / (dnm.cntr_value*8)
        END
FROM sys.dm_os_performance_counters ple join sys.dm_os_performance_counters dnm
    on ple.instance_name = dnm.instance_name
    and ple.counter_name='Page life expectancy' -- PLE per NUMA node
    and dnm.counter_name='Database pages' -- buffer pool size (in pages) per NUMA node

Soumen · Jul 7, 2022

Jeff,

Here it is..

Instancename	TargetServerMemoryMB	TotalServerMemoryMB	PLE	ChurnMB/sec
<Server_Name_Masked>	3686400	3274077	180691	20

Server Name	(No column name)	numa_node	ple_sec	db_node_mem_GB	db_node_mem_MB	ple_per_4gb
<Server_Name_Masked>	2022-07-07 14:51:42.027	ALL	180731	2935	2936889	492
<Server_Name_Masked>	2022-07-07 14:51:42.027	1	180731	1432	1433755	504
<Server_Name_Masked>	2022-07-07 14:51:42.027	0	180731	1502	1503134	481

brother_of_karamazov · Jul 7, 2022

Hmmm, Buffer Cache churn is not terribly high, but the Page Life Expectancy values may indicate that pages are not staying in memory very long - around 500 seconds per 4GB per NUMA node, not far off the recommended minimum value of 300 second.

How big are your databases?

Also, tell me what the results are for this:

SQL:

SELECT name, value_in_use
    FROM sys.configurations
    WHERE
    description LIKE '%parallelism%'

and

SQL:

select (physical_memory_in_use_kb/1024) as Phy_Mem_in_mb,
(virtual_address_space_committed_kb/1024) as Total_mem_used_MB,
(virtual_address_space_committed_kb - physical_memory_in_use_kb)/1024 as Mem_as_Pagefile_MB
 from sys.dm_os_process_memory

brother_of_karamazov · Jul 7, 2022

Are there multiple SQL Server instances running on this SQL server?

Soumen · Jul 7, 2022

Jeff,

The proddta is around 6 TB. There is one primary node and a secondary node on AOAG - Always On SQL cluster. There is also a replicated database node for the proddta database used heavily for BI related reporting stuff. Also we have been using ROW compression mode across all tables.

SQL:
SELECT name, value_in_use
FROM sys.configurations
WHERE
description LIKE '%parallelism%'

name	value_in_use
cost threshold for parallelism	5
max degree of parallelism	4

SQL:
select (physical_memory_in_use_kb/1024) as Phy_Mem_in_mb,
(virtual_address_space_committed_kb/1024) as Total_mem_used_MB,
(virtual_address_space_committed_kb - physical_memory_in_use_kb)/1024 as Mem_as_Pagefile_MB
from sys.dm_os_process_memory

Phy_Mem_in_mb	Total_mem_used_MB	Mem_as_Pagefile_MB
3274264	3282161	7896

MFreitag · Jul 8, 2022

We saw a similar issue once when an orchestration ran wild and triggered 10 million records at once into the subsystem control table and then SQL Server had a malicious query plan cached, even after we cleared the table from the entries.

It's worth a try to execute DBCC FREEPROCCACHE when performance goes down. You can clear a single cache (if you can find a malicious one) or all of them. Use with caution though, read up on it here, especially the Remarks section: https://docs.microsoft.com/en-us/sq...eproccache-transact-sql?view=sql-server-ver16

Larry_Jones · Jul 8, 2022

… malicious query…”. Sounds like little Bobby Tables strikes again

Exploits of a Mom

xkcd.com

brother_of_karamazov · Jul 11, 2022

Soumen said:
Jeff,

The proddta is around 6 TB. There is one primary node and a secondary node on AOAG - Always On SQL cluster. There is also a replicated database node for the proddta database used heavily for BI related reporting stuff. Also we have been using ROW compression mode across all tables.

SQL:
SELECT name, value_in_use
FROM sys.configurations
WHERE
description LIKE '%parallelism%'

name value_in_use
cost threshold for parallelism
5
max degree of parallelism
4

SQL:
select (physical_memory_in_use_kb/1024) as Phy_Mem_in_mb,
(virtual_address_space_committed_kb/1024) as Total_mem_used_MB,
(virtual_address_space_committed_kb - physical_memory_in_use_kb)/1024 as Mem_as_Pagefile_MB
from sys.dm_os_process_memory

Phy_Mem_in_mb Total_mem_used_MB Mem_as_Pagefile_MB
3274264
3282161
7896

Ahhhh, it's in an Availability Group. Wish you would have mentioned that earlier.

Tell Microsoft Support that you want to talk to Stuart Miller. Don't stop insisting until you are talking to that guy, and only that guy. Tell Stuart that you are talking with Jeff Stevenson and that you are experiencing that same Version Store issue we dealt with from early to mid-2019 with a customer in Connecticut.

Also, unrelated to your issue - you need more memory, probably double what you currently have, and increase your SQL max memory accordingly. You also should be using Lock Pages in Memory.

Get in touch with Stuart Miller. If you need to, contact me at [email protected], your issue is a very complex one related to AG's and Version Store and the fix is not easy.

Soumen · Jul 11, 2022

Thank you very much Jeff. We do have a ticket opened with MS. Let me see how we can get Stuart Miller involved. I will keep you posted.

brother_of_karamazov · Jul 11, 2022

brother_of_karamazov said:
Ahhhh, it's in an Availability Group. Wish you would have mentioned that earlier.

Tell Microsoft Support that you want to talk to Stuart Miller. Don't stop insisting until you are talking to that guy, and only that guy. Tell Stuart that you are talking with Jeff Stevenson and that you are experiencing that same Version Store issue we dealt with from early to mid-2019 with a customer in Connecticut.

Also, unrelated to your issue - you need more memory, probably double what you currently have, and increase your SQL max memory accordingly. You also should be using Lock Pages in Memory.

Get in touch with Stuart Miller. If you need to, contact me at [email protected], your issue is a very complex one related to AG's and Version Store and the fix is not easy.

I figured you'd like to know exactly what is causing your high CPU issue while you wait for MS. I searched back through my notes and found this:

The issue with Ghost Cleanup causing high CPU was eventually determined to be caused by the Ghost Cleanup process working through a very large version
store in use. A large version store in use *can* also show up as a high number of version ghost records and *can* also show up as a very high version store reserved count (measured by Spotlight). However, the reverse is not always true – a high ghost record count and/or a high version store reserved value do not necessarily mean a high version store in use value.

Long-running open transactions can cause high version store in use, high version store reserved and high transaction log space usage. The impact on
each is different. High version store in use will cause the high CPU condition. High version store reserved may eventually consume all tempdb space.
High transaction log space usage may eventually consume all transaction log space.

The long-running open transaction mentioned in the notes is likely a BI query. I've got a query to identify exactly *which* transaction, but that will cost you since the pain and effort involved in creating it has value.

Soumen · Jul 12, 2022

Jeff,

Thank you very much for all your help with this. We are trying to get MS to enagage Stuart MIller and if I need your assistance I will reach out to you seperately in a formal capacity. Thanks again.

E9.2 JDE Slowness - fixed by DB Reboot

Soumen

Reputable Poster

brother_of_karamazov

Legendary Poster

Larry_Jones

Legendary Poster

Soumen

Reputable Poster

Soumen

Reputable Poster

brother_of_karamazov

Legendary Poster

Soumen

Reputable Poster

brother_of_karamazov

Legendary Poster

Soumen

Reputable Poster

brother_of_karamazov

Legendary Poster

brother_of_karamazov

Legendary Poster

Soumen

Reputable Poster

MFreitag

Reputable Poster

Larry_Jones

Legendary Poster

Exploits of a Mom

brother_of_karamazov

Legendary Poster

Soumen

Reputable Poster

brother_of_karamazov

Legendary Poster

Soumen

Reputable Poster

Similar threads

We value your privacy