Multi-threading and Performance

kipkingle

Active Member
Prior to upgrading to 8.12/8.9.7.1.2, we tested the multi-threading feature and it seemed to significantly deprecate performance. Some of our custom applications wouldn’t even run without erroring out. At the time, we hadn’t done any official tuning (modifying ini settings or taking additional SAR’s), so we decided to revert back to the single-threaded option, and the system went back to performing as expected. Since then, we hired some outside consultants to assist us with our tuning efforts, but the mulit-threading feature remained unused.

So, this week, we decided to turn the multi-threading feature back on in our test environments to assess if any other performance enhancements could be gained. It appears at first glance that the system doesn’t act any better or worse with the multi-threading feature turned on. (Fortunately for us, the error’s we originally discovered are no longer there which I would contribute to the couple of SAR’s we took.) Our DV and PY is setup for mulit-threading while our QA and PD environment is setup as single-threading, so I can compare the two options with a fair amount of consistency.

1. Has anyone seen a significant performance improvement by utilizing mulit-threading?

2. Can anyone recommend a way to significantly test the feature? (I am having 5 different people add a new Item at the same time.)

Thanks,
Kip.
 
I would guess that when you first turned it on and experienced errors with custom business functions that those errors were caused by the custom functions not being thread-safe.

Multi-threading will not increase the performance of an individual business function. It will increase the throughput of multiple business function executions on a CallObject kernel. Multiple users on a single CallObject kernel will not block each other as they do under single-threaded mode.

In your test with 5 people adding an item at the same time there will be many business function requests passed to CallObject kernels. Are all 5 users attached to a single kernel during your test? If they are spread out across multiple CallObject kernels you will not see contention. Even if they are attached to the same CallObject kernel you will probably not see much of a difference with a single iteration per user. You need to get up to a few hundred or thousand requests to really see the difference.

Try this for a test approach:

1) Configure system for single-threaded CallObject kernel and configure only 1 CallObject kernel.

2) Run your tests and get total execution time.

3) Configure system for multi-threaded CallObject kernels and leave the configuration with only 1 CallObject kernel.

4) Run your tests and get total execution time.

5) Compare execution times

Multi-threaded kernels improve throughput and reduce contention especially for high-volume sites with sustained business function load. When I have run tests of this sort I create a wrapper NER that calls a series of business functions in a loop. I then build a simple application with a Launch Test button and Start/End time fields. The Launch Test button calls the wrapper NER. Finally I launch the test application from 5 or 10 web sessions and press the button in each session. To simulate real world operation I include short (1 Sec range), medium (10 sec range) and long (60 sec range) running business functions in the NER.

My definition of short, medium and long running functions is arbitrary. For some of my sites I would probably define long as 3 minutes or more. It really depends on how you use the system.
 
We only have approximately 60 users on the system during peak times during the day. Would you say this is not a high enough volume to see the benefits of multi-threading functionality?

In addition, can you explain the call object kernel methodology in a little more detail? I’m not familiar with this topic. I assume that when I kick off a business function, it grabs the first available kernel to run on. If I kick off the same business function again, it will stay on the same kernel and wait until the first once is completed before proceeding to the second. Is this a fair statement?

Thanks,
Kip.
 
The 2nd part of your assumption is not true if you are in multi-threaded mode. Your function will run on the first available kernel.

Search for the document on Metalink titled: "Multi-threaded Kernel Characterization Testing EnterpriseOne Release 8.95". File name is "rp_e1perf_multithreaded_kernel.pdf".

Colin
 
I am not sure if I read Colin's response correctly. I don't believe that requests go to the first available kernel in either single-threaded mode or multi-threaded mode. In multi-threaded mode a bsfn request will go to the first available thread or a new thread will be spawned if none are available.

Here is my understanding:

When a user is logged in they are bound to a specific call object kernel. This may be first available or round robin. The user counts do tend to equalize across active kernels in any case. Although it would make sense for scalability to allow each BSFN request to float to the next available or least busy kernel doing this would make it impossible to use cache memory. jdeCache structures are maintained in the Call Object's memory and not in some sort of shared memory structure.

If you had 60 users and 6 CallObject kernels I would generally expect to see 6 users attached to each kernel. In Single-threaded mode these 6 users would contend with each other as there can only be one bsfn executed at a time on a kernel. In multi-threaded mode those six users are able to have their requests serviced in parallel by threads in the callobject kernel's thread pool. This threadpool grows as needed to attempt to never make a bsfn wait for execution.

On a separate note the threadpool has no hard limit to the total number of threads that can be spawned. It is therefore possible to completely saturate the CPU of the machine and launch more threads than can reasonably managed by the OS. I have done this in heavy, scripted load testing. In the real world it probably isn't much of a worry as long as you watch CPU utilization trends and expand capacity as needed.

As for bsfn mechanics in your code, whether your code or application stops and waits for execution or continues to run after the bsfn is launched depends on whether you have launched the function synchronously or asynchronously. Both modes are available in E1.
 
Yep Justin's right .....brains still on vacation. Multiple threads will be spawned on the same call object kernel to satisfy the requests from the same user.

There are generally 2 scenario's to tune the Call Object Kernels and threading:

Scenario 1 – More Call Object Kernels, Less Threads
Higher fixed memory cost at startup
Less users per COK
Less variation in size growth when threading occurs

Scenario 2 – Less Call Object Kernels, More Threads
Lower fixed memory cost at startup
Memory will grow dynamically to accommodate load
More users increase the probability of COK growth
Less predictable primarily during peak use

Most people (including me) used to do scenario 1 primarily. This was "Xe" logic since threading was not available.


With threading I've found that scenario 2 works well and also preservers system meory and better utilizes resources.

Colin
 
Correction to my previous post:

60 users across 6 call object kernels would usually mean 10 users per kernel.
 
Thank you both for all of your help, it has been very insightful. Our current configuration is as follows:

[JDENET_KERNEL_DEF6]
krnlName=CALL OBJECT KERNEL
dispatchDLLName=XMLCALLOBJ
dispatchDLLFunction=XMLCallObjectDispatch
maxNumberOfProcesses=10
numberOfAutoStartProcesses=0
singleThreadedMode=N (N means that multi threading is on.)
ThreadPoolSize=20
ThreadPoolSizeIncrement=5

Does this indicate that I have 10 possible kernels that could process up to 20 threads? (I apologize if this is a really basic question.)
 
You understanding is close. In your configuration you have a maximum of 10 kernels that will initially spawn 5 threads (per ThreadPoolIncrement). Additional threads will be spawned as needed in increments of 5 up to the ThreadPoolSize. If additional threads are needed after the ThreadPoolSize is reached the kernel will spawn "overflow" threads. These are created for the request and then destroyed when the request is complete. There is no upper limit to the number of overflow threads that will be spawned. A message will be written to the call object kernel's log when 1,000 overflow threads have been created and destroyed. If you get to the point that messages are logged you need to consider increasing the size of the threadpool, increasing the number of call objects kernels or both. The goal is to reduce the amount of dynamic thread creation and destruction.
 
Back
Top