OAS Load Balancing on multiple JVMs doesn't work properly

PAULDCLARK

Well Known Member
OAS Load Balancing on multiple JVMs doesn\'t work properly

Recently switched from BlueStack to Redstack, for a few reasons, mainly because EBSS wouldn’t work on BlueStack with all sorts of compilers etc and because it’s the direction for Oracle (at present, rumours in the PeopleSoft world is that WebLogic is back in favour).

I have two servers with separate applications for PD installed on them, balanced via round robin DNS which seems to work quite nicely, I average 35 users on one and 38 on the other. Each of those application has 4 JVMs, which should have less than 10 users per JVM, when I was running BlueStack, as near as makes no odds it ran at exactly that loading.

With Redstack, it seems to pick on one JVM and load that up, so this morning I had on one server:

Group 1: 2
Group 2: 23
Group 3: 0
Group 4: 2

Group 2 then failed with a java.lang.OutOfMemoryError: Java heap space

It then does this:

om.jdedwards.database.base.JDBException: [SQL_EXCEPTION_OCCURRED] An SQL exception occurred: The TDS protocol stream is not valid.. com.microsoft.sqlserver.jdbc.SQLServerException: The TDS protocol stream is not valid.

Followed by this:

com.jdedwards.database.base.JDBException: [OBJECT_IS_CLOSED] The object is closed.

Then this repeated for assorted tables:

QLException occured in the SQLPhysicalConnection.select(): | Table or View Name = F03B13 | Data Source = Business Data - PROD com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

And that’s it, game over for the JVM. 3 times this has happened now, 2 out of the three times I had to manually kill the process, although this morning it restarted the JVM by itself.

Its not machine specific, and a java.lang.OutOfMemoryError doesn’t always kill the JVM.

The install is standard OAS 10.1.3.1, no web cache or anything is installed, the install is as simple as is possible. JVM heap size is set to 64Mb with a max of 1024Mb.
Anyone getting this behaviour on OAS?
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

My experience has been that round-robin DNS and web applications do not work. And, it is not an effective way to load-balance web app servers. DNS requests are very random and Windows DNS cache gets updated without notice. The only way it can technically work is if session replication is supported by the web app and a sticky session is established. JDE E1 on OAS does not support replicated sessions among web app servers.

My recommendation to you is that you find out from your network people if your switch supports a VIP for IP load balancing. Then you can configure Apache virtual hosts for E1 that use Oracle modules for load-balancing.

There are several posts on this forum that explain how to configure virtual hosts that load balance the application servers. That is the easy part. Search for the thread "JAS port".
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

The round robin itself is working perfectly. What you have to remember is that http://enterpriseone812 is the DNS entry and that is only used once. After that the resolution is directly to the machine name, I wouldn't go near round robin directly into the web apps, precisely for the reasons you give.

So a user hits http://enterpriseone812

DNS resolves either PON209 or PON210.

The apache server listening on port 80 has the following Index.html.html file (different address on the other server)

<HTML>
<HEAD>
<META HTTP-EQUIV="REFRESH"
CONTENT="0; URL=http://pon209:12090/jde/owhtml">
<TITLE> JDE Edwards Enterpriseone - 8.12</TITLE>
<META http-equiv="Content-Language" content="en-us">
</head>
<BODY>
</body>
</html>

Which redirects to the http://pon209:12090/jde/owhtml application.

That’s it. The original url isn’t used again, until the user fires it from a favorite or the intranet homepage DNS for the apps\machines always remains the same. Effectively it’s a virtual application. Considering how primitive it is, it’s pretty effective, usually the machines are within 3-4 users of each other.

The user is now running against 1 machine and 1 machine only.

Once the application is invoked, the user session then runs on one of four JVMs, which is presumably determined somehow by OAS. This is where the issue lies; it’s got nothing to do with apps running across two or more servers. I could have only one server and get the same issue.

With WebSphere a vertical cluster worked beautifully, it would (give or take) spread the number of users equally over four JVMs on one machine. (I actually had horizontal and vertical configured on a network deployment – but the limitations of that approach are a whole different story), but with one machine active it worked just as well.

OAS doesn’t appear to do that. There is only 1 installation on each machine and they are completely independent from each other.

Right now the load on each server is:

Server1:

JVM Group 1: 8
JVM Group 2: 12
JVM Group 3: 1
JVM Group 4: 8

Server 2:

JVM Group 1: 12
JVM Group 2: 4
JVM Group 3: 5
JVM Group 4: 5

And over the next day that will get worse so I will end up with, as per this morning:

JVM Group 1: 2
JVM Group 2: 23
JVM Group 3: 0
JVM Group 4: 2

Server 2:

JVM Group 1: 21
JVM Group 2: 2
JVM Group 3: 0
JVM Group 4: 5

Then at some point the JVM with the highest number of users will crash and restart. At that point the system will then be reasonably well balanced, but gradually it starts to favour 1 JVM over the others.

It’s around the 25 mark when it goes horribly wrong, Oracle say that it should support up to 30 users per JVM. Personally I doubt that very much.

A proper hardware is my ideal solution. I inherited this, and in the original installation I had 2 servers with 1 JVM on each as standalone WebSphere installations, with no possibility to vertically clone. As a consequence they would heapdump 10-12 times a day, so I’m a quantum leap forward from there, it was running great under WebSphere, less than one crash a month. I've switched to OAS because I had very little choice in the matter, and I've had three crashes in the last fortnight and I'm expecting another this afternoon on OAS.

Are you running more than 1 JVM per application? What does the load look like?
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Paul,

a customer of ours has multiple JAS servers load balanced through a CISCO load balancer and also stand alone servers with multiple JVM's per machine (configured through server manager) on OAS 10.1.3.1

We DO NOT see the problem you are hitting our users do get spread through the 3-5 JVM's (dependent on the server they hit) we have configured.

Obvioulsy as users logout the numbers fluctuate slightly but in essence it works OK straight out of the box.

I would suggest that you raise a ticket with the Oracle OAS Support team BUT you will need the patience of a saint as they do not understand E1 and will only attack this from a pure OAS angle (which is where your issue probably lies).

We have had an issue with performance and the servers grinding to a halt where we had to apply a patch to OAS (that you can find details of if you search the posts - cannot remember the patch number off of the top of my head), which seem similar to what you are getting with your user count.

You know where I am if you want to talk more.

Terry

E1 8.11_SP1 / 8.97.03 / 10.1.3.1 / Oracle 10g RAC Clustered (Linux 64 Bit) / Windows everything else
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Hi Paul

Can you put your OAS servers' technical specs - you mentioned that there wasn't enough money to increase the specifications (or something along those lines). Your DNS round robin IS perfectly ok in the way you have it configured - that is in effect what the Cisco CSS does - since the session ends up with a correct IP and port to work through.

The 30 users per JVM is based on the amount of memory that each session uses - and the "traditional" sizing estimate for OneWorld - ever since Terminal Server sizing under B7321 - has been 40Mb per concurrent session. With a 1.2Gb JVM Heap size - you can therefore get about 30 concurrent users running. With a 1Gb JVM heap size, the maximum is about 25 users - so these are both in line with what you are seeing.

How OAS balances each of the JVM's is dependent on how OAS perceives whether that JVM is "busy" or not - hence for some reason your configuration is tending towards one JVM over the day, even though the other 3 JVM's aren't at all utilized.

One question I have is whether you could reduce the number of JVM's to 3 to see if that impacts the numbers at all (after all, you're not utilizing more than 90 users per JAS machine). I have a belief that OAS isn't able to address all the memory you believe it should be able to for some reason and that certain activities are "paging" the system and are therefore incorrectly showing a JVM process as being heavily utilized.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Hi Terry,

Nice to hear from you. I spoke some time back with a mutual friend and did check out your post, and came to the conclusion that I probably wouldn't have the same issues, as we are more vanilla that you were:

"We had an escalated issue with Oracles OAS support that provided us with a couple of opatch fixes (that were backported from 10.1.3.3 to specifically fix this issue).

The OAS patches are:-

6390846 (which is the actual fix for 10.1.3.1)
6880880 (which is a patch to opatch in order to be able to apply the other fix).

*** I haven't applied these ****


We were also required to amend the startup parameters in our JDE OC4J containers to the add following:-

-Dajp.keepalive=true

*** I tried this, but killed the server and had to restore, I probably had it in the wrong place ***


We were also asked to amend our mod_oc4j (in the Apache conf directory) the following :-
<IfModule mod_oc4j.c>
Oc4jCacheSize 400
Oc4jConnTimeout 850
</IfModule>

*** I've done this as it seemed a reasonable thing to do

"

As I suspected, it should just work out of the box.

I have indeed logged a call with Oracle, but am not holding my breath, hence posting here!

Will be in touch offline if thats OK. Will be next week now, mad panic for a promotion to PD is going on right now

Paul
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Hi Jon,

The servers are not small. 4 way 3Ghz with 8Gb memory. The rest of the system is similarly over specced and on the whole we barely touch it!

Interestingly I had problems with 2 JVMs per server on Websphere, so with the available memory just increased it to 4 with about 3 in hand, although I was trying to keep it to 1 per processor, and left it like that for OAS. Might try shrinking it, provided I keep below 25 I should be OK, I can do that at the weekend, reducing on an active system is harder than increasing after all!

Long time since I last saw you, you trained me back in 1998, 11 years ago almost to the day. Feel a bit out of my depth with this OAS at the moment, had WebSpehere nailed after god knows how many years of it!

Paul
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Hi Paul

Its good to see another CNC guy I trained still out in the world struggling with the rest of us !

I wouldn't worry too much about OAS - stick with it, you'll "get" it just like you understood CNC. It just takes time.

I think the patch that Terry recommended is probably your biggest thing to ensure is implemented - ignore my advice about going to 3 JVM's - I actually thought you had a lot less memory. 8Gb should be fine, as long as the OS can address it all.

The last couple of OAS servers I put into production, I made sure that the patch was installed immediately before I implemented E1 and I haven't seen this issue.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Cheer Jon, been consulting since 98, then took a permie role last year with an end user.

I'll have a chat with Terry and probably apply the patches, I know the low down on the story that led to those patches and understood it to be a vastly complex custom program that was the issue. So left them on the principle that a couple of other guys didn't have them and didn't have any issues.

Will get there with OAS I have no doubt, just getting old these days!
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

I found this post interesting. Thought I would try to provide some background information, straight from Oracle docs, on what might be going on behind the scenes. Please accept my advance apology if you've already perused these docs.

Excerpt from the attached OHS guide:


"2.3 What are the different routing/load balancing algorithms?
Mod_oc4j provides three distinct kinds of routing: (a) round robin, (b) random and (c) metric based. The effective performance of round robin and random algorithms is the same. The latter, metric based routing, is based on OC4J process informing mod_oc4j of a metric based on its internal resource availability (ex. connection pools). Mod_oc4j then uses this metric to make routing decisions.
These load balancing/routing algorithms also have a flavor - affinity based. In this mode (it is the default mode), these algorithms will always route to the local node, except in cases when no process is available on the local node. The random and round robin algorithms have an extra flavor - weight based. In case of weight based, mod_oc4j distributes requests according to the routing weight configured for each host. Refer to Oracle HTTP Server Administrator’s Guide for more details on load balancing algorithms."


In your spare time, you could take a look at these links:

http://download.oracle.com/docs/cd/B32110_01/web.1013/b28948/load.htm#CIHEABJD

http://download.oracle.com/docs/cd/B32110_01/web.1013/b28948/load.htm#CIHBDGCI

It seems to me you could try to influence the behavior to "work around" your current problem. Testing someplace other than production, of course.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

No I haven't, thanks for the links, I'll have a browse...
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Your statistics are precisely indicative of a problem with load balancing. All you are doing is redirecting to a set of servers. The numbers are uneven because there is no load-balancing being done. There is no way to achieve balancing with round-robin DNS. DNS replies using round-robin produce a random ip from a set of addresses. If you do a search on jdelist for "round robin", you will see this problem has come up before.

The other issue that you see in the logs sounds like it is related to a session failure. If you are trying to do load-balancing with a fixed virtual host for all servers called "enterpriseone", then you will have a problem with the sessions. This would imply that there are session failures - a symptom of non-replicated sessions.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

[ QUOTE ]
Your DNS round robin IS perfectly ok in the way you have it configured - that is in effect what the Cisco CSS does - since the session ends up with a correct IP and port to work through.

[/ QUOTE ]

Round-robin DNS resolution and CSS hardware switch logic are different animals. Cisco CSS manages a sticky session using the session id. CSS uses a session table to govern the packet header contents (ie. origination and destination IP addresses). The session table on the switch governs what addresses are swapped and when they are swapped. This is not the same as round-robin DNS resolution. With round-robin DNS you are relying upon the client application and HTTP service to persist a connection to a given address.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Ditto.

I have installed OAS clusters on four different installations. I have always gotten an almost perfect distribution of users on all of my JVMs using a switch VIP and OAS-Apache load balancing.

My guess is that if Paul asked his network admins, they could and would tell him whether his company's switch supported a VIP or not.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Nope, I get the exact opposite of that. When the JVM bombs it leaves the session active, but in an usable state.

So the users then close their windows, log off their PCs, which apart from claering cache makes no odds.

They reconnect, and the same DNS takes them back to the same server and OAS sticks it back to the same unusable session. After about 10-30 mins OAS catches up and it kills the jvm and starts a new one.

Workaround is to get them to connect directly to the other server. But that's nothing to do with dns.

RR is perfectly valid for this configuration provided:

The machines are the same size and have the same JVMs.

You accept that its not going to be perfect.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

There is one other situation where I have seen a similar problem (on WebSphere). That problem occurred because the JAS sessions were not evenly load-balanced. As a result, the server blew the max number of JDBC connections. This caused the web app server and all associated JAS sessions to crash. I increased the JAS connection pool size and the web app server stopped crashing everyone.

Other than that, I'm resigned to say that you are cursing in your darkness. You are not load-balancing; you are redirecting. Until you balance evenly, you will not be able to control for the other variables. Best of luck.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

It looks like the affinity "flavor". I agree that you would most likely have the same problem with only one server, taking DNS round robin out of the equation, and thus DNS round robin doesn't appear to be the issue here, especially if you are redirecting to one of two machines and not using virtual hostnames.

The issue certainly appears to be with routing of users to the same OC4J, due to affinity, which is recorded as the default behavior. That is why I posted a suggestion to look at "working around" the problem. Hard to tell without seeing the complete config.

Just curious, I didn't read every word of this thread, but do you have separate islands configured within OC4J mounted apps, or separate OC4Js with only one island per?
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

To be fair, this is where my knowledge hits a brick wall.

I have no idea on the islands etc. The manuals delivered for OAS 10.1.3.1 were as useless as the WebSphere 6.1 version, in that they were for the wrong release.

How I installed was pretty simple:

I installed the agent
Then OAS
Then the fixes
Then I created the J2EE server
Added the JDBC
Started it
Created a new instance
Modified the JAS.INI and JDBJ.INI
Added all the performance bits
Set the default apache port to 80
Added the redirector (yes its a redirector not a balancer!)
Restarted
Set the JVM heap size
Set the timeout
Restarted
In SM set the JVM count to 4
Thats it

After the cutover it was fine for a day or so and then started to gradually go wrong.
 
Re: OAS Load Balancing on multiple JVMs doesn\'t work properly

Agreed, all I'm using is a redirector of a DNS entry, nothing more, no virtual hosts etc. I knew about the round robin of the server:port issues because I did exactly that back in 2003 and got session disconnects, as previously described.

I don't believe that RR is the issue here, it internal to apache\oas which begs the question what did I do wrong on it? Clearly there's nothing blindingly obvious, no 'Aha' moment, which I was hoping for!

I'll apply the fixes to DV next week, nag Oracle and read through the documentation supplied. Its 8:30pm here, so I'm signing off for the weekend.

Thanks for all the help everyone. Will get there in the end.
 
Back
Top