E1 Web server Question

DJH

Active Member
Hello list

We have a Websphere 6.1 farm balanced over 2 servers with 2 instances on each server.

Our Enterprise\Data server is a failover cluster with 2 nodes (Windows Server 2003). We have found that if one node crashes and fails over to the other node, the web servers need to be restarted as well. If not we get loads of JAS errors. I have cleared the caches but it does not seem to help.

Anyone else have this setup or have any ideas on this? The full clients we have are fine when the server fails over, it just seems to affect the web clients.

We are on 8.10 with Tools release 8.98.1.3.
SQL2005 SP2
All Intel
Websphere 6.1.0.27

Many thanks
 
SOunds like an active-passive cluster using Microsoft Clustering Services?

Yep that's the way it works.

Why not move to a hardware load balanced active-active solution?

I've got many of these running and you can turn off a database server with no one noticing anything (11g RAC and 8.98.3), turn off an Enterprise Server and the only thing that fails are UBE's (BSFN's fail and switch over automatically).

For Web if it goes down then it goes down and users need to log in again (not doing memory replication).

Colin
 
[ QUOTE ]

Why not move to a hardware load balanced active-active solution?

I've got many of these running and you can turn off a database server with no one noticing anything (11g RAC and 8.98.3), turn off an Enterprise Server and the only thing that fails are UBE's (BSFN's fail and switch over automatically).

[/ QUOTE ]

Colin

Is setting up active/active with a hardware load balancer documented anywhere? I have two of six new linux 9.0 servers built. The next step is to setup the active/active/active/active/active/active cluster. The database is on a seperate box (our exadata). These boxes are just batch/logic servers. The printqueues will be directed to a single highly availible network share. We will be using an F5 for the load balancing. Is there a published document that can point me in the right direction?

-Gregg
 
Ah yes...EXADATA.

If you (I was) at OpenWorld you would have found out about EXALOGIC and if you were a good soldiier and went to Larry's 2 painfully boring keynotes which were almost an exact duplicate of each other you would of thought that the whole thing was EXCREMENT!

Okay - I feel better after my rant but really what a waste of time. Black Eyed Peas (thank you Fergie) almost made up for it.

Since you've already drank most of the Oracle kool-aid how about another glass......this time EXALOGIC for the Web Tier?

I'm sure EXAAPP will come out soon for the Application (Enterprise Server) Tier.

For the active/active there is no published documents.......perhaps a trip south of the border is in order for Mr. Dawes? I can bring Tim Hortons, Beaver Tails and Maple Sugar.

The setup is actually pretty simple < 1 week. You'll need to modify WSJ to read all Enterprise Servers (don't know why this isn't the default).

You need to worry about Scheduled Jobs and Single Concurrent Jobs as well. Here you can either (1) map these using OCM's to one machine (2) create an active-passive cluster or (3) customize JDE to create a daemon to intercept the batch jobs, check to see what's running and send it to the right server.

For #1 and 2 see me. For #3 Altquark's your man. I prefer not to customize to this extent. I did develop this but for a customer but they decided they didn't want to maintain the mods.

Finally for the actual Enterprise Server balancing you need to create a virtual entry and point all OCM's to the virtual entry. Then you have the load balancer use this entry to split the traffic between the various servers.

You'll need to have the exact same job queues setup on all servers - this makes life easy. Actually all servers should be identical for simplicity. I also do not seperate Batch and Application.

There are a few other tricks like keeping the F96511 updated (database triggers) and a few other tables.

When it comes to the load balancer I use the Cisco ACE 4710. I see you're using the F5 (hey no one's perfect :>). You do need to be careful about the maxnet processes (multiple of 2 so even numbers).

If you're balancing by IP your sticky timeout should be the same as the session timeout on the Web.

Load Balancing on the Enterprise Server isn't fully redundant and you need to be careful abou the number of kernels you configure. ie if you're on 8.12 and the default port is 6014 and it goes down then is the system down? --> YES.

However, if all 3 Metadat kernels go down then is the system down? Well yes and no. If you're monitoring port 6014 then no and traffic will be sent to an Enterprise Server that according to the load balance is up but really can't do anything.

I'm working on this issue with the Product Mangers at JDE - I've given my 2 cents and they've reached out to a few other experts on this as well but don't know where it's gone. My suggestion was to allow JDE to set predefined port numbers for each kernel so that we could program the load balancer based on certain kernels on certain ports being up or down.

I actually presented a bit on this at OpenWorld last week as part of Gary Grieshaber's tools update.

I've attached a version of the presentation I did at a previous conference back in June. My part starts on pg 30 and goes to pg 43.

The customer I did this for now has 2100+ users and pretty much zero downtime. Deployments do take longer as you need to migrate users from one machine to another. We were able to create documentation and train the local admin to do all of these tasks so sadly (for us) they are 99% self sufficient.

So rest assured it works well. However, please note that this is not GSC supported but only field supported (ie don't call JDE asking them to fix your hardware based load balnced solution).

Colin
 
Hi Colin, thanks for your reply.

Yes sorry, we are using MS Clustering with an active passive cluster.

So basically is it just a flaw with the way the web servers cache the DB connections then? Not much point us having a cluster at all then once we go 100% web client if that is the case.

Hardware load balancing sounds good, can you recommend one to look at? As long as it wasnt to expensive to install I could see being a good solution.

Dave
 
[ QUOTE ]
Ah yes...EXADATA.

If you (I was) at OpenWorld you would have found out about EXALOGIC and if you were a good soldiier and went to Larry's 2 painfully boring keynotes which were almost an exact duplicate of each other you would of thought that the whole thing was EXCREMENT!

Okay - I feel better after my rant but really what a waste of time. Black Eyed Peas (thank you Fergie) almost made up for it.

Since you've already drank most of the Oracle kool-aid how about another glass......this time EXALOGIC for the Web Tier?

I'm sure EXAAPP will come out soon for the Application (Enterprise Server) Tier.

[/ QUOTE ]

Don't forget about EXAFusion - the best of JDE, Peoplesoft, and Ebusiness rolled into one.

By the way, my profile picture on this site is a series of Exadata servers. That's really what they look like. Got a big old X on the front of the cabinet. Exadata saves the world! Cape not included....
grin.gif


- Gregg
 
Not a flaw but a "feature". I don't think this has anything to do with cluster. To test disable the cluster and then simply stop the database, restart it and see what happens to the web..........can't login. The issue is more on the database side.

Only Oracle RAC can overcome this issue.


Not too many people do active-passive clusters these days. Especially in small and mid size shops.

There is definate value in clustering on Web (also Enterprise and DB) so later versions benefit from this a great deal.

Nowadays I wouldn't do active-passive but as a cheap solution I would do physical --> virtual replication or virtual to virtual replication.

We've sent a bunch of these up recently. we use DoubleTake and VizionCore products but there is also Veema and a few others out there.

The VM solution would be much cheaper but might not hold your full user count.

For active-active there are a number of people on JDEList that can do this including myself. A turn key simple solution for 2 web, 2 Ent and 1 or 2 DB should take about 1 - 2 weeks depending on the config plus the time to program the switch which could be 24 - 40 hours depending on how elaborate you get.

Other may have higher or lower numbers but that's my general ballpark.

For the Load Balancer you can look at the Coyote, F5, Citrix NetScaler, and Cisco ACE 4710 (my personal favorite).

Do a search on hardware load balancer.......Altquark did a list of a bunch of these a few hundred years ago when he was young.

Colin
 
Hi Colin,

I have to take you to task on a few of points here - firstly a lot of small and mid size shops are using Active/Passive clustering as it is a very cost effective way to provide HA. Secondly RAC is not the only option here, Mirroring can also help as the fail over to the target is typically < 5 seconds and the JDBC connection can be setup to include the principle and target (in this case it is the data access layer that provides the retry functionality). Thirdly, and lastly, it is absolutely an application issue - if the application simply retried the database connection it would work the DB would be online on the second node and processing would continue as before. BSFN's would still fail as would batch jobs but that is the same if a node fails in RAC any open transactions roll back it is only selects that can automatically be rerouted to another node but again this is only if the application supports it. RAC is a great feature but it is also very expensive and much more complex to configure.

Jack.
 
Back
Top