How does Websphere 4 regenerate its plugin ?

antoine_mpo

Reputable Poster
Hi list,
Because of problem, i'm going deeper and deeper in websphere .. (don't have the choice, websphere 4 is no more supported by IBM)
In order to change our web architecture (change the websphere load balancing), we installed 2 web servers, with 3 clones per server.
At the begining of install, we had chosen some names for "server group" and "application servers". Once it was working, we decided to change them to make them more "understandable". We renamed the server group and all application servers.
Everything was working fine for several days, until we did a "regegen websphere plugin" on the node where is the IIS web server (we were trying to enable saw for cloned environment). After that, we were unable to access to JDE.
After several tries (to undo all the modifications) we were able to access again, but in degraded mode. all the load was sent to the first server, nothing to the other one.
After some reading of IBM Webpshere guides, we understood the role of the "regen plugin" and the essential role of the plug-in configuration file (xml file) for the work between IIS and websphere.
This xml file contains a description of what you see in the administrative console of websphere (server groupes, clones, virtual hosts, ...), and is used as a routing table, to know where to send requests.
What we noticed, looking at the xml file on the first server (where is the IIS) is that it doesn't mention the second server, so that's why there is no connection on the second one.
We then took a look at the old xml file, before the problem, and noticed that the server group name and the application servers name were the old ones (the first names after the install, before the renaming).
Yesterday we tried the following things :
- stop IIS on the first server, regen the websphere plugin of the first server. Then the xml file seemed all right (the good names, the one we see in the websphere console) and the 2 servers with each clone. But after restarting IIS, we were not able to access JDE.
- Then we stopped IIS again, and restored the xml file that was working (so with the old names), restarted IIS. And everything was working well again (connections ok, load balancing between the 2 servers).

So, what we think is that there is somewhere in websphere some references to the old names, and that they are the one active. I took a look in the Oracle tables used by websphere . But it's not really easy to see how it works. I can just notice that in a table, i can find both old and new names of the application servers.

Does anyone knows how websphere (4.0.7) regenerate the plug-in ? Is there any other files impacted except plugin-config.xml ? How are used the database tables of websphere ?

Another question, about enabling the saw :
To do so, you have to add the transport port of each clone in the aliases of the virtual host concerned. Most of the time, you type "*:port", but if you have several clones (on different server) using the same transport port, websphere throw a warning saying it could lead to problem. Do you know if it's better to explicitly type "server_name:transportPort" in aliases, to avoid trouble ?

Thanks for your help.
 
Here are some feed back about what we found for our problem.
I still don't know how websphere recreates the plugin configuration file, but i know a bit more on how the plugin uses this file, and why you probably had issue :
1 - When you regenerate the plugin in websphere console on a server, the websphere service must be running in all the server of your architecture. If it's not, the servers were the service is not running won't appear in the xml configuration file.
2 - The webpshere plugin parse the request "http://webserver:port/jde/jdeservlet" to first find a virtual host matching the "webserver:port", then to find URI matching the "/jde/jdeservelt", and then find a server group matching virtual host/URI. After that the plugin will use an alogorithm to determine to which clone to send the request. In virtual host, most of the time you use *:port, so it can be use by anyserver.
The problem we seemed to have is that in our configuration serveral virtual host were defined with *:port with the same port. So even with the right xml file, the plugin either didn't manage to find which to use or used the first one and send the request to a wrong clone. To check that, we change manually the xml file, to change the port of several virtual host (using a port not used), and let only one good port on the virtual host we wanted to use. With that change, everything was working well.
I think (but didn't check it for the moment) that if we replace the "*:port" by "webservername:port" in the virtual hosts, it should work too and the plugin would be able to find the right route.

During my search, i found some interesting documents on some parameters to look at for some tuning on web part (didn't test it in production for the moment) :
- The web server (IIS, Apache, ..) put the xml configuration in memory when it's started, and then check the file every x seconds to see if it changed. The time to reload the configuration file can be defined in the xml file itselft. by default (if not mentionned), it's 60 secondes. In a production environment this should be set to a higher value.
- If you have clones on several server, and that a server gets down, it could generate a lot of waiting time for each request send to the plugin. When a plugin try to reach a clone, if it does not answer because the server is down, it will wait until the operating system tcp time out, and then will mark this clone as unavailable (so it's not use in the routing algorithm) for a period of time set by "RetryInterval" parameter in xml file.
Here is an example of what could happen if RetryInterval is not well adjusted :
2 servers A and B, with 2 clones on each. tcp timeout set to 75s, RetryInterval set to 60 (default value).
Server A is down.
In the routing algorithm, plugin try to contact the first clone on A, wait for tcp timeout (75s), mark this clone as unavailable for 60s, then try to contact clone 2 on server A, waits 75s again, mark the clone 2 as unavailable for 60s. Then contact clone 1 on server B which is ok, but the request already waited for 150s. And during the second wait of 75s, the RetryInterval of clone1 on server A expired, and so it was set back to the routing algorithm, so next time the plugin will try to contact clone 1 on server A, it will wait again ...

Some tuning on IIS :
- Directory permissions on the web server sould be set to "script only" instead of "script and executable" (default value). It seems that script and executable can decrease websphere performance
- Number of expected hits per day : by default it's set to fewer than 100000. Recommended value : more than 100000. This parameter controls the memory allocated by IIS to connections.

If some of you have already used those parameters and have some feedback, i'd be interested.

Cheers,
 
Back
Top