HA (active/active) for Single Concurrent Jobs and Scheduler

cdawes

VIP Member
Anyone doing HA (active/active) on the Enterprise Server?

If so what are you doing for:
(1) Single Concurrent Job Queues
(2) Scheduler in an HA configuration.

In an HA config the risk for the single concurrent jobs is having 2 jobs run in the same queue at the same time on different Enterprise Servers. OCM is the quick solution but moves these jobs to active/passive rather than active/active.

The JDE Scheduler is a server kernel that looks at the F91300 and can only point to one server (unless you have multiple F91300's). If you start multiple identically configured schedulers you get multiple jobs launching. Simply mapping this to one server and allowing active/passive works. Anyone done this in HA on the JDE scheduler?

Thanks,

Colin
 
You need a load balancing daemon !

For a large unix customer who had multiple application servers, the jobs would be submitted to a "fake" server queue - for example, qwait, and then a shell script would query the F986110, and update the queue name to an actual queue based on a lookup of that job from a custom table.

For example, if you have 3 application servers and, say, 60 citrix servers (or a number of JAS servers) - and you have an F5 or something similar load balancing "APPSVR" as a virtual IP between your client servers and the real application servers (who's names, for our example, are actually "APPSVR1 through APPSVR3")

So you submit R0006P from CTX1 and CTX11 and R43500 from CTX25 and R43500 from CTX45 at the same time. Each of these submit to something called "APPSVR" - and it ends up going into the F986110 with QWAIT queue. CTX1 and CTX11 ends up with APPSVR1, and the other two end up on APPSVR2 and APPSVR3.

On APPSVR1, the load balancing daemon shell script inquires the F986110 and finds two jobs for itself. Right now, nothing else is running (select * from F986110 where sts=S or P), and so after the daemon looks up its custom table and finds that R0006P is perfectly ok to run in parallel, it renames the job in WSJ from QWAIT to SHORT - a multithreaded queue - both jobs submit at the same time...good stuff.

Now, on APPSVR2, the lookup also happens - but R43500 is marked as having to go to "SINGLE" - so it submits LOCALLY to the single queue. Everything is fine so far.

On APPSVR3, the lookup occurs - but if it submits the job at the same time, it will be running in parallel - which is bad - so the rule here is that the job waits until there aren't any other R43500's running before submitting - hence truly running in a single thread.

Now, this can be extended somewhat. For example, it allows for lookups to update the job if the job is a different version, for example - only run in single process by JOB & Version, otherwise run in parallel. Or even by User or, if you're very clever, by a substring of a version (such as a branch !)

In fact, there different ways that jobs could be configured to run :

1. the job can always run in parallel
2. the job must run single threaded by user
3. the job must run single threaded by version
4. the job must run single threaded by machine
5. the job must run single threaded

I've hurried through the description of this - but it really isn't too difficult to put together - and it will work for an active/active cluster solution for Unix or Windows, since there is a little more intelligence that occurs prior to actually submitting the job. Its already been put in place at several larger corporations...
 
Ya okay sure.......haven't tried your suggestion but a free coffee is in order.

How's Toronto, Boston, Montreal, Vancouver or Denver sound?

I could mail you a pack of Tim Horton's Coffee but might set of a few alarms with Customs, especially if it's headed to Florida.
 
Back
Top