E9.2 Notifiying users a scheduled ORCH failed

JohnDanter2 · Apr 29, 2024

Hi folks

Just wondering how you guys go about notifying users (I have 4 emails to inform) that a scheduled ORCH didn't work.
It's a simple automatic currency update orchestration, but it could not run for several reasons. The scheduler has stopped, the rates have already been entered or the banks API is down.

What would guys do in a siutaion like this?

Thanks

John

alfredorz · Apr 29, 2024

I've thought that you could create a advance query in Orchestrator Health (P980060|W980060B) or Orchestrator Exceptions (P980060|W980060A) and create a Watchlist and/or Notification. About scheduler stopped I don't know, you could a application or batch or external app to check schedduler status (https://docs.oracle.com/en/applicat...t-api/op-v2-scheduler-pingscheduler-post.html )

Kevin Long · Apr 29, 2024

It depends on the situation, but you can add a notification at the start and end of the orchestration so the user knows that is ran to completion. I will often add a notification as part of the error handling so there is an alert when something goes wrong. You can configure resiliency so multiple AIS servers can be setup to run the scheduler in case one of them goes down. Check out Doc ID 2368608.1 on Oracle support for more information.

jolly · Apr 29, 2024

The thing is, if the scheduler is down you don't get any critical notifications. I have used an external dead man's switch for this, which is a python script running on a Windows schedule to invoke the AIS API to list the critical Orchestrations and their last run times and send emails when stuff is missed.

DaveWagoner · Apr 29, 2024

jolly said:
I have used an external dead man's switch for this

This is the way. If scheduler is down then orch bits and bobs will probably also be down, so can't rely on an orch-based notification framework at that point

Kevin Long · Apr 29, 2024

Just to clarify, I was not suggesting setting up notifications to alert people when the scheduler was down. That would be quite the trick. However, "the rates have already been entered or the banks API is down" would be where I would use notifications. I do suggest setting up multiple schedulers with resiliency to mitigate the list of one going down.

JohnDanter2 · Apr 30, 2024

Thanks folks

@Kevin Long, that is exactly what I have done yes.

Both steps handle the errors in yellow below.

For now I have gone with the simple error handlong in those steps. I just call an ORCH that sends a message with a few substituion variables (which is emailed out) if the exchange API or the SREQ encounter an error I also add the steps output to my email. (Exception Message)

So you end up with this type of email

I plan to make this generic and just have one ORCH to handle this going forward
My idea is to create a generic error handling ORCH to email people.
So have a new config table to store ORCH names, email addresses, subject text, body text, 10 substitution variables for things like DOCO, AN8 ITM etc and an active flag.
In each ORCH or step we want to monitor, we just plug in this ORCH. Pass in the ORCH name and any variables and let the code create a generic message to email out.

But yes the trick is, the scheduler is down. How can I tell them that

DaveWagoner · Apr 30, 2024

In my experience, Postman offers a function in its testing suite that can do "dead man switches", you can schedule an API collection to run at specific times, and if the endpoint is down you can handle it using scripting on that end. That's just one example-- anything with scheduling, curl ability, and messaging can be a suitable solution for you, as long as it's not on or near the system you're trying to test

JohnDanter2 · Apr 30, 2024

DaveWagoner said:
In my experience, Postman offers a function in its testing suite that can do "dead man switches", you can schedule an API collection to run at specific times, and if the endpoint is down you can handle it using scripting on that end. That's just one example-- anything with scheduling, curl ability, and messaging can be a suitable solution for you, as long as it's not on or near the system you're trying to test

To test the API is 'broken' I just planned to change the url for a day or two and let the scheduler run.....Or let it use the non trimmed UDC values!!! That does it lol

I may gve up in the scheduer side of it and rely on success messages instead. If you dont get them one day, well

Kevin Long · Apr 30, 2024

JohnDanter2 said:
Thanks folks

@Kevin Long, that is exactly what I have done yes.
Both steps handle the errors in yellow below.
View attachment 20236

For now I have gone with the simple error handlong in those steps. I just call an ORCH that sends a message with a few substituion variables (which is emailed out) if the exchange API or the SREQ encounter an error I also add the steps output to my email. (Exception Message)
View attachment 20237

So you end up with this type of email
View attachment 20238

I plan to make this generic and just have one ORCH to handle this going forward
My idea is to create a generic error handling ORCH to email people.
So have a new config table to store ORCH names, email addresses, subject text, body text, 10 substitution variables for things like DOCO, AN8 ITM etc and an active flag.
In each ORCH or step we want to monitor, we just plug in this ORCH. Pass in the ORCH name and any variables and let the code create a generic message to email out.

But yes the trick is, the scheduler is down. How can I tell them that

@JohnDanter2, do you have other scheduling tools in your organization that also have a monitoring solution to alert people when they go down, or is this specifically an AIS scheduler concern?

DaveWagoner · Apr 30, 2024

JohnDanter2 said:
To test the API is 'broken' I just planned to change the url for a day or two and let the scheduler run.....Or let it use the non trimmed UDC values!!! That does it lol

I may gve up in the scheduer side of it and rely on success messages instead. If you dont get them one day, well

True canary method

DaveWagoner · Apr 30, 2024

Kevin Long said:
@JohnDanter2, do you have other scheduling tools in your organization that also have a monitoring solution to alert people when they go down, or is this specifically an AIS scheduler concern?

This and things like assertions for automated testing are some frontiers many of us haven't taken the time to attempt. I really really really want to start using orch assertions more and test driven development. Need to find the time and put in the research & effort

JohnDanter2 · Apr 30, 2024

@Kevin Long
Yeah I'm sure we do. That's a good idea. See if I can piggy back off one those you mean? Yes, AIS scheduler concern

@DaveWagoner
I did think about assesrtions, but I am the same as you, not researched it loads. Apart fom I think it's more to do with expected values and ranges, I knwo nothing else about it

Kevin Long · Apr 30, 2024

JohnDanter2 said:
@Kevin Long
Yeah I'm sure we do. That's a good idea. See if I can piggy back off one those you mean? Yes, AIS scheduler concern

@DaveWagoner
I did think about assesrtions, but I am the same as you, not researched it loads. Apart fom I think it's more to do with expected values and ranges, I knwo nothing else about it

@JohnDanter2 If you have concerns with the reliability of the AIS scheduler, then I think either those concerns need to be addressed, or I would suggest using a different scheduling tool. The biggest issue I have seen with the AIS scheduler is it not starting after a JDE restart. However, that issue has been resolved. You may have encounter other problems which have you questioning its reliability. If that is the case, then have a look at using an external scheduling solution.

JohnDanter2 · Apr 30, 2024

Kevin Long said:
@JohnDanter2 If you have concerns with the reliability of the AIS scheduler, then I think either those concerns need to be addressed, or I would suggest using a different scheduling tool. The biggest issue I have seen with the AIS scheduler is it not starting after a JDE restart. However, that issue has been resolved. You may have encounter other problems which have you questioning its reliability. If that is the case, then have a look at using an external scheduling solution.

Hi Kevin. My concerns are mainly to do with restarts yes. it seems to fire off the ORCH at odd times and aldo I am not sure what time zone it is actually using in the CRON string
It's meant to go off Mon to Fri 1PM CEST. 0 0 0,6 ? * MON,TUE,WED,THU,FRI *
Yet it's just fired off today at 6AM my time (GMT) (but I think my string is wrong (0.6))
Sometimes it also seems to fire off after a restart and it can't do this as the bank doesn't change the rate until midday. I'll create a new rate for yesterdays rate at 6AM basically.

I am in GMT time, the users who want this at 1PM CEST yet our servers are using some US timezone lol.
So I'm sure it's just education on my and my CNC teams part, but for now it's a bit random. I also want this to fire off from a generic scheduler account and not mine

Kevin Long · May 1, 2024

Have a look at Oracle Doc ID 2800801.1 on how to configure timezones for the AIS Scheduler. There is a setting WebLogic console where you can set the timezone for the server instance. I don't have that setting in my instance and it defaults to the server's timezone. The display time in the Scheduler itself is determined by the user profile's timezone setting in the Universal Time setting and the timezone rule.

If you don't want a job to run automatically after a restart, make sure Autostart is turned off. To have the scheduled job run with a scheduler account, the scheduler account needs to be the account that signs into scheduler and turns the job on.

ArnoldWillems · May 2, 2024

Kevin Long said:
@JohnDanter2 If you have concerns with the reliability of the AIS scheduler, then I think either those concerns need to be addressed, or I would suggest using a different scheduling tool. The biggest issue I have seen with the AIS scheduler is it not starting after a JDE restart. However, that issue has been resolved. You may have encounter other problems which have you questioning its reliability. If that is the case, then have a look at using an external scheduling solution.

Hi Kevin,

Also, you mention the issue is resolved with the scheduler, see here some further thoughts on it which could maybe of interest to some on the form.

To determine if the scheduler is offline, I have a few steps I currently use:
You can employ Postman (or any other tool/script running on a private VM) to schedule an POST API call to "{{account_url}}/v2/scheduler/list". This call will return either true or false.

With this approach, you can also set up a secondary POST request to restart the scheduler in case it's offline.

Cheers

E9.2 Notifiying users a scheduled ORCH failed

JohnDanter2

VIP Member

alfredorz

Reputable Poster

Kevin Long

Well Known Member

jolly

VIP Member

DaveWagoner

VIP Member

Kevin Long

Well Known Member

JohnDanter2

VIP Member

DaveWagoner

VIP Member

JohnDanter2

VIP Member

Kevin Long

Well Known Member

DaveWagoner

VIP Member

DaveWagoner

VIP Member

JohnDanter2

VIP Member

Kevin Long

Well Known Member

JohnDanter2

VIP Member

Kevin Long

Well Known Member

ArnoldWillems

Member

Similar threads

We value your privacy