LDAP and security kernel failures

Bulldog

Active Member
I have a customer who has iSeries V7R1 (DB2 V7R1) for their enterprise servers for JDEdwards 9.0 TR 9.1.2.3. They have recently switched on LDAP and we are noticing that every couple of days we get security kernel failures and 'cannot authenticate user' error messages in the logs. They are on multifoundation (non-production has two E1 instances on seperate ports - E900SYS lib and E9003SYS lib). We have to restart the non-production enterprise server to get everything working again. Prodcution seems unaffected probably due to the single E1 instance. Oracle have suggested that the issues lies with multifoundation. Has anyone switched on LDAP for an I-series Enterprise system with Multifoundation or can you suggest any reasons why we would be getting such failures on a regular basis? I have uploaded a small extract of the kind of logs we are getting.
 

Attachments

  • 183878-Log Extract.txt
    931 bytes · Views: 76
Multi-foundation has nothing to do with it.

Please detail your config a bit more.

The key with any LDAP install for E1 is to point to the Active Directory Global Catalogue. You also need to replicate the GC to a close location.

In the LDAP config you just point each Enterprise Server port to the LDAP server - you do need a separate entry for each Enterprise Server.

Also in the Security section of the JDE.INI file you need to make sure that each enterprise server only has one security server (itself).
 
I had this exact same problem at a client some time ago. After x number of days (and it was always x), the kernels quit working. I tried changing the token settings. I tried timeout settings and nothing ever worked. The AD admin said that the service was always running and that it must be JDE. I opened an SR with Oracle and they could never resolve it.

To this day, I don't know what the problem was. My bet is that the AD LDAP (GC) service was being stopped or some other NT timeout setting. The workaround was to restart services once a week.

I am at another client and have the exact same config (except a newer tools release and AD 2008). We don't have this problem.
 
Bulldog,

I have a similiar issue with one of my customers.

Customer is on iseries
Apps 9.1
Tools 9.1.0.4
Websphere app server.

If we let their prod server run, in roughly 8 days, I see the security kernel crash error. Interestingly, I see the issue from two directions. The users are all on the iseries for JDE access. They also have a windows web server running websphere for some custom mobile apps. I start to see the impending crash on the windows server first. The mobile app crashes and then needs to be restarted. An hour or so after the windows webshere apps crash, the iseries websphere instances start to crash.

When we see this condition on the windows servers, I found this message in the logs: "Cannot connect to any OneWorld Security Server.Failure in retrieving extended token from Security Kernel....."

When it starts to hit the iseries, I see this message: "Sign on: error message ID = 340 (Security Service is down, please contact system adminstrator)"

They are running multiple security kernels. Not all of the kernels crash. If they are persistent and try to login a few times, they eventually hit a working security kernel and can get in.

They have a prod and a non-prod instance for their iseries. I only see the issue on the prod server. Oracle support could not find anything. Oracle tried to blamed the issue on websphere.

I looked at all of the various timeout settings and scoured the knowledge jungle for clues. We have all of the recommended settings in place, but the issue persists and is consistent. The closest hit I found to their issue was in Oracle document 885414.1. In that document, Oracle points out some issues with the regular and extended token lifetime. They list some parameters for optimizing the token lifetime settings and pointed out that they are trying to resolve this is a later tools release. I'm still holding my breath.

Our workaround was a scheduled reboot process. We cycle JDE nightly. With that process in place, we have not seen a reoccurance of the security kernel issue.

We have added in some extensive scripting and testing around this nightly reboot that ensures that the customer's system reboots cleanly. If there are some issues that our scripts can't handle, we have a secondary process that proactively tests their system on a 24/7 basis. If their system has a meltdown, our automated system sends an alert for myself or a member of my CNC team to start troubleshooting their system, before the customer even knows that they are having an issue.


- Gregg
 
>> If their system has a meltdown, our automated system sends an alert for myself or a member of my CNC team to start troubleshooting their system, before the customer even knows that they are having an issue.

Have you ever thought of monetizing that?

I mean, I think everyone who's worked on the CNC side tends to figure out pretty quick to check the kernels when things go wrong, but that might be an interesting add-on to JDE.

Malcolm
 
Thanks all for the information, we have had this issue for several weeks.

Colin, you asked for more information on our cusotmers system so here goes:

JDE Enterprise One 9.0 TR9.1.2.3
Web servers Solaris 10, Websphere 7.0.0.19
Database DB2 V7R1
Fat Clients: Windows 7, websphere Express 7.0.0.19, Local DB=SSE Jdeveloper Studio 11.1.1.5.0, JAX-WS for BSSV

I have attached the JDE.INI files from the I-series. if you need anything else let me know -
 

Attachments

  • 183912-JDE INI File from E9003SYS_E900SYS.zip
    12.9 KB · Views: 29
Hi Colin,
Can you expand on your reply with specifics to;

The key with any LDAP install for E1 is to point to the Active Directory Global Catalogue. You also need to replicate the GC to a close location.

In the LDAP config you just point each Enterprise Server port to the LDAP server - you do need a separate entry for each Enterprise Server.

Also if you need more info within JDE such as the config in P95928 let me know.

Seperate LDAP server for each E1 instance in our multifoundation, allocated to a different port. The LDAP server location to the cusotmer looks like an IAM address, LDAP server type is Novell AD, no role and SSL is Y. The mapping values are the same for both instances, regards..
 
Hi All,

I have similar situation with one of the customer.
Setup information:
AS400 OS V7R1
E1 9.1
TR 9.1.3

I have restart services in order to get rid of this issue. It is multi-foundation setup.
We do not have trusted node setting in JDE.INI files. So, setup is picking up from system.

In case, any one got any solution for this problem, please share.

Thanks in advance!
 
Arny - is your Domain Services server a reliable service ? ie, is it load balanced between a number of domain servers ? If not, then any downtime from the domain services will "kill" the security kernel and result in the necessity to restart JDE services. This includes network downtime for whatever reason - if the Domain Service is not available when JDE makes a request - it can kill the JDE services. Just a thought.
 
Back
Top