IPC2100015 - createIPCShm failed, errno=28: No space left on device

peterbruce

peterbruce

Legendary Poster
We have a problem in both our production and test systems that results in the following error:
IPC2100015 - createIPCShm failed, errno=28: No space left on device.

It was first noticed after we had made a full package deployment in production (our first full package deployment in production in a number of years) as part of the installation of the Purge-IT archiving solution. Users were getting a lot of errors relating to business function calls.

The limit set in solaris for IPC Shared Memory IDs is 128 for both production and test enterprise servers.

Investigation of the logs on the JDE test system enterprise server show that the IPC Shared Memory error did not appear in the logs between the time the full package was deployed on 26th October 2011 and the solaris patching on the bothe production and test enterprise servers on 7th April 2012. It was during this period that archive testing was done. The delay between the archive testing and the installation in production was caused by problems with a couple of the archive data inquiry applications.

On the production system, the IPC Shared Memory error only occurred in the server logs on the enterprise server after the full package was deployed to the production enterprise server on Tuesday 12th June 2012. The JDE services on the enterprise server were restarted on Thursday 12th June and the IPC Shared Memory error has not occurred since, though the IPC Shared Memory ID count does climb slowly, apparently in jumps of 15 to 20.

In both test and production all the IPC Shared Memory IDs in excess of the initial 6 (created when the JDE services are started) are created by the SAW Kernel process.

The conclusion that seems plausible is that the solaris patching on both production and test enterprise servers on 7th April 2012 has somehow caused the consumption of IPC Shared Memory IDs by the SAW Kernel process. The reason it did not show up in the production logs, prior to the full package deployment on the production enterprise server, seems to be that the rate of creation of extra IPC Shared Memory IDs is slower. After the full package deployment to the production enterprise server the rate of creation of extra IPC Shared Memory IDs appears to have increased. After the subsequent restart of JDE services on the production enterprise server the rate of creation of extra IPC Shared Memory IDs appears to have returned to the previous level (between the time of the solaris patch and the time of deployement of the full package). The rate of creation of extra IPC Shared Memory IDs in the test system seems to be high even though it is not used much. The difference in the rate of creation of extra IPC Shared Memory IDs between production and test may be related to the number of environments in each (1 in production and 3 in test).

The response from Oracle (before the solaris patch involvement was discovered) was:

[ QUOTE ]

This is an expired tool release. This release has been expired for over 2 years. Please have the customer upgrade.
There had been several fixes addressing resource leaks, ipc threading issues, etc. Most of the fixes were complicated and depends on other fixes.

Recommendation from Development is to upgrade the Tools Release.


[/ QUOTE ]

The situation is that we are upgrading! As a prerequisite to upgrading we need to archive data so that the upgrade will be simpler, easier, smoother and quicker. In the process of installing the archiving solution, we came across this problem.

Our target upgrade is E9.1 and TR9.1. If I understand the developers response, they want us to add an additional step to the upgrade - an intermediate TR upgrade. I doubt that this is a practical solution.

After being informed of the solaris patch involvement Oracle want the list of patches that were applied and there release dates.

As it appears that there is no fix available to us for this problem, is there any work around?

Any comments or help gratefully appreciated.

Config:
Oracle JD Edwards EnterpriseOne,
E8.11sp1 8.97.2.1, ES Sun solaris 10, Oracle DB 10.2, Websphere 6 Win2K3.
Forms: Create!form Server 3/Server 6
 
JDEList,

A Clarification:

The Purge-IT software was not the problem and was never suspected of being so. Wwe are happy with the Purge-IT archive solution. This software was installed and run without problems on our test system.

Update:

The real cause of our problem is a combination of the tools release and a solaris patch. We are on TR 8.97.2.1 which is old and not supported any more. A kernel patch that included a different /kernel/sys/sparcv9/shmsys, 147440-14 was applied and then the problems occurred. There has since been another patch, 147440-19, applied to our test system and indications so far that it has not worked.

Because we will be upgrading soon, it is not worth chasing a fix. So we have a "band-aid" approach to keep the system running until we upgrade. We have increase the IPC Shared Memory ID limit in Solaris, which gives us more time between related errors and we are monitoring the number of IPC Shared Memory IDs and will schedule a restart when it is approaching the limit.
 
Back
Top