Parsing Text NER BSFN

peterbruce

peterbruce

Legendary Poster
JDEList,

A number of the members of JDEList have appreciated the ER I posted (post 86032; download attachment to Post 86032) to parse text without breaking a word. I have had to repeatedly use this ER or variations of it in a number of projects. Anticipating a continuing need for this functionality, I wrote a NER BSFN to accomplish the task, so I didn't have to continuously recode it. In addition I had to make it a little more flexible than the original ER.

In the attached zip file there is a Boomerang package called ParseTextToLines.BG2 (thanks Alex Pastuhov). If a BG1 (non unicode) Boomerang package is required please ask. The Boomerang package can be installed on your development Client using Boomerang Lite, which is free to download from Everest Software International.

The Boomerang package includes:
Data Structure: D55PRSTXT
NER: N55PRSTX
Data Dictionary Messages:
PRSTXT0001 - No Text to Parse Found
PRSTXT0002 - Line Length of Parsed Text Too Large
PRSTXT0003 - Parsed Text Line No Found
PRSTXT0004 - Parsing Text Starting Position Invalid

The NER has been tested on my configuation only (see below) and works on both windows and unix. Before using it in production test it thoroughly on your installation first.

I welcome any comments and/or questions, especially improvements I can incorporate (and use myself).

My configuration is:
Oracle JD Edwards EnterpriseOne,
E8.11sp1 8.97.2.1, ES Sun Solaris 10, Oracle DB 9207
Compiler is Sun Studio 12

Here is a description of the NER BSFN

NER Name: N55PRSTX
Function Name: ParseTextToLine
Function Description: Parse text To Line

Purpose:

N55PRSTX is designed to parse a large text object, which may contain multiple lines (new line or end of line character or character combinations), and break it up into smaller, strings of text. It will always end an output line at a new line (or end of line character or character combinations) and exclude the new line from the output. If the output line contains a space "-" or "_" it will break at that point instead of breaking in the middle of a word, if the output length is reached before a new line is reached. If the line break occurs at a "-" or "_" it will be included in the output line. If the line break occurs at a space, the space will be omitted from the output line.

Inputs:

BF szFullText_ISTDATA2 - Required
The text which is to be broken up into lines.

BF mnLineLength_IDROW
The maximum number of characters in the line to return. Maximum is 500. Default 80.

BF mnLineNumToReturn_IDROW
The line number of the lines produced to return. Default First (1).

If BF cSuccessiveLines_EV01 is set to '1' then this will be ignored and the default of the first line will be returned. This will allow successive calls to this function to return successive lines in the most efficient way.

BF mnStartCharacterNum_IDROW
The character position in BF szFullText_ISTDATA2 to start breaking up text. 0 is the first character position. Default 0.

BF cSuccessiveLines_EV01
A flag indicating that successive calls to this function will return successive lines from BF szFullText_ISTDATA2. If this is set to '1' then BF mnStartCharacterNum_IDROW will have an output value equal to BF mnLastCharacterNum_IDROW plus one. If this is set to '1' then BF mnLineNumToReturn_IDROW will be ignored and the default of the first line will be returned. This will allow successive calls to this function to return successive lines in the most efficient way.

Outputs:

BF mnStartCharacterNum_IDROW
The input value or default (0) unless:
If BF cSuccessiveLines_EV01 is set to '1' then this will have an output value equal to BF mnLastCharacterNum_IDROW plus one. This will allow successive calls to this function to return successive lines in the most efficient way.

BF szLineOut_AGEN - required
The line to output.

BF mnLastCharacterNum_IDROW
The last character position in BF szFullText_ISTDATA2 that was read last, counting from 0 as the first character position.

BF szErrorID_DTAI
The message ID to identify the error that has occurred. This is a Data Dictionary Workflow item (Glossary Group Y). The value returned is the Data Item value of the message.
 

Attachments

  • 155542-ParseTextToLines.zip
    86.5 KB · Views: 390
Last edited:
Hi Peter,

Great work! Thanks for sharing this program with us. Could you please kindly attach the BSFN in non-unicode BG1 format (ParseTextToLines.BG2). We are on 8.0 and would love to use this functionality.

Thanks

Libu.
 
Libu,

Sorry for the delay. One of the standard JDE Data Dictionary items I used is relatively new. The Data Dictionary item AGEN, which is not in Xe, was copied from a later version and added when building the BG1 Boomerang package (thanks Alex Pastuhov). The attached zip file contains the BG1 Boomerang package.

The only standard JDE business function used in the NER (details below) appears to have been there since 1997.

Function Description: Convert CR And LR to Hexadecimal
Function Name: ConvertCRLFtoHex
Source Module: B7400150
Data Structure: D7400150

The BG1 Boomerang package includes
Data Structure: D55PRSTXT
NER: N55PRSTX
Data Dictionary Item: AGEN
Data Dictionary Messages:
PRSTXT0001 - No Text to Parse Found
PRSTXT0002 - Line Length of Parsed Text Too Large
PRSTXT0003 - Parsed Text Line No Found
PRSTXT0004 - Parsing Text Starting Position Invalid
 

Attachments

  • 156064-ParseTextToLinesBG1.zip
    105.4 KB · Views: 242
Last edited:
Back
Top