E9.2 How To Read A Web Page for Processing

Euroboy · Jan 11, 2023

Hi All

I have a requirement where the data that i need to process is provided on a web page controlled by a 3rd party / one of our suppliers.
I can determine the URL for the webpage to land me at the right place, which will display all the data I need ("contents" / data on web page is basically in XML format).
the problem is how can I "read" the page / contents of web page?

I guess the starting point would be use Orchestrator to land on the webpage ... but what then?

Thanks for any help that may come my way.

on E1 9.2.6.3

BOster · Jan 11, 2023

If you want to scrape data off a Web Page your best bet is to do this piece outside of the JDE toolset in some other language/toolset like Java or Python and expose the functionality as a Web Service that can be consumed by Orchestrator (or BSSV/BSFN, etc.) to provide this functionality in JDE. The downside, obviously, is you would need the architecture to deploy this too. If all you have are the JDE based servers then I am sure there are Orchestrator Gurus here that could probably detail how to do this with Orchestrator/Groovy - I would probably try and do this with BSSVs and Java or even a C BSFN if trying to do it within the JDE toolset. To me it would just be easier to do it outside the confines of the JDE toolset - Python for example has libraries to help you scrape data off a Web Page.

You may also want to make sure that doing this (scraping data off a web page) doesn't violate your 3rd party supplier's TOS before you do this.

Euroboy · Jan 11, 2023

BOster said:
If you want to scrape data off a Web Page your best bet is to do this piece outside of the JDE toolset in some other language/toolset like Java or Python and expose the functionality as a Web Service that can be consumed by Orchestrator (or BSSV/BSFN, etc.) to provide this functionality in JDE. The downside, obviously, is you would need the architecture to deploy this too. If all you have are the JDE based servers then I am sure there are Orchestrator Gurus here that could probably detail how to do this with Orchestrator/Groovy - I would probably try and do this with BSSVs and Java or even a C BSFN if trying to do it within the JDE toolset. To me it would just be easier to do it outside the confines of the JDE toolset - Python for example has libraries to help you scrape data off a Web Page.

You may also want to make sure that doing this (scraping data off a web page) doesn't violate your 3rd party supplier's TOS before you do this.

Thanks Brian.

We need to do this using the std JDE toolset, and my preferred way would be via the orchestrator route rather than BSSV.

If there are any orchestrator Gurus out there, want to show the rest of us what to do?

Thanks, again

DaveWagoner · Jan 11, 2023

You certainly can just do a connection/connector out to a http website and get the entirety of the HTML/CSS back. Parsing the return body becomes a really "fun" exercise.

nomir@ · 2024-05-25T05:13:07-0700

DaveWagoner said:
You certainly can just do a connection/connector out to a http website and get the entirety of the HTML/CSS back. Parsing the return body becomes a really "fun" exercise.

View attachment 19491

Hi Dave! What configuration did you make about the orchestration so that the output is "response:"?

Kevin Long · 2024-05-26T06:05:51-0700

In the REST Connector, there is a section called Manipulating Output. Orchestrator will wrap whatever was received in JSON (when it does not receive JSON) in a string and call it response. If your source website is primarily XML, you should be able to separate the xml from the html and then parse it using REXML. There is some boiler plate code that is commented out that shows a simple example, but if you search for Ruby and REXML you should find some examples for how to parse your xml.

Following on to @DaveWagoner's response, you can return the response to your orchestration and then pass it in to a custom request to exact what you need. Personally, while I may write my parsing code in a custom request so I can test as I go, I will incorporate it into the REST connector Manipulating Output section so what I return from the API is ready to be used in my orchestration.

E9.2 How To Read A Web Page for Processing

Euroboy

Active Member

BOster

Legendary Poster

Euroboy

Active Member

DaveWagoner

VIP Member

nomir@

Member

Kevin Long

Well Known Member

Similar threads

We value your privacy