EnterpriseOne on iSCSI...

altquark

altquark

Legendary Poster
I know we've talked about this in the past - and its somewhat of a bit of a holy war, I'm sure - but I couldn't find any reference to the last discussion (I think it was a hijacked thread to be honest) and I thought that the forum was quiet enough recently and it was time to start another crusade...

So...has anyone used iSCSI hardware for their database server array - specifically for Microsoft SQL Server - but any experience in a database setting would be good information.

I've scoured the net, and once I eliminated the obvious "it works great under Exchange" crud - I managed to find a couple of whitepapers (one of which was from Oracle) that indicate pretty good performance of iSCSI arrays in comparison with traditional arrays.

iSCSI, by the way, is IP attached storage - but the latest iSCSI array systems seem to be architected a lot different from the old cruddy "NAS" stuff - and the whitepapers I've read seem to indicate comparative performance with fiber arrays.

So - has anyone tested or even implemented this stuff with an ERP system ? I'd be very interested to hear...
 
Oh man did you open up a hurt locker here.

Yep I recently did a major tech upgrade from a 1 CPU (non-HT) Enterprise (4 GB) Server and 2 CPU (HT) SQL 2000 Database Server (4 GB) to a combined 4 way Dual Core machine with 16 GB RAM with SQL 2005.

The SAN was a iSCSI EMC Celerra. I'm a FC SAN guy and I didn't have control over the SAN config because it "is what it is" - and what it happened to be was a HUGE RAID 5 array. Not only was the Database on here but also a bunch of file servers and for a short experimental amount of time the Exchange Server was here as well.

So I guess you know where I'm going with this......performance was actually faster with local disk! I finally got the client to let me move the transaction logs and Tempdb on local disk. This helped keep the performance on par with Local disk.

I've got tons of documentation on how to set up an iSCSI SAN which is just sitting around so you're welcome to it (gimme FTP credentials). All of the manufacturers have tons of info on how to make this work - you just need to sit doen and read the info which didn't happen here (again I had no control).

Here are a few good docs you'll want to check:


http://www.emc.com/techlib/pdf/H2372_microsoft_sql_svr_2005_ns_ser_gde_ldv.pdf

http://www.emc.com/techlib/abstract.jsp?id=1764&c=US&l=en
http://www.emc.com/techlib/pdf/H2372_microsoft_sql_svr_2005_ns_ser_gde_ldv.pdf

http://www.emc.com/techlib/abstract.jsp?id=1661&c=US&l=en

http://www.emc.com/solutions/microsoft/sql_server/to_the_point/
 
Thats what I thought would happen - that performance would be faster with "local" disk compared to the iSCSI array - thanks for the documents, I'll look into them....
 
All the evidence that I have says it works fine if done correctly. In my case it wasn't.

Check out the attachment.......summarizes the issues.
 

Attachments

  • 128599-IP SAN or Fibre Channel SAN.pdf
    93.4 KB · Views: 1,063
Everything I've read suggests that iSCSI transfers should be on a par with FC SAN transfers. It helps if you've got some dedicated network cards with TOE off-loading so that the card handles the network/storage traffic and not the CPU.

I hope you're not put off from the technology by a sub-optimally configured RAID array ...
tongue.gif
 
Right - but everything I've read seems to indicate that BANDWIDTH is fast. Nothing and nobody has compared LATENCY - and, given the way that SQL communicates with disks I'm concerned that the latency is the biggest issue. Can anyone with an iSCSI array provide any information on how the latency stacks up to Fiber ?
 
I started to look around, because I hadn't seen that anywhere. I still haven't seen anything that really talks about the latency differences, other than to say that iSCSI > FC > local SCSI.

This is an old article (http://searchstorage.techtarget.com/columnItem/0,294698,sid5_gci1161824,00.html), but it suggests that the effects of the latency differences can be minimized by ensuring that your iSCSI target has good caching and a TOE card to minimize the CPU workload. And, of course, ensuring that you have enough disk arms to ensure good performance. All standard stuff.

Did anybody ever get back to you with numbers comparing the two? The cost isn't trivial, so I'd imagine that only the larger firms would have the resources to be able to play with iSCSI vs. FC.
 
As someone who has JUST went through this on an EMC Clariion....don't do it.

Here are a couple key points to consider (NOTE FOR EMC CLARiiON ARRAYS):
1. MS MPIO allows for round robin load balancing of the interfaces. When the database is talking to the SAN, it is actually talking to the datafiles. If you have 2+ paths (FC or iSCSI) and EMC PowerPath 5+, FLARE 26, you will get the Bandwidth of each port to the database. Granted, data must be spread across the datafiles equally.

2. You need 4 1gb iSCSI port for each 4gb fibre, and, fibre does not have the overhead iSCSI has. In my situation, I have 8gb bus to the data. I now have disk queuing which shows I am disk head bound. On iSCSI, I was bus bound using 2 1gb ports into the array.

3. Salesmen will sell you anything to make the sell. EMC and Equallogic both pulled a log of the 24 usage of the system. By looking at 95%, peaks, etc, they said that 2gb iSCSI should be fine. Actually, 1gb iSCSI should be fine. What they did not factor in was that users/batch jobs use the system 16 hours per 24 hours which messes with the numbers. While we did not use more than 2gb/s throughput over the entire 24 hours, we maxed our Ultra320 SCSI (which is 320MB, or around 3gb) during the day. To cut the mid day performance to users by 50% would cause me to lose my job after I asked to spend a lot of money on the SAN!

4. No mater which way you go, be sure to set you offsets dependant on the array striping.

5. Host your LUNs on different storage processors to distribute the processor/port load for performance.

Here are some real world performance numbers to chew on.

All tests were with 15gig of data loaded into the database across 2 files. A single select sum(field1) from table1 was done for the test. DB cache was cleared after each test.

Old Production Server:
SQL 2000 Standard (2GB of ram to DB), 2.5Ghz + HT (4 procs) 12 72GB disks at 15k RPM Ultra320 (12 disks in Raid 5), 384 controller cache (50%/50%)
1. Statement run once: 1m 59s
(Benchmark)

Test Server:
SQL 2000 Standard (2GB of ram to DB), Dual 3.6Ghz + HT (4 procs), 20 15k fibre disks in CX3-20 in 2 LUNS(10 disks in 2 Raid 5s). 8 iSCSI ports on array and 2 iSCSI ports on test server, 9000MTU
1. Statement run once: 3m14s
2. Statement run 10 times by 3 users (at same time): No test. Could not perform single test good enough.

Test Server:
SQL 2000 Standard (2GB of ram to DB), Dual 3.6Ghz + HT (4 procs), 20 15k fibre disks in CX3-20 in 2 LUNS (10 disks in 2 Raid 5s). 4 4GB fibre ports on array and 2 4GB ports on test server
1. Statement run once: 30s
2. Statement run 10 times by 3 users (at same time): 4m39s, 4m23s, 4m24s

SQL 2005 Enterprise (30GB of ram to DB), Dual Quad Core 3.6Ghz (8 procs), 20 15k fibre disks in CX3-20 in 2 LUNS(10 disks in 2 Raid 5s). 4 4GB fibre ports on array and 2 4GB ports on test server
1. Statement run once: 30s
2. Statement run 10 times by 3 users (at same time): 37s, 42s, 41s

Obviously the RAM is a factor at the end in tuning the system. But raw data access to disk shows the problem.

It really depends on how your system runs and what is expected. For us, the min goal was to at LEAST match the production system with DAS. For my JDE system, iSCSI will not perform. I spent 2 months trying to make it perform. You just don't having the throughput *yet* for the business needs.
 
With my current setup, I actually pull about 5-6gb to the database (cap of 8gb of course). The number of drive heads hosting the datafiles is now my limit. Which is exactly where I want it to be.
 
[ QUOTE ]
fibre does not have the overhead iSCSI has.

[/ QUOTE ]
This was one of my answers to the customer - IP is one of the most overhead-heavy protocols there is - the majority of the datagram is made up of routing information, and a standard TCP/IP packet just doesn't have a lot of room for pure data. I'm glad that someone brought this up - I believe that iSCSI has issues with latency BECAUSE of the levels of the OSI model being utilized - all that routing information has to be deciphered, and that has to take time !

A lot of architecture is common sense - and while iSCSI is a nice easy method to add storage to your enterprise relatively inexpensively - the expense isn't a dramatic cost saving, and the performance issues are more than a challenge for those customers that have optioned for this. Most IT organizations view the fact that because it works on exchange servers, its got to work on database servers. Pity that few of those IT organizations understand database technology !
 
Changing the MTU to 9000 helps alot in the performance, but you are still lacking the bus speed to run a database with any "real" amount of data. Small email data is nothing compared to someone wanting to run a large GL report where gigs of data are coming from the SAN to the DB server.

1gb iSCSI has its place. Just not in large OLAP databases (IMHO). Large data warehouse where the response time of the UBE does not matter if it takes 1 hours vs 2-3 hours then go for it. Mission critical apps, aint going happen. You would put it in production only to prompting be told to take it back out as it will not perform.

Lets look at some costs...

2 single port fibre HBAs ~ $1200 each ($2400)
8 port cisco fibre switch ~ $4000 (you *can* connect directly to the SAN without a switch)

So for $6500, you get a bare bones 8gb connection to the SAN. Add a second switch for failover, more servers, etc, price goes up. Only $2400 if you only want the DB server for performance and throw the rest of the servers on iSCSI.

iSCSI HBA (or you can use regular gb nics) cost around $750. Cisco 24 port gb switch is around $5000 I think (you are using dedicated hardware for the iSCSI network aren't you....).

When you work the numbers, the cost is about the same but with fibre, it works correctly. The only thing iSCSI gets you is if you have the hardware already. For my ERP, I would not VLAN a switch for the SAN. Dedicated hardware for the job is the best route IMHO.

The cost has really come out of the fibre market. Set the system up right for the data you will be moving. The next thing the bosses will tell you is to install everything on 5 750GB SATA drives
shocked.gif
.

Total solution does not just mean work on one piece. You need performance of drives, number of drive heads, memory for server, bus speed, and good indexes. Mess up any 1 of the 5 and you might as well not change anything. They all depend on each other.

And any SAN vendor that tells you to put everything on the same drives or "we have a lot of cache for fetches so iSCSI is fine", run.

Let talk cache real quick.

According to EMC, read cache should be NO MORE than 20% of the cache. If you have it set more than that, you are wasting the cache. The slow part of the process is the write. It needs all the cache it can get. I have tested this out as well and found it to be true. Did you know the M$ even has stated that you should set your arrary controller for SCSI to use 100% write cache and no read cache? MS Document

Lots of ways to tune a system. Know your data, know your server, know your DB software, know your bus, know your drives, know your SAN, and know how to CONFIGURE your san. If you do it right, you will get terrific performance and a pat on the back. If not, you may be looking for a new job.
 
Excellent article - and this completely supports what I was telling the customer but from a different angle. I hope others read this article and realize that disk architecture IS important and that "copping out" to some hardware vendor with iSCSI is just a bad, bad methodology - and realistically does NTO save them money.

Hardware vendors have introduced some really bad concepts in the past few years including iSCSI and Blade Systems. I'm certain there are others !
 
And for those who are thinking "Gee, I wonder how this test would run on 5 750GB SATA disks in Raid 5 using fibre channel or iSCSI?", I did that test too.

2m22s for FC
7m55s for iSCSI

And SATA drives can only perform up to 3gb FC, not 4gb.

On all my tests, I never bothered to test write performance when testing the iSCSI. If I can't read sequencial data the same or better than DAS, no need to continue my tests.

Just because you have a lot of disk SPACE, does not mean you have a lot of disk PERFORMANCE.
 
Typo on my benchmark number. It was 1m10s on ultra320. Will fix the post.
 
This is great info! Thanks for sharing your experiences, and thanks to Jon for starting the thread!
 
No problem. I went through a lot of pain to get the system running as well as it could for the drives I had access to. A lot of long days and nights working on it. Now if I can get my replication manager licensed correctly, I can get users in to test SQL 2005 64bit!

I have attached a doc showing the drive layouts we are using. Hope it helps someone. There just isn't a lot of good information out there to make an informed opinion until you try it all for yourself. Your milage might be different, but make sure you understand all the players in the design before listening to a salesman.
 

Attachments

  • 129066-EMCDriveLayout.doc
    32 KB · Views: 95
Back
Top