From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adam Goryachev Subject: Re: RAID performance Date: Fri, 08 Feb 2013 18:11:55 +1100 Message-ID: <5114A53B.9060103@websitemanagers.com.au> References: <51134E43.7090508@websitemanagers.com.au> <51137FB8.6060003@websitemanagers.com.au> <511471EA.2000605@hardwarefreak.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <511471EA.2000605@hardwarefreak.com> Sender: linux-raid-owner@vger.kernel.org To: stan@hardwarefreak.com Cc: Dave Cundiff , linux-raid@vger.kernel.org List-Id: linux-raid.ids On 08/02/13 14:32, Stan Hoeppner wrote: > On 2/7/2013 5:07 AM, Dave Cundiff wrote: > >> Its not going to help your remote access any. From your configuration >> it looks like you are limited to 4 gigabits. At least as long as your >> NICs are not in the slot shared with the disks. If they are you might >> get some contention. >> >> http://download.intel.com/support/motherboards/server/sb/g13326004_s1200bt_tps_r2_0.pdf >> >> See page 17 for a block diagram of your motherboard. You have a 4x DMI >> connection that PCI slot 3, your disks, and every other onboard device >> share. That should be about 1.2GB(10Gigabits) of bandwidth. > > This is not an issue. The C204 to LGA1155 connection is 4 lane DMI 2.0, > not 1.0, so that's 40Gb/s and 5GB/s duplex, 2.5GB/s each way, which is > more than sufficient for his devices. Thanks, good to know, though I will still check that the network cards are in the right slots (ie, in slot 4 and 5) >> Your SSDs >> alone could saturate that if you performed a local operation. > > See above. However, using an LSI 9211-8i, or better yet a 9207-8i, in > SLOT6 would be more optimal: > > 1. These board's ASICs are capable of 320K and 700K IOPS respectively. > As good as it may be, the Intel C204 Southbridge SATA IO processor is > simply not in this league. Whether it is a bottleneck in this case is > unknown at this time, but it's a possibility, as the C204 wasn't > designed with SSDs in mind. > > 2. SLOT6 is PCIe x8 with 8GB/s bandwidth, 4GB/s each way, which can > handle the full bandwidth of 8 of these Intel 480GB SSDs. OK, so potentially, I may need to get a new controller board. Is there a test I can run which will determine the capability of the chipset? I can shutdown all the VM's tonight, and run the required tests... >> Get your >> NIC's going at 4Gig and all of it a sudden you'll really want that >> SATA card in slot 4 or 5. > > Which brings me to the issue of the W2K DC that seems to be at the root > of the performance problems. Adam mentioned one scenario, where a user > was copying a 50GB file from "one drive to another" through the Windows > DC. That's a big load on any network, and would tie up both bonded GbE > links for quite a while. All of these Windows machines are VM guests > whose local disks are apparently iSCSI targets on the server holding the > SSD md/RAID5 array. This suggests a few possible causes: > > 1. Ethernet interface saturation on Xen host under this W2K file server > 2. Ethernet bonding isn't configured properly and all iSCSI traffic > for this W2K DC is over a single GbE link, limiting throughput to > less than 100MB/s. >From the switch stats, ports 5 to 8 are the bonded ports on the storage server (iSCSI traffic): Int PacketsRX ErrorsRX BroadcastRX PacketsTX ErrorsTX BroadcastTX 5 734007958 0 110 120729310 0 0 6 733085348 0 114 54059704 0 0 7 734264296 0 113 45917956 0 0 8 732964685 0 102 95655835 0 0 So, traffic seems reasonably well balanced across all four links, though these stats were reset 16.5 days ago, so I'm not sure if they have wrapped. The PacketsTX numbers look a little funny, with 5 and 8 getting double compared to 6 and 7, but they are all certainly in use. > 3. All traffic, user and iSCSI, traversing a single link. This is true for the VM's, but "all traffic" is mostly iSCSI, user traffic is just RDP, which is minimal. > 4. A deficiency in the iSCSI configuration yielding significantly less > than 100MB/s throughput. Possible, but in the past, silly, useless, performance testing with a single physical machine, all VM's stopped, produced read performance of 100 to 110MB/s (using dd with DIRECT option). I also tested two in parallel, and did see some reduction, but I think both would get around 90MB/s... I can do more of this testing tonight. (This testing did produce one issue that some machines were only getting 70MB/s, but this was due to being connected to a second gigabit switch using a single gigabit port for the uplink. Now, all physical machines and all 4 of the iSCSI ports are on the same switch. > 5. A deficiency in IO traffic between the W2K guest and the Xen host. I can try and do some basic testing here.... The xen hosts have a single Intel SSD drive, but I think disk space is very limited. I might be able to copy a small winXP onto a local disk to test performance. > 6. And number of kernel tuning issues on the W2K DC guest causing > network and/or iSCSI IO issues, memory allocation problems, pagefile > problems, etc. On most of the terminal servers and application servers, they were using the pagefile during peak load (win2003 is limited to 4G RAM), so I have allocated 4GB of RAM as a block device, and allocated to windows, which windows then places a 4G pagefile on it. This removed the pagefile load from the network and storage system, but had minimal noticeable impact. > 7. A problem with the 16 port GbE switch, bonding or other. It would > be very worthwhile to gather metrics from the switch for the ports > connected to the Xen host with the W2K DC, and the storage server. > This could prove to be enlightening. The win2k DC is on physical machine 1 which is on port 9 of the switch, I've included the above stats here as well: Int PacketsRX ErrorsRX BroadcastRX PacketsTX ErrorsTX BroadcastTX 5 734007958 0 110 120729310 0 0 6 733085348 0 114 54059704 0 0 7 734264296 0 113 45917956 0 0 8 732964685 0 102 95655835 0 0 9 1808508983 0 72998 1942345594 0 0 I can also see very detailed stats per port, I'll show port 9 detailed stats here, and comment below. If you think any other port would be useful, please let me know. Interface g9 MST ID CST ifIndex 9 Port Type Port Channel ID Disable Port Role Disabled STP Mode STP State Manual forwarding Admin Mode Enable LACP Mode Enable Physical Mode Auto Physical Status 1000 Mbps Full Duplex Link Status Link Up Link Trap Enable Packets RX and TX 64 Octets 49459211 Packets RX and TX 65-127 Octets 1618637216 Packets RX and TX 128-255 Octets 226809713 Packets RX and TX 256-511 Octets 26365450 Packets RX and TX 512-1023 Octets 246692277 Packets RX and TX 1024-1518 Octets 1587427388 Packets RX and TX > 1522 Octets 0 Octets Received 625082658823 Packets Received 64 Octets 15738586 Packets Received 65-127 Octets 1232246454 Packets Received 128-255 Octets 104644153 Packets Received 256-511 Octets 9450877 Packets Received 512-1023 Octets 208875645 Packets Received 1024-1518 Octets 239934983 Packets Received > 1522 Octets 0 Total Packets Received Without Errors 1810890698 Unicast Packets Received 1810793833 Multicast Packets Received 23697 Broadcast Packets Received 73168 Total Packets Received with MAC Errors 0 Jabbers Received 0 Fragments Received 0 Undersize Received 0 Alignment Errors 0 Rx FCS Errors 0 Overruns 0 Total Received Packets Not Forwarded 0 Local Traffic Frames 0 802.3x Pause Frames Received 0 Unacceptable Frame Type 0 Multicast Tree Viable Discards 0 Reserved Address Discards 0 Broadcast Storm Recovery 0 CFI Discards 0 Upstream Threshold 0 Total Packets Transmitted (Octets) 2070575251257 Packets Transmitted 64 Octets 33720625 Packets Transmitted 65-127 Octets 386390762 Packets Transmitted 128-255 Octets 122165560 Packets Transmitted 256-511 Octets 16914573 Packets Transmitted 512-1023 Octets 37816632 Packets Transmitted 1024-1518 Octets 1347492405 Packets Transmitted > 1522 Octets 0 Maximum Frame Size 1518 Total Packets Transmitted Successfully 1944500557 Unicast Packets Transmitted 1940616380 Multicast Packets Transmitted 2164121 Broadcast Packets Transmitted 1720056 Total Transmit Errors 0 Tx FCS Errors 0 Underrun Errors 0 Total Transmit Packets Discarded 0 Single Collision Frames 0 Multiple Collision Frames 0 Excessive Collision Frames 0 Port Membership Discards 0 STP BPDUs Received 0 STP BPDUs Transmitted 0 RSTP BPDUs Received 0 RSTP BPDUs Transmitted 0 MSTP BPDUs Received 0 MSTP BPDUs Transmitted 0 802.3x Pause Frames Transmitted 1230476 EAPOL Frames Received 0 EAPOL Frames Transmitted 0 Time Since Counters Last Cleared 16 day 16 hr 11 min 52 sec To me, there are two interesting bits of information here. 1) We can see a breakdown of packet sizes, and this shows no jumbo frames at all, I'm not real sure if this would help, or how to go about configuring it, though I guess it only needs to be done within the linux storage server, linux physical machines, and the switch. 2) The value for Pause Frames Transmitted, I'm not sure what this is, but it doesn't sound like a good thing.... http://en.wikipedia.org/wiki/Ethernet_flow_control Seems to indicate that the switch is telling the physical machine to slow down sending data, and if these happen at even time intervals, then that is an average of one per second for the past 16 days..... Looking at the port 5 (one of the ports connected to the storage server) this value is much higher (approx 24 per second averaged over 16 days). I can understand that the storage server can send faster that any individual receiver, so I can see why the switch might tell it to slow down, but I don't see why the switch would tell the physical machine to slow down. So, to summarise, I think I need to look into the network performance, and find out what is going on there. However, before I get too complex, I'd like to confirm that things are working properly on the local machine at the RAID5, and hopefully DRBD + LVM layers. So, I got some fio tests last night to run, which I'll do afterhours tonight, and then post those stats. If that shows: 1) RAID5 performance is excellent, then I should be able to avoid purchase of an extra controller card, and mark the SATA chipset OK 2) DRBD performance is excellent, then I can ignore config errors there 3) LVM performance is excellent, then I can ignore config errors there That then leaves me with iSCSI issues, network issues, etc, but, like I said, one thing at a time. Are there any other or related tests you think I should be running on the local machine to ensure things are working properly? Any other suggestions or information I need to provide? Should I setup and start graphing some of these values from the switch? I'm sure it supports SNMP, so I could poll and dump the values into some RRD files for analysis. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au