* Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
@ 2013-04-19 22:58 Andrei Banu
[not found] ` <CAH3kUhEaZGON=fAyVMZOz5fH_DcfKv=hCa96UCeK4pN7k81c_Q@mail.gmail.com>
` (3 more replies)
0 siblings, 4 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-19 22:58 UTC (permalink / raw)
To: linux-raid
Hello!
I come to you with a difficult problem. We have a server otherwise
snappy fitted with mdraid-1 made of Samsung 840 PRO SSDs. If we copy a
larger file to the server (from the same server, from net doesn't
matter) the server load will increase from roughly 0.7 to over 100 (for
several GB files). Apparently the reason is that the raid can't write well.
Few examples:
root [~]# dd if=testfile.tar.gz of=test20 oflag=sync bs=4M
130+1 records in
130+1 records out
547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
And 10-20 seconds later I try the very same test:
root [~]# dd if=testfile.tar.gz of=test21 oflag=sync bs=4M
130+1 records in / 130+1 records out
547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
A different test with 'bs=1G'
root [~]# w
12:08:34 up 1 day, 13:09, 1 user, load average: 0.37, 0.60, 0.72
root [~]# dd if=testfile.tar.gz of=test oflag=sync bs=1G
0+1 records in / 0+1 records out
547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
root [~]# w
12:09:56 up 1 day, 13:11, 1 user, load average: 39.29, 12.67, 4.93
It needed 75 seconds to copy a half GB file and the server load
increased 100 times.
And a final test:
root@ [~]# dd if=/dev/zero of=test24 bs=64k count=16k conv=fdatasync
16384+0 records in / 16384+0 records out
1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
This time the load spiked to only ~ 20.
A few other peculiarities:
root@ [~]# hdparm -t /dev/sda
Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55 MB/sec
root@ [~]# hdparm -t /dev/sdb
Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44 MB/sec
The read speed is very different between the 2 devices (the margin is
140%) but look what happens when I run it with --direct:
root@ [~]# hdparm --direct -t /dev/sda
Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23 MB/sec
root@ [~]# hdparm --direct -t /dev/sdb
Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53 MB/sec
So the hardware seems to sustain speeds of about 200MB/s on both
devices but it differs greatly.
The measurement of sda increased 20% but sdb doubled. Maybe there's a
problem with the page cache?
BACKGROUND INFORMATION
Server type: general shared hosting server (3 weeks new)
O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
Hardware: SuperMicro 5017C-MTRF, E3-1270v2, 16GB RAM, 2 x Samsung 840
PRO 512GB
Partitioning: ~ 100GB left for over-provisioning, ext 4:
I believe it is aligned:
root [~]# fdisk -lu
Disk /dev/sda: 512.1 GB, 512110190592 bytes
255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00026d59
Device Boot Start End Blocks Id System
/dev/sda1 2048 4196351 2097152 fd Linux raid
autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 * 4196352 4605951 204800 fd Linux raid
autodetect
Partition 2 does not end on cylinder boundary.
/dev/sda3 4605952 814106623 404750336 fd Linux raid
autodetect
Disk /dev/sdb: 512.1 GB, 512110190592 bytes
255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003dede
Device Boot Start End Blocks Id System
/dev/sdb1 2048 4196351 2097152 fd Linux raid
autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2 * 4196352 4605951 204800 fd Linux raid
autodetect
Partition 2 does not end on cylinder boundary.
/dev/sdb3 4605952 814106623 404750336 fd Linux raid
autodetect
The matrix is NOT degraded:
root@ [~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[1] sda2[0]
204736 blocks super 1.0 [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
404750144 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sdb1[1] sda1[0]
2096064 blocks super 1.1 [2/2] [UU]
unused devices: <none>
Write cache is on:
root@ [~]# hdparm -W /dev/sda
write-caching = 1 (on)
root@ [~]# hdparm -W /dev/sdb
write-caching = 1 (on)
SMART seems to be OK:
SMART overall-health self-assessment test result: PASSED (for both devices)
I have tried changing IO scheduler with NOOP and deadline but I couldn't
see improvements.
I have tried running fstrim but it errors out:
root [~]# fstrim -v /
fstrim: /: FITRIM ioctl failed: Operation not supported
So I have changed /etc/fstab to contain noatime and discard and rebooted
the server but to no avail.
I no longer know what to do. And I need to come up with some sort of a
solution (it's not reasonable nor acceptable to get at 3 digits loads
from copying several GBs worth of file). If anyone can help me, please do!
Thanks in advance!
Andy
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
[not found] ` <CAH3kUhHxBiqugFQm=PPJNNe9jOdKy0etUjQNsoDz_LJNUCLCCQ@mail.gmail.com>
@ 2013-04-20 23:25 ` Andrei Banu
2013-04-20 23:26 ` Andrei Banu
1 sibling, 0 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-20 23:25 UTC (permalink / raw)
To: linux-raid
The previous test was done with "noop" for scheduler (the speed test
completed at about 8MB/s). Then I rebooted the server and redone the
test (also with noop) and the result was slightly better but not as it
should be (21MB/s). A third test 5-10 minute later (after the load
subsided) completed at 16MB/s. A fourth test ended with 14.6MB/s.
Something else: the weekly auto raid check started a little time ago and
it's going at an average of 60MB/s (anywhere between 25 and 100MB/s)
with noop, cfq and deadline. A raid check with ordinary mechanical
drives gets done at about 160MB/s on the outer cylinders. Why are these
SSDs so slow?
These are the result from the 21MB/s test (5 seconds delay):
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 204.94 1918.02 389.30 1367719 277605
sdb 154.80 1196.21 389.30 853008 277605
md1 0.65 2.59 0.00 1848 0
md2 355.45 3106.05 388.53 2214890 277056
md0 1.10 2.90 0.01 2069 9
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 583.40 42764.80 29452.90 213824 147264
sdb 234.80 23172.00 14950.50 115860 74752
md1 0.00 0.00 0.00 0 0
md2 8079.60 65886.40 29862.40 329432 149312
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 15.00 1.60 1740.00 8 8700
sdb 15.00 0.00 7196.80 0 35984
md1 0.00 0.00 0.00 0 0
md2 333.20 1.60 1330.40 8 6652
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 167.20 538.40 37432.80 2692 187164
sdb 86.20 16.00 33688.80 80 168444
md1 0.00 0.00 0.00 0 0
md2 9510.80 572.80 37934.40 2864 189672
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 150.20 585.60 29090.40 2928 145452
sdb 71.20 44.00 30355.20 220 151776
md1 0.00 0.00 0.00 0 0
md2 7306.20 615.20 28998.40 3076 144992
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 257.20 1624.00 9913.80 8120 49569
sdb 137.20 372.00 21438.60 1860 107193
md1 0.00 0.00 0.00 0 0
md2 2600.80 1991.20 9504.00 9956 47520
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 186.80 972.80 292.70 4864 1463
sdb 150.40 733.60 292.70 3668 1463
md1 0.00 0.00 0.00 0 0
md2 283.80 1706.40 291.20 8532 1456
md0 0.00 0.00 0.00 0 0
If you have any idea what can I do to improve this please let me know.
Thanks!!
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
[not found] ` <CAH3kUhEaZGON=fAyVMZOz5fH_DcfKv=hCa96UCeK4pN7k81c_Q@mail.gmail.com>
@ 2013-04-20 23:26 ` Andrei Banu
[not found] ` <51725458.7020109@redhost.ro>
1 sibling, 0 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-20 23:26 UTC (permalink / raw)
To: linux-raid
Hi,
I ran with '-d 3' iostat during a "heavy" (540MB) copy. It took a bit
over a minute and completed with less than 9MB/s. These are some of the
results (this does NOT include the first batch i.e. the average from
start up result):
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 503.00 1542.67 28157.33 4628 84472
sdb 66.00 72.00 13162.67 216 39488
md1 373.00 1492.00 0.00 4476 0
md2 6951.67 126.67 27734.67 380 83204
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 56.67 20.00 1177.50 60 3532
sdb 47.33 12.00 10824.17 36 32472
md1 0.67 2.67 0.00 8 0
md2 322.00 25.33 1266.67 76 3800
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 122.00 16.00 45773.33 48 137320
sdb 96.67 14.67 19472.00 44 58416
md1 0.00 0.00 0.00 0 0
md2 11431.00 32.00 45684.00 96 137052
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 13.67 8.00 5973.33 24 17920
md1 0.00 0.00 0.00 0 0
md2 2.00 8.00 0.00 24 0
md0 0.00 0.00 0.00 0 0
This is the "normal" iostat took after 10 minutes (this DOES include the
first batch i.e. the average from start up result):
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 281.83 973.99 641.55 212615675 140045467
sdb 215.51 665.94 641.55 145369465 140045467
md1 1.18 2.17 2.56 473492 558452
md2 470.71 1596.29 638.01 348460340 139272912
md0 0.08 0.27 0.00 59983 171
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 41.67 237.33 133.67 712 401
sdb 39.33 90.67 133.67 272 401
md1 0.00 0.00 0.00 0 0
md2 83.00 328.00 133.33 984 400
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 29.33 2.67 110.00 8 330
sdb 29.33 2.67 110.00 8 330
md1 0.00 0.00 0.00 0 0
md2 28.67 5.33 109.33 16 328
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 175.67 1.33 747.50 4 2242
sdb 182.00 56.00 747.50 168 2242
md1 0.00 0.00 0.00 0 0
md2 191.33 57.33 746.67 172 2240
md0 0.00 0.00 0.00 0 0
Best regards!
On 20/04/2013 3:59 AM, Roberto Spadim wrote:
> run some kind of iostat -d 1 -k and check the write/read iops and kb/s
>
>
> 2013/4/19 Andrei Banu <andrei.banu@redhost.ro
> <mailto:andrei.banu@redhost.ro>>
>
> Hello!
>
> I come to you with a difficult problem. We have a server otherwise
> snappy fitted with mdraid-1 made of Samsung 840 PRO SSDs. If we
> copy a larger file to the server (from the same server, from net
> doesn't matter) the server load will increase from roughly 0.7 to
> over 100 (for several GB files). Apparently the reason is that the
> raid can't write well.
>
> Few examples:
>
> root [~]# dd if=testfile.tar.gz of=test20 oflag=sync bs=4M
> 130+1 records in
> 130+1 records out
> 547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
>
> And 10-20 seconds later I try the very same test:
>
> root [~]# dd if=testfile.tar.gz of=test21 oflag=sync bs=4M
> 130+1 records in / 130+1 records out
> 547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
>
> A different test with 'bs=1G'
> root [~]# w
> 12:08:34 up 1 day, 13:09, 1 user, load average: 0.37, 0.60, 0.72
>
> root [~]# dd if=testfile.tar.gz of=test oflag=sync bs=1G
> 0+1 records in / 0+1 records out
> 547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
>
> root [~]# w
> 12:09:56 up 1 day, 13:11, 1 user, load average: 39.29, 12.67, 4.93
>
> It needed 75 seconds to copy a half GB file and the server load
> increased 100 times.
>
> And a final test:
>
> root@ [~]# dd if=/dev/zero of=test24 bs=64k count=16k conv=fdatasync
> 16384+0 records in / 16384+0 records out
> 1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
>
> This time the load spiked to only ~ 20.
>
> A few other peculiarities:
>
> root@ [~]# hdparm -t /dev/sda
> Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55 MB/sec
> root@ [~]# hdparm -t /dev/sdb
> Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44 MB/sec
>
> The read speed is very different between the 2 devices (the margin
> is 140%) but look what happens when I run it with --direct:
>
> root@ [~]# hdparm --direct -t /dev/sda
> Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23 MB/sec
> root@ [~]# hdparm --direct -t /dev/sdb
> Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53 MB/sec
>
> So the hardware seems to sustain speeds of about 200MB/s on both
> devices but it differs greatly.
> The measurement of sda increased 20% but sdb doubled. Maybe
> there's a problem with the page cache?
>
> BACKGROUND INFORMATION
> Server type: general shared hosting server (3 weeks new)
> O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
> Hardware: SuperMicro 5017C-MTRF, E3-1270v2, 16GB RAM, 2 x Samsung
> 840 PRO 512GB
> Partitioning: ~ 100GB left for over-provisioning, ext 4:
>
> I believe it is aligned:
>
> root [~]# fdisk -lu
>
> Disk /dev/sda: 512.1 GB, 512110190592 bytes
> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00026d59
>
> Device Boot Start End Blocks Id System
> /dev/sda1 2048 4196351 2097152 fd Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sda2 * 4196352 4605951 204800 fd Linux raid
> autodetect
> Partition 2 does not end on cylinder boundary.
> /dev/sda3 4605952 814106623 404750336 fd Linux raid
> autodetect
>
> Disk /dev/sdb: 512.1 GB, 512110190592 bytes
> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0003dede
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 2048 4196351 2097152 fd Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sdb2 * 4196352 4605951 204800 fd Linux raid
> autodetect
> Partition 2 does not end on cylinder boundary.
> /dev/sdb3 4605952 814106623 404750336 fd Linux raid
> autodetect
>
> The matrix is NOT degraded:
>
> root@ [~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb2[1] sda2[0]
> 204736 blocks super 1.0 [2/2] [UU]
> md2 : active raid1 sdb3[1] sda3[0]
> 404750144 blocks super 1.0 [2/2] [UU]
> md1 : active raid1 sdb1[1] sda1[0]
> 2096064 blocks super 1.1 [2/2] [UU]
> unused devices: <none>
>
> Write cache is on:
>
> root@ [~]# hdparm -W /dev/sda
> write-caching = 1 (on)
> root@ [~]# hdparm -W /dev/sdb
> write-caching = 1 (on)
>
> SMART seems to be OK:
> SMART overall-health self-assessment test result: PASSED (for both
> devices)
>
> I have tried changing IO scheduler with NOOP and deadline but I
> couldn't see improvements.
>
> I have tried running fstrim but it errors out:
>
> root [~]# fstrim -v /
> fstrim: /: FITRIM ioctl failed: Operation not supported
>
> So I have changed /etc/fstab to contain noatime and discard and
> rebooted the server but to no avail.
>
> I no longer know what to do. And I need to come up with some sort
> of a solution (it's not reasonable nor acceptable to get at 3
> digits loads from copying several GBs worth of file). If anyone
> can help me, please do!
>
> Thanks in advance!
> Andy
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> <mailto:majordomo@vger.kernel.org>
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
> --
> Roberto Spadim
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
[not found] ` <CAH3kUhHxBiqugFQm=PPJNNe9jOdKy0etUjQNsoDz_LJNUCLCCQ@mail.gmail.com>
2013-04-20 23:25 ` Andrei Banu
@ 2013-04-20 23:26 ` Andrei Banu
2013-04-21 2:48 ` Stan Hoeppner
2013-04-25 11:38 ` Thomas Jarosch
1 sibling, 2 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-20 23:26 UTC (permalink / raw)
To: linux-raid
Hi!
They are connected through SATA2 ports (this does explain the read speed
but not the pitiful write one) in AHCI.
Ok, I redid the test with '-d 6' seconds and 'noop' scheduler during the
same file copy and these are the entire results:
root [~]# iostat -d 6 -k
Linux 2.6.32-358.2.1.el6.x86_64 (host) 04/21/2013 _x86_64_(8 CPU)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 245.95 832.69 591.13 219499895 155823699
sdb 190.80 572.24 590.88 150844446 155758671
md1 1.15 2.15 2.43 567732 641156
md2 406.02 1368.44 587.74 360725304 154930520
md0 0.06 0.23 0.00 59992 171
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 34.17 0.00 4466.00 0 26796
sdb 9.67 0.00 4949.33 0 29696
md1 0.00 0.00 0.00 0 0
md2 1116.50 0.00 4466.00 0 26796
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 35.17 0.00 5475.33 0 32852
sdb 9.33 2.00 4522.67 12 27136
md1 0.00 0.00 0.00 0 0
md2 1369.67 8.00 5475.33 48 32852
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 40.33 0.00 3160.00 0 18960
sdb 19.50 0.00 7882.00 0 47292
md1 0.00 0.00 0.00 0 0
md2 790.50 2.67 3160.00 16 18960
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 77.67 4.00 15328.00 24 91968
sdb 50.33 16.00 10972.67 96 65836
md1 0.00 0.00 0.00 0 0
md2 3834.33 9.33 15328.00 56 91968
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 66.67 48.00 10604.00 288 63624
sdb 23.17 0.00 9660.00 0 57960
md1 0.00 0.00 0.00 0 0
md2 2653.50 51.33 10604.00 308 63624
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 37.83 24.67 5378.67 148 32272
sdb 13.17 3.33 6315.33 20 37892
md1 0.00 0.00 0.00 0 0
md2 1345.17 26.00 5378.67 156 32272
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 132.50 4.67 22714.00 28 136284
sdb 32.33 20.00 12328.00 120 73968
md1 0.00 0.00 0.00 0 0
md2 5713.67 31.33 22843.33 188 137060
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 58.17 6.00 8200.00 36 49200
sdb 23.00 8.00 11349.33 48 68096
md1 0.00 0.00 0.00 0 0
md2 1936.17 21.33 7729.33 128 46376
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 6.17 0.00 24.67 0 148
sdb 10.00 0.00 5120.00 0 30720
md1 0.00 0.00 0.00 0 0
md2 6.17 0.00 24.67 0 148
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1.50 0.00 5.33 0 32
sdb 14.17 0.00 7170.67 0 43024
md1 0.00 0.00 0.00 0 0
md2 1.50 0.00 5.33 0 32
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 256.00 346.67 1105.17 2080 6631
sdb 270.83 544.00 7029.17 3264 42175
md1 49.33 170.00 27.33 1020 164
md2 311.83 705.33 1076.67 4232 6460
md0 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 51.17 46.67 219.08 280 1314
sdb 48.67 140.00 219.08 840 1314
md1 20.67 82.67 0.00 496 0
md2 58.00 104.00 218.00 624 1308
md0 0.00 0.00 0.00 0 0
Thank you for your time.
Kind regards!
On 20/04/2013 4:11 PM, Roberto Spadim wrote:
>
> Hum at beginning you have more iops than the end, how you connected
> this devices, normally a ssd can handler more than 1000 iops and a hd
> no more than 300iops, how did you configured the queue of ssd disks?
> Could you change it to noop and test again?
>
> Em 20/04/2013 05:39, "Andrei Banu" <andrei.banu@redhost.ro
> <mailto:andrei.banu@redhost.ro>> escreveu:
>
> Hi,
>
> I ran with '-d 3' iostat during a "heavy" (540MB) copy. It took a
> bit over a minute and completed with less than 9MB/s. These are
> some of the results (this does NOT include the first batch i.e.
> the average from start up result):
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 503.00 1542.67 28157.33 4628 84472
> sdb 66.00 72.00 13162.67 216 39488
> md1 373.00 1492.00 0.00 4476 0
> md2 6951.67 126.67 27734.67 380 83204
> md0 0.00 0.00 0.00 0 0
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 56.67 20.00 1177.50 60 3532
> sdb 47.33 12.00 10824.17 36 32472
> md1 0.67 2.67 0.00 8 0
> md2 322.00 25.33 1266.67 76 3800
> md0 0.00 0.00 0.00 0 0
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 122.00 16.00 45773.33 48 137320
> sdb 96.67 14.67 19472.00 44 58416
> md1 0.00 0.00 0.00 0 0
> md2 11431.00 32.00 45684.00 96 137052
> md0 0.00 0.00 0.00 0 0
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 0.00 0.00 0.00 0 0
> sdb 13.67 8.00 5973.33 24 17920
> md1 0.00 0.00 0.00 0 0
> md2 2.00 8.00 0.00 24 0
> md0 0.00 0.00 0.00 0 0
>
> This is the "normal" iostat took after 10 minutes (this DOES
> include the first batch i.e. the average from start up result):
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 281.83 973.99 641.55 212615675 140045467
> sdb 215.51 665.94 641.55 145369465 140045467
> md1 1.18 2.17 2.56 473492 558452
> md2 470.71 1596.29 638.01 348460340 139272912
> md0 0.08 0.27 0.00 59983 171
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 41.67 237.33 133.67 712 401
> sdb 39.33 90.67 133.67 272 401
> md1 0.00 0.00 0.00 0 0
> md2 83.00 328.00 133.33 984 400
> md0 0.00 0.00 0.00 0 0
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 29.33 2.67 110.00 8 330
> sdb 29.33 2.67 110.00 8 330
> md1 0.00 0.00 0.00 0 0
> md2 28.67 5.33 109.33 16 328
> md0 0.00 0.00 0.00 0 0
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 175.67 1.33 747.50 4 2242
> sdb 182.00 56.00 747.50 168 2242
> md1 0.00 0.00 0.00 0 0
> md2 191.33 57.33 746.67 172 2240
> md0 0.00 0.00 0.00 0 0
>
> Best regards!
>
> On 20/04/2013 3:59 AM, Roberto Spadim wrote:
>> run some kind of iostat -d 1 -k and check the write/read iops
>> and kb/s
>>
>>
>> 2013/4/19 Andrei Banu <andrei.banu@redhost.ro
>> <mailto:andrei.banu@redhost.ro>>
>>
>> Hello!
>>
>> I come to you with a difficult problem. We have a server
>> otherwise snappy fitted with mdraid-1 made of Samsung 840 PRO
>> SSDs. If we copy a larger file to the server (from the same
>> server, from net doesn't matter) the server load will
>> increase from roughly 0.7 to over 100 (for several GB files).
>> Apparently the reason is that the raid can't write well.
>>
>> Few examples:
>>
>> root [~]# dd if=testfile.tar.gz of=test20 oflag=sync bs=4M
>> 130+1 records in
>> 130+1 records out
>> 547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
>>
>> And 10-20 seconds later I try the very same test:
>>
>> root [~]# dd if=testfile.tar.gz of=test21 oflag=sync bs=4M
>> 130+1 records in / 130+1 records out
>> 547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
>>
>> A different test with 'bs=1G'
>> root [~]# w
>> 12:08:34 up 1 day, 13:09, 1 user, load average: 0.37,
>> 0.60, 0.72
>>
>> root [~]# dd if=testfile.tar.gz of=test oflag=sync bs=1G
>> 0+1 records in / 0+1 records out
>> 547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
>>
>> root [~]# w
>> 12:09:56 up 1 day, 13:11, 1 user, load average: 39.29,
>> 12.67, 4.93
>>
>> It needed 75 seconds to copy a half GB file and the server
>> load increased 100 times.
>>
>> And a final test:
>>
>> root@ [~]# dd if=/dev/zero of=test24 bs=64k count=16k
>> conv=fdatasync
>> 16384+0 records in / 16384+0 records out
>> 1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
>>
>> This time the load spiked to only ~ 20.
>>
>> A few other peculiarities:
>>
>> root@ [~]# hdparm -t /dev/sda
>> Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55
>> MB/sec
>> root@ [~]# hdparm -t /dev/sdb
>> Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44
>> MB/sec
>>
>> The read speed is very different between the 2 devices (the
>> margin is 140%) but look what happens when I run it with
>> --direct:
>>
>> root@ [~]# hdparm --direct -t /dev/sda
>> Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23
>> MB/sec
>> root@ [~]# hdparm --direct -t /dev/sdb
>> Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53
>> MB/sec
>>
>> So the hardware seems to sustain speeds of about 200MB/s on
>> both devices but it differs greatly.
>> The measurement of sda increased 20% but sdb doubled. Maybe
>> there's a problem with the page cache?
>>
>> BACKGROUND INFORMATION
>> Server type: general shared hosting server (3 weeks new)
>> O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
>> Hardware: SuperMicro 5017C-MTRF, E3-1270v2, 16GB RAM, 2 x
>> Samsung 840 PRO 512GB
>> Partitioning: ~ 100GB left for over-provisioning, ext 4:
>>
>> I believe it is aligned:
>>
>> root [~]# fdisk -lu
>>
>> Disk /dev/sda: 512.1 GB, 512110190592 bytes
>> 255 heads, 63 sectors/track, 62260 cylinders, total
>> 1000215216 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk identifier: 0x00026d59
>>
>> Device Boot Start End Blocks Id System
>> /dev/sda1 2048 4196351 2097152 fd Linux
>> raid autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sda2 * 4196352 4605951 204800 fd Linux
>> raid autodetect
>> Partition 2 does not end on cylinder boundary.
>> /dev/sda3 4605952 814106623 404750336 fd Linux
>> raid autodetect
>>
>> Disk /dev/sdb: 512.1 GB, 512110190592 bytes
>> 255 heads, 63 sectors/track, 62260 cylinders, total
>> 1000215216 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk identifier: 0x0003dede
>>
>> Device Boot Start End Blocks Id System
>> /dev/sdb1 2048 4196351 2097152 fd Linux
>> raid autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sdb2 * 4196352 4605951 204800 fd Linux
>> raid autodetect
>> Partition 2 does not end on cylinder boundary.
>> /dev/sdb3 4605952 814106623 404750336 fd Linux
>> raid autodetect
>>
>> The matrix is NOT degraded:
>>
>> root@ [~]# cat /proc/mdstat
>> Personalities : [raid1]
>> md0 : active raid1 sdb2[1] sda2[0]
>> 204736 blocks super 1.0 [2/2] [UU]
>> md2 : active raid1 sdb3[1] sda3[0]
>> 404750144 blocks super 1.0 [2/2] [UU]
>> md1 : active raid1 sdb1[1] sda1[0]
>> 2096064 blocks super 1.1 [2/2] [UU]
>> unused devices: <none>
>>
>> Write cache is on:
>>
>> root@ [~]# hdparm -W /dev/sda
>> write-caching = 1 (on)
>> root@ [~]# hdparm -W /dev/sdb
>> write-caching = 1 (on)
>>
>> SMART seems to be OK:
>> SMART overall-health self-assessment test result: PASSED (for
>> both devices)
>>
>> I have tried changing IO scheduler with NOOP and deadline but
>> I couldn't see improvements.
>>
>> I have tried running fstrim but it errors out:
>>
>> root [~]# fstrim -v /
>> fstrim: /: FITRIM ioctl failed: Operation not supported
>>
>> So I have changed /etc/fstab to contain noatime and discard
>> and rebooted the server but to no avail.
>>
>> I no longer know what to do. And I need to come up with some
>> sort of a solution (it's not reasonable nor acceptable to get
>> at 3 digits loads from copying several GBs worth of file). If
>> anyone can help me, please do!
>>
>> Thanks in advance!
>> Andy
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> <mailto:majordomo@vger.kernel.org>
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>> --
>> Roberto Spadim
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-19 22:58 Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO Andrei Banu
[not found] ` <CAH3kUhEaZGON=fAyVMZOz5fH_DcfKv=hCa96UCeK4pN7k81c_Q@mail.gmail.com>
@ 2013-04-21 0:10 ` Stan Hoeppner
[not found] ` <51732E2B.6090607@hardwarefreak.com>
2013-04-23 6:01 ` Stan Hoeppner
3 siblings, 0 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-21 0:10 UTC (permalink / raw)
To: Andrei Banu, Linux RAID
Forgot to CC the list. Sorry for the dup Andrei.
On 4/19/2013 5:58 PM, Andrei Banu wrote:
> I come to you with a difficult problem. We have a server otherwise
> snappy fitted with mdraid-1 made of Samsung 840 PRO SSDs. If we copy a
> larger file to the server (from the same server, from net doesn't
> matter) the server load will increase from roughly 0.7 to over 100 (for
> several GB files). Apparently the reason is that the raid can't write well.
...
> 547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
> 547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
> 547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
> 1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
> Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55 MB/sec
> Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44 MB/sec
> Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23 MB/sec
> Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53 MB/sec
...
Obviously this is frustrating, but the fix should be pretty easy.
> O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
I'd guess your problem is the following regression. I don't believe
this regression is fixed in Red Hat 2.6.32-* kernels:
http://www.archivum.info/linux-ide@vger.kernel.org/2010-02/00243/bad-performance-with-SSD-since-kernel-version-2.6.32.html
After I discovered this regression and recommended Adam Goryachev
upgrade from Debian 2.6.32 to 3.2.x, his SSD RAID5 throughput increased
by a factor of 5x, though much of this was due testing methods. His raw
SSD throughput more than doubled per drive. The thread detailing this
is long but is a good read:
http://marc.info/?l=linux-raid&m=136098921212920&w=2
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-20 23:26 ` Andrei Banu
@ 2013-04-21 2:48 ` Stan Hoeppner
2013-04-21 12:23 ` Tommy Apel
2013-04-25 11:38 ` Thomas Jarosch
1 sibling, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-21 2:48 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 4/20/2013 6:26 PM, Andrei Banu wrote:
> They are connected through SATA2 ports (this does explain the read speed
> but not the pitiful write one) in AHCI.
These SSDs are capable of 500MB/s, and cost ~$1000 USD. Spend ~$200 USD
on a decent HBA. The 6G SAS/SATA LSI 9211-4i seems perfectly suited to
your RAID1 SSD application. It is a 4 port enterprise JBOD HBA that
also supports ASIC level RAID 1, 1E, 10.
Also, the difference in throughput your show between RAID maintenance,
direct device access, and filesystem access suggests you have something
running between the block and filesystem layers, for instance LUKS.
Though LUKS alone shouldn't hammer your CPU and IO throughput so
dramatically. However, if the SSDs do compression or encryption
automatically, and I believe the 840s do, the LUKS encrypted blocks may
cause the SSD firmware to take considerably more time to process the blocks.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 2:48 ` Stan Hoeppner
@ 2013-04-21 12:23 ` Tommy Apel
2013-04-21 16:48 ` Tommy Apel
2013-04-21 19:33 ` Stan Hoeppner
0 siblings, 2 replies; 38+ messages in thread
From: Tommy Apel @ 2013-04-21 12:23 UTC (permalink / raw)
To: stan; +Cc: Andrei Banu, linux-raid Raid
Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
writing from /dev/zero
mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
/run/fill ~1.06GB/s
dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
iostat claiming 100% util on both drives when doing so, running both
deadline and noop scheduler,
doing the same with 4 threads and offset by 1.1GB on the disk and
taske set to 4 cores makes no difference, still ~68MB/s with random
data
# for x in `seq 0 4`; do taskset -c $x dd if=/run/fill of=/dev/md0
bs=1M count=1024 seek=$(($x * 1024)) oflag=direct & done
/Tommy
2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
> On 4/20/2013 6:26 PM, Andrei Banu wrote:
>
>> They are connected through SATA2 ports (this does explain the read speed
>> but not the pitiful write one) in AHCI.
>
> These SSDs are capable of 500MB/s, and cost ~$1000 USD. Spend ~$200 USD
> on a decent HBA. The 6G SAS/SATA LSI 9211-4i seems perfectly suited to
> your RAID1 SSD application. It is a 4 port enterprise JBOD HBA that
> also supports ASIC level RAID 1, 1E, 10.
>
> Also, the difference in throughput your show between RAID maintenance,
> direct device access, and filesystem access suggests you have something
> running between the block and filesystem layers, for instance LUKS.
> Though LUKS alone shouldn't hammer your CPU and IO throughput so
> dramatically. However, if the SSDs do compression or encryption
> automatically, and I believe the 840s do, the LUKS encrypted blocks may
> cause the SSD firmware to take considerably more time to process the blocks.
>
> --
> Stan
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 12:23 ` Tommy Apel
@ 2013-04-21 16:48 ` Tommy Apel
2013-04-21 19:33 ` Stan Hoeppner
1 sibling, 0 replies; 38+ messages in thread
From: Tommy Apel @ 2013-04-21 16:48 UTC (permalink / raw)
To: stan; +Cc: Andrei Banu, linux-raid Raid
Just did a blockwise test aswell with fio >
Single SSD :
# ./scst-trunk/scripts/blockdev-perftest -d -f -i 1 -j -m 10 -M 20 -s
30 -f /dev/sdb
blocksize W W(avg, W(std, W R R(avg,
R(std, R
(bytes) (s) MB/s) MB/s) (IOPS) (s) MB/s)
MB/s) (IOPS)
1048576 6.548 156.384 0.000 156.384 2.383 429.710
0.000 429.710
524288 6.311 162.256 0.000 324.513 2.521 406.188
0.000 812.376
262144 6.183 165.615 0.000 662.462 3.003 340.992
0.000 1363.969
131072 6.096 167.979 0.000 1343.832 3.140 326.115
0.000 2608.917
65536 5.973 171.438 0.000 2743.010 3.807 268.978
0.000 4303.651
32768 5.748 178.149 0.000 5700.765 4.609 222.174
0.000 7109.568
16384 5.693 179.870 0.000 11511.681 5.203 196.810
0.000 12595.810
8192 6.188 165.482 0.000 21181.642 7.339 139.529
0.000 17859.654
4096 10.190 100.491 0.000 25725.613 13.816 74.117
0.000 18973.943
2048 25.018 40.931 0.000 20956.431 26.136 39.180
0.000 20059.994
1024 39.693 25.798 0.000 26417.152 50.580 20.245
0.000 20731.040
RAID1 with two Intel330 SSDs:
# ./scst-trunk/scripts/blockdev-perftest -d -f -i 1 -j -m 10 -M 20 -s
30 -f /dev/md0
blocksize W W(avg, W(std, W R R(avg,
R(std, R
(bytes) (s) MB/s) MB/s) (IOPS) (s) MB/s)
MB/s) (IOPS)
1048576 7.053 145.186 0.000 145.186 2.384 429.530
0.000 429.530
524288 6.906 148.277 0.000 296.554 2.518 406.672
0.000 813.344
262144 6.763 151.412 0.000 605.648 2.871 356.670
0.000 1426.681
131072 6.558 156.145 0.000 1249.161 3.166 323.437
0.000 2587.492
65536 6.578 155.670 0.000 2490.727 3.835 267.014
0.000 4272.229
32768 6.311 162.256 0.000 5192.204 4.379 233.843
0.000 7482.987
16384 6.406 159.850 0.000 10230.409 5.953 172.014
0.000 11008.903
8192 7.776 131.687 0.000 16855.967 8.621 118.780
0.000 15203.805
4096 11.137 91.946 0.000 23538.116 14.138 72.429
0.000 18541.802
2048 38.440 26.639 0.000 13639.126 22.512 45.487
0.000 23289.268
1024 60.933 16.805 0.000 17208.672 43.247 23.678
0.000 24246.214
it sorta confirms that the performance goes down but I would kinda
expect that in a way aswell as the write confirm has to come from both
disks.
/Tommy
2013/4/21 Tommy Apel <tommyapeldk@gmail.com>:
> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
> writing from /dev/zero
>
> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
> /run/fill ~1.06GB/s
> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>
> iostat claiming 100% util on both drives when doing so, running both
> deadline and noop scheduler,
> doing the same with 4 threads and offset by 1.1GB on the disk and
> taske set to 4 cores makes no difference, still ~68MB/s with random
> data
> # for x in `seq 0 4`; do taskset -c $x dd if=/run/fill of=/dev/md0
> bs=1M count=1024 seek=$(($x * 1024)) oflag=direct & done
>
> /Tommy
>
> 2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
>> On 4/20/2013 6:26 PM, Andrei Banu wrote:
>>
>>> They are connected through SATA2 ports (this does explain the read speed
>>> but not the pitiful write one) in AHCI.
>>
>> These SSDs are capable of 500MB/s, and cost ~$1000 USD. Spend ~$200 USD
>> on a decent HBA. The 6G SAS/SATA LSI 9211-4i seems perfectly suited to
>> your RAID1 SSD application. It is a 4 port enterprise JBOD HBA that
>> also supports ASIC level RAID 1, 1E, 10.
>>
>> Also, the difference in throughput your show between RAID maintenance,
>> direct device access, and filesystem access suggests you have something
>> running between the block and filesystem layers, for instance LUKS.
>> Though LUKS alone shouldn't hammer your CPU and IO throughput so
>> dramatically. However, if the SSDs do compression or encryption
>> automatically, and I believe the 840s do, the LUKS encrypted blocks may
>> cause the SSD firmware to take considerably more time to process the blocks.
>>
>> --
>> Stan
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 12:23 ` Tommy Apel
2013-04-21 16:48 ` Tommy Apel
@ 2013-04-21 19:33 ` Stan Hoeppner
2013-04-21 19:56 ` Tommy Apel
1 sibling, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-21 19:33 UTC (permalink / raw)
To: Tommy Apel; +Cc: Andrei Banu, linux-raid Raid
On 4/21/2013 7:23 AM, Tommy Apel wrote:
> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
> writing from /dev/zero
>
> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
> /run/fill ~1.06GB/s
What's the purpose of all of this? Surely not simply to create random
data, which is accomplished much more easily. Are you sand bagging us
here with a known bug, or simply trying to show off your mad skillz?
Either way this is entirely unnecessary for troubleshooting an IO
performance issue. dd doesn't (shouldn't) care if the bits are random
or not, though the Intel SSD controller might, as well as other layers
you may have in your IO stack. Keep it simple so we can isolate one
layer at a time.
> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
Noting the above, it's interesting that you omitted this test
dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
preventing an apples to apples comparison between raw SSD device and
md/RAID1 performance with your uber random file as input.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 19:33 ` Stan Hoeppner
@ 2013-04-21 19:56 ` Tommy Apel
2013-04-22 0:47 ` Stan Hoeppner
0 siblings, 1 reply; 38+ messages in thread
From: Tommy Apel @ 2013-04-21 19:56 UTC (permalink / raw)
To: stan; +Cc: Andrei Banu, linux-raid Raid
Calm the f. down, I was just handing over some information, sorry your
day was ruined mr. high and mighty, use the info for whatever you want
to but flaming me is't going to help anyone.
2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
> On 4/21/2013 7:23 AM, Tommy Apel wrote:
>> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
>> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
>> writing from /dev/zero
>>
>> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
>
>
>> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
>> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
>> /run/fill ~1.06GB/s
>
> What's the purpose of all of this? Surely not simply to create random
> data, which is accomplished much more easily. Are you sand bagging us
> here with a known bug, or simply trying to show off your mad skillz?
> Either way this is entirely unnecessary for troubleshooting an IO
> performance issue. dd doesn't (shouldn't) care if the bits are random
> or not, though the Intel SSD controller might, as well as other layers
> you may have in your IO stack. Keep it simple so we can isolate one
> layer at a time.
>
>> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
>> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
>> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>
> Noting the above, it's interesting that you omitted this test
>
> dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
>
> preventing an apples to apples comparison between raw SSD device and
> md/RAID1 performance with your uber random file as input.
>
> --
> Stan
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
[not found] ` <51732E2B.6090607@hardwarefreak.com>
@ 2013-04-21 20:46 ` Andrei Banu
2013-04-21 23:17 ` Stan Hoeppner
0 siblings, 1 reply; 38+ messages in thread
From: Andrei Banu @ 2013-04-21 20:46 UTC (permalink / raw)
To: linux-raid
Hello,
At this point I probably should state that I am not an experienced
sysadmin. Knowing this, I do have a server management company but they
said they don't know what to do so now I am trying to fix things myself
but I am something of a noob. I normally try to keep my actions to
cautious config changes and testing. I have never done a kernel update.
Any easy way to do this?
Regarding your second advice (to purchase a decent HBA) I have already
thought about it but I guess it comes with it's own drivers that need to
be compiled into initramfs etc. So I am trying to replace the baseboard
with one with SATA3 support to avoid any configuration changes (the old
board has the C202 chipset and the new one has C204 so I guess this
replacement is as simple as it gets - just remove the old board and plug
the new one without any software changes or recompiles). Again I need to
say this server is in production and I can't move the data or the users.
I can have a few hours downtime during the night but that's about all.
Regarding the kernel upgrade, do we need to compile one from source or
there's an easier way?
Thanks!
On 21/04/2013 3:09 AM, Stan Hoeppner wrote:
> On 4/19/2013 5:58 PM, Andrei Banu wrote:
>
>> I come to you with a difficult problem. We have a server otherwise
>> snappy fitted with mdraid-1 made of Samsung 840 PRO SSDs. If we copy a
>> larger file to the server (from the same server, from net doesn't
>> matter) the server load will increase from roughly 0.7 to over 100 (for
>> several GB files). Apparently the reason is that the raid can't write well.
> ...
>> 547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
>> 547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
>> 547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
>> 1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
>> Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55 MB/sec
>> Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44 MB/sec
>> Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23 MB/sec
>> Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53 MB/sec
> ...
>
> Obviously this is frustrating, but the fix should be pretty easy.
>
>> O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
> I'd guess your problem is the following regression. I don't believe
> this regression is fixed in Red Hat 2.6.32-* kernels:
>
> http://www.archivum.info/linux-ide@vger.kernel.org/2010-02/00243/bad-performance-with-SSD-since-kernel-version-2.6.32.html
>
> After I discovered this regression and recommended Adam Goryachev
> upgrade from Debian 2.6.32 to 3.2.x, his SSD RAID5 throughput increased
> by a factor of 5x, though much of this was due testing methods. His raw
> SSD throughput more than doubled per drive. The thread detailing this
> is long but is a good read:
>
> http://marc.info/?l=linux-raid&m=136098921212920&w=2
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 20:46 ` Andrei Banu
@ 2013-04-21 23:17 ` Stan Hoeppner
2013-04-22 10:19 ` Andrei Banu
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-21 23:17 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 4/21/2013 3:46 PM, Andrei Banu wrote:
> Hello,
>
> At this point I probably should state that I am not an experienced
> sysadmin.
Things are becoming more clear now.
> Knowing this, I do have a server management company but they
> said they don't know what to do
So you own this hardware and it is colocated, correct?
> so now I am trying to fix things myself
> but I am something of a noob. I normally try to keep my actions to
> cautious config changes and testing.
Why did you choose Centos? Was this installed by the company?
> I have never done a kernel update.
> Any easy way to do this?
It may not be necessary, at least to solve any SSD performance problems
anyway. Reexamining your numbers shows you hit 262MB/s to /dev/sda.
That's 65% of SATA2 interface bandwidth, so this kernel probably does
have the patch. Your problem lie elsewhere.
> Regarding your second advice (to purchase a decent HBA) I have already
> thought about it but I guess it comes with it's own drivers that need to
> be compiled into initramfs etc.
The default CentOS (RHEL) initramfs should include mptsas, which
supports all the LSI HBAs. The LSI caching RAID cards are supported as
well with megaraid_sas.
The question is, do you really need more than the ~260MB/s of peak
throughput you currently have? And is it worth the hassle?
> So I am trying to replace the baseboard
> with one with SATA3 support to avoid any configuration changes (the old
> board has the C202 chipset and the new one has C204 so I guess this
> replacement is as simple as it gets - just remove the old board and plug
> the new one without any software changes or recompiles). Again I need to
> say this server is in production and I can't move the data or the users.
> I can have a few hours downtime during the night but that's about all.
It's not clear your problem is hardware bandwidth. In fact it seems the
problem lie elsewhere. It may simply be that you're running these tests
while other substantial IO is occurring. Actually, your numbers show
this is exactly the case. What they don't show is how much other IO is
hitting the SSDs while you're running your tests.
> Regarding the kernel upgrade, do we need to compile one from source or
> there's an easier way?
I don't believe at this point you need a new kernel to fix the problem
you have. If this patch was not present you'd not be able to get
260MB/s from SATA2. Your problem lie elsewhere.
In the future, instead of making a post saying "md is slow, my SSDs are
slow" and pasting test data which appears to back that claim, you'd be
better served by describing a general problem, such as "users say the
system is slow and I think it may be md or SSD related". This way we
don't waste time following a troubleshooting path based on incorrect
assumptions, as we've done here. Or at least as I've done here, as I'm
the only one assisting.
Boot all users off the system, shut down any daemons that may generate
any meaningful load on the disks or CPUs. Disable any encryption or
compression. Then rerun your tests while completely idle. Then we'll
go from there.
--
Stan
> Thanks!
>
> On 21/04/2013 3:09 AM, Stan Hoeppner wrote:
>> On 4/19/2013 5:58 PM, Andrei Banu wrote:
>>
>>> I come to you with a difficult problem. We have a server otherwise
>>> snappy fitted with mdraid-1 made of Samsung 840 PRO SSDs. If we copy a
>>> larger file to the server (from the same server, from net doesn't
>>> matter) the server load will increase from roughly 0.7 to over 100 (for
>>> several GB files). Apparently the reason is that the raid can't write
>>> well.
>> ...
>>> 547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
>>> 547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
>>> 547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
>>> 1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
>>> Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55 MB/sec
>>> Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44 MB/sec
>>> Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23 MB/sec
>>> Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53 MB/sec
>> ...
>>
>> Obviously this is frustrating, but the fix should be pretty easy.
>>
>>> O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
>> I'd guess your problem is the following regression. I don't believe
>> this regression is fixed in Red Hat 2.6.32-* kernels:
>>
>> http://www.archivum.info/linux-ide@vger.kernel.org/2010-02/00243/bad-performance-with-SSD-since-kernel-version-2.6.32.html
>>
>>
>> After I discovered this regression and recommended Adam Goryachev
>> upgrade from Debian 2.6.32 to 3.2.x, his SSD RAID5 throughput increased
>> by a factor of 5x, though much of this was due testing methods. His raw
>> SSD throughput more than doubled per drive. The thread detailing this
>> is long but is a good read:
>>
>> http://marc.info/?l=linux-raid&m=136098921212920&w=2
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 19:56 ` Tommy Apel
@ 2013-04-22 0:47 ` Stan Hoeppner
2013-04-22 7:51 ` Tommy Apel
0 siblings, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-22 0:47 UTC (permalink / raw)
To: Tommy Apel; +Cc: Andrei Banu, linux-raid Raid
On 4/21/2013 2:56 PM, Tommy Apel wrote:
> Calm the f. down, I was just handing over some information, sorry your
> day was ruined mr. high and mighty, use the info for whatever you want
> to but flaming me is't going to help anyone.
Your tantrum aside, the Intel 330, as well as all current Intel SSDs,
uses the SandForce 2281 controller. The SF2xxx series' write
performance is limited by the compressibility of the data. What you're
doing below is simply showcasing the write bandwidth limitation of the
SF2xxx controllers with incompressible data.
This is not relevant to md. And it's not relevant to Andrei. It turns
out that the Samsung 840 SSDs have consistent throughput because they
don't rely on compression.
--
Stan
> 2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
>> On 4/21/2013 7:23 AM, Tommy Apel wrote:
>>> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
>>> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
>>> writing from /dev/zero
>>>
>>> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
>>
>>
>>> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
>>> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
>>> /run/fill ~1.06GB/s
>>
>> What's the purpose of all of this? Surely not simply to create random
>> data, which is accomplished much more easily. Are you sand bagging us
>> here with a known bug, or simply trying to show off your mad skillz?
>> Either way this is entirely unnecessary for troubleshooting an IO
>> performance issue. dd doesn't (shouldn't) care if the bits are random
>> or not, though the Intel SSD controller might, as well as other layers
>> you may have in your IO stack. Keep it simple so we can isolate one
>> layer at a time.
>>
>>> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
>>> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
>>> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>>
>> Noting the above, it's interesting that you omitted this test
>>
>> dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
>>
>> preventing an apples to apples comparison between raw SSD device and
>> md/RAID1 performance with your uber random file as input.
>>
>> --
>> Stan
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 0:47 ` Stan Hoeppner
@ 2013-04-22 7:51 ` Tommy Apel
2013-04-22 8:29 ` Tommy Apel
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Tommy Apel @ 2013-04-22 7:51 UTC (permalink / raw)
To: stan; +Cc: Andrei Banu, linux-raid Raid
Stan>
That was exactly what I was trying to show, that you result may vary
depending on data and backing device, as far as the raid1 goes it
doesn't care much for the data beeing passed through it.
Ben>
could you try to run iostat -x 2 for a few minuts just to make sure
there is no other I/O going on the device before running your tests,
and then run the tests with fio instead of dd ?
fio write test > fio --rw=write --filename=testfile --bs=1048576
--size=4294967296 --ioengine=psync --end_fsync=1 --invalidate=1
--direct=1 --name=writeperftest
/Tommy
2013/4/22 Stan Hoeppner <stan@hardwarefreak.com>:
> On 4/21/2013 2:56 PM, Tommy Apel wrote:
>> Calm the f. down, I was just handing over some information, sorry your
>> day was ruined mr. high and mighty, use the info for whatever you want
>> to but flaming me is't going to help anyone.
>
> Your tantrum aside, the Intel 330, as well as all current Intel SSDs,
> uses the SandForce 2281 controller. The SF2xxx series' write
> performance is limited by the compressibility of the data. What you're
> doing below is simply showcasing the write bandwidth limitation of the
> SF2xxx controllers with incompressible data.
>
> This is not relevant to md. And it's not relevant to Andrei. It turns
> out that the Samsung 840 SSDs have consistent throughput because they
> don't rely on compression.
>
> --
> Stan
>
>
>> 2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
>>> On 4/21/2013 7:23 AM, Tommy Apel wrote:
>>>> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
>>>> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
>>>> writing from /dev/zero
>>>>
>>>> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
>>>
>>>
>>>> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
>>>> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
>>>> /run/fill ~1.06GB/s
>>>
>>> What's the purpose of all of this? Surely not simply to create random
>>> data, which is accomplished much more easily. Are you sand bagging us
>>> here with a known bug, or simply trying to show off your mad skillz?
>>> Either way this is entirely unnecessary for troubleshooting an IO
>>> performance issue. dd doesn't (shouldn't) care if the bits are random
>>> or not, though the Intel SSD controller might, as well as other layers
>>> you may have in your IO stack. Keep it simple so we can isolate one
>>> layer at a time.
>>>
>>>> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
>>>> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
>>>> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>>>
>>> Noting the above, it's interesting that you omitted this test
>>>
>>> dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
>>>
>>> preventing an apples to apples comparison between raw SSD device and
>>> md/RAID1 performance with your uber random file as input.
>>>
>>> --
>>> Stan
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 7:51 ` Tommy Apel
@ 2013-04-22 8:29 ` Tommy Apel
2013-04-22 10:26 ` Andrei Banu
2013-04-22 23:21 ` Stan Hoeppner
2 siblings, 0 replies; 38+ messages in thread
From: Tommy Apel @ 2013-04-22 8:29 UTC (permalink / raw)
To: Andrei Banu, stan; +Cc: linux-raid Raid
Ben = Andrei, sorry for the typo.
2013/4/22 Tommy Apel <tommyapeldk@gmail.com>:
> Stan>
> That was exactly what I was trying to show, that you result may vary
> depending on data and backing device, as far as the raid1 goes it
> doesn't care much for the data beeing passed through it.
>
> Ben>
> could you try to run iostat -x 2 for a few minuts just to make sure
> there is no other I/O going on the device before running your tests,
> and then run the tests with fio instead of dd ?
>
> fio write test > fio --rw=write --filename=testfile --bs=1048576
> --size=4294967296 --ioengine=psync --end_fsync=1 --invalidate=1
> --direct=1 --name=writeperftest
>
> /Tommy
>
> 2013/4/22 Stan Hoeppner <stan@hardwarefreak.com>:
>> On 4/21/2013 2:56 PM, Tommy Apel wrote:
>>> Calm the f. down, I was just handing over some information, sorry your
>>> day was ruined mr. high and mighty, use the info for whatever you want
>>> to but flaming me is't going to help anyone.
>>
>> Your tantrum aside, the Intel 330, as well as all current Intel SSDs,
>> uses the SandForce 2281 controller. The SF2xxx series' write
>> performance is limited by the compressibility of the data. What you're
>> doing below is simply showcasing the write bandwidth limitation of the
>> SF2xxx controllers with incompressible data.
>>
>> This is not relevant to md. And it's not relevant to Andrei. It turns
>> out that the Samsung 840 SSDs have consistent throughput because they
>> don't rely on compression.
>>
>> --
>> Stan
>>
>>
>>> 2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
>>>> On 4/21/2013 7:23 AM, Tommy Apel wrote:
>>>>> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
>>>>> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
>>>>> writing from /dev/zero
>>>>>
>>>>> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
>>>>
>>>>
>>>>> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
>>>>> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
>>>>> /run/fill ~1.06GB/s
>>>>
>>>> What's the purpose of all of this? Surely not simply to create random
>>>> data, which is accomplished much more easily. Are you sand bagging us
>>>> here with a known bug, or simply trying to show off your mad skillz?
>>>> Either way this is entirely unnecessary for troubleshooting an IO
>>>> performance issue. dd doesn't (shouldn't) care if the bits are random
>>>> or not, though the Intel SSD controller might, as well as other layers
>>>> you may have in your IO stack. Keep it simple so we can isolate one
>>>> layer at a time.
>>>>
>>>>> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
>>>>> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
>>>>> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>>>>
>>>> Noting the above, it's interesting that you omitted this test
>>>>
>>>> dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
>>>>
>>>> preventing an apples to apples comparison between raw SSD device and
>>>> md/RAID1 performance with your uber random file as input.
>>>>
>>>> --
>>>> Stan
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 23:17 ` Stan Hoeppner
@ 2013-04-22 10:19 ` Andrei Banu
2013-04-23 2:51 ` Stan Hoeppner
2013-04-22 23:11 ` Andrei Banu
2013-04-22 23:25 ` Stan Hoeppner
2 siblings, 1 reply; 38+ messages in thread
From: Andrei Banu @ 2013-04-22 10:19 UTC (permalink / raw)
To: linux-raid
Hello!
First off allow me to apologize if my rumbling sent you in a wrong
direction and thank you for assisting.
Most of the data I have supplied was mostly background information. Let
me start fresh but first allow me to answer your explicit questions:
1. Yes, I own the hardware and it's colocated in a datacenter.
2. I am quite happy with 260MB/s read for SATA2. I think that's decent
and I never meant it as a problem.
3. I have run for a few minutes iostat -x -m 2 and from what I see the
normal write per second is at about 0-500KB/s, sometimes it gets to
1-2MB/s and rarely between 3 and 4MB/s.
4. I will redo the test off-peak hours when I can afford to shutdown
various services.
The actual problem is that when I write any larger file hundreds of MB
or more to the server (from network or from the same server) the server
starts to overload. The server can overload to over 100 for files of ~
5GB. I mean this server has an average load of 0.52 (sar -q) but it can
spike to 3 digit server loads in a few minutes from making or
downloading a larger cPanel backup file. I have to rely only on R1Soft
for backups right now because the normal cPanel backups make the server
unstable when it backs up accounts over 1GB (many).
So I concluded this is due to very low write speeds so I ran the 'dd'
tests to evaluate this assumption. You know, I don't think that the
problem is I ran these tests during other I/O intensive tasks. It's like
after a number of megabytes written at a time, the SSD devices
themselves overload. I mean during off peak hours I can sometimes get a
good decent speed (like 60-100MB/s write speed) but if I redo the test
soon (tens of seconds - minutes) I get very different much lower write
speeds (like under 10MB/s write speed). Or maybe the write speed itsef
is not the problem but the fact that when I write a large file the
server seems to stop doing anything else. So...the speed test results
are poor AND the server overloads. A lot! I mean most write results are
in the 10-20MB/s range. I have seen more than 25MB/s very rarely and
almost never was I able to reproduce them within the same hour. If I do
a 'dd' test with 'bs' of 2-4MB I sometimes get good results (40-60MB/s)
but never with a 'bs' of 1GB (the top speed I got with 1G 'bs' was
27MB/s during the night). But the essential notable problem is that this
server can't copy large files without seriously overloading itself.
Now let me elaborate why I have given the read speeds (as I am not
unhappy with them):
1. Some said the low write speed might be due to a bad cable. So I
stated the 260MB/s read speed to show it's probably not a bad cable. If
it's capable to push 260MB/s up, it's probably not a bad cable.
2. I have observed a very big difference between /dev/sda and /dev/sdb
and I thought it might me indicative of a problem somewhere. If I run
hdparm -t /dev/sda I get about 215MB/s but on /dev/sdb I get about
80-90MB/s. Only if I add --direct flag I get 260MB/s for /dev/sda.
Previously when I added --direct for /dev/sdb I was getting about
180MB/s but now I get ~85MB/s with or without --direct.
root [/]# hdparm -t /dev/sdb
Timing buffered disk reads: 262 MB in 3.01 seconds = 86.92 MB/sec
root [/]# hdparm --direct -t /dev/sdb
Timing O_DIRECT disk reads: 264 MB in 3.08 seconds = 85.74 MB/sec
This is something new. /dev/sdb no longer gets to nearly 200MB/s (with
--direct) but stays under 100MB/s in all cases. Maybe indeed it's a
problem with the cable or with the device itself.
And a 30 minutes later update: /dev/sdb returned to 90MB/s read speed
WITHOUT --direct and 180MB/s WITH --direct. /dev/sda is constant (215
without --direct and 260 with --direct). What do you make of this?
Kind regards!
On 2013-04-22 02:17, Stan Hoeppner wrote:
> On 4/21/2013 3:46 PM, Andrei Banu wrote:
>> Hello,
>> At this point I probably should state that I am not an experienced
>> sysadmin.
> Things are becoming more clear now.
>
>> Knowing this, I do have a server management company but they
>> said they don't know what to do
> So you own this hardware and it is colocated, correct?
>
>> so now I am trying to fix things myself
>> but I am something of a noob. I normally try to keep my actions to
>> cautious config changes and testing.
> Why did you choose Centos? Was this installed by the company?
>
>> I have never done a kernel update.
>> Any easy way to do this?
> It may not be necessary, at least to solve any SSD performance
> problems
> anyway. Reexamining your numbers shows you hit 262MB/s to /dev/sda.
> That's 65% of SATA2 interface bandwidth, so this kernel probably does
> have the patch. Your problem lie elsewhere.
>
>> Regarding your second advice (to purchase a decent HBA) I have
>> already
>> thought about it but I guess it comes with it's own drivers that need
>> to
>> be compiled into initramfs etc.
> The default CentOS (RHEL) initramfs should include mptsas, which
> supports all the LSI HBAs. The LSI caching RAID cards are supported
> as
> well with megaraid_sas.
> The question is, do you really need more than the ~260MB/s of peak
> throughput you currently have? And is it worth the hassle?
>
>> So I am trying to replace the baseboard
>> with one with SATA3 support to avoid any configuration changes (the
>> old
>> board has the C202 chipset and the new one has C204 so I guess this
>> replacement is as simple as it gets - just remove the old board and
>> plug
>> the new one without any software changes or recompiles). Again I need
>> to
>> say this server is in production and I can't move the data or the
>> users.
>> I can have a few hours downtime during the night but that's about
>> all.
> It's not clear your problem is hardware bandwidth. In fact it seems
> the
> problem lie elsewhere. It may simply be that you're running these
> tests
> while other substantial IO is occurring. Actually, your numbers show
> this is exactly the case. What they don't show is how much other IO
> is
> hitting the SSDs while you're running your tests.
>
>> Regarding the kernel upgrade, do we need to compile one from source
>> or
>> there's an easier way?
> I don't believe at this point you need a new kernel to fix the problem
> you have. If this patch was not present you'd not be able to get
> 260MB/s from SATA2. Your problem lie elsewhere.
> In the future, instead of making a post saying "md is slow, my SSDs
> are
> slow" and pasting test data which appears to back that claim, you'd be
> better served by describing a general problem, such as "users say the
> system is slow and I think it may be md or SSD related". This way we
> don't waste time following a troubleshooting path based on incorrect
> assumptions, as we've done here. Or at least as I've done here, as
> I'm
> the only one assisting.
> Boot all users off the system, shut down any daemons that may generate
> any meaningful load on the disks or CPUs. Disable any encryption or
> compression. Then rerun your tests while completely idle. Then we'll
> go from there.
> --
> Stan
>
>
>> Thanks!
>> On 21/04/2013 3:09 AM, Stan Hoeppner wrote:
>>> On 4/19/2013 5:58 PM, Andrei Banu wrote:
>>>
>>>> I come to you with a difficult problem. We have a server otherwise
>>>> snappy fitted with mdraid-1 made of Samsung 840 PRO SSDs. If we
>>>> copy a
>>>> larger file to the server (from the same server, from net doesn't
>>>> matter) the server load will increase from roughly 0.7 to over 100
>>>> (for
>>>> several GB files). Apparently the reason is that the raid can't
>>>> write
>>>> well.
>>> ...
>>>> 547682517 bytes (548 MB) copied, 7.99664 s, 68.5 MB/s
>>>> 547682517 bytes (548 MB) copied, 52.1958 s, 10.5 MB/s
>>>> 547682517 bytes (548 MB) copied, 75.3476 s, 7.3 MB/s
>>>> 1073741824 bytes (1.1 GB) copied, 61.8796 s, 17.4 MB/s
>>>> Timing buffered disk reads: 654 MB in 3.01 seconds = 217.55
>>>> MB/sec
>>>> Timing buffered disk reads: 272 MB in 3.01 seconds = 90.44
>>>> MB/sec
>>>> Timing O_DIRECT disk reads: 788 MB in 3.00 seconds = 262.23
>>>> MB/sec
>>>> Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.53
>>>> MB/sec
>>> ...
>>> Obviously this is frustrating, but the fix should be pretty easy.
>>>
>>>> O/S: CentOS 6.4 / 64 bit (2.6.32-358.2.1.el6.x86_64)
>>> I'd guess your problem is the following regression. I don't believe
>>> this regression is fixed in Red Hat 2.6.32-* kernels:
>>> http://www.archivum.info/linux-ide@vger.kernel.org/2010-02/00243/bad-performance-with-SSD-since-kernel-version-2.6.32.html
>>>
>>> After I discovered this regression and recommended Adam Goryachev
>>> upgrade from Debian 2.6.32 to 3.2.x, his SSD RAID5 throughput
>>> increased
>>> by a factor of 5x, though much of this was due testing methods. His
>>> raw
>>> SSD throughput more than doubled per drive. The thread detailing
>>> this
>>> is long but is a good read:
>>> http://marc.info/?l=linux-raid&m=136098921212920&w=2
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 7:51 ` Tommy Apel
2013-04-22 8:29 ` Tommy Apel
@ 2013-04-22 10:26 ` Andrei Banu
2013-04-22 12:02 ` Tommy Apel
2013-04-22 23:21 ` Stan Hoeppner
2 siblings, 1 reply; 38+ messages in thread
From: Andrei Banu @ 2013-04-22 10:26 UTC (permalink / raw)
To: linux-raid
Hi,
No worries about the typo. I ran iostat -x -m 2 for a few minutes and I
get:
- 0-500KB/s 70% of the time
- 1-2MB/s 20% of the time
- 3-4MB/s 10% of the time.
It never went beyond 4MB/s write speed. But I guess none of this
qualifies as a heavy write. Right?
The fio test can be carried out safely on an active production server
just as you gave it?
Thanks!
Andrei
On 2013-04-22 10:51, Tommy Apel wrote:
> Stan>
> That was exactly what I was trying to show, that you result may vary
> depending on data and backing device, as far as the raid1 goes it
> doesn't care much for the data beeing passed through it.
>
> Ben>
> could you try to run iostat -x 2 for a few minuts just to make sure
> there is no other I/O going on the device before running your tests,
> and then run the tests with fio instead of dd ?
>
> fio write test > fio --rw=write --filename=testfile --bs=1048576
> --size=4294967296 --ioengine=psync --end_fsync=1 --invalidate=1
> --direct=1 --name=writeperftest
>
> /Tommy
>
> 2013/4/22 Stan Hoeppner <stan@hardwarefreak.com>:
>> On 4/21/2013 2:56 PM, Tommy Apel wrote:
>>> Calm the f. down, I was just handing over some information, sorry
>>> your
>>> day was ruined mr. high and mighty, use the info for whatever you
>>> want
>>> to but flaming me is't going to help anyone.
>>
>> Your tantrum aside, the Intel 330, as well as all current Intel SSDs,
>> uses the SandForce 2281 controller. The SF2xxx series' write
>> performance is limited by the compressibility of the data. What
>> you're
>> doing below is simply showcasing the write bandwidth limitation of
>> the
>> SF2xxx controllers with incompressible data.
>>
>> This is not relevant to md. And it's not relevant to Andrei. It
>> turns
>> out that the Samsung 840 SSDs have consistent throughput because they
>> don't rely on compression.
>>
>> --
>> Stan
>>
>>
>>> 2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
>>>> On 4/21/2013 7:23 AM, Tommy Apel wrote:
>>>>> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
>>>>> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
>>>>> writing from /dev/zero
>>>>>
>>>>> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb
>>>>> /dev/sdc
>>>>
>>>>
>>>>> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
>>>>> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
>>>>> /run/fill ~1.06GB/s
>>>>
>>>> What's the purpose of all of this? Surely not simply to create
>>>> random
>>>> data, which is accomplished much more easily. Are you sand bagging
>>>> us
>>>> here with a known bug, or simply trying to show off your mad
>>>> skillz?
>>>> Either way this is entirely unnecessary for troubleshooting an IO
>>>> performance issue. dd doesn't (shouldn't) care if the bits are
>>>> random
>>>> or not, though the Intel SSD controller might, as well as other
>>>> layers
>>>> you may have in your IO stack. Keep it simple so we can isolate
>>>> one
>>>> layer at a time.
>>>>
>>>>> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock
>>>>> ~5.7GB/s
>>>>> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
>>>>> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>>>>
>>>> Noting the above, it's interesting that you omitted this test
>>>>
>>>> dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
>>>>
>>>> preventing an apples to apples comparison between raw SSD device
>>>> and
>>>> md/RAID1 performance with your uber random file as input.
>>>>
>>>> --
>>>> Stan
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 10:26 ` Andrei Banu
@ 2013-04-22 12:02 ` Tommy Apel
2013-04-23 2:59 ` Stan Hoeppner
0 siblings, 1 reply; 38+ messages in thread
From: Tommy Apel @ 2013-04-22 12:02 UTC (permalink / raw)
To: Andrei Banu, stan; +Cc: linux-raid Raid
Yes it can be run as it is, it will write to the file given by --filename=
well from what I make of it so far I wouldn't rule out the bad device
part but at the same time there could be other things involved
although I don't belive it to be the md part
Stan> do you know anything about the state of ext4 on centos 6.x ?
/Tommy
2013/4/22 Andrei Banu <andrei.banu@redhost.ro>
>
> Hi,
>
> No worries about the typo. I ran iostat -x -m 2 for a few minutes and I get:
>
> - 0-500KB/s 70% of the time
> - 1-2MB/s 20% of the time
> - 3-4MB/s 10% of the time.
>
> It never went beyond 4MB/s write speed. But I guess none of this qualifies as a heavy write. Right?
>
> The fio test can be carried out safely on an active production server just as you gave it?
>
> Thanks!
> Andrei
>
>
> On 2013-04-22 10:51, Tommy Apel wrote:
>>
>> Stan>
>> That was exactly what I was trying to show, that you result may vary
>> depending on data and backing device, as far as the raid1 goes it
>> doesn't care much for the data beeing passed through it.
>>
>> Ben>
>> could you try to run iostat -x 2 for a few minuts just to make sure
>> there is no other I/O going on the device before running your tests,
>> and then run the tests with fio instead of dd ?
>>
>> fio write test > fio --rw=write --filename=testfile --bs=1048576
>> --size=4294967296 --ioengine=psync --end_fsync=1 --invalidate=1
>> --direct=1 --name=writeperftest
>>
>> /Tommy
>>
>> 2013/4/22 Stan Hoeppner <stan@hardwarefreak.com>:
>>>
>>> On 4/21/2013 2:56 PM, Tommy Apel wrote:
>>>>
>>>> Calm the f. down, I was just handing over some information, sorry your
>>>> day was ruined mr. high and mighty, use the info for whatever you want
>>>> to but flaming me is't going to help anyone.
>>>
>>>
>>> Your tantrum aside, the Intel 330, as well as all current Intel SSDs,
>>> uses the SandForce 2281 controller. The SF2xxx series' write
>>> performance is limited by the compressibility of the data. What you're
>>> doing below is simply showcasing the write bandwidth limitation of the
>>> SF2xxx controllers with incompressible data.
>>>
>>> This is not relevant to md. And it's not relevant to Andrei. It turns
>>> out that the Samsung 840 SSDs have consistent throughput because they
>>> don't rely on compression.
>>>
>>> --
>>> Stan
>>>
>>>
>>>> 2013/4/21 Stan Hoeppner <stan@hardwarefreak.com>:
>>>>>
>>>>> On 4/21/2013 7:23 AM, Tommy Apel wrote:
>>>>>>
>>>>>> Hello, FYI I'm getting ~68MB/s on two intel330 in RAID1 aswell on
>>>>>> vanilla 3.8.8 and 3.9.0-rc3 when writing random data and ~236MB/s
>>>>>> writing from /dev/zero
>>>>>>
>>>>>> mdadm -C /dev/md0 -l 1 -n 2 --assume-clean --force --run /dev/sdb /dev/sdc
>>>>>
>>>>>
>>>>>
>>>>>> openssl enc -aes-128-ctr -pass pass:"$(dd if=/dev/urandom bs=128
>>>>>> count=1 2>/dev/null | base64)" -nosalt < /dev/zero | pv -pterb >
>>>>>> /run/fill ~1.06GB/s
>>>>>
>>>>>
>>>>> What's the purpose of all of this? Surely not simply to create random
>>>>> data, which is accomplished much more easily. Are you sand bagging us
>>>>> here with a known bug, or simply trying to show off your mad skillz?
>>>>> Either way this is entirely unnecessary for troubleshooting an IO
>>>>> performance issue. dd doesn't (shouldn't) care if the bits are random
>>>>> or not, though the Intel SSD controller might, as well as other layers
>>>>> you may have in your IO stack. Keep it simple so we can isolate one
>>>>> layer at a time.
>>>>>
>>>>>> dd if=/run/fill of=/dev/null bs=1M count=1024 iflag=fullblock ~5.7GB/s
>>>>>> dd if=/run/fill of=/dev/md0 bs=1M count=1024 oflag=direct ~68MB/s
>>>>>> dd if=/dev/zero of=/dev/md0 bs=1M count=1024 oflag=direct ~236MB/s
>>>>>
>>>>>
>>>>> Noting the above, it's interesting that you omitted this test
>>>>>
>>>>> dd if=/run/fill of=/dev/sdb bs=1M count=1024 oflag=direct
>>>>>
>>>>> preventing an apples to apples comparison between raw SSD device and
>>>>> md/RAID1 performance with your uber random file as input.
>>>>>
>>>>> --
>>>>> Stan
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 23:17 ` Stan Hoeppner
2013-04-22 10:19 ` Andrei Banu
@ 2013-04-22 23:11 ` Andrei Banu
2013-04-23 4:39 ` Stan Hoeppner
2013-04-22 23:25 ` Stan Hoeppner
2 siblings, 1 reply; 38+ messages in thread
From: Andrei Banu @ 2013-04-22 23:11 UTC (permalink / raw)
To: linux-raid
Hello again!
I have closed all the load generating services, waited a few minutes for
the server load to reach a clean 0.00 and then I have re-performed the
dd tests with various bs sizes. I was not able to setup correctly fio
with a compile error but I'll get it done.
One more thing before the results: I omitted to answer something earlier
today. CentOS was installed due to fact that cPanel is not installable
on many OSes (CentOS, RHEL and I think that's about it). So I picked
CentOS. The installation was done remotely over KVM with a minimal
CentOS CD (datacenter does not offer any server related services so we
had to do it ourselves over a Raritan KVM).
Tests were done roughly 1 minute apart.
1. First test (bs=1G): same as always.
root [~]# dd if=testfile.tar.gz of=test oflag=sync bs=1G
547682517 bytes (548 MB) copied, 53.3767 s, 10.3 MB/s
2. With a bs of 4MB: niceeee! Best result ever. I am not sure what
happened this time. However it's short lived.
root [~]# dd if=testfile.tar.gz of=test2 oflag=sync bs=4M
547682517 bytes (548 MB) copied, 4.43305 s, 124 MB/s
3. bs=2MB, starting to decay.
root [~]# dd if=testfile.tar.gz of=test3 oflag=sync bs=2M
547682517 bytes (548 MB) copied, 20.3647 s, 26.9 MB/s
4. bs=4MB again. Back to square 1.
root [~]# dd if=testfile.tar.gz of=test4 oflag=sync bs=4M
547682517 bytes (548 MB) copied, 56.7124 s, 9.7 MB/s
As services were shut down prior to the test, the biggest load it
reached was about 2.
5. Finally I restarted the services and redone the bs=4MB test (going
from a load of 0.23):
root [~]# dd if=testfile.tar.gz of=test6 oflag=sync bs=4M
547682517 bytes (548 MB) copied, 116.469 s, 4.7 MB/s
Again, I don't think my problem is related to any concurrent I/O
starvation. These SSDs or this mdraid or I don't know what simply can't
take any sustained write task. And this is not due to the server load.
Even during very low server loads it's enough to write about 1GB of data
within a short time frame (minutes) to bring the I/O system to it's
knees for a considerable time (at least tens of minutes).
4.7MB per second for writing a 548MB file starting from a load of 0.23
during off peak hours on SSDs. Nice!!!
Thanks!
On 22/04/2013 2:17 AM, Stan Hoeppner wrote:
> On 4/21/2013 3:46 PM, Andrei Banu wrote:
>> Hello,
>>
>> At this point I probably should state that I am not an experienced
>> sysadmin.
> Things are becoming more clear now.
>
>> Knowing this, I do have a server management company but they
>> said they don't know what to do
> So you own this hardware and it is colocated, correct?
>
>> so now I am trying to fix things myself
>> but I am something of a noob. I normally try to keep my actions to
>> cautious config changes and testing.
> Why did you choose Centos? Was this installed by the company?
>
>> I have never done a kernel update.
>> Any easy way to do this?
> It may not be necessary, at least to solve any SSD performance problems
> anyway. Reexamining your numbers shows you hit 262MB/s to /dev/sda.
> That's 65% of SATA2 interface bandwidth, so this kernel probably does
> have the patch. Your problem lie elsewhere.
>
>> Regarding your second advice (to purchase a decent HBA) I have already
>> thought about it but I guess it comes with it's own drivers that need to
>> be compiled into initramfs etc.
> The default CentOS (RHEL) initramfs should include mptsas, which
> supports all the LSI HBAs. The LSI caching RAID cards are supported as
> well with megaraid_sas.
>
> The question is, do you really need more than the ~260MB/s of peak
> throughput you currently have? And is it worth the hassle?
>
>> So I am trying to replace the baseboard
>> with one with SATA3 support to avoid any configuration changes (the old
>> board has the C202 chipset and the new one has C204 so I guess this
>> replacement is as simple as it gets - just remove the old board and plug
>> the new one without any software changes or recompiles). Again I need to
>> say this server is in production and I can't move the data or the users.
>> I can have a few hours downtime during the night but that's about all.
> It's not clear your problem is hardware bandwidth. In fact it seems the
> problem lie elsewhere. It may simply be that you're running these tests
> while other substantial IO is occurring. Actually, your numbers show
> this is exactly the case. What they don't show is how much other IO is
> hitting the SSDs while you're running your tests.
>
>> Regarding the kernel upgrade, do we need to compile one from source or
>> there's an easier way?
> I don't believe at this point you need a new kernel to fix the problem
> you have. If this patch was not present you'd not be able to get
> 260MB/s from SATA2. Your problem lie elsewhere.
>
> In the future, instead of making a post saying "md is slow, my SSDs are
> slow" and pasting test data which appears to back that claim, you'd be
> better served by describing a general problem, such as "users say the
> system is slow and I think it may be md or SSD related". This way we
> don't waste time following a troubleshooting path based on incorrect
> assumptions, as we've done here. Or at least as I've done here, as I'm
> the only one assisting.
>
> Boot all users off the system, shut down any daemons that may generate
> any meaningful load on the disks or CPUs. Disable any encryption or
> compression. Then rerun your tests while completely idle. Then we'll
> go from there.
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 7:51 ` Tommy Apel
2013-04-22 8:29 ` Tommy Apel
2013-04-22 10:26 ` Andrei Banu
@ 2013-04-22 23:21 ` Stan Hoeppner
2 siblings, 0 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-22 23:21 UTC (permalink / raw)
To: Tommy Apel; +Cc: Andrei Banu, linux-raid Raid
On 4/22/2013 2:51 AM, Tommy Apel wrote:
> Stan>
> That was exactly what I was trying to show, that you result may vary
> depending on data and backing device, as far as the raid1 goes it
> doesn't care much for the data beeing passed through it.
As I mentioned, this is true of the SandForce 2nd gen ASICs, maybe some
others. The Samsung SSDs use a home grown Samsung controller which
doesn't do compression. Its performance doesn't vary due to data
content. Thus the performance gap you demonstrated doesn't apply to Andrei.
We can eliminate this as a possible cause of his apparently horrible
performance. And I think we can eliminate the regression in 2.6.32 as
that patch seems to be included in his kernel, otherwise he'd likely not
get 260MB/s in his dd raw read tests. The mystery continues...
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-21 23:17 ` Stan Hoeppner
2013-04-22 10:19 ` Andrei Banu
2013-04-22 23:11 ` Andrei Banu
@ 2013-04-22 23:25 ` Stan Hoeppner
2013-04-23 4:49 ` Mikael Abrahamsson
2 siblings, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-22 23:25 UTC (permalink / raw)
To: stan; +Cc: Andrei Banu, linux-raid
On 4/21/2013 6:17 PM, Stan Hoeppner wrote:
> It may not be necessary, at least to solve any SSD performance problems
> anyway. Reexamining your numbers shows you hit 262MB/s to /dev/sda.
> That's 65% of SATA2 interface bandwidth, so this kernel probably does
> have the patch. Your problem lie elsewhere.
Big correction. That should state 87% of SATA2 interface bandwidth. I
must have been thinking of three things at once when I fubar'd that, as
that's not simply a typo.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 10:19 ` Andrei Banu
@ 2013-04-23 2:51 ` Stan Hoeppner
2013-04-23 10:17 ` Andrei Banu
0 siblings, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-23 2:51 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 4/22/2013 5:19 AM, Andrei Banu wrote:
> Hello!
>
> First off allow me to apologize if my rumbling sent you in a wrong
> direction and thank you for assisting.
No harm done, and you're welcome.
> The actual problem is that when I write any larger file hundreds of MB
> or more to the server (from network or from the same server) the server
> starts to overload. The server can overload to over 100 for files of ~
> 5GB. I mean this server has an average load of 0.52 (sar -q) but it can
> spike to 3 digit server loads in a few minutes from making or
> downloading a larger cPanel backup file. I have to rely only on R1Soft
> for backups right now because the normal cPanel backups make the server
> unstable when it backs up accounts over 1GB (many).
Describing this problem in terms of load average isn't very helpful.
What would be is 'perf top -U' output so we can see what is eating cpu,
simultaneously with 'iotop' so we see what's eating IO.
> So I concluded this is due to very low write speeds so I ran the 'dd'
It's most likely that the low disk throughput is a symptom of the
problem, which is lurking elsewhere awaiting discovery.
> 1. Some said the low write speed might be due to a bad cable.
Very unlikely, but possible. This is easy to verify. Does dmesg show
hundreds of "hard resetting link" messages.
> 2. I have observed a very big difference between /dev/sda and /dev/sdb
> and I thought it might me indicative of a problem somewhere. If I run
> hdparm -t /dev/sda I get about 215MB/s but on /dev/sdb I get about
> 80-90MB/s. Only if I add --direct flag I get 260MB/s for /dev/sda.
> Previously when I added --direct for /dev/sdb I was getting about
> 180MB/s but now I get ~85MB/s with or without --direct.
I simply chalked up the difference to IO load variance between test runs
of hdparm. If one SSD is always that much slower there may be a problem
with the drive or controller but it's not likely. If you haven't
already, swap the cable on the slow drive with new one. In fact, SATA
cables are cheap as dirt so I'd swap them both just for piece of mind.
> root [/]# hdparm -t /dev/sdb
> Timing buffered disk reads: 262 MB in 3.01 seconds = 86.92 MB/sec
>
> root [/]# hdparm --direct -t /dev/sdb
> Timing O_DIRECT disk reads: 264 MB in 3.08 seconds = 85.74 MB/sec
...
> This is something new. /dev/sdb no longer gets to nearly 200MB/s (with
> --direct) but stays under 100MB/s in all cases. Maybe indeed it's a
> problem with the cable or with the device itself.
...
> And a 30 minutes later update: /dev/sdb returned to 90MB/s read speed
> WITHOUT --direct and 180MB/s WITH --direct. /dev/sda is constant (215
> without --direct and 260 with --direct). What do you make of this?
Show your partition tables again. My gut instinct tells me you have a
swap partition on /dev/sdb, and/or some other partition that is not part
of the RAID1, nor equally present on /dev/sda, that is/are being
accessed heavily at some times and not others, thus the throughput
discrepancy.
If this is the case, and the kernel is low on RAM due to an application
memory leak or just normal process load, that swap partition may become
critical. When when you start $big_file copy, the kernel goes into
overdrive swapping and/or dropping cache to make room for $big_file in
the write buffers. This could explain both your triple digit system
load and the decreased throughput on /dev/sdb.
The fdisk output you provided previously showed only 3 partitions per
SSD, all RAID autodetect, all in md/RAID1 I assume. However, the
symptoms you're reporting tend to suggest the partition layout I just
described, and could be responsible for the odd up/down throughput on sdb.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 12:02 ` Tommy Apel
@ 2013-04-23 2:59 ` Stan Hoeppner
0 siblings, 0 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-23 2:59 UTC (permalink / raw)
To: Tommy Apel; +Cc: Andrei Banu, linux-raid Raid
On 4/22/2013 7:02 AM, Tommy Apel wrote:
> Yes it can be run as it is, it will write to the file given by --filename=
>
> well from what I make of it so far I wouldn't rule out the bad device
> part but at the same time there could be other things involved
> although I don't belive it to be the md part
>
> Stan> do you know anything about the state of ext4 on centos 6.x ?
Enough to assume it's not part of the problem here. Andrei's hdparm
below the filesystem layer throughput is bouncing up/down by ~100MB/s
depending on when he runs it.
If he's using LVM and has active snapshots that would definitely cause
some extra load, but in that case given his 3 RAID1 pairs it should
affect both drives equally. And that's not what we're seeing.
I hope my last post gets him closer to identifying the problem. The
perf top and iotop data doing $bigfile copy should be instructive.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 23:11 ` Andrei Banu
@ 2013-04-23 4:39 ` Stan Hoeppner
0 siblings, 0 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-23 4:39 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 4/22/2013 6:11 PM, Andrei Banu wrote:
...
> 1. First test (bs=1G): same as always.
> root [~]# dd if=testfile.tar.gz of=test oflag=sync bs=1G
> 547682517 bytes (548 MB) copied, 53.3767 s, 10.3 MB/s
...
> root [~]# dd if=testfile.tar.gz of=test6 oflag=sync bs=4M
> 547682517 bytes (548 MB) copied, 116.469 s, 4.7 MB/s
...
> Again, I don't think my problem is related to any concurrent I/O
> starvation. These SSDs or this mdraid or I don't know what simply can't
> take any sustained write task. And this is not due to the server load.
> Even during very low server loads it's enough to write about 1GB of data
> within a short time frame (minutes) to bring the I/O system to it's
> knees for a considerable time (at least tens of minutes).
Something's going on here. Ditch dd for now. What's the result of:
$ echo 3 > /proc/sys/vm/drop_caches
$ time cp testfile.tar.gz testxx.tmp; sync
548/real = xx MB/s
And now ditch flushing FS buffers:
$ echo 3 > /proc/sys/vm/drop_caches
$ time cp testfile.tar.gz testxx.tmp
548/real = xx MB/s
And please paste this so we can see how you're mounting EXT4.
$ cat /etc/fstab |grep ext
Mounting data=journal will decrease write throughput by 50% as
everything is written twice: once to the journal, once into the
filesystem. This wouldn't account for the entire performance deficit
though.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-22 23:25 ` Stan Hoeppner
@ 2013-04-23 4:49 ` Mikael Abrahamsson
0 siblings, 0 replies; 38+ messages in thread
From: Mikael Abrahamsson @ 2013-04-23 4:49 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: Andrei Banu, linux-raid
On Mon, 22 Apr 2013, Stan Hoeppner wrote:
> On 4/21/2013 6:17 PM, Stan Hoeppner wrote:
>
>> It may not be necessary, at least to solve any SSD performance problems
>> anyway. Reexamining your numbers shows you hit 262MB/s to /dev/sda.
>> That's 65% of SATA2 interface bandwidth, so this kernel probably does
>> have the patch. Your problem lie elsewhere.
>
> Big correction. That should state 87% of SATA2 interface bandwidth. I
> must have been thinking of three things at once when I fubar'd that, as
> that's not simply a typo.
As far as I know, the 300 megabyte/s of SATA2 bw doesn't include coding
overhead etc, so it's not theoretically possible to reach all the way up
to 300. From all tests I've seen, around 260-270 megabyte/s seems to be
maximum that can be achievable, so I'd say 262 MB/s is basically as much
as can be expected from SATA2.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-19 22:58 Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO Andrei Banu
` (2 preceding siblings ...)
[not found] ` <51732E2B.6090607@hardwarefreak.com>
@ 2013-04-23 6:01 ` Stan Hoeppner
3 siblings, 0 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-23 6:01 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 4/19/2013 5:58 PM, Andrei Banu wrote:
> Hardware: SuperMicro 5017C-MTRF
Not relevant if you're using SATA ports 0-1, but may well be if using
2-5, assuming this system isn't brand new. As I said previously, you'd
see some errors in dmesg if you had port/cable issues. From:
Intel® 6 Series Chipset and Intel® C200 Series Chipset Specification Update
Problem: Due to a circuit design issue on Intel 6 Series Chipset and
Intel C200 Series Chipset, electrical lifetime wear out may affect clock
distribution for SATA ports 2-5. This may manifest itself as a
functional issue on SATA ports 2-5 over time.
•The electrical lifetime wear out may result in device oxide degradation
which over time can cause drain to gate leakage current.
•This issue has time, temperature and voltage sensitivities.
Implication: The increased leakage current may result in an unstable
clock and potentially functional issues on SATA ports 2-5 in the form of
receive errors, transmit errors, and unrecognized drives.
...
•SATA ports 0-1 are not affected by this design issue as they have
separate clock generation circuitry.
Workaround: Intel has worked with board and system manufacturers to
identify and implement solutions for affected systems.
•Use only SATA ports 0-1.
•Use an add-in PCIe SATA bridge solution.
Not all boards are affected by this. You'd have to check the spec
revision on your C202, which means contacting SuperMicro with your board
revision/serial number. To be certain you're not affected simply use
only ports 0-1. But on that note...
It may be an opportune time to consider dropping in a LSI 9211-4i.
4GB/s raw throughput, plenty for 4 SSDs at full boogie should you
expand. The kit version comes with a 1-4 breakout cable for your 1U SM
chassis drive backplane. Even if we get your issue fixed via software
and both drives are humming away at ~260MB/s, that nightly backup
process you mentioned, and others, would surely benefit from an
additional ~200MB/s throughput.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-23 2:51 ` Stan Hoeppner
@ 2013-04-23 10:17 ` Andrei Banu
2013-04-24 3:24 ` Stan Hoeppner
0 siblings, 1 reply; 38+ messages in thread
From: Andrei Banu @ 2013-04-23 10:17 UTC (permalink / raw)
Cc: linux-raid
Hi,
I am sorry for the very long email. And thanks a lot for all your patience.
1. DMESG doesn't show any "hard resetting link" at all.
2. The SSDs are connected to ATA 0 and ATA1. The server is brand new (or
at least it should be).
3. Partition table:
root [~]# cat /etc/fstab
# Created by anaconda on Wed Apr 3 17:22:52 2013
UUID=8fedde2c-f5b7-4edf-975f-d8d087d79ebf / ext4
noatime,usrjquota=quota.user,jqfmt=vfsv0 1 1
UUID=bfc50d02-6d4d-4510-93ea-27941cd49cf4 /boot ext4
noatime,defaults 1 2
UUID=cef1d19d-2578-43db-9ffc-b6b70e227bfa swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/usr/tmpDSK /tmp ext3
noatime,defaults,noauto 0 0
root [~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=8a4b7005:a4f71a13:7d4659cf:104f9a4f
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=ead5b5ca:9f5397a2:3b488cbe:11eb8bdb
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=44efd14d:8bcd26d4:4d1fda9f:a4b5fe14
root [/]# mount
/dev/md2 on / type ext4 (rw,noatime,usrjquota=quota.user,jqfmt=vfsv0)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/md0 on /boot type ext4 (rw,noatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)
And now the tests you indicated:
4.
root [/]# echo 3 > /proc/sys/vm/drop_caches
root [~]# time cp largefile.tar.gz test03.tmp; time sync;
(this is probably when the file is read into some swap/cache)
real 0m3.052s
user 0m0.010s
sys 0m0.612s
(this is probably when the file is actually written)
real 1m2.570s
user 0m0.000s
sys 0m0.011s
root [/]# echo 3 > /proc/sys/vm/drop_caches
root [~]# time cp largefile.tar.gz test04.tmp;
real 0m3.848s
user 0m0.004s
sys 0m0.634s
After about 15 seconds the server load started to increase from 1,
spiked to 40 in about a minute and then it started decreasing.
5. The perf top -U output during a dd copy:
Samples: 2M of event 'cycles', Event count (approx.): 19505138470
9.10% [kernel] [k] page_fault
5.56% [kernel] [k] clear_page_c_e
3.29% [kernel] [k] list_del
2.51% [kernel] [k] unmap_vmas
2.50% [kernel] [k] __mem_cgroup_commit_charge
2.50% [kernel] [k] mem_cgroup_update_file_mapped
2.26% [kernel] [k] port_inb
1.89% [kernel] [k] shmem_getpage_gfp
1.78% [kernel] [k] _spin_lock
1.72% [kernel] [k] __alloc_pages_nodemask
1.67% [kernel] [k] __mem_cgroup_uncharge_common
1.61% [kernel] [k] free_pcppages_bulk
1.59% [kernel] [k] get_page_from_freelist
1.56% [kernel] [k] alloc_pages_vma
1.37% [kernel] [k] get_page
1.26% [kernel] [k] release_pages
1.22% [kernel] [k] radix_tree_lookup_slot
1.19% [kernel] [k] lookup_page_cgroup
1.11% [kernel] [k] handle_mm_fault
0.98% [kernel] [k] __wake_up_bit
0.98% [kernel] [k] copy_page_c
0.97% [kernel] [k] __d_lookup
0.94% [kernel] [k] __do_fault
0.92% [kernel] [k] free_hot_cold_page
0.80% [kernel] [k] find_vma
6. iotop is very dynamic and I am afraid the data I am providing will be
unclear but let me give a number of snapshots from during the large file
copy and maybe you can make something of it (samples a few seconds apart):
Total DISK READ: 15.39 K/s | Total DISK WRITE: 169.29 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
4236 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 0.00 % [httpd]
4662 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 0.00 % [httpd]
31126 be/4 mysql 0.00 B/s 46.17 K/s 0.00 % 0.00 % mysqld
--basedir=/ --datadir=/var/lib/mysql --user=mysql
--log-error=/var/lib/mysql/server.err --open-files-limit=50000
--pid-file=/var/$
4971 be/4 nobody 0.00 B/s 23.08 K/s 0.00 % 0.00 % [httpd]
5284 be/4 nobody 0.00 B/s 7.69 K/s 0.00 % 0.00 % [httpd]
9522 be/4 user 7.69 K/s 38.47 K/s 0.00 % 0.00 % spamd child
5547 be/4 nobody 0.00 B/s 7.69 K/s 0.00 % 0.00 % [httpd]
!!!!!! 6085 be/4 root 7.69 K/s 1004.85 M/s 0.00 % 0.00 % dd
if=largefile.tar.gz of=test10 oflag=sync bs=1G
Total DISK READ: 7.71 K/s | Total DISK WRITE: 29.91 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
506 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [md2_raid1]
30861 be/4 root 0.00 B/s 7.71 K/s 0.00 % 0.00 % httpd -k
start -DSSL
31346 be/4 root 0.00 B/s 7.71 K/s 0.00 % 0.00 % tailwatchd
1457 be/3 root 0.00 B/s 7.71 K/s 0.00 % 0.00 % auditd
5914 be/4 root 7.71 K/s 0.00 B/s 0.00 % 0.00 % cpanellogd
- scanning logs
6085 be/4 root 0.00 B/s 7.71 K/s 0.00 % 0.00 % dd
if=largefile.tar.gz of=test10 oflag=sync bs=1G
Total DISK READ: 0.00 B/s | Total DISK WRITE: 29.30 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
9522 be/4 user 0.00 B/s 0.00 B/s 0.00 % 99.99 % spamd child
506 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [md2_raid1]
31346 be/4 root 0.00 B/s 7.73 K/s 0.00 % 0.00 % tailwatchd
1397 be/4 root 0.00 B/s 7.73 K/s 0.00 % 0.00 % [flush-9:2]
6085 be/4 root 0.00 B/s 15.45 K/s 0.00 % 0.00 % dd
if=largefile.tar.gz of=test10 oflag=sync bs=1G
Total DISK READ: 12.43 K/s | Total DISK WRITE: 5.96 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
5914 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % cpanellogd
- setting up logs for promusic
6101 be/4 mailnull 0.00 B/s 353.61 B/s 0.00 % 99.99 % exim -bd -q1h
6107 be/4 user 0.00 B/s 0.00 B/s 0.00 % 99.99 % pop3
6124 be/4 nobody 0.00 B/s 353.61 B/s 0.00 % 99.99 % httpd -k
start -DSSL
9522 be/4 user 1060.83 B/s 184.06 K/s 0.00 % 99.99 % spamd child
1669 be/4 root 0.00 B/s 2.42 K/s 0.00 % 99.99 % rsyslogd -i
/var/run/syslogd.pid -c 5
1235 be/4 root 0.00 B/s 2.42 K/s 0.00 % 98.28 % [kjournald]
506 be/4 root 0.00 B/s 0.00 B/s 0.00 % 28.46 % [md2_raid1]
541 be/3 root 0.00 B/s 34.04 M/s 0.00 % 3.43 % [jbd2/md2-8]
Total DISK READ: 303.21 K/s | Total DISK WRITE: 60.64 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
1235 be/4 root 0.00 B/s 60.64 K/s 0.00 % 99.99 % [kjournald]
541 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.16 % [jbd2/md2-8]
1232 be/0 root 0.00 B/s 0.00 B/s 0.00 % 81.07 % [loop0]
11449 be/4 mysql 250.15 K/s 0.00 B/s 0.00 % 12.84 % mysqld
--basedir=/ --datadir=/var/lib/mysql --user=mysql
--log-error=/var/lib/mysql/server.err --open-files-limit=50000
--pid-file=/var/$
6085 be/4 root 7.58 K/s 30.32 K/s 0.00 % 5.24 % dd
if=largefile.tar.gz of=test10 oflag=sync bs=1G
Total DISK READ: 2023.83 K/s | Total DISK WRITE: 82.31 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
6085 be/4 root 0.00 B/s 38.04 K/s 0.00 % 99.99 % dd
if=largefile.tar.gz of=test10 oflag=sync bs=1G
6267 be/4 user 0.00 B/s 0.00 B/s 0.00 % 99.99 % pop3
6291 be/4 user 0.00 B/s 0.00 B/s 0.00 % 99.99 % pop3
541 be/3 root 0.00 B/s 492.43 M/s 0.00 % 99.99 % [jbd2/md2-8]
6282 be/4 nobody 730.40 K/s 0.00 B/s 0.00 % 99.99 % httpd -k
start -DSSL
506 be/4 root 0.00 B/s 0.00 B/s 0.00 % 52.39 % [md2_raid1]
Total DISK READ: 74.61 K/s | Total DISK WRITE: 8.66 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
6282 be/4 nobody 26.55 K/s 0.00 B/s 0.00 % 97.65 % httpd -k
start -DSSL
541 be/3 root 0.00 B/s 7.04 M/s 0.00 % 95.64 % [jbd2/md2-8]
1235 be/4 root 0.00 B/s 0.00 B/s 0.00 % 94.07 % [kjournald]
1394 be/4 root 0.00 B/s 0.00 B/s 0.00 % 89.26 % [flush-7:0]
506 be/4 root 0.00 B/s 0.00 B/s 0.00 % 31.66 % [md2_raid1]
Total DISK READ: 544.44 K/s | Total DISK WRITE: 82.08 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
1235 be/4 root 0.00 B/s 129.31 K/s 0.00 % 99.99 % [kjournald]
541 be/3 root 0.00 B/s 63.57 M/s 0.00 % 99.99 % [jbd2/md2-8]
31119 be/4 mysql 0.00 B/s 61.25 K/s 0.00 % 88.49 % mysqld
--basedir=/ --datadir=/var/lib/mysql --user=mysql
--log-error=/var/lib/mysql/server.err --open-files-limit=50000
--pid-file=/var/$
506 be/4 root 0.00 B/s 0.00 B/s 0.00 % 72.41 % [md2_raid1]
31346 be/4 root 0.00 B/s 20.42 K/s 0.00 % 69.36 % tailwatchd
1232 be/0 root 0.00 B/s 183.75 K/s 0.00 % 54.04 % [loop0]
6085 be/4 root 3.40 K/s 40.83 K/s 0.00 % 26.49 % dd
if=largefile.tar.gz of=test10 oflag=sync bs=1G
11561 be/4 mysql 0.00 B/s 45.64 M/s 0.00 % 0.00 % mysqld
--basedir=/ --datadir=/var/lib/mysql --user=mysql
--log-error=/var/lib/mysql/server.err --open-files-limit=50000
--pid-file=/var/$
I have also run it with the "-a" flag and there is something interesting
(looong though heavily greped output below).
This is taken during the 'dd oflag=sync' copy. It seems it does
something right at the beginning (writes about 250MB of that files) than
it mostly idles through the end:
Total DISK READ: 333.35 K/s | Total DISK WRITE: 38.76 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 332.00 K 0.00 % 0.49 % [jbd2/md2-8]
13467 be/4 root 0.00 B 4.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
13479 be/4 root 4.00 K 250.12 M 0.00 % 0.00 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
Total DISK READ: 4.84 M/s | Total DISK WRITE: 11.77 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 332.00 K 0.00 % 0.37 % [jbd2/md2-8]
13467 be/4 root 0.00 B 4.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
13479 be/4 root 4.00 K 250.12 M 0.00 % 0.00 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
Total DISK READ: 0.00 B/s | Total DISK WRITE: 379.93 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 332.00 K 0.00 % 0.30 % [jbd2/md2-8]
13467 be/4 root 0.00 B 8.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
13479 be/4 root 4.00 K 250.12 M 0.00 % 0.00 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
1232 be/0 root 0.00 B 244.00 K 0.00 % 0.00 % [loop0]
1397 be/4 root 0.00 B 24.00 K 0.00 % 0.00 % [flush-9:2]
Total DISK READ: 0.00 B/s | Total DISK WRITE: 69.69 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
13479 be/4 root 4.00 K 250.16 M 0.00 % 79.98 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
541 be/3 root 0.00 B 458.64 M 0.00 % 0.25 % [jbd2/md2-8]
13467 be/4 root 0.00 B 8.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
1232 be/0 root 0.00 B 244.00 K 0.00 % 0.00 % [loop0]
1397 be/4 root 0.00 B 24.00 K 0.00 % 0.00 % [flush-9:2]
Total DISK READ: 20.81 K/s | Total DISK WRITE: 6.07 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 765.19 M 0.00 % 83.17 % [jbd2/md2-8]
1235 be/4 root 0.00 B 0.00 B 0.00 % 78.06 % [kjournald]
13479 be/4 root 8.00 K 250.24 M 0.00 % 60.66 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 35.01 % [md2_raid1]
1394 be/4 root 0.00 B 0.00 B 0.00 % 11.25 % [flush-7:0]
Total DISK READ: 43.28 K/s | Total DISK WRITE: 34.09 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 767.47 M 0.00 % 84.84 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 70.65 % [kjournald]
13479 be/4 root 12.00 K 250.29 M 0.00 % 65.12 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 31.81 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 14.57 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 9.71 % [flush-7:0]
1397 be/4 root 0.00 B 3.44 M 0.00 % 1.47 % [flush-9:2]
13467 be/4 root 0.00 B 12.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 3.85 K/s | Total DISK WRITE: 35.28 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.32 M 0.00 % 84.36 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 83.53 % [kjournald]
13479 be/4 root 12.00 K 250.30 M 0.00 % 65.05 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 32.55 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 14.21 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 9.46 % [flush-7:0]
1397 be/4 root 0.00 B 3.45 M 0.00 % 1.48 % [flush-9:2]
Total DISK READ: 3.91 K/s | Total DISK WRITE: 3.91 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.32 M 0.00 % 82.29 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 81.48 % [kjournald]
13479 be/4 root 12.00 K 250.30 M 0.00 % 63.37 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 31.75 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 13.86 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 9.23 % [flush-7:0]
1397 be/4 root 0.00 B 3.45 M 0.00 % 1.44 % [flush-9:2]
13467 be/4 root 0.00 B 28.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 15.64 K/s | Total DISK WRITE: 15.32 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.71 M 0.00 % 85.51 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 79.53 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 61.78 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 36.15 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 13.53 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 9.01 % [flush-7:0]
1397 be/4 root 0.00 B 3.45 M 0.00 % 6.60 % [flush-9:2]
13467 be/4 root 0.00 B 32.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.71 M 0.00 % 85.77 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 75.90 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 58.82 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 34.51 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 12.91 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 8.60 % [flush-7:0]
1397 be/4 root 0.00 B 3.45 M 0.00 % 6.30 % [flush-9:2]
31346 be/4 root 0.00 B 120.00 K 0.00 % 3.42 % tailwatchd
13467 be/4 root 0.00 B 44.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 19.56 K/s | Total DISK WRITE: 10.12 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 86.39 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 74.21 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 64.36 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 39.86 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 12.62 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 8.41 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 6.16 % [flush-9:2]
13467 be/4 root 0.00 B 48.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 15.65 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 87.13 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 72.58 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 65.64 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 38.98 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 12.34 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 8.22 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 6.03 % [flush-9:2]
13467 be/4 root 0.00 B 52.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 46.71 K/s | Total DISK WRITE: 38.92 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 87.24 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 71.03 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 66.24 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 38.15 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 12.08 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 8.05 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.90 % [flush-9:2]
13467 be/4 root 0.00 B 56.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 87.63 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 69.54 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 67.10 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 42.88 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 11.83 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.88 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.78 % [flush-9:2]
13467 be/4 root 0.00 B 60.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 7.82 K/s | Total DISK WRITE: 0.00 B/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 87.91 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 68.12 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 67.83 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 42.01 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 11.59 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.72 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.66 % [flush-9:2]
13467 be/4 root 0.00 B 68.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 50.84 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 88.16 % [jbd2/md2-8]
1235 be/4 root 0.00 B 28.00 K 0.00 % 66.75 % [kjournald]
13479 be/4 root 12.00 K 250.31 M 0.00 % 68.51 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
506 be/4 root 0.00 B 0.00 B 0.00 % 41.16 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 11.35 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.56 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.54 % [flush-9:2]
13467 be/4 root 0.00 B 72.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 3.91 K/s | Total DISK WRITE: 93.83 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 88.33 % [jbd2/md2-8]
13479 be/4 root 12.00 K 250.31 M 0.00 % 69.09 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
1235 be/4 root 0.00 B 28.00 K 0.00 % 65.44 % [kjournald]
506 be/4 root 0.00 B 0.00 B 0.00 % 40.35 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 11.13 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.41 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.43 % [flush-9:2]
31346 be/4 root 0.00 B 120.00 K 0.00 % 2.95 % tailwatchd
13467 be/4 root 0.00 B 76.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 88.53 % [jbd2/md2-8]
13479 be/4 root 12.00 K 250.31 M 0.00 % 69.69 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
1235 be/4 root 0.00 B 28.00 K 0.00 % 64.18 % [kjournald]
506 be/4 root 0.00 B 0.00 B 0.00 % 39.57 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 10.91 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.27 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.33 % [flush-9:2]
13467 be/4 root 0.00 B 80.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 15.64 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 88.20 % [jbd2/md2-8]
13479 be/4 root 12.00 K 250.31 M 0.00 % 69.72 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
1235 be/4 root 0.00 B 28.00 K 0.00 % 62.96 % [kjournald]
506 be/4 root 0.00 B 0.00 B 0.00 % 38.82 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 10.71 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.13 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.23 % [flush-9:2]
13467 be/4 root 0.00 B 84.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 88.61 % [jbd2/md2-8]
13479 be/4 root 12.00 K 250.31 M 0.00 % 70.50 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
1235 be/4 root 0.00 B 28.00 K 0.00 % 61.79 % [kjournald]
506 be/4 root 0.00 B 0.00 B 0.00 % 38.10 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 10.51 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 7.00 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.13 % [flush-9:2]
13467 be/4 root 0.00 B 92.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
Total DISK READ: 258.12 K/s | Total DISK WRITE: 86.04 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
541 be/3 root 0.00 B 768.76 M 0.00 % 89.19 % [jbd2/md2-8]
13479 be/4 root 12.00 K 250.31 M 0.00 % 71.45 % dd if=largefile.tar.gz of=test11 oflag=sync bs=1G
1235 be/4 root 0.00 B 28.00 K 0.00 % 60.66 % [kjournald]
506 be/4 root 0.00 B 0.00 B 0.00 % 37.40 % [md2_raid1]
1232 be/0 root 0.00 B 1568.00 K 0.00 % 10.32 % [loop0]
1394 be/4 root 0.00 B 0.00 B 0.00 % 6.87 % [flush-7:0]
1397 be/4 root 0.00 B 3.46 M 0.00 % 5.04 % [flush-9:2]
13467 be/4 root 0.00 B 96.00 K 0.00 % 0.00 % python /usr/bin/iotop -baoP -d 1
I appologize for such a lengthy email!
Kind regards!
Andrei Banu
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-23 10:17 ` Andrei Banu
@ 2013-04-24 3:24 ` Stan Hoeppner
2013-04-24 8:26 ` Andrei Banu
0 siblings, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-24 3:24 UTC (permalink / raw)
To: Andrei Banu
On 4/23/2013 5:17 AM, Andrei Banu wrote:
> I am sorry for the very long email. And thanks a lot for all your patience.
From now on simply provide what is asked for. That keeps the length
manageable and the info relevant, and allows us to help you get to a
solution more quickly without being bogged down.
> 1. DMESG doesn't show any "hard resetting link" at all.
Then it seems you don't have hardware problems.
> 2. The SSDs are connected to ATA 0 and ATA1. The server is brand new (or
> at least it should be).
Nor the Intel 6 Series SATA problem.
> 3. Partition table:
/etc/fstab contains mount points, not the partition table.
> root [~]# cat /etc/fstab
> UUID=cef1d19d-2578-43db-9ffc-b6b70e227bfa swap swap defaults 0 0
I can't discern from UUID where your swap partition is located. Is it a
partition directly on an SSD or is it a partition atop md1?
> root [/]# echo 3 > /proc/sys/vm/drop_caches
> root [~]# time cp largefile.tar.gz test03.tmp; time sync;
You're slowing us down here. Please execute commands as instructed
without modification. The above is wrong. You don't call time twice.
If you're worried about sync execution being included time, use:
$ time (cp src.tmp src.temp; sync)
Though it makes little difference as Linux is pretty good about flushing
the last few write buffers. But you missed the important part, the math
for bandwidth determination: 548/real = xx MB/s
This is cp not dd. It's up to you to do the math. Using time allows
you to do so. 548MB is my example using your previous file size in your
tests. Modify accordingly if needed.
*Important note* The job of this list is to provide knowledge transfer,
advice, and assistance. You must do the work, and you must learn along
the way. We don't fix people's problems, as we don't have access to
their computers. What we do is *enable* people to fix their problems
themselves.
> After about 15 seconds the server load started to increase from 1,
> spiked to 40 in about a minute and then it started decreasing.
Please stop telling us this. Linux load average is irrelevant.
> 5. The perf top -U output during a dd copy:
This was supposed to be executed before and simultaneously with the cp
operation above. Do you know how to use multiple terminal windows?
> 6. iotop
Again, this was supposed to be run with the cp command, exited toward
the end of the cp operation, then copy/pasted.
is very dynamic and I am afraid the data I am providing will be
> unclear but let me give a number of snapshots from during the large file
> copy and maybe you can make something of it (samples a few seconds apart):
> !!!!!! 6085 be/4 root 7.69 K/s 1004.85 M/s 0.00 % 0.00 % dd
> if=largefile.tar.gz of=test10 oflag=sync bs=1G
This is another example of why you don't use dd for IO testing, and
especially with a block size of 1GB. dd buffers into RAM up to
$block_size bytes before it begins flushing to disk. So what you're
seeing here is that massive push at the beginning of the run. Your SSDs
in RAID1 peak at ~265MB/s. iotop is showing 1GB/s, 4 times what the
drives can do. This is obviously not real.
You can get away with oflag=sync using 1GB block size. But if you run
dd the only way it can be run for realistic results, using bs=4096 which
matches every filesystem block size including EXTx, XFS, and JFS, then
using iflag=sync will degrade your performance, an ack is required on
each block. That's what sync does. With SSD it won't be nearly as
dramatic as rust, where the difference in runtime is 100-200x slower due
to rotational latency.
> I appologize for such a lengthy email!
Don't apologize, just don't send more information than needed,
especially if you don't know it's relevant. ;) Send only what's
requested, and as requested, please.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 3:24 ` Stan Hoeppner
@ 2013-04-24 8:26 ` Andrei Banu
2013-04-24 9:12 ` Adam Goryachev
2013-04-24 16:37 ` Stan Hoeppner
0 siblings, 2 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-24 8:26 UTC (permalink / raw)
To: linux-raid
Hello,
I am sorry for the irrelevant feedback. Where I misunderstood your
request, I filled in the blanks (poorly).
1. SWAP
root [~]# blkid | grep cef1d19d-2578-43db-9ffc-b6b70e227bfa
/dev/md1: UUID="cef1d19d-2578-43db-9ffc-b6b70e227bfa" TYPE="swap"
So yes, swap is on md1. This *md1 has a size of 2GB*. Isn't this way too
low for a system with 16GB of memory?
2. Let me try again to give you the right test results:
Before the bigfile copy:
root [~]# perf top -U
Samples: 768 of event 'cycles', Event count (approx.): 499088870
18.58% [kernel] [k] port_inb
6.21% [kernel] [k] page_fault
3.36% [kernel] [k] clear_page_c_e
2.82% [kernel] [k] kallsyms_expand_symbol
1.99% [kernel] [k] __mem_cgroup_commit_charge
1.84% [kernel] [k] shmem_getpage_gfp
1.51% [kernel] [k] alloc_pages_vma
1.51% [kernel] [k] __alloc_pages_nodemask
1.46% [kernel] [k] avtab_search_node
1.45% [kernel] [k] format_decode
1.40% [kernel] [k] list_del
1.36% [kernel] [k] get_page_from_freelist
1.35% [kernel] [k] vsnprintf
1.29% [kernel] [k] avc_has_perm_noaudit
1.28% [kernel] [k] number
1.22% [kernel] [k] free_pcppages_bulk
1.21% [kernel] [k] ____pagevec_lru_add
1.14% [kernel] [k] get_page
1.08% [kernel] [k] memcpy
1.07% [kernel] [k] mem_cgroup_update_file_mapped
1.07% [kernel] [k] page_waitqueue
0.98% [kernel] [k] __d_lookup
0.97% [kernel] [k] unmap_vmas
0.91% [kernel] [k] _spin_lock
0.87% [kernel] [k] inode_has_perm
0.81% [kernel] [k] string
0.77% [kernel] [k] page_remove_rmap
0.73% [kernel] [k] __audit_syscall_exit
0.68% [kernel] [k] lookup_page_cgroup
0.61% [kernel] [k] unlock_page
0.61% [kernel] [k] shmem_find_get_pages_and_swap
0.61% [kernel] [k] free_hot_cold_page
0.61% [kernel] [k] release_pages
0.56% [kernel] [k] mem_cgroup_lru_del_list
0.55% [kernel] [k] strncpy_from_user
0.54% [kernel] [k] module_get_kallsym
0.52% [kernel] [k] find_get_page
0.50% [kernel] [k] __do_fault
0.48% [kernel] [k] path_put
0.46% [kernel] [k] __list_add
0.46% [kernel] [k] handle_mm_fault
0.45% [kernel] [k] __wake_up_bit
0.44% [kernel] [k] handle_pte_fault
0.43% [kernel] [k] audit_syscall_entry
0.43% [kernel] [k] thread_return
0.42% [kernel] [k] path_init
0.41% [kernel] [k] dput
0.40% [kernel] [k] task_has_capability
0.40% [kernel] [k] get_task_cred
0.40% [kernel] [k] pointer
0.40% [kernel] [k] _atomic_dec_and_lock
0.39% [kernel] [k] __link_path_walk
0.38% [kernel] [k] memset
0.37% [kernel] [k] do_lookup
0.34% [kernel] [k] radix_tree_lookup_slot
0.34% [kernel] [k] down_read_trylock
0.33% [kernel] [k] kmem_cache_alloc
0.31% [kernel] [k] __set_page_dirty_no_writeback
0.31% [kernel] [k] __inc_zone_state
0.31% [kernel] [k] __mem_cgroup_uncharge_common
root [~]# iotop
Total DISK READ: 0.00 B/s | Total DISK WRITE: 2.33 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
541 be/3 root 0.00 B/s 7.83 K/s 0.00 % 2.27 % [jbd2/md2-8]
8568 be/4 root 0.00 B/s 7.83 K/s 0.00 % 0.00 % lfd - sleeping
1457 be/3 root 0.00 B/s 7.83 K/s 0.00 % 0.00 % auditd
1669 be/4 root 0.00 B/s 3.91 K/s 0.00 % 0.00 % rsyslogd -i
/var/run/syslogd.pid -c 5
1695 be/4 named 0.00 B/s 3.91 K/s 0.00 % 0.00 % named -u named
31391 be/4 mysql 0.00 B/s 23.48 K/s 0.00 % 0.00 % mysqld
--basedir=/ --datadir=/var/lib/mysql --user=mysql --log-error=/var~r
--open-files-limit=50000 --pid-file=/var/lib/mysql/server.pid
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
5 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]
11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
13 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2]
14 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/2]
15 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
16 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
17 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3]
18 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/3]
19 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/4]
20 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/4]
21 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/4]
22 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/4]
23 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/5]
24 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/5]
25 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/5]
26 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/5]
27 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/6]
28 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/6]
29 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/6]
30 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/6]
31 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/7]
32 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/7]
33 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/7]
34 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/7]
35 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/0]
36 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/1]
37 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/2]
38 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/3]
39 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/4]
40 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/5]
41 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/6]
42 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [events/7]
43 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cgroup]
44 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [khelper]
45 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [netns]
46 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [async/mgr]
47 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [pm]
48 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [sync_supers]
49 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [bdi-default]
50 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kintegrityd/0]
51 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kintegrityd/1]
52 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kintegrityd/2]
53 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kintegrityd/3]
54 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kintegrityd/4]
Now the file copy with sync:
root [~]# time (cp largefile.tar.gz test05.tmp; sync)
real 1m33.923s
user 0m0.002s
sys 0m0.713s
Large file size: 523MB
BW determination: 523MB / 93.923 seconds = 5.56MB/s
File copy without sync:
root [~]# echo 3 > /proc/sys/vm/drop_caches
root [~]# time cp largefile.tar.gz test07.tmp
real 0m6.452s
user 0m0.007s
sys 0m0.687s
Large file size: 523MB
BW determination: 523MB / 6.452 seconds = 81.06 MB/s
During the copy (near the end: about 70 seconds into the copy - results
with sync):
Samples: 17K of event 'cycles', Event count (approx.): 5067697991
7.48% [kernel] [k] port_inb
5.40% [kernel] [k] page_fault
2.92% [kernel] [k] clear_page_c_e
2.29% [kernel] [k] list_del
2.21% [kernel] [k] _spin_lock
1.99% [kernel] [k] __d_lookup
1.92% [kernel] [k] avtab_search_node
1.64% [kernel] [k] unmap_vmas
1.59% [kernel] [k] get_page_from_freelist
1.55% [kernel] [k] __mem_cgroup_commit_charge
1.22% [kernel] [k] mem_cgroup_update_file_mapped
1.21% [kernel] [k] copy_page_c
1.04% [kernel] [k] find_vma
1.00% [kernel] [k] _spin_lock_irq
0.97% [kernel] [k] __wake_up_bit
0.94% [kernel] [k] __mem_cgroup_uncharge_common
0.92% [kernel] [k] get_page
0.91% [kernel] [k] __alloc_pages_nodemask
0.87% [kernel] [k] handle_mm_fault
0.85% [kernel] [k] __link_path_walk
0.84% [kernel] [k] avc_has_perm_noaudit
0.83% [kernel] [k] alloc_pages_vma
0.81% [kernel] [k] lookup_page_cgroup
0.80% [kernel] [k] __do_page_fault
0.80% [kernel] [k] free_pcppages_bulk
0.77% [kernel] [k] _spin_lock_irqsave
0.75% [kernel] [k] radix_tree_lookup_slot
0.73% [kernel] [k] kmem_cache_alloc
0.68% [ip_tables] [k] ipt_do_table
0.66% [kernel] [k] _atomic_dec_and_lock
0.65% [kernel] [k] release_pages
0.62% [kernel] [k] find_get_page
0.61% [kernel] [k] schedule
0.60% [kernel] [k] inode_has_perm
0.56% [kernel] [k] sidtab_context_to_sid
0.54% [kernel] [k] handle_pte_fault
0.53% [kernel] [k] _spin_unlock_irqrestore
0.53% [kernel] [k] memset
0.52% [kernel] [k] __inc_zone_state
0.51% [kernel] [k] update_curr
0.51% [kernel] [k] kfree
0.50% [kernel] [k] __list_add
0.50% [kernel] [k] __do_fault
0.49% [kernel] [k] shmem_getpage_gfp
0.47% [kernel] [k] filemap_fault
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
541 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.96 % [jbd2/md2-8]
12468 be/4 nobody 0.00 B/s 3.89 K/s 0.00 % 0.00 % httpd -k
start -DSSL
18818 be/4 mysql 0.00 B/s 3.89 K/s 0.00 % 0.00 % mysqld
--basedir=/ --da~sql/server.pid
12333 be/4 nobody 0.00 B/s 3.89 K/s 0.00 % 0.00 % httpd -k
start -DSSL
12560 be/4 nobody 0.00 B/s 3.89 K/s 0.00 % 0.00 % httpd -k
start -DSSL
12568 be/4 nobody 0.00 B/s 3.89 K/s 0.00 % 0.00 % httpd -k
start -DSSL
12281 be/4 nobody 0.00 B/s 3.89 K/s 0.00 % 0.00 % [httpd]
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
5 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]
11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
13 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2]
14 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/2]
15 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
16 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
17 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3]
18 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/3]
19 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/4]
20 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/4]
21 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/4]
22 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/4]
23 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/5]
24 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/5]
25 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/5]
26 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/5]
27 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/6]
Please let me know if I messed up again so that I can correct it.
@Adam
3. root [~]# fdisk -lu /dev/sd*
Disk /dev/sda: 512.1 GB, 512110190592 bytes
255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00026d59
Device Boot Start End Blocks Id System
/dev/sda1 2048 4196351 2097152 fd Linux raid
autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 * 4196352 4605951 204800 fd Linux raid
autodetect
Partition 2 does not end on cylinder boundary.
/dev/sda3 4605952 814106623 404750336 fd Linux raid
autodetect
Disk /dev/sda1: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders, total 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xfffefffe
Disk /dev/sda2: 209 MB, 209715200 bytes
255 heads, 63 sectors/track, 25 cylinders, total 409600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sda3: 414.5 GB, 414464344064 bytes
255 heads, 63 sectors/track, 50389 cylinders, total 809500672 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdb: 512.1 GB, 512110190592 bytes
255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003dede
Device Boot Start End Blocks Id System
/dev/sdb1 2048 4196351 2097152 fd Linux raid
autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2 * 4196352 4605951 204800 fd Linux raid
autodetect
Partition 2 does not end on cylinder boundary.
/dev/sdb3 4605952 814106623 404750336 fd Linux raid
autodetect
Disk /dev/sdb1: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders, total 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xfffefffe
Disk /dev/sdb2: 209 MB, 209715200 bytes
255 heads, 63 sectors/track, 25 cylinders, total 409600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdb3: 414.5 GB, 414464344064 bytes
255 heads, 63 sectors/track, 50389 cylinders, total 809500672 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Kind regards!
Andrei Banu
On 4/24/2013 6:24 AM, Stan Hoeppner wrote:
> root [~]# cat /etc/fstab
>> UUID=cef1d19d-2578-43db-9ffc-b6b70e227bfa swap swap defaults 0 0
> I can't discern from UUID where your swap partition is located. Is it a
> partition directly on an SSD or is it a partition atop md1?
>
>> root [/]# echo 3 > /proc/sys/vm/drop_caches
>> root [~]# time cp largefile.tar.gz test03.tmp; time sync;
> You're slowing us down here. Please execute commands as instructed
> without modification. The above is wrong. You don't call time twice.
> If you're worried about sync execution being included time, use:
> $ time (cp src.tmp src.temp; sync)
>
> Though it makes little difference as Linux is pretty good about flushing
> the last few write buffers. But you missed the important part, the math
> for bandwidth determination: 548/real = xx MB/s
>
> This is cp not dd. It's up to you to do the math. Using time allows
> you to do so. 548MB is my example using your previous file size in your
> tests. Modify accordingly if needed.
>
> *Important note* The job of this list is to provide knowledge transfer,
> advice, and assistance. You must do the work, and you must learn along
> the way. We don't fix people's problems, as we don't have access to
> their computers. What we do is *enable* people to fix their problems
> themselves.
>
>> After about 15 seconds the server load started to increase from 1,
>> spiked to 40 in about a minute and then it started decreasing.
> Please stop telling us this. Linux load average is irrelevant.
>
>> 5. The perf top -U output during a dd copy:
> This was supposed to be executed before and simultaneously with the cp
> operation above. Do you know how to use multiple terminal windows?
>
>> 6. iotop
> Again, this was supposed to be run with the cp command, exited toward
> the end of the cp operation, then copy/pasted.
>
> is very dynamic and I am afraid the data I am providing will be
>> unclear but let me give a number of snapshots from during the large file
>> copy and maybe you can make something of it (samples a few seconds apart):
>> !!!!!! 6085 be/4 root 7.69 K/s 1004.85 M/s 0.00 % 0.00 % dd
>> if=largefile.tar.gz of=test10 oflag=sync bs=1G
> This is another example of why you don't use dd for IO testing, and
> especially with a block size of 1GB. dd buffers into RAM up to
> $block_size bytes before it begins flushing to disk. So what you're
> seeing here is that massive push at the beginning of the run. Your SSDs
> in RAID1 peak at ~265MB/s. iotop is showing 1GB/s, 4 times what the
> drives can do. This is obviously not real.
>
> You can get away with oflag=sync using 1GB block size. But if you run
> dd the only way it can be run for realistic results, using bs=4096 which
> matches every filesystem block size including EXTx, XFS, and JFS, then
> using iflag=sync will degrade your performance, an ack is required on
> each block. That's what sync does. With SSD it won't be nearly as
> dramatic as rust, where the difference in runtime is 100-200x slower due
> to rotational latency.
>
>> I appologize for such a lengthy email!
> Don't apologize, just don't send more information than needed,
> especially if you don't know it's relevant. ;) Send only what's
> requested, and as requested, please.
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 8:26 ` Andrei Banu
@ 2013-04-24 9:12 ` Adam Goryachev
2013-04-24 10:24 ` Tommy Apel
2013-04-24 21:40 ` Andrei Banu
2013-04-24 16:37 ` Stan Hoeppner
1 sibling, 2 replies; 38+ messages in thread
From: Adam Goryachev @ 2013-04-24 9:12 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 24/04/13 18:26, Andrei Banu wrote:
> Hello,
>
> I am sorry for the irrelevant feedback. Where I misunderstood your
> request, I filled in the blanks (poorly).
>
> 1. SWAP
> root [~]# blkid | grep cef1d19d-2578-43db-9ffc-b6b70e227bfa
> /dev/md1: UUID="cef1d19d-2578-43db-9ffc-b6b70e227bfa" TYPE="swap"
>
> So yes, swap is on md1. This *md1 has a size of 2GB*. Isn't this way
> too low for a system with 16GB of memory?
>
Provide the output of "free", if there is RAM available, then it isn't
too small (that is my personal opinion, but at least it won't affect
performance/operations until you are using most of that swap space).
>
> 3. root [~]# fdisk -lu /dev/sd*
>
My mistake, I should have said:
fdisk -lu /dev/sd?
In any case, all of the relevant information was included, so no harm done.
> Disk /dev/sda: 512.1 GB, 512110190592 bytes
> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00026d59
>
> Device Boot Start End Blocks Id System
> /dev/sda1 2048 4196351 2097152 fd Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sda2 * 4196352 4605951 204800 fd Linux raid
> autodetect
> Partition 2 does not end on cylinder boundary.
> /dev/sda3 4605952 814106623 404750336 fd Linux raid
> autodetect
>
> Disk /dev/sdb: 512.1 GB, 512110190592 bytes
> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0003dede
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 2048 4196351 2097152 fd Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sdb2 * 4196352 4605951 204800 fd Linux raid
> autodetect
> Partition 2 does not end on cylinder boundary.
> /dev/sdb3 4605952 814106623 404750336 fd Linux raid
> autodetect
>
I'm assuming from this you have three md RAID1 arrays where sda1/sdb1
are a pair, sda2/sdb2 are a pair and sda3/sdb3 are a pair?
Can you describe what is on each of these arrays?
Output of
cat /proc/mdstat
df
pvs
lvs
Might be helpful....
Regards,
Adam
--
Adam Goryachev
Website Managers
www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 9:12 ` Adam Goryachev
@ 2013-04-24 10:24 ` Tommy Apel
2013-04-24 21:42 ` Andrei Banu
2013-04-24 21:40 ` Andrei Banu
1 sibling, 1 reply; 38+ messages in thread
From: Tommy Apel @ 2013-04-24 10:24 UTC (permalink / raw)
To: Adam Goryachev; +Cc: Andrei Banu, linux-raid Raid, stan
Looks to me like it's the journaled quota process that holds everything back.
2013/4/24 Adam Goryachev <mailinglists@websitemanagers.com.au>:
> On 24/04/13 18:26, Andrei Banu wrote:
>> Hello,
>>
>> I am sorry for the irrelevant feedback. Where I misunderstood your
>> request, I filled in the blanks (poorly).
>>
>> 1. SWAP
>> root [~]# blkid | grep cef1d19d-2578-43db-9ffc-b6b70e227bfa
>> /dev/md1: UUID="cef1d19d-2578-43db-9ffc-b6b70e227bfa" TYPE="swap"
>>
>> So yes, swap is on md1. This *md1 has a size of 2GB*. Isn't this way
>> too low for a system with 16GB of memory?
>>
> Provide the output of "free", if there is RAM available, then it isn't
> too small (that is my personal opinion, but at least it won't affect
> performance/operations until you are using most of that swap space).
>
>>
>> 3. root [~]# fdisk -lu /dev/sd*
>>
> My mistake, I should have said:
> fdisk -lu /dev/sd?
>
> In any case, all of the relevant information was included, so no harm done.
>> Disk /dev/sda: 512.1 GB, 512110190592 bytes
>> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk identifier: 0x00026d59
>>
>> Device Boot Start End Blocks Id System
>> /dev/sda1 2048 4196351 2097152 fd Linux raid
>> autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sda2 * 4196352 4605951 204800 fd Linux raid
>> autodetect
>> Partition 2 does not end on cylinder boundary.
>> /dev/sda3 4605952 814106623 404750336 fd Linux raid
>> autodetect
>>
>> Disk /dev/sdb: 512.1 GB, 512110190592 bytes
>> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk identifier: 0x0003dede
>>
>> Device Boot Start End Blocks Id System
>> /dev/sdb1 2048 4196351 2097152 fd Linux raid
>> autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sdb2 * 4196352 4605951 204800 fd Linux raid
>> autodetect
>> Partition 2 does not end on cylinder boundary.
>> /dev/sdb3 4605952 814106623 404750336 fd Linux raid
>> autodetect
>>
> I'm assuming from this you have three md RAID1 arrays where sda1/sdb1
> are a pair, sda2/sdb2 are a pair and sda3/sdb3 are a pair?
>
> Can you describe what is on each of these arrays?
> Output of
> cat /proc/mdstat
> df
> pvs
> lvs
>
> Might be helpful....
>
> Regards,
> Adam
>
> --
> Adam Goryachev
> Website Managers
> www.websitemanagers.com.au
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 8:26 ` Andrei Banu
2013-04-24 9:12 ` Adam Goryachev
@ 2013-04-24 16:37 ` Stan Hoeppner
2013-04-24 21:46 ` Andrei Banu
1 sibling, 1 reply; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-24 16:37 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On 4/24/2013 3:26 AM, Andrei Banu wrote:
> Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
> 541 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.96 % [jbd2/md2-8]
This seems to be your problem. jbd2 (journal block device) is causing
97% iowait, yet without doing much physical IO. This is a component of
EXT4. As this will fire intermittently it explains why you see such a
wide throughput gap between tests at different points in time.
This isn't a bug or Google would reveal that. Andrei, you need to
identify which daemon or kernel feature is causing this. Do you happen
to have realtime TRIM enabled? It is well known to bring IO to a crawl.
If not realtime TRIM, I'd guess you turned a knob you should not have in
some config file, causing a daemon to frequently issue a few gazillion
atomic updates.
--
Stan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 9:12 ` Adam Goryachev
2013-04-24 10:24 ` Tommy Apel
@ 2013-04-24 21:40 ` Andrei Banu
1 sibling, 0 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-24 21:40 UTC (permalink / raw)
Cc: linux-raid
Hi,
1. free -m
root [~]# free -m
total used free shared buffers cached
Mem: 15921 15542 379 0 1063 11870
-/+ buffers/cache: 2608 13313
Swap: 2046 100 1946
2. Yes, you understood correctly regarding the raid array (all 3 of them
are raid 1):
root@gts6 [~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[1] sda2[0]
204736 blocks super 1.0 [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
404750144 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sdb1[1] sda1[0]
2096064 blocks super 1.1 [2/2] [UU]
unused devices: <none>
md0 is boot.
md1 is swap.
md2 is /
3. df
root@gts6 [~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 380G 246G 116G 68% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/md0 194M 47M 137M 26% /boot
/usr/tmpDSK 3.6G 1.2G 2.2G 36% /tmp
4. pvs
root [~]# pvs -a
PV VG Fmt Attr PSize PFree
/dev/loop0 --- 0 0
/dev/md0 --- 0 0
/dev/md1 --- 0 0
/dev/ram0 --- 0 0
/dev/ram1 --- 0 0
/dev/ram10 --- 0 0
/dev/ram11 --- 0 0
/dev/ram12 --- 0 0
/dev/ram13 --- 0 0
/dev/ram14 --- 0 0
/dev/ram15 --- 0 0
/dev/ram2 --- 0 0
/dev/ram3 --- 0 0
/dev/ram4 --- 0 0
/dev/ram5 --- 0 0
/dev/ram6 --- 0 0
/dev/ram7 --- 0 0
/dev/ram8 --- 0 0
/dev/ram9 --- 0 0
/dev/root --- 0 0
5. lvs (No volume groups).
Thanks!
On 24/04/2013 12:12 PM, Adam Goryachev wrote:
> On 24/04/13 18:26, Andrei Banu wrote:
>> Hello,
>>
>> I am sorry for the irrelevant feedback. Where I misunderstood your
>> request, I filled in the blanks (poorly).
>>
>> 1. SWAP
>> root [~]# blkid | grep cef1d19d-2578-43db-9ffc-b6b70e227bfa
>> /dev/md1: UUID="cef1d19d-2578-43db-9ffc-b6b70e227bfa" TYPE="swap"
>>
>> So yes, swap is on md1. This *md1 has a size of 2GB*. Isn't this way
>> too low for a system with 16GB of memory?
>>
> Provide the output of "free", if there is RAM available, then it isn't
> too small (that is my personal opinion, but at least it won't affect
> performance/operations until you are using most of that swap space).
>
>> 3. root [~]# fdisk -lu /dev/sd*
>>
> My mistake, I should have said:
> fdisk -lu /dev/sd?
>
> In any case, all of the relevant information was included, so no harm done.
>> Disk /dev/sda: 512.1 GB, 512110190592 bytes
>> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk identifier: 0x00026d59
>>
>> Device Boot Start End Blocks Id System
>> /dev/sda1 2048 4196351 2097152 fd Linux raid
>> autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sda2 * 4196352 4605951 204800 fd Linux raid
>> autodetect
>> Partition 2 does not end on cylinder boundary.
>> /dev/sda3 4605952 814106623 404750336 fd Linux raid
>> autodetect
>>
>> Disk /dev/sdb: 512.1 GB, 512110190592 bytes
>> 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk identifier: 0x0003dede
>>
>> Device Boot Start End Blocks Id System
>> /dev/sdb1 2048 4196351 2097152 fd Linux raid
>> autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sdb2 * 4196352 4605951 204800 fd Linux raid
>> autodetect
>> Partition 2 does not end on cylinder boundary.
>> /dev/sdb3 4605952 814106623 404750336 fd Linux raid
>> autodetect
>>
> I'm assuming from this you have three md RAID1 arrays where sda1/sdb1
> are a pair, sda2/sdb2 are a pair and sda3/sdb3 are a pair?
>
> Can you describe what is on each of these arrays?
> Output of
> cat /proc/mdstat
> df
> pvs
> lvs
>
> Might be helpful....
>
> Regards,
> Adam
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 10:24 ` Tommy Apel
@ 2013-04-24 21:42 ` Andrei Banu
0 siblings, 0 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-24 21:42 UTC (permalink / raw)
Cc: linux-raid Raid
Hi,
Why would it do that?
And how do I fix this?
Thanks!
On 24/04/2013 1:24 PM, Tommy Apel wrote:
> Looks to me like it's the journaled quota process that holds everything back.\
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 16:37 ` Stan Hoeppner
@ 2013-04-24 21:46 ` Andrei Banu
[not found] ` <CAH3kUhHnF0imY=CAHfzaQy4XJuOMgOtbHNp17EYzeSJR2en7Fg@mail.gmail.com>
2013-04-25 10:56 ` Stan Hoeppner
0 siblings, 2 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-24 21:46 UTC (permalink / raw)
Cc: linux-raid
Hi,
1. How can I at least start trying to find the daemon that might be
doing this?
2. I am not sure what real time TRIM is. I thought there was the
'discard' option in
fstab (which I tried and didn't help) and other command like trims
(fstrim - which
errors out when run on / or mdtrim that seems somebody's experiment). But I
am not sure what real time trim might be.
I am not really sure where do I go from here. I am a bit lost as it
seems we hit
a dead end.
Thanks!
Andrei Banu
On 24/04/2013 7:37 PM, Stan Hoeppner wrote:
> On 4/24/2013 3:26 AM, Andrei Banu wrote:
>
>> Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
>> 541 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.96 % [jbd2/md2-8]
> This seems to be your problem. jbd2 (journal block device) is causing
> 97% iowait, yet without doing much physical IO. This is a component of
> EXT4. As this will fire intermittently it explains why you see such a
> wide throughput gap between tests at different points in time.
>
> This isn't a bug or Google would reveal that. Andrei, you need to
> identify which daemon or kernel feature is causing this. Do you happen
> to have realtime TRIM enabled? It is well known to bring IO to a crawl.
>
> If not realtime TRIM, I'd guess you turned a knob you should not have in
> some config file, causing a daemon to frequently issue a few gazillion
> atomic updates.
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
[not found] ` <CAH3kUhHnF0imY=CAHfzaQy4XJuOMgOtbHNp17EYzeSJR2en7Fg@mail.gmail.com>
@ 2013-04-25 10:11 ` Andrei Banu
0 siblings, 0 replies; 38+ messages in thread
From: Andrei Banu @ 2013-04-25 10:11 UTC (permalink / raw)
To: linux-raid
Hi,
I don't have fstab discard option set. I was just enumerating the trim
kinds I know. I did try discard but it didn't do anything good. And the
problem dated from before my discard test.
Regards!
On 2013-04-25 00:53, Roberto Spadim wrote:
> TRIM in ext4 = discard
> 2013/4/24 Andrei Banu <andrei.banu@redhost.ro>
>
>> Hi,
>> 1. How can I at least start trying to find the daemon that might be
>> doing this?
>> 2. I am not sure what real time TRIM is. I thought there was the
>> 'discard' option in
>> fstab (which I tried and didn't help) and other command like trims
>> (fstrim - which
>> errors out when run on / or mdtrim that seems somebody's experiment).
>> But I
>> am not sure what real time trim might be.
>> I am not really sure where do I go from here. I am a bit lost as it
>> seems we hit
>> a dead end.
>> Thanks!
>> Andrei Banu
>> On 24/04/2013 7:37 PM, Stan Hoeppner wrote:
>>
>>> On 4/24/2013 3:26 AM, Andrei Banu wrote:
>>>
>>>> Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
>>>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO>
>>>> COMMAND
>>>> 541 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.96 %
>>>> [jbd2/md2-8]
>>> This seems to be your problem. jbd2 (journal block device) is
>>> causing
>>> 97% iowait, yet without doing much physical IO. This is a component
>>> of
>>> EXT4. As this will fire intermittently it explains why you see such
>>> a
>>> wide throughput gap between tests at different points in time.
>>> This isn't a bug or Google would reveal that. Andrei, you need to
>>> identify which daemon or kernel feature is causing this. Do you
>>> happen
>>> to have realtime TRIM enabled? It is well known to bring IO to a
>>> crawl.
>>> If not realtime TRIM, I'd guess you turned a knob you should not
>>> have in
>>> some config file, causing a daemon to frequently issue a few
>>> gazillion
>>> atomic updates.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> [1]
> --
> Roberto Spadim
> Links:
> ------
> [1] http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-24 21:46 ` Andrei Banu
[not found] ` <CAH3kUhHnF0imY=CAHfzaQy4XJuOMgOtbHNp17EYzeSJR2en7Fg@mail.gmail.com>
@ 2013-04-25 10:56 ` Stan Hoeppner
1 sibling, 0 replies; 38+ messages in thread
From: Stan Hoeppner @ 2013-04-25 10:56 UTC (permalink / raw)
To: Andrei Banu
On 4/24/2013 4:46 PM, Andrei Banu wrote:
> 1. How can I at least start trying to find the daemon that might be
> doing this?
For you, I'd say grab a bucket of popcorn and watch top and iotop for a
while during peak use periods. Fire up two ssh sessions and watch both
simultaneously, left and right on your screen. You need to become
familiar with your system, what the applications are doing to cpu, mem,
and io.
When you're not doing that, use Google. Start reading about problems
others have with "[jbd2/]" and/or super slow performance with very fast
SSDs.
> 2. I am not sure what real time TRIM is. I thought there was the
> 'discard' option in
> fstab (which I tried and didn't help) and other command like trims
discard = realtime trim
If it's not enabled then this isn't the source of your problem.
> I am not really sure where do I go from here. I am a bit lost as it
> seems we hit
> a dead end.
There's only so much we can do. The problem appears to have nothing to
do with md/RAID. I'm doing my best to point you in the right
direction(s), but I'm neither a CentOS nor EXT4 user and am not familiar
with those ecosystems nor support channels.
You need to research your problem via Google, interface with other
CentOS users and others using the same type of cpanel based hosting
software stack.
If I had access to the box I'm sure I could figure this out for you, but
this isn't something I'm willing to do at this time.
Keep at it and you'll eventually figure it out. And you'll learn a lot
along the way.
Best of luck.
--
Stan
> Thanks!
> Andrei Banu
>
> On 24/04/2013 7:37 PM, Stan Hoeppner wrote:
>> On 4/24/2013 3:26 AM, Andrei Banu wrote:
>>
>>> Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
>>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
>>> 541 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.96 %
>>> [jbd2/md2-8]
>> This seems to be your problem. jbd2 (journal block device) is causing
>> 97% iowait, yet without doing much physical IO. This is a component of
>> EXT4. As this will fire intermittently it explains why you see such a
>> wide throughput gap between tests at different points in time.
>>
>> This isn't a bug or Google would reveal that. Andrei, you need to
>> identify which daemon or kernel feature is causing this. Do you happen
>> to have realtime TRIM enabled? It is well known to bring IO to a crawl.
>>
>> If not realtime TRIM, I'd guess you turned a knob you should not have in
>> some config file, causing a daemon to frequently issue a few gazillion
>> atomic updates.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
2013-04-20 23:26 ` Andrei Banu
2013-04-21 2:48 ` Stan Hoeppner
@ 2013-04-25 11:38 ` Thomas Jarosch
1 sibling, 0 replies; 38+ messages in thread
From: Thomas Jarosch @ 2013-04-25 11:38 UTC (permalink / raw)
To: Andrei Banu; +Cc: linux-raid
On Sunday, 21. April 2013 02:26:26 Andrei Banu wrote:
> They are connected through SATA2 ports (this does explain the read speed
> but not the pitiful write one) in AHCI.
So the SATA controller is already in AHCI mode. Good.
You didn't say what kind of server hardware you are using or I missed it.
On the HP DL3xxx servers we usually use, we have to enable AHCI mode _and_
the write cache in the BIOS. Maybe your server needs something similar.
Some RAID controllers only allow you to enable the write cache
when a battery-backed write cache module is installed.
HTH,
Thomas
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2013-04-25 11:38 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-19 22:58 Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO Andrei Banu
[not found] ` <CAH3kUhEaZGON=fAyVMZOz5fH_DcfKv=hCa96UCeK4pN7k81c_Q@mail.gmail.com>
2013-04-20 23:26 ` Andrei Banu
[not found] ` <51725458.7020109@redhost.ro>
[not found] ` <CAH3kUhHxBiqugFQm=PPJNNe9jOdKy0etUjQNsoDz_LJNUCLCCQ@mail.gmail.com>
2013-04-20 23:25 ` Andrei Banu
2013-04-20 23:26 ` Andrei Banu
2013-04-21 2:48 ` Stan Hoeppner
2013-04-21 12:23 ` Tommy Apel
2013-04-21 16:48 ` Tommy Apel
2013-04-21 19:33 ` Stan Hoeppner
2013-04-21 19:56 ` Tommy Apel
2013-04-22 0:47 ` Stan Hoeppner
2013-04-22 7:51 ` Tommy Apel
2013-04-22 8:29 ` Tommy Apel
2013-04-22 10:26 ` Andrei Banu
2013-04-22 12:02 ` Tommy Apel
2013-04-23 2:59 ` Stan Hoeppner
2013-04-22 23:21 ` Stan Hoeppner
2013-04-25 11:38 ` Thomas Jarosch
2013-04-21 0:10 ` Stan Hoeppner
[not found] ` <51732E2B.6090607@hardwarefreak.com>
2013-04-21 20:46 ` Andrei Banu
2013-04-21 23:17 ` Stan Hoeppner
2013-04-22 10:19 ` Andrei Banu
2013-04-23 2:51 ` Stan Hoeppner
2013-04-23 10:17 ` Andrei Banu
2013-04-24 3:24 ` Stan Hoeppner
2013-04-24 8:26 ` Andrei Banu
2013-04-24 9:12 ` Adam Goryachev
2013-04-24 10:24 ` Tommy Apel
2013-04-24 21:42 ` Andrei Banu
2013-04-24 21:40 ` Andrei Banu
2013-04-24 16:37 ` Stan Hoeppner
2013-04-24 21:46 ` Andrei Banu
[not found] ` <CAH3kUhHnF0imY=CAHfzaQy4XJuOMgOtbHNp17EYzeSJR2en7Fg@mail.gmail.com>
2013-04-25 10:11 ` Andrei Banu
2013-04-25 10:56 ` Stan Hoeppner
2013-04-22 23:11 ` Andrei Banu
2013-04-23 4:39 ` Stan Hoeppner
2013-04-22 23:25 ` Stan Hoeppner
2013-04-23 4:49 ` Mikael Abrahamsson
2013-04-23 6:01 ` Stan Hoeppner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.