* Performance of a software raid 5
@ 2009-04-20 17:12 Johannes Segitz
2009-04-20 23:46 ` John Robinson
0 siblings, 1 reply; 30+ messages in thread
From: Johannes Segitz @ 2009-04-20 17:12 UTC (permalink / raw)
To: linux-raid
Hi,
[first of all i'm not really sure if i'm right here. If this is the
wrong place then please just
tell me what to rtfm or where to post]
i'm currently trying to create a raid 5 out of three 1 TB hdd. For now
there is one hdd missing so i get 3 TB of usable space.
One hdd is connected to
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
the other two to
04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA
Raid II Controller (rev 01)
The CPU is a AMD X2 4200+ and the system has 2 GB RAM.
The performance of the array is underwhelming.
time dd if=/dev/zero of=big_file bs=4096 count=2560000
10485760000 bytes (10 GB) copied, 187.691 s, 55.9 MB/s
dd if=/dev/zero of=big_file bs=4096 count=2560000 0.70s user 26.05s
system 14% cpu 3:08.12 total
time dd if=big_file of=/dev/null bs=4096 count=2560000
10485760000 bytes (10 GB) copied, 297.345 s, 35.3 MB/s
dd if=big_file of=/dev/null bs=4096 count=2560000 0.50s user 10.60s
system 3% cpu 4:57.35 total
So i get a write performance of 55 MB/s and a read speed of 35 MB/s. The hdd
Model=SAMSUNG HD103UJ , FwRev=1AA01113,
SerialNo=S13PJDWS250990
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
BuffType=DualPortCache, BuffSize=32767kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 udma6
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-3,4,5,6,7
are all the same and get ~70 MB/s when used alone.
The details for the raid device:
/dev/md6:
Version : 00.90
Creation Time : Sun Apr 19 22:30:23 2009
Raid Level : raid5
Array Size : 2930279424 (2794.53 GiB 3000.61 GB)
Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 6
Persistence : Superblock is persistent
Update Time : Mon Apr 20 14:45:32 2009
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : 584a0f66:3c075c23:9cae9464:25382498 (local to host
johannes-desktop)
Events : 0.21396
Number Major Minor RaidDevice State
0 8 97 0 active sync /dev/sdg1
1 8 145 1 active sync /dev/sdj1
2 8 161 2 active sync /dev/sdk1
3 0 0 3 removed
On top of the raid device there is a crypto layer
cryptsetup --verify-passphrase -c aes-cbc-essiv:sha256 -y -s 256
luksFormat /dev/md6
and then ext4
mkfs.ext4 -v -b 4096 -E lazy_itable_init,stride=64,stripe-width=256 -O
large_file,dir_index,extent,sparse_super,uninit_bg -m0
/dev/mapper/data
I use kernel 2.6.29.1
Stride and stripe-width will be correct when i add another two hdd of
which one will carry data. Can someone please give me a hint why i
could get such bad performance especially while reading? I don't think
its the crypto layer since kcryptd doesn't go over 50% cpu and having
two cores should prevent other processes from starving. The stride and
stripe-width aren't correct right now but can it degrade performance
like that? I would expect at least 100+ MB/s on reading and writing.
Thanks
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-20 17:12 Performance of a software raid 5 Johannes Segitz
@ 2009-04-20 23:46 ` John Robinson
2009-04-21 0:10 ` Johannes Segitz
2009-04-21 0:44 ` Poor write performance with write-intent bitmap? John Robinson
0 siblings, 2 replies; 30+ messages in thread
From: John Robinson @ 2009-04-20 23:46 UTC (permalink / raw)
To: Johannes Segitz; +Cc: linux-raid
On 20/04/2009 18:12, Johannes Segitz wrote:
> i'm currently trying to create a raid 5 out of three 1 TB hdd. For now
> there is one hdd missing so i get 3 TB of usable space.
[...]
> Stride and stripe-width will be correct when i add another two hdd of
> which one will carry data. Can someone please give me a hint why i
> could get such bad performance especially while reading?
I would have thought it's because you're running in degraded mode and
one in 3 sectors is having to be regenerated from the parity. It still
seems a bit slow, though.
Here I have a 3-disc RAID-5 of similar drives:
# hdparm -i /dev/sda
/dev/sda:
Model=SAMSUNG HD103UJ , FwRev=1AA01112,
SerialNo=S1PVJ1CQ602164
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
BuffType=DualPortCache, BuffSize=32767kB, MaxMultSect=16, MultSect=?0?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5
ATA/ATAPI-6 ATA/ATAPI-7
# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Mon Jul 28 15:49:09 2008
Raid Level : raid5
Array Size : 1953310720 (1862.82 GiB 2000.19 GB)
Used Dev Size : 976655360 (931.41 GiB 1000.10 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Apr 21 00:35:26 2009
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : d8c57a89:166ee722:23adec48:1574b5fc
Events : 0.6134
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 34 2 active sync /dev/sdc2
It has LVM and an ext3 filesystem on it. Here are my timings:
# time dd if=/dev/zero of=big_file bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 264.448 seconds, 39.7 MB/s
real 4m25.740s
user 0m2.272s
sys 0m34.470s
# time dd if=big_file of=/dev/null bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 53.9577 seconds, 194 MB/s
real 0m54.026s
user 0m0.556s
sys 0m4.944s
I'm not quite sure whether I should be disappointed at my writes being
so slow. Certainly there's a lot of rattling during writing, which
probably indicates lots of seeks to write ext3's journal. But reads are
roughly what I expected, at about three times the single-disc throughput.
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-20 23:46 ` John Robinson
@ 2009-04-21 0:10 ` Johannes Segitz
2009-04-21 0:52 ` John Robinson
2009-04-21 0:44 ` Poor write performance with write-intent bitmap? John Robinson
1 sibling, 1 reply; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21 0:10 UTC (permalink / raw)
To: linux-raid
On Tue, Apr 21, 2009 at 1:46 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> I would have thought it's because you're running in degraded mode and one in
> 3 sectors is having to be regenerated from the parity. It still seems a bit
> slow, though.
i don't think that that is a problem. The data is there without
redundancy so i can't see
how there would be the need to calculate anything
> I'm not quite sure whether I should be disappointed at my writes being so
> slow. Certainly there's a lot of rattling during writing, which probably
> indicates lots of seeks to write ext3's journal. But reads are roughly what
> I expected, at about three times the single-disc throughput.
200 MB/s reads would be nice but i expected quite a bit more speed
writing too. I know
that you don't get an O(n) speedup but falling behind the normal performance of
a single drive seems not okay.
Johannes
Btw.:The controller is on PCI-E so no bottleneck there
^ permalink raw reply [flat|nested] 30+ messages in thread
* Poor write performance with write-intent bitmap?
2009-04-20 23:46 ` John Robinson
2009-04-21 0:10 ` Johannes Segitz
@ 2009-04-21 0:44 ` John Robinson
2009-04-21 1:33 ` NeilBrown
1 sibling, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-21 0:44 UTC (permalink / raw)
To: Linux RAID
On 21/04/2009 00:46, I wrote:
[...]
> # time dd if=/dev/zero of=big_file bs=4096 count=2560000
> 2560000+0 records in
> 2560000+0 records out
> 10485760000 bytes (10 GB) copied, 264.448 seconds, 39.7 MB/s
[...]
> I'm not quite sure whether I should be disappointed at my writes being
> so slow. Certainly there's a lot of rattling during writing, which
> probably indicates lots of seeks to write ext3's journal.
No, that's not it. Using a scratch logical volume over the md RAID-5
isn't much better:
# time dd if=/dev/zero of=/dev/mapper/vg0-scratch bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 230.036 seconds, 45.6 MB/s
real 3m50.077s
user 0m1.608s
sys 0m11.097s
It still rattles a lot, suggesting a lot of seeking. Now if I turn off
the bitmap and try again:
# mdadm --grow /dev/md1 --bitmap=none
# time dd if=/dev/zero of=/dev/mapper/vg0-scratch bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 110.17 seconds, 95.2 MB/s
real 1m50.346s
user 0m1.900s
sys 0m13.537s
That's more like it, and no more rattling. Can I tune settings for the
internal bitmap, or is this something which will have improved anyway
since my kernel (2.6.18-128.1.6.el5.centos.plusxen so essentially a
prominent North American Enterprise Linux vendor's EL5 codebase for
md/raid5)? I mean, I do want the bitmap, but I hadn't realised it was
quite so expensive (not that it matters much in this particular
application).
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 0:10 ` Johannes Segitz
@ 2009-04-21 0:52 ` John Robinson
2009-04-21 1:05 ` Johannes Segitz
0 siblings, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-21 0:52 UTC (permalink / raw)
To: Johannes Segitz; +Cc: linux-raid
On 21/04/2009 01:10, Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 1:46 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> I would have thought it's because you're running in degraded mode and one in
>> 3 sectors is having to be regenerated from the parity. It still seems a bit
>> slow, though.
>
> i don't think that that is a problem. The data is there without
> redundancy so i can't see
> how there would be the need to calculate anything
There's no redundancy but it's still the RAID-5 4-disc layout with 3
data and 1 parity, the parity on a different disc in each stripe. In
your case with a missing disc, for 3 stripes in 4 you have 2 data and 1
parity. Of course the parity is having to be calculated when you're
writing, and whatever would be written to your missing disc is being
discarded.
On the other hand if you were using RAID-0 over 3 discs there would be
no need to calculate anything.
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 0:52 ` John Robinson
@ 2009-04-21 1:05 ` Johannes Segitz
2009-04-21 1:12 ` John Robinson
2009-04-21 1:19 ` NeilBrown
0 siblings, 2 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21 1:05 UTC (permalink / raw)
To: linux-raid
On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> There's no redundancy but it's still the RAID-5 4-disc layout with 3 data
> and 1 parity, the parity on a different disc in each stripe. In your case
> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
> course the parity is having to be calculated when you're writing, and
> whatever would be written to your missing disc is being discarded.
you're right, i didn't think of that. But calculating an xor isn't really
a big deal (especially with the aes on top of it) so i still can't see why
it's so slow
Johannes
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 1:05 ` Johannes Segitz
@ 2009-04-21 1:12 ` John Robinson
2009-04-21 1:19 ` NeilBrown
1 sibling, 0 replies; 30+ messages in thread
From: John Robinson @ 2009-04-21 1:12 UTC (permalink / raw)
To: Johannes Segitz; +Cc: linux-raid
On 21/04/2009 02:05, Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> There's no redundancy but it's still the RAID-5 4-disc layout with 3 data
>> and 1 parity, the parity on a different disc in each stripe. In your case
>> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
>> course the parity is having to be calculated when you're writing, and
>> whatever would be written to your missing disc is being discarded.
>
> you're right, i didn't think of that. But calculating an xor isn't really
> a big deal (especially with the aes on top of it) so i still can't see why
> it's so slow
No nor can I, especially since your `time` output shows a very modest
amount of system time; it may be worth trying fewer layers (i.e. no
encryption and/or no filesystem) to eliminate them, or monitoring with
other tools like iostat to see if you can get to the bottom of it.
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 1:05 ` Johannes Segitz
2009-04-21 1:12 ` John Robinson
@ 2009-04-21 1:19 ` NeilBrown
2009-04-21 2:04 ` Johannes Segitz
2009-04-22 9:07 ` Goswin von Brederlow
1 sibling, 2 replies; 30+ messages in thread
From: NeilBrown @ 2009-04-21 1:19 UTC (permalink / raw)
To: Johannes Segitz; +Cc: linux-raid
On Tue, April 21, 2009 11:05 am, Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> There's no redundancy but it's still the RAID-5 4-disc layout with 3
>> data
>> and 1 parity, the parity on a different disc in each stripe. In your
>> case
>> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
>> course the parity is having to be calculated when you're writing, and
>> whatever would be written to your missing disc is being discarded.
>
> you're right, i didn't think of that. But calculating an xor isn't really
> a big deal (especially with the aes on top of it) so i still can't see why
> it's so slow
Large sequential writes to a degraded RAID5 will be the same speed as to a
non-degraded RAID5. Smaller random write can still be slower as the
amount of pre-reading can increase.
Reads from a degraded raid5 will be slower not because of the XOR, but
because of needing to read all that extra data to feed in to the XOR.
Have you done any testing without the crypto layer to see what effect
that has?
Can I suggest:
for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
do
dd if=$d of=/dev/null bs=1M count=100
done
and report the times.
NeilBrown
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 0:44 ` Poor write performance with write-intent bitmap? John Robinson
@ 2009-04-21 1:33 ` NeilBrown
2009-04-21 2:13 ` John Robinson
2009-04-21 16:00 ` Bill Davidsen
0 siblings, 2 replies; 30+ messages in thread
From: NeilBrown @ 2009-04-21 1:33 UTC (permalink / raw)
To: John Robinson; +Cc: Linux RAID
On Tue, April 21, 2009 10:44 am, John Robinson wrote:
> That's more like it, and no more rattling. Can I tune settings for the
> internal bitmap, or is this something which will have improved anyway
> since my kernel (2.6.18-128.1.6.el5.centos.plusxen so essentially a
> prominent North American Enterprise Linux vendor's EL5 codebase for
> md/raid5)? I mean, I do want the bitmap, but I hadn't realised it was
> quite so expensive (not that it matters much in this particular
> application).
>
I don't think newer kernels make any different to bitmap related
performance, though there might be some general raid5 improvements since
then.
There are two tunables for bitmaps. Chuck size and delay (though the
delay doesn't seem to be in the man page).
Choosing a larger --bitmap-chunk size will require fewer updates to the
bitmap before writes are allowed to proceed. However a larger bitmap-chunk
size will also increase the amount of work needed after a crash or
re-added device.
Check your current (default) chunk size with "mdadm -X /dev/sdxx" and
create a new bitmap with (say) 16 or 64 times the chunk size.
See if that makes a difference.
The delay tunable sets how quickly bits are removed from the bitmap.
This is done fairly lazily and opportunistically so it isn't likely
to affect throughput directly. However if you are updates the same
area on disk periodically, and this period is a little bit more than
the timeout, you will get excessive bitmap updates that service little
purpose.
You could try increasing this (to say 30 or 60 seconds), but I doubt
it will have much effect.
Your other option is to put the bitmap in a file on some other device.
If you have a device this is rarely used (maybe your root filesystem)
you can create an external bitmap on there. You would want to be
sure that the device storing the bitmap is always going to be
available when you start the array that the bitmap belongs to.
Maybe you could create an external bitmap in a tmpfs.... but then
it wouldn't survive a crash and so has little value. It would be
fast though :-)
NeilBrown
> Cheers,
>
> John.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 1:19 ` NeilBrown
@ 2009-04-21 2:04 ` Johannes Segitz
2009-04-21 5:46 ` Neil Brown
2009-04-21 18:56 ` Corey Hickey
2009-04-22 9:07 ` Goswin von Brederlow
1 sibling, 2 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21 2:04 UTC (permalink / raw)
To: linux-raid
On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
> Have you done any testing without the crypto layer to see what effect
> that has?
>
> Can I suggest:
>
> for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
> do
> dd if=$d of=/dev/null bs=1M count=100
> done
>
> and report the times.
tested it with 1gb instead of 100 mb
sdg
1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
sdj
1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
sdk
1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
/dev/md6
1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
/dev/mapper/data
1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
bigfile
1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
so the crypto indeed slows it down (and i'm surprised that it's that
bad because i've read
it's not a big hit on current CPUs and the X2 isn't new but not that
old) but still read speed
from md6 is worse than from one drive alone
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 1:33 ` NeilBrown
@ 2009-04-21 2:13 ` John Robinson
2009-04-21 5:50 ` Neil Brown
2009-04-22 9:16 ` Goswin von Brederlow
2009-04-21 16:00 ` Bill Davidsen
1 sibling, 2 replies; 30+ messages in thread
From: John Robinson @ 2009-04-21 2:13 UTC (permalink / raw)
To: NeilBrown; +Cc: Linux RAID
On 21/04/2009 02:33, NeilBrown wrote:
[...]
> Choosing a larger --bitmap-chunk size will require fewer updates to the
> bitmap before writes are allowed to proceed. However a larger bitmap-chunk
> size will also increase the amount of work needed after a crash or
> re-added device.
Ah, from my reading of the mdadm man page I thought you could only
specify the chunk size when using an external bitmap:
--bitmap-chunk=
Set the chunksize of the bitmap. Each bit corresponds to that
many Kilobytes of storage. When using a file based bitmap, the
default is to use the smallest size that is at-least 4 and
requires no more than 2^21 chunks. When using an internal
bitmap, the chunksize is automatically determined to make best
use of available space.
> Check your current (default) chunk size with "mdadm -X /dev/sdxx" and
> create a new bitmap with (say) 16 or 64 times the chunk size.
> See if that makes a difference.
It certainly does. Upping it from 2M to 32M gets me from 45MB/s to
81MB/s on the scratch LV, and there's noticeably less seek noise.
Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large
chunk size caused a reboot! There's nothing in the log, and I didn't see
the console. I still have my 32M chunksize but I don't want to try that
again in a hurry :-)
[...]
> Your other option is to put the bitmap in a file on some other device.
> If you have a device this is rarely used (maybe your root filesystem)
Can't do that, my root filesystem is on the RAID-5, and part of the
reason for wanting the bitmap is because the md can't be stopped while
shutting down, so it was always wanting to resync at startup, which is
rather tedious.
> Maybe you could create an external bitmap in a tmpfs.... but then
> it wouldn't survive a crash and so has little value. It would be
> fast though :-)
"Ooh, virtual memory! Now I can have a really big RAM disc!"
Now, it's time I checked my discs to see if I've lost data in that
crash. `mdadm -X` is stuck saying there are 10 dirty chunks.
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 2:04 ` Johannes Segitz
@ 2009-04-21 5:46 ` Neil Brown
2009-04-21 12:40 ` Johannes Segitz
2009-04-26 17:03 ` Johannes Segitz
2009-04-21 18:56 ` Corey Hickey
1 sibling, 2 replies; 30+ messages in thread
From: Neil Brown @ 2009-04-21 5:46 UTC (permalink / raw)
To: Johannes Segitz; +Cc: linux-raid
On Tuesday April 21, johannes.segitz@gmail.com wrote:
> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
> > Have you done any testing without the crypto layer to see what effect
> > that has?
> >
> > Can I suggest:
> >
> > for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
> > do
> > dd if=$d of=/dev/null bs=1M count=100
> > done
> >
> > and report the times.
>
> tested it with 1gb instead of 100 mb
>
> sdg
> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
> sdj
> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
> sdk
> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
> /dev/md6
> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
> /dev/mapper/data
> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
> bigfile
> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
>
> so the crypto indeed slows it down (and i'm surprised that it's that
> bad because i've read
> it's not a big hit on current CPUs and the X2 isn't new but not that
> old) but still read speed
> from md6 is worse than from one drive alone
I suspect you will see that improve when you add another drive that it
isn't running degraded.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 2:13 ` John Robinson
@ 2009-04-21 5:50 ` Neil Brown
2009-04-21 12:05 ` John Robinson
2009-04-22 9:16 ` Goswin von Brederlow
1 sibling, 1 reply; 30+ messages in thread
From: Neil Brown @ 2009-04-21 5:50 UTC (permalink / raw)
To: John Robinson; +Cc: Linux RAID
On Tuesday April 21, john.robinson@anonymous.org.uk wrote:
>
> Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large
> chunk size caused a reboot! There's nothing in the log, and I didn't see
> the console. I still have my 32M chunksize but I don't want to try that
> again in a hurry :-)
That's a worry... I cannot easily reproduce it. If it happens again
and you get any more detail, I'm sure you'll let me know.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 5:50 ` Neil Brown
@ 2009-04-21 12:05 ` John Robinson
2009-05-22 23:00 ` Redeeman
0 siblings, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-21 12:05 UTC (permalink / raw)
To: Linux RAID
On 21/04/2009 06:50, Neil Brown wrote:
> On Tuesday April 21, john.robinson@anonymous.org.uk wrote:
>> Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large
>> chunk size caused a reboot! There's nothing in the log, and I didn't see
>> the console. I still have my 32M chunksize but I don't want to try that
>> again in a hurry :-)
>
> That's a worry... I cannot easily reproduce it. If it happens again
> and you get any more detail, I'm sure you'll let me know.
Sure will. For the moment I have something that looks slightly
inconsistent: mdadm --detail shows no bitmap after the crash:
# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Mon Jul 28 15:49:09 2008
Raid Level : raid5
Array Size : 1953310720 (1862.82 GiB 2000.19 GB)
Used Dev Size : 976655360 (931.41 GiB 1000.10 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Apr 21 12:37:15 2009
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : d8c57a89:166ee722:23adec48:1574b5fc
Events : 0.6152
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 34 2 active sync /dev/sdc2
and indeed another attempt to remove the bitmap fails gently:
# mdadm --grow /dev/md1 --bitmap none
mdadm: no bitmap found on /dev/md1
However examining any of the devices making up the RAID appears to
suggest there is a bitmap:
# mdadm --examine-bitmap /dev/sda2
Filename : /dev/sda2
Magic : 6d746962
Version : 4
UUID : d8c57a89:166ee722:23adec48:1574b5fc
Events : 6148
Events Cleared : 6148
State : OK
Chunksize : 32 MB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 976655360 (931.41 GiB 1000.10 GB)
Bitmap : 29806 bits (chunks), 10 dirty (0.0%)
Is this to be expected? I would have thought it would say nothing here,
or say there's no bitmap.
Anyway, continuing my experiment, increasing the bitmap chunk size to
128MB improves my streaming write throughput even further, to 86MB/s (vs
45MB/s with the default 2MB chunk, and 81MB/s with a 32MB chunk), but it
looks like a case of diminishing returns, the chunk size is getting
large enough to mean there could be real work involved in recovery, and
I really ought to be testing this with some real filesystem throughput,
not just streaming writes with dd.
Another `mdadm --grow /dev/md1 --bitmap none` has worked without
side-effects, but afterwards `mdadm --examine-bitmap` still shows the
most recent bitmap settings. This is mdadm 2.6.4, or more specifically
mdadm-2.6.4-1.el5.x86_64.rpm.
I've now gone to a 16MB chunk size, which gives 75MB/s throughput for
streaming writes to the scratch LV, better than 80% of the bitmap-less
setup as opposed to less than 50% with the default chunk size, and I
think I'm going to settle at that for now.
Many thanks for all your advice and assistance.
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 5:46 ` Neil Brown
@ 2009-04-21 12:40 ` Johannes Segitz
2009-04-24 13:49 ` Johannes Segitz
2009-04-26 17:03 ` Johannes Segitz
1 sibling, 1 reply; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21 12:40 UTC (permalink / raw)
To: linux-raid
On Tue, Apr 21, 2009 at 7:46 AM, Neil Brown <neilb@suse.de> wrote:
> I suspect you will see that improve when you add another drive that it
> isn't running degraded.
I will give it a try and i hope you're right since i can't recreate the array
when i used the other drives because currently they are used in another
array which then will be destroyed. I'll try it later and then post the results
Johannes
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 1:33 ` NeilBrown
2009-04-21 2:13 ` John Robinson
@ 2009-04-21 16:00 ` Bill Davidsen
1 sibling, 0 replies; 30+ messages in thread
From: Bill Davidsen @ 2009-04-21 16:00 UTC (permalink / raw)
To: NeilBrown; +Cc: John Robinson, Linux RAID
NeilBrown wrote:
> On Tue, April 21, 2009 10:44 am, John Robinson wrote:
>
>> That's more like it, and no more rattling. Can I tune settings for the
>> internal bitmap, or is this something which will have improved anyway
>> since my kernel (2.6.18-128.1.6.el5.centos.plusxen so essentially a
>> prominent North American Enterprise Linux vendor's EL5 codebase for
>> md/raid5)? I mean, I do want the bitmap, but I hadn't realised it was
>> quite so expensive (not that it matters much in this particular
>> application).
>>
>>
>
> I don't think newer kernels make any different to bitmap related
> performance, though there might be some general raid5 improvements since
> then.
>
> There are two tunables for bitmaps. Chuck size and delay (though the
> delay doesn't seem to be in the man page).
>
It isn't in the man page, and
mdadm --help | egrep 'bitmap|delay'
comes up empty as well. So then I looked at the strings:
strings $(type -p mdadm) | less
and I not only found it, but found that you have a vast bunch of
duplicate strings, including some which are saying the same thing but
expressed in several ways. If I might quote my offspring, "That's ugly,
dude!"
Anyway, setting "--delay N" (sec) does exactly what you predicted, not much.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
"You are disgraced professional losers. And by the way, give us our money back."
- Representative Earl Pomeroy, Democrat of North Dakota
on the A.I.G. executives who were paid bonuses after a federal bailout.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 2:04 ` Johannes Segitz
2009-04-21 5:46 ` Neil Brown
@ 2009-04-21 18:56 ` Corey Hickey
2009-04-22 12:29 ` Bill Davidsen
1 sibling, 1 reply; 30+ messages in thread
From: Corey Hickey @ 2009-04-21 18:56 UTC (permalink / raw)
To: Johannes Segitz; +Cc: linux-raid
Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
>> Have you done any testing without the crypto layer to see what effect
>> that has?
>>
>> Can I suggest:
>>
>> for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>> do
>> dd if=$d of=/dev/null bs=1M count=100
>> done
>>
>> and report the times.
>
> tested it with 1gb instead of 100 mb
>
> sdg
> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
> sdj
> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
> sdk
> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
> /dev/md6
> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
> /dev/mapper/data
> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
> bigfile
> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
>
> so the crypto indeed slows it down (and i'm surprised that it's that
> bad because i've read
> it's not a big hit on current CPUs and the X2 isn't new but not that
> old) but still read speed
> from md6 is worse than from one drive alone
If it helps, some recent dd benchmarks I did indicate that twofish is
about 25% faster than aes on my Athlon64.
Athlon64 3400+ 2.4 GHz, 64-bit Linux 2.6.28.2
Both aes and twofish are using the asm implementations according to
/proc/crypto.
All numbers are in MB/s; average of three tests for a 512MB dd
read/write to the encrypted device.
read write
aes 69.4 61.0
twofish 86.8 76.6
aes-cbc-essiv:sha256 65.1 56.3
twofish-cbc-essiv:sha256 82.6 73.5
-Corey
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 1:19 ` NeilBrown
2009-04-21 2:04 ` Johannes Segitz
@ 2009-04-22 9:07 ` Goswin von Brederlow
1 sibling, 0 replies; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-22 9:07 UTC (permalink / raw)
To: NeilBrown; +Cc: Johannes Segitz, linux-raid
"NeilBrown" <neilb@suse.de> writes:
> On Tue, April 21, 2009 11:05 am, Johannes Segitz wrote:
>> On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
>> <john.robinson@anonymous.org.uk> wrote:
>>> There's no redundancy but it's still the RAID-5 4-disc layout with 3
>>> data
>>> and 1 parity, the parity on a different disc in each stripe. In your
>>> case
>>> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
>>> course the parity is having to be calculated when you're writing, and
>>> whatever would be written to your missing disc is being discarded.
>>
>> you're right, i didn't think of that. But calculating an xor isn't really
>> a big deal (especially with the aes on top of it) so i still can't see why
>> it's so slow
>
> Large sequential writes to a degraded RAID5 will be the same speed as to a
> non-degraded RAID5. Smaller random write can still be slower as the
> amount of pre-reading can increase.
>
> Reads from a degraded raid5 will be slower not because of the XOR, but
> because of needing to read all that extra data to feed in to the XOR.
But when doing large sequential reads those blocks have already been
loaded or would be loaded next anyway. The number of blocks read
should be exactly the same.
> Have you done any testing without the crypto layer to see what effect
> that has?
>
> Can I suggest:
>
> for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
> do
> dd if=$d of=/dev/null bs=1M count=100
> done
>
> and report the times.
>
> NeilBrown
MfG
Goswin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 2:13 ` John Robinson
2009-04-21 5:50 ` Neil Brown
@ 2009-04-22 9:16 ` Goswin von Brederlow
2009-04-22 12:41 ` John Robinson
1 sibling, 1 reply; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-22 9:16 UTC (permalink / raw)
To: John Robinson; +Cc: NeilBrown, Linux RAID
John Robinson <john.robinson@anonymous.org.uk> writes:
> Can't do that, my root filesystem is on the RAID-5, and part of the
> reason for wanting the bitmap is because the md can't be stopped while
> shutting down, so it was always wanting to resync at startup, which is
> rather tedious.
Normal shutdown should put the raid in read-only mode as last step. At
least Debian does that. That way even a mounted raid will be clean
after reboot.
I would also suggest restructuring your system like this:
sdX1 1GB raid1 / (+/boot)
sdX2 rest raid5 lvm with /usr, /var, /home, ...
Both / and /usr can usualy be read-only preventing any filesystem
corruption and raid resyncs in that part of the raid.
MfG
Goswin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 18:56 ` Corey Hickey
@ 2009-04-22 12:29 ` Bill Davidsen
2009-04-22 22:32 ` Corey Hickey
0 siblings, 1 reply; 30+ messages in thread
From: Bill Davidsen @ 2009-04-22 12:29 UTC (permalink / raw)
To: Corey Hickey; +Cc: Johannes Segitz, linux-raid
Corey Hickey wrote:
> Johannes Segitz wrote:
>
>> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
>>
>>> Have you done any testing without the crypto layer to see what effect
>>> that has?
>>>
>>> Can I suggest:
>>>
>>> for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>>> do
>>> dd if=$d of=/dev/null bs=1M count=100
>>> done
>>>
>>> and report the times.
>>>
>> tested it with 1gb instead of 100 mb
>>
>> sdg
>> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
>> sdj
>> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
>> sdk
>> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
>> /dev/md6
>> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
>> /dev/mapper/data
>> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
>> bigfile
>> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
>>
>> so the crypto indeed slows it down (and i'm surprised that it's that
>> bad because i've read
>> it's not a big hit on current CPUs and the X2 isn't new but not that
>> old) but still read speed
>> from md6 is worse than from one drive alone
>>
>
> If it helps, some recent dd benchmarks I did indicate that twofish is
> about 25% faster than aes on my Athlon64.
>
> Athlon64 3400+ 2.4 GHz, 64-bit Linux 2.6.28.2
>
> Both aes and twofish are using the asm implementations according to
> /proc/crypto.
>
> All numbers are in MB/s; average of three tests for a 512MB dd
> read/write to the encrypted device.
>
> read write
> aes 69.4 61.0
> twofish 86.8 76.6
> aes-cbc-essiv:sha256 65.1 56.3
> twofish-cbc-essiv:sha256 82.6 73.5
>
Good info, but was the CPU maxed or was something else the limiting factor?
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
"You are disgraced professional losers. And by the way, give us our money back."
- Representative Earl Pomeroy, Democrat of North Dakota
on the A.I.G. executives who were paid bonuses after a federal bailout.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-22 9:16 ` Goswin von Brederlow
@ 2009-04-22 12:41 ` John Robinson
2009-04-22 14:02 ` Goswin von Brederlow
2009-04-22 14:21 ` Andre Noll
0 siblings, 2 replies; 30+ messages in thread
From: John Robinson @ 2009-04-22 12:41 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: Linux RAID
On 22/04/2009 10:16, Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
>> Can't do that, my root filesystem is on the RAID-5, and part of the
>> reason for wanting the bitmap is because the md can't be stopped while
>> shutting down, so it was always wanting to resync at startup, which is
>> rather tedious.
>
> Normal shutdown should put the raid in read-only mode as last step. At
> least Debian does that. That way even a mounted raid will be clean
> after reboot.
Yes, I would have thought it should as well. But I've just looked at
CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to
switch md devices to read-only. Of course the root filesystem has gone
read-only but as we know that doesn't mean the device underneath it gets
told that. In particular we know that ext3 normally opens its device
read-write even when you're mounting the filesystem read-only (iirc it's
so it can replay the journal).
Another issue might be the LVM layer; does that need to be stopped or
switched to read-only too?
> I would also suggest restructuring your system like this:
>
> sdX1 1GB raid1 / (+/boot)
> sdX2 rest raid5 lvm with /usr, /var, /home, ...
>
> Both / and /usr can usualy be read-only preventing any filesystem
> corruption and raid resyncs in that part of the raid.
I did do this multiple partition/LV thing once upon a time, but I got
fed up with having to resize things when one partition was full and
others empty. The machine is primarily a fileserver and Xen host, so the
dom0 only has 40GB of its own, and I couldn't be bothered splitting that
up. Having said all this, your suggestion is a good one, it's just my
preference to have it otherwise :-)
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-22 12:41 ` John Robinson
@ 2009-04-22 14:02 ` Goswin von Brederlow
2009-04-23 7:48 ` John Robinson
2009-04-22 14:21 ` Andre Noll
1 sibling, 1 reply; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-22 14:02 UTC (permalink / raw)
To: John Robinson; +Cc: Goswin von Brederlow, Linux RAID
John Robinson <john.robinson@anonymous.org.uk> writes:
> On 22/04/2009 10:16, Goswin von Brederlow wrote:
>> John Robinson <john.robinson@anonymous.org.uk> writes:
>>> Can't do that, my root filesystem is on the RAID-5, and part of the
>>> reason for wanting the bitmap is because the md can't be stopped while
>>> shutting down, so it was always wanting to resync at startup, which is
>>> rather tedious.
>>
>> Normal shutdown should put the raid in read-only mode as last step. At
>> least Debian does that. That way even a mounted raid will be clean
>> after reboot.
>
> Yes, I would have thought it should as well. But I've just looked at
> CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to
> switch md devices to read-only. Of course the root filesystem has gone
> read-only but as we know that doesn't mean the device underneath it
> gets told that. In particular we know that ext3 normally opens its
> device read-write even when you're mounting the filesystem read-only
> (iirc it's so it can replay the journal).
>
> Another issue might be the LVM layer; does that need to be stopped or
> switched to read-only too?
Debian does
/sbin/vgchange -aln --ignorelockingfailure || return 2
before S60mdadm-raid, S60umountroot and S90reboot.
>> I would also suggest restructuring your system like this:
>>
>> sdX1 1GB raid1 / (+/boot)
>> sdX2 rest raid5 lvm with /usr, /var, /home, ...
>>
>> Both / and /usr can usualy be read-only preventing any filesystem
>> corruption and raid resyncs in that part of the raid.
>
> I did do this multiple partition/LV thing once upon a time, but I got
> fed up with having to resize things when one partition was full and
> others empty. The machine is primarily a fileserver and Xen host, so
> the dom0 only has 40GB of its own, and I couldn't be bothered
> splitting that up. Having said all this, your suggestion is a good
> one, it's just my preference to have it otherwise :-)
>
> Cheers,
>
> John.
I've been using a 1GB / for years and years now so that won't be a
problem. As for the rest one can also bind mount /usr, /var, /home to
/mnt/space/* respectively. I.e. have just 2 (/ and everything else)
partitions.
Esspecially for XEN hosts I find LVM verry usefull. Makes it easy to
create new logical volumes for new xen domains.
MfG
Goswin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-22 12:41 ` John Robinson
2009-04-22 14:02 ` Goswin von Brederlow
@ 2009-04-22 14:21 ` Andre Noll
2009-04-23 8:04 ` John Robinson
1 sibling, 1 reply; 30+ messages in thread
From: Andre Noll @ 2009-04-22 14:21 UTC (permalink / raw)
To: John Robinson; +Cc: Goswin von Brederlow, Linux RAID
[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]
On 13:41, John Robinson wrote:
> >Normal shutdown should put the raid in read-only mode as last step. At
> >least Debian does that. That way even a mounted raid will be clean
> >after reboot.
>
> Yes, I would have thought it should as well. But I've just looked at
> CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to
> switch md devices to read-only.
There's no need to do that in the shutdown script as the kernel will
switch all arrays to read-only mode on halt/reboot.
Moreover, as raid arrays are automatically marked clean if no writes
are pending for some small time period, a simple "sync; sleep 1"
at the end of the shutdown script is usually enough to have a clean
array during the next boot.
An alternative way to deal with this issue is to not have a root file
system at all but to mount/link each top level directory separately.
This allows to stop all md arrays cleany.
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-22 12:29 ` Bill Davidsen
@ 2009-04-22 22:32 ` Corey Hickey
0 siblings, 0 replies; 30+ messages in thread
From: Corey Hickey @ 2009-04-22 22:32 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Johannes Segitz, linux-raid
Bill Davidsen wrote:
> Corey Hickey wrote:
>> Johannes Segitz wrote:
>>
>>> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
>>>
>>>> Have you done any testing without the crypto layer to see what effect
>>>> that has?
>>>>
>>>> Can I suggest:
>>>>
>>>> for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>>>> do
>>>> dd if=$d of=/dev/null bs=1M count=100
>>>> done
>>>>
>>>> and report the times.
>>>>
>>> tested it with 1gb instead of 100 mb
>>>
>>> sdg
>>> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
>>> sdj
>>> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
>>> sdk
>>> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
>>> /dev/md6
>>> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
>>> /dev/mapper/data
>>> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
>>> bigfile
>>> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
>>>
>>> so the crypto indeed slows it down (and i'm surprised that it's that
>>> bad because i've read
>>> it's not a big hit on current CPUs and the X2 isn't new but not that
>>> old) but still read speed
>>> from md6 is worse than from one drive alone
>>>
>> If it helps, some recent dd benchmarks I did indicate that twofish is
>> about 25% faster than aes on my Athlon64.
>>
>> Athlon64 3400+ 2.4 GHz, 64-bit Linux 2.6.28.2
>>
>> Both aes and twofish are using the asm implementations according to
>> /proc/crypto.
>>
>> All numbers are in MB/s; average of three tests for a 512MB dd
>> read/write to the encrypted device.
>>
>> read write
>> aes 69.4 61.0
>> twofish 86.8 76.6
>> aes-cbc-essiv:sha256 65.1 56.3
>> twofish-cbc-essiv:sha256 82.6 73.5
no encryption 237 131
>>
>
> Good info, but was the CPU maxed or was something else the limiting factor?
To be honest, I didn't check when I benchmarked, but the underlying
device is much faster. I added the numbers to the table above. This is
for an md RAID-0 of two 1TB Samsung drives. I don't know why the write
speed for the RAID-0 is so much slower, except that it's not md's fault;
writing to the individual drives is slower, too. I would have
investigated more, but, at the time, I really wanted to get my computer
operational again. :)
That might be lowering my encrypted write speeds a bit relative to the
read speeds, but, even if so, I think it would affect the faster of the
two ciphers more than the slower--and twofish still leads by a
significant margin.
Also, to the original poster:
Check which crypto drivers in your kernel have ASM implementations loaded:
$ grep asm /proc/crypto
AES, twofish, and salsa20 are available.
-Corey
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-22 14:02 ` Goswin von Brederlow
@ 2009-04-23 7:48 ` John Robinson
0 siblings, 0 replies; 30+ messages in thread
From: John Robinson @ 2009-04-23 7:48 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: Linux RAID
On 22/04/2009 15:02, Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
>> Another issue might be the LVM layer; does that need to be stopped or
>> switched to read-only too?
>
> Debian does
>
> /sbin/vgchange -aln --ignorelockingfailure || return 2
>
> before S60mdadm-raid, S60umountroot and S90reboot.
But that's not going to switch any VG with a still-mounted filesystem
(e.g. /) to read-only or make it go away, it's going to fail. Still
probably a good idea for other circumstances, though.
> I've been using a 1GB / for years and years now so that won't be a
> problem. As for the rest one can also bind mount /usr, /var, /home to
> /mnt/space/* respectively. I.e. have just 2 (/ and everything else)
> partitions.
Well, I have just 2, /boot and everything else, but I might in the
future switch to your suggestion.
> Esspecially for XEN hosts I find LVM verry usefull. Makes it easy to
> create new logical volumes for new xen domains.
My thoughts exactly :-)
Many thanks,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-22 14:21 ` Andre Noll
@ 2009-04-23 8:04 ` John Robinson
2009-04-23 20:23 ` Goswin von Brederlow
0 siblings, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-23 8:04 UTC (permalink / raw)
To: Linux RAID; +Cc: Goswin von Brederlow
On 22/04/2009 15:21, Andre Noll wrote:
> On 13:41, John Robinson wrote:
>>> Normal shutdown should put the raid in read-only mode as last step. At
>>> least Debian does that. That way even a mounted raid will be clean
>>> after reboot.
>> Yes, I would have thought it should as well. But I've just looked at
>> CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to
>> switch md devices to read-only.
>
> There's no need to do that in the shutdown script as the kernel will
> switch all arrays to read-only mode on halt/reboot.
>
> Moreover, as raid arrays are automatically marked clean if no writes
> are pending for some small time period, a simple "sync; sleep 1"
> at the end of the shutdown script is usually enough to have a clean
> array during the next boot.
But that's still only "usually". Considering the enormous efforts taken
to unmount filesystems (or remount them read-only) so they're certain to
be clean at the next startup, it seems odd to settle for "usually"...
and CentOS 5 doesn't even appear to do that.
Goswin, please can you tell me what command Debian uses? I think I want
to combine both of these into my systems' halt scripts.
Cheers,
John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-23 8:04 ` John Robinson
@ 2009-04-23 20:23 ` Goswin von Brederlow
0 siblings, 0 replies; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-23 20:23 UTC (permalink / raw)
To: John Robinson; +Cc: Linux RAID, Goswin von Brederlow
John Robinson <john.robinson@anonymous.org.uk> writes:
> On 22/04/2009 15:21, Andre Noll wrote:
>> On 13:41, John Robinson wrote:
>>>> Normal shutdown should put the raid in read-only mode as last step. At
>>>> least Debian does that. That way even a mounted raid will be clean
>>>> after reboot.
>>> Yes, I would have thought it should as well. But I've just looked
>>> at CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try
>>> to switch md devices to read-only.
>>
>> There's no need to do that in the shutdown script as the kernel will
>> switch all arrays to read-only mode on halt/reboot.
>>
>> Moreover, as raid arrays are automatically marked clean if no writes
>> are pending for some small time period, a simple "sync; sleep 1"
>> at the end of the shutdown script is usually enough to have a clean
>> array during the next boot.
>
> But that's still only "usually". Considering the enormous efforts
> taken to unmount filesystems (or remount them read-only) so they're
> certain to be clean at the next startup, it seems odd to settle for
> "usually"... and CentOS 5 doesn't even appear to do that.
>
> Goswin, please can you tell me what command Debian uses? I think I
> want to combine both of these into my systems' halt scripts.
>
> Cheers,
>
> John.
On halt I do see a message about the raid being switched to read-only
but I don't see any command that would do that. So I do believe this
is, as Andre says, the kernel switching the raid read-only before
halting.
Maybe your kernel is too old to have this feature?
MfG
Goswin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 12:40 ` Johannes Segitz
@ 2009-04-24 13:49 ` Johannes Segitz
0 siblings, 0 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-24 13:49 UTC (permalink / raw)
To: linux-raid
On Tue, Apr 21, 2009 at 2:40 PM, Johannes Segitz
<johannes.segitz@gmail.com> wrote:
> I will give it a try and i hope you're right since i can't recreate the array
> when i used the other drives because currently they are used in another
> array which then will be destroyed. I'll try it later and then post the results
One of the drives is failing so i have to wait till i get a
replacement. Currently
a replacement drive is connected via USB so a benchmark wouldn't make
much sense. I'll retry the dd test when the new drive is build in.
Johannes
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Performance of a software raid 5
2009-04-21 5:46 ` Neil Brown
2009-04-21 12:40 ` Johannes Segitz
@ 2009-04-26 17:03 ` Johannes Segitz
1 sibling, 0 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-26 17:03 UTC (permalink / raw)
To: linux-raid
On Tue, Apr 21, 2009 at 7:46 AM, Neil Brown <neilb@suse.de> wrote:
> I suspect you will see that improve when you add another drive that it
> isn't running degraded.
Well, i didn't but it doesn't really matter since the crypto layer slows
it down to a crawl
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md6 : active raid5 sdd1[0] sdc1[1] md7[3] md8[4] sde1[2]
3907039232 blocks level 5, 256k chunk, algorithm 2 [5/5] [UUUUU]
bitmap: 1/15 pages [4KB], 32768KB chunk
md7 : active raid0 sdg1[0] sda1[1]
976767744 blocks 128k chunks
md8 : active raid0 sdh1[1] sdb1[0]
976767744 blocks 128k chunks
So everything is okay now, no missing drive and the bad drive is now gone.
for d in /dev/sd[cde]1 /dev/md[678] /dev/mapper/daten bigfile
do
echo $d
dd if=$d of=/dev/null bs=1M count=1000
done
/dev/sdc1
1048576000 bytes (1.0 GB) copied, 6.13302 s, 171 MB/s
/dev/sdd1
1048576000 bytes (1.0 GB) copied, 12.2261 s, 85.8 MB/s
/dev/sde1
1048576000 bytes (1.0 GB) copied, 11.8026 s, 88.8 MB/s
/dev/md6
1048576000 bytes (1.0 GB) copied, 6.42977 s, 163 MB/s
/dev/md7
1048576000 bytes (1.0 GB) copied, 9.51655 s, 110 MB/s
/dev/md8
1048576000 bytes (1.0 GB) copied, 7.97321 s, 132 MB/s
/dev/mapper/daten
1048576000 bytes (1.0 GB) copied, 28.6309 s, 36.6 MB/s
bigfile
1048576000 bytes (1.0 GB) copied, 31.9715 s, 32.8 MB/s
So the raid works okay, although i'm not thrilled by 163 MB/s
read speed when i see what the underlying devices are capable
of. But the real bad drop seems to be the crypto and with
http://tynne.de/linux-crypto-speed in mind it seems quite a
reasonable speed, although i'm disappointed since i expected
more. Next time i won't expect but do better tests beforehand.
Thanks everyone for their help.
Johannes
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor write performance with write-intent bitmap?
2009-04-21 12:05 ` John Robinson
@ 2009-05-22 23:00 ` Redeeman
0 siblings, 0 replies; 30+ messages in thread
From: Redeeman @ 2009-05-22 23:00 UTC (permalink / raw)
To: John Robinson; +Cc: Linux RAID
On Tue, 2009-04-21 at 13:05 +0100, John Robinson wrote:
> On 21/04/2009 06:50, Neil Brown wrote:
> > On Tuesday April 21, john.robinson@anonymous.org.uk wrote:
> >> Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large
> >> chunk size caused a reboot! There's nothing in the log, and I didn't see
> >> the console. I still have my 32M chunksize but I don't want to try that
> >> again in a hurry :-)
> >
> > That's a worry... I cannot easily reproduce it. If it happens again
> > and you get any more detail, I'm sure you'll let me know.
>
> Sure will. For the moment I have something that looks slightly
> inconsistent: mdadm --detail shows no bitmap after the crash:
> # mdadm --detail /dev/md1
> /dev/md1:
> Version : 00.90.03
> Creation Time : Mon Jul 28 15:49:09 2008
> Raid Level : raid5
> Array Size : 1953310720 (1862.82 GiB 2000.19 GB)
> Used Dev Size : 976655360 (931.41 GiB 1000.10 GB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Tue Apr 21 12:37:15 2009
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> UUID : d8c57a89:166ee722:23adec48:1574b5fc
> Events : 0.6152
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync /dev/sda2
> 1 8 18 1 active sync /dev/sdb2
> 2 8 34 2 active sync /dev/sdc2
>
> and indeed another attempt to remove the bitmap fails gently:
> # mdadm --grow /dev/md1 --bitmap none
> mdadm: no bitmap found on /dev/md1
>
> However examining any of the devices making up the RAID appears to
> suggest there is a bitmap:
> # mdadm --examine-bitmap /dev/sda2
> Filename : /dev/sda2
> Magic : 6d746962
> Version : 4
> UUID : d8c57a89:166ee722:23adec48:1574b5fc
> Events : 6148
> Events Cleared : 6148
> State : OK
> Chunksize : 32 MB
> Daemon : 5s flush period
> Write Mode : Normal
> Sync Size : 976655360 (931.41 GiB 1000.10 GB)
> Bitmap : 29806 bits (chunks), 10 dirty (0.0%)
>
> Is this to be expected? I would have thought it would say nothing here,
> or say there's no bitmap.
Hmm very good question, I'd like to know that aswell..
<snip>
> Many thanks for all your advice and assistance.
>
> Cheers,
>
> John.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-05-22 23:00 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-20 17:12 Performance of a software raid 5 Johannes Segitz
2009-04-20 23:46 ` John Robinson
2009-04-21 0:10 ` Johannes Segitz
2009-04-21 0:52 ` John Robinson
2009-04-21 1:05 ` Johannes Segitz
2009-04-21 1:12 ` John Robinson
2009-04-21 1:19 ` NeilBrown
2009-04-21 2:04 ` Johannes Segitz
2009-04-21 5:46 ` Neil Brown
2009-04-21 12:40 ` Johannes Segitz
2009-04-24 13:49 ` Johannes Segitz
2009-04-26 17:03 ` Johannes Segitz
2009-04-21 18:56 ` Corey Hickey
2009-04-22 12:29 ` Bill Davidsen
2009-04-22 22:32 ` Corey Hickey
2009-04-22 9:07 ` Goswin von Brederlow
2009-04-21 0:44 ` Poor write performance with write-intent bitmap? John Robinson
2009-04-21 1:33 ` NeilBrown
2009-04-21 2:13 ` John Robinson
2009-04-21 5:50 ` Neil Brown
2009-04-21 12:05 ` John Robinson
2009-05-22 23:00 ` Redeeman
2009-04-22 9:16 ` Goswin von Brederlow
2009-04-22 12:41 ` John Robinson
2009-04-22 14:02 ` Goswin von Brederlow
2009-04-23 7:48 ` John Robinson
2009-04-22 14:21 ` Andre Noll
2009-04-23 8:04 ` John Robinson
2009-04-23 20:23 ` Goswin von Brederlow
2009-04-21 16:00 ` Bill Davidsen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.