All of lore.kernel.org
 help / color / mirror / Atom feed
* Performance of a software raid 5
@ 2009-04-20 17:12 Johannes Segitz
  2009-04-20 23:46 ` John Robinson
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Segitz @ 2009-04-20 17:12 UTC (permalink / raw)
  To: linux-raid

Hi,

[first of all i'm not really sure if i'm right here. If this is the
wrong place then please just
tell me what to rtfm or where to post]

i'm currently trying to create a raid 5 out of three 1 TB hdd. For now
there is one hdd missing so i get 3 TB of usable space.

One hdd is connected to
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)

the other two to
04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA
Raid II Controller (rev 01)

The CPU is a AMD X2 4200+ and the system has 2 GB RAM.

The performance of the array is underwhelming.
time dd if=/dev/zero of=big_file bs=4096 count=2560000
10485760000 bytes (10 GB) copied, 187.691 s, 55.9 MB/s
dd if=/dev/zero of=big_file bs=4096 count=2560000 0.70s user 26.05s
system 14% cpu 3:08.12 total

time dd if=big_file of=/dev/null bs=4096 count=2560000
10485760000 bytes (10 GB) copied, 297.345 s, 35.3 MB/s
dd if=big_file of=/dev/null bs=4096 count=2560000 0.50s user 10.60s
system 3% cpu 4:57.35 total

So i get a write performance of 55 MB/s and a read speed of 35 MB/s. The hdd

 Model=SAMSUNG HD103UJ                         , FwRev=1AA01113,
SerialNo=S13PJDWS250990
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=32767kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-3,4,5,6,7

are all the same and get ~70 MB/s when used alone.

The details for the raid device:
/dev/md6:
        Version : 00.90
  Creation Time : Sun Apr 19 22:30:23 2009
     Raid Level : raid5
     Array Size : 2930279424 (2794.53 GiB 3000.61 GB)
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 6
    Persistence : Superblock is persistent

    Update Time : Mon Apr 20 14:45:32 2009
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

           UUID : 584a0f66:3c075c23:9cae9464:25382498 (local to host
johannes-desktop)
         Events : 0.21396

    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       1       8      145        1      active sync   /dev/sdj1
       2       8      161        2      active sync   /dev/sdk1
       3       0        0        3      removed

On top of the raid device there is a crypto layer
cryptsetup --verify-passphrase -c aes-cbc-essiv:sha256 -y -s 256
luksFormat /dev/md6
and then ext4
mkfs.ext4 -v -b 4096 -E lazy_itable_init,stride=64,stripe-width=256 -O
large_file,dir_index,extent,sparse_super,uninit_bg -m0
/dev/mapper/data
I use kernel 2.6.29.1

Stride and stripe-width will be correct when i add another two hdd of
which one will carry data. Can someone please give me a hint why i
could get such bad performance especially while reading? I don't think
its the crypto layer since kcryptd doesn't go over 50% cpu and having
two cores should prevent other processes from starving. The stride and
stripe-width aren't correct right now but can it degrade performance
like that? I would expect at least 100+ MB/s on reading and writing.

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-20 17:12 Performance of a software raid 5 Johannes Segitz
@ 2009-04-20 23:46 ` John Robinson
  2009-04-21  0:10   ` Johannes Segitz
  2009-04-21  0:44   ` Poor write performance with write-intent bitmap? John Robinson
  0 siblings, 2 replies; 30+ messages in thread
From: John Robinson @ 2009-04-20 23:46 UTC (permalink / raw)
  To: Johannes Segitz; +Cc: linux-raid

On 20/04/2009 18:12, Johannes Segitz wrote:
> i'm currently trying to create a raid 5 out of three 1 TB hdd. For now
> there is one hdd missing so i get 3 TB of usable space.
[...]
> Stride and stripe-width will be correct when i add another two hdd of
> which one will carry data. Can someone please give me a hint why i
> could get such bad performance especially while reading?

I would have thought it's because you're running in degraded mode and 
one in 3 sectors is having to be regenerated from the parity. It still 
seems a bit slow, though.

Here I have a 3-disc RAID-5 of similar drives:

# hdparm -i /dev/sda
/dev/sda:

  Model=SAMSUNG HD103UJ                         , FwRev=1AA01112, 
SerialNo=S1PVJ1CQ602164
  Config={ Fixed }
  RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
  BuffType=DualPortCache, BuffSize=32767kB, MaxMultSect=16, MultSect=?0?
  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
  IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
  PIO modes:  pio0 pio1 pio2 pio3 pio4
  DMA modes:  mdma0 mdma1 mdma2
  UDMA modes: udma0 udma1 udma2
  AdvancedPM=yes: disabled (255) WriteCache=enabled
  Drive conforms to: unknown:  ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 
ATA/ATAPI-6 ATA/ATAPI-7

# mdadm --detail /dev/md1
/dev/md1:
         Version : 00.90.03
   Creation Time : Mon Jul 28 15:49:09 2008
      Raid Level : raid5
      Array Size : 1953310720 (1862.82 GiB 2000.19 GB)
   Used Dev Size : 976655360 (931.41 GiB 1000.10 GB)
    Raid Devices : 3
   Total Devices : 3
Preferred Minor : 1
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Tue Apr 21 00:35:26 2009
           State : active
  Active Devices : 3
Working Devices : 3
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 256K

            UUID : d8c57a89:166ee722:23adec48:1574b5fc
          Events : 0.6134

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2

It has LVM and an ext3 filesystem on it. Here are my timings:

# time dd if=/dev/zero of=big_file bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 264.448 seconds, 39.7 MB/s

real    4m25.740s
user    0m2.272s
sys     0m34.470s

# time dd if=big_file of=/dev/null bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 53.9577 seconds, 194 MB/s

real    0m54.026s
user    0m0.556s
sys     0m4.944s

I'm not quite sure whether I should be disappointed at my writes being 
so slow. Certainly there's a lot of rattling during writing, which 
probably indicates lots of seeks to write ext3's journal. But reads are 
roughly what I expected, at about three times the single-disc throughput.

Cheers,

John.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-20 23:46 ` John Robinson
@ 2009-04-21  0:10   ` Johannes Segitz
  2009-04-21  0:52     ` John Robinson
  2009-04-21  0:44   ` Poor write performance with write-intent bitmap? John Robinson
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21  0:10 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 21, 2009 at 1:46 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> I would have thought it's because you're running in degraded mode and one in
> 3 sectors is having to be regenerated from the parity. It still seems a bit
> slow, though.

i don't think that that is a problem. The data is there without
redundancy so i can't see
how there would be the need to calculate anything

> I'm not quite sure whether I should be disappointed at my writes being so
> slow. Certainly there's a lot of rattling during writing, which probably
> indicates lots of seeks to write ext3's journal. But reads are roughly what
> I expected, at about three times the single-disc throughput.

200 MB/s reads would be nice but i expected quite a bit more speed
writing too. I know
that you don't get an O(n) speedup but falling behind the normal performance of
a single drive seems not okay.

Johannes
Btw.:The controller is on PCI-E so no bottleneck there

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Poor write performance with write-intent bitmap?
  2009-04-20 23:46 ` John Robinson
  2009-04-21  0:10   ` Johannes Segitz
@ 2009-04-21  0:44   ` John Robinson
  2009-04-21  1:33     ` NeilBrown
  1 sibling, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-21  0:44 UTC (permalink / raw)
  To: Linux RAID

On 21/04/2009 00:46, I wrote:
[...]
> # time dd if=/dev/zero of=big_file bs=4096 count=2560000
> 2560000+0 records in
> 2560000+0 records out
> 10485760000 bytes (10 GB) copied, 264.448 seconds, 39.7 MB/s
[...]
> I'm not quite sure whether I should be disappointed at my writes being 
> so slow. Certainly there's a lot of rattling during writing, which 
> probably indicates lots of seeks to write ext3's journal.

No, that's not it. Using a scratch logical volume over the md RAID-5 
isn't much better:

# time dd if=/dev/zero of=/dev/mapper/vg0-scratch bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 230.036 seconds, 45.6 MB/s

real    3m50.077s
user    0m1.608s
sys     0m11.097s

It still rattles a lot, suggesting a lot of seeking. Now if I turn off 
the bitmap and try again:
# mdadm --grow /dev/md1 --bitmap=none
# time dd if=/dev/zero of=/dev/mapper/vg0-scratch bs=4096 count=2560000
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 110.17 seconds, 95.2 MB/s

real    1m50.346s
user    0m1.900s
sys     0m13.537s

That's more like it, and no more rattling. Can I tune settings for the 
internal bitmap, or is this something which will have improved anyway 
since my kernel (2.6.18-128.1.6.el5.centos.plusxen so essentially a 
prominent North American Enterprise Linux vendor's EL5 codebase for 
md/raid5)? I mean, I do want the bitmap, but I hadn't realised it was 
quite so expensive (not that it matters much in this particular 
application).

Cheers,

John.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  0:10   ` Johannes Segitz
@ 2009-04-21  0:52     ` John Robinson
  2009-04-21  1:05       ` Johannes Segitz
  0 siblings, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-21  0:52 UTC (permalink / raw)
  To: Johannes Segitz; +Cc: linux-raid

On 21/04/2009 01:10, Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 1:46 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> I would have thought it's because you're running in degraded mode and one in
>> 3 sectors is having to be regenerated from the parity. It still seems a bit
>> slow, though.
> 
> i don't think that that is a problem. The data is there without
> redundancy so i can't see
> how there would be the need to calculate anything

There's no redundancy but it's still the RAID-5 4-disc layout with 3 
data and 1 parity, the parity on a different disc in each stripe. In 
your case with a missing disc, for 3 stripes in 4 you have 2 data and 1 
parity. Of course the parity is having to be calculated when you're 
writing, and whatever would be written to your missing disc is being 
discarded.

On the other hand if you were using RAID-0 over 3 discs there would be 
no need to calculate anything.

Cheers,

John.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  0:52     ` John Robinson
@ 2009-04-21  1:05       ` Johannes Segitz
  2009-04-21  1:12         ` John Robinson
  2009-04-21  1:19         ` NeilBrown
  0 siblings, 2 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21  1:05 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> There's no redundancy but it's still the RAID-5 4-disc layout with 3 data
> and 1 parity, the parity on a different disc in each stripe. In your case
> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
> course the parity is having to be calculated when you're writing, and
> whatever would be written to your missing disc is being discarded.

you're right, i didn't think of that. But calculating an xor isn't really
a big deal (especially with the aes on top of it) so i still can't see why
it's so slow

Johannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  1:05       ` Johannes Segitz
@ 2009-04-21  1:12         ` John Robinson
  2009-04-21  1:19         ` NeilBrown
  1 sibling, 0 replies; 30+ messages in thread
From: John Robinson @ 2009-04-21  1:12 UTC (permalink / raw)
  To: Johannes Segitz; +Cc: linux-raid

On 21/04/2009 02:05, Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> There's no redundancy but it's still the RAID-5 4-disc layout with 3 data
>> and 1 parity, the parity on a different disc in each stripe. In your case
>> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
>> course the parity is having to be calculated when you're writing, and
>> whatever would be written to your missing disc is being discarded.
> 
> you're right, i didn't think of that. But calculating an xor isn't really
> a big deal (especially with the aes on top of it) so i still can't see why
> it's so slow

No nor can I, especially since your `time` output shows a very modest 
amount of system time; it may be worth trying fewer layers (i.e. no 
encryption and/or no filesystem) to eliminate them, or monitoring with 
other tools like iostat to see if you can get to the bottom of it.

Cheers,

John.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  1:05       ` Johannes Segitz
  2009-04-21  1:12         ` John Robinson
@ 2009-04-21  1:19         ` NeilBrown
  2009-04-21  2:04           ` Johannes Segitz
  2009-04-22  9:07           ` Goswin von Brederlow
  1 sibling, 2 replies; 30+ messages in thread
From: NeilBrown @ 2009-04-21  1:19 UTC (permalink / raw)
  To: Johannes Segitz; +Cc: linux-raid

On Tue, April 21, 2009 11:05 am, Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> There's no redundancy but it's still the RAID-5 4-disc layout with 3
>> data
>> and 1 parity, the parity on a different disc in each stripe. In your
>> case
>> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
>> course the parity is having to be calculated when you're writing, and
>> whatever would be written to your missing disc is being discarded.
>
> you're right, i didn't think of that. But calculating an xor isn't really
> a big deal (especially with the aes on top of it) so i still can't see why
> it's so slow

Large sequential writes to a degraded RAID5 will be the same speed as to a
non-degraded RAID5.  Smaller random write can still be slower as the
amount of pre-reading can increase.

Reads from a degraded raid5 will be slower not because of the XOR, but
because of needing to read all that extra data to feed in to the XOR.

Have you done any testing without the crypto layer to see what effect
that has?

Can I suggest:

  for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
  do
    dd if=$d of=/dev/null bs=1M count=100
  done

and report the times.

NeilBrown


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21  0:44   ` Poor write performance with write-intent bitmap? John Robinson
@ 2009-04-21  1:33     ` NeilBrown
  2009-04-21  2:13       ` John Robinson
  2009-04-21 16:00       ` Bill Davidsen
  0 siblings, 2 replies; 30+ messages in thread
From: NeilBrown @ 2009-04-21  1:33 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

On Tue, April 21, 2009 10:44 am, John Robinson wrote:
> That's more like it, and no more rattling. Can I tune settings for the
> internal bitmap, or is this something which will have improved anyway
> since my kernel (2.6.18-128.1.6.el5.centos.plusxen so essentially a
> prominent North American Enterprise Linux vendor's EL5 codebase for
> md/raid5)? I mean, I do want the bitmap, but I hadn't realised it was
> quite so expensive (not that it matters much in this particular
> application).
>

I don't think newer kernels make any different to bitmap related
performance, though there might be some general raid5 improvements since
then.

There are two tunables for bitmaps.  Chuck size and delay (though the
delay doesn't seem to be in the man page).

Choosing a larger --bitmap-chunk size will require fewer updates to the
bitmap before writes are allowed to proceed.   However a larger bitmap-chunk
size will also increase the amount of work needed after a crash or
re-added device.
Check your current (default) chunk size with "mdadm -X /dev/sdxx" and
create a new bitmap with (say) 16 or 64 times the chunk size.
See if that makes a difference.

The delay tunable sets how quickly bits are removed from the bitmap.
This is done fairly lazily and opportunistically so it isn't likely
to affect throughput directly.  However if you are updates the same
area on disk periodically, and this period is a little bit more than
the timeout, you will get excessive bitmap updates that service little
purpose.
You could try increasing this (to say 30 or 60 seconds), but I doubt
it will have much effect.

Your other option is to put the bitmap in a file on some other device.
If you have a device this is rarely used (maybe your root filesystem)
you can create an external bitmap on there.  You would want to be
sure that the device storing the bitmap is always going to be
available when you start the array that the bitmap belongs to.

Maybe you could create an external bitmap in a tmpfs.... but then
it wouldn't survive a crash and so has little value.  It would be
fast though :-)

NeilBrown


> Cheers,
>
> John.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  1:19         ` NeilBrown
@ 2009-04-21  2:04           ` Johannes Segitz
  2009-04-21  5:46             ` Neil Brown
  2009-04-21 18:56             ` Corey Hickey
  2009-04-22  9:07           ` Goswin von Brederlow
  1 sibling, 2 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21  2:04 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
> Have you done any testing without the crypto layer to see what effect
> that has?
>
> Can I suggest:
>
>  for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>  do
>    dd if=$d of=/dev/null bs=1M count=100
>  done
>
> and report the times.

tested it with 1gb instead of 100 mb

sdg
1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
sdj
1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
sdk
1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
/dev/md6
1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
/dev/mapper/data
1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
bigfile
1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s

so the crypto indeed slows it down (and i'm surprised that it's that
bad because i've read
it's not a big hit on current CPUs and the X2 isn't new but not that
old) but still read speed
from md6 is worse than from one drive alone

Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21  1:33     ` NeilBrown
@ 2009-04-21  2:13       ` John Robinson
  2009-04-21  5:50         ` Neil Brown
  2009-04-22  9:16         ` Goswin von Brederlow
  2009-04-21 16:00       ` Bill Davidsen
  1 sibling, 2 replies; 30+ messages in thread
From: John Robinson @ 2009-04-21  2:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux RAID

On 21/04/2009 02:33, NeilBrown wrote:
[...]
> Choosing a larger --bitmap-chunk size will require fewer updates to the
> bitmap before writes are allowed to proceed.   However a larger bitmap-chunk
> size will also increase the amount of work needed after a crash or
> re-added device.

Ah, from my reading of the mdadm man page I thought you could only 
specify the chunk size when using an external bitmap:
--bitmap-chunk=
        Set  the  chunksize  of the bitmap. Each bit corresponds to that
        many Kilobytes of storage.  When using a file based bitmap,  the
        default  is  to  use  the  smallest  size that is at-least 4 and
        requires no more than  2^21  chunks.   When  using  an  internal
        bitmap,  the  chunksize is automatically determined to make best
        use of available space.

> Check your current (default) chunk size with "mdadm -X /dev/sdxx" and
> create a new bitmap with (say) 16 or 64 times the chunk size.
> See if that makes a difference.

It certainly does. Upping it from 2M to 32M gets me from 45MB/s to 
81MB/s on the scratch LV, and there's noticeably less seek noise.

Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large 
chunk size caused a reboot! There's nothing in the log, and I didn't see 
the console. I still have my 32M chunksize but I don't want to try that 
again in a hurry :-)

[...]
> Your other option is to put the bitmap in a file on some other device.
> If you have a device this is rarely used (maybe your root filesystem)

Can't do that, my root filesystem is on the RAID-5, and part of the 
reason for wanting the bitmap is because the md can't be stopped while 
shutting down, so it was always wanting to resync at startup, which is 
rather tedious.

> Maybe you could create an external bitmap in a tmpfs.... but then
> it wouldn't survive a crash and so has little value.  It would be
> fast though :-)

"Ooh, virtual memory! Now I can have a really big RAM disc!"

Now, it's time I checked my discs to see if I've lost data in that 
crash. `mdadm -X` is stuck saying there are 10 dirty chunks.

Cheers,

John.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  2:04           ` Johannes Segitz
@ 2009-04-21  5:46             ` Neil Brown
  2009-04-21 12:40               ` Johannes Segitz
  2009-04-26 17:03               ` Johannes Segitz
  2009-04-21 18:56             ` Corey Hickey
  1 sibling, 2 replies; 30+ messages in thread
From: Neil Brown @ 2009-04-21  5:46 UTC (permalink / raw)
  To: Johannes Segitz; +Cc: linux-raid

On Tuesday April 21, johannes.segitz@gmail.com wrote:
> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
> > Have you done any testing without the crypto layer to see what effect
> > that has?
> >
> > Can I suggest:
> >
> >  for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
> >  do
> >    dd if=$d of=/dev/null bs=1M count=100
> >  done
> >
> > and report the times.
> 
> tested it with 1gb instead of 100 mb
> 
> sdg
> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
> sdj
> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
> sdk
> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
> /dev/md6
> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
> /dev/mapper/data
> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
> bigfile
> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
> 
> so the crypto indeed slows it down (and i'm surprised that it's that
> bad because i've read
> it's not a big hit on current CPUs and the X2 isn't new but not that
> old) but still read speed
> from md6 is worse than from one drive alone

I suspect you will see that improve when you add another drive that it
isn't running degraded.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21  2:13       ` John Robinson
@ 2009-04-21  5:50         ` Neil Brown
  2009-04-21 12:05           ` John Robinson
  2009-04-22  9:16         ` Goswin von Brederlow
  1 sibling, 1 reply; 30+ messages in thread
From: Neil Brown @ 2009-04-21  5:50 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

On Tuesday April 21, john.robinson@anonymous.org.uk wrote:
> 
> Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large 
> chunk size caused a reboot! There's nothing in the log, and I didn't see 
> the console. I still have my 32M chunksize but I don't want to try that 
> again in a hurry :-)

That's a worry... I cannot easily reproduce it.  If it happens again
and you get any more detail, I'm sure you'll let me know.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21  5:50         ` Neil Brown
@ 2009-04-21 12:05           ` John Robinson
  2009-05-22 23:00             ` Redeeman
  0 siblings, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-21 12:05 UTC (permalink / raw)
  To: Linux RAID

On 21/04/2009 06:50, Neil Brown wrote:
> On Tuesday April 21, john.robinson@anonymous.org.uk wrote:
>> Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large 
>> chunk size caused a reboot! There's nothing in the log, and I didn't see 
>> the console. I still have my 32M chunksize but I don't want to try that 
>> again in a hurry :-)
> 
> That's a worry... I cannot easily reproduce it.  If it happens again
> and you get any more detail, I'm sure you'll let me know.

Sure will. For the moment I have something that looks slightly 
inconsistent: mdadm --detail shows no bitmap after the crash:
# mdadm --detail /dev/md1
/dev/md1:
         Version : 00.90.03
   Creation Time : Mon Jul 28 15:49:09 2008
      Raid Level : raid5
      Array Size : 1953310720 (1862.82 GiB 2000.19 GB)
   Used Dev Size : 976655360 (931.41 GiB 1000.10 GB)
    Raid Devices : 3
   Total Devices : 3
Preferred Minor : 1
     Persistence : Superblock is persistent

     Update Time : Tue Apr 21 12:37:15 2009
           State : clean
  Active Devices : 3
Working Devices : 3
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 256K

            UUID : d8c57a89:166ee722:23adec48:1574b5fc
          Events : 0.6152

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2

and indeed another attempt to remove the bitmap fails gently:
# mdadm --grow /dev/md1 --bitmap none
mdadm: no bitmap found on /dev/md1

However examining any of the devices making up the RAID appears to 
suggest there is a bitmap:
# mdadm --examine-bitmap /dev/sda2
         Filename : /dev/sda2
            Magic : 6d746962
          Version : 4
             UUID : d8c57a89:166ee722:23adec48:1574b5fc
           Events : 6148
   Events Cleared : 6148
            State : OK
        Chunksize : 32 MB
           Daemon : 5s flush period
       Write Mode : Normal
        Sync Size : 976655360 (931.41 GiB 1000.10 GB)
           Bitmap : 29806 bits (chunks), 10 dirty (0.0%)

Is this to be expected? I would have thought it would say nothing here, 
or say there's no bitmap.

Anyway, continuing my experiment, increasing the bitmap chunk size to 
128MB improves my streaming write throughput even further, to 86MB/s (vs 
45MB/s with the default 2MB chunk, and 81MB/s with a 32MB chunk), but it 
looks like a case of diminishing returns, the chunk size is getting 
large enough to mean there could be real work involved in recovery, and 
I really ought to be testing this with some real filesystem throughput, 
not just streaming writes with dd.

Another `mdadm --grow /dev/md1 --bitmap none` has worked without 
side-effects, but afterwards `mdadm --examine-bitmap` still shows the 
most recent bitmap settings. This is mdadm 2.6.4, or more specifically 
mdadm-2.6.4-1.el5.x86_64.rpm.

I've now gone to a 16MB chunk size, which gives 75MB/s throughput for 
streaming writes to the scratch LV, better than 80% of the bitmap-less 
setup as opposed to less than 50% with the default chunk size, and I 
think I'm going to settle at that for now.

Many thanks for all your advice and assistance.

Cheers,

John.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  5:46             ` Neil Brown
@ 2009-04-21 12:40               ` Johannes Segitz
  2009-04-24 13:49                 ` Johannes Segitz
  2009-04-26 17:03               ` Johannes Segitz
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Segitz @ 2009-04-21 12:40 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 21, 2009 at 7:46 AM, Neil Brown <neilb@suse.de> wrote:
> I suspect you will see that improve when you add another drive that it
> isn't running degraded.

I will give it a try and i hope you're right since i can't recreate the array
when i used the other drives because currently they are used in another
array which then will be destroyed. I'll try it later and then post the results

Johannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21  1:33     ` NeilBrown
  2009-04-21  2:13       ` John Robinson
@ 2009-04-21 16:00       ` Bill Davidsen
  1 sibling, 0 replies; 30+ messages in thread
From: Bill Davidsen @ 2009-04-21 16:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Robinson, Linux RAID

NeilBrown wrote:
> On Tue, April 21, 2009 10:44 am, John Robinson wrote:
>   
>> That's more like it, and no more rattling. Can I tune settings for the
>> internal bitmap, or is this something which will have improved anyway
>> since my kernel (2.6.18-128.1.6.el5.centos.plusxen so essentially a
>> prominent North American Enterprise Linux vendor's EL5 codebase for
>> md/raid5)? I mean, I do want the bitmap, but I hadn't realised it was
>> quite so expensive (not that it matters much in this particular
>> application).
>>
>>     
>
> I don't think newer kernels make any different to bitmap related
> performance, though there might be some general raid5 improvements since
> then.
>
> There are two tunables for bitmaps.  Chuck size and delay (though the
> delay doesn't seem to be in the man page).
>   

It isn't in the man page, and
  mdadm --help | egrep 'bitmap|delay'
comes up empty as well. So then I looked at the strings:
  strings $(type -p mdadm) | less
and I not only found it, but found that you have a vast bunch of 
duplicate strings, including some which are saying the same thing but 
expressed in several ways. If I might quote my offspring, "That's ugly, 
dude!"

Anyway, setting "--delay N" (sec) does exactly what you predicted, not much.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  2:04           ` Johannes Segitz
  2009-04-21  5:46             ` Neil Brown
@ 2009-04-21 18:56             ` Corey Hickey
  2009-04-22 12:29               ` Bill Davidsen
  1 sibling, 1 reply; 30+ messages in thread
From: Corey Hickey @ 2009-04-21 18:56 UTC (permalink / raw)
  To: Johannes Segitz; +Cc: linux-raid

Johannes Segitz wrote:
> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
>> Have you done any testing without the crypto layer to see what effect
>> that has?
>>
>> Can I suggest:
>>
>>  for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>>  do
>>    dd if=$d of=/dev/null bs=1M count=100
>>  done
>>
>> and report the times.
> 
> tested it with 1gb instead of 100 mb
> 
> sdg
> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
> sdj
> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
> sdk
> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
> /dev/md6
> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
> /dev/mapper/data
> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
> bigfile
> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
> 
> so the crypto indeed slows it down (and i'm surprised that it's that
> bad because i've read
> it's not a big hit on current CPUs and the X2 isn't new but not that
> old) but still read speed
> from md6 is worse than from one drive alone

If it helps, some recent dd benchmarks I did indicate that twofish is
about 25% faster than aes on my Athlon64.

Athlon64 3400+ 2.4 GHz, 64-bit Linux 2.6.28.2

Both aes and twofish are using the asm implementations according to
/proc/crypto.

All numbers are in MB/s; average of three tests for a 512MB dd
read/write to the encrypted device.

                                 read       write
aes                              69.4        61.0
twofish                          86.8        76.6
aes-cbc-essiv:sha256             65.1        56.3
twofish-cbc-essiv:sha256         82.6        73.5


-Corey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  1:19         ` NeilBrown
  2009-04-21  2:04           ` Johannes Segitz
@ 2009-04-22  9:07           ` Goswin von Brederlow
  1 sibling, 0 replies; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-22  9:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Johannes Segitz, linux-raid

"NeilBrown" <neilb@suse.de> writes:

> On Tue, April 21, 2009 11:05 am, Johannes Segitz wrote:
>> On Tue, Apr 21, 2009 at 2:52 AM, John Robinson
>> <john.robinson@anonymous.org.uk> wrote:
>>> There's no redundancy but it's still the RAID-5 4-disc layout with 3
>>> data
>>> and 1 parity, the parity on a different disc in each stripe. In your
>>> case
>>> with a missing disc, for 3 stripes in 4 you have 2 data and 1 parity. Of
>>> course the parity is having to be calculated when you're writing, and
>>> whatever would be written to your missing disc is being discarded.
>>
>> you're right, i didn't think of that. But calculating an xor isn't really
>> a big deal (especially with the aes on top of it) so i still can't see why
>> it's so slow
>
> Large sequential writes to a degraded RAID5 will be the same speed as to a
> non-degraded RAID5.  Smaller random write can still be slower as the
> amount of pre-reading can increase.
>
> Reads from a degraded raid5 will be slower not because of the XOR, but
> because of needing to read all that extra data to feed in to the XOR.

But when doing large sequential reads those blocks have already been
loaded or would be loaded next anyway. The number of blocks read
should be exactly the same.

> Have you done any testing without the crypto layer to see what effect
> that has?
>
> Can I suggest:
>
>   for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>   do
>     dd if=$d of=/dev/null bs=1M count=100
>   done
>
> and report the times.
>
> NeilBrown

MfG
        Goswin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21  2:13       ` John Robinson
  2009-04-21  5:50         ` Neil Brown
@ 2009-04-22  9:16         ` Goswin von Brederlow
  2009-04-22 12:41           ` John Robinson
  1 sibling, 1 reply; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-22  9:16 UTC (permalink / raw)
  To: John Robinson; +Cc: NeilBrown, Linux RAID

John Robinson <john.robinson@anonymous.org.uk> writes:

> Can't do that, my root filesystem is on the RAID-5, and part of the
> reason for wanting the bitmap is because the md can't be stopped while
> shutting down, so it was always wanting to resync at startup, which is
> rather tedious.

Normal shutdown should put the raid in read-only mode as last step. At
least Debian does that. That way even a mounted raid will be clean
after reboot.

I would also suggest restructuring your system like this:

sdX1 1GB  raid1  / (+/boot)
sdX2 rest raid5  lvm with /usr, /var, /home, ...

Both / and /usr can usualy be read-only preventing any filesystem
corruption and raid resyncs in that part of the raid.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21 18:56             ` Corey Hickey
@ 2009-04-22 12:29               ` Bill Davidsen
  2009-04-22 22:32                 ` Corey Hickey
  0 siblings, 1 reply; 30+ messages in thread
From: Bill Davidsen @ 2009-04-22 12:29 UTC (permalink / raw)
  To: Corey Hickey; +Cc: Johannes Segitz, linux-raid

Corey Hickey wrote:
> Johannes Segitz wrote:
>   
>> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
>>     
>>> Have you done any testing without the crypto layer to see what effect
>>> that has?
>>>
>>> Can I suggest:
>>>
>>>  for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>>>  do
>>>    dd if=$d of=/dev/null bs=1M count=100
>>>  done
>>>
>>> and report the times.
>>>       
>> tested it with 1gb instead of 100 mb
>>
>> sdg
>> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
>> sdj
>> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
>> sdk
>> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
>> /dev/md6
>> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
>> /dev/mapper/data
>> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
>> bigfile
>> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
>>
>> so the crypto indeed slows it down (and i'm surprised that it's that
>> bad because i've read
>> it's not a big hit on current CPUs and the X2 isn't new but not that
>> old) but still read speed
>> from md6 is worse than from one drive alone
>>     
>
> If it helps, some recent dd benchmarks I did indicate that twofish is
> about 25% faster than aes on my Athlon64.
>
> Athlon64 3400+ 2.4 GHz, 64-bit Linux 2.6.28.2
>
> Both aes and twofish are using the asm implementations according to
> /proc/crypto.
>
> All numbers are in MB/s; average of three tests for a 512MB dd
> read/write to the encrypted device.
>
>                                  read       write
> aes                              69.4        61.0
> twofish                          86.8        76.6
> aes-cbc-essiv:sha256             65.1        56.3
> twofish-cbc-essiv:sha256         82.6        73.5
>   

Good info, but was the CPU maxed or was something else the limiting factor?

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-22  9:16         ` Goswin von Brederlow
@ 2009-04-22 12:41           ` John Robinson
  2009-04-22 14:02             ` Goswin von Brederlow
  2009-04-22 14:21             ` Andre Noll
  0 siblings, 2 replies; 30+ messages in thread
From: John Robinson @ 2009-04-22 12:41 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Linux RAID

On 22/04/2009 10:16, Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
>> Can't do that, my root filesystem is on the RAID-5, and part of the
>> reason for wanting the bitmap is because the md can't be stopped while
>> shutting down, so it was always wanting to resync at startup, which is
>> rather tedious.
> 
> Normal shutdown should put the raid in read-only mode as last step. At
> least Debian does that. That way even a mounted raid will be clean
> after reboot.

Yes, I would have thought it should as well. But I've just looked at 
CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to 
switch md devices to read-only. Of course the root filesystem has gone 
read-only but as we know that doesn't mean the device underneath it gets 
told that. In particular we know that ext3 normally opens its device 
read-write even when you're mounting the filesystem read-only (iirc it's 
so it can replay the journal).

Another issue might be the LVM layer; does that need to be stopped or 
switched to read-only too?

> I would also suggest restructuring your system like this:
> 
> sdX1 1GB  raid1  / (+/boot)
> sdX2 rest raid5  lvm with /usr, /var, /home, ...
> 
> Both / and /usr can usualy be read-only preventing any filesystem
> corruption and raid resyncs in that part of the raid.

I did do this multiple partition/LV thing once upon a time, but I got 
fed up with having to resize things when one partition was full and 
others empty. The machine is primarily a fileserver and Xen host, so the 
dom0 only has 40GB of its own, and I couldn't be bothered splitting that 
up. Having said all this, your suggestion is a good one, it's just my 
preference to have it otherwise :-)

Cheers,

John.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-22 12:41           ` John Robinson
@ 2009-04-22 14:02             ` Goswin von Brederlow
  2009-04-23  7:48               ` John Robinson
  2009-04-22 14:21             ` Andre Noll
  1 sibling, 1 reply; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-22 14:02 UTC (permalink / raw)
  To: John Robinson; +Cc: Goswin von Brederlow, Linux RAID

John Robinson <john.robinson@anonymous.org.uk> writes:

> On 22/04/2009 10:16, Goswin von Brederlow wrote:
>> John Robinson <john.robinson@anonymous.org.uk> writes:
>>> Can't do that, my root filesystem is on the RAID-5, and part of the
>>> reason for wanting the bitmap is because the md can't be stopped while
>>> shutting down, so it was always wanting to resync at startup, which is
>>> rather tedious.
>>
>> Normal shutdown should put the raid in read-only mode as last step. At
>> least Debian does that. That way even a mounted raid will be clean
>> after reboot.
>
> Yes, I would have thought it should as well. But I've just looked at
> CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to
> switch md devices to read-only. Of course the root filesystem has gone
> read-only but as we know that doesn't mean the device underneath it
> gets told that. In particular we know that ext3 normally opens its
> device read-write even when you're mounting the filesystem read-only
> (iirc it's so it can replay the journal).
>
> Another issue might be the LVM layer; does that need to be stopped or
> switched to read-only too?

Debian does

/sbin/vgchange -aln --ignorelockingfailure || return 2

before S60mdadm-raid, S60umountroot and S90reboot.


>> I would also suggest restructuring your system like this:
>>
>> sdX1 1GB  raid1  / (+/boot)
>> sdX2 rest raid5  lvm with /usr, /var, /home, ...
>>
>> Both / and /usr can usualy be read-only preventing any filesystem
>> corruption and raid resyncs in that part of the raid.
>
> I did do this multiple partition/LV thing once upon a time, but I got
> fed up with having to resize things when one partition was full and
> others empty. The machine is primarily a fileserver and Xen host, so
> the dom0 only has 40GB of its own, and I couldn't be bothered
> splitting that up. Having said all this, your suggestion is a good
> one, it's just my preference to have it otherwise :-)
>
> Cheers,
>
> John.

I've been using a 1GB / for years and years now so that won't be a
problem. As for the rest one can also bind mount /usr, /var, /home to
/mnt/space/* respectively. I.e. have just 2 (/ and everything else)
partitions.

Esspecially for XEN hosts I find LVM verry usefull. Makes it easy to
create new logical volumes for new xen domains.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-22 12:41           ` John Robinson
  2009-04-22 14:02             ` Goswin von Brederlow
@ 2009-04-22 14:21             ` Andre Noll
  2009-04-23  8:04               ` John Robinson
  1 sibling, 1 reply; 30+ messages in thread
From: Andre Noll @ 2009-04-22 14:21 UTC (permalink / raw)
  To: John Robinson; +Cc: Goswin von Brederlow, Linux RAID

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On 13:41, John Robinson wrote:

> >Normal shutdown should put the raid in read-only mode as last step. At
> >least Debian does that. That way even a mounted raid will be clean
> >after reboot.
> 
> Yes, I would have thought it should as well. But I've just looked at 
> CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to 
> switch md devices to read-only.

There's no need to do that in the shutdown script as the kernel will
switch all arrays to read-only mode on halt/reboot.

Moreover, as raid arrays are automatically marked clean if no writes
are pending for some small time period, a simple "sync; sleep 1"
at the end of the shutdown script is usually enough to have a clean
array during the next boot.

An alternative way to deal with this issue is to not have a root file
system at all but to mount/link each top level directory separately.
This allows to stop all md arrays cleany.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-22 12:29               ` Bill Davidsen
@ 2009-04-22 22:32                 ` Corey Hickey
  0 siblings, 0 replies; 30+ messages in thread
From: Corey Hickey @ 2009-04-22 22:32 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Johannes Segitz, linux-raid

Bill Davidsen wrote:
> Corey Hickey wrote:
>> Johannes Segitz wrote:
>>   
>>> On Tue, Apr 21, 2009 at 3:19 AM, NeilBrown <neilb@suse.de> wrote:
>>>     
>>>> Have you done any testing without the crypto layer to see what effect
>>>> that has?
>>>>
>>>> Can I suggest:
>>>>
>>>>  for d in /dev/sd[gjk]1 /dev/md6 /dev/mapper/data bigfile
>>>>  do
>>>>    dd if=$d of=/dev/null bs=1M count=100
>>>>  done
>>>>
>>>> and report the times.
>>>>       
>>> tested it with 1gb instead of 100 mb
>>>
>>> sdg
>>> 1048576000 bytes (1.0 GB) copied, 9.89311 s, 106 MB/s
>>> sdj
>>> 1048576000 bytes (1.0 GB) copied, 10.094 s, 104 MB/s
>>> sdk
>>> 1048576000 bytes (1.0 GB) copied, 8.53513 s, 123 MB/s
>>> /dev/md6
>>> 1048576000 bytes (1.0 GB) copied, 11.4741 s, 91.4 MB/s
>>> /dev/mapper/data
>>> 1048576000 bytes (1.0 GB) copied, 34.4544 s, 30.4 MB/s
>>> bigfile
>>> 1048576000 bytes (1.0 GB) copied, 26.6532 s, 39.3 MB/s
>>>
>>> so the crypto indeed slows it down (and i'm surprised that it's that
>>> bad because i've read
>>> it's not a big hit on current CPUs and the X2 isn't new but not that
>>> old) but still read speed
>>> from md6 is worse than from one drive alone
>>>     
>> If it helps, some recent dd benchmarks I did indicate that twofish is
>> about 25% faster than aes on my Athlon64.
>>
>> Athlon64 3400+ 2.4 GHz, 64-bit Linux 2.6.28.2
>>
>> Both aes and twofish are using the asm implementations according to
>> /proc/crypto.
>>
>> All numbers are in MB/s; average of three tests for a 512MB dd
>> read/write to the encrypted device.
>>
>>                                  read       write
>> aes                              69.4        61.0
>> twofish                          86.8        76.6
>> aes-cbc-essiv:sha256             65.1        56.3
>> twofish-cbc-essiv:sha256         82.6        73.5

   no encryption                    237        131

>>   
> 
> Good info, but was the CPU maxed or was something else the limiting factor?

To be honest, I didn't check when I benchmarked, but the underlying
device is much faster. I added the numbers to the table above. This is
for an md RAID-0 of two 1TB Samsung drives. I don't know why the write
speed for the RAID-0 is so much slower, except that it's not md's fault;
writing to the individual drives is slower, too. I would have
investigated more, but, at the time, I really wanted to get my computer
operational again. :)

That might be lowering my encrypted write speeds a bit relative to the
read speeds, but, even if so, I think it would affect the faster of the
two ciphers more than the slower--and twofish still leads by a
significant margin.


Also, to the original poster:
Check which crypto drivers in your kernel have ASM implementations loaded:
$ grep asm /proc/crypto

AES, twofish, and salsa20 are available.


-Corey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-22 14:02             ` Goswin von Brederlow
@ 2009-04-23  7:48               ` John Robinson
  0 siblings, 0 replies; 30+ messages in thread
From: John Robinson @ 2009-04-23  7:48 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Linux RAID

On 22/04/2009 15:02, Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
>> Another issue might be the LVM layer; does that need to be stopped or
>> switched to read-only too?
> 
> Debian does
> 
> /sbin/vgchange -aln --ignorelockingfailure || return 2
> 
> before S60mdadm-raid, S60umountroot and S90reboot.

But that's not going to switch any VG with a still-mounted filesystem 
(e.g. /) to read-only or make it go away, it's going to fail. Still 
probably a good idea for other circumstances, though.

> I've been using a 1GB / for years and years now so that won't be a
> problem. As for the rest one can also bind mount /usr, /var, /home to
> /mnt/space/* respectively. I.e. have just 2 (/ and everything else)
> partitions.

Well, I have just 2, /boot and everything else, but I might in the 
future switch to your suggestion.

> Esspecially for XEN hosts I find LVM verry usefull. Makes it easy to
> create new logical volumes for new xen domains.

My thoughts exactly :-)

Many thanks,

John.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-22 14:21             ` Andre Noll
@ 2009-04-23  8:04               ` John Robinson
  2009-04-23 20:23                 ` Goswin von Brederlow
  0 siblings, 1 reply; 30+ messages in thread
From: John Robinson @ 2009-04-23  8:04 UTC (permalink / raw)
  To: Linux RAID; +Cc: Goswin von Brederlow

On 22/04/2009 15:21, Andre Noll wrote:
> On 13:41, John Robinson wrote:
>>> Normal shutdown should put the raid in read-only mode as last step. At
>>> least Debian does that. That way even a mounted raid will be clean
>>> after reboot.
>> Yes, I would have thought it should as well. But I've just looked at 
>> CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try to 
>> switch md devices to read-only.
> 
> There's no need to do that in the shutdown script as the kernel will
> switch all arrays to read-only mode on halt/reboot.
> 
> Moreover, as raid arrays are automatically marked clean if no writes
> are pending for some small time period, a simple "sync; sleep 1"
> at the end of the shutdown script is usually enough to have a clean
> array during the next boot.

But that's still only "usually". Considering the enormous efforts taken 
to unmount filesystems (or remount them read-only) so they're certain to 
be clean at the next startup, it seems odd to settle for "usually"... 
and CentOS 5 doesn't even appear to do that.

Goswin, please can you tell me what command Debian uses? I think I want 
to combine both of these into my systems' halt scripts.

Cheers,

John.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-23  8:04               ` John Robinson
@ 2009-04-23 20:23                 ` Goswin von Brederlow
  0 siblings, 0 replies; 30+ messages in thread
From: Goswin von Brederlow @ 2009-04-23 20:23 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID, Goswin von Brederlow

John Robinson <john.robinson@anonymous.org.uk> writes:

> On 22/04/2009 15:21, Andre Noll wrote:
>> On 13:41, John Robinson wrote:
>>>> Normal shutdown should put the raid in read-only mode as last step. At
>>>> least Debian does that. That way even a mounted raid will be clean
>>>> after reboot.
>>> Yes, I would have thought it should as well. But I've just looked
>>> at CentOS 5's /etc/rc.d/halt and as far as I can see it doesn't try
>>> to switch md devices to read-only.
>>
>> There's no need to do that in the shutdown script as the kernel will
>> switch all arrays to read-only mode on halt/reboot.
>>
>> Moreover, as raid arrays are automatically marked clean if no writes
>> are pending for some small time period, a simple "sync; sleep 1"
>> at the end of the shutdown script is usually enough to have a clean
>> array during the next boot.
>
> But that's still only "usually". Considering the enormous efforts
> taken to unmount filesystems (or remount them read-only) so they're
> certain to be clean at the next startup, it seems odd to settle for
> "usually"... and CentOS 5 doesn't even appear to do that.
>
> Goswin, please can you tell me what command Debian uses? I think I
> want to combine both of these into my systems' halt scripts.
>
> Cheers,
>
> John.

On halt I do see a message about the raid being switched to read-only
but I don't see any command that would do that. So I do believe this
is, as Andre says, the kernel switching the raid read-only before
halting.

Maybe your kernel is too old to have this feature?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21 12:40               ` Johannes Segitz
@ 2009-04-24 13:49                 ` Johannes Segitz
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-24 13:49 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 21, 2009 at 2:40 PM, Johannes Segitz
<johannes.segitz@gmail.com> wrote:
> I will give it a try and i hope you're right since i can't recreate the array
> when i used the other drives because currently they are used in another
> array which then will be destroyed. I'll try it later and then post the results

One of the drives is failing so i have to wait till i get a
replacement. Currently
a replacement drive is connected via USB so a benchmark wouldn't make
much sense. I'll retry the dd test when the new drive is build in.

Johannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Performance of a software raid 5
  2009-04-21  5:46             ` Neil Brown
  2009-04-21 12:40               ` Johannes Segitz
@ 2009-04-26 17:03               ` Johannes Segitz
  1 sibling, 0 replies; 30+ messages in thread
From: Johannes Segitz @ 2009-04-26 17:03 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 21, 2009 at 7:46 AM, Neil Brown <neilb@suse.de> wrote:
> I suspect you will see that improve when you add another drive that it
> isn't running degraded.

Well, i didn't but it doesn't really matter since the crypto layer slows
it down to a crawl

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md6 : active raid5 sdd1[0] sdc1[1] md7[3] md8[4] sde1[2]
      3907039232 blocks level 5, 256k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 1/15 pages [4KB], 32768KB chunk

md7 : active raid0 sdg1[0] sda1[1]
      976767744 blocks 128k chunks

md8 : active raid0 sdh1[1] sdb1[0]
      976767744 blocks 128k chunks

So everything is okay now, no missing drive and the bad drive is now gone.

for d in /dev/sd[cde]1 /dev/md[678] /dev/mapper/daten bigfile
do
   echo $d
   dd if=$d of=/dev/null bs=1M count=1000
done

/dev/sdc1
1048576000 bytes (1.0 GB) copied, 6.13302 s, 171 MB/s
/dev/sdd1
1048576000 bytes (1.0 GB) copied, 12.2261 s, 85.8 MB/s
/dev/sde1
1048576000 bytes (1.0 GB) copied, 11.8026 s, 88.8 MB/s
/dev/md6
1048576000 bytes (1.0 GB) copied, 6.42977 s, 163 MB/s
/dev/md7
1048576000 bytes (1.0 GB) copied, 9.51655 s, 110 MB/s
/dev/md8
1048576000 bytes (1.0 GB) copied, 7.97321 s, 132 MB/s
/dev/mapper/daten
1048576000 bytes (1.0 GB) copied, 28.6309 s, 36.6 MB/s
bigfile
1048576000 bytes (1.0 GB) copied, 31.9715 s, 32.8 MB/s

So the raid works okay, although i'm not thrilled by 163 MB/s
read speed when i see what the underlying devices are capable
of. But the real bad drop seems to be the crypto and with
http://tynne.de/linux-crypto-speed in mind it seems quite a
reasonable speed, although i'm disappointed since i expected
more. Next time i won't expect but do better tests beforehand.

Thanks everyone for their help.
Johannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Poor write performance with write-intent bitmap?
  2009-04-21 12:05           ` John Robinson
@ 2009-05-22 23:00             ` Redeeman
  0 siblings, 0 replies; 30+ messages in thread
From: Redeeman @ 2009-05-22 23:00 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

On Tue, 2009-04-21 at 13:05 +0100, John Robinson wrote:
> On 21/04/2009 06:50, Neil Brown wrote:
> > On Tuesday April 21, john.robinson@anonymous.org.uk wrote:
> >> Eeek! Trying to `mdadm --grow /dev/md1 --bitmap=none` from my large 
> >> chunk size caused a reboot! There's nothing in the log, and I didn't see 
> >> the console. I still have my 32M chunksize but I don't want to try that 
> >> again in a hurry :-)
> > 
> > That's a worry... I cannot easily reproduce it.  If it happens again
> > and you get any more detail, I'm sure you'll let me know.
> 
> Sure will. For the moment I have something that looks slightly 
> inconsistent: mdadm --detail shows no bitmap after the crash:
> # mdadm --detail /dev/md1
> /dev/md1:
>          Version : 00.90.03
>    Creation Time : Mon Jul 28 15:49:09 2008
>       Raid Level : raid5
>       Array Size : 1953310720 (1862.82 GiB 2000.19 GB)
>    Used Dev Size : 976655360 (931.41 GiB 1000.10 GB)
>     Raid Devices : 3
>    Total Devices : 3
> Preferred Minor : 1
>      Persistence : Superblock is persistent
> 
>      Update Time : Tue Apr 21 12:37:15 2009
>            State : clean
>   Active Devices : 3
> Working Devices : 3
>   Failed Devices : 0
>    Spare Devices : 0
> 
>           Layout : left-symmetric
>       Chunk Size : 256K
> 
>             UUID : d8c57a89:166ee722:23adec48:1574b5fc
>           Events : 0.6152
> 
>      Number   Major   Minor   RaidDevice State
>         0       8        2        0      active sync   /dev/sda2
>         1       8       18        1      active sync   /dev/sdb2
>         2       8       34        2      active sync   /dev/sdc2
> 
> and indeed another attempt to remove the bitmap fails gently:
> # mdadm --grow /dev/md1 --bitmap none
> mdadm: no bitmap found on /dev/md1
> 
> However examining any of the devices making up the RAID appears to 
> suggest there is a bitmap:
> # mdadm --examine-bitmap /dev/sda2
>          Filename : /dev/sda2
>             Magic : 6d746962
>           Version : 4
>              UUID : d8c57a89:166ee722:23adec48:1574b5fc
>            Events : 6148
>    Events Cleared : 6148
>             State : OK
>         Chunksize : 32 MB
>            Daemon : 5s flush period
>        Write Mode : Normal
>         Sync Size : 976655360 (931.41 GiB 1000.10 GB)
>            Bitmap : 29806 bits (chunks), 10 dirty (0.0%)
> 
> Is this to be expected? I would have thought it would say nothing here, 
> or say there's no bitmap.

Hmm very good question, I'd like to know that aswell..

<snip>
> Many thanks for all your advice and assistance.
> 
> Cheers,
> 
> John.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-05-22 23:00 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-20 17:12 Performance of a software raid 5 Johannes Segitz
2009-04-20 23:46 ` John Robinson
2009-04-21  0:10   ` Johannes Segitz
2009-04-21  0:52     ` John Robinson
2009-04-21  1:05       ` Johannes Segitz
2009-04-21  1:12         ` John Robinson
2009-04-21  1:19         ` NeilBrown
2009-04-21  2:04           ` Johannes Segitz
2009-04-21  5:46             ` Neil Brown
2009-04-21 12:40               ` Johannes Segitz
2009-04-24 13:49                 ` Johannes Segitz
2009-04-26 17:03               ` Johannes Segitz
2009-04-21 18:56             ` Corey Hickey
2009-04-22 12:29               ` Bill Davidsen
2009-04-22 22:32                 ` Corey Hickey
2009-04-22  9:07           ` Goswin von Brederlow
2009-04-21  0:44   ` Poor write performance with write-intent bitmap? John Robinson
2009-04-21  1:33     ` NeilBrown
2009-04-21  2:13       ` John Robinson
2009-04-21  5:50         ` Neil Brown
2009-04-21 12:05           ` John Robinson
2009-05-22 23:00             ` Redeeman
2009-04-22  9:16         ` Goswin von Brederlow
2009-04-22 12:41           ` John Robinson
2009-04-22 14:02             ` Goswin von Brederlow
2009-04-23  7:48               ` John Robinson
2009-04-22 14:21             ` Andre Noll
2009-04-23  8:04               ` John Robinson
2009-04-23 20:23                 ` Goswin von Brederlow
2009-04-21 16:00       ` Bill Davidsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.