All of lore.kernel.org
 help / color / mirror / Atom feed
* slow 'check'
@ 2007-02-10  5:41 Eyal Lebedinsky
  2007-02-10  7:41 ` Raz Ben-Jehuda(caro)
  2007-02-10  9:25 ` Justin Piszcz
  0 siblings, 2 replies; 10+ messages in thread
From: Eyal Lebedinsky @ 2007-02-10  5:41 UTC (permalink / raw)
  To: linux-raid list

I have a six-disk RAID5 over sata. First two disks are on the mobo and last four
are on a Promise SATA-II-150-TX4. The sixth disk was added recently and I decided
to run a 'check' periodically, and started one manually to see how long it should
take. Vanilla 2.6.20.

A 'dd' test shows:

# dd if=/dev/md0 of=/dev/null bs=1024k count=10240
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)

This is good for this setup. A check shows:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  check =  0.8% (2518144/312568576) finish=2298.3min speed=2246K/sec

unused devices: <none>

which is an order of magnitude slower (the speed is per-disk, call it 13MB/s
for the six). There is no activity on the RAID. Is this expected? I assume
that the simple dd does the same amount of work (don't we check parity on
read?).

I have these tweaked at bootup:
	echo 4096 >/sys/block/md0/md/stripe_cache_size
	blockdev --setra 32768 /dev/md0

Changing the above parameters seems to not have a significant effect.

The check logs the following:

md: data-check of RAID array md0
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
md: using 128k window, over a total of 312568576 blocks.

Does it need a larger window (whatever a window is)? If so, can it
be set dynamically?

TIA

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10  5:41 slow 'check' Eyal Lebedinsky
@ 2007-02-10  7:41 ` Raz Ben-Jehuda(caro)
  2007-02-10  9:57   ` Eyal Lebedinsky
  2007-02-10 20:18   ` Bill Davidsen
  2007-02-10  9:25 ` Justin Piszcz
  1 sibling, 2 replies; 10+ messages in thread
From: Raz Ben-Jehuda(caro) @ 2007-02-10  7:41 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: Linux RAID Mailing List

On 2/10/07, Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
> I have a six-disk RAID5 over sata. First two disks are on the mobo and last four
> are on a Promise SATA-II-150-TX4. The sixth disk was added recently and I decided
> to run a 'check' periodically, and started one manually to see how long it should
> take. Vanilla 2.6.20.
>
> A 'dd' test shows:
>
> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)
try dd with bs of 4x(5x256) = 5 M.

> This is good for this setup. A check shows:
>
> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>      [>....................]  check =  0.8% (2518144/312568576) finish=2298.3min speed=2246K/sec
>
> unused devices: <none>
>
> which is an order of magnitude slower (the speed is per-disk, call it 13MB/s
> for the six). There is no activity on the RAID. Is this expected? I assume
> that the simple dd does the same amount of work (don't we check parity on
> read?).
>
> I have these tweaked at bootup:
>        echo 4096 >/sys/block/md0/md/stripe_cache_size
>        blockdev --setra 32768 /dev/md0
>
> Changing the above parameters seems to not have a significant effect.
Stripe cache size is less effective than previous versions
of raid5 since in some cases it is being bypassed.
Why do you check random access to the raid
and not sequential access.

> The check logs the following:
>
> md: data-check of RAID array md0
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> md: using 128k window, over a total of 312568576 blocks.
>
> Does it need a larger window (whatever a window is)? If so, can it
> be set dynamically?
>
> TIA
>
> --
> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
>        attach .zip as .dat
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Raz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10  5:41 slow 'check' Eyal Lebedinsky
  2007-02-10  7:41 ` Raz Ben-Jehuda(caro)
@ 2007-02-10  9:25 ` Justin Piszcz
  2007-02-10 10:15   ` Eyal Lebedinsky
  1 sibling, 1 reply; 10+ messages in thread
From: Justin Piszcz @ 2007-02-10  9:25 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-raid list



On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:

> I have a six-disk RAID5 over sata. First two disks are on the mobo and last four
> are on a Promise SATA-II-150-TX4. The sixth disk was added recently and I decided
> to run a 'check' periodically, and started one manually to see how long it should
> take. Vanilla 2.6.20.
>
> A 'dd' test shows:
>
> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)
>
> This is good for this setup. A check shows:
>
> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>      [>....................]  check =  0.8% (2518144/312568576) finish=2298.3min speed=2246K/sec
>
> unused devices: <none>
>
> which is an order of magnitude slower (the speed is per-disk, call it 13MB/s
> for the six). There is no activity on the RAID. Is this expected? I assume
> that the simple dd does the same amount of work (don't we check parity on
> read?).
>
> I have these tweaked at bootup:
> 	echo 4096 >/sys/block/md0/md/stripe_cache_size
> 	blockdev --setra 32768 /dev/md0
>
> Changing the above parameters seems to not have a significant effect.
>
> The check logs the following:
>
> md: data-check of RAID array md0
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> md: using 128k window, over a total of 312568576 blocks.
>
> Does it need a larger window (whatever a window is)? If so, can it
> be set dynamically?
>
> TIA
>
> -- 
> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
> 	attach .zip as .dat
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

As you add disks onto the PCI bus it will get slower.  For 6 disks you 
should get faster than 2MB/s however..

You can try increasing the min speed of the raid rebuild.

Justin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10  7:41 ` Raz Ben-Jehuda(caro)
@ 2007-02-10  9:57   ` Eyal Lebedinsky
  2007-02-10 20:18   ` Bill Davidsen
  1 sibling, 0 replies; 10+ messages in thread
From: Eyal Lebedinsky @ 2007-02-10  9:57 UTC (permalink / raw)
  To: Raz Ben-Jehuda(caro); +Cc: Linux RAID Mailing List

Raz Ben-Jehuda(caro) wrote:
> On 2/10/07, Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
> 
>> I have a six-disk RAID5 over sata. First two disks are on the mobo and
>> last four
>> are on a Promise SATA-II-150-TX4. The sixth disk was added recently
>> and I decided
>> to run a 'check' periodically, and started one manually to see how
>> long it should
>> take. Vanilla 2.6.20.
>>
>> A 'dd' test shows:
>>
>> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)
> 
> try dd with bs of 4x(5x256) = 5 M.

About the same:

# dd if=/dev/md0 of=/dev/null bs=5120k count=1024
1024+0 records in
1024+0 records out
5368709120 bytes transferred in 42.736373 seconds (125623883 bytes/sec)

Each disk pulls about 65MB/s alone, however with six concurrent dd's
the two mobo disks manage ~60MB/s while the four on the TX4 do only ~20MB/s.

>> This is good for this setup. A check shows:
>>
>> $ cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>>      [>....................]  check =  0.8% (2518144/312568576)
>> finish=2298.3min speed=2246K/sec
>>
>> unused devices: <none>
>>
>> which is an order of magnitude slower (the speed is per-disk, call it
>> 13MB/s
>> for the six). There is no activity on the RAID. Is this expected? I
>> assume
>> that the simple dd does the same amount of work (don't we check parity on
>> read?).
>>
>> I have these tweaked at bootup:
>>        echo 4096 >/sys/block/md0/md/stripe_cache_size
>>        blockdev --setra 32768 /dev/md0
>>
>> Changing the above parameters seems to not have a significant effect.
> 
> Stripe cache size is less effective than previous versions
> of raid5 since in some cases it is being bypassed.
> Why do you check random access to the raid
> and not sequential access.

What do you mean? I understand that 'setra' sets the readahead which
should not hurt sequential access. But I did try to take it down
without seeing any improvement:

# blockdev --setra 1024 /dev/md0
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  check =  0.0% (51456/312568576) finish=2326.1min speed=2237K/sec

Anyway, I was not checking anything but doing a raid check which
I recall was doing much better (20M+) with 5 devices on older kernels.

>> The check logs the following:
>>
>> md: data-check of RAID array md0
>> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> md: using maximum available idle IO bandwidth (but not more than
>> 200000 KB/sec) for data-check.
>> md: using 128k window, over a total of 312568576 blocks.
>>
>> Does it need a larger window (whatever a window is)? If so, can it
>> be set dynamically?
>>
>> TIA
>>
>> -- 
>> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10  9:25 ` Justin Piszcz
@ 2007-02-10 10:15   ` Eyal Lebedinsky
  2007-02-10 10:23     ` Justin Piszcz
  0 siblings, 1 reply; 10+ messages in thread
From: Eyal Lebedinsky @ 2007-02-10 10:15 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid list

Justin Piszcz wrote:
> 
> 
> On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:
> 
>> I have a six-disk RAID5 over sata. First two disks are on the mobo and
>> last four
>> are on a Promise SATA-II-150-TX4. The sixth disk was added recently
>> and I decided
>> to run a 'check' periodically, and started one manually to see how
>> long it should
>> take. Vanilla 2.6.20.
>>
>> A 'dd' test shows:
>>
>> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)
>>
>> This is good for this setup. A check shows:
>>
>> $ cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>>      [>....................]  check =  0.8% (2518144/312568576)
>> finish=2298.3min speed=2246K/sec
>>
>> unused devices: <none>
>>
>> which is an order of magnitude slower (the speed is per-disk, call it
>> 13MB/s
>> for the six). There is no activity on the RAID. Is this expected? I
>> assume
>> that the simple dd does the same amount of work (don't we check parity on
>> read?).
>>
>> I have these tweaked at bootup:
>>     echo 4096 >/sys/block/md0/md/stripe_cache_size
>>     blockdev --setra 32768 /dev/md0
>>
>> Changing the above parameters seems to not have a significant effect.
>>
>> The check logs the following:
>>
>> md: data-check of RAID array md0
>> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> md: using maximum available idle IO bandwidth (but not more than
>> 200000 KB/sec) for data-check.
>> md: using 128k window, over a total of 312568576 blocks.
>>
>> Does it need a larger window (whatever a window is)? If so, can it
>> be set dynamically?
>>
>> TIA
>>
>> -- 
>> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
>>     attach .zip as .dat
> 
> As you add disks onto the PCI bus it will get slower.  For 6 disks you
> should get faster than 2MB/s however..
> 
> You can try increasing the min speed of the raid rebuild.

Interesting - this does help. I wonder why it used much more i/o by
default before. It still uses only ~16% CPU.

# echo 20000 >/sys/block/md0/md/sync_speed_min
# echo check >/sys/block/md0/md/sync_action
... wait about 10s for the process to settle...
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  check =  0.1% (364928/312568576) finish=256.6min speed=20273K/sec
# echo idle >/sys/block/md0/md/sync_action

Raising it further only manages about 21MB/s (the _max is set to 200MB/s)
as expected; this is what the TX4 delivers with four disks. I need a better
controller (or is the linux driver slow?).

> Justin.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10 10:15   ` Eyal Lebedinsky
@ 2007-02-10 10:23     ` Justin Piszcz
  2007-02-10 20:35       ` Bill Davidsen
  0 siblings, 1 reply; 10+ messages in thread
From: Justin Piszcz @ 2007-02-10 10:23 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-raid list



On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:

> Justin Piszcz wrote:
>>
>>
>> On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:
>>
>>> I have a six-disk RAID5 over sata. First two disks are on the mobo and
>>> last four
>>> are on a Promise SATA-II-150-TX4. The sixth disk was added recently
>>> and I decided
>>> to run a 'check' periodically, and started one manually to see how
>>> long it should
>>> take. Vanilla 2.6.20.
>>>
>>> A 'dd' test shows:
>>>
>>> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
>>> 10240+0 records in
>>> 10240+0 records out
>>> 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)
>>>
>>> This is good for this setup. A check shows:
>>>
>>> $ cat /proc/mdstat
>>> Personalities : [raid6] [raid5] [raid4]
>>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>>>      [>....................]  check =  0.8% (2518144/312568576)
>>> finish=2298.3min speed=2246K/sec
>>>
>>> unused devices: <none>
>>>
>>> which is an order of magnitude slower (the speed is per-disk, call it
>>> 13MB/s
>>> for the six). There is no activity on the RAID. Is this expected? I
>>> assume
>>> that the simple dd does the same amount of work (don't we check parity on
>>> read?).
>>>
>>> I have these tweaked at bootup:
>>>     echo 4096 >/sys/block/md0/md/stripe_cache_size
>>>     blockdev --setra 32768 /dev/md0
>>>
>>> Changing the above parameters seems to not have a significant effect.
>>>
>>> The check logs the following:
>>>
>>> md: data-check of RAID array md0
>>> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>>> md: using maximum available idle IO bandwidth (but not more than
>>> 200000 KB/sec) for data-check.
>>> md: using 128k window, over a total of 312568576 blocks.
>>>
>>> Does it need a larger window (whatever a window is)? If so, can it
>>> be set dynamically?
>>>
>>> TIA
>>>
>>> --
>>> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
>>>     attach .zip as .dat
>>
>> As you add disks onto the PCI bus it will get slower.  For 6 disks you
>> should get faster than 2MB/s however..
>>
>> You can try increasing the min speed of the raid rebuild.
>
> Interesting - this does help. I wonder why it used much more i/o by
> default before. It still uses only ~16% CPU.
>
> # echo 20000 >/sys/block/md0/md/sync_speed_min
> # echo check >/sys/block/md0/md/sync_action
> ... wait about 10s for the process to settle...
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>      [>....................]  check =  0.1% (364928/312568576) finish=256.6min speed=20273K/sec
> # echo idle >/sys/block/md0/md/sync_action
>
> Raising it further only manages about 21MB/s (the _max is set to 200MB/s)
> as expected; this is what the TX4 delivers with four disks. I need a better
> controller (or is the linux driver slow?).
>
>> Justin.

You are maxing out the PCI Bus, remember each bit/parity/verify operation 
has to go to each disk.  If you get an entirely PCI-e system you will see 
rates 50-100-150-200MB/s easily.  I used to have 10 x 400GB drives on a 
PCI bus, after 2 or 3 drives, you max out the PCI bus, this is why you 
need PCI-e, each slot has its own lane of bandwidth.

21MB/s is about right for 5-6 disks, when you go to 10 it drops to about 
5-8MB/s on a PCI system.

Justin.

>
> -- 
> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
> 	attach .zip as .dat
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10  7:41 ` Raz Ben-Jehuda(caro)
  2007-02-10  9:57   ` Eyal Lebedinsky
@ 2007-02-10 20:18   ` Bill Davidsen
  1 sibling, 0 replies; 10+ messages in thread
From: Bill Davidsen @ 2007-02-10 20:18 UTC (permalink / raw)
  To: Raz Ben-Jehuda(caro); +Cc: Eyal Lebedinsky, Linux RAID Mailing List

Raz Ben-Jehuda(caro) wrote:
> On 2/10/07, Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
>> I have a six-disk RAID5 over sata. First two disks are on the mobo 
>> and last four
>> are on a Promise SATA-II-150-TX4. The sixth disk was added recently 
>> and I decided
>> to run a 'check' periodically, and started one manually to see how 
>> long it should
>> take. Vanilla 2.6.20.
>>
>> A 'dd' test shows:
>>
>> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec)
> try dd with bs of 4x(5x256) = 5 M.
>
>> This is good for this setup. A check shows:
>>
>> $ cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>>      [>....................]  check =  0.8% (2518144/312568576) 
>> finish=2298.3min speed=2246K/sec
>>
>> unused devices: <none>
>>
>> which is an order of magnitude slower (the speed is per-disk, call it 
>> 13MB/s
>> for the six). There is no activity on the RAID. Is this expected? I 
>> assume
>> that the simple dd does the same amount of work (don't we check 
>> parity on
>> read?).
>>
>> I have these tweaked at bootup:
>>        echo 4096 >/sys/block/md0/md/stripe_cache_size
>>        blockdev --setra 32768 /dev/md0
>>
>> Changing the above parameters seems to not have a significant effect.
> Stripe cache size is less effective than previous versions
> of raid5 since in some cases it is being bypassed.
> Why do you check random access to the raid
> and not sequential access. 
What on Earth makes you think dd uses random access???

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10 10:23     ` Justin Piszcz
@ 2007-02-10 20:35       ` Bill Davidsen
  2007-02-11  9:30         ` Raz Ben-Jehuda(caro)
  0 siblings, 1 reply; 10+ messages in thread
From: Bill Davidsen @ 2007-02-10 20:35 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Eyal Lebedinsky, linux-raid list

Justin Piszcz wrote:
>
>
> On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:
>
>> Justin Piszcz wrote:
>>>
>>>
>>> On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:
>>>
>>>> I have a six-disk RAID5 over sata. First two disks are on the mobo and
>>>> last four
>>>> are on a Promise SATA-II-150-TX4. The sixth disk was added recently
>>>> and I decided
>>>> to run a 'check' periodically, and started one manually to see how
>>>> long it should
>>>> take. Vanilla 2.6.20.
>>>>
>>>> A 'dd' test shows:
>>>>
>>>> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
>>>> 10240+0 records in
>>>> 10240+0 records out
>>>> 10737418240 bytes transferred in 84.449870 seconds (127145468 
>>>> bytes/sec)
>>>>
>>>> This is good for this setup. A check shows:
>>>>
>>>> $ cat /proc/mdstat
>>>> Personalities : [raid6] [raid5] [raid4]
>>>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>>>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>>>>      [>....................]  check =  0.8% (2518144/312568576)
>>>> finish=2298.3min speed=2246K/sec
>>>>
>>>> unused devices: <none>
>>>>
>>>> which is an order of magnitude slower (the speed is per-disk, call it
>>>> 13MB/s
>>>> for the six). There is no activity on the RAID. Is this expected? I
>>>> assume
>>>> that the simple dd does the same amount of work (don't we check 
>>>> parity on
>>>> read?).
>>>>
>>>> I have these tweaked at bootup:
>>>>     echo 4096 >/sys/block/md0/md/stripe_cache_size
>>>>     blockdev --setra 32768 /dev/md0
>>>>
>>>> Changing the above parameters seems to not have a significant effect.
>>>>
>>>> The check logs the following:
>>>>
>>>> md: data-check of RAID array md0
>>>> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>>>> md: using maximum available idle IO bandwidth (but not more than
>>>> 200000 KB/sec) for data-check.
>>>> md: using 128k window, over a total of 312568576 blocks.
>>>>
>>>> Does it need a larger window (whatever a window is)? If so, can it
>>>> be set dynamically?
>>>>
>>>> TIA
>>>>
>>>> -- 
>>>> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
>>>>     attach .zip as .dat
>>>
>>> As you add disks onto the PCI bus it will get slower.  For 6 disks you
>>> should get faster than 2MB/s however..
>>>
>>> You can try increasing the min speed of the raid rebuild.
>>
>> Interesting - this does help. I wonder why it used much more i/o by
>> default before. It still uses only ~16% CPU.
>>
>> # echo 20000 >/sys/block/md0/md/sync_speed_min
>> # echo check >/sys/block/md0/md/sync_action
>> ... wait about 10s for the process to settle...
>> # cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
>>      [>....................]  check =  0.1% (364928/312568576) 
>> finish=256.6min speed=20273K/sec
>> # echo idle >/sys/block/md0/md/sync_action
>>
>> Raising it further only manages about 21MB/s (the _max is set to 
>> 200MB/s)
>> as expected; this is what the TX4 delivers with four disks. I need a 
>> better
>> controller (or is the linux driver slow?).
>>
>>> Justin.
>
> You are maxing out the PCI Bus, remember each bit/parity/verify 
> operation has to go to each disk.  If you get an entirely PCI-e system 
> you will see rates 50-100-150-200MB/s easily.  I used to have 10 x 
> 400GB drives on a PCI bus, after 2 or 3 drives, you max out the PCI 
> bus, this is why you need PCI-e, each slot has its own lane of bandwidth.

>
> 21MB/s is about right for 5-6 disks, when you go to 10 it drops to 
> about 5-8MB/s on a PCI system. 
Wait, let's say that we have three drives and 1m chunk size. So we read 
1M here, 1M there, and 1M somewhere else, and get 2M data and 1M parity 
which we check. With five we would read 4M data and 1M parity, but have 
4M checked. The end case is that for each stripe we read N*chunk bytes 
and verify (N-1)*chunk. In fact the data is (N-1)/N of the stripe, and 
the percentage gets higher (not lower) as you add drives. I see no 
reason why more drives would be slower, a higher percentage of the bytes 
read are data.

That doesn't mean that you can't run out of Bus bandwidth, but number of 
drives is not obviously the issue.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-10 20:35       ` Bill Davidsen
@ 2007-02-11  9:30         ` Raz Ben-Jehuda(caro)
  2007-02-14 16:41           ` Bill Davidsen
  0 siblings, 1 reply; 10+ messages in thread
From: Raz Ben-Jehuda(caro) @ 2007-02-11  9:30 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-raid list

I suggest you test all drives concurrently with dd.
load dd on sda , then sdb slowly one after the other and
see whether the throughput degrades. use iostat.
furtheremore, dd is not the measure for random access.

On 2/10/07, Bill Davidsen <davidsen@tmr.com> wrote:
> Justin Piszcz wrote:
> >
> >
> > On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:
> >
> >> Justin Piszcz wrote:
> >>>
> >>>
> >>> On Sat, 10 Feb 2007, Eyal Lebedinsky wrote:
> >>>
> >>>> I have a six-disk RAID5 over sata. First two disks are on the mobo and
> >>>> last four
> >>>> are on a Promise SATA-II-150-TX4. The sixth disk was added recently
> >>>> and I decided
> >>>> to run a 'check' periodically, and started one manually to see how
> >>>> long it should
> >>>> take. Vanilla 2.6.20.
> >>>>
> >>>> A 'dd' test shows:
> >>>>
> >>>> # dd if=/dev/md0 of=/dev/null bs=1024k count=10240
> >>>> 10240+0 records in
> >>>> 10240+0 records out
> >>>> 10737418240 bytes transferred in 84.449870 seconds (127145468
> >>>> bytes/sec)
> >>>>
> >>>> This is good for this setup. A check shows:
> >>>>
> >>>> $ cat /proc/mdstat
> >>>> Personalities : [raid6] [raid5] [raid4]
> >>>> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
> >>>>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
> >>>>      [>....................]  check =  0.8% (2518144/312568576)
> >>>> finish=2298.3min speed=2246K/sec
> >>>>
> >>>> unused devices: <none>
> >>>>
> >>>> which is an order of magnitude slower (the speed is per-disk, call it
> >>>> 13MB/s
> >>>> for the six). There is no activity on the RAID. Is this expected? I
> >>>> assume
> >>>> that the simple dd does the same amount of work (don't we check
> >>>> parity on
> >>>> read?).
> >>>>
> >>>> I have these tweaked at bootup:
> >>>>     echo 4096 >/sys/block/md0/md/stripe_cache_size
> >>>>     blockdev --setra 32768 /dev/md0
> >>>>
> >>>> Changing the above parameters seems to not have a significant effect.
> >>>>
> >>>> The check logs the following:
> >>>>
> >>>> md: data-check of RAID array md0
> >>>> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> >>>> md: using maximum available idle IO bandwidth (but not more than
> >>>> 200000 KB/sec) for data-check.
> >>>> md: using 128k window, over a total of 312568576 blocks.
> >>>>
> >>>> Does it need a larger window (whatever a window is)? If so, can it
> >>>> be set dynamically?
> >>>>
> >>>> TIA
> >>>>
> >>>> --
> >>>> Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
> >>>>     attach .zip as .dat
> >>>
> >>> As you add disks onto the PCI bus it will get slower.  For 6 disks you
> >>> should get faster than 2MB/s however..
> >>>
> >>> You can try increasing the min speed of the raid rebuild.
> >>
> >> Interesting - this does help. I wonder why it used much more i/o by
> >> default before. It still uses only ~16% CPU.
> >>
> >> # echo 20000 >/sys/block/md0/md/sync_speed_min
> >> # echo check >/sys/block/md0/md/sync_action
> >> ... wait about 10s for the process to settle...
> >> # cat /proc/mdstat
> >> Personalities : [raid6] [raid5] [raid4]
> >> md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
> >>      1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
> >>      [>....................]  check =  0.1% (364928/312568576)
> >> finish=256.6min speed=20273K/sec
> >> # echo idle >/sys/block/md0/md/sync_action
> >>
> >> Raising it further only manages about 21MB/s (the _max is set to
> >> 200MB/s)
> >> as expected; this is what the TX4 delivers with four disks. I need a
> >> better
> >> controller (or is the linux driver slow?).
> >>
> >>> Justin.
> >
> > You are maxing out the PCI Bus, remember each bit/parity/verify
> > operation has to go to each disk.  If you get an entirely PCI-e system
> > you will see rates 50-100-150-200MB/s easily.  I used to have 10 x
> > 400GB drives on a PCI bus, after 2 or 3 drives, you max out the PCI
> > bus, this is why you need PCI-e, each slot has its own lane of bandwidth.
>
> >
> > 21MB/s is about right for 5-6 disks, when you go to 10 it drops to
> > about 5-8MB/s on a PCI system.
> Wait, let's say that we have three drives and 1m chunk size. So we read
> 1M here, 1M there, and 1M somewhere else, and get 2M data and 1M parity
> which we check. With five we would read 4M data and 1M parity, but have
> 4M checked. The end case is that for each stripe we read N*chunk bytes
> and verify (N-1)*chunk. In fact the data is (N-1)/N of the stripe, and
> the percentage gets higher (not lower) as you add drives. I see no
> reason why more drives would be slower, a higher percentage of the bytes
> read are data.
>
> That doesn't mean that you can't run out of Bus bandwidth, but number of
> drives is not obviously the issue.
>
> --
> bill davidsen <davidsen@tmr.com>
>   CTO TMR Associates, Inc
>   Doing interesting things with small computers since 1979
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Raz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: slow 'check'
  2007-02-11  9:30         ` Raz Ben-Jehuda(caro)
@ 2007-02-14 16:41           ` Bill Davidsen
  0 siblings, 0 replies; 10+ messages in thread
From: Bill Davidsen @ 2007-02-14 16:41 UTC (permalink / raw)
  To: Raz Ben-Jehuda(caro); +Cc: Eyal Lebedinsky, linux-raid list

Raz Ben-Jehuda(caro) wrote:
> I suggest you test all drives concurrently with dd.
> load dd on sda , then sdb slowly one after the other and
> see whether the throughput degrades. use iostat.
> furtheremore, dd is not the measure for random access.
AFAIK 'check' does no do random access, which was the original question. 
My figures are related only to that.

For random access, read should access only one drive unless there's an 
error, and write two, data and updated parity. I don't have the tool I 
want to measure this properly, perhaps later this week I'll generate one.
>
> On 2/10/07, Bill Davidsen <davidsen@tmr.com> wrote:
>>
>> Wait, let's say that we have three drives and 1m chunk size. So we read
>> 1M here, 1M there, and 1M somewhere else, and get 2M data and 1M parity
>> which we check. With five we would read 4M data and 1M parity, but have
>> 4M checked. The end case is that for each stripe we read N*chunk bytes
>> and verify (N-1)*chunk. In fact the data is (N-1)/N of the stripe, and
>> the percentage gets higher (not lower) as you add drives. I see no
>> reason why more drives would be slower, a higher percentage of the bytes
>> read are data.
>>
>> That doesn't mean that you can't run out of Bus bandwidth, but number of
>> drives is not obviously the issue. 


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-02-14 16:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-10  5:41 slow 'check' Eyal Lebedinsky
2007-02-10  7:41 ` Raz Ben-Jehuda(caro)
2007-02-10  9:57   ` Eyal Lebedinsky
2007-02-10 20:18   ` Bill Davidsen
2007-02-10  9:25 ` Justin Piszcz
2007-02-10 10:15   ` Eyal Lebedinsky
2007-02-10 10:23     ` Justin Piszcz
2007-02-10 20:35       ` Bill Davidsen
2007-02-11  9:30         ` Raz Ben-Jehuda(caro)
2007-02-14 16:41           ` Bill Davidsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.