* raid6check extremely slow ?
@ 2020-05-10 12:07 Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-10 12:07 UTC (permalink / raw)
To: linux-raid
Hi,
I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
RAID6 array and wonder why it is so extremely slow...
It seems to be reading the disks only a about 400 kB/s,
which results in an estimated time of some 57 days!!!
to complete checking the array. The system is basically idle, there
is neither any significant CPU load nor any other I/o (no to the
tested array, nor to any other storage on this system).
Am I doing something wrong?
The command I'm running is simply:
# raid6check /dev/md0 0 0
This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64).
The array data:
# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu Nov 7 19:30:03 2013
Raid Level : raid6
Array Size : 11720301024 (11177.35 GiB 12001.59 GB)
Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB)
Raid Devices : 8
Total Devices : 8
Persistence : Superblock is persistent
Update Time : Mon May 4 22:12:02 2020
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 16K
Consistency Policy : resync
Name : atlas.denx.de:0 (local to host atlas.denx.de)
UUID : 4df90724:87913791:1700bb31:773735d0
Events : 181544
Number Major Minor RaidDevice State
12 8 64 0 active sync /dev/sde
11 8 80 1 active sync /dev/sdf
13 8 112 2 active sync /dev/sdh
8 8 128 3 active sync /dev/sdi
9 8 144 4 active sync /dev/sdj
10 8 160 5 active sync /dev/sdk
14 8 176 6 active sync /dev/sdl
15 8 192 7 active sync /dev/sdm
# iostat /dev/sd[efhijklm]
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-07 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.18 0.01 1.11 0.21 0.00 98.49
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
sde 19.23 388.93 0.09 0.00 158440224 35218 0
sdf 19.20 388.94 0.09 0.00 158447574 34894 0
sdh 19.23 388.89 0.08 0.00 158425596 34178 0
sdi 19.23 388.99 0.09 0.00 158466326 34690 0
sdj 20.18 388.93 0.09 0.00 158439780 34766 0
sdk 19.23 388.88 0.09 0.00 158419988 35366 0
sdl 19.20 388.97 0.08 0.00 158457352 34426 0
sdm 19.21 388.92 0.08 0.00 158435748 34566 0
top - 09:08:19 up 4 days, 17:10, 3 users, load average: 1.00, 1.00, 1.00
Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 98.5 id, 0.1 wa, 0.6 hi, 0.1 si, 0.0 st
MiB Mem : 24034.6 total, 11198.4 free, 1871.8 used, 10964.3 buff/cache
MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21767.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19719 root 20 0 2852 2820 2020 D 5.1 0.0 285:40.07 raid6check
1123 root 20 0 0 0 0 S 0.3 0.0 25:47.54 md0_raid6
37816 root 20 0 0 0 0 I 0.3 0.0 0:00.08 kworker/3:1-events
37903 root 20 0 219680 4540 3716 R 0.3 0.0 0:00.02 top
...
HDD in use:
/dev/sde : ST2000NM0033-9ZM175
/dev/sdf : ST2000NM0033-9ZM175
/dev/sdh : ST2000NM0033-9ZM175
/dev/sdi : ST2000NM0033-9ZM175
/dev/sdj : ST2000NM0033-9ZM175
/dev/sdk : ST2000NM0033-9ZM175
/dev/sdl : ST2000NM0033-9ZM175
/dev/sdm : ST2000NM0008-2F3100
3 days later:
# iostat /dev/sd[efhijklm]
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-10 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.18 0.00 1.07 0.17 0.00 98.57
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
sde 20.15 370.73 0.10 0.00 253186948 68154 0
sdf 20.13 370.74 0.10 0.00 253194646 68138 0
sdh 20.15 370.71 0.10 0.00 253172656 67738 0
sdi 20.15 370.77 0.10 0.00 253213854 68158 0
sdj 20.72 370.73 0.10 0.00 253187084 68066 0
sdk 20.15 370.70 0.10 0.00 253166960 69286 0
sdl 20.13 370.76 0.10 0.00 253204572 68070 0
sdm 20.14 370.73 0.10 0.00 253182964 68070 0
I've tried playing with speed_limit_min/speed_limit_max, but this
didn't change anything:
# cat /proc/sys/dev/raid/speed_limit_max
2000000
cat /proc/sys/dev/raid/speed_limit_min
10000
Any ideas welcome!
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The inappropriate cannot be beautiful.
- Frank Lloyd Wright _The Future of Architecture_ (1953)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
@ 2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-11 6:33 ` Wolfgang Denk
2020-05-10 22:16 ` Guoqing Jiang
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-10 13:26 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: linux-raid
On Sun, May 10, 2020 at 02:07:25PM +0200, Wolfgang Denk wrote:
> Hi,
>
> I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
> RAID6 array and wonder why it is so extremely slow...
> It seems to be reading the disks only a about 400 kB/s,
> which results in an estimated time of some 57 days!!!
> to complete checking the array. The system is basically idle, there
> is neither any significant CPU load nor any other I/o (no to the
> tested array, nor to any other storage on this system).
>
> Am I doing something wrong?
>
>
> The command I'm running is simply:
>
> # raid6check /dev/md0 0 0
>
> This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64).
>
> The array data:
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Thu Nov 7 19:30:03 2013
> Raid Level : raid6
> Array Size : 11720301024 (11177.35 GiB 12001.59 GB)
> Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB)
> Raid Devices : 8
> Total Devices : 8
> Persistence : Superblock is persistent
>
> Update Time : Mon May 4 22:12:02 2020
> State : active
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 16K
>
> Consistency Policy : resync
>
> Name : atlas.denx.de:0 (local to host atlas.denx.de)
> UUID : 4df90724:87913791:1700bb31:773735d0
> Events : 181544
>
> Number Major Minor RaidDevice State
> 12 8 64 0 active sync /dev/sde
> 11 8 80 1 active sync /dev/sdf
> 13 8 112 2 active sync /dev/sdh
> 8 8 128 3 active sync /dev/sdi
> 9 8 144 4 active sync /dev/sdj
> 10 8 160 5 active sync /dev/sdk
> 14 8 176 6 active sync /dev/sdl
> 15 8 192 7 active sync /dev/sdm
>
> # iostat /dev/sd[efhijklm]
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-07 _x86_64_ (8 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.18 0.01 1.11 0.21 0.00 98.49
>
> Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> sde 19.23 388.93 0.09 0.00 158440224 35218 0
> sdf 19.20 388.94 0.09 0.00 158447574 34894 0
> sdh 19.23 388.89 0.08 0.00 158425596 34178 0
> sdi 19.23 388.99 0.09 0.00 158466326 34690 0
> sdj 20.18 388.93 0.09 0.00 158439780 34766 0
> sdk 19.23 388.88 0.09 0.00 158419988 35366 0
> sdl 19.20 388.97 0.08 0.00 158457352 34426 0
> sdm 19.21 388.92 0.08 0.00 158435748 34566 0
>
>
> top - 09:08:19 up 4 days, 17:10, 3 users, load average: 1.00, 1.00, 1.00
> Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 98.5 id, 0.1 wa, 0.6 hi, 0.1 si, 0.0 st
> MiB Mem : 24034.6 total, 11198.4 free, 1871.8 used, 10964.3 buff/cache
> MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21767.6 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 19719 root 20 0 2852 2820 2020 D 5.1 0.0 285:40.07 raid6check
> 1123 root 20 0 0 0 0 S 0.3 0.0 25:47.54 md0_raid6
> 37816 root 20 0 0 0 0 I 0.3 0.0 0:00.08 kworker/3:1-events
> 37903 root 20 0 219680 4540 3716 R 0.3 0.0 0:00.02 top
> ...
>
>
> HDD in use:
>
> /dev/sde : ST2000NM0033-9ZM175
> /dev/sdf : ST2000NM0033-9ZM175
> /dev/sdh : ST2000NM0033-9ZM175
> /dev/sdi : ST2000NM0033-9ZM175
> /dev/sdj : ST2000NM0033-9ZM175
> /dev/sdk : ST2000NM0033-9ZM175
> /dev/sdl : ST2000NM0033-9ZM175
> /dev/sdm : ST2000NM0008-2F3100
>
>
> 3 days later:
>
> # iostat /dev/sd[efhijklm]
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-10 _x86_64_ (8 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.18 0.00 1.07 0.17 0.00 98.57
>
> Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> sde 20.15 370.73 0.10 0.00 253186948 68154 0
> sdf 20.13 370.74 0.10 0.00 253194646 68138 0
> sdh 20.15 370.71 0.10 0.00 253172656 67738 0
> sdi 20.15 370.77 0.10 0.00 253213854 68158 0
> sdj 20.72 370.73 0.10 0.00 253187084 68066 0
> sdk 20.15 370.70 0.10 0.00 253166960 69286 0
> sdl 20.13 370.76 0.10 0.00 253204572 68070 0
> sdm 20.14 370.73 0.10 0.00 253182964 68070 0
>
>
> I've tried playing with speed_limit_min/speed_limit_max, but this
> didn't change anything:
>
> # cat /proc/sys/dev/raid/speed_limit_max
> 2000000
> cat /proc/sys/dev/raid/speed_limit_min
> 10000
>
> Any ideas welcome!
Difficult to say.
raid6check is CPU bounded, no vector optimization
and no multithread.
Nevertheless, if you see no CPU load (single core
load), then something else is not OK, but I've no
idea what it could be.
Please check if one core is up 100%, if this is
the case, then there is the limit.
If not, sorry, I cannot help.
bye,
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
@ 2020-05-10 22:16 ` Guoqing Jiang
2020-05-11 6:40 ` Wolfgang Denk
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2 siblings, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-10 22:16 UTC (permalink / raw)
To: Wolfgang Denk, linux-raid
On 5/10/20 2:07 PM, Wolfgang Denk wrote:
> Hi,
>
> I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
> RAID6 array and wonder why it is so extremely slow...
> It seems to be reading the disks only a about 400 kB/s,
> which results in an estimated time of some 57 days!!!
> to complete checking the array. The system is basically idle, there
> is neither any significant CPU load nor any other I/o (no to the
> tested array, nor to any other storage on this system).
>
> Am I doing something wrong?
>
>
> The command I'm running is simply:
>
> # raid6check /dev/md0 0 0
>
> This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64).
>
> The array data:
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Thu Nov 7 19:30:03 2013
> Raid Level : raid6
> Array Size : 11720301024 (11177.35 GiB 12001.59 GB)
> Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB)
> Raid Devices : 8
> Total Devices : 8
> Persistence : Superblock is persistent
>
> Update Time : Mon May 4 22:12:02 2020
> State : active
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 16K
>
> Consistency Policy : resync
>
> Name : atlas.denx.de:0 (local to host atlas.denx.de)
> UUID : 4df90724:87913791:1700bb31:773735d0
> Events : 181544
>
> Number Major Minor RaidDevice State
> 12 8 64 0 active sync /dev/sde
> 11 8 80 1 active sync /dev/sdf
> 13 8 112 2 active sync /dev/sdh
> 8 8 128 3 active sync /dev/sdi
> 9 8 144 4 active sync /dev/sdj
> 10 8 160 5 active sync /dev/sdk
> 14 8 176 6 active sync /dev/sdl
> 15 8 192 7 active sync /dev/sdm
>
> # iostat /dev/sd[efhijklm]
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-07 _x86_64_ (8 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.18 0.01 1.11 0.21 0.00 98.49
>
> Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> sde 19.23 388.93 0.09 0.00 158440224 35218 0
> sdf 19.20 388.94 0.09 0.00 158447574 34894 0
> sdh 19.23 388.89 0.08 0.00 158425596 34178 0
> sdi 19.23 388.99 0.09 0.00 158466326 34690 0
> sdj 20.18 388.93 0.09 0.00 158439780 34766 0
> sdk 19.23 388.88 0.09 0.00 158419988 35366 0
> sdl 19.20 388.97 0.08 0.00 158457352 34426 0
> sdm 19.21 388.92 0.08 0.00 158435748 34566 0
>
>
> top - 09:08:19 up 4 days, 17:10, 3 users, load average: 1.00, 1.00, 1.00
> Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 98.5 id, 0.1 wa, 0.6 hi, 0.1 si, 0.0 st
> MiB Mem : 24034.6 total, 11198.4 free, 1871.8 used, 10964.3 buff/cache
> MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21767.6 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 19719 root 20 0 2852 2820 2020 D 5.1 0.0 285:40.07 raid6check
Seems raid6check is in 'D' state, what are the output of 'cat
/proc/19719/stack' and /proc/mdstat?
> 1123 root 20 0 0 0 0 S 0.3 0.0 25:47.54 md0_raid6
> 37816 root 20 0 0 0 0 I 0.3 0.0 0:00.08 kworker/3:1-events
> 37903 root 20 0 219680 4540 3716 R 0.3 0.0 0:00.02 top
> ...
>
>
> HDD in use:
>
> /dev/sde : ST2000NM0033-9ZM175
> /dev/sdf : ST2000NM0033-9ZM175
> /dev/sdh : ST2000NM0033-9ZM175
> /dev/sdi : ST2000NM0033-9ZM175
> /dev/sdj : ST2000NM0033-9ZM175
> /dev/sdk : ST2000NM0033-9ZM175
> /dev/sdl : ST2000NM0033-9ZM175
> /dev/sdm : ST2000NM0008-2F3100
>
>
> 3 days later:
Is raid6check still in 'D' state as before?
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-10 13:26 ` Piergiorgio Sartor
@ 2020-05-11 6:33 ` Wolfgang Denk
0 siblings, 0 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-11 6:33 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: linux-raid
Dear Piergiorgio,
In message <20200510132611.GA12994@lazy.lzy> you wrote:
>
> raid6check is CPU bounded, no vector optimization
> and no multithread.
>
> Nevertheless, if you see no CPU load (single core
> load), then something else is not OK, but I've no
> idea what it could be.
>
> Please check if one core is up 100%, if this is
> the case, then there is the limit.
> If not, sorry, I cannot help.
No, there is virtually no CPU load at all:
top - 08:32:36 up 8 days, 16:34, 3 users, load average: 1.00, 1.01, 1.00
Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 1.3 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.3 us, 1.3 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 1.7 us, 3.7 sy, 0.0 ni, 90.4 id, 3.0 wa, 0.7 hi, 0.7 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 24034.6 total, 10921.2 free, 1882.4 used, 11230.9 buff/cache
MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21757.0 avail Mem
What I find interesting is thet all disks are more or less
constantly at around 400 kB/s (390...400, never more).
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
That's their goal, remember, a goal that's really contrary to that of
the programmer or administrator. We just want to get our jobs done.
$Bill just wants to become $$Bill. These aren't even marginallly
congruent.
-- Tom Christiansen in <6jhtqk$qls$1@csnews.cs.colorado.edu>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-10 22:16 ` Guoqing Jiang
@ 2020-05-11 6:40 ` Wolfgang Denk
2020-05-11 8:58 ` Guoqing Jiang
0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-11 6:40 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: linux-raid
Dear Guoqing Jiang,
In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
>
> Seems raid6check is in 'D' state, what are the output of 'cat
> /proc/19719/stack' and /proc/mdstat?
# for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_lo_store+0x50/0xa0
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_lo_store+0x50/0xa0
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_hi_store+0x44/0x90
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_hi_store+0x44/0x90
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
all the time? I thought it was _reading_ the disks only?
And iostat does not report any writes either?
# iostat /dev/sd[efhijklm] | cat
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.18 0.00 1.07 0.17 0.00 98.58
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
sde 20.30 368.76 0.10 0.00 277022327 75178 0
sdf 20.28 368.77 0.10 0.00 277030081 75170 0
sdh 20.30 368.74 0.10 0.00 277007903 74854 0
sdi 20.30 368.79 0.10 0.00 277049113 75246 0
sdj 20.82 368.76 0.10 0.00 277022363 74986 0
sdk 20.30 368.73 0.10 0.00 277002179 76322 0
sdl 20.29 368.78 0.10 0.00 277039743 74982 0
sdm 20.29 368.75 0.10 0.00 277018163 74958 0
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid10 sdc1[0] sdd1[1]
234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
bitmap: 0/2 pages [0KB], 65536KB chunk
md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
md1 : active raid1 sdb3[0] sda3[1]
484118656 blocks [2/2] [UU]
md2 : active raid1 sdb1[0] sda1[1]
255936 blocks [2/2] [UU]
unused devices: <none>
> > 3 days later:
>
> Is raid6check still in 'D' state as before?
Yes, nothing changed, still running:
top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00
Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache
MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check
1123 root 20 0 0 0 0 S 0.7 0.0 60:55.64 md0_raid6
10 root 20 0 0 0 0 I 0.3 0.0 9:09.26 rcu_sched
655 root 0 -20 0 0 0 I 0.3 0.0 21:28.95 kworker/1:1H-kblockd
60161 root 20 0 0 0 0 I 0.3 0.0 0:01.18 kworker/6:1-events
61997 root 20 0 0 0 0 I 0.3 0.0 0:01.48 kworker/1:3-events
...
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Every program has at least one bug and can be shortened by at least
one instruction -- from which, by induction, one can deduce that
every program can be reduced to one instruction which doesn't work.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 6:40 ` Wolfgang Denk
@ 2020-05-11 8:58 ` Guoqing Jiang
2020-05-11 15:39 ` Piergiorgio Sartor
2020-05-11 16:14 ` Piergiorgio Sartor
0 siblings, 2 replies; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 8:58 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: linux-raid
Hi Wolfgang,
On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> Dear Guoqing Jiang,
>
> In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
>> Seems raid6check is in 'D' state, what are the output of 'cat
>> /proc/19719/stack' and /proc/mdstat?
> # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_lo_store+0x50/0xa0
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_lo_store+0x50/0xa0
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_hi_store+0x44/0x90
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_hi_store+0x44/0x90
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Looks raid6check keeps writing suspend_lo/hi node which causes
mddev_suspend is called,
means synchronize_rcu and other synchronize mechanisms are triggered in
the path ...
> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> all the time? I thought it was _reading_ the disks only?
I didn't read raid6check before, just find check_stripes has
while (length > 0) {
lock_stripe -> write suspend_lo/hi node
...
unlock_all_stripes -> -> write suspend_lo/hi node
}
I think it explains the stack of raid6check, and maybe it is way that
raid6check works, lock
stripe, check the stripe then unlock the stripe, just my guess ...
> And iostat does not report any writes either?
Because CPU is busying with mddev_suspend I think.
> # iostat /dev/sd[efhijklm] | cat
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.18 0.00 1.07 0.17 0.00 98.58
>
> Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> sde 20.30 368.76 0.10 0.00 277022327 75178 0
> sdf 20.28 368.77 0.10 0.00 277030081 75170 0
> sdh 20.30 368.74 0.10 0.00 277007903 74854 0
> sdi 20.30 368.79 0.10 0.00 277049113 75246 0
> sdj 20.82 368.76 0.10 0.00 277022363 74986 0
> sdk 20.30 368.73 0.10 0.00 277002179 76322 0
> sdl 20.29 368.78 0.10 0.00 277039743 74982 0
> sdm 20.29 368.75 0.10 0.00 277018163 74958 0
>
>
> # cat /proc/mdstat
> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> md3 : active raid10 sdc1[0] sdd1[1]
> 234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
> bitmap: 0/2 pages [0KB], 65536KB chunk
>
> md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
> 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
>
> md1 : active raid1 sdb3[0] sda3[1]
> 484118656 blocks [2/2] [UU]
>
> md2 : active raid1 sdb1[0] sda1[1]
> 255936 blocks [2/2] [UU]
>
> unused devices: <none>
>
>>> 3 days later:
>> Is raid6check still in 'D' state as before?
> Yes, nothing changed, still running:
>
> top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00
> Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
> %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
> %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st
> %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache
> MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check
I think the stack of raid6check is pretty much the same as before.
Since the estimated time of 12TB array is about 57 days, if the
estimated time is linear to
the number of stripes in the same machine, then it is how raid6check
works as I guessed.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 8:58 ` Guoqing Jiang
@ 2020-05-11 15:39 ` Piergiorgio Sartor
2020-05-12 7:37 ` Wolfgang Denk
2020-05-11 16:14 ` Piergiorgio Sartor
1 sibling, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-11 15:39 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Wolfgang Denk, linux-raid
On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> Hi Wolfgang,
>
>
> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > Dear Guoqing Jiang,
> >
> > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > /proc/19719/stack' and /proc/mdstat?
> > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> is called,
> means synchronize_rcu and other synchronize mechanisms are triggered in the
> path ...
>
> > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > all the time? I thought it was _reading_ the disks only?
>
> I didn't read raid6check before, just find check_stripes has
>
>
> while (length > 0) {
> lock_stripe -> write suspend_lo/hi node
> ...
> unlock_all_stripes -> -> write suspend_lo/hi node
> }
>
> I think it explains the stack of raid6check, and maybe it is way that
> raid6check works, lock
> stripe, check the stripe then unlock the stripe, just my guess ...
Yes, that's the way it works.
raid6check lock the stripe, check it, release it.
This is required in order to avoid race conditions
between raid6check and some write to the stripe.
The alternative is to set the array R/O and do
the check, avoiding the lock / unlock.
This could be a way to test if the problem is
really here.
That is, remove the lock / unlock (I guess
there should be only one pair, but better
check) and check with the array in R/O mode.
Hope this helps,
bye,
pg
> > And iostat does not report any writes either?
>
> Because CPU is busying with mddev_suspend I think.
>
> > # iostat /dev/sd[efhijklm] | cat
> > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU)
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> > 0.18 0.00 1.07 0.17 0.00 98.58
> >
> > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> > sde 20.30 368.76 0.10 0.00 277022327 75178 0
> > sdf 20.28 368.77 0.10 0.00 277030081 75170 0
> > sdh 20.30 368.74 0.10 0.00 277007903 74854 0
> > sdi 20.30 368.79 0.10 0.00 277049113 75246 0
> > sdj 20.82 368.76 0.10 0.00 277022363 74986 0
> > sdk 20.30 368.73 0.10 0.00 277002179 76322 0
> > sdl 20.29 368.78 0.10 0.00 277039743 74982 0
> > sdm 20.29 368.75 0.10 0.00 277018163 74958 0
> >
> >
> > # cat /proc/mdstat
> > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> > md3 : active raid10 sdc1[0] sdd1[1]
> > 234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
> > bitmap: 0/2 pages [0KB], 65536KB chunk
> >
> > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
> > 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
> >
> > md1 : active raid1 sdb3[0] sda3[1]
> > 484118656 blocks [2/2] [UU]
> >
> > md2 : active raid1 sdb1[0] sda1[1]
> > 255936 blocks [2/2] [UU]
> >
> > unused devices: <none>
> >
> > > > 3 days later:
> > > Is raid6check still in 'D' state as before?
> > Yes, nothing changed, still running:
> >
> > top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00
> > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
> > %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
> > %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st
> > %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache
> > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check
>
> I think the stack of raid6check is pretty much the same as before.
>
> Since the estimated time of 12TB array is about 57 days, if the estimated
> time is linear to
> the number of stripes in the same machine, then it is how raid6check works
> as I guessed.
>
> Thanks,
> Guoqing
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 8:58 ` Guoqing Jiang
2020-05-11 15:39 ` Piergiorgio Sartor
@ 2020-05-11 16:14 ` Piergiorgio Sartor
2020-05-11 20:53 ` Giuseppe Bilotta
2020-05-11 21:07 ` Guoqing Jiang
1 sibling, 2 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-11 16:14 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Wolfgang Denk, linux-raid
On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> Hi Wolfgang,
>
>
> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > Dear Guoqing Jiang,
> >
> > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > /proc/19719/stack' and /proc/mdstat?
> > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> is called,
> means synchronize_rcu and other synchronize mechanisms are triggered in the
> path ...
>
> > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > all the time? I thought it was _reading_ the disks only?
>
> I didn't read raid6check before, just find check_stripes has
>
>
> while (length > 0) {
> lock_stripe -> write suspend_lo/hi node
> ...
> unlock_all_stripes -> -> write suspend_lo/hi node
> }
>
> I think it explains the stack of raid6check, and maybe it is way that
> raid6check works, lock
> stripe, check the stripe then unlock the stripe, just my guess ...
Hi again!
I made a quick test.
I disabled the lock / unlock in raid6check.
With lock / unlock, I get around 1.2MB/sec
per device component, with ~13% CPU load.
Wihtout lock / unlock, I get around 15.5MB/sec
per device component, with ~30% CPU load.
So, it seems the lock / unlock mechanism is
quite expensive.
I'm not sure what's the best solution, since
we still need to avoid race conditions.
Any suggestion is welcome!
bye,
pg
> > And iostat does not report any writes either?
>
> Because CPU is busying with mddev_suspend I think.
>
> > # iostat /dev/sd[efhijklm] | cat
> > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU)
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> > 0.18 0.00 1.07 0.17 0.00 98.58
> >
> > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> > sde 20.30 368.76 0.10 0.00 277022327 75178 0
> > sdf 20.28 368.77 0.10 0.00 277030081 75170 0
> > sdh 20.30 368.74 0.10 0.00 277007903 74854 0
> > sdi 20.30 368.79 0.10 0.00 277049113 75246 0
> > sdj 20.82 368.76 0.10 0.00 277022363 74986 0
> > sdk 20.30 368.73 0.10 0.00 277002179 76322 0
> > sdl 20.29 368.78 0.10 0.00 277039743 74982 0
> > sdm 20.29 368.75 0.10 0.00 277018163 74958 0
> >
> >
> > # cat /proc/mdstat
> > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> > md3 : active raid10 sdc1[0] sdd1[1]
> > 234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
> > bitmap: 0/2 pages [0KB], 65536KB chunk
> >
> > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
> > 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
> >
> > md1 : active raid1 sdb3[0] sda3[1]
> > 484118656 blocks [2/2] [UU]
> >
> > md2 : active raid1 sdb1[0] sda1[1]
> > 255936 blocks [2/2] [UU]
> >
> > unused devices: <none>
> >
> > > > 3 days later:
> > > Is raid6check still in 'D' state as before?
> > Yes, nothing changed, still running:
> >
> > top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00
> > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
> > %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
> > %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st
> > %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache
> > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check
>
> I think the stack of raid6check is pretty much the same as before.
>
> Since the estimated time of 12TB array is about 57 days, if the estimated
> time is linear to
> the number of stripes in the same machine, then it is how raid6check works
> as I guessed.
>
> Thanks,
> Guoqing
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 16:14 ` Piergiorgio Sartor
@ 2020-05-11 20:53 ` Giuseppe Bilotta
2020-05-11 21:12 ` Guoqing Jiang
2020-05-12 16:05 ` Piergiorgio Sartor
2020-05-11 21:07 ` Guoqing Jiang
1 sibling, 2 replies; 38+ messages in thread
From: Giuseppe Bilotta @ 2020-05-11 20:53 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Guoqing Jiang, Wolfgang Denk, linux-raid
Hello Piergiorgio,
On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:
> Hi again!
>
> I made a quick test.
> I disabled the lock / unlock in raid6check.
>
> With lock / unlock, I get around 1.2MB/sec
> per device component, with ~13% CPU load.
> Wihtout lock / unlock, I get around 15.5MB/sec
> per device component, with ~30% CPU load.
>
> So, it seems the lock / unlock mechanism is
> quite expensive.
>
> I'm not sure what's the best solution, since
> we still need to avoid race conditions.
>
> Any suggestion is welcome!
Would it be possible/effective to lock multiple stripes at once? Lock,
say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
internals, but if locking is O(1) on the number of stripes (at least
if they are consecutive), this would help reduce (potentially by a
factor of 8 or 16) the costs of the locks/unlocks at the expense of
longer locks and their influence on external I/O.
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 16:14 ` Piergiorgio Sartor
2020-05-11 20:53 ` Giuseppe Bilotta
@ 2020-05-11 21:07 ` Guoqing Jiang
2020-05-11 22:44 ` Peter Grandi
2020-05-12 16:07 ` Piergiorgio Sartor
1 sibling, 2 replies; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 21:07 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid
On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
>> Hi Wolfgang,
>>
>>
>> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
>>> Dear Guoqing Jiang,
>>>
>>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
>>>> Seems raid6check is in 'D' state, what are the output of 'cat
>>>> /proc/19719/stack' and /proc/mdstat?
>>> # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_lo_store+0x50/0xa0
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_lo_store+0x50/0xa0
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_hi_store+0x44/0x90
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_hi_store+0x44/0x90
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
>> is called,
>> means synchronize_rcu and other synchronize mechanisms are triggered in the
>> path ...
>>
>>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
>>> all the time? I thought it was_reading_ the disks only?
>> I didn't read raid6check before, just find check_stripes has
>>
>>
>> while (length > 0) {
>> lock_stripe -> write suspend_lo/hi node
>> ...
>> unlock_all_stripes -> -> write suspend_lo/hi node
>> }
>>
>> I think it explains the stack of raid6check, and maybe it is way that
>> raid6check works, lock
>> stripe, check the stripe then unlock the stripe, just my guess ...
> Hi again!
>
> I made a quick test.
> I disabled the lock / unlock in raid6check.
>
> With lock / unlock, I get around 1.2MB/sec
> per device component, with ~13% CPU load.
> Wihtout lock / unlock, I get around 15.5MB/sec
> per device component, with ~30% CPU load.
>
> So, it seems the lock / unlock mechanism is
> quite expensive.
Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
> I'm not sure what's the best solution, since
> we still need to avoid race conditions.
I guess there are two possible ways:
1. Per your previous reply, only call raid6check when array is RO, then
we don't need the lock.
2. Investigate if it is possible that acquire stripe_lock in
suspend_lo/hi_store
to avoid the race between raid6check and write to the same stripe. IOW,
try fine grained protection instead of call the expensive suspend/resume
in suspend_lo/hi_store. But I am not sure it is doable or not right now.
BTW, seems there are build problems for raid6check ...
mdadm$ make raid6check
gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
-Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
-DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
-DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
-DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
-DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\""
-DUSE_PTHREADS -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c
gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
xmalloc.o dlink.o
sysfs.o: In function `sysfsline':
sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
collect2: error: ld returned 1 exit status
Makefile:220: recipe for target 'raid6check' failed
make: *** [raid6check] Error 1
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 20:53 ` Giuseppe Bilotta
@ 2020-05-11 21:12 ` Guoqing Jiang
2020-05-11 21:16 ` Guoqing Jiang
2020-05-12 16:05 ` Piergiorgio Sartor
1 sibling, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 21:12 UTC (permalink / raw)
To: Giuseppe Bilotta, Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid
On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> Hello Piergiorgio,
>
> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
>> Hi again!
>>
>> I made a quick test.
>> I disabled the lock / unlock in raid6check.
>>
>> With lock / unlock, I get around 1.2MB/sec
>> per device component, with ~13% CPU load.
>> Wihtout lock / unlock, I get around 15.5MB/sec
>> per device component, with ~30% CPU load.
>>
>> So, it seems the lock / unlock mechanism is
>> quite expensive.
>>
>> I'm not sure what's the best solution, since
>> we still need to avoid race conditions.
>>
>> Any suggestion is welcome!
> Would it be possible/effective to lock multiple stripes at once? Lock,
> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> internals, but if locking is O(1) on the number of stripes (at least
> if they are consecutive), this would help reduce (potentially by a
> factor of 8 or 16) the costs of the locks/unlocks at the expense of
> longer locks and their influence on external I/O.
>
Hmm, maybe something like.
check_stripes
-> mddev_suspend
while (whole_stripe_num--) {
check each stripe
}
-> mddev_resume
Then just need to call suspend/resume once.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 21:12 ` Guoqing Jiang
@ 2020-05-11 21:16 ` Guoqing Jiang
2020-05-12 1:52 ` Giuseppe Bilotta
0 siblings, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 21:16 UTC (permalink / raw)
To: Giuseppe Bilotta, Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid
On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
>> Hello Piergiorgio,
>>
>> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
>> <piergiorgio.sartor@nexgo.de> wrote:
>>> Hi again!
>>>
>>> I made a quick test.
>>> I disabled the lock / unlock in raid6check.
>>>
>>> With lock / unlock, I get around 1.2MB/sec
>>> per device component, with ~13% CPU load.
>>> Wihtout lock / unlock, I get around 15.5MB/sec
>>> per device component, with ~30% CPU load.
>>>
>>> So, it seems the lock / unlock mechanism is
>>> quite expensive.
>>>
>>> I'm not sure what's the best solution, since
>>> we still need to avoid race conditions.
>>>
>>> Any suggestion is welcome!
>> Would it be possible/effective to lock multiple stripes at once? Lock,
>> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
>> internals, but if locking is O(1) on the number of stripes (at least
>> if they are consecutive), this would help reduce (potentially by a
>> factor of 8 or 16) the costs of the locks/unlocks at the expense of
>> longer locks and their influence on external I/O.
>>
>
> Hmm, maybe something like.
>
> check_stripes
>
> -> mddev_suspend
>
> while (whole_stripe_num--) {
> check each stripe
> }
>
> -> mddev_resume
>
>
> Then just need to call suspend/resume once.
But basically, the array can't process any new requests when checking is
in progress ...
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 21:07 ` Guoqing Jiang
@ 2020-05-11 22:44 ` Peter Grandi
2020-05-12 16:09 ` Piergiorgio Sartor
2020-05-12 16:07 ` Piergiorgio Sartor
1 sibling, 1 reply; 38+ messages in thread
From: Peter Grandi @ 2020-05-11 22:44 UTC (permalink / raw)
To: Linux RAID
>>> With lock / unlock, I get around 1.2MB/sec per device
>>> component, with ~13% CPU load. Wihtout lock / unlock, I get
>>> around 15.5MB/sec per device component, with ~30% CPU load.
>> [...] we still need to avoid race conditions. [...]
Not all race conditions are equally bad in this situation.
> 1. Per your previous reply, only call raid6check when array is
> RO, then we don't need the lock.
> 2. Investigate if it is possible that acquire stripe_lock in
> suspend_lo/hi_store [...]
Some other ways could be considered:
* Read a stripe without locking and check it; if it checks good,
no problem, else either it was modified during the read, or it
was faulty, so acquire a W lock, reread and recheck it (it
could have become good in the meantime).
The assumption here is that there is a modest write load from
applications on the RAID set, so the check will almost always
succeed, and it is worth rereading the stripe in very rare
cases of "collisions" or faults.
* Variants, like acquiring a W lock (if possible) on the stripe
solely while reading it ("atomic" read, which may be possible
in other ways without locking) and then if check fails we know
it was faulty, so optionally acquire a new W lock and reread
and recheck it (it could have become good in the meantime).
The assumption here is that the write load is less modest, but
there are a lot more reads than writes, so a W lock only
during read will eliminate the rereads and rechecks from
relatively rare "collisions".
The case where there is at the same time a large application
write load on the RAID set and checking at the same time is hard
to improve and probably eliminating rereads and rechecks by just
acquiring the stripe W lock for the whole duration of read and
check.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 21:16 ` Guoqing Jiang
@ 2020-05-12 1:52 ` Giuseppe Bilotta
2020-05-12 6:27 ` Adam Goryachev
0 siblings, 1 reply; 38+ messages in thread
From: Giuseppe Bilotta @ 2020-05-12 1:52 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid
On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
> On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> >> Would it be possible/effective to lock multiple stripes at once? Lock,
> >> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> >> internals, but if locking is O(1) on the number of stripes (at least
> >> if they are consecutive), this would help reduce (potentially by a
> >> factor of 8 or 16) the costs of the locks/unlocks at the expense of
> >> longer locks and their influence on external I/O.
> >>
> >
> > Hmm, maybe something like.
> >
> > check_stripes
> >
> > -> mddev_suspend
> >
> > while (whole_stripe_num--) {
> > check each stripe
> > }
> >
> > -> mddev_resume
> >
> >
> > Then just need to call suspend/resume once.
>
> But basically, the array can't process any new requests when checking is
Yeah, locking the entire device might be excessive (especially if it's
a big one). Using a granularity larger than 1 but smaller than the
whole device could be a compromise. Since the “no lock” approach seems
to be about an order of magnitude faster (at least in Piergiorgio's
benchmark), my guess was that something between 8 and 16 could bring
the speed up to be close to the “no lock” case without having dramatic
effects on I/O. Reading all 8/16 stripes before processing (assuming
sufficient memory) might even lead to better disk utilization during
the check.
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 1:52 ` Giuseppe Bilotta
@ 2020-05-12 6:27 ` Adam Goryachev
2020-05-12 16:11 ` Piergiorgio Sartor
0 siblings, 1 reply; 38+ messages in thread
From: Adam Goryachev @ 2020-05-12 6:27 UTC (permalink / raw)
To: Giuseppe Bilotta, Guoqing Jiang
Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid
On 12/5/20 11:52, Giuseppe Bilotta wrote:
> On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
>> On 5/11/20 11:12 PM, Guoqing Jiang wrote:
>>> On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
>>>> Would it be possible/effective to lock multiple stripes at once? Lock,
>>>> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
>>>> internals, but if locking is O(1) on the number of stripes (at least
>>>> if they are consecutive), this would help reduce (potentially by a
>>>> factor of 8 or 16) the costs of the locks/unlocks at the expense of
>>>> longer locks and their influence on external I/O.
>>>>
>>> Hmm, maybe something like.
>>>
>>> check_stripes
>>>
>>> -> mddev_suspend
>>>
>>> while (whole_stripe_num--) {
>>> check each stripe
>>> }
>>>
>>> -> mddev_resume
>>>
>>>
>>> Then just need to call suspend/resume once.
>> But basically, the array can't process any new requests when checking is
> Yeah, locking the entire device might be excessive (especially if it's
> a big one). Using a granularity larger than 1 but smaller than the
> whole device could be a compromise. Since the “no lock” approach seems
> to be about an order of magnitude faster (at least in Piergiorgio's
> benchmark), my guess was that something between 8 and 16 could bring
> the speed up to be close to the “no lock” case without having dramatic
> effects on I/O. Reading all 8/16 stripes before processing (assuming
> sufficient memory) might even lead to better disk utilization during
> the check.
I know very little about this, but could you perhaps lock 2 x 16
stripes, and then after you complete the first 16, release the first 16,
lock the 3rd 16 stripes, and while waiting for the lock continue to
process the 2nd set of 16?
Would that allow you to do more processing and less waiting for
lock/release?
Regards,
Adam
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 15:39 ` Piergiorgio Sartor
@ 2020-05-12 7:37 ` Wolfgang Denk
2020-05-12 16:17 ` Piergiorgio Sartor
0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-12 7:37 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1793 bytes --]
Dear Piergiorgio,
In message <20200511153937.GA3225@lazy.lzy> you wrote:
> > while (length > 0) {
> > lock_stripe -> write suspend_lo/hi node
> > ...
> > unlock_all_stripes -> -> write suspend_lo/hi node
> > }
> >
> > I think it explains the stack of raid6check, and maybe it is way that
> > raid6check works, lock
> > stripe, check the stripe then unlock the stripe, just my guess ...
>
> Yes, that's the way it works.
> raid6check lock the stripe, check it, release it.
> This is required in order to avoid race conditions
> between raid6check and some write to the stripe.
This still does not really explain what is so slow here. I mean,
even if the locking was an expenive operation code-wise, I would
expect to see at least one of the CPU cores near 100% then - but
botch CPU _and_ I/O are basically idle, and disks are _all_ and
_always_ really close at a trhoughput of 400 kB/s - this looks like
some intentional bandwith limit - I just can't see where this can be
configured?
> This could be a way to test if the problem is
> really here.
> That is, remove the lock / unlock (I guess
> there should be only one pair, but better
> check) and check with the array in R/O mode.
I may try this again after this test completed ;-)
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
It's certainly convenient the way the crime (or condition) of
stupidity carries with it its own punishment, automatically
admisistered without remorse, pity, or prejudice. :-)
-- Tom Christiansen in <559seq$ag1$1@csnews.cs.colorado.edu>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 20:53 ` Giuseppe Bilotta
2020-05-11 21:12 ` Guoqing Jiang
@ 2020-05-12 16:05 ` Piergiorgio Sartor
1 sibling, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:05 UTC (permalink / raw)
To: Giuseppe Bilotta
Cc: Piergiorgio Sartor, Guoqing Jiang, Wolfgang Denk, linux-raid
On Mon, May 11, 2020 at 10:53:05PM +0200, Giuseppe Bilotta wrote:
> Hello Piergiorgio,
>
> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> > Hi again!
> >
> > I made a quick test.
> > I disabled the lock / unlock in raid6check.
> >
> > With lock / unlock, I get around 1.2MB/sec
> > per device component, with ~13% CPU load.
> > Wihtout lock / unlock, I get around 15.5MB/sec
> > per device component, with ~30% CPU load.
> >
> > So, it seems the lock / unlock mechanism is
> > quite expensive.
> >
> > I'm not sure what's the best solution, since
> > we still need to avoid race conditions.
> >
> > Any suggestion is welcome!
>
> Would it be possible/effective to lock multiple stripes at once? Lock,
> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> internals, but if locking is O(1) on the number of stripes (at least
> if they are consecutive), this would help reduce (potentially by a
> factor of 8 or 16) the costs of the locks/unlocks at the expense of
> longer locks and their influence on external I/O.
Probabily possible, from the technical
point of view, even if I do not know
either the effect.
From the coding point of view, a bit
tricky, boundary conditions and so on
must be properly considered.
>
> --
> Giuseppe "Oblomov" Bilotta
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 21:07 ` Guoqing Jiang
2020-05-11 22:44 ` Peter Grandi
@ 2020-05-12 16:07 ` Piergiorgio Sartor
2020-05-12 18:16 ` Guoqing Jiang
2020-05-13 6:07 ` Wolfgang Denk
1 sibling, 2 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:07 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid
On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
> On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> > > Hi Wolfgang,
> > >
> > >
> > > On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > > > Dear Guoqing Jiang,
> > > >
> > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > > > /proc/19719/stack' and /proc/mdstat?
> > > > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_hi_store+0x44/0x90
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_hi_store+0x44/0x90
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> > > is called,
> > > means synchronize_rcu and other synchronize mechanisms are triggered in the
> > > path ...
> > >
> > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > > > all the time? I thought it was_reading_ the disks only?
> > > I didn't read raid6check before, just find check_stripes has
> > >
> > >
> > > while (length > 0) {
> > > lock_stripe -> write suspend_lo/hi node
> > > ...
> > > unlock_all_stripes -> -> write suspend_lo/hi node
> > > }
> > >
> > > I think it explains the stack of raid6check, and maybe it is way that
> > > raid6check works, lock
> > > stripe, check the stripe then unlock the stripe, just my guess ...
> > Hi again!
> >
> > I made a quick test.
> > I disabled the lock / unlock in raid6check.
> >
> > With lock / unlock, I get around 1.2MB/sec
> > per device component, with ~13% CPU load.
> > Wihtout lock / unlock, I get around 15.5MB/sec
> > per device component, with ~30% CPU load.
> >
> > So, it seems the lock / unlock mechanism is
> > quite expensive.
>
> Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
>
> > I'm not sure what's the best solution, since
> > we still need to avoid race conditions.
>
> I guess there are two possible ways:
>
> 1. Per your previous reply, only call raid6check when array is RO, then
> we don't need the lock.
>
> 2. Investigate if it is possible that acquire stripe_lock in
> suspend_lo/hi_store
> to avoid the race between raid6check and write to the same stripe. IOW,
> try fine grained protection instead of call the expensive suspend/resume
> in suspend_lo/hi_store. But I am not sure it is doable or not right now.
Could you please elaborate on the
"fine grained protection" thing?
>
> BTW, seems there are build problems for raid6check ...
>
> mdadm$ make raid6check
> gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
> -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
> -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
> -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
> -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
> -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
> -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c
> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
> xmalloc.o dlink.o
> sysfs.o: In function `sysfsline':
> sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> collect2: error: ld returned 1 exit status
> Makefile:220: recipe for target 'raid6check' failed
> make: *** [raid6check] Error 1
I cannot see this problem.
I could compile without issue.
Maybe some library is missing somewhere,
but I'm not sure where.
bye,
pg
>
> Thanks,
> Guoqing
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-11 22:44 ` Peter Grandi
@ 2020-05-12 16:09 ` Piergiorgio Sartor
2020-05-12 20:54 ` antlists
0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:09 UTC (permalink / raw)
To: Peter Grandi; +Cc: Linux RAID
On Mon, May 11, 2020 at 11:44:11PM +0100, Peter Grandi wrote:
> >>> With lock / unlock, I get around 1.2MB/sec per device
> >>> component, with ~13% CPU load. Wihtout lock / unlock, I get
> >>> around 15.5MB/sec per device component, with ~30% CPU load.
>
> >> [...] we still need to avoid race conditions. [...]
>
> Not all race conditions are equally bad in this situation.
>
> > 1. Per your previous reply, only call raid6check when array is
> > RO, then we don't need the lock.
> > 2. Investigate if it is possible that acquire stripe_lock in
> > suspend_lo/hi_store [...]
>
> Some other ways could be considered:
>
> * Read a stripe without locking and check it; if it checks good,
> no problem, else either it was modified during the read, or it
> was faulty, so acquire a W lock, reread and recheck it (it
> could have become good in the meantime).
>
> The assumption here is that there is a modest write load from
> applications on the RAID set, so the check will almost always
> succeed, and it is worth rereading the stripe in very rare
> cases of "collisions" or faults.
>
> * Variants, like acquiring a W lock (if possible) on the stripe
> solely while reading it ("atomic" read, which may be possible
> in other ways without locking) and then if check fails we know
> it was faulty, so optionally acquire a new W lock and reread
> and recheck it (it could have become good in the meantime).
>
> The assumption here is that the write load is less modest, but
> there are a lot more reads than writes, so a W lock only
> during read will eliminate the rereads and rechecks from
> relatively rare "collisions".
The locking method was suggested by Neil,
I'm not aware of other methods.
About the check -> maybe lock -> re-check,
it is a possible workaround, but I find it
a bit extreme.
In any case, we should keep it in mind.
bye,
pg
> The case where there is at the same time a large application
> write load on the RAID set and checking at the same time is hard
> to improve and probably eliminating rereads and rechecks by just
> acquiring the stripe W lock for the whole duration of read and
> check.
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 6:27 ` Adam Goryachev
@ 2020-05-12 16:11 ` Piergiorgio Sartor
0 siblings, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:11 UTC (permalink / raw)
To: Adam Goryachev
Cc: Giuseppe Bilotta, Guoqing Jiang, Piergiorgio Sartor,
Wolfgang Denk, linux-raid
On Tue, May 12, 2020 at 04:27:59PM +1000, Adam Goryachev wrote:
>
> On 12/5/20 11:52, Giuseppe Bilotta wrote:
> > On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
> > <guoqing.jiang@cloud.ionos.com> wrote:
> > > On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> > > > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> > > > > Would it be possible/effective to lock multiple stripes at once? Lock,
> > > > > say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> > > > > internals, but if locking is O(1) on the number of stripes (at least
> > > > > if they are consecutive), this would help reduce (potentially by a
> > > > > factor of 8 or 16) the costs of the locks/unlocks at the expense of
> > > > > longer locks and their influence on external I/O.
> > > > >
> > > > Hmm, maybe something like.
> > > >
> > > > check_stripes
> > > >
> > > > -> mddev_suspend
> > > >
> > > > while (whole_stripe_num--) {
> > > > check each stripe
> > > > }
> > > >
> > > > -> mddev_resume
> > > >
> > > >
> > > > Then just need to call suspend/resume once.
> > > But basically, the array can't process any new requests when checking is
> > Yeah, locking the entire device might be excessive (especially if it's
> > a big one). Using a granularity larger than 1 but smaller than the
> > whole device could be a compromise. Since the “no lock” approach seems
> > to be about an order of magnitude faster (at least in Piergiorgio's
> > benchmark), my guess was that something between 8 and 16 could bring
> > the speed up to be close to the “no lock” case without having dramatic
> > effects on I/O. Reading all 8/16 stripes before processing (assuming
> > sufficient memory) might even lead to better disk utilization during
> > the check.
>
> I know very little about this, but could you perhaps lock 2 x 16 stripes,
> and then after you complete the first 16, release the first 16, lock the 3rd
> 16 stripes, and while waiting for the lock continue to process the 2nd set
> of 16?
For some reason I don not know, the unlock
is global.
If I recall correctly, this was the way
Neil mentioned is "more" correct.
> Would that allow you to do more processing and less waiting for
> lock/release?
I think the general concept of pipelineing
is good, this would really improve the
performances of the whole thing.
If we could just multithread, I suspect
it could improve.
We need to solve the unlock problem...
bye,
>
> Regards,
> Adam
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 7:37 ` Wolfgang Denk
@ 2020-05-12 16:17 ` Piergiorgio Sartor
2020-05-13 6:13 ` Wolfgang Denk
0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:17 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Piergiorgio Sartor, Guoqing Jiang, linux-raid
On Tue, May 12, 2020 at 09:37:47AM +0200, Wolfgang Denk wrote:
> Dear Piergiorgio,
>
> In message <20200511153937.GA3225@lazy.lzy> you wrote:
> > > ??? while (length > 0) {
> > > ??? ??? ??? lock_stripe -> write suspend_lo/hi node
> > > ??? ??? ??? ...
> > > ??? ??? ??? unlock_all_stripes -> -> write suspend_lo/hi node
> > > ??? }
> > >
> > > I think it explains the stack of raid6check, and maybe it is way that
> > > raid6check works, lock
> > > stripe, check the stripe then unlock the stripe, just my guess ...
> >
> > Yes, that's the way it works.
> > raid6check lock the stripe, check it, release it.
> > This is required in order to avoid race conditions
> > between raid6check and some write to the stripe.
>
> This still does not really explain what is so slow here. I mean,
> even if the locking was an expenive operation code-wise, I would
> expect to see at least one of the CPU cores near 100% then - but
> botch CPU _and_ I/O are basically idle, and disks are _all_ and
> _always_ really close at a trhoughput of 400 kB/s - this looks like
> some intentional bandwith limit - I just can't see where this can be
> configured?
The code has 2 functions: lock_stripe() and
unlock_all_stripes().
These are doing more than just lock / unlock.
First, the memory pages of the process will
be locked, then some signal will be set to
"ignore", then the strip will be locked.
The unlock does the opposite in the reverse
order (unlock, set the signal back, unlock
the memory pages).
The difference is that, whatever the reason,
the unlock unlocks *all* the stripes, not
only the one locked.
Not sure why.
> > This could be a way to test if the problem is
> > really here.
> > That is, remove the lock / unlock (I guess
> > there should be only one pair, but better
> > check) and check with the array in R/O mode.
>
> I may try this again after this test completed ;-)
I did it, some performance improvement,
even if not really the possible max.
bye,
pg
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> It's certainly convenient the way the crime (or condition) of
> stupidity carries with it its own punishment, automatically
> admisistered without remorse, pity, or prejudice. :-)
> -- Tom Christiansen in <559seq$ag1$1@csnews.cs.colorado.edu>
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 16:07 ` Piergiorgio Sartor
@ 2020-05-12 18:16 ` Guoqing Jiang
2020-05-12 18:32 ` Piergiorgio Sartor
2020-05-13 6:07 ` Wolfgang Denk
1 sibling, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-12 18:16 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid
On 5/12/20 6:07 PM, Piergiorgio Sartor wrote:
> On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
>> On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
>>> On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
>>>> Hi Wolfgang,
>>>>
>>>>
>>>> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
>>>>> Dear Guoqing Jiang,
>>>>>
>>>>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
>>>>>> Seems raid6check is in 'D' state, what are the output of 'cat
>>>>>> /proc/19719/stack' and /proc/mdstat?
>>>>> # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_lo_store+0x50/0xa0
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_lo_store+0x50/0xa0
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_hi_store+0x44/0x90
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_hi_store+0x44/0x90
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
>>>> is called,
>>>> means synchronize_rcu and other synchronize mechanisms are triggered in the
>>>> path ...
>>>>
>>>>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
>>>>> all the time? I thought it was_reading_ the disks only?
>>>> I didn't read raid6check before, just find check_stripes has
>>>>
>>>>
>>>> while (length > 0) {
>>>> lock_stripe -> write suspend_lo/hi node
>>>> ...
>>>> unlock_all_stripes -> -> write suspend_lo/hi node
>>>> }
>>>>
>>>> I think it explains the stack of raid6check, and maybe it is way that
>>>> raid6check works, lock
>>>> stripe, check the stripe then unlock the stripe, just my guess ...
>>> Hi again!
>>>
>>> I made a quick test.
>>> I disabled the lock / unlock in raid6check.
>>>
>>> With lock / unlock, I get around 1.2MB/sec
>>> per device component, with ~13% CPU load.
>>> Wihtout lock / unlock, I get around 15.5MB/sec
>>> per device component, with ~30% CPU load.
>>>
>>> So, it seems the lock / unlock mechanism is
>>> quite expensive.
>> Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
>>
>>> I'm not sure what's the best solution, since
>>> we still need to avoid race conditions.
>> I guess there are two possible ways:
>>
>> 1. Per your previous reply, only call raid6check when array is RO, then
>> we don't need the lock.
>>
>> 2. Investigate if it is possible that acquire stripe_lock in
>> suspend_lo/hi_store
>> to avoid the race between raid6check and write to the same stripe. IOW,
>> try fine grained protection instead of call the expensive suspend/resume
>> in suspend_lo/hi_store. But I am not sure it is doable or not right now.
> Could you please elaborate on the
> "fine grained protection" thing?
Even raid6check checks stripe and locks stripe one by one, but the thing
is different in kernel space, locking of one stripe triggers mddev_suspend
and mddev_resume which affect all stripes ...
If kernel can expose interface to actually locking one stripe, then
raid6check
could use it to actually lock only one stripe (this is what I call fine
grained)
instead of trigger suspend/resume which are time consuming.
>
>> BTW, seems there are build problems for raid6check ...
>>
>> mdadm$ make raid6check
>> gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
>> -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
>> -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
>> -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
>> -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
>> -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
>> -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c
>> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
>> xmalloc.o dlink.o
>> sysfs.o: In function `sysfsline':
>> sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
>> sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
>> sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
>> collect2: error: ld returned 1 exit status
>> Makefile:220: recipe for target 'raid6check' failed
>> make: *** [raid6check] Error 1
> I cannot see this problem.
> I could compile without issue.
> Maybe some library is missing somewhere,
> but I'm not sure where.
Do you try with the fastest mdadm tree? But could be environment issue ...
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 18:16 ` Guoqing Jiang
@ 2020-05-12 18:32 ` Piergiorgio Sartor
2020-05-13 6:18 ` Wolfgang Denk
0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 18:32 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid
On Tue, May 12, 2020 at 08:16:27PM +0200, Guoqing Jiang wrote:
> On 5/12/20 6:07 PM, Piergiorgio Sartor wrote:
> > On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
> > > On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> > > > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> > > > > Hi Wolfgang,
> > > > >
> > > > >
> > > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > > > > > Dear Guoqing Jiang,
> > > > > >
> > > > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > > > > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > > > > > /proc/19719/stack' and /proc/mdstat?
> > > > > > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > >
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > >
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_hi_store+0x44/0x90
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > >
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_hi_store+0x44/0x90
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> > > > > is called,
> > > > > means synchronize_rcu and other synchronize mechanisms are triggered in the
> > > > > path ...
> > > > >
> > > > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > > > > > all the time? I thought it was_reading_ the disks only?
> > > > > I didn't read raid6check before, just find check_stripes has
> > > > >
> > > > >
> > > > > while (length > 0) {
> > > > > lock_stripe -> write suspend_lo/hi node
> > > > > ...
> > > > > unlock_all_stripes -> -> write suspend_lo/hi node
> > > > > }
> > > > >
> > > > > I think it explains the stack of raid6check, and maybe it is way that
> > > > > raid6check works, lock
> > > > > stripe, check the stripe then unlock the stripe, just my guess ...
> > > > Hi again!
> > > >
> > > > I made a quick test.
> > > > I disabled the lock / unlock in raid6check.
> > > >
> > > > With lock / unlock, I get around 1.2MB/sec
> > > > per device component, with ~13% CPU load.
> > > > Wihtout lock / unlock, I get around 15.5MB/sec
> > > > per device component, with ~30% CPU load.
> > > >
> > > > So, it seems the lock / unlock mechanism is
> > > > quite expensive.
> > > Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
> > >
> > > > I'm not sure what's the best solution, since
> > > > we still need to avoid race conditions.
> > > I guess there are two possible ways:
> > >
> > > 1. Per your previous reply, only call raid6check when array is RO, then
> > > we don't need the lock.
> > >
> > > 2. Investigate if it is possible that acquire stripe_lock in
> > > suspend_lo/hi_store
> > > to avoid the race between raid6check and write to the same stripe. IOW,
> > > try fine grained protection instead of call the expensive suspend/resume
> > > in suspend_lo/hi_store. But I am not sure it is doable or not right now.
> > Could you please elaborate on the
> > "fine grained protection" thing?
>
> Even raid6check checks stripe and locks stripe one by one, but the thing
> is different in kernel space, locking of one stripe triggers mddev_suspend
> and mddev_resume which affect all stripes ...
>
> If kernel can expose interface to actually locking one stripe, then
> raid6check
> could use it to actually lock only one stripe (this is what I call fine
> grained)
> instead of trigger suspend/resume which are time consuming.
I see, you mean we need a different
interface to this lock / unlock thing.
> > > BTW, seems there are build problems for raid6check ...
> > >
> > > mdadm$ make raid6check
> > > gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
> > > -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
> > > -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
> > > -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
> > > -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
> > > -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
> > > -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c
> > > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
> > > xmalloc.o dlink.o
> > > sysfs.o: In function `sysfsline':
> > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> > > collect2: error: ld returned 1 exit status
> > > Makefile:220: recipe for target 'raid6check' failed
> > > make: *** [raid6check] Error 1
> > I cannot see this problem.
> > I could compile without issue.
> > Maybe some library is missing somewhere,
> > but I'm not sure where.
>
> Do you try with the fastest mdadm tree? But could be environment issue ...
I'm using Fedora, so I downloaded
the .srpm package, installed, enabled
raid6check, patched and rebuild...
My background idea was to have the
mdadm rpm *with* raid6check, but I
did not go so far...
Sorry...
bye,
pg
> Thanks,
> Guoqing
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 16:09 ` Piergiorgio Sartor
@ 2020-05-12 20:54 ` antlists
2020-05-13 16:18 ` Piergiorgio Sartor
0 siblings, 1 reply; 38+ messages in thread
From: antlists @ 2020-05-12 20:54 UTC (permalink / raw)
To: Piergiorgio Sartor, Peter Grandi; +Cc: Linux RAID
On 12/05/2020 17:09, Piergiorgio Sartor wrote:
> About the check -> maybe lock -> re-check,
> it is a possible workaround, but I find it
> a bit extreme.
This seems the best (most obvious?) solution to me.
If the system is under light write pressure, and the disk is healthy, it
will scan pretty quickly with almost no locking.
If the system is under heavy pressure, chances are there'll be a fair
few stripes needing rechecking, but even at it's worst it'll only be as
bad as the current setup.
And if the system is somewhere inbetween, you still stand a good chance
of a fast scan.
At the end of the day, the rule should always be "lock only if you need
to" so looking for problems with an optimistic no-lock scan, then
locking only if needed to check and fix the problem, just feels right.
Cheers,
Wol
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 16:07 ` Piergiorgio Sartor
2020-05-12 18:16 ` Guoqing Jiang
@ 2020-05-13 6:07 ` Wolfgang Denk
2020-05-15 10:34 ` Andrey Jr. Melnikov
1 sibling, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-13 6:07 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid
Dear Piergiorgio,
In message <20200512160712.GB7261@lazy.lzy> you wrote:
>
> > BTW, seems there are build problems for raid6check ...
...
> I cannot see this problem.
> I could compile without issue.
> Maybe some library is missing somewhere,
> but I'm not sure where.
I see the same problem when trying to build current to of tree
(mdadm-4.1-74-g5cfb79d):
-> make raid6check
...
gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o dlink.o -c dlink.c
In function "dl_strndup",
inlined from "dl_strdup" at dlink.c:73:12:
dlink.c:66:5: error: "strncpy" output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
66 | strncpy(n, s, l);
| ^~~~~~~~~~~~~~~~
dlink.c: In function "dl_strdup":
dlink.c:73:31: note: length computed here
73 | return dl_strndup(s, (int)strlen(s));
| ^~~~~~~~~
cc1: all warnings being treated as errors
removing the "-Werror" from the CWFLAGS setting in the Makefile then
leads to:
...
gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
/usr/bin/ld: sysfs.o: in function `sysfsline':
sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
/usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
/usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'
This might come from commit b06815989 "mdadm: load default
sysfs attributes after assemblation"; mdadm-4.1 builds ok.
Build tests were run on Fedora 32.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Calm down, it's *__only* ones and zeroes.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 16:17 ` Piergiorgio Sartor
@ 2020-05-13 6:13 ` Wolfgang Denk
2020-05-13 16:22 ` Piergiorgio Sartor
0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-13 6:13 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid
Dear Piergiorgio,
In message <20200512161731.GE7261@lazy.lzy> you wrote:
>
> > This still does not really explain what is so slow here. I mean,
> > even if the locking was an expenive operation code-wise, I would
> > expect to see at least one of the CPU cores near 100% then - but
> > botch CPU _and_ I/O are basically idle, and disks are _all_ and
> > _always_ really close at a trhoughput of 400 kB/s - this looks like
> > some intentional bandwith limit - I just can't see where this can be
> > configured?
>
> The code has 2 functions: lock_stripe() and
> unlock_all_stripes().
>
> These are doing more than just lock / unlock.
> First, the memory pages of the process will
> be locked, then some signal will be set to
> "ignore", then the strip will be locked.
>
> The unlock does the opposite in the reverse
> order (unlock, set the signal back, unlock
> the memory pages).
> The difference is that, whatever the reason,
> the unlock unlocks *all* the stripes, not
> only the one locked.
>
> Not sure why.
It does not matter how omplex the operation is - I wonder why it is
taking so long: it cannot be CPU bound, as then I would expect to
see any significant CPU load, but none of the cores shows more than
5...6% usage, ever. Or it is I/O bound, then I would expect to see
I/O wait, but this is also never more than 0.2...0.3%.
And why are all disks running at pretty exaclty 400 kB/s read rate,
all the time? this looks like some intentinal bandwith limit, but I
cannot find any knob to change it.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Be careful what you wish for. You never know who will be listening.
- Terry Pratchett, _Soul Music_
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 18:32 ` Piergiorgio Sartor
@ 2020-05-13 6:18 ` Wolfgang Denk
0 siblings, 0 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-13 6:18 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid
Dear Piergiorgio,
In message <20200512183251.GA11548@lazy.lzy> you wrote:
>
> > > > xmalloc.o dlink.o
> > > > sysfs.o: In function `sysfsline':
> > > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> > > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> > > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> > > > collect2: error: ld returned 1 exit status
> > > > Makefile:220: recipe for target 'raid6check' failed
> > > > make: *** [raid6check] Error 1
> > > I cannot see this problem.
> > > I could compile without issue.
> > > Maybe some library is missing somewhere,
> > > but I'm not sure where.
> >
> > Do you try with the fastest mdadm tree? But could be environment issue ...
>
> I'm using Fedora, so I downloaded
> the .srpm package, installed, enabled
> raid6check, patched and rebuild...
Fedora 32 is still at mdadm-4.1 (Mon Oct 1 14:27:52 2018), but it
seems the significant change was introduced bu commit b06815989
"mdadm: load default sysfs attributes after assemblation" (Wed Jul 10
13:38:53 2019).
If you try to build top of tree you should see the problem, too
[and the -Werror issue I mentioned before, which is also fixed
in Fedora by local distro patches.]
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
As far as the laws of mathematics refer to reality, they are not cer-
tain, and as far as they are certain, they do not refer to reality.
-- Albert Einstein
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-12 20:54 ` antlists
@ 2020-05-13 16:18 ` Piergiorgio Sartor
2020-05-13 17:37 ` Wols Lists
0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-13 16:18 UTC (permalink / raw)
To: antlists; +Cc: Piergiorgio Sartor, Peter Grandi, Linux RAID
On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote:
> On 12/05/2020 17:09, Piergiorgio Sartor wrote:
> > About the check -> maybe lock -> re-check,
> > it is a possible workaround, but I find it
> > a bit extreme.
>
> This seems the best (most obvious?) solution to me.
>
> If the system is under light write pressure, and the disk is healthy, it
> will scan pretty quickly with almost no locking.
I've some concerns about optimization
solutions which can result in less
performances than the original status.
You mention "write pressure", but there
is an other case, which will cause
read -> lock -> re-read...
Namely, when some chunk is really corrupted.
Now, I do not know, maybe there are other
things we overlook, or maybe not.
I do not know either how likely is that some
situations will occur to reduce performances.
I would prefer a solution which will *only*
improve, without any possible drawback.
Again, this does not mean this approach is
wrong, actually is to be considered.
In the end, I would like also to understand
why the lock / unlock is so expensive.
> If the system is under heavy pressure, chances are there'll be a fair few
> stripes needing rechecking, but even at it's worst it'll only be as bad as
> the current setup.
It will be worse (or worst, I'm always
confused...).
The read and the check will double.
I'm not sure about the read, but the
check is currently expensive.
bye,
pg
> And if the system is somewhere inbetween, you still stand a good chance of a
> fast scan.
>
> At the end of the day, the rule should always be "lock only if you need to"
> so looking for problems with an optimistic no-lock scan, then locking only
> if needed to check and fix the problem, just feels right.
>
> Cheers,
> Wol
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-13 6:13 ` Wolfgang Denk
@ 2020-05-13 16:22 ` Piergiorgio Sartor
0 siblings, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-13 16:22 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Piergiorgio Sartor, Guoqing Jiang, linux-raid
On Wed, May 13, 2020 at 08:13:58AM +0200, Wolfgang Denk wrote:
> Dear Piergiorgio,
>
> In message <20200512161731.GE7261@lazy.lzy> you wrote:
> >
> > > This still does not really explain what is so slow here. I mean,
> > > even if the locking was an expenive operation code-wise, I would
> > > expect to see at least one of the CPU cores near 100% then - but
> > > botch CPU _and_ I/O are basically idle, and disks are _all_ and
> > > _always_ really close at a trhoughput of 400 kB/s - this looks like
> > > some intentional bandwith limit - I just can't see where this can be
> > > configured?
> >
> > The code has 2 functions: lock_stripe() and
> > unlock_all_stripes().
> >
> > These are doing more than just lock / unlock.
> > First, the memory pages of the process will
> > be locked, then some signal will be set to
> > "ignore", then the strip will be locked.
> >
> > The unlock does the opposite in the reverse
> > order (unlock, set the signal back, unlock
> > the memory pages).
> > The difference is that, whatever the reason,
> > the unlock unlocks *all* the stripes, not
> > only the one locked.
> >
> > Not sure why.
>
> It does not matter how omplex the operation is - I wonder why it is
> taking so long: it cannot be CPU bound, as then I would expect to
> see any significant CPU load, but none of the cores shows more than
> 5...6% usage, ever. Or it is I/O bound, then I would expect to see
> I/O wait, but this is also never more than 0.2...0.3%.
>
> And why are all disks running at pretty exaclty 400 kB/s read rate,
> all the time? this looks like some intentinal bandwith limit, but I
> cannot find any knob to change it.
In my test I see 1200KB/sec, or 1.2MB/sec,
which is different than yours.
I do not think there is any bandwidth
limitation, otherwise we should see the
same, I guess.
The low CPU load and low data rate seems
to me a symptom of CPU just systematically
waiting (for something).
It would be like putting in the code, here
and there, some usleep().
In the end we'll see low CPU load and low
data rate, *but* very constant.
Likely, is not either I/O wait, but some
other wait.
It could be not the lock stripe, but the
locking of the process memory pages...
This could be easily test, BTW.
Maybe I'll try...
bye,
pg
>
>
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Be careful what you wish for. You never know who will be listening.
> - Terry Pratchett, _Soul Music_
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-13 16:18 ` Piergiorgio Sartor
@ 2020-05-13 17:37 ` Wols Lists
2020-05-13 18:23 ` Piergiorgio Sartor
0 siblings, 1 reply; 38+ messages in thread
From: Wols Lists @ 2020-05-13 17:37 UTC (permalink / raw)
To: Piergiorgio Sartor; +Cc: Peter Grandi, Linux RAID
On 13/05/20 17:18, Piergiorgio Sartor wrote:
> On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote:
>> On 12/05/2020 17:09, Piergiorgio Sartor wrote:
>>> About the check -> maybe lock -> re-check,
>>> it is a possible workaround, but I find it
>>> a bit extreme.
>>
>> This seems the best (most obvious?) solution to me.
>>
>> If the system is under light write pressure, and the disk is healthy, it
>> will scan pretty quickly with almost no locking.
>
> I've some concerns about optimization
> solutions which can result in less
> performances than the original status.
>
> You mention "write pressure", but there
> is an other case, which will cause
> read -> lock -> re-read...
> Namely, when some chunk is really corrupted.
>
Yup. That's why I said "the disk is healthy" :-)
> Now, I do not know, maybe there are other
> things we overlook, or maybe not.
>
> I do not know either how likely is that some
> situations will occur to reduce performances.
>
> I would prefer a solution which will *only*
> improve, without any possible drawback.
Wouldn't we all. But if the *normal* case shows an appreciable
improvement, then I'm inclined to write off a "shouldn't happen" case as
"tough luck, shit happens".
>
> Again, this does not mean this approach is
> wrong, actually is to be considered.
>
> In the end, I would like also to understand
> why the lock / unlock is so expensive.
Agreed.
>
>> If the system is under heavy pressure, chances are there'll be a fair few
>> stripes needing rechecking, but even at it's worst it'll only be as bad as
>> the current setup.
>
> It will be worse (or worst, I'm always
> confused...).
> The read and the check will double.
Touche - my logic was off ...
But a bit of grammar - bad = descriptive, worse = comparative, worst =
absolute, so you were correct with worse.
>
> I'm not sure about the read, but the
> check is currently expensive.
But you're still going to need a very unlucky state of affairs for the
optimised check to be worse. Okay, if the disk IS damaged, then the
optimised check could easily be the worst, but if it's just write
pressure, you're going to need every second stripe to be messed up by a
collision. Rather unlikely imho.
>
> bye,
>
> pg
Cheers,
Wol
>
>> And if the system is somewhere inbetween, you still stand a good chance of a
>> fast scan.
>>
>> At the end of the day, the rule should always be "lock only if you need to"
>> so looking for problems with an optimistic no-lock scan, then locking only
>> if needed to check and fix the problem, just feels right.
>>
>> Cheers,
>> Wol
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-13 17:37 ` Wols Lists
@ 2020-05-13 18:23 ` Piergiorgio Sartor
0 siblings, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-13 18:23 UTC (permalink / raw)
To: Wols Lists; +Cc: Piergiorgio Sartor, Peter Grandi, Linux RAID
On Wed, May 13, 2020 at 06:37:18PM +0100, Wols Lists wrote:
> On 13/05/20 17:18, Piergiorgio Sartor wrote:
> > On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote:
> >> On 12/05/2020 17:09, Piergiorgio Sartor wrote:
> >>> About the check -> maybe lock -> re-check,
> >>> it is a possible workaround, but I find it
> >>> a bit extreme.
> >>
> >> This seems the best (most obvious?) solution to me.
> >>
> >> If the system is under light write pressure, and the disk is healthy, it
> >> will scan pretty quickly with almost no locking.
> >
> > I've some concerns about optimization
> > solutions which can result in less
> > performances than the original status.
> >
> > You mention "write pressure", but there
> > is an other case, which will cause
> > read -> lock -> re-read...
> > Namely, when some chunk is really corrupted.
> >
> Yup. That's why I said "the disk is healthy" :-)
We need to consider all posibilities...
> > Now, I do not know, maybe there are other
> > things we overlook, or maybe not.
> >
> > I do not know either how likely is that some
> > situations will occur to reduce performances.
> >
> > I would prefer a solution which will *only*
> > improve, without any possible drawback.
>
> Wouldn't we all. But if the *normal* case shows an appreciable
> improvement, then I'm inclined to write off a "shouldn't happen" case as
> "tough luck, shit happens".
> >
> > Again, this does not mean this approach is
> > wrong, actually is to be considered.
> >
> > In the end, I would like also to understand
> > why the lock / unlock is so expensive.
>
> Agreed.
> >
> >> If the system is under heavy pressure, chances are there'll be a fair few
> >> stripes needing rechecking, but even at it's worst it'll only be as bad as
> >> the current setup.
> >
> > It will be worse (or worst, I'm always
> > confused...).
> > The read and the check will double.
>
> Touche - my logic was off ...
>
> But a bit of grammar - bad = descriptive, worse = comparative, worst =
> absolute, so you were correct with worse.
Ah! Thank you.
That's always confusing me. Usually I check
with some search engine, but sometimes I'm
too lazy... And then I forgot.
BTW, somehow related, please do not
refrain to correct my English.
> > I'm not sure about the read, but the
> > check is currently expensive.
>
> But you're still going to need a very unlucky state of affairs for the
> optimised check to be worse. Okay, if the disk IS damaged, then the
> optimised check could easily be the worst, but if it's just write
> pressure, you're going to need every second stripe to be messed up by a
> collision. Rather unlikely imho.
Well, as Neil would say, patch are welcome! :-)
Really, I've too little time to make
changes to the code.
I can do some test and, hopefully,
some support.
bye,
pg
> >
> > bye,
> >
> > pg
>
> Cheers,
> Wol
> >
> >> And if the system is somewhere inbetween, you still stand a good chance of a
> >> fast scan.
> >>
> >> At the end of the day, the rule should always be "lock only if you need to"
> >> so looking for problems with an optimistic no-lock scan, then locking only
> >> if needed to check and fix the problem, just feels right.
> >>
> >> Cheers,
> >> Wol
> >
--
piergiorgio
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-10 22:16 ` Guoqing Jiang
@ 2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2020-05-14 18:20 ` Wolfgang Denk
2 siblings, 1 reply; 38+ messages in thread
From: Roy Sigurd Karlsbakk @ 2020-05-14 17:20 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Linux Raid
> I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
> RAID6 array and wonder why it is so extremely slow...
> It seems to be reading the disks only a about 400 kB/s,
> which results in an estimated time of some 57 days!!!
> to complete checking the array. The system is basically idle, there
> is neither any significant CPU load nor any other I/o (no to the
> tested array, nor to any other storage on this system).
>
> Am I doing something wrong?
Try checking with iostat -x to see if one disk is performing worse than the other ones. This sometimes happens and can indicate a failure that the normal SMART/smartctl stuff can't identify. If you see a utilisation of one of the disks at 100%, that's the bastard. Under normal circumstances, you probably won't be able to return that, since it "works". There's a quick fix for that, though. Just unplug the disk, plug it into a power cable, let it spin up and then sharpy twist it 90 degees a few times, and it's all sorted out and you can return it ;)
Vennlig hilsen
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
@ 2020-05-14 18:20 ` Wolfgang Denk
2020-05-14 19:51 ` Roy Sigurd Karlsbakk
0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-14 18:20 UTC (permalink / raw)
To: Roy Sigurd Karlsbakk; +Cc: Linux Raid
Dear Roy,
In message <1999694976.3317399.1589476824607.JavaMail.zimbra@karlsbakk.net> you wrote:
>
> Try checking with iostat -x to see if one disk is performing worse
> than the other ones. This sometimes happens and can indicate a
> failure that the normal SMART/smartctl stuff can't identify. If
> you see a utilisation of one of the disks at 100%, that's the
> bastard. Under normal circumstances, you probably won't be able to
> return that, since it "works". There's a quick fix for that,
> though. Just unplug the disk, plug it into a power cable, let it
> spin up and then sharpy twist it 90 degees a few times, and it's
> all sorted out and you can return it ;)
Everything looks unsuspicious to me - all disks behave the same:
# iostat -x /dev/sd[efhijklm] 1 3
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-14 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.19 0.00 1.06 0.15 0.00 98.60
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sde 20.08 360.56 2.53 11.20 0.34 17.95 0.49 0.10 0.02 3.41 32.36 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 32.74 0.02 2.11
sdf 20.07 360.56 2.54 11.24 0.33 17.96 0.49 0.10 0.02 3.40 44.23 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 44.77 0.02 2.09
sdh 20.08 360.54 2.53 11.17 0.35 17.95 0.49 0.10 0.02 3.40 43.47 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 44.01 0.02 2.40
sdi 20.08 360.58 2.54 11.23 0.34 17.96 0.49 0.10 0.02 3.40 26.22 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.49 26.50 0.01 2.84
sdj 20.45 360.56 2.16 9.54 0.34 17.63 0.49 0.10 0.02 3.38 35.19 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 35.60 0.02 2.46
sdk 20.08 360.54 2.53 11.21 0.35 17.95 0.49 0.10 0.02 3.42 40.63 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.49 41.13 0.02 2.36
sdl 20.07 360.57 2.54 11.24 0.34 17.96 0.49 0.10 0.02 3.39 23.61 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 23.84 0.01 2.70
sdm 20.08 360.55 2.53 11.21 0.53 17.96 0.49 0.10 0.02 3.41 21.52 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 21.67 0.01 2.64
avg-cpu: %user %nice %system %iowait %steal %idle
0.38 0.00 1.12 0.12 0.00 98.38
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sde 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00
sdf 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00
sdh 20.00 320.00 0.00 0.00 0.30 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00
sdi 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00
sdj 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00
sdk 20.00 320.00 0.00 0.00 0.30 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00
sdl 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00
sdm 20.00 320.00 0.00 0.00 0.35 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 0.88 0.00 0.00 98.87
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sde 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10
sdf 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10
sdh 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.30
sdi 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00
sdj 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10
sdk 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10
sdl 21.00 336.00 0.00 0.00 0.29 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.20
sdm 21.00 336.00 0.00 0.00 0.38 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
We see things not as they are, but as we are. - H. M. Tomlinson
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-14 18:20 ` Wolfgang Denk
@ 2020-05-14 19:51 ` Roy Sigurd Karlsbakk
2020-05-15 8:08 ` Wolfgang Denk
0 siblings, 1 reply; 38+ messages in thread
From: Roy Sigurd Karlsbakk @ 2020-05-14 19:51 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Linux Raid
what?
Vennlig hilsen
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.
----- Original Message -----
> From: "Wolfgang Denk" <wd@denx.de>
> To: "Roy Sigurd Karlsbakk" <roy@karlsbakk.net>
> Cc: "Linux Raid" <linux-raid@vger.kernel.org>
> Sent: Thursday, 14 May, 2020 20:20:41
> Subject: Re: raid6check extremely slow ?
> Dear Roy,
>
> In message <1999694976.3317399.1589476824607.JavaMail.zimbra@karlsbakk.net> you
> wrote:
>>
>> Try checking with iostat -x to see if one disk is performing worse
>> than the other ones. This sometimes happens and can indicate a
>> failure that the normal SMART/smartctl stuff can't identify. If
>> you see a utilisation of one of the disks at 100%, that's the
>> bastard. Under normal circumstances, you probably won't be able to
>> return that, since it "works". There's a quick fix for that,
>> though. Just unplug the disk, plug it into a power cable, let it
>> spin up and then sharpy twist it 90 degees a few times, and it's
>> all sorted out and you can return it ;)
>
> Everything looks unsuspicious to me - all disks behave the same:
>
> # iostat -x /dev/sd[efhijklm] 1 3
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-14 _x86_64_
> (8 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.19 0.00 1.06 0.15 0.00 98.60
>
> Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s
> wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm
> d_await dareq-sz f/s f_await aqu-sz %util
> sde 20.08 360.56 2.53 11.20 0.34 17.95 0.49
> 0.10 0.02 3.41 32.36 0.20 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 32.74 0.02 2.11
> sdf 20.07 360.56 2.54 11.24 0.33 17.96 0.49
> 0.10 0.02 3.40 44.23 0.20 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 44.77 0.02 2.09
> sdh 20.08 360.54 2.53 11.17 0.35 17.95 0.49
> 0.10 0.02 3.40 43.47 0.20 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 44.01 0.02 2.40
> sdi 20.08 360.58 2.54 11.23 0.34 17.96 0.49
> 0.10 0.02 3.40 26.22 0.21 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 26.50 0.01 2.84
> sdj 20.45 360.56 2.16 9.54 0.34 17.63 0.49
> 0.10 0.02 3.38 35.19 0.20 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 35.60 0.02 2.46
> sdk 20.08 360.54 2.53 11.21 0.35 17.95 0.49
> 0.10 0.02 3.42 40.63 0.21 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 41.13 0.02 2.36
> sdl 20.07 360.57 2.54 11.24 0.34 17.96 0.49
> 0.10 0.02 3.39 23.61 0.20 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 23.84 0.01 2.70
> sdm 20.08 360.55 2.53 11.21 0.53 17.96 0.49
> 0.10 0.02 3.41 21.52 0.20 0.00 0.00 0.00 0.00
> 0.00 0.00 0.49 21.67 0.01 2.64
>
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.38 0.00 1.12 0.12 0.00 98.38
>
> Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s
> wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm
> d_await dareq-sz f/s f_await aqu-sz %util
> sde 20.00 320.00 0.00 0.00 0.25 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.00
> sdf 20.00 320.00 0.00 0.00 0.25 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.00
> sdh 20.00 320.00 0.00 0.00 0.30 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.00
> sdi 20.00 320.00 0.00 0.00 0.25 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 4.00
> sdj 20.00 320.00 0.00 0.00 0.25 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.00
> sdk 20.00 320.00 0.00 0.00 0.30 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.00
> sdl 20.00 320.00 0.00 0.00 0.25 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 4.00
> sdm 20.00 320.00 0.00 0.00 0.35 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.00
>
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.25 0.00 0.88 0.00 0.00 98.87
>
> Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s
> wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm
> d_await dareq-sz f/s f_await aqu-sz %util
> sde 21.00 336.00 0.00 0.00 0.24 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.10
> sdf 21.00 336.00 0.00 0.00 0.24 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.10
> sdh 21.00 336.00 0.00 0.00 0.24 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.30
> sdi 21.00 336.00 0.00 0.00 0.24 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 4.00
> sdj 21.00 336.00 0.00 0.00 0.24 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.10
> sdk 21.00 336.00 0.00 0.00 0.24 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.10
> sdl 21.00 336.00 0.00 0.00 0.29 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 4.20
> sdm 21.00 336.00 0.00 0.00 0.38 16.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 2.10
>
>
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> We see things not as they are, but as we are. - H. M. Tomlinson
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-14 19:51 ` Roy Sigurd Karlsbakk
@ 2020-05-15 8:08 ` Wolfgang Denk
0 siblings, 0 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-15 8:08 UTC (permalink / raw)
To: Roy Sigurd Karlsbakk; +Cc: Linux Raid
Dear Roy Sigurd Karlsbakk,
In message <1430936688.3381175.1589485881380.JavaMail.zimbra@karlsbakk.net> you wrote:
> what?
You asked: "Try checking with iostat -x to see if one disk is
performing worse than the other ones."
The output of "iostat -x" which I posted shows clearly that all disk
behave very much the same - there are just minimal statistic
fluctuations, but agail equally distributed over all 8 disks.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I used to be indecisive, now I'm not sure.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-13 6:07 ` Wolfgang Denk
@ 2020-05-15 10:34 ` Andrey Jr. Melnikov
2020-05-15 11:54 ` Wolfgang Denk
0 siblings, 1 reply; 38+ messages in thread
From: Andrey Jr. Melnikov @ 2020-05-15 10:34 UTC (permalink / raw)
To: linux-raid
Wolfgang Denk <wd@denx.de> wrote:
> Dear Piergiorgio,
> In message <20200512160712.GB7261@lazy.lzy> you wrote:
> >
> > > BTW, seems there are build problems for raid6check ...
> ...
> > I cannot see this problem.
> > I could compile without issue.
> > Maybe some library is missing somewhere,
> > but I'm not sure where.
> ...
> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
> /usr/bin/ld: sysfs.o: in function `sysfsline':
> sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
> /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
> /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'
raid6check miss util.o object. Add it to CHECK_OBJS
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-15 10:34 ` Andrey Jr. Melnikov
@ 2020-05-15 11:54 ` Wolfgang Denk
2020-05-15 12:58 ` Guoqing Jiang
0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-15 11:54 UTC (permalink / raw)
To: Andrey Jr. Melnikov; +Cc: linux-raid
Dear "Andrey Jr. Melnikov",
In message <sq72pg-98v.ln1@banana.localnet> you wrote:
>
> > ...
> > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
> > /usr/bin/ld: sysfs.o: in function `sysfsline':
> > sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
> > /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
> > /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'
>
> raid6check miss util.o object. Add it to CHECK_OBJS
This makes things just worse. With this, I get:
...
gcc -Wall -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-77-g3b7aae9\" -DVERS_DATE="\"2020-05-14\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o util.o -c util.c
gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o util.o
/usr/bin/ld: util.o: in function `mdadm_version':
util.c:(.text+0x702): undefined reference to `Version'
/usr/bin/ld: util.o: in function `fname_from_uuid':
util.c:(.text+0xdce): undefined reference to `super1'
/usr/bin/ld: util.o: in function `is_subarray_active':
util.c:(.text+0x30b3): undefined reference to `mdstat_read'
/usr/bin/ld: util.c:(.text+0x3122): undefined reference to `free_mdstat'
/usr/bin/ld: util.o: in function `flush_metadata_updates':
util.c:(.text+0x3ad3): undefined reference to `connect_monitor'
/usr/bin/ld: util.c:(.text+0x3af1): undefined reference to `send_message'
/usr/bin/ld: util.c:(.text+0x3afb): undefined reference to `wait_reply'
/usr/bin/ld: util.c:(.text+0x3b1f): undefined reference to `ack'
/usr/bin/ld: util.c:(.text+0x3b29): undefined reference to `wait_reply'
/usr/bin/ld: util.o: in function `container_choose_spares':
util.c:(.text+0x3c84): undefined reference to `devid_policy'
/usr/bin/ld: util.c:(.text+0x3c9b): undefined reference to `pol_domain'
/usr/bin/ld: util.c:(.text+0x3caa): undefined reference to `pol_add'
/usr/bin/ld: util.c:(.text+0x3cbc): undefined reference to `domain_test'
/usr/bin/ld: util.c:(.text+0x3ccb): undefined reference to `dev_policy_free'
/usr/bin/ld: util.c:(.text+0x3d11): undefined reference to `dev_policy_free'
/usr/bin/ld: util.o: in function `set_cmap_hooks':
util.c:(.text+0x3f80): undefined reference to `dlopen'
/usr/bin/ld: util.c:(.text+0x3f9c): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x3fad): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x3fbe): undefined reference to `dlsym'
/usr/bin/ld: util.o: in function `set_dlm_hooks':
util.c:(.text+0x4310): undefined reference to `dlopen'
/usr/bin/ld: util.c:(.text+0x4330): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4341): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4352): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4363): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4374): undefined reference to `dlsym'
/usr/bin/ld: util.o:util.c:(.text+0x4385): more undefined references to `dlsym' follow
/usr/bin/ld: util.o: in function `set_cmap_hooks':
util.c:(.text+0x3fed): undefined reference to `dlclose'
/usr/bin/ld: util.o: in function `set_dlm_hooks':
util.c:(.text+0x43e5): undefined reference to `dlclose'
/usr/bin/ld: util.o:(.data+0x0): undefined reference to `super0'
/usr/bin/ld: util.o:(.data+0x8): undefined reference to `super1'
/usr/bin/ld: util.o:(.data+0x10): undefined reference to `super_ddf'
/usr/bin/ld: util.o:(.data+0x18): undefined reference to `super_imsm'
/usr/bin/ld: util.o:(.data+0x20): undefined reference to `mbr'
/usr/bin/ld: util.o:(.data+0x28): undefined reference to `gpt'
collect2: error: ld returned 1 exit status
make: *** [Makefile:221: raid6check] Error 1
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Ninety-Ninety Rule of Project Schedules:
The first ninety percent of the task takes ninety percent of
the time, and the last ten percent takes the other ninety percent.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ?
2020-05-15 11:54 ` Wolfgang Denk
@ 2020-05-15 12:58 ` Guoqing Jiang
0 siblings, 0 replies; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-15 12:58 UTC (permalink / raw)
To: Wolfgang Denk, Andrey Jr. Melnikov; +Cc: linux-raid
On 5/15/20 1:54 PM, Wolfgang Denk wrote:
> Dear "Andrey Jr. Melnikov",
>
> In message <sq72pg-98v.ln1@banana.localnet> you wrote:
>>> ...
>>> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
>>> /usr/bin/ld: sysfs.o: in function `sysfsline':
>>> sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
>>> /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
>>> /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'
>> raid6check miss util.o object. Add it to CHECK_OBJS
> This makes things just worse. With this, I get:
>
> ...
> gcc -Wall -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-77-g3b7aae9\" -DVERS_DATE="\"2020-05-14\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o util.o -c util.c
> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o util.o
> /usr/bin/ld: util.o: in function `mdadm_version':
> util.c:(.text+0x702): undefined reference to `Version'
> /usr/bin/ld: util.o: in function `fname_from_uuid':
> util.c:(.text+0xdce): undefined reference to `super1'
> /usr/bin/ld: util.o: in function `is_subarray_active':
> util.c:(.text+0x30b3): undefined reference to `mdstat_read'
> /usr/bin/ld: util.c:(.text+0x3122): undefined reference to `free_mdstat'
> /usr/bin/ld: util.o: in function `flush_metadata_updates':
> util.c:(.text+0x3ad3): undefined reference to `connect_monitor'
> /usr/bin/ld: util.c:(.text+0x3af1): undefined reference to `send_message'
> /usr/bin/ld: util.c:(.text+0x3afb): undefined reference to `wait_reply'
> /usr/bin/ld: util.c:(.text+0x3b1f): undefined reference to `ack'
> /usr/bin/ld: util.c:(.text+0x3b29): undefined reference to `wait_reply'
> /usr/bin/ld: util.o: in function `container_choose_spares':
> util.c:(.text+0x3c84): undefined reference to `devid_policy'
> /usr/bin/ld: util.c:(.text+0x3c9b): undefined reference to `pol_domain'
> /usr/bin/ld: util.c:(.text+0x3caa): undefined reference to `pol_add'
> /usr/bin/ld: util.c:(.text+0x3cbc): undefined reference to `domain_test'
> /usr/bin/ld: util.c:(.text+0x3ccb): undefined reference to `dev_policy_free'
> /usr/bin/ld: util.c:(.text+0x3d11): undefined reference to `dev_policy_free'
> /usr/bin/ld: util.o: in function `set_cmap_hooks':
> util.c:(.text+0x3f80): undefined reference to `dlopen'
> /usr/bin/ld: util.c:(.text+0x3f9c): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x3fad): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x3fbe): undefined reference to `dlsym'
> /usr/bin/ld: util.o: in function `set_dlm_hooks':
> util.c:(.text+0x4310): undefined reference to `dlopen'
> /usr/bin/ld: util.c:(.text+0x4330): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4341): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4352): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4363): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4374): undefined reference to `dlsym'
> /usr/bin/ld: util.o:util.c:(.text+0x4385): more undefined references to `dlsym' follow
> /usr/bin/ld: util.o: in function `set_cmap_hooks':
> util.c:(.text+0x3fed): undefined reference to `dlclose'
> /usr/bin/ld: util.o: in function `set_dlm_hooks':
> util.c:(.text+0x43e5): undefined reference to `dlclose'
> /usr/bin/ld: util.o:(.data+0x0): undefined reference to `super0'
> /usr/bin/ld: util.o:(.data+0x8): undefined reference to `super1'
> /usr/bin/ld: util.o:(.data+0x10): undefined reference to `super_ddf'
> /usr/bin/ld: util.o:(.data+0x18): undefined reference to `super_imsm'
> /usr/bin/ld: util.o:(.data+0x20): undefined reference to `mbr'
> /usr/bin/ld: util.o:(.data+0x28): undefined reference to `gpt'
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:221: raid6check] Error 1
>
I think we need a new uuid.c which is separated from util.c to address
the issue,
will send patch for it later.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2020-05-15 12:58 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-11 6:33 ` Wolfgang Denk
2020-05-10 22:16 ` Guoqing Jiang
2020-05-11 6:40 ` Wolfgang Denk
2020-05-11 8:58 ` Guoqing Jiang
2020-05-11 15:39 ` Piergiorgio Sartor
2020-05-12 7:37 ` Wolfgang Denk
2020-05-12 16:17 ` Piergiorgio Sartor
2020-05-13 6:13 ` Wolfgang Denk
2020-05-13 16:22 ` Piergiorgio Sartor
2020-05-11 16:14 ` Piergiorgio Sartor
2020-05-11 20:53 ` Giuseppe Bilotta
2020-05-11 21:12 ` Guoqing Jiang
2020-05-11 21:16 ` Guoqing Jiang
2020-05-12 1:52 ` Giuseppe Bilotta
2020-05-12 6:27 ` Adam Goryachev
2020-05-12 16:11 ` Piergiorgio Sartor
2020-05-12 16:05 ` Piergiorgio Sartor
2020-05-11 21:07 ` Guoqing Jiang
2020-05-11 22:44 ` Peter Grandi
2020-05-12 16:09 ` Piergiorgio Sartor
2020-05-12 20:54 ` antlists
2020-05-13 16:18 ` Piergiorgio Sartor
2020-05-13 17:37 ` Wols Lists
2020-05-13 18:23 ` Piergiorgio Sartor
2020-05-12 16:07 ` Piergiorgio Sartor
2020-05-12 18:16 ` Guoqing Jiang
2020-05-12 18:32 ` Piergiorgio Sartor
2020-05-13 6:18 ` Wolfgang Denk
2020-05-13 6:07 ` Wolfgang Denk
2020-05-15 10:34 ` Andrey Jr. Melnikov
2020-05-15 11:54 ` Wolfgang Denk
2020-05-15 12:58 ` Guoqing Jiang
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2020-05-14 18:20 ` Wolfgang Denk
2020-05-14 19:51 ` Roy Sigurd Karlsbakk
2020-05-15 8:08 ` Wolfgang Denk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.