* raid6check extremely slow ? @ 2020-05-10 12:07 Wolfgang Denk 2020-05-10 13:26 ` Piergiorgio Sartor ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Wolfgang Denk @ 2020-05-10 12:07 UTC (permalink / raw) To: linux-raid Hi, I'm running raid6check on a 12 TB (8 x 2 TB harddisks) RAID6 array and wonder why it is so extremely slow... It seems to be reading the disks only a about 400 kB/s, which results in an estimated time of some 57 days!!! to complete checking the array. The system is basically idle, there is neither any significant CPU load nor any other I/o (no to the tested array, nor to any other storage on this system). Am I doing something wrong? The command I'm running is simply: # raid6check /dev/md0 0 0 This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64). The array data: # mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Thu Nov 7 19:30:03 2013 Raid Level : raid6 Array Size : 11720301024 (11177.35 GiB 12001.59 GB) Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB) Raid Devices : 8 Total Devices : 8 Persistence : Superblock is persistent Update Time : Mon May 4 22:12:02 2020 State : active Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 16K Consistency Policy : resync Name : atlas.denx.de:0 (local to host atlas.denx.de) UUID : 4df90724:87913791:1700bb31:773735d0 Events : 181544 Number Major Minor RaidDevice State 12 8 64 0 active sync /dev/sde 11 8 80 1 active sync /dev/sdf 13 8 112 2 active sync /dev/sdh 8 8 128 3 active sync /dev/sdi 9 8 144 4 active sync /dev/sdj 10 8 160 5 active sync /dev/sdk 14 8 176 6 active sync /dev/sdl 15 8 192 7 active sync /dev/sdm # iostat /dev/sd[efhijklm] Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-07 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.18 0.01 1.11 0.21 0.00 98.49 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd sde 19.23 388.93 0.09 0.00 158440224 35218 0 sdf 19.20 388.94 0.09 0.00 158447574 34894 0 sdh 19.23 388.89 0.08 0.00 158425596 34178 0 sdi 19.23 388.99 0.09 0.00 158466326 34690 0 sdj 20.18 388.93 0.09 0.00 158439780 34766 0 sdk 19.23 388.88 0.09 0.00 158419988 35366 0 sdl 19.20 388.97 0.08 0.00 158457352 34426 0 sdm 19.21 388.92 0.08 0.00 158435748 34566 0 top - 09:08:19 up 4 days, 17:10, 3 users, load average: 1.00, 1.00, 1.00 Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 98.5 id, 0.1 wa, 0.6 hi, 0.1 si, 0.0 st MiB Mem : 24034.6 total, 11198.4 free, 1871.8 used, 10964.3 buff/cache MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21767.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19719 root 20 0 2852 2820 2020 D 5.1 0.0 285:40.07 raid6check 1123 root 20 0 0 0 0 S 0.3 0.0 25:47.54 md0_raid6 37816 root 20 0 0 0 0 I 0.3 0.0 0:00.08 kworker/3:1-events 37903 root 20 0 219680 4540 3716 R 0.3 0.0 0:00.02 top ... HDD in use: /dev/sde : ST2000NM0033-9ZM175 /dev/sdf : ST2000NM0033-9ZM175 /dev/sdh : ST2000NM0033-9ZM175 /dev/sdi : ST2000NM0033-9ZM175 /dev/sdj : ST2000NM0033-9ZM175 /dev/sdk : ST2000NM0033-9ZM175 /dev/sdl : ST2000NM0033-9ZM175 /dev/sdm : ST2000NM0008-2F3100 3 days later: # iostat /dev/sd[efhijklm] Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-10 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.18 0.00 1.07 0.17 0.00 98.57 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd sde 20.15 370.73 0.10 0.00 253186948 68154 0 sdf 20.13 370.74 0.10 0.00 253194646 68138 0 sdh 20.15 370.71 0.10 0.00 253172656 67738 0 sdi 20.15 370.77 0.10 0.00 253213854 68158 0 sdj 20.72 370.73 0.10 0.00 253187084 68066 0 sdk 20.15 370.70 0.10 0.00 253166960 69286 0 sdl 20.13 370.76 0.10 0.00 253204572 68070 0 sdm 20.14 370.73 0.10 0.00 253182964 68070 0 I've tried playing with speed_limit_min/speed_limit_max, but this didn't change anything: # cat /proc/sys/dev/raid/speed_limit_max 2000000 cat /proc/sys/dev/raid/speed_limit_min 10000 Any ideas welcome! Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The inappropriate cannot be beautiful. - Frank Lloyd Wright _The Future of Architecture_ (1953) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk @ 2020-05-10 13:26 ` Piergiorgio Sartor 2020-05-11 6:33 ` Wolfgang Denk 2020-05-10 22:16 ` Guoqing Jiang 2020-05-14 17:20 ` Roy Sigurd Karlsbakk 2 siblings, 1 reply; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-10 13:26 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid On Sun, May 10, 2020 at 02:07:25PM +0200, Wolfgang Denk wrote: > Hi, > > I'm running raid6check on a 12 TB (8 x 2 TB harddisks) > RAID6 array and wonder why it is so extremely slow... > It seems to be reading the disks only a about 400 kB/s, > which results in an estimated time of some 57 days!!! > to complete checking the array. The system is basically idle, there > is neither any significant CPU load nor any other I/o (no to the > tested array, nor to any other storage on this system). > > Am I doing something wrong? > > > The command I'm running is simply: > > # raid6check /dev/md0 0 0 > > This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64). > > The array data: > > # mdadm --detail /dev/md0 > /dev/md0: > Version : 1.2 > Creation Time : Thu Nov 7 19:30:03 2013 > Raid Level : raid6 > Array Size : 11720301024 (11177.35 GiB 12001.59 GB) > Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB) > Raid Devices : 8 > Total Devices : 8 > Persistence : Superblock is persistent > > Update Time : Mon May 4 22:12:02 2020 > State : active > Active Devices : 8 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 16K > > Consistency Policy : resync > > Name : atlas.denx.de:0 (local to host atlas.denx.de) > UUID : 4df90724:87913791:1700bb31:773735d0 > Events : 181544 > > Number Major Minor RaidDevice State > 12 8 64 0 active sync /dev/sde > 11 8 80 1 active sync /dev/sdf > 13 8 112 2 active sync /dev/sdh > 8 8 128 3 active sync /dev/sdi > 9 8 144 4 active sync /dev/sdj > 10 8 160 5 active sync /dev/sdk > 14 8 176 6 active sync /dev/sdl > 15 8 192 7 active sync /dev/sdm > > # iostat /dev/sd[efhijklm] > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-07 _x86_64_ (8 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.18 0.01 1.11 0.21 0.00 98.49 > > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd > sde 19.23 388.93 0.09 0.00 158440224 35218 0 > sdf 19.20 388.94 0.09 0.00 158447574 34894 0 > sdh 19.23 388.89 0.08 0.00 158425596 34178 0 > sdi 19.23 388.99 0.09 0.00 158466326 34690 0 > sdj 20.18 388.93 0.09 0.00 158439780 34766 0 > sdk 19.23 388.88 0.09 0.00 158419988 35366 0 > sdl 19.20 388.97 0.08 0.00 158457352 34426 0 > sdm 19.21 388.92 0.08 0.00 158435748 34566 0 > > > top - 09:08:19 up 4 days, 17:10, 3 users, load average: 1.00, 1.00, 1.00 > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 98.5 id, 0.1 wa, 0.6 hi, 0.1 si, 0.0 st > MiB Mem : 24034.6 total, 11198.4 free, 1871.8 used, 10964.3 buff/cache > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21767.6 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 19719 root 20 0 2852 2820 2020 D 5.1 0.0 285:40.07 raid6check > 1123 root 20 0 0 0 0 S 0.3 0.0 25:47.54 md0_raid6 > 37816 root 20 0 0 0 0 I 0.3 0.0 0:00.08 kworker/3:1-events > 37903 root 20 0 219680 4540 3716 R 0.3 0.0 0:00.02 top > ... > > > HDD in use: > > /dev/sde : ST2000NM0033-9ZM175 > /dev/sdf : ST2000NM0033-9ZM175 > /dev/sdh : ST2000NM0033-9ZM175 > /dev/sdi : ST2000NM0033-9ZM175 > /dev/sdj : ST2000NM0033-9ZM175 > /dev/sdk : ST2000NM0033-9ZM175 > /dev/sdl : ST2000NM0033-9ZM175 > /dev/sdm : ST2000NM0008-2F3100 > > > 3 days later: > > # iostat /dev/sd[efhijklm] > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-10 _x86_64_ (8 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.18 0.00 1.07 0.17 0.00 98.57 > > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd > sde 20.15 370.73 0.10 0.00 253186948 68154 0 > sdf 20.13 370.74 0.10 0.00 253194646 68138 0 > sdh 20.15 370.71 0.10 0.00 253172656 67738 0 > sdi 20.15 370.77 0.10 0.00 253213854 68158 0 > sdj 20.72 370.73 0.10 0.00 253187084 68066 0 > sdk 20.15 370.70 0.10 0.00 253166960 69286 0 > sdl 20.13 370.76 0.10 0.00 253204572 68070 0 > sdm 20.14 370.73 0.10 0.00 253182964 68070 0 > > > I've tried playing with speed_limit_min/speed_limit_max, but this > didn't change anything: > > # cat /proc/sys/dev/raid/speed_limit_max > 2000000 > cat /proc/sys/dev/raid/speed_limit_min > 10000 > > Any ideas welcome! Difficult to say. raid6check is CPU bounded, no vector optimization and no multithread. Nevertheless, if you see no CPU load (single core load), then something else is not OK, but I've no idea what it could be. Please check if one core is up 100%, if this is the case, then there is the limit. If not, sorry, I cannot help. bye, -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-10 13:26 ` Piergiorgio Sartor @ 2020-05-11 6:33 ` Wolfgang Denk 0 siblings, 0 replies; 38+ messages in thread From: Wolfgang Denk @ 2020-05-11 6:33 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: linux-raid Dear Piergiorgio, In message <20200510132611.GA12994@lazy.lzy> you wrote: > > raid6check is CPU bounded, no vector optimization > and no multithread. > > Nevertheless, if you see no CPU load (single core > load), then something else is not OK, but I've no > idea what it could be. > > Please check if one core is up 100%, if this is > the case, then there is the limit. > If not, sorry, I cannot help. No, there is virtually no CPU load at all: top - 08:32:36 up 8 days, 16:34, 3 users, load average: 1.00, 1.01, 1.00 Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 1.3 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.3 us, 1.3 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 1.7 us, 3.7 sy, 0.0 ni, 90.4 id, 3.0 wa, 0.7 hi, 0.7 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 24034.6 total, 10921.2 free, 1882.4 used, 11230.9 buff/cache MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21757.0 avail Mem What I find interesting is thet all disks are more or less constantly at around 400 kB/s (390...400, never more). Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de That's their goal, remember, a goal that's really contrary to that of the programmer or administrator. We just want to get our jobs done. $Bill just wants to become $$Bill. These aren't even marginallly congruent. -- Tom Christiansen in <6jhtqk$qls$1@csnews.cs.colorado.edu> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk 2020-05-10 13:26 ` Piergiorgio Sartor @ 2020-05-10 22:16 ` Guoqing Jiang 2020-05-11 6:40 ` Wolfgang Denk 2020-05-14 17:20 ` Roy Sigurd Karlsbakk 2 siblings, 1 reply; 38+ messages in thread From: Guoqing Jiang @ 2020-05-10 22:16 UTC (permalink / raw) To: Wolfgang Denk, linux-raid On 5/10/20 2:07 PM, Wolfgang Denk wrote: > Hi, > > I'm running raid6check on a 12 TB (8 x 2 TB harddisks) > RAID6 array and wonder why it is so extremely slow... > It seems to be reading the disks only a about 400 kB/s, > which results in an estimated time of some 57 days!!! > to complete checking the array. The system is basically idle, there > is neither any significant CPU load nor any other I/o (no to the > tested array, nor to any other storage on this system). > > Am I doing something wrong? > > > The command I'm running is simply: > > # raid6check /dev/md0 0 0 > > This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64). > > The array data: > > # mdadm --detail /dev/md0 > /dev/md0: > Version : 1.2 > Creation Time : Thu Nov 7 19:30:03 2013 > Raid Level : raid6 > Array Size : 11720301024 (11177.35 GiB 12001.59 GB) > Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB) > Raid Devices : 8 > Total Devices : 8 > Persistence : Superblock is persistent > > Update Time : Mon May 4 22:12:02 2020 > State : active > Active Devices : 8 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 16K > > Consistency Policy : resync > > Name : atlas.denx.de:0 (local to host atlas.denx.de) > UUID : 4df90724:87913791:1700bb31:773735d0 > Events : 181544 > > Number Major Minor RaidDevice State > 12 8 64 0 active sync /dev/sde > 11 8 80 1 active sync /dev/sdf > 13 8 112 2 active sync /dev/sdh > 8 8 128 3 active sync /dev/sdi > 9 8 144 4 active sync /dev/sdj > 10 8 160 5 active sync /dev/sdk > 14 8 176 6 active sync /dev/sdl > 15 8 192 7 active sync /dev/sdm > > # iostat /dev/sd[efhijklm] > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-07 _x86_64_ (8 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.18 0.01 1.11 0.21 0.00 98.49 > > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd > sde 19.23 388.93 0.09 0.00 158440224 35218 0 > sdf 19.20 388.94 0.09 0.00 158447574 34894 0 > sdh 19.23 388.89 0.08 0.00 158425596 34178 0 > sdi 19.23 388.99 0.09 0.00 158466326 34690 0 > sdj 20.18 388.93 0.09 0.00 158439780 34766 0 > sdk 19.23 388.88 0.09 0.00 158419988 35366 0 > sdl 19.20 388.97 0.08 0.00 158457352 34426 0 > sdm 19.21 388.92 0.08 0.00 158435748 34566 0 > > > top - 09:08:19 up 4 days, 17:10, 3 users, load average: 1.00, 1.00, 1.00 > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 98.5 id, 0.1 wa, 0.6 hi, 0.1 si, 0.0 st > MiB Mem : 24034.6 total, 11198.4 free, 1871.8 used, 10964.3 buff/cache > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21767.6 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 19719 root 20 0 2852 2820 2020 D 5.1 0.0 285:40.07 raid6check Seems raid6check is in 'D' state, what are the output of 'cat /proc/19719/stack' and /proc/mdstat? > 1123 root 20 0 0 0 0 S 0.3 0.0 25:47.54 md0_raid6 > 37816 root 20 0 0 0 0 I 0.3 0.0 0:00.08 kworker/3:1-events > 37903 root 20 0 219680 4540 3716 R 0.3 0.0 0:00.02 top > ... > > > HDD in use: > > /dev/sde : ST2000NM0033-9ZM175 > /dev/sdf : ST2000NM0033-9ZM175 > /dev/sdh : ST2000NM0033-9ZM175 > /dev/sdi : ST2000NM0033-9ZM175 > /dev/sdj : ST2000NM0033-9ZM175 > /dev/sdk : ST2000NM0033-9ZM175 > /dev/sdl : ST2000NM0033-9ZM175 > /dev/sdm : ST2000NM0008-2F3100 > > > 3 days later: Is raid6check still in 'D' state as before? Thanks, Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-10 22:16 ` Guoqing Jiang @ 2020-05-11 6:40 ` Wolfgang Denk 2020-05-11 8:58 ` Guoqing Jiang 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Denk @ 2020-05-11 6:40 UTC (permalink / raw) To: Guoqing Jiang; +Cc: linux-raid Dear Guoqing Jiang, In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: > > Seems raid6check is in 'D' state, what are the output of 'cat > /proc/19719/stack' and /proc/mdstat? # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done [<0>] __wait_rcu_gp+0x10d/0x110 [<0>] synchronize_rcu+0x47/0x50 [<0>] mddev_suspend+0x4a/0x140 [<0>] suspend_lo_store+0x50/0xa0 [<0>] md_attr_store+0x86/0xe0 [<0>] kernfs_fop_write+0xce/0x1b0 [<0>] vfs_write+0xb6/0x1a0 [<0>] ksys_write+0x4f/0xc0 [<0>] do_syscall_64+0x5b/0xf0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] __wait_rcu_gp+0x10d/0x110 [<0>] synchronize_rcu+0x47/0x50 [<0>] mddev_suspend+0x4a/0x140 [<0>] suspend_lo_store+0x50/0xa0 [<0>] md_attr_store+0x86/0xe0 [<0>] kernfs_fop_write+0xce/0x1b0 [<0>] vfs_write+0xb6/0x1a0 [<0>] ksys_write+0x4f/0xc0 [<0>] do_syscall_64+0x5b/0xf0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] __wait_rcu_gp+0x10d/0x110 [<0>] synchronize_rcu+0x47/0x50 [<0>] mddev_suspend+0x4a/0x140 [<0>] suspend_hi_store+0x44/0x90 [<0>] md_attr_store+0x86/0xe0 [<0>] kernfs_fop_write+0xce/0x1b0 [<0>] vfs_write+0xb6/0x1a0 [<0>] ksys_write+0x4f/0xc0 [<0>] do_syscall_64+0x5b/0xf0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] __wait_rcu_gp+0x10d/0x110 [<0>] synchronize_rcu+0x47/0x50 [<0>] mddev_suspend+0x4a/0x140 [<0>] suspend_hi_store+0x44/0x90 [<0>] md_attr_store+0x86/0xe0 [<0>] kernfs_fop_write+0xce/0x1b0 [<0>] vfs_write+0xb6/0x1a0 [<0>] ksys_write+0x4f/0xc0 [<0>] do_syscall_64+0x5b/0xf0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write all the time? I thought it was _reading_ the disks only? And iostat does not report any writes either? # iostat /dev/sd[efhijklm] | cat Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.18 0.00 1.07 0.17 0.00 98.58 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd sde 20.30 368.76 0.10 0.00 277022327 75178 0 sdf 20.28 368.77 0.10 0.00 277030081 75170 0 sdh 20.30 368.74 0.10 0.00 277007903 74854 0 sdi 20.30 368.79 0.10 0.00 277049113 75246 0 sdj 20.82 368.76 0.10 0.00 277022363 74986 0 sdk 20.30 368.73 0.10 0.00 277002179 76322 0 sdl 20.29 368.78 0.10 0.00 277039743 74982 0 sdm 20.29 368.75 0.10 0.00 277018163 74958 0 # cat /proc/mdstat Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid10 sdc1[0] sdd1[1] 234878976 blocks 512K chunks 2 far-copies [2/2] [UU] bitmap: 0/2 pages [0KB], 65536KB chunk md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11] 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU] md1 : active raid1 sdb3[0] sda3[1] 484118656 blocks [2/2] [UU] md2 : active raid1 sdb1[0] sda1[1] 255936 blocks [2/2] [UU] unused devices: <none> > > 3 days later: > > Is raid6check still in 'D' state as before? Yes, nothing changed, still running: top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00 Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check 1123 root 20 0 0 0 0 S 0.7 0.0 60:55.64 md0_raid6 10 root 20 0 0 0 0 I 0.3 0.0 9:09.26 rcu_sched 655 root 0 -20 0 0 0 I 0.3 0.0 21:28.95 kworker/1:1H-kblockd 60161 root 20 0 0 0 0 I 0.3 0.0 0:01.18 kworker/6:1-events 61997 root 20 0 0 0 0 I 0.3 0.0 0:01.48 kworker/1:3-events ... Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Every program has at least one bug and can be shortened by at least one instruction -- from which, by induction, one can deduce that every program can be reduced to one instruction which doesn't work. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 6:40 ` Wolfgang Denk @ 2020-05-11 8:58 ` Guoqing Jiang 2020-05-11 15:39 ` Piergiorgio Sartor 2020-05-11 16:14 ` Piergiorgio Sartor 0 siblings, 2 replies; 38+ messages in thread From: Guoqing Jiang @ 2020-05-11 8:58 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid Hi Wolfgang, On 5/11/20 8:40 AM, Wolfgang Denk wrote: > Dear Guoqing Jiang, > > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: >> Seems raid6check is in 'D' state, what are the output of 'cat >> /proc/19719/stack' and /proc/mdstat? > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done > [<0>] __wait_rcu_gp+0x10d/0x110 > [<0>] synchronize_rcu+0x47/0x50 > [<0>] mddev_suspend+0x4a/0x140 > [<0>] suspend_lo_store+0x50/0xa0 > [<0>] md_attr_store+0x86/0xe0 > [<0>] kernfs_fop_write+0xce/0x1b0 > [<0>] vfs_write+0xb6/0x1a0 > [<0>] ksys_write+0x4f/0xc0 > [<0>] do_syscall_64+0x5b/0xf0 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [<0>] __wait_rcu_gp+0x10d/0x110 > [<0>] synchronize_rcu+0x47/0x50 > [<0>] mddev_suspend+0x4a/0x140 > [<0>] suspend_lo_store+0x50/0xa0 > [<0>] md_attr_store+0x86/0xe0 > [<0>] kernfs_fop_write+0xce/0x1b0 > [<0>] vfs_write+0xb6/0x1a0 > [<0>] ksys_write+0x4f/0xc0 > [<0>] do_syscall_64+0x5b/0xf0 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [<0>] __wait_rcu_gp+0x10d/0x110 > [<0>] synchronize_rcu+0x47/0x50 > [<0>] mddev_suspend+0x4a/0x140 > [<0>] suspend_hi_store+0x44/0x90 > [<0>] md_attr_store+0x86/0xe0 > [<0>] kernfs_fop_write+0xce/0x1b0 > [<0>] vfs_write+0xb6/0x1a0 > [<0>] ksys_write+0x4f/0xc0 > [<0>] do_syscall_64+0x5b/0xf0 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [<0>] __wait_rcu_gp+0x10d/0x110 > [<0>] synchronize_rcu+0x47/0x50 > [<0>] mddev_suspend+0x4a/0x140 > [<0>] suspend_hi_store+0x44/0x90 > [<0>] md_attr_store+0x86/0xe0 > [<0>] kernfs_fop_write+0xce/0x1b0 > [<0>] vfs_write+0xb6/0x1a0 > [<0>] ksys_write+0x4f/0xc0 > [<0>] do_syscall_64+0x5b/0xf0 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend is called, means synchronize_rcu and other synchronize mechanisms are triggered in the path ... > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write > all the time? I thought it was _reading_ the disks only? I didn't read raid6check before, just find check_stripes has while (length > 0) { lock_stripe -> write suspend_lo/hi node ... unlock_all_stripes -> -> write suspend_lo/hi node } I think it explains the stack of raid6check, and maybe it is way that raid6check works, lock stripe, check the stripe then unlock the stripe, just my guess ... > And iostat does not report any writes either? Because CPU is busying with mddev_suspend I think. > # iostat /dev/sd[efhijklm] | cat > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.18 0.00 1.07 0.17 0.00 98.58 > > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd > sde 20.30 368.76 0.10 0.00 277022327 75178 0 > sdf 20.28 368.77 0.10 0.00 277030081 75170 0 > sdh 20.30 368.74 0.10 0.00 277007903 74854 0 > sdi 20.30 368.79 0.10 0.00 277049113 75246 0 > sdj 20.82 368.76 0.10 0.00 277022363 74986 0 > sdk 20.30 368.73 0.10 0.00 277002179 76322 0 > sdl 20.29 368.78 0.10 0.00 277039743 74982 0 > sdm 20.29 368.75 0.10 0.00 277018163 74958 0 > > > # cat /proc/mdstat > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] > md3 : active raid10 sdc1[0] sdd1[1] > 234878976 blocks 512K chunks 2 far-copies [2/2] [UU] > bitmap: 0/2 pages [0KB], 65536KB chunk > > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11] > 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU] > > md1 : active raid1 sdb3[0] sda3[1] > 484118656 blocks [2/2] [UU] > > md2 : active raid1 sdb1[0] sda1[1] > 255936 blocks [2/2] [UU] > > unused devices: <none> > >>> 3 days later: >> Is raid6check still in 'D' state as before? > Yes, nothing changed, still running: > > top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00 > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie > %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st > %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st > %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check I think the stack of raid6check is pretty much the same as before. Since the estimated time of 12TB array is about 57 days, if the estimated time is linear to the number of stripes in the same machine, then it is how raid6check works as I guessed. Thanks, Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 8:58 ` Guoqing Jiang @ 2020-05-11 15:39 ` Piergiorgio Sartor 2020-05-12 7:37 ` Wolfgang Denk 2020-05-11 16:14 ` Piergiorgio Sartor 1 sibling, 1 reply; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-11 15:39 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Wolfgang Denk, linux-raid On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote: > Hi Wolfgang, > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote: > > Dear Guoqing Jiang, > > > > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: > > > Seems raid6check is in 'D' state, what are the output of 'cat > > > /proc/19719/stack' and /proc/mdstat? > > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_lo_store+0x50/0xa0 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_lo_store+0x50/0xa0 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_hi_store+0x44/0x90 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_hi_store+0x44/0x90 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend > is called, > means synchronize_rcu and other synchronize mechanisms are triggered in the > path ... > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write > > all the time? I thought it was _reading_ the disks only? > > I didn't read raid6check before, just find check_stripes has > > > while (length > 0) { > lock_stripe -> write suspend_lo/hi node > ... > unlock_all_stripes -> -> write suspend_lo/hi node > } > > I think it explains the stack of raid6check, and maybe it is way that > raid6check works, lock > stripe, check the stripe then unlock the stripe, just my guess ... Yes, that's the way it works. raid6check lock the stripe, check it, release it. This is required in order to avoid race conditions between raid6check and some write to the stripe. The alternative is to set the array R/O and do the check, avoiding the lock / unlock. This could be a way to test if the problem is really here. That is, remove the lock / unlock (I guess there should be only one pair, but better check) and check with the array in R/O mode. Hope this helps, bye, pg > > And iostat does not report any writes either? > > Because CPU is busying with mddev_suspend I think. > > > # iostat /dev/sd[efhijklm] | cat > > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU) > > > > avg-cpu: %user %nice %system %iowait %steal %idle > > 0.18 0.00 1.07 0.17 0.00 98.58 > > > > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd > > sde 20.30 368.76 0.10 0.00 277022327 75178 0 > > sdf 20.28 368.77 0.10 0.00 277030081 75170 0 > > sdh 20.30 368.74 0.10 0.00 277007903 74854 0 > > sdi 20.30 368.79 0.10 0.00 277049113 75246 0 > > sdj 20.82 368.76 0.10 0.00 277022363 74986 0 > > sdk 20.30 368.73 0.10 0.00 277002179 76322 0 > > sdl 20.29 368.78 0.10 0.00 277039743 74982 0 > > sdm 20.29 368.75 0.10 0.00 277018163 74958 0 > > > > > > # cat /proc/mdstat > > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] > > md3 : active raid10 sdc1[0] sdd1[1] > > 234878976 blocks 512K chunks 2 far-copies [2/2] [UU] > > bitmap: 0/2 pages [0KB], 65536KB chunk > > > > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11] > > 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU] > > > > md1 : active raid1 sdb3[0] sda3[1] > > 484118656 blocks [2/2] [UU] > > > > md2 : active raid1 sdb1[0] sda1[1] > > 255936 blocks [2/2] [UU] > > > > unused devices: <none> > > > > > > 3 days later: > > > Is raid6check still in 'D' state as before? > > Yes, nothing changed, still running: > > > > top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00 > > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie > > %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st > > %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st > > %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache > > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check > > I think the stack of raid6check is pretty much the same as before. > > Since the estimated time of 12TB array is about 57 days, if the estimated > time is linear to > the number of stripes in the same machine, then it is how raid6check works > as I guessed. > > Thanks, > Guoqing -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 15:39 ` Piergiorgio Sartor @ 2020-05-12 7:37 ` Wolfgang Denk 2020-05-12 16:17 ` Piergiorgio Sartor 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Denk @ 2020-05-12 7:37 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1793 bytes --] Dear Piergiorgio, In message <20200511153937.GA3225@lazy.lzy> you wrote: > > while (length > 0) { > > lock_stripe -> write suspend_lo/hi node > > ... > > unlock_all_stripes -> -> write suspend_lo/hi node > > } > > > > I think it explains the stack of raid6check, and maybe it is way that > > raid6check works, lock > > stripe, check the stripe then unlock the stripe, just my guess ... > > Yes, that's the way it works. > raid6check lock the stripe, check it, release it. > This is required in order to avoid race conditions > between raid6check and some write to the stripe. This still does not really explain what is so slow here. I mean, even if the locking was an expenive operation code-wise, I would expect to see at least one of the CPU cores near 100% then - but botch CPU _and_ I/O are basically idle, and disks are _all_ and _always_ really close at a trhoughput of 400 kB/s - this looks like some intentional bandwith limit - I just can't see where this can be configured? > This could be a way to test if the problem is > really here. > That is, remove the lock / unlock (I guess > there should be only one pair, but better > check) and check with the array in R/O mode. I may try this again after this test completed ;-) Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de It's certainly convenient the way the crime (or condition) of stupidity carries with it its own punishment, automatically admisistered without remorse, pity, or prejudice. :-) -- Tom Christiansen in <559seq$ag1$1@csnews.cs.colorado.edu> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 7:37 ` Wolfgang Denk @ 2020-05-12 16:17 ` Piergiorgio Sartor 2020-05-13 6:13 ` Wolfgang Denk 0 siblings, 1 reply; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-12 16:17 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Piergiorgio Sartor, Guoqing Jiang, linux-raid On Tue, May 12, 2020 at 09:37:47AM +0200, Wolfgang Denk wrote: > Dear Piergiorgio, > > In message <20200511153937.GA3225@lazy.lzy> you wrote: > > > ??? while (length > 0) { > > > ??? ??? ??? lock_stripe -> write suspend_lo/hi node > > > ??? ??? ??? ... > > > ??? ??? ??? unlock_all_stripes -> -> write suspend_lo/hi node > > > ??? } > > > > > > I think it explains the stack of raid6check, and maybe it is way that > > > raid6check works, lock > > > stripe, check the stripe then unlock the stripe, just my guess ... > > > > Yes, that's the way it works. > > raid6check lock the stripe, check it, release it. > > This is required in order to avoid race conditions > > between raid6check and some write to the stripe. > > This still does not really explain what is so slow here. I mean, > even if the locking was an expenive operation code-wise, I would > expect to see at least one of the CPU cores near 100% then - but > botch CPU _and_ I/O are basically idle, and disks are _all_ and > _always_ really close at a trhoughput of 400 kB/s - this looks like > some intentional bandwith limit - I just can't see where this can be > configured? The code has 2 functions: lock_stripe() and unlock_all_stripes(). These are doing more than just lock / unlock. First, the memory pages of the process will be locked, then some signal will be set to "ignore", then the strip will be locked. The unlock does the opposite in the reverse order (unlock, set the signal back, unlock the memory pages). The difference is that, whatever the reason, the unlock unlocks *all* the stripes, not only the one locked. Not sure why. > > This could be a way to test if the problem is > > really here. > > That is, remove the lock / unlock (I guess > > there should be only one pair, but better > > check) and check with the array in R/O mode. > > I may try this again after this test completed ;-) I did it, some performance improvement, even if not really the possible max. bye, pg > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > It's certainly convenient the way the crime (or condition) of > stupidity carries with it its own punishment, automatically > admisistered without remorse, pity, or prejudice. :-) > -- Tom Christiansen in <559seq$ag1$1@csnews.cs.colorado.edu> -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 16:17 ` Piergiorgio Sartor @ 2020-05-13 6:13 ` Wolfgang Denk 2020-05-13 16:22 ` Piergiorgio Sartor 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Denk @ 2020-05-13 6:13 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid Dear Piergiorgio, In message <20200512161731.GE7261@lazy.lzy> you wrote: > > > This still does not really explain what is so slow here. I mean, > > even if the locking was an expenive operation code-wise, I would > > expect to see at least one of the CPU cores near 100% then - but > > botch CPU _and_ I/O are basically idle, and disks are _all_ and > > _always_ really close at a trhoughput of 400 kB/s - this looks like > > some intentional bandwith limit - I just can't see where this can be > > configured? > > The code has 2 functions: lock_stripe() and > unlock_all_stripes(). > > These are doing more than just lock / unlock. > First, the memory pages of the process will > be locked, then some signal will be set to > "ignore", then the strip will be locked. > > The unlock does the opposite in the reverse > order (unlock, set the signal back, unlock > the memory pages). > The difference is that, whatever the reason, > the unlock unlocks *all* the stripes, not > only the one locked. > > Not sure why. It does not matter how omplex the operation is - I wonder why it is taking so long: it cannot be CPU bound, as then I would expect to see any significant CPU load, but none of the cores shows more than 5...6% usage, ever. Or it is I/O bound, then I would expect to see I/O wait, but this is also never more than 0.2...0.3%. And why are all disks running at pretty exaclty 400 kB/s read rate, all the time? this looks like some intentinal bandwith limit, but I cannot find any knob to change it. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Be careful what you wish for. You never know who will be listening. - Terry Pratchett, _Soul Music_ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-13 6:13 ` Wolfgang Denk @ 2020-05-13 16:22 ` Piergiorgio Sartor 0 siblings, 0 replies; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-13 16:22 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Piergiorgio Sartor, Guoqing Jiang, linux-raid On Wed, May 13, 2020 at 08:13:58AM +0200, Wolfgang Denk wrote: > Dear Piergiorgio, > > In message <20200512161731.GE7261@lazy.lzy> you wrote: > > > > > This still does not really explain what is so slow here. I mean, > > > even if the locking was an expenive operation code-wise, I would > > > expect to see at least one of the CPU cores near 100% then - but > > > botch CPU _and_ I/O are basically idle, and disks are _all_ and > > > _always_ really close at a trhoughput of 400 kB/s - this looks like > > > some intentional bandwith limit - I just can't see where this can be > > > configured? > > > > The code has 2 functions: lock_stripe() and > > unlock_all_stripes(). > > > > These are doing more than just lock / unlock. > > First, the memory pages of the process will > > be locked, then some signal will be set to > > "ignore", then the strip will be locked. > > > > The unlock does the opposite in the reverse > > order (unlock, set the signal back, unlock > > the memory pages). > > The difference is that, whatever the reason, > > the unlock unlocks *all* the stripes, not > > only the one locked. > > > > Not sure why. > > It does not matter how omplex the operation is - I wonder why it is > taking so long: it cannot be CPU bound, as then I would expect to > see any significant CPU load, but none of the cores shows more than > 5...6% usage, ever. Or it is I/O bound, then I would expect to see > I/O wait, but this is also never more than 0.2...0.3%. > > And why are all disks running at pretty exaclty 400 kB/s read rate, > all the time? this looks like some intentinal bandwith limit, but I > cannot find any knob to change it. In my test I see 1200KB/sec, or 1.2MB/sec, which is different than yours. I do not think there is any bandwidth limitation, otherwise we should see the same, I guess. The low CPU load and low data rate seems to me a symptom of CPU just systematically waiting (for something). It would be like putting in the code, here and there, some usleep(). In the end we'll see low CPU load and low data rate, *but* very constant. Likely, is not either I/O wait, but some other wait. It could be not the lock stripe, but the locking of the process memory pages... This could be easily test, BTW. Maybe I'll try... bye, pg > > > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > Be careful what you wish for. You never know who will be listening. > - Terry Pratchett, _Soul Music_ -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 8:58 ` Guoqing Jiang 2020-05-11 15:39 ` Piergiorgio Sartor @ 2020-05-11 16:14 ` Piergiorgio Sartor 2020-05-11 20:53 ` Giuseppe Bilotta 2020-05-11 21:07 ` Guoqing Jiang 1 sibling, 2 replies; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-11 16:14 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Wolfgang Denk, linux-raid On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote: > Hi Wolfgang, > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote: > > Dear Guoqing Jiang, > > > > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: > > > Seems raid6check is in 'D' state, what are the output of 'cat > > > /proc/19719/stack' and /proc/mdstat? > > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_lo_store+0x50/0xa0 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_lo_store+0x50/0xa0 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_hi_store+0x44/0x90 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > [<0>] synchronize_rcu+0x47/0x50 > > [<0>] mddev_suspend+0x4a/0x140 > > [<0>] suspend_hi_store+0x44/0x90 > > [<0>] md_attr_store+0x86/0xe0 > > [<0>] kernfs_fop_write+0xce/0x1b0 > > [<0>] vfs_write+0xb6/0x1a0 > > [<0>] ksys_write+0x4f/0xc0 > > [<0>] do_syscall_64+0x5b/0xf0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend > is called, > means synchronize_rcu and other synchronize mechanisms are triggered in the > path ... > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write > > all the time? I thought it was _reading_ the disks only? > > I didn't read raid6check before, just find check_stripes has > > > while (length > 0) { > lock_stripe -> write suspend_lo/hi node > ... > unlock_all_stripes -> -> write suspend_lo/hi node > } > > I think it explains the stack of raid6check, and maybe it is way that > raid6check works, lock > stripe, check the stripe then unlock the stripe, just my guess ... Hi again! I made a quick test. I disabled the lock / unlock in raid6check. With lock / unlock, I get around 1.2MB/sec per device component, with ~13% CPU load. Wihtout lock / unlock, I get around 15.5MB/sec per device component, with ~30% CPU load. So, it seems the lock / unlock mechanism is quite expensive. I'm not sure what's the best solution, since we still need to avoid race conditions. Any suggestion is welcome! bye, pg > > And iostat does not report any writes either? > > Because CPU is busying with mddev_suspend I think. > > > # iostat /dev/sd[efhijklm] | cat > > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU) > > > > avg-cpu: %user %nice %system %iowait %steal %idle > > 0.18 0.00 1.07 0.17 0.00 98.58 > > > > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd > > sde 20.30 368.76 0.10 0.00 277022327 75178 0 > > sdf 20.28 368.77 0.10 0.00 277030081 75170 0 > > sdh 20.30 368.74 0.10 0.00 277007903 74854 0 > > sdi 20.30 368.79 0.10 0.00 277049113 75246 0 > > sdj 20.82 368.76 0.10 0.00 277022363 74986 0 > > sdk 20.30 368.73 0.10 0.00 277002179 76322 0 > > sdl 20.29 368.78 0.10 0.00 277039743 74982 0 > > sdm 20.29 368.75 0.10 0.00 277018163 74958 0 > > > > > > # cat /proc/mdstat > > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] > > md3 : active raid10 sdc1[0] sdd1[1] > > 234878976 blocks 512K chunks 2 far-copies [2/2] [UU] > > bitmap: 0/2 pages [0KB], 65536KB chunk > > > > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11] > > 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU] > > > > md1 : active raid1 sdb3[0] sda3[1] > > 484118656 blocks [2/2] [UU] > > > > md2 : active raid1 sdb1[0] sda1[1] > > 255936 blocks [2/2] [UU] > > > > unused devices: <none> > > > > > > 3 days later: > > > Is raid6check still in 'D' state as before? > > Yes, nothing changed, still running: > > > > top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00 > > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie > > %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st > > %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st > > %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache > > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check > > I think the stack of raid6check is pretty much the same as before. > > Since the estimated time of 12TB array is about 57 days, if the estimated > time is linear to > the number of stripes in the same machine, then it is how raid6check works > as I guessed. > > Thanks, > Guoqing -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 16:14 ` Piergiorgio Sartor @ 2020-05-11 20:53 ` Giuseppe Bilotta 2020-05-11 21:12 ` Guoqing Jiang 2020-05-12 16:05 ` Piergiorgio Sartor 2020-05-11 21:07 ` Guoqing Jiang 1 sibling, 2 replies; 38+ messages in thread From: Giuseppe Bilotta @ 2020-05-11 20:53 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Guoqing Jiang, Wolfgang Denk, linux-raid Hello Piergiorgio, On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote: > Hi again! > > I made a quick test. > I disabled the lock / unlock in raid6check. > > With lock / unlock, I get around 1.2MB/sec > per device component, with ~13% CPU load. > Wihtout lock / unlock, I get around 15.5MB/sec > per device component, with ~30% CPU load. > > So, it seems the lock / unlock mechanism is > quite expensive. > > I'm not sure what's the best solution, since > we still need to avoid race conditions. > > Any suggestion is welcome! Would it be possible/effective to lock multiple stripes at once? Lock, say, 8 or 16 stripes, process them, unlock. I'm not familiar with the internals, but if locking is O(1) on the number of stripes (at least if they are consecutive), this would help reduce (potentially by a factor of 8 or 16) the costs of the locks/unlocks at the expense of longer locks and their influence on external I/O. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 20:53 ` Giuseppe Bilotta @ 2020-05-11 21:12 ` Guoqing Jiang 2020-05-11 21:16 ` Guoqing Jiang 2020-05-12 16:05 ` Piergiorgio Sartor 1 sibling, 1 reply; 38+ messages in thread From: Guoqing Jiang @ 2020-05-11 21:12 UTC (permalink / raw) To: Giuseppe Bilotta, Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid On 5/11/20 10:53 PM, Giuseppe Bilotta wrote: > Hello Piergiorgio, > > On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor > <piergiorgio.sartor@nexgo.de> wrote: >> Hi again! >> >> I made a quick test. >> I disabled the lock / unlock in raid6check. >> >> With lock / unlock, I get around 1.2MB/sec >> per device component, with ~13% CPU load. >> Wihtout lock / unlock, I get around 15.5MB/sec >> per device component, with ~30% CPU load. >> >> So, it seems the lock / unlock mechanism is >> quite expensive. >> >> I'm not sure what's the best solution, since >> we still need to avoid race conditions. >> >> Any suggestion is welcome! > Would it be possible/effective to lock multiple stripes at once? Lock, > say, 8 or 16 stripes, process them, unlock. I'm not familiar with the > internals, but if locking is O(1) on the number of stripes (at least > if they are consecutive), this would help reduce (potentially by a > factor of 8 or 16) the costs of the locks/unlocks at the expense of > longer locks and their influence on external I/O. > Hmm, maybe something like. check_stripes -> mddev_suspend while (whole_stripe_num--) { check each stripe } -> mddev_resume Then just need to call suspend/resume once. Thanks, Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 21:12 ` Guoqing Jiang @ 2020-05-11 21:16 ` Guoqing Jiang 2020-05-12 1:52 ` Giuseppe Bilotta 0 siblings, 1 reply; 38+ messages in thread From: Guoqing Jiang @ 2020-05-11 21:16 UTC (permalink / raw) To: Giuseppe Bilotta, Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid On 5/11/20 11:12 PM, Guoqing Jiang wrote: > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote: >> Hello Piergiorgio, >> >> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor >> <piergiorgio.sartor@nexgo.de> wrote: >>> Hi again! >>> >>> I made a quick test. >>> I disabled the lock / unlock in raid6check. >>> >>> With lock / unlock, I get around 1.2MB/sec >>> per device component, with ~13% CPU load. >>> Wihtout lock / unlock, I get around 15.5MB/sec >>> per device component, with ~30% CPU load. >>> >>> So, it seems the lock / unlock mechanism is >>> quite expensive. >>> >>> I'm not sure what's the best solution, since >>> we still need to avoid race conditions. >>> >>> Any suggestion is welcome! >> Would it be possible/effective to lock multiple stripes at once? Lock, >> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the >> internals, but if locking is O(1) on the number of stripes (at least >> if they are consecutive), this would help reduce (potentially by a >> factor of 8 or 16) the costs of the locks/unlocks at the expense of >> longer locks and their influence on external I/O. >> > > Hmm, maybe something like. > > check_stripes > > -> mddev_suspend > > while (whole_stripe_num--) { > check each stripe > } > > -> mddev_resume > > > Then just need to call suspend/resume once. But basically, the array can't process any new requests when checking is in progress ... Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 21:16 ` Guoqing Jiang @ 2020-05-12 1:52 ` Giuseppe Bilotta 2020-05-12 6:27 ` Adam Goryachev 0 siblings, 1 reply; 38+ messages in thread From: Giuseppe Bilotta @ 2020-05-12 1:52 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang <guoqing.jiang@cloud.ionos.com> wrote: > On 5/11/20 11:12 PM, Guoqing Jiang wrote: > > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote: > >> Would it be possible/effective to lock multiple stripes at once? Lock, > >> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the > >> internals, but if locking is O(1) on the number of stripes (at least > >> if they are consecutive), this would help reduce (potentially by a > >> factor of 8 or 16) the costs of the locks/unlocks at the expense of > >> longer locks and their influence on external I/O. > >> > > > > Hmm, maybe something like. > > > > check_stripes > > > > -> mddev_suspend > > > > while (whole_stripe_num--) { > > check each stripe > > } > > > > -> mddev_resume > > > > > > Then just need to call suspend/resume once. > > But basically, the array can't process any new requests when checking is Yeah, locking the entire device might be excessive (especially if it's a big one). Using a granularity larger than 1 but smaller than the whole device could be a compromise. Since the “no lock” approach seems to be about an order of magnitude faster (at least in Piergiorgio's benchmark), my guess was that something between 8 and 16 could bring the speed up to be close to the “no lock” case without having dramatic effects on I/O. Reading all 8/16 stripes before processing (assuming sufficient memory) might even lead to better disk utilization during the check. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 1:52 ` Giuseppe Bilotta @ 2020-05-12 6:27 ` Adam Goryachev 2020-05-12 16:11 ` Piergiorgio Sartor 0 siblings, 1 reply; 38+ messages in thread From: Adam Goryachev @ 2020-05-12 6:27 UTC (permalink / raw) To: Giuseppe Bilotta, Guoqing Jiang Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid On 12/5/20 11:52, Giuseppe Bilotta wrote: > On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang > <guoqing.jiang@cloud.ionos.com> wrote: >> On 5/11/20 11:12 PM, Guoqing Jiang wrote: >>> On 5/11/20 10:53 PM, Giuseppe Bilotta wrote: >>>> Would it be possible/effective to lock multiple stripes at once? Lock, >>>> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the >>>> internals, but if locking is O(1) on the number of stripes (at least >>>> if they are consecutive), this would help reduce (potentially by a >>>> factor of 8 or 16) the costs of the locks/unlocks at the expense of >>>> longer locks and their influence on external I/O. >>>> >>> Hmm, maybe something like. >>> >>> check_stripes >>> >>> -> mddev_suspend >>> >>> while (whole_stripe_num--) { >>> check each stripe >>> } >>> >>> -> mddev_resume >>> >>> >>> Then just need to call suspend/resume once. >> But basically, the array can't process any new requests when checking is > Yeah, locking the entire device might be excessive (especially if it's > a big one). Using a granularity larger than 1 but smaller than the > whole device could be a compromise. Since the “no lock” approach seems > to be about an order of magnitude faster (at least in Piergiorgio's > benchmark), my guess was that something between 8 and 16 could bring > the speed up to be close to the “no lock” case without having dramatic > effects on I/O. Reading all 8/16 stripes before processing (assuming > sufficient memory) might even lead to better disk utilization during > the check. I know very little about this, but could you perhaps lock 2 x 16 stripes, and then after you complete the first 16, release the first 16, lock the 3rd 16 stripes, and while waiting for the lock continue to process the 2nd set of 16? Would that allow you to do more processing and less waiting for lock/release? Regards, Adam ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 6:27 ` Adam Goryachev @ 2020-05-12 16:11 ` Piergiorgio Sartor 0 siblings, 0 replies; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-12 16:11 UTC (permalink / raw) To: Adam Goryachev Cc: Giuseppe Bilotta, Guoqing Jiang, Piergiorgio Sartor, Wolfgang Denk, linux-raid On Tue, May 12, 2020 at 04:27:59PM +1000, Adam Goryachev wrote: > > On 12/5/20 11:52, Giuseppe Bilotta wrote: > > On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang > > <guoqing.jiang@cloud.ionos.com> wrote: > > > On 5/11/20 11:12 PM, Guoqing Jiang wrote: > > > > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote: > > > > > Would it be possible/effective to lock multiple stripes at once? Lock, > > > > > say, 8 or 16 stripes, process them, unlock. I'm not familiar with the > > > > > internals, but if locking is O(1) on the number of stripes (at least > > > > > if they are consecutive), this would help reduce (potentially by a > > > > > factor of 8 or 16) the costs of the locks/unlocks at the expense of > > > > > longer locks and their influence on external I/O. > > > > > > > > > Hmm, maybe something like. > > > > > > > > check_stripes > > > > > > > > -> mddev_suspend > > > > > > > > while (whole_stripe_num--) { > > > > check each stripe > > > > } > > > > > > > > -> mddev_resume > > > > > > > > > > > > Then just need to call suspend/resume once. > > > But basically, the array can't process any new requests when checking is > > Yeah, locking the entire device might be excessive (especially if it's > > a big one). Using a granularity larger than 1 but smaller than the > > whole device could be a compromise. Since the “no lock” approach seems > > to be about an order of magnitude faster (at least in Piergiorgio's > > benchmark), my guess was that something between 8 and 16 could bring > > the speed up to be close to the “no lock” case without having dramatic > > effects on I/O. Reading all 8/16 stripes before processing (assuming > > sufficient memory) might even lead to better disk utilization during > > the check. > > I know very little about this, but could you perhaps lock 2 x 16 stripes, > and then after you complete the first 16, release the first 16, lock the 3rd > 16 stripes, and while waiting for the lock continue to process the 2nd set > of 16? For some reason I don not know, the unlock is global. If I recall correctly, this was the way Neil mentioned is "more" correct. > Would that allow you to do more processing and less waiting for > lock/release? I think the general concept of pipelineing is good, this would really improve the performances of the whole thing. If we could just multithread, I suspect it could improve. We need to solve the unlock problem... bye, > > Regards, > Adam -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 20:53 ` Giuseppe Bilotta 2020-05-11 21:12 ` Guoqing Jiang @ 2020-05-12 16:05 ` Piergiorgio Sartor 1 sibling, 0 replies; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-12 16:05 UTC (permalink / raw) To: Giuseppe Bilotta Cc: Piergiorgio Sartor, Guoqing Jiang, Wolfgang Denk, linux-raid On Mon, May 11, 2020 at 10:53:05PM +0200, Giuseppe Bilotta wrote: > Hello Piergiorgio, > > On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor > <piergiorgio.sartor@nexgo.de> wrote: > > Hi again! > > > > I made a quick test. > > I disabled the lock / unlock in raid6check. > > > > With lock / unlock, I get around 1.2MB/sec > > per device component, with ~13% CPU load. > > Wihtout lock / unlock, I get around 15.5MB/sec > > per device component, with ~30% CPU load. > > > > So, it seems the lock / unlock mechanism is > > quite expensive. > > > > I'm not sure what's the best solution, since > > we still need to avoid race conditions. > > > > Any suggestion is welcome! > > Would it be possible/effective to lock multiple stripes at once? Lock, > say, 8 or 16 stripes, process them, unlock. I'm not familiar with the > internals, but if locking is O(1) on the number of stripes (at least > if they are consecutive), this would help reduce (potentially by a > factor of 8 or 16) the costs of the locks/unlocks at the expense of > longer locks and their influence on external I/O. Probabily possible, from the technical point of view, even if I do not know either the effect. From the coding point of view, a bit tricky, boundary conditions and so on must be properly considered. > > -- > Giuseppe "Oblomov" Bilotta -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 16:14 ` Piergiorgio Sartor 2020-05-11 20:53 ` Giuseppe Bilotta @ 2020-05-11 21:07 ` Guoqing Jiang 2020-05-11 22:44 ` Peter Grandi 2020-05-12 16:07 ` Piergiorgio Sartor 1 sibling, 2 replies; 38+ messages in thread From: Guoqing Jiang @ 2020-05-11 21:07 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid On 5/11/20 6:14 PM, Piergiorgio Sartor wrote: > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote: >> Hi Wolfgang, >> >> >> On 5/11/20 8:40 AM, Wolfgang Denk wrote: >>> Dear Guoqing Jiang, >>> >>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: >>>> Seems raid6check is in 'D' state, what are the output of 'cat >>>> /proc/19719/stack' and /proc/mdstat? >>> # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done >>> [<0>] __wait_rcu_gp+0x10d/0x110 >>> [<0>] synchronize_rcu+0x47/0x50 >>> [<0>] mddev_suspend+0x4a/0x140 >>> [<0>] suspend_lo_store+0x50/0xa0 >>> [<0>] md_attr_store+0x86/0xe0 >>> [<0>] kernfs_fop_write+0xce/0x1b0 >>> [<0>] vfs_write+0xb6/0x1a0 >>> [<0>] ksys_write+0x4f/0xc0 >>> [<0>] do_syscall_64+0x5b/0xf0 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> >>> [<0>] __wait_rcu_gp+0x10d/0x110 >>> [<0>] synchronize_rcu+0x47/0x50 >>> [<0>] mddev_suspend+0x4a/0x140 >>> [<0>] suspend_lo_store+0x50/0xa0 >>> [<0>] md_attr_store+0x86/0xe0 >>> [<0>] kernfs_fop_write+0xce/0x1b0 >>> [<0>] vfs_write+0xb6/0x1a0 >>> [<0>] ksys_write+0x4f/0xc0 >>> [<0>] do_syscall_64+0x5b/0xf0 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> >>> [<0>] __wait_rcu_gp+0x10d/0x110 >>> [<0>] synchronize_rcu+0x47/0x50 >>> [<0>] mddev_suspend+0x4a/0x140 >>> [<0>] suspend_hi_store+0x44/0x90 >>> [<0>] md_attr_store+0x86/0xe0 >>> [<0>] kernfs_fop_write+0xce/0x1b0 >>> [<0>] vfs_write+0xb6/0x1a0 >>> [<0>] ksys_write+0x4f/0xc0 >>> [<0>] do_syscall_64+0x5b/0xf0 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> >>> [<0>] __wait_rcu_gp+0x10d/0x110 >>> [<0>] synchronize_rcu+0x47/0x50 >>> [<0>] mddev_suspend+0x4a/0x140 >>> [<0>] suspend_hi_store+0x44/0x90 >>> [<0>] md_attr_store+0x86/0xe0 >>> [<0>] kernfs_fop_write+0xce/0x1b0 >>> [<0>] vfs_write+0xb6/0x1a0 >>> [<0>] ksys_write+0x4f/0xc0 >>> [<0>] do_syscall_64+0x5b/0xf0 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend >> is called, >> means synchronize_rcu and other synchronize mechanisms are triggered in the >> path ... >> >>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write >>> all the time? I thought it was_reading_ the disks only? >> I didn't read raid6check before, just find check_stripes has >> >> >> while (length > 0) { >> lock_stripe -> write suspend_lo/hi node >> ... >> unlock_all_stripes -> -> write suspend_lo/hi node >> } >> >> I think it explains the stack of raid6check, and maybe it is way that >> raid6check works, lock >> stripe, check the stripe then unlock the stripe, just my guess ... > Hi again! > > I made a quick test. > I disabled the lock / unlock in raid6check. > > With lock / unlock, I get around 1.2MB/sec > per device component, with ~13% CPU load. > Wihtout lock / unlock, I get around 15.5MB/sec > per device component, with ~30% CPU load. > > So, it seems the lock / unlock mechanism is > quite expensive. Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe. > I'm not sure what's the best solution, since > we still need to avoid race conditions. I guess there are two possible ways: 1. Per your previous reply, only call raid6check when array is RO, then we don't need the lock. 2. Investigate if it is possible that acquire stripe_lock in suspend_lo/hi_store to avoid the race between raid6check and write to the same stripe. IOW, try fine grained protection instead of call the expensive suspend/resume in suspend_lo/hi_store. But I am not sure it is doable or not right now. BTW, seems there are build problems for raid6check ... mdadm$ make raid6check gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o sysfs.o: In function `sysfsline': sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid' sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero' sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero' collect2: error: ld returned 1 exit status Makefile:220: recipe for target 'raid6check' failed make: *** [raid6check] Error 1 Thanks, Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 21:07 ` Guoqing Jiang @ 2020-05-11 22:44 ` Peter Grandi 2020-05-12 16:09 ` Piergiorgio Sartor 2020-05-12 16:07 ` Piergiorgio Sartor 1 sibling, 1 reply; 38+ messages in thread From: Peter Grandi @ 2020-05-11 22:44 UTC (permalink / raw) To: Linux RAID >>> With lock / unlock, I get around 1.2MB/sec per device >>> component, with ~13% CPU load. Wihtout lock / unlock, I get >>> around 15.5MB/sec per device component, with ~30% CPU load. >> [...] we still need to avoid race conditions. [...] Not all race conditions are equally bad in this situation. > 1. Per your previous reply, only call raid6check when array is > RO, then we don't need the lock. > 2. Investigate if it is possible that acquire stripe_lock in > suspend_lo/hi_store [...] Some other ways could be considered: * Read a stripe without locking and check it; if it checks good, no problem, else either it was modified during the read, or it was faulty, so acquire a W lock, reread and recheck it (it could have become good in the meantime). The assumption here is that there is a modest write load from applications on the RAID set, so the check will almost always succeed, and it is worth rereading the stripe in very rare cases of "collisions" or faults. * Variants, like acquiring a W lock (if possible) on the stripe solely while reading it ("atomic" read, which may be possible in other ways without locking) and then if check fails we know it was faulty, so optionally acquire a new W lock and reread and recheck it (it could have become good in the meantime). The assumption here is that the write load is less modest, but there are a lot more reads than writes, so a W lock only during read will eliminate the rereads and rechecks from relatively rare "collisions". The case where there is at the same time a large application write load on the RAID set and checking at the same time is hard to improve and probably eliminating rereads and rechecks by just acquiring the stripe W lock for the whole duration of read and check. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 22:44 ` Peter Grandi @ 2020-05-12 16:09 ` Piergiorgio Sartor 2020-05-12 20:54 ` antlists 0 siblings, 1 reply; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-12 16:09 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID On Mon, May 11, 2020 at 11:44:11PM +0100, Peter Grandi wrote: > >>> With lock / unlock, I get around 1.2MB/sec per device > >>> component, with ~13% CPU load. Wihtout lock / unlock, I get > >>> around 15.5MB/sec per device component, with ~30% CPU load. > > >> [...] we still need to avoid race conditions. [...] > > Not all race conditions are equally bad in this situation. > > > 1. Per your previous reply, only call raid6check when array is > > RO, then we don't need the lock. > > 2. Investigate if it is possible that acquire stripe_lock in > > suspend_lo/hi_store [...] > > Some other ways could be considered: > > * Read a stripe without locking and check it; if it checks good, > no problem, else either it was modified during the read, or it > was faulty, so acquire a W lock, reread and recheck it (it > could have become good in the meantime). > > The assumption here is that there is a modest write load from > applications on the RAID set, so the check will almost always > succeed, and it is worth rereading the stripe in very rare > cases of "collisions" or faults. > > * Variants, like acquiring a W lock (if possible) on the stripe > solely while reading it ("atomic" read, which may be possible > in other ways without locking) and then if check fails we know > it was faulty, so optionally acquire a new W lock and reread > and recheck it (it could have become good in the meantime). > > The assumption here is that the write load is less modest, but > there are a lot more reads than writes, so a W lock only > during read will eliminate the rereads and rechecks from > relatively rare "collisions". The locking method was suggested by Neil, I'm not aware of other methods. About the check -> maybe lock -> re-check, it is a possible workaround, but I find it a bit extreme. In any case, we should keep it in mind. bye, pg > The case where there is at the same time a large application > write load on the RAID set and checking at the same time is hard > to improve and probably eliminating rereads and rechecks by just > acquiring the stripe W lock for the whole duration of read and > check. -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 16:09 ` Piergiorgio Sartor @ 2020-05-12 20:54 ` antlists 2020-05-13 16:18 ` Piergiorgio Sartor 0 siblings, 1 reply; 38+ messages in thread From: antlists @ 2020-05-12 20:54 UTC (permalink / raw) To: Piergiorgio Sartor, Peter Grandi; +Cc: Linux RAID On 12/05/2020 17:09, Piergiorgio Sartor wrote: > About the check -> maybe lock -> re-check, > it is a possible workaround, but I find it > a bit extreme. This seems the best (most obvious?) solution to me. If the system is under light write pressure, and the disk is healthy, it will scan pretty quickly with almost no locking. If the system is under heavy pressure, chances are there'll be a fair few stripes needing rechecking, but even at it's worst it'll only be as bad as the current setup. And if the system is somewhere inbetween, you still stand a good chance of a fast scan. At the end of the day, the rule should always be "lock only if you need to" so looking for problems with an optimistic no-lock scan, then locking only if needed to check and fix the problem, just feels right. Cheers, Wol ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 20:54 ` antlists @ 2020-05-13 16:18 ` Piergiorgio Sartor 2020-05-13 17:37 ` Wols Lists 0 siblings, 1 reply; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-13 16:18 UTC (permalink / raw) To: antlists; +Cc: Piergiorgio Sartor, Peter Grandi, Linux RAID On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote: > On 12/05/2020 17:09, Piergiorgio Sartor wrote: > > About the check -> maybe lock -> re-check, > > it is a possible workaround, but I find it > > a bit extreme. > > This seems the best (most obvious?) solution to me. > > If the system is under light write pressure, and the disk is healthy, it > will scan pretty quickly with almost no locking. I've some concerns about optimization solutions which can result in less performances than the original status. You mention "write pressure", but there is an other case, which will cause read -> lock -> re-read... Namely, when some chunk is really corrupted. Now, I do not know, maybe there are other things we overlook, or maybe not. I do not know either how likely is that some situations will occur to reduce performances. I would prefer a solution which will *only* improve, without any possible drawback. Again, this does not mean this approach is wrong, actually is to be considered. In the end, I would like also to understand why the lock / unlock is so expensive. > If the system is under heavy pressure, chances are there'll be a fair few > stripes needing rechecking, but even at it's worst it'll only be as bad as > the current setup. It will be worse (or worst, I'm always confused...). The read and the check will double. I'm not sure about the read, but the check is currently expensive. bye, pg > And if the system is somewhere inbetween, you still stand a good chance of a > fast scan. > > At the end of the day, the rule should always be "lock only if you need to" > so looking for problems with an optimistic no-lock scan, then locking only > if needed to check and fix the problem, just feels right. > > Cheers, > Wol -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-13 16:18 ` Piergiorgio Sartor @ 2020-05-13 17:37 ` Wols Lists 2020-05-13 18:23 ` Piergiorgio Sartor 0 siblings, 1 reply; 38+ messages in thread From: Wols Lists @ 2020-05-13 17:37 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Peter Grandi, Linux RAID On 13/05/20 17:18, Piergiorgio Sartor wrote: > On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote: >> On 12/05/2020 17:09, Piergiorgio Sartor wrote: >>> About the check -> maybe lock -> re-check, >>> it is a possible workaround, but I find it >>> a bit extreme. >> >> This seems the best (most obvious?) solution to me. >> >> If the system is under light write pressure, and the disk is healthy, it >> will scan pretty quickly with almost no locking. > > I've some concerns about optimization > solutions which can result in less > performances than the original status. > > You mention "write pressure", but there > is an other case, which will cause > read -> lock -> re-read... > Namely, when some chunk is really corrupted. > Yup. That's why I said "the disk is healthy" :-) > Now, I do not know, maybe there are other > things we overlook, or maybe not. > > I do not know either how likely is that some > situations will occur to reduce performances. > > I would prefer a solution which will *only* > improve, without any possible drawback. Wouldn't we all. But if the *normal* case shows an appreciable improvement, then I'm inclined to write off a "shouldn't happen" case as "tough luck, shit happens". > > Again, this does not mean this approach is > wrong, actually is to be considered. > > In the end, I would like also to understand > why the lock / unlock is so expensive. Agreed. > >> If the system is under heavy pressure, chances are there'll be a fair few >> stripes needing rechecking, but even at it's worst it'll only be as bad as >> the current setup. > > It will be worse (or worst, I'm always > confused...). > The read and the check will double. Touche - my logic was off ... But a bit of grammar - bad = descriptive, worse = comparative, worst = absolute, so you were correct with worse. > > I'm not sure about the read, but the > check is currently expensive. But you're still going to need a very unlucky state of affairs for the optimised check to be worse. Okay, if the disk IS damaged, then the optimised check could easily be the worst, but if it's just write pressure, you're going to need every second stripe to be messed up by a collision. Rather unlikely imho. > > bye, > > pg Cheers, Wol > >> And if the system is somewhere inbetween, you still stand a good chance of a >> fast scan. >> >> At the end of the day, the rule should always be "lock only if you need to" >> so looking for problems with an optimistic no-lock scan, then locking only >> if needed to check and fix the problem, just feels right. >> >> Cheers, >> Wol > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-13 17:37 ` Wols Lists @ 2020-05-13 18:23 ` Piergiorgio Sartor 0 siblings, 0 replies; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-13 18:23 UTC (permalink / raw) To: Wols Lists; +Cc: Piergiorgio Sartor, Peter Grandi, Linux RAID On Wed, May 13, 2020 at 06:37:18PM +0100, Wols Lists wrote: > On 13/05/20 17:18, Piergiorgio Sartor wrote: > > On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote: > >> On 12/05/2020 17:09, Piergiorgio Sartor wrote: > >>> About the check -> maybe lock -> re-check, > >>> it is a possible workaround, but I find it > >>> a bit extreme. > >> > >> This seems the best (most obvious?) solution to me. > >> > >> If the system is under light write pressure, and the disk is healthy, it > >> will scan pretty quickly with almost no locking. > > > > I've some concerns about optimization > > solutions which can result in less > > performances than the original status. > > > > You mention "write pressure", but there > > is an other case, which will cause > > read -> lock -> re-read... > > Namely, when some chunk is really corrupted. > > > Yup. That's why I said "the disk is healthy" :-) We need to consider all posibilities... > > Now, I do not know, maybe there are other > > things we overlook, or maybe not. > > > > I do not know either how likely is that some > > situations will occur to reduce performances. > > > > I would prefer a solution which will *only* > > improve, without any possible drawback. > > Wouldn't we all. But if the *normal* case shows an appreciable > improvement, then I'm inclined to write off a "shouldn't happen" case as > "tough luck, shit happens". > > > > Again, this does not mean this approach is > > wrong, actually is to be considered. > > > > In the end, I would like also to understand > > why the lock / unlock is so expensive. > > Agreed. > > > >> If the system is under heavy pressure, chances are there'll be a fair few > >> stripes needing rechecking, but even at it's worst it'll only be as bad as > >> the current setup. > > > > It will be worse (or worst, I'm always > > confused...). > > The read and the check will double. > > Touche - my logic was off ... > > But a bit of grammar - bad = descriptive, worse = comparative, worst = > absolute, so you were correct with worse. Ah! Thank you. That's always confusing me. Usually I check with some search engine, but sometimes I'm too lazy... And then I forgot. BTW, somehow related, please do not refrain to correct my English. > > I'm not sure about the read, but the > > check is currently expensive. > > But you're still going to need a very unlucky state of affairs for the > optimised check to be worse. Okay, if the disk IS damaged, then the > optimised check could easily be the worst, but if it's just write > pressure, you're going to need every second stripe to be messed up by a > collision. Rather unlikely imho. Well, as Neil would say, patch are welcome! :-) Really, I've too little time to make changes to the code. I can do some test and, hopefully, some support. bye, pg > > > > bye, > > > > pg > > Cheers, > Wol > > > >> And if the system is somewhere inbetween, you still stand a good chance of a > >> fast scan. > >> > >> At the end of the day, the rule should always be "lock only if you need to" > >> so looking for problems with an optimistic no-lock scan, then locking only > >> if needed to check and fix the problem, just feels right. > >> > >> Cheers, > >> Wol > > -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-11 21:07 ` Guoqing Jiang 2020-05-11 22:44 ` Peter Grandi @ 2020-05-12 16:07 ` Piergiorgio Sartor 2020-05-12 18:16 ` Guoqing Jiang 2020-05-13 6:07 ` Wolfgang Denk 1 sibling, 2 replies; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-12 16:07 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote: > On 5/11/20 6:14 PM, Piergiorgio Sartor wrote: > > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote: > > > Hi Wolfgang, > > > > > > > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote: > > > > Dear Guoqing Jiang, > > > > > > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: > > > > > Seems raid6check is in 'D' state, what are the output of 'cat > > > > > /proc/19719/stack' and /proc/mdstat? > > > > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > [<0>] suspend_lo_store+0x50/0xa0 > > > > [<0>] md_attr_store+0x86/0xe0 > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > [<0>] ksys_write+0x4f/0xc0 > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > [<0>] suspend_lo_store+0x50/0xa0 > > > > [<0>] md_attr_store+0x86/0xe0 > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > [<0>] ksys_write+0x4f/0xc0 > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > [<0>] suspend_hi_store+0x44/0x90 > > > > [<0>] md_attr_store+0x86/0xe0 > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > [<0>] ksys_write+0x4f/0xc0 > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > [<0>] suspend_hi_store+0x44/0x90 > > > > [<0>] md_attr_store+0x86/0xe0 > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > [<0>] ksys_write+0x4f/0xc0 > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend > > > is called, > > > means synchronize_rcu and other synchronize mechanisms are triggered in the > > > path ... > > > > > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write > > > > all the time? I thought it was_reading_ the disks only? > > > I didn't read raid6check before, just find check_stripes has > > > > > > > > > while (length > 0) { > > > lock_stripe -> write suspend_lo/hi node > > > ... > > > unlock_all_stripes -> -> write suspend_lo/hi node > > > } > > > > > > I think it explains the stack of raid6check, and maybe it is way that > > > raid6check works, lock > > > stripe, check the stripe then unlock the stripe, just my guess ... > > Hi again! > > > > I made a quick test. > > I disabled the lock / unlock in raid6check. > > > > With lock / unlock, I get around 1.2MB/sec > > per device component, with ~13% CPU load. > > Wihtout lock / unlock, I get around 15.5MB/sec > > per device component, with ~30% CPU load. > > > > So, it seems the lock / unlock mechanism is > > quite expensive. > > Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe. > > > I'm not sure what's the best solution, since > > we still need to avoid race conditions. > > I guess there are two possible ways: > > 1. Per your previous reply, only call raid6check when array is RO, then > we don't need the lock. > > 2. Investigate if it is possible that acquire stripe_lock in > suspend_lo/hi_store > to avoid the race between raid6check and write to the same stripe. IOW, > try fine grained protection instead of call the expensive suspend/resume > in suspend_lo/hi_store. But I am not sure it is doable or not right now. Could you please elaborate on the "fine grained protection" thing? > > BTW, seems there are build problems for raid6check ... > > mdadm$ make raid6check > gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter > -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" > -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" > -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" > -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM > -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS > -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o > xmalloc.o dlink.o > sysfs.o: In function `sysfsline': > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid' > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero' > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero' > collect2: error: ld returned 1 exit status > Makefile:220: recipe for target 'raid6check' failed > make: *** [raid6check] Error 1 I cannot see this problem. I could compile without issue. Maybe some library is missing somewhere, but I'm not sure where. bye, pg > > Thanks, > Guoqing -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 16:07 ` Piergiorgio Sartor @ 2020-05-12 18:16 ` Guoqing Jiang 2020-05-12 18:32 ` Piergiorgio Sartor 2020-05-13 6:07 ` Wolfgang Denk 1 sibling, 1 reply; 38+ messages in thread From: Guoqing Jiang @ 2020-05-12 18:16 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid On 5/12/20 6:07 PM, Piergiorgio Sartor wrote: > On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote: >> On 5/11/20 6:14 PM, Piergiorgio Sartor wrote: >>> On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote: >>>> Hi Wolfgang, >>>> >>>> >>>> On 5/11/20 8:40 AM, Wolfgang Denk wrote: >>>>> Dear Guoqing Jiang, >>>>> >>>>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: >>>>>> Seems raid6check is in 'D' state, what are the output of 'cat >>>>>> /proc/19719/stack' and /proc/mdstat? >>>>> # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done >>>>> [<0>] __wait_rcu_gp+0x10d/0x110 >>>>> [<0>] synchronize_rcu+0x47/0x50 >>>>> [<0>] mddev_suspend+0x4a/0x140 >>>>> [<0>] suspend_lo_store+0x50/0xa0 >>>>> [<0>] md_attr_store+0x86/0xe0 >>>>> [<0>] kernfs_fop_write+0xce/0x1b0 >>>>> [<0>] vfs_write+0xb6/0x1a0 >>>>> [<0>] ksys_write+0x4f/0xc0 >>>>> [<0>] do_syscall_64+0x5b/0xf0 >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> >>>>> [<0>] __wait_rcu_gp+0x10d/0x110 >>>>> [<0>] synchronize_rcu+0x47/0x50 >>>>> [<0>] mddev_suspend+0x4a/0x140 >>>>> [<0>] suspend_lo_store+0x50/0xa0 >>>>> [<0>] md_attr_store+0x86/0xe0 >>>>> [<0>] kernfs_fop_write+0xce/0x1b0 >>>>> [<0>] vfs_write+0xb6/0x1a0 >>>>> [<0>] ksys_write+0x4f/0xc0 >>>>> [<0>] do_syscall_64+0x5b/0xf0 >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> >>>>> [<0>] __wait_rcu_gp+0x10d/0x110 >>>>> [<0>] synchronize_rcu+0x47/0x50 >>>>> [<0>] mddev_suspend+0x4a/0x140 >>>>> [<0>] suspend_hi_store+0x44/0x90 >>>>> [<0>] md_attr_store+0x86/0xe0 >>>>> [<0>] kernfs_fop_write+0xce/0x1b0 >>>>> [<0>] vfs_write+0xb6/0x1a0 >>>>> [<0>] ksys_write+0x4f/0xc0 >>>>> [<0>] do_syscall_64+0x5b/0xf0 >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> >>>>> [<0>] __wait_rcu_gp+0x10d/0x110 >>>>> [<0>] synchronize_rcu+0x47/0x50 >>>>> [<0>] mddev_suspend+0x4a/0x140 >>>>> [<0>] suspend_hi_store+0x44/0x90 >>>>> [<0>] md_attr_store+0x86/0xe0 >>>>> [<0>] kernfs_fop_write+0xce/0x1b0 >>>>> [<0>] vfs_write+0xb6/0x1a0 >>>>> [<0>] ksys_write+0x4f/0xc0 >>>>> [<0>] do_syscall_64+0x5b/0xf0 >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend >>>> is called, >>>> means synchronize_rcu and other synchronize mechanisms are triggered in the >>>> path ... >>>> >>>>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write >>>>> all the time? I thought it was_reading_ the disks only? >>>> I didn't read raid6check before, just find check_stripes has >>>> >>>> >>>> while (length > 0) { >>>> lock_stripe -> write suspend_lo/hi node >>>> ... >>>> unlock_all_stripes -> -> write suspend_lo/hi node >>>> } >>>> >>>> I think it explains the stack of raid6check, and maybe it is way that >>>> raid6check works, lock >>>> stripe, check the stripe then unlock the stripe, just my guess ... >>> Hi again! >>> >>> I made a quick test. >>> I disabled the lock / unlock in raid6check. >>> >>> With lock / unlock, I get around 1.2MB/sec >>> per device component, with ~13% CPU load. >>> Wihtout lock / unlock, I get around 15.5MB/sec >>> per device component, with ~30% CPU load. >>> >>> So, it seems the lock / unlock mechanism is >>> quite expensive. >> Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe. >> >>> I'm not sure what's the best solution, since >>> we still need to avoid race conditions. >> I guess there are two possible ways: >> >> 1. Per your previous reply, only call raid6check when array is RO, then >> we don't need the lock. >> >> 2. Investigate if it is possible that acquire stripe_lock in >> suspend_lo/hi_store >> to avoid the race between raid6check and write to the same stripe. IOW, >> try fine grained protection instead of call the expensive suspend/resume >> in suspend_lo/hi_store. But I am not sure it is doable or not right now. > Could you please elaborate on the > "fine grained protection" thing? Even raid6check checks stripe and locks stripe one by one, but the thing is different in kernel space, locking of one stripe triggers mddev_suspend and mddev_resume which affect all stripes ... If kernel can expose interface to actually locking one stripe, then raid6check could use it to actually lock only one stripe (this is what I call fine grained) instead of trigger suspend/resume which are time consuming. > >> BTW, seems there are build problems for raid6check ... >> >> mdadm$ make raid6check >> gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter >> -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" >> -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" >> -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" >> -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM >> -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS >> -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c >> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o >> xmalloc.o dlink.o >> sysfs.o: In function `sysfsline': >> sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid' >> sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero' >> sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero' >> collect2: error: ld returned 1 exit status >> Makefile:220: recipe for target 'raid6check' failed >> make: *** [raid6check] Error 1 > I cannot see this problem. > I could compile without issue. > Maybe some library is missing somewhere, > but I'm not sure where. Do you try with the fastest mdadm tree? But could be environment issue ... Thanks, Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 18:16 ` Guoqing Jiang @ 2020-05-12 18:32 ` Piergiorgio Sartor 2020-05-13 6:18 ` Wolfgang Denk 0 siblings, 1 reply; 38+ messages in thread From: Piergiorgio Sartor @ 2020-05-12 18:32 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid On Tue, May 12, 2020 at 08:16:27PM +0200, Guoqing Jiang wrote: > On 5/12/20 6:07 PM, Piergiorgio Sartor wrote: > > On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote: > > > On 5/11/20 6:14 PM, Piergiorgio Sartor wrote: > > > > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote: > > > > > Hi Wolfgang, > > > > > > > > > > > > > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote: > > > > > > Dear Guoqing Jiang, > > > > > > > > > > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote: > > > > > > > Seems raid6check is in 'D' state, what are the output of 'cat > > > > > > > /proc/19719/stack' and /proc/mdstat? > > > > > > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > > > [<0>] suspend_lo_store+0x50/0xa0 > > > > > > [<0>] md_attr_store+0x86/0xe0 > > > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > > > [<0>] ksys_write+0x4f/0xc0 > > > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > > > [<0>] suspend_lo_store+0x50/0xa0 > > > > > > [<0>] md_attr_store+0x86/0xe0 > > > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > > > [<0>] ksys_write+0x4f/0xc0 > > > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > > > [<0>] suspend_hi_store+0x44/0x90 > > > > > > [<0>] md_attr_store+0x86/0xe0 > > > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > > > [<0>] ksys_write+0x4f/0xc0 > > > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > > > > [<0>] __wait_rcu_gp+0x10d/0x110 > > > > > > [<0>] synchronize_rcu+0x47/0x50 > > > > > > [<0>] mddev_suspend+0x4a/0x140 > > > > > > [<0>] suspend_hi_store+0x44/0x90 > > > > > > [<0>] md_attr_store+0x86/0xe0 > > > > > > [<0>] kernfs_fop_write+0xce/0x1b0 > > > > > > [<0>] vfs_write+0xb6/0x1a0 > > > > > > [<0>] ksys_write+0x4f/0xc0 > > > > > > [<0>] do_syscall_64+0x5b/0xf0 > > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend > > > > > is called, > > > > > means synchronize_rcu and other synchronize mechanisms are triggered in the > > > > > path ... > > > > > > > > > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write > > > > > > all the time? I thought it was_reading_ the disks only? > > > > > I didn't read raid6check before, just find check_stripes has > > > > > > > > > > > > > > > while (length > 0) { > > > > > lock_stripe -> write suspend_lo/hi node > > > > > ... > > > > > unlock_all_stripes -> -> write suspend_lo/hi node > > > > > } > > > > > > > > > > I think it explains the stack of raid6check, and maybe it is way that > > > > > raid6check works, lock > > > > > stripe, check the stripe then unlock the stripe, just my guess ... > > > > Hi again! > > > > > > > > I made a quick test. > > > > I disabled the lock / unlock in raid6check. > > > > > > > > With lock / unlock, I get around 1.2MB/sec > > > > per device component, with ~13% CPU load. > > > > Wihtout lock / unlock, I get around 15.5MB/sec > > > > per device component, with ~30% CPU load. > > > > > > > > So, it seems the lock / unlock mechanism is > > > > quite expensive. > > > Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe. > > > > > > > I'm not sure what's the best solution, since > > > > we still need to avoid race conditions. > > > I guess there are two possible ways: > > > > > > 1. Per your previous reply, only call raid6check when array is RO, then > > > we don't need the lock. > > > > > > 2. Investigate if it is possible that acquire stripe_lock in > > > suspend_lo/hi_store > > > to avoid the race between raid6check and write to the same stripe. IOW, > > > try fine grained protection instead of call the expensive suspend/resume > > > in suspend_lo/hi_store. But I am not sure it is doable or not right now. > > Could you please elaborate on the > > "fine grained protection" thing? > > Even raid6check checks stripe and locks stripe one by one, but the thing > is different in kernel space, locking of one stripe triggers mddev_suspend > and mddev_resume which affect all stripes ... > > If kernel can expose interface to actually locking one stripe, then > raid6check > could use it to actually lock only one stripe (this is what I call fine > grained) > instead of trigger suspend/resume which are time consuming. I see, you mean we need a different interface to this lock / unlock thing. > > > BTW, seems there are build problems for raid6check ... > > > > > > mdadm$ make raid6check > > > gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter > > > -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" > > > -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" > > > -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" > > > -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM > > > -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS > > > -DBINDIR=\"/sbin\" -o sysfs.o -c sysfs.c > > > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o > > > xmalloc.o dlink.o > > > sysfs.o: In function `sysfsline': > > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid' > > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero' > > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero' > > > collect2: error: ld returned 1 exit status > > > Makefile:220: recipe for target 'raid6check' failed > > > make: *** [raid6check] Error 1 > > I cannot see this problem. > > I could compile without issue. > > Maybe some library is missing somewhere, > > but I'm not sure where. > > Do you try with the fastest mdadm tree? But could be environment issue ... I'm using Fedora, so I downloaded the .srpm package, installed, enabled raid6check, patched and rebuild... My background idea was to have the mdadm rpm *with* raid6check, but I did not go so far... Sorry... bye, pg > Thanks, > Guoqing -- piergiorgio ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 18:32 ` Piergiorgio Sartor @ 2020-05-13 6:18 ` Wolfgang Denk 0 siblings, 0 replies; 38+ messages in thread From: Wolfgang Denk @ 2020-05-13 6:18 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid Dear Piergiorgio, In message <20200512183251.GA11548@lazy.lzy> you wrote: > > > > > xmalloc.o dlink.o > > > > sysfs.o: In function `sysfsline': > > > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid' > > > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero' > > > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero' > > > > collect2: error: ld returned 1 exit status > > > > Makefile:220: recipe for target 'raid6check' failed > > > > make: *** [raid6check] Error 1 > > > I cannot see this problem. > > > I could compile without issue. > > > Maybe some library is missing somewhere, > > > but I'm not sure where. > > > > Do you try with the fastest mdadm tree? But could be environment issue ... > > I'm using Fedora, so I downloaded > the .srpm package, installed, enabled > raid6check, patched and rebuild... Fedora 32 is still at mdadm-4.1 (Mon Oct 1 14:27:52 2018), but it seems the significant change was introduced bu commit b06815989 "mdadm: load default sysfs attributes after assemblation" (Wed Jul 10 13:38:53 2019). If you try to build top of tree you should see the problem, too [and the -Werror issue I mentioned before, which is also fixed in Fedora by local distro patches.] Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de As far as the laws of mathematics refer to reality, they are not cer- tain, and as far as they are certain, they do not refer to reality. -- Albert Einstein ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-12 16:07 ` Piergiorgio Sartor 2020-05-12 18:16 ` Guoqing Jiang @ 2020-05-13 6:07 ` Wolfgang Denk 2020-05-15 10:34 ` Andrey Jr. Melnikov 1 sibling, 1 reply; 38+ messages in thread From: Wolfgang Denk @ 2020-05-13 6:07 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid Dear Piergiorgio, In message <20200512160712.GB7261@lazy.lzy> you wrote: > > > BTW, seems there are build problems for raid6check ... ... > I cannot see this problem. > I could compile without issue. > Maybe some library is missing somewhere, > but I'm not sure where. I see the same problem when trying to build current to of tree (mdadm-4.1-74-g5cfb79d): -> make raid6check ... gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o dlink.o -c dlink.c In function "dl_strndup", inlined from "dl_strdup" at dlink.c:73:12: dlink.c:66:5: error: "strncpy" output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] 66 | strncpy(n, s, l); | ^~~~~~~~~~~~~~~~ dlink.c: In function "dl_strdup": dlink.c:73:31: note: length computed here 73 | return dl_strndup(s, (int)strlen(s)); | ^~~~~~~~~ cc1: all warnings being treated as errors removing the "-Werror" from the CWFLAGS setting in the Makefile then leads to: ... gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o /usr/bin/ld: sysfs.o: in function `sysfsline': sysfs.c:(.text+0x2707): undefined reference to `parse_uuid' /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero' /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero' This might come from commit b06815989 "mdadm: load default sysfs attributes after assemblation"; mdadm-4.1 builds ok. Build tests were run on Fedora 32. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Calm down, it's *__only* ones and zeroes. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-13 6:07 ` Wolfgang Denk @ 2020-05-15 10:34 ` Andrey Jr. Melnikov 2020-05-15 11:54 ` Wolfgang Denk 0 siblings, 1 reply; 38+ messages in thread From: Andrey Jr. Melnikov @ 2020-05-15 10:34 UTC (permalink / raw) To: linux-raid Wolfgang Denk <wd@denx.de> wrote: > Dear Piergiorgio, > In message <20200512160712.GB7261@lazy.lzy> you wrote: > > > > > BTW, seems there are build problems for raid6check ... > ... > > I cannot see this problem. > > I could compile without issue. > > Maybe some library is missing somewhere, > > but I'm not sure where. > ... > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o > /usr/bin/ld: sysfs.o: in function `sysfsline': > sysfs.c:(.text+0x2707): undefined reference to `parse_uuid' > /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero' > /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero' raid6check miss util.o object. Add it to CHECK_OBJS ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-15 10:34 ` Andrey Jr. Melnikov @ 2020-05-15 11:54 ` Wolfgang Denk 2020-05-15 12:58 ` Guoqing Jiang 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Denk @ 2020-05-15 11:54 UTC (permalink / raw) To: Andrey Jr. Melnikov; +Cc: linux-raid Dear "Andrey Jr. Melnikov", In message <sq72pg-98v.ln1@banana.localnet> you wrote: > > > ... > > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o > > /usr/bin/ld: sysfs.o: in function `sysfsline': > > sysfs.c:(.text+0x2707): undefined reference to `parse_uuid' > > /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero' > > /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero' > > raid6check miss util.o object. Add it to CHECK_OBJS This makes things just worse. With this, I get: ... gcc -Wall -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-77-g3b7aae9\" -DVERS_DATE="\"2020-05-14\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o util.o -c util.c gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o util.o /usr/bin/ld: util.o: in function `mdadm_version': util.c:(.text+0x702): undefined reference to `Version' /usr/bin/ld: util.o: in function `fname_from_uuid': util.c:(.text+0xdce): undefined reference to `super1' /usr/bin/ld: util.o: in function `is_subarray_active': util.c:(.text+0x30b3): undefined reference to `mdstat_read' /usr/bin/ld: util.c:(.text+0x3122): undefined reference to `free_mdstat' /usr/bin/ld: util.o: in function `flush_metadata_updates': util.c:(.text+0x3ad3): undefined reference to `connect_monitor' /usr/bin/ld: util.c:(.text+0x3af1): undefined reference to `send_message' /usr/bin/ld: util.c:(.text+0x3afb): undefined reference to `wait_reply' /usr/bin/ld: util.c:(.text+0x3b1f): undefined reference to `ack' /usr/bin/ld: util.c:(.text+0x3b29): undefined reference to `wait_reply' /usr/bin/ld: util.o: in function `container_choose_spares': util.c:(.text+0x3c84): undefined reference to `devid_policy' /usr/bin/ld: util.c:(.text+0x3c9b): undefined reference to `pol_domain' /usr/bin/ld: util.c:(.text+0x3caa): undefined reference to `pol_add' /usr/bin/ld: util.c:(.text+0x3cbc): undefined reference to `domain_test' /usr/bin/ld: util.c:(.text+0x3ccb): undefined reference to `dev_policy_free' /usr/bin/ld: util.c:(.text+0x3d11): undefined reference to `dev_policy_free' /usr/bin/ld: util.o: in function `set_cmap_hooks': util.c:(.text+0x3f80): undefined reference to `dlopen' /usr/bin/ld: util.c:(.text+0x3f9c): undefined reference to `dlsym' /usr/bin/ld: util.c:(.text+0x3fad): undefined reference to `dlsym' /usr/bin/ld: util.c:(.text+0x3fbe): undefined reference to `dlsym' /usr/bin/ld: util.o: in function `set_dlm_hooks': util.c:(.text+0x4310): undefined reference to `dlopen' /usr/bin/ld: util.c:(.text+0x4330): undefined reference to `dlsym' /usr/bin/ld: util.c:(.text+0x4341): undefined reference to `dlsym' /usr/bin/ld: util.c:(.text+0x4352): undefined reference to `dlsym' /usr/bin/ld: util.c:(.text+0x4363): undefined reference to `dlsym' /usr/bin/ld: util.c:(.text+0x4374): undefined reference to `dlsym' /usr/bin/ld: util.o:util.c:(.text+0x4385): more undefined references to `dlsym' follow /usr/bin/ld: util.o: in function `set_cmap_hooks': util.c:(.text+0x3fed): undefined reference to `dlclose' /usr/bin/ld: util.o: in function `set_dlm_hooks': util.c:(.text+0x43e5): undefined reference to `dlclose' /usr/bin/ld: util.o:(.data+0x0): undefined reference to `super0' /usr/bin/ld: util.o:(.data+0x8): undefined reference to `super1' /usr/bin/ld: util.o:(.data+0x10): undefined reference to `super_ddf' /usr/bin/ld: util.o:(.data+0x18): undefined reference to `super_imsm' /usr/bin/ld: util.o:(.data+0x20): undefined reference to `mbr' /usr/bin/ld: util.o:(.data+0x28): undefined reference to `gpt' collect2: error: ld returned 1 exit status make: *** [Makefile:221: raid6check] Error 1 Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Ninety-Ninety Rule of Project Schedules: The first ninety percent of the task takes ninety percent of the time, and the last ten percent takes the other ninety percent. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-15 11:54 ` Wolfgang Denk @ 2020-05-15 12:58 ` Guoqing Jiang 0 siblings, 0 replies; 38+ messages in thread From: Guoqing Jiang @ 2020-05-15 12:58 UTC (permalink / raw) To: Wolfgang Denk, Andrey Jr. Melnikov; +Cc: linux-raid On 5/15/20 1:54 PM, Wolfgang Denk wrote: > Dear "Andrey Jr. Melnikov", > > In message <sq72pg-98v.ln1@banana.localnet> you wrote: >>> ... >>> gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o >>> /usr/bin/ld: sysfs.o: in function `sysfsline': >>> sysfs.c:(.text+0x2707): undefined reference to `parse_uuid' >>> /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero' >>> /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero' >> raid6check miss util.o object. Add it to CHECK_OBJS > This makes things just worse. With this, I get: > > ... > gcc -Wall -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-77-g3b7aae9\" -DVERS_DATE="\"2020-05-14\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -o util.o -c util.c > gcc -O2 -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o util.o > /usr/bin/ld: util.o: in function `mdadm_version': > util.c:(.text+0x702): undefined reference to `Version' > /usr/bin/ld: util.o: in function `fname_from_uuid': > util.c:(.text+0xdce): undefined reference to `super1' > /usr/bin/ld: util.o: in function `is_subarray_active': > util.c:(.text+0x30b3): undefined reference to `mdstat_read' > /usr/bin/ld: util.c:(.text+0x3122): undefined reference to `free_mdstat' > /usr/bin/ld: util.o: in function `flush_metadata_updates': > util.c:(.text+0x3ad3): undefined reference to `connect_monitor' > /usr/bin/ld: util.c:(.text+0x3af1): undefined reference to `send_message' > /usr/bin/ld: util.c:(.text+0x3afb): undefined reference to `wait_reply' > /usr/bin/ld: util.c:(.text+0x3b1f): undefined reference to `ack' > /usr/bin/ld: util.c:(.text+0x3b29): undefined reference to `wait_reply' > /usr/bin/ld: util.o: in function `container_choose_spares': > util.c:(.text+0x3c84): undefined reference to `devid_policy' > /usr/bin/ld: util.c:(.text+0x3c9b): undefined reference to `pol_domain' > /usr/bin/ld: util.c:(.text+0x3caa): undefined reference to `pol_add' > /usr/bin/ld: util.c:(.text+0x3cbc): undefined reference to `domain_test' > /usr/bin/ld: util.c:(.text+0x3ccb): undefined reference to `dev_policy_free' > /usr/bin/ld: util.c:(.text+0x3d11): undefined reference to `dev_policy_free' > /usr/bin/ld: util.o: in function `set_cmap_hooks': > util.c:(.text+0x3f80): undefined reference to `dlopen' > /usr/bin/ld: util.c:(.text+0x3f9c): undefined reference to `dlsym' > /usr/bin/ld: util.c:(.text+0x3fad): undefined reference to `dlsym' > /usr/bin/ld: util.c:(.text+0x3fbe): undefined reference to `dlsym' > /usr/bin/ld: util.o: in function `set_dlm_hooks': > util.c:(.text+0x4310): undefined reference to `dlopen' > /usr/bin/ld: util.c:(.text+0x4330): undefined reference to `dlsym' > /usr/bin/ld: util.c:(.text+0x4341): undefined reference to `dlsym' > /usr/bin/ld: util.c:(.text+0x4352): undefined reference to `dlsym' > /usr/bin/ld: util.c:(.text+0x4363): undefined reference to `dlsym' > /usr/bin/ld: util.c:(.text+0x4374): undefined reference to `dlsym' > /usr/bin/ld: util.o:util.c:(.text+0x4385): more undefined references to `dlsym' follow > /usr/bin/ld: util.o: in function `set_cmap_hooks': > util.c:(.text+0x3fed): undefined reference to `dlclose' > /usr/bin/ld: util.o: in function `set_dlm_hooks': > util.c:(.text+0x43e5): undefined reference to `dlclose' > /usr/bin/ld: util.o:(.data+0x0): undefined reference to `super0' > /usr/bin/ld: util.o:(.data+0x8): undefined reference to `super1' > /usr/bin/ld: util.o:(.data+0x10): undefined reference to `super_ddf' > /usr/bin/ld: util.o:(.data+0x18): undefined reference to `super_imsm' > /usr/bin/ld: util.o:(.data+0x20): undefined reference to `mbr' > /usr/bin/ld: util.o:(.data+0x28): undefined reference to `gpt' > collect2: error: ld returned 1 exit status > make: *** [Makefile:221: raid6check] Error 1 > I think we need a new uuid.c which is separated from util.c to address the issue, will send patch for it later. Thanks, Guoqing ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk 2020-05-10 13:26 ` Piergiorgio Sartor 2020-05-10 22:16 ` Guoqing Jiang @ 2020-05-14 17:20 ` Roy Sigurd Karlsbakk 2020-05-14 18:20 ` Wolfgang Denk 2 siblings, 1 reply; 38+ messages in thread From: Roy Sigurd Karlsbakk @ 2020-05-14 17:20 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Linux Raid > I'm running raid6check on a 12 TB (8 x 2 TB harddisks) > RAID6 array and wonder why it is so extremely slow... > It seems to be reading the disks only a about 400 kB/s, > which results in an estimated time of some 57 days!!! > to complete checking the array. The system is basically idle, there > is neither any significant CPU load nor any other I/o (no to the > tested array, nor to any other storage on this system). > > Am I doing something wrong? Try checking with iostat -x to see if one disk is performing worse than the other ones. This sometimes happens and can indicate a failure that the normal SMART/smartctl stuff can't identify. If you see a utilisation of one of the disks at 100%, that's the bastard. Under normal circumstances, you probably won't be able to return that, since it "works". There's a quick fix for that, though. Just unplug the disk, plug it into a power cable, let it spin up and then sharpy twist it 90 degees a few times, and it's all sorted out and you can return it ;) Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-14 17:20 ` Roy Sigurd Karlsbakk @ 2020-05-14 18:20 ` Wolfgang Denk 2020-05-14 19:51 ` Roy Sigurd Karlsbakk 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Denk @ 2020-05-14 18:20 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Linux Raid Dear Roy, In message <1999694976.3317399.1589476824607.JavaMail.zimbra@karlsbakk.net> you wrote: > > Try checking with iostat -x to see if one disk is performing worse > than the other ones. This sometimes happens and can indicate a > failure that the normal SMART/smartctl stuff can't identify. If > you see a utilisation of one of the disks at 100%, that's the > bastard. Under normal circumstances, you probably won't be able to > return that, since it "works". There's a quick fix for that, > though. Just unplug the disk, plug it into a power cable, let it > spin up and then sharpy twist it 90 degees a few times, and it's > all sorted out and you can return it ;) Everything looks unsuspicious to me - all disks behave the same: # iostat -x /dev/sd[efhijklm] 1 3 Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-14 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.19 0.00 1.06 0.15 0.00 98.60 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sde 20.08 360.56 2.53 11.20 0.34 17.95 0.49 0.10 0.02 3.41 32.36 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 32.74 0.02 2.11 sdf 20.07 360.56 2.54 11.24 0.33 17.96 0.49 0.10 0.02 3.40 44.23 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 44.77 0.02 2.09 sdh 20.08 360.54 2.53 11.17 0.35 17.95 0.49 0.10 0.02 3.40 43.47 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 44.01 0.02 2.40 sdi 20.08 360.58 2.54 11.23 0.34 17.96 0.49 0.10 0.02 3.40 26.22 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.49 26.50 0.01 2.84 sdj 20.45 360.56 2.16 9.54 0.34 17.63 0.49 0.10 0.02 3.38 35.19 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 35.60 0.02 2.46 sdk 20.08 360.54 2.53 11.21 0.35 17.95 0.49 0.10 0.02 3.42 40.63 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.49 41.13 0.02 2.36 sdl 20.07 360.57 2.54 11.24 0.34 17.96 0.49 0.10 0.02 3.39 23.61 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 23.84 0.01 2.70 sdm 20.08 360.55 2.53 11.21 0.53 17.96 0.49 0.10 0.02 3.41 21.52 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.49 21.67 0.01 2.64 avg-cpu: %user %nice %system %iowait %steal %idle 0.38 0.00 1.12 0.12 0.00 98.38 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sde 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 sdf 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 sdh 20.00 320.00 0.00 0.00 0.30 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 sdi 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 sdj 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 sdk 20.00 320.00 0.00 0.00 0.30 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 sdl 20.00 320.00 0.00 0.00 0.25 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 sdm 20.00 320.00 0.00 0.00 0.35 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.25 0.00 0.88 0.00 0.00 98.87 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sde 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10 sdf 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10 sdh 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.30 sdi 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 sdj 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10 sdk 21.00 336.00 0.00 0.00 0.24 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10 sdl 21.00 336.00 0.00 0.00 0.29 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.20 sdm 21.00 336.00 0.00 0.00 0.38 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.10 Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de We see things not as they are, but as we are. - H. M. Tomlinson ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-14 18:20 ` Wolfgang Denk @ 2020-05-14 19:51 ` Roy Sigurd Karlsbakk 2020-05-15 8:08 ` Wolfgang Denk 0 siblings, 1 reply; 38+ messages in thread From: Roy Sigurd Karlsbakk @ 2020-05-14 19:51 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Linux Raid what? Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. ----- Original Message ----- > From: "Wolfgang Denk" <wd@denx.de> > To: "Roy Sigurd Karlsbakk" <roy@karlsbakk.net> > Cc: "Linux Raid" <linux-raid@vger.kernel.org> > Sent: Thursday, 14 May, 2020 20:20:41 > Subject: Re: raid6check extremely slow ? > Dear Roy, > > In message <1999694976.3317399.1589476824607.JavaMail.zimbra@karlsbakk.net> you > wrote: >> >> Try checking with iostat -x to see if one disk is performing worse >> than the other ones. This sometimes happens and can indicate a >> failure that the normal SMART/smartctl stuff can't identify. If >> you see a utilisation of one of the disks at 100%, that's the >> bastard. Under normal circumstances, you probably won't be able to >> return that, since it "works". There's a quick fix for that, >> though. Just unplug the disk, plug it into a power cable, let it >> spin up and then sharpy twist it 90 degees a few times, and it's >> all sorted out and you can return it ;) > > Everything looks unsuspicious to me - all disks behave the same: > > # iostat -x /dev/sd[efhijklm] 1 3 > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-14 _x86_64_ > (8 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.19 0.00 1.06 0.15 0.00 98.60 > > Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s > wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm > d_await dareq-sz f/s f_await aqu-sz %util > sde 20.08 360.56 2.53 11.20 0.34 17.95 0.49 > 0.10 0.02 3.41 32.36 0.20 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 32.74 0.02 2.11 > sdf 20.07 360.56 2.54 11.24 0.33 17.96 0.49 > 0.10 0.02 3.40 44.23 0.20 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 44.77 0.02 2.09 > sdh 20.08 360.54 2.53 11.17 0.35 17.95 0.49 > 0.10 0.02 3.40 43.47 0.20 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 44.01 0.02 2.40 > sdi 20.08 360.58 2.54 11.23 0.34 17.96 0.49 > 0.10 0.02 3.40 26.22 0.21 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 26.50 0.01 2.84 > sdj 20.45 360.56 2.16 9.54 0.34 17.63 0.49 > 0.10 0.02 3.38 35.19 0.20 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 35.60 0.02 2.46 > sdk 20.08 360.54 2.53 11.21 0.35 17.95 0.49 > 0.10 0.02 3.42 40.63 0.21 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 41.13 0.02 2.36 > sdl 20.07 360.57 2.54 11.24 0.34 17.96 0.49 > 0.10 0.02 3.39 23.61 0.20 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 23.84 0.01 2.70 > sdm 20.08 360.55 2.53 11.21 0.53 17.96 0.49 > 0.10 0.02 3.41 21.52 0.20 0.00 0.00 0.00 0.00 > 0.00 0.00 0.49 21.67 0.01 2.64 > > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.38 0.00 1.12 0.12 0.00 98.38 > > Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s > wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm > d_await dareq-sz f/s f_await aqu-sz %util > sde 20.00 320.00 0.00 0.00 0.25 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.00 > sdf 20.00 320.00 0.00 0.00 0.25 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.00 > sdh 20.00 320.00 0.00 0.00 0.30 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.00 > sdi 20.00 320.00 0.00 0.00 0.25 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 4.00 > sdj 20.00 320.00 0.00 0.00 0.25 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.00 > sdk 20.00 320.00 0.00 0.00 0.30 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.00 > sdl 20.00 320.00 0.00 0.00 0.25 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 4.00 > sdm 20.00 320.00 0.00 0.00 0.35 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.00 > > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.25 0.00 0.88 0.00 0.00 98.87 > > Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s > wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm > d_await dareq-sz f/s f_await aqu-sz %util > sde 21.00 336.00 0.00 0.00 0.24 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.10 > sdf 21.00 336.00 0.00 0.00 0.24 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.10 > sdh 21.00 336.00 0.00 0.00 0.24 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.30 > sdi 21.00 336.00 0.00 0.00 0.24 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 4.00 > sdj 21.00 336.00 0.00 0.00 0.24 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.10 > sdk 21.00 336.00 0.00 0.00 0.24 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.10 > sdl 21.00 336.00 0.00 0.00 0.29 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 4.20 > sdm 21.00 336.00 0.00 0.00 0.38 16.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 2.10 > > > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > We see things not as they are, but as we are. - H. M. Tomlinson ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: raid6check extremely slow ? 2020-05-14 19:51 ` Roy Sigurd Karlsbakk @ 2020-05-15 8:08 ` Wolfgang Denk 0 siblings, 0 replies; 38+ messages in thread From: Wolfgang Denk @ 2020-05-15 8:08 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Linux Raid Dear Roy Sigurd Karlsbakk, In message <1430936688.3381175.1589485881380.JavaMail.zimbra@karlsbakk.net> you wrote: > what? You asked: "Try checking with iostat -x to see if one disk is performing worse than the other ones." The output of "iostat -x" which I posted shows clearly that all disk behave very much the same - there are just minimal statistic fluctuations, but agail equally distributed over all 8 disks. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de I used to be indecisive, now I'm not sure. ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2020-05-15 12:58 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk 2020-05-10 13:26 ` Piergiorgio Sartor 2020-05-11 6:33 ` Wolfgang Denk 2020-05-10 22:16 ` Guoqing Jiang 2020-05-11 6:40 ` Wolfgang Denk 2020-05-11 8:58 ` Guoqing Jiang 2020-05-11 15:39 ` Piergiorgio Sartor 2020-05-12 7:37 ` Wolfgang Denk 2020-05-12 16:17 ` Piergiorgio Sartor 2020-05-13 6:13 ` Wolfgang Denk 2020-05-13 16:22 ` Piergiorgio Sartor 2020-05-11 16:14 ` Piergiorgio Sartor 2020-05-11 20:53 ` Giuseppe Bilotta 2020-05-11 21:12 ` Guoqing Jiang 2020-05-11 21:16 ` Guoqing Jiang 2020-05-12 1:52 ` Giuseppe Bilotta 2020-05-12 6:27 ` Adam Goryachev 2020-05-12 16:11 ` Piergiorgio Sartor 2020-05-12 16:05 ` Piergiorgio Sartor 2020-05-11 21:07 ` Guoqing Jiang 2020-05-11 22:44 ` Peter Grandi 2020-05-12 16:09 ` Piergiorgio Sartor 2020-05-12 20:54 ` antlists 2020-05-13 16:18 ` Piergiorgio Sartor 2020-05-13 17:37 ` Wols Lists 2020-05-13 18:23 ` Piergiorgio Sartor 2020-05-12 16:07 ` Piergiorgio Sartor 2020-05-12 18:16 ` Guoqing Jiang 2020-05-12 18:32 ` Piergiorgio Sartor 2020-05-13 6:18 ` Wolfgang Denk 2020-05-13 6:07 ` Wolfgang Denk 2020-05-15 10:34 ` Andrey Jr. Melnikov 2020-05-15 11:54 ` Wolfgang Denk 2020-05-15 12:58 ` Guoqing Jiang 2020-05-14 17:20 ` Roy Sigurd Karlsbakk 2020-05-14 18:20 ` Wolfgang Denk 2020-05-14 19:51 ` Roy Sigurd Karlsbakk 2020-05-15 8:08 ` Wolfgang Denk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.