All of lore.kernel.org
 help / color / mirror / Atom feed
* raid6check extremely slow ?
@ 2020-05-10 12:07 Wolfgang Denk
  2020-05-10 13:26 ` Piergiorgio Sartor
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-10 12:07 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
RAID6 array and wonder why it is so extremely slow...
It seems to be reading the disks only a about 400 kB/s,
which results in an estimated time of some 57 days!!!
to complete checking the array.  The system is basically idle, there
is neither any significant CPU load nor any other I/o (no to the
tested array, nor to any other storage on this system).

Am I doing something wrong?


The command I'm running is simply:

# raid6check /dev/md0 0 0

This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64).

The array data:

# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu Nov  7 19:30:03 2013
        Raid Level : raid6
        Array Size : 11720301024 (11177.35 GiB 12001.59 GB)
     Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB)
      Raid Devices : 8
     Total Devices : 8
       Persistence : Superblock is persistent

       Update Time : Mon May  4 22:12:02 2020
             State : active
    Active Devices : 8
   Working Devices : 8
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 16K

Consistency Policy : resync

              Name : atlas.denx.de:0  (local to host atlas.denx.de)
              UUID : 4df90724:87913791:1700bb31:773735d0
            Events : 181544

    Number   Major   Minor   RaidDevice State
      12       8       64        0      active sync   /dev/sde
      11       8       80        1      active sync   /dev/sdf
      13       8      112        2      active sync   /dev/sdh
       8       8      128        3      active sync   /dev/sdi
       9       8      144        4      active sync   /dev/sdj
      10       8      160        5      active sync   /dev/sdk
      14       8      176        6      active sync   /dev/sdl
      15       8      192        7      active sync   /dev/sdm

# iostat /dev/sd[efhijklm]
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-07      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.18    0.01    1.11    0.21    0.00   98.49

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sde              19.23       388.93         0.09         0.00  158440224      35218          0
sdf              19.20       388.94         0.09         0.00  158447574      34894          0
sdh              19.23       388.89         0.08         0.00  158425596      34178          0
sdi              19.23       388.99         0.09         0.00  158466326      34690          0
sdj              20.18       388.93         0.09         0.00  158439780      34766          0
sdk              19.23       388.88         0.09         0.00  158419988      35366          0
sdl              19.20       388.97         0.08         0.00  158457352      34426          0
sdm              19.21       388.92         0.08         0.00  158435748      34566          0


top - 09:08:19 up 4 days, 17:10,  3 users,  load average: 1.00, 1.00, 1.00
Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.5 sy,  0.0 ni, 98.5 id,  0.1 wa,  0.6 hi,  0.1 si,  0.0 st
MiB Mem :  24034.6 total,  11198.4 free,   1871.8 used,  10964.3 buff/cache
MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21767.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  19719 root      20   0    2852   2820   2020 D   5.1   0.0 285:40.07 raid6check
   1123 root      20   0       0      0      0 S   0.3   0.0  25:47.54 md0_raid6
  37816 root      20   0       0      0      0 I   0.3   0.0   0:00.08 kworker/3:1-events
  37903 root      20   0  219680   4540   3716 R   0.3   0.0   0:00.02 top
...


HDD in use:

/dev/sde : ST2000NM0033-9ZM175
/dev/sdf : ST2000NM0033-9ZM175
/dev/sdh : ST2000NM0033-9ZM175
/dev/sdi : ST2000NM0033-9ZM175
/dev/sdj : ST2000NM0033-9ZM175
/dev/sdk : ST2000NM0033-9ZM175
/dev/sdl : ST2000NM0033-9ZM175
/dev/sdm : ST2000NM0008-2F3100


3 days later:

# iostat /dev/sd[efhijklm]
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-10      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.18    0.00    1.07    0.17    0.00   98.57

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sde              20.15       370.73         0.10         0.00  253186948      68154          0
sdf              20.13       370.74         0.10         0.00  253194646      68138          0
sdh              20.15       370.71         0.10         0.00  253172656      67738          0
sdi              20.15       370.77         0.10         0.00  253213854      68158          0
sdj              20.72       370.73         0.10         0.00  253187084      68066          0
sdk              20.15       370.70         0.10         0.00  253166960      69286          0
sdl              20.13       370.76         0.10         0.00  253204572      68070          0
sdm              20.14       370.73         0.10         0.00  253182964      68070          0


I've tried playing with speed_limit_min/speed_limit_max, but this
didn't change anything:

# cat /proc/sys/dev/raid/speed_limit_max
2000000
cat /proc/sys/dev/raid/speed_limit_min
10000

Any ideas welcome!

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The inappropriate cannot be beautiful.
             - Frank Lloyd Wright _The Future of Architecture_ (1953)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
@ 2020-05-10 13:26 ` Piergiorgio Sartor
  2020-05-11  6:33   ` Wolfgang Denk
  2020-05-10 22:16 ` Guoqing Jiang
  2020-05-14 17:20 ` Roy Sigurd Karlsbakk
  2 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-10 13:26 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Sun, May 10, 2020 at 02:07:25PM +0200, Wolfgang Denk wrote:
> Hi,
> 
> I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
> RAID6 array and wonder why it is so extremely slow...
> It seems to be reading the disks only a about 400 kB/s,
> which results in an estimated time of some 57 days!!!
> to complete checking the array.  The system is basically idle, there
> is neither any significant CPU load nor any other I/o (no to the
> tested array, nor to any other storage on this system).
> 
> Am I doing something wrong?
> 
> 
> The command I'm running is simply:
> 
> # raid6check /dev/md0 0 0
> 
> This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64).
> 
> The array data:
> 
> # mdadm --detail /dev/md0
> /dev/md0:
>            Version : 1.2
>      Creation Time : Thu Nov  7 19:30:03 2013
>         Raid Level : raid6
>         Array Size : 11720301024 (11177.35 GiB 12001.59 GB)
>      Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB)
>       Raid Devices : 8
>      Total Devices : 8
>        Persistence : Superblock is persistent
> 
>        Update Time : Mon May  4 22:12:02 2020
>              State : active
>     Active Devices : 8
>    Working Devices : 8
>     Failed Devices : 0
>      Spare Devices : 0
> 
>             Layout : left-symmetric
>         Chunk Size : 16K
> 
> Consistency Policy : resync
> 
>               Name : atlas.denx.de:0  (local to host atlas.denx.de)
>               UUID : 4df90724:87913791:1700bb31:773735d0
>             Events : 181544
> 
>     Number   Major   Minor   RaidDevice State
>       12       8       64        0      active sync   /dev/sde
>       11       8       80        1      active sync   /dev/sdf
>       13       8      112        2      active sync   /dev/sdh
>        8       8      128        3      active sync   /dev/sdi
>        9       8      144        4      active sync   /dev/sdj
>       10       8      160        5      active sync   /dev/sdk
>       14       8      176        6      active sync   /dev/sdl
>       15       8      192        7      active sync   /dev/sdm
> 
> # iostat /dev/sd[efhijklm]
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-07      _x86_64_        (8 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.18    0.01    1.11    0.21    0.00   98.49
> 
> Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> sde              19.23       388.93         0.09         0.00  158440224      35218          0
> sdf              19.20       388.94         0.09         0.00  158447574      34894          0
> sdh              19.23       388.89         0.08         0.00  158425596      34178          0
> sdi              19.23       388.99         0.09         0.00  158466326      34690          0
> sdj              20.18       388.93         0.09         0.00  158439780      34766          0
> sdk              19.23       388.88         0.09         0.00  158419988      35366          0
> sdl              19.20       388.97         0.08         0.00  158457352      34426          0
> sdm              19.21       388.92         0.08         0.00  158435748      34566          0
> 
> 
> top - 09:08:19 up 4 days, 17:10,  3 users,  load average: 1.00, 1.00, 1.00
> Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.2 us,  0.5 sy,  0.0 ni, 98.5 id,  0.1 wa,  0.6 hi,  0.1 si,  0.0 st
> MiB Mem :  24034.6 total,  11198.4 free,   1871.8 used,  10964.3 buff/cache
> MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21767.6 avail Mem
> 
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   19719 root      20   0    2852   2820   2020 D   5.1   0.0 285:40.07 raid6check
>    1123 root      20   0       0      0      0 S   0.3   0.0  25:47.54 md0_raid6
>   37816 root      20   0       0      0      0 I   0.3   0.0   0:00.08 kworker/3:1-events
>   37903 root      20   0  219680   4540   3716 R   0.3   0.0   0:00.02 top
> ...
> 
> 
> HDD in use:
> 
> /dev/sde : ST2000NM0033-9ZM175
> /dev/sdf : ST2000NM0033-9ZM175
> /dev/sdh : ST2000NM0033-9ZM175
> /dev/sdi : ST2000NM0033-9ZM175
> /dev/sdj : ST2000NM0033-9ZM175
> /dev/sdk : ST2000NM0033-9ZM175
> /dev/sdl : ST2000NM0033-9ZM175
> /dev/sdm : ST2000NM0008-2F3100
> 
> 
> 3 days later:
> 
> # iostat /dev/sd[efhijklm]
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-10      _x86_64_        (8 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.18    0.00    1.07    0.17    0.00   98.57
> 
> Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> sde              20.15       370.73         0.10         0.00  253186948      68154          0
> sdf              20.13       370.74         0.10         0.00  253194646      68138          0
> sdh              20.15       370.71         0.10         0.00  253172656      67738          0
> sdi              20.15       370.77         0.10         0.00  253213854      68158          0
> sdj              20.72       370.73         0.10         0.00  253187084      68066          0
> sdk              20.15       370.70         0.10         0.00  253166960      69286          0
> sdl              20.13       370.76         0.10         0.00  253204572      68070          0
> sdm              20.14       370.73         0.10         0.00  253182964      68070          0
> 
> 
> I've tried playing with speed_limit_min/speed_limit_max, but this
> didn't change anything:
> 
> # cat /proc/sys/dev/raid/speed_limit_max
> 2000000
> cat /proc/sys/dev/raid/speed_limit_min
> 10000
> 
> Any ideas welcome!

Difficult to say.

raid6check is CPU bounded, no vector optimization
and no multithread.

Nevertheless, if you see no CPU load (single core
load), then something else is not OK, but I've no
idea what it could be.

Please check if one core is up 100%, if this is
the case, then there is the limit.
If not, sorry, I cannot help.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
  2020-05-10 13:26 ` Piergiorgio Sartor
@ 2020-05-10 22:16 ` Guoqing Jiang
  2020-05-11  6:40   ` Wolfgang Denk
  2020-05-14 17:20 ` Roy Sigurd Karlsbakk
  2 siblings, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-10 22:16 UTC (permalink / raw)
  To: Wolfgang Denk, linux-raid

On 5/10/20 2:07 PM, Wolfgang Denk wrote:
> Hi,
>
> I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
> RAID6 array and wonder why it is so extremely slow...
> It seems to be reading the disks only a about 400 kB/s,
> which results in an estimated time of some 57 days!!!
> to complete checking the array.  The system is basically idle, there
> is neither any significant CPU load nor any other I/o (no to the
> tested array, nor to any other storage on this system).
>
> Am I doing something wrong?
>
>
> The command I'm running is simply:
>
> # raid6check /dev/md0 0 0
>
> This is with mdadm-4.1 on a Fedora 32 system (mdadm-4.1-4.fc32.x86_64).
>
> The array data:
>
> # mdadm --detail /dev/md0
> /dev/md0:
>             Version : 1.2
>       Creation Time : Thu Nov  7 19:30:03 2013
>          Raid Level : raid6
>          Array Size : 11720301024 (11177.35 GiB 12001.59 GB)
>       Used Dev Size : 1953383504 (1862.89 GiB 2000.26 GB)
>        Raid Devices : 8
>       Total Devices : 8
>         Persistence : Superblock is persistent
>
>         Update Time : Mon May  4 22:12:02 2020
>               State : active
>      Active Devices : 8
>     Working Devices : 8
>      Failed Devices : 0
>       Spare Devices : 0
>
>              Layout : left-symmetric
>          Chunk Size : 16K
>
> Consistency Policy : resync
>
>                Name : atlas.denx.de:0  (local to host atlas.denx.de)
>                UUID : 4df90724:87913791:1700bb31:773735d0
>              Events : 181544
>
>      Number   Major   Minor   RaidDevice State
>        12       8       64        0      active sync   /dev/sde
>        11       8       80        1      active sync   /dev/sdf
>        13       8      112        2      active sync   /dev/sdh
>         8       8      128        3      active sync   /dev/sdi
>         9       8      144        4      active sync   /dev/sdj
>        10       8      160        5      active sync   /dev/sdk
>        14       8      176        6      active sync   /dev/sdl
>        15       8      192        7      active sync   /dev/sdm
>
> # iostat /dev/sd[efhijklm]
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-07      _x86_64_        (8 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.18    0.01    1.11    0.21    0.00   98.49
>
> Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> sde              19.23       388.93         0.09         0.00  158440224      35218          0
> sdf              19.20       388.94         0.09         0.00  158447574      34894          0
> sdh              19.23       388.89         0.08         0.00  158425596      34178          0
> sdi              19.23       388.99         0.09         0.00  158466326      34690          0
> sdj              20.18       388.93         0.09         0.00  158439780      34766          0
> sdk              19.23       388.88         0.09         0.00  158419988      35366          0
> sdl              19.20       388.97         0.08         0.00  158457352      34426          0
> sdm              19.21       388.92         0.08         0.00  158435748      34566          0
>
>
> top - 09:08:19 up 4 days, 17:10,  3 users,  load average: 1.00, 1.00, 1.00
> Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.2 us,  0.5 sy,  0.0 ni, 98.5 id,  0.1 wa,  0.6 hi,  0.1 si,  0.0 st
> MiB Mem :  24034.6 total,  11198.4 free,   1871.8 used,  10964.3 buff/cache
> MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21767.6 avail Mem
>
>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>    19719 root      20   0    2852   2820   2020 D   5.1   0.0 285:40.07 raid6check

Seems raid6check is in 'D' state, what are the output of 'cat 
/proc/19719/stack' and /proc/mdstat?

>     1123 root      20   0       0      0      0 S   0.3   0.0  25:47.54 md0_raid6
>    37816 root      20   0       0      0      0 I   0.3   0.0   0:00.08 kworker/3:1-events
>    37903 root      20   0  219680   4540   3716 R   0.3   0.0   0:00.02 top
> ...
>
>
> HDD in use:
>
> /dev/sde : ST2000NM0033-9ZM175
> /dev/sdf : ST2000NM0033-9ZM175
> /dev/sdh : ST2000NM0033-9ZM175
> /dev/sdi : ST2000NM0033-9ZM175
> /dev/sdj : ST2000NM0033-9ZM175
> /dev/sdk : ST2000NM0033-9ZM175
> /dev/sdl : ST2000NM0033-9ZM175
> /dev/sdm : ST2000NM0008-2F3100
>
>
> 3 days later:

Is raid6check still in 'D' state as before?

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-10 13:26 ` Piergiorgio Sartor
@ 2020-05-11  6:33   ` Wolfgang Denk
  0 siblings, 0 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-11  6:33 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

Dear Piergiorgio,

In message <20200510132611.GA12994@lazy.lzy> you wrote:
>
> raid6check is CPU bounded, no vector optimization
> and no multithread.
>
> Nevertheless, if you see no CPU load (single core
> load), then something else is not OK, but I've no
> idea what it could be.
>
> Please check if one core is up 100%, if this is
> the case, then there is the limit.
> If not, sorry, I cannot help.

No, there is virtually no CPU load at all:

top - 08:32:36 up 8 days, 16:34,  3 users,  load average: 1.00, 1.01, 1.00
Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  1.3 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.3 us,  1.3 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.3 hi,  0.3 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  1.7 us,  3.7 sy,  0.0 ni, 90.4 id,  3.0 wa,  0.7 hi,  0.7 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  24034.6 total,  10921.2 free,   1882.4 used,  11230.9 buff/cache
MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21757.0 avail Mem

What I find interesting is thet all disks are more or less
constantly at around 400 kB/s (390...400, never more).

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
That's their goal, remember, a goal that's really contrary to that of
the programmer or administrator. We just want to get our  jobs  done.
$Bill  just  wants  to  become  $$Bill. These aren't even marginallly
congruent.
         -- Tom Christiansen in <6jhtqk$qls$1@csnews.cs.colorado.edu>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-10 22:16 ` Guoqing Jiang
@ 2020-05-11  6:40   ` Wolfgang Denk
  2020-05-11  8:58     ` Guoqing Jiang
  0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-11  6:40 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

Dear Guoqing Jiang,

In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
>
> Seems raid6check is in 'D' state, what are the output of 'cat 
> /proc/19719/stack' and /proc/mdstat?

# for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_lo_store+0x50/0xa0
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_lo_store+0x50/0xa0
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_hi_store+0x44/0x90
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_hi_store+0x44/0x90
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
all the time?  I thought it was _reading_ the disks only?

And iostat does not report any writes either?

# iostat /dev/sd[efhijklm] | cat
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-11      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.18    0.00    1.07    0.17    0.00   98.58

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sde              20.30       368.76         0.10         0.00  277022327      75178          0
sdf              20.28       368.77         0.10         0.00  277030081      75170          0
sdh              20.30       368.74         0.10         0.00  277007903      74854          0
sdi              20.30       368.79         0.10         0.00  277049113      75246          0
sdj              20.82       368.76         0.10         0.00  277022363      74986          0
sdk              20.30       368.73         0.10         0.00  277002179      76322          0
sdl              20.29       368.78         0.10         0.00  277039743      74982          0
sdm              20.29       368.75         0.10         0.00  277018163      74958          0


# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid10 sdc1[0] sdd1[1]
      234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
      11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]

md1 : active raid1 sdb3[0] sda3[1]
      484118656 blocks [2/2] [UU]

md2 : active raid1 sdb1[0] sda1[1]
      255936 blocks [2/2] [UU]

unused devices: <none>

> > 3 days later:
>
> Is raid6check still in 'D' state as before?

Yes, nothing changed, still running:

top - 08:39:30 up 8 days, 16:41,  3 users,  load average: 1.00, 1.00, 1.00
Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
%Cpu1  :  1.0 us,  5.4 sy,  0.0 ni, 92.2 id,  0.7 wa,  0.3 hi,  0.3 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  24034.6 total,  10920.6 free,   1883.0 used,  11231.1 buff/cache
MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21756.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  19719 root      20   0    2852   2820   2020 D   7.6   0.0 679:04.39 raid6check
   1123 root      20   0       0      0      0 S   0.7   0.0  60:55.64 md0_raid6
     10 root      20   0       0      0      0 I   0.3   0.0   9:09.26 rcu_sched
    655 root       0 -20       0      0      0 I   0.3   0.0  21:28.95 kworker/1:1H-kblockd
  60161 root      20   0       0      0      0 I   0.3   0.0   0:01.18 kworker/6:1-events
  61997 root      20   0       0      0      0 I   0.3   0.0   0:01.48 kworker/1:3-events
...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Every program has at least one bug and can be shortened by  at  least
one  instruction  --  from  which,  by induction, one can deduce that
every program can be reduced to one instruction which doesn't work.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11  6:40   ` Wolfgang Denk
@ 2020-05-11  8:58     ` Guoqing Jiang
  2020-05-11 15:39       ` Piergiorgio Sartor
  2020-05-11 16:14       ` Piergiorgio Sartor
  0 siblings, 2 replies; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11  8:58 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

Hi Wolfgang,


On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> Dear Guoqing Jiang,
>
> In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
>> Seems raid6check is in 'D' state, what are the output of 'cat
>> /proc/19719/stack' and /proc/mdstat?
> # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_lo_store+0x50/0xa0
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_lo_store+0x50/0xa0
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_hi_store+0x44/0x90
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [<0>] __wait_rcu_gp+0x10d/0x110
> [<0>] synchronize_rcu+0x47/0x50
> [<0>] mddev_suspend+0x4a/0x140
> [<0>] suspend_hi_store+0x44/0x90
> [<0>] md_attr_store+0x86/0xe0
> [<0>] kernfs_fop_write+0xce/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x4f/0xc0
> [<0>] do_syscall_64+0x5b/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Looks raid6check keeps writing suspend_lo/hi node which causes 
mddev_suspend is called,
means synchronize_rcu and other synchronize mechanisms are triggered in 
the path ...

> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> all the time?  I thought it was _reading_ the disks only?

I didn't read raid6check before, just find check_stripes has


     while (length > 0) {
             lock_stripe -> write suspend_lo/hi node
             ...
             unlock_all_stripes -> -> write suspend_lo/hi node
     }

I think it explains the stack of raid6check, and maybe it is way that 
raid6check works, lock
stripe, check the stripe then unlock the stripe, just my guess ...

> And iostat does not report any writes either?

Because CPU is busying with mddev_suspend I think.

> # iostat /dev/sd[efhijklm] | cat
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-11      _x86_64_        (8 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.18    0.00    1.07    0.17    0.00   98.58
>
> Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> sde              20.30       368.76         0.10         0.00  277022327      75178          0
> sdf              20.28       368.77         0.10         0.00  277030081      75170          0
> sdh              20.30       368.74         0.10         0.00  277007903      74854          0
> sdi              20.30       368.79         0.10         0.00  277049113      75246          0
> sdj              20.82       368.76         0.10         0.00  277022363      74986          0
> sdk              20.30       368.73         0.10         0.00  277002179      76322          0
> sdl              20.29       368.78         0.10         0.00  277039743      74982          0
> sdm              20.29       368.75         0.10         0.00  277018163      74958          0
>
>
> # cat /proc/mdstat
> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> md3 : active raid10 sdc1[0] sdd1[1]
>        234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
>        bitmap: 0/2 pages [0KB], 65536KB chunk
>
> md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
>        11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
>
> md1 : active raid1 sdb3[0] sda3[1]
>        484118656 blocks [2/2] [UU]
>
> md2 : active raid1 sdb1[0] sda1[1]
>        255936 blocks [2/2] [UU]
>
> unused devices: <none>
>
>>> 3 days later:
>> Is raid6check still in 'D' state as before?
> Yes, nothing changed, still running:
>
> top - 08:39:30 up 8 days, 16:41,  3 users,  load average: 1.00, 1.00, 1.00
> Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
> %Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
> %Cpu1  :  1.0 us,  5.4 sy,  0.0 ni, 92.2 id,  0.7 wa,  0.3 hi,  0.3 si,  0.0 st
> %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem :  24034.6 total,  10920.6 free,   1883.0 used,  11231.1 buff/cache
> MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21756.5 avail Mem
>
>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>    19719 root      20   0    2852   2820   2020 D   7.6   0.0 679:04.39 raid6check

I think the stack of raid6check is pretty much the same as before.

Since the estimated time of 12TB array is about 57 days, if the 
estimated time is linear to
the number of stripes in the same machine, then it is how raid6check 
works as I guessed.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11  8:58     ` Guoqing Jiang
@ 2020-05-11 15:39       ` Piergiorgio Sartor
  2020-05-12  7:37         ` Wolfgang Denk
  2020-05-11 16:14       ` Piergiorgio Sartor
  1 sibling, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-11 15:39 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Wolfgang Denk, linux-raid

On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> Hi Wolfgang,
> 
> 
> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > Dear Guoqing Jiang,
> > 
> > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > /proc/19719/stack' and /proc/mdstat?
> > # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> is called,
> means synchronize_rcu and other synchronize mechanisms are triggered in the
> path ...
> 
> > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > all the time?  I thought it was _reading_ the disks only?
> 
> I didn't read raid6check before, just find check_stripes has
> 
> 
>     while (length > 0) {
>             lock_stripe -> write suspend_lo/hi node
>             ...
>             unlock_all_stripes -> -> write suspend_lo/hi node
>     }
> 
> I think it explains the stack of raid6check, and maybe it is way that
> raid6check works, lock
> stripe, check the stripe then unlock the stripe, just my guess ...

Yes, that's the way it works.
raid6check lock the stripe, check it, release it.
This is required in order to avoid race conditions
between raid6check and some write to the stripe.

The alternative is to set the array R/O and do
the check, avoiding the lock / unlock.

This could be a way to test if the problem is
really here.
That is, remove the lock / unlock (I guess
there should be only one pair, but better
check) and check with the array in R/O mode.

Hope this helps,

bye,

pg
 
> > And iostat does not report any writes either?
> 
> Because CPU is busying with mddev_suspend I think.
> 
> > # iostat /dev/sd[efhijklm] | cat
> > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-11      _x86_64_        (8 CPU)
> > 
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >             0.18    0.00    1.07    0.17    0.00   98.58
> > 
> > Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> > sde              20.30       368.76         0.10         0.00  277022327      75178          0
> > sdf              20.28       368.77         0.10         0.00  277030081      75170          0
> > sdh              20.30       368.74         0.10         0.00  277007903      74854          0
> > sdi              20.30       368.79         0.10         0.00  277049113      75246          0
> > sdj              20.82       368.76         0.10         0.00  277022363      74986          0
> > sdk              20.30       368.73         0.10         0.00  277002179      76322          0
> > sdl              20.29       368.78         0.10         0.00  277039743      74982          0
> > sdm              20.29       368.75         0.10         0.00  277018163      74958          0
> > 
> > 
> > # cat /proc/mdstat
> > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> > md3 : active raid10 sdc1[0] sdd1[1]
> >        234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
> >        bitmap: 0/2 pages [0KB], 65536KB chunk
> > 
> > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
> >        11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
> > 
> > md1 : active raid1 sdb3[0] sda3[1]
> >        484118656 blocks [2/2] [UU]
> > 
> > md2 : active raid1 sdb1[0] sda1[1]
> >        255936 blocks [2/2] [UU]
> > 
> > unused devices: <none>
> > 
> > > > 3 days later:
> > > Is raid6check still in 'D' state as before?
> > Yes, nothing changed, still running:
> > 
> > top - 08:39:30 up 8 days, 16:41,  3 users,  load average: 1.00, 1.00, 1.00
> > Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
> > %Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
> > %Cpu1  :  1.0 us,  5.4 sy,  0.0 ni, 92.2 id,  0.7 wa,  0.3 hi,  0.3 si,  0.0 st
> > %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > MiB Mem :  24034.6 total,  10920.6 free,   1883.0 used,  11231.1 buff/cache
> > MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21756.5 avail Mem
> > 
> >      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >    19719 root      20   0    2852   2820   2020 D   7.6   0.0 679:04.39 raid6check
> 
> I think the stack of raid6check is pretty much the same as before.
> 
> Since the estimated time of 12TB array is about 57 days, if the estimated
> time is linear to
> the number of stripes in the same machine, then it is how raid6check works
> as I guessed.
> 
> Thanks,
> Guoqing

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11  8:58     ` Guoqing Jiang
  2020-05-11 15:39       ` Piergiorgio Sartor
@ 2020-05-11 16:14       ` Piergiorgio Sartor
  2020-05-11 20:53         ` Giuseppe Bilotta
  2020-05-11 21:07         ` Guoqing Jiang
  1 sibling, 2 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-11 16:14 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Wolfgang Denk, linux-raid

On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> Hi Wolfgang,
> 
> 
> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > Dear Guoqing Jiang,
> > 
> > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > /proc/19719/stack' and /proc/mdstat?
> > # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> is called,
> means synchronize_rcu and other synchronize mechanisms are triggered in the
> path ...
> 
> > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > all the time?  I thought it was _reading_ the disks only?
> 
> I didn't read raid6check before, just find check_stripes has
> 
> 
>     while (length > 0) {
>             lock_stripe -> write suspend_lo/hi node
>             ...
>             unlock_all_stripes -> -> write suspend_lo/hi node
>     }
> 
> I think it explains the stack of raid6check, and maybe it is way that
> raid6check works, lock
> stripe, check the stripe then unlock the stripe, just my guess ...

Hi again!

I made a quick test.
I disabled the lock / unlock in raid6check.

With lock / unlock, I get around 1.2MB/sec
per device component, with ~13% CPU load.
Wihtout lock / unlock, I get around 15.5MB/sec
per device component, with ~30% CPU load.

So, it seems the lock / unlock mechanism is
quite expensive.

I'm not sure what's the best solution, since
we still need to avoid race conditions.

Any suggestion is welcome!

bye,

pg
 
> > And iostat does not report any writes either?
> 
> Because CPU is busying with mddev_suspend I think.
> 
> > # iostat /dev/sd[efhijklm] | cat
> > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-11      _x86_64_        (8 CPU)
> > 
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >             0.18    0.00    1.07    0.17    0.00   98.58
> > 
> > Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> > sde              20.30       368.76         0.10         0.00  277022327      75178          0
> > sdf              20.28       368.77         0.10         0.00  277030081      75170          0
> > sdh              20.30       368.74         0.10         0.00  277007903      74854          0
> > sdi              20.30       368.79         0.10         0.00  277049113      75246          0
> > sdj              20.82       368.76         0.10         0.00  277022363      74986          0
> > sdk              20.30       368.73         0.10         0.00  277002179      76322          0
> > sdl              20.29       368.78         0.10         0.00  277039743      74982          0
> > sdm              20.29       368.75         0.10         0.00  277018163      74958          0
> > 
> > 
> > # cat /proc/mdstat
> > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> > md3 : active raid10 sdc1[0] sdd1[1]
> >        234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
> >        bitmap: 0/2 pages [0KB], 65536KB chunk
> > 
> > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
> >        11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
> > 
> > md1 : active raid1 sdb3[0] sda3[1]
> >        484118656 blocks [2/2] [UU]
> > 
> > md2 : active raid1 sdb1[0] sda1[1]
> >        255936 blocks [2/2] [UU]
> > 
> > unused devices: <none>
> > 
> > > > 3 days later:
> > > Is raid6check still in 'D' state as before?
> > Yes, nothing changed, still running:
> > 
> > top - 08:39:30 up 8 days, 16:41,  3 users,  load average: 1.00, 1.00, 1.00
> > Tasks: 243 total,   1 running, 242 sleeping,   0 stopped,   0 zombie
> > %Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
> > %Cpu1  :  1.0 us,  5.4 sy,  0.0 ni, 92.2 id,  0.7 wa,  0.3 hi,  0.3 si,  0.0 st
> > %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> > MiB Mem :  24034.6 total,  10920.6 free,   1883.0 used,  11231.1 buff/cache
> > MiB Swap:   7828.5 total,   7828.5 free,      0.0 used.  21756.5 avail Mem
> > 
> >      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >    19719 root      20   0    2852   2820   2020 D   7.6   0.0 679:04.39 raid6check
> 
> I think the stack of raid6check is pretty much the same as before.
> 
> Since the estimated time of 12TB array is about 57 days, if the estimated
> time is linear to
> the number of stripes in the same machine, then it is how raid6check works
> as I guessed.
> 
> Thanks,
> Guoqing

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 16:14       ` Piergiorgio Sartor
@ 2020-05-11 20:53         ` Giuseppe Bilotta
  2020-05-11 21:12           ` Guoqing Jiang
  2020-05-12 16:05           ` Piergiorgio Sartor
  2020-05-11 21:07         ` Guoqing Jiang
  1 sibling, 2 replies; 38+ messages in thread
From: Giuseppe Bilotta @ 2020-05-11 20:53 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Guoqing Jiang, Wolfgang Denk, linux-raid

Hello Piergiorgio,

On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:
> Hi again!
>
> I made a quick test.
> I disabled the lock / unlock in raid6check.
>
> With lock / unlock, I get around 1.2MB/sec
> per device component, with ~13% CPU load.
> Wihtout lock / unlock, I get around 15.5MB/sec
> per device component, with ~30% CPU load.
>
> So, it seems the lock / unlock mechanism is
> quite expensive.
>
> I'm not sure what's the best solution, since
> we still need to avoid race conditions.
>
> Any suggestion is welcome!

Would it be possible/effective to lock multiple stripes at once? Lock,
say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
internals, but if locking is O(1) on the number of stripes (at least
if they are consecutive), this would help reduce (potentially by a
factor of 8 or 16) the costs of the locks/unlocks at the expense of
longer locks and their influence on external I/O.

--
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 16:14       ` Piergiorgio Sartor
  2020-05-11 20:53         ` Giuseppe Bilotta
@ 2020-05-11 21:07         ` Guoqing Jiang
  2020-05-11 22:44           ` Peter Grandi
  2020-05-12 16:07           ` Piergiorgio Sartor
  1 sibling, 2 replies; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 21:07 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid

On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
>> Hi Wolfgang,
>>
>>
>> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
>>> Dear Guoqing Jiang,
>>>
>>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com>  you wrote:
>>>> Seems raid6check is in 'D' state, what are the output of 'cat
>>>> /proc/19719/stack' and /proc/mdstat?
>>> # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_lo_store+0x50/0xa0
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_lo_store+0x50/0xa0
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_hi_store+0x44/0x90
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>> [<0>] synchronize_rcu+0x47/0x50
>>> [<0>] mddev_suspend+0x4a/0x140
>>> [<0>] suspend_hi_store+0x44/0x90
>>> [<0>] md_attr_store+0x86/0xe0
>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>> [<0>] vfs_write+0xb6/0x1a0
>>> [<0>] ksys_write+0x4f/0xc0
>>> [<0>] do_syscall_64+0x5b/0xf0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
>> is called,
>> means synchronize_rcu and other synchronize mechanisms are triggered in the
>> path ...
>>
>>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
>>> all the time?  I thought it was_reading_  the disks only?
>> I didn't read raid6check before, just find check_stripes has
>>
>>
>>      while (length > 0) {
>>              lock_stripe -> write suspend_lo/hi node
>>              ...
>>              unlock_all_stripes -> -> write suspend_lo/hi node
>>      }
>>
>> I think it explains the stack of raid6check, and maybe it is way that
>> raid6check works, lock
>> stripe, check the stripe then unlock the stripe, just my guess ...
> Hi again!
>
> I made a quick test.
> I disabled the lock / unlock in raid6check.
>
> With lock / unlock, I get around 1.2MB/sec
> per device component, with ~13% CPU load.
> Wihtout lock / unlock, I get around 15.5MB/sec
> per device component, with ~30% CPU load.
>
> So, it seems the lock / unlock mechanism is
> quite expensive.

Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.

> I'm not sure what's the best solution, since
> we still need to avoid race conditions.

I guess there are two possible ways:

1. Per your previous reply, only call raid6check when array is RO, then
we don't need the lock.

2. Investigate if it is possible that acquire stripe_lock in 
suspend_lo/hi_store
to avoid the race between raid6check and write to the same stripe. IOW,
try fine grained protection instead of call the expensive suspend/resume
in suspend_lo/hi_store. But I am not sure it is doable or not right now.


BTW, seems there are build problems for raid6check ...

mdadm$ make raid6check
gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter 
-Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" 
-DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" 
-DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" 
-DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM 
-DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" 
-DUSE_PTHREADS -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c
gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o 
xmalloc.o dlink.o
sysfs.o: In function `sysfsline':
sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
collect2: error: ld returned 1 exit status
Makefile:220: recipe for target 'raid6check' failed
make: *** [raid6check] Error 1


Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 20:53         ` Giuseppe Bilotta
@ 2020-05-11 21:12           ` Guoqing Jiang
  2020-05-11 21:16             ` Guoqing Jiang
  2020-05-12 16:05           ` Piergiorgio Sartor
  1 sibling, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 21:12 UTC (permalink / raw)
  To: Giuseppe Bilotta, Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid

On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> Hello Piergiorgio,
>
> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
>> Hi again!
>>
>> I made a quick test.
>> I disabled the lock / unlock in raid6check.
>>
>> With lock / unlock, I get around 1.2MB/sec
>> per device component, with ~13% CPU load.
>> Wihtout lock / unlock, I get around 15.5MB/sec
>> per device component, with ~30% CPU load.
>>
>> So, it seems the lock / unlock mechanism is
>> quite expensive.
>>
>> I'm not sure what's the best solution, since
>> we still need to avoid race conditions.
>>
>> Any suggestion is welcome!
> Would it be possible/effective to lock multiple stripes at once? Lock,
> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> internals, but if locking is O(1) on the number of stripes (at least
> if they are consecutive), this would help reduce (potentially by a
> factor of 8 or 16) the costs of the locks/unlocks at the expense of
> longer locks and their influence on external I/O.
>

Hmm, maybe something like.

check_stripes
	
	-> mddev_suspend
	
	while (whole_stripe_num--) {
		check each stripe
	}
	
	-> mddev_resume


Then just need to call suspend/resume once.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 21:12           ` Guoqing Jiang
@ 2020-05-11 21:16             ` Guoqing Jiang
  2020-05-12  1:52               ` Giuseppe Bilotta
  0 siblings, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-11 21:16 UTC (permalink / raw)
  To: Giuseppe Bilotta, Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid

On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
>> Hello Piergiorgio,
>>
>> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
>> <piergiorgio.sartor@nexgo.de> wrote:
>>> Hi again!
>>>
>>> I made a quick test.
>>> I disabled the lock / unlock in raid6check.
>>>
>>> With lock / unlock, I get around 1.2MB/sec
>>> per device component, with ~13% CPU load.
>>> Wihtout lock / unlock, I get around 15.5MB/sec
>>> per device component, with ~30% CPU load.
>>>
>>> So, it seems the lock / unlock mechanism is
>>> quite expensive.
>>>
>>> I'm not sure what's the best solution, since
>>> we still need to avoid race conditions.
>>>
>>> Any suggestion is welcome!
>> Would it be possible/effective to lock multiple stripes at once? Lock,
>> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
>> internals, but if locking is O(1) on the number of stripes (at least
>> if they are consecutive), this would help reduce (potentially by a
>> factor of 8 or 16) the costs of the locks/unlocks at the expense of
>> longer locks and their influence on external I/O.
>>
>
> Hmm, maybe something like.
>
> check_stripes
>
>     -> mddev_suspend
>
>     while (whole_stripe_num--) {
>         check each stripe
>     }
>
>     -> mddev_resume
>
>
> Then just need to call suspend/resume once.

But basically, the array can't process any new requests when checking is 
in progress ...

Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 21:07         ` Guoqing Jiang
@ 2020-05-11 22:44           ` Peter Grandi
  2020-05-12 16:09             ` Piergiorgio Sartor
  2020-05-12 16:07           ` Piergiorgio Sartor
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Grandi @ 2020-05-11 22:44 UTC (permalink / raw)
  To: Linux RAID

>>> With lock / unlock, I get around 1.2MB/sec per device
>>> component, with ~13% CPU load.  Wihtout lock / unlock, I get
>>> around 15.5MB/sec per device component, with ~30% CPU load.

>> [...] we still need to avoid race conditions. [...]

Not all race conditions are equally bad in this situation.

> 1. Per your previous reply, only call raid6check when array is
> RO, then we don't need the lock.
> 2. Investigate if it is possible that acquire stripe_lock in
> suspend_lo/hi_store [...]

Some other ways could be considered:

* Read a stripe without locking and check it; if it checks good,
  no problem, else either it was modified during the read, or it
  was faulty, so acquire a W lock, reread and recheck it (it
  could have become good in the meantime).

  The assumption here is that there is a modest write load from
  applications on the RAID set, so the check will almost always
  succeed, and it is worth rereading the stripe in very rare
  cases of "collisions" or faults.

* Variants, like acquiring a W lock (if possible) on the stripe
  solely while reading it ("atomic" read, which may be possible
  in other ways without locking) and then if check fails we know
  it was faulty, so optionally acquire a new W lock and reread
  and recheck it (it could have become good in the meantime).

  The assumption here is that the write load is less modest, but
  there are a lot more reads than writes, so a W lock only
  during read will eliminate the rereads and rechecks from
  relatively rare "collisions".

The case where there is at the same time a large application
write load on the RAID set and checking at the same time is hard
to improve and probably eliminating rereads and rechecks by just
acquiring the stripe W lock for the whole duration of read and
check.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 21:16             ` Guoqing Jiang
@ 2020-05-12  1:52               ` Giuseppe Bilotta
  2020-05-12  6:27                 ` Adam Goryachev
  0 siblings, 1 reply; 38+ messages in thread
From: Giuseppe Bilotta @ 2020-05-12  1:52 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid

On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
> On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> >> Would it be possible/effective to lock multiple stripes at once? Lock,
> >> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> >> internals, but if locking is O(1) on the number of stripes (at least
> >> if they are consecutive), this would help reduce (potentially by a
> >> factor of 8 or 16) the costs of the locks/unlocks at the expense of
> >> longer locks and their influence on external I/O.
> >>
> >
> > Hmm, maybe something like.
> >
> > check_stripes
> >
> >     -> mddev_suspend
> >
> >     while (whole_stripe_num--) {
> >         check each stripe
> >     }
> >
> >     -> mddev_resume
> >
> >
> > Then just need to call suspend/resume once.
>
> But basically, the array can't process any new requests when checking is

Yeah, locking the entire device might be excessive (especially if it's
a big one). Using a granularity larger than 1 but smaller than the
whole device could be a compromise. Since the “no lock” approach seems
to be about an order of magnitude faster (at least in Piergiorgio's
benchmark), my guess was that something between 8 and 16 could bring
the speed up to be close to the “no lock” case without having dramatic
effects on I/O. Reading all 8/16 stripes before processing (assuming
sufficient memory) might even lead to better disk utilization during
the check.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12  1:52               ` Giuseppe Bilotta
@ 2020-05-12  6:27                 ` Adam Goryachev
  2020-05-12 16:11                   ` Piergiorgio Sartor
  0 siblings, 1 reply; 38+ messages in thread
From: Adam Goryachev @ 2020-05-12  6:27 UTC (permalink / raw)
  To: Giuseppe Bilotta, Guoqing Jiang
  Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid


On 12/5/20 11:52, Giuseppe Bilotta wrote:
> On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
>> On 5/11/20 11:12 PM, Guoqing Jiang wrote:
>>> On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
>>>> Would it be possible/effective to lock multiple stripes at once? Lock,
>>>> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
>>>> internals, but if locking is O(1) on the number of stripes (at least
>>>> if they are consecutive), this would help reduce (potentially by a
>>>> factor of 8 or 16) the costs of the locks/unlocks at the expense of
>>>> longer locks and their influence on external I/O.
>>>>
>>> Hmm, maybe something like.
>>>
>>> check_stripes
>>>
>>>      -> mddev_suspend
>>>
>>>      while (whole_stripe_num--) {
>>>          check each stripe
>>>      }
>>>
>>>      -> mddev_resume
>>>
>>>
>>> Then just need to call suspend/resume once.
>> But basically, the array can't process any new requests when checking is
> Yeah, locking the entire device might be excessive (especially if it's
> a big one). Using a granularity larger than 1 but smaller than the
> whole device could be a compromise. Since the “no lock” approach seems
> to be about an order of magnitude faster (at least in Piergiorgio's
> benchmark), my guess was that something between 8 and 16 could bring
> the speed up to be close to the “no lock” case without having dramatic
> effects on I/O. Reading all 8/16 stripes before processing (assuming
> sufficient memory) might even lead to better disk utilization during
> the check.

I know very little about this, but could you perhaps lock 2 x 16 
stripes, and then after you complete the first 16, release the first 16, 
lock the 3rd 16 stripes, and while waiting for the lock continue to 
process the 2nd set of 16?

Would that allow you to do more processing and less waiting for 
lock/release?

Regards,
Adam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 15:39       ` Piergiorgio Sartor
@ 2020-05-12  7:37         ` Wolfgang Denk
  2020-05-12 16:17           ` Piergiorgio Sartor
  0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-12  7:37 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1793 bytes --]

Dear Piergiorgio,

In message <20200511153937.GA3225@lazy.lzy> you wrote:
> >     while (length > 0) {
> >             lock_stripe -> write suspend_lo/hi node
> >             ...
> >             unlock_all_stripes -> -> write suspend_lo/hi node
> >     }
> > 
> > I think it explains the stack of raid6check, and maybe it is way that
> > raid6check works, lock
> > stripe, check the stripe then unlock the stripe, just my guess ...
>
> Yes, that's the way it works.
> raid6check lock the stripe, check it, release it.
> This is required in order to avoid race conditions
> between raid6check and some write to the stripe.

This still does not really explain what is so slow here.  I mean,
even if the locking was an expenive operation code-wise, I would
expect to see at least one of the CPU cores near 100% then - but
botch CPU _and_ I/O are basically idle, and disks are _all_ and
_always_ really close at a trhoughput of 400 kB/s - this looks like
some intentional bandwith limit - I just can't see where this can be
configured?

> This could be a way to test if the problem is
> really here.
> That is, remove the lock / unlock (I guess
> there should be only one pair, but better
> check) and check with the array in R/O mode.

I may try this again after this test completed ;-)

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
It's certainly  convenient  the  way  the  crime  (or  condition)  of
stupidity   carries   with   it  its  own  punishment,  automatically
admisistered without remorse, pity, or prejudice. :-)
         -- Tom Christiansen in <559seq$ag1$1@csnews.cs.colorado.edu>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 20:53         ` Giuseppe Bilotta
  2020-05-11 21:12           ` Guoqing Jiang
@ 2020-05-12 16:05           ` Piergiorgio Sartor
  1 sibling, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:05 UTC (permalink / raw)
  To: Giuseppe Bilotta
  Cc: Piergiorgio Sartor, Guoqing Jiang, Wolfgang Denk, linux-raid

On Mon, May 11, 2020 at 10:53:05PM +0200, Giuseppe Bilotta wrote:
> Hello Piergiorgio,
> 
> On Mon, May 11, 2020 at 6:15 PM Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> > Hi again!
> >
> > I made a quick test.
> > I disabled the lock / unlock in raid6check.
> >
> > With lock / unlock, I get around 1.2MB/sec
> > per device component, with ~13% CPU load.
> > Wihtout lock / unlock, I get around 15.5MB/sec
> > per device component, with ~30% CPU load.
> >
> > So, it seems the lock / unlock mechanism is
> > quite expensive.
> >
> > I'm not sure what's the best solution, since
> > we still need to avoid race conditions.
> >
> > Any suggestion is welcome!
> 
> Would it be possible/effective to lock multiple stripes at once? Lock,
> say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> internals, but if locking is O(1) on the number of stripes (at least
> if they are consecutive), this would help reduce (potentially by a
> factor of 8 or 16) the costs of the locks/unlocks at the expense of
> longer locks and their influence on external I/O.

Probabily possible, from the technical
point of view, even if I do not know
either the effect.

From the coding point of view, a bit
tricky, boundary conditions and so on
must be properly considered.

> 
> --
> Giuseppe "Oblomov" Bilotta

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 21:07         ` Guoqing Jiang
  2020-05-11 22:44           ` Peter Grandi
@ 2020-05-12 16:07           ` Piergiorgio Sartor
  2020-05-12 18:16             ` Guoqing Jiang
  2020-05-13  6:07             ` Wolfgang Denk
  1 sibling, 2 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:07 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid

On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
> On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> > > Hi Wolfgang,
> > > 
> > > 
> > > On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > > > Dear Guoqing Jiang,
> > > > 
> > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com>  you wrote:
> > > > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > > > /proc/19719/stack' and /proc/mdstat?
> > > > # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > 
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > 
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_hi_store+0x44/0x90
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > 
> > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > [<0>] synchronize_rcu+0x47/0x50
> > > > [<0>] mddev_suspend+0x4a/0x140
> > > > [<0>] suspend_hi_store+0x44/0x90
> > > > [<0>] md_attr_store+0x86/0xe0
> > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > [<0>] vfs_write+0xb6/0x1a0
> > > > [<0>] ksys_write+0x4f/0xc0
> > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> > > is called,
> > > means synchronize_rcu and other synchronize mechanisms are triggered in the
> > > path ...
> > > 
> > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > > > all the time?  I thought it was_reading_  the disks only?
> > > I didn't read raid6check before, just find check_stripes has
> > > 
> > > 
> > >      while (length > 0) {
> > >              lock_stripe -> write suspend_lo/hi node
> > >              ...
> > >              unlock_all_stripes -> -> write suspend_lo/hi node
> > >      }
> > > 
> > > I think it explains the stack of raid6check, and maybe it is way that
> > > raid6check works, lock
> > > stripe, check the stripe then unlock the stripe, just my guess ...
> > Hi again!
> > 
> > I made a quick test.
> > I disabled the lock / unlock in raid6check.
> > 
> > With lock / unlock, I get around 1.2MB/sec
> > per device component, with ~13% CPU load.
> > Wihtout lock / unlock, I get around 15.5MB/sec
> > per device component, with ~30% CPU load.
> > 
> > So, it seems the lock / unlock mechanism is
> > quite expensive.
> 
> Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
> 
> > I'm not sure what's the best solution, since
> > we still need to avoid race conditions.
> 
> I guess there are two possible ways:
> 
> 1. Per your previous reply, only call raid6check when array is RO, then
> we don't need the lock.
> 
> 2. Investigate if it is possible that acquire stripe_lock in
> suspend_lo/hi_store
> to avoid the race between raid6check and write to the same stripe. IOW,
> try fine grained protection instead of call the expensive suspend/resume
> in suspend_lo/hi_store. But I am not sure it is doable or not right now.

Could you please elaborate on the
"fine grained protection" thing?
 
> 
> BTW, seems there are build problems for raid6check ...
> 
> mdadm$ make raid6check
> gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
> -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
> -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
> -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
> -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
> -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
> -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c
> gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
> xmalloc.o dlink.o
> sysfs.o: In function `sysfsline':
> sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> collect2: error: ld returned 1 exit status
> Makefile:220: recipe for target 'raid6check' failed
> make: *** [raid6check] Error 1

I cannot see this problem.
I could compile without issue.
Maybe some library is missing somewhere,
but I'm not sure where.

bye,

pg

> 
> Thanks,
> Guoqing

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-11 22:44           ` Peter Grandi
@ 2020-05-12 16:09             ` Piergiorgio Sartor
  2020-05-12 20:54               ` antlists
  0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:09 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID

On Mon, May 11, 2020 at 11:44:11PM +0100, Peter Grandi wrote:
> >>> With lock / unlock, I get around 1.2MB/sec per device
> >>> component, with ~13% CPU load.  Wihtout lock / unlock, I get
> >>> around 15.5MB/sec per device component, with ~30% CPU load.
> 
> >> [...] we still need to avoid race conditions. [...]
> 
> Not all race conditions are equally bad in this situation.
> 
> > 1. Per your previous reply, only call raid6check when array is
> > RO, then we don't need the lock.
> > 2. Investigate if it is possible that acquire stripe_lock in
> > suspend_lo/hi_store [...]
> 
> Some other ways could be considered:
> 
> * Read a stripe without locking and check it; if it checks good,
>   no problem, else either it was modified during the read, or it
>   was faulty, so acquire a W lock, reread and recheck it (it
>   could have become good in the meantime).
> 
>   The assumption here is that there is a modest write load from
>   applications on the RAID set, so the check will almost always
>   succeed, and it is worth rereading the stripe in very rare
>   cases of "collisions" or faults.
> 
> * Variants, like acquiring a W lock (if possible) on the stripe
>   solely while reading it ("atomic" read, which may be possible
>   in other ways without locking) and then if check fails we know
>   it was faulty, so optionally acquire a new W lock and reread
>   and recheck it (it could have become good in the meantime).
> 
>   The assumption here is that the write load is less modest, but
>   there are a lot more reads than writes, so a W lock only
>   during read will eliminate the rereads and rechecks from
>   relatively rare "collisions".

The locking method was suggested by Neil,
I'm not aware of other methods.

About the check -> maybe lock -> re-check,
it is a possible workaround, but I find it
a bit extreme.

In any case, we should keep it in mind.

bye,

pg
 
> The case where there is at the same time a large application
> write load on the RAID set and checking at the same time is hard
> to improve and probably eliminating rereads and rechecks by just
> acquiring the stripe W lock for the whole duration of read and
> check.

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12  6:27                 ` Adam Goryachev
@ 2020-05-12 16:11                   ` Piergiorgio Sartor
  0 siblings, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:11 UTC (permalink / raw)
  To: Adam Goryachev
  Cc: Giuseppe Bilotta, Guoqing Jiang, Piergiorgio Sartor,
	Wolfgang Denk, linux-raid

On Tue, May 12, 2020 at 04:27:59PM +1000, Adam Goryachev wrote:
> 
> On 12/5/20 11:52, Giuseppe Bilotta wrote:
> > On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
> > <guoqing.jiang@cloud.ionos.com> wrote:
> > > On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> > > > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> > > > > Would it be possible/effective to lock multiple stripes at once? Lock,
> > > > > say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> > > > > internals, but if locking is O(1) on the number of stripes (at least
> > > > > if they are consecutive), this would help reduce (potentially by a
> > > > > factor of 8 or 16) the costs of the locks/unlocks at the expense of
> > > > > longer locks and their influence on external I/O.
> > > > > 
> > > > Hmm, maybe something like.
> > > > 
> > > > check_stripes
> > > > 
> > > >      -> mddev_suspend
> > > > 
> > > >      while (whole_stripe_num--) {
> > > >          check each stripe
> > > >      }
> > > > 
> > > >      -> mddev_resume
> > > > 
> > > > 
> > > > Then just need to call suspend/resume once.
> > > But basically, the array can't process any new requests when checking is
> > Yeah, locking the entire device might be excessive (especially if it's
> > a big one). Using a granularity larger than 1 but smaller than the
> > whole device could be a compromise. Since the “no lock” approach seems
> > to be about an order of magnitude faster (at least in Piergiorgio's
> > benchmark), my guess was that something between 8 and 16 could bring
> > the speed up to be close to the “no lock” case without having dramatic
> > effects on I/O. Reading all 8/16 stripes before processing (assuming
> > sufficient memory) might even lead to better disk utilization during
> > the check.
> 
> I know very little about this, but could you perhaps lock 2 x 16 stripes,
> and then after you complete the first 16, release the first 16, lock the 3rd
> 16 stripes, and while waiting for the lock continue to process the 2nd set
> of 16?

For some reason I don not know, the unlock
is global.
If I recall correctly, this was the way
Neil mentioned is "more" correct.
 
> Would that allow you to do more processing and less waiting for
> lock/release?

I think the general concept of pipelineing
is good, this would really improve the
performances of the whole thing.
If we could just multithread, I suspect
it could improve.

We need to solve the unlock problem...

bye,

> 
> Regards,
> Adam

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12  7:37         ` Wolfgang Denk
@ 2020-05-12 16:17           ` Piergiorgio Sartor
  2020-05-13  6:13             ` Wolfgang Denk
  0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 16:17 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Piergiorgio Sartor, Guoqing Jiang, linux-raid

On Tue, May 12, 2020 at 09:37:47AM +0200, Wolfgang Denk wrote:
> Dear Piergiorgio,
> 
> In message <20200511153937.GA3225@lazy.lzy> you wrote:
> > > ??? while (length > 0) {
> > > ??? ??? ??? lock_stripe -> write suspend_lo/hi node
> > > ??? ??? ??? ...
> > > ??? ??? ??? unlock_all_stripes -> -> write suspend_lo/hi node
> > > ??? }
> > > 
> > > I think it explains the stack of raid6check, and maybe it is way that
> > > raid6check works, lock
> > > stripe, check the stripe then unlock the stripe, just my guess ...
> >
> > Yes, that's the way it works.
> > raid6check lock the stripe, check it, release it.
> > This is required in order to avoid race conditions
> > between raid6check and some write to the stripe.
> 
> This still does not really explain what is so slow here.  I mean,
> even if the locking was an expenive operation code-wise, I would
> expect to see at least one of the CPU cores near 100% then - but
> botch CPU _and_ I/O are basically idle, and disks are _all_ and
> _always_ really close at a trhoughput of 400 kB/s - this looks like
> some intentional bandwith limit - I just can't see where this can be
> configured?

The code has 2 functions: lock_stripe() and
unlock_all_stripes().

These are doing more than just lock / unlock.
First, the memory pages of the process will
be locked, then some signal will be set to
"ignore", then the strip will be locked.

The unlock does the opposite in the reverse
order (unlock, set the signal back, unlock
the memory pages).
The difference is that, whatever the reason,
the unlock unlocks *all* the stripes, not
only the one locked.

Not sure why.
 
> > This could be a way to test if the problem is
> > really here.
> > That is, remove the lock / unlock (I guess
> > there should be only one pair, but better
> > check) and check with the array in R/O mode.
> 
> I may try this again after this test completed ;-)

I did it, some performance improvement,
even if not really the possible max.

bye,

pg

> Best regards,
> 
> Wolfgang Denk
> 
> -- 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> It's certainly  convenient  the  way  the  crime  (or  condition)  of
> stupidity   carries   with   it  its  own  punishment,  automatically
> admisistered without remorse, pity, or prejudice. :-)
>          -- Tom Christiansen in <559seq$ag1$1@csnews.cs.colorado.edu>

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 16:07           ` Piergiorgio Sartor
@ 2020-05-12 18:16             ` Guoqing Jiang
  2020-05-12 18:32               ` Piergiorgio Sartor
  2020-05-13  6:07             ` Wolfgang Denk
  1 sibling, 1 reply; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-12 18:16 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Wolfgang Denk, linux-raid

On 5/12/20 6:07 PM, Piergiorgio Sartor wrote:
> On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
>> On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
>>> On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
>>>> Hi Wolfgang,
>>>>
>>>>
>>>> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
>>>>> Dear Guoqing Jiang,
>>>>>
>>>>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com>  you wrote:
>>>>>> Seems raid6check is in 'D' state, what are the output of 'cat
>>>>>> /proc/19719/stack' and /proc/mdstat?
>>>>> # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_lo_store+0x50/0xa0
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_lo_store+0x50/0xa0
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_hi_store+0x44/0x90
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_hi_store+0x44/0x90
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
>>>> is called,
>>>> means synchronize_rcu and other synchronize mechanisms are triggered in the
>>>> path ...
>>>>
>>>>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
>>>>> all the time?  I thought it was_reading_  the disks only?
>>>> I didn't read raid6check before, just find check_stripes has
>>>>
>>>>
>>>>       while (length > 0) {
>>>>               lock_stripe -> write suspend_lo/hi node
>>>>               ...
>>>>               unlock_all_stripes -> -> write suspend_lo/hi node
>>>>       }
>>>>
>>>> I think it explains the stack of raid6check, and maybe it is way that
>>>> raid6check works, lock
>>>> stripe, check the stripe then unlock the stripe, just my guess ...
>>> Hi again!
>>>
>>> I made a quick test.
>>> I disabled the lock / unlock in raid6check.
>>>
>>> With lock / unlock, I get around 1.2MB/sec
>>> per device component, with ~13% CPU load.
>>> Wihtout lock / unlock, I get around 15.5MB/sec
>>> per device component, with ~30% CPU load.
>>>
>>> So, it seems the lock / unlock mechanism is
>>> quite expensive.
>> Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
>>
>>> I'm not sure what's the best solution, since
>>> we still need to avoid race conditions.
>> I guess there are two possible ways:
>>
>> 1. Per your previous reply, only call raid6check when array is RO, then
>> we don't need the lock.
>>
>> 2. Investigate if it is possible that acquire stripe_lock in
>> suspend_lo/hi_store
>> to avoid the race between raid6check and write to the same stripe. IOW,
>> try fine grained protection instead of call the expensive suspend/resume
>> in suspend_lo/hi_store. But I am not sure it is doable or not right now.
> Could you please elaborate on the
> "fine grained protection" thing?

Even raid6check checks stripe and locks stripe one by one, but the thing
is different in kernel space, locking of one stripe triggers mddev_suspend
and mddev_resume which affect all stripes ...

If kernel can expose interface to actually locking one stripe, then 
raid6check
could use it to actually lock only one stripe (this is what I call fine 
grained)
instead of trigger suspend/resume which are time consuming.

>   
>> BTW, seems there are build problems for raid6check ...
>>
>> mdadm$ make raid6check
>> gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
>> -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
>> -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
>> -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
>> -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
>> -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
>> -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c
>> gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
>> xmalloc.o dlink.o
>> sysfs.o: In function `sysfsline':
>> sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
>> sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
>> sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
>> collect2: error: ld returned 1 exit status
>> Makefile:220: recipe for target 'raid6check' failed
>> make: *** [raid6check] Error 1
> I cannot see this problem.
> I could compile without issue.
> Maybe some library is missing somewhere,
> but I'm not sure where.

Do you try with the fastest mdadm tree? But could be environment issue ...

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 18:16             ` Guoqing Jiang
@ 2020-05-12 18:32               ` Piergiorgio Sartor
  2020-05-13  6:18                 ` Wolfgang Denk
  0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-12 18:32 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Piergiorgio Sartor, Wolfgang Denk, linux-raid

On Tue, May 12, 2020 at 08:16:27PM +0200, Guoqing Jiang wrote:
> On 5/12/20 6:07 PM, Piergiorgio Sartor wrote:
> > On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
> > > On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> > > > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> > > > > Hi Wolfgang,
> > > > > 
> > > > > 
> > > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > > > > > Dear Guoqing Jiang,
> > > > > > 
> > > > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com>  you wrote:
> > > > > > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > > > > > /proc/19719/stack' and /proc/mdstat?
> > > > > > # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > 
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > 
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_hi_store+0x44/0x90
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > 
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_hi_store+0x44/0x90
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> > > > > is called,
> > > > > means synchronize_rcu and other synchronize mechanisms are triggered in the
> > > > > path ...
> > > > > 
> > > > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > > > > > all the time?  I thought it was_reading_  the disks only?
> > > > > I didn't read raid6check before, just find check_stripes has
> > > > > 
> > > > > 
> > > > >       while (length > 0) {
> > > > >               lock_stripe -> write suspend_lo/hi node
> > > > >               ...
> > > > >               unlock_all_stripes -> -> write suspend_lo/hi node
> > > > >       }
> > > > > 
> > > > > I think it explains the stack of raid6check, and maybe it is way that
> > > > > raid6check works, lock
> > > > > stripe, check the stripe then unlock the stripe, just my guess ...
> > > > Hi again!
> > > > 
> > > > I made a quick test.
> > > > I disabled the lock / unlock in raid6check.
> > > > 
> > > > With lock / unlock, I get around 1.2MB/sec
> > > > per device component, with ~13% CPU load.
> > > > Wihtout lock / unlock, I get around 15.5MB/sec
> > > > per device component, with ~30% CPU load.
> > > > 
> > > > So, it seems the lock / unlock mechanism is
> > > > quite expensive.
> > > Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
> > > 
> > > > I'm not sure what's the best solution, since
> > > > we still need to avoid race conditions.
> > > I guess there are two possible ways:
> > > 
> > > 1. Per your previous reply, only call raid6check when array is RO, then
> > > we don't need the lock.
> > > 
> > > 2. Investigate if it is possible that acquire stripe_lock in
> > > suspend_lo/hi_store
> > > to avoid the race between raid6check and write to the same stripe. IOW,
> > > try fine grained protection instead of call the expensive suspend/resume
> > > in suspend_lo/hi_store. But I am not sure it is doable or not right now.
> > Could you please elaborate on the
> > "fine grained protection" thing?
> 
> Even raid6check checks stripe and locks stripe one by one, but the thing
> is different in kernel space, locking of one stripe triggers mddev_suspend
> and mddev_resume which affect all stripes ...
> 
> If kernel can expose interface to actually locking one stripe, then
> raid6check
> could use it to actually lock only one stripe (this is what I call fine
> grained)
> instead of trigger suspend/resume which are time consuming.

I see, you mean we need a different
interface to this lock / unlock thing.

> > > BTW, seems there are build problems for raid6check ...
> > > 
> > > mdadm$ make raid6check
> > > gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
> > > -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
> > > -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
> > > -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
> > > -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
> > > -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
> > > -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c
> > > gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
> > > xmalloc.o dlink.o
> > > sysfs.o: In function `sysfsline':
> > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> > > collect2: error: ld returned 1 exit status
> > > Makefile:220: recipe for target 'raid6check' failed
> > > make: *** [raid6check] Error 1
> > I cannot see this problem.
> > I could compile without issue.
> > Maybe some library is missing somewhere,
> > but I'm not sure where.
> 
> Do you try with the fastest mdadm tree? But could be environment issue ...

I'm using Fedora, so I downloaded
the .srpm package, installed, enabled
raid6check, patched and rebuild...

My background idea was to have the
mdadm rpm *with* raid6check, but I
did not go so far...

Sorry...

bye,

pg
 
> Thanks,
> Guoqing

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 16:09             ` Piergiorgio Sartor
@ 2020-05-12 20:54               ` antlists
  2020-05-13 16:18                 ` Piergiorgio Sartor
  0 siblings, 1 reply; 38+ messages in thread
From: antlists @ 2020-05-12 20:54 UTC (permalink / raw)
  To: Piergiorgio Sartor, Peter Grandi; +Cc: Linux RAID

On 12/05/2020 17:09, Piergiorgio Sartor wrote:
> About the check -> maybe lock -> re-check,
> it is a possible workaround, but I find it
> a bit extreme.

This seems the best (most obvious?) solution to me.

If the system is under light write pressure, and the disk is healthy, it 
will scan pretty quickly with almost no locking.

If the system is under heavy pressure, chances are there'll be a fair 
few stripes needing rechecking, but even at it's worst it'll only be as 
bad as the current setup.

And if the system is somewhere inbetween, you still stand a good chance 
of a fast scan.

At the end of the day, the rule should always be "lock only if you need 
to" so looking for problems with an optimistic no-lock scan, then 
locking only if needed to check and fix the problem, just feels right.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 16:07           ` Piergiorgio Sartor
  2020-05-12 18:16             ` Guoqing Jiang
@ 2020-05-13  6:07             ` Wolfgang Denk
  2020-05-15 10:34               ` Andrey Jr. Melnikov
  1 sibling, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-13  6:07 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid

Dear Piergiorgio,

In message <20200512160712.GB7261@lazy.lzy> you wrote:
>
> > BTW, seems there are build problems for raid6check ...
...
> I cannot see this problem.
> I could compile without issue.
> Maybe some library is missing somewhere,
> but I'm not sure where.

I see the same problem when trying to build current to of tree
(mdadm-4.1-74-g5cfb79d):

-> make raid6check
...
gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\"  -o dlink.o -c dlink.c
In function "dl_strndup",
    inlined from "dl_strdup" at dlink.c:73:12:
dlink.c:66:5: error: "strncpy" output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
   66 |     strncpy(n, s, l);
      |     ^~~~~~~~~~~~~~~~
dlink.c: In function "dl_strdup":
dlink.c:73:31: note: length computed here
   73 |     return dl_strndup(s, (int)strlen(s));
      |                               ^~~~~~~~~
cc1: all warnings being treated as errors


removing the "-Werror" from the CWFLAGS setting in the Makefile then
leads to:

...
gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
/usr/bin/ld: sysfs.o: in function `sysfsline':
sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
/usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
/usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'

This might come from commit b06815989 "mdadm: load default
sysfs attributes after assemblation"; mdadm-4.1 builds ok.


Build tests were run on Fedora 32.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Calm down, it's *__only* ones and zeroes.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 16:17           ` Piergiorgio Sartor
@ 2020-05-13  6:13             ` Wolfgang Denk
  2020-05-13 16:22               ` Piergiorgio Sartor
  0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-13  6:13 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid

Dear Piergiorgio,

In message <20200512161731.GE7261@lazy.lzy> you wrote:
>
> > This still does not really explain what is so slow here.  I mean,
> > even if the locking was an expenive operation code-wise, I would
> > expect to see at least one of the CPU cores near 100% then - but
> > botch CPU _and_ I/O are basically idle, and disks are _all_ and
> > _always_ really close at a trhoughput of 400 kB/s - this looks like
> > some intentional bandwith limit - I just can't see where this can be
> > configured?
>
> The code has 2 functions: lock_stripe() and
> unlock_all_stripes().
>
> These are doing more than just lock / unlock.
> First, the memory pages of the process will
> be locked, then some signal will be set to
> "ignore", then the strip will be locked.
>
> The unlock does the opposite in the reverse
> order (unlock, set the signal back, unlock
> the memory pages).
> The difference is that, whatever the reason,
> the unlock unlocks *all* the stripes, not
> only the one locked.
>
> Not sure why.

It does not matter how omplex the operation is - I wonder why it is
taking so long: it cannot be CPU bound, as then I would expect to
see any significant CPU load, but none of the cores shows more than
5...6% usage, ever.  Or it is I/O bound, then I would expect to see
I/O wait, but this is also never more than 0.2...0.3%.

And why are all disks running at pretty exaclty 400 kB/s read rate,
all the time?  this looks like some intentinal bandwith limit, but I
cannot find any knob to change it.



Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Be careful what you wish for. You never know who will be listening.
                                      - Terry Pratchett, _Soul Music_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 18:32               ` Piergiorgio Sartor
@ 2020-05-13  6:18                 ` Wolfgang Denk
  0 siblings, 0 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-13  6:18 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Guoqing Jiang, linux-raid

Dear Piergiorgio,

In message <20200512183251.GA11548@lazy.lzy> you wrote:
>
> > > > xmalloc.o dlink.o
> > > > sysfs.o: In function `sysfsline':
> > > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> > > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> > > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> > > > collect2: error: ld returned 1 exit status
> > > > Makefile:220: recipe for target 'raid6check' failed
> > > > make: *** [raid6check] Error 1
> > > I cannot see this problem.
> > > I could compile without issue.
> > > Maybe some library is missing somewhere,
> > > but I'm not sure where.
> > 
> > Do you try with the fastest mdadm tree? But could be environment issue ...
>
> I'm using Fedora, so I downloaded
> the .srpm package, installed, enabled
> raid6check, patched and rebuild...

Fedora 32 is still at mdadm-4.1 (Mon Oct 1 14:27:52 2018), but it
seems the significant change was introduced bu commit b06815989
"mdadm: load default sysfs attributes after assemblation" (Wed Jul 10
13:38:53 2019).

If you try to build top of tree you should see the problem, too
[and the -Werror issue I mentioned before, which is also fixed
in Fedora by local distro patches.]

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
As far as the laws of mathematics refer to reality, they are not cer-
tain, and as far as they are certain, they do not refer  to  reality.
                                                   -- Albert Einstein

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-12 20:54               ` antlists
@ 2020-05-13 16:18                 ` Piergiorgio Sartor
  2020-05-13 17:37                   ` Wols Lists
  0 siblings, 1 reply; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-13 16:18 UTC (permalink / raw)
  To: antlists; +Cc: Piergiorgio Sartor, Peter Grandi, Linux RAID

On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote:
> On 12/05/2020 17:09, Piergiorgio Sartor wrote:
> > About the check -> maybe lock -> re-check,
> > it is a possible workaround, but I find it
> > a bit extreme.
> 
> This seems the best (most obvious?) solution to me.
> 
> If the system is under light write pressure, and the disk is healthy, it
> will scan pretty quickly with almost no locking.

I've some concerns about optimization
solutions which can result in less
performances than the original status.

You mention "write pressure", but there
is an other case, which will cause
read -> lock -> re-read...
Namely, when some chunk is really corrupted.

Now, I do not know, maybe there are other
things we overlook, or maybe not.

I do not know either how likely is that some
situations will occur to reduce performances.

I would prefer a solution which will *only*
improve, without any possible drawback.

Again, this does not mean this approach is
wrong, actually is to be considered.

In the end, I would like also to understand
why the lock / unlock is so expensive.

> If the system is under heavy pressure, chances are there'll be a fair few
> stripes needing rechecking, but even at it's worst it'll only be as bad as
> the current setup.

It will be worse (or worst, I'm always
confused...).
The read and the check will double.

I'm not sure about the read, but the
check is currently expensive.

bye,

pg

> And if the system is somewhere inbetween, you still stand a good chance of a
> fast scan.
> 
> At the end of the day, the rule should always be "lock only if you need to"
> so looking for problems with an optimistic no-lock scan, then locking only
> if needed to check and fix the problem, just feels right.
> 
> Cheers,
> Wol

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-13  6:13             ` Wolfgang Denk
@ 2020-05-13 16:22               ` Piergiorgio Sartor
  0 siblings, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-13 16:22 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Piergiorgio Sartor, Guoqing Jiang, linux-raid

On Wed, May 13, 2020 at 08:13:58AM +0200, Wolfgang Denk wrote:
> Dear Piergiorgio,
> 
> In message <20200512161731.GE7261@lazy.lzy> you wrote:
> >
> > > This still does not really explain what is so slow here.  I mean,
> > > even if the locking was an expenive operation code-wise, I would
> > > expect to see at least one of the CPU cores near 100% then - but
> > > botch CPU _and_ I/O are basically idle, and disks are _all_ and
> > > _always_ really close at a trhoughput of 400 kB/s - this looks like
> > > some intentional bandwith limit - I just can't see where this can be
> > > configured?
> >
> > The code has 2 functions: lock_stripe() and
> > unlock_all_stripes().
> >
> > These are doing more than just lock / unlock.
> > First, the memory pages of the process will
> > be locked, then some signal will be set to
> > "ignore", then the strip will be locked.
> >
> > The unlock does the opposite in the reverse
> > order (unlock, set the signal back, unlock
> > the memory pages).
> > The difference is that, whatever the reason,
> > the unlock unlocks *all* the stripes, not
> > only the one locked.
> >
> > Not sure why.
> 
> It does not matter how omplex the operation is - I wonder why it is
> taking so long: it cannot be CPU bound, as then I would expect to
> see any significant CPU load, but none of the cores shows more than
> 5...6% usage, ever.  Or it is I/O bound, then I would expect to see
> I/O wait, but this is also never more than 0.2...0.3%.
> 
> And why are all disks running at pretty exaclty 400 kB/s read rate,
> all the time?  this looks like some intentinal bandwith limit, but I
> cannot find any knob to change it.

In my test I see 1200KB/sec, or 1.2MB/sec,
which is different than yours.

I do not think there is any bandwidth
limitation, otherwise we should see the
same, I guess.

The low CPU load and low data rate seems
to me a symptom of CPU just systematically
waiting (for something).

It would be like putting in the code, here
and there, some usleep().
In the end we'll see low CPU load and low
data rate, *but* very constant.

Likely, is not either I/O wait, but some
other wait.
It could be not the lock stripe, but the
locking of the process memory pages...

This could be easily test, BTW.
Maybe I'll try...

bye,

pg

> 
> 
> 
> Best regards,
> 
> Wolfgang Denk
> 
> -- 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Be careful what you wish for. You never know who will be listening.
>                                       - Terry Pratchett, _Soul Music_

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-13 16:18                 ` Piergiorgio Sartor
@ 2020-05-13 17:37                   ` Wols Lists
  2020-05-13 18:23                     ` Piergiorgio Sartor
  0 siblings, 1 reply; 38+ messages in thread
From: Wols Lists @ 2020-05-13 17:37 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Peter Grandi, Linux RAID

On 13/05/20 17:18, Piergiorgio Sartor wrote:
> On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote:
>> On 12/05/2020 17:09, Piergiorgio Sartor wrote:
>>> About the check -> maybe lock -> re-check,
>>> it is a possible workaround, but I find it
>>> a bit extreme.
>>
>> This seems the best (most obvious?) solution to me.
>>
>> If the system is under light write pressure, and the disk is healthy, it
>> will scan pretty quickly with almost no locking.
> 
> I've some concerns about optimization
> solutions which can result in less
> performances than the original status.
> 
> You mention "write pressure", but there
> is an other case, which will cause
> read -> lock -> re-read...
> Namely, when some chunk is really corrupted.
> 
Yup. That's why I said "the disk is healthy" :-)

> Now, I do not know, maybe there are other
> things we overlook, or maybe not.
> 
> I do not know either how likely is that some
> situations will occur to reduce performances.
> 
> I would prefer a solution which will *only*
> improve, without any possible drawback.

Wouldn't we all. But if the *normal* case shows an appreciable
improvement, then I'm inclined to write off a "shouldn't happen" case as
"tough luck, shit happens".
> 
> Again, this does not mean this approach is
> wrong, actually is to be considered.
> 
> In the end, I would like also to understand
> why the lock / unlock is so expensive.

Agreed.
> 
>> If the system is under heavy pressure, chances are there'll be a fair few
>> stripes needing rechecking, but even at it's worst it'll only be as bad as
>> the current setup.
> 
> It will be worse (or worst, I'm always
> confused...).
> The read and the check will double.

Touche - my logic was off ...

But a bit of grammar - bad = descriptive, worse = comparative, worst =
absolute, so you were correct with worse.
> 
> I'm not sure about the read, but the
> check is currently expensive.

But you're still going to need a very unlucky state of affairs for the
optimised check to be worse. Okay, if the disk IS damaged, then the
optimised check could easily be the worst, but if it's just write
pressure, you're going to need every second stripe to be messed up by a
collision. Rather unlikely imho.
> 
> bye,
> 
> pg

Cheers,
Wol
> 
>> And if the system is somewhere inbetween, you still stand a good chance of a
>> fast scan.
>>
>> At the end of the day, the rule should always be "lock only if you need to"
>> so looking for problems with an optimistic no-lock scan, then locking only
>> if needed to check and fix the problem, just feels right.
>>
>> Cheers,
>> Wol
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-13 17:37                   ` Wols Lists
@ 2020-05-13 18:23                     ` Piergiorgio Sartor
  0 siblings, 0 replies; 38+ messages in thread
From: Piergiorgio Sartor @ 2020-05-13 18:23 UTC (permalink / raw)
  To: Wols Lists; +Cc: Piergiorgio Sartor, Peter Grandi, Linux RAID

On Wed, May 13, 2020 at 06:37:18PM +0100, Wols Lists wrote:
> On 13/05/20 17:18, Piergiorgio Sartor wrote:
> > On Tue, May 12, 2020 at 09:54:21PM +0100, antlists wrote:
> >> On 12/05/2020 17:09, Piergiorgio Sartor wrote:
> >>> About the check -> maybe lock -> re-check,
> >>> it is a possible workaround, but I find it
> >>> a bit extreme.
> >>
> >> This seems the best (most obvious?) solution to me.
> >>
> >> If the system is under light write pressure, and the disk is healthy, it
> >> will scan pretty quickly with almost no locking.
> > 
> > I've some concerns about optimization
> > solutions which can result in less
> > performances than the original status.
> > 
> > You mention "write pressure", but there
> > is an other case, which will cause
> > read -> lock -> re-read...
> > Namely, when some chunk is really corrupted.
> > 
> Yup. That's why I said "the disk is healthy" :-)

We need to consider all posibilities...
 
> > Now, I do not know, maybe there are other
> > things we overlook, or maybe not.
> > 
> > I do not know either how likely is that some
> > situations will occur to reduce performances.
> > 
> > I would prefer a solution which will *only*
> > improve, without any possible drawback.
> 
> Wouldn't we all. But if the *normal* case shows an appreciable
> improvement, then I'm inclined to write off a "shouldn't happen" case as
> "tough luck, shit happens".
> > 
> > Again, this does not mean this approach is
> > wrong, actually is to be considered.
> > 
> > In the end, I would like also to understand
> > why the lock / unlock is so expensive.
> 
> Agreed.
> > 
> >> If the system is under heavy pressure, chances are there'll be a fair few
> >> stripes needing rechecking, but even at it's worst it'll only be as bad as
> >> the current setup.
> > 
> > It will be worse (or worst, I'm always
> > confused...).
> > The read and the check will double.
> 
> Touche - my logic was off ...
> 
> But a bit of grammar - bad = descriptive, worse = comparative, worst =
> absolute, so you were correct with worse.

Ah! Thank you.
That's always confusing me. Usually I check
with some search engine, but sometimes I'm
too lazy... And then I forgot.

BTW, somehow related, please do not
refrain to correct my English.

> > I'm not sure about the read, but the
> > check is currently expensive.
> 
> But you're still going to need a very unlucky state of affairs for the
> optimised check to be worse. Okay, if the disk IS damaged, then the
> optimised check could easily be the worst, but if it's just write
> pressure, you're going to need every second stripe to be messed up by a
> collision. Rather unlikely imho.

Well, as Neil would say, patch are welcome! :-)

Really, I've too little time to make
changes to the code.
I can do some test and, hopefully,
some support.

bye,

pg

> > 
> > bye,
> > 
> > pg
> 
> Cheers,
> Wol
> > 
> >> And if the system is somewhere inbetween, you still stand a good chance of a
> >> fast scan.
> >>
> >> At the end of the day, the rule should always be "lock only if you need to"
> >> so looking for problems with an optimistic no-lock scan, then locking only
> >> if needed to check and fix the problem, just feels right.
> >>
> >> Cheers,
> >> Wol
> > 

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
  2020-05-10 13:26 ` Piergiorgio Sartor
  2020-05-10 22:16 ` Guoqing Jiang
@ 2020-05-14 17:20 ` Roy Sigurd Karlsbakk
  2020-05-14 18:20   ` Wolfgang Denk
  2 siblings, 1 reply; 38+ messages in thread
From: Roy Sigurd Karlsbakk @ 2020-05-14 17:20 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Linux Raid

> I'm running raid6check on a 12 TB (8 x 2 TB harddisks)
> RAID6 array and wonder why it is so extremely slow...
> It seems to be reading the disks only a about 400 kB/s,
> which results in an estimated time of some 57 days!!!
> to complete checking the array.  The system is basically idle, there
> is neither any significant CPU load nor any other I/o (no to the
> tested array, nor to any other storage on this system).
> 
> Am I doing something wrong?

Try checking with iostat -x to see if one disk is performing worse than the other ones. This sometimes happens and can indicate a failure that the normal SMART/smartctl stuff can't identify. If you see a utilisation of one of the disks at 100%, that's the bastard. Under normal circumstances, you probably won't be able to return that, since it "works". There's a quick fix for that, though. Just unplug the disk, plug it into a power cable, let it spin up and then sharpy twist it 90 degees a few times, and it's all sorted out and you can return it ;)

Vennlig hilsen

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-14 17:20 ` Roy Sigurd Karlsbakk
@ 2020-05-14 18:20   ` Wolfgang Denk
  2020-05-14 19:51     ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-14 18:20 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Linux Raid

Dear Roy,

In message <1999694976.3317399.1589476824607.JavaMail.zimbra@karlsbakk.net> you wrote:
>
> Try checking with iostat -x to see if one disk is performing worse
> than the other ones. This sometimes happens and can indicate a
> failure that the normal SMART/smartctl stuff can't identify. If
> you see a utilisation of one of the disks at 100%, that's the
> bastard. Under normal circumstances, you probably won't be able to
> return that, since it "works". There's a quick fix for that,
> though. Just unplug the disk, plug it into a power cable, let it
> spin up and then sharpy twist it 90 degees a few times, and it's
> all sorted out and you can return it ;)

Everything looks unsuspicious to me - all disks behave the same:

# iostat -x /dev/sd[efhijklm] 1 3
Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-14      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.19    0.00    1.06    0.15    0.00   98.60

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sde             20.08    360.56     2.53  11.20    0.34    17.95    0.49      0.10     0.02   3.41   32.36     0.20    0.00      0.00     0.00   0.00    0.00     0.00    0.49   32.74    0.02   2.11
sdf             20.07    360.56     2.54  11.24    0.33    17.96    0.49      0.10     0.02   3.40   44.23     0.20    0.00      0.00     0.00   0.00    0.00     0.00    0.49   44.77    0.02   2.09
sdh             20.08    360.54     2.53  11.17    0.35    17.95    0.49      0.10     0.02   3.40   43.47     0.20    0.00      0.00     0.00   0.00    0.00     0.00    0.49   44.01    0.02   2.40
sdi             20.08    360.58     2.54  11.23    0.34    17.96    0.49      0.10     0.02   3.40   26.22     0.21    0.00      0.00     0.00   0.00    0.00     0.00    0.49   26.50    0.01   2.84
sdj             20.45    360.56     2.16   9.54    0.34    17.63    0.49      0.10     0.02   3.38   35.19     0.20    0.00      0.00     0.00   0.00    0.00     0.00    0.49   35.60    0.02   2.46
sdk             20.08    360.54     2.53  11.21    0.35    17.95    0.49      0.10     0.02   3.42   40.63     0.21    0.00      0.00     0.00   0.00    0.00     0.00    0.49   41.13    0.02   2.36
sdl             20.07    360.57     2.54  11.24    0.34    17.96    0.49      0.10     0.02   3.39   23.61     0.20    0.00      0.00     0.00   0.00    0.00     0.00    0.49   23.84    0.01   2.70
sdm             20.08    360.55     2.53  11.21    0.53    17.96    0.49      0.10     0.02   3.41   21.52     0.20    0.00      0.00     0.00   0.00    0.00     0.00    0.49   21.67    0.01   2.64


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.38    0.00    1.12    0.12    0.00   98.38

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sde             20.00    320.00     0.00   0.00    0.25    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
sdf             20.00    320.00     0.00   0.00    0.25    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
sdh             20.00    320.00     0.00   0.00    0.30    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
sdi             20.00    320.00     0.00   0.00    0.25    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   4.00
sdj             20.00    320.00     0.00   0.00    0.25    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
sdk             20.00    320.00     0.00   0.00    0.30    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
sdl             20.00    320.00     0.00   0.00    0.25    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   4.00
sdm             20.00    320.00     0.00   0.00    0.35    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.88    0.00    0.00   98.87

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sde             21.00    336.00     0.00   0.00    0.24    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.10
sdf             21.00    336.00     0.00   0.00    0.24    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.10
sdh             21.00    336.00     0.00   0.00    0.24    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.30
sdi             21.00    336.00     0.00   0.00    0.24    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   4.00
sdj             21.00    336.00     0.00   0.00    0.24    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.10
sdk             21.00    336.00     0.00   0.00    0.24    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.10
sdl             21.00    336.00     0.00   0.00    0.29    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   4.20
sdm             21.00    336.00     0.00   0.00    0.38    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.10



Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
We see things not as they are, but as we are.       - H. M. Tomlinson

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-14 18:20   ` Wolfgang Denk
@ 2020-05-14 19:51     ` Roy Sigurd Karlsbakk
  2020-05-15  8:08       ` Wolfgang Denk
  0 siblings, 1 reply; 38+ messages in thread
From: Roy Sigurd Karlsbakk @ 2020-05-14 19:51 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Linux Raid

what?

Vennlig hilsen

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.

----- Original Message -----
> From: "Wolfgang Denk" <wd@denx.de>
> To: "Roy Sigurd Karlsbakk" <roy@karlsbakk.net>
> Cc: "Linux Raid" <linux-raid@vger.kernel.org>
> Sent: Thursday, 14 May, 2020 20:20:41
> Subject: Re: raid6check extremely slow ?

> Dear Roy,
> 
> In message <1999694976.3317399.1589476824607.JavaMail.zimbra@karlsbakk.net> you
> wrote:
>>
>> Try checking with iostat -x to see if one disk is performing worse
>> than the other ones. This sometimes happens and can indicate a
>> failure that the normal SMART/smartctl stuff can't identify. If
>> you see a utilisation of one of the disks at 100%, that's the
>> bastard. Under normal circumstances, you probably won't be able to
>> return that, since it "works". There's a quick fix for that,
>> though. Just unplug the disk, plug it into a power cable, let it
>> spin up and then sharpy twist it 90 degees a few times, and it's
>> all sorted out and you can return it ;)
> 
> Everything looks unsuspicious to me - all disks behave the same:
> 
> # iostat -x /dev/sd[efhijklm] 1 3
> Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de)     2020-05-14      _x86_64_
> (8 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           0.19    0.00    1.06    0.15    0.00   98.60
> 
> Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s
> wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm
> d_await dareq-sz     f/s f_await  aqu-sz  %util
> sde             20.08    360.56     2.53  11.20    0.34    17.95    0.49
> 0.10     0.02   3.41   32.36     0.20    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   32.74    0.02   2.11
> sdf             20.07    360.56     2.54  11.24    0.33    17.96    0.49
> 0.10     0.02   3.40   44.23     0.20    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   44.77    0.02   2.09
> sdh             20.08    360.54     2.53  11.17    0.35    17.95    0.49
> 0.10     0.02   3.40   43.47     0.20    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   44.01    0.02   2.40
> sdi             20.08    360.58     2.54  11.23    0.34    17.96    0.49
> 0.10     0.02   3.40   26.22     0.21    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   26.50    0.01   2.84
> sdj             20.45    360.56     2.16   9.54    0.34    17.63    0.49
> 0.10     0.02   3.38   35.19     0.20    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   35.60    0.02   2.46
> sdk             20.08    360.54     2.53  11.21    0.35    17.95    0.49
> 0.10     0.02   3.42   40.63     0.21    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   41.13    0.02   2.36
> sdl             20.07    360.57     2.54  11.24    0.34    17.96    0.49
> 0.10     0.02   3.39   23.61     0.20    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   23.84    0.01   2.70
> sdm             20.08    360.55     2.53  11.21    0.53    17.96    0.49
> 0.10     0.02   3.41   21.52     0.20    0.00      0.00     0.00   0.00
> 0.00     0.00    0.49   21.67    0.01   2.64
> 
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           0.38    0.00    1.12    0.12    0.00   98.38
> 
> Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s
> wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm
> d_await dareq-sz     f/s f_await  aqu-sz  %util
> sde             20.00    320.00     0.00   0.00    0.25    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.00
> sdf             20.00    320.00     0.00   0.00    0.25    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.00
> sdh             20.00    320.00     0.00   0.00    0.30    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.00
> sdi             20.00    320.00     0.00   0.00    0.25    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   4.00
> sdj             20.00    320.00     0.00   0.00    0.25    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.00
> sdk             20.00    320.00     0.00   0.00    0.30    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.00
> sdl             20.00    320.00     0.00   0.00    0.25    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   4.00
> sdm             20.00    320.00     0.00   0.00    0.35    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.00
> 
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           0.25    0.00    0.88    0.00    0.00   98.87
> 
> Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s
> wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm
> d_await dareq-sz     f/s f_await  aqu-sz  %util
> sde             21.00    336.00     0.00   0.00    0.24    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.10
> sdf             21.00    336.00     0.00   0.00    0.24    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.10
> sdh             21.00    336.00     0.00   0.00    0.24    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.30
> sdi             21.00    336.00     0.00   0.00    0.24    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   4.00
> sdj             21.00    336.00     0.00   0.00    0.24    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.10
> sdk             21.00    336.00     0.00   0.00    0.24    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.10
> sdl             21.00    336.00     0.00   0.00    0.29    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   4.20
> sdm             21.00    336.00     0.00   0.00    0.38    16.00    0.00
> 0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00
> 0.00     0.00    0.00    0.00    0.00   2.10
> 
> 
> 
> Best regards,
> 
> Wolfgang Denk
> 
> --
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> We see things not as they are, but as we are.       - H. M. Tomlinson

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-14 19:51     ` Roy Sigurd Karlsbakk
@ 2020-05-15  8:08       ` Wolfgang Denk
  0 siblings, 0 replies; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-15  8:08 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Linux Raid

Dear Roy Sigurd Karlsbakk,

In message <1430936688.3381175.1589485881380.JavaMail.zimbra@karlsbakk.net> you wrote:
> what?

You asked: "Try checking with iostat -x to see if one disk is
performing worse than the other ones."

The output of "iostat -x" which I posted shows clearly that all disk
behave very much the same - there are just minimal statistic
fluctuations, but agail equally distributed over all 8 disks.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I used to be indecisive, now I'm not sure.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-13  6:07             ` Wolfgang Denk
@ 2020-05-15 10:34               ` Andrey Jr. Melnikov
  2020-05-15 11:54                 ` Wolfgang Denk
  0 siblings, 1 reply; 38+ messages in thread
From: Andrey Jr. Melnikov @ 2020-05-15 10:34 UTC (permalink / raw)
  To: linux-raid

Wolfgang Denk <wd@denx.de> wrote:
> Dear Piergiorgio,

> In message <20200512160712.GB7261@lazy.lzy> you wrote:
> >
> > > BTW, seems there are build problems for raid6check ...
> ...
> > I cannot see this problem.
> > I could compile without issue.
> > Maybe some library is missing somewhere,
> > but I'm not sure where.

> ...
> gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
> /usr/bin/ld: sysfs.o: in function `sysfsline':
> sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
> /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
> /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'

raid6check miss util.o object. Add it to CHECK_OBJS

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-15 10:34               ` Andrey Jr. Melnikov
@ 2020-05-15 11:54                 ` Wolfgang Denk
  2020-05-15 12:58                   ` Guoqing Jiang
  0 siblings, 1 reply; 38+ messages in thread
From: Wolfgang Denk @ 2020-05-15 11:54 UTC (permalink / raw)
  To: Andrey Jr. Melnikov; +Cc: linux-raid

Dear "Andrey Jr. Melnikov",

In message <sq72pg-98v.ln1@banana.localnet> you wrote:
>
> > ...
> > gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
> > /usr/bin/ld: sysfs.o: in function `sysfsline':
> > sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
> > /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
> > /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'
>
> raid6check miss util.o object. Add it to CHECK_OBJS

This makes things just worse.  With this, I get:

...
gcc -Wall -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-77-g3b7aae9\" -DVERS_DATE="\"2020-05-14\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\"  -o util.o -c util.c
gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o util.o
/usr/bin/ld: util.o: in function `mdadm_version':
util.c:(.text+0x702): undefined reference to `Version'
/usr/bin/ld: util.o: in function `fname_from_uuid':
util.c:(.text+0xdce): undefined reference to `super1'
/usr/bin/ld: util.o: in function `is_subarray_active':
util.c:(.text+0x30b3): undefined reference to `mdstat_read'
/usr/bin/ld: util.c:(.text+0x3122): undefined reference to `free_mdstat'
/usr/bin/ld: util.o: in function `flush_metadata_updates':
util.c:(.text+0x3ad3): undefined reference to `connect_monitor'
/usr/bin/ld: util.c:(.text+0x3af1): undefined reference to `send_message'
/usr/bin/ld: util.c:(.text+0x3afb): undefined reference to `wait_reply'
/usr/bin/ld: util.c:(.text+0x3b1f): undefined reference to `ack'
/usr/bin/ld: util.c:(.text+0x3b29): undefined reference to `wait_reply'
/usr/bin/ld: util.o: in function `container_choose_spares':
util.c:(.text+0x3c84): undefined reference to `devid_policy'
/usr/bin/ld: util.c:(.text+0x3c9b): undefined reference to `pol_domain'
/usr/bin/ld: util.c:(.text+0x3caa): undefined reference to `pol_add'
/usr/bin/ld: util.c:(.text+0x3cbc): undefined reference to `domain_test'
/usr/bin/ld: util.c:(.text+0x3ccb): undefined reference to `dev_policy_free'
/usr/bin/ld: util.c:(.text+0x3d11): undefined reference to `dev_policy_free'
/usr/bin/ld: util.o: in function `set_cmap_hooks':
util.c:(.text+0x3f80): undefined reference to `dlopen'
/usr/bin/ld: util.c:(.text+0x3f9c): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x3fad): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x3fbe): undefined reference to `dlsym'
/usr/bin/ld: util.o: in function `set_dlm_hooks':
util.c:(.text+0x4310): undefined reference to `dlopen'
/usr/bin/ld: util.c:(.text+0x4330): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4341): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4352): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4363): undefined reference to `dlsym'
/usr/bin/ld: util.c:(.text+0x4374): undefined reference to `dlsym'
/usr/bin/ld: util.o:util.c:(.text+0x4385): more undefined references to `dlsym' follow
/usr/bin/ld: util.o: in function `set_cmap_hooks':
util.c:(.text+0x3fed): undefined reference to `dlclose'
/usr/bin/ld: util.o: in function `set_dlm_hooks':
util.c:(.text+0x43e5): undefined reference to `dlclose'
/usr/bin/ld: util.o:(.data+0x0): undefined reference to `super0'
/usr/bin/ld: util.o:(.data+0x8): undefined reference to `super1'
/usr/bin/ld: util.o:(.data+0x10): undefined reference to `super_ddf'
/usr/bin/ld: util.o:(.data+0x18): undefined reference to `super_imsm'
/usr/bin/ld: util.o:(.data+0x20): undefined reference to `mbr'
/usr/bin/ld: util.o:(.data+0x28): undefined reference to `gpt'
collect2: error: ld returned 1 exit status
make: *** [Makefile:221: raid6check] Error 1


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Ninety-Ninety Rule of Project Schedules:
        The first ninety percent of the task takes ninety percent of
the time, and the last ten percent takes the other ninety percent.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: raid6check extremely slow ?
  2020-05-15 11:54                 ` Wolfgang Denk
@ 2020-05-15 12:58                   ` Guoqing Jiang
  0 siblings, 0 replies; 38+ messages in thread
From: Guoqing Jiang @ 2020-05-15 12:58 UTC (permalink / raw)
  To: Wolfgang Denk, Andrey Jr. Melnikov; +Cc: linux-raid

On 5/15/20 1:54 PM, Wolfgang Denk wrote:
> Dear "Andrey Jr. Melnikov",
>
> In message <sq72pg-98v.ln1@banana.localnet> you wrote:
>>> ...
>>> gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
>>> /usr/bin/ld: sysfs.o: in function `sysfsline':
>>> sysfs.c:(.text+0x2707): undefined reference to `parse_uuid'
>>> /usr/bin/ld: sysfs.c:(.text+0x271a): undefined reference to `uuid_zero'
>>> /usr/bin/ld: sysfs.c:(.text+0x2721): undefined reference to `uuid_zero'
>> raid6check miss util.o object. Add it to CHECK_OBJS
> This makes things just worse.  With this, I get:
>
> ...
> gcc -Wall -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-77-g3b7aae9\" -DVERS_DATE="\"2020-05-14\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\"  -o util.o -c util.c
> gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o util.o
> /usr/bin/ld: util.o: in function `mdadm_version':
> util.c:(.text+0x702): undefined reference to `Version'
> /usr/bin/ld: util.o: in function `fname_from_uuid':
> util.c:(.text+0xdce): undefined reference to `super1'
> /usr/bin/ld: util.o: in function `is_subarray_active':
> util.c:(.text+0x30b3): undefined reference to `mdstat_read'
> /usr/bin/ld: util.c:(.text+0x3122): undefined reference to `free_mdstat'
> /usr/bin/ld: util.o: in function `flush_metadata_updates':
> util.c:(.text+0x3ad3): undefined reference to `connect_monitor'
> /usr/bin/ld: util.c:(.text+0x3af1): undefined reference to `send_message'
> /usr/bin/ld: util.c:(.text+0x3afb): undefined reference to `wait_reply'
> /usr/bin/ld: util.c:(.text+0x3b1f): undefined reference to `ack'
> /usr/bin/ld: util.c:(.text+0x3b29): undefined reference to `wait_reply'
> /usr/bin/ld: util.o: in function `container_choose_spares':
> util.c:(.text+0x3c84): undefined reference to `devid_policy'
> /usr/bin/ld: util.c:(.text+0x3c9b): undefined reference to `pol_domain'
> /usr/bin/ld: util.c:(.text+0x3caa): undefined reference to `pol_add'
> /usr/bin/ld: util.c:(.text+0x3cbc): undefined reference to `domain_test'
> /usr/bin/ld: util.c:(.text+0x3ccb): undefined reference to `dev_policy_free'
> /usr/bin/ld: util.c:(.text+0x3d11): undefined reference to `dev_policy_free'
> /usr/bin/ld: util.o: in function `set_cmap_hooks':
> util.c:(.text+0x3f80): undefined reference to `dlopen'
> /usr/bin/ld: util.c:(.text+0x3f9c): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x3fad): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x3fbe): undefined reference to `dlsym'
> /usr/bin/ld: util.o: in function `set_dlm_hooks':
> util.c:(.text+0x4310): undefined reference to `dlopen'
> /usr/bin/ld: util.c:(.text+0x4330): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4341): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4352): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4363): undefined reference to `dlsym'
> /usr/bin/ld: util.c:(.text+0x4374): undefined reference to `dlsym'
> /usr/bin/ld: util.o:util.c:(.text+0x4385): more undefined references to `dlsym' follow
> /usr/bin/ld: util.o: in function `set_cmap_hooks':
> util.c:(.text+0x3fed): undefined reference to `dlclose'
> /usr/bin/ld: util.o: in function `set_dlm_hooks':
> util.c:(.text+0x43e5): undefined reference to `dlclose'
> /usr/bin/ld: util.o:(.data+0x0): undefined reference to `super0'
> /usr/bin/ld: util.o:(.data+0x8): undefined reference to `super1'
> /usr/bin/ld: util.o:(.data+0x10): undefined reference to `super_ddf'
> /usr/bin/ld: util.o:(.data+0x18): undefined reference to `super_imsm'
> /usr/bin/ld: util.o:(.data+0x20): undefined reference to `mbr'
> /usr/bin/ld: util.o:(.data+0x28): undefined reference to `gpt'
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:221: raid6check] Error 1
>

I think we need a new uuid.c which is separated from util.c to address 
the issue,
will send patch for it later.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-05-15 12:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-11  6:33   ` Wolfgang Denk
2020-05-10 22:16 ` Guoqing Jiang
2020-05-11  6:40   ` Wolfgang Denk
2020-05-11  8:58     ` Guoqing Jiang
2020-05-11 15:39       ` Piergiorgio Sartor
2020-05-12  7:37         ` Wolfgang Denk
2020-05-12 16:17           ` Piergiorgio Sartor
2020-05-13  6:13             ` Wolfgang Denk
2020-05-13 16:22               ` Piergiorgio Sartor
2020-05-11 16:14       ` Piergiorgio Sartor
2020-05-11 20:53         ` Giuseppe Bilotta
2020-05-11 21:12           ` Guoqing Jiang
2020-05-11 21:16             ` Guoqing Jiang
2020-05-12  1:52               ` Giuseppe Bilotta
2020-05-12  6:27                 ` Adam Goryachev
2020-05-12 16:11                   ` Piergiorgio Sartor
2020-05-12 16:05           ` Piergiorgio Sartor
2020-05-11 21:07         ` Guoqing Jiang
2020-05-11 22:44           ` Peter Grandi
2020-05-12 16:09             ` Piergiorgio Sartor
2020-05-12 20:54               ` antlists
2020-05-13 16:18                 ` Piergiorgio Sartor
2020-05-13 17:37                   ` Wols Lists
2020-05-13 18:23                     ` Piergiorgio Sartor
2020-05-12 16:07           ` Piergiorgio Sartor
2020-05-12 18:16             ` Guoqing Jiang
2020-05-12 18:32               ` Piergiorgio Sartor
2020-05-13  6:18                 ` Wolfgang Denk
2020-05-13  6:07             ` Wolfgang Denk
2020-05-15 10:34               ` Andrey Jr. Melnikov
2020-05-15 11:54                 ` Wolfgang Denk
2020-05-15 12:58                   ` Guoqing Jiang
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2020-05-14 18:20   ` Wolfgang Denk
2020-05-14 19:51     ` Roy Sigurd Karlsbakk
2020-05-15  8:08       ` Wolfgang Denk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.