All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: Wolfgang Denk <wd@denx.de>, linux-raid@vger.kernel.org
Subject: Re: raid6check extremely slow ?
Date: Tue, 12 May 2020 20:16:27 +0200	[thread overview]
Message-ID: <e24b0703-a599-45ef-f6b6-0a713cfa414c@cloud.ionos.com> (raw)
In-Reply-To: <20200512160712.GB7261@lazy.lzy>

On 5/12/20 6:07 PM, Piergiorgio Sartor wrote:
> On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
>> On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
>>> On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
>>>> Hi Wolfgang,
>>>>
>>>>
>>>> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
>>>>> Dear Guoqing Jiang,
>>>>>
>>>>> In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com>  you wrote:
>>>>>> Seems raid6check is in 'D' state, what are the output of 'cat
>>>>>> /proc/19719/stack' and /proc/mdstat?
>>>>> # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_lo_store+0x50/0xa0
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_lo_store+0x50/0xa0
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_hi_store+0x44/0x90
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>
>>>>> [<0>] __wait_rcu_gp+0x10d/0x110
>>>>> [<0>] synchronize_rcu+0x47/0x50
>>>>> [<0>] mddev_suspend+0x4a/0x140
>>>>> [<0>] suspend_hi_store+0x44/0x90
>>>>> [<0>] md_attr_store+0x86/0xe0
>>>>> [<0>] kernfs_fop_write+0xce/0x1b0
>>>>> [<0>] vfs_write+0xb6/0x1a0
>>>>> [<0>] ksys_write+0x4f/0xc0
>>>>> [<0>] do_syscall_64+0x5b/0xf0
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
>>>> is called,
>>>> means synchronize_rcu and other synchronize mechanisms are triggered in the
>>>> path ...
>>>>
>>>>> Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
>>>>> all the time?  I thought it was_reading_  the disks only?
>>>> I didn't read raid6check before, just find check_stripes has
>>>>
>>>>
>>>>       while (length > 0) {
>>>>               lock_stripe -> write suspend_lo/hi node
>>>>               ...
>>>>               unlock_all_stripes -> -> write suspend_lo/hi node
>>>>       }
>>>>
>>>> I think it explains the stack of raid6check, and maybe it is way that
>>>> raid6check works, lock
>>>> stripe, check the stripe then unlock the stripe, just my guess ...
>>> Hi again!
>>>
>>> I made a quick test.
>>> I disabled the lock / unlock in raid6check.
>>>
>>> With lock / unlock, I get around 1.2MB/sec
>>> per device component, with ~13% CPU load.
>>> Wihtout lock / unlock, I get around 15.5MB/sec
>>> per device component, with ~30% CPU load.
>>>
>>> So, it seems the lock / unlock mechanism is
>>> quite expensive.
>> Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
>>
>>> I'm not sure what's the best solution, since
>>> we still need to avoid race conditions.
>> I guess there are two possible ways:
>>
>> 1. Per your previous reply, only call raid6check when array is RO, then
>> we don't need the lock.
>>
>> 2. Investigate if it is possible that acquire stripe_lock in
>> suspend_lo/hi_store
>> to avoid the race between raid6check and write to the same stripe. IOW,
>> try fine grained protection instead of call the expensive suspend/resume
>> in suspend_lo/hi_store. But I am not sure it is doable or not right now.
> Could you please elaborate on the
> "fine grained protection" thing?

Even raid6check checks stripe and locks stripe one by one, but the thing
is different in kernel space, locking of one stripe triggers mddev_suspend
and mddev_resume which affect all stripes ...

If kernel can expose interface to actually locking one stripe, then 
raid6check
could use it to actually lock only one stripe (this is what I call fine 
grained)
instead of trigger suspend/resume which are time consuming.

>   
>> BTW, seems there are build problems for raid6check ...
>>
>> mdadm$ make raid6check
>> gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
>> -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
>> -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
>> -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
>> -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
>> -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
>> -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c
>> gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
>> xmalloc.o dlink.o
>> sysfs.o: In function `sysfsline':
>> sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
>> sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
>> sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
>> collect2: error: ld returned 1 exit status
>> Makefile:220: recipe for target 'raid6check' failed
>> make: *** [raid6check] Error 1
> I cannot see this problem.
> I could compile without issue.
> Maybe some library is missing somewhere,
> but I'm not sure where.

Do you try with the fastest mdadm tree? But could be environment issue ...

Thanks,
Guoqing

  reply	other threads:[~2020-05-12 18:16 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-11  6:33   ` Wolfgang Denk
2020-05-10 22:16 ` Guoqing Jiang
2020-05-11  6:40   ` Wolfgang Denk
2020-05-11  8:58     ` Guoqing Jiang
2020-05-11 15:39       ` Piergiorgio Sartor
2020-05-12  7:37         ` Wolfgang Denk
2020-05-12 16:17           ` Piergiorgio Sartor
2020-05-13  6:13             ` Wolfgang Denk
2020-05-13 16:22               ` Piergiorgio Sartor
2020-05-11 16:14       ` Piergiorgio Sartor
2020-05-11 20:53         ` Giuseppe Bilotta
2020-05-11 21:12           ` Guoqing Jiang
2020-05-11 21:16             ` Guoqing Jiang
2020-05-12  1:52               ` Giuseppe Bilotta
2020-05-12  6:27                 ` Adam Goryachev
2020-05-12 16:11                   ` Piergiorgio Sartor
2020-05-12 16:05           ` Piergiorgio Sartor
2020-05-11 21:07         ` Guoqing Jiang
2020-05-11 22:44           ` Peter Grandi
2020-05-12 16:09             ` Piergiorgio Sartor
2020-05-12 20:54               ` antlists
2020-05-13 16:18                 ` Piergiorgio Sartor
2020-05-13 17:37                   ` Wols Lists
2020-05-13 18:23                     ` Piergiorgio Sartor
2020-05-12 16:07           ` Piergiorgio Sartor
2020-05-12 18:16             ` Guoqing Jiang [this message]
2020-05-12 18:32               ` Piergiorgio Sartor
2020-05-13  6:18                 ` Wolfgang Denk
2020-05-13  6:07             ` Wolfgang Denk
2020-05-15 10:34               ` Andrey Jr. Melnikov
2020-05-15 11:54                 ` Wolfgang Denk
2020-05-15 12:58                   ` Guoqing Jiang
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2020-05-14 18:20   ` Wolfgang Denk
2020-05-14 19:51     ` Roy Sigurd Karlsbakk
2020-05-15  8:08       ` Wolfgang Denk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e24b0703-a599-45ef-f6b6-0a713cfa414c@cloud.ionos.com \
    --to=guoqing.jiang@cloud.ionos.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=piergiorgio.sartor@nexgo.de \
    --cc=wd@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.