All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Cc: songliubraving@fb.com, linux-raid@vger.kernel.org,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
Date: Fri, 10 Apr 2020 17:36:13 +0800	[thread overview]
Message-ID: <3c878bde-bba8-fa75-15d4-051d826dbcdc@suse.de> (raw)
In-Reply-To: <204e9fd0-3712-4864-2bf5-38913511e658@cloud.ionos.com>

On 2020/4/10 5:38 上午, Guoqing Jiang wrote:
> On 07.04.20 17:09, Coly Li wrote:
>> On 2020/4/5 11:53 下午, Guoqing Jiang wrote:
>>> On 02.04.20 10:13, Coly Li wrote:
>>>> -    scribble = kvmalloc_array(cnt, obj_size, flags);
>>>> +    scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);
>>> Maybe it is simpler to call kvmalloc_array between memalloc_noio_save
>>> and memalloc_noio_restore.
>>> And seems sched/mm.h need to be included per the report from LKP.
>> The falgs can be,
>> - GFP_KERNEL: when called from alloc_scratch_buffer()
>> - GFP_NOIO: when called from resize_chunks().
>>
>> If the scope APIs are used inside scribble_alloc(), the first call path
>> is restricted as noio, which is not expected. So I only use the scope
>> APIs around the location where GFP_NOIO is used.
> 
> Thanks for the explanation. I assume it can be distinguished by check
> the flag,
> eg, FYI.
> 
> if (flag == GFP_NOIO)
>     memalloc_noio_save();
> kvmalloc_array();
> if (flag == GFP_NOIO)
>     memalloc_noio_restore();

This was similar to my original idea, but finally I decide to accept
Michal's suggestion to add them into mddev_suspend()/mddev_resume. Let
me explain.

>> Anyway, Michal Hocko suggests to add the scope APIs in
>> mddev_suspend()/mddev_resume(). Then in the whole code path where md
>> raid array is suspend, we don't need to worry the recursive memory
>> reclaim I/Os onto the array. After checking the complicated raid5 code,
>> I come to realize this suggestion makes sense.
> 
> Hmm, mddev_suspend/resume are called at lots of places, then it's better to
> check if all the places don't allocate memory with GFP_KERNEL.
> 

When mddev_suspend() is called, then all I/Os on the suspending raid
array have to wait until mddev_resume() is called. Inside the suspending
region, all memory allocation should be careful to avoid memory reclaim
I/Os going back to the suspending raid array. If such recursive I/Os
happen, we will experience deadlock.

Therefore we should be very careful to use GFP_KERNEL to allocate memory
during the period when the raid array is suspending. No matter
physically continuous or virtually continuous pages.

> And seems In level_store(), sysfs_create_group could be called between
> suspend
> and resume, then the two functions (kstrdup_const(name, GFP_KERNEL) and
> kmem_cache_zalloc(kernfs_node_cache, GFP_KERNEL)) could be triggered by the
> path:
> 
> sysfs_create_group ->internal_create_group -> kernfs_create_dir_ns ->
> kernfs_new_node -> __kernfs_new_node
> 

From your information, the above code path is suspicious. Although the
deadlock might not happen in practice, unless the raid array is used as
the only fs volume, and all memory writeback or swap I/Os will only go
into this array.

Checking all the location of memory allocation is trivial but possible,
how to prevent new code with memory allocation to avoid the potential
deadlock risk during the suspending period is still a problem.

> Not know memalloc_noio_{save,restore} well, but I guess it is better to
> use them
> to mark a small scope, just my two cents.

Since mddev_suspend() is the entry point to prevent I/Os on the array,
it is the good location to restrict memory reclaim I/Os of memory
allocation. Finally I realize this is a brilliant idea after Michal
Hocko suggests me for a while.

Thanks.

-- 

Coly Li

  reply	other threads:[~2020-04-10  9:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-02  8:13 [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks() Coly Li
2020-04-03 13:17 ` kbuild test robot
2020-04-03 13:17   ` kbuild test robot
2020-04-05 15:53 ` Guoqing Jiang
2020-04-07 15:09   ` Coly Li
2020-04-09 21:38     ` Guoqing Jiang
2020-04-10  9:36       ` Coly Li [this message]
2020-04-15 11:48       ` Michal Hocko
2020-04-15 14:10         ` Guoqing Jiang
2020-04-15 14:23           ` Michal Hocko
2020-04-15 14:57             ` Guoqing Jiang
2020-04-30  6:36               ` Song Liu
2020-04-05 17:43 ` Song Liu
2020-04-07 14:42   ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c878bde-bba8-fa75-15d4-051d826dbcdc@suse.de \
    --to=colyli@suse.de \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.