All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
@ 2020-04-02  8:13 Coly Li
  2020-04-03 13:17   ` kbuild test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Coly Li @ 2020-04-02  8:13 UTC (permalink / raw)
  To: songliubraving
  Cc: linux-raid, guoqing.jiang, Coly Li, Kent Overstreet, Michal Hocko

Commit b330e6a49dc3 ("md: convert to kvmalloc") uses kvmalloc_array()
to allocate memory with GFP_NOIO flag in resize_chunks() via function
scribble_alloc(),
2269	err = scribble_alloc(percpu, new_disks,
2270			     new_sectors / STRIPE_SECTORS,
2271			     GFP_NOIO);

The purpose of GFP_NOIO flag to kvmalloc_array() is to allocate
non-physically continuous pages and avoid extra I/Os of page reclaim
which triggered by memory allocation. When system memory is under
heavy pressure, non-physically continuous pages allocation is more
probably to success than allocating physically continuous pages.

But as a non GFP_KERNEL compatible flag, GFP_NOIO is not acceptible
by kvmalloc_node() and the memory allocation indeed is handled with
kmalloc_node() to allocate physically continuous pages. This is not
the expected behavior of the original purpose when mistakenly using
GFP_NOIO flag.

In this patch, the memalloc scope APIs memalloc_noio_save() and
memalloc_noio_restore() are used when calling scribble_alloc(). Then
when calling kvmalloc_array() with GFP_KERNEL mask, the scope APIs
may indicatet the allocating context to avoid memory reclaim related
I/Os, to avoid recursive I/O deadlock on the md raid array itself
which is calling scribble_alloc() to allocate non-physically continuous
pages.

This patch also removes gfp_t flags from scribble_alloc() parameters
list, because the invalid GFP_NOIO is replaced by memalloc scope APIs.

Fixes: b330e6a49dc3 ("md: convert to kvmalloc")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
---
 drivers/md/raid5.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ba00e9877f02..6b23f8aba169 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2228,14 +2228,15 @@ static int grow_stripes(struct r5conf *conf, int num)
  * of the P and Q blocks.
  */
 static int scribble_alloc(struct raid5_percpu *percpu,
-			  int num, int cnt, gfp_t flags)
+			  int num, int cnt)
 {
 	size_t obj_size =
 		sizeof(struct page *) * (num+2) +
 		sizeof(addr_conv_t) * (num+2);
 	void *scribble;
+	unsigned int noio_flag;
 
-	scribble = kvmalloc_array(cnt, obj_size, flags);
+	scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);
 	if (!scribble)
 		return -ENOMEM;
 
@@ -2250,6 +2251,7 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 {
 	unsigned long cpu;
 	int err = 0;
+	unsigned int noio_flag;
 
 	/*
 	 * Never shrink. And mddev_suspend() could deadlock if this is called
@@ -2262,16 +2264,25 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 	mddev_suspend(conf->mddev);
 	get_online_cpus();
 
+	/*
+	 * scribble_alloc() allocates memory by kvmalloc_array(), if
+	 * the memory allocation triggers memory reclaim I/Os onto
+	 * this raid array, there might be potential deadlock if this
+	 * raid array happens to be suspended during memory allocation.
+	 * Here the scope APIs are used to disable such recursive memory
+	 * reclaim I/Os.
+	 */
+	noio_flag = memalloc_noio_save();
 	for_each_present_cpu(cpu) {
 		struct raid5_percpu *percpu;
 
 		percpu = per_cpu_ptr(conf->percpu, cpu);
 		err = scribble_alloc(percpu, new_disks,
-				     new_sectors / STRIPE_SECTORS,
-				     GFP_NOIO);
+				     new_sectors / STRIPE_SECTORS);
 		if (err)
 			break;
 	}
+	memalloc_noio_restore(noio_flag);
 
 	put_online_cpus();
 	mddev_resume(conf->mddev);
@@ -6759,8 +6770,7 @@ static int alloc_scratch_buffer(struct r5conf *conf, struct raid5_percpu *percpu
 			       conf->previous_raid_disks),
 			   max(conf->chunk_sectors,
 			       conf->prev_chunk_sectors)
-			   / STRIPE_SECTORS,
-			   GFP_KERNEL)) {
+			   / STRIPE_SECTORS)) {
 		free_scratch_buffer(conf, percpu);
 		return -ENOMEM;
 	}
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-02  8:13 [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks() Coly Li
@ 2020-04-03 13:17   ` kbuild test robot
  2020-04-05 15:53 ` Guoqing Jiang
  2020-04-05 17:43 ` Song Liu
  2 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-04-03 13:17 UTC (permalink / raw)
  Cc: kbuild-all, songliubraving, linux-raid, guoqing.jiang, Coly Li,
	Kent Overstreet, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 3509 bytes --]

Hi Coly,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.6 next-20200403]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Coly-Li/raid5-use-memalloc_noio_save-restore-in-resize_chunks/20200402-215014
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 919dce24701f7b34681a6a1d3ef95c9f6c4fb1cc
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/md/raid5.c: In function 'scribble_alloc':
   drivers/md/raid5.c:2237:15: warning: unused variable 'noio_flag' [-Wunused-variable]
     unsigned int noio_flag;
                  ^~~~~~~~~
   drivers/md/raid5.c: In function 'resize_chunks':
>> drivers/md/raid5.c:2275:14: error: implicit declaration of function 'memalloc_noio_save'; did you mean 'vmalloc_to_page'? [-Werror=implicit-function-declaration]
     noio_flag = memalloc_noio_save();
                 ^~~~~~~~~~~~~~~~~~
                 vmalloc_to_page
>> drivers/md/raid5.c:2285:2: error: implicit declaration of function 'memalloc_noio_restore'; did you mean '__vmalloc_node_range'? [-Werror=implicit-function-declaration]
     memalloc_noio_restore(noio_flag);
     ^~~~~~~~~~~~~~~~~~~~~
     __vmalloc_node_range
   cc1: some warnings being treated as errors

vim +2275 drivers/md/raid5.c

  2249	
  2250	static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
  2251	{
  2252		unsigned long cpu;
  2253		int err = 0;
  2254		unsigned int noio_flag;
  2255	
  2256		/*
  2257		 * Never shrink. And mddev_suspend() could deadlock if this is called
  2258		 * from raid5d. In that case, scribble_disks and scribble_sectors
  2259		 * should equal to new_disks and new_sectors
  2260		 */
  2261		if (conf->scribble_disks >= new_disks &&
  2262		    conf->scribble_sectors >= new_sectors)
  2263			return 0;
  2264		mddev_suspend(conf->mddev);
  2265		get_online_cpus();
  2266	
  2267		/*
  2268		 * scribble_alloc() allocates memory by kvmalloc_array(), if
  2269		 * the memory allocation triggers memory reclaim I/Os onto
  2270		 * this raid array, there might be potential deadlock if this
  2271		 * raid array happens to be suspended during memory allocation.
  2272		 * Here the scope APIs are used to disable such recursive memory
  2273		 * reclaim I/Os.
  2274		 */
> 2275		noio_flag = memalloc_noio_save();
  2276		for_each_present_cpu(cpu) {
  2277			struct raid5_percpu *percpu;
  2278	
  2279			percpu = per_cpu_ptr(conf->percpu, cpu);
  2280			err = scribble_alloc(percpu, new_disks,
  2281					     new_sectors / STRIPE_SECTORS);
  2282			if (err)
  2283				break;
  2284		}
> 2285		memalloc_noio_restore(noio_flag);
  2286	
  2287		put_online_cpus();
  2288		mddev_resume(conf->mddev);
  2289		if (!err) {
  2290			conf->scribble_disks = new_disks;
  2291			conf->scribble_sectors = new_sectors;
  2292		}
  2293		return err;
  2294	}
  2295	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44475 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
@ 2020-04-03 13:17   ` kbuild test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-04-03 13:17 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3602 bytes --]

Hi Coly,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.6 next-20200403]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Coly-Li/raid5-use-memalloc_noio_save-restore-in-resize_chunks/20200402-215014
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 919dce24701f7b34681a6a1d3ef95c9f6c4fb1cc
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/md/raid5.c: In function 'scribble_alloc':
   drivers/md/raid5.c:2237:15: warning: unused variable 'noio_flag' [-Wunused-variable]
     unsigned int noio_flag;
                  ^~~~~~~~~
   drivers/md/raid5.c: In function 'resize_chunks':
>> drivers/md/raid5.c:2275:14: error: implicit declaration of function 'memalloc_noio_save'; did you mean 'vmalloc_to_page'? [-Werror=implicit-function-declaration]
     noio_flag = memalloc_noio_save();
                 ^~~~~~~~~~~~~~~~~~
                 vmalloc_to_page
>> drivers/md/raid5.c:2285:2: error: implicit declaration of function 'memalloc_noio_restore'; did you mean '__vmalloc_node_range'? [-Werror=implicit-function-declaration]
     memalloc_noio_restore(noio_flag);
     ^~~~~~~~~~~~~~~~~~~~~
     __vmalloc_node_range
   cc1: some warnings being treated as errors

vim +2275 drivers/md/raid5.c

  2249	
  2250	static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
  2251	{
  2252		unsigned long cpu;
  2253		int err = 0;
  2254		unsigned int noio_flag;
  2255	
  2256		/*
  2257		 * Never shrink. And mddev_suspend() could deadlock if this is called
  2258		 * from raid5d. In that case, scribble_disks and scribble_sectors
  2259		 * should equal to new_disks and new_sectors
  2260		 */
  2261		if (conf->scribble_disks >= new_disks &&
  2262		    conf->scribble_sectors >= new_sectors)
  2263			return 0;
  2264		mddev_suspend(conf->mddev);
  2265		get_online_cpus();
  2266	
  2267		/*
  2268		 * scribble_alloc() allocates memory by kvmalloc_array(), if
  2269		 * the memory allocation triggers memory reclaim I/Os onto
  2270		 * this raid array, there might be potential deadlock if this
  2271		 * raid array happens to be suspended during memory allocation.
  2272		 * Here the scope APIs are used to disable such recursive memory
  2273		 * reclaim I/Os.
  2274		 */
> 2275		noio_flag = memalloc_noio_save();
  2276		for_each_present_cpu(cpu) {
  2277			struct raid5_percpu *percpu;
  2278	
  2279			percpu = per_cpu_ptr(conf->percpu, cpu);
  2280			err = scribble_alloc(percpu, new_disks,
  2281					     new_sectors / STRIPE_SECTORS);
  2282			if (err)
  2283				break;
  2284		}
> 2285		memalloc_noio_restore(noio_flag);
  2286	
  2287		put_online_cpus();
  2288		mddev_resume(conf->mddev);
  2289		if (!err) {
  2290			conf->scribble_disks = new_disks;
  2291			conf->scribble_sectors = new_sectors;
  2292		}
  2293		return err;
  2294	}
  2295	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 44475 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-02  8:13 [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks() Coly Li
  2020-04-03 13:17   ` kbuild test robot
@ 2020-04-05 15:53 ` Guoqing Jiang
  2020-04-07 15:09   ` Coly Li
  2020-04-05 17:43 ` Song Liu
  2 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2020-04-05 15:53 UTC (permalink / raw)
  To: Coly Li, songliubraving; +Cc: linux-raid, Kent Overstreet, Michal Hocko

On 02.04.20 10:13, Coly Li wrote:
> -	scribble = kvmalloc_array(cnt, obj_size, flags);
> +	scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);

Maybe it is simpler to call kvmalloc_array between memalloc_noio_save 
and memalloc_noio_restore.
And seems sched/mm.h need to be included per the report from LKP.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-02  8:13 [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks() Coly Li
  2020-04-03 13:17   ` kbuild test robot
  2020-04-05 15:53 ` Guoqing Jiang
@ 2020-04-05 17:43 ` Song Liu
  2020-04-07 14:42   ` Coly Li
  2 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2020-04-05 17:43 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-raid, guoqing.jiang, Kent Overstreet, Michal Hocko

Hi Coly,

Thanks for the patch!

> On Apr 2, 2020, at 1:13 AM, Coly Li <colyli@suse.de> wrote:
> 
> Commit b330e6a49dc3 ("md: convert to kvmalloc") uses kvmalloc_array()
> to allocate memory with GFP_NOIO flag in resize_chunks() via function
> scribble_alloc(),
> 2269	err = scribble_alloc(percpu, new_disks,
> 2270			     new_sectors / STRIPE_SECTORS,
> 2271			     GFP_NOIO);
> 
> The purpose of GFP_NOIO flag to kvmalloc_array() is to allocate
> non-physically continuous pages and avoid extra I/Os of page reclaim
> which triggered by memory allocation. When system memory is under
> heavy pressure, non-physically continuous pages allocation is more
> probably to success than allocating physically continuous pages.
> 
> But as a non GFP_KERNEL compatible flag, GFP_NOIO is not acceptible
> by kvmalloc_node() and the memory allocation indeed is handled with
> kmalloc_node() to allocate physically continuous pages. This is not
> the expected behavior of the original purpose when mistakenly using
> GFP_NOIO flag.
> 
> In this patch, the memalloc scope APIs memalloc_noio_save() and
> memalloc_noio_restore() are used when calling scribble_alloc(). Then
> when calling kvmalloc_array() with GFP_KERNEL mask, the scope APIs
> may indicatet the allocating context to avoid memory reclaim related
> I/Os, to avoid recursive I/O deadlock on the md raid array itself
> which is calling scribble_alloc() to allocate non-physically continuous
> pages.
> 
> This patch also removes gfp_t flags from scribble_alloc() parameters
> list, because the invalid GFP_NOIO is replaced by memalloc scope APIs.
> 
> Fixes: b330e6a49dc3 ("md: convert to kvmalloc")
> Signed-off-by: Coly Li <colyli@suse.de>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
> drivers/md/raid5.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index ba00e9877f02..6b23f8aba169 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2228,14 +2228,15 @@ static int grow_stripes(struct r5conf *conf, int num)
>  * of the P and Q blocks.
>  */

I noticed the comment before scribble_alloc() is outdated. Maybe
fix also that as we are on it? 

> static int scribble_alloc(struct raid5_percpu *percpu,
> -			  int num, int cnt, gfp_t flags)
> +			  int num, int cnt)
> {
> 	size_t obj_size =
> 		sizeof(struct page *) * (num+2) +
> 		sizeof(addr_conv_t) * (num+2);
> 	void *scribble;
> +	unsigned int noio_flag;
I think we don't use noio_flag in scribble_alloc()? 

Thanks,
Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-05 17:43 ` Song Liu
@ 2020-04-07 14:42   ` Coly Li
  0 siblings, 0 replies; 14+ messages in thread
From: Coly Li @ 2020-04-07 14:42 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, guoqing.jiang, Kent Overstreet, Michal Hocko

On 2020/4/6 1:43 上午, Song Liu wrote:
> Hi Coly,
> 
> Thanks for the patch!
> 
>> On Apr 2, 2020, at 1:13 AM, Coly Li <colyli@suse.de> wrote:
>>
>> Commit b330e6a49dc3 ("md: convert to kvmalloc") uses kvmalloc_array()
>> to allocate memory with GFP_NOIO flag in resize_chunks() via function
>> scribble_alloc(),
>> 2269	err = scribble_alloc(percpu, new_disks,
>> 2270			     new_sectors / STRIPE_SECTORS,
>> 2271			     GFP_NOIO);
>>
>> The purpose of GFP_NOIO flag to kvmalloc_array() is to allocate
>> non-physically continuous pages and avoid extra I/Os of page reclaim
>> which triggered by memory allocation. When system memory is under
>> heavy pressure, non-physically continuous pages allocation is more
>> probably to success than allocating physically continuous pages.
>>
>> But as a non GFP_KERNEL compatible flag, GFP_NOIO is not acceptible
>> by kvmalloc_node() and the memory allocation indeed is handled with
>> kmalloc_node() to allocate physically continuous pages. This is not
>> the expected behavior of the original purpose when mistakenly using
>> GFP_NOIO flag.
>>
>> In this patch, the memalloc scope APIs memalloc_noio_save() and
>> memalloc_noio_restore() are used when calling scribble_alloc(). Then
>> when calling kvmalloc_array() with GFP_KERNEL mask, the scope APIs
>> may indicatet the allocating context to avoid memory reclaim related
>> I/Os, to avoid recursive I/O deadlock on the md raid array itself
>> which is calling scribble_alloc() to allocate non-physically continuous
>> pages.
>>
>> This patch also removes gfp_t flags from scribble_alloc() parameters
>> list, because the invalid GFP_NOIO is replaced by memalloc scope APIs.
>>
>> Fixes: b330e6a49dc3 ("md: convert to kvmalloc")
>> Signed-off-by: Coly Li <colyli@suse.de>
>> Cc: Kent Overstreet <kent.overstreet@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> ---
>> drivers/md/raid5.c | 22 ++++++++++++++++------
>> 1 file changed, 16 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index ba00e9877f02..6b23f8aba169 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -2228,14 +2228,15 @@ static int grow_stripes(struct r5conf *conf, int num)
>>  * of the P and Q blocks.
>>  */
> 
> I noticed the comment before scribble_alloc() is outdated. Maybe
> fix also that as we are on it? 
> 

OK, I will do that.

>> static int scribble_alloc(struct raid5_percpu *percpu,
>> -			  int num, int cnt, gfp_t flags)
>> +			  int num, int cnt)
>> {
>> 	size_t obj_size =
>> 		sizeof(struct page *) * (num+2) +
>> 		sizeof(addr_conv_t) * (num+2);
>> 	void *scribble;
>> +	unsigned int noio_flag;
> I think we don't use noio_flag in scribble_alloc()? 

You are right of cause. It should not be here :-)

Thanks.
-- 

Coly Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-05 15:53 ` Guoqing Jiang
@ 2020-04-07 15:09   ` Coly Li
  2020-04-09 21:38     ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Coly Li @ 2020-04-07 15:09 UTC (permalink / raw)
  To: Guoqing Jiang, songliubraving; +Cc: linux-raid, Kent Overstreet, Michal Hocko

On 2020/4/5 11:53 下午, Guoqing Jiang wrote:
> On 02.04.20 10:13, Coly Li wrote:
>> -    scribble = kvmalloc_array(cnt, obj_size, flags);
>> +    scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);
> 
> Maybe it is simpler to call kvmalloc_array between memalloc_noio_save
> and memalloc_noio_restore.
> And seems sched/mm.h need to be included per the report from LKP.

The falgs can be,
- GFP_KERNEL: when called from alloc_scratch_buffer()
- GFP_NOIO: when called from resize_chunks().

If the scope APIs are used inside scribble_alloc(), the first call path
is restricted as noio, which is not expected. So I only use the scope
APIs around the location where GFP_NOIO is used.

Anyway, Michal Hocko suggests to add the scope APIs in
mddev_suspend()/mddev_resume(). Then in the whole code path where md
raid array is suspend, we don't need to worry the recursive memory
reclaim I/Os onto the array. After checking the complicated raid5 code,
I come to realize this suggestion makes sense.

I will try to compose a v2 patch following Michal's suggestion.
-- 

Coly Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-07 15:09   ` Coly Li
@ 2020-04-09 21:38     ` Guoqing Jiang
  2020-04-10  9:36       ` Coly Li
  2020-04-15 11:48       ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: Guoqing Jiang @ 2020-04-09 21:38 UTC (permalink / raw)
  To: Coly Li, songliubraving; +Cc: linux-raid, Kent Overstreet, Michal Hocko

On 07.04.20 17:09, Coly Li wrote:
> On 2020/4/5 11:53 下午, Guoqing Jiang wrote:
>> On 02.04.20 10:13, Coly Li wrote:
>>> -    scribble = kvmalloc_array(cnt, obj_size, flags);
>>> +    scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);
>> Maybe it is simpler to call kvmalloc_array between memalloc_noio_save
>> and memalloc_noio_restore.
>> And seems sched/mm.h need to be included per the report from LKP.
> The falgs can be,
> - GFP_KERNEL: when called from alloc_scratch_buffer()
> - GFP_NOIO: when called from resize_chunks().
>
> If the scope APIs are used inside scribble_alloc(), the first call path
> is restricted as noio, which is not expected. So I only use the scope
> APIs around the location where GFP_NOIO is used.

Thanks for the explanation. I assume it can be distinguished by check 
the flag,
eg, FYI.

if (flag == GFP_NOIO)
     memalloc_noio_save();
kvmalloc_array();
if (flag == GFP_NOIO)
     memalloc_noio_restore();
> Anyway, Michal Hocko suggests to add the scope APIs in
> mddev_suspend()/mddev_resume(). Then in the whole code path where md
> raid array is suspend, we don't need to worry the recursive memory
> reclaim I/Os onto the array. After checking the complicated raid5 code,
> I come to realize this suggestion makes sense.

Hmm, mddev_suspend/resume are called at lots of places, then it's better to
check if all the places don't allocate memory with GFP_KERNEL.

And seems In level_store(), sysfs_create_group could be called between 
suspend
and resume, then the two functions (kstrdup_const(name, GFP_KERNEL) and
kmem_cache_zalloc(kernfs_node_cache, GFP_KERNEL)) could be triggered by the
path:

sysfs_create_group ->internal_create_group -> kernfs_create_dir_ns ->
kernfs_new_node -> __kernfs_new_node

Not know memalloc_noio_{save,restore} well, but I guess it is better to 
use them
to mark a small scope, just my two cents.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-09 21:38     ` Guoqing Jiang
@ 2020-04-10  9:36       ` Coly Li
  2020-04-15 11:48       ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Coly Li @ 2020-04-10  9:36 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: songliubraving, linux-raid, Kent Overstreet, Michal Hocko

On 2020/4/10 5:38 上午, Guoqing Jiang wrote:
> On 07.04.20 17:09, Coly Li wrote:
>> On 2020/4/5 11:53 下午, Guoqing Jiang wrote:
>>> On 02.04.20 10:13, Coly Li wrote:
>>>> -    scribble = kvmalloc_array(cnt, obj_size, flags);
>>>> +    scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);
>>> Maybe it is simpler to call kvmalloc_array between memalloc_noio_save
>>> and memalloc_noio_restore.
>>> And seems sched/mm.h need to be included per the report from LKP.
>> The falgs can be,
>> - GFP_KERNEL: when called from alloc_scratch_buffer()
>> - GFP_NOIO: when called from resize_chunks().
>>
>> If the scope APIs are used inside scribble_alloc(), the first call path
>> is restricted as noio, which is not expected. So I only use the scope
>> APIs around the location where GFP_NOIO is used.
> 
> Thanks for the explanation. I assume it can be distinguished by check
> the flag,
> eg, FYI.
> 
> if (flag == GFP_NOIO)
>     memalloc_noio_save();
> kvmalloc_array();
> if (flag == GFP_NOIO)
>     memalloc_noio_restore();

This was similar to my original idea, but finally I decide to accept
Michal's suggestion to add them into mddev_suspend()/mddev_resume. Let
me explain.

>> Anyway, Michal Hocko suggests to add the scope APIs in
>> mddev_suspend()/mddev_resume(). Then in the whole code path where md
>> raid array is suspend, we don't need to worry the recursive memory
>> reclaim I/Os onto the array. After checking the complicated raid5 code,
>> I come to realize this suggestion makes sense.
> 
> Hmm, mddev_suspend/resume are called at lots of places, then it's better to
> check if all the places don't allocate memory with GFP_KERNEL.
> 

When mddev_suspend() is called, then all I/Os on the suspending raid
array have to wait until mddev_resume() is called. Inside the suspending
region, all memory allocation should be careful to avoid memory reclaim
I/Os going back to the suspending raid array. If such recursive I/Os
happen, we will experience deadlock.

Therefore we should be very careful to use GFP_KERNEL to allocate memory
during the period when the raid array is suspending. No matter
physically continuous or virtually continuous pages.

> And seems In level_store(), sysfs_create_group could be called between
> suspend
> and resume, then the two functions (kstrdup_const(name, GFP_KERNEL) and
> kmem_cache_zalloc(kernfs_node_cache, GFP_KERNEL)) could be triggered by the
> path:
> 
> sysfs_create_group ->internal_create_group -> kernfs_create_dir_ns ->
> kernfs_new_node -> __kernfs_new_node
> 

From your information, the above code path is suspicious. Although the
deadlock might not happen in practice, unless the raid array is used as
the only fs volume, and all memory writeback or swap I/Os will only go
into this array.

Checking all the location of memory allocation is trivial but possible,
how to prevent new code with memory allocation to avoid the potential
deadlock risk during the suspending period is still a problem.

> Not know memalloc_noio_{save,restore} well, but I guess it is better to
> use them
> to mark a small scope, just my two cents.

Since mddev_suspend() is the entry point to prevent I/Os on the array,
it is the good location to restrict memory reclaim I/Os of memory
allocation. Finally I realize this is a brilliant idea after Michal
Hocko suggests me for a while.

Thanks.

-- 

Coly Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-09 21:38     ` Guoqing Jiang
  2020-04-10  9:36       ` Coly Li
@ 2020-04-15 11:48       ` Michal Hocko
  2020-04-15 14:10         ` Guoqing Jiang
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-04-15 11:48 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Coly Li, songliubraving, linux-raid, Kent Overstreet

On Thu 09-04-20 23:38:13, Guoqing Jiang wrote:
[...]
> Not know memalloc_noio_{save,restore} well, but I guess it is better
> to use them to mark a small scope, just my two cents.

This would go against the intentio of the api. It is really meant to
define reclaim recursion problematic scope. If there is a clear entry
point where any further allocation recursing to FS/IO could deadlock
then it should be used at that level. This might be a lock which is
taken from the reclaim or like this case a device is suspended and no IO
is processed so anything that would wait for an IO or rely on IO making
progress in the reclaim path would deadlock.

Please have a look at Documentation/core-api/gfp_mask-from-fs-io.rst
and let me know is something could be made more clear or explicit.
I am more than happy to improve the documentation.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-15 11:48       ` Michal Hocko
@ 2020-04-15 14:10         ` Guoqing Jiang
  2020-04-15 14:23           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2020-04-15 14:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Coly Li, songliubraving, linux-raid, Kent Overstreet

On 15.04.20 13:48, Michal Hocko wrote:
> On Thu 09-04-20 23:38:13, Guoqing Jiang wrote:
> [...]
>> Not know memalloc_noio_{save,restore} well, but I guess it is better
>> to use them to mark a small scope, just my two cents.
> This would go against the intentio of the api. It is really meant to
> define reclaim recursion problematic scope.

Well, in current proposal, the scope is just when 
scribble_allo/kvmalloc_array is called.

memalloc_noio_save
scribble_allo/kvmalloc_array
memalloc_noio_restore

With the new proposal, the marked scope would be bigger than current one 
since there
are lots of places call mddev_suspend/resume.

mddev_suspend
memalloc_noio_save
...
memalloc_noio_restore
mddev_resume

IMHO, if the current proposal works then what is the advantage to 
increase the scope.
If all the callers of mddev_suspend/resume could suffer from the 
deadlock issue due to
recursing fs io, then it is definitely need to use the new proposal.

> If there is a clear entry point where any further allocation recursing to FS/IO could deadlock
> then it should be used at that level.

Agree.

At the end of mddev_suspend, I guess there is no FS/IO could happen to 
the array,
because mddev->suspended++ and quiesce(mddev, 1) are called previously in
mddev_suspend. And it makes me curious to go through to the call chain:

layout_store / chunk_size_store / update_raid_disks / update_array_info 
/ md_check_recovery
         => pers->check_reshape => raid{5,6}_check_reshape => check_reshape
         => resize_chunks => scribble_alloc(..., GFP_NOIO)

Perhaps I missed something, but seems all the 5 original callers are not 
called between
mddev_suspend and mddev_resume, if so, then with the new proposal, 
scribble_alloc()
could not be protected by the memalloc_noio_{save,restore}.

> This might be a lock which is
> taken from the reclaim or like this case a device is suspended and no IO
> is processed so anything that would wait for an IO or rely on IO making
> progress in the reclaim path would deadlock.

Thanks for teaching.

> Please have a look at Documentation/core-api/gfp_mask-from-fs-io.rst
> and let me know is something could be made more clear or explicit.
> I am more than happy to improve the documentation.

Thanks again for your lighting, will read it.

Regards,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-15 14:10         ` Guoqing Jiang
@ 2020-04-15 14:23           ` Michal Hocko
  2020-04-15 14:57             ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-04-15 14:23 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Coly Li, songliubraving, linux-raid, Kent Overstreet

On Wed 15-04-20 16:10:08, Guoqing Jiang wrote:
> On 15.04.20 13:48, Michal Hocko wrote:
> > On Thu 09-04-20 23:38:13, Guoqing Jiang wrote:
> > [...]
> > > Not know memalloc_noio_{save,restore} well, but I guess it is better
> > > to use them to mark a small scope, just my two cents.
> > This would go against the intentio of the api. It is really meant to
> > define reclaim recursion problematic scope.
> 
> Well, in current proposal, the scope is just when
> scribble_allo/kvmalloc_array is called.
> 
> memalloc_noio_save
> scribble_allo/kvmalloc_array
> memalloc_noio_restore
> 
> With the new proposal, the marked scope would be bigger than current one
> since there
> are lots of places call mddev_suspend/resume.
> 
> mddev_suspend
> memalloc_noio_save
> ...
> memalloc_noio_restore
> mddev_resume
> 
> IMHO, if the current proposal works then what is the advantage to increase
> the scope.

The advantage is twofold. It serves the documentation purpose because it
is clear _what_ and _why_ is the actual allocation restricted context.
In this case mddev_{suspend,resume} because XYZ and you do not have to
care about __GFP_IO for _any_ allocation inside the scope.

> If all the callers of mddev_suspend/resume could suffer from the
> deadlock issue due to recursing fs io, then it is definitely need to
> use the new proposal.

Why some of them wouldn't? Isn't the mddev_suspend going to block any IO
on the device? The thing is that you cannot tell which devices the
allocator can issue IO for therefore GFP_NOIO is a global flag. Please
also note that the scope API is bound to a process context so it only
affects the current execution.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-15 14:23           ` Michal Hocko
@ 2020-04-15 14:57             ` Guoqing Jiang
  2020-04-30  6:36               ` Song Liu
  0 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2020-04-15 14:57 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Coly Li, songliubraving, linux-raid, Kent Overstreet

On 15.04.20 16:23, Michal Hocko wrote:
> On Wed 15-04-20 16:10:08, Guoqing Jiang wrote:
>> On 15.04.20 13:48, Michal Hocko wrote:
>>> On Thu 09-04-20 23:38:13, Guoqing Jiang wrote:
>>> [...]
>>>> Not know memalloc_noio_{save,restore} well, but I guess it is better
>>>> to use them to mark a small scope, just my two cents.
>>> This would go against the intentio of the api. It is really meant to
>>> define reclaim recursion problematic scope.
>> Well, in current proposal, the scope is just when
>> scribble_allo/kvmalloc_array is called.
>>
>> memalloc_noio_save
>> scribble_allo/kvmalloc_array
>> memalloc_noio_restore
>>
>> With the new proposal, the marked scope would be bigger than current one
>> since there
>> are lots of places call mddev_suspend/resume.
>>
>> mddev_suspend
>> memalloc_noio_save
>> ...
>> memalloc_noio_restore
>> mddev_resume
>>
>> IMHO, if the current proposal works then what is the advantage to increase
>> the scope.
> The advantage is twofold. It serves the documentation purpose because it
> is clear _what_ and _why_ is the actual allocation restricted context.
> In this case mddev_{suspend,resume} because XYZ and you do not have to
> care about __GFP_IO for _any_ allocation inside the scope.

Personally, I'd prefer fine grained protection scope, anyway just my own
flavor. And I think Song has his opinion about the proposal, I will respect
his decision.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks()
  2020-04-15 14:57             ` Guoqing Jiang
@ 2020-04-30  6:36               ` Song Liu
  0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2020-04-30  6:36 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Michal Hocko, Coly Li, linux-raid, Kent Overstreet



> On Apr 15, 2020, at 7:57 AM, Guoqing Jiang <guoqing.jiang@cloud.ionos.com> wrote:
> 
> On 15.04.20 16:23, Michal Hocko wrote:
>> On Wed 15-04-20 16:10:08, Guoqing Jiang wrote:
>>> On 15.04.20 13:48, Michal Hocko wrote:
>>>> On Thu 09-04-20 23:38:13, Guoqing Jiang wrote:
>>>> [...]
>>>>> Not know memalloc_noio_{save,restore} well, but I guess it is better
>>>>> to use them to mark a small scope, just my two cents.
>>>> This would go against the intentio of the api. It is really meant to
>>>> define reclaim recursion problematic scope.
>>> Well, in current proposal, the scope is just when
>>> scribble_allo/kvmalloc_array is called.
>>> 
>>> memalloc_noio_save
>>> scribble_allo/kvmalloc_array
>>> memalloc_noio_restore
>>> 
>>> With the new proposal, the marked scope would be bigger than current one
>>> since there
>>> are lots of places call mddev_suspend/resume.
>>> 
>>> mddev_suspend
>>> memalloc_noio_save
>>> ...
>>> memalloc_noio_restore
>>> mddev_resume
>>> 
>>> IMHO, if the current proposal works then what is the advantage to increase
>>> the scope.
>> The advantage is twofold. It serves the documentation purpose because it
>> is clear _what_ and _why_ is the actual allocation restricted context.
>> In this case mddev_{suspend,resume} because XYZ and you do not have to
>> care about __GFP_IO for _any_ allocation inside the scope.
> 
> Personally, I'd prefer fine grained protection scope, anyway just my own
> flavor. And I think Song has his opinion about the proposal, I will respect
> his decision.

Thanks Coly, Guoqing, and Michal for these great inputs. 

Coly's v3 set looks good to me. I will go ahead apply that version to md-next
branch. 

Thanks again,
Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-30  6:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02  8:13 [PATCH] raid5: use memalloc_noio_save()/restore in resize_chunks() Coly Li
2020-04-03 13:17 ` kbuild test robot
2020-04-03 13:17   ` kbuild test robot
2020-04-05 15:53 ` Guoqing Jiang
2020-04-07 15:09   ` Coly Li
2020-04-09 21:38     ` Guoqing Jiang
2020-04-10  9:36       ` Coly Li
2020-04-15 11:48       ` Michal Hocko
2020-04-15 14:10         ` Guoqing Jiang
2020-04-15 14:23           ` Michal Hocko
2020-04-15 14:57             ` Guoqing Jiang
2020-04-30  6:36               ` Song Liu
2020-04-05 17:43 ` Song Liu
2020-04-07 14:42   ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.