All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] improve memalloc scope APIs usage
@ 2020-04-09 11:52 colyli
  2020-04-09 11:52 ` [PATCH v2 1/4] md: use memalloc scope APIs in mddev_suspend()/mddev_resume() colyli
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: colyli @ 2020-04-09 11:52 UTC (permalink / raw)
  To: songliubraving, linux-raid
  Cc: mhocko, kent.overstreet, guoqing.jiang, Coly Li

From: Coly Li <colyli@suse.de>

Hi folks,

The motivation of this series is to fix the incorrect GFP_NOIO flag
usage in drivers/md/raid5.c:resize_chunks(). I take the suggestion
from Michal Hocko to use memalloc scope APIs in unified entry point
mddev_suspend()/mddev_resume(). Also I get rid of the incorect GFP_NOIO
usage for scribble_alloc(), and remove redundant memalloc scope APIs
usage in mddev_create_serial_pool(), also as Song Liu suggested, update
the code comments on the header of scribble_alloc().

Thank you in advance for the review and comments.

Coly Li
---
Changelog:
v2: Add memalloc scope APIs in raid array suspend context.
v1: Original version to add memalloc scope APIs in resize_chunks().

Coly Li (4):
  md: use memalloc scope APIs in mddev_suspend()/mddev_resume()
  raid5: remove gfp flags from scribble_alloc()
  raid5: update code comment of scribble_alloc()
  md: remove redundant memalloc scope API usage

 drivers/md/md.c    |  6 ++++--
 drivers/md/md.h    |  1 +
 drivers/md/raid5.c | 22 ++++++++++++++--------
 3 files changed, 19 insertions(+), 10 deletions(-)

-- 
2.25.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] md: use memalloc scope APIs in mddev_suspend()/mddev_resume()
  2020-04-09 11:52 [PATCH v2 0/4] improve memalloc scope APIs usage colyli
@ 2020-04-09 11:52 ` colyli
  2020-04-09 11:52 ` [PATCH v2 2/4] raid5: remove gfp flags from scribble_alloc() colyli
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: colyli @ 2020-04-09 11:52 UTC (permalink / raw)
  To: songliubraving, linux-raid
  Cc: mhocko, kent.overstreet, guoqing.jiang, Coly Li

From: Coly Li <colyli@suse.de>

In raid5.c:resize_chunks(), scribble_alloc() is called with GFP_NOIO
flag, then it is sent into kvmalloc_array() inside scribble_alloc().

The problem is kvmalloc_array() eventually calls kvmalloc_node() which
does not accept non GFP_KERNEL compatible flag like GFP_NOIO, then
kmalloc_node() is called indeed to allocate physically continuous
pages. When system memory is under heavy pressure, and the requesting
size is large, there is high probability that allocating continueous
pages will fail.

But simply using GFP_KERNEL flag to call kvmalloc_array() is also
progblematic. In the code path where scribble_alloc() is called, the
raid array is suspended, if kvmalloc_node() triggers memory reclaim I/Os
and such I/Os go back to the suspend raid array, deadlock will happen.

What is desired here is to allocate non-physically (a.k.a virtually)
continuous pages and avoid memory reclaim I/Os. Michal Hocko suggests
to use the mmealloc sceope APIs to restrict memory reclaim I/O in
allocating context, specifically to call memalloc_noio_save() when
suspend the raid array and to call memalloc_noio_restore() when
resume the raid array.

This patch adds the memalloc scope APIs in mddev_suspend() and
mddev_resume(), to restrict memory reclaim I/Os during the raid array
is suspended. The benifit of adding the memalloc scope API in the
unified entry point mddev_suspend()/mddev_resume() is, no matter which
md raid array type (personality), we are sure the deadlock by recursive
memory reclaim I/O won't happen on the suspending context.

Please notice that the memalloc scope APIs only take effect on the raid
array suspending context, if the memory allocation is from another new
created kthread after raid array suspended, the recursive memory reclaim
I/Os won't be restricted. The mddev_suspend()/mddev_resume() entries are
used for the critical section where the raid metadata is modifying,
creating a kthread to allocate memory inside the critical section is
queer and very probably being buggy.

Fixes: b330e6a49dc3 ("md: convert to kvmalloc")
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/md.c | 4 ++++
 drivers/md/md.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 271e8a587354..1a8e1bb3a7f4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -527,11 +527,15 @@ void mddev_suspend(struct mddev *mddev)
 	wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
 
 	del_timer_sync(&mddev->safemode_timer);
+	/* restrict memory reclaim I/O during raid array is suspend */
+	mddev->noio_flag = memalloc_noio_save();
 }
 EXPORT_SYMBOL_GPL(mddev_suspend);
 
 void mddev_resume(struct mddev *mddev)
 {
+	/* entred the memalloc scope from mddev_suspend() */
+	memalloc_noio_restore(mddev->noio_flag);
 	lockdep_assert_held(&mddev->reconfig_mutex);
 	if (--mddev->suspended)
 		return;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index acd681939112..612814d07d35 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -497,6 +497,7 @@ struct mddev {
 	void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev);
 	struct md_cluster_info		*cluster_info;
 	unsigned int			good_device_nr;	/* good device num within cluster raid */
+	unsigned int			noio_flag; /* for memalloc scope API */
 
 	bool	has_superblocks:1;
 	bool	fail_last_dev:1;
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] raid5: remove gfp flags from scribble_alloc()
  2020-04-09 11:52 [PATCH v2 0/4] improve memalloc scope APIs usage colyli
  2020-04-09 11:52 ` [PATCH v2 1/4] md: use memalloc scope APIs in mddev_suspend()/mddev_resume() colyli
@ 2020-04-09 11:52 ` colyli
  2020-04-09 11:52 ` [PATCH v2 3/4] raid5: update code comment of scribble_alloc() colyli
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: colyli @ 2020-04-09 11:52 UTC (permalink / raw)
  To: songliubraving, linux-raid
  Cc: mhocko, kent.overstreet, guoqing.jiang, Coly Li

From: Coly Li <colyli@suse.de>

Using GFP_NOIO flag to call scribble_alloc() from resize_chunks() does
not have the expected behavior. kvmalloc_array() inside scribble_alloc()
which receives the GFP_NOIO flag will eventually call kmalloc_node() to
allocate physically continuous pages.

Now we have memalloc scope APIs in mddev_suspend()/mddev_resume() to
prevent memory reclaim I/Os during raid array suspend context, calling
to kvmalloc_array() with GFP_KERNEL flag may avoid deadlock of recursive
I/O as expected.

This patch removes the useless gfp flags from parameters list of
scribble_alloc(), and call kvmalloc_array() with GFP_KERNEL flag. The
incorrect GFP_NOIO flag does not exist anymore.

Fixes: b330e6a49dc3 ("md: convert to kvmalloc")
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
---
 drivers/md/raid5.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ba00e9877f02..190dd70db514 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2228,14 +2228,19 @@ static int grow_stripes(struct r5conf *conf, int num)
  * of the P and Q blocks.
  */
 static int scribble_alloc(struct raid5_percpu *percpu,
-			  int num, int cnt, gfp_t flags)
+			  int num, int cnt)
 {
 	size_t obj_size =
 		sizeof(struct page *) * (num+2) +
 		sizeof(addr_conv_t) * (num+2);
 	void *scribble;
 
-	scribble = kvmalloc_array(cnt, obj_size, flags);
+	/*
+	 * If here is in raid array suspend context, it is in memalloc noio
+	 * context as well, there is no potential recursive memory reclaim
+	 * I/Os with the GFP_KERNEL flag.
+	 */
+	scribble = kvmalloc_array(cnt, obj_size, GFP_KERNEL);
 	if (!scribble)
 		return -ENOMEM;
 
@@ -2267,8 +2272,7 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 
 		percpu = per_cpu_ptr(conf->percpu, cpu);
 		err = scribble_alloc(percpu, new_disks,
-				     new_sectors / STRIPE_SECTORS,
-				     GFP_NOIO);
+				     new_sectors / STRIPE_SECTORS);
 		if (err)
 			break;
 	}
@@ -6759,8 +6763,7 @@ static int alloc_scratch_buffer(struct r5conf *conf, struct raid5_percpu *percpu
 			       conf->previous_raid_disks),
 			   max(conf->chunk_sectors,
 			       conf->prev_chunk_sectors)
-			   / STRIPE_SECTORS,
-			   GFP_KERNEL)) {
+			   / STRIPE_SECTORS)) {
 		free_scratch_buffer(conf, percpu);
 		return -ENOMEM;
 	}
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] raid5: update code comment of scribble_alloc()
  2020-04-09 11:52 [PATCH v2 0/4] improve memalloc scope APIs usage colyli
  2020-04-09 11:52 ` [PATCH v2 1/4] md: use memalloc scope APIs in mddev_suspend()/mddev_resume() colyli
  2020-04-09 11:52 ` [PATCH v2 2/4] raid5: remove gfp flags from scribble_alloc() colyli
@ 2020-04-09 11:52 ` colyli
  2020-04-09 11:52 ` [PATCH v2 4/4] md: remove redundant memalloc scope API usage colyli
  2020-04-30  6:49 ` [PATCH v2 0/4] improve memalloc scope APIs usage Song Liu
  4 siblings, 0 replies; 7+ messages in thread
From: colyli @ 2020-04-09 11:52 UTC (permalink / raw)
  To: songliubraving, linux-raid
  Cc: mhocko, kent.overstreet, guoqing.jiang, Coly Li

From: Coly Li <colyli@suse.de>

Code comments of scribble_alloc() is outdated for a while. This patch
update the comments in function header for the new parameter list.

Suggested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/raid5.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 190dd70db514..ab8067f9ce8c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2215,10 +2215,13 @@ static int grow_stripes(struct r5conf *conf, int num)
 }
 
 /**
- * scribble_len - return the required size of the scribble region
+ * scribble_alloc - allocate percpu scribble buffer for required size
+ *		    of the scribble region
+ * @percpu - from for_each_present_cpu() of the caller
  * @num - total number of disks in the array
+ * @cnt - scribble objs count for required size of the scribble region
  *
- * The size must be enough to contain:
+ * The scribble buffer size must be enough to contain:
  * 1/ a struct page pointer for each device in the array +2
  * 2/ room to convert each entry in (1) to its corresponding dma
  *    (dma_map_page()) or page (page_address()) address.
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] md: remove redundant memalloc scope API usage
  2020-04-09 11:52 [PATCH v2 0/4] improve memalloc scope APIs usage colyli
                   ` (2 preceding siblings ...)
  2020-04-09 11:52 ` [PATCH v2 3/4] raid5: update code comment of scribble_alloc() colyli
@ 2020-04-09 11:52 ` colyli
  2020-04-09 11:55   ` Coly Li
  2020-04-30  6:49 ` [PATCH v2 0/4] improve memalloc scope APIs usage Song Liu
  4 siblings, 1 reply; 7+ messages in thread
From: colyli @ 2020-04-09 11:52 UTC (permalink / raw)
  To: songliubraving, linux-raid
  Cc: mhocko, kent.overstreet, guoqing.jiang, Coly Li

From: Coly Li <colyli@suse.de>

In mddev_create_serial_pool(), memalloc scope APIs memalloc_noio_save()
and memalloc_noio_restore() are used when allocating memory by calling
mempool_create_kmalloc_pool(). After adding the memalloc scope APIs in
raid array suspend context, it is unncessary to explicitly call them
around mempool_create_kmalloc_pool() any longer.

This patch removes the redundant memalloc scope APIs in
mddev_create_serial_pool().

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 drivers/md/md.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1a8e1bb3a7f4..ef29f49a0f51 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -229,11 +229,9 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 	if (mddev->serial_info_pool == NULL) {
 		unsigned int noio_flag;
 
-		noio_flag = memalloc_noio_save();
 		mddev->serial_info_pool =
 			mempool_create_kmalloc_pool(NR_SERIAL_INFOS,
 						sizeof(struct serial_info));
-		memalloc_noio_restore(noio_flag);
 		if (!mddev->serial_info_pool) {
 			rdevs_uninit_serial(mddev);
 			pr_err("can't alloc memory pool for serialization\n");
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 4/4] md: remove redundant memalloc scope API usage
  2020-04-09 11:52 ` [PATCH v2 4/4] md: remove redundant memalloc scope API usage colyli
@ 2020-04-09 11:55   ` Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2020-04-09 11:55 UTC (permalink / raw)
  To: songliubraving, linux-raid; +Cc: mhocko, kent.overstreet, guoqing.jiang

On 2020/4/9 7:52 下午, colyli@suse.de wrote:
> From: Coly Li <colyli@suse.de>
> 
> In mddev_create_serial_pool(), memalloc scope APIs memalloc_noio_save()
> and memalloc_noio_restore() are used when allocating memory by calling
> mempool_create_kmalloc_pool(). After adding the memalloc scope APIs in
> raid array suspend context, it is unncessary to explicitly call them
> around mempool_create_kmalloc_pool() any longer.
> 
> This patch removes the redundant memalloc scope APIs in
> mddev_create_serial_pool().
> 
> Signed-off-by: Coly Li <colyli@suse.de>
> Cc: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
>  drivers/md/md.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 1a8e1bb3a7f4..ef29f49a0f51 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -229,11 +229,9 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
>  	if (mddev->serial_info_pool == NULL) {
>  		unsigned int noio_flag;
		^^^^^^^^^^^^^^^^^^^^^^^ I will remove this one in v3 series.


Coly Li


>  
> -		noio_flag = memalloc_noio_save();
>  		mddev->serial_info_pool =
>  			mempool_create_kmalloc_pool(NR_SERIAL_INFOS,
>  						sizeof(struct serial_info));
> -		memalloc_noio_restore(noio_flag);
>  		if (!mddev->serial_info_pool) {
>  			rdevs_uninit_serial(mddev);
>  			pr_err("can't alloc memory pool for serialization\n");
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/4] improve memalloc scope APIs usage
  2020-04-09 11:52 [PATCH v2 0/4] improve memalloc scope APIs usage colyli
                   ` (3 preceding siblings ...)
  2020-04-09 11:52 ` [PATCH v2 4/4] md: remove redundant memalloc scope API usage colyli
@ 2020-04-30  6:49 ` Song Liu
  4 siblings, 0 replies; 7+ messages in thread
From: Song Liu @ 2020-04-30  6:49 UTC (permalink / raw)
  To: Coly Li; +Cc: Song Liu, linux-raid, mhocko, kent.overstreet, Guoqing Jiang

On Thu, Apr 9, 2020 at 4:53 AM <colyli@suse.de> wrote:
>
> From: Coly Li <colyli@suse.de>
>
> Hi folks,
>
> The motivation of this series is to fix the incorrect GFP_NOIO flag
> usage in drivers/md/raid5.c:resize_chunks(). I take the suggestion
> from Michal Hocko to use memalloc scope APIs in unified entry point
> mddev_suspend()/mddev_resume(). Also I get rid of the incorect GFP_NOIO
> usage for scribble_alloc(), and remove redundant memalloc scope APIs
> usage in mddev_create_serial_pool(), also as Song Liu suggested, update
> the code comments on the header of scribble_alloc().
>
> Thank you in advance for the review and comments.
>
> Coly Li

Applied to md-next. Thanks for the fix. And Thanks Michal and Guoqing
for great inputs.

Song

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-30  6:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09 11:52 [PATCH v2 0/4] improve memalloc scope APIs usage colyli
2020-04-09 11:52 ` [PATCH v2 1/4] md: use memalloc scope APIs in mddev_suspend()/mddev_resume() colyli
2020-04-09 11:52 ` [PATCH v2 2/4] raid5: remove gfp flags from scribble_alloc() colyli
2020-04-09 11:52 ` [PATCH v2 3/4] raid5: update code comment of scribble_alloc() colyli
2020-04-09 11:52 ` [PATCH v2 4/4] md: remove redundant memalloc scope API usage colyli
2020-04-09 11:55   ` Coly Li
2020-04-30  6:49 ` [PATCH v2 0/4] improve memalloc scope APIs usage Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.