linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] blk-mq: Avoid memory reclaim when allocating
@ 2019-09-30  1:52 xiubli
  2019-09-30  1:52 ` [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map xiubli
  2019-09-30  1:52 ` [PATCH v4 2/2] blk-mq: use BLK_MQ_GFP_FLAGS macro instead xiubli
  0 siblings, 2 replies; 7+ messages in thread
From: xiubli @ 2019-09-30  1:52 UTC (permalink / raw)
  To: josef, axboe; +Cc: mchristi, ming.lei, linux-block, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Changed in V2:
- Addressed the comment from Ming Lei, thanks.

Changed in V3:
- Switch to memalloc_noio_save/restore from Christoph's comment, thanks.

Changed in V4:
- Switch back to v2 by remove the memalloc_ stuff
- With a small fix by making all the gfp flags to BLK_MQ_GFP_FLAGS in
blk_mq_alloc_rq_map where the NOIO is needed.

Xiubo Li (2):
  blk-mq: Avoid memory reclaim when allocating request map
  blk-mq: use BLK_MQ_GFP_FLAGS macro instead

 block/blk-mq-tag.c |  5 +++--
 block/blk-mq-tag.h |  5 ++++-
 block/blk-mq.c     | 20 +++++++++++---------
 3 files changed, 18 insertions(+), 12 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map
  2019-09-30  1:52 [PATCH v4 0/2] blk-mq: Avoid memory reclaim when allocating xiubli
@ 2019-09-30  1:52 ` xiubli
  2019-09-30  5:28   ` Damien Le Moal
  2019-09-30  1:52 ` [PATCH v4 2/2] blk-mq: use BLK_MQ_GFP_FLAGS macro instead xiubli
  1 sibling, 1 reply; 7+ messages in thread
From: xiubli @ 2019-09-30  1:52 UTC (permalink / raw)
  To: josef, axboe
  Cc: mchristi, ming.lei, linux-block, Xiubo Li, Gabriel Krisman Bertazi

From: Xiubo Li <xiubli@redhat.com>

For some storage drivers, such as the nbd, when there has new socket
connections added, it will update the hardware queue number by calling
blk_mq_update_nr_hw_queues(), in which it will freeze all the queues
first. And then tries to do the hardware queue updating stuff.

But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating
memory for tags, it may cause the mm do the memories direct reclaiming,
since the queues has been freezed, so if needs to flush the page cache
to disk, it will stuck in generic_make_request()-->blk_queue_enter() by
waiting the queues to be unfreezed and then cause deadlock here.

Since the memory size requested here is a small one, which will make
it not that easy to happen with a large size, but in theory this could
happen when the OS is running in pressure and out of memory.

Gabriel Krisman Bertazi has hit the similar issue by fixing it in
commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping
queues"), but might forget this part.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 5 +++--
 block/blk-mq-tag.h | 5 ++++-
 block/blk-mq.c     | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 008388e82b5c..04ee0e4c3fa1 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 				     unsigned int reserved_tags,
-				     int node, int alloc_policy)
+				     int node, int alloc_policy,
+				     gfp_t flags)
 {
 	struct blk_mq_tags *tags;
 
@@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 		return NULL;
 	}
 
-	tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node);
+	tags = kzalloc_node(sizeof(*tags), flags, node);
 	if (!tags)
 		return NULL;
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..296e0bc97126 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -22,7 +22,10 @@ struct blk_mq_tags {
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
+					    unsigned int reserved_tags,
+					    int node, int alloc_policy,
+					    gfp_t flags);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 240416057f28..9c52e4dfe132 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 		node = set->numa_node;
 
 	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
-				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
+				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
+				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
 	if (!tags)
 		return NULL;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] blk-mq: use BLK_MQ_GFP_FLAGS macro instead
  2019-09-30  1:52 [PATCH v4 0/2] blk-mq: Avoid memory reclaim when allocating xiubli
  2019-09-30  1:52 ` [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map xiubli
@ 2019-09-30  1:52 ` xiubli
  1 sibling, 0 replies; 7+ messages in thread
From: xiubli @ 2019-09-30  1:52 UTC (permalink / raw)
  To: josef, axboe; +Cc: mchristi, ming.lei, linux-block, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

There are at least 6 places are using the same combined GFP flags,
switch them to one macro instead to make the code get cleaner.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c52e4dfe132..3d3b3e5787b0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -39,6 +39,8 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
+#define BLK_MQ_GFP_FLAGS (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
+
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -2091,21 +2093,19 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 
 	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
 				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
-				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
+				BLK_MQ_GFP_FLAGS);
 	if (!tags)
 		return NULL;
 
 	tags->rqs = kcalloc_node(nr_tags, sizeof(struct request *),
-				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
-				 node);
+				 BLK_MQ_GFP_FLAGS, node);
 	if (!tags->rqs) {
 		blk_mq_free_tags(tags);
 		return NULL;
 	}
 
 	tags->static_rqs = kcalloc_node(nr_tags, sizeof(struct request *),
-					GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
-					node);
+					BLK_MQ_GFP_FLAGS, node);
 	if (!tags->static_rqs) {
 		kfree(tags->rqs);
 		blk_mq_free_tags(tags);
@@ -2167,7 +2167,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 
 		do {
 			page = alloc_pages_node(node,
-				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO,
+				BLK_MQ_GFP_FLAGS | __GFP_ZERO,
 				this_order);
 			if (page)
 				break;
@@ -2188,7 +2188,8 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		 * Allow kmemleak to scan these pages as they contain pointers
 		 * to additional allocations like via ops->init_request().
 		 */
-		kmemleak_alloc(p, order_to_size(this_order), 1, GFP_NOIO);
+		kmemleak_alloc(p, order_to_size(this_order), 1,
+			       BLK_MQ_GFP_FLAGS);
 		entries_per_page = order_to_size(this_order) / rq_size;
 		to_do = min(entries_per_page, depth - i);
 		left -= to_do * rq_size;
@@ -2333,7 +2334,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 		int node)
 {
 	struct blk_mq_hw_ctx *hctx;
-	gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY;
+	gfp_t gfp = BLK_MQ_GFP_FLAGS;
 
 	hctx = kzalloc_node(blk_mq_hw_ctx_size(set), gfp, node);
 	if (!hctx)
@@ -3194,7 +3195,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
 	if (!q->elevator)
 		return true;
 
-	qe = kmalloc(sizeof(*qe), GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
+	qe = kmalloc(sizeof(*qe), BLK_MQ_GFP_FLAGS);
 	if (!qe)
 		return false;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map
  2019-09-30  1:52 ` [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map xiubli
@ 2019-09-30  5:28   ` Damien Le Moal
  2019-09-30  5:50     ` Xiubo Li
  0 siblings, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2019-09-30  5:28 UTC (permalink / raw)
  To: xiubli, josef, axboe
  Cc: mchristi, ming.lei, linux-block, Gabriel Krisman Bertazi

On 2019/09/29 18:52, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> For some storage drivers, such as the nbd, when there has new socket
> connections added, it will update the hardware queue number by calling
> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues
> first. And then tries to do the hardware queue updating stuff.
> 
> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating
> memory for tags, it may cause the mm do the memories direct reclaiming,
> since the queues has been freezed, so if needs to flush the page cache
> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by
> waiting the queues to be unfreezed and then cause deadlock here.
> 
> Since the memory size requested here is a small one, which will make
> it not that easy to happen with a large size, but in theory this could
> happen when the OS is running in pressure and out of memory.
> 
> Gabriel Krisman Bertazi has hit the similar issue by fixing it in
> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping
> queues"), but might forget this part.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-tag.c | 5 +++--
>  block/blk-mq-tag.h | 5 ++++-
>  block/blk-mq.c     | 3 ++-
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 008388e82b5c..04ee0e4c3fa1 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  
>  struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  				     unsigned int reserved_tags,
> -				     int node, int alloc_policy)
> +				     int node, int alloc_policy,
> +				     gfp_t flags)
>  {
>  	struct blk_mq_tags *tags;
>  
> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  		return NULL;
>  	}
>  
> -	tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node);
> +	tags = kzalloc_node(sizeof(*tags), flags, node);
>  	if (!tags)
>  		return NULL;
>  
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 61deab0b5a5a..296e0bc97126 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -22,7 +22,10 @@ struct blk_mq_tags {
>  };
>  
>  
> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
> +					    unsigned int reserved_tags,
> +					    int node, int alloc_policy,
> +					    gfp_t flags);
>  extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>  
>  extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 240416057f28..9c52e4dfe132 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  		node = set->numa_node;
>  
>  	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
> -				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
> +				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
> +				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);

You added the gfp_t argument to blk_mq_init_tags() but you are only using that
argument with a hardcoded value here. So why not simply call kzalloc_node() in
that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That
would avoid needing to add the "gfp_t flags" argument and still fit with your
patch 2 definition of BLK_MQ_GFP_FLAGS.

>  	if (!tags)
>  		return NULL;
>  
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map
  2019-09-30  5:28   ` Damien Le Moal
@ 2019-09-30  5:50     ` Xiubo Li
  2019-09-30  6:20       ` Damien Le Moal
  0 siblings, 1 reply; 7+ messages in thread
From: Xiubo Li @ 2019-09-30  5:50 UTC (permalink / raw)
  To: Damien Le Moal, josef, axboe
  Cc: mchristi, ming.lei, linux-block, Gabriel Krisman Bertazi

On 2019/9/30 13:28, Damien Le Moal wrote:
> On 2019/09/29 18:52, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> For some storage drivers, such as the nbd, when there has new socket
>> connections added, it will update the hardware queue number by calling
>> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues
>> first. And then tries to do the hardware queue updating stuff.
>>
>> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating
>> memory for tags, it may cause the mm do the memories direct reclaiming,
>> since the queues has been freezed, so if needs to flush the page cache
>> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by
>> waiting the queues to be unfreezed and then cause deadlock here.
>>
>> Since the memory size requested here is a small one, which will make
>> it not that easy to happen with a large size, but in theory this could
>> happen when the OS is running in pressure and out of memory.
>>
>> Gabriel Krisman Bertazi has hit the similar issue by fixing it in
>> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping
>> queues"), but might forget this part.
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
>> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>> ---
>>   block/blk-mq-tag.c | 5 +++--
>>   block/blk-mq-tag.h | 5 ++++-
>>   block/blk-mq.c     | 3 ++-
>>   3 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 008388e82b5c..04ee0e4c3fa1 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>>   
>>   struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>>   				     unsigned int reserved_tags,
>> -				     int node, int alloc_policy)
>> +				     int node, int alloc_policy,
>> +				     gfp_t flags)
>>   {
>>   	struct blk_mq_tags *tags;
>>   
>> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>>   		return NULL;
>>   	}
>>   
>> -	tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node);
>> +	tags = kzalloc_node(sizeof(*tags), flags, node);
>>   	if (!tags)
>>   		return NULL;
>>   
>> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
>> index 61deab0b5a5a..296e0bc97126 100644
>> --- a/block/blk-mq-tag.h
>> +++ b/block/blk-mq-tag.h
>> @@ -22,7 +22,10 @@ struct blk_mq_tags {
>>   };
>>   
>>   
>> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
>> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
>> +					    unsigned int reserved_tags,
>> +					    int node, int alloc_policy,
>> +					    gfp_t flags);
>>   extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>>   
>>   extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 240416057f28..9c52e4dfe132 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>>   		node = set->numa_node;
>>   
>>   	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
>> -				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>> +				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
>> +				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
> You added the gfp_t argument to blk_mq_init_tags() but you are only using that
> argument with a hardcoded value here. So why not simply call kzalloc_node() in
> that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That
> would avoid needing to add the "gfp_t flags" argument and still fit with your
> patch 2 definition of BLK_MQ_GFP_FLAGS.

The blk_mq_init_tags() is defined in another separate file, which I think it means to provide a common way of initializing the tags stuff, and currently in this path it needs GFP_NOIO while in others in future it may not.

Thanks,
BRs


>>   	if (!tags)
>>   		return NULL;
>>   
>>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map
  2019-09-30  5:50     ` Xiubo Li
@ 2019-09-30  6:20       ` Damien Le Moal
  2019-09-30  6:59         ` Xiubo Li
  0 siblings, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2019-09-30  6:20 UTC (permalink / raw)
  To: Xiubo Li, josef, axboe
  Cc: mchristi, ming.lei, linux-block, Gabriel Krisman Bertazi

On 2019/09/29 22:50, Xiubo Li wrote:
> On 2019/9/30 13:28, Damien Le Moal wrote:
>> On 2019/09/29 18:52, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> For some storage drivers, such as the nbd, when there has new socket
>>> connections added, it will update the hardware queue number by calling
>>> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues
>>> first. And then tries to do the hardware queue updating stuff.
>>>
>>> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating
>>> memory for tags, it may cause the mm do the memories direct reclaiming,
>>> since the queues has been freezed, so if needs to flush the page cache
>>> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by
>>> waiting the queues to be unfreezed and then cause deadlock here.
>>>
>>> Since the memory size requested here is a small one, which will make
>>> it not that easy to happen with a large size, but in theory this could
>>> happen when the OS is running in pressure and out of memory.
>>>
>>> Gabriel Krisman Bertazi has hit the similar issue by fixing it in
>>> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping
>>> queues"), but might forget this part.
>>>
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
>>> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>   block/blk-mq-tag.c | 5 +++--
>>>   block/blk-mq-tag.h | 5 ++++-
>>>   block/blk-mq.c     | 3 ++-
>>>   3 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>> index 008388e82b5c..04ee0e4c3fa1 100644
>>> --- a/block/blk-mq-tag.c
>>> +++ b/block/blk-mq-tag.c
>>> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>>>   
>>>   struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>>>   				     unsigned int reserved_tags,
>>> -				     int node, int alloc_policy)
>>> +				     int node, int alloc_policy,
>>> +				     gfp_t flags)
>>>   {
>>>   	struct blk_mq_tags *tags;
>>>   
>>> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>>>   		return NULL;
>>>   	}
>>>   
>>> -	tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node);
>>> +	tags = kzalloc_node(sizeof(*tags), flags, node);
>>>   	if (!tags)
>>>   		return NULL;
>>>   
>>> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
>>> index 61deab0b5a5a..296e0bc97126 100644
>>> --- a/block/blk-mq-tag.h
>>> +++ b/block/blk-mq-tag.h
>>> @@ -22,7 +22,10 @@ struct blk_mq_tags {
>>>   };
>>>   
>>>   
>>> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
>>> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
>>> +					    unsigned int reserved_tags,
>>> +					    int node, int alloc_policy,
>>> +					    gfp_t flags);
>>>   extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>>>   
>>>   extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 240416057f28..9c52e4dfe132 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>>>   		node = set->numa_node;
>>>   
>>>   	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
>>> -				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>>> +				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
>>> +				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
>> You added the gfp_t argument to blk_mq_init_tags() but you are only using that
>> argument with a hardcoded value here. So why not simply call kzalloc_node() in
>> that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That
>> would avoid needing to add the "gfp_t flags" argument and still fit with your
>> patch 2 definition of BLK_MQ_GFP_FLAGS.
> 
> The blk_mq_init_tags() is defined in another separate file, which I think it means to provide a common way of initializing the tags stuff, and currently in this path it needs GFP_NOIO while in others in future it may not.

blk_mq_alloc_rq_map() is currently the only user of blk_mq_init_tags(), so I do
not see the point in doing this change now. If it is needed "in the future" then
do it then.

I do not mind the patch going in as is, but I really think that everything can
be folded into your patch 2 without the addition of blk_mq_init_tags()
additional argument.

> 
> Thanks,
> BRs
> 
> 
>>>   	if (!tags)
>>>   		return NULL;
>>>   
>>>
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map
  2019-09-30  6:20       ` Damien Le Moal
@ 2019-09-30  6:59         ` Xiubo Li
  0 siblings, 0 replies; 7+ messages in thread
From: Xiubo Li @ 2019-09-30  6:59 UTC (permalink / raw)
  To: Damien Le Moal, josef, axboe
  Cc: mchristi, ming.lei, linux-block, Gabriel Krisman Bertazi

On 2019/9/30 14:20, Damien Le Moal wrote:
> On 2019/09/29 22:50, Xiubo Li wrote:
>> On 2019/9/30 13:28, Damien Le Moal wrote:
>>> On 2019/09/29 18:52, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> For some storage drivers, such as the nbd, when there has new socket
>>>> connections added, it will update the hardware queue number by calling
>>>> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues
>>>> first. And then tries to do the hardware queue updating stuff.
>>>>
>>>> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating
>>>> memory for tags, it may cause the mm do the memories direct reclaiming,
>>>> since the queues has been freezed, so if needs to flush the page cache
>>>> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by
>>>> waiting the queues to be unfreezed and then cause deadlock here.
>>>>
>>>> Since the memory size requested here is a small one, which will make
>>>> it not that easy to happen with a large size, but in theory this could
>>>> happen when the OS is running in pressure and out of memory.
>>>>
>>>> Gabriel Krisman Bertazi has hit the similar issue by fixing it in
>>>> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping
>>>> queues"), but might forget this part.
>>>>
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
>>>> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>>>> ---
>>>>    block/blk-mq-tag.c | 5 +++--
>>>>    block/blk-mq-tag.h | 5 ++++-
>>>>    block/blk-mq.c     | 3 ++-
>>>>    3 files changed, 9 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>>> index 008388e82b5c..04ee0e4c3fa1 100644
>>>> --- a/block/blk-mq-tag.c
>>>> +++ b/block/blk-mq-tag.c
>>>> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>>>>    
>>>>    struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>>>>    				     unsigned int reserved_tags,
>>>> -				     int node, int alloc_policy)
>>>> +				     int node, int alloc_policy,
>>>> +				     gfp_t flags)
>>>>    {
>>>>    	struct blk_mq_tags *tags;
>>>>    
>>>> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>>>>    		return NULL;
>>>>    	}
>>>>    
>>>> -	tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node);
>>>> +	tags = kzalloc_node(sizeof(*tags), flags, node);
>>>>    	if (!tags)
>>>>    		return NULL;
>>>>    
>>>> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
>>>> index 61deab0b5a5a..296e0bc97126 100644
>>>> --- a/block/blk-mq-tag.h
>>>> +++ b/block/blk-mq-tag.h
>>>> @@ -22,7 +22,10 @@ struct blk_mq_tags {
>>>>    };
>>>>    
>>>>    
>>>> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
>>>> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
>>>> +					    unsigned int reserved_tags,
>>>> +					    int node, int alloc_policy,
>>>> +					    gfp_t flags);
>>>>    extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>>>>    
>>>>    extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 240416057f28..9c52e4dfe132 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>>>>    		node = set->numa_node;
>>>>    
>>>>    	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
>>>> -				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>>>> +				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
>>>> +				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
>>> You added the gfp_t argument to blk_mq_init_tags() but you are only using that
>>> argument with a hardcoded value here. So why not simply call kzalloc_node() in
>>> that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That
>>> would avoid needing to add the "gfp_t flags" argument and still fit with your
>>> patch 2 definition of BLK_MQ_GFP_FLAGS.
>> The blk_mq_init_tags() is defined in another separate file, which I think it means to provide a common way of initializing the tags stuff, and currently in this path it needs GFP_NOIO while in others in future it may not.
> blk_mq_alloc_rq_map() is currently the only user of blk_mq_init_tags(), so I do
> not see the point in doing this change now. If it is needed "in the future" then
> do it then.
>
> I do not mind the patch going in as is, but I really think that everything can
> be folded into your patch 2 without the addition of blk_mq_init_tags()
> additional argument.

Actually I do not object to folding it into patch 2 :-)

Just from the v2 I supposed that it would be easier to be reviewed by 
splitting it into a separate one.

Thanks,

BRs

>> Thanks,
>> BRs
>>
>>
>>>>    	if (!tags)
>>>>    		return NULL;
>>>>    
>>>>
>>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-30  6:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-30  1:52 [PATCH v4 0/2] blk-mq: Avoid memory reclaim when allocating xiubli
2019-09-30  1:52 ` [PATCH v4 1/2] blk-mq: Avoid memory reclaim when allocating request map xiubli
2019-09-30  5:28   ` Damien Le Moal
2019-09-30  5:50     ` Xiubo Li
2019-09-30  6:20       ` Damien Le Moal
2019-09-30  6:59         ` Xiubo Li
2019-09-30  1:52 ` [PATCH v4 2/2] blk-mq: use BLK_MQ_GFP_FLAGS macro instead xiubli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).