All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
@ 2023-05-13 14:12 Tian Lan
  2023-05-13 16:41 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 14:12 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: Tian Lan

From: Tian Lan <tian.lan@twosigma.com>

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
 block/blk-mq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c8dc70020bc..5b374fab169c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1593,6 +1593,9 @@ void blk_mq_put_rq_ref(struct request *rq)
 		if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
 			blk_mq_free_request(rq);
 	} else if (req_ref_put_and_test(rq)) {
+		if (rq->rq_flags & RQF_MQ_INFLIGHT)
+			__blk_mq_dec_active_requests(rq->mq_hctx)
+
 		__blk_mq_free_request(rq);
 	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
@ 2023-05-13 16:41 ` kernel test robot
  2023-05-13 16:41 ` kernel test robot
  2023-05-13 16:52 ` Jens Axboe
  2 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-05-13 16:41 UTC (permalink / raw)
  To: Tian Lan, axboe, linux-block; +Cc: oe-kbuild-all, Tian Lan

Hi Tian,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.4-rc1 next-20230512]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20230513141234.8395-1-tilan7663%40gmail.com
patch subject: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
config: x86_64-randconfig-a011 (https://download.01.org/0day-ci/archive/20230514/202305140028.fzLbBFSs-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/f958f858d42c76e1439649993bd485ac02200f87
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
        git checkout f958f858d42c76e1439649993bd485ac02200f87
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305140028.fzLbBFSs-lkp@intel.com/

All errors (new ones prefixed by >>):

   block/blk-mq.c: In function 'blk_mq_put_rq_ref':
>> block/blk-mq.c:1574:66: error: expected ';' before '__blk_mq_free_request'
    1574 |                         __blk_mq_dec_active_requests(rq->mq_hctx)
         |                                                                  ^
         |                                                                  ;
    1575 | 
    1576 |                 __blk_mq_free_request(rq);
         |                 ~~~~~~~~~~~~~~~~~~~~~                             


vim +1574 block/blk-mq.c

  1566	
  1567	void blk_mq_put_rq_ref(struct request *rq)
  1568	{
  1569		if (is_flush_rq(rq)) {
  1570			if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
  1571				blk_mq_free_request(rq);
  1572		} else if (req_ref_put_and_test(rq)) {
  1573			if (rq->rq_flags & RQF_MQ_INFLIGHT)
> 1574				__blk_mq_dec_active_requests(rq->mq_hctx)
  1575	
  1576			__blk_mq_free_request(rq);
  1577		}
  1578	}
  1579	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
  2023-05-13 16:41 ` kernel test robot
@ 2023-05-13 16:41 ` kernel test robot
  2023-05-13 19:05   ` Tian Lan
  2023-05-13 19:11   ` Tian Lan
  2023-05-13 16:52 ` Jens Axboe
  2 siblings, 2 replies; 27+ messages in thread
From: kernel test robot @ 2023-05-13 16:41 UTC (permalink / raw)
  To: Tian Lan, axboe, linux-block; +Cc: llvm, oe-kbuild-all, Tian Lan

Hi Tian,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.4-rc1 next-20230512]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20230513141234.8395-1-tilan7663%40gmail.com
patch subject: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230514/202305140021.WvuGBjaZ-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/f958f858d42c76e1439649993bd485ac02200f87
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
        git checkout f958f858d42c76e1439649993bd485ac02200f87
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305140021.WvuGBjaZ-lkp@intel.com/

All errors (new ones prefixed by >>):

>> block/blk-mq.c:1574:45: error: expected ';' after expression
                           __blk_mq_dec_active_requests(rq->mq_hctx)
                                                                    ^
                                                                    ;
   1 error generated.


vim +1574 block/blk-mq.c

  1566	
  1567	void blk_mq_put_rq_ref(struct request *rq)
  1568	{
  1569		if (is_flush_rq(rq)) {
  1570			if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
  1571				blk_mq_free_request(rq);
  1572		} else if (req_ref_put_and_test(rq)) {
  1573			if (rq->rq_flags & RQF_MQ_INFLIGHT)
> 1574				__blk_mq_dec_active_requests(rq->mq_hctx)
  1575	
  1576			__blk_mq_free_request(rq);
  1577		}
  1578	}
  1579	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
  2023-05-13 16:41 ` kernel test robot
  2023-05-13 16:41 ` kernel test robot
@ 2023-05-13 16:52 ` Jens Axboe
  2 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-05-13 16:52 UTC (permalink / raw)
  To: Tian Lan, linux-block; +Cc: Tian Lan

On 5/13/23 8:12 AM, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'
> 
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
>  block/blk-mq.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9c8dc70020bc..5b374fab169c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1593,6 +1593,9 @@ void blk_mq_put_rq_ref(struct request *rq)
>  		if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
>  			blk_mq_free_request(rq);
>  	} else if (req_ref_put_and_test(rq)) {
> +		if (rq->rq_flags & RQF_MQ_INFLIGHT)
> +			__blk_mq_dec_active_requests(rq->mq_hctx)
> +
>  		__blk_mq_free_request(rq);
>  	}
>  }

The fact that you didn't even compile this not withstanding, why
not just move the existing check from blk_mq_free_request() to
__blk_mq_free_request()?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 16:41 ` kernel test robot
@ 2023-05-13 19:05   ` Tian Lan
  2023-05-13 20:54     ` Simon Horman
  2023-05-14 12:08     ` Ming Lei
  2023-05-13 19:11   ` Tian Lan
  1 sibling, 2 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 19:05 UTC (permalink / raw)
  To: lkp; +Cc: axboe, linux-block, llvm, oe-kbuild-all, tian.lan, tilan7663

From: Tian Lan <tian.lan@twosigma.com>

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
 block/blk-mq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c8dc70020bc..732a39d88cd6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -716,6 +716,10 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_crypto_free_request(rq);
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT)
+		__blk_mq_dec_active_requests(hctx);
+
 	if (rq->tag != BLK_MQ_NO_TAG)
 		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != BLK_MQ_NO_TAG)
@@ -733,9 +737,6 @@ void blk_mq_free_request(struct request *rq)
 	    q->elevator->type->ops.finish_request)
 		q->elevator->type->ops.finish_request(rq);
 
-	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		__blk_mq_dec_active_requests(hctx);
-
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->disk->bdi);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 16:41 ` kernel test robot
  2023-05-13 19:05   ` Tian Lan
@ 2023-05-13 19:11   ` Tian Lan
  1 sibling, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 19:11 UTC (permalink / raw)
  To: axboe; +Cc: lkp, linux-block, llvm, oe-kbuild-all, tian.lan, tilan7663

Hi Jens,

>> The fact that you didn't even compile this not withstanding, why
>> not just move the existing check from blk_mq_free_request() to
>> __blk_mq_free_request()?

Could you please review the change again when you have a chance, I made the
change that you suggested earlier. I apologize for the broken patch submitted
earlier, we did test the change on our production host and able to confirm the 
change did address the active request leaking issue. It was totally my fault 
for not proofreading the change before submitting it to the upstream. 

Best,
Tian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 19:05   ` Tian Lan
@ 2023-05-13 20:54     ` Simon Horman
  2023-05-13 21:15       ` Jens Axboe
  2023-05-14 12:08     ` Ming Lei
  1 sibling, 1 reply; 27+ messages in thread
From: Simon Horman @ 2023-05-13 20:54 UTC (permalink / raw)
  To: Tian Lan; +Cc: lkp, axboe, linux-block, llvm, oe-kbuild-all, tian.lan

On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'
> 
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
>  block/blk-mq.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9c8dc70020bc..732a39d88cd6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -716,6 +716,10 @@ static void __blk_mq_free_request(struct request *rq)
>  	blk_crypto_free_request(rq);
>  	blk_pm_mark_last_busy(rq);
>  	rq->mq_hctx = NULL;
> +
> +	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> +		__blk_mq_dec_active_requests(hctx);
> +
>  	if (rq->tag != BLK_MQ_NO_TAG)
>  		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
>  	if (sched_tag != BLK_MQ_NO_TAG)
> @@ -733,9 +737,6 @@ void blk_mq_free_request(struct request *rq)
>  	    q->elevator->type->ops.finish_request)
>  		q->elevator->type->ops.finish_request(rq);
>  
> -	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> -		__blk_mq_dec_active_requests(hctx);
> -

Unless I am mistaken, hctx is now unused in this function.

>  	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
>  		laptop_io_completion(q->disk->bdi);
>  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 20:54     ` Simon Horman
@ 2023-05-13 21:15       ` Jens Axboe
  2023-05-13 22:11         ` Tian Lan
  2023-05-13 22:12         ` Tian Lan
  0 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2023-05-13 21:15 UTC (permalink / raw)
  To: Simon Horman, Tian Lan; +Cc: lkp, linux-block, llvm, oe-kbuild-all, tian.lan

On 5/13/23 2:54?PM, Simon Horman wrote:
> On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
>> From: Tian Lan <tian.lan@twosigma.com>
>>
>> The nr_active counter continues to increase over time which causes the
>> blk_mq_get_tag to hang until the thread is rescheduled to a different
>> core despite there are still tags available.
>>
>> kernel-stack
>>
>>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>>     Call Trace:
>>     <TASK>
>>     __schedule+0x351/0xa20
>>     scheduler+0x5d/0xe0
>>     io_schedule+0x42/0x70
>>     blk_mq_get_tag+0x11a/0x2a0
>>     ? dequeue_task_stop+0x70/0x70
>>     __blk_mq_alloc_requests+0x191/0x2e0
>>
>> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
>> __blk_mq_free_request being called.
>>
>>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>>          b'__blk_mq_free_request+0x1 [kernel]'
>>          b'bt_iter+0x50 [kernel]'
>>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>>          b'blk_mq_timeout_work+0x7c [kernel]'
>>          b'process_one_work+0x1c4 [kernel]'
>>          b'worker_thread+0x4d [kernel]'
>>          b'kthread+0xe6 [kernel]'
>>          b'ret_from_fork+0x1f [kernel]'
>>
>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>> ---
>>  block/blk-mq.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 9c8dc70020bc..732a39d88cd6 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -716,6 +716,10 @@ static void __blk_mq_free_request(struct request *rq)
>>  	blk_crypto_free_request(rq);
>>  	blk_pm_mark_last_busy(rq);
>>  	rq->mq_hctx = NULL;
>> +
>> +	if (rq->rq_flags & RQF_MQ_INFLIGHT)
>> +		__blk_mq_dec_active_requests(hctx);
>> +
>>  	if (rq->tag != BLK_MQ_NO_TAG)
>>  		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
>>  	if (sched_tag != BLK_MQ_NO_TAG)
>> @@ -733,9 +737,6 @@ void blk_mq_free_request(struct request *rq)
>>  	    q->elevator->type->ops.finish_request)
>>  		q->elevator->type->ops.finish_request(rq);
>>  
>> -	if (rq->rq_flags & RQF_MQ_INFLIGHT)
>> -		__blk_mq_dec_active_requests(hctx);
>> -
> 
> Unless I am mistaken, hctx is now unused in this function.

Indeed. Tian, first one didn't compile and this one will most certainly
spew a warning. That's 2 for 2 so far. I realize that you tested this
one a different kernel in prod, but don't seen stuff you haven't even
compiled.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 21:15       ` Jens Axboe
@ 2023-05-13 22:11         ` Tian Lan
  2023-05-14  1:39           ` Jens Axboe
  2023-05-13 22:12         ` Tian Lan
  1 sibling, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-05-13 22:11 UTC (permalink / raw)
  To: axboe; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663

Thanks Simon and Jens, I will remove the unused reference. 

And I'm sorry again, I'm very fresh in kernel development and still 
trying to learn the whole process. Hopefully this is the last time I will 
have to make the change (previously, I was compiling the entire kernel and
missed the warning message). Thank you all for being patient with me and 
tolerate all the silly mistakes.

Tian


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 21:15       ` Jens Axboe
  2023-05-13 22:11         ` Tian Lan
@ 2023-05-13 22:12         ` Tian Lan
  2023-05-14  1:52           ` Jens Axboe
                             ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 22:12 UTC (permalink / raw)
  To: axboe; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663

From: Tian Lan <tian.lan@twosigma.com>

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_crypto_free_request(rq);
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT)
+		__blk_mq_dec_active_requests(hctx);
+
 	if (rq->tag != BLK_MQ_NO_TAG)
 		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	if ((rq->rq_flags & RQF_ELVPRIV) &&
 	    q->elevator->type->ops.finish_request)
 		q->elevator->type->ops.finish_request(rq);
 
-	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		__blk_mq_dec_active_requests(hctx);
-
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->disk->bdi);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 22:11         ` Tian Lan
@ 2023-05-14  1:39           ` Jens Axboe
  2023-05-15 12:28             ` Tian Lan
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2023-05-14  1:39 UTC (permalink / raw)
  To: Tian Lan; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan

On 5/13/23 4:11 PM, Tian Lan wrote:
> Thanks Simon and Jens, I will remove the unused reference. 
> 
> And I'm sorry again, I'm very fresh in kernel development and still 
> trying to learn the whole process. Hopefully this is the last time I will 
> have to make the change (previously, I was compiling the entire kernel and
> missed the warning message). Thank you all for being patient with me and 
> tolerate all the silly mistakes.

There should be zero warnings when compiling, so the easiest way to
check and catch it would be:

$ make -jX -s

as that will be quiet unless a warning or error is triggered.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 22:12         ` Tian Lan
@ 2023-05-14  1:52           ` Jens Axboe
  2023-05-14 12:20             ` Ming Lei
                               ` (2 more replies)
  2023-05-14 14:22           ` Ming Lei
  2023-06-03 23:20           ` [PATCH 1/1] " Jens Axboe
  2 siblings, 3 replies; 27+ messages in thread
From: Jens Axboe @ 2023-05-14  1:52 UTC (permalink / raw)
  To: Tian Lan; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, Ming Lei

On 5/13/23 4:12 PM, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'
> 
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>

I think this needs:

Cc: stable@vger.kernel.org
Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")

tags, but I'm also now confused as to whether the flush handling part
of that patch. Ming, what am I missing in terms of not honoring the
flush ref on put? What happens if two iterators both grab the
flush at the same time, and then subsequently put them?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 19:05   ` Tian Lan
  2023-05-13 20:54     ` Simon Horman
@ 2023-05-14 12:08     ` Ming Lei
  2023-05-14 14:12       ` Tian Lan
  2023-05-14 14:15       ` Ming Lei
  1 sibling, 2 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 12:08 UTC (permalink / raw)
  To: Tian Lan; +Cc: lkp, axboe, linux-block, llvm, oe-kbuild-all, tian.lan, ming.lei

Hello Tian,

On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.

RQF_MQ_INFLIGHT won't be cleared when the request is freed normally
from blk_mq_free_request().

> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1

RQF_MQ_INFLIGHT/RQF_DONTPREP/RQF_IO_STAT/RQF_STATS is set, and it isn't
a FLUSH request.

>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'

If __blk_mq_free_request() is called from timeout, that means this
request has been freed by blk_mq_free_request() already, so __blk_mq_dec_active_requests
should have been run.

However, one case is that __blk_mq_dec_active_requests isn't called in
blk_mq_end_request_batch, so maybe your driver is nvme with multiple
NSs, so can you try the following patch?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..9c5dd5aa289c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1062,6 +1062,9 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
 		if (iob->need_ts)
 			__blk_mq_end_request_acct(rq, now);
 
+		if (rq->rq_flags & RQF_MQ_INFLIGHT)
+			__blk_mq_dec_active_requests(rq->mq_hctx);
+
 		rq_qos_done(rq->q, rq);
 
 		/*

Thanks, 
Ming


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14  1:52           ` Jens Axboe
@ 2023-05-14 12:20             ` Ming Lei
  2023-06-03 22:39             ` [PATCH] " Tian Lan
  2023-06-03 22:39             ` [PATCH 1/1] " Tian Lan
  2 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 12:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tian Lan, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan,
	ming.lei

Hi Jens,

On Sat, May 13, 2023 at 07:52:37PM -0600, Jens Axboe wrote:
> On 5/13/23 4:12 PM, Tian Lan wrote:
> > From: Tian Lan <tian.lan@twosigma.com>
> > 
> > The nr_active counter continues to increase over time which causes the
> > blk_mq_get_tag to hang until the thread is rescheduled to a different
> > core despite there are still tags available.
> > 
> > kernel-stack
> > 
> >   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> >   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> >   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
> >     Call Trace:
> >     <TASK>
> >     __schedule+0x351/0xa20
> >     scheduler+0x5d/0xe0
> >     io_schedule+0x42/0x70
> >     blk_mq_get_tag+0x11a/0x2a0
> >     ? dequeue_task_stop+0x70/0x70
> >     __blk_mq_alloc_requests+0x191/0x2e0
> > 
> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.
> > 
> >   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> >          b'__blk_mq_free_request+0x1 [kernel]'
> >          b'bt_iter+0x50 [kernel]'
> >          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> >          b'blk_mq_timeout_work+0x7c [kernel]'
> >          b'process_one_work+0x1c4 [kernel]'
> >          b'worker_thread+0x4d [kernel]'
> >          b'kthread+0xe6 [kernel]'
> >          b'ret_from_fork+0x1f [kernel]'
> > 
> > Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> 
> I think this needs:
> 
> Cc: stable@vger.kernel.org
> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")

I am still not sure what is wrong with above commit, and the real cause
of Tian's issue & this fix.

> 
> tags, but I'm also now confused as to whether the flush handling part
> of that patch. Ming, what am I missing in terms of not honoring the
> flush ref on put?

From Tian's log, the issue didn't happen on flush request.

> What happens if two iterators both grab the
> flush at the same time, and then subsequently put them?

Both two code paths try to acquire the request refcount, and put it
if the reference is grabbed, and reference release is just done
in flush_end_io().


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14 12:08     ` Ming Lei
@ 2023-05-14 14:12       ` Tian Lan
  2023-05-14 14:15       ` Ming Lei
  1 sibling, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-14 14:12 UTC (permalink / raw)
  To: ming.lei
  Cc: axboe, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663

Hi Ming,

> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.

> RQF_MQ_INFLIGHT won't be cleared when the request is freed normally
> from blk_mq_free_request().

Yes you are correct, maybe I should capture both rq->rq_flags and 
rq->state so we know for sure if either of blk_mq_free_request or 
__blk_mq_put_driver_tag was being called before hitting __blk_mq_free_request.


> >          b'__blk_mq_free_request+0x1 [kernel]'
> >          b'bt_iter+0x50 [kernel]'
> >          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> >          b'blk_mq_timeout_work+0x7c [kernel]'
> >          b'process_one_work+0x1c4 [kernel]'
> >          b'worker_thread+0x4d [kernel]'
> >          b'kthread+0xe6 [kernel]'
> >          b'ret_from_fork+0x1f [kernel]'

> If __blk_mq_free_request() is called from timeout, that means this
> request has been freed by blk_mq_free_request() already, so __blk_mq_dec_active_requests
> should have been run.

We are also seeing a different call stack that could also potentially by-pass 
__blk_mq_dec_active_requests. Do you think they could be caused by the same 
underlying issue.

1976    2000    collectd    __blk_mq_free_request rq_flags 0x620c0 in-flight 1
        b'__blk_mq_free_request+0x1 [kernel]'
        b'bt_iter+0x50 [kernel]'
        b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
        b'blk_mq_in_flight+0x35 [kernel]'
        b'diskstats_show+0x205 [kernel]'
        b'seq_read_iter+0x11f [kernel]'
        b'proc_req_read_iter+0x4a [kernel]'
        b'vfs_read+0x239 [kernel]'
        b'ksys_read+0xb [kernel]'
        b'do_syscall_64+0x58 [kernel]'
        b'entry_SYSCALL_64_after_hwframe+0x63 [kernel]'


> However, one case is that __blk_mq_dec_active_requests isn't called in
> blk_mq_end_request_batch, so maybe your driver is nvme with multiple
> NSs, so can you try the following patch?

Yes, we are using nvme driver with multiple NSs. 

I can test this patch and will update you on the results. I'm just curious 
shouldn't the counter be subtracted via __blk_mq_sub_active_requests when 
blk_mq_flush_tag_batch is invoked in that case. Then this would result in 
double counting, is that correct.


Thanks,
Tian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14 12:08     ` Ming Lei
  2023-05-14 14:12       ` Tian Lan
@ 2023-05-14 14:15       ` Ming Lei
  1 sibling, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 14:15 UTC (permalink / raw)
  To: Tian Lan; +Cc: lkp, axboe, linux-block, llvm, oe-kbuild-all, tian.lan, ming.lei

On Sun, May 14, 2023 at 08:08:12PM +0800, Ming Lei wrote:
> Hello Tian,
> 
> On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
> > From: Tian Lan <tian.lan@twosigma.com>
> > 
> > The nr_active counter continues to increase over time which causes the
> > blk_mq_get_tag to hang until the thread is rescheduled to a different
> > core despite there are still tags available.
> > 
> > kernel-stack
> > 
> >   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> >   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> >   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
> >     Call Trace:
> >     <TASK>
> >     __schedule+0x351/0xa20
> >     scheduler+0x5d/0xe0
> >     io_schedule+0x42/0x70
> >     blk_mq_get_tag+0x11a/0x2a0
> >     ? dequeue_task_stop+0x70/0x70
> >     __blk_mq_alloc_requests+0x191/0x2e0
> > 
> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.
> 
> RQF_MQ_INFLIGHT won't be cleared when the request is freed normally
> from blk_mq_free_request().
> 
> > 
> >   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> 
> RQF_MQ_INFLIGHT/RQF_DONTPREP/RQF_IO_STAT/RQF_STATS is set, and it isn't
> a FLUSH request.
> 
> >          b'__blk_mq_free_request+0x1 [kernel]'
> >          b'bt_iter+0x50 [kernel]'
> >          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> >          b'blk_mq_timeout_work+0x7c [kernel]'
> >          b'process_one_work+0x1c4 [kernel]'
> >          b'worker_thread+0x4d [kernel]'
> >          b'kthread+0xe6 [kernel]'
> >          b'ret_from_fork+0x1f [kernel]'
> 
> If __blk_mq_free_request() is called from timeout, that means this
> request has been freed by blk_mq_free_request() already, so __blk_mq_dec_active_requests
> should have been run.
> 
> However, one case is that __blk_mq_dec_active_requests isn't called in
> blk_mq_end_request_batch, so maybe your driver is nvme with multiple
> NSs, so can you try the following patch?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f6dad0886a2f..9c5dd5aa289c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1062,6 +1062,9 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
>  		if (iob->need_ts)
>  			__blk_mq_end_request_acct(rq, now);
>  
> +		if (rq->rq_flags & RQF_MQ_INFLIGHT)
> +			__blk_mq_dec_active_requests(rq->mq_hctx);
> +
>  		rq_qos_done(rq->q, rq);

The above is actually not correct, given __blk_mq_sub_active_requests()
is done as batch in blk_mq_flush_tag_batch().

So the problem is in the difference between blk_mq_free_request() and
blk_mq_end_request_batch(), wrt. when to call
__blk_mq_dec_active_requests().

So your V2 looks correct, given now both two paths decrease active
requests after req_ref_put_and_test() is done.

And the issue should start from f794f3351f26 ("block: add support for
blk_mq_end_request_batch()").


Thanks,
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 22:12         ` Tian Lan
  2023-05-14  1:52           ` Jens Axboe
@ 2023-05-14 14:22           ` Ming Lei
  2023-05-14 14:28             ` Tian Lan
  2023-05-14 14:53             ` [PATCH] " Tian Lan
  2023-06-03 23:20           ` [PATCH 1/1] " Jens Axboe
  2 siblings, 2 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 14:22 UTC (permalink / raw)
  To: Tian Lan
  Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, ming.lei

On Sat, May 13, 2023 at 06:12:27PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'
> 
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
>  block/blk-mq.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f6dad0886a2f..850bfb844ed2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
>  	blk_crypto_free_request(rq);
>  	blk_pm_mark_last_busy(rq);
>  	rq->mq_hctx = NULL;
> +
> +	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> +		__blk_mq_dec_active_requests(hctx);
> +
>  	if (rq->tag != BLK_MQ_NO_TAG)
>  		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
>  	if (sched_tag != BLK_MQ_NO_TAG)
> @@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
>  void blk_mq_free_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> -	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
>  	if ((rq->rq_flags & RQF_ELVPRIV) &&
>  	    q->elevator->type->ops.finish_request)
>  		q->elevator->type->ops.finish_request(rq);
>  
> -	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> -		__blk_mq_dec_active_requests(hctx);
> -
>  	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
>  		laptop_io_completion(q->disk->bdi);

This patch looks fine, but please add words about why this way fixes the issue
with fixes tag:

- the difference between blk_mq_free_request() and blk_mq_end_request_batch(),
wrt. when to call __blk_mq_dec_active_requests(), the former does it
before calling req_ref_put_and_test(), and the later decreases active
request after req_ref_put_and_test().

- Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")

Once the above is done, feel free to add:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14 14:22           ` Ming Lei
@ 2023-05-14 14:28             ` Tian Lan
  2023-05-14 14:53             ` [PATCH] " Tian Lan
  1 sibling, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-14 14:28 UTC (permalink / raw)
  To: ming.lei
  Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663

Thanks Ming, will make the requested changes.

Tian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14 14:22           ` Ming Lei
  2023-05-14 14:28             ` Tian Lan
@ 2023-05-14 14:53             ` Tian Lan
  2023-05-15  5:54               ` Hannes Reinecke
  1 sibling, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-05-14 14:53 UTC (permalink / raw)
  To: ming.lei
  Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663

From: Tian Lan <tian.lan@twosigma.com>

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIORe state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

The issue is caused by the difference between blk_mq_free_request() and
blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
The former does it before calling req_ref_put_and_test(), and the latter
decreases the active request after req_ref_put_and_test().

- Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_crypto_free_request(rq);
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT)
+		__blk_mq_dec_active_requests(hctx);
+
 	if (rq->tag != BLK_MQ_NO_TAG)
 		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	if ((rq->rq_flags & RQF_ELVPRIV) &&
 	    q->elevator->type->ops.finish_request)
 		q->elevator->type->ops.finish_request(rq);
 
-	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		__blk_mq_dec_active_requests(hctx);
-
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->disk->bdi);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14 14:53             ` [PATCH] " Tian Lan
@ 2023-05-15  5:54               ` Hannes Reinecke
  2023-05-15 12:26                 ` Tian Lan
  0 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2023-05-15  5:54 UTC (permalink / raw)
  To: Tian Lan, ming.lei
  Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan

On 5/14/23 16:53, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>    INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>    Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    task:inboundIORe state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>      Call Trace:
>      <TASK>
>      __schedule+0x351/0xa20
>      scheduler+0x5d/0xe0
>      io_schedule+0x42/0x70
>      blk_mq_get_tag+0x11a/0x2a0
>      ? dequeue_task_stop+0x70/0x70
>      __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>    320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
>           b'__blk_mq_free_request+0x1 [kernel]'
>           b'bt_iter+0x50 [kernel]'
>           b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>           b'blk_mq_timeout_work+0x7c [kernel]'
>           b'process_one_work+0x1c4 [kernel]'
>           b'worker_thread+0x4d [kernel]'
>           b'kthread+0xe6 [kernel]'
>           b'ret_from_fork+0x1f [kernel]'
> 
> The issue is caused by the difference between blk_mq_free_request() and
> blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
> The former does it before calling req_ref_put_and_test(), and the latter
> decreases the active request after req_ref_put_and_test().
> 
> - Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
> 
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-15  5:54               ` Hannes Reinecke
@ 2023-05-15 12:26                 ` Tian Lan
  2023-05-25  2:12                   ` Ming Lei
  0 siblings, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-05-15 12:26 UTC (permalink / raw)
  To: hare
  Cc: axboe, horms, linux-block, lkp, llvm, ming.lei, oe-kbuild-all,
	tian.lan, tilan7663

From: Tian Lan <tian.lan@twosigma.com>

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIORe state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

The issue is caused by the difference between blk_mq_free_request() and
blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
The former does it before calling req_ref_put_and_test(), and the latter
decreases the active request after req_ref_put_and_test().

- Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_crypto_free_request(rq);
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT)
+		__blk_mq_dec_active_requests(hctx);
+
 	if (rq->tag != BLK_MQ_NO_TAG)
 		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	if ((rq->rq_flags & RQF_ELVPRIV) &&
 	    q->elevator->type->ops.finish_request)
 		q->elevator->type->ops.finish_request(rq);
 
-	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		__blk_mq_dec_active_requests(hctx);
-
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->disk->bdi);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14  1:39           ` Jens Axboe
@ 2023-05-15 12:28             ` Tian Lan
  0 siblings, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-15 12:28 UTC (permalink / raw)
  To: axboe; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663

Thanks Jens, this is very helpful!

Tian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-15 12:26                 ` Tian Lan
@ 2023-05-25  2:12                   ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-25  2:12 UTC (permalink / raw)
  To: Tian Lan
  Cc: hare, axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan

On Mon, May 15, 2023 at 08:26:43AM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIORe state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'
> 
> The issue is caused by the difference between blk_mq_free_request() and
> blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
> The former does it before calling req_ref_put_and_test(), and the latter
> decreases the active request after req_ref_put_and_test().
> 
> - Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
> 
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Hello Jens,

Looks this fix is missed, can you take a look?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14  1:52           ` Jens Axboe
  2023-05-14 12:20             ` Ming Lei
@ 2023-06-03 22:39             ` Tian Lan
  2023-06-04  0:47               ` Ming Lei
  2023-06-03 22:39             ` [PATCH 1/1] " Tian Lan
  2 siblings, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-06-03 22:39 UTC (permalink / raw)
  To: axboe
  Cc: horms, linux-block, lkp, llvm, ming.lei, oe-kbuild-all, tian.lan,
	tilan7663, Hannes Reinecke, stable

From: Tian Lan <tian.lan@twosigma.com>

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIORe state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

This issue arises when both bt_iter() and blk_mq_end_request_batch()
are iterating on the same request. The leak happens when
blk_mq_find_and_get_req() is executed(from bt_iter) before
req_ref_put_and_test() gets called by blk_mq_end_request_batch().
And because non-flush request freed by blk_mq_put_rq_ref() bypasses the
active request tracking, the counter would slowly leak overtime.

Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_crypto_free_request(rq);
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT)
+		__blk_mq_dec_active_requests(hctx);
+
 	if (rq->tag != BLK_MQ_NO_TAG)
 		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	if ((rq->rq_flags & RQF_ELVPRIV) &&
 	    q->elevator->type->ops.finish_request)
 		q->elevator->type->ops.finish_request(rq);
 
-	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		__blk_mq_dec_active_requests(hctx);
-
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->disk->bdi);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-14  1:52           ` Jens Axboe
  2023-05-14 12:20             ` Ming Lei
  2023-06-03 22:39             ` [PATCH] " Tian Lan
@ 2023-06-03 22:39             ` Tian Lan
  2 siblings, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-06-03 22:39 UTC (permalink / raw)
  To: axboe
  Cc: horms, linux-block, lkp, llvm, ming.lei, oe-kbuild-all, tian.lan,
	tilan7663

Hello Jens,

I updated the patch with more detailed description of the issue. Could you
please take a look again when you have a chance and let me know if there is
anything you would like to make a change. 

Best,
Tian Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-05-13 22:12         ` Tian Lan
  2023-05-14  1:52           ` Jens Axboe
  2023-05-14 14:22           ` Ming Lei
@ 2023-06-03 23:20           ` Jens Axboe
  2 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-06-03 23:20 UTC (permalink / raw)
  To: Tian Lan; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan


On Sat, 13 May 2023 18:12:27 -0400, Tian Lan wrote:
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> [...]

Applied, thanks!

[1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
      commit: ddad59331a4e16088468ca0ad228a9fe32d7955a

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
  2023-06-03 22:39             ` [PATCH] " Tian Lan
@ 2023-06-04  0:47               ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-06-04  0:47 UTC (permalink / raw)
  To: Tian Lan
  Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan,
	Hannes Reinecke, stable

On Sat, Jun 03, 2023 at 06:39:12PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
> 
> kernel-stack
> 
>   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:inboundIORe state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
>     Call Trace:
>     <TASK>
>     __schedule+0x351/0xa20
>     scheduler+0x5d/0xe0
>     io_schedule+0x42/0x70
>     blk_mq_get_tag+0x11a/0x2a0
>     ? dequeue_task_stop+0x70/0x70
>     __blk_mq_alloc_requests+0x191/0x2e0
> 
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
> 
>   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
>          b'__blk_mq_free_request+0x1 [kernel]'
>          b'bt_iter+0x50 [kernel]'
>          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>          b'blk_mq_timeout_work+0x7c [kernel]'
>          b'process_one_work+0x1c4 [kernel]'
>          b'worker_thread+0x4d [kernel]'
>          b'kthread+0xe6 [kernel]'
>          b'ret_from_fork+0x1f [kernel]'
> 
> This issue arises when both bt_iter() and blk_mq_end_request_batch()
> are iterating on the same request. The leak happens when
> blk_mq_find_and_get_req() is executed(from bt_iter) before
> req_ref_put_and_test() gets called by blk_mq_end_request_batch().
> And because non-flush request freed by blk_mq_put_rq_ref() bypasses the
> active request tracking, the counter would slowly leak overtime.
> 
> Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")

f794f3351f26 is merged to v5.16, and the leak starts.

> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")

2e315dc07df0 is merged to v5.14, when everything is just fine.

Both two aren't marked as -stable, so 'Fixes: 2e315dc07df0' is actually
not correct.


thanks,
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-06-04  0:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
2023-05-13 16:41 ` kernel test robot
2023-05-13 16:41 ` kernel test robot
2023-05-13 19:05   ` Tian Lan
2023-05-13 20:54     ` Simon Horman
2023-05-13 21:15       ` Jens Axboe
2023-05-13 22:11         ` Tian Lan
2023-05-14  1:39           ` Jens Axboe
2023-05-15 12:28             ` Tian Lan
2023-05-13 22:12         ` Tian Lan
2023-05-14  1:52           ` Jens Axboe
2023-05-14 12:20             ` Ming Lei
2023-06-03 22:39             ` [PATCH] " Tian Lan
2023-06-04  0:47               ` Ming Lei
2023-06-03 22:39             ` [PATCH 1/1] " Tian Lan
2023-05-14 14:22           ` Ming Lei
2023-05-14 14:28             ` Tian Lan
2023-05-14 14:53             ` [PATCH] " Tian Lan
2023-05-15  5:54               ` Hannes Reinecke
2023-05-15 12:26                 ` Tian Lan
2023-05-25  2:12                   ` Ming Lei
2023-06-03 23:20           ` [PATCH 1/1] " Jens Axboe
2023-05-14 12:08     ` Ming Lei
2023-05-14 14:12       ` Tian Lan
2023-05-14 14:15       ` Ming Lei
2023-05-13 19:11   ` Tian Lan
2023-05-13 16:52 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.