* [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
@ 2023-05-13 14:12 Tian Lan
2023-05-13 16:41 ` kernel test robot
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 14:12 UTC (permalink / raw)
To: axboe, linux-block; +Cc: Tian Lan
From: Tian Lan <tian.lan@twosigma.com>
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
block/blk-mq.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c8dc70020bc..5b374fab169c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1593,6 +1593,9 @@ void blk_mq_put_rq_ref(struct request *rq)
if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
blk_mq_free_request(rq);
} else if (req_ref_put_and_test(rq)) {
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(rq->mq_hctx)
+
__blk_mq_free_request(rq);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
@ 2023-05-13 16:41 ` kernel test robot
2023-05-13 16:41 ` kernel test robot
2023-05-13 16:52 ` Jens Axboe
2 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-05-13 16:41 UTC (permalink / raw)
To: Tian Lan, axboe, linux-block; +Cc: oe-kbuild-all, Tian Lan
Hi Tian,
kernel test robot noticed the following build errors:
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.4-rc1 next-20230512]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20230513141234.8395-1-tilan7663%40gmail.com
patch subject: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
config: x86_64-randconfig-a011 (https://download.01.org/0day-ci/archive/20230514/202305140028.fzLbBFSs-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/f958f858d42c76e1439649993bd485ac02200f87
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
git checkout f958f858d42c76e1439649993bd485ac02200f87
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305140028.fzLbBFSs-lkp@intel.com/
All errors (new ones prefixed by >>):
block/blk-mq.c: In function 'blk_mq_put_rq_ref':
>> block/blk-mq.c:1574:66: error: expected ';' before '__blk_mq_free_request'
1574 | __blk_mq_dec_active_requests(rq->mq_hctx)
| ^
| ;
1575 |
1576 | __blk_mq_free_request(rq);
| ~~~~~~~~~~~~~~~~~~~~~
vim +1574 block/blk-mq.c
1566
1567 void blk_mq_put_rq_ref(struct request *rq)
1568 {
1569 if (is_flush_rq(rq)) {
1570 if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
1571 blk_mq_free_request(rq);
1572 } else if (req_ref_put_and_test(rq)) {
1573 if (rq->rq_flags & RQF_MQ_INFLIGHT)
> 1574 __blk_mq_dec_active_requests(rq->mq_hctx)
1575
1576 __blk_mq_free_request(rq);
1577 }
1578 }
1579
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
2023-05-13 16:41 ` kernel test robot
@ 2023-05-13 16:41 ` kernel test robot
2023-05-13 19:05 ` Tian Lan
2023-05-13 19:11 ` Tian Lan
2023-05-13 16:52 ` Jens Axboe
2 siblings, 2 replies; 27+ messages in thread
From: kernel test robot @ 2023-05-13 16:41 UTC (permalink / raw)
To: Tian Lan, axboe, linux-block; +Cc: llvm, oe-kbuild-all, Tian Lan
Hi Tian,
kernel test robot noticed the following build errors:
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.4-rc1 next-20230512]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20230513141234.8395-1-tilan7663%40gmail.com
patch subject: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230514/202305140021.WvuGBjaZ-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/f958f858d42c76e1439649993bd485ac02200f87
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Tian-Lan/blk-mq-fix-blk_mq_hw_ctx-active-request-accounting/20230513-221329
git checkout f958f858d42c76e1439649993bd485ac02200f87
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305140021.WvuGBjaZ-lkp@intel.com/
All errors (new ones prefixed by >>):
>> block/blk-mq.c:1574:45: error: expected ';' after expression
__blk_mq_dec_active_requests(rq->mq_hctx)
^
;
1 error generated.
vim +1574 block/blk-mq.c
1566
1567 void blk_mq_put_rq_ref(struct request *rq)
1568 {
1569 if (is_flush_rq(rq)) {
1570 if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
1571 blk_mq_free_request(rq);
1572 } else if (req_ref_put_and_test(rq)) {
1573 if (rq->rq_flags & RQF_MQ_INFLIGHT)
> 1574 __blk_mq_dec_active_requests(rq->mq_hctx)
1575
1576 __blk_mq_free_request(rq);
1577 }
1578 }
1579
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
2023-05-13 16:41 ` kernel test robot
2023-05-13 16:41 ` kernel test robot
@ 2023-05-13 16:52 ` Jens Axboe
2 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-05-13 16:52 UTC (permalink / raw)
To: Tian Lan, linux-block; +Cc: Tian Lan
On 5/13/23 8:12 AM, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
> block/blk-mq.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9c8dc70020bc..5b374fab169c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1593,6 +1593,9 @@ void blk_mq_put_rq_ref(struct request *rq)
> if (rq->end_io(rq, 0) == RQ_END_IO_FREE)
> blk_mq_free_request(rq);
> } else if (req_ref_put_and_test(rq)) {
> + if (rq->rq_flags & RQF_MQ_INFLIGHT)
> + __blk_mq_dec_active_requests(rq->mq_hctx)
> +
> __blk_mq_free_request(rq);
> }
> }
The fact that you didn't even compile this not withstanding, why
not just move the existing check from blk_mq_free_request() to
__blk_mq_free_request()?
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 16:41 ` kernel test robot
@ 2023-05-13 19:05 ` Tian Lan
2023-05-13 20:54 ` Simon Horman
2023-05-14 12:08 ` Ming Lei
2023-05-13 19:11 ` Tian Lan
1 sibling, 2 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 19:05 UTC (permalink / raw)
To: lkp; +Cc: axboe, linux-block, llvm, oe-kbuild-all, tian.lan, tilan7663
From: Tian Lan <tian.lan@twosigma.com>
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
block/blk-mq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c8dc70020bc..732a39d88cd6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -716,6 +716,10 @@ static void __blk_mq_free_request(struct request *rq)
blk_crypto_free_request(rq);
blk_pm_mark_last_busy(rq);
rq->mq_hctx = NULL;
+
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(hctx);
+
if (rq->tag != BLK_MQ_NO_TAG)
blk_mq_put_tag(hctx->tags, ctx, rq->tag);
if (sched_tag != BLK_MQ_NO_TAG)
@@ -733,9 +737,6 @@ void blk_mq_free_request(struct request *rq)
q->elevator->type->ops.finish_request)
q->elevator->type->ops.finish_request(rq);
- if (rq->rq_flags & RQF_MQ_INFLIGHT)
- __blk_mq_dec_active_requests(hctx);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 16:41 ` kernel test robot
2023-05-13 19:05 ` Tian Lan
@ 2023-05-13 19:11 ` Tian Lan
1 sibling, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 19:11 UTC (permalink / raw)
To: axboe; +Cc: lkp, linux-block, llvm, oe-kbuild-all, tian.lan, tilan7663
Hi Jens,
>> The fact that you didn't even compile this not withstanding, why
>> not just move the existing check from blk_mq_free_request() to
>> __blk_mq_free_request()?
Could you please review the change again when you have a chance, I made the
change that you suggested earlier. I apologize for the broken patch submitted
earlier, we did test the change on our production host and able to confirm the
change did address the active request leaking issue. It was totally my fault
for not proofreading the change before submitting it to the upstream.
Best,
Tian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 19:05 ` Tian Lan
@ 2023-05-13 20:54 ` Simon Horman
2023-05-13 21:15 ` Jens Axboe
2023-05-14 12:08 ` Ming Lei
1 sibling, 1 reply; 27+ messages in thread
From: Simon Horman @ 2023-05-13 20:54 UTC (permalink / raw)
To: Tian Lan; +Cc: lkp, axboe, linux-block, llvm, oe-kbuild-all, tian.lan
On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
> block/blk-mq.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9c8dc70020bc..732a39d88cd6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -716,6 +716,10 @@ static void __blk_mq_free_request(struct request *rq)
> blk_crypto_free_request(rq);
> blk_pm_mark_last_busy(rq);
> rq->mq_hctx = NULL;
> +
> + if (rq->rq_flags & RQF_MQ_INFLIGHT)
> + __blk_mq_dec_active_requests(hctx);
> +
> if (rq->tag != BLK_MQ_NO_TAG)
> blk_mq_put_tag(hctx->tags, ctx, rq->tag);
> if (sched_tag != BLK_MQ_NO_TAG)
> @@ -733,9 +737,6 @@ void blk_mq_free_request(struct request *rq)
> q->elevator->type->ops.finish_request)
> q->elevator->type->ops.finish_request(rq);
>
> - if (rq->rq_flags & RQF_MQ_INFLIGHT)
> - __blk_mq_dec_active_requests(hctx);
> -
Unless I am mistaken, hctx is now unused in this function.
> if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
> laptop_io_completion(q->disk->bdi);
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 20:54 ` Simon Horman
@ 2023-05-13 21:15 ` Jens Axboe
2023-05-13 22:11 ` Tian Lan
2023-05-13 22:12 ` Tian Lan
0 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2023-05-13 21:15 UTC (permalink / raw)
To: Simon Horman, Tian Lan; +Cc: lkp, linux-block, llvm, oe-kbuild-all, tian.lan
On 5/13/23 2:54?PM, Simon Horman wrote:
> On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
>> From: Tian Lan <tian.lan@twosigma.com>
>>
>> The nr_active counter continues to increase over time which causes the
>> blk_mq_get_tag to hang until the thread is rescheduled to a different
>> core despite there are still tags available.
>>
>> kernel-stack
>>
>> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
>> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
>> Call Trace:
>> <TASK>
>> __schedule+0x351/0xa20
>> scheduler+0x5d/0xe0
>> io_schedule+0x42/0x70
>> blk_mq_get_tag+0x11a/0x2a0
>> ? dequeue_task_stop+0x70/0x70
>> __blk_mq_alloc_requests+0x191/0x2e0
>>
>> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
>> __blk_mq_free_request being called.
>>
>> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>> b'__blk_mq_free_request+0x1 [kernel]'
>> b'bt_iter+0x50 [kernel]'
>> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
>> b'blk_mq_timeout_work+0x7c [kernel]'
>> b'process_one_work+0x1c4 [kernel]'
>> b'worker_thread+0x4d [kernel]'
>> b'kthread+0xe6 [kernel]'
>> b'ret_from_fork+0x1f [kernel]'
>>
>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>> ---
>> block/blk-mq.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 9c8dc70020bc..732a39d88cd6 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -716,6 +716,10 @@ static void __blk_mq_free_request(struct request *rq)
>> blk_crypto_free_request(rq);
>> blk_pm_mark_last_busy(rq);
>> rq->mq_hctx = NULL;
>> +
>> + if (rq->rq_flags & RQF_MQ_INFLIGHT)
>> + __blk_mq_dec_active_requests(hctx);
>> +
>> if (rq->tag != BLK_MQ_NO_TAG)
>> blk_mq_put_tag(hctx->tags, ctx, rq->tag);
>> if (sched_tag != BLK_MQ_NO_TAG)
>> @@ -733,9 +737,6 @@ void blk_mq_free_request(struct request *rq)
>> q->elevator->type->ops.finish_request)
>> q->elevator->type->ops.finish_request(rq);
>>
>> - if (rq->rq_flags & RQF_MQ_INFLIGHT)
>> - __blk_mq_dec_active_requests(hctx);
>> -
>
> Unless I am mistaken, hctx is now unused in this function.
Indeed. Tian, first one didn't compile and this one will most certainly
spew a warning. That's 2 for 2 so far. I realize that you tested this
one a different kernel in prod, but don't seen stuff you haven't even
compiled.
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 21:15 ` Jens Axboe
@ 2023-05-13 22:11 ` Tian Lan
2023-05-14 1:39 ` Jens Axboe
2023-05-13 22:12 ` Tian Lan
1 sibling, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-05-13 22:11 UTC (permalink / raw)
To: axboe; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663
Thanks Simon and Jens, I will remove the unused reference.
And I'm sorry again, I'm very fresh in kernel development and still
trying to learn the whole process. Hopefully this is the last time I will
have to make the change (previously, I was compiling the entire kernel and
missed the warning message). Thank you all for being patient with me and
tolerate all the silly mistakes.
Tian
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 21:15 ` Jens Axboe
2023-05-13 22:11 ` Tian Lan
@ 2023-05-13 22:12 ` Tian Lan
2023-05-14 1:52 ` Jens Axboe
` (2 more replies)
1 sibling, 3 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-13 22:12 UTC (permalink / raw)
To: axboe; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663
From: Tian Lan <tian.lan@twosigma.com>
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
block/blk-mq.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
blk_crypto_free_request(rq);
blk_pm_mark_last_busy(rq);
rq->mq_hctx = NULL;
+
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(hctx);
+
if (rq->tag != BLK_MQ_NO_TAG)
blk_mq_put_tag(hctx->tags, ctx, rq->tag);
if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
- struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
if ((rq->rq_flags & RQF_ELVPRIV) &&
q->elevator->type->ops.finish_request)
q->elevator->type->ops.finish_request(rq);
- if (rq->rq_flags & RQF_MQ_INFLIGHT)
- __blk_mq_dec_active_requests(hctx);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 22:11 ` Tian Lan
@ 2023-05-14 1:39 ` Jens Axboe
2023-05-15 12:28 ` Tian Lan
0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2023-05-14 1:39 UTC (permalink / raw)
To: Tian Lan; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan
On 5/13/23 4:11 PM, Tian Lan wrote:
> Thanks Simon and Jens, I will remove the unused reference.
>
> And I'm sorry again, I'm very fresh in kernel development and still
> trying to learn the whole process. Hopefully this is the last time I will
> have to make the change (previously, I was compiling the entire kernel and
> missed the warning message). Thank you all for being patient with me and
> tolerate all the silly mistakes.
There should be zero warnings when compiling, so the easiest way to
check and catch it would be:
$ make -jX -s
as that will be quiet unless a warning or error is triggered.
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 22:12 ` Tian Lan
@ 2023-05-14 1:52 ` Jens Axboe
2023-05-14 12:20 ` Ming Lei
` (2 more replies)
2023-05-14 14:22 ` Ming Lei
2023-06-03 23:20 ` [PATCH 1/1] " Jens Axboe
2 siblings, 3 replies; 27+ messages in thread
From: Jens Axboe @ 2023-05-14 1:52 UTC (permalink / raw)
To: Tian Lan; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, Ming Lei
On 5/13/23 4:12 PM, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
I think this needs:
Cc: stable@vger.kernel.org
Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
tags, but I'm also now confused as to whether the flush handling part
of that patch. Ming, what am I missing in terms of not honoring the
flush ref on put? What happens if two iterators both grab the
flush at the same time, and then subsequently put them?
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 19:05 ` Tian Lan
2023-05-13 20:54 ` Simon Horman
@ 2023-05-14 12:08 ` Ming Lei
2023-05-14 14:12 ` Tian Lan
2023-05-14 14:15 ` Ming Lei
1 sibling, 2 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 12:08 UTC (permalink / raw)
To: Tian Lan; +Cc: lkp, axboe, linux-block, llvm, oe-kbuild-all, tian.lan, ming.lei
Hello Tian,
On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
RQF_MQ_INFLIGHT won't be cleared when the request is freed normally
from blk_mq_free_request().
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
RQF_MQ_INFLIGHT/RQF_DONTPREP/RQF_IO_STAT/RQF_STATS is set, and it isn't
a FLUSH request.
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
If __blk_mq_free_request() is called from timeout, that means this
request has been freed by blk_mq_free_request() already, so __blk_mq_dec_active_requests
should have been run.
However, one case is that __blk_mq_dec_active_requests isn't called in
blk_mq_end_request_batch, so maybe your driver is nvme with multiple
NSs, so can you try the following patch?
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..9c5dd5aa289c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1062,6 +1062,9 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
if (iob->need_ts)
__blk_mq_end_request_acct(rq, now);
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(rq->mq_hctx);
+
rq_qos_done(rq->q, rq);
/*
Thanks,
Ming
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 1:52 ` Jens Axboe
@ 2023-05-14 12:20 ` Ming Lei
2023-06-03 22:39 ` [PATCH] " Tian Lan
2023-06-03 22:39 ` [PATCH 1/1] " Tian Lan
2 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 12:20 UTC (permalink / raw)
To: Jens Axboe
Cc: Tian Lan, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan,
ming.lei
Hi Jens,
On Sat, May 13, 2023 at 07:52:37PM -0600, Jens Axboe wrote:
> On 5/13/23 4:12 PM, Tian Lan wrote:
> > From: Tian Lan <tian.lan@twosigma.com>
> >
> > The nr_active counter continues to increase over time which causes the
> > blk_mq_get_tag to hang until the thread is rescheduled to a different
> > core despite there are still tags available.
> >
> > kernel-stack
> >
> > INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> > Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> > Call Trace:
> > <TASK>
> > __schedule+0x351/0xa20
> > scheduler+0x5d/0xe0
> > io_schedule+0x42/0x70
> > blk_mq_get_tag+0x11a/0x2a0
> > ? dequeue_task_stop+0x70/0x70
> > __blk_mq_alloc_requests+0x191/0x2e0
> >
> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.
> >
> > 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> > b'__blk_mq_free_request+0x1 [kernel]'
> > b'bt_iter+0x50 [kernel]'
> > b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> > b'blk_mq_timeout_work+0x7c [kernel]'
> > b'process_one_work+0x1c4 [kernel]'
> > b'worker_thread+0x4d [kernel]'
> > b'kthread+0xe6 [kernel]'
> > b'ret_from_fork+0x1f [kernel]'
> >
> > Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>
> I think this needs:
>
> Cc: stable@vger.kernel.org
> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
I am still not sure what is wrong with above commit, and the real cause
of Tian's issue & this fix.
>
> tags, but I'm also now confused as to whether the flush handling part
> of that patch. Ming, what am I missing in terms of not honoring the
> flush ref on put?
From Tian's log, the issue didn't happen on flush request.
> What happens if two iterators both grab the
> flush at the same time, and then subsequently put them?
Both two code paths try to acquire the request refcount, and put it
if the reference is grabbed, and reference release is just done
in flush_end_io().
Thanks,
Ming
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 12:08 ` Ming Lei
@ 2023-05-14 14:12 ` Tian Lan
2023-05-14 14:15 ` Ming Lei
1 sibling, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-14 14:12 UTC (permalink / raw)
To: ming.lei
Cc: axboe, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663
Hi Ming,
> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.
> RQF_MQ_INFLIGHT won't be cleared when the request is freed normally
> from blk_mq_free_request().
Yes you are correct, maybe I should capture both rq->rq_flags and
rq->state so we know for sure if either of blk_mq_free_request or
__blk_mq_put_driver_tag was being called before hitting __blk_mq_free_request.
> > b'__blk_mq_free_request+0x1 [kernel]'
> > b'bt_iter+0x50 [kernel]'
> > b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> > b'blk_mq_timeout_work+0x7c [kernel]'
> > b'process_one_work+0x1c4 [kernel]'
> > b'worker_thread+0x4d [kernel]'
> > b'kthread+0xe6 [kernel]'
> > b'ret_from_fork+0x1f [kernel]'
> If __blk_mq_free_request() is called from timeout, that means this
> request has been freed by blk_mq_free_request() already, so __blk_mq_dec_active_requests
> should have been run.
We are also seeing a different call stack that could also potentially by-pass
__blk_mq_dec_active_requests. Do you think they could be caused by the same
underlying issue.
1976 2000 collectd __blk_mq_free_request rq_flags 0x620c0 in-flight 1
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_in_flight+0x35 [kernel]'
b'diskstats_show+0x205 [kernel]'
b'seq_read_iter+0x11f [kernel]'
b'proc_req_read_iter+0x4a [kernel]'
b'vfs_read+0x239 [kernel]'
b'ksys_read+0xb [kernel]'
b'do_syscall_64+0x58 [kernel]'
b'entry_SYSCALL_64_after_hwframe+0x63 [kernel]'
> However, one case is that __blk_mq_dec_active_requests isn't called in
> blk_mq_end_request_batch, so maybe your driver is nvme with multiple
> NSs, so can you try the following patch?
Yes, we are using nvme driver with multiple NSs.
I can test this patch and will update you on the results. I'm just curious
shouldn't the counter be subtracted via __blk_mq_sub_active_requests when
blk_mq_flush_tag_batch is invoked in that case. Then this would result in
double counting, is that correct.
Thanks,
Tian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 12:08 ` Ming Lei
2023-05-14 14:12 ` Tian Lan
@ 2023-05-14 14:15 ` Ming Lei
1 sibling, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 14:15 UTC (permalink / raw)
To: Tian Lan; +Cc: lkp, axboe, linux-block, llvm, oe-kbuild-all, tian.lan, ming.lei
On Sun, May 14, 2023 at 08:08:12PM +0800, Ming Lei wrote:
> Hello Tian,
>
> On Sat, May 13, 2023 at 03:05:34PM -0400, Tian Lan wrote:
> > From: Tian Lan <tian.lan@twosigma.com>
> >
> > The nr_active counter continues to increase over time which causes the
> > blk_mq_get_tag to hang until the thread is rescheduled to a different
> > core despite there are still tags available.
> >
> > kernel-stack
> >
> > INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> > Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> > Call Trace:
> > <TASK>
> > __schedule+0x351/0xa20
> > scheduler+0x5d/0xe0
> > io_schedule+0x42/0x70
> > blk_mq_get_tag+0x11a/0x2a0
> > ? dequeue_task_stop+0x70/0x70
> > __blk_mq_alloc_requests+0x191/0x2e0
> >
> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.
>
> RQF_MQ_INFLIGHT won't be cleared when the request is freed normally
> from blk_mq_free_request().
>
> >
> > 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
>
> RQF_MQ_INFLIGHT/RQF_DONTPREP/RQF_IO_STAT/RQF_STATS is set, and it isn't
> a FLUSH request.
>
> > b'__blk_mq_free_request+0x1 [kernel]'
> > b'bt_iter+0x50 [kernel]'
> > b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> > b'blk_mq_timeout_work+0x7c [kernel]'
> > b'process_one_work+0x1c4 [kernel]'
> > b'worker_thread+0x4d [kernel]'
> > b'kthread+0xe6 [kernel]'
> > b'ret_from_fork+0x1f [kernel]'
>
> If __blk_mq_free_request() is called from timeout, that means this
> request has been freed by blk_mq_free_request() already, so __blk_mq_dec_active_requests
> should have been run.
>
> However, one case is that __blk_mq_dec_active_requests isn't called in
> blk_mq_end_request_batch, so maybe your driver is nvme with multiple
> NSs, so can you try the following patch?
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f6dad0886a2f..9c5dd5aa289c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1062,6 +1062,9 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
> if (iob->need_ts)
> __blk_mq_end_request_acct(rq, now);
>
> + if (rq->rq_flags & RQF_MQ_INFLIGHT)
> + __blk_mq_dec_active_requests(rq->mq_hctx);
> +
> rq_qos_done(rq->q, rq);
The above is actually not correct, given __blk_mq_sub_active_requests()
is done as batch in blk_mq_flush_tag_batch().
So the problem is in the difference between blk_mq_free_request() and
blk_mq_end_request_batch(), wrt. when to call
__blk_mq_dec_active_requests().
So your V2 looks correct, given now both two paths decrease active
requests after req_ref_put_and_test() is done.
And the issue should start from f794f3351f26 ("block: add support for
blk_mq_end_request_batch()").
Thanks,
Ming
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 22:12 ` Tian Lan
2023-05-14 1:52 ` Jens Axboe
@ 2023-05-14 14:22 ` Ming Lei
2023-05-14 14:28 ` Tian Lan
2023-05-14 14:53 ` [PATCH] " Tian Lan
2023-06-03 23:20 ` [PATCH 1/1] " Jens Axboe
2 siblings, 2 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-14 14:22 UTC (permalink / raw)
To: Tian Lan
Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, ming.lei
On Sat, May 13, 2023 at 06:12:27PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
> block/blk-mq.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f6dad0886a2f..850bfb844ed2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
> blk_crypto_free_request(rq);
> blk_pm_mark_last_busy(rq);
> rq->mq_hctx = NULL;
> +
> + if (rq->rq_flags & RQF_MQ_INFLIGHT)
> + __blk_mq_dec_active_requests(hctx);
> +
> if (rq->tag != BLK_MQ_NO_TAG)
> blk_mq_put_tag(hctx->tags, ctx, rq->tag);
> if (sched_tag != BLK_MQ_NO_TAG)
> @@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
> void blk_mq_free_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
> - struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>
> if ((rq->rq_flags & RQF_ELVPRIV) &&
> q->elevator->type->ops.finish_request)
> q->elevator->type->ops.finish_request(rq);
>
> - if (rq->rq_flags & RQF_MQ_INFLIGHT)
> - __blk_mq_dec_active_requests(hctx);
> -
> if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
> laptop_io_completion(q->disk->bdi);
This patch looks fine, but please add words about why this way fixes the issue
with fixes tag:
- the difference between blk_mq_free_request() and blk_mq_end_request_batch(),
wrt. when to call __blk_mq_dec_active_requests(), the former does it
before calling req_ref_put_and_test(), and the later decreases active
request after req_ref_put_and_test().
- Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Once the above is done, feel free to add:
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Thanks,
Ming
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 14:22 ` Ming Lei
@ 2023-05-14 14:28 ` Tian Lan
2023-05-14 14:53 ` [PATCH] " Tian Lan
1 sibling, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-14 14:28 UTC (permalink / raw)
To: ming.lei
Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663
Thanks Ming, will make the requested changes.
Tian
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 14:22 ` Ming Lei
2023-05-14 14:28 ` Tian Lan
@ 2023-05-14 14:53 ` Tian Lan
2023-05-15 5:54 ` Hannes Reinecke
1 sibling, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-05-14 14:53 UTC (permalink / raw)
To: ming.lei
Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663
From: Tian Lan <tian.lan@twosigma.com>
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIORe state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
The issue is caused by the difference between blk_mq_free_request() and
blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
The former does it before calling req_ref_put_and_test(), and the latter
decreases the active request after req_ref_put_and_test().
- Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
blk_crypto_free_request(rq);
blk_pm_mark_last_busy(rq);
rq->mq_hctx = NULL;
+
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(hctx);
+
if (rq->tag != BLK_MQ_NO_TAG)
blk_mq_put_tag(hctx->tags, ctx, rq->tag);
if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
- struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
if ((rq->rq_flags & RQF_ELVPRIV) &&
q->elevator->type->ops.finish_request)
q->elevator->type->ops.finish_request(rq);
- if (rq->rq_flags & RQF_MQ_INFLIGHT)
- __blk_mq_dec_active_requests(hctx);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 14:53 ` [PATCH] " Tian Lan
@ 2023-05-15 5:54 ` Hannes Reinecke
2023-05-15 12:26 ` Tian Lan
0 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2023-05-15 5:54 UTC (permalink / raw)
To: Tian Lan, ming.lei
Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan
On 5/14/23 16:53, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIORe state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> The issue is caused by the difference between blk_mq_free_request() and
> blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
> The former does it before calling req_ref_put_and_test(), and the latter
> decreases the active request after req_ref_put_and_test().
>
> - Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
>
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
> block/blk-mq.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-15 5:54 ` Hannes Reinecke
@ 2023-05-15 12:26 ` Tian Lan
2023-05-25 2:12 ` Ming Lei
0 siblings, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-05-15 12:26 UTC (permalink / raw)
To: hare
Cc: axboe, horms, linux-block, lkp, llvm, ming.lei, oe-kbuild-all,
tian.lan, tilan7663
From: Tian Lan <tian.lan@twosigma.com>
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIORe state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
The issue is caused by the difference between blk_mq_free_request() and
blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
The former does it before calling req_ref_put_and_test(), and the latter
decreases the active request after req_ref_put_and_test().
- Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
block/blk-mq.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
blk_crypto_free_request(rq);
blk_pm_mark_last_busy(rq);
rq->mq_hctx = NULL;
+
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(hctx);
+
if (rq->tag != BLK_MQ_NO_TAG)
blk_mq_put_tag(hctx->tags, ctx, rq->tag);
if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
- struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
if ((rq->rq_flags & RQF_ELVPRIV) &&
q->elevator->type->ops.finish_request)
q->elevator->type->ops.finish_request(rq);
- if (rq->rq_flags & RQF_MQ_INFLIGHT)
- __blk_mq_dec_active_requests(hctx);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 1:39 ` Jens Axboe
@ 2023-05-15 12:28 ` Tian Lan
0 siblings, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-05-15 12:28 UTC (permalink / raw)
To: axboe; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan, tilan7663
Thanks Jens, this is very helpful!
Tian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-15 12:26 ` Tian Lan
@ 2023-05-25 2:12 ` Ming Lei
0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-05-25 2:12 UTC (permalink / raw)
To: Tian Lan
Cc: hare, axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan
On Mon, May 15, 2023 at 08:26:43AM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIORe state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> The issue is caused by the difference between blk_mq_free_request() and
> blk_mq_end_request_batch() wrt. when to call __blk_mq_dec_active_requests().
> The former does it before calling req_ref_put_and_test(), and the latter
> decreases the active request after req_ref_put_and_test().
>
> - Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
>
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
Hello Jens,
Looks this fix is missed, can you take a look?
Thanks,
Ming
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 1:52 ` Jens Axboe
2023-05-14 12:20 ` Ming Lei
@ 2023-06-03 22:39 ` Tian Lan
2023-06-04 0:47 ` Ming Lei
2023-06-03 22:39 ` [PATCH 1/1] " Tian Lan
2 siblings, 1 reply; 27+ messages in thread
From: Tian Lan @ 2023-06-03 22:39 UTC (permalink / raw)
To: axboe
Cc: horms, linux-block, lkp, llvm, ming.lei, oe-kbuild-all, tian.lan,
tilan7663, Hannes Reinecke, stable
From: Tian Lan <tian.lan@twosigma.com>
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIORe state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
This issue arises when both bt_iter() and blk_mq_end_request_batch()
are iterating on the same request. The leak happens when
blk_mq_find_and_get_req() is executed(from bt_iter) before
req_ref_put_and_test() gets called by blk_mq_end_request_batch().
And because non-flush request freed by blk_mq_put_rq_ref() bypasses the
active request tracking, the counter would slowly leak overtime.
Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
---
block/blk-mq.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..850bfb844ed2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -683,6 +683,10 @@ static void __blk_mq_free_request(struct request *rq)
blk_crypto_free_request(rq);
blk_pm_mark_last_busy(rq);
rq->mq_hctx = NULL;
+
+ if (rq->rq_flags & RQF_MQ_INFLIGHT)
+ __blk_mq_dec_active_requests(hctx);
+
if (rq->tag != BLK_MQ_NO_TAG)
blk_mq_put_tag(hctx->tags, ctx, rq->tag);
if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +698,11 @@ static void __blk_mq_free_request(struct request *rq)
void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
- struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
if ((rq->rq_flags & RQF_ELVPRIV) &&
q->elevator->type->ops.finish_request)
q->elevator->type->ops.finish_request(rq);
- if (rq->rq_flags & RQF_MQ_INFLIGHT)
- __blk_mq_dec_active_requests(hctx);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-14 1:52 ` Jens Axboe
2023-05-14 12:20 ` Ming Lei
2023-06-03 22:39 ` [PATCH] " Tian Lan
@ 2023-06-03 22:39 ` Tian Lan
2 siblings, 0 replies; 27+ messages in thread
From: Tian Lan @ 2023-06-03 22:39 UTC (permalink / raw)
To: axboe
Cc: horms, linux-block, lkp, llvm, ming.lei, oe-kbuild-all, tian.lan,
tilan7663
Hello Jens,
I updated the patch with more detailed description of the issue. Could you
please take a look again when you have a chance and let me know if there is
anything you would like to make a change.
Best,
Tian Lan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-05-13 22:12 ` Tian Lan
2023-05-14 1:52 ` Jens Axboe
2023-05-14 14:22 ` Ming Lei
@ 2023-06-03 23:20 ` Jens Axboe
2 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-06-03 23:20 UTC (permalink / raw)
To: Tian Lan; +Cc: horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan
On Sat, 13 May 2023 18:12:27 -0400, Tian Lan wrote:
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> [...]
Applied, thanks!
[1/1] blk-mq: fix blk_mq_hw_ctx active request accounting
commit: ddad59331a4e16088468ca0ad228a9fe32d7955a
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] blk-mq: fix blk_mq_hw_ctx active request accounting
2023-06-03 22:39 ` [PATCH] " Tian Lan
@ 2023-06-04 0:47 ` Ming Lei
0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2023-06-04 0:47 UTC (permalink / raw)
To: Tian Lan
Cc: axboe, horms, linux-block, lkp, llvm, oe-kbuild-all, tian.lan,
Hannes Reinecke, stable
On Sat, Jun 03, 2023 at 06:39:12PM -0400, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
>
> The nr_active counter continues to increase over time which causes the
> blk_mq_get_tag to hang until the thread is rescheduled to a different
> core despite there are still tags available.
>
> kernel-stack
>
> INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:inboundIORe state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x351/0xa20
> scheduler+0x5d/0xe0
> io_schedule+0x42/0x70
> blk_mq_get_tag+0x11a/0x2a0
> ? dequeue_task_stop+0x70/0x70
> __blk_mq_alloc_requests+0x191/0x2e0
>
> kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> __blk_mq_free_request being called.
>
> 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0
> b'__blk_mq_free_request+0x1 [kernel]'
> b'bt_iter+0x50 [kernel]'
> b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> b'blk_mq_timeout_work+0x7c [kernel]'
> b'process_one_work+0x1c4 [kernel]'
> b'worker_thread+0x4d [kernel]'
> b'kthread+0xe6 [kernel]'
> b'ret_from_fork+0x1f [kernel]'
>
> This issue arises when both bt_iter() and blk_mq_end_request_batch()
> are iterating on the same request. The leak happens when
> blk_mq_find_and_get_req() is executed(from bt_iter) before
> req_ref_put_and_test() gets called by blk_mq_end_request_batch().
> And because non-flush request freed by blk_mq_put_rq_ref() bypasses the
> active request tracking, the counter would slowly leak overtime.
>
> Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
f794f3351f26 is merged to v5.16, and the leak starts.
> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
2e315dc07df0 is merged to v5.14, when everything is just fine.
Both two aren't marked as -stable, so 'Fixes: 2e315dc07df0' is actually
not correct.
thanks,
Ming
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2023-06-04 0:47 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-13 14:12 [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting Tian Lan
2023-05-13 16:41 ` kernel test robot
2023-05-13 16:41 ` kernel test robot
2023-05-13 19:05 ` Tian Lan
2023-05-13 20:54 ` Simon Horman
2023-05-13 21:15 ` Jens Axboe
2023-05-13 22:11 ` Tian Lan
2023-05-14 1:39 ` Jens Axboe
2023-05-15 12:28 ` Tian Lan
2023-05-13 22:12 ` Tian Lan
2023-05-14 1:52 ` Jens Axboe
2023-05-14 12:20 ` Ming Lei
2023-06-03 22:39 ` [PATCH] " Tian Lan
2023-06-04 0:47 ` Ming Lei
2023-06-03 22:39 ` [PATCH 1/1] " Tian Lan
2023-05-14 14:22 ` Ming Lei
2023-05-14 14:28 ` Tian Lan
2023-05-14 14:53 ` [PATCH] " Tian Lan
2023-05-15 5:54 ` Hannes Reinecke
2023-05-15 12:26 ` Tian Lan
2023-05-25 2:12 ` Ming Lei
2023-06-03 23:20 ` [PATCH 1/1] " Jens Axboe
2023-05-14 12:08 ` Ming Lei
2023-05-14 14:12 ` Tian Lan
2023-05-14 14:15 ` Ming Lei
2023-05-13 19:11 ` Tian Lan
2023-05-13 16:52 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.