* [PATCH v2] blk-mq: release scheduler resource when request complete
@ 2023-08-13 15:23 chengming.zhou
2023-08-13 15:34 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: chengming.zhou @ 2023-08-13 15:23 UTC (permalink / raw)
To: axboe, hch, chuck.lever
Cc: bvanassche, cel, linux-block, linux-kernel, zhouchengming
From: Chengming Zhou <zhouchengming@bytedance.com>
Chuck reported [1] a IO hang problem on NFS exports that reside on SATA
devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
submission path for post-flush requests").
We analysed the IO hang problem, found there are two postflush requests
are waiting for each other.
The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
failed to blk_kick_flush() because of the second postflush request which
is inflight waiting in scheduler queue.
The second postflush waiting in scheduler queue can't be dispatched because
the first postflush hasn't released scheduler resource even though it has
completed by itself.
Fix it by releasing scheduler resource when the first postflush request
completed, so the second postflush can be dispatched and completed, then
make blk_kick_flush() succeed.
[1] https://lore.kernel.org/all/7A57C7AE-A51A-4254-888B-FE15CA21F9E9@oracle.com/
Fixes: 615939a2ae73 ("blk-mq: defer to the normal submission path for post-flush requests")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
---
v2:
- All IO schedulers do set ->finish_request(), so remove the
check and warn on not setting when register.
---
block/blk-mq.c | 16 ++++++++++++----
block/elevator.c | 3 +++
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f14b8669ac69..a8c63bef8ff1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -682,6 +682,14 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
+static void blk_mq_finish_request(struct request *rq)
+{
+ struct request_queue *q = rq->q;
+
+ if (rq->rq_flags & RQF_USE_SCHED)
+ q->elevator->type->ops.finish_request(rq);
+}
+
static void __blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
@@ -708,10 +716,6 @@ void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
- if ((rq->rq_flags & RQF_USE_SCHED) &&
- q->elevator->type->ops.finish_request)
- q->elevator->type->ops.finish_request(rq);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
@@ -1021,6 +1025,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
if (blk_mq_need_time_stamp(rq))
__blk_mq_end_request_acct(rq, ktime_get_ns());
+ blk_mq_finish_request(rq);
+
if (rq->end_io) {
rq_qos_done(rq->q, rq);
if (rq->end_io(rq, error) == RQ_END_IO_FREE)
@@ -1075,6 +1081,8 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
if (iob->need_ts)
__blk_mq_end_request_acct(rq, now);
+ blk_mq_finish_request(rq);
+
rq_qos_done(rq->q, rq);
/*
diff --git a/block/elevator.c b/block/elevator.c
index 8400e303fbcb..ac2cb3814eac 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -499,6 +499,9 @@ void elv_unregister_queue(struct request_queue *q)
int elv_register(struct elevator_type *e)
{
+ if (WARN_ON_ONCE(!e->ops.finish_request))
+ return -EINVAL;
+
/* insert_requests and dispatch_request are mandatory */
if (WARN_ON_ONCE(!e->ops.insert_requests || !e->ops.dispatch_request))
return -EINVAL;
--
2.41.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-13 15:23 [PATCH v2] blk-mq: release scheduler resource when request complete chengming.zhou
@ 2023-08-13 15:34 ` Jens Axboe
2023-08-13 15:45 ` Chengming Zhou
2023-08-14 21:42 ` Jens Axboe
2023-08-17 14:41 ` kernel test robot
2 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2023-08-13 15:34 UTC (permalink / raw)
To: chengming.zhou, hch, chuck.lever
Cc: bvanassche, cel, linux-block, linux-kernel, zhouchengming
On 8/13/23 9:23 AM, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> Chuck reported [1] a IO hang problem on NFS exports that reside on SATA
> devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
> submission path for post-flush requests").
>
> We analysed the IO hang problem, found there are two postflush requests
> are waiting for each other.
>
> The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
> the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
> failed to blk_kick_flush() because of the second postflush request which
> is inflight waiting in scheduler queue.
>
> The second postflush waiting in scheduler queue can't be dispatched because
> the first postflush hasn't released scheduler resource even though it has
> completed by itself.
>
> Fix it by releasing scheduler resource when the first postflush request
> completed, so the second postflush can be dispatched and completed, then
> make blk_kick_flush() succeed.
Thanks, applied with a bit of commit message massaging and adding a
comment for the newly added WARN_ON_ONCE().
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-13 15:34 ` Jens Axboe
@ 2023-08-13 15:45 ` Chengming Zhou
0 siblings, 0 replies; 11+ messages in thread
From: Chengming Zhou @ 2023-08-13 15:45 UTC (permalink / raw)
To: Jens Axboe, hch, chuck.lever
Cc: bvanassche, cel, linux-block, linux-kernel, zhouchengming
On 2023/8/13 23:34, Jens Axboe wrote:
> On 8/13/23 9:23 AM, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Chuck reported [1] a IO hang problem on NFS exports that reside on SATA
>> devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
>> submission path for post-flush requests").
>>
>> We analysed the IO hang problem, found there are two postflush requests
>> are waiting for each other.
>>
>> The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
>> the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
>> failed to blk_kick_flush() because of the second postflush request which
>> is inflight waiting in scheduler queue.
>>
>> The second postflush waiting in scheduler queue can't be dispatched because
>> the first postflush hasn't released scheduler resource even though it has
>> completed by itself.
>>
>> Fix it by releasing scheduler resource when the first postflush request
>> completed, so the second postflush can be dispatched and completed, then
>> make blk_kick_flush() succeed.
>
> Thanks, applied with a bit of commit message massaging and adding a
> comment for the newly added WARN_ON_ONCE().
>
Ok, thanks!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-13 15:23 [PATCH v2] blk-mq: release scheduler resource when request complete chengming.zhou
2023-08-13 15:34 ` Jens Axboe
@ 2023-08-14 21:42 ` Jens Axboe
2023-08-17 14:41 ` kernel test robot
2 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2023-08-14 21:42 UTC (permalink / raw)
To: hch, chuck.lever, chengming.zhou
Cc: bvanassche, cel, linux-block, linux-kernel, zhouchengming
On Sun, 13 Aug 2023 23:23:25 +0800, chengming.zhou@linux.dev wrote:
> Chuck reported [1] a IO hang problem on NFS exports that reside on SATA
> devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
> submission path for post-flush requests").
>
> We analysed the IO hang problem, found there are two postflush requests
> are waiting for each other.
>
> [...]
Applied, thanks!
[1/1] blk-mq: release scheduler resource when request complete
(no commit info)
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-13 15:23 [PATCH v2] blk-mq: release scheduler resource when request complete chengming.zhou
2023-08-13 15:34 ` Jens Axboe
2023-08-14 21:42 ` Jens Axboe
@ 2023-08-17 14:41 ` kernel test robot
2023-08-17 14:50 ` Bart Van Assche
2 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2023-08-17 14:41 UTC (permalink / raw)
To: chengming.zhou
Cc: oe-lkp, lkp, Chuck Lever, linux-block, axboe, hch, bvanassche,
cel, linux-kernel, zhouchengming, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_block/mq-deadline.c:#dd_exit_sched" on:
commit: 567b5a4f2f8b9e9fe97d5f8dca266d95c734ae91 ("[PATCH v2] blk-mq: release scheduler resource when request complete")
url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/blk-mq-release-scheduler-resource-when-request-complete/20230813-232513
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/all/20230813152325.3017343-1-chengming.zhou@linux.dev/
patch subject: [PATCH v2] blk-mq: release scheduler resource when request complete
this also caused xfstests.generic.704.fail
in testcase: xfstests
version: xfstests-x86_64-bb8af9c-1_20230807
with following parameters:
disk: 4HDD
fs: ext4
test: generic-704
compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@intel.com
[ 222.617396][ T2216] ------------[ cut here ]------------
[ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
[ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.638218][ T2216] Modules linked in: scsi_debug(-) dm_mod btrfs blake2b_generic xor intel_rapl_msr raid6_pq intel_rapl_common zstd_compress libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sd_mod t10_pi kvm crc64_rocksoft_generic crc64_rocksoft crc64 irqbypass sg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 ipmi_devintf ipmi_msghandler mei_wdt i915 rapl drm_buddy intel_cstate intel_gtt wmi_bmof drm_display_helper intel_uncore drm_kms_helper ahci libahci mei_me libata intel_pch_thermal ttm mei video wmi intel_pmc_core acpi_pad drm fuse ip_tables
[ 222.690679][ T2216] CPU: 0 PID: 2216 Comm: modprobe Tainted: G I 6.5.0-rc5-00190-g567b5a4f2f8b #1
[ 222.700878][ T2216] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
[ 222.708919][ T2216] RIP: 0010:dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.714130][ T2216] Code: 89 da 8b 4b 04 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e c4 00 00 00 8b 13 44 89 e6 48 c7 c7 20 d3 e2 83 e8 e2 d1 08 ff <0f> 0b e9 f4 fe ff ff 0f 0b e9 78 fe ff ff 48 89 ee 48 c7 c7 20 cb
All code
========
0: 89 da mov %ebx,%edx
2: 8b 4b 04 mov 0x4(%rbx),%ecx
5: 48 c1 ea 03 shr $0x3,%rdx
9: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax
d: 84 c0 test %al,%al
f: 74 08 je 0x19
11: 3c 03 cmp $0x3,%al
13: 0f 8e c4 00 00 00 jle 0xdd
19: 8b 13 mov (%rbx),%edx
1b: 44 89 e6 mov %r12d,%esi
1e: 48 c7 c7 20 d3 e2 83 mov $0xffffffff83e2d320,%rdi
25: e8 e2 d1 08 ff callq 0xffffffffff08d20c
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 f4 fe ff ff jmpq 0xffffffffffffff25
31: 0f 0b ud2
33: e9 78 fe ff ff jmpq 0xfffffffffffffeb0
38: 48 89 ee mov %rbp,%rsi
3b: 48 rex.W
3c: c7 .byte 0xc7
3d: c7 (bad)
3e: 20 cb and %cl,%bl
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 f4 fe ff ff jmpq 0xfffffffffffffefb
7: 0f 0b ud2
9: e9 78 fe ff ff jmpq 0xfffffffffffffe86
e: 48 89 ee mov %rbp,%rsi
11: 48 rex.W
12: c7 .byte 0xc7
13: c7 (bad)
14: 20 cb and %cl,%bl
[ 222.733468][ T2216] RSP: 0018:ffffc9000092f858 EFLAGS: 00010282
[ 222.739367][ T2216] RAX: 0000000000000000 RBX: ffff88819e33e0b0 RCX: ffffffff821fb87e
[ 222.747166][ T2216] RDX: 1ffff110fe1c57b8 RSI: 0000000000000008 RDI: ffffc9000092f610
[ 222.754975][ T2216] RBP: 0000000000000116 R08: 0000000000000001 R09: fffff52000125ec2
[ 222.762782][ T2216] R10: ffffc9000092f617 R11: 0000000000000001 R12: 0000000000000001
[ 222.770591][ T2216] R13: dffffc0000000000 R14: ffff88819e33e000 R15: ffff88819e33e0bc
[ 222.778386][ T2216] FS: 00007fd9aea5e540(0000) GS:ffff8887f0e00000(0000) knlGS:0000000000000000
[ 222.787144][ T2216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 222.793561][ T2216] CR2: 00007fbe00432bb0 CR3: 0000000126856005 CR4: 00000000003706f0
[ 222.801359][ T2216] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 222.809156][ T2216] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 222.816970][ T2216] Call Trace:
[ 222.820093][ T2216] <TASK>
[ 222.822872][ T2216] ? __warn (kernel/panic.c:673)
[ 222.826794][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.831403][ T2216] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 222.835752][ T2216] ? handle_bug (arch/x86/kernel/traps.c:324)
[ 222.839912][ T2216] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
[ 222.844433][ T2216] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 222.849286][ T2216] ? llist_add_batch (lib/llist.c:33 (discriminator 14))
[ 222.853980][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.858586][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.863193][ T2216] blk_mq_exit_sched (block/blk-mq-sched.c:550)
[ 222.867973][ T2216] ? blk_mq_sched_free_rqs (block/blk-mq-sched.c:527)
[ 222.873257][ T2216] ? __mutex_lock_slowpath (kernel/locking/mutex.c:282)
[ 222.878384][ T2216] ? ioc_clear_queue (include/linux/list.h:292 block/blk-ioc.c:174)
[ 222.883076][ T2216] ? blk_mq_poll (block/blk-mq.c:4832)
[ 222.887324][ T2216] elevator_exit (block/elevator.c:168)
[ 222.891572][ T2216] del_gendisk (block/genhd.c:701)
[ 222.895834][ T2216] ? diskstats_show (block/genhd.c:631)
[ 222.900526][ T2216] ? __pm_runtime_resume (drivers/base/power/runtime.c:1174)
[ 222.905568][ T2216] sd_remove (drivers/scsi/sd.c:3736) sd_mod
[ 222.910436][ T2216] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293)
[ 222.916335][ T2216] bus_remove_device (include/linux/kobject.h:191 drivers/base/base.h:73 drivers/base/bus.c:581)
[ 222.921225][ T2216] device_del (drivers/base/core.c:3815)
[ 222.925388][ T2216] ? attribute_container_device_trigger (drivers/base/attribute_container.c:357)
[ 222.931807][ T2216] ? __device_link_del (drivers/base/core.c:3769)
[ 222.936760][ T2216] ? attribute_container_device_trigger_safe (drivers/base/attribute_container.c:357)
[ 222.943612][ T2216] __scsi_remove_device (drivers/scsi/scsi_sysfs.c:1489)
[ 222.948651][ T2216] scsi_forget_host (drivers/scsi/scsi_scan.c:1988)
[ 222.953243][ T2216] scsi_remove_host (drivers/scsi/hosts.c:182)
[ 222.957838][ T2216] sdebug_driver_remove (drivers/scsi/scsi_debug.c:7841) scsi_debug
[ 222.963933][ T2216] ? kernfs_remove_by_name_ns (fs/kernfs/dir.c:1679)
[ 222.969492][ T2216] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293)
[ 222.975392][ T2216] bus_remove_device (include/linux/kobject.h:191 drivers/base/base.h:73 drivers/base/bus.c:581)
[ 222.980173][ T2216] device_del (drivers/base/core.c:3815)
[ 222.984349][ T2216] ? __device_link_del (drivers/base/core.c:3769)
[ 222.989299][ T2216] ? mutex_unlock (arch/x86/include/asm/atomic64_64.h:109 include/linux/atomic/atomic-arch-fallback.h:4303 include/linux/atomic/atomic-long.h:1499 include/linux/atomic/atomic-instrumented.h:4446 kernel/locking/mutex.c:181 kernel/locking/mutex.c:540)
[ 222.993645][ T2216] ? __mutex_unlock_slowpath+0x2b0/0x2b0
[ 223.000160][ T2216] device_unregister (drivers/base/core.c:3732 drivers/base/core.c:3845)
[ 223.004772][ T2216] sdebug_do_remove_host (drivers/scsi/scsi_debug.c:7299) scsi_debug
[ 223.011039][ T2216] scsi_debug_exit (drivers/scsi/scsi_debug.c:7670) scsi_debug
[ 223.016625][ T2216] __do_sys_delete_module+0x316/0x540
[ 223.022880][ T2216] ? module_flags (kernel/module/main.c:698)
[ 223.027401][ T2216] ? task_work_cancel (kernel/task_work.c:147)
[ 223.032081][ T2216] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:115 include/linux/atomic/atomic-arch-fallback.h:2155 include/linux/atomic/atomic-instrumented.h:1296 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 223.036590][ T2216] ? exit_to_user_mode_loop (include/linux/sched.h:2337 include/linux/resume_user_mode.h:61 kernel/entry/common.c:171)
[ 223.041890][ T2216] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 223.046154][ T2216] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 223.051867][ T2216] RIP: 0033:0x7fd9aeb83417
[ 223.056129][ T2216] Code: 73 01 c3 48 8b 0d 79 1a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 1a 0d 00 f7 d8 64 89 01 48
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d 79 1a 0d 00 mov 0xd1a79(%rip),%rcx # 0xd1a83
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1b: 00 00 00
1e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
23: b8 b0 00 00 00 mov $0xb0,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 49 1a 0d 00 mov 0xd1a49(%rip),%rcx # 0xd1a83
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 49 1a 0d 00 mov 0xd1a49(%rip),%rcx # 0xd1a59
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230817/202308172100.8ce4b853-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-17 14:41 ` kernel test robot
@ 2023-08-17 14:50 ` Bart Van Assche
2023-08-17 15:29 ` Chengming Zhou
0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2023-08-17 14:50 UTC (permalink / raw)
To: chengming.zhou
Cc: oe-lkp, lkp, Chuck Lever, linux-block, axboe, hch, cel,
linux-kernel, zhouchengming, kernel test robot
On 8/17/23 07:41, kernel test robot wrote:
> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
The above information shows that dd_inserted_request() has been called
276 times and also that dd_finish_request() has been called 278 times.
Calling dd_finish_request() more than once per request breaks the code
for priority handling since that code checks how many requests are
pending per priority level by subtracting the number of completion calls
from the number of insertion calls (see also dd_queued()). I think the
above output indicates that this patch introduced a regression.
Bart.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-17 14:50 ` Bart Van Assche
@ 2023-08-17 15:29 ` Chengming Zhou
2023-08-17 17:17 ` Chengming Zhou
2023-08-17 17:20 ` Jens Axboe
0 siblings, 2 replies; 11+ messages in thread
From: Chengming Zhou @ 2023-08-17 15:29 UTC (permalink / raw)
To: Bart Van Assche, axboe, hch
Cc: oe-lkp, lkp, Chuck Lever, linux-block, cel, linux-kernel,
kernel test robot, chengming.zhou
On 2023/8/17 22:50, Bart Van Assche wrote:
> On 8/17/23 07:41, kernel test robot wrote:
>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>
> The above information shows that dd_inserted_request() has been called
> 276 times and also that dd_finish_request() has been called 278 times.
Thanks much for your help.
This patch indeed introduced a regression, postflush requests will be completed
twice, so here dd_finish_request() is more than dd_inserted_request().
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a8c63bef8ff1..7cd47ffc04ce 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
{
struct request_queue *q = rq->q;
- if (rq->rq_flags & RQF_USE_SCHED)
+ if (rq->rq_flags & RQF_USE_SCHED) {
q->elevator->type->ops.finish_request(rq);
+ rq->rq_flags &= ~RQF_USE_SCHED;
+ }
}
Clear RQF_USE_SCHED flag here should fix this problem, which should be ok
since finish_request() is the last callback, this flag isn't needed anymore.
Jens, should I send this diff as another patch or resend updated v3?
Thanks.
> Calling dd_finish_request() more than once per request breaks the code
> for priority handling since that code checks how many requests are
> pending per priority level by subtracting the number of completion calls
> from the number of insertion calls (see also dd_queued()). I think the
> above output indicates that this patch introduced a regression.
>
> Bart.
>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-17 15:29 ` Chengming Zhou
@ 2023-08-17 17:17 ` Chengming Zhou
2023-08-17 17:26 ` Jens Axboe
2023-08-17 17:20 ` Jens Axboe
1 sibling, 1 reply; 11+ messages in thread
From: Chengming Zhou @ 2023-08-17 17:17 UTC (permalink / raw)
To: Bart Van Assche, axboe, hch
Cc: oe-lkp, lkp, Chuck Lever, linux-block, cel, linux-kernel,
kernel test robot
On 2023/8/17 23:29, Chengming Zhou wrote:
> On 2023/8/17 22:50, Bart Van Assche wrote:
>> On 8/17/23 07:41, kernel test robot wrote:
>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>>
>> The above information shows that dd_inserted_request() has been called
>> 276 times and also that dd_finish_request() has been called 278 times.
>
> Thanks much for your help.
>
> This patch indeed introduced a regression, postflush requests will be completed
> twice, so here dd_finish_request() is more than dd_inserted_request().
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a8c63bef8ff1..7cd47ffc04ce 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
>
> - if (rq->rq_flags & RQF_USE_SCHED)
> + if (rq->rq_flags & RQF_USE_SCHED) {
> q->elevator->type->ops.finish_request(rq);
> + rq->rq_flags &= ~RQF_USE_SCHED;
> + }
> }
>
I just tried to run LKP and xfstests, firstly failed to run LKP on my server
which seems to miss some dependencies. Then I ran xfstests successfully.
But xfstests generic/704 always pass and no WARN in dmesg. (I don't know why,
maybe my server settings are some different from the test robot.)
So I try to reproduce it manually. Steps:
```
echo mq-deadline > /sys/block/sdb/queue/scheduler
mkfs.ext4 /dev/sdb
mount /dev/sdb /fs/sdb
cd /fs/sdb
stress-ng --symlink 4 --timeout 60
echo none > /sys/block/sdb/queue/scheduler
```
This way the WARNING in mq-deadline can be reproduced easily.
Then retest with the diff, mq-deadline WARNING still happened... So there
are still other requests which have RQF_USE_SCHED flag completed without
being inserted into elevator.
Will use some tracing and look again.
Thanks.
>
> Clear RQF_USE_SCHED flag here should fix this problem, which should be ok
> since finish_request() is the last callback, this flag isn't needed anymore.
>
> Jens, should I send this diff as another patch or resend updated v3?
>
> Thanks.
>
>> Calling dd_finish_request() more than once per request breaks the code
>> for priority handling since that code checks how many requests are
>> pending per priority level by subtracting the number of completion calls
>> from the number of insertion calls (see also dd_queued()). I think the
>> above output indicates that this patch introduced a regression.
>>
>> Bart.
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-17 15:29 ` Chengming Zhou
2023-08-17 17:17 ` Chengming Zhou
@ 2023-08-17 17:20 ` Jens Axboe
2023-08-17 17:24 ` Jens Axboe
1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2023-08-17 17:20 UTC (permalink / raw)
To: Chengming Zhou, Bart Van Assche, hch
Cc: oe-lkp, lkp, Chuck Lever, linux-block, cel, linux-kernel,
kernel test robot
On 8/17/23 9:29 AM, Chengming Zhou wrote:
> On 2023/8/17 22:50, Bart Van Assche wrote:
>> On 8/17/23 07:41, kernel test robot wrote:
>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>>
>> The above information shows that dd_inserted_request() has been called
>> 276 times and also that dd_finish_request() has been called 278 times.
>
> Thanks much for your help.
>
> This patch indeed introduced a regression, postflush requests will be completed
> twice, so here dd_finish_request() is more than dd_inserted_request().
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a8c63bef8ff1..7cd47ffc04ce 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
>
> - if (rq->rq_flags & RQF_USE_SCHED)
> + if (rq->rq_flags & RQF_USE_SCHED) {
> q->elevator->type->ops.finish_request(rq);
> + rq->rq_flags &= ~RQF_USE_SCHED;
> + }
> }
>
>
> Clear RQF_USE_SCHED flag here should fix this problem, which should be ok
> since finish_request() is the last callback, this flag isn't needed anymore.
>
> Jens, should I send this diff as another patch or resend updated v3?
I don't think this is the right solution, it makes all kinds of
assumptions on what that flag is and when it's safe to clear it. It's a
very fragile fix, I think we need to do better than that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-17 17:20 ` Jens Axboe
@ 2023-08-17 17:24 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2023-08-17 17:24 UTC (permalink / raw)
To: Chengming Zhou, Bart Van Assche, hch
Cc: oe-lkp, lkp, Chuck Lever, linux-block, cel, linux-kernel,
kernel test robot
On 8/17/23 11:20 AM, Jens Axboe wrote:
> On 8/17/23 9:29 AM, Chengming Zhou wrote:
>> On 2023/8/17 22:50, Bart Van Assche wrote:
>>> On 8/17/23 07:41, kernel test robot wrote:
>>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>>>
>>> The above information shows that dd_inserted_request() has been called
>>> 276 times and also that dd_finish_request() has been called 278 times.
>>
>> Thanks much for your help.
>>
>> This patch indeed introduced a regression, postflush requests will be completed
>> twice, so here dd_finish_request() is more than dd_inserted_request().
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index a8c63bef8ff1..7cd47ffc04ce 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
>> {
>> struct request_queue *q = rq->q;
>>
>> - if (rq->rq_flags & RQF_USE_SCHED)
>> + if (rq->rq_flags & RQF_USE_SCHED) {
>> q->elevator->type->ops.finish_request(rq);
>> + rq->rq_flags &= ~RQF_USE_SCHED;
>> + }
>> }
>>
>>
>> Clear RQF_USE_SCHED flag here should fix this problem, which should be ok
>> since finish_request() is the last callback, this flag isn't needed anymore.
>>
>> Jens, should I send this diff as another patch or resend updated v3?
>
> I don't think this is the right solution, it makes all kinds of
> assumptions on what that flag is and when it's safe to clear it. It's a
> very fragile fix, I think we need to do better than that.
Well maybe this is actually fine, since we're freeing the request now
anyway. I can fold it in the fix, I'll add a comment as well. If this is
subtle enough that it caused this issue, we definitely should have a
comment on why we're clearing this flag.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] blk-mq: release scheduler resource when request complete
2023-08-17 17:17 ` Chengming Zhou
@ 2023-08-17 17:26 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2023-08-17 17:26 UTC (permalink / raw)
To: Chengming Zhou, Bart Van Assche, hch
Cc: oe-lkp, lkp, Chuck Lever, linux-block, cel, linux-kernel,
kernel test robot
On 8/17/23 11:17 AM, Chengming Zhou wrote:
> On 2023/8/17 23:29, Chengming Zhou wrote:
>> On 2023/8/17 22:50, Bart Van Assche wrote:
>>> On 8/17/23 07:41, kernel test robot wrote:
>>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>>>
>>> The above information shows that dd_inserted_request() has been called
>>> 276 times and also that dd_finish_request() has been called 278 times.
>>
>> Thanks much for your help.
>>
>> This patch indeed introduced a regression, postflush requests will be completed
>> twice, so here dd_finish_request() is more than dd_inserted_request().
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index a8c63bef8ff1..7cd47ffc04ce 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
>> {
>> struct request_queue *q = rq->q;
>>
>> - if (rq->rq_flags & RQF_USE_SCHED)
>> + if (rq->rq_flags & RQF_USE_SCHED) {
>> q->elevator->type->ops.finish_request(rq);
>> + rq->rq_flags &= ~RQF_USE_SCHED;
>> + }
>> }
>>
>
> I just tried to run LKP and xfstests, firstly failed to run LKP on my server
> which seems to miss some dependencies. Then I ran xfstests successfully.
>
> But xfstests generic/704 always pass and no WARN in dmesg. (I don't know why,
> maybe my server settings are some different from the test robot.)
>
> So I try to reproduce it manually. Steps:
>
> ```
> echo mq-deadline > /sys/block/sdb/queue/scheduler
>
> mkfs.ext4 /dev/sdb
> mount /dev/sdb /fs/sdb
> cd /fs/sdb
> stress-ng --symlink 4 --timeout 60
>
> echo none > /sys/block/sdb/queue/scheduler
> ```
>
> This way the WARNING in mq-deadline can be reproduced easily.
>
> Then retest with the diff, mq-deadline WARNING still happened... So there
> are still other requests which have RQF_USE_SCHED flag completed without
> being inserted into elevator.
>
> Will use some tracing and look again.
Ah missed this, thanks for doing this testing. I'll wait for an update
version. We can just fold in whatever change we need, and most likely
just push the patch to next week rather than send off a pull request for
this week. It's the only important thing pending on the block side.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-08-17 17:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-13 15:23 [PATCH v2] blk-mq: release scheduler resource when request complete chengming.zhou
2023-08-13 15:34 ` Jens Axboe
2023-08-13 15:45 ` Chengming Zhou
2023-08-14 21:42 ` Jens Axboe
2023-08-17 14:41 ` kernel test robot
2023-08-17 14:50 ` Bart Van Assche
2023-08-17 15:29 ` Chengming Zhou
2023-08-17 17:17 ` Chengming Zhou
2023-08-17 17:26 ` Jens Axboe
2023-08-17 17:20 ` Jens Axboe
2023-08-17 17:24 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).