* [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 @ 2024-04-23 13:17 Yi Zhang 2024-04-26 8:30 ` [bug report][bisected] " Yi Zhang 2024-04-29 14:35 ` [bug report] " Johannes Thumshirn 0 siblings, 2 replies; 11+ messages in thread From: Yi Zhang @ 2024-04-23 13:17 UTC (permalink / raw) To: linux-block, open list:NVM EXPRESS DRIVER Hi I found this issue on the latest linux-block/for-next by blktests nvme/tcp nvme/012, please help check it and let me know if you need any info/testing for it, thanks. [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 [ 1873.761900] loop0: detected capacity change from 0 to 2097152 [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) [ 1874.208883] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 1874.243423] nvme nvme0: creating 48 I/O queues. [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 [ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem b5a2998b-e06a-40d0-a291-a46b67aa43db [ 1875.770202] XFS (nvme0n1): Ending clean mount [ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.182573] nvme nvme0: starting error recovery [ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.205619] block nvme0n1: no usable path - requeuing I/O [ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.218975] block nvme0n1: no usable path - requeuing I/O [ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.232395] block nvme0n1: no usable path - requeuing I/O [ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.245731] block nvme0n1: no usable path - requeuing I/O [ 1936.251165] block nvme0n1: no usable path - requeuing I/O [ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.264676] block nvme0n1: no usable path - requeuing I/O [ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.277848] block nvme0n1: no usable path - requeuing I/O [ 1936.283294] block nvme0n1: no usable path - requeuing I/O [ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.296450] block nvme0n1: no usable path - requeuing I/O [ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.309621] block nvme0n1: no usable path - requeuing I/O [ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O Cmd) QID 2 timeout [ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O Cmd) QID 8 timeout [ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O Cmd) QID 8 timeout [ 1936.449539] general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI [ 1936.450749] nvme nvme0: Reconnecting in 10 seconds... [ 1936.460764] KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] [ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2 [ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.20.1 09/13/2023 [ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn [ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 [ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48 b8 00 [ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002 [ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a [ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020 [ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3 [ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000 [ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000 [ 1936.557847] FS: 0000000000000000(0000) GS:ffff88c7aba00000(0000) knlGS:0000000000000000 [ 1936.565930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0 [ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1936.593072] PKRU: 55555554 [ 1936.595784] Call Trace: [ 1936.598237] <TASK> [ 1936.600344] ? __die_body.cold+0x19/0x25 [ 1936.604279] ? die_addr+0x46/0x70 [ 1936.607606] ? exc_general_protection+0x14f/0x250 [ 1936.612325] ? asm_exc_general_protection+0x26/0x30 [ 1936.617212] ? blk_flush_complete_seq+0x450/0x1060 [ 1936.622012] ? __pfx_blk_flush_complete_seq+0x10/0x10 [ 1936.627072] mq_flush_data_end_io+0x279/0x500 [ 1936.631430] ? __pfx_mq_flush_data_end_io+0x10/0x10 [ 1936.636308] blk_mq_end_request+0x1b8/0x690 [ 1936.640493] ? _raw_spin_unlock_irqrestore+0x45/0x80 [ 1936.645460] nvme_failover_req+0x37d/0x4e0 [nvme_core] [ 1936.650634] nvme_fail_nonready_command+0x130/0x180 [nvme_core] [ 1936.656578] blk_mq_dispatch_rq_list+0x3b3/0x2100 [ 1936.661292] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 [ 1936.666436] __blk_mq_sched_dispatch_requests+0x554/0x1510 [ 1936.671925] ? __pfx___lock_acquire+0x10/0x10 [ 1936.676285] ? __pfx_lock_acquired+0x10/0x10 [ 1936.680558] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 [ 1936.686477] ? __pfx_lock_acquire+0x10/0x10 [ 1936.690668] blk_mq_sched_dispatch_requests+0xa6/0xf0 [ 1936.695721] blk_mq_run_work_fn+0x122/0x2c0 [ 1936.699908] ? blk_mq_run_work_fn+0x114/0x2c0 [ 1936.704266] process_one_work+0x85d/0x13f0 [ 1936.708371] ? worker_thread+0xcc/0x1130 [ 1936.712301] ? __pfx_process_one_work+0x10/0x10 [ 1936.716835] ? assign_work+0x16c/0x240 [ 1936.720596] worker_thread+0x6da/0x1130 [ 1936.724438] ? __pfx_worker_thread+0x10/0x10 [ 1936.728712] kthread+0x2ed/0x3c0 [ 1936.731944] ? _raw_spin_unlock_irq+0x28/0x60 [ 1936.736303] ? __pfx_kthread+0x10/0x10 [ 1936.740058] ret_from_fork+0x31/0x70 [ 1936.743636] ? __pfx_kthread+0x10/0x10 [ 1936.747387] ret_from_fork_asm+0x1a/0x30 [ 1936.751319] </TASK> [ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3 i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit libnvdimm [last unloaded: nvmet] [ 1936.829942] ---[ end trace 0000000000000000 ]--- [ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 [ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48 b8 00 [ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002 [ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a [ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020 [ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3 [ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000 [ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000 [ 1936.945811] FS: 0000000000000000(0000) GS:ffff88c7aba00000(0000) knlGS:0000000000000000 [ 1936.953899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0 [ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1936.981039] PKRU: 55555554 [ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled [ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1 [ 1946.926323] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 1946.944673] nvme nvme0: creating 48 I/O queues. (gdb) l *(blk_flush_complete_seq+0x450) 0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133). 128 * After flush data completion, @rq->bio is %NULL but we need to 129 * complete the bio again. @rq->biotail is guaranteed to equal the 130 * original @rq->bio. Restore it. 131 */ 132 rq->bio = rq->biotail; 133 rq->__sector = rq->bio->bi_iter.bi_sector; 134 135 /* make @rq a normal request */ 136 rq->rq_flags &= ~RQF_FLUSH_SEQ; 137 rq->end_io = rq->flush.saved_end_io; -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report][bisected] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang @ 2024-04-26 8:30 ` Yi Zhang 2024-04-29 14:35 ` [bug report] " Johannes Thumshirn 1 sibling, 0 replies; 11+ messages in thread From: Yi Zhang @ 2024-04-26 8:30 UTC (permalink / raw) To: linux-block, open list:NVM EXPRESS DRIVER Cc: Damien Le Moal, Jens Axboe, Christoph Hellwig, Bart Van Assche, Hannes Reinecke Hi Damien Bisect shows it was introduced with this patch, could you help check it commit 6f8fd758de63bab513c551bb1796a14f8cdc40d9 Author: Damien Le Moal <dlemoal@kernel.org> Date: Mon Apr 8 10:41:01 2024 +0900 block: Restore sector of flush requests On Tue, Apr 23, 2024 at 9:17 PM Yi Zhang <yi.zhang@redhat.com> wrote: > > Hi > I found this issue on the latest linux-block/for-next by blktests > nvme/tcp nvme/012, please help check it and let me know if you need > any info/testing for it, thanks. > > [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 > [ 1873.761900] loop0: detected capacity change from 0 to 2097152 > [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) > [ 1874.208883] nvmet: creating nvm controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > [ 1874.243423] nvme nvme0: creating 48 I/O queues. > [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. > [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr > 127.0.0.1:4420, hostnqn: > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 > [ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem > b5a2998b-e06a-40d0-a291-a46b67aa43db > [ 1875.770202] XFS (nvme0n1): Ending clean mount > [ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.182573] nvme nvme0: starting error recovery > [ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.205619] block nvme0n1: no usable path - requeuing I/O > [ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.218975] block nvme0n1: no usable path - requeuing I/O > [ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.232395] block nvme0n1: no usable path - requeuing I/O > [ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.245731] block nvme0n1: no usable path - requeuing I/O > [ 1936.251165] block nvme0n1: no usable path - requeuing I/O > [ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.264676] block nvme0n1: no usable path - requeuing I/O > [ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.277848] block nvme0n1: no usable path - requeuing I/O > [ 1936.283294] block nvme0n1: no usable path - requeuing I/O > [ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.296450] block nvme0n1: no usable path - requeuing I/O > [ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.309621] block nvme0n1: no usable path - requeuing I/O > [ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O > Cmd) QID 8 timeout > [ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.449539] general protection fault, probably for non-canonical > address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI > [ 1936.450749] nvme nvme0: Reconnecting in 10 seconds... > [ 1936.460764] KASAN: null-ptr-deref in range > [0x0000000000000020-0x0000000000000027] > [ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2 > [ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS > 2.20.1 09/13/2023 > [ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn > [ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 > [ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00 > 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 > c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48 > b8 00 > [ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002 > [ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a > [ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020 > [ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3 > [ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000 > [ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000 > [ 1936.557847] FS: 0000000000000000(0000) GS:ffff88c7aba00000(0000) > knlGS:0000000000000000 > [ 1936.565930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0 > [ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1936.593072] PKRU: 55555554 > [ 1936.595784] Call Trace: > [ 1936.598237] <TASK> > [ 1936.600344] ? __die_body.cold+0x19/0x25 > [ 1936.604279] ? die_addr+0x46/0x70 > [ 1936.607606] ? exc_general_protection+0x14f/0x250 > [ 1936.612325] ? asm_exc_general_protection+0x26/0x30 > [ 1936.617212] ? blk_flush_complete_seq+0x450/0x1060 > [ 1936.622012] ? __pfx_blk_flush_complete_seq+0x10/0x10 > [ 1936.627072] mq_flush_data_end_io+0x279/0x500 > [ 1936.631430] ? __pfx_mq_flush_data_end_io+0x10/0x10 > [ 1936.636308] blk_mq_end_request+0x1b8/0x690 > [ 1936.640493] ? _raw_spin_unlock_irqrestore+0x45/0x80 > [ 1936.645460] nvme_failover_req+0x37d/0x4e0 [nvme_core] > [ 1936.650634] nvme_fail_nonready_command+0x130/0x180 [nvme_core] > [ 1936.656578] blk_mq_dispatch_rq_list+0x3b3/0x2100 > [ 1936.661292] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 > [ 1936.666436] __blk_mq_sched_dispatch_requests+0x554/0x1510 > [ 1936.671925] ? __pfx___lock_acquire+0x10/0x10 > [ 1936.676285] ? __pfx_lock_acquired+0x10/0x10 > [ 1936.680558] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 > [ 1936.686477] ? __pfx_lock_acquire+0x10/0x10 > [ 1936.690668] blk_mq_sched_dispatch_requests+0xa6/0xf0 > [ 1936.695721] blk_mq_run_work_fn+0x122/0x2c0 > [ 1936.699908] ? blk_mq_run_work_fn+0x114/0x2c0 > [ 1936.704266] process_one_work+0x85d/0x13f0 > [ 1936.708371] ? worker_thread+0xcc/0x1130 > [ 1936.712301] ? __pfx_process_one_work+0x10/0x10 > [ 1936.716835] ? assign_work+0x16c/0x240 > [ 1936.720596] worker_thread+0x6da/0x1130 > [ 1936.724438] ? __pfx_worker_thread+0x10/0x10 > [ 1936.728712] kthread+0x2ed/0x3c0 > [ 1936.731944] ? _raw_spin_unlock_irq+0x28/0x60 > [ 1936.736303] ? __pfx_kthread+0x10/0x10 > [ 1936.740058] ret_from_fork+0x31/0x70 > [ 1936.743636] ? __pfx_kthread+0x10/0x10 > [ 1936.747387] ret_from_fork_asm+0x1a/0x30 > [ 1936.751319] </TASK> > [ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp > nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr > intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common > isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp > kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet > iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas > dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3 > i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei > lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf > ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop > nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel > polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 > sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit > libnvdimm [last unloaded: nvmet] > [ 1936.829942] ---[ end trace 0000000000000000 ]--- > [ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 > [ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00 > 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 > c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48 > b8 00 > [ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002 > [ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a > [ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020 > [ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3 > [ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000 > [ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000 > [ 1936.945811] FS: 0000000000000000(0000) GS:ffff88c7aba00000(0000) > knlGS:0000000000000000 > [ 1936.953899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0 > [ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1936.981039] PKRU: 55555554 > [ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled > [ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1 > [ 1946.926323] nvmet: creating nvm controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > [ 1946.944673] nvme nvme0: creating 48 I/O queues. > > (gdb) l *(blk_flush_complete_seq+0x450) > 0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133). > 128 * After flush data completion, @rq->bio is %NULL but we need to > 129 * complete the bio again. @rq->biotail is guaranteed to equal the > 130 * original @rq->bio. Restore it. > 131 */ > 132 rq->bio = rq->biotail; > 133 rq->__sector = rq->bio->bi_iter.bi_sector; > 134 > 135 /* make @rq a normal request */ > 136 rq->rq_flags &= ~RQF_FLUSH_SEQ; > 137 rq->end_io = rq->flush.saved_end_io; > > > > -- > Best Regards, > Yi Zhang -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang 2024-04-26 8:30 ` [bug report][bisected] " Yi Zhang @ 2024-04-29 14:35 ` Johannes Thumshirn 2024-04-29 22:18 ` Chaitanya Kulkarni 1 sibling, 1 reply; 11+ messages in thread From: Johannes Thumshirn @ 2024-04-29 14:35 UTC (permalink / raw) To: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER On 23.04.24 15:18, Yi Zhang wrote: > Hi > I found this issue on the latest linux-block/for-next by blktests > nvme/tcp nvme/012, please help check it and let me know if you need > any info/testing for it, thanks. > > [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 > [ 1873.761900] loop0: detected capacity change from 0 to 2097152 > [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) > [ 1874.208883] nvmet: creating nvm controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > [ 1874.243423] nvme nvme0: creating 48 I/O queues. > [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. > [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr > 127.0.0.1:4420, hostnqn: > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 > [ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem > b5a2998b-e06a-40d0-a291-a46b67aa43db > [ 1875.770202] XFS (nvme0n1): Ending clean mount > [ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.182573] nvme nvme0: starting error recovery > [ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.205619] block nvme0n1: no usable path - requeuing I/O > [ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.218975] block nvme0n1: no usable path - requeuing I/O > [ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.232395] block nvme0n1: no usable path - requeuing I/O > [ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.245731] block nvme0n1: no usable path - requeuing I/O > [ 1936.251165] block nvme0n1: no usable path - requeuing I/O > [ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.264676] block nvme0n1: no usable path - requeuing I/O > [ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.277848] block nvme0n1: no usable path - requeuing I/O > [ 1936.283294] block nvme0n1: no usable path - requeuing I/O > [ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.296450] block nvme0n1: no usable path - requeuing I/O > [ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.309621] block nvme0n1: no usable path - requeuing I/O > [ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O > Cmd) QID 2 timeout > [ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O > Cmd) QID 8 timeout > [ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O > Cmd) QID 8 timeout > [ 1936.449539] general protection fault, probably for non-canonical > address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI > [ 1936.450749] nvme nvme0: Reconnecting in 10 seconds... > [ 1936.460764] KASAN: null-ptr-deref in range > [0x0000000000000020-0x0000000000000027] > [ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2 > [ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS > 2.20.1 09/13/2023 > [ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn > [ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 > [ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00 > 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 > c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48 > b8 00 > [ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002 > [ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a > [ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020 > [ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3 > [ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000 > [ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000 > [ 1936.557847] FS: 0000000000000000(0000) GS:ffff88c7aba00000(0000) > knlGS:0000000000000000 > [ 1936.565930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0 > [ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1936.593072] PKRU: 55555554 > [ 1936.595784] Call Trace: > [ 1936.598237] <TASK> > [ 1936.600344] ? __die_body.cold+0x19/0x25 > [ 1936.604279] ? die_addr+0x46/0x70 > [ 1936.607606] ? exc_general_protection+0x14f/0x250 > [ 1936.612325] ? asm_exc_general_protection+0x26/0x30 > [ 1936.617212] ? blk_flush_complete_seq+0x450/0x1060 > [ 1936.622012] ? __pfx_blk_flush_complete_seq+0x10/0x10 > [ 1936.627072] mq_flush_data_end_io+0x279/0x500 > [ 1936.631430] ? __pfx_mq_flush_data_end_io+0x10/0x10 > [ 1936.636308] blk_mq_end_request+0x1b8/0x690 > [ 1936.640493] ? _raw_spin_unlock_irqrestore+0x45/0x80 > [ 1936.645460] nvme_failover_req+0x37d/0x4e0 [nvme_core] > [ 1936.650634] nvme_fail_nonready_command+0x130/0x180 [nvme_core] > [ 1936.656578] blk_mq_dispatch_rq_list+0x3b3/0x2100 > [ 1936.661292] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 > [ 1936.666436] __blk_mq_sched_dispatch_requests+0x554/0x1510 > [ 1936.671925] ? __pfx___lock_acquire+0x10/0x10 > [ 1936.676285] ? __pfx_lock_acquired+0x10/0x10 > [ 1936.680558] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 > [ 1936.686477] ? __pfx_lock_acquire+0x10/0x10 > [ 1936.690668] blk_mq_sched_dispatch_requests+0xa6/0xf0 > [ 1936.695721] blk_mq_run_work_fn+0x122/0x2c0 > [ 1936.699908] ? blk_mq_run_work_fn+0x114/0x2c0 > [ 1936.704266] process_one_work+0x85d/0x13f0 > [ 1936.708371] ? worker_thread+0xcc/0x1130 > [ 1936.712301] ? __pfx_process_one_work+0x10/0x10 > [ 1936.716835] ? assign_work+0x16c/0x240 > [ 1936.720596] worker_thread+0x6da/0x1130 > [ 1936.724438] ? __pfx_worker_thread+0x10/0x10 > [ 1936.728712] kthread+0x2ed/0x3c0 > [ 1936.731944] ? _raw_spin_unlock_irq+0x28/0x60 > [ 1936.736303] ? __pfx_kthread+0x10/0x10 > [ 1936.740058] ret_from_fork+0x31/0x70 > [ 1936.743636] ? __pfx_kthread+0x10/0x10 > [ 1936.747387] ret_from_fork_asm+0x1a/0x30 > [ 1936.751319] </TASK> > [ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp > nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr > intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common > isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp > kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet > iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas > dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3 > i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei > lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf > ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop > nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel > polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 > sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit > libnvdimm [last unloaded: nvmet] > [ 1936.829942] ---[ end trace 0000000000000000 ]--- > [ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 > [ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00 > 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 > c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48 > b8 00 > [ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002 > [ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a > [ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020 > [ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3 > [ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000 > [ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000 > [ 1936.945811] FS: 0000000000000000(0000) GS:ffff88c7aba00000(0000) > knlGS:0000000000000000 > [ 1936.953899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0 > [ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1936.981039] PKRU: 55555554 > [ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled > [ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1 > [ 1946.926323] nvmet: creating nvm controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > [ 1946.944673] nvme nvme0: creating 48 I/O queues. > > (gdb) l *(blk_flush_complete_seq+0x450) > 0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133). > 128 * After flush data completion, @rq->bio is %NULL but we need to > 129 * complete the bio again. @rq->biotail is guaranteed to equal the > 130 * original @rq->bio. Restore it. > 131 */ > 132 rq->bio = rq->biotail; > 133 rq->__sector = rq->bio->bi_iter.bi_sector; > 134 > 135 /* make @rq a normal request */ > 136 rq->rq_flags &= ~RQF_FLUSH_SEQ; > 137 rq->end_io = rq->flush.saved_end_io; > > > > -- > Best Regards, > Yi Zhang > > > I'm sorry, but I fail to recreate the reported bug: root@virtme-ng:blktests# nvme_trtype=tcp ./check nvme/012 nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) runtime 75.900s ... [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 [ 327.475957] loop0: detected capacity change from 0 to 2097152 [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) [ 327.872343] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices. [ 327.878095] nvme nvme0: creating 4 I/O queues. [ 327.882741] nvme nvme0: mapped 4/0/0 default/read/poll queues. [ 327.886786] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 [ 328.465762] XFS (nvme0n1): Mounting V5 Filesystem 1637ca0c-2e42-4816-b46b-e1348e583389 [ 328.510061] XFS (nvme0n1): Ending clean mount [ 328.517226] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038-01-19 (0x7fffffff) [ 386.394225] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 [ 472.615859] perf: interrupt took too long (3154 > 3127), lowering kernel.perf_event_max_sample_rate to 63250 nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed] runtime 75.900s ... 203.043s ctrl: NQN "blktests-subsystem-1" root@virtme-ng:blktests# Any pointers what I could be doing wrong? My tested tree HEAD is: commit 518f377c90e510032b4b0f0ebf709dfd9057043d (HEAD -> test, block-for-next) Merge: 9d60dddf961a a4d416dc6098 Author: Jens Axboe <axboe@kernel.dk> Date: Fri Apr 26 07:40:36 2024 -0600 Merge branch 'for-6.10/io_uring' into for-next * for-6.10/io_uring: io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-04-29 14:35 ` [bug report] " Johannes Thumshirn @ 2024-04-29 22:18 ` Chaitanya Kulkarni 2024-04-30 6:16 ` Johannes Thumshirn 0 siblings, 1 reply; 11+ messages in thread From: Chaitanya Kulkarni @ 2024-04-29 22:18 UTC (permalink / raw) To: Johannes Thumshirn; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER On 4/29/24 07:35, Johannes Thumshirn wrote: > On 23.04.24 15:18, Yi Zhang wrote: >> Hi >> I found this issue on the latest linux-block/for-next by blktests >> nvme/tcp nvme/012, please help check it and let me know if you need >> any info/testing for it, thanks. >> >> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 >> [ 1873.761900] loop0: detected capacity change from 0 to 2097152 >> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem >> blktests-subsystem-1 for NQN >> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >> [ 1874.243423] nvme nvme0: creating 48 I/O queues. >> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. >> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr >> 127.0.0.1:4420, hostnqn: >> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 [...] >> >> [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 >> [ 327.475957] loop0: detected capacity change from 0 to 2097152 >> [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >> >> [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >> >> >> [ 327.872343] nvmet: creating nvm controller 1 for subsystem >> blktests-subsystem-1 for NQN >> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >> >> [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full >> support of multi-port devices. seems like you don't have multipath enabled that is one difference I can see in above log posted by Yi, and your log. >> [ 327.878095] nvme nvme0: creating 4 I/O queues. >> >> >> [ 327.882741] nvme nvme0: mapped 4/0/0 default/read/poll queues. >> [ 327.886786] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr >> 127.0.0.1:4420, hostnqn: >> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 >> -ck ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-04-29 22:18 ` Chaitanya Kulkarni @ 2024-04-30 6:16 ` Johannes Thumshirn 2024-04-30 14:17 ` Yi Zhang 0 siblings, 1 reply; 11+ messages in thread From: Johannes Thumshirn @ 2024-04-30 6:16 UTC (permalink / raw) To: Chaitanya Kulkarni; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER On 30.04.24 00:18, Chaitanya Kulkarni wrote: > On 4/29/24 07:35, Johannes Thumshirn wrote: >> On 23.04.24 15:18, Yi Zhang wrote: >>> Hi >>> I found this issue on the latest linux-block/for-next by blktests >>> nvme/tcp nvme/012, please help check it and let me know if you need >>> any info/testing for it, thanks. >>> >>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 >>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152 >>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem >>> blktests-subsystem-1 for NQN >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>> [ 1874.243423] nvme nvme0: creating 48 I/O queues. >>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. >>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr >>> 127.0.0.1:4420, hostnqn: >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 > > [...] > >>> >>> [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 >>> [ 327.475957] loop0: detected capacity change from 0 to 2097152 >>> [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>> >>> [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>> >>> >>> [ 327.872343] nvmet: creating nvm controller 1 for subsystem >>> blktests-subsystem-1 for NQN >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>> >>> [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full >>> support of multi-port devices. > > seems like you don't have multipath enabled that is one difference > I can see in above log posted by Yi, and your log. Yup, but even with multipath enabled I can't get the bug to trigger :( nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [ 279.642826] run blktests nvme/012 at 2024-04-29 18:52:26 [ 280.296493] loop0: detected capacity change from 0 to 2097152 [ 280.360139] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 280.426171] nvmet_tcp: enabling port 0 (127.0.0.1:4420) [ 280.712262] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 280.718259] nvme nvme0: creating 4 I/O queues. [ 280.722258] nvme nvme0: mapped 4/0/0 default/read/poll queues. [ 280.726088] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 [ 281.343044] XFS (nvme0n1): Mounting V5 Filesystem 513881ee-db18-48c7-a2b0-3e4e3e41f38c [ 281.381925] XFS (nvme0n1): Ending clean mount [ 281.390154] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038-01-19 (0x7fffffff) [ 309.958309] perf: interrupt took too long (2593 > 2500), lowering kernel.perf_event_max_sample_rate to 77000 [ 377.847337] perf: interrupt took too long (3278 > 3241), lowering kernel.perf_event_max_sample_rate to 61000 [ 471.964099] XFS (nvme0n1): Unmounting Filesystem 513881ee-db18-48c7-a2b0-3e4e3e41f38c nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed] runtime ... 192.747s Can you see if you can reproduce it on your side? Thanks, Johannes ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-04-30 6:16 ` Johannes Thumshirn @ 2024-04-30 14:17 ` Yi Zhang 2024-05-03 7:59 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Yi Zhang @ 2024-04-30 14:17 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn <Johannes.Thumshirn@wdc.com> wrote: > > On 30.04.24 00:18, Chaitanya Kulkarni wrote: > > On 4/29/24 07:35, Johannes Thumshirn wrote: > >> On 23.04.24 15:18, Yi Zhang wrote: > >>> Hi > >>> I found this issue on the latest linux-block/for-next by blktests > >>> nvme/tcp nvme/012, please help check it and let me know if you need > >>> any info/testing for it, thanks. > >>> > >>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 > >>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152 > >>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > >>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) > >>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem > >>> blktests-subsystem-1 for NQN > >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > >>> [ 1874.243423] nvme nvme0: creating 48 I/O queues. > >>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. > >>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr > >>> 127.0.0.1:4420, hostnqn: > >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 > > > > [...] > > > >>> > >>> [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 > >>> [ 327.475957] loop0: detected capacity change from 0 to 2097152 > >>> [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > >>> > >>> [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) > >>> > >>> > >>> [ 327.872343] nvmet: creating nvm controller 1 for subsystem > >>> blktests-subsystem-1 for NQN > >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > >>> > >>> [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full > >>> support of multi-port devices. > > > > seems like you don't have multipath enabled that is one difference > > I can see in above log posted by Yi, and your log. > > > Yup, but even with multipath enabled I can't get the bug to trigger :( It's not one 100% reproduced issue, I tried on my another server and it cannot be reproduced. > > nvme/012 (run mkfs and data verification fio job on NVMeOF block > device-backed ns) > > [ 279.642826] run blktests nvme/012 at 2024-04-29 18:52:26 > > [ 280.296493] loop0: detected capacity change from 0 to 2097152 > [ 280.360139] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > > > [ 280.426171] nvmet_tcp: enabling port 0 (127.0.0.1:4420) > [ 280.712262] nvmet: creating nvm controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. > > [ 280.718259] nvme nvme0: creating 4 I/O queues. > > > [ 280.722258] nvme nvme0: mapped 4/0/0 default/read/poll queues. > > > [ 280.726088] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr > 127.0.0.1:4420, hostnqn: > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 > [ 281.343044] XFS (nvme0n1): Mounting V5 Filesystem > 513881ee-db18-48c7-a2b0-3e4e3e41f38c > > [ 281.381925] XFS (nvme0n1): Ending clean mount > > > [ 281.390154] xfs filesystem being mounted at /mnt/blktests supports > timestamps until 2038-01-19 (0x7fffffff) > [ 309.958309] perf: interrupt took too long (2593 > 2500), lowering > kernel.perf_event_max_sample_rate to 77000 > > [ 377.847337] perf: interrupt took too long (3278 > 3241), lowering > kernel.perf_event_max_sample_rate to 61000 > > [ 471.964099] XFS (nvme0n1): Unmounting Filesystem > 513881ee-db18-48c7-a2b0-3e4e3e41f38c > > nvme/012 (run mkfs and data verification fio job on NVMeOF block > device-backed ns) [passed] > > runtime ... 192.747s > > > > Can you see if you can reproduce it on your side? > > Thanks, > Johannes -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-04-30 14:17 ` Yi Zhang @ 2024-05-03 7:59 ` Sagi Grimberg 2024-05-03 10:32 ` Johannes Thumshirn 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2024-05-03 7:59 UTC (permalink / raw) To: Yi Zhang, Johannes Thumshirn, Damien Le Moal Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER On 4/30/24 17:17, Yi Zhang wrote: > On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn > <Johannes.Thumshirn@wdc.com> wrote: >> On 30.04.24 00:18, Chaitanya Kulkarni wrote: >>> On 4/29/24 07:35, Johannes Thumshirn wrote: >>>> On 23.04.24 15:18, Yi Zhang wrote: >>>>> Hi >>>>> I found this issue on the latest linux-block/for-next by blktests >>>>> nvme/tcp nvme/012, please help check it and let me know if you need >>>>> any info/testing for it, thanks. >>>>> >>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 >>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152 >>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem >>>>> blktests-subsystem-1 for NQN >>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues. >>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. >>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr >>>>> 127.0.0.1:4420, hostnqn: >>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 >>> [...] >>> >>>>> [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 >>>>> [ 327.475957] loop0: detected capacity change from 0 to 2097152 >>>>> [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>>>> >>>>> [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>>>> >>>>> >>>>> [ 327.872343] nvmet: creating nvm controller 1 for subsystem >>>>> blktests-subsystem-1 for NQN >>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>>>> >>>>> [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full >>>>> support of multi-port devices. >>> seems like you don't have multipath enabled that is one difference >>> I can see in above log posted by Yi, and your log. >> >> Yup, but even with multipath enabled I can't get the bug to trigger :( > It's not one 100% reproduced issue, I tried on my another server and > it cannot be reproduced. Looking at the trace, I think I can see the issue here. In the test case, nvme-mpath fails the request upon submission as the queue is not live, and because it is a mpath request, it is failed over using nvme_failover_request, which steals the bios from the request to its private requeue list. The bisected patch, introduces req->bio dereference to a flush request that has no bios (stolen by the failover sequence). The reproduction seems to be related to in where in the flush sequence the request completion is called. I am unsure if simply making the dereference is the correct fix or not... Damien? -- diff --git a/block/blk-flush.c b/block/blk-flush.c index 2f58ae018464..c17cf8ed8113 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request *rq) * original @rq->bio. Restore it. */ rq->bio = rq->biotail; - rq->__sector = rq->bio->bi_iter.bi_sector; + if (rq->bio) + rq->__sector = rq->bio->bi_iter.bi_sector; /* make @rq a normal request */ rq->rq_flags &= ~RQF_FLUSH_SEQ; -- ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-05-03 7:59 ` Sagi Grimberg @ 2024-05-03 10:32 ` Johannes Thumshirn 2024-05-03 11:01 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Johannes Thumshirn @ 2024-05-03 10:32 UTC (permalink / raw) To: Sagi Grimberg, Yi Zhang, Damien Le Moal Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER On 03.05.24 09:59, Sagi Grimberg wrote: > > > On 4/30/24 17:17, Yi Zhang wrote: >> On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn >> <Johannes.Thumshirn@wdc.com> wrote: >>> On 30.04.24 00:18, Chaitanya Kulkarni wrote: >>>> On 4/29/24 07:35, Johannes Thumshirn wrote: >>>>> On 23.04.24 15:18, Yi Zhang wrote: >>>>>> Hi >>>>>> I found this issue on the latest linux-block/for-next by blktests >>>>>> nvme/tcp nvme/012, please help check it and let me know if you need >>>>>> any info/testing for it, thanks. >>>>>> >>>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 >>>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152 >>>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem >>>>>> blktests-subsystem-1 for NQN >>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues. >>>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. >>>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr >>>>>> 127.0.0.1:4420, hostnqn: >>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 >>>> [...] >>>> >>>>>> [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 >>>>>> [ 327.475957] loop0: detected capacity change from 0 to 2097152 >>>>>> [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>>>>> >>>>>> [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>>>>> >>>>>> >>>>>> [ 327.872343] nvmet: creating nvm controller 1 for subsystem >>>>>> blktests-subsystem-1 for NQN >>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>>>>> >>>>>> [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full >>>>>> support of multi-port devices. >>>> seems like you don't have multipath enabled that is one difference >>>> I can see in above log posted by Yi, and your log. >>> >>> Yup, but even with multipath enabled I can't get the bug to trigger :( >> It's not one 100% reproduced issue, I tried on my another server and >> it cannot be reproduced. > > Looking at the trace, I think I can see the issue here. In the test > case, nvme-mpath fails > the request upon submission as the queue is not live, and because it is > a mpath request, it > is failed over using nvme_failover_request, which steals the bios from > the request to its private > requeue list. > > The bisected patch, introduces req->bio dereference to a flush request > that has no bios (stolen > by the failover sequence). The reproduction seems to be related to in > where in the flush sequence > the request completion is called. > > I am unsure if simply making the dereference is the correct fix or > not... Damien? > -- > diff --git a/block/blk-flush.c b/block/blk-flush.c > index 2f58ae018464..c17cf8ed8113 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request > *rq) > * original @rq->bio. Restore it. > */ > rq->bio = rq->biotail; > - rq->__sector = rq->bio->bi_iter.bi_sector; > + if (rq->bio) > + rq->__sector = rq->bio->bi_iter.bi_sector; > > /* make @rq a normal request */ > rq->rq_flags &= ~RQF_FLUSH_SEQ; > -- > This is something Damien added to his patch series. I just wonder, why I couldn't reproduce the failure, even with nvme-mpath enabled. I tried both nvme-tcp as well as nvme-loop without any problems. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-05-03 10:32 ` Johannes Thumshirn @ 2024-05-03 11:01 ` Sagi Grimberg 2024-05-03 21:14 ` Chaitanya Kulkarni 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2024-05-03 11:01 UTC (permalink / raw) To: Johannes Thumshirn, Yi Zhang, Damien Le Moal Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER On 03/05/2024 13:32, Johannes Thumshirn wrote: > On 03.05.24 09:59, Sagi Grimberg wrote: >> >> On 4/30/24 17:17, Yi Zhang wrote: >>> On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn >>> <Johannes.Thumshirn@wdc.com> wrote: >>>> On 30.04.24 00:18, Chaitanya Kulkarni wrote: >>>>> On 4/29/24 07:35, Johannes Thumshirn wrote: >>>>>> On 23.04.24 15:18, Yi Zhang wrote: >>>>>>> Hi >>>>>>> I found this issue on the latest linux-block/for-next by blktests >>>>>>> nvme/tcp nvme/012, please help check it and let me know if you need >>>>>>> any info/testing for it, thanks. >>>>>>> >>>>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47 >>>>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152 >>>>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>>>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>>>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem >>>>>>> blktests-subsystem-1 for NQN >>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>>>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues. >>>>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues. >>>>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr >>>>>>> 127.0.0.1:4420, hostnqn: >>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 >>>>> [...] >>>>> >>>>>>> [ 326.827260] run blktests nvme/012 at 2024-04-29 16:28:31 >>>>>>> [ 327.475957] loop0: detected capacity change from 0 to 2097152 >>>>>>> [ 327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >>>>>>> >>>>>>> [ 327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) >>>>>>> >>>>>>> >>>>>>> [ 327.872343] nvmet: creating nvm controller 1 for subsystem >>>>>>> blktests-subsystem-1 for NQN >>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. >>>>>>> >>>>>>> [ 327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full >>>>>>> support of multi-port devices. >>>>> seems like you don't have multipath enabled that is one difference >>>>> I can see in above log posted by Yi, and your log. >>>> Yup, but even with multipath enabled I can't get the bug to trigger :( >>> It's not one 100% reproduced issue, I tried on my another server and >>> it cannot be reproduced. >> Looking at the trace, I think I can see the issue here. In the test >> case, nvme-mpath fails >> the request upon submission as the queue is not live, and because it is >> a mpath request, it >> is failed over using nvme_failover_request, which steals the bios from >> the request to its private >> requeue list. >> >> The bisected patch, introduces req->bio dereference to a flush request >> that has no bios (stolen >> by the failover sequence). The reproduction seems to be related to in >> where in the flush sequence >> the request completion is called. >> >> I am unsure if simply making the dereference is the correct fix or >> not... Damien? >> -- >> diff --git a/block/blk-flush.c b/block/blk-flush.c >> index 2f58ae018464..c17cf8ed8113 100644 >> --- a/block/blk-flush.c >> +++ b/block/blk-flush.c >> @@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request >> *rq) >> * original @rq->bio. Restore it. >> */ >> rq->bio = rq->biotail; >> - rq->__sector = rq->bio->bi_iter.bi_sector; >> + if (rq->bio) >> + rq->__sector = rq->bio->bi_iter.bi_sector; >> >> /* make @rq a normal request */ >> rq->rq_flags &= ~RQF_FLUSH_SEQ; >> -- >> > > This is something Damien added to his patch series. I just wonder, why I > couldn't reproduce the failure, even with nvme-mpath enabled. I tried > both nvme-tcp as well as nvme-loop without any problems. Not exactly sure. From what I see blk_flush_complete_seq() will only call blk_flush_restore_request() and panic is for error != 0. And if that is the case, any request with its bios stolen must panic. However, nvme-mpath always ends a stolen request with error = 0. Seems that there is code that may override the request error status in flush_end_io() but I cannot see it in the trace... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-05-03 11:01 ` Sagi Grimberg @ 2024-05-03 21:14 ` Chaitanya Kulkarni 2024-05-09 6:15 ` Yi Zhang 0 siblings, 1 reply; 11+ messages in thread From: Chaitanya Kulkarni @ 2024-05-03 21:14 UTC (permalink / raw) To: Sagi Grimberg, Johannes Thumshirn, Yi Zhang, Damien Le Moal Cc: linux-block, open list:NVM EXPRESS DRIVER >> This is something Damien added to his patch series. I just wonder, why I >> couldn't reproduce the failure, even with nvme-mpath enabled. I tried >> both nvme-tcp as well as nvme-loop without any problems. > > Not exactly sure. > > From what I see blk_flush_complete_seq() will only call > blk_flush_restore_request() and > panic is for error != 0. And if that is the case, any request with its > bios stolen must panic. > > However, nvme-mpath always ends a stolen request with error = 0. > > Seems that there is code that may override the request error status in > flush_end_io() but I cannot > see it in the trace... I confirm that after several tries I cannot reproduced it either with and without multi-pathing, blktests is passing without any errors for both nvme-loop and nvme-tcp .... -ck ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 2024-05-03 21:14 ` Chaitanya Kulkarni @ 2024-05-09 6:15 ` Yi Zhang 0 siblings, 0 replies; 11+ messages in thread From: Yi Zhang @ 2024-05-09 6:15 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Sagi Grimberg, Johannes Thumshirn, Damien Le Moal, linux-block, open list:NVM EXPRESS DRIVER Confirmed the issue was fixed with this patch. https://lore.kernel.org/linux-block/20240501110907.96950-9-dlemoal@kernel.org/ On Sat, May 4, 2024 at 5:14 AM Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote: > > > >> This is something Damien added to his patch series. I just wonder, why I > >> couldn't reproduce the failure, even with nvme-mpath enabled. I tried > >> both nvme-tcp as well as nvme-loop without any problems. > > > > Not exactly sure. > > > > From what I see blk_flush_complete_seq() will only call > > blk_flush_restore_request() and > > panic is for error != 0. And if that is the case, any request with its > > bios stolen must panic. > > > > However, nvme-mpath always ends a stolen request with error = 0. > > > > Seems that there is code that may override the request error status in > > flush_end_io() but I cannot > > see it in the trace... > > > I confirm that after several tries I cannot reproduced it either with > and without multi-pathing, blktests is passing without any errors for > both nvme-loop and nvme-tcp .... > > -ck > > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-05-09 6:15 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang 2024-04-26 8:30 ` [bug report][bisected] " Yi Zhang 2024-04-29 14:35 ` [bug report] " Johannes Thumshirn 2024-04-29 22:18 ` Chaitanya Kulkarni 2024-04-30 6:16 ` Johannes Thumshirn 2024-04-30 14:17 ` Yi Zhang 2024-05-03 7:59 ` Sagi Grimberg 2024-05-03 10:32 ` Johannes Thumshirn 2024-05-03 11:01 ` Sagi Grimberg 2024-05-03 21:14 ` Chaitanya Kulkarni 2024-05-09 6:15 ` Yi Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).