linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
@ 2024-04-23 13:17 Yi Zhang
  2024-04-26  8:30 ` [bug report][bisected] " Yi Zhang
  2024-04-29 14:35 ` [bug report] " Johannes Thumshirn
  0 siblings, 2 replies; 10+ messages in thread
From: Yi Zhang @ 2024-04-23 13:17 UTC (permalink / raw)
  To: linux-block, open list:NVM EXPRESS DRIVER

Hi
I found this issue on the latest linux-block/for-next by blktests
nvme/tcp nvme/012, please help check it and let me know if you need
any info/testing for it, thanks.

[ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
[ 1873.761900] loop0: detected capacity change from 0 to 2097152
[ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 1874.208883] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 1874.243423] nvme nvme0: creating 48 I/O queues.
[ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem
b5a2998b-e06a-40d0-a291-a46b67aa43db
[ 1875.770202] XFS (nvme0n1): Ending clean mount
[ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.182573] nvme nvme0: starting error recovery
[ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.205619] block nvme0n1: no usable path - requeuing I/O
[ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.218975] block nvme0n1: no usable path - requeuing I/O
[ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.232395] block nvme0n1: no usable path - requeuing I/O
[ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.245731] block nvme0n1: no usable path - requeuing I/O
[ 1936.251165] block nvme0n1: no usable path - requeuing I/O
[ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.264676] block nvme0n1: no usable path - requeuing I/O
[ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.277848] block nvme0n1: no usable path - requeuing I/O
[ 1936.283294] block nvme0n1: no usable path - requeuing I/O
[ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.296450] block nvme0n1: no usable path - requeuing I/O
[ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.309621] block nvme0n1: no usable path - requeuing I/O
[ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O
Cmd) QID 2 timeout
[ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O
Cmd) QID 8 timeout
[ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O
Cmd) QID 8 timeout
[ 1936.449539] general protection fault, probably for non-canonical
address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI
[ 1936.450749] nvme nvme0: Reconnecting in 10 seconds...
[ 1936.460764] KASAN: null-ptr-deref in range
[0x0000000000000020-0x0000000000000027]
[ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2
[ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
2.20.1 09/13/2023
[ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn
[ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
[ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
b8 00
[ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
[ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
[ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
[ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
[ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
[ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
[ 1936.557847] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
knlGS:0000000000000000
[ 1936.565930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
[ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1936.593072] PKRU: 55555554
[ 1936.595784] Call Trace:
[ 1936.598237]  <TASK>
[ 1936.600344]  ? __die_body.cold+0x19/0x25
[ 1936.604279]  ? die_addr+0x46/0x70
[ 1936.607606]  ? exc_general_protection+0x14f/0x250
[ 1936.612325]  ? asm_exc_general_protection+0x26/0x30
[ 1936.617212]  ? blk_flush_complete_seq+0x450/0x1060
[ 1936.622012]  ? __pfx_blk_flush_complete_seq+0x10/0x10
[ 1936.627072]  mq_flush_data_end_io+0x279/0x500
[ 1936.631430]  ? __pfx_mq_flush_data_end_io+0x10/0x10
[ 1936.636308]  blk_mq_end_request+0x1b8/0x690
[ 1936.640493]  ? _raw_spin_unlock_irqrestore+0x45/0x80
[ 1936.645460]  nvme_failover_req+0x37d/0x4e0 [nvme_core]
[ 1936.650634]  nvme_fail_nonready_command+0x130/0x180 [nvme_core]
[ 1936.656578]  blk_mq_dispatch_rq_list+0x3b3/0x2100
[ 1936.661292]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[ 1936.666436]  __blk_mq_sched_dispatch_requests+0x554/0x1510
[ 1936.671925]  ? __pfx___lock_acquire+0x10/0x10
[ 1936.676285]  ? __pfx_lock_acquired+0x10/0x10
[ 1936.680558]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[ 1936.686477]  ? __pfx_lock_acquire+0x10/0x10
[ 1936.690668]  blk_mq_sched_dispatch_requests+0xa6/0xf0
[ 1936.695721]  blk_mq_run_work_fn+0x122/0x2c0
[ 1936.699908]  ? blk_mq_run_work_fn+0x114/0x2c0
[ 1936.704266]  process_one_work+0x85d/0x13f0
[ 1936.708371]  ? worker_thread+0xcc/0x1130
[ 1936.712301]  ? __pfx_process_one_work+0x10/0x10
[ 1936.716835]  ? assign_work+0x16c/0x240
[ 1936.720596]  worker_thread+0x6da/0x1130
[ 1936.724438]  ? __pfx_worker_thread+0x10/0x10
[ 1936.728712]  kthread+0x2ed/0x3c0
[ 1936.731944]  ? _raw_spin_unlock_irq+0x28/0x60
[ 1936.736303]  ? __pfx_kthread+0x10/0x10
[ 1936.740058]  ret_from_fork+0x31/0x70
[ 1936.743636]  ? __pfx_kthread+0x10/0x10
[ 1936.747387]  ret_from_fork_asm+0x1a/0x30
[ 1936.751319]  </TASK>
[ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp
nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr
intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet
iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas
dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3
i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei
lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf
ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop
nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit
libnvdimm [last unloaded: nvmet]
[ 1936.829942] ---[ end trace 0000000000000000 ]---
[ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
[ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
b8 00
[ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
[ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
[ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
[ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
[ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
[ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
[ 1936.945811] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
knlGS:0000000000000000
[ 1936.953899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
[ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1936.981039] PKRU: 55555554
[ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled
[ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1
[ 1946.926323] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 1946.944673] nvme nvme0: creating 48 I/O queues.

(gdb) l *(blk_flush_complete_seq+0x450)
0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133).
128 * After flush data completion, @rq->bio is %NULL but we need to
129 * complete the bio again.  @rq->biotail is guaranteed to equal the
130 * original @rq->bio.  Restore it.
131 */
132 rq->bio = rq->biotail;
133 rq->__sector = rq->bio->bi_iter.bi_sector;
134
135 /* make @rq a normal request */
136 rq->rq_flags &= ~RQF_FLUSH_SEQ;
137 rq->end_io = rq->flush.saved_end_io;



--
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report][bisected] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang
@ 2024-04-26  8:30 ` Yi Zhang
  2024-04-29 14:35 ` [bug report] " Johannes Thumshirn
  1 sibling, 0 replies; 10+ messages in thread
From: Yi Zhang @ 2024-04-26  8:30 UTC (permalink / raw)
  To: linux-block, open list:NVM EXPRESS DRIVER
  Cc: Damien Le Moal, Jens Axboe, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke

Hi Damien

Bisect shows it was introduced with this patch, could you help check  it

commit 6f8fd758de63bab513c551bb1796a14f8cdc40d9
Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Apr 8 10:41:01 2024 +0900

    block: Restore sector of flush requests

On Tue, Apr 23, 2024 at 9:17 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hi
> I found this issue on the latest linux-block/for-next by blktests
> nvme/tcp nvme/012, please help check it and let me know if you need
> any info/testing for it, thanks.
>
> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem
> b5a2998b-e06a-40d0-a291-a46b67aa43db
> [ 1875.770202] XFS (nvme0n1): Ending clean mount
> [ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.182573] nvme nvme0: starting error recovery
> [ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.205619] block nvme0n1: no usable path - requeuing I/O
> [ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.218975] block nvme0n1: no usable path - requeuing I/O
> [ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.232395] block nvme0n1: no usable path - requeuing I/O
> [ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.245731] block nvme0n1: no usable path - requeuing I/O
> [ 1936.251165] block nvme0n1: no usable path - requeuing I/O
> [ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.264676] block nvme0n1: no usable path - requeuing I/O
> [ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.277848] block nvme0n1: no usable path - requeuing I/O
> [ 1936.283294] block nvme0n1: no usable path - requeuing I/O
> [ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.296450] block nvme0n1: no usable path - requeuing I/O
> [ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.309621] block nvme0n1: no usable path - requeuing I/O
> [ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O
> Cmd) QID 8 timeout
> [ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.449539] general protection fault, probably for non-canonical
> address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 1936.450749] nvme nvme0: Reconnecting in 10 seconds...
> [ 1936.460764] KASAN: null-ptr-deref in range
> [0x0000000000000020-0x0000000000000027]
> [ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2
> [ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
> 2.20.1 09/13/2023
> [ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn
> [ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
> [ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
> 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
> c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
> b8 00
> [ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
> [ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
> [ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
> [ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
> [ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
> [ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
> [ 1936.557847] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
> knlGS:0000000000000000
> [ 1936.565930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
> [ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1936.593072] PKRU: 55555554
> [ 1936.595784] Call Trace:
> [ 1936.598237]  <TASK>
> [ 1936.600344]  ? __die_body.cold+0x19/0x25
> [ 1936.604279]  ? die_addr+0x46/0x70
> [ 1936.607606]  ? exc_general_protection+0x14f/0x250
> [ 1936.612325]  ? asm_exc_general_protection+0x26/0x30
> [ 1936.617212]  ? blk_flush_complete_seq+0x450/0x1060
> [ 1936.622012]  ? __pfx_blk_flush_complete_seq+0x10/0x10
> [ 1936.627072]  mq_flush_data_end_io+0x279/0x500
> [ 1936.631430]  ? __pfx_mq_flush_data_end_io+0x10/0x10
> [ 1936.636308]  blk_mq_end_request+0x1b8/0x690
> [ 1936.640493]  ? _raw_spin_unlock_irqrestore+0x45/0x80
> [ 1936.645460]  nvme_failover_req+0x37d/0x4e0 [nvme_core]
> [ 1936.650634]  nvme_fail_nonready_command+0x130/0x180 [nvme_core]
> [ 1936.656578]  blk_mq_dispatch_rq_list+0x3b3/0x2100
> [ 1936.661292]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
> [ 1936.666436]  __blk_mq_sched_dispatch_requests+0x554/0x1510
> [ 1936.671925]  ? __pfx___lock_acquire+0x10/0x10
> [ 1936.676285]  ? __pfx_lock_acquired+0x10/0x10
> [ 1936.680558]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
> [ 1936.686477]  ? __pfx_lock_acquire+0x10/0x10
> [ 1936.690668]  blk_mq_sched_dispatch_requests+0xa6/0xf0
> [ 1936.695721]  blk_mq_run_work_fn+0x122/0x2c0
> [ 1936.699908]  ? blk_mq_run_work_fn+0x114/0x2c0
> [ 1936.704266]  process_one_work+0x85d/0x13f0
> [ 1936.708371]  ? worker_thread+0xcc/0x1130
> [ 1936.712301]  ? __pfx_process_one_work+0x10/0x10
> [ 1936.716835]  ? assign_work+0x16c/0x240
> [ 1936.720596]  worker_thread+0x6da/0x1130
> [ 1936.724438]  ? __pfx_worker_thread+0x10/0x10
> [ 1936.728712]  kthread+0x2ed/0x3c0
> [ 1936.731944]  ? _raw_spin_unlock_irq+0x28/0x60
> [ 1936.736303]  ? __pfx_kthread+0x10/0x10
> [ 1936.740058]  ret_from_fork+0x31/0x70
> [ 1936.743636]  ? __pfx_kthread+0x10/0x10
> [ 1936.747387]  ret_from_fork_asm+0x1a/0x30
> [ 1936.751319]  </TASK>
> [ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp
> nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr
> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
> isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet
> iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas
> dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3
> i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei
> lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf
> ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop
> nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
> sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit
> libnvdimm [last unloaded: nvmet]
> [ 1936.829942] ---[ end trace 0000000000000000 ]---
> [ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
> [ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
> 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
> c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
> b8 00
> [ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
> [ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
> [ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
> [ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
> [ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
> [ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
> [ 1936.945811] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
> knlGS:0000000000000000
> [ 1936.953899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
> [ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1936.981039] PKRU: 55555554
> [ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled
> [ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1
> [ 1946.926323] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1946.944673] nvme nvme0: creating 48 I/O queues.
>
> (gdb) l *(blk_flush_complete_seq+0x450)
> 0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133).
> 128 * After flush data completion, @rq->bio is %NULL but we need to
> 129 * complete the bio again.  @rq->biotail is guaranteed to equal the
> 130 * original @rq->bio.  Restore it.
> 131 */
> 132 rq->bio = rq->biotail;
> 133 rq->__sector = rq->bio->bi_iter.bi_sector;
> 134
> 135 /* make @rq a normal request */
> 136 rq->rq_flags &= ~RQF_FLUSH_SEQ;
> 137 rq->end_io = rq->flush.saved_end_io;
>
>
>
> --
> Best Regards,
>   Yi Zhang



-- 
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang
  2024-04-26  8:30 ` [bug report][bisected] " Yi Zhang
@ 2024-04-29 14:35 ` Johannes Thumshirn
  2024-04-29 22:18   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Thumshirn @ 2024-04-29 14:35 UTC (permalink / raw)
  To: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER

On 23.04.24 15:18, Yi Zhang wrote:
> Hi
> I found this issue on the latest linux-block/for-next by blktests
> nvme/tcp nvme/012, please help check it and let me know if you need
> any info/testing for it, thanks.
> 
> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem
> b5a2998b-e06a-40d0-a291-a46b67aa43db
> [ 1875.770202] XFS (nvme0n1): Ending clean mount
> [ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.182573] nvme nvme0: starting error recovery
> [ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.205619] block nvme0n1: no usable path - requeuing I/O
> [ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.218975] block nvme0n1: no usable path - requeuing I/O
> [ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.232395] block nvme0n1: no usable path - requeuing I/O
> [ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.245731] block nvme0n1: no usable path - requeuing I/O
> [ 1936.251165] block nvme0n1: no usable path - requeuing I/O
> [ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.264676] block nvme0n1: no usable path - requeuing I/O
> [ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.277848] block nvme0n1: no usable path - requeuing I/O
> [ 1936.283294] block nvme0n1: no usable path - requeuing I/O
> [ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.296450] block nvme0n1: no usable path - requeuing I/O
> [ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.309621] block nvme0n1: no usable path - requeuing I/O
> [ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O
> Cmd) QID 8 timeout
> [ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.449539] general protection fault, probably for non-canonical
> address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 1936.450749] nvme nvme0: Reconnecting in 10 seconds...
> [ 1936.460764] KASAN: null-ptr-deref in range
> [0x0000000000000020-0x0000000000000027]
> [ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2
> [ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
> 2.20.1 09/13/2023
> [ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn
> [ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
> [ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
> 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
> c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
> b8 00
> [ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
> [ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
> [ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
> [ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
> [ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
> [ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
> [ 1936.557847] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
> knlGS:0000000000000000
> [ 1936.565930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
> [ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1936.593072] PKRU: 55555554
> [ 1936.595784] Call Trace:
> [ 1936.598237]  <TASK>
> [ 1936.600344]  ? __die_body.cold+0x19/0x25
> [ 1936.604279]  ? die_addr+0x46/0x70
> [ 1936.607606]  ? exc_general_protection+0x14f/0x250
> [ 1936.612325]  ? asm_exc_general_protection+0x26/0x30
> [ 1936.617212]  ? blk_flush_complete_seq+0x450/0x1060
> [ 1936.622012]  ? __pfx_blk_flush_complete_seq+0x10/0x10
> [ 1936.627072]  mq_flush_data_end_io+0x279/0x500
> [ 1936.631430]  ? __pfx_mq_flush_data_end_io+0x10/0x10
> [ 1936.636308]  blk_mq_end_request+0x1b8/0x690
> [ 1936.640493]  ? _raw_spin_unlock_irqrestore+0x45/0x80
> [ 1936.645460]  nvme_failover_req+0x37d/0x4e0 [nvme_core]
> [ 1936.650634]  nvme_fail_nonready_command+0x130/0x180 [nvme_core]
> [ 1936.656578]  blk_mq_dispatch_rq_list+0x3b3/0x2100
> [ 1936.661292]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
> [ 1936.666436]  __blk_mq_sched_dispatch_requests+0x554/0x1510
> [ 1936.671925]  ? __pfx___lock_acquire+0x10/0x10
> [ 1936.676285]  ? __pfx_lock_acquired+0x10/0x10
> [ 1936.680558]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
> [ 1936.686477]  ? __pfx_lock_acquire+0x10/0x10
> [ 1936.690668]  blk_mq_sched_dispatch_requests+0xa6/0xf0
> [ 1936.695721]  blk_mq_run_work_fn+0x122/0x2c0
> [ 1936.699908]  ? blk_mq_run_work_fn+0x114/0x2c0
> [ 1936.704266]  process_one_work+0x85d/0x13f0
> [ 1936.708371]  ? worker_thread+0xcc/0x1130
> [ 1936.712301]  ? __pfx_process_one_work+0x10/0x10
> [ 1936.716835]  ? assign_work+0x16c/0x240
> [ 1936.720596]  worker_thread+0x6da/0x1130
> [ 1936.724438]  ? __pfx_worker_thread+0x10/0x10
> [ 1936.728712]  kthread+0x2ed/0x3c0
> [ 1936.731944]  ? _raw_spin_unlock_irq+0x28/0x60
> [ 1936.736303]  ? __pfx_kthread+0x10/0x10
> [ 1936.740058]  ret_from_fork+0x31/0x70
> [ 1936.743636]  ? __pfx_kthread+0x10/0x10
> [ 1936.747387]  ret_from_fork_asm+0x1a/0x30
> [ 1936.751319]  </TASK>
> [ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp
> nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr
> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
> isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet
> iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas
> dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3
> i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei
> lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf
> ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop
> nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
> sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit
> libnvdimm [last unloaded: nvmet]
> [ 1936.829942] ---[ end trace 0000000000000000 ]---
> [ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
> [ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
> 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
> c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
> b8 00
> [ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
> [ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
> [ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
> [ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
> [ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
> [ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
> [ 1936.945811] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
> knlGS:0000000000000000
> [ 1936.953899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
> [ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1936.981039] PKRU: 55555554
> [ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled
> [ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1
> [ 1946.926323] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1946.944673] nvme nvme0: creating 48 I/O queues.
> 
> (gdb) l *(blk_flush_complete_seq+0x450)
> 0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133).
> 128 * After flush data completion, @rq->bio is %NULL but we need to
> 129 * complete the bio again.  @rq->biotail is guaranteed to equal the
> 130 * original @rq->bio.  Restore it.
> 131 */
> 132 rq->bio = rq->biotail;
> 133 rq->__sector = rq->bio->bi_iter.bi_sector;
> 134
> 135 /* make @rq a normal request */
> 136 rq->rq_flags &= ~RQF_FLUSH_SEQ;
> 137 rq->end_io = rq->flush.saved_end_io;
> 
> 
> 
> --
> Best Regards,
>    Yi Zhang
> 
> 
> 


I'm sorry, but I fail to recreate the reported bug:

root@virtme-ng:blktests# nvme_trtype=tcp ./check nvme/012 
 
 
 

nvme/012 (run mkfs and data verification fio job on NVMeOF block 
device-backed ns)
     runtime  75.900s  ... 
 

[  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
[  327.475957] loop0: detected capacity change from 0 to 2097152
[  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 

[  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) 
 

[  327.872343] nvmet: creating nvm controller 1 for subsystem 
blktests-subsystem-1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. 

[  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full 
support of multi-port devices. 

[  327.878095] nvme nvme0: creating 4 I/O queues. 
 

[  327.882741] nvme nvme0: mapped 4/0/0 default/read/poll queues.
[  327.886786] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
127.0.0.1:4420, hostnqn: 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 

[  328.465762] XFS (nvme0n1): Mounting V5 Filesystem 
1637ca0c-2e42-4816-b46b-e1348e583389 

[  328.510061] XFS (nvme0n1): Ending clean mount 

[  328.517226] xfs filesystem being mounted at /mnt/blktests supports 
timestamps until 2038-01-19 (0x7fffffff) 

[  386.394225] perf: interrupt took too long (2502 > 2500), lowering 
kernel.perf_event_max_sample_rate to 79750 

[  472.615859] perf: interrupt took too long (3154 > 3127), lowering 
kernel.perf_event_max_sample_rate to 63250 

nvme/012 (run mkfs and data verification fio job on NVMeOF block 
device-backed ns) [passed]
     runtime  75.900s  ...  203.043s ctrl: NQN "blktests-subsystem-1" 

root@virtme-ng:blktests#

Any pointers what I could be doing wrong?

My tested tree HEAD is:
commit 518f377c90e510032b4b0f0ebf709dfd9057043d (HEAD -> test, 
block-for-next)
Merge: 9d60dddf961a a4d416dc6098
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Apr 26 07:40:36 2024 -0600

     Merge branch 'for-6.10/io_uring' into for-next

     * for-6.10/io_uring:
       io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE 
instead of re-reading it



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-04-29 14:35 ` [bug report] " Johannes Thumshirn
@ 2024-04-29 22:18   ` Chaitanya Kulkarni
  2024-04-30  6:16     ` Johannes Thumshirn
  0 siblings, 1 reply; 10+ messages in thread
From: Chaitanya Kulkarni @ 2024-04-29 22:18 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER

On 4/29/24 07:35, Johannes Thumshirn wrote:
> On 23.04.24 15:18, Yi Zhang wrote:
>> Hi
>> I found this issue on the latest linux-block/for-next by blktests
>> nvme/tcp nvme/012, please help check it and let me know if you need
>> any info/testing for it, thanks.
>>
>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
>> blktests-subsystem-1 for NQN
>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>> 127.0.0.1:4420, hostnqn:
>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349

[...]

>>
>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
>> [  327.475957] loop0: detected capacity change from 0 to 2097152
>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>
>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>   
>>
>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
>> blktests-subsystem-1 for NQN
>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>
>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
>> support of multi-port devices.

seems like you don't have multipath enabled that is one difference
I can see in above log posted by Yi, and your log.


>> [  327.878095] nvme nvme0: creating 4 I/O queues.
>>   
>>
>> [  327.882741] nvme nvme0: mapped 4/0/0 default/read/poll queues.
>> [  327.886786] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>> 127.0.0.1:4420, hostnqn:
>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
>>

-ck



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-04-29 22:18   ` Chaitanya Kulkarni
@ 2024-04-30  6:16     ` Johannes Thumshirn
  2024-04-30 14:17       ` Yi Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Thumshirn @ 2024-04-30  6:16 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER

On 30.04.24 00:18, Chaitanya Kulkarni wrote:
> On 4/29/24 07:35, Johannes Thumshirn wrote:
>> On 23.04.24 15:18, Yi Zhang wrote:
>>> Hi
>>> I found this issue on the latest linux-block/for-next by blktests
>>> nvme/tcp nvme/012, please help check it and let me know if you need
>>> any info/testing for it, thanks.
>>>
>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>>> 127.0.0.1:4420, hostnqn:
>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> 
> [...]
> 
>>>
>>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
>>> [  327.475957] loop0: detected capacity change from 0 to 2097152
>>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>
>>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>    
>>>
>>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>
>>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
>>> support of multi-port devices.
> 
> seems like you don't have multipath enabled that is one difference
> I can see in above log posted by Yi, and your log.


Yup, but even with multipath enabled I can't get the bug to trigger :(

nvme/012 (run mkfs and data verification fio job on NVMeOF block 
device-backed ns) 

[  279.642826] run blktests nvme/012 at 2024-04-29 18:52:26 

[  280.296493] loop0: detected capacity change from 0 to 2097152
[  280.360139] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 
 

[  280.426171] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  280.712262] nvmet: creating nvm controller 1 for subsystem 
blktests-subsystem-1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. 

[  280.718259] nvme nvme0: creating 4 I/O queues. 
 

[  280.722258] nvme nvme0: mapped 4/0/0 default/read/poll queues. 
 

[  280.726088] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
127.0.0.1:4420, hostnqn: 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  281.343044] XFS (nvme0n1): Mounting V5 Filesystem 
513881ee-db18-48c7-a2b0-3e4e3e41f38c 

[  281.381925] XFS (nvme0n1): Ending clean mount 
 

[  281.390154] xfs filesystem being mounted at /mnt/blktests supports 
timestamps until 2038-01-19 (0x7fffffff)
[  309.958309] perf: interrupt took too long (2593 > 2500), lowering 
kernel.perf_event_max_sample_rate to 77000 

[  377.847337] perf: interrupt took too long (3278 > 3241), lowering 
kernel.perf_event_max_sample_rate to 61000 

[  471.964099] XFS (nvme0n1): Unmounting Filesystem 
513881ee-db18-48c7-a2b0-3e4e3e41f38c 

nvme/012 (run mkfs and data verification fio job on NVMeOF block 
device-backed ns) [passed] 

     runtime    ...  192.747s 



Can you see if you can reproduce it on your side?

Thanks,
	Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-04-30  6:16     ` Johannes Thumshirn
@ 2024-04-30 14:17       ` Yi Zhang
  2024-05-03  7:59         ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Yi Zhang @ 2024-04-30 14:17 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER

On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 30.04.24 00:18, Chaitanya Kulkarni wrote:
> > On 4/29/24 07:35, Johannes Thumshirn wrote:
> >> On 23.04.24 15:18, Yi Zhang wrote:
> >>> Hi
> >>> I found this issue on the latest linux-block/for-next by blktests
> >>> nvme/tcp nvme/012, please help check it and let me know if you need
> >>> any info/testing for it, thanks.
> >>>
> >>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
> >>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
> >>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> >>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> >>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
> >>> blktests-subsystem-1 for NQN
> >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> >>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
> >>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
> >>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> >>> 127.0.0.1:4420, hostnqn:
> >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> >
> > [...]
> >
> >>>
> >>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
> >>> [  327.475957] loop0: detected capacity change from 0 to 2097152
> >>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> >>>
> >>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> >>>
> >>>
> >>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
> >>> blktests-subsystem-1 for NQN
> >>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> >>>
> >>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
> >>> support of multi-port devices.
> >
> > seems like you don't have multipath enabled that is one difference
> > I can see in above log posted by Yi, and your log.
>
>
> Yup, but even with multipath enabled I can't get the bug to trigger :(

It's not one 100% reproduced issue, I tried on my another server and
it cannot be reproduced.

>
> nvme/012 (run mkfs and data verification fio job on NVMeOF block
> device-backed ns)
>
> [  279.642826] run blktests nvme/012 at 2024-04-29 18:52:26
>
> [  280.296493] loop0: detected capacity change from 0 to 2097152
> [  280.360139] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>
>
> [  280.426171] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  280.712262] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>
> [  280.718259] nvme nvme0: creating 4 I/O queues.
>
>
> [  280.722258] nvme nvme0: mapped 4/0/0 default/read/poll queues.
>
>
> [  280.726088] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  281.343044] XFS (nvme0n1): Mounting V5 Filesystem
> 513881ee-db18-48c7-a2b0-3e4e3e41f38c
>
> [  281.381925] XFS (nvme0n1): Ending clean mount
>
>
> [  281.390154] xfs filesystem being mounted at /mnt/blktests supports
> timestamps until 2038-01-19 (0x7fffffff)
> [  309.958309] perf: interrupt took too long (2593 > 2500), lowering
> kernel.perf_event_max_sample_rate to 77000
>
> [  377.847337] perf: interrupt took too long (3278 > 3241), lowering
> kernel.perf_event_max_sample_rate to 61000
>
> [  471.964099] XFS (nvme0n1): Unmounting Filesystem
> 513881ee-db18-48c7-a2b0-3e4e3e41f38c
>
> nvme/012 (run mkfs and data verification fio job on NVMeOF block
> device-backed ns) [passed]
>
>      runtime    ...  192.747s
>
>
>
> Can you see if you can reproduce it on your side?
>
> Thanks,
>         Johannes



-- 
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-04-30 14:17       ` Yi Zhang
@ 2024-05-03  7:59         ` Sagi Grimberg
  2024-05-03 10:32           ` Johannes Thumshirn
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2024-05-03  7:59 UTC (permalink / raw)
  To: Yi Zhang, Johannes Thumshirn, Damien Le Moal
  Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER



On 4/30/24 17:17, Yi Zhang wrote:
> On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn
> <Johannes.Thumshirn@wdc.com> wrote:
>> On 30.04.24 00:18, Chaitanya Kulkarni wrote:
>>> On 4/29/24 07:35, Johannes Thumshirn wrote:
>>>> On 23.04.24 15:18, Yi Zhang wrote:
>>>>> Hi
>>>>> I found this issue on the latest linux-block/for-next by blktests
>>>>> nvme/tcp nvme/012, please help check it and let me know if you need
>>>>> any info/testing for it, thanks.
>>>>>
>>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
>>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
>>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
>>>>> blktests-subsystem-1 for NQN
>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
>>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>>>>> 127.0.0.1:4420, hostnqn:
>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
>>> [...]
>>>
>>>>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
>>>>> [  327.475957] loop0: detected capacity change from 0 to 2097152
>>>>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>
>>>>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>
>>>>>
>>>>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
>>>>> blktests-subsystem-1 for NQN
>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>
>>>>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
>>>>> support of multi-port devices.
>>> seems like you don't have multipath enabled that is one difference
>>> I can see in above log posted by Yi, and your log.
>>
>> Yup, but even with multipath enabled I can't get the bug to trigger :(
> It's not one 100% reproduced issue, I tried on my another server and
> it cannot be reproduced.

Looking at the trace, I think I can see the issue here. In the test 
case, nvme-mpath fails
the request upon submission as the queue is not live, and because it is 
a mpath request, it
is failed over using nvme_failover_request, which steals the bios from 
the request to its private
requeue list.

The bisected patch, introduces req->bio dereference to a flush request 
that has no bios (stolen
by the failover sequence). The reproduction seems to be related to in 
where in the flush sequence
the request completion is called.

I am unsure if simply making the dereference is the correct fix or 
not... Damien?
--
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 2f58ae018464..c17cf8ed8113 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request 
*rq)
          * original @rq->bio.  Restore it.
          */
         rq->bio = rq->biotail;
-       rq->__sector = rq->bio->bi_iter.bi_sector;
+       if (rq->bio)
+               rq->__sector = rq->bio->bi_iter.bi_sector;

         /* make @rq a normal request */
         rq->rq_flags &= ~RQF_FLUSH_SEQ;
--


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-05-03  7:59         ` Sagi Grimberg
@ 2024-05-03 10:32           ` Johannes Thumshirn
  2024-05-03 11:01             ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Thumshirn @ 2024-05-03 10:32 UTC (permalink / raw)
  To: Sagi Grimberg, Yi Zhang, Damien Le Moal
  Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER

On 03.05.24 09:59, Sagi Grimberg wrote:
> 
> 
> On 4/30/24 17:17, Yi Zhang wrote:
>> On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn
>> <Johannes.Thumshirn@wdc.com> wrote:
>>> On 30.04.24 00:18, Chaitanya Kulkarni wrote:
>>>> On 4/29/24 07:35, Johannes Thumshirn wrote:
>>>>> On 23.04.24 15:18, Yi Zhang wrote:
>>>>>> Hi
>>>>>> I found this issue on the latest linux-block/for-next by blktests
>>>>>> nvme/tcp nvme/012, please help check it and let me know if you need
>>>>>> any info/testing for it, thanks.
>>>>>>
>>>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
>>>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
>>>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
>>>>>> blktests-subsystem-1 for NQN
>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
>>>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>>>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>>>>>> 127.0.0.1:4420, hostnqn:
>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
>>>> [...]
>>>>
>>>>>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
>>>>>> [  327.475957] loop0: detected capacity change from 0 to 2097152
>>>>>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>>
>>>>>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>>
>>>>>>
>>>>>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
>>>>>> blktests-subsystem-1 for NQN
>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>>
>>>>>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
>>>>>> support of multi-port devices.
>>>> seems like you don't have multipath enabled that is one difference
>>>> I can see in above log posted by Yi, and your log.
>>>
>>> Yup, but even with multipath enabled I can't get the bug to trigger :(
>> It's not one 100% reproduced issue, I tried on my another server and
>> it cannot be reproduced.
> 
> Looking at the trace, I think I can see the issue here. In the test
> case, nvme-mpath fails
> the request upon submission as the queue is not live, and because it is
> a mpath request, it
> is failed over using nvme_failover_request, which steals the bios from
> the request to its private
> requeue list.
> 
> The bisected patch, introduces req->bio dereference to a flush request
> that has no bios (stolen
> by the failover sequence). The reproduction seems to be related to in
> where in the flush sequence
> the request completion is called.
> 
> I am unsure if simply making the dereference is the correct fix or
> not... Damien?
> --
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 2f58ae018464..c17cf8ed8113 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request
> *rq)
>            * original @rq->bio.  Restore it.
>            */
>           rq->bio = rq->biotail;
> -       rq->__sector = rq->bio->bi_iter.bi_sector;
> +       if (rq->bio)
> +               rq->__sector = rq->bio->bi_iter.bi_sector;
> 
>           /* make @rq a normal request */
>           rq->rq_flags &= ~RQF_FLUSH_SEQ;
> --
> 


This is something Damien added to his patch series. I just wonder, why I 
couldn't reproduce the failure, even with nvme-mpath enabled. I tried 
both nvme-tcp as well as nvme-loop without any problems.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-05-03 10:32           ` Johannes Thumshirn
@ 2024-05-03 11:01             ` Sagi Grimberg
  2024-05-03 21:14               ` Chaitanya Kulkarni
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2024-05-03 11:01 UTC (permalink / raw)
  To: Johannes Thumshirn, Yi Zhang, Damien Le Moal
  Cc: Chaitanya Kulkarni, linux-block, open list:NVM EXPRESS DRIVER



On 03/05/2024 13:32, Johannes Thumshirn wrote:
> On 03.05.24 09:59, Sagi Grimberg wrote:
>>
>> On 4/30/24 17:17, Yi Zhang wrote:
>>> On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn
>>> <Johannes.Thumshirn@wdc.com> wrote:
>>>> On 30.04.24 00:18, Chaitanya Kulkarni wrote:
>>>>> On 4/29/24 07:35, Johannes Thumshirn wrote:
>>>>>> On 23.04.24 15:18, Yi Zhang wrote:
>>>>>>> Hi
>>>>>>> I found this issue on the latest linux-block/for-next by blktests
>>>>>>> nvme/tcp nvme/012, please help check it and let me know if you need
>>>>>>> any info/testing for it, thanks.
>>>>>>>
>>>>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
>>>>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
>>>>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
>>>>>>> blktests-subsystem-1 for NQN
>>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
>>>>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>>>>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>>>>>>> 127.0.0.1:4420, hostnqn:
>>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
>>>>> [...]
>>>>>
>>>>>>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
>>>>>>> [  327.475957] loop0: detected capacity change from 0 to 2097152
>>>>>>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>>>
>>>>>>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>>>
>>>>>>>
>>>>>>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
>>>>>>> blktests-subsystem-1 for NQN
>>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>>>
>>>>>>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
>>>>>>> support of multi-port devices.
>>>>> seems like you don't have multipath enabled that is one difference
>>>>> I can see in above log posted by Yi, and your log.
>>>> Yup, but even with multipath enabled I can't get the bug to trigger :(
>>> It's not one 100% reproduced issue, I tried on my another server and
>>> it cannot be reproduced.
>> Looking at the trace, I think I can see the issue here. In the test
>> case, nvme-mpath fails
>> the request upon submission as the queue is not live, and because it is
>> a mpath request, it
>> is failed over using nvme_failover_request, which steals the bios from
>> the request to its private
>> requeue list.
>>
>> The bisected patch, introduces req->bio dereference to a flush request
>> that has no bios (stolen
>> by the failover sequence). The reproduction seems to be related to in
>> where in the flush sequence
>> the request completion is called.
>>
>> I am unsure if simply making the dereference is the correct fix or
>> not... Damien?
>> --
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index 2f58ae018464..c17cf8ed8113 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request
>> *rq)
>>             * original @rq->bio.  Restore it.
>>             */
>>            rq->bio = rq->biotail;
>> -       rq->__sector = rq->bio->bi_iter.bi_sector;
>> +       if (rq->bio)
>> +               rq->__sector = rq->bio->bi_iter.bi_sector;
>>
>>            /* make @rq a normal request */
>>            rq->rq_flags &= ~RQF_FLUSH_SEQ;
>> --
>>
>
> This is something Damien added to his patch series. I just wonder, why I
> couldn't reproduce the failure, even with nvme-mpath enabled. I tried
> both nvme-tcp as well as nvme-loop without any problems.

Not exactly sure.

 From what I see blk_flush_complete_seq() will only call 
blk_flush_restore_request() and
panic is for error != 0. And if that is the case, any request with its 
bios stolen must panic.

However, nvme-mpath always ends a stolen request with error = 0.

Seems that there is code that may override the request error status in 
flush_end_io() but I cannot
see it in the trace...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
  2024-05-03 11:01             ` Sagi Grimberg
@ 2024-05-03 21:14               ` Chaitanya Kulkarni
  0 siblings, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2024-05-03 21:14 UTC (permalink / raw)
  To: Sagi Grimberg, Johannes Thumshirn, Yi Zhang, Damien Le Moal
  Cc: linux-block, open list:NVM EXPRESS DRIVER


>> This is something Damien added to his patch series. I just wonder, why I
>> couldn't reproduce the failure, even with nvme-mpath enabled. I tried
>> both nvme-tcp as well as nvme-loop without any problems.
> 
> Not exactly sure.
> 
>  From what I see blk_flush_complete_seq() will only call 
> blk_flush_restore_request() and
> panic is for error != 0. And if that is the case, any request with its 
> bios stolen must panic.
> 
> However, nvme-mpath always ends a stolen request with error = 0.
> 
> Seems that there is code that may override the request error status in 
> flush_end_io() but I cannot
> see it in the trace...


I confirm that after several tries I cannot reproduced it either with
and without multi-pathing, blktests is passing without any errors for
both nvme-loop and nvme-tcp ....

-ck



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-03 22:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang
2024-04-26  8:30 ` [bug report][bisected] " Yi Zhang
2024-04-29 14:35 ` [bug report] " Johannes Thumshirn
2024-04-29 22:18   ` Chaitanya Kulkarni
2024-04-30  6:16     ` Johannes Thumshirn
2024-04-30 14:17       ` Yi Zhang
2024-05-03  7:59         ` Sagi Grimberg
2024-05-03 10:32           ` Johannes Thumshirn
2024-05-03 11:01             ` Sagi Grimberg
2024-05-03 21:14               ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).