All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
@ 2023-03-15 17:48 Belanger, Martin
  2023-03-15 18:13 ` Keith Busch
  2023-03-15 22:24 ` Keith Busch
  0 siblings, 2 replies; 30+ messages in thread
From: Belanger, Martin @ 2023-03-15 17:48 UTC (permalink / raw)
  To: linux-nvme

I'm running tests where I connect/disconnect to/from a few I/O controllers using the nvme_tcp driver. I use nvmet_tcp with a null_blk device to simulate the target. The kernel module crashes (trace below) while trying to connect over TCP. This happens on Fedora 37 and Ubuntu 22.04. I also recompiled the kernel using the latest nvme-6.4 branch and I'm still seeing the crash.

I'm not sure how to debug this further. Any suggestions?

Thanks,
Martin Belanger

Mar 15 13:30:22.954399 fedora37 kernel: nvme nvme1: failed to connect socket: -110
Mar 15 13:30:22.958393 fedora37 kernel: nvmet: creating nvm controller 2 for subsystem nqn.1988-11.com.dell:PowerSANxxx:01:20210225100113-454f73093ceb4847a7bdfc6e34ae8e28 for NQN nqn.2014-08.org.nvmexpress:uuid:f9ef75fc-1699-418f-ba45-49f9fc766e1b.
Mar 15 13:30:22.958453 fedora37 kernel: nvme nvme1: creating 12 I/O queues.
Mar 15 13:30:22.960320 fedora37 kernel: nvme nvme1: mapped 4/4/4 default/read/poll queues.
Mar 15 13:30:22.960862 fedora37 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000034
Mar 15 13:30:22.960998 fedora37 kernel: #PF: supervisor read access in kernel mode
Mar 15 13:30:22.992915 fedora37 kernel: #PF: error_code(0x0000) - not-present page
Mar 15 13:30:22.994551 fedora37 kernel: PGD 0 P4D 0 
Mar 15 13:30:22.996135 fedora37 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Mar 15 13:30:22.996169 fedora37 kernel: CPU: 0 PID: 3953 Comm: pool Not tainted 6.3.0-rc1-stas+ #1
Mar 15 13:30:22.996192 fedora37 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Mar 15 13:30:22.996210 fedora37 kernel: RIP: 0010:bio_poll+0xd/0x150
Mar 15 13:30:22.996227 fedora37 kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 <8b> 6f 34 48 8b 47 08 48 85 c0 0f 84 a1 00 00 00 4c 8b a8 60 03 00
Mar 15 13:30:22.996245 fedora37 kernel: RSP: 0018:ffffa561851bfae0 EFLAGS: 00010246
Mar 15 13:30:22.996266 fedora37 kernel: RAX: 0000000000000000 RBX: ffff8ff38ae60000 RCX: 0000000000000000
Mar 15 13:30:22.996311 fedora37 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 15 13:30:22.996369 fedora37 kernel: RBP: ffffa561851bfb10 R08: 0000000000000001 R09: ffff8ff38cc0e860
Mar 15 13:30:22.996410 fedora37 kernel: R10: ffff8ff3887af388 R11: 0000000000000110 R12: 0000000000000001
Mar 15 13:30:22.996430 fedora37 kernel: R13: ffff8ff38fbd9c00 R14: 0000000000000400 R15: ffffa561851bfba8
Mar 15 13:30:22.996450 fedora37 kernel: FS:  00007f9aab2ff6c0(0000) GS:ffff8ff84b400000(0000) knlGS:0000000000000000
Mar 15 13:30:22.996467 fedora37 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 15 13:30:22.996484 fedora37 kernel: CR2: 0000000000000034 CR3: 000000011439e002 CR4: 00000000000706f0
Mar 15 13:30:22.996501 fedora37 kernel: Call Trace:
Mar 15 13:30:22.996518 fedora37 kernel:  <TASK>
Mar 15 13:30:22.996535 fedora37 kernel:  blk_execute_rq+0xc9/0x190
Mar 15 13:30:22.996552 fedora37 kernel:  __nvme_submit_sync_cmd+0xa5/0x160 [nvme_core]
Mar 15 13:30:22.996572 fedora37 kernel:  nvmf_connect_io_queue+0x10b/0x200 [nvme_fabrics]
Mar 15 13:30:22.996589 fedora37 kernel:  nvme_tcp_start_queue+0x1a/0x90 [nvme_tcp]
Mar 15 13:30:22.996606 fedora37 kernel:  nvme_tcp_setup_ctrl+0x410/0x7e0 [nvme_tcp]
Mar 15 13:30:22.996626 fedora37 kernel:  nvme_tcp_create_ctrl+0x34f/0x460 [nvme_tcp]
Mar 15 13:30:22.996643 fedora37 kernel:  nvmf_dev_write+0x5da/0xec0 [nvme_fabrics]
Mar 15 13:30:22.996660 fedora37 kernel:  ? selinux_file_permission+0x10b/0x150
Mar 15 13:30:22.996675 fedora37 kernel:  vfs_write+0xb9/0x3e0
Mar 15 13:30:22.996690 fedora37 kernel:  ? __fget_light+0x9d/0x100
Mar 15 13:30:22.996706 fedora37 kernel:  ksys_write+0x5b/0xd0
Mar 15 13:30:22.996721 fedora37 kernel:  do_syscall_64+0x5b/0x80
Mar 15 13:30:22.996735 fedora37 kernel:  ? ksys_write+0xb4/0xd0
Mar 15 13:30:22.996752 fedora37 kernel:  ? syscall_exit_to_user_mode+0x17/0x40
Mar 15 13:30:22.996769 fedora37 kernel:  ? do_syscall_64+0x67/0x80
Mar 15 13:30:22.996788 fedora37 kernel:  ? preempt_count_add+0x47/0xa0
Mar 15 13:30:22.996808 fedora37 kernel:  ? up_read+0x37/0x70
Mar 15 13:30:22.996823 fedora37 kernel:  ? do_user_addr_fault+0x1ef/0x710
Mar 15 13:30:22.996841 fedora37 kernel:  ? do_syscall_64+0x67/0x80
Mar 15 13:30:22.996856 fedora37 kernel:  ? exc_page_fault+0x70/0x170
Mar 15 13:30:22.996871 fedora37 kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Mar 15 13:30:22.996888 fedora37 kernel: RIP: 0033:0x7f9abbf1e2bf
Mar 15 13:30:22.996964 fedora37 kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 19 c3 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 6c c3 f8 ff 48
Mar 15 13:30:22.996984 fedora37 kernel: RSP: 002b:00007f9aab2fd500 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Mar 15 13:30:22.997003 fedora37 kernel: RAX: ffffffffffffffda RBX: 00007f9aa0006aa0 RCX: 00007f9abbf1e2bf
Mar 15 13:30:22.997022 fedora37 kernel: RDX: 0000000000000166 RSI: 00007f9aa0006aa0 RDI: 0000000000000010
Mar 15 13:30:22.997044 fedora37 kernel: RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000073
Mar 15 13:30:22.997061 fedora37 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00005595f875a370
Mar 15 13:30:22.997077 fedora37 kernel: R13: 0000000000000166 R14: 00007f9aac4a35f8 R15: 00007f9aac49502b
Mar 15 13:30:22.997097 fedora37 kernel:  </TASK>
Mar 15 13:30:22.997114 fedora37 kernel: Modules linked in: nvmet_tcp nvmet null_blk nvme_tcp nvme_fabrics nvme_core nvme_common uinput snd_seq_dummy snd_hrtimer qrtr rfkill sunrpc binfmt_misc snd_intel8x0 snd_ac97_codec ac97_bus snd_seq intel_rapl_msr intel_rapl_common snd_seq_device rapl joydev snd_pcm snd_timer pcspkr snd i2c_piix4 vboxguest soundcore loop zram vmwgfx crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni e1000 polyval_generic drm_ttm_helper ttm video wmi ghash_clmulni_intel sha512_ssse3 serio_raw ata_generic pata_acpi ip6_tables ip_tables fuse
Mar 15 13:30:22.997178 fedora37 kernel: CR2: 0000000000000034
Mar 15 13:30:22.997199 fedora37 kernel: ---[ end trace 0000000000000000 ]---
Mar 15 13:30:22.997218 fedora37 kernel: RIP: 0010:bio_poll+0xd/0x150
Mar 15 13:30:22.997234 fedora37 kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 <8b> 6f 34 48 8b 47 08 48 85 c0 0f 84 a1 00 00 00 4c 8b a8 60 03 00
Mar 15 13:30:22.997249 fedora37 kernel: RSP: 0018:ffffa561851bfae0 EFLAGS: 00010246
Mar 15 13:30:22.997264 fedora37 kernel: RAX: 0000000000000000 RBX: ffff8ff38ae60000 RCX: 0000000000000000
Mar 15 13:30:22.997279 fedora37 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 15 13:30:22.997331 fedora37 kernel: RBP: ffffa561851bfb10 R08: 0000000000000001 R09: ffff8ff38cc0e860
Mar 15 13:30:22.997384 fedora37 kernel: R10: ffff8ff3887af388 R11: 0000000000000110 R12: 0000000000000001
Mar 15 13:30:22.997402 fedora37 kernel: R13: ffff8ff38fbd9c00 R14: 0000000000000400 R15: ffffa561851bfba8
Mar 15 13:30:22.997417 fedora37 kernel: FS:  00007f9aab2ff6c0(0000) GS:ffff8ff84b400000(0000) knlGS:0000000000000000
Mar 15 13:30:22.997432 fedora37 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 15 13:30:22.997445 fedora37 kernel: CR2: 0000000000000034 CR3: 000000011439e002 CR4: 00000000000706f0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 17:48 nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034 Belanger, Martin
@ 2023-03-15 18:13 ` Keith Busch
  2023-03-15 18:23   ` Belanger, Martin
  2023-03-15 22:24 ` Keith Busch
  1 sibling, 1 reply; 30+ messages in thread
From: Keith Busch @ 2023-03-15 18:13 UTC (permalink / raw)
  To: Belanger, Martin; +Cc: linux-nvme

On Wed, Mar 15, 2023 at 05:48:14PM +0000, Belanger, Martin wrote:
> I'm running tests where I connect/disconnect to/from a few I/O controllers using the nvme_tcp driver. I use nvmet_tcp with a null_blk device to simulate the target. The kernel module crashes (trace below) while trying to connect over TCP. This happens on Fedora 37 and Ubuntu 22.04. I also recompiled the kernel using the latest nvme-6.4 branch and I'm still seeing the crash.
> 
> I'm not sure how to debug this further. Any suggestions?

Never seen anyone try to use poll queues with nvme tcp before. It doesn't look
like that would work for a connect command since there's no bdev at this point,
and polling needs a bdev.

> Mar 15 13:30:22.954399 fedora37 kernel: nvme nvme1: failed to connect socket: -110
> Mar 16 13:30:22.958393 fedora37 kernel: nvmet: creating nvm controller 2 for subsystem nqn.1988-11.com.dell:PowerSANxxx:01:20210225100113-454f73093ceb4847a7bdfc6e34ae8e28 for NQN nqn.2014-08.org.nvmexpress:uuid:f9ef75fc-1699-418f-ba45-49f9fc766e1b.
> Mar 15 13:30:22.958453 fedora37 kernel: nvme nvme1: creating 12 I/O queues.
> Mar 15 13:30:22.960320 fedora37 kernel: nvme nvme1: mapped 4/4/4 default/read/poll queues.
> Mar 15 13:30:22.960862 fedora37 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000034
> Mar 15 13:30:22.960998 fedora37 kernel: #PF: supervisor read access in kernel mode
> Mar 15 13:30:22.992915 fedora37 kernel: #PF: error_code(0x0000) - not-present page
> Mar 15 13:30:22.994551 fedora37 kernel: PGD 0 P4D 0 
> Mar 15 13:30:22.996135 fedora37 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
> Mar 15 13:30:22.996169 fedora37 kernel: CPU: 0 PID: 3953 Comm: pool Not tainted 6.3.0-rc1-stas+ #1
> Mar 15 13:30:22.996192 fedora37 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> Mar 15 13:30:22.996210 fedora37 kernel: RIP: 0010:bio_poll+0xd/0x150
> Mar 15 13:30:22.996227 fedora37 kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 <8b> 6f 34 48 8b 47 08 48 85 c0 0f 84 a1 00 00 00 4c 8b a8 60 03 00
> Mar 15 13:30:22.996245 fedora37 kernel: RSP: 0018:ffffa561851bfae0 EFLAGS: 00010246
> Mar 15 13:30:22.996266 fedora37 kernel: RAX: 0000000000000000 RBX: ffff8ff38ae60000 RCX: 0000000000000000
> Mar 15 13:30:22.996311 fedora37 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> Mar 15 13:30:22.996369 fedora37 kernel: RBP: ffffa561851bfb10 R08: 0000000000000001 R09: ffff8ff38cc0e860
> Mar 15 13:30:22.996410 fedora37 kernel: R10: ffff8ff3887af388 R11: 0000000000000110 R12: 0000000000000001
> Mar 15 13:30:22.996430 fedora37 kernel: R13: ffff8ff38fbd9c00 R14: 0000000000000400 R15: ffffa561851bfba8
> Mar 15 13:30:22.996450 fedora37 kernel: FS:  00007f9aab2ff6c0(0000) GS:ffff8ff84b400000(0000) knlGS:0000000000000000
> Mar 15 13:30:22.996467 fedora37 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 15 13:30:22.996484 fedora37 kernel: CR2: 0000000000000034 CR3: 000000011439e002 CR4: 00000000000706f0
> Mar 15 13:30:22.996501 fedora37 kernel: Call Trace:
> Mar 16 13:30:22.996518 fedora37 kernel:  <TASK>
> Mar 15 13:30:22.996535 fedora37 kernel:  blk_execute_rq+0xc9/0x190
> Mar 15 13:30:22.996552 fedora37 kernel:  __nvme_submit_sync_cmd+0xa5/0x160 [nvme_core]
> Mar 15 13:30:22.996572 fedora37 kernel:  nvmf_connect_io_queue+0x10b/0x200 [nvme_fabrics]
> Mar 15 13:30:22.996589 fedora37 kernel:  nvme_tcp_start_queue+0x1a/0x90 [nvme_tcp]
> Mar 15 13:30:22.996606 fedora37 kernel:  nvme_tcp_setup_ctrl+0x410/0x7e0 [nvme_tcp]
> Mar 15 13:30:22.996626 fedora37 kernel:  nvme_tcp_create_ctrl+0x34f/0x460 [nvme_tcp]
> Mar 15 13:30:22.996643 fedora37 kernel:  nvmf_dev_write+0x5da/0xec0 [nvme_fabrics]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 18:13 ` Keith Busch
@ 2023-03-15 18:23   ` Belanger, Martin
  2023-03-15 19:39     ` Keith Busch
  2023-03-15 22:49     ` Chaitanya Kulkarni
  0 siblings, 2 replies; 30+ messages in thread
From: Belanger, Martin @ 2023-03-15 18:23 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme

> 
> On Wed, Mar 15, 2023 at 05:48:14PM +0000, Belanger, Martin wrote:
> > I'm running tests where I connect/disconnect to/from a few I/O controllers
> using the nvme_tcp driver. I use nvmet_tcp with a null_blk device to simulate the
> target. The kernel module crashes (trace below) while trying to connect over
> TCP. This happens on Fedora 37 and Ubuntu 22.04. I also recompiled the kernel
> using the latest nvme-6.4 branch and I'm still seeing the crash.
> >
> > I'm not sure how to debug this further. Any suggestions?
> 
> Never seen anyone try to use poll queues with nvme tcp before. It doesn't look
> like that would work for a connect command since there's no bdev at this point,
> and polling needs a bdev.

Thanks for pointing me in the right direction.
I wrote a test program that exercises all the different options available.
The crash went away once I removed "nr-poll-queues=4". 
But this begs the question: should a user-space program be given the ability
to crash the kernel by simply providing the wrong (or weird) arguments?

Thanks,
Martin 

> 
> > Mar 15 13:30:22.954399 fedora37 kernel: nvme nvme1: failed to connect
> > socket: -110 Mar 16 13:30:22.958393 fedora37 kernel: nvmet: creating nvm
> controller 2 for subsystem nqn.1988-
> 11.com.dell:PowerSANxxx:01:20210225100113-
> 454f73093ceb4847a7bdfc6e34ae8e28 for NQN nqn.2014-
> 08.org.nvmexpress:uuid:f9ef75fc-1699-418f-ba45-49f9fc766e1b.
> > Mar 15 13:30:22.958453 fedora37 kernel: nvme nvme1: creating 12 I/O
> queues.
> > Mar 15 13:30:22.960320 fedora37 kernel: nvme nvme1: mapped 4/4/4
> default/read/poll queues.
> > Mar 15 13:30:22.960862 fedora37 kernel: BUG: kernel NULL pointer
> > dereference, address: 0000000000000034 Mar 15 13:30:22.960998 fedora37
> > kernel: #PF: supervisor read access in kernel mode Mar 15
> > 13:30:22.992915 fedora37 kernel: #PF: error_code(0x0000) - not-present
> > page Mar 15 13:30:22.994551 fedora37 kernel: PGD 0 P4D 0 Mar 15
> > 13:30:22.996135 fedora37 kernel: Oops: 0000 [#1] PREEMPT SMP PTI Mar
> > 15 13:30:22.996169 fedora37 kernel: CPU: 0 PID: 3953 Comm: pool Not
> > tainted 6.3.0-rc1-stas+ #1 Mar 15 13:30:22.996192 fedora37 kernel:
> > Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox
> > 12/01/2006 Mar 15 13:30:22.996210 fedora37 kernel: RIP:
> > 0010:bio_poll+0xd/0x150 Mar 15 13:30:22.996227 fedora37 kernel: Code:
> > 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90
> > 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 <8b> 6f 34 48
> > 8b 47 08 48 85 c0 0f 84 a1 00 00 00 4c 8b a8 60 03 00 Mar 15
> > 13:30:22.996245 fedora37 kernel: RSP: 0018:ffffa561851bfae0 EFLAGS:
> > 00010246 Mar 15 13:30:22.996266 fedora37 kernel: RAX: 0000000000000000
> > RBX: ffff8ff38ae60000 RCX: 0000000000000000 Mar 15 13:30:22.996311
> > fedora37 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 0000000000000000 Mar 15 13:30:22.996369 fedora37 kernel: RBP:
> ffffa561851bfb10 R08: 0000000000000001 R09: ffff8ff38cc0e860 Mar 15
> 13:30:22.996410 fedora37 kernel: R10: ffff8ff3887af388 R11:
> 0000000000000110 R12: 0000000000000001 Mar 15 13:30:22.996430 fedora37
> kernel: R13: ffff8ff38fbd9c00 R14: 0000000000000400 R15: ffffa561851bfba8
> Mar 15 13:30:22.996450 fedora37 kernel: FS:  00007f9aab2ff6c0(0000)
> GS:ffff8ff84b400000(0000) knlGS:0000000000000000 Mar 15 13:30:22.996467
> fedora37 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 15
> 13:30:22.996484 fedora37 kernel: CR2: 0000000000000034 CR3:
> 000000011439e002 CR4: 00000000000706f0 Mar 15 13:30:22.996501 fedora37
> kernel: Call Trace:
> > Mar 16 13:30:22.996518 fedora37 kernel:  <TASK> Mar 15 13:30:22.996535
> > fedora37 kernel:  blk_execute_rq+0xc9/0x190 Mar 15 13:30:22.996552
> > fedora37 kernel:  __nvme_submit_sync_cmd+0xa5/0x160 [nvme_core] Mar 15
> > 13:30:22.996572 fedora37 kernel:  nvmf_connect_io_queue+0x10b/0x200
> > [nvme_fabrics] Mar 15 13:30:22.996589 fedora37 kernel:
> > nvme_tcp_start_queue+0x1a/0x90 [nvme_tcp] Mar 15 13:30:22.996606
> > fedora37 kernel:  nvme_tcp_setup_ctrl+0x410/0x7e0 [nvme_tcp] Mar 15
> > 13:30:22.996626 fedora37 kernel:  nvme_tcp_create_ctrl+0x34f/0x460
> > [nvme_tcp] Mar 15 13:30:22.996643 fedora37 kernel:
> > nvmf_dev_write+0x5da/0xec0 [nvme_fabrics]

Internal Use - Confidential


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 18:23   ` Belanger, Martin
@ 2023-03-15 19:39     ` Keith Busch
  2023-03-16  8:57       ` Sagi Grimberg
  2023-03-15 22:49     ` Chaitanya Kulkarni
  1 sibling, 1 reply; 30+ messages in thread
From: Keith Busch @ 2023-03-15 19:39 UTC (permalink / raw)
  To: Belanger, Martin; +Cc: linux-nvme

On Wed, Mar 15, 2023 at 06:23:32PM +0000, Belanger, Martin wrote:
> > 
> > On Wed, Mar 15, 2023 at 05:48:14PM +0000, Belanger, Martin wrote:
> > > I'm running tests where I connect/disconnect to/from a few I/O controllers
> > using the nvme_tcp driver. I use nvmet_tcp with a null_blk device to simulate the
> > target. The kernel module crashes (trace below) while trying to connect over
> > TCP. This happens on Fedora 37 and Ubuntu 22.04. I also recompiled the kernel
> > using the latest nvme-6.4 branch and I'm still seeing the crash.
> > >
> > > I'm not sure how to debug this further. Any suggestions?
> > 
> > Never seen anyone try to use poll queues with nvme tcp before. It doesn't look
> > like that would work for a connect command since there's no bdev at this point,
> > and polling needs a bdev.
> 
> Thanks for pointing me in the right direction.
> I wrote a test program that exercises all the different options available.
> The crash went away once I removed "nr-poll-queues=4". 
> But this begs the question: should a user-space program be given the ability
> to crash the kernel by simply providing the wrong (or weird) arguments?

Right, we certainly don't want to let an easy kernel crash like this exist now
that we know it's there. I'm just consdering a couple different ways to fix it.
We could just reject user polling options for nvme fabrics, or we could make
polling work with just a request_queue instead of needing a bdev.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 17:48 nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034 Belanger, Martin
  2023-03-15 18:13 ` Keith Busch
@ 2023-03-15 22:24 ` Keith Busch
  2023-03-16  9:00   ` Sagi Grimberg
  1 sibling, 1 reply; 30+ messages in thread
From: Keith Busch @ 2023-03-15 22:24 UTC (permalink / raw)
  To: Belanger, Martin; +Cc: linux-nvme

Could you try this patch with your tcp polling queues enabled?

---
diff --git a/block/blk-core.c b/block/blk-core.c
index 9e5e0277a4d95..12f7ab369f7ba 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -844,30 +844,12 @@ void submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
-/**
- * bio_poll - poll for BIO completions
- * @bio: bio to poll for
- * @iob: batches of IO
- * @flags: BLK_POLL_* flags that control the behavior
- *
- * Poll for completions on queue associated with the bio. Returns number of
- * completed entries found.
- *
- * Note: the caller must either be the context that submitted @bio, or
- * be in a RCU critical section to prevent freeing of @bio.
- */
-int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
+static int blk_poll(struct request_queue *q, struct io_comp_batch *iob,
+		    struct bio *bio, unsigned int flags)
 {
 	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
-	struct block_device *bdev;
-	struct request_queue *q;
 	int ret = 0;
 
-	bdev = READ_ONCE(bio->bi_bdev);
-	if (!bdev)
-		return 0;
-
-	q = bdev_get_queue(bdev);
 	if (cookie == BLK_QC_T_NONE ||
 	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		return 0;
@@ -902,6 +884,39 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
 	blk_queue_exit(q);
 	return ret;
 }
+
+/**
+ * blk_rq_poll - poll for request completions
+ * @rq: request to poll for
+ */
+int blk_rq_poll(struct request *rq)
+{
+	return blk_poll(rq->q, NULL, rq->bio, 0);
+}
+EXPORT_SYMBOL_GPL(blk_rq_poll);
+
+/**
+ * bio_poll - poll for BIO completions
+ * @bio: bio to poll for
+ * @iob: batches of IO
+ * @flags: BLK_POLL_* flags that control the behavior
+ *
+ * Poll for completions on queue associated with the bio. Returns number of
+ * completed entries found.
+ *
+ * Note: the caller must either be the context that submitted @bio, or
+ * be in a RCU critical section to prevent freeing of @bio.
+ */
+int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
+{
+	struct block_device *bdev;
+
+	bdev = READ_ONCE(bio->bi_bdev);
+	if (!bdev)
+		return 0;
+
+	return blk_poll(bdev_get_queue(bdev), iob, bio, flags);
+}
 EXPORT_SYMBOL_GPL(bio_poll);
 
 /*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d0cb2ef18fe21..2ada72cbfb24e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1368,7 +1368,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll);
 static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
 {
 	do {
-		bio_poll(rq->bio, NULL, 0);
+		blk_rq_poll(rq);
 		cond_resched();
 	} while (!completion_done(wait));
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d1aee08f8c181..70532cafd3a96 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -872,6 +872,7 @@ blk_status_t errno_to_blk_status(int errno);
 #define BLK_POLL_ONESHOT		(1 << 0)
 /* do not sleep to wait for the expected completion time */
 #define BLK_POLL_NOSLEEP		(1 << 1)
+int blk_rq_poll(struct request *rq);
 int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags);
 int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
 			unsigned int flags);
--


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 18:23   ` Belanger, Martin
  2023-03-15 19:39     ` Keith Busch
@ 2023-03-15 22:49     ` Chaitanya Kulkarni
  1 sibling, 0 replies; 30+ messages in thread
From: Chaitanya Kulkarni @ 2023-03-15 22:49 UTC (permalink / raw)
  To: Belanger, Martin, Keith Busch; +Cc: linux-nvme


>> Never seen anyone try to use poll queues with nvme tcp before. It doesn't look
>> like that would work for a connect command since there's no bdev at this point,
>> and polling needs a bdev.
> Thanks for pointing me in the right direction.
> I wrote a test program that exercises all the different options available.
> The crash went away once I removed "nr-poll-queues=4".
> But this begs the question: should a user-space program be given the ability
> to crash the kernel by simply providing the wrong (or weird) arguments?
>
>

Can you please submit a blktests for the nvme category ?
that way it will get exercised by everyone on the mailing list ?

-ck



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 19:39     ` Keith Busch
@ 2023-03-16  8:57       ` Sagi Grimberg
  0 siblings, 0 replies; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-16  8:57 UTC (permalink / raw)
  To: Keith Busch, Belanger, Martin; +Cc: linux-nvme


>>>> I'm running tests where I connect/disconnect to/from a few I/O controllers
>>> using the nvme_tcp driver. I use nvmet_tcp with a null_blk device to simulate the
>>> target. The kernel module crashes (trace below) while trying to connect over
>>> TCP. This happens on Fedora 37 and Ubuntu 22.04. I also recompiled the kernel
>>> using the latest nvme-6.4 branch and I'm still seeing the crash.
>>>>
>>>> I'm not sure how to debug this further. Any suggestions?
>>>
>>> Never seen anyone try to use poll queues with nvme tcp before. It doesn't look
>>> like that would work for a connect command since there's no bdev at this point,
>>> and polling needs a bdev.
>>
>> Thanks for pointing me in the right direction.
>> I wrote a test program that exercises all the different options available.
>> The crash went away once I removed "nr-poll-queues=4".
>> But this begs the question: should a user-space program be given the ability
>> to crash the kernel by simply providing the wrong (or weird) arguments?
> 
> Right, we certainly don't want to let an easy kernel crash like this exist now
> that we know it's there. I'm just consdering a couple different ways to fix it.
> We could just reject user polling options for nvme fabrics, or we could make
> polling work with just a request_queue instead of needing a bdev.

Polling used to not need a bdev, but the introduction of bio_poll made
it required.

We can't just reject user polling for fabrics, it exists for RDMA as
well as TCP.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-15 22:24 ` Keith Busch
@ 2023-03-16  9:00   ` Sagi Grimberg
  2023-03-16 15:20     ` Keith Busch
  0 siblings, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-16  9:00 UTC (permalink / raw)
  To: Keith Busch, Belanger, Martin; +Cc: linux-nvme

> Could you try this patch with your tcp polling queues enabled?
> 
> ---
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 9e5e0277a4d95..12f7ab369f7ba 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -844,30 +844,12 @@ void submit_bio(struct bio *bio)
>   }
>   EXPORT_SYMBOL(submit_bio);
>   
> -/**
> - * bio_poll - poll for BIO completions
> - * @bio: bio to poll for
> - * @iob: batches of IO
> - * @flags: BLK_POLL_* flags that control the behavior
> - *
> - * Poll for completions on queue associated with the bio. Returns number of
> - * completed entries found.
> - *
> - * Note: the caller must either be the context that submitted @bio, or
> - * be in a RCU critical section to prevent freeing of @bio.
> - */
> -int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
> +static int blk_poll(struct request_queue *q, struct io_comp_batch *iob,
> +		    struct bio *bio, unsigned int flags)
>   {
>   	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);

You need to pass in the cookie as well.
Probably can pass BLK_QC_T_NONE for blk_rq_poll...


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-16  9:00   ` Sagi Grimberg
@ 2023-03-16 15:20     ` Keith Busch
  2023-03-16 16:11       ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Keith Busch @ 2023-03-16 15:20 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Belanger, Martin, linux-nvme

On Thu, Mar 16, 2023 at 11:00:57AM +0200, Sagi Grimberg wrote:
> > Could you try this patch with your tcp polling queues enabled?
> > 
> > ---
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 9e5e0277a4d95..12f7ab369f7ba 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -844,30 +844,12 @@ void submit_bio(struct bio *bio)
> >   }
> >   EXPORT_SYMBOL(submit_bio);
> > -/**
> > - * bio_poll - poll for BIO completions
> > - * @bio: bio to poll for
> > - * @iob: batches of IO
> > - * @flags: BLK_POLL_* flags that control the behavior
> > - *
> > - * Poll for completions on queue associated with the bio. Returns number of
> > - * completed entries found.
> > - *
> > - * Note: the caller must either be the context that submitted @bio, or
> > - * be in a RCU critical section to prevent freeing of @bio.
> > - */
> > -int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
> > +static int blk_poll(struct request_queue *q, struct io_comp_batch *iob,
> > +		    struct bio *bio, unsigned int flags)
> >   {
> >   	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
> 
> You need to pass in the cookie as well.
> Probably can pass BLK_QC_T_NONE for blk_rq_poll...

I don't understand. The bio holds the cookie, and the bio is passed through the
rq. If we send NONE, then polling won't happen. Will the command be completed
some other way?

And since we're talking about this, what happens if a command is dispatched to
a polled queue, but has no data transfer? The nvme driver doesn't attach a bio
to the request in that case, so no cookie?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-16 15:20     ` Keith Busch
@ 2023-03-16 16:11       ` Sagi Grimberg
  2023-03-16 17:19         ` Keith Busch
  0 siblings, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-16 16:11 UTC (permalink / raw)
  To: Keith Busch; +Cc: Belanger, Martin, linux-nvme


>>> Could you try this patch with your tcp polling queues enabled?
>>>
>>> ---
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 9e5e0277a4d95..12f7ab369f7ba 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -844,30 +844,12 @@ void submit_bio(struct bio *bio)
>>>    }
>>>    EXPORT_SYMBOL(submit_bio);
>>> -/**
>>> - * bio_poll - poll for BIO completions
>>> - * @bio: bio to poll for
>>> - * @iob: batches of IO
>>> - * @flags: BLK_POLL_* flags that control the behavior
>>> - *
>>> - * Poll for completions on queue associated with the bio. Returns number of
>>> - * completed entries found.
>>> - *
>>> - * Note: the caller must either be the context that submitted @bio, or
>>> - * be in a RCU critical section to prevent freeing of @bio.
>>> - */
>>> -int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
>>> +static int blk_poll(struct request_queue *q, struct io_comp_batch *iob,
>>> +		    struct bio *bio, unsigned int flags)
>>>    {
>>>    	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
>>
>> You need to pass in the cookie as well.
>> Probably can pass BLK_QC_T_NONE for blk_rq_poll...
> 
> I don't understand. The bio holds the cookie, and the bio is passed through the
> rq. If we send NONE, then polling won't happen. Will the command be completed
> some other way?

You're right, I thought that NONE then the poll will be called at least 
once. That seem to have changed from the past...

> And since we're talking about this, what happens if a command is dispatched to
> a polled queue, but has no data transfer? The nvme driver doesn't attach a bio
> to the request in that case, so no cookie?

IIRC, initially NONE was called once, and the loop in blk_execute_rq 
looped until the completion was met.

I would assume that any other cookie other than NONE would work here, 
because the driver .poll() would be invoked at least once (maybe set
ONESHOT as well).


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-16 16:11       ` Sagi Grimberg
@ 2023-03-16 17:19         ` Keith Busch
  2023-03-19 13:10           ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Keith Busch @ 2023-03-16 17:19 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Belanger, Martin, linux-nvme

On Thu, Mar 16, 2023 at 06:11:57PM +0200, Sagi Grimberg wrote:
> > And since we're talking about this, what happens if a command is dispatched to
> > a polled queue, but has no data transfer? The nvme driver doesn't attach a bio
> > to the request in that case, so no cookie?
> 
> IIRC, initially NONE was called once, and the loop in blk_execute_rq looped
> until the completion was met.
> 
> I would assume that any other cookie other than NONE would work here,
> because the driver .poll() would be invoked at least once (maybe set
> ONESHOT as well).

As long as the cookie points to the correct hctx, then it would work.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-16 17:19         ` Keith Busch
@ 2023-03-19 13:10           ` Sagi Grimberg
  2023-03-21  8:23             ` Daniel Wagner
  0 siblings, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-19 13:10 UTC (permalink / raw)
  To: Keith Busch; +Cc: Belanger, Martin, linux-nvme


>>> And since we're talking about this, what happens if a command is dispatched to
>>> a polled queue, but has no data transfer? The nvme driver doesn't attach a bio
>>> to the request in that case, so no cookie?
>>
>> IIRC, initially NONE was called once, and the loop in blk_execute_rq looped
>> until the completion was met.
>>
>> I would assume that any other cookie other than NONE would work here,
>> because the driver .poll() would be invoked at least once (maybe set
>> ONESHOT as well).
> 
> As long as the cookie points to the correct hctx, then it would work.

The only commands that can go to a polling hctx today either have a bio 
or they are connect... I don't think that any other sync commands has an
interface to end up on a polling hctx.

So perhaps we should just verify that with:
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d4be525f8100..29b31d8d9d8e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1051,6 +1051,11 @@ int __nvme_submit_sync_cmd(struct request_queue 
*q, struct nvme_command *cmd,
                 ret = blk_rq_map_kern(q, req, buffer, bufflen, GFP_KERNEL);
                 if (ret)
                         goto out;
+       } else if (req->cmd_flags & REQ_POLLED) {
+               dev_err(nvme_req(req)->ctrl->device,
+                       "cannot issue a polled request with no bio\n");
+               ret = -EINVAL;
+               goto out;
         }

         ret = nvme_execute_rq(req, at_head);
--

Thoughts?


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-19 13:10           ` Sagi Grimberg
@ 2023-03-21  8:23             ` Daniel Wagner
  2023-03-21  8:49               ` Daniel Wagner
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21  8:23 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Sun, Mar 19, 2023 at 03:10:40PM +0200, Sagi Grimberg wrote:
> Thoughts?

It still crashes in the same way with both patches from this
disucssion applied.

 nvme nvme1: mapped 8/0/2 default/read/poll queues.
 general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
 CPU: 5 PID: 16617 Comm: nvme Kdump: loaded Tainted: G        W          6.3.0-rc1+ #9 d97c09c311a99b3c39b25760658850e8f66ae67b
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 RIP: 0010:blk_poll+0x31/0x350
 Code: 57 41 56 41 55 41 54 53 48 83 ec 18 41 89 cd 49 89 f6 48 89 fd 48 b9 00 00 00 00 00 fc ff df 48 8d 5a 34 48 89 d8 48 c1 e8 03 <8a> 04 08 84 c0 0f 85 ea 02 00 00 44 8b 23 45 31 ff 4

 RSP: 0018:ffff888114dbf670 EFLAGS: 00010207
 RAX: 0000000000000006 RBX: 0000000000000034 RCX: dffffc0000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881182f56e0
 RBP: ffff8881182f56e0 R08: dffffc0000000000 R09: fffffbfff3803f46
 R10: fffffbfff3803f46 R11: 1ffffffff3803f45 R12: ffff888132860018
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff888114dbf700
 FS:  00007efde0867780(0000) GS:ffff8881f1400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000556f0b5af810 CR3: 0000000105c9a005 CR4: 0000000000170ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? __init_swait_queue_head+0xab/0x140
  blk_execute_rq+0x388/0x590
  ? blk_rq_is_poll+0xb0/0xb0
  ? complete+0x2c/0x1e0
  ? blk_rq_map_kern+0x5e0/0x790
  __nvme_submit_sync_cmd+0x31c/0x6a0 [nvme_core 355464cf83c3fcaf7cde9c80e64f0ce3bbc1f5e0]
  nvmf_connect_io_queue+0x30d/0x5e0 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
  ? nvmf_log_connect_error+0x470/0x470 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
  ? blk_set_default_limits+0x195/0x4d0
  ? blk_alloc_queue+0x3a4/0x460
  nvme_tcp_start_queue+0x30/0x360 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
  nvme_tcp_setup_ctrl+0xc03/0x1690 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
  ? nvme_reset_ctrl_work+0xf0/0xf0 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
  ? _raw_spin_unlock_irqrestore+0x32/0x50
  ? nvme_change_ctrl_state+0xec/0x2d0 [nvme_core 355464cf83c3fcaf7cde9c80e64f0ce3bbc1f5e0]
  nvme_tcp_create_ctrl+0x71e/0xa80 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
  nvmf_dev_write+0x498/0x790 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
  vfs_write+0x1fc/0xaa0
  ? n_tty_read+0x1250/0x1250
  ? file_end_write+0x1a0/0x1a0
  ? vfs_write+0x57f/0xaa0
  ? file_end_write+0x1a0/0x1a0
  ? __fdget_pos+0x51/0x250
  ksys_write+0x128/0x210
  ? __ia32_sys_read+0x80/0x80
  ? syscall_enter_from_user_mode+0x2e/0x1c0
  do_syscall_64+0x60/0x90
  ? do_syscall_64+0x6e/0x90
  ? do_user_addr_fault+0x747/0x8e0
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x7efddf706af3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  8:23             ` Daniel Wagner
@ 2023-03-21  8:49               ` Daniel Wagner
  2023-03-21  8:56                 ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21  8:49 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 09:23:08AM +0100, Daniel Wagner wrote:
> On Sun, Mar 19, 2023 at 03:10:40PM +0200, Sagi Grimberg wrote:
> > Thoughts?
> 
> It still crashes in the same way with both patches from this
> disucssion applied.

annoted the __nvme_submit_sync_cmd:


[  210.019050] nvme nvme0: rq ffff8881122a0000 bio ffff88810d8cdd00
[  210.022653] nvme nvme0: rq ffff8881122a0200 bio 0000000000000000
[  210.023571] CPU: 4 PID: 15752 Comm: nvme Tainted: G        W          6.3.0-rc1+ #9 d97c09c311a99b3c39b25760658850e8f66ae67b
[  210.025120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[  210.026224] Call Trace:
[  210.026679]  <TASK>
[  210.027069]  dump_stack_lvl+0x5a/0x80
[  210.027695]  __nvme_submit_sync_cmd+0x518/0x750 [nvme_core f2d2b58d969ec189999606e54c8a53895e280d91]
[  210.029076]  nvmf_reg_read64+0x14f/0x2a0 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[  210.030331]  ? nvmf_reg_read32+0x290/0x290 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[  210.031669]  ? nvmf_connect_admin_queue+0x381/0x5d0 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[  210.033031]  ? nvmf_reg_write32+0x260/0x260 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[  210.034329]  nvme_enable_ctrl+0xcb/0x580 [nvme_core f2d2b58d969ec189999606e54c8a53895e280d91]
[  210.035589]  ? nvme_wait_ready+0x2f0/0x2f0 [nvme_core f2d2b58d969ec189999606e54c8a53895e280d91]
[  210.036882]  ? nvme_tcp_start_queue+0x87/0x360 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
[  210.038156]  nvme_tcp_setup_ctrl+0x210/0x1690 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
[  210.039439]  ? nvme_reset_ctrl_work+0xf0/0xf0 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
[  210.040692]  ? _raw_spin_unlock_irqrestore+0x32/0x50
[  210.041452]  ? nvme_change_ctrl_state+0xec/0x2d0 [nvme_core f2d2b58d969ec189999606e54c8a53895e280d91]
[  210.042805]  nvme_tcp_create_ctrl+0x71e/0xa80 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]
[  210.044080]  nvmf_dev_write+0x498/0x790 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[  210.045325]  vfs_write+0x1fc/0xaa0
[  210.045895]  ? n_tty_read+0x1250/0x1250
[  210.046530]  ? file_end_write+0x1a0/0x1a0
[  210.047164]  ? vfs_write+0x57f/0xaa0
[  210.047758]  ? file_end_write+0x1a0/0x1a0
[  210.048406]  ? do_user_addr_fault+0x747/0x8e0
[  210.049096]  ? __fdget_pos+0x51/0x250
[  210.049703]  ksys_write+0x128/0x210
[  210.050285]  ? __ia32_sys_read+0x80/0x80
[  210.050929]  ? syscall_enter_from_user_mode+0x2e/0x1c0
[  210.051714]  do_syscall_64+0x60/0x90
[  210.052302]  ? do_syscall_64+0x6e/0x90
[  210.052929]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  210.053693] RIP: 0033:0x7f7ef9f06af3


It looks like the register read/writes are the ones without a bio.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  8:49               ` Daniel Wagner
@ 2023-03-21  8:56                 ` Sagi Grimberg
  2023-03-21  9:09                   ` Daniel Wagner
  0 siblings, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21  8:56 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>>> Thoughts?
>>
>> It still crashes in the same way with both patches from this
>> disucssion applied.
> 
> annoted the __nvme_submit_sync_cmd:
> 
> 
> [  210.019050] nvme nvme0: rq ffff8881122a0000 bio ffff88810d8cdd00
> [  210.022653] nvme nvme0: rq ffff8881122a0200 bio 0000000000000000

Why is the admin queue created as a polling hctx?
It shouldn't be.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  8:56                 ` Sagi Grimberg
@ 2023-03-21  9:09                   ` Daniel Wagner
  2023-03-21  9:15                     ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21  9:09 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 10:56:21AM +0200, Sagi Grimberg wrote:
> 
> > > > Thoughts?
> > > 
> > > It still crashes in the same way with both patches from this
> > > disucssion applied.
> > 
> > annoted the __nvme_submit_sync_cmd:
> > 
> > 
> > [  210.019050] nvme nvme0: rq ffff8881122a0000 bio ffff88810d8cdd00
> > [  210.022653] nvme nvme0: rq ffff8881122a0200 bio 0000000000000000
> 
> Why is the admin queue created as a polling hctx?
> It shouldn't be.

Isn't this because we use NVME_QID_ANY as qid for the register access?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  9:09                   ` Daniel Wagner
@ 2023-03-21  9:15                     ` Sagi Grimberg
  2023-03-21  9:25                       ` Daniel Wagner
  0 siblings, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21  9:15 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>>>>> Thoughts?
>>>>
>>>> It still crashes in the same way with both patches from this
>>>> disucssion applied.
>>>
>>> annoted the __nvme_submit_sync_cmd:
>>>
>>>
>>> [  210.019050] nvme nvme0: rq ffff8881122a0000 bio ffff88810d8cdd00
>>> [  210.022653] nvme nvme0: rq ffff8881122a0200 bio 0000000000000000
>>
>> Why is the admin queue created as a polling hctx?
>> It shouldn't be.
> 
> Isn't this because we use NVME_QID_ANY as qid for the register access?

The admin tagset does not have a polled hctxs map to begin with,
so I'm unclear how any fabrics or admin requests end up polled...


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  9:15                     ` Sagi Grimberg
@ 2023-03-21  9:25                       ` Daniel Wagner
  2023-03-21  9:37                         ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21  9:25 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 11:15:53AM +0200, Sagi Grimberg wrote:
> The admin tagset does not have a polled hctxs map to begin with,
> so I'm unclear how any fabrics or admin requests end up polled...

Hmm, if no map_queues() callback is provided for the admin tag set, isn't the
default mapping function used and this would add the poll hctxs map? Let me add
a map_queues() callback and see what happens :)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  9:25                       ` Daniel Wagner
@ 2023-03-21  9:37                         ` Sagi Grimberg
  2023-03-21 10:15                           ` Sagi Grimberg
  2023-03-21 10:40                           ` Daniel Wagner
  0 siblings, 2 replies; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21  9:37 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>> The admin tagset does not have a polled hctxs map to begin with,
>> so I'm unclear how any fabrics or admin requests end up polled...
> 
> Hmm, if no map_queues() callback is provided for the admin tag set, isn't the
> default mapping function used and this would add the poll hctxs map? Let me add
> a map_queues() callback and see what happens :)

admin_tagset.nr_maps = 1 (only the default map, no read, no poll)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  9:37                         ` Sagi Grimberg
@ 2023-03-21 10:15                           ` Sagi Grimberg
  2023-03-21 16:26                             ` Keith Busch
  2023-03-21 10:40                           ` Daniel Wagner
  1 sibling, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21 10:15 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>>> The admin tagset does not have a polled hctxs map to begin with,
>>> so I'm unclear how any fabrics or admin requests end up polled...
>>
>> Hmm, if no map_queues() callback is provided for the admin tag set, 
>> isn't the
>> default mapping function used and this would add the poll hctxs map? 
>> Let me add
>> a map_queues() callback and see what happens :)
> 
> admin_tagset.nr_maps = 1 (only the default map, no read, no poll)

Oddly, I don't see admin/fabrics requests being polled...

I just attempted with the below patch and it seems to work.
The only change I did was to pass to blk_poll the cookie as well,
from bio_poll that is bi_cookie, and from blk_rq_poll it is computed
from the hctx directly.
--
diff --git a/block/blk-core.c b/block/blk-core.c
index 9e5e0277a4d9..098840fe8bef 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -844,30 +844,11 @@ void submit_bio(struct bio *bio)
  }
  EXPORT_SYMBOL(submit_bio);

-/**
- * bio_poll - poll for BIO completions
- * @bio: bio to poll for
- * @iob: batches of IO
- * @flags: BLK_POLL_* flags that control the behavior
- *
- * Poll for completions on queue associated with the bio. Returns number of
- * completed entries found.
- *
- * Note: the caller must either be the context that submitted @bio, or
- * be in a RCU critical section to prevent freeing of @bio.
- */
-int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int 
flags)
+static int blk_poll(struct request_queue *q, struct io_comp_batch *iob,
+                   blk_qc_t cookie, struct bio *bio, unsigned int flags)
  {
-       blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
-       struct block_device *bdev;
-       struct request_queue *q;
         int ret = 0;

-       bdev = READ_ONCE(bio->bi_bdev);
-       if (!bdev)
-               return 0;
-
-       q = bdev_get_queue(bdev);
         if (cookie == BLK_QC_T_NONE ||
             !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
                 return 0;
@@ -902,6 +883,39 @@ int bio_poll(struct bio *bio, struct io_comp_batch 
*iob, unsigned int flags)
         blk_queue_exit(q);
         return ret;
  }
+
+/**
+ * blk_rq_poll - poll for request completions
+ * @rq: request to poll for
+ */
+int blk_rq_poll(struct request *rq)
+{
+       return blk_poll(rq->q, NULL, blk_rq_to_qc(rq), rq->bio, 0);
+}
+EXPORT_SYMBOL_GPL(blk_rq_poll);
+
+/**
+ * bio_poll - poll for BIO completions
+ * @bio: bio to poll for
+ * @iob: batches of IO
+ * @flags: BLK_POLL_* flags that control the behavior
+ *
+ * Poll for completions on queue associated with the bio. Returns number of
+ * completed entries found.
+ *
+ * Note: the caller must either be the context that submitted @bio, or
+ * be in a RCU critical section to prevent freeing of @bio.
+ */
+int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int 
flags)
+{
+       struct block_device *bdev;
+
+       bdev = READ_ONCE(bio->bi_bdev);
+       if (!bdev)
+               return 0;
+
+       return blk_poll(bdev_get_queue(bdev), iob, 
READ_ONCE(bio->bi_cookie), bio, flags);
+}
  EXPORT_SYMBOL_GPL(bio_poll);

  /*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index cf1a39adf9a5..8161a07ca59d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -66,9 +66,6 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
         return bucket;
  }

-#define BLK_QC_T_SHIFT         16
-#define BLK_QC_T_INTERNAL      (1U << 31)
-
  static inline struct blk_mq_hw_ctx *blk_qc_to_hctx(struct 
request_queue *q,
                 blk_qc_t qc)
  {
@@ -86,13 +83,6 @@ static inline struct request *blk_qc_to_rq(struct 
blk_mq_hw_ctx *hctx,
         return blk_mq_tag_to_rq(hctx->tags, tag);
  }

-static inline blk_qc_t blk_rq_to_qc(struct request *rq)
-{
-       return (rq->mq_hctx->queue_num << BLK_QC_T_SHIFT) |
-               (rq->tag != -1 ?
-                rq->tag : (rq->internal_tag | BLK_QC_T_INTERNAL));
-}
-
  /*
   * Check if any of the ctx, dispatch list or elevator
   * have pending work in this hardware queue.
@@ -1359,8 +1349,6 @@ bool blk_rq_is_poll(struct request *rq)
                 return false;
         if (rq->mq_hctx->type != HCTX_TYPE_POLL)
                 return false;
-       if (WARN_ON_ONCE(!rq->bio))
-               return false;
         return true;
  }
  EXPORT_SYMBOL_GPL(blk_rq_is_poll);
@@ -1368,7 +1356,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll);
  static void blk_rq_poll_completion(struct request *rq, struct 
completion *wait)
  {
         do {
-               bio_poll(rq->bio, NULL, 0);
+               blk_rq_poll(rq);
                 cond_resched();
         } while (!completion_done(wait));
  }
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a7482d2cc82e..7100ba5edd16 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -86,6 +86,15 @@ static inline struct blk_mq_hw_ctx 
*blk_mq_map_queue_type(struct request_queue *
         return xa_load(&q->hctx_table, q->tag_set->map[type].mq_map[cpu]);
  }

+#define BLK_QC_T_SHIFT         16
+#define BLK_QC_T_INTERNAL      (1U << 31)
+static inline blk_qc_t blk_rq_to_qc(struct request *rq)
+{
+       return (rq->mq_hctx->queue_num << BLK_QC_T_SHIFT) |
+               (rq->tag != -1 ?
+                rq->tag : (rq->internal_tag | BLK_QC_T_INTERNAL));
+}
+
  static inline enum hctx_type blk_mq_get_hctx_type(blk_opf_t opf)
  {
         enum hctx_type type = HCTX_TYPE_DEFAULT;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d1aee08f8c18..70532cafd3a9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -872,6 +872,7 @@ blk_status_t errno_to_blk_status(int errno);
  #define BLK_POLL_ONESHOT               (1 << 0)
  /* do not sleep to wait for the expected completion time */
  #define BLK_POLL_NOSLEEP               (1 << 1)
+int blk_rq_poll(struct request *rq);
  int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int 
flags);
  int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
                         unsigned int flags);
--


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21  9:37                         ` Sagi Grimberg
  2023-03-21 10:15                           ` Sagi Grimberg
@ 2023-03-21 10:40                           ` Daniel Wagner
  2023-03-21 10:53                             ` Sagi Grimberg
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21 10:40 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 11:37:05AM +0200, Sagi Grimberg wrote:
> admin_tagset.nr_maps = 1 (only the default map, no read, no poll)

Indeed, that would be to easy.

I've just triggered a crash where we are passing in a non-null bio. Some
more annotation. This time I am printing from blk_rq_is_poll() and
we see that that is also the case where we have a valid bio but
want to use the poll context:


[   53.663613] rq ffff888107190000 mq_hctx ffff888106244000 type 0 bio ffff88810da4ec00
[   53.665190] nvme nvme1: q ffff888119c40000 rq ffff888124da0000 bio ffff88810da4e600
[   53.665230] rq ffff888124da0000 mq_hctx ffff888106241800 type 0 bio ffff88810da4e600
[   53.666293] nvme nvme1: q ffff888119c40000 rq ffff888106c40000 bio ffff88810da4e100
[   53.669844] rq ffff888106c40000 mq_hctx ffff888106247800 type 2 bio ffff88810da4e100
[   53.670682] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
[   53.670689] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[   53.670694] CPU: 6 PID: 6410 Comm: nvme Tainted: G        W          6.3.0-rc1+ #10 5490073fe695e8e1be1b11c57a398a463ed2e52d
[   53.670701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[   53.670705] RIP: 0010:blk_poll+0x31/0x350
[   53.677417] Code: 57 41 56 41 55 41 54 53 48 83 ec 18 41 89 cd 49 89 f6 48 89 fd 48 b9 00 00 00 00 00 fc ff df 48 8d 5a 34 48 89 d8 48 c1 e8 03 <8a> 04 08 84 c0 0f 85 ea 02 00 00 44 8b 23 45 31 ff 41 83 fc ff 0f
[   53.677422] RSP: 0018:ffff88810642f710 EFLAGS: 00010207
[   53.677429] RAX: 0000000000000006 RBX: 0000000000000034 RCX: dffffc0000000000
[   53.677433] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888119c40000
[   53.677436] RBP: ffff888119c40000 R08: dffffc0000000000 R09: ffffed103e33e0f2
[   53.677440] R10: ffffed103e33e0f2 R11: 1ffff1103e33e0f1 R12: 1ffff11020d88002
[   53.677443] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810642f7c0
[   53.677447] FS:  00007fd70718a780(0000) GS:ffff8881f1800000(0000) knlGS:0000000000000000
[   53.677451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.677455] CR2: 00007f25a1c176f8 CR3: 00000001048b6003 CR4: 0000000000170ee0
[   53.677462] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   53.677465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   53.677469] Call Trace:
[   53.677472]  <TASK>
[   53.677476]  ? blk_rq_poll+0x40/0x60
[   53.691431]  blk_execute_rq+0x418/0x640
[   53.691445]  ? blk_rq_is_poll+0x170/0x170
[   53.691454]  ? complete+0x2c/0x1e0
[   53.691469]  __nvme_submit_sync_cmd+0x3eb/0x750 [nvme_core 3b8f33cff2a9cda33de352373714dd43a47c79c4]
[   53.694428]  nvmf_connect_io_queue+0x30d/0x5e0 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[   53.694449]  ? nvmf_log_connect_error+0x470/0x470 [nvme_fabrics a56b21f9a9f011a785bd0916f38d0deca6de166d]
[   53.694466]  ? blk_set_default_limits+0x195/0x4d0
[   53.694474]  ? blk_alloc_queue+0x3a4/0x460
[   53.694483]  nvme_tcp_start_queue+0x30/0x360 [nvme_tcp 8413e4e242b091568613e66c1cbb42a8845a3aa7]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 10:40                           ` Daniel Wagner
@ 2023-03-21 10:53                             ` Sagi Grimberg
  2023-03-21 11:06                               ` Daniel Wagner
  0 siblings, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21 10:53 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>> admin_tagset.nr_maps = 1 (only the default map, no read, no poll)
> 
> Indeed, that would be to easy.
> 
> I've just triggered a crash where we are passing in a non-null bio. Some
> more annotation. This time I am printing from blk_rq_is_poll() and
> we see that that is also the case where we have a valid bio but
> want to use the poll context:

That is not a crash, but a WARN stack dump. It is still unclear to me
how exactly you get to poll for a bio-less request.

See my other reply, While I removed the below warning, and allow
bio-less request polling, I was not able to observe any bio-less
requests actually being polled.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 10:53                             ` Sagi Grimberg
@ 2023-03-21 11:06                               ` Daniel Wagner
  2023-03-21 11:10                                 ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21 11:06 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 12:53:29PM +0200, Sagi Grimberg wrote:
> 
> > > admin_tagset.nr_maps = 1 (only the default map, no read, no poll)
> > 
> > Indeed, that would be to easy.
> > 
> > I've just triggered a crash where we are passing in a non-null bio. Some
> > more annotation. This time I am printing from blk_rq_is_poll() and
> > we see that that is also the case where we have a valid bio but
> > want to use the poll context:
> 
> That is not a crash, but a WARN stack dump.

Not sure how you get to this conclusion.

> It is still unclear to me
> how exactly you get to poll for a bio-less request.

I don't do anything special here. The only thing which is special is that I am
testing against Linux soft target with Hannes TP8013 patches.

> See my other reply, While I removed the below warning, and allow
> bio-less request polling, I was not able to observe any bio-less
> requests actually being polled.

Unfortunatly, somehow all your inline patches receive my inbox whitespace
damaged. Takes a few minutes to patch it manually. Anyway, with your patch
the crash is gone.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 11:06                               ` Daniel Wagner
@ 2023-03-21 11:10                                 ` Sagi Grimberg
  2023-03-21 11:14                                   ` Sagi Grimberg
  2023-03-21 12:58                                   ` Daniel Wagner
  0 siblings, 2 replies; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21 11:10 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>>>> admin_tagset.nr_maps = 1 (only the default map, no read, no poll)
>>>
>>> Indeed, that would be to easy.
>>>
>>> I've just triggered a crash where we are passing in a non-null bio. Some
>>> more annotation. This time I am printing from blk_rq_is_poll() and
>>> we see that that is also the case where we have a valid bio but
>>> want to use the poll context:
>>
>> That is not a crash, but a WARN stack dump.
> 
> Not sure how you get to this conclusion.
> 
>> It is still unclear to me
>> how exactly you get to poll for a bio-less request.
> 
> I don't do anything special here. The only thing which is special is that I am
> testing against Linux soft target with Hannes TP8013 patches.
> 
>> See my other reply, While I removed the below warning, and allow
>> bio-less request polling, I was not able to observe any bio-less
>> requests actually being polled.
> 
> Unfortunatly, somehow all your inline patches receive my inbox whitespace
> damaged. Takes a few minutes to patch it manually. Anyway, with your patch
> the crash is gone.

Can you verify that you never see a bio-less requests is being polled?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 11:10                                 ` Sagi Grimberg
@ 2023-03-21 11:14                                   ` Sagi Grimberg
  2023-03-21 12:41                                     ` Daniel Wagner
  2023-03-21 12:58                                   ` Daniel Wagner
  1 sibling, 1 reply; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21 11:14 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>>> That is not a crash, but a WARN stack dump.
>>
>> Not sure how you get to this conclusion.

bool blk_rq_is_poll(struct request *rq)
{
         if (!rq->mq_hctx)
                 return false;
         if (rq->mq_hctx->type != HCTX_TYPE_POLL)
                 return false;
         if (WARN_ON_ONCE(!rq->bio)) // this is the stack dump
                 return false;
         return true;
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 11:14                                   ` Sagi Grimberg
@ 2023-03-21 12:41                                     ` Daniel Wagner
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21 12:41 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 01:14:11PM +0200, Sagi Grimberg wrote:
> 
> > > > That is not a crash, but a WARN stack dump.
> > > 
> > > Not sure how you get to this conclusion.
> 
> bool blk_rq_is_poll(struct request *rq)
> {
>         if (!rq->mq_hctx)
>                 return false;
>         if (rq->mq_hctx->type != HCTX_TYPE_POLL)
>                 return false;
>         if (WARN_ON_ONCE(!rq->bio)) // this is the stack dump
>                 return false;
>         return true;
> }

This should trigger if rq->bio is NULL, no? Though the rq->bio
pointer is not NULL in my trace:

[   53.669844] rq ffff888106c40000 mq_hctx ffff888106247800 type 2 bio ffff88810da4e100

And I thought the would be a WARN or something simimilar, but that
might have changed and I didn't noticed.

Sorry for being so daft.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 11:10                                 ` Sagi Grimberg
  2023-03-21 11:14                                   ` Sagi Grimberg
@ 2023-03-21 12:58                                   ` Daniel Wagner
  2023-03-21 13:08                                     ` Sagi Grimberg
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Wagner @ 2023-03-21 12:58 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 01:10:51PM +0200, Sagi Grimberg wrote:
> Can you verify that you never see a bio-less requests is being polled?

As far I can tell this is not happening anymore. I could trigger just by
doing a

   nvme connect-all -P 2

I've added my annotation again and don't see the combination of

  hxct type = 2 && bio == 0

anymore:

[  503.353236] nvme nvme1: creating 12 I/O queues.
[  503.366038] nvme nvme1: mapped 2/8/2 default/read/poll queues.
[  503.370816] nvme nvme1: q ffff888106944580 rq ffff888140090000 bio ffff88810d1c5e00
[  503.372115] rq ffff888140090000 mq_hctx ffff888104e87000 type 0 bio ffff88810d1c5e00
[  503.372977] nvme nvme1: q ffff888106944580 rq ffff888121820000 bio ffff88810d1c4800
[  503.375439] rq ffff888121820000 mq_hctx ffff888104e80800 type 0 bio ffff88810d1c4800
[  503.376712] nvme nvme1: q ffff888106944580 rq ffff888128790000 bio ffff8881ec613100
[  503.379097] rq ffff888128790000 mq_hctx ffff888104e82000 type 1 bio ffff8881ec613100
[  503.380127] nvme nvme1: q ffff888106944580 rq ffff888129870000 bio ffff8881243c6c00
[  503.382560] rq ffff888129870000 mq_hctx ffff888104e84000 type 1 bio ffff8881243c6c00
[  503.383833] nvme nvme1: q ffff888106944580 rq ffff888105910000 bio ffff8881243c7d00
[  503.383889] rq ffff888105910000 mq_hctx ffff888104e82800 type 1 bio ffff8881243c7d00
[  503.386522] nvme nvme1: q ffff888106944580 rq ffff888104350000 bio ffff8881243c6600
[  503.389771] rq ffff888104350000 mq_hctx ffff888104e83800 type 1 bio ffff8881243c6600
[  503.390783] nvme nvme1: q ffff888106944580 rq ffff888107610000 bio ffff8881243c7c00
[  503.393617] rq ffff888107610000 mq_hctx ffff888104e86000 type 1 bio ffff8881243c7c00
[  503.394612] nvme nvme1: q ffff888106944580 rq ffff888104280000 bio ffff8881243c6200
[  503.397470] rq ffff888104280000 mq_hctx ffff888104e81800 type 1 bio ffff8881243c6200
[  503.398530] nvme nvme1: q ffff888106944580 rq ffff888113e30000 bio ffff8881243c7000
[  503.401462] rq ffff888113e30000 mq_hctx ffff888104e84800 type 1 bio ffff8881243c7000
[  503.402353] nvme nvme1: q ffff888106944580 rq ffff888128940000 bio ffff8881243c7800
[  503.405161] rq ffff888128940000 mq_hctx ffff888104e86800 type 1 bio ffff8881243c7800
[  503.406433] nvme nvme1: q ffff888106944580 rq ffff8881070c0000 bio ffff8881243c7200
[  503.409353] rq ffff8881070c0000 mq_hctx ffff888104e85800 type 2 bio ffff8881243c7200
[  503.410438] nvme nvme1: q ffff888106944580 rq ffff888112210000 bio ffff8881243c7700
[  503.413271] rq ffff888112210000 mq_hctx ffff888104e87800 type 2 bio ffff8881243c7700
[  503.414505] nvme nvme1: q ffff888106e84e30 rq ffff888105fc1400 bio 0000000000000000
[  503.417447] rq ffff888105fc1400 mq_hctx ffff8881417c4800 type 0 bio 0000000000000000
[  503.418401] nvme nvme1: q ffff888106e84e30 rq ffff888105fc1000 bio ffff888111242000
[  503.418410] nvme nvme1: new ctrl: NQN "nqn.io-1", addr 10.161.60.136:4420


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 12:58                                   ` Daniel Wagner
@ 2023-03-21 13:08                                     ` Sagi Grimberg
  0 siblings, 0 replies; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-21 13:08 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Keith Busch, Belanger, Martin, linux-nvme


>> Can you verify that you never see a bio-less requests is being polled?
> 
> As far I can tell this is not happening anymore. I could trigger just by
> doing a
> 
>     nvme connect-all -P 2
> 
> I've added my annotation again and don't see the combination of
> 
>    hxct type = 2 && bio == 0
> 
> anymore:

I think I understand what was happening. blk_execute_rq calls
blk_mq_sched_insert_request and then starts polling. But because
this is a connect, that is running from a single context, it is
not running on the same cpu as the io queue was mapped to, hence
it is async.

The bio->bi_cookie assignment happens only when the request is started
(blk_mq_start_request is called), that is where the assignment happens,
but given that this is async, it can and does occur after blk_execute_rq
starts polling, and dereferencing bio->bi_cookie.

So I think we need to both untangle the need for a bdev to poll, and
a need for a bio with a stable bi_cookie. And the latter patch does
both.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 10:15                           ` Sagi Grimberg
@ 2023-03-21 16:26                             ` Keith Busch
  2023-03-22  7:12                               ` Sagi Grimberg
  0 siblings, 1 reply; 30+ messages in thread
From: Keith Busch @ 2023-03-21 16:26 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Daniel Wagner, Belanger, Martin, linux-nvme

On Tue, Mar 21, 2023 at 12:15:47PM +0200, Sagi Grimberg wrote:
> 
> > > > The admin tagset does not have a polled hctxs map to begin with,
> > > > so I'm unclear how any fabrics or admin requests end up polled...
> > > 
> > > Hmm, if no map_queues() callback is provided for the admin tag set,
> > > isn't the
> > > default mapping function used and this would add the poll hctxs map?
> > > Let me add
> > > a map_queues() callback and see what happens :)
> > 
> > admin_tagset.nr_maps = 1 (only the default map, no read, no poll)
> 
> Oddly, I don't see admin/fabrics requests being polled...
> 
> I just attempted with the below patch and it seems to work.
> The only change I did was to pass to blk_poll the cookie as well,
> from bio_poll that is bi_cookie, and from blk_rq_poll it is computed
> from the hctx directly.

I think we can can really simplify this path a lot knowing that we are dealing
with a live request from a polling queue. Then we can poll for requests without
bios too.

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d60ef2f0fa50b..b64c7d491306b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1323,8 +1323,6 @@ bool blk_rq_is_poll(struct request *rq)
 		return false;
 	if (rq->mq_hctx->type != HCTX_TYPE_POLL)
 		return false;
-	if (WARN_ON_ONCE(!rq->bio))
-		return false;
 	return true;
 }
 EXPORT_SYMBOL_GPL(blk_rq_is_poll);
@@ -1332,7 +1330,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll);
 static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
 {
 	do {
-		bio_poll(rq->bio, NULL, 0);
+		blk_mq_poll(rq->q, blk_rq_to_qc(rq), NULL, 0);
 		cond_resched();
 	} while (!completion_done(wait));
 }
--


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034
  2023-03-21 16:26                             ` Keith Busch
@ 2023-03-22  7:12                               ` Sagi Grimberg
  0 siblings, 0 replies; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-22  7:12 UTC (permalink / raw)
  To: Keith Busch; +Cc: Daniel Wagner, Belanger, Martin, linux-nvme


>> I just attempted with the below patch and it seems to work.
>> The only change I did was to pass to blk_poll the cookie as well,
>> from bio_poll that is bi_cookie, and from blk_rq_poll it is computed
>> from the hctx directly.
> 
> I think we can can really simplify this path a lot knowing that we are dealing
> with a live request from a polling queue. Then we can poll for requests without
> bios too.
> 
> ---
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d60ef2f0fa50b..b64c7d491306b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1323,8 +1323,6 @@ bool blk_rq_is_poll(struct request *rq)
>   		return false;
>   	if (rq->mq_hctx->type != HCTX_TYPE_POLL)
>   		return false;
> -	if (WARN_ON_ONCE(!rq->bio))
> -		return false;
>   	return true;
>   }
>   EXPORT_SYMBOL_GPL(blk_rq_is_poll);
> @@ -1332,7 +1330,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll);
>   static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
>   {
>   	do {
> -		bio_poll(rq->bio, NULL, 0);
> +		blk_mq_poll(rq->q, blk_rq_to_qc(rq), NULL, 0);
>   		cond_resched();
>   	} while (!completion_done(wait));
>   }
> --

Much simpler! Like


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-03-22  7:13 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15 17:48 nvme-tcp: kernel NULL pointer dereference, address: 0000000000000034 Belanger, Martin
2023-03-15 18:13 ` Keith Busch
2023-03-15 18:23   ` Belanger, Martin
2023-03-15 19:39     ` Keith Busch
2023-03-16  8:57       ` Sagi Grimberg
2023-03-15 22:49     ` Chaitanya Kulkarni
2023-03-15 22:24 ` Keith Busch
2023-03-16  9:00   ` Sagi Grimberg
2023-03-16 15:20     ` Keith Busch
2023-03-16 16:11       ` Sagi Grimberg
2023-03-16 17:19         ` Keith Busch
2023-03-19 13:10           ` Sagi Grimberg
2023-03-21  8:23             ` Daniel Wagner
2023-03-21  8:49               ` Daniel Wagner
2023-03-21  8:56                 ` Sagi Grimberg
2023-03-21  9:09                   ` Daniel Wagner
2023-03-21  9:15                     ` Sagi Grimberg
2023-03-21  9:25                       ` Daniel Wagner
2023-03-21  9:37                         ` Sagi Grimberg
2023-03-21 10:15                           ` Sagi Grimberg
2023-03-21 16:26                             ` Keith Busch
2023-03-22  7:12                               ` Sagi Grimberg
2023-03-21 10:40                           ` Daniel Wagner
2023-03-21 10:53                             ` Sagi Grimberg
2023-03-21 11:06                               ` Daniel Wagner
2023-03-21 11:10                                 ` Sagi Grimberg
2023-03-21 11:14                                   ` Sagi Grimberg
2023-03-21 12:41                                     ` Daniel Wagner
2023-03-21 12:58                                   ` Daniel Wagner
2023-03-21 13:08                                     ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.