Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* Sighting: Kernel fault with large write (512k) and io_uring
@ 2020-03-18 23:37 Wunderlich, Mark
  2020-03-19  1:12 ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-18 23:37 UTC (permalink / raw)
  To: linux-nvme; +Cc: Jens Axboe, Sagi Grimberg

Hope all are well out there - wash hands (mine are getting rather chapped) and keep a good social distance.

OK for my sighting.  Recently experienced a large write failure and so I retested after removing all development patches.
I rebuilt initiator kernel removing all development patches, except the priority patch (but set to 0, making it a noop) and the three stability patches listed below I had to apply to correct a previous fault I had seen when using baseline branch nvme-5.5-rc.  With the FIO verify options turned on I see a different failure report (shown at bottom)  So, wondering if any one out there knows of some other lingering large write issue that may have been fixed?

Three stability patches added to tip of branch, suggested previously to fix an issue (list add corruption) I reported on linux-block list:
- io-wq-re-add-io_wq_current_is_worker
- io_uring-ensure-workqueue-offload-grabs-ring-mutex
- io_uring-clear-req-result-always-before-issuing

The initiator host kernel fault details are the following:

[ 1907.415517] nvme nvme0: queue 5: timeout request 0x11 type 6
[ 1907.415519] nvme nvme0: starting error recovery
[ 1908.432805] BUG: kernel NULL pointer dereference, address: 00000000000000c8
[ 1908.433229] #PF: supervisor read access in kernel mode
[ 1908.433536] #PF: error_code(0x0000) - not-present page
[ 1908.433844] PGD 8000001f8e7c2067 P4D 8000001f8e7c2067 PUD 202eab9067 PMD 0
[ 1908.434292] Oops: 0000 [#1] SMP PTI
[ 1908.434498] CPU: 3 PID: 5626 Comm: fio Tainted: G           O      5.5.0-rc2stable+ #56
[ 1908.434967] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
[ 1908.435408] RIP: 0010:nvme_tcp_try_recv+0x59/0x90 [nvme_tcp]
[ 1908.435739] Code: 24 20 31 c0 49 8b 5c 24 18 48 89 df e8 d0 c6 58 d9 c7 45 60 00 00 00 00 49 8b 44 24 20 48 c7 c2 40 c0 26 c0 48 89 e6 48 89 df <48> 8b 80 c8 00 00 00 e8 3b 67 99 d9 48 89 df 89 c5 e8 c1 e6 58 d9
[ 1908.436838] RSP: 0018:ffffb5f309b7bcb0 EFLAGS: 00010286
[ 1908.437144] RAX: 0000000000000000 RBX: ffff89164c076900 RCX: 0000000000000000
[ 1908.437561] RDX: ffffffffc026c040 RSI: ffffb5f309b7bcb0 RDI: ffff89164c076900
[ 1908.437978] RBP: ffff8915f63e0460 R08: 0000000000000000 R09: 0000000000000001
[ 1908.438396] R10: 0000000000000024 R11: 071c71c71c71c71c R12: ffff8916624b2d80
[ 1908.438813] R13: ffff89167045c000 R14: ffff8915f73e5230 R15: ffff8905f53c9800
[ 1908.439231] FS:  00007f1d60ebc700(0000) GS:ffff89167f040000(0000) knlGS:0000000000000000
[ 1908.439705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1908.440041] CR2: 00000000000000c8 CR3: 000000201e2a2003 CR4: 00000000007606e0
[ 1908.440458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1908.440875] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1908.441292] PKRU: 55555554
[ 1908.441450] Call Trace:
[ 1908.441597]  nvme_tcp_poll+0x49/0x70 [nvme_tcp]
[ 1908.441866]  blk_poll+0x25a/0x360
[ 1908.442067]  io_iopoll_getevents+0xe8/0x360
[ 1908.442315]  ? __switch_to_asm+0x40/0x70
[ 1908.442546]  __io_iopoll_check+0x4b/0xa0
[ 1908.442777]  __x64_sys_io_uring_enter+0x19c/0x600
[ 1908.443055]  ? schedule+0x4a/0xb0
[ 1908.443254]  do_syscall_64+0x5b/0x1b0
[ 1908.443469]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1908.443765] RIP: 0033:0x7f1dd17ecec9
[ 1908.443975] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 cf 2c 00 f7 d8 64 89 01 48
[ 1908.445073] RSP: 002b:00007f1d60ebbac8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
[ 1908.445516] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1dd17ecec9
[ 1908.445933] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000005
[ 1908.446350] RBP: 0000000000000020 R08: 0000000000000000 R09: 00007f1d00000000

If I turn on FIO verification options ( --do_verify=1 --verify=crc32c ) I see the following fault for same 512k write I/O pattern:

[ 4850.021884] BUG: stack guard page was hit at 00000000291034b3 (stack is 0000000040c9cc3e..00000000e65d9875)
[ 4850.022471] kernel stack overflow (page fault): 0000 [#1] SMP PTI
[ 4850.022829] CPU: 3 PID: 3744 Comm: fio Tainted: G           O      5.5.0-rc2stable+ #56
[ 4850.023298] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
[ 4850.023742] RIP: 0010:memcpy_erms+0x6/0x10
[ 4850.023982] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 4850.025087] RSP: 0018:ffffb8ec09557b68 EFLAGS: 00010206
[ 4850.025392] RAX: ffff943ef5c2d840 RBX: ffff943ee9216500 RCX: 00000000000003e0
[ 4850.025808] RDX: 0000000000000800 RSI: ffffb8ec09558000 RDI: ffff943ef5c2dc60
[ 4850.026225] RBP: 0000000000080000 R08: ffffb8ec09557bd8 R09: 0000000000000080
[ 4850.026640] R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000
[ 4850.027056] R13: ffffb8ec09557be0 R14: ffffb8ec09557bb8 R15: 0000000000080000
[ 4850.027472] FS:  00007f5db8053700(0000) GS:ffff943eff040000(0000) knlGS:0000000000000000
[ 4850.027944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4850.028279] CR2: ffffb8ec09558000 CR3: 000000203a052006 CR4: 00000000007606e0
[ 4850.028695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4850.029111] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4850.029527] PKRU: 55555554
[ 4850.029684] Call Trace:
[ 4850.029834]  io_setup_async_io+0x51/0xc0
[ 4850.030065]  io_write+0xe4/0x220
[ 4850.030256]  ? get_page_from_freelist+0x43f/0x1220
[ 4850.030538]  io_issue_sqe+0x419/0xac0
[ 4850.030752]  io_queue_sqe+0x13b/0x620
[ 4850.030967]  ? kmem_cache_alloc_bulk+0x32/0x230
[ 4850.031231]  io_submit_sqes+0x783/0x990
[ 4850.031456]  __x64_sys_io_uring_enter+0x231/0x600
[ 4850.031735]  ? syscall_trace_enter+0x1f8/0x2e0
[ 4850.031995]  do_syscall_64+0x5b/0x1b0
[ 4850.032210]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4850.032506] RIP: 0033:0x7f5e28983ec9
[ 4850.032714] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 cf 2c 00 f7 d8 64 89 01 48
[ 4850.033808] RSP: 002b:00007f5db8052ac8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
[ 4850.034249] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5e28983ec9
[ 4850.034665] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
[ 4850.035080] RBP: 00007f5de40008c0 R08: 0000000000000000 R09: 00007f5d00000000
[ 4850.035495] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5deb0ba000
[ 4850.035911] R13: 00007f5db8052d60 R14: 000000000000c8e0 R15: 00007f5deb0e8a90
[ 4850.036328] Modules linked in: nvme_tcp nvme_fabrics nvme nvme_core ice(O) intel_rapl_msr intel_rapl_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rfkill irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore intel_rapl_perf dcdbas ipmi_ssif pcspkr mei_me joydev i2c_i801 lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm_kms_helper drm_vram_helper drm_ttm_helper crc32c_intel ttm drm tg3 bnxt_en megaraid_sas ptp i2c_algo_bit pps_core [last unloaded: ice]
[ 4850.066910] ---[ end trace a22216d511ea2653 ]---
[ 4850.083136] RIP: 0010:memcpy_erms+0x6/0x10
[ 4850.097681] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 4850.126922] RSP: 0018:ffffb8ec09557b68 EFLAGS: 00010206
[ 4850.141247] RAX: ffff943ef5c2d840 RBX: ffff943ee9216500 RCX: 00000000000003e0
[ 4850.155619] RDX: 0000000000000800 RSI: ffffb8ec09558000 RDI: ffff943ef5c2dc60
[ 4850.169919] RBP: 0000000000080000 R08: ffffb8ec09557bd8 R09: 0000000000000080
[ 4850.183902] R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000
[ 4850.197649] R13: ffffb8ec09557be0 R14: ffffb8ec09557bb8 R15: 0000000000080000
[ 4850.211195] FS:  00007f5db8053700(0000) GS:ffff943eff040000(0000) knlGS:0000000000000000
[ 4850.224881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4850.238645] CR2: ffffb8ec09558000 CR3: 000000203a052006 CR4: 00000000007606e0
[ 4850.252631] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4850.266603] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4850.280477] PKRU: 55555554
[ 4850.294045] ------------[ cut here ]------------
[ 4850.307618] WARNING: CPU: 3 PID: 3744 at kernel/exit.c:723 do_exit+0x50/0xc00
[ 4850.321397] Modules linked in: nvme_tcp nvme_fabrics nvme nvme_core ice(O) intel_rapl_msr intel_rapl_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rfkill irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore intel_rapl_perf dcdbas ipmi_ssif pcspkr mei_me joydev i2c_i801 lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm_kms_helper drm_vram_helper drm_ttm_helper crc32c_intel ttm drm tg3 bnxt_en megaraid_sas ptp i2c_algo_bit pps_core [last unloaded: ice]
[ 4850.397550] CPU: 3 PID: 3744 Comm: fio Tainted: G      D    O      5.5.0-rc2stable+ #56
[ 4850.413832] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
[ 4850.430246] RIP: 0010:do_exit+0x50/0xc00
[ 4850.446603] Code: 25 28 00 00 00 48 89 44 24 28 31 c0 e8 79 04 08 00 49 8b 84 24 b0 0b 00 00 48 85 c0 74 0e 48 8b 10 48 39 d0 0f 84 f9 08 00 00 <0f> 0b 65 8b 05 1f dc f3 78 a9 00 ff 1f 00 0f 85 3c 0a 00 00 45 8b
[ 4850.481036] RSP: 0018:ffffb8ec09557ef0 EFLAGS: 00010012
[ 4850.498330] RAX: ffffb8ec09557dd0 RBX: 000000000000000b RCX: 00000000ffffffff
[ 4850.515836] RDX: ffff943e6c381fc8 RSI: 0000000000000000 RDI: ffffffff884b03c0
[ 4850.533342] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff87ec3be0
[ 4850.550859] R10: 000000000000000f R11: 0000000007070707 R12: ffff943eeaad4000
[ 4850.568303] R13: ffff943eeaad4000 R14: 000000000000000b R15: 0000000000000001
[ 4850.585730] FS:  00007f5db8053700(0000) GS:ffff943eff040000(0000) knlGS:0000000000000000
[ 4850.603111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4850.620223] CR2: ffffb8ec09558000 CR3: 000000203a052006 CR4: 00000000007606e0
[ 4850.637392] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4850.654398] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4850.671306] PKRU: 55555554
[ 4850.688099] Call Trace:
[ 4850.704867]  rewind_stack_do_exit+0x17/0x20
[ 4850.721813] ---[ end trace a22216d511ea2654 ]---


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-18 23:37 Sighting: Kernel fault with large write (512k) and io_uring Wunderlich, Mark
@ 2020-03-19  1:12 ` Sagi Grimberg
  2020-03-19  2:33   ` Jens Axboe
  2020-03-23 21:07   ` Wunderlich, Mark
  0 siblings, 2 replies; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-19  1:12 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe

Hey Mark, thanks for reporting

> Hope all are well out there - wash hands (mine are getting rather chapped) and keep a good social distance.

:)

> OK for my sighting.  Recently experienced a large write failure and so I retested after removing all development patches.
> I rebuilt initiator kernel removing all development patches, except the priority patch (but set to 0, making it a noop) and the three stability patches listed below I had to apply to correct a previous fault I had seen when using baseline branch nvme-5.5-rc.  With the FIO verify options turned on I see a different failure report (shown at bottom)  So, wondering if any one out there knows of some other lingering large write issue that may have been fixed?
> 
> Three stability patches added to tip of branch, suggested previously to fix an issue (list add corruption) I reported on linux-block list:
> - io-wq-re-add-io_wq_current_is_worker
> - io_uring-ensure-workqueue-offload-grabs-ring-mutex
> - io_uring-clear-req-result-always-before-issuing
> 
> The initiator host kernel fault details are the following:
> 
> [ 1907.415517] nvme nvme0: queue 5: timeout request 0x11 type 6
> [ 1907.415519] nvme nvme0: starting error recovery
> [ 1908.432805] BUG: kernel NULL pointer dereference, address: 00000000000000c8
> [ 1908.433229] #PF: supervisor read access in kernel mode
> [ 1908.433536] #PF: error_code(0x0000) - not-present page
> [ 1908.433844] PGD 8000001f8e7c2067 P4D 8000001f8e7c2067 PUD 202eab9067 PMD 0
> [ 1908.434292] Oops: 0000 [#1] SMP PTI
> [ 1908.434498] CPU: 3 PID: 5626 Comm: fio Tainted: G           O      5.5.0-rc2stable+ #56
> [ 1908.434967] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
> [ 1908.435408] RIP: 0010:nvme_tcp_try_recv+0x59/0x90 [nvme_tcp]

Can you run:
gdb drivers/nvme/host/nvme-tcp.ko
...
$ l *(nvme_tcp_try_recv+0x59)

But this looks like a use-after-free condition.

This is interesting, when we go ahead and do error recovery, I am
wandering what is supposed to stop/quiesce blk_poll before safely
tearing down and freeing stuff..

I suspect that this will be common to others that implement blk_poll?

> [ 1908.435739] Code: 24 20 31 c0 49 8b 5c 24 18 48 89 df e8 d0 c6 58 d9 c7 45 60 00 00 00 00 49 8b 44 24 20 48 c7 c2 40 c0 26 c0 48 89 e6 48 89 df <48> 8b 80 c8 00 00 00 e8 3b 67 99 d9 48 89 df 89 c5 e8 c1 e6 58 d9
> [ 1908.436838] RSP: 0018:ffffb5f309b7bcb0 EFLAGS: 00010286
> [ 1908.437144] RAX: 0000000000000000 RBX: ffff89164c076900 RCX: 0000000000000000
> [ 1908.437561] RDX: ffffffffc026c040 RSI: ffffb5f309b7bcb0 RDI: ffff89164c076900
> [ 1908.437978] RBP: ffff8915f63e0460 R08: 0000000000000000 R09: 0000000000000001
> [ 1908.438396] R10: 0000000000000024 R11: 071c71c71c71c71c R12: ffff8916624b2d80
> [ 1908.438813] R13: ffff89167045c000 R14: ffff8915f73e5230 R15: ffff8905f53c9800
> [ 1908.439231] FS:  00007f1d60ebc700(0000) GS:ffff89167f040000(0000) knlGS:0000000000000000
> [ 1908.439705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1908.440041] CR2: 00000000000000c8 CR3: 000000201e2a2003 CR4: 00000000007606e0
> [ 1908.440458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1908.440875] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1908.441292] PKRU: 55555554
> [ 1908.441450] Call Trace:
> [ 1908.441597]  nvme_tcp_poll+0x49/0x70 [nvme_tcp]
> [ 1908.441866]  blk_poll+0x25a/0x360
> [ 1908.442067]  io_iopoll_getevents+0xe8/0x360
> [ 1908.442315]  ? __switch_to_asm+0x40/0x70
> [ 1908.442546]  __io_iopoll_check+0x4b/0xa0
> [ 1908.442777]  __x64_sys_io_uring_enter+0x19c/0x600
> [ 1908.443055]  ? schedule+0x4a/0xb0
> [ 1908.443254]  do_syscall_64+0x5b/0x1b0
> [ 1908.443469]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1908.443765] RIP: 0033:0x7f1dd17ecec9
> [ 1908.443975] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 cf 2c 00 f7 d8 64 89 01 48
> [ 1908.445073] RSP: 002b:00007f1d60ebbac8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
> [ 1908.445516] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1dd17ecec9
> [ 1908.445933] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000005
> [ 1908.446350] RBP: 0000000000000020 R08: 0000000000000000 R09: 00007f1d00000000
> 
> If I turn on FIO verification options ( --do_verify=1 --verify=crc32c ) I see the following fault for same 512k write I/O pattern:
> 
> [ 4850.021884] BUG: stack guard page was hit at 00000000291034b3 (stack is 0000000040c9cc3e..00000000e65d9875)
> [ 4850.022471] kernel stack overflow (page fault): 0000 [#1] SMP PTI
> [ 4850.022829] CPU: 3 PID: 3744 Comm: fio Tainted: G           O      5.5.0-rc2stable+ #56
> [ 4850.023298] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
> [ 4850.023742] RIP: 0010:memcpy_erms+0x6/0x10
> [ 4850.023982] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
> [ 4850.025087] RSP: 0018:ffffb8ec09557b68 EFLAGS: 00010206
> [ 4850.025392] RAX: ffff943ef5c2d840 RBX: ffff943ee9216500 RCX: 00000000000003e0
> [ 4850.025808] RDX: 0000000000000800 RSI: ffffb8ec09558000 RDI: ffff943ef5c2dc60
> [ 4850.026225] RBP: 0000000000080000 R08: ffffb8ec09557bd8 R09: 0000000000000080
> [ 4850.026640] R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000
> [ 4850.027056] R13: ffffb8ec09557be0 R14: ffffb8ec09557bb8 R15: 0000000000080000
> [ 4850.027472] FS:  00007f5db8053700(0000) GS:ffff943eff040000(0000) knlGS:0000000000000000
> [ 4850.027944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4850.028279] CR2: ffffb8ec09558000 CR3: 000000203a052006 CR4: 00000000007606e0
> [ 4850.028695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4850.029111] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4850.029527] PKRU: 55555554
> [ 4850.029684] Call Trace:
> [ 4850.029834]  io_setup_async_io+0x51/0xc0
> [ 4850.030065]  io_write+0xe4/0x220
> [ 4850.030256]  ? get_page_from_freelist+0x43f/0x1220
> [ 4850.030538]  io_issue_sqe+0x419/0xac0
> [ 4850.030752]  io_queue_sqe+0x13b/0x620
> [ 4850.030967]  ? kmem_cache_alloc_bulk+0x32/0x230

This one looks related to io_uring and not nvme.

Jens?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-19  1:12 ` Sagi Grimberg
@ 2020-03-19  2:33   ` Jens Axboe
  2020-03-23 21:07   ` Wunderlich, Mark
  1 sibling, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2020-03-19  2:33 UTC (permalink / raw)
  To: Sagi Grimberg, Wunderlich, Mark, linux-nvme

On 3/18/20 7:12 PM, Sagi Grimberg wrote:
>> If I turn on FIO verification options ( --do_verify=1 --verify=crc32c ) I see the following fault for same 512k write I/O pattern:
>>
>> [ 4850.021884] BUG: stack guard page was hit at 00000000291034b3 (stack is 0000000040c9cc3e..00000000e65d9875)
>> [ 4850.022471] kernel stack overflow (page fault): 0000 [#1] SMP PTI
>> [ 4850.022829] CPU: 3 PID: 3744 Comm: fio Tainted: G           O      5.5.0-rc2stable+ #56
>> [ 4850.023298] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
>> [ 4850.023742] RIP: 0010:memcpy_erms+0x6/0x10
>> [ 4850.023982] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
>> [ 4850.025087] RSP: 0018:ffffb8ec09557b68 EFLAGS: 00010206
>> [ 4850.025392] RAX: ffff943ef5c2d840 RBX: ffff943ee9216500 RCX: 00000000000003e0
>> [ 4850.025808] RDX: 0000000000000800 RSI: ffffb8ec09558000 RDI: ffff943ef5c2dc60
>> [ 4850.026225] RBP: 0000000000080000 R08: ffffb8ec09557bd8 R09: 0000000000000080
>> [ 4850.026640] R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000
>> [ 4850.027056] R13: ffffb8ec09557be0 R14: ffffb8ec09557bb8 R15: 0000000000080000
>> [ 4850.027472] FS:  00007f5db8053700(0000) GS:ffff943eff040000(0000) knlGS:0000000000000000
>> [ 4850.027944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4850.028279] CR2: ffffb8ec09558000 CR3: 000000203a052006 CR4: 00000000007606e0
>> [ 4850.028695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 4850.029111] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 4850.029527] PKRU: 55555554
>> [ 4850.029684] Call Trace:
>> [ 4850.029834]  io_setup_async_io+0x51/0xc0
>> [ 4850.030065]  io_write+0xe4/0x220
>> [ 4850.030256]  ? get_page_from_freelist+0x43f/0x1220
>> [ 4850.030538]  io_issue_sqe+0x419/0xac0
>> [ 4850.030752]  io_queue_sqe+0x13b/0x620
>> [ 4850.030967]  ? kmem_cache_alloc_bulk+0x32/0x230
> 
> This one looks related to io_uring and not nvme.
> 
> Jens?

Looks like you're running 5.5-rc2, which is a somewhat odd choice. I'd
love to see if this reproduces in 5.5 as released, or current 5.6-rc6. I
think this looks like an issue that was fixed previously.

-- 
Jens Axboe


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-19  1:12 ` Sagi Grimberg
  2020-03-19  2:33   ` Jens Axboe
@ 2020-03-23 21:07   ` Wunderlich, Mark
  2020-03-23 21:11     ` Sagi Grimberg
  1 sibling, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-23 21:07 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Jens Axboe


>Can you run:
>gdb drivers/nvme/host/nvme-tcp.ko
>...
>$ l *(nvme_tcp_try_recv+0x59)

(gdb) L *(nvme_tcp_try_recv+0x59)
0xffffffffc03d04d9 is in nvme_tcp_try_recv (drivers/nvme/host/tcp.c:1046).
1041
1042            rd_desc.arg.data = queue;
1043            rd_desc.count = 1;
1044            lock_sock(sk);
1045            queue->nr_cqe = 0;
1046            consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
1047            release_sock(sk);
1048            return consumed;
1049    }
1050
Reproduced this fault on branch nvme-5.6-rc6.

>
>But this looks like a use-after-free condition.

>This is interesting, when we go ahead and do error recovery, I am wandering what is supposed to stop/quiesce blk_poll before safely tearing down and freeing stuff..

>I suspect that this will be common to others that implement blk_poll?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 21:07   ` Wunderlich, Mark
@ 2020-03-23 21:11     ` Sagi Grimberg
  2020-03-23 21:18       ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-23 21:11 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe


>> Can you run:
>> gdb drivers/nvme/host/nvme-tcp.ko
>> ...
>> $ l *(nvme_tcp_try_recv+0x59)
> 
> (gdb) L *(nvme_tcp_try_recv+0x59)
> 0xffffffffc03d04d9 is in nvme_tcp_try_recv (drivers/nvme/host/tcp.c:1046).
> 1041
> 1042            rd_desc.arg.data = queue;
> 1043            rd_desc.count = 1;
> 1044            lock_sock(sk);
> 1045            queue->nr_cqe = 0;
> 1046            consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
> 1047            release_sock(sk);
> 1048            return consumed;
> 1049    }
> 1050
> Reproduced this fault on branch nvme-5.6-rc6.

Thanks, this makes sense. I'm assuming you don't see an issue with
the non-polling I/O correct?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 21:11     ` Sagi Grimberg
@ 2020-03-23 21:18       ` Sagi Grimberg
  2020-03-23 22:04         ` Wunderlich, Mark
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-23 21:18 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe


>>> Can you run:
>>> gdb drivers/nvme/host/nvme-tcp.ko
>>> ...
>>> $ l *(nvme_tcp_try_recv+0x59)
>>
>> (gdb) L *(nvme_tcp_try_recv+0x59)
>> 0xffffffffc03d04d9 is in nvme_tcp_try_recv 
>> (drivers/nvme/host/tcp.c:1046).
>> 1041
>> 1042            rd_desc.arg.data = queue;
>> 1043            rd_desc.count = 1;
>> 1044            lock_sock(sk);
>> 1045            queue->nr_cqe = 0;
>> 1046            consumed = sock->ops->read_sock(sk, &rd_desc, 
>> nvme_tcp_recv_skb);
>> 1047            release_sock(sk);
>> 1048            return consumed;
>> 1049    }
>> 1050
>> Reproduced this fault on branch nvme-5.6-rc6.

Mark, does this patch make the issue go away?
--
@@ -2326,6 +2328,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx)
         struct nvme_tcp_queue *queue = hctx->driver_data;
         struct sock *sk = queue->sock->sk;

+       if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
+               return 0;
+
         set_bit(NVME_TCP_Q_POLLING, &queue->flags);
         if (sk_can_busy_loop(sk) && 
skb_queue_empty_lockless(&sk->sk_receive_queue))
                 sk_busy_loop(sk, true);
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 21:18       ` Sagi Grimberg
@ 2020-03-23 22:04         ` Wunderlich, Mark
  2020-03-23 22:09           ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-23 22:04 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Jens Axboe

>Mark, does this patch make the issue go away?
>--
>@@ -2326,6 +2328,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx)
>        struct nvme_tcp_queue *queue = hctx->driver_data;
>        struct sock *sk = queue->sock->sk;
>
>+       if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
>+               return 0;
>+
>        set_bit(NVME_TCP_Q_POLLING, &queue->flags);
>         if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
>                 sk_busy_loop(sk, true);

Do not see the fault (on first attempt), but as part of error recovery the initiator is not
able to reconnect with target.  Another separate issue?

[  304.395405] nvme nvme0: queue 5: timeout request 0x41 type 4
[  304.395407] nvme nvme0: starting error recovery
[  304.534399] nvme nvme0: Reconnecting in 10 seconds...
[  314.636262] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[  314.636323] nvme nvme0: creating 102 I/O queues.
[  378.117435] nvme nvme0: queue 5: timeout request 0x0 type 4
[  378.123398] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[  378.123790] nvme nvme0: failed to connect queue: 5 ret=881
[  378.124338] nvme nvme0: Failed reconnect attempt 1
[  378.124339] nvme nvme0: Reconnecting in 10 seconds...
[  388.357615] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[  388.357670] nvme nvme0: creating 102 I/O queues.
[  451.848084] nvme nvme0: queue 5: timeout request 0x0 type 4
[  452.096044] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[  452.096428] nvme nvme0: failed to connect queue: 5 ret=881
[  452.096762] nvme nvme0: Failed reconnect attempt 2
[  452.096763] nvme nvme0: Reconnecting in 10 seconds...

And of coarse this does not explain why we have to initiate error recovery
In the first place.

--- Mark
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 22:04         ` Wunderlich, Mark
@ 2020-03-23 22:09           ` Sagi Grimberg
  2020-03-23 23:16             ` Wunderlich, Mark
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-23 22:09 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe


>> Mark, does this patch make the issue go away?
>> --
>> @@ -2326,6 +2328,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx)
>>         struct nvme_tcp_queue *queue = hctx->driver_data;
>>         struct sock *sk = queue->sock->sk;
>>
>> +       if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
>> +               return 0;
>> +
>>         set_bit(NVME_TCP_Q_POLLING, &queue->flags);
>>          if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
>>                  sk_busy_loop(sk, true);
> 
> Do not see the fault (on first attempt),

OK, this is a needed fix then.

> but as part of error recovery the initiator is not
> able to reconnect with target.  Another separate issue?

Possibly,

> 
> [  304.395405] nvme nvme0: queue 5: timeout request 0x41 type 4
> [  304.395407] nvme nvme0: starting error recovery
> [  304.534399] nvme nvme0: Reconnecting in 10 seconds...
> [  314.636262] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
> [  314.636323] nvme nvme0: creating 102 I/O queues.
> [  378.117435] nvme nvme0: queue 5: timeout request 0x0 type 4
> [  378.123398] nvme nvme0: Connect command failed, error wo/DNR bit: 881
> [  378.123790] nvme nvme0: failed to connect queue: 5 ret=881
> [  378.124338] nvme nvme0: Failed reconnect attempt 1
> [  378.124339] nvme nvme0: Reconnecting in 10 seconds...
> [  388.357615] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
> [  388.357670] nvme nvme0: creating 102 I/O queues.
> [  451.848084] nvme nvme0: queue 5: timeout request 0x0 type 4
> [  452.096044] nvme nvme0: Connect command failed, error wo/DNR bit: 881
> [  452.096428] nvme nvme0: failed to connect queue: 5 ret=881
> [  452.096762] nvme nvme0: Failed reconnect attempt 2
> [  452.096763] nvme nvme0: Reconnecting in 10 seconds...
> 
> And of coarse this does not explain why we have to initiate error recovery
> In the first place.

Yes, this is with nvme-5.7 tree right? which is currently based of
5.6.0-rc4 correct?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 22:09           ` Sagi Grimberg
@ 2020-03-23 23:16             ` Wunderlich, Mark
  2020-03-23 23:29               ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-23 23:16 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Jens Axboe


i> OK, this is a needed fix then.

Yes, so far this removed the kernel fault seen for large write (512k) + queue depth 32 + batch size 8.

>Yes, this is with nvme-5.7 tree right? which is currently based of
>5.6.0-rc4 correct?

No, was not yet able to get branch 5.7 to build with our driver.  For now am running on branch nvme-5.6-rc6, the other branch mentioned by Jens to try.  This appears to be based on 5.6.0-rc3 I believe.  Will re-test on 5.7 once I can get it ready.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 23:16             ` Wunderlich, Mark
@ 2020-03-23 23:29               ` Sagi Grimberg
  2020-03-23 23:45                 ` Wunderlich, Mark
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-23 23:29 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe



On 3/23/20 4:16 PM, Wunderlich, Mark wrote:
> 
> i> OK, this is a needed fix then.
> 
> Yes, so far this removed the kernel fault seen for large write (512k) + queue depth 32 + batch size 8.
> 
>> Yes, this is with nvme-5.7 tree right? which is currently based of
>> 5.6.0-rc4 correct?
> 
> No, was not yet able to get branch 5.7 to build with our driver.  For now am running on branch nvme-5.6-rc6, the other branch mentioned by Jens to try.  This appears to be based on 5.6.0-rc3 I believe.  Will re-test on 5.7 once I can get it ready.

OK, but this is with nvme-5.6-rc6 as is right? no add-ons correct?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 23:29               ` Sagi Grimberg
@ 2020-03-23 23:45                 ` Wunderlich, Mark
  2020-03-23 23:48                   ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-23 23:45 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Jens Axboe

>> 
>> OK, this is a needed fix then.
>> 
>> Yes, so far this removed the kernel fault seen for large write (512k) + queue depth 32 + batch size 8.
>> 
>>> Yes, this is with nvme-5.7 tree right? which is currently based of
>>> 5.6.0-rc4 correct?
>> 
>> No, was not yet able to get branch 5.7 to build with our driver.  For now am running on branch nvme-5.6-rc6, the other branch mentioned by Jens to try.  This appears to be based on 5.6.0-rc3 I believe.  Will re-test on 5.7 once I can get it ready.

>OK, but this is with nvme-5.6-rc6 as is right? no add-ons correct?

Since the patch you indicated for me to try show use of the POLLING flag I went ahead and added that patch on the baseline nvme-5.6-rc6 branch before making the small suggested patch to nvme_tcp_poll that checks LIVE state.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 23:45                 ` Wunderlich, Mark
@ 2020-03-23 23:48                   ` Sagi Grimberg
  2020-03-24  0:34                     ` Wunderlich, Mark
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-23 23:48 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe


>>> OK, this is a needed fix then.
>>>
>>> Yes, so far this removed the kernel fault seen for large write (512k) + queue depth 32 + batch size 8.
>>>
>>>> Yes, this is with nvme-5.7 tree right? which is currently based of
>>>> 5.6.0-rc4 correct?
>>>
>>> No, was not yet able to get branch 5.7 to build with our driver.  For now am running on branch nvme-5.6-rc6, the other branch mentioned by Jens to try.  This appears to be based on 5.6.0-rc3 I believe.  Will re-test on 5.7 once I can get it ready.
> 
>> OK, but this is with nvme-5.6-rc6 as is right? no add-ons correct?
> 
> Since the patch you indicated for me to try show use of the POLLING flag I went ahead and added that patch on the baseline nvme-5.6-rc6 branch before making the small suggested patch to nvme_tcp_poll that checks LIVE state.

Does the I/O timeout only happens when you run polling mode (hipri)? Or 
does it happen for non-polling I/O as well?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-23 23:48                   ` Sagi Grimberg
@ 2020-03-24  0:34                     ` Wunderlich, Mark
  2020-03-24  1:29                       ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-24  0:34 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Jens Axboe


> Does the I/O timeout only happens when you run polling mode (hipri)? Or does it happen for non-polling I/O as well?

So far have not seen a failure with io_uring (not setting hipri) or using libaio for same I/O pattern.  Only when hipri is set have I been able to reproduce the recovery failure. 
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-24  0:34                     ` Wunderlich, Mark
@ 2020-03-24  1:29                       ` Sagi Grimberg
  2020-03-24 16:31                         ` Wunderlich, Mark
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-24  1:29 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe


>> Does the I/O timeout only happens when you run polling mode (hipri)? Or does it happen for non-polling I/O as well?
> 
> So far have not seen a failure with io_uring (not setting hipri) or using libaio for same I/O pattern.  Only when hipri is set have I been able to reproduce the recovery failure.

This also reproduces without the POLLING patch correct?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-24  1:29                       ` Sagi Grimberg
@ 2020-03-24 16:31                         ` Wunderlich, Mark
  2020-03-24 19:13                           ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Wunderlich, Mark @ 2020-03-24 16:31 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Jens Axboe


>>> Does the I/O timeout only happens when you run polling mode (hipri)? Or does it happen for non-polling I/O as well?
>> 
>> So far have not seen a failure with io_uring (not setting hipri) or using libaio for same I/O pattern.  Only when hipri is set have I been able to reproduce the recovery failure.
>
>This also reproduces without the POLLING patch correct?

Yes, I reproduced the original reported failure on the baseline nvme-5.6-rc6 branch with no other patches applied, running in polling mode using FIO 'hipri' option to io_uring engine.  Also could not re-create failure, as indicated just above, for libaio or io_uring NOT using 'hipri' option.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sighting: Kernel fault with large write (512k) and io_uring
  2020-03-24 16:31                         ` Wunderlich, Mark
@ 2020-03-24 19:13                           ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2020-03-24 19:13 UTC (permalink / raw)
  To: Wunderlich, Mark, linux-nvme; +Cc: Jens Axboe


>>>> Does the I/O timeout only happens when you run polling mode (hipri)? Or does it happen for non-polling I/O as well?
>>>
>>> So far have not seen a failure with io_uring (not setting hipri) or using libaio for same I/O pattern.  Only when hipri is set have I been able to reproduce the recovery failure.
>>
>> This also reproduces without the POLLING patch correct?
> 
> Yes, I reproduced the original reported failure on the baseline nvme-5.6-rc6 branch with no other patches applied, running in polling mode using FIO 'hipri' option to io_uring engine.  Also could not re-create failure, as indicated just above, for libaio or io_uring NOT using 'hipri' option.

Is it possible that when we poll there is not enough time to process h2c
data pdus from io_work? What happens if you remove the deadline stop
condition from nvme_tcp_io_work?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18 23:37 Sighting: Kernel fault with large write (512k) and io_uring Wunderlich, Mark
2020-03-19  1:12 ` Sagi Grimberg
2020-03-19  2:33   ` Jens Axboe
2020-03-23 21:07   ` Wunderlich, Mark
2020-03-23 21:11     ` Sagi Grimberg
2020-03-23 21:18       ` Sagi Grimberg
2020-03-23 22:04         ` Wunderlich, Mark
2020-03-23 22:09           ` Sagi Grimberg
2020-03-23 23:16             ` Wunderlich, Mark
2020-03-23 23:29               ` Sagi Grimberg
2020-03-23 23:45                 ` Wunderlich, Mark
2020-03-23 23:48                   ` Sagi Grimberg
2020-03-24  0:34                     ` Wunderlich, Mark
2020-03-24  1:29                       ` Sagi Grimberg
2020-03-24 16:31                         ` Wunderlich, Mark
2020-03-24 19:13                           ` Sagi Grimberg

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git