All of lore.kernel.org
 help / color / mirror / Atom feed
* nvmet-tcp kernel crashes consistently when doing 64k rw with fio.
@ 2021-08-11  7:58 Mark Ruijter
  2021-08-13  9:32 ` Daniel Wagner
  0 siblings, 1 reply; 2+ messages in thread
From: Mark Ruijter @ 2021-08-11  7:58 UTC (permalink / raw)
  To: linux-nvme

When I attach an initiator to a nvmet-tcp target running kernel 5.10.57 the target system crashes whenever the initiator runs fio with a 64K blocksize.

mix:	rwmixread=100
rw:	rw
blksiz: 64k
qdepth:	128
job:    12

dmesg will first show _many_ of these messages before the system crashes and reboots.
---
messages-20210720:2021-07-19T16:06:40.084137-06:00 gold kernel: [ 3402.950666] nvmet_tcp: failed cmd 00000000873b89c1 id 22 opcode 2, data_len: 65536
messages-20210720:2021-07-19T16:06:40.084137-06:00 gold kernel: [ 3402.950667] nvmet_tcp: failed cmd 0000000022c079ff id 23 opcode 2, data_len: 65536
messages-20210720:2021-07-19T16:06:40.084138-06:00 gold kernel: [ 3402.950669] nvmet_tcp: failed cmd 0000000093ff4775 id 9 opcode 2, data_len: 65536
messages-20210720:2021-07-19T16:06:40.084138-06:00 gold kernel: [ 3402.950671] nvmet_tcp: failed cmd 00000000dcb0e105 id 10 opcode 2, data_len: 65536
---

We tested with kernels 5.3.18 (SuSE) and kernel 5.10.57.
Running 64K IO with fio triggers this problem consistently.
The automated test that we run tests 64K reads and 64 writes.

I managed to grab a stack trace from the older SuSE kernel which may be helpful:
--
[65980.188661] nvmet_tcp: failed cmd 00000000d4ea0295 id 113 opcode 1, data_len: 65536
[65980.188663] #PF: error_code(0x0000) - not-present page
[65980.188665] nvmet_tcp: failed cmd 000000009fad62b2 id 114 opcode 1, data_len: 65536
[65980.188665] PGD 0 P4D 0 
[65980.188674] Oops: 0000 [#1] SMP NOPTI
[65980.188677] CPU: 0 PID: 4193 Comm: kworker/0:7H Kdump: loaded Tainted: G           OE  X  N 5.3.18-24.37-default #1 SLE15-SP2
[65980.188678] Hardware name: Supermicro, Supermicro, Supermicro, Supermicro SYS-2029U-TN24R4T, SYS-2029U-TN24R4T, SYS-2029U-TN24R4T, SYS-2029U-TN24R4T/X11DP
[65980.188683] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[65980.188685] RIP: 0010:nvmet_tcp_map_pdu_iovec+0x66/0xf0 [nvmet_tcp]
[65980.188687] Code: 48 05 ff 0f 00 00 48 c1 e8 0c 81 e5 ff 0f 00 00 89 87 b8 01 00 00 48 c1 e0 05 48 03 47 30 45 85 f6 0f 84 81 00 00 00 41 89 ec <8b> 70 0c 48 8b 08 8b 50 08 29 ee 44 39 f6 41 0f 47 f6 48 83 e1 fc
[65980.188690] RSP: 0018:ffff9aa9602a7d20 EFLAGS: 00010206
[65980.188691] RAX: 0000000000000000 RBX: ffff8f6f208c1c80 RCX: f28d67c52c83a000
[65980.188692] RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffff8f6ecfd8a200
[65980.188693] nvmet_tcp: failed cmd 0000000044957200 id 115 opcode 1, data_len: 65536
[65980.188695] RBP: 0000000000000000 R08: 0000000000006ee0 R09: 0000000000000030
[65980.188696] nvmet_tcp: failed cmd 000000002387173b id 127 opcode 1, data_len: 65536
[65980.188697] R10: 0000000000000010 R11: fefefefefefefeff R12: 0000000000000000
[65980.188698] nvmet_tcp: failed cmd 00000000dc7f4316 id 58 opcode 1, data_len: 65536
[65980.188700] nvmet_tcp: failed cmd 000000002b203233 id 60 opcode 1, data_len: 65536
[65980.188701] R13: ffff8f8eb4f1e510 R14: 000000000000f000 R15: ffff8f8e6f05e810
[65980.188702] nvmet_tcp: failed cmd 00000000d75d54dd id 61 opcode 1, data_len: 65536
[65980.188704] nvmet_tcp: failed cmd 00000000040df628 id 62 opcode 1, data_len: 65536
[65980.188705] FS:  0000000000000000(0000) GS:ffff8f6fafe00000(0000) knlGS:0000000000000000
[65980.188706] nvmet_tcp: failed cmd 00000000d2515f84 id 63 opcode 1, data_len: 65536
[65980.188708] nvmet_tcp: failed cmd 000000009705fc83 id 64 opcode 1, data_len: 65536
[65980.188709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[65980.188710] nvmet_tcp: failed cmd 00000000f219ad1a id 65 opcode 1, data_len: 65536
[65980.188712] nvmet_tcp: failed cmd 00000000c6c09c57 id 66 opcode 1, data_len: 65536
[65980.188713] CR2: 000000000000000c CR3: 0000003ed41a2002 CR4: 00000000007606f0
[65980.188714] nvmet_tcp: failed cmd 0000000008d40238 id 67 opcode 1, data_len: 65536
[65980.188715] nvmet_tcp: failed cmd 000000006f8e7979 id 68 opcode 1, data_len: 65536
[65980.188717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[65980.188718] nvmet_tcp: failed cmd 000000006d2f85bb id 69 opcode 1, data_len: 65536
[65980.188720] nvmet_tcp: failed cmd 00000000aaccbded id 70 opcode 1, data_len: 65536
[65980.188721] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[65980.188722] PKRU: 55555554
[65980.188723] Call Trace:
[65980.188727]  nvmet_tcp_try_recv_pdu+0x3dc/0x6f0 [nvmet_tcp]
[65980.188734]  ? __switch_to_asm+0x34/0x70
[65980.188735]  ? __switch_to_asm+0x40/0x70
[65980.188736]  ? __switch_to_asm+0x34/0x70
[65980.188737]  ? __switch_to_asm+0x40/0x70
[65980.188738]  ? __switch_to_asm+0x34/0x70
[65980.188739]  ? __switch_to_asm+0x40/0x70
[65980.188741]  nvmet_tcp_io_work+0x6d/0xa80 [nvmet_tcp]
[65980.188743]  ? __switch_to_asm+0x34/0x70
[65980.188746]  process_one_work+0x1f4/0x3e0
[65980.188748]  worker_thread+0x2d/0x3e0
[65980.188750]  ? process_one_work+0x3e0/0x3e0
[65980.188752]  kthread+0x10d/0x130
[65980.188753]  ? kthread_park+0xa0/0xa0
[65980.188755]  ret_from_fork+0x1f/0x40
[65980.188757] Modules linked in: st sr_mod cdrom lp parport_pc ppdev parport xfrm_user xsk_diag sctp_diag udp_diag raw_diag unix_diag af_packet_diag netlink_diag binfmt_misc xfs dm_snapshot dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid0 md_mod tcp_diag inet_diag scst_vdisk(OENN) nfsd auth_rpcgss nfs_acl lockd grace loop nvme nvme_core xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter nvmet_rdma nvmet_tcp null_blk nvmet iscsi_scst(OENN) scst(OENN) dlm sctp rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm configfs ib_umad mlx4_ib mlx4_en mlx4_core mlx5_ib ib_uverbs ib_core mlx5_core mlxfw tls pci_hyperv_intf(XX) ipmi_watc
 hdog
[65980.188788]  af_packet iscsi_ibft iscsi_boot_sysfs rfkill dmi_sysfs msr intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp ast drm_vram_helper kvm_intel i2c_algo_bit ttm kvm drm_kms_helper irqbypass crc32_pclmul ixgbe drm ghash_clmulni_intel xfrm_algo aesni_intel mei_me libphy syscopyarea aes_x86_64 sysfillrect crypto_simd lpc_ich sysimgblt mdio ioatdma cryptd glue_helper joydev fb_sys_fops i2c_i801 mei mfd_core dca ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq acpi_pad button btrfs libcrc32c xor hid_generic usbhid raid6_pq sd_mod crc32c_intel xhci_pci xhci_hcd ahci libahci usbcore libata vmd wmi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod [last unloaded: parport_pc]
[65980.188824] Supported: No, Unsupported modules are loaded
[65980.188826] CR2: 000000000000000c


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: nvmet-tcp kernel crashes consistently when doing 64k rw with fio.
  2021-08-11  7:58 nvmet-tcp kernel crashes consistently when doing 64k rw with fio Mark Ruijter
@ 2021-08-13  9:32 ` Daniel Wagner
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Wagner @ 2021-08-13  9:32 UTC (permalink / raw)
  To: Mark Ruijter; +Cc: linux-nvme

Hi Mark

On Wed, Aug 11, 2021 at 07:58:28AM +0000, Mark Ruijter wrote:
> messages-20210720:2021-07-19T16:06:40.084138-06:00 gold kernel: [ 3402.950671] nvmet_tcp: failed cmd 00000000dcb0e105 id 10 opcode 2, data_len: 65536

I've tried to reproduce this with the current nvme/nvme-5.15 branch
(this was the kernel I got running) and was not able to trigger it. So
not sure if it is fixed or my setup is not able to reproduce it. Anyway,
I'll try to the 5.10 stable tree.

Daniel

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-13  9:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11  7:58 nvmet-tcp kernel crashes consistently when doing 64k rw with fio Mark Ruijter
2021-08-13  9:32 ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.