Fault seen with io_uring and nvmf/tcp

* Fault seen with io_uring and nvmf/tcp
@ 2020-02-11 19:30 Wunderlich, Mark
  2020-02-11 19:45 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Wunderlich, Mark @ 2020-02-11 19:30 UTC (permalink / raw)
  To: linux-block; +Cc: Sagi Grimberg

Posting to this mail list in hopes someone has already seen this fault before I start digging.  Using the nvme-5.5-rc branch of  git.infradead.org repo.
Pulled this branch and running un-modified.
Performing FIO (io_uring) test: (initiating on 8 host cores, TIME=30, RWMIX=100, BLOCK_SIZE=4k, DEPTH=32, BATCH=8), using latest version of fio.
cmd="fio --filename=/dev/nvme0n1 --time_based --runtime=$TIME --ramp_time=10 --thread --rw=randrw --rwmixread=$RWMIX --refill_buffers --direct=1 --ioengine=io_uring --hipri --fixedbufs --bs=$BLOCK_SIZE --iodepth=$DEPTH --iodepth_batch_complete_min=1 --iodepth_batch_complete_max=$DEPTH --iodepth_batch=$BATCH --numjobs=1 --group_reporting --gtod_reduce=0 --disable_lat=0 --name=cpu3 --cpus_allowed=3 --name=cpu5 --cpus_allowed=5 --name=cpu7 --cpus_allowed=7 --name=cpu9 --cpus_allowed=9 --name=cpu11 --cpus_allowed11 --name=cpu13 --cpus_allowed=13 --name=cpu15 --cpus_allowed=15 --name=cpu17 --cpus_allowed=17

NVMf TCP queue configuration is 1 default queue and 101 poll queues.  Connected to a single remote NVMe ram disk device.
I/O performs normally up to 30 second run, but faults just at the end. Very repeatable.

Thanks for your time --- Mark

[64592.841944] nvme nvme0: mapped 1/0/101 default/read/poll queues.
[64592.867003] nvme nvme0: new ctrl: NQN "testrd", addr 192.168.0.1:4420
[64646.940588] list_add corruption. prev->next should be next (ffff9c1feb2bc7c8), but was ffff9c1ff7ee5368. (prev=ffff9c1ff7ee5468).
[64646.941149] ------------[ cut here ]------------
[64646.941150] kernel BUG at lib/list_debug.c:28!
[64646.941360] invalid opcode: 0000 [#1] SMP PTI
[64646.941561] CPU: 82 PID: 7886 Comm: io_wqe_worker-0 Tainted: G           O      5.5.0-rc2stable+ #32
[64646.941994] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018
[64646.942349] RIP: 0010:__list_add_valid+0x64/0x70
[64646.942562] Code: 48 89 fe 31 c0 48 c7 c7 40 21 17 89 e8 f9 5c c6 ff 0f 0b 48 89 d1 48 c7 c7 e8 20 17 89 48 89 f2 48 89 c6 31 c0 e8 e0 5c c6 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 b9 00 01 00 00 00
[64646.943442] RSP: 0018:ffffa78a49137d90 EFLAGS: 00010246
[64646.943687] RAX: 0000000000000075 RBX: ffff9c1ff7ee5a00 RCX: 0000000000000000
[64646.944021] RDX: 0000000000000000 RSI: ffff9c0fffe59d28 RDI: ffff9c0fffe59d28
[64646.944356] RBP: ffffa78a49137df8 R08: 00000000000006ad R09: ffffffff88ec3be0
[64646.944691] R10: 000000000000000f R11: 0000000007070707 R12: ffff9c1feb2bc600
[64646.945025] R13: ffff9c1feb2bc7c8 R14: ffff9c1ff7ee5468 R15: ffff9c1ff7ee5a68
[64646.945360] FS:  0000000000000000(0000) GS:ffff9c0fffe40000(0000) knlGS:0000000000000000
[64646.945739] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[64646.946008] CR2: 00007f4423eb7004 CR3: 000000169940a005 CR4: 00000000007606e0
[64646.946343] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[64646.946677] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[64646.947012] PKRU: 55555554
[64646.947138] Call Trace:
[64646.947260]  io_issue_sqe+0x115/0xa30
[64646.947429]  io_wq_submit_work+0xb5/0x1d0
[64646.947615]  io_worker_handle_work+0x19d/0x4c0
[64646.947823]  io_wqe_worker+0xdc/0x390
[64646.947998]  kthread+0xf8/0x130
[64646.948141]  ? io_wq_for_each_worker+0xb0/0xb0
[64646.948349]  ? kthread_bind+0x10/0x10
[64646.948522]  ret_from_fork+0x35/0x40

^ permalink raw reply	[flat|nested] 3+ messages in thread