Error when running fio against nvme-of rdma target (mlx5 driver)

* Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-09  2:50 ` Martin Oliveira
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Oliveira @ 2022-02-09  2:50 UTC (permalink / raw)
  To: linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe

Hello,

We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.

Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.

When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:

[  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
[  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
[  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
[  408.380235] nvme nvme15: starting error recovery
[  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
[  408.380246] block nvme15n2: no usable path - requeuing I/O
[  408.380284] block nvme15n5: no usable path - requeuing I/O
[  408.380298] block nvme15n1: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380330] block nvme15n1: no usable path - requeuing I/O
[  408.380350] block nvme15n2: no usable path - requeuing I/O
[  408.380371] block nvme15n6: no usable path - requeuing I/O
[  408.380377] block nvme15n6: no usable path - requeuing I/O
[  408.380382] block nvme15n4: no usable path - requeuing I/O
[  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
[  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
[  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[  415.131898] nvmet: ctrl 1 fatal error occurred!

Occasionally, we've seen the following stack trace:

[ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
[ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
[ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
[ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
[ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
[ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
[ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
[ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
[ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
[ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
[ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
[ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
[ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
[ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
[ 1158.542419] Call Trace:
[ 1158.544877]  amd_iommu_unmap+0x2c/0x40
[ 1158.548653]  __iommu_unmap+0xc4/0x170
[ 1158.552344]  iommu_unmap_fast+0xe/0x10
[ 1158.556100]  __iommu_dma_unmap+0x85/0x120
[ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
[ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
[ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
[ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
[ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
[ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
[ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
[ 1158.594027]  process_one_work+0x220/0x3c0
[ 1158.598038]  worker_thread+0x4d/0x3f0
[ 1158.601696]  kthread+0x114/0x150
[ 1158.604928]  ? process_one_work+0x3c0/0x3c0
[ 1158.609114]  ? kthread_park+0x90/0x90
[ 1158.612783]  ret_from_fork+0x22/0x30

We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.

We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.

As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.

Any suggestions are appreciated.

Thanks,
Martin

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
[2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/

fio script:
[global]
name=fio-seq-write
rw=write
bs=1M
direct=1
numjobs=32
time_based
group_reporting=1
runtime=18000
end_fsync=1
size=10G
ioengine=libaio
iodepth=16

[file1]
filename=/dev/nvme0n1

[file2]
filename=/dev/nvme0n2

[file3]
filename=/dev/nvme0n3

[file4]
filename=/dev/nvme0n4

[file5]
filename=/dev/nvme0n5

[file6]
filename=/dev/nvme0n6

[file7]
filename=/dev/nvme0n7

[file8]
filename=/dev/nvme0n8

[file9]
filename=/dev/nvme0n9

[file10]
filename=/dev/nvme0n10

[file11]
filename=/dev/nvme0n11

[file12]
filename=/dev/nvme0n12
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread