All of lore.kernel.org
 help / color / mirror / Atom feed
* Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-09  2:50 ` Martin Oliveira
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Oliveira @ 2022-02-09  2:50 UTC (permalink / raw)
  To: linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe

Hello,

We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.

Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.

When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:

[  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
[  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
[  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
[  408.380235] nvme nvme15: starting error recovery
[  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
[  408.380246] block nvme15n2: no usable path - requeuing I/O
[  408.380284] block nvme15n5: no usable path - requeuing I/O
[  408.380298] block nvme15n1: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380330] block nvme15n1: no usable path - requeuing I/O
[  408.380350] block nvme15n2: no usable path - requeuing I/O
[  408.380371] block nvme15n6: no usable path - requeuing I/O
[  408.380377] block nvme15n6: no usable path - requeuing I/O
[  408.380382] block nvme15n4: no usable path - requeuing I/O
[  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
[  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
[  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[  415.131898] nvmet: ctrl 1 fatal error occurred!

Occasionally, we've seen the following stack trace:

[ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
[ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
[ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
[ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
[ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
[ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
[ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
[ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
[ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
[ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
[ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
[ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
[ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
[ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
[ 1158.542419] Call Trace:
[ 1158.544877]  amd_iommu_unmap+0x2c/0x40
[ 1158.548653]  __iommu_unmap+0xc4/0x170
[ 1158.552344]  iommu_unmap_fast+0xe/0x10
[ 1158.556100]  __iommu_dma_unmap+0x85/0x120
[ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
[ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
[ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
[ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
[ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
[ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
[ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
[ 1158.594027]  process_one_work+0x220/0x3c0
[ 1158.598038]  worker_thread+0x4d/0x3f0
[ 1158.601696]  kthread+0x114/0x150
[ 1158.604928]  ? process_one_work+0x3c0/0x3c0
[ 1158.609114]  ? kthread_park+0x90/0x90
[ 1158.612783]  ret_from_fork+0x22/0x30

We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.

We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.

As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.

Any suggestions are appreciated.

Thanks,
Martin

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
[2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/

fio script:
[global]
name=fio-seq-write
rw=write
bs=1M
direct=1
numjobs=32
time_based
group_reporting=1
runtime=18000
end_fsync=1
size=10G
ioengine=libaio
iodepth=16

[file1]
filename=/dev/nvme0n1

[file2]
filename=/dev/nvme0n2

[file3]
filename=/dev/nvme0n3

[file4]
filename=/dev/nvme0n4

[file5]
filename=/dev/nvme0n5

[file6]
filename=/dev/nvme0n6

[file7]
filename=/dev/nvme0n7

[file8]
filename=/dev/nvme0n8

[file9]
filename=/dev/nvme0n9

[file10]
filename=/dev/nvme0n10

[file11]
filename=/dev/nvme0n11

[file12]
filename=/dev/nvme0n12
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-09  2:50 ` Martin Oliveira
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Oliveira @ 2022-02-09  2:50 UTC (permalink / raw)
  To: linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Logan Gunthorpe, Lee, Jason

Hello,

We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.

Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.

When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:

[  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
[  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
[  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
[  408.380235] nvme nvme15: starting error recovery
[  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
[  408.380246] block nvme15n2: no usable path - requeuing I/O
[  408.380284] block nvme15n5: no usable path - requeuing I/O
[  408.380298] block nvme15n1: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380330] block nvme15n1: no usable path - requeuing I/O
[  408.380350] block nvme15n2: no usable path - requeuing I/O
[  408.380371] block nvme15n6: no usable path - requeuing I/O
[  408.380377] block nvme15n6: no usable path - requeuing I/O
[  408.380382] block nvme15n4: no usable path - requeuing I/O
[  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
[  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
[  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[  415.131898] nvmet: ctrl 1 fatal error occurred!

Occasionally, we've seen the following stack trace:

[ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
[ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
[ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
[ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
[ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
[ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
[ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
[ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
[ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
[ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
[ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
[ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
[ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
[ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
[ 1158.542419] Call Trace:
[ 1158.544877]  amd_iommu_unmap+0x2c/0x40
[ 1158.548653]  __iommu_unmap+0xc4/0x170
[ 1158.552344]  iommu_unmap_fast+0xe/0x10
[ 1158.556100]  __iommu_dma_unmap+0x85/0x120
[ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
[ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
[ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
[ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
[ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
[ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
[ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
[ 1158.594027]  process_one_work+0x220/0x3c0
[ 1158.598038]  worker_thread+0x4d/0x3f0
[ 1158.601696]  kthread+0x114/0x150
[ 1158.604928]  ? process_one_work+0x3c0/0x3c0
[ 1158.609114]  ? kthread_park+0x90/0x90
[ 1158.612783]  ret_from_fork+0x22/0x30

We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.

We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.

As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.

Any suggestions are appreciated.

Thanks,
Martin

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
[2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/

fio script:
[global]
name=fio-seq-write
rw=write
bs=1M
direct=1
numjobs=32
time_based
group_reporting=1
runtime=18000
end_fsync=1
size=10G
ioengine=libaio
iodepth=16

[file1]
filename=/dev/nvme0n1

[file2]
filename=/dev/nvme0n2

[file3]
filename=/dev/nvme0n3

[file4]
filename=/dev/nvme0n4

[file5]
filename=/dev/nvme0n5

[file6]
filename=/dev/nvme0n6

[file7]
filename=/dev/nvme0n7

[file8]
filename=/dev/nvme0n8

[file9]
filename=/dev/nvme0n9

[file10]
filename=/dev/nvme0n10

[file11]
filename=/dev/nvme0n11

[file12]
filename=/dev/nvme0n12

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-09  2:50 ` Martin Oliveira
@ 2022-02-09  8:41   ` Chaitanya Kulkarni via iommu
  -1 siblings, 0 replies; 18+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-09  8:41 UTC (permalink / raw)
  To: Martin Oliveira
  Cc: linux-nvme, linux-rdma, Kelly Ursenbach, Logan Gunthorpe, Lee,
	Jason, iommu

On 2/8/22 6:50 PM, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> 

Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.

-ck

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-09  8:41   ` Chaitanya Kulkarni via iommu
  0 siblings, 0 replies; 18+ messages in thread
From: Chaitanya Kulkarni via iommu @ 2022-02-09  8:41 UTC (permalink / raw)
  To: Martin Oliveira
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

On 2/8/22 6:50 PM, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> 

Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.

-ck
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-09  2:50 ` Martin Oliveira
@ 2022-02-09 12:48   ` Robin Murphy
  -1 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-02-09 12:48 UTC (permalink / raw)
  To: Martin Oliveira, linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe

On 2022-02-09 02:50, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> 
> When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:
> 
> [  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
> [  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
> [  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
> [  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
> [  408.380235] nvme nvme15: starting error recovery
> [  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [  408.380246] block nvme15n2: no usable path - requeuing I/O
> [  408.380284] block nvme15n5: no usable path - requeuing I/O
> [  408.380298] block nvme15n1: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380330] block nvme15n1: no usable path - requeuing I/O
> [  408.380350] block nvme15n2: no usable path - requeuing I/O
> [  408.380371] block nvme15n6: no usable path - requeuing I/O
> [  408.380377] block nvme15n6: no usable path - requeuing I/O
> [  408.380382] block nvme15n4: no usable path - requeuing I/O
> [  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [  415.131898] nvmet: ctrl 1 fatal error occurred!
> 
> Occasionally, we've seen the following stack trace:

FWIW this is indicative the scatterlist passed to dma_unmap_sg_attrs() 
was wrong - specifically it looks like an attempt to unmap a region 
that's already unmapped (or was never mapped in the first place). 
Whatever race or data corruption issue is causing that is almost 
certainly happening much earlier, since the IO_PAGE_FAULT logs further 
imply that either some pages have been spuriously unmapped while the 
device was still accessing them, or some DMA address in the scatterlist 
was already bogus by the time it was handed off to the device.

Robin.

> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
> [ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
> [ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877]  amd_iommu_unmap+0x2c/0x40
> [ 1158.548653]  __iommu_unmap+0xc4/0x170
> [ 1158.552344]  iommu_unmap_fast+0xe/0x10
> [ 1158.556100]  __iommu_dma_unmap+0x85/0x120
> [ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027]  process_one_work+0x220/0x3c0
> [ 1158.598038]  worker_thread+0x4d/0x3f0
> [ 1158.601696]  kthread+0x114/0x150
> [ 1158.604928]  ? process_one_work+0x3c0/0x3c0
> [ 1158.609114]  ? kthread_park+0x90/0x90
> [ 1158.612783]  ret_from_fork+0x22/0x30
> 
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
> 
> We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.
> 
> As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.
> 
> Any suggestions are appreciated.
> 
> Thanks,
> Martin
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
> 
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
> 
> [file1]
> filename=/dev/nvme0n1
> 
> [file2]
> filename=/dev/nvme0n2
> 
> [file3]
> filename=/dev/nvme0n3
> 
> [file4]
> filename=/dev/nvme0n4
> 
> [file5]
> filename=/dev/nvme0n5
> 
> [file6]
> filename=/dev/nvme0n6
> 
> [file7]
> filename=/dev/nvme0n7
> 
> [file8]
> filename=/dev/nvme0n8
> 
> [file9]
> filename=/dev/nvme0n9
> 
> [file10]
> filename=/dev/nvme0n10
> 
> [file11]
> filename=/dev/nvme0n11
> 
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-09 12:48   ` Robin Murphy
  0 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-02-09 12:48 UTC (permalink / raw)
  To: Martin Oliveira, linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe

On 2022-02-09 02:50, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> 
> When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:
> 
> [  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
> [  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
> [  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
> [  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
> [  408.380235] nvme nvme15: starting error recovery
> [  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [  408.380246] block nvme15n2: no usable path - requeuing I/O
> [  408.380284] block nvme15n5: no usable path - requeuing I/O
> [  408.380298] block nvme15n1: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380330] block nvme15n1: no usable path - requeuing I/O
> [  408.380350] block nvme15n2: no usable path - requeuing I/O
> [  408.380371] block nvme15n6: no usable path - requeuing I/O
> [  408.380377] block nvme15n6: no usable path - requeuing I/O
> [  408.380382] block nvme15n4: no usable path - requeuing I/O
> [  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [  415.131898] nvmet: ctrl 1 fatal error occurred!
> 
> Occasionally, we've seen the following stack trace:

FWIW this is indicative the scatterlist passed to dma_unmap_sg_attrs() 
was wrong - specifically it looks like an attempt to unmap a region 
that's already unmapped (or was never mapped in the first place). 
Whatever race or data corruption issue is causing that is almost 
certainly happening much earlier, since the IO_PAGE_FAULT logs further 
imply that either some pages have been spuriously unmapped while the 
device was still accessing them, or some DMA address in the scatterlist 
was already bogus by the time it was handed off to the device.

Robin.

> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
> [ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
> [ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877]  amd_iommu_unmap+0x2c/0x40
> [ 1158.548653]  __iommu_unmap+0xc4/0x170
> [ 1158.552344]  iommu_unmap_fast+0xe/0x10
> [ 1158.556100]  __iommu_dma_unmap+0x85/0x120
> [ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027]  process_one_work+0x220/0x3c0
> [ 1158.598038]  worker_thread+0x4d/0x3f0
> [ 1158.601696]  kthread+0x114/0x150
> [ 1158.604928]  ? process_one_work+0x3c0/0x3c0
> [ 1158.609114]  ? kthread_park+0x90/0x90
> [ 1158.612783]  ret_from_fork+0x22/0x30
> 
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
> 
> We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.
> 
> As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.
> 
> Any suggestions are appreciated.
> 
> Thanks,
> Martin
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
> 
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
> 
> [file1]
> filename=/dev/nvme0n1
> 
> [file2]
> filename=/dev/nvme0n2
> 
> [file3]
> filename=/dev/nvme0n3
> 
> [file4]
> filename=/dev/nvme0n4
> 
> [file5]
> filename=/dev/nvme0n5
> 
> [file6]
> filename=/dev/nvme0n6
> 
> [file7]
> filename=/dev/nvme0n7
> 
> [file8]
> filename=/dev/nvme0n8
> 
> [file9]
> filename=/dev/nvme0n9
> 
> [file10]
> filename=/dev/nvme0n10
> 
> [file11]
> filename=/dev/nvme0n11
> 
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-09  8:41   ` Chaitanya Kulkarni via iommu
@ 2022-02-10 23:58     ` Martin Oliveira
  -1 siblings, 0 replies; 18+ messages in thread
From: Martin Oliveira @ 2022-02-10 23:58 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme, linux-rdma, Kelly Ursenbach, Logan Gunthorpe, Lee,
	Jason, iommu

On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
> On 2/8/22 6:50 PM, Martin Oliveira wrote:
> > Hello,
> >
> > We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> >
> > Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> >
> 
> Thanks for reporting this, if you can bisect the problem on your setup
> it will help others to help you better.
> 
> -ck

Hi Chaitanya,

I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.

I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.

I'd be happy to try any tests if someone has any suggestions.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-10 23:58     ` Martin Oliveira
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Oliveira @ 2022-02-10 23:58 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
> On 2/8/22 6:50 PM, Martin Oliveira wrote:
> > Hello,
> >
> > We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> >
> > Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> >
> 
> Thanks for reporting this, if you can bisect the problem on your setup
> it will help others to help you better.
> 
> -ck

Hi Chaitanya,

I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.

I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.

I'd be happy to try any tests if someone has any suggestions.

Thanks,
Martin
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-10 23:58     ` Martin Oliveira
@ 2022-02-11 11:35       ` Robin Murphy
  -1 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-02-11 11:35 UTC (permalink / raw)
  To: Martin Oliveira, Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

On 2022-02-10 23:58, Martin Oliveira wrote:
> On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>>> Hello,
>>>
>>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>>>
>>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>>>
>>
>> Thanks for reporting this, if you can bisect the problem on your setup
>> it will help others to help you better.
>>
>> -ck
> 
> Hi Chaitanya,
> 
> I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
> 
> I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
> 
> I'd be happy to try any tests if someone has any suggestions.

The IOMMU is probably your friend here - one thing that might be worth 
trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
the address reported in subsequent IOMMU faults was previously mapped as 
a valid DMA address (be warned that there will likely be a *lot* of 
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
should also make it easier to tell real DMA IOVAs from rogue physical 
addresses or other nonsense, as real DMA addresses should then look more 
like 0xffff24d08000.

That could at least help narrow down whether it's some kind of 
use-after-free race or a completely bogus address creeping in somehow.

Robin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-11 11:35       ` Robin Murphy
  0 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-02-11 11:35 UTC (permalink / raw)
  To: Martin Oliveira, Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

On 2022-02-10 23:58, Martin Oliveira wrote:
> On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>>> Hello,
>>>
>>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>>>
>>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>>>
>>
>> Thanks for reporting this, if you can bisect the problem on your setup
>> it will help others to help you better.
>>
>> -ck
> 
> Hi Chaitanya,
> 
> I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
> 
> I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
> 
> I'd be happy to try any tests if someone has any suggestions.

The IOMMU is probably your friend here - one thing that might be worth 
trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
the address reported in subsequent IOMMU faults was previously mapped as 
a valid DMA address (be warned that there will likely be a *lot* of 
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
should also make it easier to tell real DMA IOVAs from rogue physical 
addresses or other nonsense, as real DMA addresses should then look more 
like 0xffff24d08000.

That could at least help narrow down whether it's some kind of 
use-after-free race or a completely bogus address creeping in somehow.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-11 11:35       ` Robin Murphy
@ 2022-05-17  8:26         ` Mark Ruijter
  -1 siblings, 0 replies; 18+ messages in thread
From: Mark Ruijter @ 2022-05-17  8:26 UTC (permalink / raw)
  To: Robin Murphy, Martin Oliveira, Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

Hi Robin,

I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6.

[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 00000030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...

I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.

Thanks,

--Mark

On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" <linux-nvme-bounces@lists.infradead.org on behalf of robin.murphy@arm.com> wrote:

    On 2022-02-10 23:58, Martin Oliveira wrote:
    > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
    >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
    >>> Hello,
    >>>
    >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
    >>>
    >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
    >>>
    >>
    >> Thanks for reporting this, if you can bisect the problem on your setup
    >> it will help others to help you better.
    >>
    >> -ck
    > 
    > Hi Chaitanya,
    > 
    > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
    > 
    > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
    > 
    > I'd be happy to try any tests if someone has any suggestions.

    The IOMMU is probably your friend here - one thing that might be worth 
    trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
    the address reported in subsequent IOMMU faults was previously mapped as 
    a valid DMA address (be warned that there will likely be a *lot* of 
    trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
    should also make it easier to tell real DMA IOVAs from rogue physical 
    addresses or other nonsense, as real DMA addresses should then look more 
    like 0xffff24d08000.

    That could at least help narrow down whether it's some kind of 
    use-after-free race or a completely bogus address creeping in somehow.

    Robin.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-05-17  8:26         ` Mark Ruijter
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Ruijter @ 2022-05-17  8:26 UTC (permalink / raw)
  To: Robin Murphy, Martin Oliveira, Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

Hi Robin,

I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6.

[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 00000030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...

I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.

Thanks,

--Mark

On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" <linux-nvme-bounces@lists.infradead.org on behalf of robin.murphy@arm.com> wrote:

    On 2022-02-10 23:58, Martin Oliveira wrote:
    > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
    >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
    >>> Hello,
    >>>
    >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
    >>>
    >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
    >>>
    >>
    >> Thanks for reporting this, if you can bisect the problem on your setup
    >> it will help others to help you better.
    >>
    >> -ck
    > 
    > Hi Chaitanya,
    > 
    > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
    > 
    > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
    > 
    > I'd be happy to try any tests if someone has any suggestions.

    The IOMMU is probably your friend here - one thing that might be worth 
    trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
    the address reported in subsequent IOMMU faults was previously mapped as 
    a valid DMA address (be warned that there will likely be a *lot* of 
    trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
    should also make it easier to tell real DMA IOVAs from rogue physical 
    addresses or other nonsense, as real DMA addresses should then look more 
    like 0xffff24d08000.

    That could at least help narrow down whether it's some kind of 
    use-after-free race or a completely bogus address creeping in somehow.

    Robin.


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-05-17  8:26         ` Mark Ruijter
@ 2022-05-17 11:16           ` Max Gurtovoy via iommu
  -1 siblings, 0 replies; 18+ messages in thread
From: Max Gurtovoy @ 2022-05-17 11:16 UTC (permalink / raw)
  To: Mark Ruijter, Robin Murphy, Martin Oliveira, Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

Hi,

Can you please send the original scenario, setup details and dumps ?

I can't find it in my mailbox.

you can send it directly to me to avoid spam.

-Max.

On 5/17/2022 11:26 AM, Mark Ruijter wrote:
> Hi Robin,
>
> I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6.
>
> [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
> [ 4879.122015] nvme nvme0: starting error recovery
> [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
> [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
> [ 4879.122037] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122039] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122040] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122040] 00000030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
> [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...
>
> I assume this means that the problem has still not been resolved?
> If so, I'll try to diagnose the problem.
>
> Thanks,
>
> --Mark
>
> On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" <linux-nvme-bounces@lists.infradead.org on behalf of robin.murphy@arm.com> wrote:
>
>      On 2022-02-10 23:58, Martin Oliveira wrote:
>      > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>      >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>      >>> Hello,
>      >>>
>      >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>      >>>
>      >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>      >>>
>      >>
>      >> Thanks for reporting this, if you can bisect the problem on your setup
>      >> it will help others to help you better.
>      >>
>      >> -ck
>      >
>      > Hi Chaitanya,
>      >
>      > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
>      >
>      > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
>      >
>      > I'd be happy to try any tests if someone has any suggestions.
>
>      The IOMMU is probably your friend here - one thing that might be worth
>      trying is capturing the iommu:map and iommu:unmap tracepoints to see if
>      the address reported in subsequent IOMMU faults was previously mapped as
>      a valid DMA address (be warned that there will likely be a *lot* of
>      trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
>      should also make it easier to tell real DMA IOVAs from rogue physical
>      addresses or other nonsense, as real DMA addresses should then look more
>      like 0xffff24d08000.
>
>      That could at least help narrow down whether it's some kind of
>      use-after-free race or a completely bogus address creeping in somehow.
>
>      Robin.
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-05-17 11:16           ` Max Gurtovoy via iommu
  0 siblings, 0 replies; 18+ messages in thread
From: Max Gurtovoy via iommu @ 2022-05-17 11:16 UTC (permalink / raw)
  To: Mark Ruijter, Robin Murphy, Martin Oliveira, Chaitanya Kulkarni
  Cc: Kelly Ursenbach, linux-rdma, Lee, Jason, linux-nvme, iommu,
	Logan Gunthorpe

Hi,

Can you please send the original scenario, setup details and dumps ?

I can't find it in my mailbox.

you can send it directly to me to avoid spam.

-Max.

On 5/17/2022 11:26 AM, Mark Ruijter wrote:
> Hi Robin,
>
> I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6.
>
> [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
> [ 4879.122015] nvme nvme0: starting error recovery
> [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
> [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
> [ 4879.122037] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122039] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122040] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122040] 00000030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
> [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...
>
> I assume this means that the problem has still not been resolved?
> If so, I'll try to diagnose the problem.
>
> Thanks,
>
> --Mark
>
> On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" <linux-nvme-bounces@lists.infradead.org on behalf of robin.murphy@arm.com> wrote:
>
>      On 2022-02-10 23:58, Martin Oliveira wrote:
>      > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>      >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>      >>> Hello,
>      >>>
>      >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>      >>>
>      >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>      >>>
>      >>
>      >> Thanks for reporting this, if you can bisect the problem on your setup
>      >> it will help others to help you better.
>      >>
>      >> -ck
>      >
>      > Hi Chaitanya,
>      >
>      > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
>      >
>      > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
>      >
>      > I'd be happy to try any tests if someone has any suggestions.
>
>      The IOMMU is probably your friend here - one thing that might be worth
>      trying is capturing the iommu:map and iommu:unmap tracepoints to see if
>      the address reported in subsequent IOMMU faults was previously mapped as
>      a valid DMA address (be warned that there will likely be a *lot* of
>      trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
>      should also make it easier to tell real DMA IOVAs from rogue physical
>      addresses or other nonsense, as real DMA addresses should then look more
>      like 0xffff24d08000.
>
>      That could at least help narrow down whether it's some kind of
>      use-after-free race or a completely bogus address creeping in somehow.
>
>      Robin.
>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-09  2:50 ` Martin Oliveira
                   ` (2 preceding siblings ...)
  (?)
@ 2024-01-31  5:34 ` Arthur Muller
  2024-01-31 13:18   ` Christoph Hellwig
  -1 siblings, 1 reply; 18+ messages in thread
From: Arthur Muller @ 2024-01-31  5:34 UTC (permalink / raw)
  To: Martin Oliveira, linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe

Dear all,

We've encountered a similar issue. In our case, we are using the Lustre
file system instead of NVMe-oF to connect our storage over the network.
Our setup involves an AMD EPYC 7282 machine paired with Mellanox
MT28908 cards. Following the guidelines in the Nvidia documentation:

https://docs.nvidia.com/networking/display/mlnxenv584150lts/installing+mlnx_en#src-2477565014_InstallingMLNX_EN-InstallationModes

we compiled the MLNX_EN 5.8 LTS driver using VMA. Additionally, we
experimented with the latest MLNX_EN 23.10 driver, encountering the
same issue. 

We use kernel 5.15.0 now. This problems first appered after upgrading
our systems from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS and, thus, from
kernel 5.4 to 5.15. Unfortunately, we are not completely sure about the
MLNX_EN driver version prior to the upgrade, but strongly assume <=
5.8.

The error process appears to be initiated by the following error
message:

mlx5_core 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0003 address=0x200020f758 flags=0x0020]

This error results in timeouts and read operation faults within the
Lustre file system on both the client and storage ends, potentially
leading to a complete storage failure. 

Subsequently, the IO_PAGE_FAULT error is often followed by a "local
protection error," although this is not consistent. The error tends to
manifest after a prolonged period of constant network operation,
typically after approximately one day of continuous read and write
operations involving a Postgres database on the file system. 

Regrettably, we have been unable to trigger this error by a manual
action.

Kind regards,
Arthur Müller


On Wed, 2022-02-09 at 02:50 +0000, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup,
> using the mlx5 driver and we are wondering if anyone has seen
> anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over
> RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are
> exposed as a single NVMe fabrics device, one physical SSD per
> namespace.
> 
> When running an fio job targeting directly the fabrics devices (no
> filesystem, see script at the end), within a minute or so we start
> seeing errors like this:
> 
> [  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0):
> WC error: 4, Message: local protection error
> [  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error
> cqe
> [  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8
> e2
> [  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed
> with status local protection error (4)
> [  408.380235] nvme nvme15: starting error recovery
> [  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [  408.380246] block nvme15n2: no usable path - requeuing I/O
> [  408.380284] block nvme15n5: no usable path - requeuing I/O
> [  408.380298] block nvme15n1: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380330] block nvme15n1: no usable path - requeuing I/O
> [  408.380350] block nvme15n2: no usable path - requeuing I/O
> [  408.380371] block nvme15n6: no usable path - requeuing I/O
> [  408.380377] block nvme15n6: no usable path - requeuing I/O
> [  408.380382] block nvme15n4: no usable path - requeuing I/O
> [  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [  415.131898] nvmet: ctrl 1 fatal error occurred!
> 
> Occasionally, we've seen the following stack trace:
> 
> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted:
> P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS
> R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48
> 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85
> f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90
> 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX:
> 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI:
> 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09:
> 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12:
> ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15:
> 0001000000000000
> [ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000)
> knlGS:0000000000000000
> [ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4:
> 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877]  amd_iommu_unmap+0x2c/0x40
> [ 1158.548653]  __iommu_unmap+0xc4/0x170
> [ 1158.552344]  iommu_unmap_fast+0xe/0x10
> [ 1158.556100]  __iommu_dma_unmap+0x85/0x120
> [ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027]  process_one_work+0x220/0x3c0
> [ 1158.598038]  worker_thread+0x4d/0x3f0
> [ 1158.601696]  kthread+0x114/0x150
> [ 1158.604928]  ? process_one_work+0x3c0/0x3c0
> [ 1158.609114]  ? kthread_park+0x90/0x90
> [ 1158.612783]  ret_from_fork+0x22/0x30
> 
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
> 
> We found a possibly related bug report [1] that suggested disabling
> the IOMMU could help, but even after I disabled it (amd_iommu=off
> iommu=off) I still get errors (nvme IO timeouts). Another thread from
> 2016[2] suggested that disabling some kernel debug options could
> workaround the "local protection error" but that didn't help either.
> 
> As far as I can tell, the disks are fine, as running the same fio job
> targeting the real physical devices works fine.
> 
> Any suggestions are appreciated.
> 
> Thanks,
> Martin
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]:
> https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
> 
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
> 
> [file1]
> filename=/dev/nvme0n1
> 
> [file2]
> filename=/dev/nvme0n2
> 
> [file3]
> filename=/dev/nvme0n3
> 
> [file4]
> filename=/dev/nvme0n4
> 
> [file5]
> filename=/dev/nvme0n5
> 
> [file6]
> filename=/dev/nvme0n6
> 
> [file7]
> filename=/dev/nvme0n7
> 
> [file8]
> filename=/dev/nvme0n8
> 
> [file9]
> filename=/dev/nvme0n9
> 
> [file10]
> filename=/dev/nvme0n10
> 
> [file11]
> filename=/dev/nvme0n11
> 
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2022-02-09  2:50 ` Martin Oliveira
                   ` (3 preceding siblings ...)
  (?)
@ 2024-01-31  9:18 ` Arthur Muller
  -1 siblings, 0 replies; 18+ messages in thread
From: Arthur Muller @ 2024-01-31  9:18 UTC (permalink / raw)
  To: Martin Oliveira, linux-nvme, iommu, linux-rdma
  Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe

This is a re-sending of an email due to iommu old list bounce. Now
including new iommu@lists.linux.dev instead. First message was
submitted to LORE and is published at:

https://lore.kernel.org/all/9a40e66eb8ffc48a2e3765cf77f49914d57c55e7.camel@gmx.net/




Dear all,

We've encountered a similar issue. In our case, we are using the Lustre
file system instead of NVMe-oF to connect our storage over the network.
Our setup involves an AMD EPYC 7282 machine paired with Mellanox
MT28908 cards. Following the guidelines in the Nvidia documentation:

https://docs.nvidia.com/networking/display/mlnxenv584150lts/installing+mlnx_en#src-2477565014_InstallingMLNX_EN-InstallationModes

we compiled the MLNX_EN 5.8 LTS driver using VMA. Additionally, we
experimented with the latest MLNX_EN 23.10 driver, encountering the
same issue. 

We use kernel 5.15.0 now. This problems first appered after upgrading
our systems from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS and, thus, from
kernel 5.4 to 5.15. Unfortunately, we are not completely sure about the
MLNX_EN driver version prior to the upgrade, but strongly assume <=
5.8.

The error process appears to be initiated by the following error
message:

mlx5_core 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0003 address=0x200020f758 flags=0x0020]

This error results in timeouts and read operation faults within the
Lustre file system on both the client and storage ends, potentially
leading to a complete storage failure. 

Subsequently, the IO_PAGE_FAULT error is often followed by a "local
protection error," although this is not consistent. The error tends to
manifest after a prolonged period of constant network operation,
typically after approximately one day of continuous read and write
operations involving a Postgres database on the file system. 

Regrettably, we have been unable to trigger this error by a manual
action.

Kind regards,
Arthur Müller


On Wed, 2022-02-09 at 02:50 +0000, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup,
> using the mlx5 driver and we are wondering if anyone has seen
> anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over
> RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are
> exposed as a single NVMe fabrics device, one physical SSD per
> namespace.
> 
> When running an fio job targeting directly the fabrics devices (no
> filesystem, see script at the end), within a minute or so we start
> seeing errors like this:
> 
> [  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0):
> WC error: 4, Message: local protection error
> [  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error
> cqe
> [  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8
> e2
> [  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed
> with status local protection error (4)
> [  408.380235] nvme nvme15: starting error recovery
> [  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [  408.380246] block nvme15n2: no usable path - requeuing I/O
> [  408.380284] block nvme15n5: no usable path - requeuing I/O
> [  408.380298] block nvme15n1: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380330] block nvme15n1: no usable path - requeuing I/O
> [  408.380350] block nvme15n2: no usable path - requeuing I/O
> [  408.380371] block nvme15n6: no usable path - requeuing I/O
> [  408.380377] block nvme15n6: no usable path - requeuing I/O
> [  408.380382] block nvme15n4: no usable path - requeuing I/O
> [  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [  415.131898] nvmet: ctrl 1 fatal error occurred!
> 
> Occasionally, we've seen the following stack trace:
> 
> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted:
> P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS
> R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48
> 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85
> f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90
> 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX:
> 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI:
> 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09:
> 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12:
> ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15:
> 0001000000000000
> [ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000)
> knlGS:0000000000000000
> [ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4:
> 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877]  amd_iommu_unmap+0x2c/0x40
> [ 1158.548653]  __iommu_unmap+0xc4/0x170
> [ 1158.552344]  iommu_unmap_fast+0xe/0x10
> [ 1158.556100]  __iommu_dma_unmap+0x85/0x120
> [ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027]  process_one_work+0x220/0x3c0
> [ 1158.598038]  worker_thread+0x4d/0x3f0
> [ 1158.601696]  kthread+0x114/0x150
> [ 1158.604928]  ? process_one_work+0x3c0/0x3c0
> [ 1158.609114]  ? kthread_park+0x90/0x90
> [ 1158.612783]  ret_from_fork+0x22/0x30
> 
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
> 
> We found a possibly related bug report [1] that suggested disabling
> the IOMMU could help, but even after I disabled it (amd_iommu=off
> iommu=off) I still get errors (nvme IO timeouts). Another thread from
> 2016[2] suggested that disabling some kernel debug options could
> workaround the "local protection error" but that didn't help either.
> 
> As far as I can tell, the disks are fine, as running the same fio job
> targeting the real physical devices works fine.
> 
> Any suggestions are appreciated.
> 
> Thanks,
> Martin
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]:
> https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
> 
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
> 
> [file1]
> filename=/dev/nvme0n1
> 
> [file2]
> filename=/dev/nvme0n2
> 
> [file3]
> filename=/dev/nvme0n3
> 
> [file4]
> filename=/dev/nvme0n4
> 
> [file5]
> filename=/dev/nvme0n5
> 
> [file6]
> filename=/dev/nvme0n6
> 
> [file7]
> filename=/dev/nvme0n7
> 
> [file8]
> filename=/dev/nvme0n8
> 
> [file9]
> filename=/dev/nvme0n9
> 
> [file10]
> filename=/dev/nvme0n10
> 
> [file11]
> filename=/dev/nvme0n11
> 
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2024-01-31  5:34 ` Arthur Muller
@ 2024-01-31 13:18   ` Christoph Hellwig
  2024-01-31 14:01     ` Arthur Muller
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-01-31 13:18 UTC (permalink / raw)
  To: Arthur Muller
  Cc: Martin Oliveira, linux-nvme, iommu, linux-rdma, Kelly Ursenbach,
	Lee, Jason, Logan Gunthorpe

On Wed, Jan 31, 2024 at 06:34:00AM +0100, Arthur Muller wrote:
> Dear all,
> 
> We've encountered a similar issue. In our case, we are using the Lustre
> file system instead of NVMe-oF to connect our storage over the network.
> Our setup involves an AMD EPYC 7282 machine paired with Mellanox
> MT28908 cards. Following the guidelines in the Nvidia documentation:
> 
> https://docs.nvidia.com/networking/display/mlnxenv584150lts/installing+mlnx_en#src-2477565014_InstallingMLNX_EN-InstallationModes
> 
> we compiled the MLNX_EN 5.8 LTS driver using VMA. Additionally, we
> experimented with the latest MLNX_EN 23.10 driver, encountering the
> same issue. 

If you use the nvidia out of tree junk you are completely on your own
and have no one to blame but yourself.  Any problems with that do not
belong on a Linux mailing list.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
  2024-01-31 13:18   ` Christoph Hellwig
@ 2024-01-31 14:01     ` Arthur Muller
  0 siblings, 0 replies; 18+ messages in thread
From: Arthur Muller @ 2024-01-31 14:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin Oliveira, linux-nvme, iommu, linux-rdma, Kelly Ursenbach,
	Lee, Jason, Logan Gunthorpe

On Wed, 2024-01-31 at 05:18 -0800, Christoph Hellwig wrote:
> On Wed, Jan 31, 2024 at 06:34:00AM +0100, Arthur Muller wrote:
> > Dear all,
> > 
> > We've encountered a similar issue. In our case, we are using the
> > Lustre
> > file system instead of NVMe-oF to connect our storage over the
> > network.
> > Our setup involves an AMD EPYC 7282 machine paired with Mellanox
> > MT28908 cards. Following the guidelines in the Nvidia
> > documentation:
> > 
> > https://docs.nvidia.com/networking/display/mlnxenv584150lts/installing+mlnx_en#src-2477565014_InstallingMLNX_EN-InstallationModes
> > 
> > we compiled the MLNX_EN 5.8 LTS driver using VMA. Additionally, we
> > experimented with the latest MLNX_EN 23.10 driver, encountering the
> > same issue. 
> 
> If you use the nvidia out of tree junk you are completely on your own
> and have no one to blame but yourself.  Any problems with that do not
> belong on a Linux mailing list.
> 


Dear Christoph,

thank you very much for your reply pointing me to the possible cause of
the problem. I am not blaming anybody. According to the history of this
thread I was assuming that there might be an unsolved IOMMU issue and
just provided some context, hoping I can help debugging it. There are
some IOMMU-related theads on Kernel's Bugzilla. Mentioned setup was the
most similar to our.

Kind regards,
Arthur Müller

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-01-31 14:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09  2:50 Error when running fio against nvme-of rdma target (mlx5 driver) Martin Oliveira
2022-02-09  2:50 ` Martin Oliveira
2022-02-09  8:41 ` Chaitanya Kulkarni
2022-02-09  8:41   ` Chaitanya Kulkarni via iommu
2022-02-10 23:58   ` Martin Oliveira
2022-02-10 23:58     ` Martin Oliveira
2022-02-11 11:35     ` Robin Murphy
2022-02-11 11:35       ` Robin Murphy
2022-05-17  8:26       ` Mark Ruijter
2022-05-17  8:26         ` Mark Ruijter
2022-05-17 11:16         ` Max Gurtovoy
2022-05-17 11:16           ` Max Gurtovoy via iommu
2022-02-09 12:48 ` Robin Murphy
2022-02-09 12:48   ` Robin Murphy
2024-01-31  5:34 ` Arthur Muller
2024-01-31 13:18   ` Christoph Hellwig
2024-01-31 14:01     ` Arthur Muller
2024-01-31  9:18 ` Arthur Muller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.