Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
* AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
@ 2020-02-11  5:06 Andre Tomt
  2020-02-11  7:25 ` Joerg Roedel
  0 siblings, 1 reply; 10+ messages in thread
From: Andre Tomt @ 2020-02-11  5:06 UTC (permalink / raw)
  To: linux-nfs, Joerg Roedel, Tom Murphy, iommu

Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA 
stopped working. But only on my AMD Ryzen systems. And so far only NFS, 
curiously other RDMA diagnostic tools (like qperf <ip> -cm1 rc_bw) work 
fine.

A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 
iommu/amd: Convert AMD iommu driver to the dma-iommu api

5.5.3-rc1, 5.6-rc1 are also not working.

I verified it by booting with amd_iommu=off on the kernel cmdline - it 
makes everything work again.

The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running 
over RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). 
Nothing fancy besides the RoCEv1 and related bits network bits like PFC 
and storage VLAN. Bare metal, no virtualization.

The impacted systems are:
ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201

pcaps off a mirror port can be provided. They show that on 5.5.x, CM 
succeeds, and then a couple of NFS NULL calls comes through (over RoCE), 
both acked, and then the rest just never goes out from the client until 
the mount times out and CM is torn down.

No messages shows up in the kernel log on either side. I was at least 
expecting some scary IOMMU warnings.

More serious hardware is not available for RDMA testing currently, so I 
dont know if a EPYC system or newer mlx5 cards would have similar 
issues. Intel I've only tested as server so far, that worked fine, as 
expected given the bisect result.


> git bisect start
> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux
> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage
> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message
> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map
> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask
> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11  5:06 AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected) Andre Tomt
@ 2020-02-11  7:25 ` Joerg Roedel
  2020-02-11 13:48   ` Chuck Lever
  0 siblings, 1 reply; 10+ messages in thread
From: Joerg Roedel @ 2020-02-11  7:25 UTC (permalink / raw)
  To: Andre Tomt, Tom Murphy; +Cc: linux-nfs, iommu

Adding Tom's new email address.

Tom, can you have a look, please? 
https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
issue.

On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
> stopped working. But only on my AMD Ryzen systems. And so far only NFS,
> curiously other RDMA diagnostic tools (like qperf <ip> -cm1 rc_bw) work
> fine.
> 
> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
> Convert AMD iommu driver to the dma-iommu api
> 
> 5.5.3-rc1, 5.6-rc1 are also not working.
> 
> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
> everything work again.
> 
> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
> fancy besides the RoCEv1 and related bits network bits like PFC and storage
> VLAN. Bare metal, no virtualization.
> 
> The impacted systems are:
> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
> 
> pcaps off a mirror port can be provided. They show that on 5.5.x, CM
> succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
> both acked, and then the rest just never goes out from the client until the
> mount times out and CM is torn down.
> 
> No messages shows up in the kernel log on either side. I was at least
> expecting some scary IOMMU warnings.
> 
> More serious hardware is not available for RDMA testing currently, so I dont
> know if a EPYC system or newer mlx5 cards would have similar issues. Intel
> I've only tested as server so far, that worked fine, as expected given the
> bisect result.
> 
> 
> > git bisect start
> > # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
> > git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
> > # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
> > git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
> > # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
> > # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
> > git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
> > # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
> > git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
> > # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
> > # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux
> > git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
> > # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
> > git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
> > # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage
> > git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
> > # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
> > git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
> > # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message
> > git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
> > # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
> > git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
> > # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
> > git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
> > # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map
> > git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
> > # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask
> > git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
> > # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11  7:25 ` Joerg Roedel
@ 2020-02-11 13:48   ` Chuck Lever
  2020-02-11 15:12     ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-11 13:48 UTC (permalink / raw)
  To: Andre Tomt, Tom Murphy; +Cc: Linux NFS Mailing List, Joerg Roedel, iommu

Andre-

Thank you for the detailed report!

Tom-

There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
Please keep me in the loop, let me know if there is anything I can do to help.


> On Feb 11, 2020, at 2:25 AM, Joerg Roedel <jroedel@suse.de> wrote:
> 
> Adding Tom's new email address.
> 
> Tom, can you have a look, please? 
> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
> issue.
> 
> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
>> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
>> stopped working. But only on my AMD Ryzen systems. And so far only NFS,
>> curiously other RDMA diagnostic tools (like qperf <ip> -cm1 rc_bw) work
>> fine.
>> 
>> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
>> Convert AMD iommu driver to the dma-iommu api
>> 
>> 5.5.3-rc1, 5.6-rc1 are also not working.
>> 
>> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
>> everything work again.
>> 
>> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
>> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
>> fancy besides the RoCEv1 and related bits network bits like PFC and storage
>> VLAN. Bare metal, no virtualization.
>> 
>> The impacted systems are:
>> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
>> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
>> 
>> pcaps off a mirror port can be provided. They show that on 5.5.x, CM
>> succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
>> both acked, and then the rest just never goes out from the client until the
>> mount times out and CM is torn down.
>> 
>> No messages shows up in the kernel log on either side. I was at least
>> expecting some scary IOMMU warnings.
>> 
>> More serious hardware is not available for RDMA testing currently, so I dont
>> know if a EPYC system or newer mlx5 cards would have similar issues. Intel
>> I've only tested as server so far, that worked fine, as expected given the
>> bisect result.
>> 
>> 
>>> git bisect start
>>> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
>>> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
>>> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
>>> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
>>> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>>> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
>>> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>>> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
>>> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
>>> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
>>> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
>>> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux
>>> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
>>> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
>>> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
>>> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage
>>> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
>>> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
>>> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
>>> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message
>>> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
>>> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
>>> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
>>> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
>>> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
>>> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map
>>> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
>>> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask
>>> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
>>> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api

--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 13:48   ` Chuck Lever
@ 2020-02-11 15:12     ` Robin Murphy
  2020-02-11 15:24       ` Chuck Lever
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2020-02-11 15:12 UTC (permalink / raw)
  To: Chuck Lever, Andre Tomt, Tom Murphy
  Cc: Linux NFS Mailing List, Joerg Roedel, iommu

On 11/02/2020 1:48 pm, Chuck Lever wrote:
> Andre-
> 
> Thank you for the detailed report!
> 
> Tom-
> 
> There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
> Please keep me in the loop, let me know if there is anything I can do to help.

One aspect that may be worth checking is whether there's anywhere that 
assumes a successful return value from dma_map_sg() is always the same 
as the number of entries passed in - that's the most obvious way the 
iommu-dma code differs (legitimately) from the previous amd-iommu 
implementation.

Robin.

>> On Feb 11, 2020, at 2:25 AM, Joerg Roedel <jroedel@suse.de> wrote:
>>
>> Adding Tom's new email address.
>>
>> Tom, can you have a look, please?
>> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
>> issue.
>>
>> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
>>> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
>>> stopped working. But only on my AMD Ryzen systems. And so far only NFS,
>>> curiously other RDMA diagnostic tools (like qperf <ip> -cm1 rc_bw) work
>>> fine.
>>>
>>> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
>>> Convert AMD iommu driver to the dma-iommu api
>>>
>>> 5.5.3-rc1, 5.6-rc1 are also not working.
>>>
>>> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
>>> everything work again.
>>>
>>> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
>>> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
>>> fancy besides the RoCEv1 and related bits network bits like PFC and storage
>>> VLAN. Bare metal, no virtualization.
>>>
>>> The impacted systems are:
>>> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
>>> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
>>>
>>> pcaps off a mirror port can be provided. They show that on 5.5.x, CM
>>> succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
>>> both acked, and then the rest just never goes out from the client until the
>>> mount times out and CM is torn down.
>>>
>>> No messages shows up in the kernel log on either side. I was at least
>>> expecting some scary IOMMU warnings.
>>>
>>> More serious hardware is not available for RDMA testing currently, so I dont
>>> know if a EPYC system or newer mlx5 cards would have similar issues. Intel
>>> I've only tested as server so far, that worked fine, as expected given the
>>> bisect result.
>>>
>>>
>>>> git bisect start
>>>> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
>>>> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
>>>> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
>>>> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
>>>> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>>>> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
>>>> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>>>> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
>>>> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
>>>> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
>>>> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
>>>> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux
>>>> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
>>>> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
>>>> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
>>>> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage
>>>> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
>>>> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
>>>> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
>>>> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message
>>>> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
>>>> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
>>>> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
>>>> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
>>>> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
>>>> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map
>>>> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
>>>> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask
>>>> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
>>>> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
> 
> --
> Chuck Lever
> 
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 15:12     ` Robin Murphy
@ 2020-02-11 15:24       ` Chuck Lever
  2020-02-11 15:32         ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-11 15:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andre Tomt, Tom Murphy, Linux NFS Mailing List, Joerg Roedel, iommu



> On Feb 11, 2020, at 10:12 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>> Andre-
>> Thank you for the detailed report!
>> Tom-
>> There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
>> Please keep me in the loop, let me know if there is anything I can do to help.
> 
> One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation.

net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()

317         mr->mr_nents =
318                 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
319         if (!mr->mr_nents)
320                 goto out_dmamap_err;

Should that rather be "if (mr->mr_nents != i)" ?


> Robin.
> 
>>> On Feb 11, 2020, at 2:25 AM, Joerg Roedel <jroedel@suse.de> wrote:
>>> 
>>> Adding Tom's new email address.
>>> 
>>> Tom, can you have a look, please?
>>> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
>>> issue.
>>> 
>>> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
>>>> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
>>>> stopped working. But only on my AMD Ryzen systems. And so far only NFS,
>>>> curiously other RDMA diagnostic tools (like qperf <ip> -cm1 rc_bw) work
>>>> fine.
>>>> 
>>>> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
>>>> Convert AMD iommu driver to the dma-iommu api
>>>> 
>>>> 5.5.3-rc1, 5.6-rc1 are also not working.
>>>> 
>>>> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
>>>> everything work again.
>>>> 
>>>> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
>>>> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
>>>> fancy besides the RoCEv1 and related bits network bits like PFC and storage
>>>> VLAN. Bare metal, no virtualization.
>>>> 
>>>> The impacted systems are:
>>>> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
>>>> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
>>>> 
>>>> pcaps off a mirror port can be provided. They show that on 5.5.x, CM
>>>> succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
>>>> both acked, and then the rest just never goes out from the client until the
>>>> mount times out and CM is torn down.
>>>> 
>>>> No messages shows up in the kernel log on either side. I was at least
>>>> expecting some scary IOMMU warnings.
>>>> 
>>>> More serious hardware is not available for RDMA testing currently, so I dont
>>>> know if a EPYC system or newer mlx5 cards would have similar issues. Intel
>>>> I've only tested as server so far, that worked fine, as expected given the
>>>> bisect result.
>>>> 
>>>> 
>>>>> git bisect start
>>>>> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
>>>>> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
>>>>> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
>>>>> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
>>>>> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>>>>> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
>>>>> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>>>>> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
>>>>> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
>>>>> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
>>>>> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
>>>>> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux
>>>>> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
>>>>> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
>>>>> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
>>>>> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage
>>>>> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
>>>>> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
>>>>> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
>>>>> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message
>>>>> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
>>>>> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
>>>>> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
>>>>> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
>>>>> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
>>>>> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map
>>>>> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
>>>>> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask
>>>>> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
>>>>> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
>> --
>> Chuck Lever
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 15:24       ` Chuck Lever
@ 2020-02-11 15:32         ` Robin Murphy
  2020-02-11 16:03           ` Chuck Lever
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2020-02-11 15:32 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Andre Tomt, Tom Murphy, Linux NFS Mailing List, Joerg Roedel, iommu

On 11/02/2020 3:24 pm, Chuck Lever wrote:
> 
> 
>> On Feb 11, 2020, at 10:12 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>>> Andre-
>>> Thank you for the detailed report!
>>> Tom-
>>> There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
>>> Please keep me in the loop, let me know if there is anything I can do to help.
>>
>> One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation.
> 
> net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
> 
> 317         mr->mr_nents =
> 318                 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
> 319         if (!mr->mr_nents)
> 320                 goto out_dmamap_err;
> 
> Should that rather be "if (mr->mr_nents != i)" ?

No, that much is OK - the point is that dma_map_sg() may pack the DMA 
addresses such that sg_dma_len(sg) > sg->length - however, subsequently 
passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather 
than the original value of i) looks at a glance like an example of how 
things may start to get out-of-whack.

Robin.

>>>> On Feb 11, 2020, at 2:25 AM, Joerg Roedel <jroedel@suse.de> wrote:
>>>>
>>>> Adding Tom's new email address.
>>>>
>>>> Tom, can you have a look, please?
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
>>>> issue.
>>>>
>>>> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
>>>>> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
>>>>> stopped working. But only on my AMD Ryzen systems. And so far only NFS,
>>>>> curiously other RDMA diagnostic tools (like qperf <ip> -cm1 rc_bw) work
>>>>> fine.
>>>>>
>>>>> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
>>>>> Convert AMD iommu driver to the dma-iommu api
>>>>>
>>>>> 5.5.3-rc1, 5.6-rc1 are also not working.
>>>>>
>>>>> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
>>>>> everything work again.
>>>>>
>>>>> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
>>>>> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
>>>>> fancy besides the RoCEv1 and related bits network bits like PFC and storage
>>>>> VLAN. Bare metal, no virtualization.
>>>>>
>>>>> The impacted systems are:
>>>>> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
>>>>> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
>>>>>
>>>>> pcaps off a mirror port can be provided. They show that on 5.5.x, CM
>>>>> succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
>>>>> both acked, and then the rest just never goes out from the client until the
>>>>> mount times out and CM is torn down.
>>>>>
>>>>> No messages shows up in the kernel log on either side. I was at least
>>>>> expecting some scary IOMMU warnings.
>>>>>
>>>>> More serious hardware is not available for RDMA testing currently, so I dont
>>>>> know if a EPYC system or newer mlx5 cards would have similar issues. Intel
>>>>> I've only tested as server so far, that worked fine, as expected given the
>>>>> bisect result.
>>>>>
>>>>>
>>>>>> git bisect start
>>>>>> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
>>>>>> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
>>>>>> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
>>>>>> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
>>>>>> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>>>>>> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
>>>>>> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>>>>>> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
>>>>>> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
>>>>>> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
>>>>>> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
>>>>>> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux
>>>>>> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
>>>>>> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
>>>>>> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
>>>>>> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage
>>>>>> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
>>>>>> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
>>>>>> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
>>>>>> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message
>>>>>> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
>>>>>> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
>>>>>> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
>>>>>> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
>>>>>> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
>>>>>> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map
>>>>>> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
>>>>>> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask
>>>>>> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
>>>>>> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api
>>> --
>>> Chuck Lever
>>> _______________________________________________
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> --
> Chuck Lever
> 
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 15:32         ` Robin Murphy
@ 2020-02-11 16:03           ` Chuck Lever
  2020-02-11 16:36             ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-11 16:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andre Tomt, Tom Murphy, Linux NFS Mailing List, Joerg Roedel, iommu



> On Feb 11, 2020, at 10:32 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 11/02/2020 3:24 pm, Chuck Lever wrote:
>>> On Feb 11, 2020, at 10:12 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> 
>>> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>>>> Andre-
>>>> Thank you for the detailed report!
>>>> Tom-
>>>> There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
>>>> Please keep me in the loop, let me know if there is anything I can do to help.
>>> 
>>> One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation.
>> net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
>> 317         mr->mr_nents =
>> 318                 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
>> 319         if (!mr->mr_nents)
>> 320                 goto out_dmamap_err;
>> Should that rather be "if (mr->mr_nents != i)" ?
> 
> No, that much is OK - the point is that dma_map_sg() may pack the DMA addresses such that sg_dma_len(sg) > sg->length - however, subsequently passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than the original value of i) looks at a glance like an example of how things may start to get out-of-whack.

Robin, your explanation makes sense to me. I can post a fix for this imbalance later today for Andre to try.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 16:03           ` Chuck Lever
@ 2020-02-11 16:36             ` Robin Murphy
  2020-02-11 16:42               ` Chuck Lever
  2020-02-11 17:53               ` Andre Tomt
  0 siblings, 2 replies; 10+ messages in thread
From: Robin Murphy @ 2020-02-11 16:36 UTC (permalink / raw)
  To: Chuck Lever, Andre Tomt
  Cc: Tom Murphy, Linux NFS Mailing List, Joerg Roedel, iommu

On 11/02/2020 4:03 pm, Chuck Lever wrote:
> 
> 
>> On Feb 11, 2020, at 10:32 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 11/02/2020 3:24 pm, Chuck Lever wrote:
>>>> On Feb 11, 2020, at 10:12 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>>>>> Andre-
>>>>> Thank you for the detailed report!
>>>>> Tom-
>>>>> There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
>>>>> Please keep me in the loop, let me know if there is anything I can do to help.
>>>>
>>>> One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation.
>>> net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
>>> 317         mr->mr_nents =
>>> 318                 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
>>> 319         if (!mr->mr_nents)
>>> 320                 goto out_dmamap_err;
>>> Should that rather be "if (mr->mr_nents != i)" ?
>>
>> No, that much is OK - the point is that dma_map_sg() may pack the DMA addresses such that sg_dma_len(sg) > sg->length - however, subsequently passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than the original value of i) looks at a glance like an example of how things may start to get out-of-whack.
> 
> Robin, your explanation makes sense to me. I can post a fix for this imbalance later today for Andre to try.

FWIW here's a quick hack which *should* suppress the concatenation 
behaviour - if it makes Andre's system any happier then that would 
indeed point towards dma_map_sg() handling being the culprit.

Robin.

----->8-----
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..a6b71bad518e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
  		 * - but doesn't fall at a segment boundary
  		 * - and wouldn't make the resulting output segment too long
  		 */
-		if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
+		if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
  		    (max_len - cur_len >= s_length)) {
  			/* ...then concatenate it with the previous one */
  			cur_len += s_length;
@@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
  		if (s_length + s_iova_off < s_iova_len)
  			cur_len = 0;
  	}
+	WARN_ON(count < nents);
  	return count;
  }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 16:36             ` Robin Murphy
@ 2020-02-11 16:42               ` Chuck Lever
  2020-02-11 17:53               ` Andre Tomt
  1 sibling, 0 replies; 10+ messages in thread
From: Chuck Lever @ 2020-02-11 16:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andre Tomt, Tom Murphy, Linux NFS Mailing List, Joerg Roedel, iommu



> On Feb 11, 2020, at 11:36 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 11/02/2020 4:03 pm, Chuck Lever wrote:
>>> On Feb 11, 2020, at 10:32 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> 
>>> On 11/02/2020 3:24 pm, Chuck Lever wrote:
>>>>> On Feb 11, 2020, at 10:12 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>> 
>>>>> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>>>>>> Andre-
>>>>>> Thank you for the detailed report!
>>>>>> Tom-
>>>>>> There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw.
>>>>>> Please keep me in the loop, let me know if there is anything I can do to help.
>>>>> 
>>>>> One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation.
>>>> net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
>>>> 317         mr->mr_nents =
>>>> 318                 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
>>>> 319         if (!mr->mr_nents)
>>>> 320                 goto out_dmamap_err;
>>>> Should that rather be "if (mr->mr_nents != i)" ?
>>> 
>>> No, that much is OK - the point is that dma_map_sg() may pack the DMA addresses such that sg_dma_len(sg) > sg->length - however, subsequently passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than the original value of i) looks at a glance like an example of how things may start to get out-of-whack.
>> Robin, your explanation makes sense to me. I can post a fix for this imbalance later today for Andre to try.
> 
> FWIW here's a quick hack which *should* suppress the concatenation behaviour - if it makes Andre's system any happier then that would indeed point towards dma_map_sg() handling being the culprit.

Even so, 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
looks like it introduced this problem.


> Robin.
> 
> ----->8-----
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index a2e96a5fd9a7..a6b71bad518e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
> 		 * - but doesn't fall at a segment boundary
> 		 * - and wouldn't make the resulting output segment too long
> 		 */
> -		if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> +		if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> 		    (max_len - cur_len >= s_length)) {
> 			/* ...then concatenate it with the previous one */
> 			cur_len += s_length;
> @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
> 		if (s_length + s_iova_off < s_iova_len)
> 			cur_len = 0;
> 	}
> +	WARN_ON(count < nents);
> 	return count;
> }

--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
  2020-02-11 16:36             ` Robin Murphy
  2020-02-11 16:42               ` Chuck Lever
@ 2020-02-11 17:53               ` Andre Tomt
  1 sibling, 0 replies; 10+ messages in thread
From: Andre Tomt @ 2020-02-11 17:53 UTC (permalink / raw)
  To: Robin Murphy, Chuck Lever
  Cc: Tom Murphy, Linux NFS Mailing List, Joerg Roedel, iommu

On 11.02.2020 17:36, Robin Murphy wrote:
> On 11/02/2020 4:03 pm, Chuck Lever wrote:
>> Robin, your explanation makes sense to me. I can post a fix for this 
>> imbalance later today for Andre to try.
> 
> FWIW here's a quick hack which *should* suppress the concatenation 
> behaviour - if it makes Andre's system any happier then that would 
> indeed point towards dma_map_sg() handling being the culprit.
> 
> Robin.

This hack do indeed make things work again.

> ----->8-----
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index a2e96a5fd9a7..a6b71bad518e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct 
> scatterlist *sg, int nents,
>            * - but doesn't fall at a segment boundary
>            * - and wouldn't make the resulting output segment too long
>            */
> -        if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> +        if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
>               (max_len - cur_len >= s_length)) {
>               /* ...then concatenate it with the previous one */
>               cur_len += s_length;
> @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct 
> scatterlist *sg, int nents,
>           if (s_length + s_iova_off < s_iova_len)
>               cur_len = 0;
>       }
> +    WARN_ON(count < nents);
>       return count;
>   }
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11  5:06 AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected) Andre Tomt
2020-02-11  7:25 ` Joerg Roedel
2020-02-11 13:48   ` Chuck Lever
2020-02-11 15:12     ` Robin Murphy
2020-02-11 15:24       ` Chuck Lever
2020-02-11 15:32         ` Robin Murphy
2020-02-11 16:03           ` Chuck Lever
2020-02-11 16:36             ` Robin Murphy
2020-02-11 16:42               ` Chuck Lever
2020-02-11 17:53               ` Andre Tomt

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git