Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/2] Fix NFS/RDMA operation with Ryzen IOMMU
@ 2020-02-12 16:12 Chuck Lever
  2020-02-12 16:12 ` [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance Chuck Lever
  2020-02-12 16:12 ` [PATCH v3 2/2] xprtrdma: Enhance MR-related trace points Chuck Lever
  0 siblings, 2 replies; 10+ messages in thread
From: Chuck Lever @ 2020-02-12 16:12 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-nfs, linux-rdma

Hi Anna-

Here's a fix for the bug reported yesterday morning, plus an
observability enhancement to help diagnose future similar bugs. The
fix needs to go into 5.6-rc and 5.5-stable as soon as you can. The
observability change can go in wherever you see fit. Thanks!


Changes since v2:
- Fix now passes the usual gamut of static checks
- Improve MR trace points even further
- Split out the trace point change; it does not need to be
  backported

Changes since v1:
- Ensure the correct nents value is passed to ib_map_mr_sg
- Record the mr_nents value in the MR trace points

---

Chuck Lever (2):
      xprtrdma: Fix DMA scatter-gather list mapping imbalance
      xprtrdma: Enhance MR-related trace points


 include/trace/events/rpcrdma.h |   56 +++++++++++++++++++++-------------------
 net/sunrpc/xprtrdma/frwr_ops.c |   15 ++++++-----
 2 files changed, 38 insertions(+), 33 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 16:12 [PATCH v3 0/2] Fix NFS/RDMA operation with Ryzen IOMMU Chuck Lever
@ 2020-02-12 16:12 ` Chuck Lever
  2020-02-12 18:26   ` Jason Gunthorpe
  2020-02-12 16:12 ` [PATCH v3 2/2] xprtrdma: Enhance MR-related trace points Chuck Lever
  1 sibling, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-12 16:12 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-nfs, linux-rdma

The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.

The bug was exposed by recent changes to the AMD IOMMU driver, which
enabled sg entry concatenation.

Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
new memory registration API") and reviewing other kernel ULPs, it's
not clear that the frwr_map() logic was ever correct for this case.

Reported-by: Andre Tomt <andre@tomt.net>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v5.5
---
 net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 095be887753e..125297c9aa3e 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -288,8 +288,8 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
 {
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	struct ib_reg_wr *reg_wr;
+	int i, n, dma_nents;
 	struct ib_mr *ibmr;
-	int i, n;
 	u8 key;
 
 	if (nsegs > ia->ri_max_frwr_depth)
@@ -313,15 +313,16 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
 			break;
 	}
 	mr->mr_dir = rpcrdma_data_dir(writing);
+	mr->mr_nents = i;
 
-	mr->mr_nents =
-		ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
-	if (!mr->mr_nents)
+	dma_nents = ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, mr->mr_nents,
+				  mr->mr_dir);
+	if (!dma_nents)
 		goto out_dmamap_err;
 
 	ibmr = mr->frwr.fr_mr;
-	n = ib_map_mr_sg(ibmr, mr->mr_sg, mr->mr_nents, NULL, PAGE_SIZE);
-	if (unlikely(n != mr->mr_nents))
+	n = ib_map_mr_sg(ibmr, mr->mr_sg, dma_nents, NULL, PAGE_SIZE);
+	if (n != dma_nents)
 		goto out_mapmr_err;
 
 	ibmr->iova &= 0x00000000ffffffff;



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] xprtrdma: Enhance MR-related trace points
  2020-02-12 16:12 [PATCH v3 0/2] Fix NFS/RDMA operation with Ryzen IOMMU Chuck Lever
  2020-02-12 16:12 ` [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance Chuck Lever
@ 2020-02-12 16:12 ` Chuck Lever
  1 sibling, 0 replies; 10+ messages in thread
From: Chuck Lever @ 2020-02-12 16:12 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-nfs, linux-rdma

Two changes:
- Show the number of SG entries that were mapped. This helps debug
  DMA-related problems.
- Record the MR's resource ID instead of its memory address. This
  groups each MR with its associated rdma-tool output, and reduces
  needless exposure of memory addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/rpcrdma.h |   56 +++++++++++++++++++++-------------------
 net/sunrpc/xprtrdma/frwr_ops.c |    2 +
 2 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index c0e4c93324f5..87f4461ab108 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -228,20 +228,20 @@ DECLARE_EVENT_CLASS(xprtrdma_frwr_done,
 	TP_ARGS(wc, frwr),
 
 	TP_STRUCT__entry(
-		__field(const void *, mr)
+		__field(u32, mr_id)
 		__field(unsigned int, status)
 		__field(unsigned int, vendor_err)
 	),
 
 	TP_fast_assign(
-		__entry->mr = container_of(frwr, struct rpcrdma_mr, frwr);
+		__entry->mr_id = frwr->fr_mr->res.id;
 		__entry->status = wc->status;
 		__entry->vendor_err = __entry->status ? wc->vendor_err : 0;
 	),
 
 	TP_printk(
-		"mr=%p: %s (%u/0x%x)",
-		__entry->mr, rdma_show_wc_status(__entry->status),
+		"mr.id=%u: %s (%u/0x%x)",
+		__entry->mr_id, rdma_show_wc_status(__entry->status),
 		__entry->status, __entry->vendor_err
 	)
 );
@@ -274,7 +274,8 @@ DECLARE_EVENT_CLASS(xprtrdma_mr,
 	TP_ARGS(mr),
 
 	TP_STRUCT__entry(
-		__field(const void *, mr)
+		__field(u32, mr_id)
+		__field(int, nents)
 		__field(u32, handle)
 		__field(u32, length)
 		__field(u64, offset)
@@ -282,15 +283,16 @@ DECLARE_EVENT_CLASS(xprtrdma_mr,
 	),
 
 	TP_fast_assign(
-		__entry->mr = mr;
+		__entry->mr_id  = mr->frwr.fr_mr->res.id;
+		__entry->nents  = mr->mr_nents;
 		__entry->handle = mr->mr_handle;
 		__entry->length = mr->mr_length;
 		__entry->offset = mr->mr_offset;
 		__entry->dir    = mr->mr_dir;
 	),
 
-	TP_printk("mr=%p %u@0x%016llx:0x%08x (%s)",
-		__entry->mr, __entry->length,
+	TP_printk("mr.id=%u nents=%d %u@0x%016llx:0x%08x (%s)",
+		__entry->mr_id, __entry->nents, __entry->length,
 		(unsigned long long)__entry->offset, __entry->handle,
 		xprtrdma_show_direction(__entry->dir)
 	)
@@ -920,17 +922,17 @@ TRACE_EVENT(xprtrdma_frwr_alloc,
 	TP_ARGS(mr, rc),
 
 	TP_STRUCT__entry(
-		__field(const void *, mr)
+		__field(u32, mr_id)
 		__field(int, rc)
 	),
 
 	TP_fast_assign(
-		__entry->mr = mr;
-		__entry->rc	= rc;
+		__entry->mr_id = mr->frwr.fr_mr->res.id;
+		__entry->rc = rc;
 	),
 
-	TP_printk("mr=%p: rc=%d",
-		__entry->mr, __entry->rc
+	TP_printk("mr.id=%u: rc=%d",
+		__entry->mr_id, __entry->rc
 	)
 );
 
@@ -943,7 +945,8 @@ TRACE_EVENT(xprtrdma_frwr_dereg,
 	TP_ARGS(mr, rc),
 
 	TP_STRUCT__entry(
-		__field(const void *, mr)
+		__field(u32, mr_id)
+		__field(int, nents)
 		__field(u32, handle)
 		__field(u32, length)
 		__field(u64, offset)
@@ -952,7 +955,8 @@ TRACE_EVENT(xprtrdma_frwr_dereg,
 	),
 
 	TP_fast_assign(
-		__entry->mr = mr;
+		__entry->mr_id  = mr->frwr.fr_mr->res.id;
+		__entry->nents  = mr->mr_nents;
 		__entry->handle = mr->mr_handle;
 		__entry->length = mr->mr_length;
 		__entry->offset = mr->mr_offset;
@@ -960,8 +964,8 @@ TRACE_EVENT(xprtrdma_frwr_dereg,
 		__entry->rc	= rc;
 	),
 
-	TP_printk("mr=%p %u@0x%016llx:0x%08x (%s): rc=%d",
-		__entry->mr, __entry->length,
+	TP_printk("mr.id=%u nents=%d %u@0x%016llx:0x%08x (%s): rc=%d",
+		__entry->mr_id, __entry->nents, __entry->length,
 		(unsigned long long)__entry->offset, __entry->handle,
 		xprtrdma_show_direction(__entry->dir),
 		__entry->rc
@@ -977,21 +981,21 @@ TRACE_EVENT(xprtrdma_frwr_sgerr,
 	TP_ARGS(mr, sg_nents),
 
 	TP_STRUCT__entry(
-		__field(const void *, mr)
+		__field(u32, mr_id)
 		__field(u64, addr)
 		__field(u32, dir)
 		__field(int, nents)
 	),
 
 	TP_fast_assign(
-		__entry->mr = mr;
+		__entry->mr_id = mr->frwr.fr_mr->res.id;
 		__entry->addr = mr->mr_sg->dma_address;
 		__entry->dir = mr->mr_dir;
 		__entry->nents = sg_nents;
 	),
 
-	TP_printk("mr=%p dma addr=0x%llx (%s) sg_nents=%d",
-		__entry->mr, __entry->addr,
+	TP_printk("mr.id=%u DMA addr=0x%llx (%s) sg_nents=%d",
+		__entry->mr_id, __entry->addr,
 		xprtrdma_show_direction(__entry->dir),
 		__entry->nents
 	)
@@ -1006,7 +1010,7 @@ TRACE_EVENT(xprtrdma_frwr_maperr,
 	TP_ARGS(mr, num_mapped),
 
 	TP_STRUCT__entry(
-		__field(const void *, mr)
+		__field(u32, mr_id)
 		__field(u64, addr)
 		__field(u32, dir)
 		__field(int, num_mapped)
@@ -1014,15 +1018,15 @@ TRACE_EVENT(xprtrdma_frwr_maperr,
 	),
 
 	TP_fast_assign(
-		__entry->mr = mr;
+		__entry->mr_id = mr->frwr.fr_mr->res.id;
 		__entry->addr = mr->mr_sg->dma_address;
 		__entry->dir = mr->mr_dir;
 		__entry->num_mapped = num_mapped;
 		__entry->nents = mr->mr_nents;
 	),
 
-	TP_printk("mr=%p dma addr=0x%llx (%s) nents=%d of %d",
-		__entry->mr, __entry->addr,
+	TP_printk("mr.id=%u DMA addr=0x%llx (%s) nents=%d of %d",
+		__entry->mr_id, __entry->addr,
 		xprtrdma_show_direction(__entry->dir),
 		__entry->num_mapped, __entry->nents
 	)
@@ -1031,7 +1035,7 @@ TRACE_EVENT(xprtrdma_frwr_maperr,
 DEFINE_MR_EVENT(localinv);
 DEFINE_MR_EVENT(map);
 DEFINE_MR_EVENT(unmap);
-DEFINE_MR_EVENT(remoteinv);
+DEFINE_MR_EVENT(reminv);
 DEFINE_MR_EVENT(recycle);
 
 TRACE_EVENT(xprtrdma_dma_maperr,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 125297c9aa3e..0dc799553a08 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -419,7 +419,7 @@ void frwr_reminv(struct rpcrdma_rep *rep, struct list_head *mrs)
 	list_for_each_entry(mr, mrs, mr_list)
 		if (mr->mr_handle == rep->rr_inv_rkey) {
 			list_del_init(&mr->mr_list);
-			trace_xprtrdma_mr_remoteinv(mr);
+			trace_xprtrdma_mr_reminv(mr);
 			rpcrdma_mr_put(mr);
 			break;	/* only one invalidated MR per RPC */
 		}



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 16:12 ` [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance Chuck Lever
@ 2020-02-12 18:26   ` Jason Gunthorpe
  2020-02-12 18:38     ` Chuck Lever
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-12 18:26 UTC (permalink / raw)
  To: Chuck Lever; +Cc: anna.schumaker, linux-nfs, linux-rdma

On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> The @nents value that was passed to ib_dma_map_sg() has to be passed
> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> concatenate sg entries, it will return a different nents value than
> it was passed.
> 
> The bug was exposed by recent changes to the AMD IOMMU driver, which
> enabled sg entry concatenation.
> 
> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> new memory registration API") and reviewing other kernel ULPs, it's
> not clear that the frwr_map() logic was ever correct for this case.
> 
> Reported-by: Andre Tomt <andre@tomt.net>
> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Cc: stable@vger.kernel.org # v5.5
> ---
>  net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)

Yep

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 18:26   ` Jason Gunthorpe
@ 2020-02-12 18:38     ` Chuck Lever
  2020-02-12 19:05       ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-12 18:38 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma


> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>> concatenate sg entries, it will return a different nents value than
>> it was passed.
>> 
>> The bug was exposed by recent changes to the AMD IOMMU driver, which
>> enabled sg entry concatenation.
>> 
>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
>> new memory registration API") and reviewing other kernel ULPs, it's
>> not clear that the frwr_map() logic was ever correct for this case.
>> 
>> Reported-by: Andre Tomt <andre@tomt.net>
>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Cc: stable@vger.kernel.org # v5.5
>> ---
>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>> 1 file changed, 7 insertions(+), 6 deletions(-)
> 
> Yep
> 
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Thanks.

Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".

Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")

--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 18:38     ` Chuck Lever
@ 2020-02-12 19:05       ` Jason Gunthorpe
  2020-02-12 19:09         ` Chuck Lever
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-12 19:05 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma

On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
> 
> > On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> >> The @nents value that was passed to ib_dma_map_sg() has to be passed
> >> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> >> concatenate sg entries, it will return a different nents value than
> >> it was passed.
> >> 
> >> The bug was exposed by recent changes to the AMD IOMMU driver, which
> >> enabled sg entry concatenation.
> >> 
> >> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> >> new memory registration API") and reviewing other kernel ULPs, it's
> >> not clear that the frwr_map() logic was ever correct for this case.
> >> 
> >> Reported-by: Andre Tomt <andre@tomt.net>
> >> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> Cc: stable@vger.kernel.org # v5.5
> >> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
> >> 1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > Yep
> > 
> > Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> 
> Thanks.
> 
> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
> 
> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")

Not really, this was broken for other configurations besides AMD

Jason 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 19:05       ` Jason Gunthorpe
@ 2020-02-12 19:09         ` Chuck Lever
  2020-02-12 19:30           ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-12 19:09 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma



> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
>> 
>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>>>> concatenate sg entries, it will return a different nents value than
>>>> it was passed.
>>>> 
>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
>>>> enabled sg entry concatenation.
>>>> 
>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
>>>> new memory registration API") and reviewing other kernel ULPs, it's
>>>> not clear that the frwr_map() logic was ever correct for this case.
>>>> 
>>>> Reported-by: Andre Tomt <andre@tomt.net>
>>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> Cc: stable@vger.kernel.org # v5.5
>>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>> 
>>> Yep
>>> 
>>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>> 
>> Thanks.
>> 
>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
>> 
>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
> 
> Not really, this was broken for other configurations besides AMD

Agreed, but the bug seems to have been inconsequential until now?

Otherwise we should explore backporting farther into the past.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 19:09         ` Chuck Lever
@ 2020-02-12 19:30           ` Jason Gunthorpe
  2020-02-13 14:33             ` Chuck Lever
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-12 19:30 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma

On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote:
> 
> 
> > On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
> >> 
> >>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>> 
> >>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> >>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
> >>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> >>>> concatenate sg entries, it will return a different nents value than
> >>>> it was passed.
> >>>> 
> >>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
> >>>> enabled sg entry concatenation.
> >>>> 
> >>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> >>>> new memory registration API") and reviewing other kernel ULPs, it's
> >>>> not clear that the frwr_map() logic was ever correct for this case.
> >>>> 
> >>>> Reported-by: Andre Tomt <andre@tomt.net>
> >>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>> Cc: stable@vger.kernel.org # v5.5
> >>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
> >>>> 1 file changed, 7 insertions(+), 6 deletions(-)
> >>> 
> >>> Yep
> >>> 
> >>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> >> 
> >> Thanks.
> >> 
> >> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
> >> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
> >> 
> >> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
> > 
> > Not really, this was broken for other configurations besides AMD
> 
> Agreed, but the bug seems to have been inconsequential until now?

I imagine it would get you on ARM or other archs, IIRC.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-12 19:30           ` Jason Gunthorpe
@ 2020-02-13 14:33             ` Chuck Lever
  2020-02-13 14:56               ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2020-02-13 14:33 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma



> On Feb 12, 2020, at 2:30 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote:
>> 
>> 
>>> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
>>>> 
>>>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>> 
>>>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
>>>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>>>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>>>>>> concatenate sg entries, it will return a different nents value than
>>>>>> it was passed.
>>>>>> 
>>>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
>>>>>> enabled sg entry concatenation.
>>>>>> 
>>>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
>>>>>> new memory registration API") and reviewing other kernel ULPs, it's
>>>>>> not clear that the frwr_map() logic was ever correct for this case.
>>>>>> 
>>>>>> Reported-by: Andre Tomt <andre@tomt.net>
>>>>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>> Cc: stable@vger.kernel.org # v5.5
>>>>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
>>>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>>>> 
>>>>> Yep
>>>>> 
>>>>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>>>> 
>>>> Thanks.
>>>> 
>>>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
>>>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
>>>> 
>>>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
>>> 
>>> Not really, this was broken for other configurations besides AMD
>> 
>> Agreed, but the bug seems to have been inconsequential until now?
> 
> I imagine it would get you on ARM or other archs, IIRC.

That's certainly plausible, but I haven't received explicit bug reports
in this area. (I'm not at all saying that such bugs categorically do
not exist).

In any event, practical matters: the posted patch applies back to v5.4,
but fails to apply starting with v5.3.

I think we can leave the "Cc: stable # v5.5"; and I'm open to requests
to backport this simple fix onto earlier stable kernels (back to v4.4),
which can be handled case-by-case. 'Salright?

--
Chuck Lever




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance
  2020-02-13 14:33             ` Chuck Lever
@ 2020-02-13 14:56               ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 14:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma

On Thu, Feb 13, 2020 at 09:33:23AM -0500, Chuck Lever wrote:
> 
> 
> > On Feb 12, 2020, at 2:30 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote:
> >> 
> >> 
> >>> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>> 
> >>> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote:
> >>>> 
> >>>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>> 
> >>>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote:
> >>>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed
> >>>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
> >>>>>> concatenate sg entries, it will return a different nents value than
> >>>>>> it was passed.
> >>>>>> 
> >>>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which
> >>>>>> enabled sg entry concatenation.
> >>>>>> 
> >>>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
> >>>>>> new memory registration API") and reviewing other kernel ULPs, it's
> >>>>>> not clear that the frwr_map() logic was ever correct for this case.
> >>>>>> 
> >>>>>> Reported-by: Andre Tomt <andre@tomt.net>
> >>>>>> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> >>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>>>> Cc: stable@vger.kernel.org # v5.5
> >>>>>> net/sunrpc/xprtrdma/frwr_ops.c |   13 +++++++------
> >>>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
> >>>>> 
> >>>>> Yep
> >>>>> 
> >>>>> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> >>>> 
> >>>> Thanks.
> >>>> 
> >>>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit
> >>>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5".
> >>>> 
> >>>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
> >>> 
> >>> Not really, this was broken for other configurations besides AMD
> >> 
> >> Agreed, but the bug seems to have been inconsequential until now?
> > 
> > I imagine it would get you on ARM or other archs, IIRC.
> 
> That's certainly plausible, but I haven't received explicit bug reports
> in this area. (I'm not at all saying that such bugs categorically do
> not exist).

Usually I encourage people to put the fixes line to the commit that is
being fixed, pointing at some other commit that happens to expose the
bug is not the best. 
 
> In any event, practical matters: the posted patch applies back to v5.4,
> but fails to apply starting with v5.3.
> 
> I think we can leave the "Cc: stable # v5.5"; and I'm open to requests
> to backport this simple fix onto earlier stable kernels (back to v4.4),
> which can be handled case-by-case. 'Salright?

I'd just put Cc: stable, the stable folks will reject it on earlier
versions because of conflicts and we can leave it.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 16:12 [PATCH v3 0/2] Fix NFS/RDMA operation with Ryzen IOMMU Chuck Lever
2020-02-12 16:12 ` [PATCH v3 1/2] xprtrdma: Fix DMA scatter-gather list mapping imbalance Chuck Lever
2020-02-12 18:26   ` Jason Gunthorpe
2020-02-12 18:38     ` Chuck Lever
2020-02-12 19:05       ` Jason Gunthorpe
2020-02-12 19:09         ` Chuck Lever
2020-02-12 19:30           ` Jason Gunthorpe
2020-02-13 14:33             ` Chuck Lever
2020-02-13 14:56               ` Jason Gunthorpe
2020-02-12 16:12 ` [PATCH v3 2/2] xprtrdma: Enhance MR-related trace points Chuck Lever

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git