All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] staging: lustre: o2iblnd: Stop MLX5 triggering a dump_cqe
@ 2018-03-16 23:40 ` Doug Oucharek
  0 siblings, 0 replies; 4+ messages in thread
From: Doug Oucharek @ 2018-03-16 23:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Drokin, Oleg, Dilger, Andreas, James Simmons
  Cc: Linux Kernel Mailing List, Lustre Development List

We have found that MLX5 will trigger a dump_cqe if we don't
invalidate the rkey on a newly alloated MR for FastReg usage.

This fix just tags the MR as invalid on its creation if we are
using FastReg and that will force it to do an invalidate of the
rkey on first usage.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8752
Reviewed-on: https://review.whamcloud.com/24306
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Doug Oucharek <dougso@me.com>
---
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 7ae2955..00ebc61 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1384,7 +1384,10 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, struct kib_fmr_po
			goto out_middle;
		}

-		frd->frd_valid = true;
+		/* There appears to be a bug in MLX5 code where you must
+		 * invalidate the rkey of a new FastReg pool before first
+		 * using it. Thus, I am marking the FRD invalid here. */
+		frd->frd_valid = false;

		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
		fpo->fast_reg.fpo_pool_size++;
-- 
1.8.3.1

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [lustre-devel] [PATCH] staging: lustre: o2iblnd: Stop MLX5 triggering a dump_cqe
@ 2018-03-16 23:40 ` Doug Oucharek
  0 siblings, 0 replies; 4+ messages in thread
From: Doug Oucharek @ 2018-03-16 23:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Drokin, Oleg, Dilger, Andreas, James Simmons
  Cc: Linux Kernel Mailing List, Lustre Development List

We have found that MLX5 will trigger a dump_cqe if we don't
invalidate the rkey on a newly alloated MR for FastReg usage.

This fix just tags the MR as invalid on its creation if we are
using FastReg and that will force it to do an invalidate of the
rkey on first usage.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8752
Reviewed-on: https://review.whamcloud.com/24306
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Doug Oucharek <dougso@me.com>
---
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 7ae2955..00ebc61 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1384,7 +1384,10 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, struct kib_fmr_po
			goto out_middle;
		}

-		frd->frd_valid = true;
+		/* There appears to be a bug in MLX5 code where you must
+		 * invalidate the rkey of a new FastReg pool before first
+		 * using it. Thus, I am marking the FRD invalid here. */
+		frd->frd_valid = false;

		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
		fpo->fast_reg.fpo_pool_size++;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] staging: lustre: o2iblnd: Stop MLX5 triggering a dump_cqe
  2018-03-16 23:40 ` [lustre-devel] " Doug Oucharek
@ 2018-03-19  8:14   ` Dan Carpenter
  -1 siblings, 0 replies; 4+ messages in thread
From: Dan Carpenter @ 2018-03-19  8:14 UTC (permalink / raw)
  To: Doug Oucharek
  Cc: Greg Kroah-Hartman, devel, Drokin, Oleg, Dilger, Andreas,
	James Simmons, Linux Kernel Mailing List,
	Lustre Development List

I don't really understand this patch...

On Fri, Mar 16, 2018 at 04:40:21PM -0700, Doug Oucharek wrote:
> We have found that MLX5 will trigger a dump_cqe if we don't
> invalidate the rkey on a newly alloated MR for FastReg usage.
> 
> This fix just tags the MR as invalid on its creation if we are
> using FastReg and that will force it to do an invalidate of the
> rkey on first usage.

This paragraph makes the change seem like a limited workaround for a
bug in the MLX5 code.  Why can't the MLX5 code be fixed instead?

Looking at the patch it doesn't seem like a limitted solution at all.
Now frd->frd_valid is *always* set to false.  Why don't we instead just
delete ->frd_valid along with the newly dead code?

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [lustre-devel] [PATCH] staging: lustre: o2iblnd: Stop MLX5 triggering a dump_cqe
@ 2018-03-19  8:14   ` Dan Carpenter
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Carpenter @ 2018-03-19  8:14 UTC (permalink / raw)
  To: Doug Oucharek
  Cc: Greg Kroah-Hartman, devel, Drokin, Oleg, Dilger, Andreas,
	James Simmons, Linux Kernel Mailing List,
	Lustre Development List

I don't really understand this patch...

On Fri, Mar 16, 2018 at 04:40:21PM -0700, Doug Oucharek wrote:
> We have found that MLX5 will trigger a dump_cqe if we don't
> invalidate the rkey on a newly alloated MR for FastReg usage.
> 
> This fix just tags the MR as invalid on its creation if we are
> using FastReg and that will force it to do an invalidate of the
> rkey on first usage.

This paragraph makes the change seem like a limited workaround for a
bug in the MLX5 code.  Why can't the MLX5 code be fixed instead?

Looking at the patch it doesn't seem like a limitted solution at all.
Now frd->frd_valid is *always* set to false.  Why don't we instead just
delete ->frd_valid along with the newly dead code?

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-03-19  8:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16 23:40 [PATCH] staging: lustre: o2iblnd: Stop MLX5 triggering a dump_cqe Doug Oucharek
2018-03-16 23:40 ` [lustre-devel] " Doug Oucharek
2018-03-19  8:14 ` Dan Carpenter
2018-03-19  8:14   ` [lustre-devel] " Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.