All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
@ 2017-04-24 22:15 Bart Van Assche
       [not found] ` <8992bd28-667f-94b1-e582-106e6b41aa4b-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2017-04-24 22:15 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Max Gurtovoy, Sagi Grimberg, Leon Romanovsky, Israel Rukshin,
	Laurence Oberman, linux-rdma-u79uwXL29TY76Z2rM5mHXA

ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
than what fits into a single MR. .map_mr_sg() must not attempt to
map more SG-list elements than what fits into a single MR.
Hence make sure that mlx5_ib_sg_to_klms() does not write outside
the MR klms[] array.

Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index d9c6c0ea750b..99beacfc4716 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1777,7 +1777,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
 	mr->ndescs = sg_nents;
 
 	for_each_sg(sgl, sg, sg_nents, i) {
-		if (unlikely(i > mr->max_descs))
+		if (unlikely(i >= mr->max_descs))
 			break;
 		klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset);
 		klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset);
-- 
2.12.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found] ` <8992bd28-667f-94b1-e582-106e6b41aa4b-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2017-04-24 22:39   ` Laurence Oberman
       [not found]     ` <1726285260.1422143.1493073573791.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-04-25 17:58   ` Leon Romanovsky
  2017-04-26 14:45   ` Sagi Grimberg
  2 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-24 22:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Doug Ledford, Max Gurtovoy, Sagi Grimberg, Leon Romanovsky,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> To: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Monday, April 24, 2017 6:15:28 PM
> Subject: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> than what fits into a single MR. .map_mr_sg() must not attempt to
> map more SG-list elements than what fits into a single MR.
> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> the MR klms[] array.
> 
> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> b/drivers/infiniband/hw/mlx5/mr.c
> index d9c6c0ea750b..99beacfc4716 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1777,7 +1777,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
>  	mr->ndescs = sg_nents;
>  
>  	for_each_sg(sgl, sg, sg_nents, i) {
> -		if (unlikely(i > mr->max_descs))
> +		if (unlikely(i >= mr->max_descs))
>  			break;
>  		klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset);
>  		klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset);
> --
> 2.12.2
> 
> 

Thanks Bart as always.
Will get this tested this week,

Regards
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]     ` <1726285260.1422143.1493073573791.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-24 22:46       ` Bart Van Assche
       [not found]         ` <1493073989.3394.24.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2017-04-24 22:46 UTC (permalink / raw)
  To: loberman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: maxg-VPRAkNaXOzVWk0Htik3J/w, israelr-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, sagi-NQWnxTmZq1alnMjI0IkVqw

On Mon, 2017-04-24 at 18:39 -0400, Laurence Oberman wrote:
> Will get this tested this week,

Thanks Laurence. BTW, if you want to test this patch with the SRP protocol
you will also have to revert commit d6c58dc40fec ("IB/SRP: Avoid using
IB_MR_TYPE_SG_GAPS"). The code path touched by this patch is namely only
relevant for IB_MR_TYPE_SG_GAPS memory regions. Currently the SRP initiator
driver does not use that MR type. Reverting the aforementioned commit will
make the SRP initiator driver use that MR type.

Please also apply Sagi's "mlx5: Fix mlx5_ib_map_mr_sg mr length" patch
before starting any tests.

Thanks,

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]         ` <1493073989.3394.24.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2017-04-24 22:59           ` Laurence Oberman
  0 siblings, 0 replies; 26+ messages in thread
From: Laurence Oberman @ 2017-04-24 22:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: maxg-VPRAkNaXOzVWk0Htik3J/w, israelr-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, sagi-NQWnxTmZq1alnMjI0IkVqw



----- Original Message -----
> From: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> To: loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Cc: maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
> sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org
> Sent: Monday, April 24, 2017 6:46:30 PM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> On Mon, 2017-04-24 at 18:39 -0400, Laurence Oberman wrote:
> > Will get this tested this week,
> 
> Thanks Laurence. BTW, if you want to test this patch with the SRP protocol
> you will also have to revert commit d6c58dc40fec ("IB/SRP: Avoid using
> IB_MR_TYPE_SG_GAPS"). The code path touched by this patch is namely only
> relevant for IB_MR_TYPE_SG_GAPS memory regions. Currently the SRP initiator
> driver does not use that MR type. Reverting the aforementioned commit will
> make the SRP initiator driver use that MR type.
> 
> Please also apply Sagi's "mlx5: Fix mlx5_ib_map_mr_sg mr length" patch
> before starting any tests.
> 
> Thanks,
> 
> Bart.--
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Understood
Regards
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found] ` <8992bd28-667f-94b1-e582-106e6b41aa4b-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2017-04-24 22:39   ` Laurence Oberman
@ 2017-04-25 17:58   ` Leon Romanovsky
       [not found]     ` <20170425175849.GS14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-04-26 14:45   ` Sagi Grimberg
  2 siblings, 1 reply; 26+ messages in thread
From: Leon Romanovsky @ 2017-04-25 17:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Doug Ledford, Max Gurtovoy, Sagi Grimberg, Israel Rukshin,
	Laurence Oberman, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> than what fits into a single MR. .map_mr_sg() must not attempt to
> map more SG-list elements than what fits into a single MR.
> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> the MR klms[] array.
>
> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>

Bart,

Thanks a lot, it indeed looks right.
Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]     ` <20170425175849.GS14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-04-25 20:37       ` Laurence Oberman
       [not found]         ` <438230391.2090966.1493152655709.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-25 20:37 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Bart Van Assche, Doug Ledford, Max Gurtovoy, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Tuesday, April 25, 2017 1:58:49 PM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> > ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> > than what fits into a single MR. .map_mr_sg() must not attempt to
> > map more SG-list elements than what fits into a single MR.
> > Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> > the MR klms[] array.
> >
> > Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> > Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > ---
> >  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> 
> Bart,
> 
> Thanks a lot, it indeed looks right.
> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Thanks
> 


Hello Bart, Leon, Max and Israel.

I cloned off Barts tree.

git clone https://github.com/bvanassche/linux
cd linux
git checkout block-scsi-for-next

I checked all patches were in for this test.

a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt

Built and tested the kernel.

However this issue is not resolved :(

[ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
[ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
[ 2708.121342] 00000000 00000000 00000000 00000000
[ 2708.147104] 00000000 00000000 00000000 00000000
[ 2708.172633] 00000000 00000000 00000000 00000000
[ 2708.198702] 00000000 0f007806 2500002a 14a527d0
[ 2732.434127] scsi host1: ib_srp: reconnect succeeded
[ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9c30

[root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
[ 2746.443240] 00000000 00000000 00000000 00000000
[ 2746.469323] 00000000 00000000 00000000 00000000
[ 2746.495310] 00000000 00000000 00000000 00000000
[ 2746.521407] 00000000 0f007806 25000032 003c7ad0
[ 2752.445899] scsi host1: ib_srp: reconnect succeeded
[ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
[ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
[ 2763.297826] 00000000 00000000 00000000 00000000
[ 2763.323352] 00000000 00000000 00000000 00000000
[ 2763.348722] 00000000 00000000 00000000 00000000
[ 2763.374681] 00000000 0f007806 2500003a 00084bd0

[root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP port-1:1 / host1.
[ 2769.415956] scsi host1: ib_srp: reconnect succeeded
[ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
[ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
[ 2780.093520] 00000000 00000000 00000000 00000000
[ 2780.120067] 00000000 00000000 00000000 00000000
[ 2780.145575] 00000000 00000000 00000000 00000000
[ 2780.171153] 00000000 0f007806 25000042 000833d0
[ 2785.923399] scsi host1: ib_srp: reconnect succeeded
[ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
[ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
[ 2796.495257] 00000000 00000000 00000000 00000000
[ 2796.521506] 00000000 00000000 00000000 00000000
[ 2796.547640] 00000000 00000000 00000000 00000000
[ 2796.573120] 00000000 0f007806 2500004a 00083bd0
[ 2802.562578] scsi host1: ib_srp: reconnect succeeded
[ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0

Regards
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]         ` <438230391.2090966.1493152655709.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-26  3:39           ` Bart Van Assche
       [not found]             ` <1493177952.3503.1.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2017-04-26  6:16           ` Leon Romanovsky
  2017-04-26  8:31           ` Max Gurtovoy
  2 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2017-04-26  3:39 UTC (permalink / raw)
  To: leonro-VPRAkNaXOzVWk0Htik3J/w, loberman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: maxg-VPRAkNaXOzVWk0Htik3J/w, israelr-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, sagi-NQWnxTmZq1alnMjI0IkVqw

On Tue, 2017-04-25 at 16:37 -0400, Laurence Oberman wrote:
> Hello Bart, Leon, Max and Israel.
> 
> I cloned off Barts tree.
> 
> git clone https://github.com/bvanassche/linux
> cd linux
> git checkout block-scsi-for-next
> 
> I checked all patches were in for this test.
> 
> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> 
> Built and tested the kernel.
> 
> However this issue is not resolved :(
> 
> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> [ 2708.121342] 00000000 00000000 00000000 00000000
> [ 2708.147104] 00000000 00000000 00000000 00000000
> [ 2708.172633] 00000000 00000000 00000000 00000000
> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9c30

Hello Laurence,

Thank you for having run this test. But are you aware that if a flush error
is reported at the initiator side that does not necessarily mean that there
is a bug at the initiator side? If e.g. the target system would initiate a
disconnect that would also trigger this kind of flush errors. What kind of
SRP target system was used in this test? Were the clocks of initiator and
target system synchronized? Are the logs of the target system available? If
so, can you have a look whether anything interesting can be found in the
target log around the time the initiator reported the flush error?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]         ` <438230391.2090966.1493152655709.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-04-26  3:39           ` Bart Van Assche
@ 2017-04-26  6:16           ` Leon Romanovsky
       [not found]             ` <20170426061640.GV14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-04-26  8:31           ` Max Gurtovoy
  2 siblings, 1 reply; 26+ messages in thread
From: Leon Romanovsky @ 2017-04-26  6:16 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Bart Van Assche, Doug Ledford, Max Gurtovoy, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 3330 bytes --]

On Tue, Apr 25, 2017 at 04:37:35PM -0400, Laurence Oberman wrote:
>
>
> ----- Original Message -----
> > From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> > "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Tuesday, April 25, 2017 1:58:49 PM
> > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> >
> > On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> > > ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> > > than what fits into a single MR. .map_mr_sg() must not attempt to
> > > map more SG-list elements than what fits into a single MR.
> > > Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> > > the MR klms[] array.
> > >
> > > Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> > > Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > > Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > > ---
> > >  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> >
> > Bart,
> >
> > Thanks a lot, it indeed looks right.
> > Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > Thanks
> >
>
>
> Hello Bart, Leon, Max and Israel.
>
> I cloned off Barts tree.
>
> git clone https://github.com/bvanassche/linux
> cd linux
> git checkout block-scsi-for-next
>
> I checked all patches were in for this test.
>
> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
>
> Built and tested the kernel.
>
> However this issue is not resolved :(
>
> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> [ 2708.121342] 00000000 00000000 00000000 00000000
> [ 2708.147104] 00000000 00000000 00000000 00000000
> [ 2708.172633] 00000000 00000000 00000000 00000000
> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0

Parsed version:
	hw_error_syndrome                : 0xf
	hw_syndrome_type                 : 0x0
	vendor_error_syndrome            : 0x78
	syndrome                         : MEMORY_WINDOW_BIND_ERROR (0x6)
	s_wqe_opcode                     : UMR (0x25)
	opcode                           : REQUESTOR_ERROR (0xd)
	cqe_format                       : NO_INLINE_DATA (0x0)
	owner                            : 0x0

Description:
	umr.klm_octoword_count > mkey.mtt_octoword_count

Sagi, Max,
Any idea where can it be?

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]         ` <438230391.2090966.1493152655709.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-04-26  3:39           ` Bart Van Assche
  2017-04-26  6:16           ` Leon Romanovsky
@ 2017-04-26  8:31           ` Max Gurtovoy
       [not found]             ` <896e9a9e-43b6-7a21-e41b-861e4f795436-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2 siblings, 1 reply; 26+ messages in thread
From: Max Gurtovoy @ 2017-04-26  8:31 UTC (permalink / raw)
  To: Laurence Oberman, Leon Romanovsky
  Cc: Bart Van Assche, Doug Ledford, Sagi Grimberg, Israel Rukshin,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 4/25/2017 11:37 PM, Laurence Oberman wrote:
>
>
> ----- Original Message -----
>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Sent: Tuesday, April 25, 2017 1:58:49 PM
>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>>
>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
>>> than what fits into a single MR. .map_mr_sg() must not attempt to
>>> map more SG-list elements than what fits into a single MR.
>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
>>> the MR klms[] array.
>>>
>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
>>> ---
>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>
>> Bart,
>>
>> Thanks a lot, it indeed looks right.
>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Thanks
>>
>
>
> Hello Bart, Leon, Max and Israel.
>
> I cloned off Barts tree.
>
> git clone https://github.com/bvanassche/linux
> cd linux
> git checkout block-scsi-for-next
>
> I checked all patches were in for this test.
>
> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt

Hi,
copying Sagi's request from different thread:

"
Can you please enable srp_add_one debug:

echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control

In addition apply the following:
-- 
diff --git a/drivers/infiniband/hw/mlx5/mr.c 
b/drivers/infiniband/hw/mlx5/mr.c
index d9c6c0ea750b..040fbc387e4f 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
         int add_size;
         int ret;

+       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
+
         add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);

         mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);

"

Max.

>
> Built and tested the kernel.
>
> However this issue is not resolved :(
>
> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> [ 2708.121342] 00000000 00000000 00000000 00000000
> [ 2708.147104] 00000000 00000000 00000000 00000000
> [ 2708.172633] 00000000 00000000 00000000 00000000
> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9c30
>
> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> [ 2746.443240] 00000000 00000000 00000000 00000000
> [ 2746.469323] 00000000 00000000 00000000 00000000
> [ 2746.495310] 00000000 00000000 00000000 00000000
> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> [ 2763.297826] 00000000 00000000 00000000 00000000
> [ 2763.323352] 00000000 00000000 00000000 00000000
> [ 2763.348722] 00000000 00000000 00000000 00000000
> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
>
> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> [ 2780.093520] 00000000 00000000 00000000 00000000
> [ 2780.120067] 00000000 00000000 00000000 00000000
> [ 2780.145575] 00000000 00000000 00000000 00000000
> [ 2780.171153] 00000000 0f007806 25000042 000833d0
> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> [ 2796.495257] 00000000 00000000 00000000 00000000
> [ 2796.521506] 00000000 00000000 00000000 00000000
> [ 2796.547640] 00000000 00000000 00000000 00000000
> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0
>
> Regards
> Laurence
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]             ` <20170426061640.GV14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-04-26 10:30               ` Max Gurtovoy
  2017-05-03  8:18               ` Sagi Grimberg
  1 sibling, 0 replies; 26+ messages in thread
From: Max Gurtovoy @ 2017-04-26 10:30 UTC (permalink / raw)
  To: Leon Romanovsky, Laurence Oberman
  Cc: Bart Van Assche, Doug Ledford, Sagi Grimberg, Israel Rukshin,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 4/26/2017 9:16 AM, Leon Romanovsky wrote:
> On Tue, Apr 25, 2017 at 04:37:35PM -0400, Laurence Oberman wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
>>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Sent: Tuesday, April 25, 2017 1:58:49 PM
>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>>>
>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
>>>> than what fits into a single MR. .map_mr_sg() must not attempt to
>>>> map more SG-list elements than what fits into a single MR.
>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
>>>> the MR klms[] array.
>>>>
>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
>>>> ---
>>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>
>>> Bart,
>>>
>>> Thanks a lot, it indeed looks right.
>>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>
>>> Thanks
>>>
>>
>>
>> Hello Bart, Leon, Max and Israel.
>>
>> I cloned off Barts tree.
>>
>> git clone https://github.com/bvanassche/linux
>> cd linux
>> git checkout block-scsi-for-next
>>
>> I checked all patches were in for this test.
>>
>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
>>
>> Built and tested the kernel.
>>
>> However this issue is not resolved :(
>>
>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
>> [ 2708.121342] 00000000 00000000 00000000 00000000
>> [ 2708.147104] 00000000 00000000 00000000 00000000
>> [ 2708.172633] 00000000 00000000 00000000 00000000
>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
>
> Parsed version:
> 	hw_error_syndrome                : 0xf
> 	hw_syndrome_type                 : 0x0
> 	vendor_error_syndrome            : 0x78
> 	syndrome                         : MEMORY_WINDOW_BIND_ERROR (0x6)
> 	s_wqe_opcode                     : UMR (0x25)
> 	opcode                           : REQUESTOR_ERROR (0xd)
> 	cqe_format                       : NO_INLINE_DATA (0x0)
> 	owner                            : 0x0
>
> Description:
> 	umr.klm_octoword_count > mkey.mtt_octoword_count
>
> Sagi, Max,
> Any idea where can it be?

Sagi,
I see this code in drivers/infiniband/hw/mlx5/mr.c:

"
...
else if (mr_type == IB_MR_TYPE_SG_GAPS) {
                 mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS;

                 err = mlx5_alloc_priv_descs(pd->device, mr,
                                             ndescs, sizeof(struct 
mlx5_klm));
                 if (err)
                         goto err_free_in;
                 mr->desc_size = sizeof(struct mlx5_klm);
                 mr->max_descs = ndescs;
"

while in the past it was:
"
} else if (mr_type == IB_MR_INDIRECT_REG) {
                 MLX5_SET(mkc, mkc, translations_octword_size,
                          ALIGN(max_num_sg + 1, 4));
                 mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS | 
MLX5_PERM_UMR_EN;
                 mr->max_descs = ndescs;

"

in INDIRECT_REG it was + 1...

maybe this is the issue ?

Max.

>
> Thanks
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]             ` <1493177952.3503.1.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2017-04-26 11:46               ` Laurence Oberman
       [not found]                 ` <1801288254.2280763.1493207193850.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 11:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: leonro-VPRAkNaXOzVWk0Htik3J/w, maxg-VPRAkNaXOzVWk0Htik3J/w,
	israelr-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, sagi-NQWnxTmZq1alnMjI0IkVqw



----- Original Message -----
> From: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> To: leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Cc: maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org
> Sent: Tuesday, April 25, 2017 11:39:12 PM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> On Tue, 2017-04-25 at 16:37 -0400, Laurence Oberman wrote:
> > Hello Bart, Leon, Max and Israel.
> > 
> > I cloned off Barts tree.
> > 
> > git clone https://github.com/bvanassche/linux
> > cd linux
> > git checkout block-scsi-for-next
> > 
> > I checked all patches were in for this test.
> > 
> > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> > 
> > Built and tested the kernel.
> > 
> > However this issue is not resolved :(
> > 
> > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817edca86b0
> > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> > [ 2708.121342] 00000000 00000000 00000000 00000000
> > [ 2708.147104] 00000000 00000000 00000000 00000000
> > [ 2708.172633] 00000000 00000000 00000000 00000000
> > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> > [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> > [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817ed0a9c30
> 
> Hello Laurence,
> 
> Thank you for having run this test. But are you aware that if a flush error
> is reported at the initiator side that does not necessarily mean that there
> is a bug at the initiator side? If e.g. the target system would initiate a
> disconnect that would also trigger this kind of flush errors. What kind of
> SRP target system was used in this test? Were the clocks of initiator and
> target system synchronized? Are the logs of the target system available? If
> so, can you have a look whether anything interesting can be found in the
> target log around the time the initiator reported the flush error?
> 
> Thanks,
> 
> Bart.

Hi Bart

Its the same target that is stable for all other tests.
This is the same issue I originally reported when we then reverted the SG+GAPS.
Remember when I reverted that we were stable again.

This happens on the initiator first

[root@localhost ~]# [  512.375904] mlx5_0:dump_cqe:262:(pid 4653): dump error cqe
[  512.376648] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817c596f770
[  512.454276] 00000000 00000000 00000000 00000000
[  512.478734] 00000000 00000000 00000000 00000000
[  512.504170] 00000000 00000000 00000000 00000000
[  512.529457] 00000000 0f007806 2500002a 0548e2d0
[  532.128455] scsi host2: ib_srp: reconnect succeeded
[  532.232126] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bf2bb3bf0
[  532.780107] mlx5_0:dump_cqe:262:(pid 511): dump error cqe
[  532.811863] 00000000 00000000 00000000 00000000
[  532.837984] 00000000 00000000 00000000 00000000
[  532.863955] 00000000 00000000 00000000 00000000
[  532.889885] 00000000 0f007806 25000032 00683bd0

Only afterwards do I see the target complain

[root@fedstorage ~]# [  537.105985] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-48.
[  537.152767] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-47.
[  537.200585] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-46.
[  537.247864] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-45.
[  537.296822] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-44.
[  537.345001] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-43.
[  537.394146] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-42.
[  537.442148] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-41.
[  537.490011] ib_srpt sending response for ioctx 0xffff8800951ed800 failed with status 5
[  539.774018] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  539.887987] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  540.001241] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  540.111455] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  540.224780] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  540.340522] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  540.453736] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)
[  540.567043] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e)

Thanks
Laurence

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]             ` <896e9a9e-43b6-7a21-e41b-861e4f795436-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-04-26 11:47               ` Laurence Oberman
       [not found]                 ` <288883138.2280971.1493207257218.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 11:47 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
> <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, April 26, 2017 4:31:57 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> On 4/25/2017 11:37 PM, Laurence Oberman wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> >> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> >> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> >> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> >> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Sent: Tuesday, April 25, 2017 1:58:49 PM
> >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> >> overflows the klms[] array
> >>
> >> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> >>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> >>> than what fits into a single MR. .map_mr_sg() must not attempt to
> >>> map more SG-list elements than what fits into a single MR.
> >>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> >>> the MR klms[] array.
> >>>
> >>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> >>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> >>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> >>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> >>> ---
> >>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>
> >> Bart,
> >>
> >> Thanks a lot, it indeed looks right.
> >> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >> Thanks
> >>
> >
> >
> > Hello Bart, Leon, Max and Israel.
> >
> > I cloned off Barts tree.
> >
> > git clone https://github.com/bvanassche/linux
> > cd linux
> > git checkout block-scsi-for-next
> >
> > I checked all patches were in for this test.
> >
> > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> 
> Hi,
> copying Sagi's request from different thread:
> 
> "
> Can you please enable srp_add_one debug:
> 
> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
> 
> In addition apply the following:
> --
> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> b/drivers/infiniband/hw/mlx5/mr.c
> index d9c6c0ea750b..040fbc387e4f 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
>          int add_size;
>          int ret;
> 
> +       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> +
>          add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
> 
>          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
> 
> "
> 
> Max.
> 
> >
> > Built and tested the kernel.
> >
> > However this issue is not resolved :(
> >
> > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817edca86b0
> > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> > [ 2708.121342] 00000000 00000000 00000000 00000000
> > [ 2708.147104] 00000000 00000000 00000000 00000000
> > [ 2708.172633] 00000000 00000000 00000000 00000000
> > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> > [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> > [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817ed0a9c30
> >
> > [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump
> > error cqe
> > [ 2746.443240] 00000000 00000000 00000000 00000000
> > [ 2746.469323] 00000000 00000000 00000000 00000000
> > [ 2746.495310] 00000000 00000000 00000000 00000000
> > [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> > [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> > [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817ed0a9cf0
> > [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> > [ 2763.297826] 00000000 00000000 00000000 00000000
> > [ 2763.323352] 00000000 00000000 00000000 00000000
> > [ 2763.348722] 00000000 00000000 00000000 00000000
> > [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
> >
> > [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
> > port-1:1 / host1.
> > [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> > [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817ed0a9cf0
> > [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > [ 2780.093520] 00000000 00000000 00000000 00000000
> > [ 2780.120067] 00000000 00000000 00000000 00000000
> > [ 2780.145575] 00000000 00000000 00000000 00000000
> > [ 2780.171153] 00000000 0f007806 25000042 000833d0
> > [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> > [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817ed0a9cf0
> > [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > [ 2796.495257] 00000000 00000000 00000000 00000000
> > [ 2796.521506] 00000000 00000000 00000000 00000000
> > [ 2796.547640] 00000000 00000000 00000000 00000000
> > [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> > [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> > [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817ed0a9cf0
> >
> > Regards
> > Laurence
> >
> 
Doing this now
Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                 ` <288883138.2280971.1493207257218.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-26 12:18                   ` Laurence Oberman
       [not found]                     ` <497950649.2287440.1493209093092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 12:18 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, April 26, 2017 7:47:37 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> ----- Original Message -----
> > From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
> > <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
> > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Wednesday, April 26, 2017 4:31:57 AM
> > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > overflows the klms[] array
> > 
> > 
> > 
> > On 4/25/2017 11:37 PM, Laurence Oberman wrote:
> > >
> > >
> > > ----- Original Message -----
> > >> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > >> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> > >> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> > >> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> > >> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >> Sent: Tuesday, April 25, 2017 1:58:49 PM
> > >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > >> overflows the klms[] array
> > >>
> > >> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> > >>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> > >>> than what fits into a single MR. .map_mr_sg() must not attempt to
> > >>> map more SG-list elements than what fits into a single MR.
> > >>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> > >>> the MR klms[] array.
> > >>>
> > >>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> > >>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > >>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > >>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > >>> ---
> > >>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> > >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>
> > >>
> > >> Bart,
> > >>
> > >> Thanks a lot, it indeed looks right.
> > >> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>
> > >> Thanks
> > >>
> > >
> > >
> > > Hello Bart, Leon, Max and Israel.
> > >
> > > I cloned off Barts tree.
> > >
> > > git clone https://github.com/bvanassche/linux
> > > cd linux
> > > git checkout block-scsi-for-next
> > >
> > > I checked all patches were in for this test.
> > >
> > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> > > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> > 
> > Hi,
> > copying Sagi's request from different thread:
> > 
> > "
> > Can you please enable srp_add_one debug:
> > 
> > echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
> > 
> > In addition apply the following:
> > --
> > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > b/drivers/infiniband/hw/mlx5/mr.c
> > index d9c6c0ea750b..040fbc387e4f 100644
> > --- a/drivers/infiniband/hw/mlx5/mr.c
> > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
> >          int add_size;
> >          int ret;
> > 
> > +       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> > +
> >          add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
> > 
> >          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
> > 
> > "
> > 
> > Max.
> > 
> > >
> > > Built and tested the kernel.
> > >
> > > However this issue is not resolved :(
> > >
> > > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817edca86b0
> > > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> > > [ 2708.121342] 00000000 00000000 00000000 00000000
> > > [ 2708.147104] 00000000 00000000 00000000 00000000
> > > [ 2708.172633] 00000000 00000000 00000000 00000000
> > > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> > > [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> > > [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817ed0a9c30
> > >
> > > [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump
> > > error cqe
> > > [ 2746.443240] 00000000 00000000 00000000 00000000
> > > [ 2746.469323] 00000000 00000000 00000000 00000000
> > > [ 2746.495310] 00000000 00000000 00000000 00000000
> > > [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> > > [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> > > [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817ed0a9cf0
> > > [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> > > [ 2763.297826] 00000000 00000000 00000000 00000000
> > > [ 2763.323352] 00000000 00000000 00000000 00000000
> > > [ 2763.348722] 00000000 00000000 00000000 00000000
> > > [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
> > >
> > > [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
> > > port-1:1 / host1.
> > > [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> > > [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817ed0a9cf0
> > > [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > > [ 2780.093520] 00000000 00000000 00000000 00000000
> > > [ 2780.120067] 00000000 00000000 00000000 00000000
> > > [ 2780.145575] 00000000 00000000 00000000 00000000
> > > [ 2780.171153] 00000000 0f007806 25000042 000833d0
> > > [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> > > [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817ed0a9cf0
> > > [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > > [ 2796.495257] 00000000 00000000 00000000 00000000
> > > [ 2796.521506] 00000000 00000000 00000000 00000000
> > > [ 2796.547640] 00000000 00000000 00000000 00000000
> > > [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> > > [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> > > [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817ed0a9cf0
> > >
> > > Regards
> > > Laurence
> > >
> > 
> Doing this now
> Thanks
> Laurence

Max

The Patch is not correct.

drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has no member named 'attr'
  WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
                              ^
./include/asm-generic/bug.h:117:27: note: in definition of macro 'WARN_ON_ONCE'
  int __ret_warn_once = !!(condition);   \

I think you meant to give me

WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);

Can you confirm

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                     ` <497950649.2287440.1493209093092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-26 12:20                       ` Laurence Oberman
  2017-04-26 12:25                       ` Max Gurtovoy
  1 sibling, 0 replies; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 12:20 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, April 26, 2017 8:18:13 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin"
> > <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Wednesday, April 26, 2017 7:47:37 AM
> > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > overflows the klms[] array
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
> > > <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
> > > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > Sent: Wednesday, April 26, 2017 4:31:57 AM
> > > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > > overflows the klms[] array
> > > 
> > > 
> > > 
> > > On 4/25/2017 11:37 PM, Laurence Oberman wrote:
> > > >
> > > >
> > > > ----- Original Message -----
> > > >> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > >> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> > > >> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> > > >> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> > > >> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > >> Sent: Tuesday, April 25, 2017 1:58:49 PM
> > > >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > > >> overflows the klms[] array
> > > >>
> > > >> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> > > >>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> > > >>> than what fits into a single MR. .map_mr_sg() must not attempt to
> > > >>> map more SG-list elements than what fits into a single MR.
> > > >>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> > > >>> the MR klms[] array.
> > > >>>
> > > >>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> > > >>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > >>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > > >>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > > >>> ---
> > > >>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> > > >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>
> > > >>
> > > >> Bart,
> > > >>
> > > >> Thanks a lot, it indeed looks right.
> > > >> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>
> > > >> Thanks
> > > >>
> > > >
> > > >
> > > > Hello Bart, Leon, Max and Israel.
> > > >
> > > > I cloned off Barts tree.
> > > >
> > > > git clone https://github.com/bvanassche/linux
> > > > cd linux
> > > > git checkout block-scsi-for-next
> > > >
> > > > I checked all patches were in for this test.
> > > >
> > > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > > > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[]
> > > > array
> > > > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> > > 
> > > Hi,
> > > copying Sagi's request from different thread:
> > > 
> > > "
> > > Can you please enable srp_add_one debug:
> > > 
> > > echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
> > > 
> > > In addition apply the following:
> > > --
> > > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > > b/drivers/infiniband/hw/mlx5/mr.c
> > > index d9c6c0ea750b..040fbc387e4f 100644
> > > --- a/drivers/infiniband/hw/mlx5/mr.c
> > > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > > @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
> > >          int add_size;
> > >          int ret;
> > > 
> > > +       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> > > +
> > >          add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN,
> > >          0);
> > > 
> > >          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
> > > 
> > > "
> > > 
> > > Max.
> > > 
> > > >
> > > > Built and tested the kernel.
> > > >
> > > > However this issue is not resolved :(
> > > >
> > > > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > > for
> > > > CQE ffff8817edca86b0
> > > > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> > > > [ 2708.121342] 00000000 00000000 00000000 00000000
> > > > [ 2708.147104] 00000000 00000000 00000000 00000000
> > > > [ 2708.172633] 00000000 00000000 00000000 00000000
> > > > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> > > > [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> > > > [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > > for
> > > > CQE ffff8817ed0a9c30
> > > >
> > > > [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877):
> > > > dump
> > > > error cqe
> > > > [ 2746.443240] 00000000 00000000 00000000 00000000
> > > > [ 2746.469323] 00000000 00000000 00000000 00000000
> > > > [ 2746.495310] 00000000 00000000 00000000 00000000
> > > > [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> > > > [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> > > > [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > > for
> > > > CQE ffff8817ed0a9cf0
> > > > [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> > > > [ 2763.297826] 00000000 00000000 00000000 00000000
> > > > [ 2763.323352] 00000000 00000000 00000000 00000000
> > > > [ 2763.348722] 00000000 00000000 00000000 00000000
> > > > [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
> > > >
> > > > [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
> > > > port-1:1 / host1.
> > > > [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> > > > [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > > for
> > > > CQE ffff8817ed0a9cf0
> > > > [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > > > [ 2780.093520] 00000000 00000000 00000000 00000000
> > > > [ 2780.120067] 00000000 00000000 00000000 00000000
> > > > [ 2780.145575] 00000000 00000000 00000000 00000000
> > > > [ 2780.171153] 00000000 0f007806 25000042 000833d0
> > > > [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> > > > [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > > for
> > > > CQE ffff8817ed0a9cf0
> > > > [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > > > [ 2796.495257] 00000000 00000000 00000000 00000000
> > > > [ 2796.521506] 00000000 00000000 00000000 00000000
> > > > [ 2796.547640] 00000000 00000000 00000000 00000000
> > > > [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> > > > [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> > > > [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > > for
> > > > CQE ffff8817ed0a9cf0
> > > >
> > > > Regards
> > > > Laurence
> > > >
> > > 
> > Doing this now
> > Thanks
> > Laurence
> 
> Max
> 
> The Patch is not correct.
> 
> drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
> drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has no
> member named 'attr'
>   WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
>                               ^
> ./include/asm-generic/bug.h:117:27: note: in definition of macro
> 'WARN_ON_ONCE'
>   int __ret_warn_once = !!(condition);   \
> 
> I think you meant to give me
> 
> WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);
> 
> Can you confirm
> 
> Thanks
> Laurence


Oops rather this

WARN_ON_ONCE(ndescs > device->ib_device_attr.max_fast_reg_page_list_len);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                     ` <497950649.2287440.1493209093092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-04-26 12:20                       ` Laurence Oberman
@ 2017-04-26 12:25                       ` Max Gurtovoy
       [not found]                         ` <16ea1371-84a5-c055-5b0c-fdc6d355276a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Max Gurtovoy @ 2017-04-26 12:25 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 4/26/2017 3:18 PM, Laurence Oberman wrote:
>
>
> ----- Original Message -----
>> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Sent: Wednesday, April 26, 2017 7:47:37 AM
>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>>
>>
>>
>> ----- Original Message -----
>>> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
>>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
>>> <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Sent: Wednesday, April 26, 2017 4:31:57 AM
>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
>>> overflows the klms[] array
>>>
>>>
>>>
>>> On 4/25/2017 11:37 PM, Laurence Oberman wrote:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
>>>>> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
>>>>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
>>>>> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> Sent: Tuesday, April 25, 2017 1:58:49 PM
>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
>>>>> overflows the klms[] array
>>>>>
>>>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
>>>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
>>>>>> than what fits into a single MR. .map_mr_sg() must not attempt to
>>>>>> map more SG-list elements than what fits into a single MR.
>>>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
>>>>>> the MR klms[] array.
>>>>>>
>>>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
>>>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>>>>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
>>>>>> ---
>>>>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>
>>>>> Bart,
>>>>>
>>>>> Thanks a lot, it indeed looks right.
>>>>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>> Hello Bart, Leon, Max and Israel.
>>>>
>>>> I cloned off Barts tree.
>>>>
>>>> git clone https://github.com/bvanassche/linux
>>>> cd linux
>>>> git checkout block-scsi-for-next
>>>>
>>>> I checked all patches were in for this test.
>>>>
>>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
>>>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>>>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
>>>
>>> Hi,
>>> copying Sagi's request from different thread:
>>>
>>> "
>>> Can you please enable srp_add_one debug:
>>>
>>> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
>>>
>>> In addition apply the following:
>>> --
>>> diff --git a/drivers/infiniband/hw/mlx5/mr.c
>>> b/drivers/infiniband/hw/mlx5/mr.c
>>> index d9c6c0ea750b..040fbc387e4f 100644
>>> --- a/drivers/infiniband/hw/mlx5/mr.c
>>> +++ b/drivers/infiniband/hw/mlx5/mr.c
>>> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
>>>          int add_size;
>>>          int ret;
>>>
>>> +       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
>>> +
>>>          add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
>>>
>>>          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
>>>
>>> "
>>>
>>> Max.
>>>
>>>>
>>>> Built and tested the kernel.
>>>>
>>>> However this issue is not resolved :(
>>>>
>>>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>>> CQE ffff8817edca86b0
>>>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
>>>> [ 2708.121342] 00000000 00000000 00000000 00000000
>>>> [ 2708.147104] 00000000 00000000 00000000 00000000
>>>> [ 2708.172633] 00000000 00000000 00000000 00000000
>>>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
>>>> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
>>>> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>>> CQE ffff8817ed0a9c30
>>>>
>>>> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump
>>>> error cqe
>>>> [ 2746.443240] 00000000 00000000 00000000 00000000
>>>> [ 2746.469323] 00000000 00000000 00000000 00000000
>>>> [ 2746.495310] 00000000 00000000 00000000 00000000
>>>> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
>>>> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
>>>> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>>> CQE ffff8817ed0a9cf0
>>>> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
>>>> [ 2763.297826] 00000000 00000000 00000000 00000000
>>>> [ 2763.323352] 00000000 00000000 00000000 00000000
>>>> [ 2763.348722] 00000000 00000000 00000000 00000000
>>>> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
>>>>
>>>> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
>>>> port-1:1 / host1.
>>>> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
>>>> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>>> CQE ffff8817ed0a9cf0
>>>> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
>>>> [ 2780.093520] 00000000 00000000 00000000 00000000
>>>> [ 2780.120067] 00000000 00000000 00000000 00000000
>>>> [ 2780.145575] 00000000 00000000 00000000 00000000
>>>> [ 2780.171153] 00000000 0f007806 25000042 000833d0
>>>> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
>>>> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>>> CQE ffff8817ed0a9cf0
>>>> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
>>>> [ 2796.495257] 00000000 00000000 00000000 00000000
>>>> [ 2796.521506] 00000000 00000000 00000000 00000000
>>>> [ 2796.547640] 00000000 00000000 00000000 00000000
>>>> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
>>>> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
>>>> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>>> CQE ffff8817ed0a9cf0
>>>>
>>>> Regards
>>>> Laurence
>>>>
>>>
>> Doing this now
>> Thanks
>> Laurence
>
> Max
>
> The Patch is not correct.
>
> drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
> drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has no member named 'attr'
>   WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
>                               ^
> ./include/asm-generic/bug.h:117:27: note: in definition of macro 'WARN_ON_ONCE'
>   int __ret_warn_once = !!(condition);   \
>
> I think you meant to give me
>
> WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);
>
> Can you confirm

Hi Laurence,
should be device->attrs.max_fast_reg_page_list_len.

please check this one that might solve the issue (on top of everything):


diff --git a/drivers/infiniband/hw/mlx5/mr.c 
b/drivers/infiniband/hw/mlx5/mr.c
index b8f9382..063d116 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1559,7 +1559,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
                 mr->max_descs = ndescs;
         } else if (mr_type == IB_MR_TYPE_SG_GAPS) {
                 mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS;
-
+               MLX5_SET(mkc, mkc, translations_octword_size, 
ALIGN(max_num_sg + 1, 4));
                 err = mlx5_alloc_priv_descs(pd->device, mr,
                                             ndescs, sizeof(struct 
mlx5_klm));
                 if (err)

thanks,
Max.

>
> Thanks
> Laurence
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                         ` <16ea1371-84a5-c055-5b0c-fdc6d355276a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-04-26 13:28                           ` Laurence Oberman
       [not found]                             ` <2122831810.2341766.1493213317484.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 13:28 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, April 26, 2017 8:25:30 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> On 4/26/2017 3:18 PM, Laurence Oberman wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> >> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> >> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel
> >> Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> >> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Sent: Wednesday, April 26, 2017 7:47:37 AM
> >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> >> overflows the klms[] array
> >>
> >>
> >>
> >> ----- Original Message -----
> >>> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
> >>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> >>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
> >>> <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> >>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> Sent: Wednesday, April 26, 2017 4:31:57 AM
> >>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> >>> overflows the klms[] array
> >>>
> >>>
> >>>
> >>> On 4/25/2017 11:37 PM, Laurence Oberman wrote:
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>>>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> >>>>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> >>>>> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> >>>>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> >>>>> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>>>> Sent: Tuesday, April 25, 2017 1:58:49 PM
> >>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> >>>>> overflows the klms[] array
> >>>>>
> >>>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> >>>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> >>>>>> than what fits into a single MR. .map_mr_sg() must not attempt to
> >>>>>> map more SG-list elements than what fits into a single MR.
> >>>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> >>>>>> the MR klms[] array.
> >>>>>>
> >>>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> >>>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> >>>>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>>>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> >>>>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>>>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>>>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> >>>>>> ---
> >>>>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>
> >>>>> Bart,
> >>>>>
> >>>>> Thanks a lot, it indeed looks right.
> >>>>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>>
> >>>> Hello Bart, Leon, Max and Israel.
> >>>>
> >>>> I cloned off Barts tree.
> >>>>
> >>>> git clone https://github.com/bvanassche/linux
> >>>> cd linux
> >>>> git checkout block-scsi-for-next
> >>>>
> >>>> I checked all patches were in for this test.
> >>>>
> >>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> >>>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> >>>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> >>>
> >>> Hi,
> >>> copying Sagi's request from different thread:
> >>>
> >>> "
> >>> Can you please enable srp_add_one debug:
> >>>
> >>> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
> >>>
> >>> In addition apply the following:
> >>> --
> >>> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> >>> b/drivers/infiniband/hw/mlx5/mr.c
> >>> index d9c6c0ea750b..040fbc387e4f 100644
> >>> --- a/drivers/infiniband/hw/mlx5/mr.c
> >>> +++ b/drivers/infiniband/hw/mlx5/mr.c
> >>> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
> >>>          int add_size;
> >>>          int ret;
> >>>
> >>> +       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> >>> +
> >>>          add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN,
> >>>          0);
> >>>
> >>>          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
> >>>
> >>> "
> >>>
> >>> Max.
> >>>
> >>>>
> >>>> Built and tested the kernel.
> >>>>
> >>>> However this issue is not resolved :(
> >>>>
> >>>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >>>> CQE ffff8817edca86b0
> >>>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> >>>> [ 2708.121342] 00000000 00000000 00000000 00000000
> >>>> [ 2708.147104] 00000000 00000000 00000000 00000000
> >>>> [ 2708.172633] 00000000 00000000 00000000 00000000
> >>>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> >>>> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> >>>> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >>>> CQE ffff8817ed0a9c30
> >>>>
> >>>> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump
> >>>> error cqe
> >>>> [ 2746.443240] 00000000 00000000 00000000 00000000
> >>>> [ 2746.469323] 00000000 00000000 00000000 00000000
> >>>> [ 2746.495310] 00000000 00000000 00000000 00000000
> >>>> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> >>>> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> >>>> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >>>> CQE ffff8817ed0a9cf0
> >>>> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> >>>> [ 2763.297826] 00000000 00000000 00000000 00000000
> >>>> [ 2763.323352] 00000000 00000000 00000000 00000000
> >>>> [ 2763.348722] 00000000 00000000 00000000 00000000
> >>>> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
> >>>>
> >>>> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
> >>>> port-1:1 / host1.
> >>>> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> >>>> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >>>> CQE ffff8817ed0a9cf0
> >>>> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> >>>> [ 2780.093520] 00000000 00000000 00000000 00000000
> >>>> [ 2780.120067] 00000000 00000000 00000000 00000000
> >>>> [ 2780.145575] 00000000 00000000 00000000 00000000
> >>>> [ 2780.171153] 00000000 0f007806 25000042 000833d0
> >>>> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> >>>> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >>>> CQE ffff8817ed0a9cf0
> >>>> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> >>>> [ 2796.495257] 00000000 00000000 00000000 00000000
> >>>> [ 2796.521506] 00000000 00000000 00000000 00000000
> >>>> [ 2796.547640] 00000000 00000000 00000000 00000000
> >>>> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> >>>> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> >>>> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >>>> CQE ffff8817ed0a9cf0
> >>>>
> >>>> Regards
> >>>> Laurence
> >>>>
> >>>
> >> Doing this now
> >> Thanks
> >> Laurence
> >
> > Max
> >
> > The Patch is not correct.
> >
> > drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
> > drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has no
> > member named 'attr'
> >   WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> >                               ^
> > ./include/asm-generic/bug.h:117:27: note: in definition of macro
> > 'WARN_ON_ONCE'
> >   int __ret_warn_once = !!(condition);   \
> >
> > I think you meant to give me
> >
> > WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);
> >
> > Can you confirm
> 
> Hi Laurence,
> should be device->attrs.max_fast_reg_page_list_len.
> 
> please check this one that might solve the issue (on top of everything):
> 
> 
> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> b/drivers/infiniband/hw/mlx5/mr.c
> index b8f9382..063d116 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1559,7 +1559,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
>                  mr->max_descs = ndescs;
>          } else if (mr_type == IB_MR_TYPE_SG_GAPS) {
>                  mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS;
> -
> +               MLX5_SET(mkc, mkc, translations_octword_size,
> ALIGN(max_num_sg + 1, 4));
>                  err = mlx5_alloc_priv_descs(pd->device, mr,
>                                              ndescs, sizeof(struct
> mlx5_klm));
>                  if (err)
> 
> thanks,
> Max.
> 
> >
> > Thanks
> > Laurence
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Max

I have the corrected WARN_ON_ONCE patch and the above patch as well as the rest as it was from Barts tree.

Still fails.

For a baseline I can revert 
a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS

Then test again to make sure we are starting from a good place.

Initiator log

[  280.481951] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff8817d9a881b8
[  301.149106] scsi host1: ib_srp: reconnect succeeded
[  301.280635] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed32f2f0
[  334.596420] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817c592c970
[  334.599689] mlx5_1:dump_cqe:262:(pid 20): dump error cqe
[  334.599691] 00000000 00000000 00000000 00000000
[  334.599692] 00000000 00000000 00000000 00000000
[  334.599692] 00000000 00000000 00000000 00000000
[  334.599693] 00000000 0f007806 2500002d 067b48d0
[  334.599697] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff8817c6e30078
[  336.117248] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
[  336.145840] 00000000 00000000 00000000 00000000
[  336.171830] 00000000 00000000 00000000 00000000
[  336.197688] 00000000 00000000 00000000 00000000
[  336.223720] 00000000 0f007806 25000032 005408d0
[  339.712706] fast_io_fail_tmo expired for SRP port-1:1 / host1.
[  341.453634] scsi host1: ib_srp: reconnect succeeded
[  341.481600] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
[  341.482145] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ecaf6970
[  341.559359] 00000000 00000000 00000000 00000000
[  341.585397] 00000000 00000000 00000000 00000000
[  341.610948] 00000000 00000000 00000000 00000000
[  341.637515] 00000000 0f007806 2500003d 000046d0
[  342.297598] sd 1:0:0:9: rejecting I/O to offline device
[  342.297936] sd 1:0:0:9: [sdg] tag#28 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.297941] sd 1:0:0:9: [sdg] tag#28 CDB: Write(10) 2a 00 00 00 40 00 00 40 00 00
[  342.297943] blk_update_request: recoverable transport error, dev sdg, sector 16384
[  342.297951] sd 1:0:0:20: [sdar] tag#5 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.297952] sd 1:0:0:20: [sdar] tag#15 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.297956] sd 1:0:0:20: [sdar] tag#5 CDB: Write(10) 2a 00 00 03 c0 00 00 40 00 00
[  342.297956] sd 1:0:0:20: [sdar] tag#15 CDB: Write(10) 2a 00 00 2c c0 00 00 40 00 00
[  342.297958] blk_update_request: recoverable transport error, dev sdar, sector 245760
[  342.297959] blk_update_request: recoverable transport error, dev sdar, sector 2932736
[  342.298119] device-mapper: multipath: Failing path 8:96.
[  342.298266] sd 1:0:0:9: [sdg] tag#29 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.298268] sd 1:0:0:9: [sdg] tag#29 CDB: Write(10) 2a 00 00 00 c0 00 00 40 00 00
[  342.298269] blk_update_request: recoverable transport error, dev sdg, sector 49152
[  342.298300] device-mapper: multipath: Failing path 66:176.
[  342.298486] sd 1:0:0:20: [sdar] tag#16 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.298488] sd 1:0:0:20: [sdar] tag#6 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.298489] sd 1:0:0:20: [sdar] tag#16 CDB: Write(10) 2a 00 00 2d 40 00 00 40 00 00
[  342.298490] sd 1:0:0:20: [sdar] tag#6 CDB: Write(10) 2a 00 00 04 40 00 00 40 00 00
[  342.298491] blk_update_request: recoverable transport error, dev sdar, sector 2965504
[  342.298492] blk_update_request: recoverable transport error, dev sdar, sector 278528
[  342.298582] sd 1:0:0:9: [sdg] tag#30 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.298584] sd 1:0:0:9: [sdg] tag#30 CDB: Write(10) 2a 00 00 01 40 00 00 40 00 00
[  342.298585] blk_update_request: recoverable transport error, dev sdg, sector 81920
[  342.298889] sd 1:0:0:9: [sdg] tag#31 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.298890] sd 1:0:0:9: [sdg] tag#31 CDB: Write(10) 2a 00 00 01 c0 00 00 40 00 00
[  342.298891] blk_update_request: recoverable transport error, dev sdg, sector 114688
[  342.298981] sd 1:0:0:20: [sdar] tag#7 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.298983] sd 1:0:0:20: [sdar] tag#7 CDB: Write(10) 2a 00 00 04 c0 00 00 40 00 00
[  342.298985] blk_update_request: recoverable transport error, dev sdar, sector 311296
[  342.299004] sd 1:0:0:20: [sdar] tag#17 FAILED Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.299007] sd 1:0:0:20: [sdar] tag#17 CDB: Write(10) 2a 00 00 34 c0 00 00 40 00 00
[  342.299009] blk_update_request: recoverable transport error, dev sdar, sector 3457024
[  342.356353] device-mapper: multipath: Failing path 8:64.
[  342.356489] device-mapper: multipath: Failing path 8:128.
[  342.356628] device-mapper: multipath: Failing path 8:160.
[  342.356699] device-mapper: multipath: Failing path 8:176.
[  342.356767] device-mapper: multipath: Failing path 8:240.
[  342.356834] device-mapper: multipath: Failing path 8:208.
[  342.356900] device-mapper: multipath: Failing path 65:16.
[  342.356967] device-mapper: multipath: Failing path 65:64.
[  342.357035] device-mapper: multipath: Failing path 65:96.
[  342.357103] device-mapper: multipath: Failing path 65:128.
[  342.357169] device-mapper: multipath: Failing path 65:176.
[  342.357237] device-mapper: multipath: Failing path 65:208.
[  342.357303] device-mapper: multipath: Failing path 65:224.
[  342.357371] device-mapper: multipath: Failing path 66:0.
[  342.357454] device-mapper: multipath: Failing path 66:32.
[  342.357521] device-mapper: multipath: Failing path 66:48.
[  342.357647] device-mapper: multipath: Failing path 66:80.
[  342.357714] device-mapper: multipath: Failing path 66:112.
[  342.357781] device-mapper: multipath: Failing path 66:144.
[  342.357936] device-mapper: multipath: Failing path 66:208.
[  342.358019] device-mapper: multipath: Failing path 66:240.
[  342.358115] device-mapper: multipath: Failing path 67:16.
[  342.358183] device-mapper: multipath: Failing path 67:48.
[  342.358264] device-mapper: multipath: Failing path 67:80.
[  342.358359] device-mapper: multipath: Failing path 67:128.
[  342.358442] device-mapper: multipath: Failing path 67:160.
[  342.358594] device-mapper: multipath: Failing path 67:224.
[  342.358671] device-mapper: multipath: Failing path 67:208.
[  350.157728] scsi host2: ib_srp: reconnect succeeded
[  350.189605] mlx5_1:dump_cqe:262:(pid 4756): dump error cqe
[  350.193180] mlx5_1:dump_cqe:262:(pid 1275): dump error cqe
[  350.193182] 00000000 00000000 00000000 00000000
[  350.193182] 00000000 00000000 00000000 00000000
[  350.193183] 00000000 00000000 00000000 00000000
[  350.193183] 00000000 0f007806 25000035 04f569d0
[  350.193187] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff8817c6e30078
[  350.412637] 00000000 00000000 00000000 00000000
[  350.436431] 00000000 00000000 00000000 00000000
[  350.461871] 00000000 00000000 00000000 00000000
[  350.487549] 00000000 0f007806 25000032 000843d0

Target Log

Thee events happened after the first failures on the initiator

[ 1111.029847] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-49.
[ 1111.078815] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-48.
[ 1111.127420] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-47.
[ 1111.175801] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-46.
[ 1111.223725] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-45.
[ 1111.271957] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-44.
[ 1111.319494] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-43.
[ 1111.365795] ib_srpt Received CM TimeWait exit for ch 0x4f6e72000390fe7c7cfe900300726ed3-42.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                             ` <2122831810.2341766.1493213317484.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-26 13:50                               ` Laurence Oberman
       [not found]                                 ` <1879402127.2348907.1493214625254.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 13:50 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, April 26, 2017 9:28:37 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> ----- Original Message -----
> > From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin"
> > <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Wednesday, April 26, 2017 8:25:30 AM
> > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > overflows the klms[] array
> > 
> > 
> > 
> > On 4/26/2017 3:18 PM, Laurence Oberman wrote:
> > >
> > >
> > > ----- Original Message -----
> > >> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > >> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > >> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > >> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel
> > >> Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > >> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >> Sent: Wednesday, April 26, 2017 7:47:37 AM
> > >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > >> overflows the klms[] array
> > >>
> > >>
> > >>
> > >> ----- Original Message -----
> > >>> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
> > >>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > >>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
> > >>> <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > >>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >>> Sent: Wednesday, April 26, 2017 4:31:57 AM
> > >>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > >>> overflows the klms[] array
> > >>>
> > >>>
> > >>>
> > >>> On 4/25/2017 11:37 PM, Laurence Oberman wrote:
> > >>>>
> > >>>>
> > >>>> ----- Original Message -----
> > >>>>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>>>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > >>>>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> > >>>>> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> > >>>>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> > >>>>> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >>>>> Sent: Tuesday, April 25, 2017 1:58:49 PM
> > >>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > >>>>> overflows the klms[] array
> > >>>>>
> > >>>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> > >>>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> > >>>>>> than what fits into a single MR. .map_mr_sg() must not attempt to
> > >>>>>> map more SG-list elements than what fits into a single MR.
> > >>>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> > >>>>>> the MR klms[] array.
> > >>>>>>
> > >>>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> > >>>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > >>>>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>>>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > >>>>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>>>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>>>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > >>>>>> ---
> > >>>>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> > >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>>
> > >>>>>
> > >>>>> Bart,
> > >>>>>
> > >>>>> Thanks a lot, it indeed looks right.
> > >>>>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >>>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>
> > >>>>
> > >>>> Hello Bart, Leon, Max and Israel.
> > >>>>
> > >>>> I cloned off Barts tree.
> > >>>>
> > >>>> git clone https://github.com/bvanassche/linux
> > >>>> cd linux
> > >>>> git checkout block-scsi-for-next
> > >>>>
> > >>>> I checked all patches were in for this test.
> > >>>>
> > >>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > >>>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[]
> > >>>> array
> > >>>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> > >>>
> > >>> Hi,
> > >>> copying Sagi's request from different thread:
> > >>>
> > >>> "
> > >>> Can you please enable srp_add_one debug:
> > >>>
> > >>> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
> > >>>
> > >>> In addition apply the following:
> > >>> --
> > >>> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > >>> b/drivers/infiniband/hw/mlx5/mr.c
> > >>> index d9c6c0ea750b..040fbc387e4f 100644
> > >>> --- a/drivers/infiniband/hw/mlx5/mr.c
> > >>> +++ b/drivers/infiniband/hw/mlx5/mr.c
> > >>> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
> > >>>          int add_size;
> > >>>          int ret;
> > >>>
> > >>> +       WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> > >>> +
> > >>>          add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN,
> > >>>          0);
> > >>>
> > >>>          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
> > >>>
> > >>> "
> > >>>
> > >>> Max.
> > >>>
> > >>>>
> > >>>> Built and tested the kernel.
> > >>>>
> > >>>> However this issue is not resolved :(
> > >>>>
> > >>>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>>> for
> > >>>> CQE ffff8817edca86b0
> > >>>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> > >>>> [ 2708.121342] 00000000 00000000 00000000 00000000
> > >>>> [ 2708.147104] 00000000 00000000 00000000 00000000
> > >>>> [ 2708.172633] 00000000 00000000 00000000 00000000
> > >>>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> > >>>> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> > >>>> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>>> for
> > >>>> CQE ffff8817ed0a9c30
> > >>>>
> > >>>> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877):
> > >>>> dump
> > >>>> error cqe
> > >>>> [ 2746.443240] 00000000 00000000 00000000 00000000
> > >>>> [ 2746.469323] 00000000 00000000 00000000 00000000
> > >>>> [ 2746.495310] 00000000 00000000 00000000 00000000
> > >>>> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> > >>>> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> > >>>> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>>> for
> > >>>> CQE ffff8817ed0a9cf0
> > >>>> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> > >>>> [ 2763.297826] 00000000 00000000 00000000 00000000
> > >>>> [ 2763.323352] 00000000 00000000 00000000 00000000
> > >>>> [ 2763.348722] 00000000 00000000 00000000 00000000
> > >>>> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
> > >>>>
> > >>>> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
> > >>>> port-1:1 / host1.
> > >>>> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> > >>>> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>>> for
> > >>>> CQE ffff8817ed0a9cf0
> > >>>> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > >>>> [ 2780.093520] 00000000 00000000 00000000 00000000
> > >>>> [ 2780.120067] 00000000 00000000 00000000 00000000
> > >>>> [ 2780.145575] 00000000 00000000 00000000 00000000
> > >>>> [ 2780.171153] 00000000 0f007806 25000042 000833d0
> > >>>> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> > >>>> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>>> for
> > >>>> CQE ffff8817ed0a9cf0
> > >>>> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > >>>> [ 2796.495257] 00000000 00000000 00000000 00000000
> > >>>> [ 2796.521506] 00000000 00000000 00000000 00000000
> > >>>> [ 2796.547640] 00000000 00000000 00000000 00000000
> > >>>> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> > >>>> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> > >>>> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>>> for
> > >>>> CQE ffff8817ed0a9cf0
> > >>>>
> > >>>> Regards
> > >>>> Laurence
> > >>>>
> > >>>
> > >> Doing this now
> > >> Thanks
> > >> Laurence
> > >
> > > Max
> > >
> > > The Patch is not correct.
> > >
> > > drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
> > > drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has no
> > > member named 'attr'
> > >   WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> > >                               ^
> > > ./include/asm-generic/bug.h:117:27: note: in definition of macro
> > > 'WARN_ON_ONCE'
> > >   int __ret_warn_once = !!(condition);   \
> > >
> > > I think you meant to give me
> > >
> > > WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);
> > >
> > > Can you confirm
> > 
> > Hi Laurence,
> > should be device->attrs.max_fast_reg_page_list_len.
> > 
> > please check this one that might solve the issue (on top of everything):
> > 
> > 
> > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > b/drivers/infiniband/hw/mlx5/mr.c
> > index b8f9382..063d116 100644
> > --- a/drivers/infiniband/hw/mlx5/mr.c
> > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > @@ -1559,7 +1559,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
> >                  mr->max_descs = ndescs;
> >          } else if (mr_type == IB_MR_TYPE_SG_GAPS) {
> >                  mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS;
> > -
> > +               MLX5_SET(mkc, mkc, translations_octword_size,
> > ALIGN(max_num_sg + 1, 4));
> >                  err = mlx5_alloc_priv_descs(pd->device, mr,
> >                                              ndescs, sizeof(struct
> > mlx5_klm));
> >                  if (err)
> > 
> > thanks,
> > Max.
> > 
> > >
> > > Thanks
> > > Laurence
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Hello Max
> 
> I have the corrected WARN_ON_ONCE patch and the above patch as well as the
> rest as it was from Barts tree.
> 
> Still fails.
> 
> For a baseline I can revert
> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> 
> Then test again to make sure we are starting from a good place.
> 
> Initiator log
> 
> [  280.481951] scsi host1: ib_srp: failed FAST REG status memory management
> operation error (6) for CQE ffff8817d9a881b8
> [  301.149106] scsi host1: ib_srp: reconnect succeeded
> [  301.280635] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817ed32f2f0
> [  334.596420] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817c592c970
> [  334.599689] mlx5_1:dump_cqe:262:(pid 20): dump error cqe
> [  334.599691] 00000000 00000000 00000000 00000000
> [  334.599692] 00000000 00000000 00000000 00000000
> [  334.599692] 00000000 00000000 00000000 00000000
> [  334.599693] 00000000 0f007806 2500002d 067b48d0
> [  334.599697] scsi host2: ib_srp: failed FAST REG status memory management
> operation error (6) for CQE ffff8817c6e30078
> [  336.117248] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
> [  336.145840] 00000000 00000000 00000000 00000000
> [  336.171830] 00000000 00000000 00000000 00000000
> [  336.197688] 00000000 00000000 00000000 00000000
> [  336.223720] 00000000 0f007806 25000032 005408d0
> [  339.712706] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> [  341.453634] scsi host1: ib_srp: reconnect succeeded
> [  341.481600] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
> [  341.482145] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817ecaf6970
> [  341.559359] 00000000 00000000 00000000 00000000
> [  341.585397] 00000000 00000000 00000000 00000000
> [  341.610948] 00000000 00000000 00000000 00000000
> [  341.637515] 00000000 0f007806 2500003d 000046d0
> [  342.297598] sd 1:0:0:9: rejecting I/O to offline device
> [  342.297936] sd 1:0:0:9: [sdg] tag#28 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.297941] sd 1:0:0:9: [sdg] tag#28 CDB: Write(10) 2a 00 00 00 40 00 00
> 40 00 00
> [  342.297943] blk_update_request: recoverable transport error, dev sdg,
> sector 16384
> [  342.297951] sd 1:0:0:20: [sdar] tag#5 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.297952] sd 1:0:0:20: [sdar] tag#15 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.297956] sd 1:0:0:20: [sdar] tag#5 CDB: Write(10) 2a 00 00 03 c0 00 00
> 40 00 00
> [  342.297956] sd 1:0:0:20: [sdar] tag#15 CDB: Write(10) 2a 00 00 2c c0 00 00
> 40 00 00
> [  342.297958] blk_update_request: recoverable transport error, dev sdar,
> sector 245760
> [  342.297959] blk_update_request: recoverable transport error, dev sdar,
> sector 2932736
> [  342.298119] device-mapper: multipath: Failing path 8:96.
> [  342.298266] sd 1:0:0:9: [sdg] tag#29 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.298268] sd 1:0:0:9: [sdg] tag#29 CDB: Write(10) 2a 00 00 00 c0 00 00
> 40 00 00
> [  342.298269] blk_update_request: recoverable transport error, dev sdg,
> sector 49152
> [  342.298300] device-mapper: multipath: Failing path 66:176.
> [  342.298486] sd 1:0:0:20: [sdar] tag#16 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.298488] sd 1:0:0:20: [sdar] tag#6 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.298489] sd 1:0:0:20: [sdar] tag#16 CDB: Write(10) 2a 00 00 2d 40 00 00
> 40 00 00
> [  342.298490] sd 1:0:0:20: [sdar] tag#6 CDB: Write(10) 2a 00 00 04 40 00 00
> 40 00 00
> [  342.298491] blk_update_request: recoverable transport error, dev sdar,
> sector 2965504
> [  342.298492] blk_update_request: recoverable transport error, dev sdar,
> sector 278528
> [  342.298582] sd 1:0:0:9: [sdg] tag#30 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.298584] sd 1:0:0:9: [sdg] tag#30 CDB: Write(10) 2a 00 00 01 40 00 00
> 40 00 00
> [  342.298585] blk_update_request: recoverable transport error, dev sdg,
> sector 81920
> [  342.298889] sd 1:0:0:9: [sdg] tag#31 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.298890] sd 1:0:0:9: [sdg] tag#31 CDB: Write(10) 2a 00 00 01 c0 00 00
> 40 00 00
> [  342.298891] blk_update_request: recoverable transport error, dev sdg,
> sector 114688
> [  342.298981] sd 1:0:0:20: [sdar] tag#7 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.298983] sd 1:0:0:20: [sdar] tag#7 CDB: Write(10) 2a 00 00 04 c0 00 00
> 40 00 00
> [  342.298985] blk_update_request: recoverable transport error, dev sdar,
> sector 311296
> [  342.299004] sd 1:0:0:20: [sdar] tag#17 FAILED Result:
> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> [  342.299007] sd 1:0:0:20: [sdar] tag#17 CDB: Write(10) 2a 00 00 34 c0 00 00
> 40 00 00
> [  342.299009] blk_update_request: recoverable transport error, dev sdar,
> sector 3457024
> [  342.356353] device-mapper: multipath: Failing path 8:64.
> [  342.356489] device-mapper: multipath: Failing path 8:128.
> [  342.356628] device-mapper: multipath: Failing path 8:160.
> [  342.356699] device-mapper: multipath: Failing path 8:176.
> [  342.356767] device-mapper: multipath: Failing path 8:240.
> [  342.356834] device-mapper: multipath: Failing path 8:208.
> [  342.356900] device-mapper: multipath: Failing path 65:16.
> [  342.356967] device-mapper: multipath: Failing path 65:64.
> [  342.357035] device-mapper: multipath: Failing path 65:96.
> [  342.357103] device-mapper: multipath: Failing path 65:128.
> [  342.357169] device-mapper: multipath: Failing path 65:176.
> [  342.357237] device-mapper: multipath: Failing path 65:208.
> [  342.357303] device-mapper: multipath: Failing path 65:224.
> [  342.357371] device-mapper: multipath: Failing path 66:0.
> [  342.357454] device-mapper: multipath: Failing path 66:32.
> [  342.357521] device-mapper: multipath: Failing path 66:48.
> [  342.357647] device-mapper: multipath: Failing path 66:80.
> [  342.357714] device-mapper: multipath: Failing path 66:112.
> [  342.357781] device-mapper: multipath: Failing path 66:144.
> [  342.357936] device-mapper: multipath: Failing path 66:208.
> [  342.358019] device-mapper: multipath: Failing path 66:240.
> [  342.358115] device-mapper: multipath: Failing path 67:16.
> [  342.358183] device-mapper: multipath: Failing path 67:48.
> [  342.358264] device-mapper: multipath: Failing path 67:80.
> [  342.358359] device-mapper: multipath: Failing path 67:128.
> [  342.358442] device-mapper: multipath: Failing path 67:160.
> [  342.358594] device-mapper: multipath: Failing path 67:224.
> [  342.358671] device-mapper: multipath: Failing path 67:208.
> [  350.157728] scsi host2: ib_srp: reconnect succeeded
> [  350.189605] mlx5_1:dump_cqe:262:(pid 4756): dump error cqe
> [  350.193180] mlx5_1:dump_cqe:262:(pid 1275): dump error cqe
> [  350.193182] 00000000 00000000 00000000 00000000
> [  350.193182] 00000000 00000000 00000000 00000000
> [  350.193183] 00000000 00000000 00000000 00000000
> [  350.193183] 00000000 0f007806 25000035 04f569d0
> [  350.193187] scsi host2: ib_srp: failed FAST REG status memory management
> operation error (6) for CQE ffff8817c6e30078
> [  350.412637] 00000000 00000000 00000000 00000000
> [  350.436431] 00000000 00000000 00000000 00000000
> [  350.461871] 00000000 00000000 00000000 00000000
> [  350.487549] 00000000 0f007806 25000032 000843d0
> 
> Target Log
> 
> Thee events happened after the first failures on the initiator
> 
> [ 1111.029847] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-49.
> [ 1111.078815] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-48.
> [ 1111.127420] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-47.
> [ 1111.175801] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-46.
> [ 1111.223725] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-45.
> [ 1111.271957] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-44.
> [ 1111.319494] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-43.
> [ 1111.365795] ib_srpt Received CM TimeWait exit for ch
> 0x4f6e72000390fe7c7cfe900300726ed3-42.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Max

These are the parameters all my tests run with.
Same as always.

[root@localhost modprobe.d]# cat ib_srp.conf
options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048 

I dont set prefer_fr so it defaults to Y

[root@localhost parameters]# cat prefer_fr
Y

I have no settings for mlx5_core, all defaults.

Thanks
Laurence

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found] ` <8992bd28-667f-94b1-e582-106e6b41aa4b-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2017-04-24 22:39   ` Laurence Oberman
  2017-04-25 17:58   ` Leon Romanovsky
@ 2017-04-26 14:45   ` Sagi Grimberg
  2 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-04-26 14:45 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: Max Gurtovoy, Leon Romanovsky, Israel Rukshin, Laurence Oberman,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Looks good Bart,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                 ` <1801288254.2280763.1493207193850.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-26 15:05                   ` Bart Van Assche
  0 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2017-04-26 15:05 UTC (permalink / raw)
  To: loberman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: maxg-VPRAkNaXOzVWk0Htik3J/w, israelr-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, sagi-NQWnxTmZq1alnMjI0IkVqw

On Wed, 2017-04-26 at 07:46 -0400, Laurence Oberman wrote:
> Its the same target that is stable for all other tests.
> This is the same issue I originally reported when we then reverted the SG+GAPS.
> Remember when I reverted that we were stable again.
> 
> This happens on the initiator first
> 
> [...]
> 
> Only afterwards do I see the target complain
> 
> [...]

Thanks Laurence. I think this confirms that we have to continue analyzing the
initiator side further.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                                 ` <1879402127.2348907.1493214625254.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-26 15:10                                   ` Laurence Oberman
       [not found]                                     ` <1477402175.2378198.1493219418826.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-04-26 15:10 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, April 26, 2017 9:50:25 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin"
> > <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Wednesday, April 26, 2017 9:28:37 AM
> > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > overflows the klms[] array
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > > <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel
> > > Rukshin"
> > > <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > Sent: Wednesday, April 26, 2017 8:25:30 AM
> > > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > > overflows the klms[] array
> > > 
> > > 
> > > 
> > > On 4/26/2017 3:18 PM, Laurence Oberman wrote:
> > > >
> > > >
> > > > ----- Original Message -----
> > > >> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > >> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > > >> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > > >> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel
> > > >> Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > > >> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > >> Sent: Wednesday, April 26, 2017 7:47:37 AM
> > > >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > > >> overflows the klms[] array
> > > >>
> > > >>
> > > >>
> > > >> ----- Original Message -----
> > > >>> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
> > > >>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > > >>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
> > > >>> <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > > >>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > >>> Sent: Wednesday, April 26, 2017 4:31:57 AM
> > > >>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > > >>> overflows the klms[] array
> > > >>>
> > > >>>
> > > >>>
> > > >>> On 4/25/2017 11:37 PM, Laurence Oberman wrote:
> > > >>>>
> > > >>>>
> > > >>>> ----- Original Message -----
> > > >>>>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>>>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > >>>>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> > > >>>>> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
> > > >>>>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> > > >>>>> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > >>>>> Sent: Tuesday, April 25, 2017 1:58:49 PM
> > > >>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that
> > > >>>>> mlx5_ib_sg_to_klms()
> > > >>>>> overflows the klms[] array
> > > >>>>>
> > > >>>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
> > > >>>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
> > > >>>>>> than what fits into a single MR. .map_mr_sg() must not attempt to
> > > >>>>>> map more SG-list elements than what fits into a single MR.
> > > >>>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
> > > >>>>>> the MR klms[] array.
> > > >>>>>>
> > > >>>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
> > > >>>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > >>>>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>>>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > > >>>>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>>>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>>>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > > >>>>>> ---
> > > >>>>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
> > > >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>>>>
> > > >>>>>
> > > >>>>> Bart,
> > > >>>>>
> > > >>>>> Thanks a lot, it indeed looks right.
> > > >>>>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > >>>>>
> > > >>>>> Thanks
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> Hello Bart, Leon, Max and Israel.
> > > >>>>
> > > >>>> I cloned off Barts tree.
> > > >>>>
> > > >>>> git clone https://github.com/bvanassche/linux
> > > >>>> cd linux
> > > >>>> git checkout block-scsi-for-next
> > > >>>>
> > > >>>> I checked all patches were in for this test.
> > > >>>>
> > > >>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > > >>>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[]
> > > >>>> array
> > > >>>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> > > >>>
> > > >>> Hi,
> > > >>> copying Sagi's request from different thread:
> > > >>>
> > > >>> "
> > > >>> Can you please enable srp_add_one debug:
> > > >>>
> > > >>> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
> > > >>>
> > > >>> In addition apply the following:
> > > >>> --
> > > >>> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > > >>> b/drivers/infiniband/hw/mlx5/mr.c
> > > >>> index d9c6c0ea750b..040fbc387e4f 100644
> > > >>> --- a/drivers/infiniband/hw/mlx5/mr.c
> > > >>> +++ b/drivers/infiniband/hw/mlx5/mr.c
> > > >>> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
> > > >>>          int add_size;
> > > >>>          int ret;
> > > >>>
> > > >>> +       WARN_ON_ONCE(ndescs >
> > > >>> device->attr.max_fast_reg_page_list_len);
> > > >>> +
> > > >>>          add_size = max_t(int, MLX5_UMR_ALIGN -
> > > >>>          ARCH_KMALLOC_MINALIGN,
> > > >>>          0);
> > > >>>
> > > >>>          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
> > > >>>
> > > >>> "
> > > >>>
> > > >>> Max.
> > > >>>
> > > >>>>
> > > >>>> Built and tested the kernel.
> > > >>>>
> > > >>>> However this issue is not resolved :(
> > > >>>>
> > > >>>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > >>>> for
> > > >>>> CQE ffff8817edca86b0
> > > >>>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> > > >>>> [ 2708.121342] 00000000 00000000 00000000 00000000
> > > >>>> [ 2708.147104] 00000000 00000000 00000000 00000000
> > > >>>> [ 2708.172633] 00000000 00000000 00000000 00000000
> > > >>>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> > > >>>> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> > > >>>> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > >>>> for
> > > >>>> CQE ffff8817ed0a9c30
> > > >>>>
> > > >>>> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877):
> > > >>>> dump
> > > >>>> error cqe
> > > >>>> [ 2746.443240] 00000000 00000000 00000000 00000000
> > > >>>> [ 2746.469323] 00000000 00000000 00000000 00000000
> > > >>>> [ 2746.495310] 00000000 00000000 00000000 00000000
> > > >>>> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
> > > >>>> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
> > > >>>> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > >>>> for
> > > >>>> CQE ffff8817ed0a9cf0
> > > >>>> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
> > > >>>> [ 2763.297826] 00000000 00000000 00000000 00000000
> > > >>>> [ 2763.323352] 00000000 00000000 00000000 00000000
> > > >>>> [ 2763.348722] 00000000 00000000 00000000 00000000
> > > >>>> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
> > > >>>>
> > > >>>> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
> > > >>>> port-1:1 / host1.
> > > >>>> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
> > > >>>> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > >>>> for
> > > >>>> CQE ffff8817ed0a9cf0
> > > >>>> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > > >>>> [ 2780.093520] 00000000 00000000 00000000 00000000
> > > >>>> [ 2780.120067] 00000000 00000000 00000000 00000000
> > > >>>> [ 2780.145575] 00000000 00000000 00000000 00000000
> > > >>>> [ 2780.171153] 00000000 0f007806 25000042 000833d0
> > > >>>> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
> > > >>>> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > >>>> for
> > > >>>> CQE ffff8817ed0a9cf0
> > > >>>> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
> > > >>>> [ 2796.495257] 00000000 00000000 00000000 00000000
> > > >>>> [ 2796.521506] 00000000 00000000 00000000 00000000
> > > >>>> [ 2796.547640] 00000000 00000000 00000000 00000000
> > > >>>> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
> > > >>>> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
> > > >>>> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > > >>>> for
> > > >>>> CQE ffff8817ed0a9cf0
> > > >>>>
> > > >>>> Regards
> > > >>>> Laurence
> > > >>>>
> > > >>>
> > > >> Doing this now
> > > >> Thanks
> > > >> Laurence
> > > >
> > > > Max
> > > >
> > > > The Patch is not correct.
> > > >
> > > > drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
> > > > drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has
> > > > no
> > > > member named 'attr'
> > > >   WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
> > > >                               ^
> > > > ./include/asm-generic/bug.h:117:27: note: in definition of macro
> > > > 'WARN_ON_ONCE'
> > > >   int __ret_warn_once = !!(condition);   \
> > > >
> > > > I think you meant to give me
> > > >
> > > > WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);
> > > >
> > > > Can you confirm
> > > 
> > > Hi Laurence,
> > > should be device->attrs.max_fast_reg_page_list_len.
> > > 
> > > please check this one that might solve the issue (on top of everything):
> > > 
> > > 
> > > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > > b/drivers/infiniband/hw/mlx5/mr.c
> > > index b8f9382..063d116 100644
> > > --- a/drivers/infiniband/hw/mlx5/mr.c
> > > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > > @@ -1559,7 +1559,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
> > >                  mr->max_descs = ndescs;
> > >          } else if (mr_type == IB_MR_TYPE_SG_GAPS) {
> > >                  mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS;
> > > -
> > > +               MLX5_SET(mkc, mkc, translations_octword_size,
> > > ALIGN(max_num_sg + 1, 4));
> > >                  err = mlx5_alloc_priv_descs(pd->device, mr,
> > >                                              ndescs, sizeof(struct
> > > mlx5_klm));
> > >                  if (err)
> > > 
> > > thanks,
> > > Max.
> > > 
> > > >
> > > > Thanks
> > > > Laurence
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > Hello Max
> > 
> > I have the corrected WARN_ON_ONCE patch and the above patch as well as the
> > rest as it was from Barts tree.
> > 
> > Still fails.
> > 
> > For a baseline I can revert
> > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > 
> > Then test again to make sure we are starting from a good place.
> > 
> > Initiator log
> > 
> > [  280.481951] scsi host1: ib_srp: failed FAST REG status memory management
> > operation error (6) for CQE ffff8817d9a881b8
> > [  301.149106] scsi host1: ib_srp: reconnect succeeded
> > [  301.280635] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE
> > ffff8817ed32f2f0
> > [  334.596420] scsi host2: ib_srp: failed RECV status WR flushed (5) for
> > CQE
> > ffff8817c592c970
> > [  334.599689] mlx5_1:dump_cqe:262:(pid 20): dump error cqe
> > [  334.599691] 00000000 00000000 00000000 00000000
> > [  334.599692] 00000000 00000000 00000000 00000000
> > [  334.599692] 00000000 00000000 00000000 00000000
> > [  334.599693] 00000000 0f007806 2500002d 067b48d0
> > [  334.599697] scsi host2: ib_srp: failed FAST REG status memory management
> > operation error (6) for CQE ffff8817c6e30078
> > [  336.117248] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
> > [  336.145840] 00000000 00000000 00000000 00000000
> > [  336.171830] 00000000 00000000 00000000 00000000
> > [  336.197688] 00000000 00000000 00000000 00000000
> > [  336.223720] 00000000 0f007806 25000032 005408d0
> > [  339.712706] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> > [  341.453634] scsi host1: ib_srp: reconnect succeeded
> > [  341.481600] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
> > [  341.482145] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE
> > ffff8817ecaf6970
> > [  341.559359] 00000000 00000000 00000000 00000000
> > [  341.585397] 00000000 00000000 00000000 00000000
> > [  341.610948] 00000000 00000000 00000000 00000000
> > [  341.637515] 00000000 0f007806 2500003d 000046d0
> > [  342.297598] sd 1:0:0:9: rejecting I/O to offline device
> > [  342.297936] sd 1:0:0:9: [sdg] tag#28 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.297941] sd 1:0:0:9: [sdg] tag#28 CDB: Write(10) 2a 00 00 00 40 00 00
> > 40 00 00
> > [  342.297943] blk_update_request: recoverable transport error, dev sdg,
> > sector 16384
> > [  342.297951] sd 1:0:0:20: [sdar] tag#5 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.297952] sd 1:0:0:20: [sdar] tag#15 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.297956] sd 1:0:0:20: [sdar] tag#5 CDB: Write(10) 2a 00 00 03 c0 00
> > 00
> > 40 00 00
> > [  342.297956] sd 1:0:0:20: [sdar] tag#15 CDB: Write(10) 2a 00 00 2c c0 00
> > 00
> > 40 00 00
> > [  342.297958] blk_update_request: recoverable transport error, dev sdar,
> > sector 245760
> > [  342.297959] blk_update_request: recoverable transport error, dev sdar,
> > sector 2932736
> > [  342.298119] device-mapper: multipath: Failing path 8:96.
> > [  342.298266] sd 1:0:0:9: [sdg] tag#29 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.298268] sd 1:0:0:9: [sdg] tag#29 CDB: Write(10) 2a 00 00 00 c0 00 00
> > 40 00 00
> > [  342.298269] blk_update_request: recoverable transport error, dev sdg,
> > sector 49152
> > [  342.298300] device-mapper: multipath: Failing path 66:176.
> > [  342.298486] sd 1:0:0:20: [sdar] tag#16 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.298488] sd 1:0:0:20: [sdar] tag#6 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.298489] sd 1:0:0:20: [sdar] tag#16 CDB: Write(10) 2a 00 00 2d 40 00
> > 00
> > 40 00 00
> > [  342.298490] sd 1:0:0:20: [sdar] tag#6 CDB: Write(10) 2a 00 00 04 40 00
> > 00
> > 40 00 00
> > [  342.298491] blk_update_request: recoverable transport error, dev sdar,
> > sector 2965504
> > [  342.298492] blk_update_request: recoverable transport error, dev sdar,
> > sector 278528
> > [  342.298582] sd 1:0:0:9: [sdg] tag#30 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.298584] sd 1:0:0:9: [sdg] tag#30 CDB: Write(10) 2a 00 00 01 40 00 00
> > 40 00 00
> > [  342.298585] blk_update_request: recoverable transport error, dev sdg,
> > sector 81920
> > [  342.298889] sd 1:0:0:9: [sdg] tag#31 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.298890] sd 1:0:0:9: [sdg] tag#31 CDB: Write(10) 2a 00 00 01 c0 00 00
> > 40 00 00
> > [  342.298891] blk_update_request: recoverable transport error, dev sdg,
> > sector 114688
> > [  342.298981] sd 1:0:0:20: [sdar] tag#7 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.298983] sd 1:0:0:20: [sdar] tag#7 CDB: Write(10) 2a 00 00 04 c0 00
> > 00
> > 40 00 00
> > [  342.298985] blk_update_request: recoverable transport error, dev sdar,
> > sector 311296
> > [  342.299004] sd 1:0:0:20: [sdar] tag#17 FAILED Result:
> > hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
> > [  342.299007] sd 1:0:0:20: [sdar] tag#17 CDB: Write(10) 2a 00 00 34 c0 00
> > 00
> > 40 00 00
> > [  342.299009] blk_update_request: recoverable transport error, dev sdar,
> > sector 3457024
> > [  342.356353] device-mapper: multipath: Failing path 8:64.
> > [  342.356489] device-mapper: multipath: Failing path 8:128.
> > [  342.356628] device-mapper: multipath: Failing path 8:160.
> > [  342.356699] device-mapper: multipath: Failing path 8:176.
> > [  342.356767] device-mapper: multipath: Failing path 8:240.
> > [  342.356834] device-mapper: multipath: Failing path 8:208.
> > [  342.356900] device-mapper: multipath: Failing path 65:16.
> > [  342.356967] device-mapper: multipath: Failing path 65:64.
> > [  342.357035] device-mapper: multipath: Failing path 65:96.
> > [  342.357103] device-mapper: multipath: Failing path 65:128.
> > [  342.357169] device-mapper: multipath: Failing path 65:176.
> > [  342.357237] device-mapper: multipath: Failing path 65:208.
> > [  342.357303] device-mapper: multipath: Failing path 65:224.
> > [  342.357371] device-mapper: multipath: Failing path 66:0.
> > [  342.357454] device-mapper: multipath: Failing path 66:32.
> > [  342.357521] device-mapper: multipath: Failing path 66:48.
> > [  342.357647] device-mapper: multipath: Failing path 66:80.
> > [  342.357714] device-mapper: multipath: Failing path 66:112.
> > [  342.357781] device-mapper: multipath: Failing path 66:144.
> > [  342.357936] device-mapper: multipath: Failing path 66:208.
> > [  342.358019] device-mapper: multipath: Failing path 66:240.
> > [  342.358115] device-mapper: multipath: Failing path 67:16.
> > [  342.358183] device-mapper: multipath: Failing path 67:48.
> > [  342.358264] device-mapper: multipath: Failing path 67:80.
> > [  342.358359] device-mapper: multipath: Failing path 67:128.
> > [  342.358442] device-mapper: multipath: Failing path 67:160.
> > [  342.358594] device-mapper: multipath: Failing path 67:224.
> > [  342.358671] device-mapper: multipath: Failing path 67:208.
> > [  350.157728] scsi host2: ib_srp: reconnect succeeded
> > [  350.189605] mlx5_1:dump_cqe:262:(pid 4756): dump error cqe
> > [  350.193180] mlx5_1:dump_cqe:262:(pid 1275): dump error cqe
> > [  350.193182] 00000000 00000000 00000000 00000000
> > [  350.193182] 00000000 00000000 00000000 00000000
> > [  350.193183] 00000000 00000000 00000000 00000000
> > [  350.193183] 00000000 0f007806 25000035 04f569d0
> > [  350.193187] scsi host2: ib_srp: failed FAST REG status memory management
> > operation error (6) for CQE ffff8817c6e30078
> > [  350.412637] 00000000 00000000 00000000 00000000
> > [  350.436431] 00000000 00000000 00000000 00000000
> > [  350.461871] 00000000 00000000 00000000 00000000
> > [  350.487549] 00000000 0f007806 25000032 000843d0
> > 
> > Target Log
> > 
> > Thee events happened after the first failures on the initiator
> > 
> > [ 1111.029847] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-49.
> > [ 1111.078815] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-48.
> > [ 1111.127420] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-47.
> > [ 1111.175801] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-46.
> > [ 1111.223725] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-45.
> > [ 1111.271957] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-44.
> > [ 1111.319494] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-43.
> > [ 1111.365795] ib_srpt Received CM TimeWait exit for ch
> > 0x4f6e72000390fe7c7cfe900300726ed3-42.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Max
> 
> These are the parameters all my tests run with.
> Same as always.
> 
> [root@localhost modprobe.d]# cat ib_srp.conf
> options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048
> 
> I dont set prefer_fr so it defaults to Y
> 
> [root@localhost parameters]# cat prefer_fr
> Y
> 
> I have no settings for mlx5_core, all defaults.
> 
> Thanks
> Laurence
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Max,

Reverting a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS on the same source tree with all esle applied I am stable.
So clearly we still have issues with IB_MR_TYPE_SG_GAPS.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                                     ` <1477402175.2378198.1493219418826.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-05-02 23:28                                       ` Max Gurtovoy
  0 siblings, 0 replies; 26+ messages in thread
From: Max Gurtovoy @ 2017-05-02 23:28 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Sagi Grimberg,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 4/26/2017 6:10 PM, Laurence Oberman wrote:
>
>
> ----- Original Message -----
>> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Sent: Wednesday, April 26, 2017 9:50:25 AM
>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>>
>>
>>
>> ----- Original Message -----
>>> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
>>> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin"
>>> <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Sent: Wednesday, April 26, 2017 9:28:37 AM
>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
>>> overflows the klms[] array
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
>>>> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>>>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel
>>>> Rukshin"
>>>> <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>>>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> Sent: Wednesday, April 26, 2017 8:25:30 AM
>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
>>>> overflows the klms[] array
>>>>
>>>>
>>>>
>>>> On 4/26/2017 3:18 PM, Laurence Oberman wrote:
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>>>> To: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
>>>>>> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>>>>>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel
>>>>>> Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>>>>>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> Sent: Wednesday, April 26, 2017 7:47:37 AM
>>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
>>>>>> overflows the klms[] array
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Leon Romanovsky"
>>>>>>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
>>>>>>> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Sagi Grimberg"
>>>>>>> <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
>>>>>>> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> Sent: Wednesday, April 26, 2017 4:31:57 AM
>>>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
>>>>>>> overflows the klms[] array
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/25/2017 11:37 PM, Laurence Oberman wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>>>> To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>>>>>>> Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
>>>>>>>>> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
>>>>>>>>> "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
>>>>>>>>> <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>> Sent: Tuesday, April 25, 2017 1:58:49 PM
>>>>>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that
>>>>>>>>> mlx5_ib_sg_to_klms()
>>>>>>>>> overflows the klms[] array
>>>>>>>>>
>>>>>>>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote:
>>>>>>>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
>>>>>>>>>> than what fits into a single MR. .map_mr_sg() must not attempt to
>>>>>>>>>> map more SG-list elements than what fits into a single MR.
>>>>>>>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside
>>>>>>>>>> the MR klms[] array.
>>>>>>>>>>
>>>>>>>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support")
>>>>>>>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>>>>>>>> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>>>>> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>>>>>>>>>> Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>>>>> Cc: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>>>>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/infiniband/hw/mlx5/mr.c | 2 +-
>>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bart,
>>>>>>>>>
>>>>>>>>> Thanks a lot, it indeed looks right.
>>>>>>>>> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello Bart, Leon, Max and Israel.
>>>>>>>>
>>>>>>>> I cloned off Barts tree.
>>>>>>>>
>>>>>>>> git clone https://github.com/bvanassche/linux
>>>>>>>> cd linux
>>>>>>>> git checkout block-scsi-for-next
>>>>>>>>
>>>>>>>> I checked all patches were in for this test.
>>>>>>>>
>>>>>>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
>>>>>>>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[]
>>>>>>>> array
>>>>>>>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
>>>>>>>
>>>>>>> Hi,
>>>>>>> copying Sagi's request from different thread:
>>>>>>>
>>>>>>> "
>>>>>>> Can you please enable srp_add_one debug:
>>>>>>>
>>>>>>> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control
>>>>>>>
>>>>>>> In addition apply the following:
>>>>>>> --
>>>>>>> diff --git a/drivers/infiniband/hw/mlx5/mr.c
>>>>>>> b/drivers/infiniband/hw/mlx5/mr.c
>>>>>>> index d9c6c0ea750b..040fbc387e4f 100644
>>>>>>> --- a/drivers/infiniband/hw/mlx5/mr.c
>>>>>>> +++ b/drivers/infiniband/hw/mlx5/mr.c
>>>>>>> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device,
>>>>>>>          int add_size;
>>>>>>>          int ret;
>>>>>>>
>>>>>>> +       WARN_ON_ONCE(ndescs >
>>>>>>> device->attr.max_fast_reg_page_list_len);
>>>>>>> +
>>>>>>>          add_size = max_t(int, MLX5_UMR_ALIGN -
>>>>>>>          ARCH_KMALLOC_MINALIGN,
>>>>>>>          0);
>>>>>>>
>>>>>>>          mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
>>>>>>>
>>>>>>> "
>>>>>>>
>>>>>>> Max.
>>>>>>>
>>>>>>>>
>>>>>>>> Built and tested the kernel.
>>>>>>>>
>>>>>>>> However this issue is not resolved :(
>>>>>>>>
>>>>>>>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>>>>> for
>>>>>>>> CQE ffff8817edca86b0
>>>>>>>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
>>>>>>>> [ 2708.121342] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2708.147104] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2708.172633] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
>>>>>>>> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
>>>>>>>> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>>>>> for
>>>>>>>> CQE ffff8817ed0a9c30
>>>>>>>>
>>>>>>>> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877):
>>>>>>>> dump
>>>>>>>> error cqe
>>>>>>>> [ 2746.443240] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2746.469323] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2746.495310] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0
>>>>>>>> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded
>>>>>>>> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>>>>> for
>>>>>>>> CQE ffff8817ed0a9cf0
>>>>>>>> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe
>>>>>>>> [ 2763.297826] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2763.323352] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2763.348722] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0
>>>>>>>>
>>>>>>>> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP
>>>>>>>> port-1:1 / host1.
>>>>>>>> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded
>>>>>>>> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>>>>> for
>>>>>>>> CQE ffff8817ed0a9cf0
>>>>>>>> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
>>>>>>>> [ 2780.093520] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2780.120067] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2780.145575] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2780.171153] 00000000 0f007806 25000042 000833d0
>>>>>>>> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded
>>>>>>>> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>>>>> for
>>>>>>>> CQE ffff8817ed0a9cf0
>>>>>>>> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe
>>>>>>>> [ 2796.495257] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2796.521506] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2796.547640] 00000000 00000000 00000000 00000000
>>>>>>>> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0
>>>>>>>> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded
>>>>>>>> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>>>>> for
>>>>>>>> CQE ffff8817ed0a9cf0
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Laurence
>>>>>>>>
>>>>>>>
>>>>>> Doing this now
>>>>>> Thanks
>>>>>> Laurence
>>>>>
>>>>> Max
>>>>>
>>>>> The Patch is not correct.
>>>>>
>>>>> drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs':
>>>>> drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has
>>>>> no
>>>>> member named 'attr'
>>>>>   WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len);
>>>>>                               ^
>>>>> ./include/asm-generic/bug.h:117:27: note: in definition of macro
>>>>> 'WARN_ON_ONCE'
>>>>>   int __ret_warn_once = !!(condition);   \
>>>>>
>>>>> I think you meant to give me
>>>>>
>>>>> WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len);
>>>>>
>>>>> Can you confirm
>>>>
>>>> Hi Laurence,
>>>> should be device->attrs.max_fast_reg_page_list_len.
>>>>
>>>> please check this one that might solve the issue (on top of everything):
>>>>
>>>>
>>>> diff --git a/drivers/infiniband/hw/mlx5/mr.c
>>>> b/drivers/infiniband/hw/mlx5/mr.c
>>>> index b8f9382..063d116 100644
>>>> --- a/drivers/infiniband/hw/mlx5/mr.c
>>>> +++ b/drivers/infiniband/hw/mlx5/mr.c
>>>> @@ -1559,7 +1559,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
>>>>                  mr->max_descs = ndescs;
>>>>          } else if (mr_type == IB_MR_TYPE_SG_GAPS) {
>>>>                  mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS;
>>>> -
>>>> +               MLX5_SET(mkc, mkc, translations_octword_size,
>>>> ALIGN(max_num_sg + 1, 4));
>>>>                  err = mlx5_alloc_priv_descs(pd->device, mr,
>>>>                                              ndescs, sizeof(struct
>>>> mlx5_klm));
>>>>                  if (err)
>>>>
>>>> thanks,
>>>> Max.
>>>>
>>>>>
>>>>> Thanks
>>>>> Laurence
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> Hello Max
>>>
>>> I have the corrected WARN_ON_ONCE patch and the above patch as well as the
>>> rest as it was from Barts tree.
>>>
>>> Still fails.
>>>
>>> For a baseline I can revert
>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
>>>
>>> Then test again to make sure we are starting from a good place.
>>>
>>> Initiator log
>>>
>>> [  280.481951] scsi host1: ib_srp: failed FAST REG status memory management
>>> operation error (6) for CQE ffff8817d9a881b8
>>> [  301.149106] scsi host1: ib_srp: reconnect succeeded
>>> [  301.280635] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>> CQE
>>> ffff8817ed32f2f0
>>> [  334.596420] scsi host2: ib_srp: failed RECV status WR flushed (5) for
>>> CQE
>>> ffff8817c592c970
>>> [  334.599689] mlx5_1:dump_cqe:262:(pid 20): dump error cqe
>>> [  334.599691] 00000000 00000000 00000000 00000000
>>> [  334.599692] 00000000 00000000 00000000 00000000
>>> [  334.599692] 00000000 00000000 00000000 00000000
>>> [  334.599693] 00000000 0f007806 2500002d 067b48d0
>>> [  334.599697] scsi host2: ib_srp: failed FAST REG status memory management
>>> operation error (6) for CQE ffff8817c6e30078
>>> [  336.117248] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
>>> [  336.145840] 00000000 00000000 00000000 00000000
>>> [  336.171830] 00000000 00000000 00000000 00000000
>>> [  336.197688] 00000000 00000000 00000000 00000000
>>> [  336.223720] 00000000 0f007806 25000032 005408d0
>>> [  339.712706] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>> [  341.453634] scsi host1: ib_srp: reconnect succeeded
>>> [  341.481600] mlx5_0:dump_cqe:262:(pid 130): dump error cqe
>>> [  341.482145] scsi host1: ib_srp: failed RECV status WR flushed (5) for
>>> CQE
>>> ffff8817ecaf6970
>>> [  341.559359] 00000000 00000000 00000000 00000000
>>> [  341.585397] 00000000 00000000 00000000 00000000
>>> [  341.610948] 00000000 00000000 00000000 00000000
>>> [  341.637515] 00000000 0f007806 2500003d 000046d0
>>> [  342.297598] sd 1:0:0:9: rejecting I/O to offline device
>>> [  342.297936] sd 1:0:0:9: [sdg] tag#28 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.297941] sd 1:0:0:9: [sdg] tag#28 CDB: Write(10) 2a 00 00 00 40 00 00
>>> 40 00 00
>>> [  342.297943] blk_update_request: recoverable transport error, dev sdg,
>>> sector 16384
>>> [  342.297951] sd 1:0:0:20: [sdar] tag#5 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.297952] sd 1:0:0:20: [sdar] tag#15 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.297956] sd 1:0:0:20: [sdar] tag#5 CDB: Write(10) 2a 00 00 03 c0 00
>>> 00
>>> 40 00 00
>>> [  342.297956] sd 1:0:0:20: [sdar] tag#15 CDB: Write(10) 2a 00 00 2c c0 00
>>> 00
>>> 40 00 00
>>> [  342.297958] blk_update_request: recoverable transport error, dev sdar,
>>> sector 245760
>>> [  342.297959] blk_update_request: recoverable transport error, dev sdar,
>>> sector 2932736
>>> [  342.298119] device-mapper: multipath: Failing path 8:96.
>>> [  342.298266] sd 1:0:0:9: [sdg] tag#29 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.298268] sd 1:0:0:9: [sdg] tag#29 CDB: Write(10) 2a 00 00 00 c0 00 00
>>> 40 00 00
>>> [  342.298269] blk_update_request: recoverable transport error, dev sdg,
>>> sector 49152
>>> [  342.298300] device-mapper: multipath: Failing path 66:176.
>>> [  342.298486] sd 1:0:0:20: [sdar] tag#16 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.298488] sd 1:0:0:20: [sdar] tag#6 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.298489] sd 1:0:0:20: [sdar] tag#16 CDB: Write(10) 2a 00 00 2d 40 00
>>> 00
>>> 40 00 00
>>> [  342.298490] sd 1:0:0:20: [sdar] tag#6 CDB: Write(10) 2a 00 00 04 40 00
>>> 00
>>> 40 00 00
>>> [  342.298491] blk_update_request: recoverable transport error, dev sdar,
>>> sector 2965504
>>> [  342.298492] blk_update_request: recoverable transport error, dev sdar,
>>> sector 278528
>>> [  342.298582] sd 1:0:0:9: [sdg] tag#30 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.298584] sd 1:0:0:9: [sdg] tag#30 CDB: Write(10) 2a 00 00 01 40 00 00
>>> 40 00 00
>>> [  342.298585] blk_update_request: recoverable transport error, dev sdg,
>>> sector 81920
>>> [  342.298889] sd 1:0:0:9: [sdg] tag#31 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.298890] sd 1:0:0:9: [sdg] tag#31 CDB: Write(10) 2a 00 00 01 c0 00 00
>>> 40 00 00
>>> [  342.298891] blk_update_request: recoverable transport error, dev sdg,
>>> sector 114688
>>> [  342.298981] sd 1:0:0:20: [sdar] tag#7 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.298983] sd 1:0:0:20: [sdar] tag#7 CDB: Write(10) 2a 00 00 04 c0 00
>>> 00
>>> 40 00 00
>>> [  342.298985] blk_update_request: recoverable transport error, dev sdar,
>>> sector 311296
>>> [  342.299004] sd 1:0:0:20: [sdar] tag#17 FAILED Result:
>>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>>> [  342.299007] sd 1:0:0:20: [sdar] tag#17 CDB: Write(10) 2a 00 00 34 c0 00
>>> 00
>>> 40 00 00
>>> [  342.299009] blk_update_request: recoverable transport error, dev sdar,
>>> sector 3457024
>>> [  342.356353] device-mapper: multipath: Failing path 8:64.
>>> [  342.356489] device-mapper: multipath: Failing path 8:128.
>>> [  342.356628] device-mapper: multipath: Failing path 8:160.
>>> [  342.356699] device-mapper: multipath: Failing path 8:176.
>>> [  342.356767] device-mapper: multipath: Failing path 8:240.
>>> [  342.356834] device-mapper: multipath: Failing path 8:208.
>>> [  342.356900] device-mapper: multipath: Failing path 65:16.
>>> [  342.356967] device-mapper: multipath: Failing path 65:64.
>>> [  342.357035] device-mapper: multipath: Failing path 65:96.
>>> [  342.357103] device-mapper: multipath: Failing path 65:128.
>>> [  342.357169] device-mapper: multipath: Failing path 65:176.
>>> [  342.357237] device-mapper: multipath: Failing path 65:208.
>>> [  342.357303] device-mapper: multipath: Failing path 65:224.
>>> [  342.357371] device-mapper: multipath: Failing path 66:0.
>>> [  342.357454] device-mapper: multipath: Failing path 66:32.
>>> [  342.357521] device-mapper: multipath: Failing path 66:48.
>>> [  342.357647] device-mapper: multipath: Failing path 66:80.
>>> [  342.357714] device-mapper: multipath: Failing path 66:112.
>>> [  342.357781] device-mapper: multipath: Failing path 66:144.
>>> [  342.357936] device-mapper: multipath: Failing path 66:208.
>>> [  342.358019] device-mapper: multipath: Failing path 66:240.
>>> [  342.358115] device-mapper: multipath: Failing path 67:16.
>>> [  342.358183] device-mapper: multipath: Failing path 67:48.
>>> [  342.358264] device-mapper: multipath: Failing path 67:80.
>>> [  342.358359] device-mapper: multipath: Failing path 67:128.
>>> [  342.358442] device-mapper: multipath: Failing path 67:160.
>>> [  342.358594] device-mapper: multipath: Failing path 67:224.
>>> [  342.358671] device-mapper: multipath: Failing path 67:208.
>>> [  350.157728] scsi host2: ib_srp: reconnect succeeded
>>> [  350.189605] mlx5_1:dump_cqe:262:(pid 4756): dump error cqe
>>> [  350.193180] mlx5_1:dump_cqe:262:(pid 1275): dump error cqe
>>> [  350.193182] 00000000 00000000 00000000 00000000
>>> [  350.193182] 00000000 00000000 00000000 00000000
>>> [  350.193183] 00000000 00000000 00000000 00000000
>>> [  350.193183] 00000000 0f007806 25000035 04f569d0
>>> [  350.193187] scsi host2: ib_srp: failed FAST REG status memory management
>>> operation error (6) for CQE ffff8817c6e30078
>>> [  350.412637] 00000000 00000000 00000000 00000000
>>> [  350.436431] 00000000 00000000 00000000 00000000
>>> [  350.461871] 00000000 00000000 00000000 00000000
>>> [  350.487549] 00000000 0f007806 25000032 000843d0
>>>
>>> Target Log
>>>
>>> Thee events happened after the first failures on the initiator
>>>
>>> [ 1111.029847] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-49.
>>> [ 1111.078815] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-48.
>>> [ 1111.127420] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-47.
>>> [ 1111.175801] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-46.
>>> [ 1111.223725] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-45.
>>> [ 1111.271957] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-44.
>>> [ 1111.319494] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-43.
>>> [ 1111.365795] ib_srpt Received CM TimeWait exit for ch
>>> 0x4f6e72000390fe7c7cfe900300726ed3-42.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Max
>>
>> These are the parameters all my tests run with.
>> Same as always.
>>
>> [root@localhost modprobe.d]# cat ib_srp.conf
>> options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048
>>
>> I dont set prefer_fr so it defaults to Y
>>
>> [root@localhost parameters]# cat prefer_fr
>> Y
>>
>> I have no settings for mlx5_core, all defaults.
>>
>> Thanks
>> Laurence
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Max,
>
> Reverting a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS on the same source tree with all esle applied I am stable.
> So clearly we still have issues with IB_MR_TYPE_SG_GAPS.
>
> Thanks
> Laurence
>

Hi Laurence,
I would like to see the prints that Sagi asked in the srp_add_one 
function (echo "func srp_add_one +p" > 
/sys/kernel/debug/dynamic_debug/control) and also prints from 
srp_create_target (echo "func srp_create_target +p" > 
/sys/kernel/debug/dynamic_debug/control).

another patch can help is:

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index cee4626..53a67fd 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3387,6 +3387,10 @@ static ssize_t srp_create_target(struct device *dev,
                              sizeof (struct srp_indirect_buf) +
                              target->cmd_sg_cnt * sizeof (struct 
srp_direct_buf);

+       pr_info("sg_tablesize %u mr_pool_size %u mr_per_cmd %u 
indirect_size %u max_iu_len %u max_sectors %u\n",
+               target->sg_tablesize, target->mr_pool_size, 
target->mr_per_cmd, target->indirect_size,
+               target->max_iu_len, target->scsi_host->max_sectors);
+
         INIT_WORK(&target->tl_err_work, srp_tl_err_work);
         INIT_WORK(&target->remove_work, srp_remove_work);
         spin_lock_init(&target->lock);


please add also the SG_GAPS Reenable commit and let's repro it again.
BTW, how many channels are open ?
can you load ib_srp module with ch_count param changes from 4 to 
#num_cpus and let's see when we get to repro it again.

thanks,
Max.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]             ` <20170426061640.GV14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-04-26 10:30               ` Max Gurtovoy
@ 2017-05-03  8:18               ` Sagi Grimberg
       [not found]                 ` <bcd56de8-0f17-f2bb-b079-bf22c1b92ca2-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-05-03  8:18 UTC (permalink / raw)
  To: Leon Romanovsky, Laurence Oberman
  Cc: Bart Van Assche, Doug Ledford, Max Gurtovoy, Israel Rukshin,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


>> Hello Bart, Leon, Max and Israel.
>>
>> I cloned off Barts tree.
>>
>> git clone https://github.com/bvanassche/linux
>> cd linux
>> git checkout block-scsi-for-next
>>
>> I checked all patches were in for this test.
>>
>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
>>
>> Built and tested the kernel.
>>
>> However this issue is not resolved :(
>>
>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
>> [ 2708.121342] 00000000 00000000 00000000 00000000
>> [ 2708.147104] 00000000 00000000 00000000 00000000
>> [ 2708.172633] 00000000 00000000 00000000 00000000
>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
>
> Parsed version:
> 	hw_error_syndrome                : 0xf
> 	hw_syndrome_type                 : 0x0
> 	vendor_error_syndrome            : 0x78
> 	syndrome                         : MEMORY_WINDOW_BIND_ERROR (0x6)
> 	s_wqe_opcode                     : UMR (0x25)
> 	opcode                           : REQUESTOR_ERROR (0xd)
> 	cqe_format                       : NO_INLINE_DATA (0x0)
> 	owner                            : 0x0
>
> Description:
> 	umr.klm_octoword_count > mkey.mtt_octoword_count
>
> Sagi, Max,
> Any idea where can it be?

Laurence, Max,

We need to make sure that we never overflow number of mapping
elements.

Looking at the code, it seems that some of it was reworked by
Artemy for ODP.

Laurence, can you try and retest the below patch:
--
diff --git a/drivers/infiniband/hw/mlx5/qp.c 
b/drivers/infiniband/hw/mlx5/qp.c
index ad8a2638e339..76f3857ecd53 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3224,22 +3224,19 @@ static void set_reg_mkey_seg(struct 
mlx5_mkey_seg *seg,
                              struct mlx5_ib_mr *mr,
                              u32 key, int access)
  {
-       int ndescs = ALIGN(mr->ndescs, 8) >> 1;
+       int size = mr->ndescs * mr->desc_size;

         memset(seg, 0, sizeof(*seg));

         if (mr->access_mode == MLX5_MKC_ACCESS_MODE_MTT)
                 seg->log2_page_size = ilog2(mr->ibmr.page_size);
-       else if (mr->access_mode == MLX5_MKC_ACCESS_MODE_KLMS)
-               /* KLMs take twice the size of MTTs */
-               ndescs *= 2;

         seg->flags = get_umr_flags(access) | mr->access_mode;
         seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
         seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
         seg->start_addr = cpu_to_be64(mr->ibmr.iova);
         seg->len = cpu_to_be64(mr->ibmr.length);
-       seg->xlt_oct_size = cpu_to_be32(ndescs);
+       seg->xlt_oct_size = cpu_to_be32(get_xlt_octo(size));
  }

  static void set_linv_mkey_seg(struct mlx5_mkey_seg *seg)
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                 ` <bcd56de8-0f17-f2bb-b079-bf22c1b92ca2-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-05-03 14:15                   ` Laurence Oberman
       [not found]                     ` <501334895.4531615.1493820950718.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-05-03 14:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Max Gurtovoy,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> To: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy"
> <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, May 3, 2017 4:18:38 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> >> Hello Bart, Leon, Max and Israel.
> >>
> >> I cloned off Barts tree.
> >>
> >> git clone https://github.com/bvanassche/linux
> >> cd linux
> >> git checkout block-scsi-for-next
> >>
> >> I checked all patches were in for this test.
> >>
> >> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> >> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> >> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> >>
> >> Built and tested the kernel.
> >>
> >> However this issue is not resolved :(
> >>
> >> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> >> CQE ffff8817edca86b0
> >> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> >> [ 2708.121342] 00000000 00000000 00000000 00000000
> >> [ 2708.147104] 00000000 00000000 00000000 00000000
> >> [ 2708.172633] 00000000 00000000 00000000 00000000
> >> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> >
> > Parsed version:
> > 	hw_error_syndrome                : 0xf
> > 	hw_syndrome_type                 : 0x0
> > 	vendor_error_syndrome            : 0x78
> > 	syndrome                         : MEMORY_WINDOW_BIND_ERROR (0x6)
> > 	s_wqe_opcode                     : UMR (0x25)
> > 	opcode                           : REQUESTOR_ERROR (0xd)
> > 	cqe_format                       : NO_INLINE_DATA (0x0)
> > 	owner                            : 0x0
> >
> > Description:
> > 	umr.klm_octoword_count > mkey.mtt_octoword_count
> >
> > Sagi, Max,
> > Any idea where can it be?
> 
> Laurence, Max,
> 
> We need to make sure that we never overflow number of mapping
> elements.
> 
> Looking at the code, it seems that some of it was reworked by
> Artemy for ODP.
> 
> Laurence, can you try and retest the below patch:
> --
> diff --git a/drivers/infiniband/hw/mlx5/qp.c
> b/drivers/infiniband/hw/mlx5/qp.c
> index ad8a2638e339..76f3857ecd53 100644
> --- a/drivers/infiniband/hw/mlx5/qp.c
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -3224,22 +3224,19 @@ static void set_reg_mkey_seg(struct
> mlx5_mkey_seg *seg,
>                               struct mlx5_ib_mr *mr,
>                               u32 key, int access)
>   {
> -       int ndescs = ALIGN(mr->ndescs, 8) >> 1;
> +       int size = mr->ndescs * mr->desc_size;
> 
>          memset(seg, 0, sizeof(*seg));
> 
>          if (mr->access_mode == MLX5_MKC_ACCESS_MODE_MTT)
>                  seg->log2_page_size = ilog2(mr->ibmr.page_size);
> -       else if (mr->access_mode == MLX5_MKC_ACCESS_MODE_KLMS)
> -               /* KLMs take twice the size of MTTs */
> -               ndescs *= 2;
> 
>          seg->flags = get_umr_flags(access) | mr->access_mode;
>          seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
>          seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
>          seg->start_addr = cpu_to_be64(mr->ibmr.iova);
>          seg->len = cpu_to_be64(mr->ibmr.length);
> -       seg->xlt_oct_size = cpu_to_be32(ndescs);
> +       seg->xlt_oct_size = cpu_to_be32(get_xlt_octo(size));
>   }
> 
>   static void set_linv_mkey_seg(struct mlx5_mkey_seg *seg)
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Sagi
Against Bart's tree again

a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt

Above are all in
Added your most recent patch above

Same behavior.
[  579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817de9c57b0
[  579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe
[  579.369877] 00000000 00000000 00000000 00000000
[  579.369877] 00000000 00000000 00000000 00000000
[  579.369878] 00000000 00000000 00000000 00000000
[  579.369878] 00000000 0f007806 2500002b 1c528dd0
[  579.369883] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88179a460af8
[  594.814222] scsi host1: ib_srp: reconnect succeeded
[  594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817e1d4a6b0
[  595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe
[  595.525995] 00000000 00000000 00000000 00000000
[  595.552125] 00000000 00000000 00000000 00000000
[  595.578204] 00000000 00000000 00000000 00000000
[  595.603670] 00000000 0f007806 25000033 002d77d0
^C[  610.821911] scsi host1: ib_srp: reconnect succeeded
[  610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817e1d4a170
[  611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe
[  611.543083] 00000000 00000000 00000000 00000000
[  611.568670] 00000000 00000000 00000000 00000000
[  611.594064] 00000000 00000000 00000000 00000000
[  611.620142] 00000000 0f007806 2500003b 003161d0

I will capture the function traces with your patch applied and the additional logging asked for by Max.
Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                     ` <501334895.4531615.1493820950718.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-05-03 14:58                       ` Sagi Grimberg
       [not found]                         ` <374fcc74-4b84-610b-b55e-d385563bef6f-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-05-03 14:58 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Max Gurtovoy,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA


> Hello Sagi
> Against Bart's tree again
>
> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
>
> Above are all in
> Added your most recent patch above
>
> Same behavior.
> [  579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817de9c57b0
> [  579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe
> [  579.369877] 00000000 00000000 00000000 00000000
> [  579.369877] 00000000 00000000 00000000 00000000
> [  579.369878] 00000000 00000000 00000000 00000000
> [  579.369878] 00000000 0f007806 2500002b 1c528dd0
> [  579.369883] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88179a460af8
> [  594.814222] scsi host1: ib_srp: reconnect succeeded
> [  594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817e1d4a6b0
> [  595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe
> [  595.525995] 00000000 00000000 00000000 00000000
> [  595.552125] 00000000 00000000 00000000 00000000
> [  595.578204] 00000000 00000000 00000000 00000000
> [  595.603670] 00000000 0f007806 25000033 002d77d0
> ^C[  610.821911] scsi host1: ib_srp: reconnect succeeded
> [  610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817e1d4a170
> [  611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe
> [  611.543083] 00000000 00000000 00000000 00000000
> [  611.568670] 00000000 00000000 00000000 00000000
> [  611.594064] 00000000 00000000 00000000 00000000
> [  611.620142] 00000000 0f007806 2500003b 003161d0
>
> I will capture the function traces with your patch applied and the additional logging asked for by Max.

Thanks, that would be helpful,

Can you try the following patch, just to see if there is an off by 1 case:

--
diff --git a/drivers/infiniband/hw/mlx5/mr.c 
b/drivers/infiniband/hw/mlx5/mr.c
index b8f9382a8b7d..3d6ef7bce7d9 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1525,7 +1525,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
  {
         struct mlx5_ib_dev *dev = to_mdev(pd->device);
         int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
-       int ndescs = ALIGN(max_num_sg, 4);
+       int ndescs = ALIGN(max_num_sg + 1, 4);
         struct mlx5_ib_mr *mr;
         void *mkc;
         u32 *in;
--

It's not a fix, but if it works it can give us a clue...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                         ` <374fcc74-4b84-610b-b55e-d385563bef6f-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-05-05 16:31                           ` Laurence Oberman
       [not found]                             ` <1072634318.5542006.1494001866306.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Laurence Oberman @ 2017-05-05 16:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Max Gurtovoy,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, May 3, 2017 10:58:43 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> > Hello Sagi
> > Against Bart's tree again
> >
> > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> >
> > Above are all in
> > Added your most recent patch above
> >
> > Same behavior.
> > [  579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817de9c57b0
> > [  579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe
> > [  579.369877] 00000000 00000000 00000000 00000000
> > [  579.369877] 00000000 00000000 00000000 00000000
> > [  579.369878] 00000000 00000000 00000000 00000000
> > [  579.369878] 00000000 0f007806 2500002b 1c528dd0
> > [  579.369883] scsi host1: ib_srp: failed FAST REG status memory management
> > operation error (6) for CQE ffff88179a460af8
> > [  594.814222] scsi host1: ib_srp: reconnect succeeded
> > [  594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817e1d4a6b0
> > [  595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe
> > [  595.525995] 00000000 00000000 00000000 00000000
> > [  595.552125] 00000000 00000000 00000000 00000000
> > [  595.578204] 00000000 00000000 00000000 00000000
> > [  595.603670] 00000000 0f007806 25000033 002d77d0
> > ^C[  610.821911] scsi host1: ib_srp: reconnect succeeded
> > [  610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817e1d4a170
> > [  611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe
> > [  611.543083] 00000000 00000000 00000000 00000000
> > [  611.568670] 00000000 00000000 00000000 00000000
> > [  611.594064] 00000000 00000000 00000000 00000000
> > [  611.620142] 00000000 0f007806 2500003b 003161d0
> >
> > I will capture the function traces with your patch applied and the
> > additional logging asked for by Max.
> 
> Thanks, that would be helpful,
> 
> Can you try the following patch, just to see if there is an off by 1 case:
> 
> --
> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> b/drivers/infiniband/hw/mlx5/mr.c
> index b8f9382a8b7d..3d6ef7bce7d9 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1525,7 +1525,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
>   {
>          struct mlx5_ib_dev *dev = to_mdev(pd->device);
>          int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
> -       int ndescs = ALIGN(max_num_sg, 4);
> +       int ndescs = ALIGN(max_num_sg + 1, 4);
>          struct mlx5_ib_mr *mr;
>          void *mkc;
>          u32 *in;
> --
> 
> It's not a fix, but if it works it can give us a clue...
> 

Sorry, been delayed this week, will get this done this weekend.
Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
       [not found]                             ` <1072634318.5542006.1494001866306.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-05-10 14:06                               ` Laurence Oberman
  0 siblings, 0 replies; 26+ messages in thread
From: Laurence Oberman @ 2017-05-10 14:06 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Leon Romanovsky, Bart Van Assche, Doug Ledford, Max Gurtovoy,
	Israel Rukshin, linux-rdma-u79uwXL29TY76Z2rM5mHXA



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Friday, May 5, 2017 12:31:06 PM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> 
> ----- Original Message -----
> > From: "Sagi Grimberg" <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Bart Van Assche"
> > <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Doug Ledford"
> > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Israel Rukshin"
> > <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
> > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Wednesday, May 3, 2017 10:58:43 AM
> > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms()
> > overflows the klms[] array
> > 
> > 
> > > Hello Sagi
> > > Against Bart's tree again
> > >
> > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> > > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> > >
> > > Above are all in
> > > Added your most recent patch above
> > >
> > > Same behavior.
> > > [  579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817de9c57b0
> > > [  579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe
> > > [  579.369877] 00000000 00000000 00000000 00000000
> > > [  579.369877] 00000000 00000000 00000000 00000000
> > > [  579.369878] 00000000 00000000 00000000 00000000
> > > [  579.369878] 00000000 0f007806 2500002b 1c528dd0
> > > [  579.369883] scsi host1: ib_srp: failed FAST REG status memory
> > > management
> > > operation error (6) for CQE ffff88179a460af8
> > > [  594.814222] scsi host1: ib_srp: reconnect succeeded
> > > [  594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817e1d4a6b0
> > > [  595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe
> > > [  595.525995] 00000000 00000000 00000000 00000000
> > > [  595.552125] 00000000 00000000 00000000 00000000
> > > [  595.578204] 00000000 00000000 00000000 00000000
> > > [  595.603670] 00000000 0f007806 25000033 002d77d0
> > > ^C[  610.821911] scsi host1: ib_srp: reconnect succeeded
> > > [  610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > > CQE ffff8817e1d4a170
> > > [  611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe
> > > [  611.543083] 00000000 00000000 00000000 00000000
> > > [  611.568670] 00000000 00000000 00000000 00000000
> > > [  611.594064] 00000000 00000000 00000000 00000000
> > > [  611.620142] 00000000 0f007806 2500003b 003161d0
> > >
> > > I will capture the function traces with your patch applied and the
> > > additional logging asked for by Max.
> > 
> > Thanks, that would be helpful,
> > 
> > Can you try the following patch, just to see if there is an off by 1 case:
> > 
> > --
> > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > b/drivers/infiniband/hw/mlx5/mr.c
> > index b8f9382a8b7d..3d6ef7bce7d9 100644
> > --- a/drivers/infiniband/hw/mlx5/mr.c
> > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > @@ -1525,7 +1525,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
> >   {
> >          struct mlx5_ib_dev *dev = to_mdev(pd->device);
> >          int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
> > -       int ndescs = ALIGN(max_num_sg, 4);
> > +       int ndescs = ALIGN(max_num_sg + 1, 4);
> >          struct mlx5_ib_mr *mr;
> >          void *mkc;
> >          u32 *in;
> > --
> > 
> > It's not a fix, but if it works it can give us a clue...
> > 
> 
> Sorry, been delayed this week, will get this done this weekend.
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Sagi, Max 

With the patch below against Barts tree we still see the cqe_dump issue.

Is what is in the everything you wanted applied.
Please check I did not miss anything before I start the tracing.

May  9 17:16:00 localhost kernel: scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed75c830
May  9 17:16:00 localhost kernel: mlx5_1:dump_cqe:262:(pid 14567): dump error cqe
May  9 17:16:00 localhost kernel: 00000000 00000000 00000000 00000000
May  9 17:16:00 localhost kernel: 00000000 00000000 00000000 00000000
May  9 17:16:00 localhost kernel: 00000000 00000000 00000000 00000000
May  9 17:16:00 localhost kernel: 00000000 0f007806 2500002a 0b670bd0
May  9 17:16:00 localhost kernel: scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff8817972ac278
May  9 17:16:16 localhost kernel: scsi host2: ib_srp: reconnect succeeded
May  9 17:16:16 localhost kernel: scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817d819b130


diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 99beacf..cf899b4 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1525,7 +1525,8 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 {
        struct mlx5_ib_dev *dev = to_mdev(pd->device);
        int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
-       int ndescs = ALIGN(max_num_sg, 4);
+       //int ndescs = ALIGN(max_num_sg, 4);
+       int ndescs = ALIGN(max_num_sg + 1, 4);
        struct mlx5_ib_mr *mr;
        void *mkc;
        u32 *in;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index ad8a263..cb726a5 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3224,22 +3224,19 @@ static void set_reg_mkey_seg(struct mlx5_mkey_seg *seg,
                             struct mlx5_ib_mr *mr,
                             u32 key, int access)
 {
-       int ndescs = ALIGN(mr->ndescs, 8) >> 1;
+        int size = mr->ndescs * mr->desc_size;
 
        memset(seg, 0, sizeof(*seg));
 
        if (mr->access_mode == MLX5_MKC_ACCESS_MODE_MTT)
                seg->log2_page_size = ilog2(mr->ibmr.page_size);
-       else if (mr->access_mode == MLX5_MKC_ACCESS_MODE_KLMS)
-               /* KLMs take twice the size of MTTs */
-               ndescs *= 2;
 
        seg->flags = get_umr_flags(access) | mr->access_mode;
        seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
        seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
        seg->start_addr = cpu_to_be64(mr->ibmr.iova);
        seg->len = cpu_to_be64(mr->ibmr.length);
-       seg->xlt_oct_size = cpu_to_be32(ndescs);
+        seg->xlt_oct_size = cpu_to_be32(get_xlt_octo(size));
 }

I will see about capturing traces, but I am writing to a RAM disk on the target so likely will have a flood of trace data.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-05-10 14:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-24 22:15 [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array Bart Van Assche
     [not found] ` <8992bd28-667f-94b1-e582-106e6b41aa4b-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-04-24 22:39   ` Laurence Oberman
     [not found]     ` <1726285260.1422143.1493073573791.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-24 22:46       ` Bart Van Assche
     [not found]         ` <1493073989.3394.24.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-04-24 22:59           ` Laurence Oberman
2017-04-25 17:58   ` Leon Romanovsky
     [not found]     ` <20170425175849.GS14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-25 20:37       ` Laurence Oberman
     [not found]         ` <438230391.2090966.1493152655709.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-26  3:39           ` Bart Van Assche
     [not found]             ` <1493177952.3503.1.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-04-26 11:46               ` Laurence Oberman
     [not found]                 ` <1801288254.2280763.1493207193850.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-26 15:05                   ` Bart Van Assche
2017-04-26  6:16           ` Leon Romanovsky
     [not found]             ` <20170426061640.GV14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-26 10:30               ` Max Gurtovoy
2017-05-03  8:18               ` Sagi Grimberg
     [not found]                 ` <bcd56de8-0f17-f2bb-b079-bf22c1b92ca2-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-05-03 14:15                   ` Laurence Oberman
     [not found]                     ` <501334895.4531615.1493820950718.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-03 14:58                       ` Sagi Grimberg
     [not found]                         ` <374fcc74-4b84-610b-b55e-d385563bef6f-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-05-05 16:31                           ` Laurence Oberman
     [not found]                             ` <1072634318.5542006.1494001866306.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-10 14:06                               ` Laurence Oberman
2017-04-26  8:31           ` Max Gurtovoy
     [not found]             ` <896e9a9e-43b6-7a21-e41b-861e4f795436-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-04-26 11:47               ` Laurence Oberman
     [not found]                 ` <288883138.2280971.1493207257218.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-26 12:18                   ` Laurence Oberman
     [not found]                     ` <497950649.2287440.1493209093092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-26 12:20                       ` Laurence Oberman
2017-04-26 12:25                       ` Max Gurtovoy
     [not found]                         ` <16ea1371-84a5-c055-5b0c-fdc6d355276a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-04-26 13:28                           ` Laurence Oberman
     [not found]                             ` <2122831810.2341766.1493213317484.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-26 13:50                               ` Laurence Oberman
     [not found]                                 ` <1879402127.2348907.1493214625254.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-26 15:10                                   ` Laurence Oberman
     [not found]                                     ` <1477402175.2378198.1493219418826.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-02 23:28                                       ` Max Gurtovoy
2017-04-26 14:45   ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.