From mboxrd@z Thu Jan 1 00:00:00 1970 From: Max Gurtovoy Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS Date: Wed, 15 Feb 2017 18:18:02 +0200 Message-ID: <0514bb01-95cf-c10a-b883-494f149845f3@mellanox.com> References: <20170214185636.29250-1-bart.vanassche@sandisk.com> <20170214185636.29250-2-bart.vanassche@sandisk.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Sagi Grimberg , Bart Van Assche , Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Israel Rukshin , Leon Romanovsky , Mark Bloch , Yuval Shaia , "# 4 . 7+" List-Id: linux-rdma@vger.kernel.org On 2/15/2017 5:38 PM, Sagi Grimberg wrote: > >> Tests have shown that the following error message is reported when >> using SG-GAPS registration with an mlx5 adapter: >> >> scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE >> ffff880bd4270eb0 >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 0f007806 2500002a ad9fafd1 >> scsi host1: ib_srp: reconnect succeeded >> mlx5_0:dump_cqe:262:(pid 7369): dump error cqe >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 0f007806 25000032 00105dd0 >> scsi host1: ib_srp: failed FAST REG status memory management operation >> error (6) for CQE ffff880b92860138 >> >> Hence avoid using SG-GAPS memory registrations. Additionally, >> always configure the blk_queue_virt_boundary() to avoid to trigger >> a mapping failure when using adapters that support SG-GAPS (e.g. >> mlx5). > > Hi Guys, > > Sorry for addressing this late, but has this failure been investigated? > > Max, Israel, what does this error syndrome map to? Sagi, this syndrome says that number of klms to write is bigger than number of mtts. Artemy started investigating it and proposed solution that were tested by Laurence. Let's see if your fix will help. > > Looking at mlx5_ib_sg_to_klms, I think the mr->length is incorrectly > incremented. Does the following change fix the problem? > -- > diff --git a/drivers/infiniband/hw/mlx5/mr.c > b/drivers/infiniband/hw/mlx5/mr.c > index 8f608debe141..c21c9eee37f6 100644 > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -1832,7 +1832,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, > klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset); > klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset); > klms[i].key = cpu_to_be32(lkey); > - mr->ibmr.length += sg_dma_len(sg); > + mr->ibmr.length += sg_dma_len(sg) - sg_offset; > > sg_offset = 0; > } > -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-db5eur01on0073.outbound.protection.outlook.com ([104.47.2.73]:2307 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751318AbdBOQSL (ORCPT ); Wed, 15 Feb 2017 11:18:11 -0500 Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS To: Sagi Grimberg , Bart Van Assche , Doug Ledford References: <20170214185636.29250-1-bart.vanassche@sandisk.com> <20170214185636.29250-2-bart.vanassche@sandisk.com> CC: , Israel Rukshin , "Leon Romanovsky" , Mark Bloch , "Yuval Shaia" , "# 4 . 7+" From: Max Gurtovoy Message-ID: <0514bb01-95cf-c10a-b883-494f149845f3@mellanox.com> Date: Wed, 15 Feb 2017 18:18:02 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Sender: stable-owner@vger.kernel.org List-ID: On 2/15/2017 5:38 PM, Sagi Grimberg wrote: > >> Tests have shown that the following error message is reported when >> using SG-GAPS registration with an mlx5 adapter: >> >> scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE >> ffff880bd4270eb0 >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 0f007806 2500002a ad9fafd1 >> scsi host1: ib_srp: reconnect succeeded >> mlx5_0:dump_cqe:262:(pid 7369): dump error cqe >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 00000000 >> 00000000 0f007806 25000032 00105dd0 >> scsi host1: ib_srp: failed FAST REG status memory management operation >> error (6) for CQE ffff880b92860138 >> >> Hence avoid using SG-GAPS memory registrations. Additionally, >> always configure the blk_queue_virt_boundary() to avoid to trigger >> a mapping failure when using adapters that support SG-GAPS (e.g. >> mlx5). > > Hi Guys, > > Sorry for addressing this late, but has this failure been investigated? > > Max, Israel, what does this error syndrome map to? Sagi, this syndrome says that number of klms to write is bigger than number of mtts. Artemy started investigating it and proposed solution that were tested by Laurence. Let's see if your fix will help. > > Looking at mlx5_ib_sg_to_klms, I think the mr->length is incorrectly > incremented. Does the following change fix the problem? > -- > diff --git a/drivers/infiniband/hw/mlx5/mr.c > b/drivers/infiniband/hw/mlx5/mr.c > index 8f608debe141..c21c9eee37f6 100644 > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -1832,7 +1832,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, > klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset); > klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset); > klms[i].key = cpu_to_be32(lkey); > - mr->ibmr.length += sg_dma_len(sg); > + mr->ibmr.length += sg_dma_len(sg) - sg_offset; > > sg_offset = 0; > } > --