All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags
@ 2015-11-10 10:44 Sagi Grimberg
       [not found] ` <1447152255-28231-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 10:44 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Instead of each ULP being aware of iWARP/IB protocol in order
to determine the rdma_read access flags, have it accessible
as an attribute in the ib_device.

Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
unconditionally.

This patchset goes on top of Christoph's device attributes merge into
struct ib_device.

Sagi Grimberg (3):
  IB/core: Expose a device attribute for rdma_read access flags
  svcrdma: Use device rdma_read_access_flags
  RDS_IW: Use device rdma_read_access_flags

 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  2 ++
 drivers/infiniband/hw/cxgb4/provider.c       |  2 ++
 drivers/infiniband/hw/mlx4/main.c            |  1 +
 drivers/infiniband/hw/mlx5/main.c            |  1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |  1 +
 drivers/infiniband/hw/nes/nes_verbs.c        |  2 ++
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |  1 +
 drivers/infiniband/hw/qib/qib_verbs.c        |  1 +
 drivers/staging/rdma/hfi1/verbs.c            |  2 ++
 include/rdma/ib_verbs.h                      |  1 +
 net/rds/iw_send.c                            | 17 ++++++++++-------
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c      |  2 +-
 12 files changed, 25 insertions(+), 8 deletions(-)

-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found] ` <1447152255-28231-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-11-10 10:44   ` Sagi Grimberg
       [not found]     ` <1447152255-28231-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 10:44   ` [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags Sagi Grimberg
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 10:44 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Different devices may require different access permissions
for rdma reads e.g. for Infiniband devices, local write access
suffices while iWARP devices require remote write permissions as
well.

This situation generates extra logic for ULPs that need to be aware
of the underlying device to determine rdma read access. Instead,
add a device attribute for ULPs to use without being aware of the
underlying device.

Also, set rdma_read_access_flags in the relevant device drivers:
mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb3/iwch_provider.c  | 2 ++
 drivers/infiniband/hw/cxgb4/provider.c       | 2 ++
 drivers/infiniband/hw/mlx4/main.c            | 1 +
 drivers/infiniband/hw/mlx5/main.c            | 1 +
 drivers/infiniband/hw/mthca/mthca_provider.c | 1 +
 drivers/infiniband/hw/nes/nes_verbs.c        | 2 ++
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   | 1 +
 drivers/infiniband/hw/qib/qib_verbs.c        | 1 +
 drivers/staging/rdma/hfi1/verbs.c            | 2 ++
 include/rdma/ib_verbs.h                      | 1 +
 10 files changed, 14 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index db4c08c53ae3..483ae3c8a13b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1467,6 +1467,8 @@ int iwch_register_device(struct iwch_dev *dev)
 	dev->ibdev.max_pd = dev->attr.max_pds;
 	dev->ibdev.local_ca_ack_delay = 0;
 	dev->ibdev.max_fast_reg_page_list_len = T3_MAX_FASTREG_DEPTH;
+	ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE |
+					IB_ACCESS_REMOTE_WRITE;
 
 	ret = ib_register_device(&dev->ibdev, NULL);
 	if (ret)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 84e9b5db0341..001db1cc737c 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -564,6 +564,8 @@ int c4iw_register_device(struct c4iw_dev *dev)
 	dev->ibdev.max_pd = T4_MAX_NUM_PD;
 	dev->ibdev.local_ca_ack_delay = 0;
 	dev->ibdev.max_fast_reg_page_list_len = t4_max_fr_depth(use_dsgl);
+	dev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE |
+					    IB_ACCESS_REMOTE_WRITE;
 
 	ret = ib_register_device(&dev->ibdev, NULL);
 	if (ret)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 46305dc27f9f..82af524dfb4e 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -524,6 +524,7 @@ static int mlx4_ib_init_device_flags(struct ib_device *ibdev)
 	ibdev->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
 	ibdev->hca_core_clock = dev->dev->caps.hca_core_clock * 1000UL;
 	ibdev->timestamp_mask = 0xFFFFFFFFFFFFULL;
+	ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 
 	if (!mlx4_is_slave(dev->dev))
 		err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5b733221ca5f..b29f79a03494 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -287,6 +287,7 @@ static int mlx5_ib_init_device_flags(struct ib_device *ibdev)
 	ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach *
 					   ibdev->max_mcast_grp;
 	ibdev->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
+	ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	if (MLX5_CAP_GEN(mdev, pg))
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 28d7a8bee3f7..ea1882bbe7a5 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -110,6 +110,7 @@ static int mthca_init_device_flags(struct ib_device *ibdev)
 	ibdev->max_mcast_qp_attach = MTHCA_QP_PER_MGM;
 	ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach *
 					   ibdev->max_mcast_grp;
+	ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 	/*
 	 * If Sinai memory key optimization is being used, then only
 	 * the 8-bit key portion will change.  For other HCAs, the
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 2ef911953e38..adcea173f8dc 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -3868,6 +3868,8 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
 	nesibdev->ibdev.max_mw = nesibdev->max_mr;
 	nesibdev->ibdev.max_pd = nesibdev->max_pd;
 	nesibdev->ibdev.max_sge_rd = 1;
+	nesibdev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
+
 	switch (nesdev->nesadapter->max_irrq_wr) {
 		case 0:
 			nesibdev->ibdev.max_qp_rd_atom = 2;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 00ddcf9294e4..235bc90bf2c6 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
 	dev->ibdev.local_ca_ack_delay = dev->attr.local_ca_ack_delay;
 	dev->ibdev.max_fast_reg_page_list_len = dev->attr.max_pages_per_frmr;
 	dev->ibdev.max_pkeys = 1;
+	dev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 
 	return ib_register_device(&dev->ibdev, NULL);
 }
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 9e1af0b9857a..be84307611c4 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -2257,6 +2257,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
 	ibdev->max_mcast_qp_attach = ib_qib_max_mcast_qp_attached;
 	ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach *
 		ibdev->max_mcast_grp;
+	ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 
 	snprintf(ibdev->node_desc, sizeof(ibdev->node_desc),
 		 "Intel Infiniband HCA %s", init_utsname()->nodename);
diff --git a/drivers/staging/rdma/hfi1/verbs.c b/drivers/staging/rdma/hfi1/verbs.c
index 664548a62e4d..d3e0dd27a8e2 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -2062,6 +2062,8 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
 	ibdev->max_mcast_qp_attach = hfi1_max_mcast_qp_attached;
 	ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach *
 		ibdev->max_mcast_grp;
+	ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE |
+					IB_ACCESS_REMOTE_WRITE;
 
 	strncpy(ibdev->node_desc, init_utsname()->nodename,
 		sizeof(ibdev->node_desc));
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 861d92f1a0b0..2abbdd976c18 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1851,6 +1851,7 @@ struct ib_device {
 	struct ib_odp_caps	odp_caps;
 	uint64_t		timestamp_mask;
 	uint64_t		hca_core_clock; /* in KHZ */
+	int			rdma_read_access_flags;
 
 	/**
 	 * The following mandatory functions are used only at device
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found] ` <1447152255-28231-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 10:44   ` [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags Sagi Grimberg
@ 2015-11-10 10:44   ` Sagi Grimberg
       [not found]     ` <1447152255-28231-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 10:44   ` [PATCH RFC 3/3] RDS_IW: " Sagi Grimberg
  2015-11-10 11:31   ` [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags Christoph Hellwig
  3 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 10:44 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Instead of hard-coding remote access (which is not secured
issue in IB).

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 303f194970f9..692c0142c7b6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
 	read = min_t(int, nents << PAGE_SHIFT, rs_length);
 
 	frmr->direction = DMA_FROM_DEVICE;
-	frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
+	frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
 	frmr->sg_nents = nents;
 
 	for (pno = 0; pno < nents; pno++) {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags
       [not found] ` <1447152255-28231-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 10:44   ` [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags Sagi Grimberg
  2015-11-10 10:44   ` [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags Sagi Grimberg
@ 2015-11-10 10:44   ` Sagi Grimberg
       [not found]     ` <1447152255-28231-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 11:31   ` [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags Christoph Hellwig
  3 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 10:44 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Instead of hard-coding remote access (which is not secured
issue in IB).

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 net/rds/iw_send.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index e20bd503f4bd..47d684541342 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -763,7 +763,8 @@ out:
 
 static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
 				 struct scatterlist *sg,
-				 int sg_nents)
+				 int sg_nents,
+				 int access_flags)
 {
 	int n;
 
@@ -776,7 +777,7 @@ static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
 	send->s_reg_wr.wr.num_sge = 0;
 	send->s_reg_wr.mr = send->s_mr;
 	send->s_reg_wr.key = send->s_mr->rkey;
-	send->s_reg_wr.access = IB_ACCESS_REMOTE_WRITE;
+	send->s_reg_wr.access = access_flags;
 
 	ib_update_fast_reg_key(send->s_mr, send->s_remap_count++);
 
@@ -786,6 +787,7 @@ static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
 int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
 {
 	struct rds_iw_connection *ic = conn->c_transport_data;
+	struct ib_device *device = ic->i_cm_id->device;
 	struct rds_iw_send_work *send = NULL;
 	struct rds_iw_send_work *first;
 	struct rds_iw_send_work *prev;
@@ -803,11 +805,11 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
 	int num_sge;
 	int sg_nents;
 
-	rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);
+	rds_iwdev = ib_get_client_data(device, &rds_iw_client);
 
 	/* map the message the first time we see it */
 	if (!op->op_mapped) {
-		op->op_count = ib_dma_map_sg(ic->i_cm_id->device,
+		op->op_count = ib_dma_map_sg(device,
 					     op->op_sg, op->op_nents, (op->op_write) ?
 					     DMA_TO_DEVICE : DMA_FROM_DEVICE);
 		rdsdebug("ic %p mapping op %p: %d\n", ic, op, op->op_count);
@@ -896,12 +898,12 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
 
 		for (j = 0; j < send->s_rdma_wr.wr.num_sge &&
 		     scat != &op->op_sg[op->op_count]; j++) {
-			len = ib_sg_dma_len(ic->i_cm_id->device, scat);
+			len = ib_sg_dma_len(device, scat);
 
 			if (send->s_rdma_wr.wr.opcode == IB_WR_RDMA_READ_WITH_INV)
 				sg_nents++;
 			else {
-				send->s_sge[j].addr = ib_sg_dma_address(ic->i_cm_id->device, scat);
+				send->s_sge[j].addr = ib_sg_dma_address(device, scat);
 				send->s_sge[j].length = len;
 				send->s_sge[j].lkey = rds_iw_local_dma_lkey(ic);
 			}
@@ -947,7 +949,8 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
 	 */
 	if (!op->op_write) {
 		ret = rds_iw_build_send_reg(&ic->i_sends[fr_pos],
-					    &op->op_sg[0], sg_nents);
+					    &op->op_sg[0], sg_nents,
+					    device->rdma_read_access_flags);
 		if (ret) {
 			printk(KERN_WARNING "RDS/IW: failed to reg send mem\n");
 			goto out;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]     ` <1447152255-28231-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-11-10 11:25       ` Christoph Hellwig
       [not found]         ` <20151110112537.GA8991-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 11:51       ` Yann Droneaud
  2015-11-10 18:01       ` Jason Gunthorpe
  2 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 11:25 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 12:44:13PM +0200, Sagi Grimberg wrote:
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE

>From all I can tell nes also is a iWarp driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags
       [not found]     ` <1447152255-28231-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-11-10 11:26       ` Christoph Hellwig
       [not found]         ` <20151110112638.GB8991-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 11:26 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Looks reasonable, although currently this code is only used for iWarp
anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags
       [not found] ` <1447152255-28231-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-11-10 10:44   ` [PATCH RFC 3/3] RDS_IW: " Sagi Grimberg
@ 2015-11-10 11:31   ` Christoph Hellwig
       [not found]     ` <20151110113141.GA23093-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  3 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 11:31 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 12:44:12PM +0200, Sagi Grimberg wrote:
> Instead of each ULP being aware of iWARP/IB protocol in order
> to determine the rdma_read access flags, have it accessible
> as an attribute in the ib_device.
> 
> Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
> unconditionally.
> 
> This patchset goes on top of Christoph's device attributes merge into
> struct ib_device.

FYI, I've updated the git branch to be based on current linus' tree
which required a few bit to be fixed.  I'd also like to note that while
everyone but Or seemed to be generally fine with it I'd really prefer
and actualy revivewed-by or acked-by tag.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags
       [not found]         ` <20151110112638.GB8991-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 11:35           ` Sagi Grimberg
  0 siblings, 0 replies; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 11:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> Looks reasonable, although currently this code is only used for iWarp
> anyway.

I know... I'm hoping this will change at some point, and when it does,
it will get it right hopefully.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]         ` <20151110112537.GA8991-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 11:36           ` Sagi Grimberg
  0 siblings, 0 replies; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 11:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


>  From all I can tell nes also is a iWarp driver.

It is.. I don't know why I treated it as IB :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]     ` <1447152255-28231-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-11-10 11:38       ` Christoph Hellwig
       [not found]         ` <20151110113839.GA32523-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 11:41       ` Christoph Hellwig
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 11:38 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote:
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>  	read = min_t(int, nents << PAGE_SHIFT, rs_length);
>  
>  	frmr->direction = DMA_FROM_DEVICE;
> -	frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
> +	frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
>  	frmr->sg_nents = nents;

I think you can simply drop the access_flags member and use
rdma_read_access_flags directly later in the function.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags
       [not found]     ` <20151110113141.GA23093-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 11:40       ` Sagi Grimberg
  2015-11-10 16:18       ` Steve Wise
  1 sibling, 0 replies; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 11:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> FYI, I've updated the git branch to be based on current linus' tree
> which required a few bit to be fixed.  I'd also like to note that while
> everyone but Or seemed to be generally fine with it I'd really prefer
> and actualy revivewed-by or acked-by tag.

You can add:

Tested-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]     ` <1447152255-28231-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 11:38       ` Christoph Hellwig
@ 2015-11-10 11:41       ` Christoph Hellwig
       [not found]         ` <20151110114145.GA2810-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 11:41 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Oh, and while we're at it.  Can someone explain why we're even
using rdma_read_chunk_frmr for IB?  It seems to work around the
fact tat iWarp only allow a single RDMA READ SGE, but it's used
whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]         ` <20151110113839.GA32523-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 11:42           ` Sagi Grimberg
  2015-11-10 17:03           ` Chuck Lever
  1 sibling, 0 replies; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 11:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 10/11/2015 13:38, Christoph Hellwig wrote:
> On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote:
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>>   	read = min_t(int, nents << PAGE_SHIFT, rs_length);
>>
>>   	frmr->direction = DMA_FROM_DEVICE;
>> -	frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
>> +	frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
>>   	frmr->sg_nents = nents;
>
> I think you can simply drop the access_flags member and use
> rdma_read_access_flags directly later in the function.
>

I will.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]         ` <20151110114145.GA2810-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 11:46           ` Sagi Grimberg
       [not found]             ` <5641D920.5000409-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 11:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 10/11/2015 13:41, Christoph Hellwig wrote:
> Oh, and while we're at it.  Can someone explain why we're even
> using rdma_read_chunk_frmr for IB?  It seems to work around the
> fact tat iWarp only allow a single RDMA READ SGE, but it's used
> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> wrong.

I think Steve can answer it better than I can. I think that it is
just to have a single code path for both IB and iWARP. I agree that
the condition seems wrong and for small transfers rdma_read_chunk_frmr
is really a performance loss.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]     ` <1447152255-28231-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 11:25       ` Christoph Hellwig
@ 2015-11-10 11:51       ` Yann Droneaud
       [not found]         ` <1447156270.7089.3.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  2015-11-10 18:01       ` Jason Gunthorpe
  2 siblings, 1 reply; 51+ messages in thread
From: Yann Droneaud @ 2015-11-10 11:51 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit :
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE
> 

Why were those hw providers not modified to
enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
set it for them ?

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]             ` <5641D920.5000409-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-11-10 12:04               ` Christoph Hellwig
       [not found]                 ` <20151110120432.GA8230-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 16:12               ` Steve Wise
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 12:04 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
> 
> 
> On 10/11/2015 13:41, Christoph Hellwig wrote:
> >Oh, and while we're at it.  Can someone explain why we're even
> >using rdma_read_chunk_frmr for IB?  It seems to work around the
> >fact tat iWarp only allow a single RDMA READ SGE, but it's used
> >whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> >wrong.
> 
> I think Steve can answer it better than I can. I think that it is
> just to have a single code path for both IB and iWARP. I agree that
> the condition seems wrong and for small transfers rdma_read_chunk_frmr
> is really a performance loss.

Well, the code path already exists, but only is used fi
IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
that demonstrates how I think svcrdma should setup the reads.  Note
that this also allows to entirely remove it's allphys MR.

Note that as a followon this would also allow to remove the
non-READ_W_INV code path from rdma_read_chunk_frmr as a future
step.


diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index f869807..35fa638 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -147,7 +147,6 @@ struct svcxprt_rdma {
 	struct ib_qp         *sc_qp;
 	struct ib_cq         *sc_rq_cq;
 	struct ib_cq         *sc_sq_cq;
-	struct ib_mr         *sc_phys_mr;	/* MR for server memory */
 	int		     (*sc_reader)(struct svcxprt_rdma *,
 					  struct svc_rqst *,
 					  struct svc_rdma_op_ctxt *,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ff4f01e..a410cb6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -160,7 +160,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
 			goto err;
 		atomic_inc(&xprt->sc_dma_used);
 
-		/* The lkey here is either a local dma lkey or a dma_mr lkey */
+		/* The lkey here is the local dma lkey */
 		ctxt->sge[pno].lkey = xprt->sc_dma_lkey;
 		ctxt->sge[pno].length = len;
 		ctxt->count++;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 9f3eb89..2780da4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -887,8 +887,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct ib_cq_init_attr cq_attr = {};
 	struct ib_qp_init_attr qp_attr;
 	struct ib_device *dev;
-	int uninitialized_var(dma_mr_acc);
-	int need_dma_mr = 0;
 	int ret = 0;
 	int i;
 
@@ -986,68 +984,41 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	}
 	newxprt->sc_qp = newxprt->sc_cm_id->qp;
 
-	/*
-	 * Use the most secure set of MR resources based on the
-	 * transport type and available memory management features in
-	 * the device. Here's the table implemented below:
-	 *
-	 *		Fast	Global	DMA	Remote WR
-	 *		Reg	LKEY	MR	Access
-	 *		Sup'd	Sup'd	Needed	Needed
-	 *
-	 * IWARP	N	N	Y	Y
-	 *		N	Y	Y	Y
-	 *		Y	N	Y	N
-	 *		Y	Y	N	-
-	 *
-	 * IB		N	N	Y	N
-	 *		N	Y	N	-
-	 *		Y	N	Y	N
-	 *		Y	Y	N	-
-	 *
-	 * NB:	iWARP requires remote write access for the data sink
-	 *	of an RDMA_READ. IB does not.
-	 */
-	newxprt->sc_reader = rdma_read_chunk_lcl;
-	if (dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
-		newxprt->sc_frmr_pg_list_len =
-			dev->max_fast_reg_page_list_len;
-		newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
-		newxprt->sc_reader = rdma_read_chunk_frmr;
-	}
+	if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {
+		/*
+		 * iWARP requires remote write access for the data sink, and
+		 * only supports a single SGE for RDMA_READ requests, so we'll
+		 * have to use a memory registration for each RDMA_READ.
+		 */
+		if (!(dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)) {
+			/*
+			 * We're an iWarp device but don't support FRs.
+			 * Tought luck, better exit early as we have little
+			 * chance of doing something useful.
+			 */
+			goto errout;
+		}
 
-	/*
-	 * Determine if a DMA MR is required and if so, what privs are required
-	 */
-	if (!rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num) &&
-	    !rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num))
+		newxprt->sc_frmr_pg_list_len = dev->max_fast_reg_page_list_len;
+		newxprt->sc_dev_caps =
+			 SVCRDMA_DEVCAP_FAST_REG |
+			 SVCRDMA_DEVCAP_READ_W_INV;
+		newxprt->sc_reader = rdma_read_chunk_frmr;
+	} else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
+		/*
+		 * For IB or RoCE life is easy, no unsafe write access is
+		 * required and multiple SGEs are supported, so we don't need
+		 * to use MRs.
+		 */
+		newxprt->sc_reader = rdma_read_chunk_lcl;
+	} else {
+		/*
+		 * Neither iWarp nor IB-ish, we're out of luck.
+		 */
 		goto errout;
-
-	if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG) ||
-	    !(dev->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
-		need_dma_mr = 1;
-		dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
-		if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num) &&
-		    !(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG))
-			dma_mr_acc |= IB_ACCESS_REMOTE_WRITE;
 	}
 
-	if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num))
-		newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
-
-	/* Create the DMA MR if needed, otherwise, use the DMA LKEY */
-	if (need_dma_mr) {
-		/* Register all of physical memory */
-		newxprt->sc_phys_mr =
-			ib_get_dma_mr(newxprt->sc_pd, dma_mr_acc);
-		if (IS_ERR(newxprt->sc_phys_mr)) {
-			dprintk("svcrdma: Failed to create DMA MR ret=%d\n",
-				ret);
-			goto errout;
-		}
-		newxprt->sc_dma_lkey = newxprt->sc_phys_mr->lkey;
-	} else
-		newxprt->sc_dma_lkey = dev->local_dma_lkey;
+	newxprt->sc_dma_lkey = dev->local_dma_lkey;
 
 	/* Post receive buffers */
 	for (i = 0; i < newxprt->sc_max_requests; i++) {
@@ -1203,9 +1174,6 @@ static void __svc_rdma_free(struct work_struct *work)
 	if (rdma->sc_rq_cq && !IS_ERR(rdma->sc_rq_cq))
 		ib_destroy_cq(rdma->sc_rq_cq);
 
-	if (rdma->sc_phys_mr && !IS_ERR(rdma->sc_phys_mr))
-		ib_dereg_mr(rdma->sc_phys_mr);
-
 	if (rdma->sc_pd && !IS_ERR(rdma->sc_pd))
 		ib_dealloc_pd(rdma->sc_pd);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]         ` <1447156270.7089.3.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
@ 2015-11-10 12:28           ` Sagi Grimberg
       [not found]             ` <5641E2D1.8070209-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-11-10 12:36           ` Tom Talpey
  2015-11-10 18:06           ` Jason Gunthorpe
  2 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 12:28 UTC (permalink / raw)
  To: Yann Droneaud, Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Yann,

>
> Why were those hw providers not modified to
> enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
> set it for them ?

Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and
iWARP providers executing the memory registration will add
IB_ACCESS_REMOTE_WRITE? That's even better I think! Given that
changing the access rights under the user legs is something we'd want
to do. is it?

Given that for IB memory registration is not even needed for rdma_read,
I'd like to move away from code like:

if (rdma_protocol_iwarp())
	register memory for rdma_read
else
	go ahead with ib_sge for rdma_read

I've maid some initial experimentation on trying to abstract this
aspect from ULPs but it's far from being complete and needs more work.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]         ` <1447156270.7089.3.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  2015-11-10 12:28           ` Sagi Grimberg
@ 2015-11-10 12:36           ` Tom Talpey
       [not found]             ` <5641E4C9.7000206-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
  2015-11-10 18:06           ` Jason Gunthorpe
  2 siblings, 1 reply; 51+ messages in thread
From: Tom Talpey @ 2015-11-10 12:36 UTC (permalink / raw)
  To: Yann Droneaud, Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 6:51 AM, Yann Droneaud wrote:
> Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit :
>> Also, set rdma_read_access_flags in the relevant device drivers:
>> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
>> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE
>>
>
> Why were those hw providers not modified to
> enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
> set it for them ?

Because the Linux verbs don't tell the driver whether the registration
is for an RDMA Read sink buffer.

Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute
which the upper layer can use to convey this information, I've mentioned
it here before.
https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx

When this approach is used, the upper layer doesn't have to be aware
at all of the lower layer's details.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]             ` <5641E2D1.8070209-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-10 12:39               ` Sagi Grimberg
  0 siblings, 0 replies; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 12:39 UTC (permalink / raw)
  To: Sagi Grimberg, Yann Droneaud; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 10/11/2015 14:28, Sagi Grimberg wrote:
> Hi Yann,
>
>>
>> Why were those hw providers not modified to
>> enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
>> set it for them ?
>
> Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and
> iWARP providers executing the memory registration will add
> IB_ACCESS_REMOTE_WRITE? That's even better I think! Given that
> changing the access rights under the user legs is something we'd want
> to do. is it?

Oh, forgot that the providers do not know how this key will be used.
Thanks Tom for reminding me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]             ` <5641E4C9.7000206-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
@ 2015-11-10 12:42               ` Sagi Grimberg
       [not found]                 ` <5641E644.7080101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-10 12:42 UTC (permalink / raw)
  To: Tom Talpey, Yann Droneaud; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


> Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute
> which the upper layer can use to convey this information, I've mentioned
> it here before.
> https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx
>

Thanks for the tip Tom.

>
> When this approach is used, the upper layer doesn't have to be aware
> at all of the lower layer's details.

Yea, we could do that too. Any preferences from other people?
I'm pretty indifferent on which way to go...

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                 ` <5641E644.7080101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-11-10 13:06                   ` Christoph Hellwig
       [not found]                     ` <20151110130648.GA10682-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 13:06 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Tom Talpey, Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 02:42:44PM +0200, Sagi Grimberg wrote:
> >When this approach is used, the upper layer doesn't have to be aware
> >at all of the lower layer's details.
> 
> Yea, we could do that too. Any preferences from other people?
> I'm pretty indifferent on which way to go...

Yes, that's the way to go.  I though we had agree on doing that as
one of the next step of the MR API cleanups, but decided to postpone
it past the first iteration for some reason that made it non-trivial.

I have to say I still don't like the Windows API either, though as it
still keeps the criptic {local,remote}_{read,write} flags.

Basically what we want is two-dimension selection:

	/*
	 * If %true this is the target of RDMA READ/WRITE operations
	 * from the remote system.  If %false this is the sink / source
	 * for RDMA READ/WRITE operations issued by the local system.
	 */
	enum ib_mr_scope { IB_MR_REMOTE, IB_MR_LOCAL };

	/*
	 * Operation we're registering for.  Can be OR'ed together
	 * when allowing RDMA READs and WRITEs on a single MR.
	 */
	enum ib_mr_dir { IB_MR_RDMA_READ = 1 << 0, IB_MR_RDMA_WRITE = 1 << 1 };

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                     ` <20151110130648.GA10682-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 13:41                       ` Christoph Hellwig
       [not found]                         ` <20151110134147.GA12814-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-10 13:41 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Tom Talpey, Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA

FYI, this is the API I'd aim for (only SRP and no HW driver converted
yet):

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 0e21367..7ea695c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1484,14 +1484,15 @@ EXPORT_SYMBOL(ib_check_mr_status);
 int ib_map_mr_sg(struct ib_mr *mr,
 		 struct scatterlist *sg,
 		 int sg_nents,
-		 unsigned int page_size)
+		 unsigned int page_size,
+		 unsigned int flags)
 {
 	if (unlikely(!mr->device->map_mr_sg))
 		return -ENOSYS;
 
 	mr->page_size = page_size;
 
-	return mr->device->map_mr_sg(mr, sg, sg_nents);
+	return mr->device->map_mr_sg(mr, sg, sg_nents, flags);
 }
 EXPORT_SYMBOL(ib_map_mr_sg);
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 62b6cba..d77a5b4 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1314,7 +1314,6 @@ static int srp_map_finish_fr(struct srp_map_state *state,
 	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_send_wr *bad_wr;
-	struct ib_reg_wr wr;
 	struct srp_fr_desc *desc;
 	u32 rkey;
 	int n, err;
@@ -1342,20 +1341,17 @@ static int srp_map_finish_fr(struct srp_map_state *state,
 	ib_update_fast_reg_key(desc->mr, rkey);
 
 	n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
-			 dev->mr_page_size);
+			 dev->mr_page_size,
+			 /*
+			  * XXX: add a bool write argument to this function,
+			  * so that we only need to open up the required
+			  * permissions.
+			  */
+			 IB_MR_REMOTE | IB_MR_RDMA_READ | IB_MR_RDMA_WRITE);
 	if (unlikely(n < 0))
 		return n;
 
-	wr.wr.next = NULL;
-	wr.wr.opcode = IB_WR_REG_MR;
-	wr.wr.wr_id = FAST_REG_WR_ID_MASK;
-	wr.wr.num_sge = 0;
-	wr.wr.send_flags = 0;
-	wr.mr = desc->mr;
-	wr.key = desc->mr->rkey;
-	wr.access = (IB_ACCESS_LOCAL_WRITE |
-		     IB_ACCESS_REMOTE_READ |
-		     IB_ACCESS_REMOTE_WRITE);
+	desc->mr->wr.wr_id = FAST_REG_WR_ID_MASK;
 
 	*state->fr.next++ = desc;
 	state->nmdesc++;
@@ -1363,7 +1359,7 @@ static int srp_map_finish_fr(struct srp_map_state *state,
 	srp_map_desc(state, desc->mr->iova,
 		     desc->mr->length, desc->mr->rkey);
 
-	err = ib_post_send(ch->qp, &wr.wr, &bad_wr);
+	err = ib_post_send(ch->qp, &desc->mr->wr, &bad_wr);
 	if (unlikely(err))
 		return err;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 83d6ee8..b168b3a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1105,18 +1105,6 @@ static inline struct ib_ud_wr *ud_wr(struct ib_send_wr *wr)
 	return container_of(wr, struct ib_ud_wr, wr);
 }
 
-struct ib_reg_wr {
-	struct ib_send_wr	wr;
-	struct ib_mr		*mr;
-	u32			key;
-	int			access;
-};
-
-static inline struct ib_reg_wr *reg_wr(struct ib_send_wr *wr)
-{
-	return container_of(wr, struct ib_reg_wr, wr);
-}
-
 struct ib_bind_mw_wr {
 	struct ib_send_wr	wr;
 	struct ib_mw		*mw;
@@ -1314,7 +1302,18 @@ struct ib_qp {
 	enum ib_qp_type		qp_type;
 };
 
+enum ib_mr_flags {
+	/* scope: either remote or local */
+	IB_MR_REMOTE,
+	IB_MR_LOCAL,
+
+	/* direction: one or both can be ORed into the scope above */
+	IB_MR_RDMA_READ		= (1 << 10),
+	IB_MR_RDMA_WRITE	= (1 << 11)
+};
+
 struct ib_mr {
+	struct ib_send_wr  wr;
 	struct ib_device  *device;
 	struct ib_pd	  *pd;
 	struct ib_uobject *uobject;
@@ -1326,6 +1325,11 @@ struct ib_mr {
 	atomic_t	   usecnt; /* count number of MWs */
 };
 
+static inline struct ib_mr *wr_to_mr(struct ib_send_wr *wr)
+{
+	return container_of(wr, struct ib_mr, wr);
+}
+
 struct ib_mw {
 	struct ib_device	*device;
 	struct ib_pd		*pd;
@@ -1706,7 +1710,8 @@ struct ib_device {
 					       u32 max_num_sg);
 	int                        (*map_mr_sg)(struct ib_mr *mr,
 						struct scatterlist *sg,
-						int sg_nents);
+						int sg_nents,
+						unsigned int flags);
 	int                        (*rereg_phys_mr)(struct ib_mr *mr,
 						    int mr_rereg_mask,
 						    struct ib_pd *pd,
@@ -3022,17 +3027,19 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
 int ib_map_mr_sg(struct ib_mr *mr,
 		 struct scatterlist *sg,
 		 int sg_nents,
-		 unsigned int page_size);
+		 unsigned int page_size,
+		 unsigned int flags);
 
 static inline int
 ib_map_mr_sg_zbva(struct ib_mr *mr,
 		  struct scatterlist *sg,
 		  int sg_nents,
-		  unsigned int page_size)
+		  unsigned int page_size,
+		  unsigned int flags)
 {
 	int n;
 
-	n = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+	n = ib_map_mr_sg(mr, sg, sg_nents, page_size, flags);
 	mr->iova = 0;
 
 	return n;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                 ` <20151110120432.GA8230-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 14:34                   ` Chuck Lever
       [not found]                     ` <F60E8BA6-D728-44E4-9331-0C23AB96E634-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2015-11-10 18:25                   ` Jason Gunthorpe
  1 sibling, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2015-11-10 14:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA


> On Nov 10, 2015, at 7:04 AM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> 
>> On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
>> 
>> 
>>> On 10/11/2015 13:41, Christoph Hellwig wrote:
>>> Oh, and while we're at it.  Can someone explain why we're even
>>> using rdma_read_chunk_frmr for IB?  It seems to work around the
>>> fact tat iWarp only allow a single RDMA READ SGE, but it's used
>>> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
>>> wrong.
>> 
>> I think Steve can answer it better than I can. I think that it is
>> just to have a single code path for both IB and iWARP. I agree that
>> the condition seems wrong and for small transfers rdma_read_chunk_frmr
>> is really a performance loss.
> 
> Well, the code path already exists, but only is used fi
> IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
> that demonstrates how I think svcrdma should setup the reads.  Note
> that this also allows to entirely remove it's allphys MR.

Two comments:

1. RFCs and changes in sunrpc must be copied to linux-nfs.

2. Is there a reason IB is not using the allphys MR for RDMA Read?
That would be much more efficient. On the NFS server that MR isn't
exposed, thus it is safe to use.


> Note that as a followon this would also allow to remove the
> non-READ_W_INV code path from rdma_read_chunk_frmr as a future
> step.
> 
> 
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index f869807..35fa638 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -147,7 +147,6 @@ struct svcxprt_rdma {
>    struct ib_qp         *sc_qp;
>    struct ib_cq         *sc_rq_cq;
>    struct ib_cq         *sc_sq_cq;
> -    struct ib_mr         *sc_phys_mr;    /* MR for server memory */
>    int             (*sc_reader)(struct svcxprt_rdma *,
>                      struct svc_rqst *,
>                      struct svc_rdma_op_ctxt *,
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index ff4f01e..a410cb6 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -160,7 +160,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>            goto err;
>        atomic_inc(&xprt->sc_dma_used);
> 
> -        /* The lkey here is either a local dma lkey or a dma_mr lkey */
> +        /* The lkey here is the local dma lkey */
>        ctxt->sge[pno].lkey = xprt->sc_dma_lkey;
>        ctxt->sge[pno].length = len;
>        ctxt->count++;
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 9f3eb89..2780da4 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -887,8 +887,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>    struct ib_cq_init_attr cq_attr = {};
>    struct ib_qp_init_attr qp_attr;
>    struct ib_device *dev;
> -    int uninitialized_var(dma_mr_acc);
> -    int need_dma_mr = 0;
>    int ret = 0;
>    int i;
> 
> @@ -986,68 +984,41 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>    }
>    newxprt->sc_qp = newxprt->sc_cm_id->qp;
> 
> -    /*
> -     * Use the most secure set of MR resources based on the
> -     * transport type and available memory management features in
> -     * the device. Here's the table implemented below:
> -     *
> -     *        Fast    Global    DMA    Remote WR
> -     *        Reg    LKEY    MR    Access
> -     *        Sup'd    Sup'd    Needed    Needed
> -     *
> -     * IWARP    N    N    Y    Y
> -     *        N    Y    Y    Y
> -     *        Y    N    Y    N
> -     *        Y    Y    N    -
> -     *
> -     * IB        N    N    Y    N
> -     *        N    Y    N    -
> -     *        Y    N    Y    N
> -     *        Y    Y    N    -
> -     *
> -     * NB:    iWARP requires remote write access for the data sink
> -     *    of an RDMA_READ. IB does not.
> -     */
> -    newxprt->sc_reader = rdma_read_chunk_lcl;
> -    if (dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
> -        newxprt->sc_frmr_pg_list_len =
> -            dev->max_fast_reg_page_list_len;
> -        newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
> -        newxprt->sc_reader = rdma_read_chunk_frmr;
> -    }
> +    if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {
> +        /*
> +         * iWARP requires remote write access for the data sink, and
> +         * only supports a single SGE for RDMA_READ requests, so we'll
> +         * have to use a memory registration for each RDMA_READ.
> +         */
> +        if (!(dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)) {
> +            /*
> +             * We're an iWarp device but don't support FRs.
> +             * Tought luck, better exit early as we have little
> +             * chance of doing something useful.
> +             */
> +            goto errout;
> +        }
> 
> -    /*
> -     * Determine if a DMA MR is required and if so, what privs are required
> -     */
> -    if (!rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num) &&
> -        !rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num))
> +        newxprt->sc_frmr_pg_list_len = dev->max_fast_reg_page_list_len;
> +        newxprt->sc_dev_caps =
> +             SVCRDMA_DEVCAP_FAST_REG |
> +             SVCRDMA_DEVCAP_READ_W_INV;
> +        newxprt->sc_reader = rdma_read_chunk_frmr;
> +    } else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
> +        /*
> +         * For IB or RoCE life is easy, no unsafe write access is
> +         * required and multiple SGEs are supported, so we don't need
> +         * to use MRs.
> +         */
> +        newxprt->sc_reader = rdma_read_chunk_lcl;
> +    } else {
> +        /*
> +         * Neither iWarp nor IB-ish, we're out of luck.
> +         */
>        goto errout;
> -
> -    if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG) ||
> -        !(dev->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
> -        need_dma_mr = 1;
> -        dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
> -        if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num) &&
> -            !(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG))
> -            dma_mr_acc |= IB_ACCESS_REMOTE_WRITE;
>    }
> 
> -    if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num))
> -        newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
> -
> -    /* Create the DMA MR if needed, otherwise, use the DMA LKEY */
> -    if (need_dma_mr) {
> -        /* Register all of physical memory */
> -        newxprt->sc_phys_mr =
> -            ib_get_dma_mr(newxprt->sc_pd, dma_mr_acc);
> -        if (IS_ERR(newxprt->sc_phys_mr)) {
> -            dprintk("svcrdma: Failed to create DMA MR ret=%d\n",
> -                ret);
> -            goto errout;
> -        }
> -        newxprt->sc_dma_lkey = newxprt->sc_phys_mr->lkey;
> -    } else
> -        newxprt->sc_dma_lkey = dev->local_dma_lkey;
> +    newxprt->sc_dma_lkey = dev->local_dma_lkey;
> 
>    /* Post receive buffers */
>    for (i = 0; i < newxprt->sc_max_requests; i++) {
> @@ -1203,9 +1174,6 @@ static void __svc_rdma_free(struct work_struct *work)
>    if (rdma->sc_rq_cq && !IS_ERR(rdma->sc_rq_cq))
>        ib_destroy_cq(rdma->sc_rq_cq);
> 
> -    if (rdma->sc_phys_mr && !IS_ERR(rdma->sc_phys_mr))
> -        ib_dereg_mr(rdma->sc_phys_mr);
> -
>    if (rdma->sc_pd && !IS_ERR(rdma->sc_pd))
>        ib_dealloc_pd(rdma->sc_pd);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                     ` <F60E8BA6-D728-44E4-9331-0C23AB96E634-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-11-10 15:16                       ` Tom Talpey
  0 siblings, 0 replies; 51+ messages in thread
From: Tom Talpey @ 2015-11-10 15:16 UTC (permalink / raw)
  To: Chuck Lever, Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 9:34 AM, Chuck Lever wrote:
> 2. Is there a reason IB is not using the allphys MR for RDMA Read?
> That would be much more efficient. On the NFS server that MR isn't
> exposed, thus it is safe to use.

True, but only if the layout of the buffer's physical pages do not
exceed the local RDMA Read scatter limit. If the size is large then
the privileged MR will require additional RDMA Read operations, which
can be more expensive. It's certainly a valid optimization, if the
buffer "fits".

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]             ` <5641D920.5000409-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 12:04               ` Christoph Hellwig
@ 2015-11-10 16:12               ` Steve Wise
  1 sibling, 0 replies; 51+ messages in thread
From: Steve Wise @ 2015-11-10 16:12 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 5:46 AM, Sagi Grimberg wrote:
>
>
> On 10/11/2015 13:41, Christoph Hellwig wrote:
>> Oh, and while we're at it.  Can someone explain why we're even
>> using rdma_read_chunk_frmr for IB?  It seems to work around the
>> fact tat iWarp only allow a single RDMA READ SGE, but it's used
>> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
>> wrong.
>
> I think Steve can answer it better than I can. I think that it is
> just to have a single code path for both IB and iWARP. I agree that
> the condition seems wrong and for small transfers rdma_read_chunk_frmr
> is really a performance loss.

This was probably just an oversight/mistake.  The focus was on enabling 
iWARP at the time.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags
       [not found]     ` <20151110113141.GA23093-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 11:40       ` Sagi Grimberg
@ 2015-11-10 16:18       ` Steve Wise
  1 sibling, 0 replies; 51+ messages in thread
From: Steve Wise @ 2015-11-10 16:18 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 5:31 AM, Christoph Hellwig wrote:
> On Tue, Nov 10, 2015 at 12:44:12PM +0200, Sagi Grimberg wrote:
>> Instead of each ULP being aware of iWARP/IB protocol in order
>> to determine the rdma_read access flags, have it accessible
>> as an attribute in the ib_device.
>>
>> Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
>> unconditionally.
>>
>> This patchset goes on top of Christoph's device attributes merge into
>> struct ib_device.
> FYI, I've updated the git branch to be based on current linus' tree
> which required a few bit to be fixed.  I'd also like to note that while
> everyone but Or seemed to be generally fine with it I'd really prefer
> and actualy revivewed-by or acked-by tag.
> --

Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]         ` <20151110113839.GA32523-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 11:42           ` Sagi Grimberg
@ 2015-11-10 17:03           ` Chuck Lever
  1 sibling, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2015-11-10 17:03 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg; +Cc: Linux RDMA Mailing List


> On Nov 10, 2015, at 6:38 AM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> 
> On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote:
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>> 	read = min_t(int, nents << PAGE_SHIFT, rs_length);
>> 
>> 	frmr->direction = DMA_FROM_DEVICE;
>> -	frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
>> +	frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
>> 	frmr->sg_nents = nents;
> 
> I think you can simply drop the access_flags member and use
> rdma_read_access_flags directly later in the function.

> Oh, and while we're at it.  Can someone explain why we're even
> using rdma_read_chunk_frmr for IB?  It seems to work around the
> fact tat iWarp only allow a single RDMA READ SGE, but it's used
> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> wrong.

Christoph: I agree on both points.

Sagi: the overall approach looks fine to me. Will wait for you
to repost with Christoph’s first suggestion addressed.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]     ` <1447152255-28231-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-11-10 11:25       ` Christoph Hellwig
  2015-11-10 11:51       ` Yann Droneaud
@ 2015-11-10 18:01       ` Jason Gunthorpe
       [not found]         ` <20151110180156.GE12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2 siblings, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-10 18:01 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 12:44:13PM +0200, Sagi Grimberg wrote:
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE

IB devices *should* be using a local_dma_lkey instead of a memory
registration for the RDMA READ target memory.

iWarp devices *must* use a memory registration instead.

Having an API that encourages ULPs to do useless MR work on IB does
not seem like a great idea, but if the ULP needs to create a MR anyhow
(SG limits or whatever), then it makes some sense.. But not in absence
of the 'need a mr' general check..

Finally, please don't add rdma_read_access_flags - we went to a lot of
trouble to add the cap stuff, and this stuff is already covered by it.
No need to change every driver.

I'd suggest something like

  unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd)
  {
     if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device)))
           return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
     return IB_ACCESS_LOCAL_WRITE;
  }

  bool rdma_cap_need_rdma_read_mr(const struct ib_pd *pd) ...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]         ` <1447156270.7089.3.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  2015-11-10 12:28           ` Sagi Grimberg
  2015-11-10 12:36           ` Tom Talpey
@ 2015-11-10 18:06           ` Jason Gunthorpe
  2 siblings, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-10 18:06 UTC (permalink / raw)
  To: Yann Droneaud; +Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 12:51:10PM +0100, Yann Droneaud wrote:
> Why were those hw providers not modified to
> enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
> set it for them ?

iWarp hardware simply cannot do it it all.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                 ` <20151110120432.GA8230-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 14:34                   ` Chuck Lever
@ 2015-11-10 18:25                   ` Jason Gunthorpe
       [not found]                     ` <20151110182546.GI12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-10 18:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 04:04:32AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
> > 
> > 
> > On 10/11/2015 13:41, Christoph Hellwig wrote:
> > >Oh, and while we're at it.  Can someone explain why we're even
> > >using rdma_read_chunk_frmr for IB?  It seems to work around the
> > >fact tat iWarp only allow a single RDMA READ SGE, but it's used
> > >whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> > >wrong.
> > 
> > I think Steve can answer it better than I can. I think that it is
> > just to have a single code path for both IB and iWARP. I agree that
> > the condition seems wrong and for small transfers rdma_read_chunk_frmr
> > is really a performance loss.
> 
> Well, the code path already exists, but only is used fi
> IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
> that demonstrates how I think svcrdma should setup the reads.  Note
> that this also allows to entirely remove it's allphys MR.
> 
> Note that as a followon this would also allow to remove the
> non-READ_W_INV code path from rdma_read_chunk_frmr as a future
> step.

I like this, my only comment is we should have a rdma_cap for this
behavior, rdma_cap_needs_rdma_read_mr(pd) or something?

> +	if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {

Use here

> +		/*
> +		 * iWARP requires remote write access for the data sink, and
> +		 * only supports a single SGE for RDMA_READ requests, so we'll
> +		 * have to use a memory registration for each RDMA_READ.
> +		 */
> +		if (!(dev->device_cap_flags &
> IB_DEVICE_MEM_MGT_EXTENSIONS)) {

Lets enforce this in the core, if rdma_cap_needs_rdma_read_mr is set
the the device must also set IB_DEVICE_MEM_MGT_EXTENSIONS, check at
device creation time.

> +	} else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
> +		/*
> +		 * For IB or RoCE life is easy, no unsafe write access is
> +		 * required and multiple SGEs are supported, so we don't need
> +		 * to use MRs.
> +		 */
> +		newxprt->sc_reader = rdma_read_chunk_lcl;
> +	} else {
> +		/*
> +		 * Neither iWarp nor IB-ish, we're out of luck.
> +		 */
>  		goto errout;

No need for the else, !rdma_cap_needs_rdma_read_mr means pd->local_dma_lkey is okay
to use.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                         ` <20151110134147.GA12814-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-10 18:36                           ` Jason Gunthorpe
       [not found]                             ` <20151110183627.GJ12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-11-11  9:07                           ` Sagi Grimberg
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-10 18:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Tom Talpey, Yann Droneaud,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 05:41:47AM -0800, Christoph Hellwig wrote:
> FYI, this is the API I'd aim for (only SRP and no HW driver converted
> yet):

>  	n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
> -			 dev->mr_page_size);
> +			 dev->mr_page_size,
> +			 /*
> +			  * XXX: add a bool write argument to this function,
> +			  * so that we only need to open up the required
> +			  * permissions.
> +			  */
> +			 IB_MR_REMOTE | IB_MR_RDMA_READ |
> IB_MR_RDMA_WRITE);

I would call it IB_RDMA_LKEY and IB_RDMA_RKEY. We have other places in
the API where lkey/rkey is used and it makes a lot more sense to think
about a MR as being either a lkey or rkey usable MR - since this is
effectively what we are doing here with these ops.

IB_MR_RKEY | IB_MR_RDMA_READ == The resulting key can be used
in a wr.rdma.rkey field

IB_MR_LKEY | IB_MR_SEND == The key can be used in wr.sg_list[].lkey

etc

It is an error to use a IB_MR_LKEY in a rkey field, etc..

> +enum ib_mr_flags {
> +	/* scope: either remote or local */
> +	IB_MR_REMOTE,
> +	IB_MR_LOCAL,
> +
> +	/* direction: one or both can be ORed into the scope above */
> +	IB_MR_RDMA_READ		= (1 << 10),
> +	IB_MR_RDMA_WRITE	= (1 << 11)

Don't forget SEND too.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                     ` <20151110182546.GI12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-11-10 21:00                       ` Tom Talpey
       [not found]                         ` <56425AFB.30202-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
  2015-11-11  8:02                       ` Christoph Hellwig
  1 sibling, 1 reply; 51+ messages in thread
From: Tom Talpey @ 2015-11-10 21:00 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 1:25 PM, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2015 at 04:04:32AM -0800, Christoph Hellwig wrote:
>> On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
>>>
>>>
>>> On 10/11/2015 13:41, Christoph Hellwig wrote:
>>>> Oh, and while we're at it.  Can someone explain why we're even
>>>> using rdma_read_chunk_frmr for IB?  It seems to work around the
>>>> fact tat iWarp only allow a single RDMA READ SGE, but it's used
>>>> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
>>>> wrong.
>>>
>>> I think Steve can answer it better than I can. I think that it is
>>> just to have a single code path for both IB and iWARP. I agree that
>>> the condition seems wrong and for small transfers rdma_read_chunk_frmr
>>> is really a performance loss.
>>
>> Well, the code path already exists, but only is used fi
>> IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
>> that demonstrates how I think svcrdma should setup the reads.  Note
>> that this also allows to entirely remove it's allphys MR.
>>
>> Note that as a followon this would also allow to remove the
>> non-READ_W_INV code path from rdma_read_chunk_frmr as a future
>> step.
>
> I like this, my only comment is we should have a rdma_cap for this
> behavior, rdma_cap_needs_rdma_read_mr(pd) or something?

Windows NDKPI has this, it's the oh-so-succinctly-named flag
NDK_ADAPTER_FLAG_RDMA_READ_SINK_NOT_REQUIRED. The ULP is free
to ignore it and pass the NDK_MR_FLAG_RDMA_READ_SINK flag anyway,
the provider is expected to ignore it if not needed.

>
>> +	if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {
>
> Use here
>
>> +		/*
>> +		 * iWARP requires remote write access for the data sink, and
>> +		 * only supports a single SGE for RDMA_READ requests, so we'll
>> +		 * have to use a memory registration for each RDMA_READ.
>> +		 */
>> +		if (!(dev->device_cap_flags &
>> IB_DEVICE_MEM_MGT_EXTENSIONS)) {
>
> Lets enforce this in the core, if rdma_cap_needs_rdma_read_mr is set
> the the device must also set IB_DEVICE_MEM_MGT_EXTENSIONS, check at
> device creation time.
>
>> +	} else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
>> +		/*
>> +		 * For IB or RoCE life is easy, no unsafe write access is
>> +		 * required and multiple SGEs are supported, so we don't need
>> +		 * to use MRs.
>> +		 */
>> +		newxprt->sc_reader = rdma_read_chunk_lcl;
>> +	} else {
>> +		/*
>> +		 * Neither iWarp nor IB-ish, we're out of luck.
>> +		 */
>>   		goto errout;
>
> No need for the else, !rdma_cap_needs_rdma_read_mr means pd->local_dma_lkey is okay
> to use.

Hmm, agreed, but it must still be acceptable to perform a registration
instead of using the local_dma_lkey. As mentioned earlier, there are
scatter limits and other implications for all-physical addressing that
an upper layer may choose to avoid.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                         ` <56425AFB.30202-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
@ 2015-11-10 21:17                           ` Jason Gunthorpe
       [not found]                             ` <20151110211716.GA21631-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-10 21:17 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 04:00:43PM -0500, Tom Talpey wrote:

> Hmm, agreed, but it must still be acceptable to perform a registration
> instead of using the local_dma_lkey. As mentioned earlier, there are
> scatter limits and other implications for all-physical addressing that
> an upper layer may choose to avoid.

It is always acceptable to use a lkey MR instead of the local dma
lkey, but ULPs should prefer to use the local dma lkey if possible,
for performance reasons.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                             ` <20151110211716.GA21631-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-11-10 21:30                               ` Tom Talpey
       [not found]                                 ` <564261EF.4000008-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
  2015-11-11  9:25                               ` Sagi Grimberg
  1 sibling, 1 reply; 51+ messages in thread
From: Tom Talpey @ 2015-11-10 21:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 4:17 PM, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2015 at 04:00:43PM -0500, Tom Talpey wrote:
>
>> Hmm, agreed, but it must still be acceptable to perform a registration
>> instead of using the local_dma_lkey. As mentioned earlier, there are
>> scatter limits and other implications for all-physical addressing that
>> an upper layer may choose to avoid.
>
> It is always acceptable to use a lkey MR instead of the local dma
> lkey, but ULPs should prefer to use the local dma lkey if possible,
> for performance reasons.

Sure, the key words are "if possible". For example, a single 1MB
RDMA Read is unlikely to be possible with the dma lkey, assuming
worst-case physical page scatter it would need 256 SGEs. But 1MB
can be registered easily.

In any case, my point was that the ULP gets to choose, so, it looks
like we agree.

Tom.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                 ` <564261EF.4000008-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
@ 2015-11-10 23:01                                   ` Chuck Lever
       [not found]                                     ` <54CF0111-E9C4-4196-BF92-7E134A6A329A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2015-11-10 23:01 UTC (permalink / raw)
  To: Tom Talpey, Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg, Linux RDMA Mailing List


> On Nov 10, 2015, at 4:30 PM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
> 
> On 11/10/2015 4:17 PM, Jason Gunthorpe wrote:
>> On Tue, Nov 10, 2015 at 04:00:43PM -0500, Tom Talpey wrote:
>> 
>>> Hmm, agreed, but it must still be acceptable to perform a registration
>>> instead of using the local_dma_lkey. As mentioned earlier, there are
>>> scatter limits and other implications for all-physical addressing that
>>> an upper layer may choose to avoid.
>> 
>> It is always acceptable to use a lkey MR instead of the local dma
>> lkey, but ULPs should prefer to use the local dma lkey if possible,
>> for performance reasons.
> 
> Sure, the key words are "if possible". For example, a single 1MB
> RDMA Read is unlikely to be possible with the dma lkey, assuming
> worst-case physical page scatter it would need 256 SGEs. But 1MB
> can be registered easily.
> 
> In any case, my point was that the ULP gets to choose, so, it looks
> like we agree.

I’d like to see our NFS server use the local DMA lkey where it
makes sense, to avoid the cost of registering and invalidating
memory.

I have to agree with Tom that once the device’s s/g limit is
exceeded, the server has to post an RDMA Read WR every few
pages, and appears to get expensive for large NFS requests.

The current mechanism of statically choosing either FRMR or
local DMA depending on the device is probably not adequate.
Maybe we could post all the Read WRs via a single chain? Or
stick with FRMR when the amount of data to read is significant.

I’ve also tested Christoph’s patch. The logic currently in
rdma_read_chunk_lcl does not seem up to the task. Once the
device’s s/g limit is exceeded, the server starts throwing
local length exceptions, and the client workload hangs.

None of this is impossible to fix, but there is some work to
do here.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                     ` <54CF0111-E9C4-4196-BF92-7E134A6A329A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-11-10 23:55                                       ` Jason Gunthorpe
       [not found]                                         ` <20151110235551.GA12795-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-11-11  9:28                                       ` Sagi Grimberg
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-10 23:55 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Tom Talpey, Christoph Hellwig, Sagi Grimberg, Linux RDMA Mailing List

On Tue, Nov 10, 2015 at 06:01:01PM -0500, Chuck Lever wrote:
> The current mechanism of statically choosing either FRMR or
> local DMA depending on the device is probably not adequate.
> Maybe we could post all the Read WRs via a single chain? Or
> stick with FRMR when the amount of data to read is significant.

Yes, those are the right ways to go..

IMHO, the break point for when it makes sense to switch from a RDMA
READ chain to a MR is going to be a RDMA core tunable. The performance
curve won't have much to do with the ULP.

But certainly, if a requests fits in the SG list then it should just
use the local dma lkey directly, and consider allocating an
appropriate S/G list size when creating the QP.

The special thing about iwarp becomes simply defeating the use of the
local dma lkey for RDMA READ.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                     ` <20151110182546.GI12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-11-10 21:00                       ` Tom Talpey
@ 2015-11-11  8:02                       ` Christoph Hellwig
       [not found]                         ` <20151111080225.GA31508-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-11  8:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 11:25:46AM -0700, Jason Gunthorpe wrote:
> I like this, my only comment is we should have a rdma_cap for this
> behavior, rdma_cap_needs_rdma_read_mr(pd) or something?

Yes, that's better than checking the protocol.

> > +		if (!(dev->device_cap_flags &
> > IB_DEVICE_MEM_MGT_EXTENSIONS)) {
> 
> Lets enforce this in the core, if rdma_cap_needs_rdma_read_mr is set
> the the device must also set IB_DEVICE_MEM_MGT_EXTENSIONS, check at
> device creation time.

The iWarp verbs spec requires them to be supported, so that should not
be an issue.

> > +	} else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
> > +		/*
> > +		 * For IB or RoCE life is easy, no unsafe write access is
> > +		 * required and multiple SGEs are supported, so we don't need
> > +		 * to use MRs.
> > +		 */
> > +		newxprt->sc_reader = rdma_read_chunk_lcl;
> > +	} else {
> > +		/*
> > +		 * Neither iWarp nor IB-ish, we're out of luck.
> > +		 */
> >  		goto errout;
> 
> No need for the else, !rdma_cap_needs_rdma_read_mr means pd->local_dma_lkey is okay
> to use.

What would happen if someone tried to use NFS on usnic without this?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                             ` <20151110183627.GJ12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-11-11  8:03                               ` Christoph Hellwig
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-11  8:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg, Tom Talpey, Yann Droneaud,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 11:36:27AM -0700, Jason Gunthorpe wrote:
> >  	n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
> > -			 dev->mr_page_size);
> > +			 dev->mr_page_size,
> > +			 /*
> > +			  * XXX: add a bool write argument to this function,
> > +			  * so that we only need to open up the required
> > +			  * permissions.
> > +			  */
> > +			 IB_MR_REMOTE | IB_MR_RDMA_READ |
> > IB_MR_RDMA_WRITE);
> 
> I would call it IB_RDMA_LKEY and IB_RDMA_RKEY. We have other places in
> the API where lkey/rkey is used and it makes a lot more sense to think
> about a MR as being either a lkey or rkey usable MR - since this is
> effectively what we are doing here with these ops.

Hmm, I really hate these suport short names, but if there is consensus I
can fix it up.

> > +enum ib_mr_flags {
> > +	/* scope: either remote or local */
> > +	IB_MR_REMOTE,
> > +	IB_MR_LOCAL,
> > +
> > +	/* direction: one or both can be ORed into the scope above */
> > +	IB_MR_RDMA_READ		= (1 << 10),
> > +	IB_MR_RDMA_WRITE	= (1 << 11)
> 
> Don't forget SEND too.

I don't think we're ever using that in the kernel, but it's an easy
addition for completeless.  Especially once we start exposing these
flags to the drivers.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                         ` <20151110235551.GA12795-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-11-11  8:07                                           ` Christoph Hellwig
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-11  8:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Chuck Lever, Tom Talpey, Christoph Hellwig, Sagi Grimberg,
	Linux RDMA Mailing List

On Tue, Nov 10, 2015 at 04:55:51PM -0700, Jason Gunthorpe wrote:
> IMHO, the break point for when it makes sense to switch from a RDMA
> READ chain to a MR is going to be a RDMA core tunable. The performance
> curve won't have much to do with the ULP.

core/device, a lot of it depends on when we'd exceed the nr_sge limit
as well.  But yes, it needs to move out ot the ULP and into the core,
the ULP has no business here.

As said a few times before we need a core API that takes a struct
scatterlist and builds RDMA READ/WRITE requests from it in an efficient
way.  I have started talking to Sagi and preparing some build blocks
for it, but getting all corner cases right will take a while.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]         ` <20151110180156.GE12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-11-11  8:08           ` Christoph Hellwig
       [not found]             ` <20151111080837.GA14662-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-11  8:08 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 10, 2015 at 11:01:56AM -0700, Jason Gunthorpe wrote:
> No need to change every driver.
> 
> I'd suggest something like
> 
>   unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd)
>   {
>      if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device)))
>            return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
>      return IB_ACCESS_LOCAL_WRITE;
>   }

Yes, this looks good!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                         ` <20151110134147.GA12814-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-10 18:36                           ` Jason Gunthorpe
@ 2015-11-11  9:07                           ` Sagi Grimberg
       [not found]                             ` <56430542.5090008-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-11  9:07 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: Tom Talpey, Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 10/11/2015 15:41, Christoph Hellwig wrote:
> FYI, this is the API I'd aim for (only SRP and no HW driver converted
> yet):

This looks fine, although personally I find scope and direction flags
more confusing than access_flags (but maybe it's just me).

I think that the real issue here is the server/target side which needs
extra logic for rdma_read and memory registration. If we can put this
logic in the core and give ULPs an API that looks something like:
- ib_rdma_read()
- ib_rdma_write()

then maybe we don't need to change our flags?

The only reason why I posted this series was a pre-step for this type
of API so we can avoid the is_iwarp() condition in the hot path.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]             ` <20151111080837.GA14662-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-11  9:09               ` Sagi Grimberg
       [not found]                 ` <564305AF.3020903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-11  9:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 11/11/2015 10:08, Christoph Hellwig wrote:
> On Tue, Nov 10, 2015 at 11:01:56AM -0700, Jason Gunthorpe wrote:
>> No need to change every driver.
>>
>> I'd suggest something like
>>
>>    unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd)
>>    {
>>       if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device)))
>>             return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
>>       return IB_ACCESS_LOCAL_WRITE;
>>    }
>
> Yes, this looks good!

It does, but it doesn't look like something we'd want to check for each
IO...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                             ` <20151110211716.GA21631-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-11-10 21:30                               ` Tom Talpey
@ 2015-11-11  9:25                               ` Sagi Grimberg
       [not found]                                 ` <56430972.8080703-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-11  9:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Tom Talpey
  Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Jason,

> It is always acceptable to use a lkey MR instead of the local dma
> lkey, but ULPs should prefer to use the local dma lkey if possible,
> for performance reasons.

I don't necessarily agree with this statement (at least with the second
part of it), the world is not always perfect.

For RDMA READs, a HCA will perform the memory scatters when on the RX
path, when receiving the read responses containing the data. This means
that the HCA needs to perform a lookup of the relevant scatter entries
upon each read response. Due to that, modern HCAs keep a dedicate cache
for this type of RX-path lookup (which is limited in size naturally).

So, having *small* sgls for rdma_read is much (much) more friendly to
the HCA caches which means that for large transfers, even for IB, it
might be better to register memory than having endless sgls.
I was able to see that under some workloads.

This is yet another non-trivial consideration that ULPs need to be aware
of... We really are better off putting this stuff in the core...

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                     ` <54CF0111-E9C4-4196-BF92-7E134A6A329A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2015-11-10 23:55                                       ` Jason Gunthorpe
@ 2015-11-11  9:28                                       ` Sagi Grimberg
       [not found]                                         ` <56430A20.6040600-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-11  9:28 UTC (permalink / raw)
  To: Chuck Lever, Tom Talpey, Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg, Linux RDMA Mailing List



>
> I’d like to see our NFS server use the local DMA lkey where it
> makes sense, to avoid the cost of registering and invalidating
> memory.
>
> I have to agree with Tom that once the device’s s/g limit is
> exceeded, the server has to post an RDMA Read WR every few
> pages, and appears to get expensive for large NFS requests.
>
> The current mechanism of statically choosing either FRMR or
> local DMA depending on the device is probably not adequate.
> Maybe we could post all the Read WRs via a single chain? Or
> stick with FRMR when the amount of data to read is significant.
>
> I’ve also tested Christoph’s patch. The logic currently in
> rdma_read_chunk_lcl does not seem up to the task. Once the
> device’s s/g limit is exceeded, the server starts throwing
> local length exceptions, and the client workload hangs.

This is probably because this code path wasn't reached/tested for
a long time as it's hard to find a device that doesn't support FRWR
for a long time now...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                             ` <56430542.5090008-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-11 10:25                               ` Christoph Hellwig
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-11 10:25 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg, Tom Talpey, Yann Droneaud,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 11, 2015 at 11:07:14AM +0200, Sagi Grimberg wrote:
> 
> 
> On 10/11/2015 15:41, Christoph Hellwig wrote:
> >FYI, this is the API I'd aim for (only SRP and no HW driver converted
> >yet):
> 
> This looks fine, although personally I find scope and direction flags
> more confusing than access_flags (but maybe it's just me).

Looking at the drivers the current flags seem to confuse programmes
hard, given that the choises seem to be random values that just happen
to work:

rkeys:
iser:		lwrite | rwrite | rread
srp:		lwrite | rwrite | rread
rds:		lwrite | rread | rwrite
rds:		rwrite
xprdtrdma (wr):	rwrite | lwrite
xprdtrdma (rd):	rread

lkeys:
isert:		lwrite
svcrdma:	lwrite | rwrite

moving to a model where the consumer clearly says what they intend
to do with the MR seem much better than that.

> I think that the real issue here is the server/target side which needs
> extra logic for rdma_read and memory registration. If we can put this
> logic in the core and give ULPs an API that looks something like:
> - ib_rdma_read()
> - ib_rdma_write()
> 
> then maybe we don't need to change our flags?

The current flags don't make any sense for someone trying to use
the API.  They're pointless leakage of badly designed protocol specs.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags
       [not found]                 ` <564305AF.3020903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-11 10:26                   ` Christoph Hellwig
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2015-11-11 10:26 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jason Gunthorpe, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 11, 2015 at 11:09:03AM +0200, Sagi Grimberg wrote:
> It does, but it doesn't look like something we'd want to check for each
> IO...

Both potential callers already have a access flags variable in object
that's assigned to at setup time so I don't see a problem here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                         ` <56430A20.6040600-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-11 16:19                                           ` Chuck Lever
       [not found]                                             ` <D4228973-6DFA-4181-9848-41BFE8E75200-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2015-11-11 16:19 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Tom Talpey, Jason Gunthorpe, Christoph Hellwig, Linux RDMA Mailing List

Hi Sagi-


> On Nov 11, 2015, at 4:28 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> 
>> I’d like to see our NFS server use the local DMA lkey where it
>> makes sense, to avoid the cost of registering and invalidating
>> memory.
>> 
>> I have to agree with Tom that once the device’s s/g limit is
>> exceeded, the server has to post an RDMA Read WR every few
>> pages, and appears to get expensive for large NFS requests.
>> 
>> The current mechanism of statically choosing either FRMR or
>> local DMA depending on the device is probably not adequate.
>> Maybe we could post all the Read WRs via a single chain? Or
>> stick with FRMR when the amount of data to read is significant.
>> 
>> I’ve also tested Christoph’s patch. The logic currently in
>> rdma_read_chunk_lcl does not seem up to the task. Once the
>> device’s s/g limit is exceeded, the server starts throwing
>> local length exceptions, and the client workload hangs.
> 
> This is probably because this code path wasn't reached/tested for
> a long time as it's hard to find a device that doesn't support FRWR
> for a long time now…

An alternate explanation is that the provider is not setting
device->max_sge_rd properly. rdma_read_chunks_lcl() seems to
be the only thing in my copy of the kernel tree that relies on
that value.

I’ve reproduced the local length errors with CX-2 and CX-3
Pro, which both set that field to 32. If I artificially
set that field to 30, I don’t see any issue.

Is commit 18ebd40773bf correct?


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                             ` <D4228973-6DFA-4181-9848-41BFE8E75200-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-11-11 16:29                                               ` Sagi Grimberg
       [not found]                                                 ` <56436CDB.6090305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Sagi Grimberg @ 2015-11-11 16:29 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Tom Talpey, Jason Gunthorpe, Christoph Hellwig, Linux RDMA Mailing List



> An alternate explanation is that the provider is not setting
> device->max_sge_rd properly. rdma_read_chunks_lcl() seems to
> be the only thing in my copy of the kernel tree that relies on
> that value.
>
> I’ve reproduced the local length errors with CX-2 and CX-3
> Pro, which both set that field to 32. If I artificially
> set that field to 30, I don’t see any issue.
>
> Is commit 18ebd40773bf correct?

See my latest patchset "Handle mlx4 max_sge_rd correctly" (v2).

It fixes exactly this.
Now you will be able to add your "Tested-by:" tag  ;)

It on Doug to take it...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                         ` <20151111080225.GA31508-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-11 17:23                           ` Jason Gunthorpe
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-11 17:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 11, 2015 at 12:02:25AM -0800, Christoph Hellwig wrote:
> > No need for the else, !rdma_cap_needs_rdma_read_mr means pd->local_dma_lkey is okay
> > to use.
> 
> What would happen if someone tried to use NFS on usnic without this?

Honestly, I have no idea how usnic fits into the kernel side, it
simply doesn't provide the kapi. It should fail much earlier, like
when creating a PD, IMHO.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                 ` <56430972.8080703-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-11 17:39                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2015-11-11 17:39 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Tom Talpey, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 11, 2015 at 11:25:06AM +0200, Sagi Grimberg wrote:
> For RDMA READs, a HCA will perform the memory scatters when on the RX
> path, when receiving the read responses containing the data. This means
> that the HCA needs to perform a lookup of the relevant scatter entries
> upon each read response. Due to that, modern HCAs keep a dedicate cache
> for this type of RX-path lookup (which is limited in size naturally).

There is always a memory scatter, and MR setup overheads and table
reads are not zero. There is nothing architectural in the verbs or IBA
that would require a HCA to run slower for SGL than MR...

As you say, another choice the ULP should not be making.. Even
selecting the optimal SGL list len is not something the ULP should
do, in that case.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags
       [not found]                                                 ` <56436CDB.6090305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-11 20:50                                                   ` Chuck Lever
  0 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2015-11-11 20:50 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Tom Talpey, Jason Gunthorpe, Christoph Hellwig, Linux RDMA Mailing List


> On Nov 11, 2015, at 11:29 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> 
> 
> 
>> An alternate explanation is that the provider is not setting
>> device->max_sge_rd properly. rdma_read_chunks_lcl() seems to
>> be the only thing in my copy of the kernel tree that relies on
>> that value.
>> 
>> I’ve reproduced the local length errors with CX-2 and CX-3
>> Pro, which both set that field to 32. If I artificially
>> set that field to 30, I don’t see any issue.
>> 
>> Is commit 18ebd40773bf correct?
> 
> See my latest patchset "Handle mlx4 max_sge_rd correctly" (v2).
> 
> It fixes exactly this.
> Now you will be able to add your "Tested-by:" tag  ;)

Confirmed that the current value (32) does not work, and
that the proposed replacement (30) does work, in an NFS
server that uses the _lcl path with mlx4.


> It on Doug to take it...

Noted that the NFS server’s _lcl path is the only
kernel consumer of this value.

Because the current NFS server does not use the _lcl
path with mlx4 devices, maybe this fix is not needed in
stable, unless the max_sge_rd field is visible to user
space.

However no future changes to the NFS server’s reader
selection logic should be made until this fix, or one
like it, is merged. And I recommend adding a
"Fixes: 18ebd40773bf" tag.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-11-11 20:50 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10 10:44 [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags Sagi Grimberg
     [not found] ` <1447152255-28231-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-11-10 10:44   ` [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags Sagi Grimberg
     [not found]     ` <1447152255-28231-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-11-10 11:25       ` Christoph Hellwig
     [not found]         ` <20151110112537.GA8991-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 11:36           ` Sagi Grimberg
2015-11-10 11:51       ` Yann Droneaud
     [not found]         ` <1447156270.7089.3.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-11-10 12:28           ` Sagi Grimberg
     [not found]             ` <5641E2D1.8070209-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-10 12:39               ` Sagi Grimberg
2015-11-10 12:36           ` Tom Talpey
     [not found]             ` <5641E4C9.7000206-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-10 12:42               ` Sagi Grimberg
     [not found]                 ` <5641E644.7080101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-11-10 13:06                   ` Christoph Hellwig
     [not found]                     ` <20151110130648.GA10682-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 13:41                       ` Christoph Hellwig
     [not found]                         ` <20151110134147.GA12814-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 18:36                           ` Jason Gunthorpe
     [not found]                             ` <20151110183627.GJ12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-11-11  8:03                               ` Christoph Hellwig
2015-11-11  9:07                           ` Sagi Grimberg
     [not found]                             ` <56430542.5090008-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-11 10:25                               ` Christoph Hellwig
2015-11-10 18:06           ` Jason Gunthorpe
2015-11-10 18:01       ` Jason Gunthorpe
     [not found]         ` <20151110180156.GE12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-11-11  8:08           ` Christoph Hellwig
     [not found]             ` <20151111080837.GA14662-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-11  9:09               ` Sagi Grimberg
     [not found]                 ` <564305AF.3020903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-11 10:26                   ` Christoph Hellwig
2015-11-10 10:44   ` [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags Sagi Grimberg
     [not found]     ` <1447152255-28231-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-11-10 11:38       ` Christoph Hellwig
     [not found]         ` <20151110113839.GA32523-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 11:42           ` Sagi Grimberg
2015-11-10 17:03           ` Chuck Lever
2015-11-10 11:41       ` Christoph Hellwig
     [not found]         ` <20151110114145.GA2810-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 11:46           ` Sagi Grimberg
     [not found]             ` <5641D920.5000409-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-11-10 12:04               ` Christoph Hellwig
     [not found]                 ` <20151110120432.GA8230-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 14:34                   ` Chuck Lever
     [not found]                     ` <F60E8BA6-D728-44E4-9331-0C23AB96E634-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-10 15:16                       ` Tom Talpey
2015-11-10 18:25                   ` Jason Gunthorpe
     [not found]                     ` <20151110182546.GI12667-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-11-10 21:00                       ` Tom Talpey
     [not found]                         ` <56425AFB.30202-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-10 21:17                           ` Jason Gunthorpe
     [not found]                             ` <20151110211716.GA21631-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-11-10 21:30                               ` Tom Talpey
     [not found]                                 ` <564261EF.4000008-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-10 23:01                                   ` Chuck Lever
     [not found]                                     ` <54CF0111-E9C4-4196-BF92-7E134A6A329A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-10 23:55                                       ` Jason Gunthorpe
     [not found]                                         ` <20151110235551.GA12795-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-11-11  8:07                                           ` Christoph Hellwig
2015-11-11  9:28                                       ` Sagi Grimberg
     [not found]                                         ` <56430A20.6040600-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-11 16:19                                           ` Chuck Lever
     [not found]                                             ` <D4228973-6DFA-4181-9848-41BFE8E75200-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-11 16:29                                               ` Sagi Grimberg
     [not found]                                                 ` <56436CDB.6090305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-11 20:50                                                   ` Chuck Lever
2015-11-11  9:25                               ` Sagi Grimberg
     [not found]                                 ` <56430972.8080703-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-11 17:39                                   ` Jason Gunthorpe
2015-11-11  8:02                       ` Christoph Hellwig
     [not found]                         ` <20151111080225.GA31508-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-11 17:23                           ` Jason Gunthorpe
2015-11-10 16:12               ` Steve Wise
2015-11-10 10:44   ` [PATCH RFC 3/3] RDS_IW: " Sagi Grimberg
     [not found]     ` <1447152255-28231-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-11-10 11:26       ` Christoph Hellwig
     [not found]         ` <20151110112638.GB8991-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 11:35           ` Sagi Grimberg
2015-11-10 11:31   ` [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags Christoph Hellwig
     [not found]     ` <20151110113141.GA23093-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-10 11:40       ` Sagi Grimberg
2015-11-10 16:18       ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.