linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/25] Shared PD and MR
@ 2019-07-16 18:11 Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 01/25] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Shamir Rabinovitch
                   ` (25 more replies)
  0 siblings, 26 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

Following patch-set introduce the shared object feature.

A shared object feature allows one process to create HW objects (currently
PD and MR) so that a second process can import.

Patch-set is logically splits to 4 parts as the following:
- patches 1 to 8 and 18 are preparation steps.
- patches 9 to 14 are the implementation of import PD
- patches 15 to 17 are the implementation of the verb
- patches 19 to 14 are the implementation of import MR

Shamir Rabinovitch (17):
  RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
  RDMA/uverbs: Delete the macro uobj_put_obj_read
  RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
  IB/{core,hw}: ib_pd should not have ib_uobject pointer
  IB/core: ib_uobject need HW object reference count
  IB/uverbs: Helper function to initialize ufile member of
    uverbs_attr_bundle
  IB/uverbs: Add context import lock/unlock helper
  IB/uverbs: ufile must be freed only when not used anymore
  IB/verbs: Prototype of HW object clone callback
  IB/mlx4: Add implementation of clone_pd callback
  IB/mlx5: Add implementation of clone_pd callback
  RDMA/rxe: Add implementation of clone_pd callback
  IB/uverbs: Add clone reference counting to ib_pd
  IB/uverbs: Add PD import verb
  IB/mlx4: Enable import from FD verb
  IB/mlx5: Enable import from FD verb
  RDMA/rxe: Enable import from FD verb

Yuval Shaia (8):
  IB/core: Install clone ib_pd in device ops
  IB/core: ib_mr should not have ib_uobject pointer
  IB/core: Install clone ib_mr in device ops
  IB/mlx4: Add implementation of clone_pd callback
  IB/mlx5: Add implementation of clone_pd callback
  RDMA/rxe: Add implementation of clone_pd callback
  IB/uverbs: Add clone reference counting to ib_mr
  IB/uverbs: Add MR import verb

 drivers/infiniband/core/device.c              |   2 +
 drivers/infiniband/core/nldev.c               | 127 ++++-
 drivers/infiniband/core/rdma_core.c           |  52 +-
 drivers/infiniband/core/uverbs.h              |  24 +
 drivers/infiniband/core/uverbs_cmd.c          | 505 +++++++++++++++---
 drivers/infiniband/core/uverbs_main.c         |   5 +
 drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
 drivers/infiniband/core/verbs.c               |   4 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
 drivers/infiniband/hw/mlx4/main.c             |  18 +-
 drivers/infiniband/hw/mlx5/main.c             |  34 +-
 drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
 include/rdma/ib_verbs.h                       |  43 +-
 include/rdma/uverbs_std_types.h               |  11 +-
 include/uapi/rdma/ib_user_verbs.h             |  15 +
 include/uapi/rdma/rdma_netlink.h              |   3 +
 17 files changed, 740 insertions(+), 113 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/25] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 02/25] RDMA/uverbs: Delete the macro uobj_put_obj_read Shamir Rabinovitch
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

The uobject pointer will be removed from ib_x (ib_pd, ib_mr,...) objects.
uobj_get_obj_read and uobj_put_obj_read macros were constructed with the
assumption that ib_x can figure the uobject in mind.

Fix this wrong assumption.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 125 ++++++++++++++++++++-------
 include/rdma/uverbs_std_types.h      |   8 +-
 2 files changed, 98 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7ddd0e5bc6b3..b0dc301918bf 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -37,6 +37,7 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/list.h>
 
 #include <linux/uaccess.h>
 
@@ -711,6 +712,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_reg_mr      cmd;
 	struct ib_uverbs_reg_mr_resp resp;
 	struct ib_uobject           *uobj;
+	struct ib_uobject           *pduobj;
 	struct ib_pd                *pd;
 	struct ib_mr                *mr;
 	int                          ret;
@@ -731,7 +733,8 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(uobj))
 		return PTR_ERR(uobj);
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err_free;
@@ -799,6 +802,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 	struct ib_pd		    *old_pd;
 	int                          ret;
 	struct ib_uobject	    *uobj;
+	struct ib_uobject	    *pduobj = NULL;
 
 	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
 	if (ret)
@@ -831,7 +835,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 
 	if (cmd.flags & IB_MR_REREG_PD) {
 		pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle,
-				       attrs);
+				       attrs, &pduobj);
 		if (!pd) {
 			ret = -EINVAL;
 			goto put_uobjs;
@@ -885,6 +889,7 @@ static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_alloc_mw      cmd;
 	struct ib_uverbs_alloc_mw_resp resp;
 	struct ib_uobject             *uobj;
+	struct ib_uobject             *pduobj;
 	struct ib_pd                  *pd;
 	struct ib_mw                  *mw;
 	int                            ret;
@@ -898,7 +903,8 @@ static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(uobj))
 		return PTR_ERR(uobj);
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err_free;
@@ -1117,6 +1123,7 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_resize_cq	cmd;
 	struct ib_uverbs_resize_cq_resp	resp = {};
+	struct ib_uobject		*cquobj;
 	struct ib_cq			*cq;
 	int				ret = -EINVAL;
 
@@ -1124,7 +1131,8 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
@@ -1177,6 +1185,7 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_poll_cq_resp  resp;
 	u8 __user                     *header_ptr;
 	u8 __user                     *data_ptr;
+	struct ib_uobject	      *cquobj;
 	struct ib_cq                  *cq;
 	struct ib_wc                   wc;
 	int                            ret;
@@ -1185,7 +1194,8 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
@@ -1226,6 +1236,7 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_req_notify_cq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_req_notify_cq cmd;
+	struct ib_uobject	      *cquobj;
 	struct ib_cq                  *cq;
 	int ret;
 
@@ -1233,7 +1244,8 @@ static int ib_uverbs_req_notify_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
@@ -1276,16 +1288,21 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 {
 	struct ib_uqp_object		*obj;
 	struct ib_device		*device;
+	struct ib_uobject		*pd_uobj = ERR_PTR(-ENOENT);
 	struct ib_pd			*pd = NULL;
 	struct ib_xrcd			*xrcd = NULL;
 	struct ib_uobject		*xrcd_uobj = ERR_PTR(-ENOENT);
+	struct ib_uobject		*scq_uobj = ERR_PTR(-ENOENT);
+	struct ib_uobject		*rcq_uobj = ERR_PTR(-ENOENT);
 	struct ib_cq			*scq = NULL, *rcq = NULL;
+	struct ib_uobject		*srq_uobj = ERR_PTR(-ENOENT);
 	struct ib_srq			*srq = NULL;
 	struct ib_qp			*qp;
 	struct ib_qp_init_attr		attr = {};
 	struct ib_uverbs_ex_create_qp_resp resp;
 	int				ret;
 	struct ib_rwq_ind_table *ind_tbl = NULL;
+	struct ib_uobject	*ind_tbl_uobj = ERR_PTR(-ENOENT);
 	bool has_sq = true;
 	struct ib_device *ib_dev;
 
@@ -1303,7 +1320,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	if (cmd->comp_mask & IB_UVERBS_CREATE_QP_MASK_IND_TABLE) {
 		ind_tbl = uobj_get_obj_read(rwq_ind_table,
 					    UVERBS_OBJECT_RWQ_IND_TBL,
-					    cmd->rwq_ind_tbl_handle, attrs);
+					    cmd->rwq_ind_tbl_handle, attrs,
+					    &ind_tbl_uobj);
 		if (!ind_tbl) {
 			ret = -EINVAL;
 			goto err_put;
@@ -1342,7 +1360,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 		} else {
 			if (cmd->is_srq) {
 				srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ,
-							cmd->srq_handle, attrs);
+							cmd->srq_handle, attrs,
+							&srq_uobj);
 				if (!srq || srq->srq_type == IB_SRQT_XRC) {
 					ret = -EINVAL;
 					goto err_put;
@@ -1353,7 +1372,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 				if (cmd->recv_cq_handle != cmd->send_cq_handle) {
 					rcq = uobj_get_obj_read(
 						cq, UVERBS_OBJECT_CQ,
-						cmd->recv_cq_handle, attrs);
+						cmd->recv_cq_handle, attrs,
+						&rcq_uobj);
 					if (!rcq) {
 						ret = -EINVAL;
 						goto err_put;
@@ -1364,11 +1384,12 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 
 		if (has_sq)
 			scq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ,
-						cmd->send_cq_handle, attrs);
+						cmd->send_cq_handle, attrs,
+						&scq_uobj);
 		if (!ind_tbl)
 			rcq = rcq ?: scq;
 		pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle,
-				       attrs);
+				       attrs, &pd_uobj);
 		if (!pd || (!scq && has_sq)) {
 			ret = -EINVAL;
 			goto err_put;
@@ -1663,6 +1684,7 @@ static int ib_uverbs_query_qp(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_query_qp      cmd;
 	struct ib_uverbs_query_qp_resp resp;
+	struct ib_uobject	       *qpuobj;
 	struct ib_qp                   *qp;
 	struct ib_qp_attr              *attr;
 	struct ib_qp_init_attr         *init_attr;
@@ -1679,7 +1701,8 @@ static int ib_uverbs_query_qp(struct uverbs_attr_bundle *attrs)
 		goto out;
 	}
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -1776,6 +1799,7 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
 		     struct ib_uverbs_ex_modify_qp *cmd)
 {
 	struct ib_qp_attr *attr;
+	struct ib_uobject *qpuobj;
 	struct ib_qp *qp;
 	int ret;
 
@@ -1784,7 +1808,7 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
 		return -ENOMEM;
 
 	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd->base.qp_handle,
-			       attrs);
+			       attrs, &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -2018,6 +2042,7 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_send_wr       *user_wr;
 	struct ib_send_wr              *wr = NULL, *last, *next;
 	const struct ib_send_wr	       *bad_wr;
+	struct ib_uobject	       *qpuobj;
 	struct ib_qp                   *qp;
 	int                             i, sg_ind;
 	int				is_ud;
@@ -2045,7 +2070,8 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 	if (!user_wr)
 		return -ENOMEM;
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -2068,6 +2094,7 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 
 		if (is_ud) {
 			struct ib_ud_wr *ud;
+			struct ib_uobject *uobj;
 
 			if (user_wr->opcode != IB_WR_SEND &&
 			    user_wr->opcode != IB_WR_SEND_WITH_IMM) {
@@ -2083,7 +2110,8 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 			}
 
 			ud->ah = uobj_get_obj_read(ah, UVERBS_OBJECT_AH,
-						   user_wr->wr.ud.ah, attrs);
+						   user_wr->wr.ud.ah, attrs,
+						   &uobj);
 			if (!ud->ah) {
 				kfree(ud);
 				ret = -EINVAL;
@@ -2308,6 +2336,7 @@ static int ib_uverbs_post_recv(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_post_recv_resp resp;
 	struct ib_recv_wr              *wr, *next;
 	const struct ib_recv_wr	       *bad_wr;
+	struct ib_uobject	       *qpuobj;
 	struct ib_qp                   *qp;
 	int ret, ret2;
 	struct uverbs_req_iter iter;
@@ -2321,7 +2350,8 @@ static int ib_uverbs_post_recv(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(wr))
 		return PTR_ERR(wr);
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -2358,6 +2388,7 @@ static int ib_uverbs_post_srq_recv(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_post_srq_recv_resp resp;
 	struct ib_recv_wr                  *wr, *next;
 	const struct ib_recv_wr		   *bad_wr;
+	struct ib_uobject		   *srquobj;
 	struct ib_srq                      *srq;
 	int ret, ret2;
 	struct uverbs_req_iter iter;
@@ -2371,7 +2402,8 @@ static int ib_uverbs_post_srq_recv(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(wr))
 		return PTR_ERR(wr);
 
-	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs);
+	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs,
+				&srquobj);
 	if (!srq) {
 		ret = -EINVAL;
 		goto out;
@@ -2408,6 +2440,7 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_create_ah	 cmd;
 	struct ib_uverbs_create_ah_resp	 resp;
 	struct ib_uobject		*uobj;
+	struct ib_uobject		*pduobj;
 	struct ib_pd			*pd;
 	struct ib_ah			*ah;
 	struct rdma_ah_attr		attr = {};
@@ -2427,7 +2460,8 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 		goto err;
 	}
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err;
@@ -2497,6 +2531,7 @@ static int ib_uverbs_destroy_ah(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_attach_mcast(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_attach_mcast cmd;
+	struct ib_uobject	     *qpuobj;
 	struct ib_qp                 *qp;
 	struct ib_uqp_object         *obj;
 	struct ib_uverbs_mcast_entry *mcast;
@@ -2506,7 +2541,8 @@ static int ib_uverbs_attach_mcast(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp)
 		return -EINVAL;
 
@@ -2546,6 +2582,7 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_detach_mcast cmd;
 	struct ib_uqp_object         *obj;
+	struct ib_uobject	     *qpuobj;
 	struct ib_qp                 *qp;
 	struct ib_uverbs_mcast_entry *mcast;
 	int                           ret;
@@ -2555,7 +2592,8 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp)
 		return -EINVAL;
 
@@ -2665,6 +2703,8 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 				       union ib_flow_spec *ib_spec,
 				       struct ib_uflow_resources *uflow_res)
 {
+	struct ib_uobject *uobj;
+
 	ib_spec->type = kern_spec->type;
 	switch (ib_spec->type) {
 	case IB_FLOW_SPEC_ACTION_TAG:
@@ -2689,7 +2729,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 		ib_spec->action.act = uobj_get_obj_read(flow_action,
 							UVERBS_OBJECT_FLOW_ACTION,
 							kern_spec->action.handle,
-							attrs);
+							attrs, &uobj);
 		if (!ib_spec->action.act)
 			return -EINVAL;
 		ib_spec->action.size =
@@ -2707,7 +2747,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 			uobj_get_obj_read(counters,
 					  UVERBS_OBJECT_COUNTERS,
 					  kern_spec->flow_count.handle,
-					  attrs);
+					  attrs, &uobj);
 		if (!ib_spec->flow_count.counters)
 			return -EINVAL;
 		ib_spec->flow_count.size =
@@ -2909,6 +2949,8 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_ex_create_wq_resp resp = {};
 	struct ib_uwq_object           *obj;
 	int err = 0;
+	struct ib_uobject *cquobj;
+	struct ib_uobject *pduobj;
 	struct ib_cq *cq;
 	struct ib_pd *pd;
 	struct ib_wq *wq;
@@ -2927,13 +2969,15 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		err = -EINVAL;
 		goto err_uobj;
 	}
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq) {
 		err = -EINVAL;
 		goto err_put_pd;
@@ -3025,6 +3069,7 @@ static int ib_uverbs_ex_destroy_wq(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_ex_modify_wq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_ex_modify_wq cmd;
+	struct ib_uobject *wquobj;
 	struct ib_wq *wq;
 	struct ib_wq_attr wq_attr = {};
 	int ret;
@@ -3039,7 +3084,8 @@ static int ib_uverbs_ex_modify_wq(struct uverbs_attr_bundle *attrs)
 	if (cmd.attr_mask > (IB_WQ_STATE | IB_WQ_CUR_STATE | IB_WQ_FLAGS))
 		return -EINVAL;
 
-	wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ, cmd.wq_handle, attrs);
+	wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ, cmd.wq_handle, attrs,
+			       &wquobj);
 	if (!wq)
 		return -EINVAL;
 
@@ -3104,8 +3150,11 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 
 	for (num_read_wqs = 0; num_read_wqs < num_wq_handles;
 			num_read_wqs++) {
+		struct ib_uobject *uobj;
+
 		wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ,
-				       wqs_handles[num_read_wqs], attrs);
+				       wqs_handles[num_read_wqs], attrs,
+				       &uobj);
 		if (!wq) {
 			err = -EINVAL;
 			goto put_wqs;
@@ -3166,6 +3215,7 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 err_free:
 	kfree(wqs_handles);
 	kfree(wqs);
+
 	return err;
 }
 
@@ -3193,6 +3243,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 	struct ib_flow			  *flow_id;
 	struct ib_uverbs_flow_attr	  *kern_flow_attr;
 	struct ib_flow_attr		  *flow_attr;
+	struct ib_uobject		  *qpuobj;
 	struct ib_qp			  *qp;
 	struct ib_uflow_resources	  *uflow_res;
 	struct ib_uverbs_flow_spec_hdr	  *kern_spec;
@@ -3256,7 +3307,8 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 		goto err_free_attr;
 	}
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		err = -EINVAL;
 		goto err_uobj;
@@ -3371,6 +3423,8 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 {
 	struct ib_uverbs_create_srq_resp resp;
 	struct ib_usrq_object           *obj;
+	struct ib_uobject		*cquobj = ERR_PTR(-ENOENT);
+	struct ib_uobject		*pduobj;
 	struct ib_pd                    *pd;
 	struct ib_srq                   *srq;
 	struct ib_uobject               *uninitialized_var(xrcd_uobj);
@@ -3406,14 +3460,15 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 
 	if (ib_srq_has_cq(cmd->srq_type)) {
 		attr.ext.cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ,
-						cmd->cq_handle, attrs);
+						cmd->cq_handle, attrs, &cquobj);
 		if (!attr.ext.cq) {
 			ret = -EINVAL;
 			goto err_put_xrcd;
 		}
 	}
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err_put_cq;
@@ -3542,6 +3597,7 @@ static int ib_uverbs_create_xsrq(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_modify_srq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_modify_srq cmd;
+	struct ib_uobject	   *srquobj;
 	struct ib_srq              *srq;
 	struct ib_srq_attr          attr;
 	int                         ret;
@@ -3550,7 +3606,8 @@ static int ib_uverbs_modify_srq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs);
+	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs,
+				&srquobj);
 	if (!srq)
 		return -EINVAL;
 
@@ -3570,6 +3627,7 @@ static int ib_uverbs_query_srq(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_query_srq      cmd;
 	struct ib_uverbs_query_srq_resp resp;
 	struct ib_srq_attr              attr;
+	struct ib_uobject		*srquobj;
 	struct ib_srq                   *srq;
 	int                             ret;
 
@@ -3577,7 +3635,8 @@ static int ib_uverbs_query_srq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs);
+	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs,
+				&srquobj);
 	if (!srq)
 		return -EINVAL;
 
@@ -3689,6 +3748,7 @@ static int ib_uverbs_ex_query_device(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_ex_modify_cq cmd;
+	struct ib_uobject *cquobj;
 	struct ib_cq *cq;
 	int ret;
 
@@ -3702,7 +3762,8 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 	if (cmd.attr_mask > IB_CQ_MODERATE)
 		return -EOPNOTSUPP;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
index 05eabfd5d0d3..578e5c28bc1c 100644
--- a/include/rdma/uverbs_std_types.h
+++ b/include/rdma/uverbs_std_types.h
@@ -64,9 +64,11 @@ static inline void *_uobj_get_obj_read(struct ib_uobject *uobj)
 		return NULL;
 	return uobj->object;
 }
-#define uobj_get_obj_read(_object, _type, _id, _attrs)                         \
-	((struct ib_##_object *)_uobj_get_obj_read(                            \
-		uobj_get_read(_type, _id, _attrs)))
+#define uobj_get_obj_read(_object, _type, _id, _attrs, _uobj)                  \
+({                                                                             \
+	*_uobj = uobj_get_read(_type, _id, _attrs);                            \
+	((struct ib_##_object *)_uobj_get_obj_read(*_uobj));                   \
+})
 
 #define uobj_get_write(_type, _id, _attrs)                                     \
 	rdma_lookup_get_uobject(uobj_get_type(_attrs, _type), (_attrs)->ufile, \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/25] RDMA/uverbs: Delete the macro uobj_put_obj_read
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 01/25] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 03/25] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Shamir Rabinovitch
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

The macro uobj_put_obj_read assumes an existence of a member with the name
uobject on each HW object. Since this member is about to be delete we
shouldn't use this macro, instead use uobj_put_read and delete
uobj_put_obj_read.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 92 ++++++++++++++--------------
 include/rdma/uverbs_std_types.h      |  3 -
 2 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b0dc301918bf..204a93305f1c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -778,7 +778,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 	return uobj_alloc_commit(uobj, attrs);
 
@@ -786,7 +786,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
 
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 err_free:
 	uobj_alloc_abort(uobj, attrs);
@@ -864,7 +864,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 
 put_uobj_pd:
 	if (cmd.flags & IB_MR_REREG_PD)
-		uobj_put_obj_read(pd);
+		uobj_put_read(pduobj);
 
 put_uobjs:
 	uobj_put_write(uobj);
@@ -936,13 +936,13 @@ static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 	return uobj_alloc_commit(uobj, attrs);
 
 err_copy:
 	uverbs_dealloc_mw(mw);
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 err_free:
 	uobj_alloc_abort(uobj, attrs);
 	return ret;
@@ -1144,7 +1144,7 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 
 	ret = uverbs_response(attrs, &resp, sizeof(resp));
 out:
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 
 	return ret;
 }
@@ -1229,7 +1229,7 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 		ret = uverbs_output_written(attrs, UVERBS_ATTR_CORE_OUT);
 
 out_put:
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 	return ret;
 }
 
@@ -1252,7 +1252,7 @@ static int ib_uverbs_req_notify_cq(struct uverbs_attr_bundle *attrs)
 	ib_req_notify_cq(cq, cmd.solicited_only ?
 			 IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
 
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 
 	return 0;
 }
@@ -1505,15 +1505,15 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	}
 
 	if (pd)
-		uobj_put_obj_read(pd);
+		uobj_put_read(pd_uobj);
 	if (scq)
-		uobj_put_obj_read(scq);
+		uobj_put_read(scq_uobj);
 	if (rcq && rcq != scq)
-		uobj_put_obj_read(rcq);
+		uobj_put_read(rcq_uobj);
 	if (srq)
-		uobj_put_obj_read(srq);
+		uobj_put_read(srq_uobj);
 	if (ind_tbl)
-		uobj_put_obj_read(ind_tbl);
+		uobj_put_read(ind_tbl_uobj);
 
 	return uobj_alloc_commit(&obj->uevent.uobject, attrs);
 err_cb:
@@ -1523,15 +1523,15 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	if (!IS_ERR(xrcd_uobj))
 		uobj_put_read(xrcd_uobj);
 	if (pd)
-		uobj_put_obj_read(pd);
+		uobj_put_read(pd_uobj);
 	if (scq)
-		uobj_put_obj_read(scq);
+		uobj_put_read(scq_uobj);
 	if (rcq && rcq != scq)
-		uobj_put_obj_read(rcq);
+		uobj_put_read(rcq_uobj);
 	if (srq)
-		uobj_put_obj_read(srq);
+		uobj_put_read(srq_uobj);
 	if (ind_tbl)
-		uobj_put_obj_read(ind_tbl);
+		uobj_put_read(ind_tbl_uobj);
 
 	uobj_alloc_abort(&obj->uevent.uobject, attrs);
 	return ret;
@@ -1710,7 +1710,7 @@ static int ib_uverbs_query_qp(struct uverbs_attr_bundle *attrs)
 
 	ret = ib_query_qp(qp, attr, cmd.attr_mask, init_attr);
 
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 
 	if (ret)
 		goto out;
@@ -1948,7 +1948,7 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
 				      &attrs->driver_udata);
 
 release_qp:
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 out:
 	kfree(attr);
 
@@ -2216,11 +2216,11 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 		ret = ret2;
 
 out_put:
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 
 	while (wr) {
 		if (is_ud && ud_wr(wr)->ah)
-			uobj_put_obj_read(ud_wr(wr)->ah);
+			uobj_put_read(ud_wr(wr)->ah->uobject);
 		next = wr->next;
 		kfree(wr);
 		wr = next;
@@ -2360,7 +2360,7 @@ static int ib_uverbs_post_recv(struct uverbs_attr_bundle *attrs)
 	resp.bad_wr = 0;
 	ret = qp->device->ops.post_recv(qp->real_qp, wr, &bad_wr);
 
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 	if (ret) {
 		for (next = wr; next; next = next->next) {
 			++resp.bad_wr;
@@ -2412,7 +2412,7 @@ static int ib_uverbs_post_srq_recv(struct uverbs_attr_bundle *attrs)
 	resp.bad_wr = 0;
 	ret = srq->device->ops.post_srq_recv(srq, wr, &bad_wr);
 
-	uobj_put_obj_read(srq);
+	uobj_put_read(srquobj);
 
 	if (ret)
 		for (next = wr; next; next = next->next) {
@@ -2501,7 +2501,7 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 	return uobj_alloc_commit(uobj, attrs);
 
 err_copy:
@@ -2509,7 +2509,7 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 			     uverbs_get_cleared_udata(attrs));
 
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 err:
 	uobj_alloc_abort(uobj, attrs);
@@ -2573,7 +2573,7 @@ static int ib_uverbs_attach_mcast(struct uverbs_attr_bundle *attrs)
 
 out_put:
 	mutex_unlock(&obj->mcast_lock);
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 
 	return ret;
 }
@@ -2618,7 +2618,7 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
 
 out_put:
 	mutex_unlock(&obj->mcast_lock);
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 	return ret;
 }
 
@@ -2737,7 +2737,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 		flow_resources_add(uflow_res,
 				   IB_FLOW_SPEC_ACTION_HANDLE,
 				   ib_spec->action.act);
-		uobj_put_obj_read(ib_spec->action.act);
+		uobj_put_read(uobj);
 		break;
 	case IB_FLOW_SPEC_ACTION_COUNT:
 		if (kern_spec->flow_count.size !=
@@ -2755,7 +2755,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 		flow_resources_add(uflow_res,
 				   IB_FLOW_SPEC_ACTION_COUNT,
 				   ib_spec->flow_count.counters);
-		uobj_put_obj_read(ib_spec->flow_count.counters);
+		uobj_put_read(uobj);
 		break;
 	default:
 		return -EINVAL;
@@ -3022,16 +3022,16 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
 	if (err)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
-	uobj_put_obj_read(cq);
+	uobj_put_read(pduobj);
+	uobj_put_read(cquobj);
 	return uobj_alloc_commit(&obj->uevent.uobject, attrs);
 
 err_copy:
 	ib_destroy_wq(wq, uverbs_get_cleared_udata(attrs));
 err_put_cq:
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 err_put_pd:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 err_uobj:
 	uobj_alloc_abort(&obj->uevent.uobject, attrs);
 
@@ -3097,7 +3097,7 @@ static int ib_uverbs_ex_modify_wq(struct uverbs_attr_bundle *attrs)
 	}
 	ret = wq->device->ops.modify_wq(wq, &wq_attr, cmd.attr_mask,
 					&attrs->driver_udata);
-	uobj_put_obj_read(wq);
+	uobj_put_read(wquobj);
 	return ret;
 }
 
@@ -3201,7 +3201,7 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 	kfree(wqs_handles);
 
 	for (j = 0; j < num_read_wqs; j++)
-		uobj_put_obj_read(wqs[j]);
+		uobj_put_read(wqs[j]->uobject);
 
 	return uobj_alloc_commit(uobj, attrs);
 
@@ -3211,7 +3211,7 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 	uobj_alloc_abort(uobj, attrs);
 put_wqs:
 	for (j = 0; j < num_read_wqs; j++)
-		uobj_put_obj_read(wqs[j]);
+		uobj_put_read(wqs[j]->uobject);
 err_free:
 	kfree(wqs_handles);
 	kfree(wqs);
@@ -3380,7 +3380,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 	if (err)
 		goto err_copy;
 
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 	kfree(flow_attr);
 	if (cmd.flow_attr.num_of_specs)
 		kfree(kern_flow_attr);
@@ -3393,7 +3393,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 err_free_flow_attr:
 	kfree(flow_attr);
 err_put:
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 err_uobj:
 	uobj_alloc_abort(uobj, attrs);
 err_free_attr:
@@ -3532,9 +3532,9 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 		uobj_put_read(xrcd_uobj);
 
 	if (ib_srq_has_cq(cmd->srq_type))
-		uobj_put_obj_read(attr.ext.cq);
+		uobj_put_read(cquobj);
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 	return uobj_alloc_commit(&obj->uevent.uobject, attrs);
 
 err_copy:
@@ -3543,11 +3543,11 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 err_free:
 	kfree(srq);
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 err_put_cq:
 	if (ib_srq_has_cq(cmd->srq_type))
-		uobj_put_obj_read(attr.ext.cq);
+		uobj_put_read(cquobj);
 
 err_put_xrcd:
 	if (cmd->srq_type == IB_SRQT_XRC) {
@@ -3617,7 +3617,7 @@ static int ib_uverbs_modify_srq(struct uverbs_attr_bundle *attrs)
 	ret = srq->device->ops.modify_srq(srq, &attr, cmd.attr_mask,
 					  &attrs->driver_udata);
 
-	uobj_put_obj_read(srq);
+	uobj_put_read(srquobj);
 
 	return ret;
 }
@@ -3642,7 +3642,7 @@ static int ib_uverbs_query_srq(struct uverbs_attr_bundle *attrs)
 
 	ret = ib_query_srq(srq, &attr);
 
-	uobj_put_obj_read(srq);
+	uobj_put_read(srquobj);
 
 	if (ret)
 		return ret;
@@ -3769,7 +3769,7 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 
 	ret = rdma_set_cq_moderation(cq, cmd.attr.cq_count, cmd.attr.cq_period);
 
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 
 	return ret;
 }
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
index 578e5c28bc1c..02641a3241e3 100644
--- a/include/rdma/uverbs_std_types.h
+++ b/include/rdma/uverbs_std_types.h
@@ -98,9 +98,6 @@ static inline void uobj_put_read(struct ib_uobject *uobj)
 	rdma_lookup_put_uobject(uobj, UVERBS_LOOKUP_READ);
 }
 
-#define uobj_put_obj_read(_obj)					\
-	uobj_put_read((_obj)->uobject)
-
 static inline void uobj_put_write(struct ib_uobject *uobj)
 {
 	rdma_lookup_put_uobject(uobj, UVERBS_LOOKUP_WRITE);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/25] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 01/25] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 02/25] RDMA/uverbs: Delete the macro uobj_put_obj_read Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-18 16:05   ` Leon Romanovsky
  2019-07-16 18:11 ` [PATCH 04/25] IB/{core,hw}: ib_pd should not have ib_uobject pointer Shamir Rabinovitch
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

In shared object model ib_pd can belong to 1 or more ib_ucontext.
Fix the nldev code so it could report multiple context ids.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/nldev.c  | 127 +++++++++++++++++++++++++++++--
 include/uapi/rdma/rdma_netlink.h |   3 +
 2 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 783e465e7c41..5167228dea1c 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -41,6 +41,7 @@
 #include "core_priv.h"
 #include "cma_priv.h"
 #include "restrack.h"
+#include "uverbs.h"
 
 /*
  * Sort array elements by the netlink attribute name
@@ -141,6 +142,8 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_UVERBS_DRIVER_ID]	= { .type = NLA_U32 },
 	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
 	[RDMA_NLDEV_SYS_ATTR_NETNS_MODE]	= { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_RES_CTX]		= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_RES_CTX_ENTRY]		= { .type = NLA_NESTED },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -611,11 +614,84 @@ static int fill_res_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
 err:	return -EMSGSIZE;
 }
 
+struct context_id {
+	struct list_head list;
+	u32 id;
+};
+
+static void pd_context(struct ib_pd *pd, struct list_head *list, int *count)
+{
+	struct ib_device *device = pd->device;
+	struct rdma_restrack_entry *res;
+	struct rdma_restrack_root *rt;
+	struct ib_uverbs_file *ufile;
+	struct ib_ucontext *ucontext;
+	struct ib_uobject *uobj;
+	unsigned long flags;
+	unsigned long id;
+	bool found;
+
+	rt = &device->res[RDMA_RESTRACK_CTX];
+
+	xa_lock(&rt->xa);
+
+	xa_for_each(&rt->xa, id, res) {
+		if (!rdma_is_visible_in_pid_ns(res))
+			continue;
+
+		if (!rdma_restrack_get(res))
+			continue;
+
+		xa_unlock(&rt->xa);
+
+		ucontext = container_of(res, struct ib_ucontext, res);
+		ufile = ucontext->ufile;
+		found = false;
+
+		/* See locking requirements in struct ib_uverbs_file */
+		down_read(&ufile->hw_destroy_rwsem);
+		spin_lock_irqsave(&ufile->uobjects_lock, flags);
+
+		list_for_each_entry(uobj, &ufile->uobjects, list) {
+			if (uobj->object == pd) {
+				found = true;
+				goto found;
+			}
+		}
+
+found:		spin_unlock_irqrestore(&ufile->uobjects_lock, flags);
+		up_read(&ufile->hw_destroy_rwsem);
+
+		if (found) {
+			struct context_id *ctx_id =
+				kmalloc(sizeof(*ctx_id), GFP_KERNEL);
+
+			if (WARN_ON_ONCE(!ctx_id))
+				goto next;
+
+			ctx_id->id = ucontext->res.id;
+			list_add(&ctx_id->list, list);
+			(*count)++;
+		}
+
+next:		rdma_restrack_put(res);
+		xa_lock(&rt->xa);
+	}
+
+	xa_unlock(&rt->xa);
+}
+
 static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 			     struct rdma_restrack_entry *res, uint32_t port)
 {
 	struct ib_pd *pd = container_of(res, struct ib_pd, res);
 	struct ib_device *dev = pd->device;
+	struct nlattr *table_attr = NULL;
+	struct nlattr *entry_attr = NULL;
+	struct context_id *ctx_id;
+	struct context_id *tmp;
+	LIST_HEAD(pd_context_ids);
+	int ctx_count = 0;
 
 	if (has_cap_net_admin) {
 		if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LOCAL_DMA_LKEY,
@@ -633,10 +709,38 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_PDN, res->id))
 		goto err;
 
-	if (!rdma_is_kernel_res(res) &&
-	    nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
-			pd->uobject->context->res.id))
-		goto err;
+	if (!rdma_is_kernel_res(res)) {
+		pd_context(pd, &pd_context_ids, &ctx_count);
+		if (ctx_count == 1) {
+			/* user pd, not shared */
+			ctx_id = list_first_entry(&pd_context_ids,
+						  struct context_id, list);
+			if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
+					ctx_id->id))
+				goto err;
+		} else if (ctx_count > 1) {
+			/* user pd, shared */
+			table_attr = nla_nest_start(msg,
+					RDMA_NLDEV_ATTR_RES_CTX);
+			if (!table_attr)
+				goto err;
+
+			list_for_each_entry(ctx_id, &pd_context_ids, list) {
+				entry_attr = nla_nest_start(msg,
+						RDMA_NLDEV_ATTR_RES_CTX_ENTRY);
+				if (!entry_attr)
+					goto err;
+				if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
+						ctx_id->id))
+					goto err;
+				nla_nest_end(msg, entry_attr);
+				entry_attr = NULL;
+			}
+
+			nla_nest_end(msg, table_attr);
+			table_attr = NULL;
+		}
+	}
 
 	if (fill_res_name_pid(msg, res))
 		goto err;
@@ -644,9 +748,22 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 	if (fill_res_entry(dev, msg, res))
 		goto err;
 
+	list_for_each_entry_safe(ctx_id, tmp, &pd_context_ids, list)
+		kfree(ctx_id);
+
 	return 0;
 
-err:	return -EMSGSIZE;
+err:
+	if (entry_attr)
+		nla_nest_end(msg, entry_attr);
+
+	if (table_attr)
+		nla_nest_end(msg, table_attr);
+
+	list_for_each_entry_safe(ctx_id, tmp, &pd_context_ids, list)
+		kfree(ctx_id);
+
+	return -EMSGSIZE;
 }
 
 static int fill_stat_counter_mode(struct sk_buff *msg,
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 8e277783fa96..7fbbfb07f071 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -525,6 +525,9 @@ enum rdma_nldev_attr {
 	 */
 	RDMA_NLDEV_ATTR_DEV_DIM,                /* u8 */
 
+	RDMA_NLDEV_ATTR_RES_CTX,		/* nested table */
+	RDMA_NLDEV_ATTR_RES_CTX_ENTRY,		/* nested table */
+
 	/*
 	 * Always the end
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/25] IB/{core,hw}: ib_pd should not have ib_uobject pointer
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (2 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 03/25] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 05/25] IB/core: ib_uobject need HW object reference count Shamir Rabinovitch
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

As a preparation step to shared PD, where ib_pd object will be pointed
by one or more ib_uobjects, remove ib_uobject pointer from ib_pd struct.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c       | 1 -
 drivers/infiniband/core/verbs.c            | 1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 1 -
 drivers/infiniband/hw/mlx5/main.c          | 1 -
 drivers/infiniband/hw/mthca/mthca_qp.c     | 3 ++-
 include/rdma/ib_verbs.h                    | 1 -
 6 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 204a93305f1c..d1f0c04f0ae8 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -432,7 +432,6 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
 	}
 
 	pd->device  = ib_dev;
-	pd->uobject = uobj;
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->res.type = RDMA_RESTRACK_PD;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 92349bf37589..56341f514914 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -263,7 +263,6 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 		return ERR_PTR(-ENOMEM);
 
 	pd->device = device;
-	pd->uobject = NULL;
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->flags = flags;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 1a2c7dad2a0d..247f8939914c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -760,7 +760,6 @@ static int hns_roce_v1_rsv_lp_qp(struct hns_roce_dev *hr_dev)
 
 	free_mr->mr_free_pd = to_hr_pd(pd);
 	free_mr->mr_free_pd->ibpd.device  = &hr_dev->ib_dev;
-	free_mr->mr_free_pd->ibpd.uobject = NULL;
 	free_mr->mr_free_pd->ibpd.__internal_mr = NULL;
 	atomic_set(&free_mr->mr_free_pd->ibpd.usecnt, 0);
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index c2a5780cb394..34378acb28d4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4937,7 +4937,6 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 		return -ENOMEM;
 
 	devr->p0->device  = ibdev;
-	devr->p0->uobject = NULL;
 	atomic_set(&devr->p0->usecnt, 0);
 
 	ret = mlx5_ib_alloc_pd(devr->p0, NULL);
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index d04c245359eb..b1a9169e3af6 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -956,7 +956,8 @@ static int mthca_max_data_size(struct mthca_dev *dev, struct mthca_qp *qp, int d
 static inline int mthca_max_inline_data(struct mthca_pd *pd, int max_data_size)
 {
 	/* We don't support inline data for kernel QPs (yet). */
-	return pd->ibpd.uobject ? max_data_size - MTHCA_INLINE_HEADER_SIZE : 0;
+	return !rdma_is_kernel_res(&pd->ibpd.res) ?
+		max_data_size - MTHCA_INLINE_HEADER_SIZE : 0;
 }
 
 static void mthca_adjust_qp_caps(struct mthca_dev *dev,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c5f8a9f17063..0900952d9b1f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1457,7 +1457,6 @@ struct ib_pd {
 	u32			local_dma_lkey;
 	u32			flags;
 	struct ib_device       *device;
-	struct ib_uobject      *uobject;
 	atomic_t          	usecnt; /* count all resources */
 
 	u32			unsafe_global_rkey;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/25] IB/core: ib_uobject need HW object reference count
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (3 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 04/25] IB/{core,hw}: ib_pd should not have ib_uobject pointer Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 06/25] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Shamir Rabinovitch
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

This new refcnt will points to the refcnt member of the HW object and will
behaves as expected by refcnt, i.e. will be increased and decreased as a
result of usage changes and will destroy the object when reaches to zero.
For a non-shared object refcnt will remain NULL.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/rdma_core.c | 23 +++++++++++++++++++++--
 include/rdma/ib_verbs.h             |  7 +++++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..651625f632d7 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -516,7 +516,26 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
 	const struct uverbs_obj_idr_type *idr_type =
 		container_of(uobj->uapi_object->type_attrs,
 			     struct uverbs_obj_idr_type, type);
-	int ret = idr_type->destroy_object(uobj, why, attrs);
+	static DEFINE_MUTEX(lock);
+	int ret, count;
+
+	mutex_lock(&lock);
+
+	if (uobj->refcnt) {
+		count = atomic_dec_return(uobj->refcnt);
+		WARN_ON(count < 0); /* use after free! */
+		if (count) {
+			mutex_unlock(&lock);
+			goto skip;
+		}
+	}
+
+	ret = idr_type->destroy_object(uobj, why, attrs);
+
+	if (ret)
+		atomic_inc(uobj->refcnt);
+
+	mutex_unlock(&lock);
 
 	/*
 	 * We can only fail gracefully if the user requested to destroy the
@@ -525,7 +544,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
 	 */
 	if (ib_is_destroy_retryable(ret, why, uobj))
 		return ret;
-
+skip:
 	if (why == RDMA_REMOVE_ABORT)
 		return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0900952d9b1f..d19c0394b05a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1444,6 +1444,13 @@ struct ib_uobject {
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 
 	const struct uverbs_api_object *uapi_object;
+
+	/*
+	 * ib_X HW object sharing support
+	 * - NULL for HW objects that are not shareable
+	 * - Pointer to ib_X reference counter for shareable HW objects
+	 */
+	atomic_t	       *refcnt;		/* ib_X object ref count */
 };
 
 struct ib_udata {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/25] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (4 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 05/25] IB/core: ib_uobject need HW object reference count Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 07/25] IB/uverbs: Add context import lock/unlock helper Shamir Rabinovitch
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Helper function to initialize ufile member of structure uverbs_attr_bundle.

This helper produce structure uverbs_attr_bundle that is safe for the below
operations on the given ufile:

- uobj_get_[read|write]
- uobj_alloc_commit
- uobj_alloc_abort

The last 2 ops are more complicated. Abort for example triggers the
below which pass the uverbs_attr_bundle driver_udata down to the
drivers level:

uobj_alloc_abort
 rdma_alloc_abort_uobject
  uverbs_destroy_uobject
   uobj->uapi_object->type_class->destroy_hw
    destroy_hw_idr_uobject
     idr_type->destroy_object
      uverbs_free_pd
       ib_dealloc_pd_user
        pd->device->ops.dealloc_pd
	 mlx4_ib_dealloc_pd

For more information about the potential issues, please see
commit f89adedaf3fe ("RDMA/uverbs: Initialize udata struct
on destroy flows")

Cc: Gal Pressman <galpress@amazon.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index d1f0c04f0ae8..4f42f9732dca 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3773,6 +3773,24 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 	return ret;
 }
 
+/**
+ * uverbs_init_attrs_ufile - Helper function to create minimal
+ * uverbs_attr_bundle out of ib_uverbs_file that is suitable
+ * for the below operations:
+ *
+ * - uobj_get_[read|write]
+ * - uobj_alloc_commit
+ * - uobj_alloc_abort
+ */
+static void uverbs_init_attrs_ufile(struct uverbs_attr_bundle *attrs_bundle,
+				    struct ib_uverbs_file *ufile)
+{
+	*attrs_bundle = (struct uverbs_attr_bundle) {
+		.ufile = ufile,
+		.context = ufile->ucontext,
+	};
+}
+
 /*
  * Describe the input structs for write(). Some write methods have an input
  * only struct, most have an input and output. If the struct has an output then
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/25] IB/uverbs: Add context import lock/unlock helper
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (5 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 06/25] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore Shamir Rabinovitch
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

The lock/unlock helpers will be used in every import verb.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs.h      |  2 +
 drivers/infiniband/core/uverbs_cmd.c  | 73 +++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |  1 +
 3 files changed, 76 insertions(+)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 1e5aeb39f774..cf76336cb460 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -163,6 +163,8 @@ struct ib_uverbs_file {
 	struct page *disassociate_page;
 
 	struct xarray		idr;
+
+	struct file	       *filp;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 4f42f9732dca..21f0a1a986f4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -43,6 +43,7 @@
 
 #include <rdma/uverbs_types.h>
 #include <rdma/uverbs_std_types.h>
+#include <rdma/uverbs_ioctl.h>
 #include "rdma_core.h"
 
 #include "uverbs.h"
@@ -3791,6 +3792,78 @@ static void uverbs_init_attrs_ufile(struct uverbs_attr_bundle *attrs_bundle,
 	};
 }
 
+/* ib_uverbs_import_lock - Function which gathers code that is
+ *	common in the import verbs.
+ *
+ *	This function guarntee that both source and destination files are
+ *	protected from race with vfs close. The current file is protected
+ *	from such race because verb is executed in a system-call context.
+ *	The other file is protected by 'fget'. This function also ensures
+ *	that ib_uobject identified by the type & handle is locked for read.
+ *
+ *	Callers of this helper must also call ib_uverbs_import_unlock
+ *	to undo any locking performed by this helper.
+ */
+static int ib_uverbs_import_lock(struct uverbs_attr_bundle *attrs,
+				 int fd, u16 type, u32 handle,
+				 struct ib_uobject **uobj,
+				 struct file **filep,
+				 struct ib_uverbs_file **ufile)
+{
+	struct ib_uverbs_file *file = attrs->ufile;
+	struct ib_uverbs_device *dev = file->device;
+	struct uverbs_attr_bundle fd_attrs;
+	struct ib_uverbs_device *fd_dev;
+	int ret = 0;
+
+	*filep = fget(fd);
+	if (!*filep)
+		return -EINVAL;
+
+	/* check uverbs ops exist */
+	if ((*filep)->f_op != file->filp->f_op) {
+		ret = -EINVAL;
+		goto file;
+	}
+
+	*ufile = (*filep)->private_data;
+	fd_dev = (*ufile)->device;
+
+	/* check that both files belong to same ib_device */
+	if (dev != fd_dev) {
+		ret = -EINVAL;
+		goto file;
+	}
+
+	uverbs_init_attrs_ufile(&fd_attrs, *ufile);
+
+	*uobj = uobj_get_read(type, handle, &fd_attrs);
+	if (IS_ERR(*uobj)) {
+		ret = -EINVAL;
+		goto file;
+	}
+
+	/* verify ib_object is shareable */
+	if (!(*uobj)->refcnt) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	return 0;
+uobj:
+	uobj_put_read(*uobj);
+file:
+	fput(*filep);
+	return ret;
+}
+
+static void ib_uverbs_import_unlock(struct ib_uobject *uobj,
+				    struct file *filep)
+{
+	uobj_put_read(uobj);
+	fput(filep);
+}
+
 /*
  * Describe the input structs for write(). Some write methods have an input
  * only struct, most have an input and output. If the struct has an output then
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 11c13c1381cf..fa88a586284c 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -1091,6 +1091,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	mutex_init(&file->umap_lock);
 	INIT_LIST_HEAD(&file->umaps);
 
+	file->filp = filp;
 	filp->private_data = file;
 	list_add_tail(&file->list, &dev->uverbs_file_list);
 	mutex_unlock(&dev->lists_mutex);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (6 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 07/25] IB/uverbs: Add context import lock/unlock helper Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-17 11:53   ` Jason Gunthorpe
  2019-07-16 18:11 ` [PATCH 09/25] IB/verbs: Prototype of HW object clone callback Shamir Rabinovitch
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

ufile (&ucontext) with the process who own them must not be released
when there are other ufile (&ucontext) that depens at them.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/rdma_core.c   | 29 +++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs.h      | 22 ++++++++++++++++++++
 drivers/infiniband/core/uverbs_cmd.c  | 16 +++++++++++++++
 drivers/infiniband/core/uverbs_main.c |  4 ++++
 4 files changed, 71 insertions(+)

diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 651625f632d7..c81ff8e28fc6 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -841,6 +841,33 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
 	ufile->ucontext = NULL;
 }
 
+static void __uverbs_ufile_refcount(struct ib_uverbs_file *ufile)
+{
+	int wait;
+
+	if (ufile->parent) {
+		pr_debug("%s: release parent ufile. ufile %p parent %p\n",
+			 __func__, ufile, ufile->parent);
+		if (atomic_dec_and_test(&ufile->parent->refcount))
+			complete(&ufile->parent->context_released);
+	}
+
+	if (!atomic_dec_and_test(&ufile->refcount)) {
+wait:
+		wait = wait_for_completion_interruptible_timeout(
+			&ufile->context_released, 3*HZ);
+		if (wait == -ERESTARTSYS) {
+			WARN_ONCE(1,
+			"signal while waiting for context release! ufile %p\n",
+				ufile);
+		} else if (wait == 0) {
+			pr_debug("%s: timeout while waiting for context release! ufile %p\n",
+				 __func__, ufile);
+			goto wait;
+		}
+	}
+}
+
 static int __uverbs_cleanup_ufile(struct ib_uverbs_file *ufile,
 				  enum rdma_remove_reason reason)
 {
@@ -923,6 +950,8 @@ void uverbs_destroy_ufile_hw(struct ib_uverbs_file *ufile,
 	if (!list_empty(&ufile->uobjects))
 		__uverbs_cleanup_ufile(ufile, reason);
 
+	__uverbs_ufile_refcount(ufile);
+
 	ufile_destroy_ucontext(ufile, reason);
 
 done:
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index cf76336cb460..c7e711188567 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -165,6 +165,28 @@ struct ib_uverbs_file {
 	struct xarray		idr;
 
 	struct file	       *filp;
+
+	/* context sharing support */
+
+	/*
+	 * refcount=1 : zero external context depend on this context
+	 * refcount>1 : (refcount-1) external context depend on this context
+	 */
+	atomic_t		refcount;
+
+	/*
+	 * parent =NULL : this context has not imported any objects
+	 * parent!=NULL : this context imported objects from parent context
+	 */
+	struct ib_uverbs_file  *parent;
+
+	/*
+	 * context_released completion is signalled by the last
+	 * user of this context in case of shared context.
+	 * any pre matured release of the context will be delayed
+	 * till the last user of this context is gone.
+	 */
+	struct completion	context_released;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 21f0a1a986f4..8d015587946b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3816,6 +3816,10 @@ static int ib_uverbs_import_lock(struct uverbs_attr_bundle *attrs,
 	struct ib_uverbs_device *fd_dev;
 	int ret = 0;
 
+	/* shared context cannot import */
+	if (atomic_read(&file->refcount) > 1)
+		return -EINVAL;
+
 	*filep = fget(fd);
 	if (!*filep)
 		return -EINVAL;
@@ -3829,6 +3833,18 @@ static int ib_uverbs_import_lock(struct uverbs_attr_bundle *attrs,
 	*ufile = (*filep)->private_data;
 	fd_dev = (*ufile)->device;
 
+	if (file->parent) {
+		/* multiple import must use same parent context! */
+		if (file->parent != *ufile) {
+			ret = -EINVAL;
+			goto file;
+		}
+	} else {
+		/* first import must update parent refcount */
+		file->parent = *ufile;
+		atomic_inc(&(*ufile)->refcount);
+	}
+
 	/* check that both files belong to same ib_device */
 	if (dev != fd_dev) {
 		ret = -EINVAL;
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index fa88a586284c..3c28034a7a9a 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -1081,6 +1081,10 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 		goto err;
 	}
 
+	/* shared context support */
+	atomic_set(&file->refcount, 1);
+	init_completion(&file->context_released);
+
 	file->device	 = dev;
 	kref_init(&file->ref);
 	mutex_init(&file->ucontext_lock);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/25] IB/verbs: Prototype of HW object clone callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (7 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 10/25] IB/core: Install clone ib_pd in device ops Shamir Rabinovitch
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Define prototype for clone callback. The clone callback is used
by the driver layer to supply the uverbs a way to clone IB HW
object driver data to rdma-core user space provider. The clone
callback is used when new IB HW object is created and every time
it is imported to some ib_ucontext. Drivers that wish to enable
share of some IB HW object (ib_pd, ib_mr, etc..) must supply valid
clone callback for that type.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 include/rdma/ib_verbs.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index d19c0394b05a..6a3be5629e55 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2213,6 +2213,18 @@ struct iw_cm_conn_param;
 
 #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
 
+/*
+ * Prototype for IB HW object clone callback
+ *
+ * Define prototype for clone callback. The clone callback is used
+ * by the driver layer to supply the uverbs a way to clone IB HW
+ * object driver data to rdma-core user space provider. The clone
+ * callback is used when new IB HW object is created and every time
+ * it is imported to some ib_ucontext.
+ */
+#define clone_callback(ib_type)		\
+	int (*clone_##ib_type)(struct ib_udata *udata, struct ib_type *obj)
+
 /**
  * struct ib_device_ops - InfiniBand device operations
  * This structure defines all the InfiniBand device operations, providers will
@@ -2523,6 +2535,9 @@ struct ib_device_ops {
 	 */
 	int (*counter_update_stats)(struct rdma_counter *counter);
 
+	/* Object sharing callbacks */
+	clone_callback(ib_pd);
+
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
 	DECLARE_RDMA_OBJ_SIZE(ib_pd);
@@ -2530,6 +2545,17 @@ struct ib_device_ops {
 	DECLARE_RDMA_OBJ_SIZE(ib_ucontext);
 };
 
+/* Implementation of trivial clone callback */
+#define trivial_clone_callback(ib_type)					\
+static inline int trivial_clone_##ib_type(struct ib_udata *udata,	\
+					  struct ib_type *obj)		\
+{									\
+	return 0;							\
+}
+
+/* Shared IB HW object support */
+trivial_clone_callback(ib_pd);
+
 struct ib_core_device {
 	/* device must be the first element in structure until,
 	 * union of ib_core_device and device exists in ib_device.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/25] IB/core: Install clone ib_pd in device ops
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (8 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 09/25] IB/verbs: Prototype of HW object clone callback Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 11/25] IB/mlx4: Add implementation of clone_pd callback Shamir Rabinovitch
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

Install clone_ib_pd in the device ops.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 7f4affe8a10d..d4996e0f2976 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2572,6 +2572,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, set_vf_guid);
 	SET_DEVICE_OP(dev_ops, set_vf_link_state);
 	SET_DEVICE_OP(dev_ops, unmap_fmr);
+	SET_DEVICE_OP(dev_ops, clone_ib_pd);
 
 	SET_OBJ_SIZE(dev_ops, ib_ah);
 	SET_OBJ_SIZE(dev_ops, ib_cq);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/25] IB/mlx4: Add implementation of clone_pd callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (9 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 10/25] IB/core: Install clone ib_pd in device ops Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 12/25] IB/mlx5: " Shamir Rabinovitch
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Copy mlx4 ib_pd to user-space.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx4/main.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8790101facb7..8adc569ce4fd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1178,6 +1178,13 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 	}
 }
 
+static int mlx4_ib_clone_pd(struct ib_udata *udata, struct ib_pd *ibpd)
+{
+	struct mlx4_ib_pd *pd = to_mpd(ibpd);
+
+	return udata ? ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32)) : 0;
+}
+
 static int mlx4_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct mlx4_ib_pd *pd = to_mpd(ibpd);
@@ -1188,10 +1195,12 @@ static int mlx4_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 	if (err)
 		return err;
 
-	if (udata && ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32))) {
+	err = mlx4_ib_clone_pd(udata, ibpd);
+	if (err) {
 		mlx4_pd_free(to_mdev(ibdev)->dev, pd->pdn);
 		return -EFAULT;
 	}
+
 	return 0;
 }
 
@@ -2564,6 +2573,9 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 	.rereg_user_mr = mlx4_ib_rereg_user_mr,
 	.resize_cq = mlx4_ib_resize_cq,
 
+	/* Object sharing callbacks */
+	.clone_ib_pd = mlx4_ib_clone_pd,
+
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx4_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx4_ib_cq, ibcq),
 	INIT_RDMA_OBJ_SIZE(ib_pd, mlx4_ib_pd, ibpd),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/25] IB/mlx5: Add implementation of clone_pd callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (10 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 11/25] IB/mlx4: Add implementation of clone_pd callback Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 13/25] RDMA/rxe: " Shamir Rabinovitch
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Copy mlx5 ib_pd to user-space.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx5/main.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 34378acb28d4..c8eeeea1ef6d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2464,11 +2464,24 @@ int mlx5_ib_dealloc_dm(struct ib_dm *ibdm, struct uverbs_attr_bundle *attrs)
 	return 0;
 }
 
+static int mlx5_ib_clone_pd(struct ib_udata *udata, struct ib_pd *ibpd)
+{
+	struct mlx5_ib_pd *pd = to_mpd(ibpd);
+	struct mlx5_ib_alloc_pd_resp resp;
+	int ret = 0;
+
+	if (udata) {
+		resp.pdn = pd->pdn;
+		ret = ib_copy_to_udata(udata, &resp, sizeof(resp));
+	}
+
+	return ret;
+}
+
 static int mlx5_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct mlx5_ib_pd *pd = to_mpd(ibpd);
 	struct ib_device *ibdev = ibpd->device;
-	struct mlx5_ib_alloc_pd_resp resp;
 	int err;
 	u32 out[MLX5_ST_SZ_DW(alloc_pd_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(alloc_pd_in)]   = {};
@@ -2486,12 +2499,11 @@ static int mlx5_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 
 	pd->pdn = MLX5_GET(alloc_pd_out, out, pd);
 	pd->uid = uid;
-	if (udata) {
-		resp.pdn = pd->pdn;
-		if (ib_copy_to_udata(udata, &resp, sizeof(resp))) {
-			mlx5_cmd_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn, uid);
-			return -EFAULT;
-		}
+
+	err = mlx5_ib_clone_pd(udata, ibpd);
+	if (err) {
+		mlx5_cmd_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn, uid);
+		return err;
 	}
 
 	return 0;
@@ -6301,6 +6313,9 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
 	.resize_cq = mlx5_ib_resize_cq,
 
+	/* Object sharing callbacks */
+	.clone_ib_pd = mlx5_ib_clone_pd,
+
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx5_ib_cq, ibcq),
 	INIT_RDMA_OBJ_SIZE(ib_pd, mlx5_ib_pd, ibpd),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/25] RDMA/rxe: Add implementation of clone_pd callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (11 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 12/25] IB/mlx5: " Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 14/25] IB/uverbs: Add clone reference counting to ib_pd Shamir Rabinovitch
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Copy rxe ib_pd to user-space.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 4ebdfcf4d33e..6a23bb533e18 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1148,6 +1148,9 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.req_notify_cq = rxe_req_notify_cq,
 	.resize_cq = rxe_resize_cq,
 
+	/* Object sharing callbacks */
+	.clone_ib_pd = trivial_clone_ib_pd,
+
 	INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq),
 	INIT_RDMA_OBJ_SIZE(ib_pd, rxe_pd, ibpd),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/25] IB/uverbs: Add clone reference counting to ib_pd
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (12 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 13/25] RDMA/rxe: " Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 15/25] IB/uverbs: Add PD import verb Shamir Rabinovitch
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

To keep track of shared object life time add ref count to ib_pd

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 3 +++
 include/rdma/ib_verbs.h              | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 8d015587946b..0dd3ec98c739 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -436,12 +436,15 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->res.type = RDMA_RESTRACK_PD;
+	/* number of uobj using this ib_pd */
+	atomic_set(&pd->refcnt, 1);
 
 	ret = ib_dev->ops.alloc_pd(pd, &attrs->driver_udata);
 	if (ret)
 		goto err_alloc;
 
 	uobj->object = pd;
+	uobj->refcnt = &pd->refcnt;
 	memset(&resp, 0, sizeof resp);
 	resp.pd_handle = uobj->id;
 	rdma_restrack_uadd(&pd->res);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6a3be5629e55..320c4b31ef68 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1473,6 +1473,9 @@ struct ib_pd {
 	 */
 	struct ib_mr	       *__internal_mr;
 	struct rdma_restrack_entry res;
+
+	/* number of uobj using this ib_pd */
+	atomic_t	      refcnt;
 };
 
 struct ib_xrcd {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 15/25] IB/uverbs: Add PD import verb
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (13 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 14/25] IB/uverbs: Add clone reference counting to ib_pd Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-17 11:44   ` Jason Gunthorpe
  2019-07-16 18:11 ` [PATCH 16/25] IB/mlx4: Enable import from FD verb Shamir Rabinovitch
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Implementation of new verb which enable to import ib_pd object.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 95 ++++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_verbs.h    | 14 ++++
 2 files changed, 109 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 0dd3ec98c739..9c73c7b10f8d 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3883,6 +3883,96 @@ static void ib_uverbs_import_unlock(struct ib_uobject *uobj,
 	fput(filep);
 }
 
+static void ib_uverbs_clone_ib_pd(union ib_uverbs_import_fr_fd_resp *resp,
+				  struct ib_pd *pd,
+				  __u32 handle)
+{
+	resp->alloc_pd.pd_handle = handle;
+}
+
+static int ib_uverbs_import_ib_pd(struct uverbs_attr_bundle *attrs)
+{
+	union ib_uverbs_import_fr_fd_resp resp = {0};
+	struct ib_uobject *src_uobj = NULL;
+	struct ib_uobject *dst_uobj = NULL;
+	struct ib_uverbs_import_fr_fd cmd;
+	struct ib_uverbs_file *src_file;
+	struct ib_device *ibdev;
+	struct ib_pd *pd;
+	struct file *filep;
+	int ret;
+
+	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
+	if (ret)
+		return ret;
+
+	ret = ib_uverbs_import_lock(attrs, cmd.fd, UVERBS_OBJECT_PD,
+				    cmd.handle, &src_uobj, &filep,
+				    &src_file);
+	if (ret)
+		return ret;
+
+	pd = src_uobj->object;
+
+	dst_uobj = uobj_alloc(UVERBS_OBJECT_PD, attrs, &ibdev);
+	if (IS_ERR(dst_uobj)) {
+		ret = -ENOMEM;
+		goto uobj;
+	}
+
+	if (!ibdev->ops.clone_ib_pd) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	dst_uobj->object = src_uobj->object;
+	dst_uobj->refcnt = src_uobj->refcnt;
+
+	if (WARN_ON(!atomic_inc_not_zero(src_uobj->refcnt))) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	ret = ibdev->ops.clone_ib_pd(&attrs->driver_udata, pd);
+	if (ret)
+		goto uobj;
+
+	ib_uverbs_clone_ib_pd(&resp, dst_uobj->object, dst_uobj->id);
+
+	ret = uverbs_response(attrs, &resp, sizeof(resp));
+	if (ret)
+		goto uobj;
+
+	ret = uobj_alloc_commit(dst_uobj, attrs);
+
+	ib_uverbs_import_unlock(src_uobj, filep);
+
+	return ret;
+
+uobj:
+	if (!IS_ERR_OR_NULL(dst_uobj))
+		uobj_alloc_abort(dst_uobj, attrs);
+
+	return ret;
+}
+
+static int ib_uverbs_import_fr_fd(struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uverbs_import_fr_fd cmd;
+	int ret;
+
+	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
+	if (ret)
+		return ret;
+
+	switch (cmd.type) {
+	case UVERBS_OBJECT_PD:
+		return ib_uverbs_import_ib_pd(attrs);
+	}
+
+	return -EINVAL;
+}
+
 /*
  * Describe the input structs for write(). Some write methods have an input
  * only struct, most have an input and output. If the struct has an output then
@@ -4015,6 +4105,11 @@ const struct uapi_definition uverbs_def_write_intf[] = {
 			UAPI_DEF_WRITE_IO(struct ib_uverbs_query_port,
 					  struct ib_uverbs_query_port_resp),
 			UAPI_DEF_METHOD_NEEDS_FN(query_port)),
+		DECLARE_UVERBS_WRITE(
+			IB_USER_VERBS_CMD_IMPORT_FR_FD,
+			ib_uverbs_import_fr_fd,
+			UAPI_DEF_WRITE_IO(struct ib_uverbs_import_fr_fd,
+					  union  ib_uverbs_import_fr_fd_resp)),
 		DECLARE_UVERBS_WRITE_EX(
 			IB_USER_VERBS_EX_CMD_QUERY_DEVICE,
 			ib_uverbs_ex_query_device,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 0474c7400268..4274c40bcff8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -88,6 +88,7 @@ enum ib_uverbs_write_cmds {
 	IB_USER_VERBS_CMD_CLOSE_XRCD,
 	IB_USER_VERBS_CMD_CREATE_XSRQ,
 	IB_USER_VERBS_CMD_OPEN_QP,
+	IB_USER_VERBS_CMD_IMPORT_FR_FD,
 };
 
 enum {
@@ -1299,6 +1300,19 @@ struct ib_uverbs_ex_modify_cq {
 	__u32 reserved;
 };
 
+/* object sharing support */
+struct ib_uverbs_import_fr_fd {
+	__u64 response;
+	__u32 fd;
+	__u32 handle;
+	__u16 type;
+	__u8  reserved[6];
+};
+
+union ib_uverbs_import_fr_fd_resp {
+	struct ib_uverbs_alloc_pd_resp alloc_pd;
+};
+
 #define IB_DEVICE_NAME_MAX 64
 
 #endif /* IB_USER_VERBS_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 16/25] IB/mlx4: Enable import from FD verb
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (14 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 15/25] IB/uverbs: Add PD import verb Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 17/25] IB/mlx5: " Shamir Rabinovitch
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Turn on import_fr_fd bit in uverbs command mask.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx4/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8adc569ce4fd..53fa102dc455 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2692,7 +2692,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_XSRQ)		|
-		(1ull << IB_USER_VERBS_CMD_OPEN_QP);
+		(1ull << IB_USER_VERBS_CMD_OPEN_QP)		|
+		(1ull << IB_USER_VERBS_CMD_IMPORT_FR_FD);
 
 	ib_set_device_ops(&ibdev->ib_dev, &mlx4_ib_dev_ops);
 	ibdev->ib_dev.uverbs_ex_cmd_mask |=
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 17/25] IB/mlx5: Enable import from FD verb
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (15 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 16/25] IB/mlx4: Enable import from FD verb Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 18/25] RDMA/rxe: " Shamir Rabinovitch
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Turn on import_fr_fd bit in uverbs command mask.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx5/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index c8eeeea1ef6d..f16820eba83e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6386,7 +6386,8 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
 		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_XSRQ)		|
-		(1ull << IB_USER_VERBS_CMD_OPEN_QP);
+		(1ull << IB_USER_VERBS_CMD_OPEN_QP)		|
+		(1ull << IB_USER_VERBS_CMD_IMPORT_FR_FD);
 	dev->ib_dev.uverbs_ex_cmd_mask =
 		(1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE)	|
 		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ)	|
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 18/25] RDMA/rxe: Enable import from FD verb
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (16 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 17/25] IB/mlx5: " Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 19/25] IB/core: ib_mr should not have ib_uobject pointer Shamir Rabinovitch
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Turn on import_fr_fd bit in uverbs command mask.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 6a23bb533e18..e96dc0edf367 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1208,6 +1208,7 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	    | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_AH)
 	    | BIT_ULL(IB_USER_VERBS_CMD_ATTACH_MCAST)
 	    | BIT_ULL(IB_USER_VERBS_CMD_DETACH_MCAST)
+	    | BIT_ULL(IB_USER_VERBS_CMD_IMPORT_FR_FD)
 	    ;
 
 	ib_set_device_ops(dev, &rxe_dev_ops);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 19/25] IB/core: ib_mr should not have ib_uobject pointer
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (17 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 18/25] RDMA/rxe: " Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 20/25] IB/core: Install clone ib_mr in device ops Shamir Rabinovitch
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

As a preparation step to shared MR, where ib_mr object will be pointed
by one or more ib_uobjects, remove ib_uobject pointer from ib_mr struct.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c          | 1 -
 drivers/infiniband/core/uverbs_std_types_mr.c | 1 -
 drivers/infiniband/core/verbs.c               | 3 ---
 include/rdma/ib_verbs.h                       | 3 ++-
 4 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9c73c7b10f8d..dc1ffcf44ee5 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -765,7 +765,6 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	mr->type    = IB_MR_TYPE_USER;
 	mr->dm	    = NULL;
 	mr->sig_attrs = NULL;
-	mr->uobject = uobj;
 	atomic_inc(&pd->usecnt);
 	mr->res.type = RDMA_RESTRACK_MR;
 	rdma_restrack_uadd(&mr->res);
diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index c1286a52dc84..5219af8960a3 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -130,7 +130,6 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
 	mr->pd      = pd;
 	mr->type    = IB_MR_TYPE_DM;
 	mr->dm      = dm;
-	mr->uobject = uobj;
 	atomic_inc(&pd->usecnt);
 	atomic_inc(&dm->usecnt);
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 56341f514914..854d71bd2bf9 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -299,7 +299,6 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 		mr->device	= pd->device;
 		mr->pd		= pd;
 		mr->type        = IB_MR_TYPE_DMA;
-		mr->uobject	= NULL;
 		mr->need_inval	= false;
 
 		pd->__internal_mr = mr;
@@ -2035,7 +2034,6 @@ struct ib_mr *ib_alloc_mr_user(struct ib_pd *pd, enum ib_mr_type mr_type,
 		mr->device  = pd->device;
 		mr->pd      = pd;
 		mr->dm      = NULL;
-		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
 		mr->need_inval = false;
 		mr->res.type = RDMA_RESTRACK_MR;
@@ -2088,7 +2086,6 @@ struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd,
 	mr->device = pd->device;
 	mr->pd = pd;
 	mr->dm = NULL;
-	mr->uobject = NULL;
 	atomic_inc(&pd->usecnt);
 	mr->need_inval = false;
 	mr->res.type = RDMA_RESTRACK_MR;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 320c4b31ef68..0ff2732794c9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1733,7 +1733,6 @@ struct ib_mr {
 	enum ib_mr_type	   type;
 	bool		   need_inval;
 	union {
-		struct ib_uobject	*uobject;	/* user */
 		struct list_head	qp_entry;	/* FR */
 	};
 
@@ -2540,6 +2539,7 @@ struct ib_device_ops {
 
 	/* Object sharing callbacks */
 	clone_callback(ib_pd);
+	clone_callback(ib_mr);
 
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
@@ -2558,6 +2558,7 @@ static inline int trivial_clone_##ib_type(struct ib_udata *udata,	\
 
 /* Shared IB HW object support */
 trivial_clone_callback(ib_pd);
+trivial_clone_callback(ib_mr);
 
 struct ib_core_device {
 	/* device must be the first element in structure until,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 20/25] IB/core: Install clone ib_mr in device ops
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (18 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 19/25] IB/core: ib_mr should not have ib_uobject pointer Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 21/25] IB/mlx4: Add implementation of clone_pd callback Shamir Rabinovitch
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

Install clone_ib_mr in the device ops.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index d4996e0f2976..66e3d91d56c4 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2573,6 +2573,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, set_vf_link_state);
 	SET_DEVICE_OP(dev_ops, unmap_fmr);
 	SET_DEVICE_OP(dev_ops, clone_ib_pd);
+	SET_DEVICE_OP(dev_ops, clone_ib_mr);
 
 	SET_OBJ_SIZE(dev_ops, ib_ah);
 	SET_OBJ_SIZE(dev_ops, ib_cq);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 21/25] IB/mlx4: Add implementation of clone_pd callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (19 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 20/25] IB/core: Install clone ib_mr in device ops Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 22/25] IB/mlx5: " Shamir Rabinovitch
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

No special handling in for object cloning, utilize the trivial
implementation.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/hw/mlx4/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 53fa102dc455..5c807e686eed 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2575,6 +2575,7 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 
 	/* Object sharing callbacks */
 	.clone_ib_pd = mlx4_ib_clone_pd,
+	.clone_ib_mr = trivial_clone_ib_mr,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx4_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx4_ib_cq, ibcq),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 22/25] IB/mlx5: Add implementation of clone_pd callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (20 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 21/25] IB/mlx4: Add implementation of clone_pd callback Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 23/25] RDMA/rxe: " Shamir Rabinovitch
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

No special handling in for object cloning, utilize the trivial
implementation.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/hw/mlx5/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f16820eba83e..37b7b99deb7e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6315,6 +6315,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 
 	/* Object sharing callbacks */
 	.clone_ib_pd = mlx5_ib_clone_pd,
+	.clone_ib_mr = trivial_clone_ib_mr,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx5_ib_cq, ibcq),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 23/25] RDMA/rxe: Add implementation of clone_pd callback
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (21 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 22/25] IB/mlx5: " Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:11 ` [PATCH 24/25] IB/uverbs: Add clone reference counting to ib_mr Shamir Rabinovitch
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

No special handling in for object cloning, utilize the trivial
implementation.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index e96dc0edf367..178534b7c1ec 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1150,6 +1150,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 
 	/* Object sharing callbacks */
 	.clone_ib_pd = trivial_clone_ib_pd,
+	.clone_ib_mr = trivial_clone_ib_mr,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 24/25] IB/uverbs: Add clone reference counting to ib_mr
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (22 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 23/25] RDMA/rxe: " Shamir Rabinovitch
@ 2019-07-16 18:11 ` Shamir Rabinovitch
  2019-07-16 18:12 ` [PATCH 25/25] IB/uverbs: Add MR import verb Shamir Rabinovitch
  2019-07-17  5:09 ` [PATCH 00/25] Shared PD and MR Christoph Hellwig
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:11 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

To keep track of shared object life time add ref count to ib_mr

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 2 ++
 include/rdma/ib_verbs.h              | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index dc1ffcf44ee5..229fc8e48391 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -768,8 +768,10 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	atomic_inc(&pd->usecnt);
 	mr->res.type = RDMA_RESTRACK_MR;
 	rdma_restrack_uadd(&mr->res);
+	atomic_set(&mr->refcnt, 1);
 
 	uobj->object = mr;
+	uobj->refcnt = &mr->refcnt;
 
 	memset(&resp, 0, sizeof resp);
 	resp.lkey      = mr->lkey;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0ff2732794c9..01be29a71d90 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1742,6 +1742,9 @@ struct ib_mr {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+
+	/* number of uobj using this ib_pd */
+	atomic_t	      refcnt;
 };
 
 struct ib_mw {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 25/25] IB/uverbs: Add MR import verb
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (23 preceding siblings ...)
  2019-07-16 18:11 ` [PATCH 24/25] IB/uverbs: Add clone reference counting to ib_mr Shamir Rabinovitch
@ 2019-07-16 18:12 ` Shamir Rabinovitch
  2019-07-17  5:09 ` [PATCH 00/25] Shared PD and MR Christoph Hellwig
  25 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-16 18:12 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

From: Yuval Shaia <yuval.shaia@oracle.com>

Implementation of new verb which enable to import ib_mr object.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 79 ++++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_verbs.h    |  1 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 229fc8e48391..e1da443f05ff 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3891,6 +3891,15 @@ static void ib_uverbs_clone_ib_pd(union ib_uverbs_import_fr_fd_resp *resp,
 	resp->alloc_pd.pd_handle = handle;
 }
 
+static void ib_uverbs_clone_ib_mr(union ib_uverbs_import_fr_fd_resp *resp,
+				  struct ib_mr *mr,
+				  __u32 handle)
+{
+	resp->reg_mr.lkey = mr->lkey;
+	resp->reg_mr.rkey = mr->rkey;
+	resp->reg_mr.mr_handle = handle;
+}
+
 static int ib_uverbs_import_ib_pd(struct uverbs_attr_bundle *attrs)
 {
 	union ib_uverbs_import_fr_fd_resp resp = {0};
@@ -3957,6 +3966,74 @@ static int ib_uverbs_import_ib_pd(struct uverbs_attr_bundle *attrs)
 	return ret;
 }
 
+static int ib_uverbs_import_ib_mr(struct uverbs_attr_bundle *attrs)
+{
+	union ib_uverbs_import_fr_fd_resp resp = {0};
+	struct ib_uobject *src_uobj = NULL;
+	struct ib_uobject *dst_uobj = NULL;
+	struct ib_uverbs_import_fr_fd cmd;
+	struct ib_uverbs_file *src_file;
+	struct ib_device *ibdev;
+	struct ib_mr *mr;
+	struct file *filep;
+	int ret;
+
+	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
+	if (ret)
+		return ret;
+
+	ret = ib_uverbs_import_lock(attrs, cmd.fd, UVERBS_OBJECT_MR,
+				    cmd.handle, &src_uobj, &filep,
+				    &src_file);
+	if (ret)
+		return ret;
+
+	mr = src_uobj->object;
+
+	dst_uobj = uobj_alloc(UVERBS_OBJECT_MR, attrs, &ibdev);
+	if (IS_ERR(dst_uobj)) {
+		ret = -ENOMEM;
+		goto uobj;
+	}
+
+	if (!ibdev->ops.clone_ib_mr) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	dst_uobj->object = src_uobj->object;
+	dst_uobj->refcnt = src_uobj->refcnt;
+
+	if (WARN_ON(!atomic_inc_not_zero(src_uobj->refcnt))) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	ret = ibdev->ops.clone_ib_mr(&attrs->driver_udata, mr);
+	if (ret)
+		goto uobj;
+
+	ib_uverbs_clone_ib_mr(&resp, dst_uobj->object, dst_uobj->id);
+
+	ret = uverbs_response(attrs, &resp, sizeof(resp));
+	if (ret)
+		goto uobj;
+
+	ret = uobj_alloc_commit(dst_uobj, attrs);
+
+	ib_uverbs_import_unlock(src_uobj, filep);
+
+	return ret;
+
+uobj:
+	ib_uverbs_import_unlock(src_uobj, filep);
+
+	if (!IS_ERR_OR_NULL(dst_uobj))
+		uobj_alloc_abort(dst_uobj, attrs);
+
+	return ret;
+}
+
 static int ib_uverbs_import_fr_fd(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_import_fr_fd cmd;
@@ -3969,6 +4046,8 @@ static int ib_uverbs_import_fr_fd(struct uverbs_attr_bundle *attrs)
 	switch (cmd.type) {
 	case UVERBS_OBJECT_PD:
 		return ib_uverbs_import_ib_pd(attrs);
+	case UVERBS_OBJECT_MR:
+		return ib_uverbs_import_ib_mr(attrs);
 	}
 
 	return -EINVAL;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 4274c40bcff8..cfd1a734d5f3 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -1311,6 +1311,7 @@ struct ib_uverbs_import_fr_fd {
 
 union ib_uverbs_import_fr_fd_resp {
 	struct ib_uverbs_alloc_pd_resp alloc_pd;
+	struct ib_uverbs_reg_mr_resp reg_mr;
 };
 
 #define IB_DEVICE_NAME_MAX 64
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
                   ` (24 preceding siblings ...)
  2019-07-16 18:12 ` [PATCH 25/25] IB/uverbs: Add MR import verb Shamir Rabinovitch
@ 2019-07-17  5:09 ` Christoph Hellwig
  2019-07-17 11:09   ` Shamir Rabinovitch
  25 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-07-17  5:09 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> Following patch-set introduce the shared object feature.
> 
> A shared object feature allows one process to create HW objects (currently
> PD and MR) so that a second process can import.

That sounds like a major complication, so you'd better also explain
the use case very well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-17  5:09 ` [PATCH 00/25] Shared PD and MR Christoph Hellwig
@ 2019-07-17 11:09   ` Shamir Rabinovitch
  2019-07-17 11:55     ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-17 11:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

On Wed, Jul 17, 2019 at 8:09 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> > Following patch-set introduce the shared object feature.
> >
> > A shared object feature allows one process to create HW objects (currently
> > PD and MR) so that a second process can import.
>
> That sounds like a major complication, so you'd better also explain
> the use case very well.

The main use case was that there is a server that has giant shared
memory that is shared across many processes (lots of mtts).
Each process needs the same memory registration (lots of mrs that
register same memory).
In such scenario, the HCA runs out of mtts.
To solve this problem, an single memory registration is shared across
all the process in that server saving hca mtts.
Future expansion of this can be done on top of the new infrastructure
to any HW object that is suitable for sharing (mainly those that are
not saving important data in the user context).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 15/25] IB/uverbs: Add PD import verb
  2019-07-16 18:11 ` [PATCH 15/25] IB/uverbs: Add PD import verb Shamir Rabinovitch
@ 2019-07-17 11:44   ` Jason Gunthorpe
  2019-07-17 20:15     ` Shamir Rabinovitch
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-17 11:44 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: dledford, leon, monis, parav, danielj, kamalheib1, markz, swise,
	shamir.rabinovitch, johannes.berg, willy, michaelgur, markb,
	yuval.shaia, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, linux-rdma

On Tue, Jul 16, 2019 at 09:11:50PM +0300, Shamir Rabinovitch wrote:
>  /*
>   * Describe the input structs for write(). Some write methods have an input
>   * only struct, most have an input and output. If the struct has an output then
> @@ -4015,6 +4105,11 @@ const struct uapi_definition uverbs_def_write_intf[] = {
>  			UAPI_DEF_WRITE_IO(struct ib_uverbs_query_port,
>  					  struct ib_uverbs_query_port_resp),
>  			UAPI_DEF_METHOD_NEEDS_FN(query_port)),
> +		DECLARE_UVERBS_WRITE(
> +			IB_USER_VERBS_CMD_IMPORT_FR_FD,
> +			ib_uverbs_import_fr_fd,
> +			UAPI_DEF_WRITE_IO(struct ib_uverbs_import_fr_fd,
> +					  union
> ib_uverbs_import_fr_fd_resp)),

New non-ioctl interfaces are not allowed now

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-16 18:11 ` [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore Shamir Rabinovitch
@ 2019-07-17 11:53   ` Jason Gunthorpe
  2019-07-17 19:25     ` Shamir Rabinovitch
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-17 11:53 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: dledford, leon, monis, parav, danielj, kamalheib1, markz, swise,
	shamir.rabinovitch, johannes.berg, willy, michaelgur, markb,
	yuval.shaia, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, linux-rdma

On Tue, Jul 16, 2019 at 09:11:43PM +0300, Shamir Rabinovitch wrote:
> From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> 
> ufile (&ucontext) with the process who own them must not be released
> when there are other ufile (&ucontext) that depens at them.

We already have a kref, why do we need more? Especially wrongly done
refcounts with atomics?

Trying to sequence the destroy of the ucontext seems inherently wrong
to me. If the driver has to link the PD/MR to data in the ucontext it
can't support sharing.

> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
>  drivers/infiniband/core/rdma_core.c   | 29 +++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs.h      | 22 ++++++++++++++++++++
>  drivers/infiniband/core/uverbs_cmd.c  | 16 +++++++++++++++
>  drivers/infiniband/core/uverbs_main.c |  4 ++++
>  4 files changed, 71 insertions(+)
> 
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index 651625f632d7..c81ff8e28fc6 100644
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -841,6 +841,33 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
>  	ufile->ucontext = NULL;
>  }
>  
> +static void __uverbs_ufile_refcount(struct ib_uverbs_file *ufile)
> +{
> +	int wait;
> +
> +	if (ufile->parent) {
> +		pr_debug("%s: release parent ufile. ufile %p parent %p\n",
> +			 __func__, ufile, ufile->parent);
> +		if (atomic_dec_and_test(&ufile->parent->refcount))
> +			complete(&ufile->parent->context_released);
> +	}
> +
> +	if (!atomic_dec_and_test(&ufile->refcount)) {
> +wait:
> +		wait = wait_for_completion_interruptible_timeout(
> +			&ufile->context_released, 3*HZ);
> +		if (wait == -ERESTARTSYS) {
> +			WARN_ONCE(1,
> +			"signal while waiting for context release! ufile %p\n",
> +				ufile);

????

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-17 11:09   ` Shamir Rabinovitch
@ 2019-07-17 11:55     ` Jason Gunthorpe
  2019-07-17 13:35       ` Shamir Rabinovitch
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-17 11:55 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: Christoph Hellwig, dledford, leon, monis, parav, danielj,
	kamalheib1, markz, swise, shamir.rabinovitch, johannes.berg,
	willy, michaelgur, markb, yuval.shaia, dan.carpenter, bvanassche,
	maxg, israelr, galpress, denisd, yuvalav, dennis.dalessandro,
	will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 02:09:50PM +0300, Shamir Rabinovitch wrote:
> On Wed, Jul 17, 2019 at 8:09 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> > > Following patch-set introduce the shared object feature.
> > >
> > > A shared object feature allows one process to create HW objects (currently
> > > PD and MR) so that a second process can import.
> >
> > That sounds like a major complication, so you'd better also explain
> > the use case very well.
> 
> The main use case was that there is a server that has giant shared
> memory that is shared across many processes (lots of mtts).
> Each process needs the same memory registration (lots of mrs that
> register same memory).
> In such scenario, the HCA runs out of mtts.
> To solve this problem, an single memory registration is shared across
> all the process in that server saving hca mtts.

Well, why not just share the entire uverbs FD then? Once the PD is
shared all security is lost anyhow..

This is not the model that was explained to me last year

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-17 11:55     ` Jason Gunthorpe
@ 2019-07-17 13:35       ` Shamir Rabinovitch
  2019-07-17 23:55         ` Ira Weiny
  2019-07-18 12:16         ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-17 13:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, dledford, leon, monis, parav, danielj,
	kamalheib1, markz, swise, shamir.rabinovitch, johannes.berg,
	willy, michaelgur, markb, yuval.shaia, dan.carpenter, bvanassche,
	maxg, israelr, galpress, denisd, yuvalav, dennis.dalessandro,
	will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 2:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jul 17, 2019 at 02:09:50PM +0300, Shamir Rabinovitch wrote:
> > On Wed, Jul 17, 2019 at 8:09 AM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> > > > Following patch-set introduce the shared object feature.
> > > >
> > > > A shared object feature allows one process to create HW objects (currently
> > > > PD and MR) so that a second process can import.
> > >
> > > That sounds like a major complication, so you'd better also explain
> > > the use case very well.
> >
> > The main use case was that there is a server that has giant shared
> > memory that is shared across many processes (lots of mtts).
> > Each process needs the same memory registration (lots of mrs that
> > register same memory).
> > In such scenario, the HCA runs out of mtts.
> > To solve this problem, an single memory registration is shared across
> > all the process in that server saving hca mtts.
>
> Well, why not just share the entire uverbs FD then? Once the PD is
> shared all security is lost anyhow..
>
> This is not the model that was explained to me last year
>
> Jason

We do share the whole uvrbs FD (context) with the second process and
let that process to instantiate the PD & MR from the shared FD.
The instantiation include creating new uobject in the second process
context that points to the same ib_x HW objects.
The second process does not own the shared context.
It just use it to get access to the shared ib_x objects and then it
mark those & shared FD as shared.

What was the expectation from "import_from_xxx" ?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 11:53   ` Jason Gunthorpe
@ 2019-07-17 19:25     ` Shamir Rabinovitch
  2019-07-17 19:33       ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-17 19:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shamir Rabinovitch, dledford, leon, monis, parav, danielj,
	kamalheib1, markz, swise, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza,
	linux-rdma

On Wed, Jul 17, 2019 at 08:53:54AM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 16, 2019 at 09:11:43PM +0300, Shamir Rabinovitch wrote:
> > From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > 
> > ufile (&ucontext) with the process who own them must not be released
> > when there are other ufile (&ucontext) that depens at them.
> 
> We already have a kref, why do we need more? Especially wrongly done
> refcounts with atomics?

Yes. Will fix in v2.

> 
> Trying to sequence the destroy of the ucontext seems inherently wrong
> to me. If the driver has to link the PD/MR to data in the ucontext it
> can't support sharing.

The issue we try to solve here is this:

[process 1]                     [process 2]
- alloc mr & point mr to        -
  context 1                     
- share context                 -
-                               - import mr
- exit                          -
-- ufile_destroy_ucontext       -
--- ib_mr is not destroyed      -
--- context 1 is destroyed      -
-                               - exit
-                               -- ufile_destroy_ucontext
-                               --- driver dereg_mr is called
-                               ---- ib_umem_release on umem from
                                     previously destroyed context 1

If I recall correctly, you suggested the shere and shree concept.

We also talked with Mellanox architecture team and they suggested
that the shrere will be bullet proof process that *only* create and
share objects.

The whole thing directly links to the next step we talked about 
which is sharing objects via file system rathen then via FD.

> 
> > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
> >  drivers/infiniband/core/rdma_core.c   | 29 +++++++++++++++++++++++++++
> >  drivers/infiniband/core/uverbs.h      | 22 ++++++++++++++++++++
> >  drivers/infiniband/core/uverbs_cmd.c  | 16 +++++++++++++++
> >  drivers/infiniband/core/uverbs_main.c |  4 ++++
> >  4 files changed, 71 insertions(+)
> > 
> > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> > index 651625f632d7..c81ff8e28fc6 100644
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -841,6 +841,33 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
> >  	ufile->ucontext = NULL;
> >  }
> >  
> > +static void __uverbs_ufile_refcount(struct ib_uverbs_file *ufile)
> > +{
> > +	int wait;
> > +
> > +	if (ufile->parent) {
> > +		pr_debug("%s: release parent ufile. ufile %p parent %p\n",
> > +			 __func__, ufile, ufile->parent);
> > +		if (atomic_dec_and_test(&ufile->parent->refcount))
> > +			complete(&ufile->parent->context_released);
> > +	}
> > +
> > +	if (!atomic_dec_and_test(&ufile->refcount)) {
> > +wait:
> > +		wait = wait_for_completion_interruptible_timeout(
> > +			&ufile->context_released, 3*HZ);
> > +		if (wait == -ERESTARTSYS) {
> > +			WARN_ONCE(1,
> > +			"signal while waiting for context release! ufile %p\n",
> > +				ufile);
> 
> ????
> 
> Jason

I copied the behaviour I saw in the rest of the kernel as for what to do
when wait_for_completion_interruptible_timeout exit due to interrupt.

From the above reason I think we need to delay the shrere process exit
so it will not close the context prematurely *unless* it receive signal.

In that case I'd expect that process to soot some clear warning and do 
whatevere id need to do for the given signal.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 19:25     ` Shamir Rabinovitch
@ 2019-07-17 19:33       ` Jason Gunthorpe
  2019-07-17 20:31         ` Yuval Shaia
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-17 19:33 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: dledford, leon, monis, parav, danielj, kamalheib1, markz, swise,
	johannes.berg, willy, michaelgur, markb, yuval.shaia,
	dan.carpenter, bvanassche, maxg, israelr, galpress, denisd,
	yuvalav, dennis.dalessandro, will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 10:25:25PM +0300, Shamir Rabinovitch wrote:
> On Wed, Jul 17, 2019 at 08:53:54AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jul 16, 2019 at 09:11:43PM +0300, Shamir Rabinovitch wrote:
> > > From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > > 
> > > ufile (&ucontext) with the process who own them must not be released
> > > when there are other ufile (&ucontext) that depens at them.
> > 
> > We already have a kref, why do we need more? Especially wrongly done
> > refcounts with atomics?
> 
> Yes. Will fix in v2.
> 
> > 
> > Trying to sequence the destroy of the ucontext seems inherently wrong
> > to me. If the driver has to link the PD/MR to data in the ucontext it
> > can't support sharing.
> 
> The issue we try to solve here is this:
> 
> [process 1]                     [process 2]
> - alloc mr & point mr to        -
>   context 1                     
> - share context                 -
> -                               - import mr
> - exit                          -
> -                               - exit
> -                               -- ufile_destroy_ucontext
> -                               --- driver dereg_mr is called
> -                               ---- ib_umem_release on umem from
>                                      previously destroyed context 1

Like I said, drivers that require the creating ucontext as part of the
PD and MR cannot support sharing.

> > > +	int wait;
> > > +
> > > +	if (ufile->parent) {
> > > +		pr_debug("%s: release parent ufile. ufile %p parent %p\n",
> > > +			 __func__, ufile, ufile->parent);
> > > +		if (atomic_dec_and_test(&ufile->parent->refcount))
> > > +			complete(&ufile->parent->context_released);
> > > +	}
> > > +
> > > +	if (!atomic_dec_and_test(&ufile->refcount)) {
> > > +wait:
> > > +		wait = wait_for_completion_interruptible_timeout(
> > > +			&ufile->context_released, 3*HZ);
> > > +		if (wait == -ERESTARTSYS) {
> > > +			WARN_ONCE(1,
> > > +			"signal while waiting for context release! ufile %p\n",
> > > +				ufile);
> > 
> > ????
> > 
> > Jason
> 
> I copied the behaviour I saw in the rest of the kernel as for what to do
> when wait_for_completion_interruptible_timeout exit due to interrupt.

It doesn't really make sense here, we can't block release() like this

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 15/25] IB/uverbs: Add PD import verb
  2019-07-17 11:44   ` Jason Gunthorpe
@ 2019-07-17 20:15     ` Shamir Rabinovitch
  0 siblings, 0 replies; 46+ messages in thread
From: Shamir Rabinovitch @ 2019-07-17 20:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shamir Rabinovitch, dledford, leon, monis, parav, danielj,
	kamalheib1, markz, swise, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza,
	linux-rdma

On Wed, Jul 17, 2019 at 08:44:32AM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 16, 2019 at 09:11:50PM +0300, Shamir Rabinovitch wrote:
> >  /*
> >   * Describe the input structs for write(). Some write methods have an input
> >   * only struct, most have an input and output. If the struct has an output then
> > @@ -4015,6 +4105,11 @@ const struct uapi_definition uverbs_def_write_intf[] = {
> >  			UAPI_DEF_WRITE_IO(struct ib_uverbs_query_port,
> >  					  struct ib_uverbs_query_port_resp),
> >  			UAPI_DEF_METHOD_NEEDS_FN(query_port)),
> > +		DECLARE_UVERBS_WRITE(
> > +			IB_USER_VERBS_CMD_IMPORT_FR_FD,
> > +			ib_uverbs_import_fr_fd,
> > +			UAPI_DEF_WRITE_IO(struct ib_uverbs_import_fr_fd,
> > +					  union
> > ib_uverbs_import_fr_fd_resp)),
> 
> New non-ioctl interfaces are not allowed now
> 
> Jason

Oh. Can you suggest commit(s) that can be used as ref for new ioctl in rdma & rdma-core ?

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 19:33       ` Jason Gunthorpe
@ 2019-07-17 20:31         ` Yuval Shaia
  2019-07-17 20:45           ` Matthew Wilcox
  0 siblings, 1 reply; 46+ messages in thread
From: Yuval Shaia @ 2019-07-17 20:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shamir Rabinovitch, dledford, leon, monis, parav, danielj,
	kamalheib1, markz, swise, johannes.berg, willy, michaelgur,
	markb, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 17, 2019 at 10:25:25PM +0300, Shamir Rabinovitch wrote:
> > On Wed, Jul 17, 2019 at 08:53:54AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Jul 16, 2019 at 09:11:43PM +0300, Shamir Rabinovitch wrote:
> > > > From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > > > 
> > > > ufile (&ucontext) with the process who own them must not be released
> > > > when there are other ufile (&ucontext) that depens at them.
> > > 
> > > We already have a kref, why do we need more? Especially wrongly done
> > > refcounts with atomics?
> > 
> > Yes. Will fix in v2.
> > 
> > > 
> > > Trying to sequence the destroy of the ucontext seems inherently wrong
> > > to me. If the driver has to link the PD/MR to data in the ucontext it
> > > can't support sharing.
> > 
> > The issue we try to solve here is this:
> > 
> > [process 1]                     [process 2]
> > - alloc mr & point mr to        -
> >   context 1                     
> > - share context                 -
> > -                               - import mr
> > - exit                          -
> > -                               - exit
> > -                               -- ufile_destroy_ucontext
> > -                               --- driver dereg_mr is called
> > -                               ---- ib_umem_release on umem from
> >                                      previously destroyed context 1
> 
> Like I said, drivers that require the creating ucontext as part of the
> PD and MR cannot support sharing.

Even if we can make sure the process that creates the MR stays alive until
all reference to this MR completes?

> 
> > > > +	int wait;
> > > > +
> > > > +	if (ufile->parent) {
> > > > +		pr_debug("%s: release parent ufile. ufile %p parent %p\n",
> > > > +			 __func__, ufile, ufile->parent);
> > > > +		if (atomic_dec_and_test(&ufile->parent->refcount))
> > > > +			complete(&ufile->parent->context_released);
> > > > +	}
> > > > +
> > > > +	if (!atomic_dec_and_test(&ufile->refcount)) {
> > > > +wait:
> > > > +		wait = wait_for_completion_interruptible_timeout(
> > > > +			&ufile->context_released, 3*HZ);
> > > > +		if (wait == -ERESTARTSYS) {
> > > > +			WARN_ONCE(1,
> > > > +			"signal while waiting for context release! ufile %p\n",
> > > > +				ufile);
> > > 
> > > ????
> > > 
> > > Jason
> > 
> > I copied the behaviour I saw in the rest of the kernel as for what to do
> > when wait_for_completion_interruptible_timeout exit due to interrupt.
> 
> It doesn't really make sense here, we can't block release() like this
> 
> Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 20:31         ` Yuval Shaia
@ 2019-07-17 20:45           ` Matthew Wilcox
  2019-07-17 21:36             ` Yuval Shaia
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Wilcox @ 2019-07-17 20:45 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: Jason Gunthorpe, Shamir Rabinovitch, dledford, leon, monis,
	parav, danielj, kamalheib1, markz, swise, johannes.berg,
	michaelgur, markb, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza,
	linux-rdma

On Wed, Jul 17, 2019 at 11:31:12PM +0300, Yuval Shaia wrote:
> On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> > Like I said, drivers that require the creating ucontext as part of the
> > PD and MR cannot support sharing.
> 
> Even if we can make sure the process that creates the MR stays alive until
> all reference to this MR completes?

The kernel can't rely on userspace to do that.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 20:45           ` Matthew Wilcox
@ 2019-07-17 21:36             ` Yuval Shaia
  2019-07-17 23:51               ` Ira Weiny
  2019-07-18 12:17               ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Yuval Shaia @ 2019-07-17 21:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jason Gunthorpe, Shamir Rabinovitch, dledford, leon, monis,
	parav, danielj, kamalheib1, markz, swise, johannes.berg,
	michaelgur, markb, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza,
	linux-rdma, Santosh Shilimkar

On Wed, Jul 17, 2019 at 01:45:05PM -0700, Matthew Wilcox wrote:
> On Wed, Jul 17, 2019 at 11:31:12PM +0300, Yuval Shaia wrote:
> > On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> > > Like I said, drivers that require the creating ucontext as part of the
> > > PD and MR cannot support sharing.
> > 
> > Even if we can make sure the process that creates the MR stays alive until
> > all reference to this MR completes?
> 
> The kernel can't rely on userspace to do that.

ok, how about this: we know that for MR to be shared the memory behinds it
should also be shared.

In this case, i know it sounds horrifying but do we care that the process
that originally created this MR exits? i.e. how about just before the
process leaves this world we will find some other ucontext to hold these
memory mappings that driver holds?
Or how about moving this mapping from ucontext pointed by ib_mr directly to
ib_mr?

Yuval

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 21:36             ` Yuval Shaia
@ 2019-07-17 23:51               ` Ira Weiny
  2019-07-18 12:17               ` Jason Gunthorpe
  1 sibling, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2019-07-17 23:51 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: Matthew Wilcox, Jason Gunthorpe, Shamir Rabinovitch, dledford,
	leon, monis, parav, danielj, kamalheib1, markz, swise,
	johannes.berg, michaelgur, markb, dan.carpenter, bvanassche,
	maxg, israelr, galpress, denisd, yuvalav, dennis.dalessandro,
	will, ereza, linux-rdma, Santosh Shilimkar

On Thu, Jul 18, 2019 at 12:36:37AM +0300, Yuval Shaia wrote:
> On Wed, Jul 17, 2019 at 01:45:05PM -0700, Matthew Wilcox wrote:
> > On Wed, Jul 17, 2019 at 11:31:12PM +0300, Yuval Shaia wrote:
> > > On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> > > > Like I said, drivers that require the creating ucontext as part of the
> > > > PD and MR cannot support sharing.
> > > 
> > > Even if we can make sure the process that creates the MR stays alive until
> > > all reference to this MR completes?
> > 
> > The kernel can't rely on userspace to do that.
> 
> ok, how about this: we know that for MR to be shared the memory behinds it
> should also be shared.
> 
> In this case, i know it sounds horrifying but do we care that the process
> that originally created this MR exits? i.e. how about just before the
> process leaves this world we will find some other ucontext to hold these
> memory mappings that driver holds?

Could you create a better cover letter for this.

What is the use case for all this?
What protections are in place for accesses between processes?
Why is it not sufficient to share the entire IB context with another process
using SCM_RIGHTS?


I've been trying to create a reliable method for an admin to identify processes
which may have pinned file backed pages (specifically with FS DAX).  The best
idea I have involves using the struct file of the ucontext to track the file
pins through procfs.  Jason and I discussed it here:

https://lkml.org/lkml/2019/6/7/548

Therefore, sharing to another ucontext _might_ work.  But only if the file pin
information is properly duplicated to the new ucontext.  This complicates (at a
minimum) having the generic GUP code handle the tracking of this information...

Generically, this whole thread scares me because now we are proposing something
even more obscure than just the sharing of a file descriptor to other processes
which point to the same IB context.  (Or at least that is what I infer from
reading the very short cover letter.)  But if there is a good use case ...  we
need to starting thinking about something generic to track these "memory maps"
which are not mmaps.

Ira


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-17 13:35       ` Shamir Rabinovitch
@ 2019-07-17 23:55         ` Ira Weiny
  2019-08-01  4:05           ` Yuval Shaia
  2019-07-18 12:16         ` Jason Gunthorpe
  1 sibling, 1 reply; 46+ messages in thread
From: Ira Weiny @ 2019-07-17 23:55 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: Jason Gunthorpe, Christoph Hellwig, dledford, leon, monis, parav,
	danielj, kamalheib1, markz, swise, shamir.rabinovitch,
	johannes.berg, willy, michaelgur, markb, yuval.shaia,
	dan.carpenter, bvanassche, maxg, israelr, galpress, denisd,
	yuvalav, dennis.dalessandro, will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 04:35:30PM +0300, Shamir Rabinovitch wrote:
> On Wed, Jul 17, 2019 at 2:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jul 17, 2019 at 02:09:50PM +0300, Shamir Rabinovitch wrote:
> > > On Wed, Jul 17, 2019 at 8:09 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> > > > > Following patch-set introduce the shared object feature.
> > > > >
> > > > > A shared object feature allows one process to create HW objects (currently
> > > > > PD and MR) so that a second process can import.
> > > >
> > > > That sounds like a major complication, so you'd better also explain
> > > > the use case very well.
> > >
> > > The main use case was that there is a server that has giant shared
> > > memory that is shared across many processes (lots of mtts).
> > > Each process needs the same memory registration (lots of mrs that
> > > register same memory).
> > > In such scenario, the HCA runs out of mtts.
> > > To solve this problem, an single memory registration is shared across
> > > all the process in that server saving hca mtts.
> >
> > Well, why not just share the entire uverbs FD then? Once the PD is
> > shared all security is lost anyhow..
> >
> > This is not the model that was explained to me last year
> >
> > Jason
> 
> We do share the whole uvrbs FD (context) with the second process and
> let that process to instantiate the PD & MR from the shared FD.

Then the first (both) process(es) should have access to the MR right?

> The instantiation include creating new uobject in the second process
> context that points to the same ib_x HW objects.
> The second process does not own the shared context.
> It just use it to get access to the shared ib_x objects and then it
> mark those & shared FD as shared.

I'm not following this?

> 
> What was the expectation from "import_from_xxx" ?

... and I don't understand this question.

Ira


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-17 13:35       ` Shamir Rabinovitch
  2019-07-17 23:55         ` Ira Weiny
@ 2019-07-18 12:16         ` Jason Gunthorpe
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-18 12:16 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: Christoph Hellwig, dledford, leon, monis, parav, danielj,
	kamalheib1, markz, swise, shamir.rabinovitch, johannes.berg,
	willy, michaelgur, markb, yuval.shaia, dan.carpenter, bvanassche,
	maxg, israelr, galpress, denisd, yuvalav, dennis.dalessandro,
	will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 04:35:30PM +0300, Shamir Rabinovitch wrote:
> On Wed, Jul 17, 2019 at 2:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jul 17, 2019 at 02:09:50PM +0300, Shamir Rabinovitch wrote:
> > > On Wed, Jul 17, 2019 at 8:09 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> > > > > Following patch-set introduce the shared object feature.
> > > > >
> > > > > A shared object feature allows one process to create HW objects (currently
> > > > > PD and MR) so that a second process can import.
> > > >
> > > > That sounds like a major complication, so you'd better also explain
> > > > the use case very well.
> > >
> > > The main use case was that there is a server that has giant shared
> > > memory that is shared across many processes (lots of mtts).
> > > Each process needs the same memory registration (lots of mrs that
> > > register same memory).
> > > In such scenario, the HCA runs out of mtts.
> > > To solve this problem, an single memory registration is shared across
> > > all the process in that server saving hca mtts.
> >
> > Well, why not just share the entire uverbs FD then? Once the PD is
> > shared all security is lost anyhow..
> >
> > This is not the model that was explained to me last year
> >
> > Jason
> 
> We do share the whole uvrbs FD (context) with the second process and
> let that process to instantiate the PD & MR from the shared FD.
> The instantiation include creating new uobject in the second process
> context that points to the same ib_x HW objects.
> The second process does not own the shared context.
> It just use it to get access to the shared ib_x objects and then it
> mark those & shared FD as shared.
> 
> What was the expectation from "import_from_xxx" ?

None of this discussion makes any sense to me, or matches the use
model that I thought this was targetting

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-17 21:36             ` Yuval Shaia
  2019-07-17 23:51               ` Ira Weiny
@ 2019-07-18 12:17               ` Jason Gunthorpe
  2019-07-18 20:45                 ` Yuval Shaia
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-18 12:17 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: Matthew Wilcox, Shamir Rabinovitch, dledford, leon, monis, parav,
	danielj, kamalheib1, markz, swise, johannes.berg, michaelgur,
	markb, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, linux-rdma,
	Santosh Shilimkar

On Thu, Jul 18, 2019 at 12:36:37AM +0300, Yuval Shaia wrote:
> On Wed, Jul 17, 2019 at 01:45:05PM -0700, Matthew Wilcox wrote:
> > On Wed, Jul 17, 2019 at 11:31:12PM +0300, Yuval Shaia wrote:
> > > On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> > > > Like I said, drivers that require the creating ucontext as part of the
> > > > PD and MR cannot support sharing.
> > > 
> > > Even if we can make sure the process that creates the MR stays alive until
> > > all reference to this MR completes?
> > 
> > The kernel can't rely on userspace to do that.
> 
> ok, how about this: we know that for MR to be shared the memory behinds it
> should also be shared.
> 
> In this case, i know it sounds horrifying but do we care that the process
> that originally created this MR exits? i.e. how about just before the
> process leaves this world we will find some other ucontext to hold these
> memory mappings that driver holds?
> Or how about moving this mapping from ucontext pointed by ib_mr directly to
> ib_mr?

What are you worrying about? My point is we don't need to *anything*
if the driver objects for PD and MR don't rely on the ucontext. This
appears to be the normal case.

MRs already work fine if they outlive the creating process.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/25] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
  2019-07-16 18:11 ` [PATCH 03/25] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Shamir Rabinovitch
@ 2019-07-18 16:05   ` Leon Romanovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Leon Romanovsky @ 2019-07-18 16:05 UTC (permalink / raw)
  To: Shamir Rabinovitch
  Cc: dledford, jgg, monis, parav, danielj, kamalheib1, markz, swise,
	shamir.rabinovitch, johannes.berg, willy, michaelgur, markb,
	yuval.shaia, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

On Tue, Jul 16, 2019 at 09:11:38PM +0300, Shamir Rabinovitch wrote:
> From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
>
> In shared object model ib_pd can belong to 1 or more ib_ucontext.
> Fix the nldev code so it could report multiple context ids.
>
> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
> ---
>  drivers/infiniband/core/nldev.c  | 127 +++++++++++++++++++++++++++++--
>  include/uapi/rdma/rdma_netlink.h |   3 +
>  2 files changed, 125 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 783e465e7c41..5167228dea1c 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -41,6 +41,7 @@
>  #include "core_priv.h"
>  #include "cma_priv.h"
>  #include "restrack.h"
> +#include "uverbs.h"
>
>  /*
>   * Sort array elements by the netlink attribute name
> @@ -141,6 +142,8 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
>  	[RDMA_NLDEV_ATTR_UVERBS_DRIVER_ID]	= { .type = NLA_U32 },
>  	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
>  	[RDMA_NLDEV_SYS_ATTR_NETNS_MODE]	= { .type = NLA_U8 },
> +	[RDMA_NLDEV_ATTR_RES_CTX]		= { .type = NLA_NESTED },
> +	[RDMA_NLDEV_ATTR_RES_CTX_ENTRY]		= { .type = NLA_NESTED },
>  };
>
>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
> @@ -611,11 +614,84 @@ static int fill_res_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
>  err:	return -EMSGSIZE;
>  }
>
> +struct context_id {
> +	struct list_head list;
> +	u32 id;
> +};
> +
> +static void pd_context(struct ib_pd *pd, struct list_head *list, int *count)
> +{
> +	struct ib_device *device = pd->device;
> +	struct rdma_restrack_entry *res;
> +	struct rdma_restrack_root *rt;
> +	struct ib_uverbs_file *ufile;
> +	struct ib_ucontext *ucontext;
> +	struct ib_uobject *uobj;
> +	unsigned long flags;
> +	unsigned long id;
> +	bool found;
> +
> +	rt = &device->res[RDMA_RESTRACK_CTX];
> +
> +	xa_lock(&rt->xa);
> +
> +	xa_for_each(&rt->xa, id, res) {
> +		if (!rdma_is_visible_in_pid_ns(res))
> +			continue;
> +
> +		if (!rdma_restrack_get(res))
> +			continue;
> +
> +		xa_unlock(&rt->xa);
> +
> +		ucontext = container_of(res, struct ib_ucontext, res);
> +		ufile = ucontext->ufile;
> +		found = false;
> +
> +		/* See locking requirements in struct ib_uverbs_file */
> +		down_read(&ufile->hw_destroy_rwsem);
> +		spin_lock_irqsave(&ufile->uobjects_lock, flags);
> +
> +		list_for_each_entry(uobj, &ufile->uobjects, list) {
> +			if (uobj->object == pd) {
> +				found = true;
> +				goto found;
> +			}
> +		}
> +
> +found:		spin_unlock_irqrestore(&ufile->uobjects_lock, flags);
> +		up_read(&ufile->hw_destroy_rwsem);
> +
> +		if (found) {
> +			struct context_id *ctx_id =
> +				kmalloc(sizeof(*ctx_id), GFP_KERNEL);
> +
> +			if (WARN_ON_ONCE(!ctx_id))
> +				goto next;

No WARN_ON on kmalloc failure.

> +
> +			ctx_id->id = ucontext->res.id;
> +			list_add(&ctx_id->list, list);
> +			(*count)++;
> +		}
> +
> +next:		rdma_restrack_put(res);
> +		xa_lock(&rt->xa);
> +	}
> +
> +	xa_unlock(&rt->xa);
> +}
> +
>  static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
>  			     struct rdma_restrack_entry *res, uint32_t port)
>  {
>  	struct ib_pd *pd = container_of(res, struct ib_pd, res);
>  	struct ib_device *dev = pd->device;
> +	struct nlattr *table_attr = NULL;
> +	struct nlattr *entry_attr = NULL;
> +	struct context_id *ctx_id;
> +	struct context_id *tmp;
> +	LIST_HEAD(pd_context_ids);
> +	int ctx_count = 0;
>
>  	if (has_cap_net_admin) {
>  		if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LOCAL_DMA_LKEY,
> @@ -633,10 +709,38 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
>  	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_PDN, res->id))
>  		goto err;
>
> -	if (!rdma_is_kernel_res(res) &&
> -	    nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
> -			pd->uobject->context->res.id))
> -		goto err;
> +	if (!rdma_is_kernel_res(res)) {
> +		pd_context(pd, &pd_context_ids, &ctx_count);
> +		if (ctx_count == 1) {

Only suggestion, maybe the better approach will be to print first CTXN always
and other CTXNs as nested after that? It will give an ability to see at
least first PD with old rdmatool. In your case, old rdmatool won't print
CTXNs at all.

Thanks

> +			/* user pd, not shared */
> +			ctx_id = list_first_entry(&pd_context_ids,
> +						  struct context_id, list);
> +			if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
> +					ctx_id->id))
> +				goto err;
> +		} else if (ctx_count > 1) {
> +			/* user pd, shared */
> +			table_attr = nla_nest_start(msg,
> +					RDMA_NLDEV_ATTR_RES_CTX);
> +			if (!table_attr)
> +				goto err;
> +
> +			list_for_each_entry(ctx_id, &pd_context_ids, list) {
> +				entry_attr = nla_nest_start(msg,
> +						RDMA_NLDEV_ATTR_RES_CTX_ENTRY);
> +				if (!entry_attr)
> +					goto err;
> +				if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
> +						ctx_id->id))
> +					goto err;
> +				nla_nest_end(msg, entry_attr);
> +				entry_attr = NULL;
> +			}
> +
> +			nla_nest_end(msg, table_attr);
> +			table_attr = NULL;
> +		}
> +	}
>
>  	if (fill_res_name_pid(msg, res))
>  		goto err;
> @@ -644,9 +748,22 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
>  	if (fill_res_entry(dev, msg, res))
>  		goto err;
>
> +	list_for_each_entry_safe(ctx_id, tmp, &pd_context_ids, list)
> +		kfree(ctx_id);
> +
>  	return 0;
>
> -err:	return -EMSGSIZE;
> +err:
> +	if (entry_attr)
> +		nla_nest_end(msg, entry_attr);
> +
> +	if (table_attr)
> +		nla_nest_end(msg, table_attr);
> +
> +	list_for_each_entry_safe(ctx_id, tmp, &pd_context_ids, list)
> +		kfree(ctx_id);
> +
> +	return -EMSGSIZE;
>  }
>
>  static int fill_stat_counter_mode(struct sk_buff *msg,
> diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index 8e277783fa96..7fbbfb07f071 100644
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -525,6 +525,9 @@ enum rdma_nldev_attr {
>  	 */
>  	RDMA_NLDEV_ATTR_DEV_DIM,                /* u8 */
>
> +	RDMA_NLDEV_ATTR_RES_CTX,		/* nested table */
> +	RDMA_NLDEV_ATTR_RES_CTX_ENTRY,		/* nested table */
> +
>  	/*
>  	 * Always the end
>  	 */
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-18 12:17               ` Jason Gunthorpe
@ 2019-07-18 20:45                 ` Yuval Shaia
  2019-07-19 11:46                   ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Yuval Shaia @ 2019-07-18 20:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Wilcox, Shamir Rabinovitch, dledford, leon, monis, parav,
	danielj, kamalheib1, markz, swise, johannes.berg, michaelgur,
	markb, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, linux-rdma,
	Santosh Shilimkar

On Thu, Jul 18, 2019 at 09:17:47AM -0300, Jason Gunthorpe wrote:
> On Thu, Jul 18, 2019 at 12:36:37AM +0300, Yuval Shaia wrote:
> > On Wed, Jul 17, 2019 at 01:45:05PM -0700, Matthew Wilcox wrote:
> > > On Wed, Jul 17, 2019 at 11:31:12PM +0300, Yuval Shaia wrote:
> > > > On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> > > > > Like I said, drivers that require the creating ucontext as part of the
> > > > > PD and MR cannot support sharing.
> > > > 
> > > > Even if we can make sure the process that creates the MR stays alive until
> > > > all reference to this MR completes?
> > > 
> > > The kernel can't rely on userspace to do that.
> > 
> > ok, how about this: we know that for MR to be shared the memory behinds it
> > should also be shared.
> > 
> > In this case, i know it sounds horrifying but do we care that the process
> > that originally created this MR exits? i.e. how about just before the
> > process leaves this world we will find some other ucontext to hold these
> > memory mappings that driver holds?
> > Or how about moving this mapping from ucontext pointed by ib_mr directly to
> > ib_mr?
> 
> What are you worrying about? My point is we don't need to *anything*
> if the driver objects for PD and MR don't rely on the ucontext. This
> appears to be the normal case.

but we saw that mlx4 (and i think also 5) do use the ucontext, i think to
undo umem_get stuff.

> 
> MRs already work fine if they outlive the creating process.

You mean if we leave the creating process alive?

> 
> Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore
  2019-07-18 20:45                 ` Yuval Shaia
@ 2019-07-19 11:46                   ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2019-07-19 11:46 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: Matthew Wilcox, Shamir Rabinovitch, dledford, leon, monis, parav,
	danielj, kamalheib1, markz, swise, johannes.berg, michaelgur,
	markb, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, linux-rdma,
	Santosh Shilimkar

On Thu, Jul 18, 2019 at 11:45:52PM +0300, Yuval Shaia wrote:
> On Thu, Jul 18, 2019 at 09:17:47AM -0300, Jason Gunthorpe wrote:
> > On Thu, Jul 18, 2019 at 12:36:37AM +0300, Yuval Shaia wrote:
> > > On Wed, Jul 17, 2019 at 01:45:05PM -0700, Matthew Wilcox wrote:
> > > > On Wed, Jul 17, 2019 at 11:31:12PM +0300, Yuval Shaia wrote:
> > > > > On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote:
> > > > > > Like I said, drivers that require the creating ucontext as part of the
> > > > > > PD and MR cannot support sharing.
> > > > > 
> > > > > Even if we can make sure the process that creates the MR stays alive until
> > > > > all reference to this MR completes?
> > > > 
> > > > The kernel can't rely on userspace to do that.
> > > 
> > > ok, how about this: we know that for MR to be shared the memory behinds it
> > > should also be shared.
> > > 
> > > In this case, i know it sounds horrifying but do we care that the process
> > > that originally created this MR exits? i.e. how about just before the
> > > process leaves this world we will find some other ucontext to hold these
> > > memory mappings that driver holds?
> > > Or how about moving this mapping from ucontext pointed by ib_mr directly to
> > > ib_mr?
> > 
> > What are you worrying about? My point is we don't need to *anything*
> > if the driver objects for PD and MR don't rely on the ucontext. This
> > appears to be the normal case.
> 
> but we saw that mlx4 (and i think also 5) do use the ucontext, i think to
> undo umem_get stuff.

It is unnecessary, remove it.

> > MRs already work fine if they outlive the creating process.
> 
> You mean if we leave the creating process alive?

No, let it die, it is fine

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/25] Shared PD and MR
  2019-07-17 23:55         ` Ira Weiny
@ 2019-08-01  4:05           ` Yuval Shaia
  0 siblings, 0 replies; 46+ messages in thread
From: Yuval Shaia @ 2019-08-01  4:05 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Shamir Rabinovitch, Jason Gunthorpe, Christoph Hellwig, dledford,
	leon, monis, parav, danielj, kamalheib1, markz, swise,
	shamir.rabinovitch, johannes.berg, willy, michaelgur, markb,
	dan.carpenter, bvanassche, maxg, israelr, galpress, denisd,
	yuvalav, dennis.dalessandro, will, ereza, linux-rdma

On Wed, Jul 17, 2019 at 04:55:26PM -0700, Ira Weiny wrote:
> On Wed, Jul 17, 2019 at 04:35:30PM +0300, Shamir Rabinovitch wrote:
> > On Wed, Jul 17, 2019 at 2:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Jul 17, 2019 at 02:09:50PM +0300, Shamir Rabinovitch wrote:
> > > > On Wed, Jul 17, 2019 at 8:09 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > > >
> > > > > On Tue, Jul 16, 2019 at 09:11:35PM +0300, Shamir Rabinovitch wrote:
> > > > > > Following patch-set introduce the shared object feature.
> > > > > >
> > > > > > A shared object feature allows one process to create HW objects (currently
> > > > > > PD and MR) so that a second process can import.
> > > > >
> > > > > That sounds like a major complication, so you'd better also explain
> > > > > the use case very well.
> > > >
> > > > The main use case was that there is a server that has giant shared
> > > > memory that is shared across many processes (lots of mtts).
> > > > Each process needs the same memory registration (lots of mrs that
> > > > register same memory).
> > > > In such scenario, the HCA runs out of mtts.
> > > > To solve this problem, an single memory registration is shared across
> > > > all the process in that server saving hca mtts.
> > >
> > > Well, why not just share the entire uverbs FD then? Once the PD is
> > > shared all security is lost anyhow..
> > >
> > > This is not the model that was explained to me last year
> > >
> > > Jason
> > 
> > We do share the whole uvrbs FD (context) with the second process and
> > let that process to instantiate the PD & MR from the shared FD.
> 
> Then the first (both) process(es) should have access to the MR right?

Yes.
(Please note that since we maintain refcount then the ib_mr object will be
destroyed only when the two will destroy it).

> 
> > The instantiation include creating new uobject in the second process
> > context that points to the same ib_x HW objects.
> > The second process does not own the shared context.
> > It just use it to get access to the shared ib_x objects and then it
> > mark those & shared FD as shared.
> 
> I'm not following this?

Shamir, correct me if i'm wrong here:
So there is one ib_mr object and two uobjects that "points" to it.

> 
> > 
> > What was the expectation from "import_from_xxx" ?
> 
> ... and I don't understand this question.

Shamir?

> 
> Ira
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2019-08-01  4:07 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-16 18:11 [PATCH 00/25] Shared PD and MR Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 01/25] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 02/25] RDMA/uverbs: Delete the macro uobj_put_obj_read Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 03/25] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Shamir Rabinovitch
2019-07-18 16:05   ` Leon Romanovsky
2019-07-16 18:11 ` [PATCH 04/25] IB/{core,hw}: ib_pd should not have ib_uobject pointer Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 05/25] IB/core: ib_uobject need HW object reference count Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 06/25] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 07/25] IB/uverbs: Add context import lock/unlock helper Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 08/25] IB/uverbs: ufile must be freed only when not used anymore Shamir Rabinovitch
2019-07-17 11:53   ` Jason Gunthorpe
2019-07-17 19:25     ` Shamir Rabinovitch
2019-07-17 19:33       ` Jason Gunthorpe
2019-07-17 20:31         ` Yuval Shaia
2019-07-17 20:45           ` Matthew Wilcox
2019-07-17 21:36             ` Yuval Shaia
2019-07-17 23:51               ` Ira Weiny
2019-07-18 12:17               ` Jason Gunthorpe
2019-07-18 20:45                 ` Yuval Shaia
2019-07-19 11:46                   ` Jason Gunthorpe
2019-07-16 18:11 ` [PATCH 09/25] IB/verbs: Prototype of HW object clone callback Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 10/25] IB/core: Install clone ib_pd in device ops Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 11/25] IB/mlx4: Add implementation of clone_pd callback Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 12/25] IB/mlx5: " Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 13/25] RDMA/rxe: " Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 14/25] IB/uverbs: Add clone reference counting to ib_pd Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 15/25] IB/uverbs: Add PD import verb Shamir Rabinovitch
2019-07-17 11:44   ` Jason Gunthorpe
2019-07-17 20:15     ` Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 16/25] IB/mlx4: Enable import from FD verb Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 17/25] IB/mlx5: " Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 18/25] RDMA/rxe: " Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 19/25] IB/core: ib_mr should not have ib_uobject pointer Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 20/25] IB/core: Install clone ib_mr in device ops Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 21/25] IB/mlx4: Add implementation of clone_pd callback Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 22/25] IB/mlx5: " Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 23/25] RDMA/rxe: " Shamir Rabinovitch
2019-07-16 18:11 ` [PATCH 24/25] IB/uverbs: Add clone reference counting to ib_mr Shamir Rabinovitch
2019-07-16 18:12 ` [PATCH 25/25] IB/uverbs: Add MR import verb Shamir Rabinovitch
2019-07-17  5:09 ` [PATCH 00/25] Shared PD and MR Christoph Hellwig
2019-07-17 11:09   ` Shamir Rabinovitch
2019-07-17 11:55     ` Jason Gunthorpe
2019-07-17 13:35       ` Shamir Rabinovitch
2019-07-17 23:55         ` Ira Weiny
2019-08-01  4:05           ` Yuval Shaia
2019-07-18 12:16         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).