linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/24] Shared PD and MR
@ 2019-08-21 14:21 Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 01/24] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Yuval Shaia
                   ` (25 more replies)
  0 siblings, 26 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

Following patch-set introduce the shared object feature.

A shared object feature allows one process to create HW objects (currently
PD and MR) so that a second process can import.

Patch-set is logically splits to 4 parts as the following:
- patches 1 to 7 and 18 are preparation steps.
- patches 8 to 14 are the implementation of import PD
- patches 15 to 17 are the implementation of the verb
- patches 19 to 24 are the implementation of import MR

v0 -> v1:
	* Delete the patch "IB/uverbs: ufile must be freed only when not
	  used anymore". The process can die, the ucontext remains until
	  last reference to it is closed.
	* Rebase to latest for-next branch

Shamir Rabinovitch (16):
  RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
  RDMA/uverbs: Delete the macro uobj_put_obj_read
  RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
  IB/{core,hw}: ib_pd should not have ib_uobject pointer
  IB/core: ib_uobject need HW object reference count
  IB/uverbs: Helper function to initialize ufile member of
    uverbs_attr_bundle
  IB/uverbs: Add context import lock/unlock helper
  IB/verbs: Prototype of HW object clone callback
  IB/mlx4: Add implementation of clone_pd callback
  IB/mlx5: Add implementation of clone_pd callback
  RDMA/rxe: Add implementation of clone_pd callback
  IB/uverbs: Add clone reference counting to ib_pd
  IB/uverbs: Add PD import verb
  IB/mlx4: Enable import from FD verb
  IB/mlx5: Enable import from FD verb
  RDMA/rxe: Enable import from FD verb

Yuval Shaia (8):
  IB/core: Install clone ib_pd in device ops
  IB/core: ib_mr should not have ib_uobject pointer
  IB/core: Install clone ib_mr in device ops
  IB/mlx4: Add implementation of clone_pd callback
  IB/mlx5: Add implementation of clone_pd callback
  RDMA/rxe: Add implementation of clone_pd callback
  IB/uverbs: Add clone reference counting to ib_mr
  IB/uverbs: Add MR import verb

 drivers/infiniband/core/device.c              |   2 +
 drivers/infiniband/core/nldev.c               | 127 ++++-
 drivers/infiniband/core/rdma_core.c           |  23 +-
 drivers/infiniband/core/uverbs.h              |   2 +
 drivers/infiniband/core/uverbs_cmd.c          | 489 +++++++++++++++---
 drivers/infiniband/core/uverbs_main.c         |   1 +
 drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
 drivers/infiniband/core/verbs.c               |   4 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
 drivers/infiniband/hw/mlx4/main.c             |  18 +-
 drivers/infiniband/hw/mlx5/main.c             |  34 +-
 drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
 include/rdma/ib_verbs.h                       |  43 +-
 include/rdma/uverbs_std_types.h               |  11 +-
 include/uapi/rdma/ib_user_verbs.h             |  15 +
 include/uapi/rdma/rdma_netlink.h              |   3 +
 17 files changed, 669 insertions(+), 113 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 01/24] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 02/24] RDMA/uverbs: Delete the macro uobj_put_obj_read Yuval Shaia
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

The uobject pointer will be removed from ib_x (ib_pd, ib_mr,...) objects.
uobj_get_obj_read and uobj_put_obj_read macros were constructed with the
assumption that ib_x can figure the uobject in mind.

Fix this wrong assumption.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 125 ++++++++++++++++++++-------
 include/rdma/uverbs_std_types.h      |   8 +-
 2 files changed, 98 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7ddd0e5bc6b3..b0dc301918bf 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -37,6 +37,7 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/list.h>
 
 #include <linux/uaccess.h>
 
@@ -711,6 +712,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_reg_mr      cmd;
 	struct ib_uverbs_reg_mr_resp resp;
 	struct ib_uobject           *uobj;
+	struct ib_uobject           *pduobj;
 	struct ib_pd                *pd;
 	struct ib_mr                *mr;
 	int                          ret;
@@ -731,7 +733,8 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(uobj))
 		return PTR_ERR(uobj);
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err_free;
@@ -799,6 +802,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 	struct ib_pd		    *old_pd;
 	int                          ret;
 	struct ib_uobject	    *uobj;
+	struct ib_uobject	    *pduobj = NULL;
 
 	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
 	if (ret)
@@ -831,7 +835,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 
 	if (cmd.flags & IB_MR_REREG_PD) {
 		pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle,
-				       attrs);
+				       attrs, &pduobj);
 		if (!pd) {
 			ret = -EINVAL;
 			goto put_uobjs;
@@ -885,6 +889,7 @@ static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_alloc_mw      cmd;
 	struct ib_uverbs_alloc_mw_resp resp;
 	struct ib_uobject             *uobj;
+	struct ib_uobject             *pduobj;
 	struct ib_pd                  *pd;
 	struct ib_mw                  *mw;
 	int                            ret;
@@ -898,7 +903,8 @@ static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(uobj))
 		return PTR_ERR(uobj);
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err_free;
@@ -1117,6 +1123,7 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_resize_cq	cmd;
 	struct ib_uverbs_resize_cq_resp	resp = {};
+	struct ib_uobject		*cquobj;
 	struct ib_cq			*cq;
 	int				ret = -EINVAL;
 
@@ -1124,7 +1131,8 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
@@ -1177,6 +1185,7 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_poll_cq_resp  resp;
 	u8 __user                     *header_ptr;
 	u8 __user                     *data_ptr;
+	struct ib_uobject	      *cquobj;
 	struct ib_cq                  *cq;
 	struct ib_wc                   wc;
 	int                            ret;
@@ -1185,7 +1194,8 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
@@ -1226,6 +1236,7 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_req_notify_cq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_req_notify_cq cmd;
+	struct ib_uobject	      *cquobj;
 	struct ib_cq                  *cq;
 	int ret;
 
@@ -1233,7 +1244,8 @@ static int ib_uverbs_req_notify_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
@@ -1276,16 +1288,21 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 {
 	struct ib_uqp_object		*obj;
 	struct ib_device		*device;
+	struct ib_uobject		*pd_uobj = ERR_PTR(-ENOENT);
 	struct ib_pd			*pd = NULL;
 	struct ib_xrcd			*xrcd = NULL;
 	struct ib_uobject		*xrcd_uobj = ERR_PTR(-ENOENT);
+	struct ib_uobject		*scq_uobj = ERR_PTR(-ENOENT);
+	struct ib_uobject		*rcq_uobj = ERR_PTR(-ENOENT);
 	struct ib_cq			*scq = NULL, *rcq = NULL;
+	struct ib_uobject		*srq_uobj = ERR_PTR(-ENOENT);
 	struct ib_srq			*srq = NULL;
 	struct ib_qp			*qp;
 	struct ib_qp_init_attr		attr = {};
 	struct ib_uverbs_ex_create_qp_resp resp;
 	int				ret;
 	struct ib_rwq_ind_table *ind_tbl = NULL;
+	struct ib_uobject	*ind_tbl_uobj = ERR_PTR(-ENOENT);
 	bool has_sq = true;
 	struct ib_device *ib_dev;
 
@@ -1303,7 +1320,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	if (cmd->comp_mask & IB_UVERBS_CREATE_QP_MASK_IND_TABLE) {
 		ind_tbl = uobj_get_obj_read(rwq_ind_table,
 					    UVERBS_OBJECT_RWQ_IND_TBL,
-					    cmd->rwq_ind_tbl_handle, attrs);
+					    cmd->rwq_ind_tbl_handle, attrs,
+					    &ind_tbl_uobj);
 		if (!ind_tbl) {
 			ret = -EINVAL;
 			goto err_put;
@@ -1342,7 +1360,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 		} else {
 			if (cmd->is_srq) {
 				srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ,
-							cmd->srq_handle, attrs);
+							cmd->srq_handle, attrs,
+							&srq_uobj);
 				if (!srq || srq->srq_type == IB_SRQT_XRC) {
 					ret = -EINVAL;
 					goto err_put;
@@ -1353,7 +1372,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 				if (cmd->recv_cq_handle != cmd->send_cq_handle) {
 					rcq = uobj_get_obj_read(
 						cq, UVERBS_OBJECT_CQ,
-						cmd->recv_cq_handle, attrs);
+						cmd->recv_cq_handle, attrs,
+						&rcq_uobj);
 					if (!rcq) {
 						ret = -EINVAL;
 						goto err_put;
@@ -1364,11 +1384,12 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 
 		if (has_sq)
 			scq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ,
-						cmd->send_cq_handle, attrs);
+						cmd->send_cq_handle, attrs,
+						&scq_uobj);
 		if (!ind_tbl)
 			rcq = rcq ?: scq;
 		pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle,
-				       attrs);
+				       attrs, &pd_uobj);
 		if (!pd || (!scq && has_sq)) {
 			ret = -EINVAL;
 			goto err_put;
@@ -1663,6 +1684,7 @@ static int ib_uverbs_query_qp(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_query_qp      cmd;
 	struct ib_uverbs_query_qp_resp resp;
+	struct ib_uobject	       *qpuobj;
 	struct ib_qp                   *qp;
 	struct ib_qp_attr              *attr;
 	struct ib_qp_init_attr         *init_attr;
@@ -1679,7 +1701,8 @@ static int ib_uverbs_query_qp(struct uverbs_attr_bundle *attrs)
 		goto out;
 	}
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -1776,6 +1799,7 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
 		     struct ib_uverbs_ex_modify_qp *cmd)
 {
 	struct ib_qp_attr *attr;
+	struct ib_uobject *qpuobj;
 	struct ib_qp *qp;
 	int ret;
 
@@ -1784,7 +1808,7 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
 		return -ENOMEM;
 
 	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd->base.qp_handle,
-			       attrs);
+			       attrs, &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -2018,6 +2042,7 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_send_wr       *user_wr;
 	struct ib_send_wr              *wr = NULL, *last, *next;
 	const struct ib_send_wr	       *bad_wr;
+	struct ib_uobject	       *qpuobj;
 	struct ib_qp                   *qp;
 	int                             i, sg_ind;
 	int				is_ud;
@@ -2045,7 +2070,8 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 	if (!user_wr)
 		return -ENOMEM;
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -2068,6 +2094,7 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 
 		if (is_ud) {
 			struct ib_ud_wr *ud;
+			struct ib_uobject *uobj;
 
 			if (user_wr->opcode != IB_WR_SEND &&
 			    user_wr->opcode != IB_WR_SEND_WITH_IMM) {
@@ -2083,7 +2110,8 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 			}
 
 			ud->ah = uobj_get_obj_read(ah, UVERBS_OBJECT_AH,
-						   user_wr->wr.ud.ah, attrs);
+						   user_wr->wr.ud.ah, attrs,
+						   &uobj);
 			if (!ud->ah) {
 				kfree(ud);
 				ret = -EINVAL;
@@ -2308,6 +2336,7 @@ static int ib_uverbs_post_recv(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_post_recv_resp resp;
 	struct ib_recv_wr              *wr, *next;
 	const struct ib_recv_wr	       *bad_wr;
+	struct ib_uobject	       *qpuobj;
 	struct ib_qp                   *qp;
 	int ret, ret2;
 	struct uverbs_req_iter iter;
@@ -2321,7 +2350,8 @@ static int ib_uverbs_post_recv(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(wr))
 		return PTR_ERR(wr);
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
@@ -2358,6 +2388,7 @@ static int ib_uverbs_post_srq_recv(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_post_srq_recv_resp resp;
 	struct ib_recv_wr                  *wr, *next;
 	const struct ib_recv_wr		   *bad_wr;
+	struct ib_uobject		   *srquobj;
 	struct ib_srq                      *srq;
 	int ret, ret2;
 	struct uverbs_req_iter iter;
@@ -2371,7 +2402,8 @@ static int ib_uverbs_post_srq_recv(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(wr))
 		return PTR_ERR(wr);
 
-	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs);
+	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs,
+				&srquobj);
 	if (!srq) {
 		ret = -EINVAL;
 		goto out;
@@ -2408,6 +2440,7 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_create_ah	 cmd;
 	struct ib_uverbs_create_ah_resp	 resp;
 	struct ib_uobject		*uobj;
+	struct ib_uobject		*pduobj;
 	struct ib_pd			*pd;
 	struct ib_ah			*ah;
 	struct rdma_ah_attr		attr = {};
@@ -2427,7 +2460,8 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 		goto err;
 	}
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err;
@@ -2497,6 +2531,7 @@ static int ib_uverbs_destroy_ah(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_attach_mcast(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_attach_mcast cmd;
+	struct ib_uobject	     *qpuobj;
 	struct ib_qp                 *qp;
 	struct ib_uqp_object         *obj;
 	struct ib_uverbs_mcast_entry *mcast;
@@ -2506,7 +2541,8 @@ static int ib_uverbs_attach_mcast(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp)
 		return -EINVAL;
 
@@ -2546,6 +2582,7 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_detach_mcast cmd;
 	struct ib_uqp_object         *obj;
+	struct ib_uobject	     *qpuobj;
 	struct ib_qp                 *qp;
 	struct ib_uverbs_mcast_entry *mcast;
 	int                           ret;
@@ -2555,7 +2592,8 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp)
 		return -EINVAL;
 
@@ -2665,6 +2703,8 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 				       union ib_flow_spec *ib_spec,
 				       struct ib_uflow_resources *uflow_res)
 {
+	struct ib_uobject *uobj;
+
 	ib_spec->type = kern_spec->type;
 	switch (ib_spec->type) {
 	case IB_FLOW_SPEC_ACTION_TAG:
@@ -2689,7 +2729,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 		ib_spec->action.act = uobj_get_obj_read(flow_action,
 							UVERBS_OBJECT_FLOW_ACTION,
 							kern_spec->action.handle,
-							attrs);
+							attrs, &uobj);
 		if (!ib_spec->action.act)
 			return -EINVAL;
 		ib_spec->action.size =
@@ -2707,7 +2747,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 			uobj_get_obj_read(counters,
 					  UVERBS_OBJECT_COUNTERS,
 					  kern_spec->flow_count.handle,
-					  attrs);
+					  attrs, &uobj);
 		if (!ib_spec->flow_count.counters)
 			return -EINVAL;
 		ib_spec->flow_count.size =
@@ -2909,6 +2949,8 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_ex_create_wq_resp resp = {};
 	struct ib_uwq_object           *obj;
 	int err = 0;
+	struct ib_uobject *cquobj;
+	struct ib_uobject *pduobj;
 	struct ib_cq *cq;
 	struct ib_pd *pd;
 	struct ib_wq *wq;
@@ -2927,13 +2969,15 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		err = -EINVAL;
 		goto err_uobj;
 	}
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq) {
 		err = -EINVAL;
 		goto err_put_pd;
@@ -3025,6 +3069,7 @@ static int ib_uverbs_ex_destroy_wq(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_ex_modify_wq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_ex_modify_wq cmd;
+	struct ib_uobject *wquobj;
 	struct ib_wq *wq;
 	struct ib_wq_attr wq_attr = {};
 	int ret;
@@ -3039,7 +3084,8 @@ static int ib_uverbs_ex_modify_wq(struct uverbs_attr_bundle *attrs)
 	if (cmd.attr_mask > (IB_WQ_STATE | IB_WQ_CUR_STATE | IB_WQ_FLAGS))
 		return -EINVAL;
 
-	wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ, cmd.wq_handle, attrs);
+	wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ, cmd.wq_handle, attrs,
+			       &wquobj);
 	if (!wq)
 		return -EINVAL;
 
@@ -3104,8 +3150,11 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 
 	for (num_read_wqs = 0; num_read_wqs < num_wq_handles;
 			num_read_wqs++) {
+		struct ib_uobject *uobj;
+
 		wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ,
-				       wqs_handles[num_read_wqs], attrs);
+				       wqs_handles[num_read_wqs], attrs,
+				       &uobj);
 		if (!wq) {
 			err = -EINVAL;
 			goto put_wqs;
@@ -3166,6 +3215,7 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 err_free:
 	kfree(wqs_handles);
 	kfree(wqs);
+
 	return err;
 }
 
@@ -3193,6 +3243,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 	struct ib_flow			  *flow_id;
 	struct ib_uverbs_flow_attr	  *kern_flow_attr;
 	struct ib_flow_attr		  *flow_attr;
+	struct ib_uobject		  *qpuobj;
 	struct ib_qp			  *qp;
 	struct ib_uflow_resources	  *uflow_res;
 	struct ib_uverbs_flow_spec_hdr	  *kern_spec;
@@ -3256,7 +3307,8 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 		goto err_free_attr;
 	}
 
-	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
+	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs,
+			       &qpuobj);
 	if (!qp) {
 		err = -EINVAL;
 		goto err_uobj;
@@ -3371,6 +3423,8 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 {
 	struct ib_uverbs_create_srq_resp resp;
 	struct ib_usrq_object           *obj;
+	struct ib_uobject		*cquobj = ERR_PTR(-ENOENT);
+	struct ib_uobject		*pduobj;
 	struct ib_pd                    *pd;
 	struct ib_srq                   *srq;
 	struct ib_uobject               *uninitialized_var(xrcd_uobj);
@@ -3406,14 +3460,15 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 
 	if (ib_srq_has_cq(cmd->srq_type)) {
 		attr.ext.cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ,
-						cmd->cq_handle, attrs);
+						cmd->cq_handle, attrs, &cquobj);
 		if (!attr.ext.cq) {
 			ret = -EINVAL;
 			goto err_put_xrcd;
 		}
 	}
 
-	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle, attrs);
+	pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle, attrs,
+			       &pduobj);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err_put_cq;
@@ -3542,6 +3597,7 @@ static int ib_uverbs_create_xsrq(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_modify_srq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_modify_srq cmd;
+	struct ib_uobject	   *srquobj;
 	struct ib_srq              *srq;
 	struct ib_srq_attr          attr;
 	int                         ret;
@@ -3550,7 +3606,8 @@ static int ib_uverbs_modify_srq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs);
+	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs,
+				&srquobj);
 	if (!srq)
 		return -EINVAL;
 
@@ -3570,6 +3627,7 @@ static int ib_uverbs_query_srq(struct uverbs_attr_bundle *attrs)
 	struct ib_uverbs_query_srq      cmd;
 	struct ib_uverbs_query_srq_resp resp;
 	struct ib_srq_attr              attr;
+	struct ib_uobject		*srquobj;
 	struct ib_srq                   *srq;
 	int                             ret;
 
@@ -3577,7 +3635,8 @@ static int ib_uverbs_query_srq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
-	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs);
+	srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs,
+				&srquobj);
 	if (!srq)
 		return -EINVAL;
 
@@ -3689,6 +3748,7 @@ static int ib_uverbs_ex_query_device(struct uverbs_attr_bundle *attrs)
 static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_ex_modify_cq cmd;
+	struct ib_uobject *cquobj;
 	struct ib_cq *cq;
 	int ret;
 
@@ -3702,7 +3762,8 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 	if (cmd.attr_mask > IB_CQ_MODERATE)
 		return -EOPNOTSUPP;
 
-	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
+	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs,
+			       &cquobj);
 	if (!cq)
 		return -EINVAL;
 
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
index 05eabfd5d0d3..578e5c28bc1c 100644
--- a/include/rdma/uverbs_std_types.h
+++ b/include/rdma/uverbs_std_types.h
@@ -64,9 +64,11 @@ static inline void *_uobj_get_obj_read(struct ib_uobject *uobj)
 		return NULL;
 	return uobj->object;
 }
-#define uobj_get_obj_read(_object, _type, _id, _attrs)                         \
-	((struct ib_##_object *)_uobj_get_obj_read(                            \
-		uobj_get_read(_type, _id, _attrs)))
+#define uobj_get_obj_read(_object, _type, _id, _attrs, _uobj)                  \
+({                                                                             \
+	*_uobj = uobj_get_read(_type, _id, _attrs);                            \
+	((struct ib_##_object *)_uobj_get_obj_read(*_uobj));                   \
+})
 
 #define uobj_get_write(_type, _id, _attrs)                                     \
 	rdma_lookup_get_uobject(uobj_get_type(_attrs, _type), (_attrs)->ufile, \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 02/24] RDMA/uverbs: Delete the macro uobj_put_obj_read
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 01/24] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 03/24] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Yuval Shaia
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

The macro uobj_put_obj_read assumes an existence of a member with the name
uobject on each HW object. Since this member is about to be delete we
shouldn't use this macro, instead use uobj_put_read and delete
uobj_put_obj_read.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 92 ++++++++++++++--------------
 include/rdma/uverbs_std_types.h      |  3 -
 2 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b0dc301918bf..204a93305f1c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -778,7 +778,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 	return uobj_alloc_commit(uobj, attrs);
 
@@ -786,7 +786,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
 
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 err_free:
 	uobj_alloc_abort(uobj, attrs);
@@ -864,7 +864,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 
 put_uobj_pd:
 	if (cmd.flags & IB_MR_REREG_PD)
-		uobj_put_obj_read(pd);
+		uobj_put_read(pduobj);
 
 put_uobjs:
 	uobj_put_write(uobj);
@@ -936,13 +936,13 @@ static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 	return uobj_alloc_commit(uobj, attrs);
 
 err_copy:
 	uverbs_dealloc_mw(mw);
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 err_free:
 	uobj_alloc_abort(uobj, attrs);
 	return ret;
@@ -1144,7 +1144,7 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 
 	ret = uverbs_response(attrs, &resp, sizeof(resp));
 out:
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 
 	return ret;
 }
@@ -1229,7 +1229,7 @@ static int ib_uverbs_poll_cq(struct uverbs_attr_bundle *attrs)
 		ret = uverbs_output_written(attrs, UVERBS_ATTR_CORE_OUT);
 
 out_put:
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 	return ret;
 }
 
@@ -1252,7 +1252,7 @@ static int ib_uverbs_req_notify_cq(struct uverbs_attr_bundle *attrs)
 	ib_req_notify_cq(cq, cmd.solicited_only ?
 			 IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
 
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 
 	return 0;
 }
@@ -1505,15 +1505,15 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	}
 
 	if (pd)
-		uobj_put_obj_read(pd);
+		uobj_put_read(pd_uobj);
 	if (scq)
-		uobj_put_obj_read(scq);
+		uobj_put_read(scq_uobj);
 	if (rcq && rcq != scq)
-		uobj_put_obj_read(rcq);
+		uobj_put_read(rcq_uobj);
 	if (srq)
-		uobj_put_obj_read(srq);
+		uobj_put_read(srq_uobj);
 	if (ind_tbl)
-		uobj_put_obj_read(ind_tbl);
+		uobj_put_read(ind_tbl_uobj);
 
 	return uobj_alloc_commit(&obj->uevent.uobject, attrs);
 err_cb:
@@ -1523,15 +1523,15 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	if (!IS_ERR(xrcd_uobj))
 		uobj_put_read(xrcd_uobj);
 	if (pd)
-		uobj_put_obj_read(pd);
+		uobj_put_read(pd_uobj);
 	if (scq)
-		uobj_put_obj_read(scq);
+		uobj_put_read(scq_uobj);
 	if (rcq && rcq != scq)
-		uobj_put_obj_read(rcq);
+		uobj_put_read(rcq_uobj);
 	if (srq)
-		uobj_put_obj_read(srq);
+		uobj_put_read(srq_uobj);
 	if (ind_tbl)
-		uobj_put_obj_read(ind_tbl);
+		uobj_put_read(ind_tbl_uobj);
 
 	uobj_alloc_abort(&obj->uevent.uobject, attrs);
 	return ret;
@@ -1710,7 +1710,7 @@ static int ib_uverbs_query_qp(struct uverbs_attr_bundle *attrs)
 
 	ret = ib_query_qp(qp, attr, cmd.attr_mask, init_attr);
 
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 
 	if (ret)
 		goto out;
@@ -1948,7 +1948,7 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
 				      &attrs->driver_udata);
 
 release_qp:
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 out:
 	kfree(attr);
 
@@ -2216,11 +2216,11 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 		ret = ret2;
 
 out_put:
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 
 	while (wr) {
 		if (is_ud && ud_wr(wr)->ah)
-			uobj_put_obj_read(ud_wr(wr)->ah);
+			uobj_put_read(ud_wr(wr)->ah->uobject);
 		next = wr->next;
 		kfree(wr);
 		wr = next;
@@ -2360,7 +2360,7 @@ static int ib_uverbs_post_recv(struct uverbs_attr_bundle *attrs)
 	resp.bad_wr = 0;
 	ret = qp->device->ops.post_recv(qp->real_qp, wr, &bad_wr);
 
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 	if (ret) {
 		for (next = wr; next; next = next->next) {
 			++resp.bad_wr;
@@ -2412,7 +2412,7 @@ static int ib_uverbs_post_srq_recv(struct uverbs_attr_bundle *attrs)
 	resp.bad_wr = 0;
 	ret = srq->device->ops.post_srq_recv(srq, wr, &bad_wr);
 
-	uobj_put_obj_read(srq);
+	uobj_put_read(srquobj);
 
 	if (ret)
 		for (next = wr; next; next = next->next) {
@@ -2501,7 +2501,7 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 	return uobj_alloc_commit(uobj, attrs);
 
 err_copy:
@@ -2509,7 +2509,7 @@ static int ib_uverbs_create_ah(struct uverbs_attr_bundle *attrs)
 			     uverbs_get_cleared_udata(attrs));
 
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 err:
 	uobj_alloc_abort(uobj, attrs);
@@ -2573,7 +2573,7 @@ static int ib_uverbs_attach_mcast(struct uverbs_attr_bundle *attrs)
 
 out_put:
 	mutex_unlock(&obj->mcast_lock);
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 
 	return ret;
 }
@@ -2618,7 +2618,7 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
 
 out_put:
 	mutex_unlock(&obj->mcast_lock);
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 	return ret;
 }
 
@@ -2737,7 +2737,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 		flow_resources_add(uflow_res,
 				   IB_FLOW_SPEC_ACTION_HANDLE,
 				   ib_spec->action.act);
-		uobj_put_obj_read(ib_spec->action.act);
+		uobj_put_read(uobj);
 		break;
 	case IB_FLOW_SPEC_ACTION_COUNT:
 		if (kern_spec->flow_count.size !=
@@ -2755,7 +2755,7 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 		flow_resources_add(uflow_res,
 				   IB_FLOW_SPEC_ACTION_COUNT,
 				   ib_spec->flow_count.counters);
-		uobj_put_obj_read(ib_spec->flow_count.counters);
+		uobj_put_read(uobj);
 		break;
 	default:
 		return -EINVAL;
@@ -3022,16 +3022,16 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
 	if (err)
 		goto err_copy;
 
-	uobj_put_obj_read(pd);
-	uobj_put_obj_read(cq);
+	uobj_put_read(pduobj);
+	uobj_put_read(cquobj);
 	return uobj_alloc_commit(&obj->uevent.uobject, attrs);
 
 err_copy:
 	ib_destroy_wq(wq, uverbs_get_cleared_udata(attrs));
 err_put_cq:
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 err_put_pd:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 err_uobj:
 	uobj_alloc_abort(&obj->uevent.uobject, attrs);
 
@@ -3097,7 +3097,7 @@ static int ib_uverbs_ex_modify_wq(struct uverbs_attr_bundle *attrs)
 	}
 	ret = wq->device->ops.modify_wq(wq, &wq_attr, cmd.attr_mask,
 					&attrs->driver_udata);
-	uobj_put_obj_read(wq);
+	uobj_put_read(wquobj);
 	return ret;
 }
 
@@ -3201,7 +3201,7 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 	kfree(wqs_handles);
 
 	for (j = 0; j < num_read_wqs; j++)
-		uobj_put_obj_read(wqs[j]);
+		uobj_put_read(wqs[j]->uobject);
 
 	return uobj_alloc_commit(uobj, attrs);
 
@@ -3211,7 +3211,7 @@ static int ib_uverbs_ex_create_rwq_ind_table(struct uverbs_attr_bundle *attrs)
 	uobj_alloc_abort(uobj, attrs);
 put_wqs:
 	for (j = 0; j < num_read_wqs; j++)
-		uobj_put_obj_read(wqs[j]);
+		uobj_put_read(wqs[j]->uobject);
 err_free:
 	kfree(wqs_handles);
 	kfree(wqs);
@@ -3380,7 +3380,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 	if (err)
 		goto err_copy;
 
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 	kfree(flow_attr);
 	if (cmd.flow_attr.num_of_specs)
 		kfree(kern_flow_attr);
@@ -3393,7 +3393,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 err_free_flow_attr:
 	kfree(flow_attr);
 err_put:
-	uobj_put_obj_read(qp);
+	uobj_put_read(qpuobj);
 err_uobj:
 	uobj_alloc_abort(uobj, attrs);
 err_free_attr:
@@ -3532,9 +3532,9 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 		uobj_put_read(xrcd_uobj);
 
 	if (ib_srq_has_cq(cmd->srq_type))
-		uobj_put_obj_read(attr.ext.cq);
+		uobj_put_read(cquobj);
 
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 	return uobj_alloc_commit(&obj->uevent.uobject, attrs);
 
 err_copy:
@@ -3543,11 +3543,11 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
 err_free:
 	kfree(srq);
 err_put:
-	uobj_put_obj_read(pd);
+	uobj_put_read(pduobj);
 
 err_put_cq:
 	if (ib_srq_has_cq(cmd->srq_type))
-		uobj_put_obj_read(attr.ext.cq);
+		uobj_put_read(cquobj);
 
 err_put_xrcd:
 	if (cmd->srq_type == IB_SRQT_XRC) {
@@ -3617,7 +3617,7 @@ static int ib_uverbs_modify_srq(struct uverbs_attr_bundle *attrs)
 	ret = srq->device->ops.modify_srq(srq, &attr, cmd.attr_mask,
 					  &attrs->driver_udata);
 
-	uobj_put_obj_read(srq);
+	uobj_put_read(srquobj);
 
 	return ret;
 }
@@ -3642,7 +3642,7 @@ static int ib_uverbs_query_srq(struct uverbs_attr_bundle *attrs)
 
 	ret = ib_query_srq(srq, &attr);
 
-	uobj_put_obj_read(srq);
+	uobj_put_read(srquobj);
 
 	if (ret)
 		return ret;
@@ -3769,7 +3769,7 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 
 	ret = rdma_set_cq_moderation(cq, cmd.attr.cq_count, cmd.attr.cq_period);
 
-	uobj_put_obj_read(cq);
+	uobj_put_read(cquobj);
 
 	return ret;
 }
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
index 578e5c28bc1c..02641a3241e3 100644
--- a/include/rdma/uverbs_std_types.h
+++ b/include/rdma/uverbs_std_types.h
@@ -98,9 +98,6 @@ static inline void uobj_put_read(struct ib_uobject *uobj)
 	rdma_lookup_put_uobject(uobj, UVERBS_LOOKUP_READ);
 }
 
-#define uobj_put_obj_read(_obj)					\
-	uobj_put_read((_obj)->uobject)
-
 static inline void uobj_put_write(struct ib_uobject *uobj)
 {
 	rdma_lookup_put_uobject(uobj, UVERBS_LOOKUP_WRITE);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 03/24] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 01/24] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 02/24] RDMA/uverbs: Delete the macro uobj_put_obj_read Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 04/24] IB/{core,hw}: ib_pd should not have ib_uobject pointer Yuval Shaia
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

In shared object model ib_pd can belong to 1 or more ib_ucontext.
Fix the nldev code so it could report multiple context ids.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/nldev.c  | 127 +++++++++++++++++++++++++++++--
 include/uapi/rdma/rdma_netlink.h |   3 +
 2 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index e287b71a1cfd..7ad23a6607f7 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -41,6 +41,7 @@
 #include "core_priv.h"
 #include "cma_priv.h"
 #include "restrack.h"
+#include "uverbs.h"
 
 /*
  * Sort array elements by the netlink attribute name
@@ -141,6 +142,8 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_UVERBS_DRIVER_ID]	= { .type = NLA_U32 },
 	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
 	[RDMA_NLDEV_SYS_ATTR_NETNS_MODE]	= { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_RES_CTX]		= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_RES_CTX_ENTRY]		= { .type = NLA_NESTED },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -611,11 +614,84 @@ static int fill_res_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
 err:	return -EMSGSIZE;
 }
 
+struct context_id {
+	struct list_head list;
+	u32 id;
+};
+
+static void pd_context(struct ib_pd *pd, struct list_head *list, int *count)
+{
+	struct ib_device *device = pd->device;
+	struct rdma_restrack_entry *res;
+	struct rdma_restrack_root *rt;
+	struct ib_uverbs_file *ufile;
+	struct ib_ucontext *ucontext;
+	struct ib_uobject *uobj;
+	unsigned long flags;
+	unsigned long id;
+	bool found;
+
+	rt = &device->res[RDMA_RESTRACK_CTX];
+
+	xa_lock(&rt->xa);
+
+	xa_for_each(&rt->xa, id, res) {
+		if (!rdma_is_visible_in_pid_ns(res))
+			continue;
+
+		if (!rdma_restrack_get(res))
+			continue;
+
+		xa_unlock(&rt->xa);
+
+		ucontext = container_of(res, struct ib_ucontext, res);
+		ufile = ucontext->ufile;
+		found = false;
+
+		/* See locking requirements in struct ib_uverbs_file */
+		down_read(&ufile->hw_destroy_rwsem);
+		spin_lock_irqsave(&ufile->uobjects_lock, flags);
+
+		list_for_each_entry(uobj, &ufile->uobjects, list) {
+			if (uobj->object == pd) {
+				found = true;
+				goto found;
+			}
+		}
+
+found:		spin_unlock_irqrestore(&ufile->uobjects_lock, flags);
+		up_read(&ufile->hw_destroy_rwsem);
+
+		if (found) {
+			struct context_id *ctx_id =
+				kmalloc(sizeof(*ctx_id), GFP_KERNEL);
+
+			if (WARN_ON_ONCE(!ctx_id))
+				goto next;
+
+			ctx_id->id = ucontext->res.id;
+			list_add(&ctx_id->list, list);
+			(*count)++;
+		}
+
+next:		rdma_restrack_put(res);
+		xa_lock(&rt->xa);
+	}
+
+	xa_unlock(&rt->xa);
+}
+
 static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 			     struct rdma_restrack_entry *res, uint32_t port)
 {
 	struct ib_pd *pd = container_of(res, struct ib_pd, res);
 	struct ib_device *dev = pd->device;
+	struct nlattr *table_attr = NULL;
+	struct nlattr *entry_attr = NULL;
+	struct context_id *ctx_id;
+	struct context_id *tmp;
+	LIST_HEAD(pd_context_ids);
+	int ctx_count = 0;
 
 	if (has_cap_net_admin) {
 		if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LOCAL_DMA_LKEY,
@@ -633,10 +709,38 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_PDN, res->id))
 		goto err;
 
-	if (!rdma_is_kernel_res(res) &&
-	    nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
-			pd->uobject->context->res.id))
-		goto err;
+	if (!rdma_is_kernel_res(res)) {
+		pd_context(pd, &pd_context_ids, &ctx_count);
+		if (ctx_count == 1) {
+			/* user pd, not shared */
+			ctx_id = list_first_entry(&pd_context_ids,
+						  struct context_id, list);
+			if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
+					ctx_id->id))
+				goto err;
+		} else if (ctx_count > 1) {
+			/* user pd, shared */
+			table_attr = nla_nest_start(msg,
+					RDMA_NLDEV_ATTR_RES_CTX);
+			if (!table_attr)
+				goto err;
+
+			list_for_each_entry(ctx_id, &pd_context_ids, list) {
+				entry_attr = nla_nest_start(msg,
+						RDMA_NLDEV_ATTR_RES_CTX_ENTRY);
+				if (!entry_attr)
+					goto err;
+				if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CTXN,
+						ctx_id->id))
+					goto err;
+				nla_nest_end(msg, entry_attr);
+				entry_attr = NULL;
+			}
+
+			nla_nest_end(msg, table_attr);
+			table_attr = NULL;
+		}
+	}
 
 	if (fill_res_name_pid(msg, res))
 		goto err;
@@ -644,9 +748,22 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 	if (fill_res_entry(dev, msg, res))
 		goto err;
 
+	list_for_each_entry_safe(ctx_id, tmp, &pd_context_ids, list)
+		kfree(ctx_id);
+
 	return 0;
 
-err:	return -EMSGSIZE;
+err:
+	if (entry_attr)
+		nla_nest_end(msg, entry_attr);
+
+	if (table_attr)
+		nla_nest_end(msg, table_attr);
+
+	list_for_each_entry_safe(ctx_id, tmp, &pd_context_ids, list)
+		kfree(ctx_id);
+
+	return -EMSGSIZE;
 }
 
 static int fill_stat_counter_mode(struct sk_buff *msg,
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 8e277783fa96..7fbbfb07f071 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -525,6 +525,9 @@ enum rdma_nldev_attr {
 	 */
 	RDMA_NLDEV_ATTR_DEV_DIM,                /* u8 */
 
+	RDMA_NLDEV_ATTR_RES_CTX,		/* nested table */
+	RDMA_NLDEV_ATTR_RES_CTX_ENTRY,		/* nested table */
+
 	/*
 	 * Always the end
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 04/24] IB/{core,hw}: ib_pd should not have ib_uobject pointer
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (2 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 03/24] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count Yuval Shaia
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

As a preparation step to shared PD, where ib_pd object will be pointed
by one or more ib_uobjects, remove ib_uobject pointer from ib_pd struct.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c       | 1 -
 drivers/infiniband/core/verbs.c            | 1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 1 -
 drivers/infiniband/hw/mlx5/main.c          | 1 -
 drivers/infiniband/hw/mthca/mthca_qp.c     | 3 ++-
 include/rdma/ib_verbs.h                    | 1 -
 6 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 204a93305f1c..d1f0c04f0ae8 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -432,7 +432,6 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
 	}
 
 	pd->device  = ib_dev;
-	pd->uobject = uobj;
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->res.type = RDMA_RESTRACK_PD;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f974b6854224..1d0215c1a504 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -263,7 +263,6 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 		return ERR_PTR(-ENOMEM);
 
 	pd->device = device;
-	pd->uobject = NULL;
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->flags = flags;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 0ff5f9617639..bd4a09b2ec1e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -760,7 +760,6 @@ static int hns_roce_v1_rsv_lp_qp(struct hns_roce_dev *hr_dev)
 
 	free_mr->mr_free_pd = to_hr_pd(pd);
 	free_mr->mr_free_pd->ibpd.device  = &hr_dev->ib_dev;
-	free_mr->mr_free_pd->ibpd.uobject = NULL;
 	free_mr->mr_free_pd->ibpd.__internal_mr = NULL;
 	atomic_set(&free_mr->mr_free_pd->ibpd.usecnt, 0);
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 98e566acb746..93db6d4c7da4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4937,7 +4937,6 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 		return -ENOMEM;
 
 	devr->p0->device  = ibdev;
-	devr->p0->uobject = NULL;
 	atomic_set(&devr->p0->usecnt, 0);
 
 	ret = mlx5_ib_alloc_pd(devr->p0, NULL);
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index d04c245359eb..b1a9169e3af6 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -956,7 +956,8 @@ static int mthca_max_data_size(struct mthca_dev *dev, struct mthca_qp *qp, int d
 static inline int mthca_max_inline_data(struct mthca_pd *pd, int max_data_size)
 {
 	/* We don't support inline data for kernel QPs (yet). */
-	return pd->ibpd.uobject ? max_data_size - MTHCA_INLINE_HEADER_SIZE : 0;
+	return !rdma_is_kernel_res(&pd->ibpd.res) ?
+		max_data_size - MTHCA_INLINE_HEADER_SIZE : 0;
 }
 
 static void mthca_adjust_qp_caps(struct mthca_dev *dev,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 391499008a22..7b429b2e7cf6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1509,7 +1509,6 @@ struct ib_pd {
 	u32			local_dma_lkey;
 	u32			flags;
 	struct ib_device       *device;
-	struct ib_uobject      *uobject;
 	atomic_t          	usecnt; /* count all resources */
 
 	u32			unsafe_global_rkey;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (3 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 04/24] IB/{core,hw}: ib_pd should not have ib_uobject pointer Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:53   ` Jason Gunthorpe
  2019-08-21 14:21 ` [PATCH v1 06/24] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Yuval Shaia
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

This new refcnt will points to the refcnt member of the HW object and will
behaves as expected by refcnt, i.e. will be increased and decreased as a
result of usage changes and will destroy the object when reaches to zero.
For a non-shared object refcnt will remain NULL.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/rdma_core.c | 23 +++++++++++++++++++++--
 include/rdma/ib_verbs.h             |  7 +++++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..651625f632d7 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -516,7 +516,26 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
 	const struct uverbs_obj_idr_type *idr_type =
 		container_of(uobj->uapi_object->type_attrs,
 			     struct uverbs_obj_idr_type, type);
-	int ret = idr_type->destroy_object(uobj, why, attrs);
+	static DEFINE_MUTEX(lock);
+	int ret, count;
+
+	mutex_lock(&lock);
+
+	if (uobj->refcnt) {
+		count = atomic_dec_return(uobj->refcnt);
+		WARN_ON(count < 0); /* use after free! */
+		if (count) {
+			mutex_unlock(&lock);
+			goto skip;
+		}
+	}
+
+	ret = idr_type->destroy_object(uobj, why, attrs);
+
+	if (ret)
+		atomic_inc(uobj->refcnt);
+
+	mutex_unlock(&lock);
 
 	/*
 	 * We can only fail gracefully if the user requested to destroy the
@@ -525,7 +544,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
 	 */
 	if (ib_is_destroy_retryable(ret, why, uobj))
 		return ret;
-
+skip:
 	if (why == RDMA_REMOVE_ABORT)
 		return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7b429b2e7cf6..7e69866fc419 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1496,6 +1496,13 @@ struct ib_uobject {
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 
 	const struct uverbs_api_object *uapi_object;
+
+	/*
+	 * ib_X HW object sharing support
+	 * - NULL for HW objects that are not shareable
+	 * - Pointer to ib_X reference counter for shareable HW objects
+	 */
+	atomic_t	       *refcnt;		/* ib_X object ref count */
 };
 
 struct ib_udata {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 06/24] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (4 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper Yuval Shaia
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Helper function to initialize ufile member of structure uverbs_attr_bundle.

This helper produce structure uverbs_attr_bundle that is safe for the below
operations on the given ufile:

- uobj_get_[read|write]
- uobj_alloc_commit
- uobj_alloc_abort

The last 2 ops are more complicated. Abort for example triggers the
below which pass the uverbs_attr_bundle driver_udata down to the
drivers level:

uobj_alloc_abort
 rdma_alloc_abort_uobject
  uverbs_destroy_uobject
   uobj->uapi_object->type_class->destroy_hw
    destroy_hw_idr_uobject
     idr_type->destroy_object
      uverbs_free_pd
       ib_dealloc_pd_user
        pd->device->ops.dealloc_pd
	 mlx4_ib_dealloc_pd

For more information about the potential issues, please see
commit f89adedaf3fe ("RDMA/uverbs: Initialize udata struct
on destroy flows")

Cc: Gal Pressman <galpress@amazon.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index d1f0c04f0ae8..4f42f9732dca 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3773,6 +3773,24 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
 	return ret;
 }
 
+/**
+ * uverbs_init_attrs_ufile - Helper function to create minimal
+ * uverbs_attr_bundle out of ib_uverbs_file that is suitable
+ * for the below operations:
+ *
+ * - uobj_get_[read|write]
+ * - uobj_alloc_commit
+ * - uobj_alloc_abort
+ */
+static void uverbs_init_attrs_ufile(struct uverbs_attr_bundle *attrs_bundle,
+				    struct ib_uverbs_file *ufile)
+{
+	*attrs_bundle = (struct uverbs_attr_bundle) {
+		.ufile = ufile,
+		.context = ufile->ucontext,
+	};
+}
+
 /*
  * Describe the input structs for write(). Some write methods have an input
  * only struct, most have an input and output. If the struct has an output then
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (5 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 06/24] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:57   ` Jason Gunthorpe
  2019-08-21 14:21 ` [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback Yuval Shaia
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

The lock/unlock helpers will be used in every import verb.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs.h      |  2 +
 drivers/infiniband/core/uverbs_cmd.c  | 73 +++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |  1 +
 3 files changed, 76 insertions(+)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 1e5aeb39f774..cf76336cb460 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -163,6 +163,8 @@ struct ib_uverbs_file {
 	struct page *disassociate_page;
 
 	struct xarray		idr;
+
+	struct file	       *filp;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 4f42f9732dca..21f0a1a986f4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -43,6 +43,7 @@
 
 #include <rdma/uverbs_types.h>
 #include <rdma/uverbs_std_types.h>
+#include <rdma/uverbs_ioctl.h>
 #include "rdma_core.h"
 
 #include "uverbs.h"
@@ -3791,6 +3792,78 @@ static void uverbs_init_attrs_ufile(struct uverbs_attr_bundle *attrs_bundle,
 	};
 }
 
+/* ib_uverbs_import_lock - Function which gathers code that is
+ *	common in the import verbs.
+ *
+ *	This function guarntee that both source and destination files are
+ *	protected from race with vfs close. The current file is protected
+ *	from such race because verb is executed in a system-call context.
+ *	The other file is protected by 'fget'. This function also ensures
+ *	that ib_uobject identified by the type & handle is locked for read.
+ *
+ *	Callers of this helper must also call ib_uverbs_import_unlock
+ *	to undo any locking performed by this helper.
+ */
+static int ib_uverbs_import_lock(struct uverbs_attr_bundle *attrs,
+				 int fd, u16 type, u32 handle,
+				 struct ib_uobject **uobj,
+				 struct file **filep,
+				 struct ib_uverbs_file **ufile)
+{
+	struct ib_uverbs_file *file = attrs->ufile;
+	struct ib_uverbs_device *dev = file->device;
+	struct uverbs_attr_bundle fd_attrs;
+	struct ib_uverbs_device *fd_dev;
+	int ret = 0;
+
+	*filep = fget(fd);
+	if (!*filep)
+		return -EINVAL;
+
+	/* check uverbs ops exist */
+	if ((*filep)->f_op != file->filp->f_op) {
+		ret = -EINVAL;
+		goto file;
+	}
+
+	*ufile = (*filep)->private_data;
+	fd_dev = (*ufile)->device;
+
+	/* check that both files belong to same ib_device */
+	if (dev != fd_dev) {
+		ret = -EINVAL;
+		goto file;
+	}
+
+	uverbs_init_attrs_ufile(&fd_attrs, *ufile);
+
+	*uobj = uobj_get_read(type, handle, &fd_attrs);
+	if (IS_ERR(*uobj)) {
+		ret = -EINVAL;
+		goto file;
+	}
+
+	/* verify ib_object is shareable */
+	if (!(*uobj)->refcnt) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	return 0;
+uobj:
+	uobj_put_read(*uobj);
+file:
+	fput(*filep);
+	return ret;
+}
+
+static void ib_uverbs_import_unlock(struct ib_uobject *uobj,
+				    struct file *filep)
+{
+	uobj_put_read(uobj);
+	fput(filep);
+}
+
 /*
  * Describe the input structs for write(). Some write methods have an input
  * only struct, most have an input and output. If the struct has an output then
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 02b57240176c..e42a9b5c38b2 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -1095,6 +1095,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	mutex_init(&file->umap_lock);
 	INIT_LIST_HEAD(&file->umaps);
 
+	file->filp = filp;
 	filp->private_data = file;
 	list_add_tail(&file->list, &dev->uverbs_file_list);
 	mutex_unlock(&dev->lists_mutex);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (6 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:59   ` Jason Gunthorpe
  2019-08-21 14:21 ` [PATCH v1 09/24] IB/core: Install clone ib_pd in device ops Yuval Shaia
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Define prototype for clone callback. The clone callback is used
by the driver layer to supply the uverbs a way to clone IB HW
object driver data to rdma-core user space provider. The clone
callback is used when new IB HW object is created and every time
it is imported to some ib_ucontext. Drivers that wish to enable
share of some IB HW object (ib_pd, ib_mr, etc..) must supply valid
clone callback for that type.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 include/rdma/ib_verbs.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7e69866fc419..542b3cb2d943 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2265,6 +2265,18 @@ struct iw_cm_conn_param;
 
 #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
 
+/*
+ * Prototype for IB HW object clone callback
+ *
+ * Define prototype for clone callback. The clone callback is used
+ * by the driver layer to supply the uverbs a way to clone IB HW
+ * object driver data to rdma-core user space provider. The clone
+ * callback is used when new IB HW object is created and every time
+ * it is imported to some ib_ucontext.
+ */
+#define clone_callback(ib_type)		\
+	int (*clone_##ib_type)(struct ib_udata *udata, struct ib_type *obj)
+
 /**
  * struct ib_device_ops - InfiniBand device operations
  * This structure defines all the InfiniBand device operations, providers will
@@ -2575,6 +2587,9 @@ struct ib_device_ops {
 	 */
 	int (*counter_update_stats)(struct rdma_counter *counter);
 
+	/* Object sharing callbacks */
+	clone_callback(ib_pd);
+
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
 	DECLARE_RDMA_OBJ_SIZE(ib_pd);
@@ -2582,6 +2597,17 @@ struct ib_device_ops {
 	DECLARE_RDMA_OBJ_SIZE(ib_ucontext);
 };
 
+/* Implementation of trivial clone callback */
+#define trivial_clone_callback(ib_type)					\
+static inline int trivial_clone_##ib_type(struct ib_udata *udata,	\
+					  struct ib_type *obj)		\
+{									\
+	return 0;							\
+}
+
+/* Shared IB HW object support */
+trivial_clone_callback(ib_pd);
+
 struct ib_core_device {
 	/* device must be the first element in structure until,
 	 * union of ib_core_device and device exists in ib_device.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 09/24] IB/core: Install clone ib_pd in device ops
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (7 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

Install clone_ib_pd in the device ops.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8892862fb759..8d90e429441f 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2626,6 +2626,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, set_vf_guid);
 	SET_DEVICE_OP(dev_ops, set_vf_link_state);
 	SET_DEVICE_OP(dev_ops, unmap_fmr);
+	SET_DEVICE_OP(dev_ops, clone_ib_pd);
 
 	SET_OBJ_SIZE(dev_ops, ib_ah);
 	SET_OBJ_SIZE(dev_ops, ib_cq);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (8 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 09/24] IB/core: Install clone ib_pd in device ops Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:59   ` Jason Gunthorpe
  2019-08-21 14:21 ` [PATCH v1 11/24] IB/mlx5: " Yuval Shaia
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Copy mlx4 ib_pd to user-space.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx4/main.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8d2f1e38b891..6baf52d988ed 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1179,6 +1179,13 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 	}
 }
 
+static int mlx4_ib_clone_pd(struct ib_udata *udata, struct ib_pd *ibpd)
+{
+	struct mlx4_ib_pd *pd = to_mpd(ibpd);
+
+	return udata ? ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32)) : 0;
+}
+
 static int mlx4_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct mlx4_ib_pd *pd = to_mpd(ibpd);
@@ -1189,10 +1196,12 @@ static int mlx4_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 	if (err)
 		return err;
 
-	if (udata && ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32))) {
+	err = mlx4_ib_clone_pd(udata, ibpd);
+	if (err) {
 		mlx4_pd_free(to_mdev(ibdev)->dev, pd->pdn);
 		return -EFAULT;
 	}
+
 	return 0;
 }
 
@@ -2565,6 +2574,9 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 	.rereg_user_mr = mlx4_ib_rereg_user_mr,
 	.resize_cq = mlx4_ib_resize_cq,
 
+	/* Object sharing callbacks */
+	.clone_ib_pd = mlx4_ib_clone_pd,
+
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx4_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx4_ib_cq, ibcq),
 	INIT_RDMA_OBJ_SIZE(ib_pd, mlx4_ib_pd, ibpd),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 11/24] IB/mlx5: Add implementation of clone_pd callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (9 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 12/24] RDMA/rxe: " Yuval Shaia
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Copy mlx5 ib_pd to user-space.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx5/main.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 93db6d4c7da4..63beee644b46 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2464,11 +2464,24 @@ int mlx5_ib_dealloc_dm(struct ib_dm *ibdm, struct uverbs_attr_bundle *attrs)
 	return 0;
 }
 
+static int mlx5_ib_clone_pd(struct ib_udata *udata, struct ib_pd *ibpd)
+{
+	struct mlx5_ib_pd *pd = to_mpd(ibpd);
+	struct mlx5_ib_alloc_pd_resp resp;
+	int ret = 0;
+
+	if (udata) {
+		resp.pdn = pd->pdn;
+		ret = ib_copy_to_udata(udata, &resp, sizeof(resp));
+	}
+
+	return ret;
+}
+
 static int mlx5_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct mlx5_ib_pd *pd = to_mpd(ibpd);
 	struct ib_device *ibdev = ibpd->device;
-	struct mlx5_ib_alloc_pd_resp resp;
 	int err;
 	u32 out[MLX5_ST_SZ_DW(alloc_pd_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(alloc_pd_in)]   = {};
@@ -2486,12 +2499,11 @@ static int mlx5_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 
 	pd->pdn = MLX5_GET(alloc_pd_out, out, pd);
 	pd->uid = uid;
-	if (udata) {
-		resp.pdn = pd->pdn;
-		if (ib_copy_to_udata(udata, &resp, sizeof(resp))) {
-			mlx5_cmd_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn, uid);
-			return -EFAULT;
-		}
+
+	err = mlx5_ib_clone_pd(udata, ibpd);
+	if (err) {
+		mlx5_cmd_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn, uid);
+		return err;
 	}
 
 	return 0;
@@ -6338,6 +6350,9 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
 	.resize_cq = mlx5_ib_resize_cq,
 
+	/* Object sharing callbacks */
+	.clone_ib_pd = mlx5_ib_clone_pd,
+
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx5_ib_cq, ibcq),
 	INIT_RDMA_OBJ_SIZE(ib_pd, mlx5_ib_pd, ibpd),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 12/24] RDMA/rxe: Add implementation of clone_pd callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (10 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 11/24] IB/mlx5: " Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 13/24] IB/uverbs: Add clone reference counting to ib_pd Yuval Shaia
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Copy rxe ib_pd to user-space.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 623129f27f5a..415ac9d3b504 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1148,6 +1148,9 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.req_notify_cq = rxe_req_notify_cq,
 	.resize_cq = rxe_resize_cq,
 
+	/* Object sharing callbacks */
+	.clone_ib_pd = trivial_clone_ib_pd,
+
 	INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq),
 	INIT_RDMA_OBJ_SIZE(ib_pd, rxe_pd, ibpd),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 13/24] IB/uverbs: Add clone reference counting to ib_pd
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (11 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 12/24] RDMA/rxe: " Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 14/24] IB/uverbs: Add PD import verb Yuval Shaia
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

To keep track of shared object life time add ref count to ib_pd

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 3 +++
 include/rdma/ib_verbs.h              | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 21f0a1a986f4..5800d7e0992a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -436,12 +436,15 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->res.type = RDMA_RESTRACK_PD;
+	/* number of uobj using this ib_pd */
+	atomic_set(&pd->refcnt, 1);
 
 	ret = ib_dev->ops.alloc_pd(pd, &attrs->driver_udata);
 	if (ret)
 		goto err_alloc;
 
 	uobj->object = pd;
+	uobj->refcnt = &pd->refcnt;
 	memset(&resp, 0, sizeof resp);
 	resp.pd_handle = uobj->id;
 	rdma_restrack_uadd(&pd->res);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 542b3cb2d943..e3ce38aa89b6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1525,6 +1525,9 @@ struct ib_pd {
 	 */
 	struct ib_mr	       *__internal_mr;
 	struct rdma_restrack_entry res;
+
+	/* number of uobj using this ib_pd */
+	atomic_t	      refcnt;
 };
 
 struct ib_xrcd {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 14/24] IB/uverbs: Add PD import verb
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (12 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 13/24] IB/uverbs: Add clone reference counting to ib_pd Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 15:00   ` Jason Gunthorpe
  2019-08-21 14:21 ` [PATCH v1 15/24] IB/mlx4: Enable import from FD verb Yuval Shaia
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Implementation of new verb which enable to import ib_pd object.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 95 ++++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_verbs.h    | 14 ++++
 2 files changed, 109 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 5800d7e0992a..08f39adb3a9d 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3867,6 +3867,96 @@ static void ib_uverbs_import_unlock(struct ib_uobject *uobj,
 	fput(filep);
 }
 
+static void ib_uverbs_clone_ib_pd(union ib_uverbs_import_fr_fd_resp *resp,
+				  struct ib_pd *pd,
+				  __u32 handle)
+{
+	resp->alloc_pd.pd_handle = handle;
+}
+
+static int ib_uverbs_import_ib_pd(struct uverbs_attr_bundle *attrs)
+{
+	union ib_uverbs_import_fr_fd_resp resp = {0};
+	struct ib_uobject *src_uobj = NULL;
+	struct ib_uobject *dst_uobj = NULL;
+	struct ib_uverbs_import_fr_fd cmd;
+	struct ib_uverbs_file *src_file;
+	struct ib_device *ibdev;
+	struct ib_pd *pd;
+	struct file *filep;
+	int ret;
+
+	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
+	if (ret)
+		return ret;
+
+	ret = ib_uverbs_import_lock(attrs, cmd.fd, UVERBS_OBJECT_PD,
+				    cmd.handle, &src_uobj, &filep,
+				    &src_file);
+	if (ret)
+		return ret;
+
+	pd = src_uobj->object;
+
+	dst_uobj = uobj_alloc(UVERBS_OBJECT_PD, attrs, &ibdev);
+	if (IS_ERR(dst_uobj)) {
+		ret = -ENOMEM;
+		goto uobj;
+	}
+
+	if (!ibdev->ops.clone_ib_pd) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	dst_uobj->object = src_uobj->object;
+	dst_uobj->refcnt = src_uobj->refcnt;
+
+	if (WARN_ON(!atomic_inc_not_zero(src_uobj->refcnt))) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	ret = ibdev->ops.clone_ib_pd(&attrs->driver_udata, pd);
+	if (ret)
+		goto uobj;
+
+	ib_uverbs_clone_ib_pd(&resp, dst_uobj->object, dst_uobj->id);
+
+	ret = uverbs_response(attrs, &resp, sizeof(resp));
+	if (ret)
+		goto uobj;
+
+	ret = uobj_alloc_commit(dst_uobj, attrs);
+
+	ib_uverbs_import_unlock(src_uobj, filep);
+
+	return ret;
+
+uobj:
+	if (!IS_ERR_OR_NULL(dst_uobj))
+		uobj_alloc_abort(dst_uobj, attrs);
+
+	return ret;
+}
+
+static int ib_uverbs_import_fr_fd(struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uverbs_import_fr_fd cmd;
+	int ret;
+
+	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
+	if (ret)
+		return ret;
+
+	switch (cmd.type) {
+	case UVERBS_OBJECT_PD:
+		return ib_uverbs_import_ib_pd(attrs);
+	}
+
+	return -EINVAL;
+}
+
 /*
  * Describe the input structs for write(). Some write methods have an input
  * only struct, most have an input and output. If the struct has an output then
@@ -3999,6 +4089,11 @@ const struct uapi_definition uverbs_def_write_intf[] = {
 			UAPI_DEF_WRITE_IO(struct ib_uverbs_query_port,
 					  struct ib_uverbs_query_port_resp),
 			UAPI_DEF_METHOD_NEEDS_FN(query_port)),
+		DECLARE_UVERBS_WRITE(
+			IB_USER_VERBS_CMD_IMPORT_FR_FD,
+			ib_uverbs_import_fr_fd,
+			UAPI_DEF_WRITE_IO(struct ib_uverbs_import_fr_fd,
+					  union  ib_uverbs_import_fr_fd_resp)),
 		DECLARE_UVERBS_WRITE_EX(
 			IB_USER_VERBS_EX_CMD_QUERY_DEVICE,
 			ib_uverbs_ex_query_device,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 0474c7400268..4274c40bcff8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -88,6 +88,7 @@ enum ib_uverbs_write_cmds {
 	IB_USER_VERBS_CMD_CLOSE_XRCD,
 	IB_USER_VERBS_CMD_CREATE_XSRQ,
 	IB_USER_VERBS_CMD_OPEN_QP,
+	IB_USER_VERBS_CMD_IMPORT_FR_FD,
 };
 
 enum {
@@ -1299,6 +1300,19 @@ struct ib_uverbs_ex_modify_cq {
 	__u32 reserved;
 };
 
+/* object sharing support */
+struct ib_uverbs_import_fr_fd {
+	__u64 response;
+	__u32 fd;
+	__u32 handle;
+	__u16 type;
+	__u8  reserved[6];
+};
+
+union ib_uverbs_import_fr_fd_resp {
+	struct ib_uverbs_alloc_pd_resp alloc_pd;
+};
+
 #define IB_DEVICE_NAME_MAX 64
 
 #endif /* IB_USER_VERBS_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 15/24] IB/mlx4: Enable import from FD verb
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (13 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 14/24] IB/uverbs: Add PD import verb Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 16/24] IB/mlx5: " Yuval Shaia
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Turn on import_fr_fd bit in uverbs command mask.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx4/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 6baf52d988ed..d0eef7e2d5a3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2693,7 +2693,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_XSRQ)		|
-		(1ull << IB_USER_VERBS_CMD_OPEN_QP);
+		(1ull << IB_USER_VERBS_CMD_OPEN_QP)		|
+		(1ull << IB_USER_VERBS_CMD_IMPORT_FR_FD);
 
 	ib_set_device_ops(&ibdev->ib_dev, &mlx4_ib_dev_ops);
 	ibdev->ib_dev.uverbs_ex_cmd_mask |=
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 16/24] IB/mlx5: Enable import from FD verb
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (14 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 15/24] IB/mlx4: Enable import from FD verb Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 17/24] RDMA/rxe: " Yuval Shaia
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Turn on import_fr_fd bit in uverbs command mask.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/hw/mlx5/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 63beee644b46..89ba3330d2ef 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6423,7 +6423,8 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
 		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_XSRQ)		|
-		(1ull << IB_USER_VERBS_CMD_OPEN_QP);
+		(1ull << IB_USER_VERBS_CMD_OPEN_QP)		|
+		(1ull << IB_USER_VERBS_CMD_IMPORT_FR_FD);
 	dev->ib_dev.uverbs_ex_cmd_mask =
 		(1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE)	|
 		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ)	|
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 17/24] RDMA/rxe: Enable import from FD verb
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (15 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 16/24] IB/mlx5: " Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 18/24] IB/core: ib_mr should not have ib_uobject pointer Yuval Shaia
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma
  Cc: Shamir Rabinovitch

From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

Turn on import_fr_fd bit in uverbs command mask.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 415ac9d3b504..3c4a0b89f28f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1208,6 +1208,7 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	    | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_AH)
 	    | BIT_ULL(IB_USER_VERBS_CMD_ATTACH_MCAST)
 	    | BIT_ULL(IB_USER_VERBS_CMD_DETACH_MCAST)
+	    | BIT_ULL(IB_USER_VERBS_CMD_IMPORT_FR_FD)
 	    ;
 
 	ib_set_device_ops(dev, &rxe_dev_ops);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 18/24] IB/core: ib_mr should not have ib_uobject pointer
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (16 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 17/24] RDMA/rxe: " Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 19/24] IB/core: Install clone ib_mr in device ops Yuval Shaia
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

As a preparation step to shared MR, where ib_mr object will be pointed
by one or more ib_uobjects, remove ib_uobject pointer from ib_mr struct.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c          | 1 -
 drivers/infiniband/core/uverbs_std_types_mr.c | 1 -
 drivers/infiniband/core/verbs.c               | 3 ---
 include/rdma/ib_verbs.h                       | 3 ++-
 4 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 08f39adb3a9d..25eab9dc9cad 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -765,7 +765,6 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	mr->type    = IB_MR_TYPE_USER;
 	mr->dm	    = NULL;
 	mr->sig_attrs = NULL;
-	mr->uobject = uobj;
 	atomic_inc(&pd->usecnt);
 	mr->res.type = RDMA_RESTRACK_MR;
 	rdma_restrack_uadd(&mr->res);
diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index c1286a52dc84..5219af8960a3 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -130,7 +130,6 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
 	mr->pd      = pd;
 	mr->type    = IB_MR_TYPE_DM;
 	mr->dm      = dm;
-	mr->uobject = uobj;
 	atomic_inc(&pd->usecnt);
 	atomic_inc(&dm->usecnt);
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 1d0215c1a504..a7722d54869e 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -299,7 +299,6 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 		mr->device	= pd->device;
 		mr->pd		= pd;
 		mr->type        = IB_MR_TYPE_DMA;
-		mr->uobject	= NULL;
 		mr->need_inval	= false;
 
 		pd->__internal_mr = mr;
@@ -2035,7 +2034,6 @@ struct ib_mr *ib_alloc_mr_user(struct ib_pd *pd, enum ib_mr_type mr_type,
 		mr->device  = pd->device;
 		mr->pd      = pd;
 		mr->dm      = NULL;
-		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
 		mr->need_inval = false;
 		mr->res.type = RDMA_RESTRACK_MR;
@@ -2088,7 +2086,6 @@ struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd,
 	mr->device = pd->device;
 	mr->pd = pd;
 	mr->dm = NULL;
-	mr->uobject = NULL;
 	atomic_inc(&pd->usecnt);
 	mr->need_inval = false;
 	mr->res.type = RDMA_RESTRACK_MR;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e3ce38aa89b6..877932305ea7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1785,7 +1785,6 @@ struct ib_mr {
 	enum ib_mr_type	   type;
 	bool		   need_inval;
 	union {
-		struct ib_uobject	*uobject;	/* user */
 		struct list_head	qp_entry;	/* FR */
 	};
 
@@ -2592,6 +2591,7 @@ struct ib_device_ops {
 
 	/* Object sharing callbacks */
 	clone_callback(ib_pd);
+	clone_callback(ib_mr);
 
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
@@ -2610,6 +2610,7 @@ static inline int trivial_clone_##ib_type(struct ib_udata *udata,	\
 
 /* Shared IB HW object support */
 trivial_clone_callback(ib_pd);
+trivial_clone_callback(ib_mr);
 
 struct ib_core_device {
 	/* device must be the first element in structure until,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 19/24] IB/core: Install clone ib_mr in device ops
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (17 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 18/24] IB/core: ib_mr should not have ib_uobject pointer Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 20/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

Install clone_ib_mr in the device ops.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8d90e429441f..e2e3ed4a3d0a 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2627,6 +2627,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, set_vf_link_state);
 	SET_DEVICE_OP(dev_ops, unmap_fmr);
 	SET_DEVICE_OP(dev_ops, clone_ib_pd);
+	SET_DEVICE_OP(dev_ops, clone_ib_mr);
 
 	SET_OBJ_SIZE(dev_ops, ib_ah);
 	SET_OBJ_SIZE(dev_ops, ib_cq);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 20/24] IB/mlx4: Add implementation of clone_pd callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (18 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 19/24] IB/core: Install clone ib_mr in device ops Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 21/24] IB/mlx5: " Yuval Shaia
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

No special handling in for object cloning, utilize the trivial
implementation.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/hw/mlx4/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index d0eef7e2d5a3..3192284a85ca 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2576,6 +2576,7 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 
 	/* Object sharing callbacks */
 	.clone_ib_pd = mlx4_ib_clone_pd,
+	.clone_ib_mr = trivial_clone_ib_mr,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx4_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx4_ib_cq, ibcq),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 21/24] IB/mlx5: Add implementation of clone_pd callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (19 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 20/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 22/24] RDMA/rxe: " Yuval Shaia
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

No special handling in for object cloning, utilize the trivial
implementation.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/hw/mlx5/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 89ba3330d2ef..79eef45167bd 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6352,6 +6352,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 
 	/* Object sharing callbacks */
 	.clone_ib_pd = mlx5_ib_clone_pd,
+	.clone_ib_mr = trivial_clone_ib_mr,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mlx5_ib_cq, ibcq),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 22/24] RDMA/rxe: Add implementation of clone_pd callback
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (20 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 21/24] IB/mlx5: " Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 23/24] IB/uverbs: Add clone reference counting to ib_mr Yuval Shaia
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

No special handling in for object cloning, utilize the trivial
implementation.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 3c4a0b89f28f..ec2cade76372 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1150,6 +1150,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 
 	/* Object sharing callbacks */
 	.clone_ib_pd = trivial_clone_ib_pd,
+	.clone_ib_mr = trivial_clone_ib_mr,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 23/24] IB/uverbs: Add clone reference counting to ib_mr
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (21 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 22/24] RDMA/rxe: " Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:21 ` [PATCH v1 24/24] IB/uverbs: Add MR import verb Yuval Shaia
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

To keep track of shared object life time add ref count to ib_mr

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 2 ++
 include/rdma/ib_verbs.h              | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 25eab9dc9cad..92f04e7360b6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -768,8 +768,10 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	atomic_inc(&pd->usecnt);
 	mr->res.type = RDMA_RESTRACK_MR;
 	rdma_restrack_uadd(&mr->res);
+	atomic_set(&mr->refcnt, 1);
 
 	uobj->object = mr;
+	uobj->refcnt = &mr->refcnt;
 
 	memset(&resp, 0, sizeof resp);
 	resp.lkey      = mr->lkey;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 877932305ea7..80fd78b3b654 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1794,6 +1794,9 @@ struct ib_mr {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+
+	/* number of uobj using this ib_pd */
+	atomic_t	      refcnt;
 };
 
 struct ib_mw {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 24/24] IB/uverbs: Add MR import verb
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (22 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 23/24] IB/uverbs: Add clone reference counting to ib_mr Yuval Shaia
@ 2019-08-21 14:21 ` Yuval Shaia
  2019-08-21 14:50 ` [PATCH v1 00/24] Shared PD and MR Jason Gunthorpe
  2019-08-21 23:37 ` Ira Weiny
  25 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-21 14:21 UTC (permalink / raw)
  To: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, yuval.shaia, dan.carpenter, bvanassche, maxg, israelr,
	galpress, denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

Implementation of new verb which enable to import ib_mr object.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 79 ++++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_verbs.h    |  1 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 92f04e7360b6..5aaa6a7e129f 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3875,6 +3875,15 @@ static void ib_uverbs_clone_ib_pd(union ib_uverbs_import_fr_fd_resp *resp,
 	resp->alloc_pd.pd_handle = handle;
 }
 
+static void ib_uverbs_clone_ib_mr(union ib_uverbs_import_fr_fd_resp *resp,
+				  struct ib_mr *mr,
+				  __u32 handle)
+{
+	resp->reg_mr.lkey = mr->lkey;
+	resp->reg_mr.rkey = mr->rkey;
+	resp->reg_mr.mr_handle = handle;
+}
+
 static int ib_uverbs_import_ib_pd(struct uverbs_attr_bundle *attrs)
 {
 	union ib_uverbs_import_fr_fd_resp resp = {0};
@@ -3941,6 +3950,74 @@ static int ib_uverbs_import_ib_pd(struct uverbs_attr_bundle *attrs)
 	return ret;
 }
 
+static int ib_uverbs_import_ib_mr(struct uverbs_attr_bundle *attrs)
+{
+	union ib_uverbs_import_fr_fd_resp resp = {0};
+	struct ib_uobject *src_uobj = NULL;
+	struct ib_uobject *dst_uobj = NULL;
+	struct ib_uverbs_import_fr_fd cmd;
+	struct ib_uverbs_file *src_file;
+	struct ib_device *ibdev;
+	struct ib_mr *mr;
+	struct file *filep;
+	int ret;
+
+	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
+	if (ret)
+		return ret;
+
+	ret = ib_uverbs_import_lock(attrs, cmd.fd, UVERBS_OBJECT_MR,
+				    cmd.handle, &src_uobj, &filep,
+				    &src_file);
+	if (ret)
+		return ret;
+
+	mr = src_uobj->object;
+
+	dst_uobj = uobj_alloc(UVERBS_OBJECT_MR, attrs, &ibdev);
+	if (IS_ERR(dst_uobj)) {
+		ret = -ENOMEM;
+		goto uobj;
+	}
+
+	if (!ibdev->ops.clone_ib_mr) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	dst_uobj->object = src_uobj->object;
+	dst_uobj->refcnt = src_uobj->refcnt;
+
+	if (WARN_ON(!atomic_inc_not_zero(src_uobj->refcnt))) {
+		ret = -EINVAL;
+		goto uobj;
+	}
+
+	ret = ibdev->ops.clone_ib_mr(&attrs->driver_udata, mr);
+	if (ret)
+		goto uobj;
+
+	ib_uverbs_clone_ib_mr(&resp, dst_uobj->object, dst_uobj->id);
+
+	ret = uverbs_response(attrs, &resp, sizeof(resp));
+	if (ret)
+		goto uobj;
+
+	ret = uobj_alloc_commit(dst_uobj, attrs);
+
+	ib_uverbs_import_unlock(src_uobj, filep);
+
+	return ret;
+
+uobj:
+	ib_uverbs_import_unlock(src_uobj, filep);
+
+	if (!IS_ERR_OR_NULL(dst_uobj))
+		uobj_alloc_abort(dst_uobj, attrs);
+
+	return ret;
+}
+
 static int ib_uverbs_import_fr_fd(struct uverbs_attr_bundle *attrs)
 {
 	struct ib_uverbs_import_fr_fd cmd;
@@ -3953,6 +4030,8 @@ static int ib_uverbs_import_fr_fd(struct uverbs_attr_bundle *attrs)
 	switch (cmd.type) {
 	case UVERBS_OBJECT_PD:
 		return ib_uverbs_import_ib_pd(attrs);
+	case UVERBS_OBJECT_MR:
+		return ib_uverbs_import_ib_mr(attrs);
 	}
 
 	return -EINVAL;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 4274c40bcff8..cfd1a734d5f3 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -1311,6 +1311,7 @@ struct ib_uverbs_import_fr_fd {
 
 union ib_uverbs_import_fr_fd_resp {
 	struct ib_uverbs_alloc_pd_resp alloc_pd;
+	struct ib_uverbs_reg_mr_resp reg_mr;
 };
 
 #define IB_DEVICE_NAME_MAX 64
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (23 preceding siblings ...)
  2019-08-21 14:21 ` [PATCH v1 24/24] IB/uverbs: Add MR import verb Yuval Shaia
@ 2019-08-21 14:50 ` Jason Gunthorpe
  2019-08-22  8:50   ` Yuval Shaia
  2019-08-21 23:37 ` Ira Weiny
  25 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 14:50 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma

On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> Following patch-set introduce the shared object feature.
> 
> A shared object feature allows one process to create HW objects (currently
> PD and MR) so that a second process can import.
> 
> Patch-set is logically splits to 4 parts as the following:
> - patches 1 to 7 and 18 are preparation steps.
> - patches 8 to 14 are the implementation of import PD
> - patches 15 to 17 are the implementation of the verb
> - patches 19 to 24 are the implementation of import MR

This is way too big. 10-14 patches at most in a series.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count
  2019-08-21 14:21 ` [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count Yuval Shaia
@ 2019-08-21 14:53   ` Jason Gunthorpe
  2019-08-27 16:28     ` Yuval Shaia
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 14:53 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 05:21:06PM +0300, Yuval Shaia wrote:
> From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> 
> This new refcnt will points to the refcnt member of the HW object and will
> behaves as expected by refcnt, i.e. will be increased and decreased as a
> result of usage changes and will destroy the object when reaches to zero.
> For a non-shared object refcnt will remain NULL.
> 
> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
>  drivers/infiniband/core/rdma_core.c | 23 +++++++++++++++++++++--
>  include/rdma/ib_verbs.h             |  7 +++++++
>  2 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..651625f632d7 100644
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -516,7 +516,26 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
>  	const struct uverbs_obj_idr_type *idr_type =
>  		container_of(uobj->uapi_object->type_attrs,
>  			     struct uverbs_obj_idr_type, type);
> -	int ret = idr_type->destroy_object(uobj, why, attrs);
> +	static DEFINE_MUTEX(lock);
> +	int ret, count;
> +
> +	mutex_lock(&lock);
> +
> +	if (uobj->refcnt) {
> +		count = atomic_dec_return(uobj->refcnt);
> +		WARN_ON(count < 0); /* use after free! */

Use a proper refcount_t

> +		if (count) {
> +			mutex_unlock(&lock);
> +			goto skip;
> +		}
> +	}
> +
> +	ret = idr_type->destroy_object(uobj, why, attrs);
> +
> +	if (ret)
> +		atomic_inc(uobj->refcnt);
> +
> +	mutex_unlock(&lock);
>  
>  	/*
>  	 * We can only fail gracefully if the user requested to destroy the
> @@ -525,7 +544,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
>  	 */
>  	if (ib_is_destroy_retryable(ret, why, uobj))
>  		return ret;
> -
> +skip:

This doesn't seem to properly define who owns the rdmacg count - it
should belong to the HW object not to the uobject.

> +
> +	/*
> +	 * ib_X HW object sharing support
> +	 * - NULL for HW objects that are not shareable
> +	 * - Pointer to ib_X reference counter for shareable HW objects
> +	 */
> +	atomic_t	       *refcnt;		/* ib_X object ref count */

Gross, shouldn't this actually be in the hw object?

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper
  2019-08-21 14:21 ` [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper Yuval Shaia
@ 2019-08-21 14:57   ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 14:57 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 05:21:08PM +0300, Yuval Shaia wrote:
> From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> 
> The lock/unlock helpers will be used in every import verb.
> 
> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
>  drivers/infiniband/core/uverbs.h      |  2 +
>  drivers/infiniband/core/uverbs_cmd.c  | 73 +++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs_main.c |  1 +
>  3 files changed, 76 insertions(+)
> 
> diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
> index 1e5aeb39f774..cf76336cb460 100644
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -163,6 +163,8 @@ struct ib_uverbs_file {
>  	struct page *disassociate_page;
>  
>  	struct xarray		idr;
> +
> +	struct file	       *filp;
>  };
>  
>  struct ib_uverbs_event {
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 4f42f9732dca..21f0a1a986f4 100644
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -43,6 +43,7 @@
>  
>  #include <rdma/uverbs_types.h>
>  #include <rdma/uverbs_std_types.h>
> +#include <rdma/uverbs_ioctl.h>
>  #include "rdma_core.h"
>  
>  #include "uverbs.h"
> @@ -3791,6 +3792,78 @@ static void uverbs_init_attrs_ufile(struct uverbs_attr_bundle *attrs_bundle,
>  	};
>  }
>  
> +/* ib_uverbs_import_lock - Function which gathers code that is
> + *	common in the import verbs.
> + *
> + *	This function guarntee that both source and destination files are
> + *	protected from race with vfs close. The current file is protected
> + *	from such race because verb is executed in a system-call context.
> + *	The other file is protected by 'fget'. This function also ensures
> + *	that ib_uobject identified by the type & handle is locked for read.
> + *
> + *	Callers of this helper must also call ib_uverbs_import_unlock
> + *	to undo any locking performed by this helper.
> + */
> +static int ib_uverbs_import_lock(struct uverbs_attr_bundle *attrs,

This isn't a lock, it is a get

> +				 int fd, u16 type, u32 handle,
> +				 struct ib_uobject **uobj,
> +				 struct file **filep,
> +				 struct ib_uverbs_file **ufile)
> +{
> +	struct ib_uverbs_file *file = attrs->ufile;
> +	struct ib_uverbs_device *dev = file->device;
> +	struct uverbs_attr_bundle fd_attrs;
> +	struct ib_uverbs_device *fd_dev;
> +	int ret = 0;
> +
> +	*filep = fget(fd);
> +	if (!*filep)
> +		return -EINVAL;
> +
> +	/* check uverbs ops exist */
> +	if ((*filep)->f_op != file->filp->f_op) {
> +		ret = -EINVAL;
> +		goto file;
> +	}
> +
> +	*ufile = (*filep)->private_data;
> +	fd_dev = (*ufile)->device;

Most likely all of this should be in the ioctl code as part of some 

> +	/* check that both files belong to same ib_device */
> +	if (dev != fd_dev) {
> +		ret = -EINVAL;
> +		goto file;
> +	}
> +
> +	uverbs_init_attrs_ufile(&fd_attrs, *ufile);

This makes no sense here

> +	*uobj = uobj_get_read(type, handle, &fd_attrs);
> +	if (IS_ERR(*uobj)) {
> +		ret = -EINVAL;
> +		goto file;
> +	}
> +
> +	/* verify ib_object is shareable */
> +	if (!(*uobj)->refcnt) {
> +		ret = -EINVAL;
> +		goto uobj;
> +	}
> +
> +	return 0;
> +uobj:
> +	uobj_put_read(*uobj);
> +file:
> +	fput(*filep);
> +	return ret;
> +}

Most likely most of this belongs to the ioctl path as some new input
attr type 'foreign context'

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback
  2019-08-21 14:21 ` [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback Yuval Shaia
@ 2019-08-21 14:59   ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 14:59 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 05:21:09PM +0300, Yuval Shaia wrote:
> From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> 
> Define prototype for clone callback. The clone callback is used
> by the driver layer to supply the uverbs a way to clone IB HW
> object driver data to rdma-core user space provider. The clone
> callback is used when new IB HW object is created and every time
> it is imported to some ib_ucontext. Drivers that wish to enable
> share of some IB HW object (ib_pd, ib_mr, etc..) must supply valid
> clone callback for that type.
> 
> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
>  include/rdma/ib_verbs.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 7e69866fc419..542b3cb2d943 100644
> +++ b/include/rdma/ib_verbs.h
> @@ -2265,6 +2265,18 @@ struct iw_cm_conn_param;
>  
>  #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
>  
> +/*
> + * Prototype for IB HW object clone callback
> + *
> + * Define prototype for clone callback. The clone callback is used
> + * by the driver layer to supply the uverbs a way to clone IB HW
> + * object driver data to rdma-core user space provider. The clone
> + * callback is used when new IB HW object is created and every time
> + * it is imported to some ib_ucontext.
> + */
> +#define clone_callback(ib_type)		\
> +	int (*clone_##ib_type)(struct ib_udata *udata, struct ib_type *obj)

Don't like the idea of clone at all.

If the userspace driver needs to learn information about a HW object
it just imported, like some HW specific PDN, then the correct verb for
that is QUERY not clone.

And we already have a wide ranging infrastructure for drivers to add
their own driver specific query interfaces.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback
  2019-08-21 14:21 ` [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
@ 2019-08-21 14:59   ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 14:59 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 05:21:11PM +0300, Yuval Shaia wrote:
> From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> 
> Copy mlx4 ib_pd to user-space.
> 
> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
>  drivers/infiniband/hw/mlx4/main.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index 8d2f1e38b891..6baf52d988ed 100644
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -1179,6 +1179,13 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
>  	}
>  }
>  
> +static int mlx4_ib_clone_pd(struct ib_udata *udata, struct ib_pd *ibpd)
> +{
> +	struct mlx4_ib_pd *pd = to_mpd(ibpd);
> +
> +	return udata ? ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32)) : 0;
> +}

And here it is, clone is just query.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 14/24] IB/uverbs: Add PD import verb
  2019-08-21 14:21 ` [PATCH v1 14/24] IB/uverbs: Add PD import verb Yuval Shaia
@ 2019-08-21 15:00   ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 15:00 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 05:21:15PM +0300, Yuval Shaia wrote:

>  /*
>   * Describe the input structs for write(). Some write methods have an input
>   * only struct, most have an input and output. If the struct has an output then
> @@ -3999,6 +4089,11 @@ const struct uapi_definition uverbs_def_write_intf[] = {
>  			UAPI_DEF_WRITE_IO(struct ib_uverbs_query_port,
>  					  struct ib_uverbs_query_port_resp),
>  			UAPI_DEF_METHOD_NEEDS_FN(query_port)),
> +		DECLARE_UVERBS_WRITE(
> +			IB_USER_VERBS_CMD_IMPORT_FR_FD,
> +			ib_uverbs_import_fr_fd,
> +			UAPI_DEF_WRITE_IO(struct ib_uverbs_import_fr_fd,
> +					  union  ib_uverbs_import_fr_fd_resp)),

I'm sure I said this already, no new write() verbs. New things must
use ioctl.

Since this looks 100% driver specific it should be a set of driver
specific ioctls.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
                   ` (24 preceding siblings ...)
  2019-08-21 14:50 ` [PATCH v1 00/24] Shared PD and MR Jason Gunthorpe
@ 2019-08-21 23:37 ` Ira Weiny
  2019-08-22  8:41   ` Yuval Shaia
  25 siblings, 1 reply; 49+ messages in thread
From: Ira Weiny @ 2019-08-21 23:37 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, shamir.rabinovitch, johannes.berg, willy, michaelgur,
	markb, dan.carpenter, bvanassche, maxg, israelr, galpress,
	denisd, yuvalav, dennis.dalessandro, will, ereza, jgg,
	linux-rdma

On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> Following patch-set introduce the shared object feature.
> 
> A shared object feature allows one process to create HW objects (currently
> PD and MR) so that a second process can import.

For something this fundamental I think the cover letter should be more
detailed than this.  Questions I have without digging into the code:

What is the use case?

What is the "key" that allows a MR to be shared among 2 processes?  Do you
introduce some PD identifier?  And then some {PDID, lkey} tuple is used to ID
the MR?

I assume you have to share the PD first and then any MR in the shared PD can be
shared?  If so how does the MR get shared?

Again I'm concerned with how this will interact with the RDMA and file system
interaction we have been trying to fix.

Why is SCM_RIGHTS on the rdma context FD not sufficient to share the entire
context, PD, and all MR's?

Ira

> 
> Patch-set is logically splits to 4 parts as the following:
> - patches 1 to 7 and 18 are preparation steps.
> - patches 8 to 14 are the implementation of import PD
> - patches 15 to 17 are the implementation of the verb
> - patches 19 to 24 are the implementation of import MR
> 
> v0 -> v1:
> 	* Delete the patch "IB/uverbs: ufile must be freed only when not
> 	  used anymore". The process can die, the ucontext remains until
> 	  last reference to it is closed.
> 	* Rebase to latest for-next branch
> 
> Shamir Rabinovitch (16):
>   RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
>   RDMA/uverbs: Delete the macro uobj_put_obj_read
>   RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
>   IB/{core,hw}: ib_pd should not have ib_uobject pointer
>   IB/core: ib_uobject need HW object reference count
>   IB/uverbs: Helper function to initialize ufile member of
>     uverbs_attr_bundle
>   IB/uverbs: Add context import lock/unlock helper
>   IB/verbs: Prototype of HW object clone callback
>   IB/mlx4: Add implementation of clone_pd callback
>   IB/mlx5: Add implementation of clone_pd callback
>   RDMA/rxe: Add implementation of clone_pd callback
>   IB/uverbs: Add clone reference counting to ib_pd
>   IB/uverbs: Add PD import verb
>   IB/mlx4: Enable import from FD verb
>   IB/mlx5: Enable import from FD verb
>   RDMA/rxe: Enable import from FD verb
> 
> Yuval Shaia (8):
>   IB/core: Install clone ib_pd in device ops
>   IB/core: ib_mr should not have ib_uobject pointer
>   IB/core: Install clone ib_mr in device ops
>   IB/mlx4: Add implementation of clone_pd callback
>   IB/mlx5: Add implementation of clone_pd callback
>   RDMA/rxe: Add implementation of clone_pd callback
>   IB/uverbs: Add clone reference counting to ib_mr
>   IB/uverbs: Add MR import verb
> 
>  drivers/infiniband/core/device.c              |   2 +
>  drivers/infiniband/core/nldev.c               | 127 ++++-
>  drivers/infiniband/core/rdma_core.c           |  23 +-
>  drivers/infiniband/core/uverbs.h              |   2 +
>  drivers/infiniband/core/uverbs_cmd.c          | 489 +++++++++++++++---
>  drivers/infiniband/core/uverbs_main.c         |   1 +
>  drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
>  drivers/infiniband/core/verbs.c               |   4 -
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
>  drivers/infiniband/hw/mlx4/main.c             |  18 +-
>  drivers/infiniband/hw/mlx5/main.c             |  34 +-
>  drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
>  include/rdma/ib_verbs.h                       |  43 +-
>  include/rdma/uverbs_std_types.h               |  11 +-
>  include/uapi/rdma/ib_user_verbs.h             |  15 +
>  include/uapi/rdma/rdma_netlink.h              |   3 +
>  17 files changed, 669 insertions(+), 113 deletions(-)
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-21 23:37 ` Ira Weiny
@ 2019-08-22  8:41   ` Yuval Shaia
  2019-08-22 14:15     ` Doug Ledford
  2019-08-22 16:58     ` Ira Weiny
  0 siblings, 2 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-22  8:41 UTC (permalink / raw)
  To: Ira Weiny
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 04:37:37PM -0700, Ira Weiny wrote:
> On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> > Following patch-set introduce the shared object feature.
> > 
> > A shared object feature allows one process to create HW objects (currently
> > PD and MR) so that a second process can import.

Hi Ira,

> 
> For something this fundamental I think the cover letter should be more
> detailed than this.  Questions I have without digging into the code:
> 
> What is the use case?

I have only one use case but i didn't added it to commit log just not to
limit the usage of this feature but you are right, cover letter is great
for such things, will add it for v2.

Anyway, here is our use case: Consider a case of server with huge amount
of memory and some hundreds or even thousands processes are using it to
serves clients requests. In this case the HCA will have to manage hundreds
or thousands MRs. A better design maybe would be that one process will
create one (or several) MR(s) which will be shared with the other
processes. This will reduce the number of address translation entries and
cache miss dramatically.

> 
> What is the "key" that allows a MR to be shared among 2 processes?  Do you
> introduce some PD identifier?  And then some {PDID, lkey} tuple is used to ID
> the MR?
> 
> I assume you have to share the PD first and then any MR in the shared PD can be
> shared?  If so how does the MR get shared?

Sorry, i'm not following.
I think the term 'share' is somehow mistake, it is actually a process
'imports' objects into it's context. And yes, the workflow is first to
import the PD and then import the MR.

> 
> Again I'm concerned with how this will interact with the RDMA and file system
> interaction we have been trying to fix.

I'm not aware of this file-system thing, can you point me to some
discussion on that so i'll see how this patch-set affect it.

> 
> Why is SCM_RIGHTS on the rdma context FD not sufficient to share the entire
> context, PD, and all MR's?

Well, this SCM_RIGHTS is great, one can share the IB context with another.
But it is not enough, because:
- What API the second process can use to get his hands on one of the PDs or
  MRs from this context?
- What mechanism takes care of the destruction of such objects (SCM_RIGHTS
  takes care for the ref counting of the context but i'm referring to the
  PDs and MRs objects)?

The entire purpose of this patch set is to address these two questions.

Yuval

> 
> Ira
> 
> > 
> > Patch-set is logically splits to 4 parts as the following:
> > - patches 1 to 7 and 18 are preparation steps.
> > - patches 8 to 14 are the implementation of import PD
> > - patches 15 to 17 are the implementation of the verb
> > - patches 19 to 24 are the implementation of import MR
> > 
> > v0 -> v1:
> > 	* Delete the patch "IB/uverbs: ufile must be freed only when not
> > 	  used anymore". The process can die, the ucontext remains until
> > 	  last reference to it is closed.
> > 	* Rebase to latest for-next branch
> > 
> > Shamir Rabinovitch (16):
> >   RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
> >   RDMA/uverbs: Delete the macro uobj_put_obj_read
> >   RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
> >   IB/{core,hw}: ib_pd should not have ib_uobject pointer
> >   IB/core: ib_uobject need HW object reference count
> >   IB/uverbs: Helper function to initialize ufile member of
> >     uverbs_attr_bundle
> >   IB/uverbs: Add context import lock/unlock helper
> >   IB/verbs: Prototype of HW object clone callback
> >   IB/mlx4: Add implementation of clone_pd callback
> >   IB/mlx5: Add implementation of clone_pd callback
> >   RDMA/rxe: Add implementation of clone_pd callback
> >   IB/uverbs: Add clone reference counting to ib_pd
> >   IB/uverbs: Add PD import verb
> >   IB/mlx4: Enable import from FD verb
> >   IB/mlx5: Enable import from FD verb
> >   RDMA/rxe: Enable import from FD verb
> > 
> > Yuval Shaia (8):
> >   IB/core: Install clone ib_pd in device ops
> >   IB/core: ib_mr should not have ib_uobject pointer
> >   IB/core: Install clone ib_mr in device ops
> >   IB/mlx4: Add implementation of clone_pd callback
> >   IB/mlx5: Add implementation of clone_pd callback
> >   RDMA/rxe: Add implementation of clone_pd callback
> >   IB/uverbs: Add clone reference counting to ib_mr
> >   IB/uverbs: Add MR import verb
> > 
> >  drivers/infiniband/core/device.c              |   2 +
> >  drivers/infiniband/core/nldev.c               | 127 ++++-
> >  drivers/infiniband/core/rdma_core.c           |  23 +-
> >  drivers/infiniband/core/uverbs.h              |   2 +
> >  drivers/infiniband/core/uverbs_cmd.c          | 489 +++++++++++++++---
> >  drivers/infiniband/core/uverbs_main.c         |   1 +
> >  drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
> >  drivers/infiniband/core/verbs.c               |   4 -
> >  drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
> >  drivers/infiniband/hw/mlx4/main.c             |  18 +-
> >  drivers/infiniband/hw/mlx5/main.c             |  34 +-
> >  drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
> >  drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
> >  include/rdma/ib_verbs.h                       |  43 +-
> >  include/rdma/uverbs_std_types.h               |  11 +-
> >  include/uapi/rdma/ib_user_verbs.h             |  15 +
> >  include/uapi/rdma/rdma_netlink.h              |   3 +
> >  17 files changed, 669 insertions(+), 113 deletions(-)
> > 
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-21 14:50 ` [PATCH v1 00/24] Shared PD and MR Jason Gunthorpe
@ 2019-08-22  8:50   ` Yuval Shaia
  0 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-22  8:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Shamir Rabinovitch
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, johannes.berg, willy,
	Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 02:50:16PM +0000, Jason Gunthorpe wrote:
> On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> > Following patch-set introduce the shared object feature.
> > 
> > A shared object feature allows one process to create HW objects (currently
> > PD and MR) so that a second process can import.
> > 
> > Patch-set is logically splits to 4 parts as the following:
> > - patches 1 to 7 and 18 are preparation steps.
> > - patches 8 to 14 are the implementation of import PD
> > - patches 15 to 17 are the implementation of the verb
> > - patches 19 to 24 are the implementation of import MR
> 
> This is way too big. 10-14 patches at most in a series.

I agree with you.
Actually i had an offline discussion with Shamir on that.
Shamir view point here is that he wanted to split things to smaller pieces
to ease the maintenance (git bisect etc) and code review.

So we have two options now, one is to split this patch-set into two
separate patch-sets, one will deal with preparation (infrastructure and
cleanups) and second with the actual feature. Or second option is to merge
some patches, e.x. the patches that installs the hook in providers code
could be merged.

Not to break Shamir's work i tend to go with the first option.

Shamir, what do you think?

Yuval

> 
> Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22  8:41   ` Yuval Shaia
@ 2019-08-22 14:15     ` Doug Ledford
  2019-08-26  9:35       ` Yuval Shaia
  2019-08-22 16:58     ` Ira Weiny
  1 sibling, 1 reply; 49+ messages in thread
From: Doug Ledford @ 2019-08-22 14:15 UTC (permalink / raw)
  To: Yuval Shaia, Ira Weiny
  Cc: jgg, leon, monis, parav, danielj, kamalheib1, markz,
	johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

[-- Attachment #1: Type: text/plain, Size: 6945 bytes --]

On Thu, 2019-08-22 at 11:41 +0300, Yuval Shaia wrote:
> On Wed, Aug 21, 2019 at 04:37:37PM -0700, Ira Weiny wrote:
> > On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> > > Following patch-set introduce the shared object feature.
> > > 
> > > A shared object feature allows one process to create HW objects
> > > (currently
> > > PD and MR) so that a second process can import.
> 
> Hi Ira,
> 
> > For something this fundamental I think the cover letter should be
> > more
> > detailed than this.  Questions I have without digging into the code:
> > 
> > What is the use case?
> 
> I have only one use case but i didn't added it to commit log just not
> to
> limit the usage of this feature but you are right, cover letter is
> great
> for such things, will add it for v2.
> 
> Anyway, here is our use case: Consider a case of server with huge
> amount
> of memory and some hundreds or even thousands processes are using it
> to
> serves clients requests. In this case the HCA will have to manage
> hundreds
> or thousands MRs. A better design maybe would be that one process will
> create one (or several) MR(s) which will be shared with the other
> processes. This will reduce the number of address translation entries
> and
> cache miss dramatically.

Unless I'm misreading you here, it will be at the expense of pretty much
all inter-process memory security.  You're talking about one process
creating some large MRs just to cover the overall memory in the machine,
then sharing that among processes, and all using it to reduce the MR
workload of the card.  This sounds like going back to the days of MSDos.
It also sounds like a programming error in one process could expose
potentially all processes data buffers across all processes sharing this
PD and MR.

I get the idea, and the problem you are trying to solve, but I'm not
sure that going down this path is wise.

Maybe....maybe if you limit a queue pair to send/recv only and no
rdma_{read,write}, then this wouldn't be quite as bad.  But even then
I'm still very leary of this "feature".

> 
> > What is the "key" that allows a MR to be shared among 2
> > processes?  Do you
> > introduce some PD identifier?  And then some {PDID, lkey} tuple is
> > used to ID
> > the MR?
> > 
> > I assume you have to share the PD first and then any MR in the
> > shared PD can be
> > shared?  If so how does the MR get shared?
> 
> Sorry, i'm not following.
> I think the term 'share' is somehow mistake, it is actually a process
> 'imports' objects into it's context. And yes, the workflow is first to
> import the PD and then import the MR.
> 
> > Again I'm concerned with how this will interact with the RDMA and
> > file system
> > interaction we have been trying to fix.
> 
> I'm not aware of this file-system thing, can you point me to some
> discussion on that so i'll see how this patch-set affect it.
> 
> > Why is SCM_RIGHTS on the rdma context FD not sufficient to share the
> > entire
> > context, PD, and all MR's?
> 
> Well, this SCM_RIGHTS is great, one can share the IB context with
> another.
> But it is not enough, because:
> - What API the second process can use to get his hands on one of the
> PDs or
>   MRs from this context?
> - What mechanism takes care of the destruction of such objects
> (SCM_RIGHTS
>   takes care for the ref counting of the context but i'm referring to
> the
>   PDs and MRs objects)?
> 
> The entire purpose of this patch set is to address these two
> questions.
> 
> Yuval
> 
> > Ira
> > 
> > > Patch-set is logically splits to 4 parts as the following:
> > > - patches 1 to 7 and 18 are preparation steps.
> > > - patches 8 to 14 are the implementation of import PD
> > > - patches 15 to 17 are the implementation of the verb
> > > - patches 19 to 24 are the implementation of import MR
> > > 
> > > v0 -> v1:
> > > 	* Delete the patch "IB/uverbs: ufile must be freed only when not
> > > 	  used anymore". The process can die, the ucontext remains until
> > > 	  last reference to it is closed.
> > > 	* Rebase to latest for-next branch
> > > 
> > > Shamir Rabinovitch (16):
> > >   RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
> > >   RDMA/uverbs: Delete the macro uobj_put_obj_read
> > >   RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
> > >   IB/{core,hw}: ib_pd should not have ib_uobject pointer
> > >   IB/core: ib_uobject need HW object reference count
> > >   IB/uverbs: Helper function to initialize ufile member of
> > >     uverbs_attr_bundle
> > >   IB/uverbs: Add context import lock/unlock helper
> > >   IB/verbs: Prototype of HW object clone callback
> > >   IB/mlx4: Add implementation of clone_pd callback
> > >   IB/mlx5: Add implementation of clone_pd callback
> > >   RDMA/rxe: Add implementation of clone_pd callback
> > >   IB/uverbs: Add clone reference counting to ib_pd
> > >   IB/uverbs: Add PD import verb
> > >   IB/mlx4: Enable import from FD verb
> > >   IB/mlx5: Enable import from FD verb
> > >   RDMA/rxe: Enable import from FD verb
> > > 
> > > Yuval Shaia (8):
> > >   IB/core: Install clone ib_pd in device ops
> > >   IB/core: ib_mr should not have ib_uobject pointer
> > >   IB/core: Install clone ib_mr in device ops
> > >   IB/mlx4: Add implementation of clone_pd callback
> > >   IB/mlx5: Add implementation of clone_pd callback
> > >   RDMA/rxe: Add implementation of clone_pd callback
> > >   IB/uverbs: Add clone reference counting to ib_mr
> > >   IB/uverbs: Add MR import verb
> > > 
> > >  drivers/infiniband/core/device.c              |   2 +
> > >  drivers/infiniband/core/nldev.c               | 127 ++++-
> > >  drivers/infiniband/core/rdma_core.c           |  23 +-
> > >  drivers/infiniband/core/uverbs.h              |   2 +
> > >  drivers/infiniband/core/uverbs_cmd.c          | 489
> > > +++++++++++++++---
> > >  drivers/infiniband/core/uverbs_main.c         |   1 +
> > >  drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
> > >  drivers/infiniband/core/verbs.c               |   4 -
> > >  drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
> > >  drivers/infiniband/hw/mlx4/main.c             |  18 +-
> > >  drivers/infiniband/hw/mlx5/main.c             |  34 +-
> > >  drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
> > >  drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
> > >  include/rdma/ib_verbs.h                       |  43 +-
> > >  include/rdma/uverbs_std_types.h               |  11 +-
> > >  include/uapi/rdma/ib_user_verbs.h             |  15 +
> > >  include/uapi/rdma/rdma_netlink.h              |   3 +
> > >  17 files changed, 669 insertions(+), 113 deletions(-)
> > > 
> > > -- 
> > > 2.20.1
> > > 

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22  8:41   ` Yuval Shaia
  2019-08-22 14:15     ` Doug Ledford
@ 2019-08-22 16:58     ` Ira Weiny
  2019-08-22 17:03       ` Jason Gunthorpe
                         ` (3 more replies)
  1 sibling, 4 replies; 49+ messages in thread
From: Ira Weiny @ 2019-08-22 16:58 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

On Thu, Aug 22, 2019 at 11:41:03AM +0300, Yuval Shaia wrote:
> On Wed, Aug 21, 2019 at 04:37:37PM -0700, Ira Weiny wrote:
> > On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> > > Following patch-set introduce the shared object feature.
> > > 
> > > A shared object feature allows one process to create HW objects (currently
> > > PD and MR) so that a second process can import.
> 
> Hi Ira,
> 
> > 
> > For something this fundamental I think the cover letter should be more
> > detailed than this.  Questions I have without digging into the code:
> > 
> > What is the use case?
> 
> I have only one use case but i didn't added it to commit log just not to
> limit the usage of this feature but you are right, cover letter is great
> for such things, will add it for v2.
> 
> Anyway, here is our use case: Consider a case of server with huge amount
> of memory and some hundreds or even thousands processes are using it to
> serves clients requests. In this case the HCA will have to manage hundreds
> or thousands MRs. A better design maybe would be that one process will
> create one (or several) MR(s) which will be shared with the other
> processes. This will reduce the number of address translation entries and
> cache miss dramatically.

I think Doug covered my concerns in this area.

> 
> > 
> > What is the "key" that allows a MR to be shared among 2 processes?  Do you
> > introduce some PD identifier?  And then some {PDID, lkey} tuple is used to ID
> > the MR?
> > 
> > I assume you have to share the PD first and then any MR in the shared PD can be
> > shared?  If so how does the MR get shared?
> 
> Sorry, i'm not following.
> I think the term 'share' is somehow mistake, it is actually a process
> 'imports' objects into it's context. And yes, the workflow is first to
> import the PD and then import the MR.

Ok fair enough but the title of the thread is "Sharing PD and MR" so I used the
term Share.  And I expect that any random process can't import objects to which
the owning process does not allow them to right?

I mean you can't just have any process grab a PD and MR and start using it.  So
I assume there is some "sharing" by the originating process.

> 
> > 
> > Again I'm concerned with how this will interact with the RDMA and file system
> > interaction we have been trying to fix.
> 
> I'm not aware of this file-system thing, can you point me to some
> discussion on that so i'll see how this patch-set affect it.


https://lkml.org/lkml/2019/6/6/1101
https://lkml.org/lkml/2019/8/9/1043
https://lwn.net/Articles/796000/

There are many more articles, patch sets, discussion threads...  This work has
been going on much longer than I have been working on it.

> 
> > 
> > Why is SCM_RIGHTS on the rdma context FD not sufficient to share the entire
> > context, PD, and all MR's?
> 
> Well, this SCM_RIGHTS is great, one can share the IB context with another.
> But it is not enough, because:
> - What API the second process can use to get his hands on one of the PDs or
>   MRs from this context?

MRs can be passed by {PD,key} through any number of mechanisms.  All you need
is an ID for them.  Maybe this is clear in the code.  If so sorry about that.

> - What mechanism takes care of the destruction of such objects (SCM_RIGHTS
>   takes care for the ref counting of the context but i'm referring to the
>   PDs and MRs objects)?

This is inherent in the lifetime of the uverbs file object to which cloned FDs
(one in each process) have a reference to.

Add to your list "how does destruction of a MR in 1 process get communicated to
the other?"  Does the 2nd process just get failed WR's?

> 
> The entire purpose of this patch set is to address these two questions.

Fair enough but the cover letter should spell out the above and how this series
fixes that problem.

I have some of the same concerns as Doug WRT memory sharing.  FWIW I'm not sure
that what SCM_RIGHTS is doing is safe or correct.

For that work I'm really starting to think SCM_RIGHTS transfers should be
blocked.  It just seems wrong that Process B gets references to Process A's
mm_struct and holds the memory Process A allocated.  This seems wrong for any
type of memory, file system or not.  That said I'm assuming that this is all
within a single user so admins can at least determine who is pinning down all
this memory.

Ira


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 16:58     ` Ira Weiny
@ 2019-08-22 17:03       ` Jason Gunthorpe
  2019-08-22 20:10         ` Weiny, Ira
  2019-08-26 10:29         ` Yuval Shaia
  2019-08-26  9:51       ` Yuval Shaia
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-22 17:03 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Yuval Shaia, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:

> Add to your list "how does destruction of a MR in 1 process get communicated to
> the other?"  Does the 2nd process just get failed WR's?

IHMO a object that has been shared can no longer be asynchronously
destroyed. That is the whole point. A lkey/rkey # alone is inherently
unsafe without also holding a refcount on the MR.

> I have some of the same concerns as Doug WRT memory sharing.  FWIW I'm not sure
> that what SCM_RIGHTS is doing is safe or correct.
> 
> For that work I'm really starting to think SCM_RIGHTS transfers should be
> blocked.  

That isn't possible, SCM_RIGHTS is just some special case, fork(),
exec(), etc all cause the same situation. Any solution that blocks
those is a total non-starter.

> It just seems wrong that Process B gets references to Process A's
> mm_struct and holds the memory Process A allocated.  

Except for ODP, a MR doesn't reference the mm_struct. It references
the pages. It is not unlike a memfd.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 17:03       ` Jason Gunthorpe
@ 2019-08-22 20:10         ` Weiny, Ira
  2019-08-23 11:57           ` Jason Gunthorpe
  2019-08-26 10:29         ` Yuval Shaia
  1 sibling, 1 reply; 49+ messages in thread
From: Weiny, Ira @ 2019-08-22 20:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yuval Shaia, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, Berg, Johannes,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, Dalessandro, Dennis, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

> On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:
> 
> > Add to your list "how does destruction of a MR in 1 process get
> > communicated to the other?"  Does the 2nd process just get failed WR's?
> 
> IHMO a object that has been shared can no longer be asynchronously destroyed.
> That is the whole point. A lkey/rkey # alone is inherently unsafe without also
> holding a refcount on the MR.
> 
> > I have some of the same concerns as Doug WRT memory sharing.  FWIW I'm
> > not sure that what SCM_RIGHTS is doing is safe or correct.
> >
> > For that work I'm really starting to think SCM_RIGHTS transfers should
> > be blocked.
> 
> That isn't possible, SCM_RIGHTS is just some special case, fork(), exec(), etc all
> cause the same situation. Any solution that blocks those is a total non-starter.

Right, except in the case of fork(), exec() all of the file system references which may be pinned also get copied.  With SCM_RIGHTS they may not...  And my concern here is, if I understand this mechanism, it would introduce another avenue where the file pin is shared _without_ the file lease (or with a potentially zombie'ed lease).[1]

[1] https://lkml.org/lkml/2019/8/16/994

> 
> > It just seems wrong that Process B gets references to Process A's
> > mm_struct and holds the memory Process A allocated.
> 
> Except for ODP, a MR doesn't reference the mm_struct. It references the pages.
> It is not unlike a memfd.

I'm thinking of the owner_mm...  It is not like it is holding the entire process address space I know that.  But it is holding onto memory which Process A allocated.

Ira

> 
> Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 20:10         ` Weiny, Ira
@ 2019-08-23 11:57           ` Jason Gunthorpe
  2019-08-23 21:33             ` Weiny, Ira
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-23 11:57 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Yuval Shaia, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, Berg, Johannes,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, Dalessandro, Dennis, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Thu, Aug 22, 2019 at 08:10:09PM +0000, Weiny, Ira wrote:
> > On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:
> > 
> > > Add to your list "how does destruction of a MR in 1 process get
> > > communicated to the other?"  Does the 2nd process just get failed WR's?
> > 
> > IHMO a object that has been shared can no longer be asynchronously destroyed.
> > That is the whole point. A lkey/rkey # alone is inherently unsafe without also
> > holding a refcount on the MR.
> > 
> > > I have some of the same concerns as Doug WRT memory sharing.  FWIW I'm
> > > not sure that what SCM_RIGHTS is doing is safe or correct.
> > >
> > > For that work I'm really starting to think SCM_RIGHTS transfers should
> > > be blocked.
> > 
> > That isn't possible, SCM_RIGHTS is just some special case, fork(), exec(), etc all
> > cause the same situation. Any solution that blocks those is a total non-starter.
> 
> Right, except in the case of fork(), exec() all of the file system
> references which may be pinned also get copied.  

And what happens one one child of the fork closes the reference, or
exec with CLOEXEC causes it to no inherent?

It completely breaks the unix model to tie something to a process not
to a FD.

> > Except for ODP, a MR doesn't reference the mm_struct. It references the pages.
> > It is not unlike a memfd.
> 
> I'm thinking of the owner_mm...  It is not like it is holding the
> entire process address space I know that.  But it is holding onto
> memory which Process A allocated.

It only hold the mm for some statistics accounting, it is really just
holding pages outside the mm.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v1 00/24] Shared PD and MR
  2019-08-23 11:57           ` Jason Gunthorpe
@ 2019-08-23 21:33             ` Weiny, Ira
  2019-08-26 10:58               ` Yuval Shaia
  0 siblings, 1 reply; 49+ messages in thread
From: Weiny, Ira @ 2019-08-23 21:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yuval Shaia, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, Berg, Johannes,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, Dalessandro, Dennis, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

> Subject: Re: [PATCH v1 00/24] Shared PD and MR
> 
> On Thu, Aug 22, 2019 at 08:10:09PM +0000, Weiny, Ira wrote:
> > > On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:
> > >
> > > > Add to your list "how does destruction of a MR in 1 process get
> > > > communicated to the other?"  Does the 2nd process just get failed
> WR's?
> > >
> > > IHMO a object that has been shared can no longer be asynchronously
> destroyed.
> > > That is the whole point. A lkey/rkey # alone is inherently unsafe
> > > without also holding a refcount on the MR.
> > >
> > > > I have some of the same concerns as Doug WRT memory sharing.
> FWIW
> > > > I'm not sure that what SCM_RIGHTS is doing is safe or correct.
> > > >
> > > > For that work I'm really starting to think SCM_RIGHTS transfers
> > > > should be blocked.
> > >
> > > That isn't possible, SCM_RIGHTS is just some special case, fork(),
> > > exec(), etc all cause the same situation. Any solution that blocks those is a
> total non-starter.
> >
> > Right, except in the case of fork(), exec() all of the file system
> > references which may be pinned also get copied.
> 
> And what happens one one child of the fork closes the reference, or exec with
> CLOEXEC causes it to no inherent?

Dave Chinner is suggesting the close will hang.  Exec with CLOEXEC would probably not because the RDMA close would release the pin allowing the close of the data file to finish...  At least as far as my testing has shown.

> 
> It completely breaks the unix model to tie something to a process not to a
> FD.

But that is just it.  Dave is advocating that the FD's must get transferred.  It has nothing to do with a process.

I'm somewhat confused at this point because in this thread I was advocating that the RDMA context FD is what needs to get "shared" between the processes.  Is that what you are advocating as well?  Does this patch set do that?

> 
> > > Except for ODP, a MR doesn't reference the mm_struct. It references the
> pages.
> > > It is not unlike a memfd.
> >
> > I'm thinking of the owner_mm...  It is not like it is holding the
> > entire process address space I know that.  But it is holding onto
> > memory which Process A allocated.
> 
> It only hold the mm for some statistics accounting, it is really just holding
> pages outside the mm.

But those pages aren't necessarily mapped in Process B.  and if they are mapped in Process A then you are sending data to Process A not "B"...  That is one twisted way to look at it anyway...

Ira


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 14:15     ` Doug Ledford
@ 2019-08-26  9:35       ` Yuval Shaia
  0 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-26  9:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Ira Weiny, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

On Thu, Aug 22, 2019 at 10:15:11AM -0400, Doug Ledford wrote:
> On Thu, 2019-08-22 at 11:41 +0300, Yuval Shaia wrote:
> > On Wed, Aug 21, 2019 at 04:37:37PM -0700, Ira Weiny wrote:
> > > On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> > > > Following patch-set introduce the shared object feature.
> > > > 
> > > > A shared object feature allows one process to create HW objects
> > > > (currently
> > > > PD and MR) so that a second process can import.
> > 
> > Hi Ira,
> > 
> > > For something this fundamental I think the cover letter should be
> > > more
> > > detailed than this.  Questions I have without digging into the code:
> > > 
> > > What is the use case?
> > 
> > I have only one use case but i didn't added it to commit log just not
> > to
> > limit the usage of this feature but you are right, cover letter is
> > great
> > for such things, will add it for v2.
> > 
> > Anyway, here is our use case: Consider a case of server with huge
> > amount
> > of memory and some hundreds or even thousands processes are using it
> > to
> > serves clients requests. In this case the HCA will have to manage
> > hundreds
> > or thousands MRs. A better design maybe would be that one process will
> > create one (or several) MR(s) which will be shared with the other
> > processes. This will reduce the number of address translation entries
> > and
> > cache miss dramatically.
> 
> Unless I'm misreading you here, it will be at the expense of pretty much
> all inter-process memory security.  You're talking about one process

Isn't it already there with the use of Linux shared memory?

> creating some large MRs just to cover the overall memory in the machine,
> then sharing that among processes, and all using it to reduce the MR
> workload of the card.  This sounds like going back to the days of MSDos.

Well, too many MRs can lead to serious bottleneck, we are currently dealing
with such issue when many VMs are trying to re-register their MRs at once,
but since it is out of the scope of $subject i will not expand, just
mentioning it because *it is* an issue and educing the number of MRs could
help.

> It also sounds like a programming error in one process could expose
> potentially all processes data buffers across all processes sharing this
> PD and MR.

Again, this is already possible with shared memory and some designs trusts
on that.

> 
> I get the idea, and the problem you are trying to solve, but I'm not
> sure that going down this path is wise.
> 
> Maybe....maybe if you limit a queue pair to send/recv only and no
> rdma_{read,write}, then this wouldn't be quite as bad.  But even then
> I'm still very leary of this "feature".

How about if all the processes are considered as one unit of trust? anyway
this could be done in a multi threaded application or when one process
forks child processes.

> 
> > 
> > > What is the "key" that allows a MR to be shared among 2
> > > processes?  Do you
> > > introduce some PD identifier?  And then some {PDID, lkey} tuple is
> > > used to ID
> > > the MR?
> > > 
> > > I assume you have to share the PD first and then any MR in the
> > > shared PD can be
> > > shared?  If so how does the MR get shared?
> > 
> > Sorry, i'm not following.
> > I think the term 'share' is somehow mistake, it is actually a process
> > 'imports' objects into it's context. And yes, the workflow is first to
> > import the PD and then import the MR.
> > 
> > > Again I'm concerned with how this will interact with the RDMA and
> > > file system
> > > interaction we have been trying to fix.
> > 
> > I'm not aware of this file-system thing, can you point me to some
> > discussion on that so i'll see how this patch-set affect it.
> > 
> > > Why is SCM_RIGHTS on the rdma context FD not sufficient to share the
> > > entire
> > > context, PD, and all MR's?
> > 
> > Well, this SCM_RIGHTS is great, one can share the IB context with
> > another.
> > But it is not enough, because:
> > - What API the second process can use to get his hands on one of the
> > PDs or
> >   MRs from this context?
> > - What mechanism takes care of the destruction of such objects
> > (SCM_RIGHTS
> >   takes care for the ref counting of the context but i'm referring to
> > the
> >   PDs and MRs objects)?
> > 
> > The entire purpose of this patch set is to address these two
> > questions.
> > 
> > Yuval
> > 
> > > Ira
> > > 
> > > > Patch-set is logically splits to 4 parts as the following:
> > > > - patches 1 to 7 and 18 are preparation steps.
> > > > - patches 8 to 14 are the implementation of import PD
> > > > - patches 15 to 17 are the implementation of the verb
> > > > - patches 19 to 24 are the implementation of import MR
> > > > 
> > > > v0 -> v1:
> > > > 	* Delete the patch "IB/uverbs: ufile must be freed only when not
> > > > 	  used anymore". The process can die, the ucontext remains until
> > > > 	  last reference to it is closed.
> > > > 	* Rebase to latest for-next branch
> > > > 
> > > > Shamir Rabinovitch (16):
> > > >   RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
> > > >   RDMA/uverbs: Delete the macro uobj_put_obj_read
> > > >   RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
> > > >   IB/{core,hw}: ib_pd should not have ib_uobject pointer
> > > >   IB/core: ib_uobject need HW object reference count
> > > >   IB/uverbs: Helper function to initialize ufile member of
> > > >     uverbs_attr_bundle
> > > >   IB/uverbs: Add context import lock/unlock helper
> > > >   IB/verbs: Prototype of HW object clone callback
> > > >   IB/mlx4: Add implementation of clone_pd callback
> > > >   IB/mlx5: Add implementation of clone_pd callback
> > > >   RDMA/rxe: Add implementation of clone_pd callback
> > > >   IB/uverbs: Add clone reference counting to ib_pd
> > > >   IB/uverbs: Add PD import verb
> > > >   IB/mlx4: Enable import from FD verb
> > > >   IB/mlx5: Enable import from FD verb
> > > >   RDMA/rxe: Enable import from FD verb
> > > > 
> > > > Yuval Shaia (8):
> > > >   IB/core: Install clone ib_pd in device ops
> > > >   IB/core: ib_mr should not have ib_uobject pointer
> > > >   IB/core: Install clone ib_mr in device ops
> > > >   IB/mlx4: Add implementation of clone_pd callback
> > > >   IB/mlx5: Add implementation of clone_pd callback
> > > >   RDMA/rxe: Add implementation of clone_pd callback
> > > >   IB/uverbs: Add clone reference counting to ib_mr
> > > >   IB/uverbs: Add MR import verb
> > > > 
> > > >  drivers/infiniband/core/device.c              |   2 +
> > > >  drivers/infiniband/core/nldev.c               | 127 ++++-
> > > >  drivers/infiniband/core/rdma_core.c           |  23 +-
> > > >  drivers/infiniband/core/uverbs.h              |   2 +
> > > >  drivers/infiniband/core/uverbs_cmd.c          | 489
> > > > +++++++++++++++---
> > > >  drivers/infiniband/core/uverbs_main.c         |   1 +
> > > >  drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
> > > >  drivers/infiniband/core/verbs.c               |   4 -
> > > >  drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
> > > >  drivers/infiniband/hw/mlx4/main.c             |  18 +-
> > > >  drivers/infiniband/hw/mlx5/main.c             |  34 +-
> > > >  drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
> > > >  drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
> > > >  include/rdma/ib_verbs.h                       |  43 +-
> > > >  include/rdma/uverbs_std_types.h               |  11 +-
> > > >  include/uapi/rdma/ib_user_verbs.h             |  15 +
> > > >  include/uapi/rdma/rdma_netlink.h              |   3 +
> > > >  17 files changed, 669 insertions(+), 113 deletions(-)
> > > > 
> > > > -- 
> > > > 2.20.1
> > > > 
> 
> -- 
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 16:58     ` Ira Weiny
  2019-08-22 17:03       ` Jason Gunthorpe
@ 2019-08-26  9:51       ` Yuval Shaia
  2019-08-26 10:04       ` Yuval Shaia
  2019-08-26 10:10       ` Yuval Shaia
  3 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-26  9:51 UTC (permalink / raw)
  To: Ira Weiny
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

> > 
> > > 
> > > What is the "key" that allows a MR to be shared among 2 processes?  Do you
> > > introduce some PD identifier?  And then some {PDID, lkey} tuple is used to ID
> > > the MR?
> > > 
> > > I assume you have to share the PD first and then any MR in the shared PD can be
> > > shared?  If so how does the MR get shared?
> > 
> > Sorry, i'm not following.
> > I think the term 'share' is somehow mistake, it is actually a process
> > 'imports' objects into it's context. And yes, the workflow is first to
> > import the PD and then import the MR.
> 
> Ok fair enough but the title of the thread is "Sharing PD and MR" so I used the

You are right, my bad, will change the cover letter and some commit
messages accordingly.

> term Share.  And I expect that any random process can't import objects to which
> the owning process does not allow them to right?
> 
> I mean you can't just have any process grab a PD and MR and start using it.  So
> I assume there is some "sharing" by the originating process.

Any process that connects to the socket that the SCM_RIGHT message is
relayed on. I guess that if this mechanism exist then importing the actual
objects is just a supplemental service.

> 
> > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 16:58     ` Ira Weiny
  2019-08-22 17:03       ` Jason Gunthorpe
  2019-08-26  9:51       ` Yuval Shaia
@ 2019-08-26 10:04       ` Yuval Shaia
  2019-08-26 10:10       ` Yuval Shaia
  3 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-26 10:04 UTC (permalink / raw)
  To: Ira Weiny
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

> 
> > - What mechanism takes care of the destruction of such objects (SCM_RIGHTS
> >   takes care for the ref counting of the context but i'm referring to the
> >   PDs and MRs objects)?
> 
> This is inherent in the lifetime of the uverbs file object to which cloned FDs
> (one in each process) have a reference to.
> 
> Add to your list "how does destruction of a MR in 1 process get communicated to
> the other?"  Does the 2nd process just get failed WR's?

I meant the opposite, i.e. when two processes are sharing an object, the
fact that one decides to destroy it cannot affect the other so a ref count
needs to be maintained so object will be disposed only in case of all
references asked for destruction.

> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 16:58     ` Ira Weiny
                         ` (2 preceding siblings ...)
  2019-08-26 10:04       ` Yuval Shaia
@ 2019-08-26 10:10       ` Yuval Shaia
  3 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-26 10:10 UTC (permalink / raw)
  To: Ira Weiny
  Cc: dledford, jgg, leon, monis, parav, danielj, kamalheib1, markz,
	swise, johannes.berg, willy, michaelgur, markb, dan.carpenter,
	bvanassche, maxg, israelr, galpress, denisd, yuvalav,
	dennis.dalessandro, will, ereza, jgg, linux-rdma,
	Shamir Rabinovitch

> 
> > 
> > > 
> > > Why is SCM_RIGHTS on the rdma context FD not sufficient to share the entire
> > > context, PD, and all MR's?
> > 
> > Well, this SCM_RIGHTS is great, one can share the IB context with another.
> > But it is not enough, because:
> > - What API the second process can use to get his hands on one of the PDs or
> >   MRs from this context?
> 
> MRs can be passed by {PD,key} through any number of mechanisms.  All you need
> is an ID for them.  Maybe this is clear in the code.  If so sorry about that.

So given an ID of a PD, what is the function is can use to get the pointer
to the ibv_pd object?

> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-22 17:03       ` Jason Gunthorpe
  2019-08-22 20:10         ` Weiny, Ira
@ 2019-08-26 10:29         ` Yuval Shaia
  2019-08-26 12:26           ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-26 10:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ira Weiny, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Thu, Aug 22, 2019 at 05:03:15PM +0000, Jason Gunthorpe wrote:
> On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:
> 
> > Add to your list "how does destruction of a MR in 1 process get communicated to
> > the other?"  Does the 2nd process just get failed WR's?
> 
> IHMO a object that has been shared can no longer be asynchronously
> destroyed. That is the whole point. A lkey/rkey # alone is inherently
> unsafe without also holding a refcount on the MR.

You meant to say "can no longer be synchronously destroyed", right?

> 
> > I have some of the same concerns as Doug WRT memory sharing.  FWIW I'm not sure
> > that what SCM_RIGHTS is doing is safe or correct.
> > 
> > For that work I'm really starting to think SCM_RIGHTS transfers should be
> > blocked.  
> 
> That isn't possible, SCM_RIGHTS is just some special case, fork(),
> exec(), etc all cause the same situation. Any solution that blocks
> those is a total non-starter.
> 
> > It just seems wrong that Process B gets references to Process A's
> > mm_struct and holds the memory Process A allocated.  
> 
> Except for ODP, a MR doesn't reference the mm_struct. It references
> the pages. It is not unlike a memfd.
> 
> Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-23 21:33             ` Weiny, Ira
@ 2019-08-26 10:58               ` Yuval Shaia
  0 siblings, 0 replies; 49+ messages in thread
From: Yuval Shaia @ 2019-08-26 10:58 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Jason Gunthorpe, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, Berg, Johannes,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, Dalessandro, Dennis, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Fri, Aug 23, 2019 at 09:33:06PM +0000, Weiny, Ira wrote:
> > Subject: Re: [PATCH v1 00/24] Shared PD and MR
> > 
> > On Thu, Aug 22, 2019 at 08:10:09PM +0000, Weiny, Ira wrote:
> > > > On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:
> > > >
> > > > > Add to your list "how does destruction of a MR in 1 process get
> > > > > communicated to the other?"  Does the 2nd process just get failed
> > WR's?
> > > >
> > > > IHMO a object that has been shared can no longer be asynchronously
> > destroyed.
> > > > That is the whole point. A lkey/rkey # alone is inherently unsafe
> > > > without also holding a refcount on the MR.
> > > >
> > > > > I have some of the same concerns as Doug WRT memory sharing.
> > FWIW
> > > > > I'm not sure that what SCM_RIGHTS is doing is safe or correct.
> > > > >
> > > > > For that work I'm really starting to think SCM_RIGHTS transfers
> > > > > should be blocked.
> > > >
> > > > That isn't possible, SCM_RIGHTS is just some special case, fork(),
> > > > exec(), etc all cause the same situation. Any solution that blocks those is a
> > total non-starter.
> > >
> > > Right, except in the case of fork(), exec() all of the file system
> > > references which may be pinned also get copied.
> > 
> > And what happens one one child of the fork closes the reference, or exec with
> > CLOEXEC causes it to no inherent?
> 
> Dave Chinner is suggesting the close will hang.  Exec with CLOEXEC would probably not because the RDMA close would release the pin allowing the close of the data file to finish...  At least as far as my testing has shown.
> 
> > 
> > It completely breaks the unix model to tie something to a process not to a
> > FD.
> 
> But that is just it.  Dave is advocating that the FD's must get transferred.  It has nothing to do with a process.
> 
> I'm somewhat confused at this point because in this thread I was advocating that the RDMA context FD is what needs to get "shared" between the processes.  Is that what you are advocating as well?  Does this patch set do that?

The IB context sharing mechanism is already exist. This patch-set purpose a
way of importing and maintaining the IBV objects of such shared IB context.

> 
> > 
> > > > Except for ODP, a MR doesn't reference the mm_struct. It references the
> > pages.
> > > > It is not unlike a memfd.
> > >
> > > I'm thinking of the owner_mm...  It is not like it is holding the
> > > entire process address space I know that.  But it is holding onto
> > > memory which Process A allocated.
> > 
> > It only hold the mm for some statistics accounting, it is really just holding
> > pages outside the mm.
> 
> But those pages aren't necessarily mapped in Process B.  and if they are mapped in Process A then you are sending data to Process A not "B"...  That is one twisted way to look at it anyway...
> 
> Ira
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 00/24] Shared PD and MR
  2019-08-26 10:29         ` Yuval Shaia
@ 2019-08-26 12:26           ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-26 12:26 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: Ira Weiny, dledford, leon, Moni Shoua, Parav Pandit,
	Daniel Jurgens, kamalheib1, Mark Zhang, swise, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Mon, Aug 26, 2019 at 01:29:27PM +0300, Yuval Shaia wrote:
> On Thu, Aug 22, 2019 at 05:03:15PM +0000, Jason Gunthorpe wrote:
> > On Thu, Aug 22, 2019 at 09:58:42AM -0700, Ira Weiny wrote:
> > 
> > > Add to your list "how does destruction of a MR in 1 process get communicated to
> > > the other?"  Does the 2nd process just get failed WR's?
> > 
> > IHMO a object that has been shared can no longer be asynchronously
> > destroyed. That is the whole point. A lkey/rkey # alone is inherently
> > unsafe without also holding a refcount on the MR.
> 
> You meant to say "can no longer be synchronously destroyed", right?

No, I mean a another process cannot just rip the rkey out from a
process that is using it, asynchronously

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count
  2019-08-21 14:53   ` Jason Gunthorpe
@ 2019-08-27 16:28     ` Yuval Shaia
  2019-08-27 18:18       ` Jason Gunthorpe
  0 siblings, 1 reply; 49+ messages in thread
From: Yuval Shaia @ 2019-08-27 16:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Wed, Aug 21, 2019 at 02:53:29PM +0000, Jason Gunthorpe wrote:
> On Wed, Aug 21, 2019 at 05:21:06PM +0300, Yuval Shaia wrote:
> > From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > 
> > This new refcnt will points to the refcnt member of the HW object and will
> > behaves as expected by refcnt, i.e. will be increased and decreased as a
> > result of usage changes and will destroy the object when reaches to zero.
> > For a non-shared object refcnt will remain NULL.
> > 
> > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
> >  drivers/infiniband/core/rdma_core.c | 23 +++++++++++++++++++++--
> >  include/rdma/ib_verbs.h             |  7 +++++++
> >  2 files changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..651625f632d7 100644
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -516,7 +516,26 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
> >  	const struct uverbs_obj_idr_type *idr_type =
> >  		container_of(uobj->uapi_object->type_attrs,
> >  			     struct uverbs_obj_idr_type, type);
> > -	int ret = idr_type->destroy_object(uobj, why, attrs);
> > +	static DEFINE_MUTEX(lock);
> > +	int ret, count;
> > +
> > +	mutex_lock(&lock);
> > +
> > +	if (uobj->refcnt) {
> > +		count = atomic_dec_return(uobj->refcnt);
> > +		WARN_ON(count < 0); /* use after free! */
> 
> Use a proper refcount_t

uobj->refcnt points to HW object's refcnt (e.x ib_pd.refcnt)

> 
> > +		if (count) {
> > +			mutex_unlock(&lock);
> > +			goto skip;
> > +		}
> > +	}
> > +
> > +	ret = idr_type->destroy_object(uobj, why, attrs);
> > +
> > +	if (ret)
> > +		atomic_inc(uobj->refcnt);
> > +
> > +	mutex_unlock(&lock);
> >  
> >  	/*
> >  	 * We can only fail gracefully if the user requested to destroy the
> > @@ -525,7 +544,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
> >  	 */
> >  	if (ib_is_destroy_retryable(ret, why, uobj))
> >  		return ret;
> > -
> > +skip:
> 
> This doesn't seem to properly define who owns the rdmacg count - it
> should belong to the HW object not to the uobject.
> 
> > +
> > +	/*
> > +	 * ib_X HW object sharing support
> > +	 * - NULL for HW objects that are not shareable
> > +	 * - Pointer to ib_X reference counter for shareable HW objects
> > +	 */
> > +	atomic_t	       *refcnt;		/* ib_X object ref count */
> 
> Gross, shouldn't this actually be in the hw object?

It is belongs to the HW object, this one just *points* to it and it is
defined here since we like to maintain it when destroy_hw_idr_uobject is
called.

The actual assignment is only for object that *supports* import and it is
set in function ib_uverbs_import_ib_pd (patch "IB/uverbs: Add PD import
verb").

Hope it make sense.

> 
> Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count
  2019-08-27 16:28     ` Yuval Shaia
@ 2019-08-27 18:18       ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2019-08-27 18:18 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford, leon, Moni Shoua, Parav Pandit, Daniel Jurgens,
	kamalheib1, Mark Zhang, swise, shamir.rabinovitch, johannes.berg,
	willy, Michael Guralnik, Mark Bloch, dan.carpenter, bvanassche,
	Max Gurtovoy, Israel Rukshin, galpress, Denis Drozdov,
	Yuval Avnery, dennis.dalessandro, will, Erez Alfasi, linux-rdma,
	Shamir Rabinovitch

On Tue, Aug 27, 2019 at 07:28:14PM +0300, Yuval Shaia wrote:
> On Wed, Aug 21, 2019 at 02:53:29PM +0000, Jason Gunthorpe wrote:
> > On Wed, Aug 21, 2019 at 05:21:06PM +0300, Yuval Shaia wrote:
> > > From: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > > 
> > > This new refcnt will points to the refcnt member of the HW object and will
> > > behaves as expected by refcnt, i.e. will be increased and decreased as a
> > > result of usage changes and will destroy the object when reaches to zero.
> > > For a non-shared object refcnt will remain NULL.
> > > 
> > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> > > Signed-off-by: Shamir Rabinovitch <srabinov7@gmail.com>
> > >  drivers/infiniband/core/rdma_core.c | 23 +++++++++++++++++++++--
> > >  include/rdma/ib_verbs.h             |  7 +++++++
> > >  2 files changed, 28 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> > > index ccf4d069c25c..651625f632d7 100644
> > > +++ b/drivers/infiniband/core/rdma_core.c
> > > @@ -516,7 +516,26 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
> > >  	const struct uverbs_obj_idr_type *idr_type =
> > >  		container_of(uobj->uapi_object->type_attrs,
> > >  			     struct uverbs_obj_idr_type, type);
> > > -	int ret = idr_type->destroy_object(uobj, why, attrs);
> > > +	static DEFINE_MUTEX(lock);
> > > +	int ret, count;
> > > +
> > > +	mutex_lock(&lock);
> > > +
> > > +	if (uobj->refcnt) {
> > > +		count = atomic_dec_return(uobj->refcnt);
> > > +		WARN_ON(count < 0); /* use after free! */
> > 
> > Use a proper refcount_t
> 
> uobj->refcnt points to HW object's refcnt (e.x ib_pd.refcnt)

Hurm. That refcount is kind of broken/racey as is, I'm not clear if it
can be used for this. More changes would probably be needed..

It would be more understandable to start with a dedicated refcount and
then have a patch to consolidate

> > > +
> > > +	/*
> > > +	 * ib_X HW object sharing support
> > > +	 * - NULL for HW objects that are not shareable
> > > +	 * - Pointer to ib_X reference counter for shareable HW objects
> > > +	 */
> > > +	atomic_t	       *refcnt;		/* ib_X object ref count */
> > 
> > Gross, shouldn't this actually be in the hw object?
> 
> It is belongs to the HW object, this one just *points* to it and it is
> defined here since we like to maintain it when destroy_hw_idr_uobject is
> called.

I mean, you should just extract it from the hw_object directly, some
how. Ie have a get_refcount accessor in the object definition.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2019-08-27 18:18 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 01/24] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 02/24] RDMA/uverbs: Delete the macro uobj_put_obj_read Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 03/24] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 04/24] IB/{core,hw}: ib_pd should not have ib_uobject pointer Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count Yuval Shaia
2019-08-21 14:53   ` Jason Gunthorpe
2019-08-27 16:28     ` Yuval Shaia
2019-08-27 18:18       ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 06/24] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper Yuval Shaia
2019-08-21 14:57   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback Yuval Shaia
2019-08-21 14:59   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 09/24] IB/core: Install clone ib_pd in device ops Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
2019-08-21 14:59   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 11/24] IB/mlx5: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 12/24] RDMA/rxe: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 13/24] IB/uverbs: Add clone reference counting to ib_pd Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 14/24] IB/uverbs: Add PD import verb Yuval Shaia
2019-08-21 15:00   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 15/24] IB/mlx4: Enable import from FD verb Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 16/24] IB/mlx5: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 17/24] RDMA/rxe: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 18/24] IB/core: ib_mr should not have ib_uobject pointer Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 19/24] IB/core: Install clone ib_mr in device ops Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 20/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 21/24] IB/mlx5: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 22/24] RDMA/rxe: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 23/24] IB/uverbs: Add clone reference counting to ib_mr Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 24/24] IB/uverbs: Add MR import verb Yuval Shaia
2019-08-21 14:50 ` [PATCH v1 00/24] Shared PD and MR Jason Gunthorpe
2019-08-22  8:50   ` Yuval Shaia
2019-08-21 23:37 ` Ira Weiny
2019-08-22  8:41   ` Yuval Shaia
2019-08-22 14:15     ` Doug Ledford
2019-08-26  9:35       ` Yuval Shaia
2019-08-22 16:58     ` Ira Weiny
2019-08-22 17:03       ` Jason Gunthorpe
2019-08-22 20:10         ` Weiny, Ira
2019-08-23 11:57           ` Jason Gunthorpe
2019-08-23 21:33             ` Weiny, Ira
2019-08-26 10:58               ` Yuval Shaia
2019-08-26 10:29         ` Yuval Shaia
2019-08-26 12:26           ` Jason Gunthorpe
2019-08-26  9:51       ` Yuval Shaia
2019-08-26 10:04       ` Yuval Shaia
2019-08-26 10:10       ` Yuval Shaia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).