linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] RDMA: reg_remote_mr
@ 2019-01-29 13:26 Joel Nider
  2019-01-29 13:26 ` [PATCH 1/5] mm: add get_user_pages_remote_longterm function Joel Nider
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Joel Nider @ 2019-01-29 13:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, Joel Nider,
	linux-mm, linux-rdma, linux-kernel

As discussed at LPC'18, there is a need to be able to register a memory
region (MR) on behalf of another process. One example is the case of
post-copy container migration, in which CRIU is responsible for setting
up the migration, but the contents of the memory are from the migrating
process. In this case, we want all RDMA READ requests to be served by
the address space of the migration process directly (not by CRIU). This
patchset implements a new uverbs command which allows an application to
register a memory region in the address space of another process.

Joel Nider (5):
  mm: add get_user_pages_remote_longterm function
  RDMA/uverbs: add owner parameter to reg_user_mr
  RDMA/uverbs: add owner parameter to ib_umem_get
  RDMA/uverbs: add owner parameter to ib_umem_odp_get
  RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR

 drivers/infiniband/core/umem.c                |  26 ++++--
 drivers/infiniband/core/umem_odp.c            |  50 +++++-----
 drivers/infiniband/core/uverbs_cmd.c          |   2 +-
 drivers/infiniband/core/uverbs_std_types_mr.c | 129 +++++++++++++++++++++++++-
 drivers/infiniband/hw/bnxt_re/ib_verbs.c      |  11 ++-
 drivers/infiniband/hw/bnxt_re/ib_verbs.h      |   1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c   |   6 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h        |   1 +
 drivers/infiniband/hw/cxgb4/mem.c             |   6 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c       |   2 +-
 drivers/infiniband/hw/hns/hns_roce_db.c       |   2 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c       |   4 +-
 drivers/infiniband/hw/hns/hns_roce_qp.c       |   2 +-
 drivers/infiniband/hw/hns/hns_roce_srq.c      |   2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     |   3 +-
 drivers/infiniband/hw/mlx4/cq.c               |   2 +-
 drivers/infiniband/hw/mlx4/doorbell.c         |   2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h          |   1 +
 drivers/infiniband/hw/mlx4/mr.c               |   3 +-
 drivers/infiniband/hw/mlx4/qp.c               |   2 +-
 drivers/infiniband/hw/mlx4/srq.c              |   2 +-
 drivers/infiniband/hw/mlx5/cq.c               |   4 +-
 drivers/infiniband/hw/mlx5/devx.c             |   2 +-
 drivers/infiniband/hw/mlx5/doorbell.c         |   2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   4 +-
 drivers/infiniband/hw/mlx5/mr.c               |  17 ++--
 drivers/infiniband/hw/mlx5/odp.c              |  11 ++-
 drivers/infiniband/hw/mlx5/qp.c               |   4 +-
 drivers/infiniband/hw/mlx5/srq.c              |   2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |   5 +-
 drivers/infiniband/hw/nes/nes_verbs.c         |   5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |   6 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h   |   3 +-
 drivers/infiniband/hw/qedr/verbs.c            |   8 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |   1 +
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h  |   1 +
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |   2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |   2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |   5 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |   2 +-
 drivers/infiniband/sw/rdmavt/mr.c             |   3 +-
 drivers/infiniband/sw/rdmavt/mr.h             |   1 +
 drivers/infiniband/sw/rxe/rxe_mr.c            |   3 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |   4 +-
 include/linux/mm.h                            |  28 +++++-
 include/rdma/ib_umem.h                        |   3 +-
 include/rdma/ib_umem_odp.h                    |   6 +-
 include/rdma/ib_verbs.h                       |   9 ++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  13 +++
 mm/gup.c                                      |  15 ++-
 50 files changed, 327 insertions(+), 103 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] mm: add get_user_pages_remote_longterm function
  2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
@ 2019-01-29 13:26 ` Joel Nider
  2019-01-29 13:26 ` [PATCH 2/5] RDMA/uverbs: add owner parameter to reg_user_mr Joel Nider
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Joel Nider @ 2019-01-29 13:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, Joel Nider,
	linux-mm, linux-rdma, linux-kernel

In order to support the RDMA reg_remote_mr function, we must have the
ability to get memory pages for an indeterminate amount of time from
a remote process. In this case, 'remote' simply means a process that is
different from the caller. Functions for getting longterm pages
(get_user_pages_longterm) and remote pages (get_user_pages_remote)
already exist - this new function combines the functionality of both
of them.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
---
 include/linux/mm.h | 28 +++++++++++++++++++++++++---
 mm/gup.c           | 15 ++++++++++-----
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80bb640..1f5c72472 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1537,9 +1537,21 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 #ifdef CONFIG_FS_DAX
-long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
-			    unsigned int gup_flags, struct page **pages,
-			    struct vm_area_struct **vmas);
+long get_user_pages_remote_longterm(struct task_struct *tsk,
+			    struct mm_struct *mm, unsigned long start,
+			    unsigned long nr_pages, unsigned int gup_flags,
+			    struct page **pages, struct vm_area_struct **vmas);
+
+static inline long get_user_pages_longterm(unsigned long start,
+					   unsigned long nr_pages,
+					   unsigned int gup_flags,
+					   struct page **pages,
+					   struct vm_area_struct **vmas)
+{
+	return get_user_pages_remote_longterm(current, current->mm, start,
+					      nr_pages, gup_flags, pages,
+					      vmas);
+}
 #else
 static inline long get_user_pages_longterm(unsigned long start,
 		unsigned long nr_pages, unsigned int gup_flags,
@@ -1547,6 +1559,16 @@ static inline long get_user_pages_longterm(unsigned long start,
 {
 	return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
 }
+
+static inline long get_user_pages_remote_longterm(struct task_struct *tsk,
+		struct mm_struct *mm, unsigned long start,
+		unsigned long nr_pages, unsigned int gup_flags,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	return get_user_pages_remote(tsk, mm, start, nr_pages,
+		gup_flags, pages, vmas);
+}
+
 #endif /* CONFIG_FS_DAX */
 
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
diff --git a/mm/gup.c b/mm/gup.c
index 05acd7e..bcfe5a6 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1139,9 +1139,11 @@ EXPORT_SYMBOL(get_user_pages);
  * "longterm" == userspace controlled elevated page count lifetime.
  * Contrast this to iov_iter_get_pages() usages which are transient.
  */
-long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
-		unsigned int gup_flags, struct page **pages,
-		struct vm_area_struct **vmas_arg)
+long get_user_pages_remote_longterm(struct task_struct *tsk,
+			    struct mm_struct *mm, unsigned long start,
+			    unsigned long nr_pages, unsigned int gup_flags,
+			    struct page **pages,
+			    struct vm_area_struct **vmas_arg)
 {
 	struct vm_area_struct **vmas = vmas_arg;
 	struct vm_area_struct *vma_prev = NULL;
@@ -1157,7 +1159,9 @@ long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
 			return -ENOMEM;
 	}
 
-	rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+	rc = __get_user_pages_locked(tsk, mm, start, nr_pages,
+				     pages, vmas, NULL,
+				     gup_flags | FOLL_TOUCH | FOLL_REMOTE);
 
 	for (i = 0; i < rc; i++) {
 		struct vm_area_struct *vma = vmas[i];
@@ -1187,7 +1191,8 @@ long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
 		kfree(vmas);
 	return rc;
 }
-EXPORT_SYMBOL(get_user_pages_longterm);
+EXPORT_SYMBOL(get_user_pages_remote_longterm);
+
 #endif /* CONFIG_FS_DAX */
 
 /**
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] RDMA/uverbs: add owner parameter to reg_user_mr
  2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
  2019-01-29 13:26 ` [PATCH 1/5] mm: add get_user_pages_remote_longterm function Joel Nider
@ 2019-01-29 13:26 ` Joel Nider
  2019-01-29 13:26 ` [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get Joel Nider
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Joel Nider @ 2019-01-29 13:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, Joel Nider,
	linux-mm, linux-rdma, linux-kernel

Add a new parameter 'owner' to the reg_user_mr function. The owner
parameter specifies the owning process of the memory for which the
MR is being registered. Until now, the function assumed that the
process calling the function is also the owner. This patch relaxes
that assumption, and allows for the caller and registree to be
different processes, which is required for the reg_remote_mr verb
implemented in the following patches. No functional changes in
these files.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
---
 drivers/infiniband/core/uverbs_cmd.c         | 2 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.c     | 1 +
 drivers/infiniband/hw/bnxt_re/ib_verbs.h     | 1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c  | 3 ++-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h       | 1 +
 drivers/infiniband/hw/cxgb4/mem.c            | 3 ++-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c    | 1 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h         | 1 +
 drivers/infiniband/hw/mlx4/mr.c              | 1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h         | 4 +++-
 drivers/infiniband/hw/mlx5/mr.c              | 2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c | 3 ++-
 drivers/infiniband/hw/nes/nes_verbs.c        | 2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 3 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  | 3 ++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 1 +
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 1 +
 drivers/infiniband/sw/rdmavt/mr.c            | 1 +
 drivers/infiniband/sw/rdmavt/mr.h            | 1 +
 drivers/infiniband/sw/rxe/rxe_verbs.c        | 4 +++-
 include/rdma/ib_verbs.h                      | 1 +
 21 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 6b12cc5..034d595 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -724,7 +724,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	}
 
 	mr = pd->device->ops.reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
-					 cmd.access_flags,
+					 cmd.access_flags, NULL,
 					 &attrs->driver_udata);
 	if (IS_ERR(mr)) {
 		ret = PTR_ERR(mr);
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 1e2515e..0828f27 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3557,6 +3557,7 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig,
 /* uverbs */
 struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 				  u64 virt_addr, int mr_access_flags,
+				  struct pid *owner,
 				  struct ib_udata *udata)
 {
 	struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.h b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
index c4af726..5af76f6 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.h
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
@@ -215,6 +215,7 @@ struct ib_mw *bnxt_re_alloc_mw(struct ib_pd *ib_pd, enum ib_mw_type type,
 int bnxt_re_dealloc_mw(struct ib_mw *mw);
 struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int mr_access_flags,
+				  struct pid *owner,
 				  struct ib_udata *udata);
 struct ib_ucontext *bnxt_re_alloc_ucontext(struct ib_device *ibdev,
 					   struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b34b1a1..54d8b38 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -519,7 +519,8 @@ static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc)
 }
 
 static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				      u64 virt, int acc, struct ib_udata *udata)
+				      u64 virt, int acc, struct pid *owner,
+				      struct ib_udata *udata)
 {
 	__be64 *pages;
 	int shift, n, len;
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index f0fcead..8d382fe 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -1059,6 +1059,7 @@ struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
 			    struct ib_udata *udata);
 struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
 					   u64 length, u64 virt, int acc,
+					   struct pid *owner,
 					   struct ib_udata *udata);
 struct ib_mr *c4iw_get_dma_mr(struct ib_pd *pd, int acc);
 int c4iw_dereg_mr(struct ib_mr *ib_mr);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 7b76e6f..ec9b0b4 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -499,7 +499,8 @@ struct ib_mr *c4iw_get_dma_mr(struct ib_pd *pd, int acc)
 }
 
 struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-			       u64 virt, int acc, struct ib_udata *udata)
+			       u64 virt, int acc, struct pid *owner,
+			       struct ib_udata *udata)
 {
 	__be64 *pages;
 	int shift, n, len;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 0b675b0..fc2e6c8 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1827,6 +1827,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 				       u64 length,
 				       u64 virt,
 				       int acc,
+				       struct pid *owner,
 				       struct ib_udata *udata)
 {
 	struct i40iw_pd *iwpd = to_iwpd(pd);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index e491f3e..80bb83c 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -731,6 +731,7 @@ int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt,
 			   struct ib_umem *umem);
 struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
+				  struct pid *owner,
 				  struct ib_udata *udata);
 int mlx4_ib_dereg_mr(struct ib_mr *mr);
 struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index c7c85c2..76fa83c 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -403,6 +403,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_ucontext *context, u64 start,
 
 struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
+				  struct pid *owner,
 				  struct ib_udata *udata)
 {
 	struct mlx4_ib_dev *dev = to_mdev(pd->device);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index b06d3b1..4060461 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1085,6 +1085,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc);
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
+				  struct pid *owner,
 				  struct ib_udata *udata);
 int mlx5_ib_advise_mr(struct ib_pd *pd,
 		      enum ib_uverbs_advise_mr_advice advice,
@@ -1098,7 +1099,8 @@ int mlx5_ib_dealloc_mw(struct ib_mw *mw);
 int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 		       int page_shift, int flags);
 struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
-					     int access_flags);
+					     int access_flags,
+					     struct pid *owner);
 void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *mr);
 int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 			  u64 length, u64 virt_addr, int access_flags,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index fd6ea1f7..6add486 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -30,7 +30,6 @@
  * SOFTWARE.
  */
 
-
 #include <linux/kref.h>
 #include <linux/random.h>
 #include <linux/debugfs.h>
@@ -1313,6 +1312,7 @@ struct ib_mr *mlx5_ib_reg_dm_mr(struct ib_pd *pd, struct ib_dm *dm,
 
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
+				  struct pid *owner,
 				  struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 82cb6b7..77e678e 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -904,7 +904,8 @@ static struct ib_mr *mthca_get_dma_mr(struct ib_pd *pd, int acc)
 }
 
 static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				       u64 virt, int acc, struct ib_udata *udata)
+				       u64 virt, int acc, struct pid *owner,
+				       struct ib_udata *udata)
 {
 	struct mthca_dev *dev = to_mdev(pd->device);
 	struct scatterlist *sg;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 4e7f08e..e07cb02 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -2097,7 +2097,7 @@ static struct ib_mr *nes_get_dma_mr(struct ib_pd *pd, int acc)
  * nes_reg_user_mr
  */
 static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-		u64 virt, int acc, struct ib_udata *udata)
+		u64 virt, int acc, struct pid *owner, struct ib_udata *udata)
 {
 	u64 iova_start;
 	__le64 *pbl;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 287c332..01d076a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -900,7 +900,8 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
 }
 
 struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
-				 u64 usr_addr, int acc, struct ib_udata *udata)
+				 u64 usr_addr, int acc, struct pid *owner,
+				 struct ib_udata *udata)
 {
 	int status = -ENOMEM;
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index b69cfdce..75ea82f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -106,7 +106,8 @@ int ocrdma_post_srq_recv(struct ib_srq *, const struct ib_recv_wr *,
 int ocrdma_dereg_mr(struct ib_mr *);
 struct ib_mr *ocrdma_get_dma_mr(struct ib_pd *, int acc);
 struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
-				 u64 virt, int acc, struct ib_udata *);
+				 u64 virt, int acc, struct pid *owner,
+				 struct ib_udata *);
 struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
 			      enum ib_mr_type mr_type,
 			      u32 max_num_sg);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 1d4abef..2c8fe13 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -638,6 +638,7 @@ int usnic_ib_destroy_cq(struct ib_cq *cq)
 
 struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
 					u64 virt_addr, int access_flags,
+					struct pid *owner,
 					struct ib_udata *udata)
 {
 	struct usnic_ib_mr *mr;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index e331442..4eb42c9 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -68,6 +68,7 @@ struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
 int usnic_ib_destroy_cq(struct ib_cq *cq);
 struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
 				u64 virt_addr, int access_flags,
+				struct pid *owner,
 				struct ib_udata *udata);
 int usnic_ib_dereg_mr(struct ib_mr *ibmr);
 struct ib_ucontext *usnic_ib_alloc_ucontext(struct ib_device *ibdev,
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 49c9541..2bc95c9 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -377,6 +377,7 @@ struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc)
  */
 struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			      u64 virt_addr, int mr_access_flags,
+			      struct pid *owner,
 			      struct ib_udata *udata)
 {
 	struct rvt_mr *mr;
diff --git a/drivers/infiniband/sw/rdmavt/mr.h b/drivers/infiniband/sw/rdmavt/mr.h
index 132800e..8e6add0 100644
--- a/drivers/infiniband/sw/rdmavt/mr.h
+++ b/drivers/infiniband/sw/rdmavt/mr.h
@@ -77,6 +77,7 @@ void rvt_mr_exit(struct rvt_dev_info *rdi);
 struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc);
 struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			      u64 virt_addr, int mr_access_flags,
+			      struct pid *owner,
 			      struct ib_udata *udata);
 int rvt_dereg_mr(struct ib_mr *ibmr);
 struct ib_mr *rvt_alloc_mr(struct ib_pd *pd,
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index b20e6e0..188e038 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -982,7 +982,9 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 				     u64 start,
 				     u64 length,
 				     u64 iova,
-				     int access, struct ib_udata *udata)
+				     int access,
+				     struct pid *owner,
+				     struct ib_udata *udata)
 {
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibpd->device);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a3ceed3..3432404 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2408,6 +2408,7 @@ struct ib_device_ops {
 	struct ib_mr *(*get_dma_mr)(struct ib_pd *pd, int mr_access_flags);
 	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
 				     u64 virt_addr, int mr_access_flags,
+				     struct pid *owner,
 				     struct ib_udata *udata);
 	int (*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start, u64 length,
 			     u64 virt_addr, int mr_access_flags,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get
  2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
  2019-01-29 13:26 ` [PATCH 1/5] mm: add get_user_pages_remote_longterm function Joel Nider
  2019-01-29 13:26 ` [PATCH 2/5] RDMA/uverbs: add owner parameter to reg_user_mr Joel Nider
@ 2019-01-29 13:26 ` Joel Nider
  2019-01-29 16:56   ` Jason Gunthorpe
  2019-01-29 18:29   ` Ira Weiny
  2019-01-29 13:26 ` [PATCH 4/5] RDMA/uverbs: add owner parameter to ib_umem_odp_get Joel Nider
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Joel Nider @ 2019-01-29 13:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, Joel Nider,
	linux-mm, linux-rdma, linux-kernel

ib_umem_get is a core function used by drivers that support RDMA.
The 'owner' parameter signifies the process that owns the memory.
Until now, it was assumed that the owning process was the current
process. This adds the flexibility to specify a process other than
the current process. All drivers that call this function are also
updated, but the default behaviour is to keep backwards
compatibility by assuming the current process is the owner when
the 'owner' parameter is NULL.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
---
 drivers/infiniband/core/umem.c                | 26 ++++++++++++++++++++------
 drivers/infiniband/hw/bnxt_re/ib_verbs.c      | 10 +++++-----
 drivers/infiniband/hw/cxgb3/iwch_provider.c   |  3 ++-
 drivers/infiniband/hw/cxgb4/mem.c             |  3 ++-
 drivers/infiniband/hw/hns/hns_roce_cq.c       |  2 +-
 drivers/infiniband/hw/hns/hns_roce_db.c       |  2 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c       |  4 ++--
 drivers/infiniband/hw/hns/hns_roce_qp.c       |  2 +-
 drivers/infiniband/hw/hns/hns_roce_srq.c      |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     |  2 +-
 drivers/infiniband/hw/mlx4/cq.c               |  2 +-
 drivers/infiniband/hw/mlx4/doorbell.c         |  2 +-
 drivers/infiniband/hw/mlx4/mr.c               |  2 +-
 drivers/infiniband/hw/mlx4/qp.c               |  2 +-
 drivers/infiniband/hw/mlx4/srq.c              |  2 +-
 drivers/infiniband/hw/mlx5/cq.c               |  4 ++--
 drivers/infiniband/hw/mlx5/devx.c             |  2 +-
 drivers/infiniband/hw/mlx5/doorbell.c         |  2 +-
 drivers/infiniband/hw/mlx5/mr.c               | 15 ++++++++-------
 drivers/infiniband/hw/mlx5/odp.c              |  5 +++--
 drivers/infiniband/hw/mlx5/qp.c               |  4 ++--
 drivers/infiniband/hw/mlx5/srq.c              |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
 drivers/infiniband/hw/nes/nes_verbs.c         |  3 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |  3 ++-
 drivers/infiniband/hw/qedr/verbs.c            |  8 +++++---
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |  5 +++--
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c             |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c            |  3 ++-
 include/rdma/ib_umem.h                        |  3 ++-
 33 files changed, 80 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c6144df..9646cee 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -71,15 +71,21 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
  *
  * If access flags indicate ODP memory, avoid pinning. Instead, stores
  * the mm for future page fault handling in conjunction with MMU notifiers.
+ * If the process doing the pinning is the same as the process that owns
+ * the memory being pinned, 'owner' should be NULL. Otherwise, 'owner' should
+ * be the process ID of the owning process. The process ID must be in the
+ * same PID namespace as the calling userspace context.
  *
- * @context: userspace context to pin memory for
+ * @context: userspace context that is pinning the memory
  * @addr: userspace virtual address to start at
  * @size: length of region to pin
  * @access: IB_ACCESS_xxx flags for memory being pinned
  * @dmasync: flush in-flight DMA when the memory region is written
+ * @owner: the ID of the process that owns the memory being pinned
  */
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
-			    size_t size, int access, int dmasync)
+			    size_t size, int access, int dmasync,
+			    struct pid *owner)
 {
 	struct ib_umem *umem;
 	struct page **page_list;
@@ -94,6 +100,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	unsigned long dma_attrs = 0;
 	struct scatterlist *sg, *sg_list_start;
 	unsigned int gup_flags = FOLL_WRITE;
+	struct task_struct *owner_task = current;
 
 	if (dmasync)
 		dma_attrs |= DMA_ATTR_WRITE_BARRIER;
@@ -120,12 +127,18 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 			return ERR_PTR(-ENOMEM);
 	}
 
+	if (owner) {
+		rcu_read_lock();
+		owner_task = pid_task(owner, PIDTYPE_PID);
+		rcu_read_unlock();
+	}
+
 	umem->context    = context;
 	umem->length     = size;
 	umem->address    = addr;
 	umem->page_shift = PAGE_SHIFT;
 	umem->writable   = ib_access_writable(access);
-	umem->owning_mm = mm = current->mm;
+	umem->owning_mm = mm = owner_task->mm;
 	mmgrab(mm);
 
 	if (access & IB_ACCESS_ON_DEMAND) {
@@ -183,10 +196,11 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	while (npages) {
 		down_read(&mm->mmap_sem);
-		ret = get_user_pages_longterm(cur_base,
+		ret = get_user_pages_remote_longterm(owner_task,
+				     mm, cur_base,
 				     min_t(unsigned long, npages,
-					   PAGE_SIZE / sizeof (struct page *)),
-				     gup_flags, page_list, vma_list);
+				     PAGE_SIZE / sizeof(struct page *)),
+				     gup_flags, page_list, vma_list, NULL);
 		if (ret < 0) {
 			up_read(&mm->mmap_sem);
 			goto umem_release;
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 0828f27..96eca44 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -896,7 +896,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
 		bytes += (qplib_qp->sq.max_wqe * sizeof(struct sq_psn_search));
 	bytes = PAGE_ALIGN(bytes);
 	umem = ib_umem_get(context, ureq.qpsva, bytes,
-			   IB_ACCESS_LOCAL_WRITE, 1);
+			   IB_ACCESS_LOCAL_WRITE, 1, NULL);
 	if (IS_ERR(umem))
 		return PTR_ERR(umem);
 
@@ -909,7 +909,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
 		bytes = (qplib_qp->rq.max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
 		bytes = PAGE_ALIGN(bytes);
 		umem = ib_umem_get(context, ureq.qprva, bytes,
-				   IB_ACCESS_LOCAL_WRITE, 1);
+				   IB_ACCESS_LOCAL_WRITE, 1, NULL);
 		if (IS_ERR(umem))
 			goto rqfail;
 		qp->rumem = umem;
@@ -1371,7 +1371,7 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev,
 	bytes = (qplib_srq->max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
 	bytes = PAGE_ALIGN(bytes);
 	umem = ib_umem_get(context, ureq.srqva, bytes,
-			   IB_ACCESS_LOCAL_WRITE, 1);
+			   IB_ACCESS_LOCAL_WRITE, 1, NULL);
 	if (IS_ERR(umem))
 		return PTR_ERR(umem);
 
@@ -2624,7 +2624,7 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
 
 		cq->umem = ib_umem_get(context, req.cq_va,
 				       entries * sizeof(struct cq_base),
-				       IB_ACCESS_LOCAL_WRITE, 1);
+				       IB_ACCESS_LOCAL_WRITE, 1, NULL);
 		if (IS_ERR(cq->umem)) {
 			rc = PTR_ERR(cq->umem);
 			goto fail;
@@ -3591,7 +3591,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	mr->ib_mr.rkey = mr->qplib_mr.rkey;
 
 	umem = ib_umem_get(ib_pd->uobject->context, start, length,
-			   mr_access_flags, 0);
+			   mr_access_flags, 0, NULL);
 	if (IS_ERR(umem)) {
 		dev_err(rdev_to_dev(rdev), "Failed to get umem");
 		rc = -EFAULT;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 54d8b38..fd94576 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -541,7 +541,8 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	mhp->rhp = rhp;
 
-	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0,
+				owner);
 	if (IS_ERR(mhp->umem)) {
 		err = PTR_ERR(mhp->umem);
 		kfree(mhp);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index ec9b0b4..fa3ebbc 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -538,7 +538,8 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	mhp->rhp = rhp;
 
-	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0,
+				owner);
 	if (IS_ERR(mhp->umem))
 		goto err_free_skb;
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 3a485f5..6fc6f6a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -224,7 +224,7 @@ static int hns_roce_ib_get_cq_umem(struct hns_roce_dev *hr_dev,
 	u32 npages;
 
 	*umem = ib_umem_get(context, buf_addr, cqe * hr_dev->caps.cq_entry_sz,
-			    IB_ACCESS_LOCAL_WRITE, 1);
+			    IB_ACCESS_LOCAL_WRITE, 1, NULL);
 	if (IS_ERR(*umem))
 		return PTR_ERR(*umem);
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_db.c b/drivers/infiniband/hw/hns/hns_roce_db.c
index e2f93c1..e125502 100644
--- a/drivers/infiniband/hw/hns/hns_roce_db.c
+++ b/drivers/infiniband/hw/hns/hns_roce_db.c
@@ -29,7 +29,7 @@ int hns_roce_db_map_user(struct hns_roce_ucontext *context, unsigned long virt,
 	refcount_set(&page->refcount, 1);
 	page->user_virt = (virt & PAGE_MASK);
 	page->umem = ib_umem_get(&context->ibucontext, virt & PAGE_MASK,
-				 PAGE_SIZE, 0, 0);
+				 PAGE_SIZE, 0, 0, NULL);
 	if (IS_ERR(page->umem)) {
 		ret = PTR_ERR(page->umem);
 		kfree(page);
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index ee5991b..4f023b8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -1111,7 +1111,7 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(-ENOMEM);
 
 	mr->umem = ib_umem_get(pd->uobject->context, start, length,
-			       access_flags, 0);
+			       access_flags, 0, owner);
 	if (IS_ERR(mr->umem)) {
 		ret = PTR_ERR(mr->umem);
 		goto err_free;
@@ -1221,7 +1221,7 @@ int hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start, u64 length,
 		ib_umem_release(mr->umem);
 
 		mr->umem = ib_umem_get(ibmr->uobject->context, start, length,
-				       mr_access_flags, 0);
+				       mr_access_flags, 0, NULL);
 		if (IS_ERR(mr->umem)) {
 			ret = PTR_ERR(mr->umem);
 			mr->umem = NULL;
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index 54031c5..3e1dfdf 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -614,7 +614,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 
 		hr_qp->umem = ib_umem_get(ib_pd->uobject->context,
 					  ucmd.buf_addr, hr_qp->buff_size, 0,
-					  0);
+					  0, NULL);
 		if (IS_ERR(hr_qp->umem)) {
 			dev_err(dev, "ib_umem_get error for create qp\n");
 			ret = PTR_ERR(hr_qp->umem);
diff --git a/drivers/infiniband/hw/hns/hns_roce_srq.c b/drivers/infiniband/hw/hns/hns_roce_srq.c
index 960b194..bcfc092 100644
--- a/drivers/infiniband/hw/hns/hns_roce_srq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_srq.c
@@ -253,7 +253,7 @@ struct ib_srq *hns_roce_create_srq(struct ib_pd *pd,
 		}
 
 		srq->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr,
-					srq_buf_size, 0, 0);
+					srq_buf_size, 0, 0, NULL);
 		if (IS_ERR(srq->umem)) {
 			ret = PTR_ERR(srq->umem);
 			goto err_srq;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index fc2e6c8..e34ac01 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1853,7 +1853,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 
 	if (length > I40IW_MAX_MR_SIZE)
 		return ERR_PTR(-EINVAL);
-	region = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	region = ib_umem_get(pd->uobject->context, start, length, acc, 0, NULL);
 	if (IS_ERR(region))
 		return (struct ib_mr *)region;
 
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 4351234..a66a1ef 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -144,7 +144,7 @@ static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_ucontext *cont
 	int n;
 
 	*umem = ib_umem_get(context, buf_addr, cqe * cqe_size,
-			    IB_ACCESS_LOCAL_WRITE, 1);
+			    IB_ACCESS_LOCAL_WRITE, 1, NULL);
 	if (IS_ERR(*umem))
 		return PTR_ERR(*umem);
 
diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c
index c517409..6abc3b7 100644
--- a/drivers/infiniband/hw/mlx4/doorbell.c
+++ b/drivers/infiniband/hw/mlx4/doorbell.c
@@ -62,7 +62,7 @@ int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
 	page->user_virt = (virt & PAGE_MASK);
 	page->refcnt    = 0;
 	page->umem      = ib_umem_get(&context->ibucontext, virt & PAGE_MASK,
-				      PAGE_SIZE, 0, 0);
+				      PAGE_SIZE, 0, 0, NULL);
 	if (IS_ERR(page->umem)) {
 		err = PTR_ERR(page->umem);
 		kfree(page);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 76fa83c..4a474698 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -398,7 +398,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_ucontext *context, u64 start,
 		up_read(&current->mm->mmap_sem);
 	}
 
-	return ib_umem_get(context, start, length, access_flags, 0);
+	return ib_umem_get(context, start, length, access_flags, 0, NULL);
 }
 
 struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 971e9a9..4a984da 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1017,7 +1017,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 
 		qp->umem = ib_umem_get(pd->uobject->context,
 				(src == MLX4_IB_QP_SRC) ? ucmd.qp.buf_addr :
-				ucmd.wq.buf_addr, qp->buf_size, 0, 0);
+				ucmd.wq.buf_addr, qp->buf_size, 0, 0, NULL);
 		if (IS_ERR(qp->umem)) {
 			err = PTR_ERR(qp->umem);
 			goto err;
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 4456f1b..2919d62 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -114,7 +114,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		}
 
 		srq->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr,
-					buf_size, 0, 0);
+					buf_size, 0, 0, NULL);
 		if (IS_ERR(srq->umem)) {
 			err = PTR_ERR(srq->umem);
 			goto err_srq;
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 90f1b0b..847f4ab 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -709,7 +709,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 
 	cq->buf.umem = ib_umem_get(context, ucmd.buf_addr,
 				   entries * ucmd.cqe_size,
-				   IB_ACCESS_LOCAL_WRITE, 1);
+				   IB_ACCESS_LOCAL_WRITE, 1, NULL);
 	if (IS_ERR(cq->buf.umem)) {
 		err = PTR_ERR(cq->buf.umem);
 		return err;
@@ -1126,7 +1126,7 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 
 	umem = ib_umem_get(context, ucmd.buf_addr,
 			   (size_t)ucmd.cqe_size * entries,
-			   IB_ACCESS_LOCAL_WRITE, 1);
+			   IB_ACCESS_LOCAL_WRITE, 1, NULL);
 	if (IS_ERR(umem)) {
 		err = PTR_ERR(umem);
 		return err;
diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 5a588f3..7dafdc3 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -1195,7 +1195,7 @@ static int devx_umem_get(struct mlx5_ib_dev *dev, struct ib_ucontext *ucontext,
 	if (err)
 		return err;
 
-	obj->umem = ib_umem_get(ucontext, addr, size, access, 0);
+	obj->umem = ib_umem_get(ucontext, addr, size, access, 0, NULL);
 	if (IS_ERR(obj->umem))
 		return PTR_ERR(obj->umem);
 
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index a0e4e6d..8527574 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -64,7 +64,7 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt,
 	page->user_virt = (virt & PAGE_MASK);
 	page->refcnt    = 0;
 	page->umem      = ib_umem_get(&context->ibucontext, virt & PAGE_MASK,
-				      PAGE_SIZE, 0, 0);
+				      PAGE_SIZE, 0, 0, NULL);
 	if (IS_ERR(page->umem)) {
 		err = PTR_ERR(page->umem);
 		kfree(page);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6add486..8eb606a 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -849,7 +849,7 @@ static int mr_cache_max_order(struct mlx5_ib_dev *dev)
 static int mr_umem_get(struct ib_pd *pd, u64 start, u64 length,
 		       int access_flags, struct ib_umem **umem,
 		       int *npages, int *page_shift, int *ncont,
-		       int *order)
+		       int *order, struct pid *owner)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct ib_umem *u;
@@ -857,7 +857,8 @@ static int mr_umem_get(struct ib_pd *pd, u64 start, u64 length,
 
 	*umem = NULL;
 
-	u = ib_umem_get(pd->uobject->context, start, length, access_flags, 0);
+	u = ib_umem_get(pd->uobject->context, start, length, access_flags, 0,
+			owner);
 	err = PTR_ERR_OR_ZERO(u);
 	if (err) {
 		mlx5_ib_dbg(dev, "umem get failed (%d)\n", err);
@@ -1328,8 +1329,8 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM))
 		return ERR_PTR(-EOPNOTSUPP);
 
-	mlx5_ib_dbg(dev, "start 0x%llx, virt_addr 0x%llx, length 0x%llx, access_flags 0x%x\n",
-		    start, virt_addr, length, access_flags);
+	mlx5_ib_dbg(dev, "start=0x%llx, virt_addr=0x%llx, length=0x%llx, access_flags=0x%x owner=%i\n",
+			start, virt_addr, length, access_flags, pid_vnr(owner));
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	if (!start && length == U64_MAX) {
@@ -1337,7 +1338,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		    !(dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
 			return ERR_PTR(-EINVAL);
 
-		mr = mlx5_ib_alloc_implicit_mr(to_mpd(pd), access_flags);
+		mr = mlx5_ib_alloc_implicit_mr(to_mpd(pd), access_flags, owner);
 		if (IS_ERR(mr))
 			return ERR_CAST(mr);
 		return &mr->ibmr;
@@ -1345,7 +1346,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 #endif
 
 	err = mr_umem_get(pd, start, length, access_flags, &umem, &npages,
-			   &page_shift, &ncont, &order);
+			   &page_shift, &ncont, &order, owner);
 
 	if (err < 0)
 		return ERR_PTR(err);
@@ -1496,7 +1497,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 		ib_umem_release(mr->umem);
 		mr->umem = NULL;
 		err = mr_umem_get(pd, addr, len, access_flags, &mr->umem,
-				  &npages, &page_shift, &ncont, &order);
+				  &npages, &page_shift, &ncont, &order, NULL);
 		if (err)
 			goto err;
 	}
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 01e0f62..c317e18 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -492,13 +492,14 @@ static struct ib_umem_odp *implicit_mr_get_data(struct mlx5_ib_mr *mr,
 }
 
 struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
-					     int access_flags)
+					     int access_flags,
+					     struct pid *owner)
 {
 	struct ib_ucontext *ctx = pd->ibpd.uobject->context;
 	struct mlx5_ib_mr *imr;
 	struct ib_umem *umem;
 
-	umem = ib_umem_get(ctx, 0, 0, IB_ACCESS_ON_DEMAND, 0);
+	umem = ib_umem_get(ctx, 0, 0, IB_ACCESS_ON_DEMAND, 0, owner);
 	if (IS_ERR(umem))
 		return ERR_CAST(umem);
 
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dd2ae64..f2b72e7 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -654,7 +654,7 @@ static int mlx5_ib_umem_get(struct mlx5_ib_dev *dev,
 {
 	int err;
 
-	*umem = ib_umem_get(pd->uobject->context, addr, size, 0, 0);
+	*umem = ib_umem_get(pd->uobject->context, addr, size, 0, 0, NULL);
 	if (IS_ERR(*umem)) {
 		mlx5_ib_dbg(dev, "umem_get failed\n");
 		return PTR_ERR(*umem);
@@ -710,7 +710,7 @@ static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 
 	context = to_mucontext(pd->uobject->context);
 	rwq->umem = ib_umem_get(pd->uobject->context, ucmd->buf_addr,
-			       rwq->buf_size, 0, 0);
+			       rwq->buf_size, 0, 0, NULL);
 	if (IS_ERR(rwq->umem)) {
 		mlx5_ib_dbg(dev, "umem_get failed\n");
 		err = PTR_ERR(rwq->umem);
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index 4e8d180..d51f6f3 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -80,7 +80,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 	srq->wq_sig = !!(ucmd.flags & MLX5_SRQ_FLAG_SIGNATURE);
 
 	srq->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr, buf_size,
-				0, 0);
+				0, 0, NULL);
 	if (IS_ERR(srq->umem)) {
 		mlx5_ib_dbg(dev, "failed umem get, size %d\n", buf_size);
 		err = PTR_ERR(srq->umem);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 77e678e..5cc9f3c 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -933,7 +933,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(-ENOMEM);
 
 	mr->umem = ib_umem_get(pd->uobject->context, start, length, acc,
-			       ucmd.mr_attrs & MTHCA_MR_DMASYNC);
+			       ucmd.mr_attrs & MTHCA_MR_DMASYNC, owner);
 
 	if (IS_ERR(mr->umem)) {
 		err = PTR_ERR(mr->umem);
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index e07cb02..0e3295e 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -2134,7 +2134,8 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	u8 stag_key;
 	int first_page = 1;
 
-	region = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	region = ib_umem_get(pd->uobject->context, start, length, acc, 0,
+			owner);
 	if (IS_ERR(region)) {
 		return (struct ib_mr *)region;
 	}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 01d076a..e817d45 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -917,7 +917,8 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 	if (!mr)
 		return ERR_PTR(status);
-	mr->umem = ib_umem_get(ibpd->uobject->context, start, len, acc, 0);
+	mr->umem = ib_umem_get(ibpd->uobject->context, start, len, acc, 0,
+			owner);
 	if (IS_ERR(mr->umem)) {
 		status = -EFAULT;
 		goto umem_err;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index e1ccf32..15b87d0 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -748,7 +748,8 @@ static inline int qedr_init_user_queue(struct ib_ucontext *ib_ctx,
 
 	q->buf_addr = buf_addr;
 	q->buf_len = buf_len;
-	q->umem = ib_umem_get(ib_ctx, q->buf_addr, q->buf_len, access, dmasync);
+	q->umem = ib_umem_get(ib_ctx, q->buf_addr, q->buf_len, access, dmasync,
+			NULL);
 	if (IS_ERR(q->umem)) {
 		DP_ERR(dev, "create user queue: failed ib_umem_get, got %ld\n",
 		       PTR_ERR(q->umem));
@@ -1359,7 +1360,7 @@ static int qedr_init_srq_user_params(struct ib_ucontext *ib_ctx,
 
 	srq->prod_umem = ib_umem_get(ib_ctx, ureq->prod_pair_addr,
 				     sizeof(struct rdma_srq_producers),
-				     access, dmasync);
+				     access, dmasync, NULL);
 	if (IS_ERR(srq->prod_umem)) {
 		qedr_free_pbl(srq->dev, &srq->usrq.pbl_info, srq->usrq.pbl_tbl);
 		ib_umem_release(srq->usrq.umem);
@@ -2719,7 +2720,8 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 
 	mr->type = QEDR_MR_USER;
 
-	mr->umem = ib_umem_get(ibpd->uobject->context, start, len, acc, 0);
+	mr->umem = ib_umem_get(ibpd->uobject->context, start, len, acc, 0,
+			NULL);
 	if (IS_ERR(mr->umem)) {
 		rc = -EFAULT;
 		goto err0;
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
index 0f004c7..4ade730 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
@@ -142,7 +142,7 @@ struct ib_cq *pvrdma_create_cq(struct ib_device *ibdev,
 		}
 
 		cq->umem = ib_umem_get(context, ucmd.buf_addr, ucmd.buf_size,
-				       IB_ACCESS_LOCAL_WRITE, 1);
+				       IB_ACCESS_LOCAL_WRITE, 1, NULL);
 		if (IS_ERR(cq->umem)) {
 			ret = PTR_ERR(cq->umem);
 			goto err_cq;
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
index fa96fa4..f5466ad 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
@@ -127,7 +127,7 @@ struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	}
 
 	umem = ib_umem_get(pd->uobject->context, start,
-			   length, access_flags, 0);
+			   length, access_flags, 0, NULL);
 	if (IS_ERR(umem)) {
 		dev_warn(&dev->pdev->dev,
 			 "could not get umem for mem region\n");
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
index 3acf74c..ae8c826 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
@@ -264,7 +264,8 @@ struct ib_qp *pvrdma_create_qp(struct ib_pd *pd,
 				/* set qp->sq.wqe_cnt, shift, buf_size.. */
 				qp->rumem = ib_umem_get(pd->uobject->context,
 							ucmd.rbuf_addr,
-							ucmd.rbuf_size, 0, 0);
+							ucmd.rbuf_size, 0, 0,
+							NULL);
 				if (IS_ERR(qp->rumem)) {
 					ret = PTR_ERR(qp->rumem);
 					goto err_qp;
@@ -277,7 +278,7 @@ struct ib_qp *pvrdma_create_qp(struct ib_pd *pd,
 
 			qp->sumem = ib_umem_get(pd->uobject->context,
 						ucmd.sbuf_addr,
-						ucmd.sbuf_size, 0, 0);
+						ucmd.sbuf_size, 0, 0, NULL);
 			if (IS_ERR(qp->sumem)) {
 				if (!is_srq)
 					ib_umem_release(qp->rumem);
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c
index 06ba7c7..d235fcd 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c
@@ -155,7 +155,7 @@ struct ib_srq *pvrdma_create_srq(struct ib_pd *pd,
 
 	srq->umem = ib_umem_get(pd->uobject->context,
 				ucmd.buf_addr,
-				ucmd.buf_size, 0, 0);
+				ucmd.buf_size, 0, 0, NULL);
 	if (IS_ERR(srq->umem)) {
 		ret = PTR_ERR(srq->umem);
 		goto err_srq;
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 2bc95c9..f4cdcfe 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -390,7 +390,7 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(-EINVAL);
 
 	umem = ib_umem_get(pd->uobject->context, start, length,
-			   mr_access_flags, 0);
+			   mr_access_flags, 0, NULL);
 	if (IS_ERR(umem))
 		return (void *)umem;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 9d3916b..f91346a 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -171,7 +171,8 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start,
 	void			*vaddr;
 	int err;
 
-	umem = ib_umem_get(pd->ibpd.uobject->context, start, length, access, 0);
+	umem = ib_umem_get(pd->ibpd.uobject->context, start, length,
+		access, 0, NULL);
 	if (IS_ERR(umem)) {
 		pr_warn("err %d from rxe_umem_get\n",
 			(int)PTR_ERR(umem));
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 5d3755e..4951dcb 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -81,7 +81,8 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 #ifdef CONFIG_INFINIBAND_USER_MEM
 
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
-			    size_t size, int access, int dmasync);
+			    size_t size, int access, int dmasync,
+			    struct pid *owner);
 void ib_umem_release(struct ib_umem *umem);
 int ib_umem_page_count(struct ib_umem *umem);
 int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] RDMA/uverbs: add owner parameter to ib_umem_odp_get
  2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
                   ` (2 preceding siblings ...)
  2019-01-29 13:26 ` [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get Joel Nider
@ 2019-01-29 13:26 ` Joel Nider
  2019-01-29 13:26 ` [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR Joel Nider
  2019-01-29 16:44 ` [PATCH 0/5] RDMA: reg_remote_mr Steve Wise
  5 siblings, 0 replies; 14+ messages in thread
From: Joel Nider @ 2019-01-29 13:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, Joel Nider,
	linux-mm, linux-rdma, linux-kernel

Propagate the change of adding the owner parameter to several internal
core functions, as well as the ib_umem_odp_get() kernel interface
function. The mm of the address space that owns the memory region is
saved in the per_mm struct, which is then used by
ib_umem_odp_map_dma_pages() when resolving a page fault from ODP.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
---
 drivers/infiniband/core/umem.c     |  4 +--
 drivers/infiniband/core/umem_odp.c | 50 ++++++++++++++++++--------------------
 drivers/infiniband/hw/mlx5/odp.c   |  6 ++++-
 include/rdma/ib_umem_odp.h         |  6 +++--
 4 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 9646cee..77874e5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -142,7 +142,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	mmgrab(mm);
 
 	if (access & IB_ACCESS_ON_DEMAND) {
-		ret = ib_umem_odp_get(to_ib_umem_odp(umem), access);
+		ret = ib_umem_odp_get(to_ib_umem_odp(umem), access, owner);
 		if (ret)
 			goto umem_kfree;
 		return umem;
@@ -200,7 +200,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 				     mm, cur_base,
 				     min_t(unsigned long, npages,
 				     PAGE_SIZE / sizeof(struct page *)),
-				     gup_flags, page_list, vma_list, NULL);
+				     gup_flags, page_list, vma_list);
 		if (ret < 0) {
 			up_read(&mm->mmap_sem);
 			goto umem_release;
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index a4ec430..49826070 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -227,7 +227,8 @@ static void remove_umem_from_per_mm(struct ib_umem_odp *umem_odp)
 }
 
 static struct ib_ucontext_per_mm *alloc_per_mm(struct ib_ucontext *ctx,
-					       struct mm_struct *mm)
+					       struct mm_struct *mm,
+					       struct pid *owner)
 {
 	struct ib_ucontext_per_mm *per_mm;
 	int ret;
@@ -241,12 +242,8 @@ static struct ib_ucontext_per_mm *alloc_per_mm(struct ib_ucontext *ctx,
 	per_mm->umem_tree = RB_ROOT_CACHED;
 	init_rwsem(&per_mm->umem_rwsem);
 	per_mm->active = ctx->invalidate_range;
-
-	rcu_read_lock();
-	per_mm->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
-	rcu_read_unlock();
-
-	WARN_ON(mm != current->mm);
+	per_mm->tgid = owner;
+	mmgrab(per_mm->mm);
 
 	per_mm->mn.ops = &ib_umem_notifiers;
 	ret = mmu_notifier_register(&per_mm->mn, per_mm->mm);
@@ -265,7 +262,7 @@ static struct ib_ucontext_per_mm *alloc_per_mm(struct ib_ucontext *ctx,
 	return ERR_PTR(ret);
 }
 
-static int get_per_mm(struct ib_umem_odp *umem_odp)
+static int get_per_mm(struct ib_umem_odp *umem_odp, struct pid *owner)
 {
 	struct ib_ucontext *ctx = umem_odp->umem.context;
 	struct ib_ucontext_per_mm *per_mm;
@@ -280,7 +277,7 @@ static int get_per_mm(struct ib_umem_odp *umem_odp)
 			goto found;
 	}
 
-	per_mm = alloc_per_mm(ctx, umem_odp->umem.owning_mm);
+	per_mm = alloc_per_mm(ctx, umem_odp->umem.owning_mm, owner);
 	if (IS_ERR(per_mm)) {
 		mutex_unlock(&ctx->per_mm_list_lock);
 		return PTR_ERR(per_mm);
@@ -333,7 +330,8 @@ void put_per_mm(struct ib_umem_odp *umem_odp)
 }
 
 struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
-				      unsigned long addr, size_t size)
+				      unsigned long addr, size_t size,
+				      struct mm_struct *owner_mm)
 {
 	struct ib_ucontext *ctx = per_mm->context;
 	struct ib_umem_odp *odp_data;
@@ -345,12 +343,14 @@ struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
 	if (!odp_data)
 		return ERR_PTR(-ENOMEM);
 	umem = &odp_data->umem;
+
 	umem->context    = ctx;
 	umem->length     = size;
 	umem->address    = addr;
 	umem->page_shift = PAGE_SHIFT;
 	umem->writable   = 1;
 	umem->is_odp = 1;
+	umem->owning_mm = owner_mm;
 	odp_data->per_mm = per_mm;
 
 	mutex_init(&odp_data->umem_mutex);
@@ -389,13 +389,9 @@ struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
 }
 EXPORT_SYMBOL(ib_alloc_odp_umem);
 
-int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access)
+int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access, struct pid *owner)
 {
 	struct ib_umem *umem = &umem_odp->umem;
-	/*
-	 * NOTE: This must called in a process context where umem->owning_mm
-	 * == current->mm
-	 */
 	struct mm_struct *mm = umem->owning_mm;
 	int ret_val;
 
@@ -437,7 +433,7 @@ int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access)
 		}
 	}
 
-	ret_val = get_per_mm(umem_odp);
+	ret_val = get_per_mm(umem_odp, owner);
 	if (ret_val)
 		goto out_dma_list;
 	add_umem_to_per_mm(umem_odp);
@@ -574,8 +570,8 @@ static int ib_umem_odp_map_dma_single_page(
  *        the return value.
  * @access_mask: bit mask of the requested access permissions for the given
  *               range.
- * @current_seq: the MMU notifiers sequance value for synchronization with
- *               invalidations. the sequance number is read from
+ * @current_seq: the MMU notifiers sequence value for synchronization with
+ *               invalidations. the sequence number is read from
  *               umem_odp->notifiers_seq before calling this function
  */
 int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
@@ -584,7 +580,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 {
 	struct ib_umem *umem = &umem_odp->umem;
 	struct task_struct *owning_process  = NULL;
-	struct mm_struct *owning_mm = umem_odp->umem.owning_mm;
+	struct mm_struct *owning_mm;
 	struct page       **local_page_list = NULL;
 	u64 page_mask, off;
 	int j, k, ret = 0, start_idx, npages = 0, page_shift;
@@ -609,12 +605,13 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 	bcnt += off; /* Charge for the first page offset as well. */
 
 	/*
-	 * owning_process is allowed to be NULL, this means somehow the mm is
-	 * existing beyond the lifetime of the originating process.. Presumably
+	 * owning_process may be NULL, because the mm can
+	 * exist independently of the originating process.
 	 * mmget_not_zero will fail in this case.
 	 */
 	owning_process = get_pid_task(umem_odp->per_mm->tgid, PIDTYPE_PID);
-	if (WARN_ON(!mmget_not_zero(umem_odp->umem.owning_mm))) {
+	owning_mm = umem_odp->per_mm->mm;
+	if (WARN_ON(!mmget_not_zero(owning_mm))) {
 		ret = -EINVAL;
 		goto out_put_task;
 	}
@@ -632,15 +629,16 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 
 		down_read(&owning_mm->mmap_sem);
 		/*
-		 * Note: this might result in redundent page getting. We can
+		 * Note: this might result in redundant page getting. We can
 		 * avoid this by checking dma_list to be 0 before calling
-		 * get_user_pages. However, this make the code much more
+		 * get_user_pages. However, this makes the code much more
 		 * complex (and doesn't gain us much performance in most use
 		 * cases).
 		 */
-		npages = get_user_pages_remote(owning_process, owning_mm,
+		npages = get_user_pages_remote_longterm(owning_process,
+				owning_mm,
 				user_virt, gup_num_pages,
-				flags, local_page_list, NULL, NULL);
+				flags, local_page_list, NULL);
 		up_read(&owning_mm->mmap_sem);
 
 		if (npages < 0) {
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index c317e18..1abc917 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -439,8 +439,12 @@ static struct ib_umem_odp *implicit_mr_get_data(struct mlx5_ib_mr *mr,
 		if (nentries)
 			nentries++;
 	} else {
+		struct mm_struct *owner_mm = current->mm;
+
+		if (mr->umem->owning_mm)
+			owner_mm = mr->umem->owning_mm;
 		odp = ib_alloc_odp_umem(odp_mr->per_mm, addr,
-					MLX5_IMR_MTT_SIZE);
+					MLX5_IMR_MTT_SIZE, owner_mm);
 		if (IS_ERR(odp)) {
 			mutex_unlock(&odp_mr->umem_mutex);
 			return ERR_CAST(odp);
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index 0b1446f..28099e6 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -102,9 +102,11 @@ struct ib_ucontext_per_mm {
 	struct rcu_head rcu;
 };
 
-int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access);
+int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access,
+		     struct pid *owner);
 struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
-				      unsigned long addr, size_t size);
+				      unsigned long addr, size_t size,
+				      struct mm_struct *owner_mm);
 void ib_umem_odp_release(struct ib_umem_odp *umem_odp);
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR
  2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
                   ` (3 preceding siblings ...)
  2019-01-29 13:26 ` [PATCH 4/5] RDMA/uverbs: add owner parameter to ib_umem_odp_get Joel Nider
@ 2019-01-29 13:26 ` Joel Nider
  2019-01-29 17:04   ` Jason Gunthorpe
  2019-01-29 16:44 ` [PATCH 0/5] RDMA: reg_remote_mr Steve Wise
  5 siblings, 1 reply; 14+ messages in thread
From: Joel Nider @ 2019-01-29 13:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, Joel Nider,
	linux-mm, linux-rdma, linux-kernel

Add a new handler for new uverb reg_remote_mr. The purpose is to register
a memory region in a different address space (i.e. process) than the
caller.

The main use case which motivated this change is post-copy container
migration. When a migration manager (i.e. CRIU) starts a migration, it
must have an open connection for handling any page faults that occur
in the container after restoration on the target machine. Even though
CRIU establishes and maintains the connection, ultimately the memory
is copied from the container being migrated (i.e. a remote address
space). This container must remain passive -- meaning it cannot have
any knowledge of the RDMA connection; therefore the migration manager
must have the ability to register a remote memory region. This remote
memory region will serve as the source for any memory pages that must
be copied (on-demand or otherwise) during the migration.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 129 +++++++++++++++++++++++++-
 include/rdma/ib_verbs.h                       |   8 ++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  13 +++
 3 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 4d4be0c..bf7b4b2 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -150,6 +150,99 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
 	return ret;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_REG_REMOTE_MR)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct pid *owner_pid;
+	struct ib_reg_remote_mr_attr attr = {};
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs,
+					UVERBS_ATTR_REG_REMOTE_MR_HANDLE);
+	struct ib_pd *pd =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_REMOTE_MR_PD_HANDLE);
+
+	struct ib_mr *mr;
+	int ret;
+
+	ret = uverbs_copy_from(&attr.start, attrs,
+				UVERBS_ATTR_REG_REMOTE_MR_START);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&attr.length, attrs,
+				UVERBS_ATTR_REG_REMOTE_MR_LENGTH);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&attr.hca_va, attrs,
+				UVERBS_ATTR_REG_REMOTE_MR_HCA_VA);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&attr.owner, attrs,
+				UVERBS_ATTR_REG_REMOTE_MR_OWNER);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_flags32(&attr.access_flags, attrs,
+				 UVERBS_ATTR_REG_REMOTE_MR_ACCESS_FLAGS,
+				 IB_ACCESS_SUPPORTED);
+	if (ret)
+		return ret;
+
+	/* ensure the offsets are identical */
+	if ((attr.start & ~PAGE_MASK) != (attr.hca_va & ~PAGE_MASK))
+		return -EINVAL;
+
+	ret = ib_check_mr_access(attr.access_flags);
+	if (ret)
+		return ret;
+
+	if (attr.access_flags & IB_ACCESS_ON_DEMAND) {
+		if (!(pd->device->attrs.device_cap_flags &
+		      IB_DEVICE_ON_DEMAND_PAGING)) {
+			pr_debug("ODP support not available\n");
+			ret = -EINVAL;
+			return ret;
+		}
+	}
+
+	/* get the owner's pid struct before something happens to it */
+	owner_pid = find_get_pid(attr.owner);
+	mr = pd->device->ops.reg_user_mr(pd, attr.start, attr.length,
+		attr.hca_va, attr.access_flags, owner_pid, NULL);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device  = pd->device;
+	mr->pd      = pd;
+	mr->dm	    = NULL;
+	mr->uobject = uobj;
+	atomic_inc(&pd->usecnt);
+	mr->res.type = RDMA_RESTRACK_MR;
+	mr->res.task = get_pid_task(owner_pid, PIDTYPE_PID);
+	rdma_restrack_kadd(&mr->res);
+
+	uobj->object = mr;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_REMOTE_MR_RESP_LKEY,
+		   &mr->lkey, sizeof(mr->lkey));
+	if (ret)
+		goto err_dereg;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_REMOTE_MR_RESP_RKEY,
+			&mr->rkey, sizeof(mr->rkey));
+	if (ret)
+		goto err_dereg;
+
+	return 0;
+
+err_dereg:
+	ib_dereg_mr(mr);
+
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
@@ -203,12 +296,46 @@ DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 			UVERBS_ACCESS_DESTROY,
 			UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_REG_REMOTE_MR,
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_REMOTE_MR_HANDLE,
+			UVERBS_OBJECT_MR,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_REMOTE_MR_PD_HANDLE,
+			UVERBS_OBJECT_PD,
+			UVERBS_ACCESS_READ,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_START,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_HCA_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_REMOTE_MR_ACCESS_FLAGS,
+			     enum ib_access_flags),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_OWNER,
+			   UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_REMOTE_MR_RESP_LKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_REMOTE_MR_RESP_RKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+);
+
 DECLARE_UVERBS_NAMED_OBJECT(
 	UVERBS_OBJECT_MR,
 	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
 	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
 	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
-	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR));
+	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
+	&UVERBS_METHOD(UVERBS_METHOD_REG_REMOTE_MR),
+);
 
 const struct uapi_definition uverbs_def_obj_mr[] = {
 	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_MR,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3432404..dcf5edc 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -334,6 +334,14 @@ struct ib_dm_alloc_attr {
 	u32	flags;
 };
 
+struct ib_reg_remote_mr_attr {
+	u64      start;
+	u64      length;
+	u64      hca_va;
+	u32      access_flags;
+	u32      owner;
+};
+
 struct ib_device_attr {
 	u64			fw_ver;
 	__be64			sys_image_guid;
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 64f0e3a..4e62cd4 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -150,10 +150,23 @@ enum uverbs_attrs_reg_dm_mr_cmd_attr_ids {
 	UVERBS_ATTR_REG_DM_MR_RESP_RKEY,
 };
 
+enum uverbs_attrs_reg_remote_mr_cmd_attr_ids {
+	UVERBS_ATTR_REG_REMOTE_MR_HANDLE,
+	UVERBS_ATTR_REG_REMOTE_MR_PD_HANDLE,
+	UVERBS_ATTR_REG_REMOTE_MR_START,
+	UVERBS_ATTR_REG_REMOTE_MR_LENGTH,
+	UVERBS_ATTR_REG_REMOTE_MR_HCA_VA,
+	UVERBS_ATTR_REG_REMOTE_MR_ACCESS_FLAGS,
+	UVERBS_ATTR_REG_REMOTE_MR_OWNER,
+	UVERBS_ATTR_REG_REMOTE_MR_RESP_LKEY,
+	UVERBS_ATTR_REG_REMOTE_MR_RESP_RKEY,
+};
+
 enum uverbs_methods_mr {
 	UVERBS_METHOD_DM_MR_REG,
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_METHOD_ADVISE_MR,
+	UVERBS_METHOD_REG_REMOTE_MR,
 };
 
 enum uverbs_attrs_mr_destroy_ids {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] RDMA: reg_remote_mr
  2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
                   ` (4 preceding siblings ...)
  2019-01-29 13:26 ` [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR Joel Nider
@ 2019-01-29 16:44 ` Steve Wise
  2019-01-29 18:34   ` Ira Weiny
  2019-01-30  8:22   ` Joel Nider
  5 siblings, 2 replies; 14+ messages in thread
From: Steve Wise @ 2019-01-29 16:44 UTC (permalink / raw)
  To: Joel Nider, Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, linux-mm,
	linux-rdma, linux-kernel


On 1/29/2019 7:26 AM, Joel Nider wrote:
> As discussed at LPC'18, there is a need to be able to register a memory
> region (MR) on behalf of another process. One example is the case of
> post-copy container migration, in which CRIU is responsible for setting
> up the migration, but the contents of the memory are from the migrating
> process. In this case, we want all RDMA READ requests to be served by
> the address space of the migration process directly (not by CRIU). This
> patchset implements a new uverbs command which allows an application to
> register a memory region in the address space of another process.

Hey Joel,

Dumb question:

Doesn't this open a security hole by allowing any process to register
memory in any other process?

Steve.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get
  2019-01-29 13:26 ` [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get Joel Nider
@ 2019-01-29 16:56   ` Jason Gunthorpe
  2019-01-29 18:29   ` Ira Weiny
  1 sibling, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2019-01-29 16:56 UTC (permalink / raw)
  To: Joel Nider
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, linux-mm,
	linux-rdma, linux-kernel

On Tue, Jan 29, 2019 at 03:26:24PM +0200, Joel Nider wrote:
> ib_umem_get is a core function used by drivers that support RDMA.
> The 'owner' parameter signifies the process that owns the memory.
> Until now, it was assumed that the owning process was the current
> process. This adds the flexibility to specify a process other than
> the current process. All drivers that call this function are also
> updated, but the default behaviour is to keep backwards
> compatibility by assuming the current process is the owner when
> the 'owner' parameter is NULL.
> 
> Signed-off-by: Joel Nider <joeln@il.ibm.com>
>  drivers/infiniband/core/umem.c                | 26 ++++++++++++++++++++------
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c      | 10 +++++-----
>  drivers/infiniband/hw/cxgb3/iwch_provider.c   |  3 ++-
>  drivers/infiniband/hw/cxgb4/mem.c             |  3 ++-
>  drivers/infiniband/hw/hns/hns_roce_cq.c       |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_db.c       |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_mr.c       |  4 ++--
>  drivers/infiniband/hw/hns/hns_roce_qp.c       |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_srq.c      |  2 +-
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c     |  2 +-
>  drivers/infiniband/hw/mlx4/cq.c               |  2 +-
>  drivers/infiniband/hw/mlx4/doorbell.c         |  2 +-
>  drivers/infiniband/hw/mlx4/mr.c               |  2 +-
>  drivers/infiniband/hw/mlx4/qp.c               |  2 +-
>  drivers/infiniband/hw/mlx4/srq.c              |  2 +-
>  drivers/infiniband/hw/mlx5/cq.c               |  4 ++--
>  drivers/infiniband/hw/mlx5/devx.c             |  2 +-
>  drivers/infiniband/hw/mlx5/doorbell.c         |  2 +-
>  drivers/infiniband/hw/mlx5/mr.c               | 15 ++++++++-------
>  drivers/infiniband/hw/mlx5/odp.c              |  5 +++--
>  drivers/infiniband/hw/mlx5/qp.c               |  4 ++--
>  drivers/infiniband/hw/mlx5/srq.c              |  2 +-
>  drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
>  drivers/infiniband/hw/nes/nes_verbs.c         |  3 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |  3 ++-
>  drivers/infiniband/hw/qedr/verbs.c            |  8 +++++---
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |  2 +-
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  2 +-
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |  5 +++--
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |  2 +-
>  drivers/infiniband/sw/rdmavt/mr.c             |  2 +-
>  drivers/infiniband/sw/rxe/rxe_mr.c            |  3 ++-
>  include/rdma/ib_umem.h                        |  3 ++-
>  33 files changed, 80 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index c6144df..9646cee 100644
> +++ b/drivers/infiniband/core/umem.c
> @@ -71,15 +71,21 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>   *
>   * If access flags indicate ODP memory, avoid pinning. Instead, stores
>   * the mm for future page fault handling in conjunction with MMU notifiers.
> + * If the process doing the pinning is the same as the process that owns
> + * the memory being pinned, 'owner' should be NULL. Otherwise, 'owner' should
> + * be the process ID of the owning process. The process ID must be in the
> + * same PID namespace as the calling userspace context.
>   *
> - * @context: userspace context to pin memory for
> + * @context: userspace context that is pinning the memory
>   * @addr: userspace virtual address to start at
>   * @size: length of region to pin
>   * @access: IB_ACCESS_xxx flags for memory being pinned
>   * @dmasync: flush in-flight DMA when the memory region is written
> + * @owner: the ID of the process that owns the memory being pinned
>   */
>  struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> -			    size_t size, int access, int dmasync)
> +			    size_t size, int access, int dmasync,
> +			    struct pid *owner)

You need to rebase this patch on rdma's for-next tree, the signature is
different.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR
  2019-01-29 13:26 ` [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR Joel Nider
@ 2019-01-29 17:04   ` Jason Gunthorpe
  2019-01-30  8:34     ` Joel Nider
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2019-01-29 17:04 UTC (permalink / raw)
  To: Joel Nider
  Cc: Leon Romanovsky, Doug Ledford, Mike Rapoport, linux-mm,
	linux-rdma, linux-kernel

On Tue, Jan 29, 2019 at 03:26:26PM +0200, Joel Nider wrote:
> Add a new handler for new uverb reg_remote_mr. The purpose is to register
> a memory region in a different address space (i.e. process) than the
> caller.
> 
> The main use case which motivated this change is post-copy container
> migration. When a migration manager (i.e. CRIU) starts a migration, it
> must have an open connection for handling any page faults that occur
> in the container after restoration on the target machine. Even though
> CRIU establishes and maintains the connection, ultimately the memory
> is copied from the container being migrated (i.e. a remote address
> space). This container must remain passive -- meaning it cannot have
> any knowledge of the RDMA connection; therefore the migration manager
> must have the ability to register a remote memory region. This remote
> memory region will serve as the source for any memory pages that must
> be copied (on-demand or otherwise) during the migration.
> 
> Signed-off-by: Joel Nider <joeln@il.ibm.com>
>  drivers/infiniband/core/uverbs_std_types_mr.c | 129 +++++++++++++++++++++++++-
>  include/rdma/ib_verbs.h                       |   8 ++
>  include/uapi/rdma/ib_user_ioctl_cmds.h        |  13 +++
>  3 files changed, 149 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
> index 4d4be0c..bf7b4b2 100644
> +++ b/drivers/infiniband/core/uverbs_std_types_mr.c
> @@ -150,6 +150,99 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
>  	return ret;
>  }
>  
> +static int UVERBS_HANDLER(UVERBS_METHOD_REG_REMOTE_MR)(
> +	struct uverbs_attr_bundle *attrs)
> +{

I think this should just be REG_MR with an optional remote PID
argument

> +	struct pid *owner_pid;
> +	struct ib_reg_remote_mr_attr attr = {};
> +	struct ib_uobject *uobj =
> +		uverbs_attr_get_uobject(attrs,
> +					UVERBS_ATTR_REG_REMOTE_MR_HANDLE);
> +	struct ib_pd *pd =
> +		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_REMOTE_MR_PD_HANDLE);
> +
> +	struct ib_mr *mr;
> +	int ret;
> +
> +	ret = uverbs_copy_from(&attr.start, attrs,
> +				UVERBS_ATTR_REG_REMOTE_MR_START);
> +	if (ret)
> +		return ret;
> +
> +	ret = uverbs_copy_from(&attr.length, attrs,
> +				UVERBS_ATTR_REG_REMOTE_MR_LENGTH);
> +	if (ret)
> +		return ret;
> +
> +	ret = uverbs_copy_from(&attr.hca_va, attrs,
> +				UVERBS_ATTR_REG_REMOTE_MR_HCA_VA);
> +	if (ret)
> +		return ret;
> +
> +	ret = uverbs_copy_from(&attr.owner, attrs,
> +				UVERBS_ATTR_REG_REMOTE_MR_OWNER);
> +	if (ret)
> +		return ret;

Maybe these should use the const version, it is becoming intended for
small integers, then we can do sensible things like use uintptr_t to
store pointer values, and size_t to store sizes - the code will
automatically bounds check the user input if it is done like this.

> +	ret = uverbs_get_flags32(&attr.access_flags, attrs,
> +				 UVERBS_ATTR_REG_REMOTE_MR_ACCESS_FLAGS,
> +				 IB_ACCESS_SUPPORTED);
> +	if (ret)
> +		return ret;
> +
> +	/* ensure the offsets are identical */
> +	if ((attr.start & ~PAGE_MASK) != (attr.hca_va & ~PAGE_MASK))
> +		return -EINVAL;
> +
> +	ret = ib_check_mr_access(attr.access_flags);
> +	if (ret)
> +		return ret;
> +
> +	if (attr.access_flags & IB_ACCESS_ON_DEMAND) {
> +		if (!(pd->device->attrs.device_cap_flags &
> +		      IB_DEVICE_ON_DEMAND_PAGING)) {
> +			pr_debug("ODP support not available\n");
> +			ret = -EINVAL;
> +			return ret;
> +		}
> +	}
> +
> +	/* get the owner's pid struct before something happens to it */
> +	owner_pid = find_get_pid(attr.owner);

security? Match what ptrace does?

> +	mr = pd->device->ops.reg_user_mr(pd, attr.start, attr.length,
> +		attr.hca_va, attr.access_flags, owner_pid, NULL);
> +	if (IS_ERR(mr))
> +		return PTR_ERR(mr);
> +
> +	mr->device  = pd->device;
> +	mr->pd      = pd;
> +	mr->dm	    = NULL;
> +	mr->uobject = uobj;
> +	atomic_inc(&pd->usecnt);
> +	mr->res.type = RDMA_RESTRACK_MR;
> +	mr->res.task = get_pid_task(owner_pid, PIDTYPE_PID);
> +	rdma_restrack_kadd(&mr->res);
> +
> +	uobj->object = mr;
> +
> +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_REMOTE_MR_RESP_LKEY,
> +		   &mr->lkey, sizeof(mr->lkey));
> +	if (ret)
> +		goto err_dereg;
> +
> +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_REMOTE_MR_RESP_RKEY,
> +			&mr->rkey, sizeof(mr->rkey));
> +	if (ret)
> +		goto err_dereg;
> +
> +	return 0;
> +
> +err_dereg:
> +	ib_dereg_mr(mr);
> +
> +	return ret;
> +}
> +
>  DECLARE_UVERBS_NAMED_METHOD(
>  	UVERBS_METHOD_ADVISE_MR,
>  	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
> @@ -203,12 +296,46 @@ DECLARE_UVERBS_NAMED_METHOD_DESTROY(
>  			UVERBS_ACCESS_DESTROY,
>  			UA_MANDATORY));
>  
> +DECLARE_UVERBS_NAMED_METHOD(
> +	UVERBS_METHOD_REG_REMOTE_MR,
> +	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_REMOTE_MR_HANDLE,
> +			UVERBS_OBJECT_MR,
> +			UVERBS_ACCESS_NEW,
> +			UA_MANDATORY),
> +	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_REMOTE_MR_PD_HANDLE,
> +			UVERBS_OBJECT_PD,
> +			UVERBS_ACCESS_READ,
> +			UA_MANDATORY),
> +	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_START,
> +			   UVERBS_ATTR_TYPE(u64),
> +			   UA_MANDATORY),
> +	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_LENGTH,
> +			   UVERBS_ATTR_TYPE(u64),
> +			   UA_MANDATORY),
> +	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_HCA_VA,
> +			   UVERBS_ATTR_TYPE(u64),
> +			   UA_MANDATORY),
> +	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_REMOTE_MR_ACCESS_FLAGS,
> +			     enum ib_access_flags),
> +	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_REMOTE_MR_OWNER,
> +			   UVERBS_ATTR_TYPE(u32),
> +			   UA_MANDATORY),
> +	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_REMOTE_MR_RESP_LKEY,
> +			    UVERBS_ATTR_TYPE(u32),
> +			    UA_MANDATORY),
> +	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_REMOTE_MR_RESP_RKEY,
> +			    UVERBS_ATTR_TYPE(u32),
> +			    UA_MANDATORY),
> +);
> +
>  DECLARE_UVERBS_NAMED_OBJECT(
>  	UVERBS_OBJECT_MR,
>  	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
>  	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
>  	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
> -	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR));
> +	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
> +	&UVERBS_METHOD(UVERBS_METHOD_REG_REMOTE_MR),
> +);

I'm kind of surprised this compiles with the trailing comma?

>  const struct uapi_definition uverbs_def_obj_mr[] = {
>  	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_MR,
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 3432404..dcf5edc 100644
> +++ b/include/rdma/ib_verbs.h
> @@ -334,6 +334,14 @@ struct ib_dm_alloc_attr {
>  	u32	flags;
>  };
>  
> +struct ib_reg_remote_mr_attr {
> +	u64      start;
> +	u64      length;
> +	u64      hca_va;
> +	u32      access_flags;
> +	u32      owner;
> +};

Why? Why here?

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get
  2019-01-29 13:26 ` [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get Joel Nider
  2019-01-29 16:56   ` Jason Gunthorpe
@ 2019-01-29 18:29   ` Ira Weiny
  1 sibling, 0 replies; 14+ messages in thread
From: Ira Weiny @ 2019-01-29 18:29 UTC (permalink / raw)
  To: Joel Nider
  Cc: Jason Gunthorpe, Leon Romanovsky, Doug Ledford, Mike Rapoport,
	linux-mm, linux-rdma, linux-kernel

On Tue, Jan 29, 2019 at 03:26:24PM +0200, Joel Nider wrote:
> ib_umem_get is a core function used by drivers that support RDMA.
> The 'owner' parameter signifies the process that owns the memory.
> Until now, it was assumed that the owning process was the current
> process. This adds the flexibility to specify a process other than
> the current process. All drivers that call this function are also
> updated, but the default behaviour is to keep backwards
> compatibility by assuming the current process is the owner when
> the 'owner' parameter is NULL.
> 
> Signed-off-by: Joel Nider <joeln@il.ibm.com>
> ---

[snip]

> @@ -183,10 +196,11 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  
>  	while (npages) {
>  		down_read(&mm->mmap_sem);
> -		ret = get_user_pages_longterm(cur_base,
> +		ret = get_user_pages_remote_longterm(owner_task,
> +				     mm, cur_base,
>  				     min_t(unsigned long, npages,
> -					   PAGE_SIZE / sizeof (struct page *)),
> -				     gup_flags, page_list, vma_list);
> +				     PAGE_SIZE / sizeof(struct page *)),
> +				     gup_flags, page_list, vma_list, NULL);

qib was recently converted to get_user_pages_longterm.  So qib would need to
be updated as well.

Ira


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] RDMA: reg_remote_mr
  2019-01-29 16:44 ` [PATCH 0/5] RDMA: reg_remote_mr Steve Wise
@ 2019-01-29 18:34   ` Ira Weiny
  2019-01-30  8:22   ` Joel Nider
  1 sibling, 0 replies; 14+ messages in thread
From: Ira Weiny @ 2019-01-29 18:34 UTC (permalink / raw)
  To: Steve Wise
  Cc: Joel Nider, Jason Gunthorpe, Leon Romanovsky, Doug Ledford,
	Mike Rapoport, linux-mm, linux-rdma, linux-kernel

On Tue, Jan 29, 2019 at 10:44:48AM -0600, Steve Wise wrote:
> 
> On 1/29/2019 7:26 AM, Joel Nider wrote:
> > As discussed at LPC'18, there is a need to be able to register a memory
> > region (MR) on behalf of another process. One example is the case of
> > post-copy container migration, in which CRIU is responsible for setting
> > up the migration, but the contents of the memory are from the migrating
> > process. In this case, we want all RDMA READ requests to be served by
> > the address space of the migration process directly (not by CRIU). This
> > patchset implements a new uverbs command which allows an application to
> > register a memory region in the address space of another process.
> 
> Hey Joel,
> 
> Dumb question:
> 
> Doesn't this open a security hole by allowing any process to register
> memory in any other process?

I agree, Changing all MR to use FOLL_REMOTE seems wrong.

Ira

> 
> Steve.
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] RDMA: reg_remote_mr
  2019-01-29 16:44 ` [PATCH 0/5] RDMA: reg_remote_mr Steve Wise
  2019-01-29 18:34   ` Ira Weiny
@ 2019-01-30  8:22   ` Joel Nider
  1 sibling, 0 replies; 14+ messages in thread
From: Joel Nider @ 2019-01-30  8:22 UTC (permalink / raw)
  To: Steve Wise
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, linux-kernel,
	linux-mm, linux-rdma, Mike Rapoport

Steve Wise <swise@opengridcomputing.com> wrote on 01/29/2019 06:44:48 PM:

> 
> On 1/29/2019 7:26 AM, Joel Nider wrote:
> > As discussed at LPC'18, there is a need to be able to register a 
memory
> > region (MR) on behalf of another process. One example is the case of
> > post-copy container migration, in which CRIU is responsible for 
setting
> > up the migration, but the contents of the memory are from the 
migrating
> > process. In this case, we want all RDMA READ requests to be served by
> > the address space of the migration process directly (not by CRIU). 
This
> > patchset implements a new uverbs command which allows an application 
to
> > register a memory region in the address space of another process.
> 
> Hey Joel,
> 
> Dumb question:
> 
> Doesn't this open a security hole by allowing any process to register
> memory in any other process?

Not a dumb question - there is a security problem. Jason just suggested
I look at how ptrace solves the problem, so that's my best option at the
moment. Still, I figured it was a good idea to let everyone take a look
at what I have so far, and start to get feedback.

> Steve.
> 
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR
  2019-01-29 17:04   ` Jason Gunthorpe
@ 2019-01-30  8:34     ` Joel Nider
  2019-01-30 21:23       ` Jason Gunthorpe
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Nider @ 2019-01-30  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Leon Romanovsky, linux-kernel, linux-mm,
	linux-rdma, linux-rdma-owner, Mike Rapoport

linux-rdma-owner@vger.kernel.org wrote on 01/29/2019 07:04:06 PM:

> On Tue, Jan 29, 2019 at 03:26:26PM +0200, Joel Nider wrote:
> > Add a new handler for new uverb reg_remote_mr. The purpose is to 
register
> > a memory region in a different address space (i.e. process) than the
> > caller.
> > 
> > The main use case which motivated this change is post-copy container
> > migration. When a migration manager (i.e. CRIU) starts a migration, it
> > must have an open connection for handling any page faults that occur
> > in the container after restoration on the target machine. Even though
> > CRIU establishes and maintains the connection, ultimately the memory
> > is copied from the container being migrated (i.e. a remote address
> > space). This container must remain passive -- meaning it cannot have
> > any knowledge of the RDMA connection; therefore the migration manager
> > must have the ability to register a remote memory region. This remote
> > memory region will serve as the source for any memory pages that must
> > be copied (on-demand or otherwise) during the migration.
> > 
> > Signed-off-by: Joel Nider <joeln@il.ibm.com>
> >  drivers/infiniband/core/uverbs_std_types_mr.c | 129 
+++++++++++++++++++++++++-
> >  include/rdma/ib_verbs.h                       |   8 ++
> >  include/uapi/rdma/ib_user_ioctl_cmds.h        |  13 +++
> >  3 files changed, 149 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/
> infiniband/core/uverbs_std_types_mr.c
> > index 4d4be0c..bf7b4b2 100644
> > +++ b/drivers/infiniband/core/uverbs_std_types_mr.c
> > @@ -150,6 +150,99 @@ static int 
UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
> >     return ret;
> >  }
> > 
> > +static int UVERBS_HANDLER(UVERBS_METHOD_REG_REMOTE_MR)(
> > +   struct uverbs_attr_bundle *attrs)
> > +{
> 
> I think this should just be REG_MR with an optional remote PID
> argument

Maybe I missed something.  Isn't REG_MR only implemented as a write() 
command? In our earlier conversation you told me all new commands must be 
implemented as ioctl() commands.


> >  DECLARE_UVERBS_NAMED_OBJECT(
> >     UVERBS_OBJECT_MR,
> >     UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> >     &UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
> >     &UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
> > -   &UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR));
> > +   &UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
> > +   &UVERBS_METHOD(UVERBS_METHOD_REG_REMOTE_MR),
> > +);
> 
> I'm kind of surprised this compiles with the trailing comma?
Personally, I think it is nicer with the trailing comma. Of course 
syntactically it makes no sense, but when adding a new entry, you don't 
have to touch the previous line, which makes the diff cleaner. If this is 
against standard practices I will remove the comma.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR
  2019-01-30  8:34     ` Joel Nider
@ 2019-01-30 21:23       ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2019-01-30 21:23 UTC (permalink / raw)
  To: Joel Nider
  Cc: Doug Ledford, Leon Romanovsky, linux-kernel, linux-mm,
	linux-rdma, linux-rdma-owner, Mike Rapoport

On Wed, Jan 30, 2019 at 10:34:02AM +0200, Joel Nider wrote:
> linux-rdma-owner@vger.kernel.org wrote on 01/29/2019 07:04:06 PM:
> 
> > On Tue, Jan 29, 2019 at 03:26:26PM +0200, Joel Nider wrote:
> > > Add a new handler for new uverb reg_remote_mr. The purpose is to 
> register
> > > a memory region in a different address space (i.e. process) than the
> > > caller.
> > > 
> > > The main use case which motivated this change is post-copy container
> > > migration. When a migration manager (i.e. CRIU) starts a migration, it
> > > must have an open connection for handling any page faults that occur
> > > in the container after restoration on the target machine. Even though
> > > CRIU establishes and maintains the connection, ultimately the memory
> > > is copied from the container being migrated (i.e. a remote address
> > > space). This container must remain passive -- meaning it cannot have
> > > any knowledge of the RDMA connection; therefore the migration manager
> > > must have the ability to register a remote memory region. This remote
> > > memory region will serve as the source for any memory pages that must
> > > be copied (on-demand or otherwise) during the migration.
> > > 
> > > Signed-off-by: Joel Nider <joeln@il.ibm.com>
> > >  drivers/infiniband/core/uverbs_std_types_mr.c | 129 
> +++++++++++++++++++++++++-
> > >  include/rdma/ib_verbs.h                       |   8 ++
> > >  include/uapi/rdma/ib_user_ioctl_cmds.h        |  13 +++
> > >  3 files changed, 149 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/
> > infiniband/core/uverbs_std_types_mr.c
> > > index 4d4be0c..bf7b4b2 100644
> > > +++ b/drivers/infiniband/core/uverbs_std_types_mr.c
> > > @@ -150,6 +150,99 @@ static int 
> UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
> > >     return ret;
> > >  }
> > > 
> > > +static int UVERBS_HANDLER(UVERBS_METHOD_REG_REMOTE_MR)(
> > > +   struct uverbs_attr_bundle *attrs)
> > > +{
> > 
> > I think this should just be REG_MR with an optional remote PID
> > argument
> 
> Maybe I missed something.  Isn't REG_MR only implemented as a write() 
> command? In our earlier conversation you told me all new commands must be 
> implemented as ioctl() commands.

Yes - but we are also converting old write() commands into ioctl()
when they need new functionality. So in this case it should convert
reg_mr to ioctl() then add an optional report PID argument
> 
> > >  DECLARE_UVERBS_NAMED_OBJECT(
> > >     UVERBS_OBJECT_MR,
> > >     UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> > >     &UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
> > >     &UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
> > > -   &UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR));
> > > +   &UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
> > > +   &UVERBS_METHOD(UVERBS_METHOD_REG_REMOTE_MR),
> > > +);
> > 
> > I'm kind of surprised this compiles with the trailing comma?
> Personally, I think it is nicer with the trailing comma. Of course 
> syntactically it makes no sense, but when adding a new entry, you don't 
> have to touch the previous line, which makes the diff cleaner. If this is 
> against standard practices I will remove the comma.

Well, it is just that this is a macro call, and you usually can't have
a trailing comma in a function-macro call, at least I thought this was
the case.. Without some study I'm not sure what it expands to, or if
that expansion is even OK..

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-30 21:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 13:26 [PATCH 0/5] RDMA: reg_remote_mr Joel Nider
2019-01-29 13:26 ` [PATCH 1/5] mm: add get_user_pages_remote_longterm function Joel Nider
2019-01-29 13:26 ` [PATCH 2/5] RDMA/uverbs: add owner parameter to reg_user_mr Joel Nider
2019-01-29 13:26 ` [PATCH 3/5] RDMA/uverbs: add owner parameter to ib_umem_get Joel Nider
2019-01-29 16:56   ` Jason Gunthorpe
2019-01-29 18:29   ` Ira Weiny
2019-01-29 13:26 ` [PATCH 4/5] RDMA/uverbs: add owner parameter to ib_umem_odp_get Joel Nider
2019-01-29 13:26 ` [PATCH 5/5] RDMA/uverbs: add UVERBS_METHOD_REG_REMOTE_MR Joel Nider
2019-01-29 17:04   ` Jason Gunthorpe
2019-01-30  8:34     ` Joel Nider
2019-01-30 21:23       ` Jason Gunthorpe
2019-01-29 16:44 ` [PATCH 0/5] RDMA: reg_remote_mr Steve Wise
2019-01-29 18:34   ` Ira Weiny
2019-01-30  8:22   ` Joel Nider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).