All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next V1 00/10] ODP Fixes and Improvements
@ 2017-04-05  6:23 Leon Romanovsky
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Doug,

Please find the following patch set from Artemy. This patch set fixes
and extends ODP support.

This patch set has the following steps:

1. Code simplification for all IB drivers.
2. Three fixes to existing ODP code.
3. Adds generic infrastructure for regions consisting of physically
   contiguous chunks of arbitrary order. Utilizing this infrastructure
   added specific treatment to ODP MRs allocated with MAP_HUGETLB.
4. Adds ODP suport to Memory Windows (MW). Memory windows allow the
   application to have more flexible control over access to its memory.
   The operation of associating an MW with an MR is called binding. When
   MW is bound to ODP MR it may cause page fault which should be
   properly handled.

Thanks

Changes from v0:
 * Remove temp variable (pg_shift) in i40iw driver as was suggested by Shiraz

Artemy Kovalyov (10):
  IB: Replace ib_umem page_size by page_shift
  IB/mlx5: Fix function updating xlt emergency path
  IB/mlx5: Fix UMR size calculation
  IB/mlx5: Fix implicit MR GC
  IB/mlx5: Decrease verbosity level of ODP errors
  IB/umem: Add contiguous ODP support
  IB/mlx5: Add contiguous ODP support
  IB/umem: Add support to huge ODP
  IB/mlx5: Extract page fault code
  IB/mlx5: Add ODP support to MW

 drivers/infiniband/core/umem.c                 |  17 +-
 drivers/infiniband/core/umem_odp.c             |  81 ++++--
 drivers/infiniband/hw/bnxt_re/ib_verbs.c       |  12 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c    |   4 +-
 drivers/infiniband/hw/cxgb4/mem.c              |   4 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c        |   3 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c        |   9 +-
 drivers/infiniband/hw/hns/hns_roce_qp.c        |   3 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c      |  10 +-
 drivers/infiniband/hw/mlx4/cq.c                |   2 +-
 drivers/infiniband/hw/mlx4/mr.c                |   6 +-
 drivers/infiniband/hw/mlx4/qp.c                |   2 +-
 drivers/infiniband/hw/mlx4/srq.c               |   2 +-
 drivers/infiniband/hw/mlx5/mem.c               |  13 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h           |   1 +
 drivers/infiniband/hw/mlx5/mr.c                |   6 +-
 drivers/infiniband/hw/mlx5/odp.c               | 344 +++++++++++++++----------
 drivers/infiniband/hw/mthca/mthca_provider.c   |   5 +-
 drivers/infiniband/hw/nes/nes_verbs.c          |   4 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    |  15 +-
 drivers/infiniband/hw/qedr/verbs.c             |   8 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c |   2 +-
 drivers/infiniband/sw/rdmavt/mr.c              |   8 +-
 drivers/infiniband/sw/rxe/rxe_mr.c             |   8 +-
 include/rdma/ib_umem.h                         |   8 +-
 include/rdma/ib_umem_odp.h                     |   6 +-
 include/rdma/ib_verbs.h                        |   1 +
 27 files changed, 338 insertions(+), 246 deletions(-)

--
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-04-05  6:23   ` Leon Romanovsky
       [not found]     ` <20170405062359.26623-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-04-05  6:23   ` [PATCH rdma-next V1 02/10] IB/mlx5: Fix function updating xlt emergency path Leon Romanovsky
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov,
	Selvin Xavier, Steve Wise, Lijun Ou, Shiraz Saleem,
	Adit Ranadive, Dennis Dalessandro, Ram Amrani

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
CC: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
CC: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
CC: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
CC: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Ram Amrani <Ram.Amrani-74tsMCuadCbQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Change from v0:
 * Remove temp variable (pg_shift) variable from i40iw driver.
---
 drivers/infiniband/core/umem.c                 | 15 ++++++---------
 drivers/infiniband/core/umem_odp.c             | 12 ++++++------
 drivers/infiniband/hw/bnxt_re/ib_verbs.c       | 12 ++++++------
 drivers/infiniband/hw/cxgb3/iwch_provider.c    |  4 ++--
 drivers/infiniband/hw/cxgb4/mem.c              |  4 ++--
 drivers/infiniband/hw/hns/hns_roce_cq.c        |  3 +--
 drivers/infiniband/hw/hns/hns_roce_mr.c        |  9 +++++----
 drivers/infiniband/hw/hns/hns_roce_qp.c        |  3 +--
 drivers/infiniband/hw/i40iw/i40iw_verbs.c      | 10 +++++-----
 drivers/infiniband/hw/mlx4/cq.c                |  2 +-
 drivers/infiniband/hw/mlx4/mr.c                |  6 +++---
 drivers/infiniband/hw/mlx4/qp.c                |  2 +-
 drivers/infiniband/hw/mlx4/srq.c               |  2 +-
 drivers/infiniband/hw/mlx5/mem.c               |  4 ++--
 drivers/infiniband/hw/mlx5/odp.c               |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c   |  5 ++---
 drivers/infiniband/hw/nes/nes_verbs.c          |  4 ++--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    | 15 ++++++---------
 drivers/infiniband/hw/qedr/verbs.c             |  8 ++++----
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c              |  8 ++++----
 drivers/infiniband/sw/rxe/rxe_mr.c             |  8 +++-----
 include/rdma/ib_umem.h                         |  4 ++--
 23 files changed, 67 insertions(+), 77 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 27f155d2df8d..6b87c051ffd4 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -115,11 +115,11 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (!umem)
 		return ERR_PTR(-ENOMEM);

-	umem->context   = context;
-	umem->length    = size;
-	umem->address   = addr;
-	umem->page_size = PAGE_SIZE;
-	umem->pid       = get_task_pid(current, PIDTYPE_PID);
+	umem->context    = context;
+	umem->length     = size;
+	umem->address    = addr;
+	umem->page_shift = PAGE_SHIFT;
+	umem->pid	 = get_task_pid(current, PIDTYPE_PID);
 	/*
 	 * We ask for writable memory if any of the following
 	 * access flags are set.  "Local write" and "remote write"
@@ -315,7 +315,6 @@ EXPORT_SYMBOL(ib_umem_release);

 int ib_umem_page_count(struct ib_umem *umem)
 {
-	int shift;
 	int i;
 	int n;
 	struct scatterlist *sg;
@@ -323,11 +322,9 @@ int ib_umem_page_count(struct ib_umem *umem)
 	if (umem->odp_data)
 		return ib_umem_num_pages(umem);

-	shift = ilog2(umem->page_size);
-
 	n = 0;
 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i)
-		n += sg_dma_len(sg) >> shift;
+		n += sg_dma_len(sg) >> umem->page_shift;

 	return n;
 }
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index cb2742b548bb..8ee30163497d 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -254,11 +254,11 @@ struct ib_umem *ib_alloc_odp_umem(struct ib_ucontext *context,
 	if (!umem)
 		return ERR_PTR(-ENOMEM);

-	umem->context   = context;
-	umem->length    = size;
-	umem->address   = addr;
-	umem->page_size = PAGE_SIZE;
-	umem->writable  = 1;
+	umem->context    = context;
+	umem->length     = size;
+	umem->address    = addr;
+	umem->page_shift = PAGE_SHIFT;
+	umem->writable   = 1;

 	odp_data = kzalloc(sizeof(*odp_data), GFP_KERNEL);
 	if (!odp_data) {
@@ -707,7 +707,7 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem *umem, u64 virt,
 	 * invalidations, so we must make sure we free each page only
 	 * once. */
 	mutex_lock(&umem->odp_data->umem_mutex);
-	for (addr = virt; addr < bound; addr += (u64)umem->page_size) {
+	for (addr = virt; addr < bound; addr += BIT(umem->page_shift)) {
 		idx = (addr - ib_umem_start(umem)) / PAGE_SIZE;
 		if (umem->odp_data->page_list[idx]) {
 			struct page *page = umem->odp_data->page_list[idx];
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 33af2e3de399..a4edb44f03bb 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3016,7 +3016,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	struct bnxt_re_mr *mr;
 	struct ib_umem *umem;
 	u64 *pbl_tbl, *pbl_tbl_orig;
-	int i, umem_pgs, pages, page_shift, rc;
+	int i, umem_pgs, pages, rc;
 	struct scatterlist *sg;
 	int entry;

@@ -3062,22 +3062,22 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	}
 	pbl_tbl_orig = pbl_tbl;

-	page_shift = ilog2(umem->page_size);
 	if (umem->hugetlb) {
 		dev_err(rdev_to_dev(rdev), "umem hugetlb not supported!");
 		rc = -EFAULT;
 		goto fail;
 	}
-	if (umem->page_size != PAGE_SIZE) {
-		dev_err(rdev_to_dev(rdev), "umem page size unsupported!");
+
+	if (umem->page_shift != PAGE_SHIFT) {
+		dev_err(rdev_to_dev(rdev), "umem page shift unsupported!");
 		rc = -EFAULT;
 		goto fail;
 	}
 	/* Map umem buf ptrs to the PBL */
 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
-		pages = sg_dma_len(sg) >> page_shift;
+		pages = sg_dma_len(sg) >> umem->page_shift;
 		for (i = 0; i < pages; i++, pbl_tbl++)
-			*pbl_tbl = sg_dma_address(sg) + (i << page_shift);
+			*pbl_tbl = sg_dma_address(sg) + (i << umem->page_shift);
 	}
 	rc = bnxt_qplib_reg_mr(&rdev->qplib_res, &mr->qplib_mr, pbl_tbl_orig,
 			       umem_pgs, false);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 86ecd3ea6a4b..0a7568aea009 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -584,7 +584,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(err);
 	}

-	shift = ffs(mhp->umem->page_size) - 1;
+	shift = mhp->umem->page_shift;

 	n = mhp->umem->nmap;

@@ -604,7 +604,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			len = sg_dma_len(sg) >> shift;
 			for (k = 0; k < len; ++k) {
 				pages[i++] = cpu_to_be64(sg_dma_address(sg) +
-					mhp->umem->page_size * k);
+							 (k << shift));
 				if (i == PAGE_SIZE / sizeof *pages) {
 					err = iwch_write_pbl(mhp, pages, i, n);
 					if (err)
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 410408f886c1..89b9acd37109 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -517,7 +517,7 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(err);
 	}

-	shift = ffs(mhp->umem->page_size) - 1;
+	shift = mhp->umem->page_shift;

 	n = mhp->umem->nmap;
 	err = alloc_pbl(mhp, n);
@@ -536,7 +536,7 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		len = sg_dma_len(sg) >> shift;
 		for (k = 0; k < len; ++k) {
 			pages[i++] = cpu_to_be64(sg_dma_address(sg) +
-				mhp->umem->page_size * k);
+						 (k << shift));
 			if (i == PAGE_SIZE / sizeof *pages) {
 				err = write_pbl(&mhp->rhp->rdev,
 				      pages,
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 589496c8fb9e..b89fd711019e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -219,8 +219,7 @@ static int hns_roce_ib_get_cq_umem(struct hns_roce_dev *hr_dev,
 		return PTR_ERR(*umem);

 	ret = hns_roce_mtt_init(hr_dev, ib_umem_page_count(*umem),
-				ilog2((unsigned int)(*umem)->page_size),
-				&buf->hr_mtt);
+				(*umem)->page_shift, &buf->hr_mtt);
 	if (ret)
 		goto err_buf;

diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 4139abee3b54..024a81529281 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -503,7 +503,8 @@ int hns_roce_ib_umem_write_mtt(struct hns_roce_dev *hr_dev,
 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
 		len = sg_dma_len(sg) >> mtt->page_shift;
 		for (k = 0; k < len; ++k) {
-			pages[i++] = sg_dma_address(sg) + umem->page_size * k;
+			pages[i++] = sg_dma_address(sg) +
+				(k << umem->page_shift);
 			if (i == PAGE_SIZE / sizeof(u64)) {
 				ret = hns_roce_write_mtt(hr_dev, mtt, n, i,
 							 pages);
@@ -563,9 +564,9 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	}

 	n = ib_umem_page_count(mr->umem);
-	if (mr->umem->page_size != HNS_ROCE_HEM_PAGE_SIZE) {
-		dev_err(dev, "Just support 4K page size but is 0x%x now!\n",
-			mr->umem->page_size);
+	if (mr->umem->page_shift != HNS_ROCE_HEM_PAGE_SHIFT) {
+		dev_err(dev, "Just support 4K page size but is 0x%lx now!\n",
+			BIT(mr->umem->page_shift));
 		ret = -EINVAL;
 		goto err_umem;
 	}
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index 3f44f2f91f03..054c52699090 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -437,8 +437,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 		}

 		ret = hns_roce_mtt_init(hr_dev, ib_umem_page_count(hr_qp->umem),
-				    ilog2((unsigned int)hr_qp->umem->page_size),
-				    &hr_qp->mtt);
+					hr_qp->umem->page_shift, &hr_qp->mtt);
 		if (ret) {
 			dev_err(dev, "hns_roce_mtt_init error for create qp\n");
 			goto err_buf;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 9b2849979756..378c75759be4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1345,7 +1345,7 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr,
 {
 	struct ib_umem *region = iwmr->region;
 	struct i40iw_pbl *iwpbl = &iwmr->iwpbl;
-	int chunk_pages, entry, pg_shift, i;
+	int chunk_pages, entry, i;
 	struct i40iw_pble_alloc *palloc = &iwpbl->pble_alloc;
 	struct i40iw_pble_info *pinfo;
 	struct scatterlist *sg;
@@ -1354,14 +1354,14 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr,

 	pinfo = (level == I40IW_LEVEL_1) ? NULL : palloc->level2.leaf;

-	pg_shift = ffs(region->page_size) - 1;
 	for_each_sg(region->sg_head.sgl, sg, region->nmap, entry) {
-		chunk_pages = sg_dma_len(sg) >> pg_shift;
+		chunk_pages = sg_dma_len(sg) >> region->page_shift;
 		if ((iwmr->type == IW_MEMREG_TYPE_QP) &&
 		    !iwpbl->qp_mr.sq_page)
 			iwpbl->qp_mr.sq_page = sg_page(sg);
 		for (i = 0; i < chunk_pages; i++) {
-			pg_addr = sg_dma_address(sg) + region->page_size * i;
+			pg_addr = sg_dma_address(sg) +
+				(i << region->page_shift);

 			if ((entry + i) == 0)
 				*pbl = cpu_to_le64(pg_addr & iwmr->page_msk);
@@ -1847,7 +1847,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 	iwmr->ibmr.device = pd->device;
 	ucontext = to_ucontext(pd->uobject->context);

-	iwmr->page_size = region->page_size;
+	iwmr->page_size = PAGE_SIZE;
 	iwmr->page_msk = PAGE_MASK;

 	if (region->hugetlb && (req.reg_type == IW_MEMREG_TYPE_MEM))
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 6a0fec357dae..4f5a143fc0a7 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -147,7 +147,7 @@ static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_ucontext *cont
 		return PTR_ERR(*umem);

 	err = mlx4_mtt_init(dev->dev, ib_umem_page_count(*umem),
-			    ilog2((*umem)->page_size), &buf->mtt);
+			    (*umem)->page_shift, &buf->mtt);
 	if (err)
 		goto err_buf;

diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 433bcdbdd680..e6f77f63da75 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -107,7 +107,7 @@ int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt,
 		len = sg_dma_len(sg) >> mtt->page_shift;
 		for (k = 0; k < len; ++k) {
 			pages[i++] = sg_dma_address(sg) +
-				umem->page_size * k;
+				(k << umem->page_shift);
 			/*
 			 * Be friendly to mlx4_write_mtt() and
 			 * pass it chunks of appropriate size.
@@ -155,7 +155,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	}

 	n = ib_umem_page_count(mr->umem);
-	shift = ilog2(mr->umem->page_size);
+	shift = mr->umem->page_shift;

 	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
 			    convert_access(access_flags), n, shift, &mr->mmr);
@@ -239,7 +239,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 			goto release_mpt_entry;
 		}
 		n = ib_umem_page_count(mmr->umem);
-		shift = ilog2(mmr->umem->page_size);
+		shift = mmr->umem->page_shift;

 		err = mlx4_mr_rereg_mem_write(dev->dev, &mmr->mmr,
 					      virt_addr, length, n, shift,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index c34eebc7db65..8f382318f888 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -745,7 +745,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		}

 		err = mlx4_mtt_init(dev->dev, ib_umem_page_count(qp->umem),
-				    ilog2(qp->umem->page_size), &qp->mtt);
+				    qp->umem->page_shift, &qp->mtt);
 		if (err)
 			goto err_buf;

diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 7dd3f267f06b..e32dd58937a8 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -122,7 +122,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		}

 		err = mlx4_mtt_init(dev->dev, ib_umem_page_count(srq->umem),
-				    ilog2(srq->umem->page_size), &srq->mtt);
+				    srq->umem->page_shift, &srq->mtt);
 		if (err)
 			goto err_buf;

diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index 778d8a18925f..a0c2af964249 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -59,7 +59,7 @@ void mlx5_ib_cont_pages(struct ib_umem *umem, u64 addr,
 	u64 pfn;
 	struct scatterlist *sg;
 	int entry;
-	unsigned long page_shift = ilog2(umem->page_size);
+	unsigned long page_shift = umem->page_shift;

 	/* With ODP we must always match OS page size. */
 	if (umem->odp_data) {
@@ -156,7 +156,7 @@ void __mlx5_ib_populate_pas(struct mlx5_ib_dev *dev, struct ib_umem *umem,
 			    int page_shift, size_t offset, size_t num_pages,
 			    __be64 *pas, int access_flags)
 {
-	unsigned long umem_page_shift = ilog2(umem->page_size);
+	unsigned long umem_page_shift = umem->page_shift;
 	int shift = page_shift - umem_page_shift;
 	int mask = (1 << shift) - 1;
 	int i, k, idx;
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index d7b12f0750e2..3bfa3a9c3be0 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -206,7 +206,7 @@ void mlx5_ib_invalidate_range(struct ib_umem *umem, unsigned long start,
 	 * but they will write 0s as well, so no difference in the end result.
 	 */

-	for (addr = start; addr < end; addr += (u64)umem->page_size) {
+	for (addr = start; addr < end; addr += BIT(umem->page_shift)) {
 		idx = (addr - ib_umem_start(umem)) / PAGE_SIZE;
 		/*
 		 * Strive to write the MTTs in chunks, but avoid overwriting
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 22d0e6ee5af6..e1b8940558d2 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -937,7 +937,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto err;
 	}

-	shift = ffs(mr->umem->page_size) - 1;
+	shift = mr->umem->page_shift;
 	n = mr->umem->nmap;

 	mr->mtt = mthca_alloc_mtt(dev, n);
@@ -959,8 +959,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	for_each_sg(mr->umem->sg_head.sgl, sg, mr->umem->nmap, entry) {
 		len = sg_dma_len(sg) >> shift;
 		for (k = 0; k < len; ++k) {
-			pages[i++] = sg_dma_address(sg) +
-				mr->umem->page_size * k;
+			pages[i++] = sg_dma_address(sg) + (k << shift);
 			/*
 			 * Be friendly to write_mtt and pass it chunks
 			 * of appropriate size.
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index ccf0a4cffe9c..11f7c308c7ad 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -2165,9 +2165,9 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	}

 	nes_debug(NES_DBG_MR, "User base = 0x%lX, Virt base = 0x%lX, length = %u,"
-			" offset = %u, page size = %u.\n",
+			" offset = %u, page size = %lu.\n",
 			(unsigned long int)start, (unsigned long int)virt, (u32)length,
-			ib_umem_offset(region), region->page_size);
+			ib_umem_offset(region), BIT(region->page_shift));

 	skip_pages = ((u32)ib_umem_offset(region)) >> 12;

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index bc9fb144e57b..e6c65852797d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -914,21 +914,18 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
 	pbe = (struct ocrdma_pbe *)pbl_tbl->va;
 	pbe_cnt = 0;

-	shift = ilog2(umem->page_size);
+	shift = umem->page_shift;

 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
 		pages = sg_dma_len(sg) >> shift;
 		for (pg_cnt = 0; pg_cnt < pages; pg_cnt++) {
 			/* store the page address in pbe */
 			pbe->pa_lo =
-			    cpu_to_le32(sg_dma_address
-					(sg) +
-					(umem->page_size * pg_cnt));
+			    cpu_to_le32(sg_dma_address(sg) +
+					(pg_cnt << shift));
 			pbe->pa_hi =
-			    cpu_to_le32(upper_32_bits
-					((sg_dma_address
-					  (sg) +
-					  umem->page_size * pg_cnt)));
+			    cpu_to_le32(upper_32_bits(sg_dma_address(sg) +
+					 (pg_cnt << shift)));
 			pbe_cnt += 1;
 			total_num_pbes += 1;
 			pbe++;
@@ -978,7 +975,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	if (status)
 		goto umem_err;

-	mr->hwmr.pbe_size = mr->umem->page_size;
+	mr->hwmr.pbe_size = BIT(mr->umem->page_shift);
 	mr->hwmr.fbo = ib_umem_offset(mr->umem);
 	mr->hwmr.va = usr_addr;
 	mr->hwmr.len = len;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 6b3bb32803bd..e741cc662606 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -680,16 +680,16 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,

 	pbe_cnt = 0;

-	shift = ilog2(umem->page_size);
+	shift = umem->page_shift;

 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
 		pages = sg_dma_len(sg) >> shift;
 		for (pg_cnt = 0; pg_cnt < pages; pg_cnt++) {
 			/* store the page address in pbe */
 			pbe->lo = cpu_to_le32(sg_dma_address(sg) +
-					      umem->page_size * pg_cnt);
+					      (pg_cnt << shift));
 			addr = upper_32_bits(sg_dma_address(sg) +
-					     umem->page_size * pg_cnt);
+					     (pg_cnt << shift));
 			pbe->hi = cpu_to_le32(addr);
 			pbe_cnt++;
 			total_num_pbes++;
@@ -2189,7 +2189,7 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	mr->hw_mr.pbl_ptr = mr->info.pbl_table[0].pa;
 	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
 	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
-	mr->hw_mr.page_size_log = ilog2(mr->umem->page_size);
+	mr->hw_mr.page_size_log = mr->umem->page_shift;
 	mr->hw_mr.fbo = ib_umem_offset(mr->umem);
 	mr->hw_mr.length = len;
 	mr->hw_mr.vaddr = usr_addr;
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
index 948b5ccd2a70..6ef4df6c8c4a 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
@@ -194,7 +194,7 @@ int pvrdma_page_dir_insert_umem(struct pvrdma_page_dir *pdir,
 		len = sg_dma_len(sg) >> PAGE_SHIFT;
 		for (j = 0; j < len; j++) {
 			dma_addr_t addr = sg_dma_address(sg) +
-					  umem->page_size * j;
+					  (j << umem->page_shift);

 			ret = pvrdma_page_dir_insert_dma(pdir, i, addr);
 			if (ret)
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index ae30b6838d79..4e989e1caa40 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -405,8 +405,7 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	mr->mr.access_flags = mr_access_flags;
 	mr->umem = umem;

-	if (is_power_of_2(umem->page_size))
-		mr->mr.page_shift = ilog2(umem->page_size);
+	mr->mr.page_shift = umem->page_shift;
 	m = 0;
 	n = 0;
 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
@@ -418,8 +417,9 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			goto bail_inval;
 		}
 		mr->mr.map[m]->segs[n].vaddr = vaddr;
-		mr->mr.map[m]->segs[n].length = umem->page_size;
-		trace_rvt_mr_user_seg(&mr->mr, m, n, vaddr, umem->page_size);
+		mr->mr.map[m]->segs[n].length = BIT(umem->page_shift);
+		trace_rvt_mr_user_seg(&mr->mr, m, n, vaddr,
+				      BIT(umem->page_shift));
 		n++;
 		if (n == RVT_SEGSZ) {
 			m++;
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 37eea7441ca4..251058b3403c 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -191,10 +191,8 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
 		goto err1;
 	}

-	WARN_ON_ONCE(!is_power_of_2(umem->page_size));
-
-	mem->page_shift		= ilog2(umem->page_size);
-	mem->page_mask		= umem->page_size - 1;
+	mem->page_shift		= umem->page_shift;
+	mem->page_mask		= BIT(umem->page_shift) - 1;

 	num_buf			= 0;
 	map			= mem->map;
@@ -210,7 +208,7 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
 			}

 			buf->addr = (uintptr_t)vaddr;
-			buf->size = umem->page_size;
+			buf->size = BIT(umem->page_shift);
 			num_buf++;
 			buf++;

diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 2d83cfd7e6ce..7f4af1e1ae64 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -44,7 +44,7 @@ struct ib_umem {
 	struct ib_ucontext     *context;
 	size_t			length;
 	unsigned long		address;
-	int			page_size;
+	int			page_shift;
 	int                     writable;
 	int                     hugetlb;
 	struct work_struct	work;
@@ -60,7 +60,7 @@ struct ib_umem {
 /* Returns the offset of the umem start relative to the first page. */
 static inline int ib_umem_offset(struct ib_umem *umem)
 {
-	return umem->address & ((unsigned long)umem->page_size - 1);
+	return umem->address & (BIT(umem->page_shift) - 1);
 }

 /* Returns the first page of an ODP umem. */
--
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 02/10] IB/mlx5: Fix function updating xlt emergency path
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-04-05  6:23   ` [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 03/10] IB/mlx5: Fix UMR size calculation Leon Romanovsky
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In memory shortage path we fall back to use spare buffer.
mlx5_ib_update_xlt() called from ib_uverbs_reg_mr when ibmr.ucontext
not initialized yet.

Scenario how to test it:
1. trigger memory exhaustion so __get_free_pages(GFP_KERNEL, 4) will fail
2. register MR
3. there should be no kernel oops

Fixes: 7d0cc6edcc70 ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index b8f9382a8b7d..1f09e11fa694 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1009,7 +1009,7 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 	}
 
 	if (!xlt) {
-		uctx = to_mucontext(mr->ibmr.uobject->context);
+		uctx = to_mucontext(mr->ibmr.pd->uobject->context);
 		mlx5_ib_warn(dev, "Using XLT emergency buffer\n");
 		size = PAGE_SIZE;
 		xlt = (void *)uctx->upd_xlt_page;
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 03/10] IB/mlx5: Fix UMR size calculation
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-04-05  6:23   ` [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 02/10] IB/mlx5: Fix function updating xlt emergency path Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 04/10] IB/mlx5: Fix implicit MR GC Leon Romanovsky
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Translation table updates of large UMR may require multiple post send
operations. The last operations can be in various lengths, but current
code set them to be the same length.

Fixes: 7d0cc6edcc70 ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 1f09e11fa694..9a74260e9899 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1045,8 +1045,9 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 	for (pages_mapped = 0;
 	     pages_mapped < pages_to_map && !err;
 	     pages_mapped += pages_iter, idx += pages_iter) {
+		npages = min_t(int, pages_iter, pages_to_map - pages_mapped);
 		dma_sync_single_for_cpu(ddev, dma, size, DMA_TO_DEVICE);
-		npages = populate_xlt(mr, idx, pages_iter, xlt,
+		npages = populate_xlt(mr, idx, npages, xlt,
 				      page_shift, size, flags);
 
 		dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 04/10] IB/mlx5: Fix implicit MR GC
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 03/10] IB/mlx5: Fix UMR size calculation Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 05/10] IB/mlx5: Decrease verbosity level of ODP errors Leon Romanovsky
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When implicit MR's leaf MKey becomes unused, i.e. when it's
last page being released my MMU invalidation it is marked as "dying"
and scheduled for release by garbage collector.
Currentle consequent page fault may remove "dying" flag.
Treat leaf MKey as non-existent once it was scheduled to removal
by GC.

Fixes: 81713d3788d2 ('IB/mlx5: Add implicit MR support')
Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/odp.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 3bfa3a9c3be0..b506321f5cb7 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -57,7 +57,7 @@ static int check_parent(struct ib_umem_odp *odp,
 {
 	struct mlx5_ib_mr *mr = odp->private;
 
-	return mr && mr->parent == parent;
+	return mr && mr->parent == parent && !odp->dying;
 }
 
 static struct ib_umem_odp *odp_next(struct ib_umem_odp *odp)
@@ -158,13 +158,6 @@ static void mr_leaf_free_action(struct work_struct *work)
 	mr->parent = NULL;
 	synchronize_srcu(&mr->dev->mr_srcu);
 
-	if (!READ_ONCE(odp->dying)) {
-		mr->parent = imr;
-		if (atomic_dec_and_test(&imr->num_leaf_free))
-			wake_up(&imr->q_leaf_free);
-		return;
-	}
-
 	ib_umem_release(odp->umem);
 	if (imr->live)
 		mlx5_ib_update_xlt(imr, idx, 1, 0,
@@ -436,8 +429,6 @@ static struct ib_umem_odp *implicit_mr_get_data(struct mlx5_ib_mr *mr,
 		nentries++;
 	}
 
-	odp->dying = 0;
-
 	/* Return first odp if region not covered by single one */
 	if (likely(!result))
 		result = odp;
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 05/10] IB/mlx5: Decrease verbosity level of ODP errors
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 04/10] IB/mlx5: Fix implicit MR GC Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 06/10] IB/umem: Add contiguous ODP support Leon Romanovsky
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Decrease verbosity level of ODP error flows messages to debug level.
Remove one redundant print since debug level message already exists in
this flow.

Fixes: d9aaed838765 ('{net,IB}/mlx5: Refactor page fault handling')
Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/odp.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index b506321f5cb7..0d52b72ff99b 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -988,9 +988,6 @@ static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
 		resume_with_error = 0;
 		goto resolve_page_fault;
 	} else if (ret < 0 || total_wqe_bytes > bytes_mapped) {
-		if (ret != -ENOENT)
-			mlx5_ib_err(dev, "PAGE FAULT error: %d. QP 0x%x. type: 0x%x\n",
-				    ret, pfault->wqe.wq_num, pfault->type);
 		goto resolve_page_fault;
 	}
 
@@ -1050,8 +1047,8 @@ static void mlx5_ib_mr_rdma_pfault_handler(struct mlx5_ib_dev *dev,
 	} else if (ret < 0 || pages_in_range(address, length) > ret) {
 		mlx5_ib_page_fault_resume(dev, pfault, 1);
 		if (ret != -ENOENT)
-			mlx5_ib_warn(dev, "PAGE FAULT error %d. QP 0x%x, type: 0x%x\n",
-				     ret, pfault->token, pfault->type);
+			mlx5_ib_dbg(dev, "PAGE FAULT error %d. QP 0x%x, type: 0x%x\n",
+				    ret, pfault->token, pfault->type);
 		return;
 	}
 
@@ -1072,8 +1069,8 @@ static void mlx5_ib_mr_rdma_pfault_handler(struct mlx5_ib_dev *dev,
 						    prefetch_len,
 						    &bytes_committed, NULL);
 		if (ret < 0 && ret != -EAGAIN) {
-			mlx5_ib_warn(dev, "Prefetch failed. ret: %d, QP 0x%x, address: 0x%.16llx, length = 0x%.16x\n",
-				     ret, pfault->token, address, prefetch_len);
+			mlx5_ib_dbg(dev, "Prefetch failed. ret: %d, QP 0x%x, address: 0x%.16llx, length = 0x%.16x\n",
+				    ret, pfault->token, address, prefetch_len);
 		}
 	}
 }
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 06/10] IB/umem: Add contiguous ODP support
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 05/10] IB/mlx5: Decrease verbosity level of ODP errors Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 07/10] IB/mlx5: " Leon Romanovsky
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/umem_odp.c | 50 +++++++++++++++++++++++---------------
 include/rdma/ib_umem.h             |  4 +--
 2 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 8ee30163497d..73053c8a9e3b 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -504,7 +504,6 @@ void ib_umem_odp_release(struct ib_umem *umem)
 static int ib_umem_odp_map_dma_single_page(
 		struct ib_umem *umem,
 		int page_index,
-		u64 base_virt_addr,
 		struct page *page,
 		u64 access_mask,
 		unsigned long current_seq)
@@ -527,7 +526,7 @@ static int ib_umem_odp_map_dma_single_page(
 	if (!(umem->odp_data->dma_list[page_index])) {
 		dma_addr = ib_dma_map_page(dev,
 					   page,
-					   0, PAGE_SIZE,
+					   0, BIT(umem->page_shift),
 					   DMA_BIDIRECTIONAL);
 		if (ib_dma_mapping_error(dev, dma_addr)) {
 			ret = -EFAULT;
@@ -555,8 +554,9 @@ static int ib_umem_odp_map_dma_single_page(
 	if (remove_existing_mapping && umem->context->invalidate_range) {
 		invalidate_page_trampoline(
 			umem,
-			base_virt_addr + (page_index * PAGE_SIZE),
-			base_virt_addr + ((page_index+1)*PAGE_SIZE),
+			ib_umem_start(umem) + (page_index >> umem->page_shift),
+			ib_umem_start(umem) + ((page_index + 1) >>
+					       umem->page_shift),
 			NULL);
 		ret = -EAGAIN;
 	}
@@ -595,10 +595,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 	struct task_struct *owning_process  = NULL;
 	struct mm_struct   *owning_mm       = NULL;
 	struct page       **local_page_list = NULL;
-	u64 off;
-	int j, k, ret = 0, start_idx, npages = 0;
-	u64 base_virt_addr;
+	u64 page_mask, off;
+	int j, k, ret = 0, start_idx, npages = 0, page_shift;
 	unsigned int flags = 0;
+	phys_addr_t p = 0;
 
 	if (access_mask == 0)
 		return -EINVAL;
@@ -611,9 +611,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 	if (!local_page_list)
 		return -ENOMEM;
 
-	off = user_virt & (~PAGE_MASK);
-	user_virt = user_virt & PAGE_MASK;
-	base_virt_addr = user_virt;
+	page_shift = umem->page_shift;
+	page_mask = ~(BIT(page_shift) - 1);
+	off = user_virt & (~page_mask);
+	user_virt = user_virt & page_mask;
 	bcnt += off; /* Charge for the first page offset as well. */
 
 	owning_process = get_pid_task(umem->context->tgid, PIDTYPE_PID);
@@ -631,13 +632,13 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 	if (access_mask & ODP_WRITE_ALLOWED_BIT)
 		flags |= FOLL_WRITE;
 
-	start_idx = (user_virt - ib_umem_start(umem)) >> PAGE_SHIFT;
+	start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
 	k = start_idx;
 
 	while (bcnt > 0) {
-		const size_t gup_num_pages =
-			min_t(size_t, ALIGN(bcnt, PAGE_SIZE) / PAGE_SIZE,
-			      PAGE_SIZE / sizeof(struct page *));
+		const size_t gup_num_pages = min_t(size_t,
+				(bcnt + BIT(page_shift) - 1) >> page_shift,
+				PAGE_SIZE / sizeof(struct page *));
 
 		down_read(&owning_mm->mmap_sem);
 		/*
@@ -656,14 +657,25 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 			break;
 
 		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
-		user_virt += npages << PAGE_SHIFT;
 		mutex_lock(&umem->odp_data->umem_mutex);
-		for (j = 0; j < npages; ++j) {
+		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
+			if (user_virt & ~page_mask) {
+				p += PAGE_SIZE;
+				if (page_to_phys(local_page_list[j]) != p) {
+					ret = -EFAULT;
+					break;
+				}
+				put_page(local_page_list[j]);
+				continue;
+			}
+
 			ret = ib_umem_odp_map_dma_single_page(
-				umem, k, base_virt_addr, local_page_list[j],
-				access_mask, current_seq);
+					umem, k, local_page_list[j],
+					access_mask, current_seq);
 			if (ret < 0)
 				break;
+
+			p = page_to_phys(local_page_list[j]);
 			k++;
 		}
 		mutex_unlock(&umem->odp_data->umem_mutex);
@@ -708,7 +720,7 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem *umem, u64 virt,
 	 * once. */
 	mutex_lock(&umem->odp_data->umem_mutex);
 	for (addr = virt; addr < bound; addr += BIT(umem->page_shift)) {
-		idx = (addr - ib_umem_start(umem)) / PAGE_SIZE;
+		idx = (addr - ib_umem_start(umem)) >> umem->page_shift;
 		if (umem->odp_data->page_list[idx]) {
 			struct page *page = umem->odp_data->page_list[idx];
 			dma_addr_t dma = umem->odp_data->dma_list[idx];
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 7f4af1e1ae64..23159dd5be18 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -72,12 +72,12 @@ static inline unsigned long ib_umem_start(struct ib_umem *umem)
 /* Returns the address of the page after the last one of an ODP umem. */
 static inline unsigned long ib_umem_end(struct ib_umem *umem)
 {
-	return PAGE_ALIGN(umem->address + umem->length);
+	return ALIGN(umem->address + umem->length, BIT(umem->page_shift));
 }
 
 static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 {
-	return (ib_umem_end(umem) - ib_umem_start(umem)) >> PAGE_SHIFT;
+	return (ib_umem_end(umem) - ib_umem_start(umem)) >> umem->page_shift;
 }
 
 #ifdef CONFIG_INFINIBAND_USER_MEM
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 07/10] IB/mlx5: Add contiguous ODP support
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 06/10] IB/umem: Add contiguous ODP support Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 08/10] IB/umem: Add support to huge ODP Leon Romanovsky
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mem.c |  9 ++++-----
 drivers/infiniband/hw/mlx5/odp.c | 28 +++++++++++++++-------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index a0c2af964249..914f212e7ef6 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -61,13 +61,12 @@ void mlx5_ib_cont_pages(struct ib_umem *umem, u64 addr,
 	int entry;
 	unsigned long page_shift = umem->page_shift;
 
-	/* With ODP we must always match OS page size. */
 	if (umem->odp_data) {
-		*count = ib_umem_page_count(umem);
-		*shift = PAGE_SHIFT;
-		*ncont = *count;
+		*ncont = ib_umem_page_count(umem);
+		*count = *ncont << (page_shift - PAGE_SHIFT);
+		*shift = page_shift;
 		if (order)
-			*order = ilog2(roundup_pow_of_two(*count));
+			*order = ilog2(roundup_pow_of_two(*ncont));
 
 		return;
 	}
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 0d52b72ff99b..eddabd6e6596 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -200,7 +200,7 @@ void mlx5_ib_invalidate_range(struct ib_umem *umem, unsigned long start,
 	 */
 
 	for (addr = start; addr < end; addr += BIT(umem->page_shift)) {
-		idx = (addr - ib_umem_start(umem)) / PAGE_SIZE;
+		idx = (addr - ib_umem_start(umem)) >> umem->page_shift;
 		/*
 		 * Strive to write the MTTs in chunks, but avoid overwriting
 		 * non-existing MTTs. The huristic here can be improved to
@@ -218,8 +218,7 @@ void mlx5_ib_invalidate_range(struct ib_umem *umem, unsigned long start,
 
 			if (in_block && umr_offset == 0) {
 				mlx5_ib_update_xlt(mr, blk_start_idx,
-						   idx - blk_start_idx,
-						   PAGE_SHIFT,
+						   idx - blk_start_idx, 0,
 						   MLX5_IB_UPD_XLT_ZAP |
 						   MLX5_IB_UPD_XLT_ATOMIC);
 				in_block = 0;
@@ -228,8 +227,7 @@ void mlx5_ib_invalidate_range(struct ib_umem *umem, unsigned long start,
 	}
 	if (in_block)
 		mlx5_ib_update_xlt(mr, blk_start_idx,
-				   idx - blk_start_idx + 1,
-				   PAGE_SHIFT,
+				   idx - blk_start_idx + 1, 0,
 				   MLX5_IB_UPD_XLT_ZAP |
 				   MLX5_IB_UPD_XLT_ATOMIC);
 	/*
@@ -516,7 +514,7 @@ void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *imr)
 /*
  * Handle a single data segment in a page-fault WQE or RDMA region.
  *
- * Returns number of pages retrieved on success. The caller may continue to
+ * Returns number of OS pages retrieved on success. The caller may continue to
  * the next data segment.
  * Can return the following error codes:
  * -EAGAIN to designate a temporary error. The caller will abort handling the
@@ -531,13 +529,14 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 {
 	int srcu_key;
 	unsigned int current_seq = 0;
-	u64 start_idx;
+	u64 start_idx, page_mask;
 	int npages = 0, ret = 0;
 	struct mlx5_ib_mr *mr;
 	u64 access_mask = ODP_READ_ALLOWED_BIT;
 	struct ib_umem_odp *odp;
 	int implicit = 0;
 	size_t size;
+	int page_shift;
 
 	srcu_key = srcu_read_lock(&dev->mr_srcu);
 	mr = mlx5_ib_odp_find_mr_lkey(dev, key);
@@ -583,6 +582,9 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 		odp = mr->umem->odp_data;
 	}
 
+	page_shift = mr->umem->page_shift;
+	page_mask = ~(BIT(page_shift) - 1);
+
 next_mr:
 	current_seq = READ_ONCE(odp->notifiers_seq);
 	/*
@@ -592,7 +594,7 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 	smp_rmb();
 
 	size = min_t(size_t, bcnt, ib_umem_end(odp->umem) - io_virt);
-	start_idx = (io_virt - (mr->mmkey.iova & PAGE_MASK)) >> PAGE_SHIFT;
+	start_idx = (io_virt - (mr->mmkey.iova & page_mask)) >> page_shift;
 
 	if (mr->umem->writable)
 		access_mask |= ODP_WRITE_ALLOWED_BIT;
@@ -614,7 +616,7 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 			 * checks this.
 			 */
 			ret = mlx5_ib_update_xlt(mr, start_idx, np,
-						 PAGE_SHIFT,
+						 page_shift,
 						 MLX5_IB_UPD_XLT_ATOMIC);
 		} else {
 			ret = -EAGAIN;
@@ -625,14 +627,14 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 				mlx5_ib_err(dev, "Failed to update mkey page tables\n");
 			goto srcu_unlock;
 		}
-
 		if (bytes_mapped) {
-			u32 new_mappings = np * PAGE_SIZE -
-				(io_virt - round_down(io_virt, PAGE_SIZE));
+			u32 new_mappings = (np << page_shift) -
+				(io_virt - round_down(io_virt,
+						      1 << page_shift));
 			*bytes_mapped += min_t(u32, new_mappings, size);
 		}
 
-		npages += np;
+		npages += np << (page_shift - PAGE_SHIFT);
 	}
 
 	bcnt -= size;
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 08/10] IB/umem: Add support to huge ODP
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 07/10] IB/mlx5: " Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
       [not found]     ` <20170405062359.26623-9-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-04-05  6:23   ` [PATCH rdma-next V1 09/10] IB/mlx5: Extract page fault code Leon Romanovsky
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add IB_ACCESS_HUGETLB ib_reg_mr flag.
Hugetlb region registered with this flag
will use single translation entry per huge page.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/umem.c     |  2 +-
 drivers/infiniband/core/umem_odp.c | 19 +++++++++++++++++--
 include/rdma/ib_umem_odp.h         |  6 ++++--
 include/rdma/ib_verbs.h            |  1 +
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 6b87c051ffd4..3dbf811d3c51 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -133,7 +133,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	if (access & IB_ACCESS_ON_DEMAND) {
 		put_pid(umem->pid);
-		ret = ib_umem_odp_get(context, umem);
+		ret = ib_umem_odp_get(context, umem, access);
 		if (ret) {
 			kfree(umem);
 			return ERR_PTR(ret);
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 73053c8a9e3b..0780b1afefa9 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -38,6 +38,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/vmalloc.h>
+#include <linux/hugetlb.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_umem.h>
@@ -306,7 +307,8 @@ struct ib_umem *ib_alloc_odp_umem(struct ib_ucontext *context,
 }
 EXPORT_SYMBOL(ib_alloc_odp_umem);
 
-int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem)
+int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem,
+		    int access)
 {
 	int ret_val;
 	struct pid *our_pid;
@@ -315,6 +317,20 @@ int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem)
 	if (!mm)
 		return -EINVAL;
 
+	if (access & IB_ACCESS_HUGETLB) {
+		struct vm_area_struct *vma;
+		struct hstate *h;
+
+		vma = find_vma(mm, ib_umem_start(umem));
+		if (!vma || !is_vm_hugetlb_page(vma))
+			return -EINVAL;
+		h = hstate_vma(vma);
+		umem->page_shift = huge_page_shift(h);
+		umem->hugetlb = 1;
+	} else {
+		umem->hugetlb = 0;
+	}
+
 	/* Prevent creating ODP MRs in child processes */
 	rcu_read_lock();
 	our_pid = get_task_pid(current->group_leader, PIDTYPE_PID);
@@ -325,7 +341,6 @@ int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem)
 		goto out_mm;
 	}
 
-	umem->hugetlb = 0;
 	umem->odp_data = kzalloc(sizeof(*umem->odp_data), GFP_KERNEL);
 	if (!umem->odp_data) {
 		ret_val = -ENOMEM;
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index 542cd8b3414c..fb67554aabd6 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -84,7 +84,8 @@ struct ib_umem_odp {
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 
-int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem);
+int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem,
+		    int access);
 struct ib_umem *ib_alloc_odp_umem(struct ib_ucontext *context,
 				  unsigned long addr,
 				  size_t size);
@@ -154,7 +155,8 @@ static inline int ib_umem_mmu_notifier_retry(struct ib_umem *item,
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
 static inline int ib_umem_odp_get(struct ib_ucontext *context,
-				  struct ib_umem *umem)
+				  struct ib_umem *umem,
+				  int access)
 {
 	return -EINVAL;
 }
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0f1813c13687..b60288beb3fe 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1336,6 +1336,7 @@ enum ib_access_flags {
 	IB_ACCESS_MW_BIND	= (1<<4),
 	IB_ZERO_BASED		= (1<<5),
 	IB_ACCESS_ON_DEMAND     = (1<<6),
+	IB_ACCESS_HUGETLB	= (1<<7),
 };
 
 /*
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 09/10] IB/mlx5: Extract page fault code
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 08/10] IB/umem: Add support to huge ODP Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-05  6:23   ` [PATCH rdma-next V1 10/10] IB/mlx5: Add ODP support to MW Leon Romanovsky
  2017-04-25 19:41   ` [PATCH rdma-next V1 00/10] ODP Fixes and Improvements Doug Ledford
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

To make page fault handling code more flexible
split pagefault_single_data_segment() function.
Keep MR resolution in pagefault_single_data_segment() and
move actual updates into pagefault_single_mr().

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/odp.c | 203 ++++++++++++++++++++-------------------
 1 file changed, 104 insertions(+), 99 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index eddabd6e6596..842e1dbb50b8 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -511,81 +511,38 @@ void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *imr)
 	wait_event(imr->q_leaf_free, !atomic_read(&imr->num_leaf_free));
 }
 
-/*
- * Handle a single data segment in a page-fault WQE or RDMA region.
- *
- * Returns number of OS pages retrieved on success. The caller may continue to
- * the next data segment.
- * Can return the following error codes:
- * -EAGAIN to designate a temporary error. The caller will abort handling the
- *  page fault and resolve it.
- * -EFAULT when there's an error mapping the requested pages. The caller will
- *  abort the page fault handling.
- */
-static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
-					 u32 key, u64 io_virt, size_t bcnt,
-					 u32 *bytes_committed,
-					 u32 *bytes_mapped)
+static int pagefault_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,
+			u64 io_virt, size_t bcnt, u32 *bytes_mapped)
 {
-	int srcu_key;
-	unsigned int current_seq = 0;
-	u64 start_idx, page_mask;
-	int npages = 0, ret = 0;
-	struct mlx5_ib_mr *mr;
 	u64 access_mask = ODP_READ_ALLOWED_BIT;
+	int npages = 0, page_shift, np;
+	u64 start_idx, page_mask;
 	struct ib_umem_odp *odp;
-	int implicit = 0;
+	int current_seq;
 	size_t size;
-	int page_shift;
-
-	srcu_key = srcu_read_lock(&dev->mr_srcu);
-	mr = mlx5_ib_odp_find_mr_lkey(dev, key);
-	/*
-	 * If we didn't find the MR, it means the MR was closed while we were
-	 * handling the ODP event. In this case we return -EFAULT so that the
-	 * QP will be closed.
-	 */
-	if (!mr || !mr->ibmr.pd) {
-		mlx5_ib_dbg(dev, "Failed to find relevant mr for lkey=0x%06x, probably the MR was destroyed\n",
-			    key);
-		ret = -EFAULT;
-		goto srcu_unlock;
-	}
-	if (!mr->umem->odp_data) {
-		mlx5_ib_dbg(dev, "skipping non ODP MR (lkey=0x%06x) in page fault handler.\n",
-			    key);
-		if (bytes_mapped)
-			*bytes_mapped +=
-				(bcnt - *bytes_committed);
-		goto srcu_unlock;
-	}
-
-	/*
-	 * Avoid branches - this code will perform correctly
-	 * in all iterations (in iteration 2 and above,
-	 * bytes_committed == 0).
-	 */
-	io_virt += *bytes_committed;
-	bcnt -= *bytes_committed;
+	int ret;
 
 	if (!mr->umem->odp_data->page_list) {
 		odp = implicit_mr_get_data(mr, io_virt, bcnt);
 
-		if (IS_ERR(odp)) {
-			ret = PTR_ERR(odp);
-			goto srcu_unlock;
-		}
+		if (IS_ERR(odp))
+			return PTR_ERR(odp);
 		mr = odp->private;
-		implicit = 1;
 
 	} else {
 		odp = mr->umem->odp_data;
 	}
 
+next_mr:
+	size = min_t(size_t, bcnt, ib_umem_end(odp->umem) - io_virt);
+
 	page_shift = mr->umem->page_shift;
 	page_mask = ~(BIT(page_shift) - 1);
+	start_idx = (io_virt - (mr->mmkey.iova & page_mask)) >> page_shift;
+
+	if (mr->umem->writable)
+		access_mask |= ODP_WRITE_ALLOWED_BIT;
 
-next_mr:
 	current_seq = READ_ONCE(odp->notifiers_seq);
 	/*
 	 * Ensure the sequence number is valid for some time before we call
@@ -593,51 +550,43 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 	 */
 	smp_rmb();
 
-	size = min_t(size_t, bcnt, ib_umem_end(odp->umem) - io_virt);
-	start_idx = (io_virt - (mr->mmkey.iova & page_mask)) >> page_shift;
-
-	if (mr->umem->writable)
-		access_mask |= ODP_WRITE_ALLOWED_BIT;
-
 	ret = ib_umem_odp_map_dma_pages(mr->umem, io_virt, size,
 					access_mask, current_seq);
 
 	if (ret < 0)
-		goto srcu_unlock;
+		goto out;
 
-	if (ret > 0) {
-		int np = ret;
-
-		mutex_lock(&odp->umem_mutex);
-		if (!ib_umem_mmu_notifier_retry(mr->umem, current_seq)) {
-			/*
-			 * No need to check whether the MTTs really belong to
-			 * this MR, since ib_umem_odp_map_dma_pages already
-			 * checks this.
-			 */
-			ret = mlx5_ib_update_xlt(mr, start_idx, np,
-						 page_shift,
-						 MLX5_IB_UPD_XLT_ATOMIC);
-		} else {
-			ret = -EAGAIN;
-		}
-		mutex_unlock(&odp->umem_mutex);
-		if (ret < 0) {
-			if (ret != -EAGAIN)
-				mlx5_ib_err(dev, "Failed to update mkey page tables\n");
-			goto srcu_unlock;
-		}
-		if (bytes_mapped) {
-			u32 new_mappings = (np << page_shift) -
-				(io_virt - round_down(io_virt,
-						      1 << page_shift));
-			*bytes_mapped += min_t(u32, new_mappings, size);
-		}
+	np = ret;
 
-		npages += np << (page_shift - PAGE_SHIFT);
+	mutex_lock(&odp->umem_mutex);
+	if (!ib_umem_mmu_notifier_retry(mr->umem, current_seq)) {
+		/*
+		 * No need to check whether the MTTs really belong to
+		 * this MR, since ib_umem_odp_map_dma_pages already
+		 * checks this.
+		 */
+		ret = mlx5_ib_update_xlt(mr, start_idx, np,
+					 page_shift, MLX5_IB_UPD_XLT_ATOMIC);
+	} else {
+		ret = -EAGAIN;
 	}
+	mutex_unlock(&odp->umem_mutex);
 
+	if (ret < 0) {
+		if (ret != -EAGAIN)
+			mlx5_ib_err(dev, "Failed to update mkey page tables\n");
+		goto out;
+	}
+
+	if (bytes_mapped) {
+		u32 new_mappings = (np << page_shift) -
+			(io_virt - round_down(io_virt, 1 << page_shift));
+		*bytes_mapped += min_t(u32, new_mappings, size);
+	}
+
+	npages += np << (page_shift - PAGE_SHIFT);
 	bcnt -= size;
+
 	if (unlikely(bcnt)) {
 		struct ib_umem_odp *next;
 
@@ -646,17 +595,18 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 		if (unlikely(!next || next->umem->address != io_virt)) {
 			mlx5_ib_dbg(dev, "next implicit leaf removed at 0x%llx. got %p\n",
 				    io_virt, next);
-			ret = -EAGAIN;
-			goto srcu_unlock_no_wait;
+			return -EAGAIN;
 		}
 		odp = next;
 		mr = odp->private;
 		goto next_mr;
 	}
 
-srcu_unlock:
+	return npages;
+
+out:
 	if (ret == -EAGAIN) {
-		if (implicit || !odp->dying) {
+		if (mr->parent || !odp->dying) {
 			unsigned long timeout =
 				msecs_to_jiffies(MMU_NOTIFIER_TIMEOUT);
 
@@ -672,7 +622,62 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 		}
 	}
 
-srcu_unlock_no_wait:
+	return ret;
+}
+
+/*
+ * Handle a single data segment in a page-fault WQE or RDMA region.
+ *
+ * Returns number of OS pages retrieved on success. The caller may continue to
+ * the next data segment.
+ * Can return the following error codes:
+ * -EAGAIN to designate a temporary error. The caller will abort handling the
+ *  page fault and resolve it.
+ * -EFAULT when there's an error mapping the requested pages. The caller will
+ *  abort the page fault handling.
+ */
+static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
+					 u32 key, u64 io_virt, size_t bcnt,
+					 u32 *bytes_committed,
+					 u32 *bytes_mapped)
+{
+	int npages = 0, srcu_key, ret;
+	struct mlx5_ib_mr *mr;
+	size_t size;
+
+	srcu_key = srcu_read_lock(&dev->mr_srcu);
+	mr = mlx5_ib_odp_find_mr_lkey(dev, key);
+	/*
+	 * If we didn't find the MR, it means the MR was closed while we were
+	 * handling the ODP event. In this case we return -EFAULT so that the
+	 * QP will be closed.
+	 */
+	if (!mr || !mr->ibmr.pd) {
+		mlx5_ib_dbg(dev, "Failed to find relevant mr for lkey=0x%06x, probably the MR was destroyed\n",
+			    key);
+		ret = -EFAULT;
+		goto srcu_unlock;
+	}
+	if (!mr->umem->odp_data) {
+		mlx5_ib_dbg(dev, "skipping non ODP MR (lkey=0x%06x) in page fault handler.\n",
+			    key);
+		if (bytes_mapped)
+			*bytes_mapped +=
+				(bcnt - *bytes_committed);
+		goto srcu_unlock;
+	}
+
+	/*
+	 * Avoid branches - this code will perform correctly
+	 * in all iterations (in iteration 2 and above,
+	 * bytes_committed == 0).
+	 */
+	io_virt += *bytes_committed;
+	bcnt -= *bytes_committed;
+
+	npages = pagefault_mr(dev, mr, io_virt, size, bytes_mapped);
+
+srcu_unlock:
 	srcu_read_unlock(&dev->mr_srcu, srcu_key);
 	*bytes_committed = 0;
 	return ret ? ret : npages;
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH rdma-next V1 10/10] IB/mlx5: Add ODP support to MW
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 09/10] IB/mlx5: Extract page fault code Leon Romanovsky
@ 2017-04-05  6:23   ` Leon Romanovsky
  2017-04-25 19:41   ` [PATCH rdma-next V1 00/10] ODP Fixes and Improvements Doug Ledford
  10 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05  6:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Internally MW implemented as KLM MKey and filled by userspace UMR
postsends.  Handle pagefault trigered by operations on this MKeys.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   1 +
 drivers/infiniband/hw/mlx5/mr.c      |   1 +
 drivers/infiniband/hw/mlx5/odp.c     | 161 +++++++++++++++++++++++++----------
 3 files changed, 120 insertions(+), 43 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 3cd064b5f0bf..9f519404ad7a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -513,6 +513,7 @@ struct mlx5_ib_mr {
 struct mlx5_ib_mw {
 	struct ib_mw		ibmw;
 	struct mlx5_core_mkey	mmkey;
+	int			ndescs;
 };
 
 struct mlx5_ib_umr_context {
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 9a74260e9899..93c0e82aa491 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1688,6 +1688,7 @@ struct ib_mw *mlx5_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
 
 	mw->mmkey.type = MLX5_MKEY_MW;
 	mw->ibmw.rkey = mw->mmkey.key;
+	mw->ndescs = ndescs;
 
 	resp.response_length = min(offsetof(typeof(resp), response_length) +
 				   sizeof(resp.response_length), udata->outlen);
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 842e1dbb50b8..ae0746754008 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -288,24 +288,6 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 	return;
 }
 
-static struct mlx5_ib_mr *mlx5_ib_odp_find_mr_lkey(struct mlx5_ib_dev *dev,
-						   u32 key)
-{
-	u32 base_key = mlx5_base_mkey(key);
-	struct mlx5_core_mkey *mmkey = __mlx5_mr_lookup(dev->mdev, base_key);
-	struct mlx5_ib_mr *mr;
-
-	if (!mmkey || mmkey->key != key || mmkey->type != MLX5_MKEY_MR)
-		return NULL;
-
-	mr = container_of(mmkey, struct mlx5_ib_mr, mmkey);
-
-	if (!mr->live)
-		return NULL;
-
-	return container_of(mmkey, struct mlx5_ib_mr, mmkey);
-}
-
 static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
 				      struct mlx5_pagefault *pfault,
 				      int error)
@@ -625,6 +607,14 @@ static int pagefault_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,
 	return ret;
 }
 
+struct pf_frame {
+	struct pf_frame *next;
+	u32 key;
+	u64 io_virt;
+	size_t bcnt;
+	int depth;
+};
+
 /*
  * Handle a single data segment in a page-fault WQE or RDMA region.
  *
@@ -641,43 +631,128 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 					 u32 *bytes_committed,
 					 u32 *bytes_mapped)
 {
-	int npages = 0, srcu_key, ret;
+	int npages = 0, srcu_key, ret, i, outlen, cur_outlen = 0, depth = 0;
+	struct pf_frame *head = NULL, *frame;
+	struct mlx5_core_mkey *mmkey;
+	struct mlx5_ib_mw *mw;
 	struct mlx5_ib_mr *mr;
-	size_t size;
+	struct mlx5_klm *pklm;
+	u32 *out = NULL;
+	size_t offset;
 
 	srcu_key = srcu_read_lock(&dev->mr_srcu);
-	mr = mlx5_ib_odp_find_mr_lkey(dev, key);
-	/*
-	 * If we didn't find the MR, it means the MR was closed while we were
-	 * handling the ODP event. In this case we return -EFAULT so that the
-	 * QP will be closed.
-	 */
-	if (!mr || !mr->ibmr.pd) {
-		mlx5_ib_dbg(dev, "Failed to find relevant mr for lkey=0x%06x, probably the MR was destroyed\n",
-			    key);
+
+	io_virt += *bytes_committed;
+	bcnt -= *bytes_committed;
+
+next_mr:
+	mmkey = __mlx5_mr_lookup(dev->mdev, mlx5_base_mkey(key));
+	if (!mmkey || mmkey->key != key) {
+		mlx5_ib_dbg(dev, "failed to find mkey %x\n", key);
 		ret = -EFAULT;
 		goto srcu_unlock;
 	}
-	if (!mr->umem->odp_data) {
-		mlx5_ib_dbg(dev, "skipping non ODP MR (lkey=0x%06x) in page fault handler.\n",
-			    key);
-		if (bytes_mapped)
-			*bytes_mapped +=
-				(bcnt - *bytes_committed);
+
+	switch (mmkey->type) {
+	case MLX5_MKEY_MR:
+		mr = container_of(mmkey, struct mlx5_ib_mr, mmkey);
+		if (!mr->live || !mr->ibmr.pd) {
+			mlx5_ib_dbg(dev, "got dead MR\n");
+			ret = -EFAULT;
+			goto srcu_unlock;
+		}
+
+		ret = pagefault_mr(dev, mr, io_virt, bcnt, bytes_mapped);
+		if (ret < 0)
+			goto srcu_unlock;
+
+		npages += ret;
+		ret = 0;
+		break;
+
+	case MLX5_MKEY_MW:
+		mw = container_of(mmkey, struct mlx5_ib_mw, mmkey);
+
+		if (depth >= MLX5_CAP_GEN(dev->mdev, max_indirection)) {
+			mlx5_ib_dbg(dev, "indirection level exceeded\n");
+			ret = -EFAULT;
+			goto srcu_unlock;
+		}
+
+		outlen = MLX5_ST_SZ_BYTES(query_mkey_out) +
+			sizeof(*pklm) * (mw->ndescs - 2);
+
+		if (outlen > cur_outlen) {
+			kfree(out);
+			out = kzalloc(outlen, GFP_KERNEL);
+			if (!out) {
+				ret = -ENOMEM;
+				goto srcu_unlock;
+			}
+			cur_outlen = outlen;
+		}
+
+		pklm = (struct mlx5_klm *)MLX5_ADDR_OF(query_mkey_out, out,
+						       bsf0_klm0_pas_mtt0_1);
+
+		ret = mlx5_core_query_mkey(dev->mdev, &mw->mmkey, out, outlen);
+		if (ret)
+			goto srcu_unlock;
+
+		offset = io_virt - MLX5_GET64(query_mkey_out, out,
+					      memory_key_mkey_entry.start_addr);
+
+		for (i = 0; bcnt && i < mw->ndescs; i++, pklm++) {
+			if (offset >= be32_to_cpu(pklm->bcount)) {
+				offset -= be32_to_cpu(pklm->bcount);
+				continue;
+			}
+
+			frame = kzalloc(sizeof(*frame), GFP_KERNEL);
+			if (!frame) {
+				ret = -ENOMEM;
+				goto srcu_unlock;
+			}
+
+			frame->key = be32_to_cpu(pklm->key);
+			frame->io_virt = be64_to_cpu(pklm->va) + offset;
+			frame->bcnt = min_t(size_t, bcnt,
+					    be32_to_cpu(pklm->bcount) - offset);
+			frame->depth = depth + 1;
+			frame->next = head;
+			head = frame;
+
+			bcnt -= frame->bcnt;
+		}
+		break;
+
+	default:
+		mlx5_ib_dbg(dev, "wrong mkey type %d\n", mmkey->type);
+		ret = -EFAULT;
 		goto srcu_unlock;
 	}
 
-	/*
-	 * Avoid branches - this code will perform correctly
-	 * in all iterations (in iteration 2 and above,
-	 * bytes_committed == 0).
-	 */
-	io_virt += *bytes_committed;
-	bcnt -= *bytes_committed;
+	if (head) {
+		frame = head;
+		head = frame->next;
 
-	npages = pagefault_mr(dev, mr, io_virt, size, bytes_mapped);
+		key = frame->key;
+		io_virt = frame->io_virt;
+		bcnt = frame->bcnt;
+		depth = frame->depth;
+		kfree(frame);
+
+		goto next_mr;
+	}
 
 srcu_unlock:
+	while (head) {
+		frame = head;
+		head = frame->next;
+		kfree(frame);
+	}
+	kfree(out);
+
 	srcu_read_unlock(&dev->mr_srcu, srcu_key);
 	*bytes_committed = 0;
 	return ret ? ret : npages;
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift
       [not found]     ` <20170405062359.26623-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-04-05  9:48       ` Amrani, Ram
  2017-04-05 16:38       ` Saleem, Shiraz
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Amrani, Ram @ 2017-04-05  9:48 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov,
	Selvin Xavier, Steve Wise, Lijun Ou, Shiraz Saleem,
	Adit Ranadive, Dennis Dalessandro

> From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Size of pages are held by struct ib_umem in page_size field.
> 
> It is better to store it as an exponent, because page size by nature
> is always power-of-two and used as a factor, divisor or ilog2's argument.
> 
> The conversion of page_size to be page_shift allows to have portable
> code and avoid following error while compiling on ARM:
> 
>   ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!
> 
> CC: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> CC: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> CC: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> CC: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> CC: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Ram Amrani <Ram.Amrani-74tsMCuadCbQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> Change from v0:
>  * Remove temp variable (pg_shift) variable from i40iw driver.
> ---

...

> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
> index 6b3bb32803bd..e741cc662606 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -680,16 +680,16 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
> 
>  	pbe_cnt = 0;
> 
> -	shift = ilog2(umem->page_size);
> +	shift = umem->page_shift;
> 
>  	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
>  		pages = sg_dma_len(sg) >> shift;
>  		for (pg_cnt = 0; pg_cnt < pages; pg_cnt++) {
>  			/* store the page address in pbe */
>  			pbe->lo = cpu_to_le32(sg_dma_address(sg) +
> -					      umem->page_size * pg_cnt);
> +					      (pg_cnt << shift));
>  			addr = upper_32_bits(sg_dma_address(sg) +
> -					     umem->page_size * pg_cnt);
> +					     (pg_cnt << shift));
>  			pbe->hi = cpu_to_le32(addr);
>  			pbe_cnt++;
>  			total_num_pbes++;
> @@ -2189,7 +2189,7 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
>  	mr->hw_mr.pbl_ptr = mr->info.pbl_table[0].pa;
>  	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
>  	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
> -	mr->hw_mr.page_size_log = ilog2(mr->umem->page_size);
> +	mr->hw_mr.page_size_log = mr->umem->page_shift;
>  	mr->hw_mr.fbo = ib_umem_offset(mr->umem);
>  	mr->hw_mr.length = len;
>  	mr->hw_mr.vaddr = usr_addr;

Acked-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift
       [not found]     ` <20170405062359.26623-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-04-05  9:48       ` Amrani, Ram
@ 2017-04-05 16:38       ` Saleem, Shiraz
  2017-04-05 17:18       ` Selvin Xavier
  2017-04-05 17:30       ` Adit Ranadive
  3 siblings, 0 replies; 18+ messages in thread
From: Saleem, Shiraz @ 2017-04-05 16:38 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov,
	Selvin Xavier, Steve Wise, Lijun Ou, Adit Ranadive, Dalessandro,
	Dennis, Ram Amrani

> From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Size of pages are held by struct ib_umem in page_size field.
> 
> It is better to store it as an exponent, because page size by nature is always
> power-of-two and used as a factor, divisor or ilog2's argument.
> 
> The conversion of page_size to be page_shift allows to have portable code and
> avoid following error while compiling on ARM:
> 
>   ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!
> 
> CC: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> CC: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> CC: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> CC: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> CC: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Ram Amrani <Ram.Amrani-74tsMCuadCbQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> Change from v0:
>  * Remove temp variable (pg_shift) variable from i40iw driver.
> ---
> diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> index 9b2849979756..378c75759be4 100644
> --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> @@ -1345,7 +1345,7 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr
> *iwmr,  {
>  	struct ib_umem *region = iwmr->region;
>  	struct i40iw_pbl *iwpbl = &iwmr->iwpbl;
> -	int chunk_pages, entry, pg_shift, i;
> +	int chunk_pages, entry, i;
>  	struct i40iw_pble_alloc *palloc = &iwpbl->pble_alloc;
>  	struct i40iw_pble_info *pinfo;
>  	struct scatterlist *sg;
> @@ -1354,14 +1354,14 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr
> *iwmr,
> 
>  	pinfo = (level == I40IW_LEVEL_1) ? NULL : palloc->level2.leaf;
> 
> -	pg_shift = ffs(region->page_size) - 1;
>  	for_each_sg(region->sg_head.sgl, sg, region->nmap, entry) {
> -		chunk_pages = sg_dma_len(sg) >> pg_shift;
> +		chunk_pages = sg_dma_len(sg) >> region->page_shift;
>  		if ((iwmr->type == IW_MEMREG_TYPE_QP) &&
>  		    !iwpbl->qp_mr.sq_page)
>  			iwpbl->qp_mr.sq_page = sg_page(sg);
>  		for (i = 0; i < chunk_pages; i++) {
> -			pg_addr = sg_dma_address(sg) + region->page_size * i;
> +			pg_addr = sg_dma_address(sg) +
> +				(i << region->page_shift);
> 
>  			if ((entry + i) == 0)
>  				*pbl = cpu_to_le64(pg_addr & iwmr->page_msk);
> @@ -1847,7 +1847,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
>  	iwmr->ibmr.device = pd->device;
>  	ucontext = to_ucontext(pd->uobject->context);
> 
> -	iwmr->page_size = region->page_size;
> +	iwmr->page_size = PAGE_SIZE;
>  	iwmr->page_msk = PAGE_MASK;
> 

Thank you!

Acked-by: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next V1 08/10] IB/umem: Add support to huge ODP
       [not found]     ` <20170405062359.26623-9-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-04-05 16:45       ` Shiraz Saleem
       [not found]         ` <20170405164539.GA9232-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Shiraz Saleem @ 2017-04-05 16:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

On Wed, Apr 05, 2017 at 09:23:57AM +0300, Leon Romanovsky wrote:
> From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add IB_ACCESS_HUGETLB ib_reg_mr flag.
> Hugetlb region registered with this flag
> will use single translation entry per huge page.
> 
> Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/core/umem.c     |  2 +-
>  drivers/infiniband/core/umem_odp.c | 19 +++++++++++++++++--
>  include/rdma/ib_umem_odp.h         |  6 ++++--
>  include/rdma/ib_verbs.h            |  1 +
>  4 files changed, 23 insertions(+), 5 deletions(-)
> 
> @@ -315,6 +317,20 @@ int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem)
>  	if (!mm)
>  		return -EINVAL;
>  
> +	if (access & IB_ACCESS_HUGETLB) {
> +		struct vm_area_struct *vma;
> +		struct hstate *h;
> +
> +		vma = find_vma(mm, ib_umem_start(umem));
> +		if (!vma || !is_vm_hugetlb_page(vma))
> +			return -EINVAL;
> +		h = hstate_vma(vma);
> +		umem->page_shift = huge_page_shift(h);
> +		umem->hugetlb = 1;

User memory buffer could span multiple VMAs right? So shouldn’t we check all VMAs of the umem 
before setting the hugetlb flag?

> +	} else {
> +		umem->hugetlb = 0;
> +	}
> +
>  	/* Prevent creating ODP MRs in child processes */
>  	rcu_read_lock();
>  	our_pid = get_task_pid(current->group_leader, PIDTYPE_PID);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift
       [not found]     ` <20170405062359.26623-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-04-05  9:48       ` Amrani, Ram
  2017-04-05 16:38       ` Saleem, Shiraz
@ 2017-04-05 17:18       ` Selvin Xavier
  2017-04-05 17:30       ` Adit Ranadive
  3 siblings, 0 replies; 18+ messages in thread
From: Selvin Xavier @ 2017-04-05 17:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov,
	Steve Wise, Lijun Ou, Shiraz Saleem, Adit Ranadive,
	Dennis Dalessandro, Ram Amrani

Acked-by: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
On Wed, Apr 5, 2017 at 11:53 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Size of pages are held by struct ib_umem in page_size field.
>
> It is better to store it as an exponent, because page size by nature
> is always power-of-two and used as a factor, divisor or ilog2's argument.
>
> The conversion of page_size to be page_shift allows to have portable
> code and avoid following error while compiling on ARM:
>
>   ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!
>
> CC: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> CC: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> CC: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> CC: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> CC: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Ram Amrani <Ram.Amrani-74tsMCuadCbQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> Change from v0:
>  * Remove temp variable (pg_shift) variable from i40iw driver.
> ---
>  drivers/infiniband/core/umem.c                 | 15 ++++++---------
>  drivers/infiniband/core/umem_odp.c             | 12 ++++++------
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c       | 12 ++++++------
>  drivers/infiniband/hw/cxgb3/iwch_provider.c    |  4 ++--
>  drivers/infiniband/hw/cxgb4/mem.c              |  4 ++--
>  drivers/infiniband/hw/hns/hns_roce_cq.c        |  3 +--
>  drivers/infiniband/hw/hns/hns_roce_mr.c        |  9 +++++----
>  drivers/infiniband/hw/hns/hns_roce_qp.c        |  3 +--
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c      | 10 +++++-----
>  drivers/infiniband/hw/mlx4/cq.c                |  2 +-
>  drivers/infiniband/hw/mlx4/mr.c                |  6 +++---
>  drivers/infiniband/hw/mlx4/qp.c                |  2 +-
>  drivers/infiniband/hw/mlx4/srq.c               |  2 +-
>  drivers/infiniband/hw/mlx5/mem.c               |  4 ++--
>  drivers/infiniband/hw/mlx5/odp.c               |  2 +-
>  drivers/infiniband/hw/mthca/mthca_provider.c   |  5 ++---
>  drivers/infiniband/hw/nes/nes_verbs.c          |  4 ++--
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    | 15 ++++++---------
>  drivers/infiniband/hw/qedr/verbs.c             |  8 ++++----
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c |  2 +-
>  drivers/infiniband/sw/rdmavt/mr.c              |  8 ++++----
>  drivers/infiniband/sw/rxe/rxe_mr.c             |  8 +++-----
>  include/rdma/ib_umem.h                         |  4 ++--
>  23 files changed, 67 insertions(+), 77 deletions(-)

For the changes in both ocrdma and bnxt_re modules -

Acked-by: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift
       [not found]     ` <20170405062359.26623-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                         ` (2 preceding siblings ...)
  2017-04-05 17:18       ` Selvin Xavier
@ 2017-04-05 17:30       ` Adit Ranadive
  3 siblings, 0 replies; 18+ messages in thread
From: Adit Ranadive @ 2017-04-05 17:30 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov,
	Selvin Xavier, Steve Wise, Lijun Ou, Shiraz Saleem,
	Dennis Dalessandro, Ram Amrani

On Tue Apr 04 2017 23:23:50 GMT-0700 (PDT), Leon Romanovsky wrote:
> From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Size of pages are held by struct ib_umem in page_size field.
> 
> It is better to store it as an exponent, because page size by nature
> is always power-of-two and used as a factor, divisor or ilog2's argument.
> 
> The conversion of page_size to be page_shift allows to have portable
> code and avoid following error while compiling on ARM:
> 
>   ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!
> 
> CC: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> CC: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> CC: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> CC: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> CC: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Ram Amrani <Ram.Amrani-74tsMCuadCbQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> Change from v0:
>  * Remove temp variable (pg_shift) variable from i40iw driver.
> ---
>  drivers/infiniband/core/umem.c                 | 15 ++++++---------
>  drivers/infiniband/core/umem_odp.c             | 12 ++++++------
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c       | 12 ++++++------
>  drivers/infiniband/hw/cxgb3/iwch_provider.c    |  4 ++--
>  drivers/infiniband/hw/cxgb4/mem.c              |  4 ++--
>  drivers/infiniband/hw/hns/hns_roce_cq.c        |  3 +--
>  drivers/infiniband/hw/hns/hns_roce_mr.c        |  9 +++++----
>  drivers/infiniband/hw/hns/hns_roce_qp.c        |  3 +--
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c      | 10 +++++-----
>  drivers/infiniband/hw/mlx4/cq.c                |  2 +-
>  drivers/infiniband/hw/mlx4/mr.c                |  6 +++---
>  drivers/infiniband/hw/mlx4/qp.c                |  2 +-
>  drivers/infiniband/hw/mlx4/srq.c               |  2 +-
>  drivers/infiniband/hw/mlx5/mem.c               |  4 ++--
>  drivers/infiniband/hw/mlx5/odp.c               |  2 +-
>  drivers/infiniband/hw/mthca/mthca_provider.c   |  5 ++---
>  drivers/infiniband/hw/nes/nes_verbs.c          |  4 ++--
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    | 15 ++++++---------
>  drivers/infiniband/hw/qedr/verbs.c             |  8 ++++----
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c |  2 +-
>  drivers/infiniband/sw/rdmavt/mr.c              |  8 ++++----
>  drivers/infiniband/sw/rxe/rxe_mr.c             |  8 +++-----
>  include/rdma/ib_umem.h                         |  4 ++--
>  23 files changed, 67 insertions(+), 77 deletions(-)

...

> 
> diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
> index 948b5ccd2a70..6ef4df6c8c4a 100644
> --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
> +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
> @@ -194,7 +194,7 @@ int pvrdma_page_dir_insert_umem(struct pvrdma_page_dir *pdir,
>  		len = sg_dma_len(sg) >> PAGE_SHIFT;
>  		for (j = 0; j < len; j++) {
>  			dma_addr_t addr = sg_dma_address(sg) +
> -					  umem->page_size * j;
> +					  (j << umem->page_shift);
> 
>  			ret = pvrdma_page_dir_insert_dma(pdir, i, addr);
>  			if (ret)

Acked-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next V1 08/10] IB/umem: Add support to huge ODP
       [not found]         ` <20170405164539.GA9232-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
@ 2017-04-05 17:33           ` Leon Romanovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2017-04-05 17:33 UTC (permalink / raw)
  To: Shiraz Saleem
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

[-- Attachment #1: Type: text/plain, Size: 2021 bytes --]

On Wed, Apr 05, 2017 at 11:45:39AM -0500, Shiraz Saleem wrote:
> On Wed, Apr 05, 2017 at 09:23:57AM +0300, Leon Romanovsky wrote:
> > From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > Add IB_ACCESS_HUGETLB ib_reg_mr flag.
> > Hugetlb region registered with this flag
> > will use single translation entry per huge page.
> >
> > Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > ---
> >  drivers/infiniband/core/umem.c     |  2 +-
> >  drivers/infiniband/core/umem_odp.c | 19 +++++++++++++++++--
> >  include/rdma/ib_umem_odp.h         |  6 ++++--
> >  include/rdma/ib_verbs.h            |  1 +
> >  4 files changed, 23 insertions(+), 5 deletions(-)
> >
> > @@ -315,6 +317,20 @@ int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem)
> >  	if (!mm)
> >  		return -EINVAL;
> >
> > +	if (access & IB_ACCESS_HUGETLB) {
> > +		struct vm_area_struct *vma;
> > +		struct hstate *h;
> > +
> > +		vma = find_vma(mm, ib_umem_start(umem));
> > +		if (!vma || !is_vm_hugetlb_page(vma))
> > +			return -EINVAL;
> > +		h = hstate_vma(vma);
> > +		umem->page_shift = huge_page_shift(h);
> > +		umem->hugetlb = 1;
>
> User memory buffer could span multiple VMAs right? So shouldn’t we check all VMAs of the umem
> before setting the hugetlb flag?

It depends on what do you want to achieve. Current implementation uses
best effort strategy and it assumes that user called to madvise
with MADV_HUGETLB before. In such case, all VMAs will have hugettlb
property in it. If the user calls madavise for partial umem region,
the application and ODP will continue to work, but won't efficient
in its memory usage.

Thanks

>
> > +	} else {
> > +		umem->hugetlb = 0;
> > +	}
> > +
> >  	/* Prevent creating ODP MRs in child processes */
> >  	rcu_read_lock();
> >  	our_pid = get_task_pid(current->group_leader, PIDTYPE_PID);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next V1 00/10] ODP Fixes and Improvements
       [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-04-05  6:23   ` [PATCH rdma-next V1 10/10] IB/mlx5: Add ODP support to MW Leon Romanovsky
@ 2017-04-25 19:41   ` Doug Ledford
  10 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2017-04-25 19:41 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, 2017-04-05 at 09:23 +0300, Leon Romanovsky wrote:
> Hi Doug,
> 
> Please find the following patch set from Artemy. This patch set fixes
> and extends ODP support.
> 
> This patch set has the following steps:
> 
> 1. Code simplification for all IB drivers.
> 2. Three fixes to existing ODP code.
> 3. Adds generic infrastructure for regions consisting of physically
>    contiguous chunks of arbitrary order. Utilizing this
> infrastructure
>    added specific treatment to ODP MRs allocated with MAP_HUGETLB.
> 4. Adds ODP suport to Memory Windows (MW). Memory windows allow the
>    application to have more flexible control over access to its
> memory.
>    The operation of associating an MW with an MR is called binding.
> When
>    MW is bound to ODP MR it may cause page fault which should be
>    properly handled.
> 
> Thanks

Series applied, thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-25 19:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05  6:23 [PATCH rdma-next V1 00/10] ODP Fixes and Improvements Leon Romanovsky
     [not found] ` <20170405062359.26623-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-04-05  6:23   ` [PATCH rdma-next V1 01/10] IB: Replace ib_umem page_size by page_shift Leon Romanovsky
     [not found]     ` <20170405062359.26623-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-04-05  9:48       ` Amrani, Ram
2017-04-05 16:38       ` Saleem, Shiraz
2017-04-05 17:18       ` Selvin Xavier
2017-04-05 17:30       ` Adit Ranadive
2017-04-05  6:23   ` [PATCH rdma-next V1 02/10] IB/mlx5: Fix function updating xlt emergency path Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 03/10] IB/mlx5: Fix UMR size calculation Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 04/10] IB/mlx5: Fix implicit MR GC Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 05/10] IB/mlx5: Decrease verbosity level of ODP errors Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 06/10] IB/umem: Add contiguous ODP support Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 07/10] IB/mlx5: " Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 08/10] IB/umem: Add support to huge ODP Leon Romanovsky
     [not found]     ` <20170405062359.26623-9-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-04-05 16:45       ` Shiraz Saleem
     [not found]         ` <20170405164539.GA9232-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
2017-04-05 17:33           ` Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 09/10] IB/mlx5: Extract page fault code Leon Romanovsky
2017-04-05  6:23   ` [PATCH rdma-next V1 10/10] IB/mlx5: Add ODP support to MW Leon Romanovsky
2017-04-25 19:41   ` [PATCH rdma-next V1 00/10] ODP Fixes and Improvements Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.