linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers
@ 2020-09-04 22:41 Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 01/17] RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundary Jason Gunthorpe
                   ` (17 more replies)
  0 siblings, 18 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Adit Ranadive, Ariel Elior, Potnuri Bharat Teja, David S. Miller,
	Devesh Sharma, Doug Ledford, Faisal Latif, Gal Pressman,
	GR-everest-linux-l2, Wei Hu(Xavier),
	Jakub Kicinski, Leon Romanovsky, linux-rdma, Weihang Li,
	Michal Kalderon, Naresh Kumar PBS, netdev, Lijun Ou,
	VMware PV-Drivers, Selvin Xavier, Yossi Leybovich, Somnath Kotur,
	Sriharsha Basavapatna, Yishai Hadas
  Cc: Firas JahJah, Henry Orosco, Leon Romanovsky, Michael J. Ruhl,
	Michal Kalderon, Miguel Ojeda, Shiraz Saleem

Most RDMA drivers rely on a linear table of DMA addresses organized in
some device specific page size.

For a while now the core code has had the rdma_for_each_block() SG
iterator to help break a umem into DMA blocks for use in the device lists.

Improve on this by adding rdma_umem_for_each_dma_block(),
ib_umem_dma_offset() and ib_umem_num_dma_blocks().

Replace open codings, or calls to fixed PAGE_SIZE APIs, in most of the
drivers with one of the above APIs.

Get rid of the really weird and duplicative ib_umem_page_count().

Fix two problems with ib_umem_find_best_pgsz(), and several problems
related to computing the wrong DMA list length if IOVA != umem->address.

At this point many of the driver have a clear path to call
ib_umem_find_best_pgsz() and replace hardcoded PAGE_SIZE or PAGE_SHIFT
values when constructing their DMA lists.

This is the first series in an effort to modernize the umem usage in all
the DMA drivers.

v1: https://lore.kernel.org/r/0-v1-00f59ce24f1f+19f50-umem_1_jgg@nvidia.com
v2:
 - Fix ib_umem_find_best_pgsz() to use IOVA not umem->addr
 - Fix ib_umem_num_dma_blocks() to use IOVA not umem->addr
 - Two new patches to remove wrong open coded versions of
   ib_umem_num_dma_blocks() from EFA and i40iw
 - Redo the mlx4 ib_umem_num_dma_blocks() to do less and be safer
   until the whole thing can be moved to ib_umem_find_best_pgsz()
 - Two new patches to delete calls to ib_umem_offset() in qedr and
   ocrdma

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (17):
  RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page
    boundary
  RDMA/umem: Prevent small pages from being returned by
    ib_umem_find_best_pgsz()
  RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()
  RDMA/umem: Add rdma_umem_for_each_dma_block()
  RDMA/umem: Replace for_each_sg_dma_page with
    rdma_umem_for_each_dma_block
  RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
  RDMA/efa: Use ib_umem_num_dma_pages()
  RDMA/i40iw: Use ib_umem_num_dma_pages()
  RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding
  RDMA/qedr: Use ib_umem_num_dma_blocks() instead of
    ib_umem_page_count()
  RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages()
  RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
  RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of
    ib_umem_page_count()
  RDMA/pvrdma: Use ib_umem_num_dma_blocks() instead of
    ib_umem_page_count()
  RDMA/mlx4: Use ib_umem_num_dma_blocks()
  RDMA/qedr: Remove fbo and zbva from the MR
  RDMA/ocrdma: Remove fbo from MR

 .clang-format                                 |  1 +
 drivers/infiniband/core/umem.c                | 45 +++++++-----
 drivers/infiniband/hw/bnxt_re/ib_verbs.c      | 72 +++++++------------
 drivers/infiniband/hw/cxgb4/mem.c             |  8 +--
 drivers/infiniband/hw/efa/efa_verbs.c         |  9 ++-
 drivers/infiniband/hw/hns/hns_roce_alloc.c    |  3 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c       | 49 +++++--------
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     | 13 +---
 drivers/infiniband/hw/mlx4/cq.c               |  1 -
 drivers/infiniband/hw/mlx4/mr.c               |  5 +-
 drivers/infiniband/hw/mlx4/qp.c               |  2 -
 drivers/infiniband/hw/mlx4/srq.c              |  5 +-
 drivers/infiniband/hw/mlx5/mem.c              |  4 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  8 +--
 drivers/infiniband/hw/ocrdma/ocrdma.h         |  1 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c      |  5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   | 25 +++----
 drivers/infiniband/hw/qedr/verbs.c            | 52 +++++---------
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |  2 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_misc.c    |  9 ++-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |  6 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c    | 12 +---
 include/linux/qed/qed_rdma_if.h               |  2 -
 include/rdma/ib_umem.h                        | 37 ++++++++--
 include/rdma/ib_verbs.h                       | 24 -------
 27 files changed, 170 insertions(+), 234 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 01/17] RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundary
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 02/17] RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz() Jason Gunthorpe
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma; +Cc: Leon Romanovsky, Shiraz Saleem

It is possible for a single SGL to span an aligned boundary, eg if the SGL
is

  61440 -> 90112

Then the length is 28672, which currently limits the block size to
32k. With a 32k page size the two covering blocks will be:

  32768->65536 and 65536->98304

However, the correct answer is a 128K block size which will span the whole
28672 bytes in a single block.

Instead of limiting based on length figure out which high IOVA bits don't
change between the start and end addresses. That is the highest useful
page size.

Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/core/umem.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 831bff8d52e547..09539dd764ec05 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -156,8 +156,13 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 		return 0;
 
 	va = virt;
-	/* max page size not to exceed MR length */
-	mask = roundup_pow_of_two(umem->length);
+	/* The best result is the smallest page size that results in the minimum
+	 * number of required pages. Compute the largest page size that could
+	 * work based on VA address bits that don't change.
+	 */
+	mask = pgsz_bitmap &
+	       GENMASK(BITS_PER_LONG - 1,
+		       bits_per((umem->length - 1 + virt) ^ virt));
 	/* offset into first SGL */
 	pgoff = umem->address & ~PAGE_MASK;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/17] RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 01/17] RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundary Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 03/17] RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz() Jason Gunthorpe
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma; +Cc: Leon Romanovsky, Shiraz Saleem

rdma_for_each_block() makes assumptions about how the SGL is constructed
that don't work if the block size is below the page size used to to build
the SGL.

The rules for umem SGL construction require that the SG's all be PAGE_SIZE
aligned and we don't encode the actual byte offset of the VA range inside
the SGL using offset and length. So rdma_for_each_block() has no idea
where the actual starting/ending point is to compute the first/last block
boundary if the starting address should be within a SGL.

Fixing the SGL construction turns out to be really hard, and will be the
subject of other patches. For now block smaller pages.

Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/core/umem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 09539dd764ec05..1d0599997d0fb5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -151,6 +151,12 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 	dma_addr_t mask;
 	int i;
 
+	/* rdma_for_each_block() has a bug if the page size is smaller than the
+	 * page size used to build the umem. For now prevent smaller page sizes
+	 * from being returned.
+	 */
+	pgsz_bitmap &= GENMASK(BITS_PER_LONG - 1, PAGE_SHIFT);
+
 	/* At minimum, drivers must support PAGE_SIZE or smaller */
 	if (WARN_ON(!(pgsz_bitmap & GENMASK(PAGE_SHIFT, 0))))
 		return 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/17] RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 01/17] RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundary Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 02/17] RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 04/17] RDMA/umem: Add rdma_umem_for_each_dma_block() Jason Gunthorpe
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma

The calculation in rdma_find_pg_bit() is fairly complicated, and the
function is never called anywhere else. Inline a simpler version into
ib_umem_find_best_pgsz()

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/core/umem.c | 11 ++++++++---
 include/rdma/ib_verbs.h        | 24 ------------------------
 2 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 1d0599997d0fb5..fb7630e7aac3a7 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -39,6 +39,7 @@
 #include <linux/export.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
+#include <linux/count_zeros.h>
 #include <rdma/ib_umem_odp.h>
 
 #include "uverbs.h"
@@ -146,7 +147,6 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 				     unsigned long virt)
 {
 	struct scatterlist *sg;
-	unsigned int best_pg_bit;
 	unsigned long va, pgoff;
 	dma_addr_t mask;
 	int i;
@@ -186,9 +186,14 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 			mask |= va;
 		pgoff = 0;
 	}
-	best_pg_bit = rdma_find_pg_bit(mask, pgsz_bitmap);
 
-	return BIT_ULL(best_pg_bit);
+	/* The mask accumulates 1's in each position where the VA and physical
+	 * address differ, thus the length of trailing 0 is the largest page
+	 * size that can pass the VA through to the physical.
+	 */
+	if (mask)
+		pgsz_bitmap &= GENMASK(count_trailing_zeros(mask), 0);
+	return rounddown_pow_of_two(pgsz_bitmap);
 }
 EXPORT_SYMBOL(ib_umem_find_best_pgsz);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c868609a4ffaed..5dcbbb77cadb4f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3340,30 +3340,6 @@ static inline bool rdma_cap_read_inv(struct ib_device *dev, u32 port_num)
 	return rdma_protocol_iwarp(dev, port_num);
 }
 
-/**
- * rdma_find_pg_bit - Find page bit given address and HW supported page sizes
- *
- * @addr: address
- * @pgsz_bitmap: bitmap of HW supported page sizes
- */
-static inline unsigned int rdma_find_pg_bit(unsigned long addr,
-					    unsigned long pgsz_bitmap)
-{
-	unsigned long align;
-	unsigned long pgsz;
-
-	align = addr & -addr;
-
-	/* Find page bit such that addr is aligned to the highest supported
-	 * HW page size
-	 */
-	pgsz = pgsz_bitmap & ~(-align << 1);
-	if (!pgsz)
-		return __ffs(pgsz_bitmap);
-
-	return __fls(pgsz);
-}
-
 /**
  * rdma_core_cap_opa_port - Return whether the RDMA Port is OPA or not.
  * @device: Device
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/17] RDMA/umem: Add rdma_umem_for_each_dma_block()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 03/17] RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 05/17] RDMA/umem: Replace for_each_sg_dma_page with rdma_umem_for_each_dma_block Jason Gunthorpe
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Devesh Sharma, Doug Ledford, Faisal Latif, Gal Pressman,
	Wei Hu(Xavier),
	linux-rdma, Weihang Li, Naresh Kumar PBS, Lijun Ou,
	Selvin Xavier, Yossi Leybovich, Somnath Kotur,
	Sriharsha Basavapatna
  Cc: Miguel Ojeda, Shiraz Saleem

This helper does the same as rdma_for_each_block(), except it works on a
umem. This simplifies most of the call sites.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 .clang-format                              |  1 +
 drivers/infiniband/hw/bnxt_re/ib_verbs.c   |  2 +-
 drivers/infiniband/hw/efa/efa_verbs.c      |  3 +--
 drivers/infiniband/hw/hns/hns_roce_alloc.c |  3 +--
 drivers/infiniband/hw/i40iw/i40iw_verbs.c  |  3 +--
 include/rdma/ib_umem.h                     | 20 ++++++++++++++++++++
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/.clang-format b/.clang-format
index a0a96088c74f49..311ef2c61a1bdf 100644
--- a/.clang-format
+++ b/.clang-format
@@ -415,6 +415,7 @@ ForEachMacros:
   - 'rbtree_postorder_for_each_entry_safe'
   - 'rdma_for_each_block'
   - 'rdma_for_each_port'
+  - 'rdma_umem_for_each_dma_block'
   - 'resource_list_for_each_entry'
   - 'resource_list_for_each_entry_safe'
   - 'rhl_for_each_entry_rcu'
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 5ee272d27aaade..9e26e651730cb3 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3783,7 +3783,7 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig,
 	u64 page_size =  BIT_ULL(page_shift);
 	struct ib_block_iter biter;
 
-	rdma_for_each_block(umem->sg_head.sgl, &biter, umem->nmap, page_size)
+	rdma_umem_for_each_dma_block(umem, &biter, page_size)
 		*pbl_tbl++ = rdma_block_iter_dma_address(&biter);
 
 	return pbl_tbl - pbl_tbl_orig;
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index de9a22f0fcc218..d85c63a5021a70 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1142,8 +1142,7 @@ static int umem_to_page_list(struct efa_dev *dev,
 	ibdev_dbg(&dev->ibdev, "hp_cnt[%u], pages_in_hp[%u]\n",
 		  hp_cnt, pages_in_hp);
 
-	rdma_for_each_block(umem->sg_head.sgl, &biter, umem->nmap,
-			    BIT(hp_shift))
+	rdma_umem_for_each_dma_block(umem, &biter, BIT(hp_shift))
 		page_list[hp_idx++] = rdma_block_iter_dma_address(&biter);
 
 	return 0;
diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index a522cb2d29eabc..a6b23dec1adcf6 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -268,8 +268,7 @@ int hns_roce_get_umem_bufs(struct hns_roce_dev *hr_dev, dma_addr_t *bufs,
 	}
 
 	/* convert system page cnt to hw page cnt */
-	rdma_for_each_block(umem->sg_head.sgl, &biter, umem->nmap,
-			    1 << page_shift) {
+	rdma_umem_for_each_dma_block(umem, &biter, 1 << page_shift) {
 		addr = rdma_block_iter_dma_address(&biter);
 		if (idx >= start) {
 			bufs[total++] = addr;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index b51339328a51ef..beb611b157bc8d 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1320,8 +1320,7 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr,
 	if (iwmr->type == IW_MEMREG_TYPE_QP)
 		iwpbl->qp_mr.sq_page = sg_page(region->sg_head.sgl);
 
-	rdma_for_each_block(region->sg_head.sgl, &biter, region->nmap,
-			    iwmr->page_size) {
+	rdma_umem_for_each_dma_block(region, &biter, iwmr->page_size) {
 		*pbl = rdma_block_iter_dma_address(&biter);
 		pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx);
 	}
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 07a764eb692eed..b880512ba95f16 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -40,6 +40,26 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 	       PAGE_SHIFT;
 }
 
+static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter,
+						struct ib_umem *umem,
+						unsigned long pgsz)
+{
+	__rdma_block_iter_start(biter, umem->sg_head.sgl, umem->nmap, pgsz);
+}
+
+/**
+ * rdma_umem_for_each_dma_block - iterate over contiguous DMA blocks of the umem
+ * @umem: umem to iterate over
+ * @pgsz: Page size to split the list into
+ *
+ * pgsz must be <= PAGE_SIZE or computed by ib_umem_find_best_pgsz(). The
+ * returned DMA blocks will be aligned to pgsz and span the range:
+ * ALIGN_DOWN(umem->address, pgsz) to ALIGN(umem->address + umem->length, pgsz)
+ */
+#define rdma_umem_for_each_dma_block(umem, biter, pgsz)                        \
+	for (__rdma_umem_block_iter_start(biter, umem, pgsz);                  \
+	     __rdma_block_iter_next(biter);)
+
 #ifdef CONFIG_INFINIBAND_USER_MEM
 
 struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/17] RDMA/umem: Replace for_each_sg_dma_page with rdma_umem_for_each_dma_block
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 04/17] RDMA/umem: Add rdma_umem_for_each_dma_block() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks() Jason Gunthorpe
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Adit Ranadive, Potnuri Bharat Teja, Devesh Sharma, Doug Ledford,
	linux-rdma, VMware PV-Drivers, Selvin Xavier

Generally drivers should be using this core helper to split up the umem
into DMA pages.

These drivers are all probably wrong in some way to pass PAGE_SIZE in as
the HW page size. Either the driver doesn't support other page sizes and
it should use 4096, or the driver does support other page sizes and should
use ib_umem_find_best_pgsz() to select the best HW pages size of the HW
supported set.

The only case it could be correct is if the HW has a global setting for
PAGE_SIZE set at driver initialization time.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/cxgb4/mem.c              | 6 +++---
 drivers/infiniband/hw/mthca/mthca_provider.c   | 6 +++---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    | 7 +++----
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c | 9 ++++-----
 4 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 73936c3341b77c..82afdb1987eff6 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -510,7 +510,7 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	__be64 *pages;
 	int shift, n, i;
 	int err = -ENOMEM;
-	struct sg_dma_page_iter sg_iter;
+	struct ib_block_iter biter;
 	struct c4iw_dev *rhp;
 	struct c4iw_pd *php;
 	struct c4iw_mr *mhp;
@@ -561,8 +561,8 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	i = n = 0;
 
-	for_each_sg_dma_page(mhp->umem->sg_head.sgl, &sg_iter, mhp->umem->nmap, 0) {
-		pages[i++] = cpu_to_be64(sg_page_iter_dma_address(&sg_iter));
+	rdma_umem_for_each_dma_block(mhp->umem, &biter, 1 << shift) {
+		pages[i++] = cpu_to_be64(rdma_block_iter_dma_address(&biter));
 		if (i == PAGE_SIZE / sizeof(*pages)) {
 			err = write_pbl(&mhp->rhp->rdev, pages,
 					mhp->attr.pbl_addr + (n << 3), i,
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 9fa2f9164a47b6..317e67ad915fe8 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -846,7 +846,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				       u64 virt, int acc, struct ib_udata *udata)
 {
 	struct mthca_dev *dev = to_mdev(pd->device);
-	struct sg_dma_page_iter sg_iter;
+	struct ib_block_iter biter;
 	struct mthca_ucontext *context = rdma_udata_to_drv_context(
 		udata, struct mthca_ucontext, ibucontext);
 	struct mthca_mr *mr;
@@ -895,8 +895,8 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	write_mtt_size = min(mthca_write_mtt_size(dev), (int) (PAGE_SIZE / sizeof *pages));
 
-	for_each_sg_dma_page(mr->umem->sg_head.sgl, &sg_iter, mr->umem->nmap, 0) {
-		pages[i++] = sg_page_iter_dma_address(&sg_iter);
+	rdma_umem_for_each_dma_block(mr->umem, &biter, PAGE_SIZE) {
+		pages[i++] = rdma_block_iter_dma_address(&biter);
 
 		/*
 		 * Be friendly to write_mtt and pass it chunks
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index c1751c9a0f625c..933b297de2ba86 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -814,9 +814,8 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
 			    u32 num_pbes)
 {
 	struct ocrdma_pbe *pbe;
-	struct sg_dma_page_iter sg_iter;
+	struct ib_block_iter biter;
 	struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table;
-	struct ib_umem *umem = mr->umem;
 	int pbe_cnt, total_num_pbes = 0;
 	u64 pg_addr;
 
@@ -826,9 +825,9 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
 	pbe = (struct ocrdma_pbe *)pbl_tbl->va;
 	pbe_cnt = 0;
 
-	for_each_sg_dma_page (umem->sg_head.sgl, &sg_iter, umem->nmap, 0) {
+	rdma_umem_for_each_dma_block (mr->umem, &biter, PAGE_SIZE) {
 		/* store the page address in pbe */
-		pg_addr = sg_page_iter_dma_address(&sg_iter);
+		pg_addr = rdma_block_iter_dma_address(&biter);
 		pbe->pa_lo = cpu_to_le32(pg_addr);
 		pbe->pa_hi = cpu_to_le32(upper_32_bits(pg_addr));
 		pbe_cnt += 1;
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
index 7944c58ded0e59..ba43ad07898c2b 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c
@@ -182,17 +182,16 @@ int pvrdma_page_dir_insert_dma(struct pvrdma_page_dir *pdir, u64 idx,
 int pvrdma_page_dir_insert_umem(struct pvrdma_page_dir *pdir,
 				struct ib_umem *umem, u64 offset)
 {
+	struct ib_block_iter biter;
 	u64 i = offset;
 	int ret = 0;
-	struct sg_dma_page_iter sg_iter;
 
 	if (offset >= pdir->npages)
 		return -EINVAL;
 
-	for_each_sg_dma_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) {
-		dma_addr_t addr = sg_page_iter_dma_address(&sg_iter);
-
-		ret = pvrdma_page_dir_insert_dma(pdir, i, addr);
+	rdma_umem_for_each_dma_block (umem, &biter, PAGE_SIZE) {
+		ret = pvrdma_page_dir_insert_dma(
+			pdir, i, rdma_block_iter_dma_address(&biter));
 		if (ret)
 			goto exit;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 05/17] RDMA/umem: Replace for_each_sg_dma_page with rdma_umem_for_each_dma_block Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-07 12:16   ` Gal Pressman
  2020-09-11 13:21   ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages() Jason Gunthorpe
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Adit Ranadive, Potnuri Bharat Teja, Doug Ledford,
	Leon Romanovsky, linux-rdma, VMware PV-Drivers

ib_num_pages() should only be used by things working with the SGL in CPU
pages directly.

Drivers building DMA lists should use the new ib_num_dma_blocks() which
returns the number of blocks rdma_umem_for_each_block() will return.

To make this general for DMA drivers requires a different implementation.
Computing DMA block count based on umem->address only works if the
requested page size is < PAGE_SIZE and/or the IOVA == umem->address.

Instead the number of DMA pages should be computed in the IOVA address
space, not umem->address. Thus the IOVA has to be stored inside the umem
so it can be used for these calculations.

For now set it to umem->address by default and fix it up if
ib_umem_find_best_pgsz() was called. This allows drivers to be converted
to ib_umem_num_dma_blocks() safely.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/core/umem.c               |  7 ++++++-
 drivers/infiniband/hw/cxgb4/mem.c            |  2 +-
 drivers/infiniband/hw/mlx5/mem.c             |  4 ++--
 drivers/infiniband/hw/mthca/mthca_provider.c |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c |  2 +-
 include/rdma/ib_umem.h                       | 15 ++++++++++++---
 6 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index fb7630e7aac3a7..b57dbb14de8378 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -161,7 +161,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 	if (WARN_ON(!(pgsz_bitmap & GENMASK(PAGE_SHIFT, 0))))
 		return 0;
 
-	va = virt;
+	umem->iova = va = virt;
 	/* The best result is the smallest page size that results in the minimum
 	 * number of required pages. Compute the largest page size that could
 	 * work based on VA address bits that don't change.
@@ -240,6 +240,11 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
 	umem->ibdev      = device;
 	umem->length     = size;
 	umem->address    = addr;
+	/*
+	 * Drivers should call ib_umem_find_best_pgsz() to set the iova
+	 * correctly.
+	 */
+	umem->iova = addr;
 	umem->writable   = ib_access_writable(access);
 	umem->owning_mm = mm = current->mm;
 	mmgrab(mm);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 82afdb1987eff6..22c8f5745047db 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -548,7 +548,7 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	shift = PAGE_SHIFT;
 
-	n = ib_umem_num_pages(mhp->umem);
+	n = ib_umem_num_dma_blocks(mhp->umem, 1 << shift);
 	err = alloc_pbl(mhp, n);
 	if (err)
 		goto err_umem_release;
diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index c19ec9fd8a63c3..13de3d2edd34e3 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -169,8 +169,8 @@ void mlx5_ib_populate_pas(struct mlx5_ib_dev *dev, struct ib_umem *umem,
 			  int page_shift, __be64 *pas, int access_flags)
 {
 	return __mlx5_ib_populate_pas(dev, umem, page_shift, 0,
-				      ib_umem_num_pages(umem), pas,
-				      access_flags);
+				      ib_umem_num_dma_blocks(umem, PAGE_SIZE),
+				      pas, access_flags);
 }
 int mlx5_ib_get_buf_offset(u64 addr, int page_shift, u32 *offset)
 {
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 317e67ad915fe8..b785fb9a2634ff 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -877,7 +877,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto err;
 	}
 
-	n = ib_umem_num_pages(mr->umem);
+	n = ib_umem_num_dma_blocks(mr->umem, PAGE_SIZE);
 
 	mr->mtt = mthca_alloc_mtt(dev, n);
 	if (IS_ERR(mr->mtt)) {
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
index 91f0957e611543..e80848bfb3bdbf 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
@@ -133,7 +133,7 @@ struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_CAST(umem);
 	}
 
-	npages = ib_umem_num_pages(umem);
+	npages = ib_umem_num_dma_blocks(umem, PAGE_SIZE);
 	if (npages < 0 || npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
 		dev_warn(&dev->pdev->dev, "overflow %d pages in mem region\n",
 			 npages);
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index b880512ba95f16..0f1ab3d8f77dea 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -17,6 +17,7 @@ struct ib_umem_odp;
 struct ib_umem {
 	struct ib_device       *ibdev;
 	struct mm_struct       *owning_mm;
+	u64 iova;
 	size_t			length;
 	unsigned long		address;
 	u32 writable : 1;
@@ -33,11 +34,17 @@ static inline int ib_umem_offset(struct ib_umem *umem)
 	return umem->address & ~PAGE_MASK;
 }
 
+static inline size_t ib_umem_num_dma_blocks(struct ib_umem *umem,
+					    unsigned long pgsz)
+{
+	return (ALIGN(umem->iova + umem->length, pgsz) -
+		ALIGN_DOWN(umem->iova, pgsz)) /
+	       pgsz;
+}
+
 static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 {
-	return (ALIGN(umem->address + umem->length, PAGE_SIZE) -
-		ALIGN_DOWN(umem->address, PAGE_SIZE)) >>
-	       PAGE_SHIFT;
+	return ib_umem_num_dma_blocks(umem, PAGE_SIZE);
 }
 
 static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter,
@@ -55,6 +62,8 @@ static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter,
  * pgsz must be <= PAGE_SIZE or computed by ib_umem_find_best_pgsz(). The
  * returned DMA blocks will be aligned to pgsz and span the range:
  * ALIGN_DOWN(umem->address, pgsz) to ALIGN(umem->address + umem->length, pgsz)
+ *
+ * Performs exactly ib_umem_num_dma_blocks() iterations.
  */
 #define rdma_umem_for_each_dma_block(umem, biter, pgsz)                        \
 	for (__rdma_umem_block_iter_start(biter, umem, pgsz);                  \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-07 12:19   ` Gal Pressman
  2020-09-04 22:41 ` [PATCH v2 08/17] RDMA/i40iw: " Jason Gunthorpe
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: linux-rdma
  Cc: Doug Ledford, Firas JahJah, Gal Pressman, Shiraz Saleem, Yossi Leybovich

If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
not correct. 'start' should be 'virt'. Change it to use the core code for
page_num and the canonical calculation of page_shift.

Fixes: 40ddb3f02083 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/efa/efa_verbs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index d85c63a5021a70..72da0faa7ebf97 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/vmalloc.h>
+#include <linux/log2.h>
 
 #include <rdma/ib_addr.h>
 #include <rdma/ib_umem.h>
@@ -1538,9 +1539,8 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 		goto err_unmap;
 	}
 
-	params.page_shift = __ffs(pg_sz);
-	params.page_num = DIV_ROUND_UP(length + (start & (pg_sz - 1)),
-				       pg_sz);
+	params.page_shift = order_base_2(pg_sz);
+	params.page_num = ib_umem_num_dma_blocks(mr->umem, pg_sz);
 
 	ibdev_dbg(&dev->ibdev,
 		  "start %#llx length %#llx params.page_shift %u params.page_num %u\n",
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/17] RDMA/i40iw: Use ib_umem_num_dma_pages()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 09/17] RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding Jason Gunthorpe
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Faisal Latif, linux-rdma
  Cc: Doug Ledford, Henry Orosco, Michael J. Ruhl, Shiraz Saleem

If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
not correct. 'start' should be 'virt'. Change it to use the core code for
page_num and the canonical calculation of page_shift.

Fixes: eb52c0333f06 ("RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/i40iw/i40iw_verbs.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index beb611b157bc8d..ebfece162f98a4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1743,15 +1743,12 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 	struct i40iw_mr *iwmr;
 	struct ib_umem *region;
 	struct i40iw_mem_reg_req req;
-	u64 pbl_depth = 0;
 	u32 stag = 0;
 	u16 access;
-	u64 region_length;
 	bool use_pbles = false;
 	unsigned long flags;
 	int err = -ENOSYS;
 	int ret;
-	int pg_shift;
 
 	if (!udata)
 		return ERR_PTR(-EOPNOTSUPP);
@@ -1786,18 +1783,13 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 	if (req.reg_type == IW_MEMREG_TYPE_MEM)
 		iwmr->page_size = ib_umem_find_best_pgsz(region, SZ_4K | SZ_2M,
 							 virt);
-
-	region_length = region->length + (start & (iwmr->page_size - 1));
-	pg_shift = ffs(iwmr->page_size) - 1;
-	pbl_depth = region_length >> pg_shift;
-	pbl_depth += (region_length & (iwmr->page_size - 1)) ? 1 : 0;
 	iwmr->length = region->length;
 
 	iwpbl->user_base = virt;
 	palloc = &iwpbl->pble_alloc;
 
 	iwmr->type = req.reg_type;
-	iwmr->page_cnt = (u32)pbl_depth;
+	iwmr->page_cnt = ib_umem_num_dma_blocks(region, iwmr->page_size);
 
 	switch (req.reg_type) {
 	case IW_MEMREG_TYPE_QP:
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/17] RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 08/17] RDMA/i40iw: " Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 10/17] RDMA/qedr: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count() Jason Gunthorpe
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Ariel Elior, Doug Ledford, linux-rdma, Michal Kalderon; +Cc: Michal Kalderon

This loop is splitting the DMA SGL into pg_shift sized pages, use the core
code for this directly.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/verbs.c | 41 ++++++++++++------------------
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index b49bef94637e50..cbb49168d9f7ed 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -600,11 +600,9 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
 			       struct qedr_pbl_info *pbl_info, u32 pg_shift)
 {
 	int pbe_cnt, total_num_pbes = 0;
-	u32 fw_pg_cnt, fw_pg_per_umem_pg;
 	struct qedr_pbl *pbl_tbl;
-	struct sg_dma_page_iter sg_iter;
+	struct ib_block_iter biter;
 	struct regpair *pbe;
-	u64 pg_addr;
 
 	if (!pbl_info->num_pbes)
 		return;
@@ -625,32 +623,25 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
 
 	pbe_cnt = 0;
 
-	fw_pg_per_umem_pg = BIT(PAGE_SHIFT - pg_shift);
+	rdma_umem_for_each_dma_block (umem, &biter, BIT(pg_shift)) {
+		u64 pg_addr = rdma_block_iter_dma_address(&biter);
 
-	for_each_sg_dma_page (umem->sg_head.sgl, &sg_iter, umem->nmap, 0) {
-		pg_addr = sg_page_iter_dma_address(&sg_iter);
-		for (fw_pg_cnt = 0; fw_pg_cnt < fw_pg_per_umem_pg;) {
-			pbe->lo = cpu_to_le32(pg_addr);
-			pbe->hi = cpu_to_le32(upper_32_bits(pg_addr));
+		pbe->lo = cpu_to_le32(pg_addr);
+		pbe->hi = cpu_to_le32(upper_32_bits(pg_addr));
 
-			pg_addr += BIT(pg_shift);
-			pbe_cnt++;
-			total_num_pbes++;
-			pbe++;
+		pbe_cnt++;
+		total_num_pbes++;
+		pbe++;
 
-			if (total_num_pbes == pbl_info->num_pbes)
-				return;
+		if (total_num_pbes == pbl_info->num_pbes)
+			return;
 
-			/* If the given pbl is full storing the pbes,
-			 * move to next pbl.
-			 */
-			if (pbe_cnt == (pbl_info->pbl_size / sizeof(u64))) {
-				pbl_tbl++;
-				pbe = (struct regpair *)pbl_tbl->va;
-				pbe_cnt = 0;
-			}
-
-			fw_pg_cnt++;
+		/* If the given pbl is full storing the pbes, move to next pbl.
+		 */
+		if (pbe_cnt == (pbl_info->pbl_size / sizeof(u64))) {
+			pbl_tbl++;
+			pbe = (struct regpair *)pbl_tbl->va;
+			pbe_cnt = 0;
 		}
 	}
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/17] RDMA/qedr: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 09/17] RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 11/17] RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages() Jason Gunthorpe
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Ariel Elior, Doug Ledford, linux-rdma, Michal Kalderon; +Cc: Michal Kalderon

The length of the list populated by qedr_populate_pbls() should be
calculated using ib_umem_num_dma_blocks() with the same size/shift passed
to qedr_populate_pbls().

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/verbs.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index cbb49168d9f7ed..278b48443aedba 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -783,9 +783,7 @@ static inline int qedr_init_user_queue(struct ib_udata *udata,
 		return PTR_ERR(q->umem);
 	}
 
-	fw_pages = ib_umem_page_count(q->umem) <<
-	    (PAGE_SHIFT - FW_PAGE_SHIFT);
-
+	fw_pages = ib_umem_num_dma_blocks(q->umem, 1 << FW_PAGE_SHIFT);
 	rc = qedr_prepare_pbl_tbl(dev, &q->pbl_info, fw_pages, 0);
 	if (rc)
 		goto err0;
@@ -2852,7 +2850,8 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 		goto err0;
 	}
 
-	rc = init_mr_info(dev, &mr->info, ib_umem_page_count(mr->umem), 1);
+	rc = init_mr_info(dev, &mr->info,
+			  ib_umem_num_dma_blocks(mr->umem, PAGE_SIZE), 1);
 	if (rc)
 		goto err1;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 11/17] RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (9 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 10/17] RDMA/qedr: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 12/17] RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding Jason Gunthorpe
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Devesh Sharma, Doug Ledford, linux-rdma, Naresh Kumar PBS,
	Somnath Kotur, Sriharsha Basavapatna
  Cc: Selvin Xavier, Shiraz Saleem

ib_umem_page_count() returns the number of 4k entries required for a DMA
map, but bnxt_re already computes a variable page size. The correct API to
determine the size of the page table array is ib_umem_num_dma_blocks().

Fix the overallocation of the page array in fill_umem_pbl_tbl() when
working with larger page sizes by using the right function. Lightly
re-organize this function to make it clearer.

Replace the other calls to ib_umem_num_pages().

Fixes: d85582517e91 ("RDMA/bnxt_re: Use core helpers to get aligned DMA address")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 70 ++++++++----------------
 1 file changed, 24 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 9e26e651730cb3..9dbf9ab5a4c8db 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -939,7 +939,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
 
 	qp->sumem = umem;
 	qplib_qp->sq.sg_info.sghead = umem->sg_head.sgl;
-	qplib_qp->sq.sg_info.npages = ib_umem_num_pages(umem);
+	qplib_qp->sq.sg_info.npages = ib_umem_num_dma_blocks(umem, PAGE_SIZE);
 	qplib_qp->sq.sg_info.nmap = umem->nmap;
 	qplib_qp->sq.sg_info.pgsize = PAGE_SIZE;
 	qplib_qp->sq.sg_info.pgshft = PAGE_SHIFT;
@@ -954,7 +954,8 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
 			goto rqfail;
 		qp->rumem = umem;
 		qplib_qp->rq.sg_info.sghead = umem->sg_head.sgl;
-		qplib_qp->rq.sg_info.npages = ib_umem_num_pages(umem);
+		qplib_qp->rq.sg_info.npages =
+			ib_umem_num_dma_blocks(umem, PAGE_SIZE);
 		qplib_qp->rq.sg_info.nmap = umem->nmap;
 		qplib_qp->rq.sg_info.pgsize = PAGE_SIZE;
 		qplib_qp->rq.sg_info.pgshft = PAGE_SHIFT;
@@ -1609,7 +1610,7 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev,
 
 	srq->umem = umem;
 	qplib_srq->sg_info.sghead = umem->sg_head.sgl;
-	qplib_srq->sg_info.npages = ib_umem_num_pages(umem);
+	qplib_srq->sg_info.npages = ib_umem_num_dma_blocks(umem, PAGE_SIZE);
 	qplib_srq->sg_info.nmap = umem->nmap;
 	qplib_srq->sg_info.pgsize = PAGE_SIZE;
 	qplib_srq->sg_info.pgshft = PAGE_SHIFT;
@@ -2861,7 +2862,8 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 			goto fail;
 		}
 		cq->qplib_cq.sg_info.sghead = cq->umem->sg_head.sgl;
-		cq->qplib_cq.sg_info.npages = ib_umem_num_pages(cq->umem);
+		cq->qplib_cq.sg_info.npages =
+			ib_umem_num_dma_blocks(cq->umem, PAGE_SIZE);
 		cq->qplib_cq.sg_info.nmap = cq->umem->nmap;
 		cq->qplib_cq.dpi = &uctx->dpi;
 	} else {
@@ -3759,23 +3761,6 @@ int bnxt_re_dealloc_mw(struct ib_mw *ib_mw)
 	return rc;
 }
 
-static int bnxt_re_page_size_ok(int page_shift)
-{
-	switch (page_shift) {
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_4K:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_8K:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_64K:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_2M:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_256K:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_1M:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_4M:
-	case CMDQ_REGISTER_MR_LOG2_PBL_PG_SIZE_PG_1G:
-		return 1;
-	default:
-		return 0;
-	}
-}
-
 static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig,
 			     int page_shift)
 {
@@ -3799,7 +3784,8 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	struct bnxt_re_mr *mr;
 	struct ib_umem *umem;
 	u64 *pbl_tbl = NULL;
-	int umem_pgs, page_shift, rc;
+	unsigned long page_size;
+	int umem_pgs, rc;
 
 	if (length > BNXT_RE_MAX_MR_SIZE) {
 		ibdev_err(&rdev->ibdev, "MR Size: %lld > Max supported:%lld\n",
@@ -3833,42 +3819,34 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	mr->ib_umem = umem;
 
 	mr->qplib_mr.va = virt_addr;
-	umem_pgs = ib_umem_page_count(umem);
-	if (!umem_pgs) {
-		ibdev_err(&rdev->ibdev, "umem is invalid!");
-		rc = -EINVAL;
-		goto free_umem;
-	}
-	mr->qplib_mr.total_size = length;
-
-	pbl_tbl = kcalloc(umem_pgs, sizeof(u64 *), GFP_KERNEL);
-	if (!pbl_tbl) {
-		rc = -ENOMEM;
-		goto free_umem;
-	}
-
-	page_shift = __ffs(ib_umem_find_best_pgsz(umem,
-				BNXT_RE_PAGE_SIZE_4K | BNXT_RE_PAGE_SIZE_2M,
-				virt_addr));
-
-	if (!bnxt_re_page_size_ok(page_shift)) {
+	page_size = ib_umem_find_best_pgsz(
+		umem, BNXT_RE_PAGE_SIZE_4K | BNXT_RE_PAGE_SIZE_2M, virt_addr);
+	if (!page_size) {
 		ibdev_err(&rdev->ibdev, "umem page size unsupported!");
 		rc = -EFAULT;
-		goto fail;
+		goto free_umem;
 	}
+	mr->qplib_mr.total_size = length;
 
-	if (page_shift == BNXT_RE_PAGE_SHIFT_4K &&
+	if (page_size == BNXT_RE_PAGE_SIZE_4K &&
 	    length > BNXT_RE_MAX_MR_SIZE_LOW) {
 		ibdev_err(&rdev->ibdev, "Requested MR Sz:%llu Max sup:%llu",
 			  length, (u64)BNXT_RE_MAX_MR_SIZE_LOW);
 		rc = -EINVAL;
-		goto fail;
+		goto free_umem;
+	}
+
+	umem_pgs = ib_umem_num_dma_blocks(umem, page_size);
+	pbl_tbl = kcalloc(umem_pgs, sizeof(u64 *), GFP_KERNEL);
+	if (!pbl_tbl) {
+		rc = -ENOMEM;
+		goto free_umem;
 	}
 
 	/* Map umem buf ptrs to the PBL */
-	umem_pgs = fill_umem_pbl_tbl(umem, pbl_tbl, page_shift);
+	umem_pgs = fill_umem_pbl_tbl(umem, pbl_tbl, order_base_2(page_size));
 	rc = bnxt_qplib_reg_mr(&rdev->qplib_res, &mr->qplib_mr, pbl_tbl,
-			       umem_pgs, false, 1 << page_shift);
+			       umem_pgs, false, page_size);
 	if (rc) {
 		ibdev_err(&rdev->ibdev, "Failed to register user MR");
 		goto fail;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 12/17] RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (10 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 11/17] RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-07  8:11   ` liweihang
  2020-09-04 22:41 ` [PATCH v2 13/17] RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count() Jason Gunthorpe
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Doug Ledford, Wei Hu(Xavier), linux-rdma, Weihang Li, Lijun Ou

mtr_umem_page_count() does the same thing, replace it with the core code.

Also, ib_umem_find_best_pgsz() should always be called to check that the
umem meets the page_size requirement. If there is a limited set of
page_sizes that work it the pgsz_bitmap should be set to that set. 0 is a
failure and the umem cannot be used.

Lightly tidy the control flow to implement this flow properly.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/hns/hns_roce_mr.c | 49 ++++++++++---------------
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index e5df3884b41dda..16699f6bb03a51 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -707,19 +707,6 @@ static inline size_t mtr_bufs_size(struct hns_roce_buf_attr *attr)
 	return size;
 }
 
-static inline int mtr_umem_page_count(struct ib_umem *umem,
-				      unsigned int page_shift)
-{
-	int count = ib_umem_page_count(umem);
-
-	if (page_shift >= PAGE_SHIFT)
-		count >>= page_shift - PAGE_SHIFT;
-	else
-		count <<= PAGE_SHIFT - page_shift;
-
-	return count;
-}
-
 static inline size_t mtr_kmem_direct_size(bool is_direct, size_t alloc_size,
 					  unsigned int page_shift)
 {
@@ -767,12 +754,10 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
 			  struct ib_udata *udata, unsigned long user_addr)
 {
 	struct ib_device *ibdev = &hr_dev->ib_dev;
-	unsigned int max_pg_shift = buf_attr->page_shift;
-	unsigned int best_pg_shift = 0;
+	unsigned int best_pg_shift;
 	int all_pg_count = 0;
 	size_t direct_size;
 	size_t total_size;
-	unsigned long tmp;
 	int ret = 0;
 
 	total_size = mtr_bufs_size(buf_attr);
@@ -782,6 +767,9 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
 	}
 
 	if (udata) {
+		unsigned long pgsz_bitmap;
+		unsigned long page_size;
+
 		mtr->kmem = NULL;
 		mtr->umem = ib_umem_get(ibdev, user_addr, total_size,
 					buf_attr->user_access);
@@ -790,15 +778,17 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
 				  PTR_ERR(mtr->umem));
 			return -ENOMEM;
 		}
-		if (buf_attr->fixed_page) {
-			best_pg_shift = max_pg_shift;
-		} else {
-			tmp = GENMASK(max_pg_shift, 0);
-			ret = ib_umem_find_best_pgsz(mtr->umem, tmp, user_addr);
-			best_pg_shift = (ret <= PAGE_SIZE) ?
-					PAGE_SHIFT : ilog2(ret);
-		}
-		all_pg_count = mtr_umem_page_count(mtr->umem, best_pg_shift);
+		if (buf_attr->fixed_page)
+			pgsz_bitmap = 1 << buf_attr->page_shift;
+		else
+			pgsz_bitmap = GENMASK(buf_attr->page_shift, PAGE_SHIFT);
+
+		page_size = ib_umem_find_best_pgsz(mtr->umem, pgsz_bitmap,
+						   user_addr);
+		if (!page_size)
+			return -EINVAL;
+		best_pg_shift = order_base_2(page_size);
+		all_pg_count = ib_umem_num_dma_blocks(mtr->umem, page_size);
 		ret = 0;
 	} else {
 		mtr->umem = NULL;
@@ -808,16 +798,15 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
 			return -ENOMEM;
 		}
 		direct_size = mtr_kmem_direct_size(is_direct, total_size,
-						   max_pg_shift);
+						   buf_attr->page_shift);
 		ret = hns_roce_buf_alloc(hr_dev, total_size, direct_size,
-					 mtr->kmem, max_pg_shift);
+					 mtr->kmem, buf_attr->page_shift);
 		if (ret) {
 			ibdev_err(ibdev, "Failed to alloc kmem, ret %d\n", ret);
 			goto err_alloc_mem;
-		} else {
-			best_pg_shift = max_pg_shift;
-			all_pg_count = mtr->kmem->npages;
 		}
+		best_pg_shift = buf_attr->page_shift;
+		all_pg_count = mtr->kmem->npages;
 	}
 
 	/* must bigger than minimum hardware page shift */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 13/17] RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (11 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 12/17] RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 14/17] RDMA/pvrdma: " Jason Gunthorpe
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Devesh Sharma, Doug Ledford, linux-rdma, Selvin Xavier

This driver always uses a DMA array made up of PAGE_SIZE elements, so just
use ib_umem_num_dma_blocks().

Since rdma_for_each_dma_block() always iterates exactly
ib_umem_num_dma_blocks() there is no need for the early exit check in
build_user_pbes(), delete it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 933b297de2ba86..1fb8da6d613674 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -810,13 +810,12 @@ static int ocrdma_build_pbl_tbl(struct ocrdma_dev *dev, struct ocrdma_hw_mr *mr)
 	return status;
 }
 
-static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
-			    u32 num_pbes)
+static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr)
 {
 	struct ocrdma_pbe *pbe;
 	struct ib_block_iter biter;
 	struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table;
-	int pbe_cnt, total_num_pbes = 0;
+	int pbe_cnt;
 	u64 pg_addr;
 
 	if (!mr->hwmr.num_pbes)
@@ -831,13 +830,8 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
 		pbe->pa_lo = cpu_to_le32(pg_addr);
 		pbe->pa_hi = cpu_to_le32(upper_32_bits(pg_addr));
 		pbe_cnt += 1;
-		total_num_pbes += 1;
 		pbe++;
 
-		/* if done building pbes, issue the mbx cmd. */
-		if (total_num_pbes == num_pbes)
-			return;
-
 		/* if the given pbl is full storing the pbes,
 		 * move to next pbl.
 		 */
@@ -856,7 +850,6 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device);
 	struct ocrdma_mr *mr;
 	struct ocrdma_pd *pd;
-	u32 num_pbes;
 
 	pd = get_ocrdma_pd(ibpd);
 
@@ -871,8 +864,8 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 		status = -EFAULT;
 		goto umem_err;
 	}
-	num_pbes = ib_umem_page_count(mr->umem);
-	status = ocrdma_get_pbl_info(dev, mr, num_pbes);
+	status = ocrdma_get_pbl_info(
+		dev, mr, ib_umem_num_dma_blocks(mr->umem, PAGE_SIZE));
 	if (status)
 		goto umem_err;
 
@@ -888,7 +881,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	status = ocrdma_build_pbl_tbl(dev, &mr->hwmr);
 	if (status)
 		goto umem_err;
-	build_user_pbes(dev, mr, num_pbes);
+	build_user_pbes(dev, mr);
 	status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, acc);
 	if (status)
 		goto mbx_err;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 14/17] RDMA/pvrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (12 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 13/17] RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 15/17] RDMA/mlx4: Use ib_umem_num_dma_blocks() Jason Gunthorpe
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Adit Ranadive, Doug Ledford, linux-rdma, VMware PV-Drivers

This driver always uses PAGE_SIZE.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  | 2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  | 6 ++++--
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c | 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
index 01cd122a8b692f..ad7dfababf1fe1 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
@@ -142,7 +142,7 @@ int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 			goto err_cq;
 		}
 
-		npages = ib_umem_page_count(cq->umem);
+		npages = ib_umem_num_dma_blocks(cq->umem, PAGE_SIZE);
 	} else {
 		/* One extra page for shared ring state */
 		npages = 1 + (entries * sizeof(struct pvrdma_cqe) +
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
index 9a8f2a9507be07..8a385acf6f0c42 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
@@ -298,9 +298,11 @@ struct ib_qp *pvrdma_create_qp(struct ib_pd *pd,
 				goto err_qp;
 			}
 
-			qp->npages_send = ib_umem_page_count(qp->sumem);
+			qp->npages_send =
+				ib_umem_num_dma_blocks(qp->sumem, PAGE_SIZE);
 			if (!is_srq)
-				qp->npages_recv = ib_umem_page_count(qp->rumem);
+				qp->npages_recv = ib_umem_num_dma_blocks(
+					qp->rumem, PAGE_SIZE);
 			else
 				qp->npages_recv = 0;
 			qp->npages = qp->npages_send + qp->npages_recv;
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c
index f60a8e81bddddb..6fd843cff21e70 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c
@@ -152,7 +152,7 @@ int pvrdma_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr,
 		goto err_srq;
 	}
 
-	srq->npages = ib_umem_page_count(srq->umem);
+	srq->npages = ib_umem_num_dma_blocks(srq->umem, PAGE_SIZE);
 
 	if (srq->npages < 0 || srq->npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
 		dev_warn(&dev->pdev->dev,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 15/17] RDMA/mlx4: Use ib_umem_num_dma_blocks()
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (13 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 14/17] RDMA/pvrdma: " Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-04 22:41 ` [PATCH v2 16/17] RDMA/qedr: Remove fbo and zbva from the MR Jason Gunthorpe
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma, Yishai Hadas

For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use
ib_umem_num_dma_blocks() inside the function, it is just some weird static
default.

All other places are just using it with PAGE_SIZE, switch to
ib_umem_num_dma_blocks().

As this is the last call site, remove ib_umem_num_count().

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/core/umem.c   | 12 ------------
 drivers/infiniband/hw/mlx4/cq.c  |  1 -
 drivers/infiniband/hw/mlx4/mr.c  |  5 +++--
 drivers/infiniband/hw/mlx4/qp.c  |  2 --
 drivers/infiniband/hw/mlx4/srq.c |  5 +++--
 include/rdma/ib_umem.h           |  2 --
 6 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index b57dbb14de8378..c1ab6a4f2bc386 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -350,18 +350,6 @@ void ib_umem_release(struct ib_umem *umem)
 }
 EXPORT_SYMBOL(ib_umem_release);
 
-int ib_umem_page_count(struct ib_umem *umem)
-{
-	int i, n = 0;
-	struct scatterlist *sg;
-
-	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i)
-		n += sg_dma_len(sg) >> PAGE_SHIFT;
-
-	return n;
-}
-EXPORT_SYMBOL(ib_umem_page_count);
-
 /*
  * Copy from the given ib_umem's pages to the given buffer.
  *
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 8a3436994f8097..f62afc13d34885 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -149,7 +149,6 @@ static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_udata *udata,
 	if (IS_ERR(*umem))
 		return PTR_ERR(*umem);
 
-	n = ib_umem_page_count(*umem);
 	shift = mlx4_ib_umem_calc_optimal_mtt_size(*umem, 0, &n);
 	err = mlx4_mtt_init(dev->dev, n, shift, &buf->mtt);
 
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 1d5ef0de12c950..bfb779b5eeb3d2 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -271,6 +271,8 @@ int mlx4_ib_umem_calc_optimal_mtt_size(struct ib_umem *umem, u64 start_va,
 	u64 total_len = 0;
 	int i;
 
+	*num_of_mtts = ib_umem_num_dma_blocks(umem, PAGE_SIZE);
+
 	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) {
 		/*
 		 * Initialization - save the first chunk start as the
@@ -421,7 +423,6 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto err_free;
 	}
 
-	n = ib_umem_page_count(mr->umem);
 	shift = mlx4_ib_umem_calc_optimal_mtt_size(mr->umem, start, &n);
 
 	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
@@ -511,7 +512,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 			mmr->umem = NULL;
 			goto release_mpt_entry;
 		}
-		n = ib_umem_page_count(mmr->umem);
+		n = ib_umem_num_dma_blocks(mmr->umem, PAGE_SIZE);
 		shift = PAGE_SHIFT;
 
 		err = mlx4_mr_rereg_mem_write(dev->dev, &mmr->mmr,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 2975f350b9fd10..31839f95d44af9 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -922,7 +922,6 @@ static int create_rq(struct ib_pd *pd, struct ib_qp_init_attr *init_attr,
 		goto err;
 	}
 
-	n = ib_umem_page_count(qp->umem);
 	shift = mlx4_ib_umem_calc_optimal_mtt_size(qp->umem, 0, &n);
 	err = mlx4_mtt_init(dev->dev, n, shift, &qp->mtt);
 
@@ -1117,7 +1116,6 @@ static int create_qp_common(struct ib_pd *pd, struct ib_qp_init_attr *init_attr,
 			goto err;
 		}
 
-		n = ib_umem_page_count(qp->umem);
 		shift = mlx4_ib_umem_calc_optimal_mtt_size(qp->umem, 0, &n);
 		err = mlx4_mtt_init(dev->dev, n, shift, &qp->mtt);
 
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 8f9d5035142d33..108b2d0118d064 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -115,8 +115,9 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
 		if (IS_ERR(srq->umem))
 			return PTR_ERR(srq->umem);
 
-		err = mlx4_mtt_init(dev->dev, ib_umem_page_count(srq->umem),
-				    PAGE_SHIFT, &srq->mtt);
+		err = mlx4_mtt_init(
+			dev->dev, ib_umem_num_dma_blocks(srq->umem, PAGE_SIZE),
+			PAGE_SHIFT, &srq->mtt);
 		if (err)
 			goto err_buf;
 
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 0f1ab3d8f77dea..fa556da3337c86 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -74,7 +74,6 @@ static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter,
 struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
 			    size_t size, int access);
 void ib_umem_release(struct ib_umem *umem);
-int ib_umem_page_count(struct ib_umem *umem);
 int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 		      size_t length);
 unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
@@ -92,7 +91,6 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
 	return ERR_PTR(-EINVAL);
 }
 static inline void ib_umem_release(struct ib_umem *umem) { }
-static inline int ib_umem_page_count(struct ib_umem *umem) { return 0; }
 static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 		      		    size_t length) {
 	return -EINVAL;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 16/17] RDMA/qedr: Remove fbo and zbva from the MR
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (14 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 15/17] RDMA/mlx4: Use ib_umem_num_dma_blocks() Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-06  8:01   ` [EXT] " Michal Kalderon
  2020-09-04 22:41 ` [PATCH v2 17/17] RDMA/ocrdma: Remove fbo from MR Jason Gunthorpe
  2020-09-09 18:38 ` [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
  17 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Ariel Elior, David S. Miller, Doug Ledford, GR-everest-linux-l2,
	Jakub Kicinski, linux-rdma, Michal Kalderon, netdev

zbva is always false, so fbo is never read.

A 'zero-based-virtual-address' is simply IOVA == 0, and the driver already
supports this.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/qedr/verbs.c         |  4 ----
 drivers/net/ethernet/qlogic/qed/qed_rdma.c | 12 ++----------
 include/linux/qed/qed_rdma_if.h            |  2 --
 3 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 278b48443aedba..cca69b4ed354ea 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -2878,10 +2878,8 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
 	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
 	mr->hw_mr.page_size_log = PAGE_SHIFT;
-	mr->hw_mr.fbo = ib_umem_offset(mr->umem);
 	mr->hw_mr.length = len;
 	mr->hw_mr.vaddr = usr_addr;
-	mr->hw_mr.zbva = false;
 	mr->hw_mr.phy_mr = false;
 	mr->hw_mr.dma_mr = false;
 
@@ -2974,10 +2972,8 @@ static struct qedr_mr *__qedr_alloc_mr(struct ib_pd *ibpd,
 	mr->hw_mr.pbl_ptr = 0;
 	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
 	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
-	mr->hw_mr.fbo = 0;
 	mr->hw_mr.length = 0;
 	mr->hw_mr.vaddr = 0;
-	mr->hw_mr.zbva = false;
 	mr->hw_mr.phy_mr = true;
 	mr->hw_mr.dma_mr = false;
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index a4bcde522cdf9d..baa4c36608ea91 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -1520,7 +1520,7 @@ qed_rdma_register_tid(void *rdma_cxt,
 		  params->pbl_two_level);
 
 	SET_FIELD(flags, RDMA_REGISTER_TID_RAMROD_DATA_ZERO_BASED,
-		  params->zbva);
+		  false);
 
 	SET_FIELD(flags, RDMA_REGISTER_TID_RAMROD_DATA_PHY_MR, params->phy_mr);
 
@@ -1582,15 +1582,7 @@ qed_rdma_register_tid(void *rdma_cxt,
 	p_ramrod->pd = cpu_to_le16(params->pd);
 	p_ramrod->length_hi = (u8)(params->length >> 32);
 	p_ramrod->length_lo = DMA_LO_LE(params->length);
-	if (params->zbva) {
-		/* Lower 32 bits of the registered MR address.
-		 * In case of zero based MR, will hold FBO
-		 */
-		p_ramrod->va.hi = 0;
-		p_ramrod->va.lo = cpu_to_le32(params->fbo);
-	} else {
-		DMA_REGPAIR_LE(p_ramrod->va, params->vaddr);
-	}
+	DMA_REGPAIR_LE(p_ramrod->va, params->vaddr);
 	DMA_REGPAIR_LE(p_ramrod->pbl_base, params->pbl_ptr);
 
 	/* DIF */
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index f464d85e88a410..aeb242cefebfa8 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -242,10 +242,8 @@ struct qed_rdma_register_tid_in_params {
 	bool pbl_two_level;
 	u8 pbl_page_size_log;
 	u8 page_size_log;
-	u32 fbo;
 	u64 length;
 	u64 vaddr;
-	bool zbva;
 	bool phy_mr;
 	bool dma_mr;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 17/17] RDMA/ocrdma: Remove fbo from MR
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (15 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 16/17] RDMA/qedr: Remove fbo and zbva from the MR Jason Gunthorpe
@ 2020-09-04 22:41 ` Jason Gunthorpe
  2020-09-06  7:21   ` Leon Romanovsky
  2020-09-09 18:38 ` [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
  17 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-04 22:41 UTC (permalink / raw)
  To: Devesh Sharma, Doug Ledford, linux-rdma, Selvin Xavier

This is always the same value as IOVA masked by the page size, just use
that clearer calculation directly.

It is unclear of ocrdma hardware can actually support a true fbo, if so it
could use a different algorithm to compute the best page size.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/ocrdma/ocrdma.h       | 1 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c    | 5 +++--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 1 -
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
index fcfe0e82197a24..5eb61c1100900d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -185,7 +185,6 @@ struct ocrdma_hw_mr {
 	u32 num_pbes;
 	u32 pbl_size;
 	u32 pbe_size;
-	u64 fbo;
 	u64 va;
 };
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index e07bf0b2209a4c..18ed658f8dba10 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -1962,6 +1962,7 @@ static int ocrdma_mbx_reg_mr(struct ocrdma_dev *dev, struct ocrdma_hw_mr *hwmr,
 	int i;
 	struct ocrdma_reg_nsmr *cmd;
 	struct ocrdma_reg_nsmr_rsp *rsp;
+	u64 fbo = hwmr->va & (hwmr->pbe_size - 1);
 
 	cmd = ocrdma_init_emb_mqe(OCRDMA_CMD_REGISTER_NSMR, sizeof(*cmd));
 	if (!cmd)
@@ -1987,8 +1988,8 @@ static int ocrdma_mbx_reg_mr(struct ocrdma_dev *dev, struct ocrdma_hw_mr *hwmr,
 					OCRDMA_REG_NSMR_HPAGE_SIZE_SHIFT;
 	cmd->totlen_low = hwmr->len;
 	cmd->totlen_high = upper_32_bits(hwmr->len);
-	cmd->fbo_low = (u32) (hwmr->fbo & 0xffffffff);
-	cmd->fbo_high = (u32) upper_32_bits(hwmr->fbo);
+	cmd->fbo_low = (u32) (fbo & 0xffffffff);
+	cmd->fbo_high = (u32) upper_32_bits(fbo);
 	cmd->va_loaddr = (u32) hwmr->va;
 	cmd->va_hiaddr = (u32) upper_32_bits(hwmr->va);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 1fb8da6d613674..3b98a3b3e2272d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -870,7 +870,6 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 		goto umem_err;
 
 	mr->hwmr.pbe_size = PAGE_SIZE;
-	mr->hwmr.fbo = ib_umem_offset(mr->umem);
 	mr->hwmr.va = usr_addr;
 	mr->hwmr.len = len;
 	mr->hwmr.remote_wr = (acc & IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 17/17] RDMA/ocrdma: Remove fbo from MR
  2020-09-04 22:41 ` [PATCH v2 17/17] RDMA/ocrdma: Remove fbo from MR Jason Gunthorpe
@ 2020-09-06  7:21   ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2020-09-06  7:21 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Devesh Sharma, Doug Ledford, linux-rdma, Selvin Xavier

On Fri, Sep 04, 2020 at 07:41:58PM -0300, Jason Gunthorpe wrote:
> This is always the same value as IOVA masked by the page size, just use
> that clearer calculation directly.
>
> It is unclear of ocrdma hardware can actually support a true fbo, if so it
> could use a different algorithm to compute the best page size.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/infiniband/hw/ocrdma/ocrdma.h       | 1 -
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.c    | 5 +++--
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 1 -
>  3 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
> index fcfe0e82197a24..5eb61c1100900d 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma.h
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
> @@ -185,7 +185,6 @@ struct ocrdma_hw_mr {
>  	u32 num_pbes;
>  	u32 pbl_size;
>  	u32 pbe_size;
> -	u64 fbo;
>  	u64 va;
>  };
>
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> index e07bf0b2209a4c..18ed658f8dba10 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> @@ -1962,6 +1962,7 @@ static int ocrdma_mbx_reg_mr(struct ocrdma_dev *dev, struct ocrdma_hw_mr *hwmr,
>  	int i;
>  	struct ocrdma_reg_nsmr *cmd;
>  	struct ocrdma_reg_nsmr_rsp *rsp;
> +	u64 fbo = hwmr->va & (hwmr->pbe_size - 1);
>
>  	cmd = ocrdma_init_emb_mqe(OCRDMA_CMD_REGISTER_NSMR, sizeof(*cmd));
>  	if (!cmd)
> @@ -1987,8 +1988,8 @@ static int ocrdma_mbx_reg_mr(struct ocrdma_dev *dev, struct ocrdma_hw_mr *hwmr,
>  					OCRDMA_REG_NSMR_HPAGE_SIZE_SHIFT;
>  	cmd->totlen_low = hwmr->len;
>  	cmd->totlen_high = upper_32_bits(hwmr->len);
> -	cmd->fbo_low = (u32) (hwmr->fbo & 0xffffffff);
> -	cmd->fbo_high = (u32) upper_32_bits(hwmr->fbo);
> +	cmd->fbo_low = (u32) (fbo & 0xffffffff);

lower_32_bits(fbo)

> +	cmd->fbo_high = (u32) upper_32_bits(fbo);

u32 casting is not necessary.

>  	cmd->va_loaddr = (u32) hwmr->va;
>  	cmd->va_hiaddr = (u32) upper_32_bits(hwmr->va);
>
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index 1fb8da6d613674..3b98a3b3e2272d 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -870,7 +870,6 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
>  		goto umem_err;
>
>  	mr->hwmr.pbe_size = PAGE_SIZE;
> -	mr->hwmr.fbo = ib_umem_offset(mr->umem);
>  	mr->hwmr.va = usr_addr;
>  	mr->hwmr.len = len;
>  	mr->hwmr.remote_wr = (acc & IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [EXT] [PATCH v2 16/17] RDMA/qedr: Remove fbo and zbva from the MR
  2020-09-04 22:41 ` [PATCH v2 16/17] RDMA/qedr: Remove fbo and zbva from the MR Jason Gunthorpe
@ 2020-09-06  8:01   ` Michal Kalderon
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Kalderon @ 2020-09-06  8:01 UTC (permalink / raw)
  To: Jason Gunthorpe, Ariel Elior, David S. Miller, Doug Ledford,
	GR-everest-linux-l2, Jakub Kicinski, linux-rdma, netdev

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, September 5, 2020 1:42 AM
> zbva is always false, so fbo is never read.
> 
> A 'zero-based-virtual-address' is simply IOVA == 0, and the driver already
> supports this.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/infiniband/hw/qedr/verbs.c         |  4 ----
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c | 12 ++----------
>  include/linux/qed/qed_rdma_if.h            |  2 --
>  3 files changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index 278b48443aedba..cca69b4ed354ea 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -2878,10 +2878,8 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd
> *ibpd, u64 start, u64 len,
>  	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
>  	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
>  	mr->hw_mr.page_size_log = PAGE_SHIFT;
> -	mr->hw_mr.fbo = ib_umem_offset(mr->umem);
>  	mr->hw_mr.length = len;
>  	mr->hw_mr.vaddr = usr_addr;
> -	mr->hw_mr.zbva = false;
>  	mr->hw_mr.phy_mr = false;
>  	mr->hw_mr.dma_mr = false;
> 
> @@ -2974,10 +2972,8 @@ static struct qedr_mr *__qedr_alloc_mr(struct
> ib_pd *ibpd,
>  	mr->hw_mr.pbl_ptr = 0;
>  	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
>  	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
> -	mr->hw_mr.fbo = 0;
>  	mr->hw_mr.length = 0;
>  	mr->hw_mr.vaddr = 0;
> -	mr->hw_mr.zbva = false;
>  	mr->hw_mr.phy_mr = true;
>  	mr->hw_mr.dma_mr = false;
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> index a4bcde522cdf9d..baa4c36608ea91 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> @@ -1520,7 +1520,7 @@ qed_rdma_register_tid(void *rdma_cxt,
>  		  params->pbl_two_level);
> 
>  	SET_FIELD(flags,
> RDMA_REGISTER_TID_RAMROD_DATA_ZERO_BASED,
> -		  params->zbva);
> +		  false);
> 
>  	SET_FIELD(flags, RDMA_REGISTER_TID_RAMROD_DATA_PHY_MR,
> params->phy_mr);
> 
> @@ -1582,15 +1582,7 @@ qed_rdma_register_tid(void *rdma_cxt,
>  	p_ramrod->pd = cpu_to_le16(params->pd);
>  	p_ramrod->length_hi = (u8)(params->length >> 32);
>  	p_ramrod->length_lo = DMA_LO_LE(params->length);
> -	if (params->zbva) {
> -		/* Lower 32 bits of the registered MR address.
> -		 * In case of zero based MR, will hold FBO
> -		 */
> -		p_ramrod->va.hi = 0;
> -		p_ramrod->va.lo = cpu_to_le32(params->fbo);
> -	} else {
> -		DMA_REGPAIR_LE(p_ramrod->va, params->vaddr);
> -	}
> +	DMA_REGPAIR_LE(p_ramrod->va, params->vaddr);
>  	DMA_REGPAIR_LE(p_ramrod->pbl_base, params->pbl_ptr);
> 
>  	/* DIF */
> diff --git a/include/linux/qed/qed_rdma_if.h
> b/include/linux/qed/qed_rdma_if.h index f464d85e88a410..aeb242cefebfa8
> 100644
> --- a/include/linux/qed/qed_rdma_if.h
> +++ b/include/linux/qed/qed_rdma_if.h
> @@ -242,10 +242,8 @@ struct qed_rdma_register_tid_in_params {
>  	bool pbl_two_level;
>  	u8 pbl_page_size_log;
>  	u8 page_size_log;
> -	u32 fbo;
>  	u64 length;
>  	u64 vaddr;
> -	bool zbva;
>  	bool phy_mr;
>  	bool dma_mr;
> 
> --
> 2.28.0

Thanks, 

Acked-by: Michal Kalderon <michal.kalderon@marvell.com>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 12/17] RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
  2020-09-04 22:41 ` [PATCH v2 12/17] RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding Jason Gunthorpe
@ 2020-09-07  8:11   ` liweihang
  0 siblings, 0 replies; 28+ messages in thread
From: liweihang @ 2020-09-07  8:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Doug Ledford, Huwei (Xavier), linux-rdma, oulijun

On 2020/9/5 6:42, Jason Gunthorpe wrote:
> mtr_umem_page_count() does the same thing, replace it with the core code.
> 
> Also, ib_umem_find_best_pgsz() should always be called to check that the
> umem meets the page_size requirement. If there is a limited set of
> page_sizes that work it the pgsz_bitmap should be set to that set. 0 is a
> failure and the umem cannot be used.
> 
> Lightly tidy the control flow to implement this flow properly.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_mr.c | 49 ++++++++++---------------
>  1 file changed, 19 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> index e5df3884b41dda..16699f6bb03a51 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> @@ -707,19 +707,6 @@ static inline size_t mtr_bufs_size(struct hns_roce_buf_attr *attr)
>  	return size;
>  }
>  
> -static inline int mtr_umem_page_count(struct ib_umem *umem,
> -				      unsigned int page_shift)
> -{
> -	int count = ib_umem_page_count(umem);
> -
> -	if (page_shift >= PAGE_SHIFT)
> -		count >>= page_shift - PAGE_SHIFT;
> -	else
> -		count <<= PAGE_SHIFT - page_shift;
> -
> -	return count;
> -}
> -
>  static inline size_t mtr_kmem_direct_size(bool is_direct, size_t alloc_size,
>  					  unsigned int page_shift)
>  {
> @@ -767,12 +754,10 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
>  			  struct ib_udata *udata, unsigned long user_addr)
>  {
>  	struct ib_device *ibdev = &hr_dev->ib_dev;
> -	unsigned int max_pg_shift = buf_attr->page_shift;
> -	unsigned int best_pg_shift = 0;
> +	unsigned int best_pg_shift;
>  	int all_pg_count = 0;
>  	size_t direct_size;
>  	size_t total_size;
> -	unsigned long tmp;
>  	int ret = 0;
>  
>  	total_size = mtr_bufs_size(buf_attr);
> @@ -782,6 +767,9 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
>  	}
>  
>  	if (udata) {
> +		unsigned long pgsz_bitmap;
> +		unsigned long page_size;
> +
>  		mtr->kmem = NULL;
>  		mtr->umem = ib_umem_get(ibdev, user_addr, total_size,
>  					buf_attr->user_access);
> @@ -790,15 +778,17 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
>  				  PTR_ERR(mtr->umem));
>  			return -ENOMEM;
>  		}
> -		if (buf_attr->fixed_page) {
> -			best_pg_shift = max_pg_shift;
> -		} else {
> -			tmp = GENMASK(max_pg_shift, 0);
> -			ret = ib_umem_find_best_pgsz(mtr->umem, tmp, user_addr);
> -			best_pg_shift = (ret <= PAGE_SIZE) ?
> -					PAGE_SHIFT : ilog2(ret);
> -		}
> -		all_pg_count = mtr_umem_page_count(mtr->umem, best_pg_shift);
> +		if (buf_attr->fixed_page)
> +			pgsz_bitmap = 1 << buf_attr->page_shift;
> +		else
> +			pgsz_bitmap = GENMASK(buf_attr->page_shift, PAGE_SHIFT);
> +
> +		page_size = ib_umem_find_best_pgsz(mtr->umem, pgsz_bitmap,
> +						   user_addr);
> +		if (!page_size)
> +			return -EINVAL;
> +		best_pg_shift = order_base_2(page_size);
> +		all_pg_count = ib_umem_num_dma_blocks(mtr->umem, page_size);
>  		ret = 0;
>  	} else {
>  		mtr->umem = NULL;
> @@ -808,16 +798,15 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
>  			return -ENOMEM;
>  		}
>  		direct_size = mtr_kmem_direct_size(is_direct, total_size,
> -						   max_pg_shift);
> +						   buf_attr->page_shift);
>  		ret = hns_roce_buf_alloc(hr_dev, total_size, direct_size,
> -					 mtr->kmem, max_pg_shift);
> +					 mtr->kmem, buf_attr->page_shift);
>  		if (ret) {
>  			ibdev_err(ibdev, "Failed to alloc kmem, ret %d\n", ret);
>  			goto err_alloc_mem;
> -		} else {
> -			best_pg_shift = max_pg_shift;
> -			all_pg_count = mtr->kmem->npages;
>  		}
> +		best_pg_shift = buf_attr->page_shift;
> +		all_pg_count = mtr->kmem->npages;
>  	}
>  
>  	/* must bigger than minimum hardware page shift */
> 

Thanks

Acked-by: Weihang Li <liweihang@huawei.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
  2020-09-04 22:41 ` [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks() Jason Gunthorpe
@ 2020-09-07 12:16   ` Gal Pressman
  2020-09-11 13:21   ` Jason Gunthorpe
  1 sibling, 0 replies; 28+ messages in thread
From: Gal Pressman @ 2020-09-07 12:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Adit Ranadive, Potnuri Bharat Teja,
	Doug Ledford, Leon Romanovsky, linux-rdma, VMware PV-Drivers

On 05/09/2020 1:41, Jason Gunthorpe wrote:
> ib_num_pages() should only be used by things working with the SGL in CPU

Nit: ib_umem_num_pages().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages()
  2020-09-04 22:41 ` [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages() Jason Gunthorpe
@ 2020-09-07 12:19   ` Gal Pressman
  2020-09-08 13:48     ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Gal Pressman @ 2020-09-07 12:19 UTC (permalink / raw)
  To: Jason Gunthorpe, linux-rdma
  Cc: Doug Ledford, Firas JahJah, Shiraz Saleem, Yossi Leybovich

On 05/09/2020 1:41, Jason Gunthorpe wrote:
> If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
> not correct. 'start' should be 'virt'. Change it to use the core code for
> page_num and the canonical calculation of page_shift.

Should I submit a fix for stable changing start to virt?

> Fixes: 40ddb3f02083 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size")
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/infiniband/hw/efa/efa_verbs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
> index d85c63a5021a70..72da0faa7ebf97 100644
> --- a/drivers/infiniband/hw/efa/efa_verbs.c
> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> @@ -4,6 +4,7 @@
>   */
>  
>  #include <linux/vmalloc.h>
> +#include <linux/log2.h>
>  
>  #include <rdma/ib_addr.h>
>  #include <rdma/ib_umem.h>
> @@ -1538,9 +1539,8 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
>  		goto err_unmap;
>  	}
>  
> -	params.page_shift = __ffs(pg_sz);
> -	params.page_num = DIV_ROUND_UP(length + (start & (pg_sz - 1)),
> -				       pg_sz);
> +	params.page_shift = order_base_2(pg_sz);

Not related to this patch, but indeed looks better :).

> +	params.page_num = ib_umem_num_dma_blocks(mr->umem, pg_sz);

Thanks,
Tested-by: Gal Pressman <galpress@amazon.com>
Acked-by: Gal Pressman <galpress@amazon.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages()
  2020-09-07 12:19   ` Gal Pressman
@ 2020-09-08 13:48     ` Jason Gunthorpe
  2020-09-09  8:18       ` Gal Pressman
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-08 13:48 UTC (permalink / raw)
  To: Gal Pressman
  Cc: linux-rdma, Doug Ledford, Firas JahJah, Shiraz Saleem, Yossi Leybovich

On Mon, Sep 07, 2020 at 03:19:54PM +0300, Gal Pressman wrote:
> On 05/09/2020 1:41, Jason Gunthorpe wrote:
> > If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
> > not correct. 'start' should be 'virt'. Change it to use the core code for
> > page_num and the canonical calculation of page_shift.
> 
> Should I submit a fix for stable changing start to virt?

I suspect EFA users never use ibv_reg_mr_iova() so won't have an
actual bug?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages()
  2020-09-08 13:48     ` Jason Gunthorpe
@ 2020-09-09  8:18       ` Gal Pressman
  2020-09-09 11:14         ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Gal Pressman @ 2020-09-09  8:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Doug Ledford, Firas JahJah, Shiraz Saleem, Yossi Leybovich

On 08/09/2020 16:48, Jason Gunthorpe wrote:
> On Mon, Sep 07, 2020 at 03:19:54PM +0300, Gal Pressman wrote:
>> On 05/09/2020 1:41, Jason Gunthorpe wrote:
>>> If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
>>> not correct. 'start' should be 'virt'. Change it to use the core code for
>>> page_num and the canonical calculation of page_shift.
>>
>> Should I submit a fix for stable changing start to virt?
> 
> I suspect EFA users never use ibv_reg_mr_iova() so won't have an
> actual bug?

That's still a driver bug though, regardless of the userspace so I'd rather fix it.
Should I submit a patch to for-rc? It would conflict with the for-next one.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages()
  2020-09-09  8:18       ` Gal Pressman
@ 2020-09-09 11:14         ` Jason Gunthorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-09 11:14 UTC (permalink / raw)
  To: Gal Pressman
  Cc: linux-rdma, Doug Ledford, Firas JahJah, Shiraz Saleem, Yossi Leybovich

On Wed, Sep 09, 2020 at 11:18:49AM +0300, Gal Pressman wrote:
> On 08/09/2020 16:48, Jason Gunthorpe wrote:
> > On Mon, Sep 07, 2020 at 03:19:54PM +0300, Gal Pressman wrote:
> >> On 05/09/2020 1:41, Jason Gunthorpe wrote:
> >>> If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
> >>> not correct. 'start' should be 'virt'. Change it to use the core code for
> >>> page_num and the canonical calculation of page_shift.
> >>
> >> Should I submit a fix for stable changing start to virt?
> > 
> > I suspect EFA users never use ibv_reg_mr_iova() so won't have an
> > actual bug?
> 
> That's still a driver bug though, regardless of the userspace so I'd rather fix it.
> Should I submit a patch to for-rc? It would conflict with the for-next one.

If you care enough then propose the parts of this series for
backporting to stable once they are merged to Linus's tree

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers
  2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
                   ` (16 preceding siblings ...)
  2020-09-04 22:41 ` [PATCH v2 17/17] RDMA/ocrdma: Remove fbo from MR Jason Gunthorpe
@ 2020-09-09 18:38 ` Jason Gunthorpe
  17 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-09 18:38 UTC (permalink / raw)
  To: Adit Ranadive, Ariel Elior, Potnuri Bharat Teja, David S. Miller,
	Devesh Sharma, Doug Ledford, Faisal Latif, Gal Pressman,
	GR-everest-linux-l2, Wei Hu(Xavier),
	Jakub Kicinski, Leon Romanovsky, linux-rdma, Weihang Li,
	Michal Kalderon, Naresh Kumar PBS, netdev, Lijun Ou,
	VMware PV-Drivers, Selvin Xavier, Yossi Leybovich, Somnath Kotur,
	Sriharsha Basavapatna, Yishai Hadas
  Cc: Firas JahJah, Henry Orosco, Leon Romanovsky, Michael J. Ruhl,
	Michal Kalderon, Miguel Ojeda, Shiraz Saleem

On Fri, Sep 04, 2020 at 07:41:41PM -0300, Jason Gunthorpe wrote:
> Most RDMA drivers rely on a linear table of DMA addresses organized in
> some device specific page size.
> 
> For a while now the core code has had the rdma_for_each_block() SG
> iterator to help break a umem into DMA blocks for use in the device lists.
> 
> Improve on this by adding rdma_umem_for_each_dma_block(),
> ib_umem_dma_offset() and ib_umem_num_dma_blocks().
> 
> Replace open codings, or calls to fixed PAGE_SIZE APIs, in most of the
> drivers with one of the above APIs.
> 
> Get rid of the really weird and duplicative ib_umem_page_count().
> 
> Fix two problems with ib_umem_find_best_pgsz(), and several problems
> related to computing the wrong DMA list length if IOVA != umem->address.
> 
> At this point many of the driver have a clear path to call
> ib_umem_find_best_pgsz() and replace hardcoded PAGE_SIZE or PAGE_SHIFT
> values when constructing their DMA lists.
> 
> This is the first series in an effort to modernize the umem usage in all
> the DMA drivers.
> 
> v1: https://lore.kernel.org/r/0-v1-00f59ce24f1f+19f50-umem_1_jgg@nvidia.com
> v2:
>  - Fix ib_umem_find_best_pgsz() to use IOVA not umem->addr
>  - Fix ib_umem_num_dma_blocks() to use IOVA not umem->addr
>  - Two new patches to remove wrong open coded versions of
>    ib_umem_num_dma_blocks() from EFA and i40iw
>  - Redo the mlx4 ib_umem_num_dma_blocks() to do less and be safer
>    until the whole thing can be moved to ib_umem_find_best_pgsz()
>  - Two new patches to delete calls to ib_umem_offset() in qedr and
>    ocrdma
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason Gunthorpe (17):
>   RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page
>     boundary
>   RDMA/umem: Prevent small pages from being returned by
>     ib_umem_find_best_pgsz()
>   RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()
>   RDMA/umem: Add rdma_umem_for_each_dma_block()
>   RDMA/umem: Replace for_each_sg_dma_page with
>     rdma_umem_for_each_dma_block
>   RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
>   RDMA/efa: Use ib_umem_num_dma_pages()
>   RDMA/i40iw: Use ib_umem_num_dma_pages()
>   RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding
>   RDMA/qedr: Use ib_umem_num_dma_blocks() instead of
>     ib_umem_page_count()
>   RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages()
>   RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
>   RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of
>     ib_umem_page_count()
>   RDMA/pvrdma: Use ib_umem_num_dma_blocks() instead of
>     ib_umem_page_count()
>   RDMA/mlx4: Use ib_umem_num_dma_blocks()
>   RDMA/qedr: Remove fbo and zbva from the MR
>   RDMA/ocrdma: Remove fbo from MR

Applied to for-next with Leon's note. Thanks everyone

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
  2020-09-04 22:41 ` [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks() Jason Gunthorpe
  2020-09-07 12:16   ` Gal Pressman
@ 2020-09-11 13:21   ` Jason Gunthorpe
  1 sibling, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 13:21 UTC (permalink / raw)
  To: Adit Ranadive, Potnuri Bharat Teja, Doug Ledford,
	Leon Romanovsky, linux-rdma, VMware PV-Drivers

On Fri, Sep 04, 2020 at 07:41:47PM -0300, Jason Gunthorpe wrote:
> @@ -33,11 +34,17 @@ static inline int ib_umem_offset(struct ib_umem *umem)
>  	return umem->address & ~PAGE_MASK;
>  }
>  
> +static inline size_t ib_umem_num_dma_blocks(struct ib_umem *umem,
> +					    unsigned long pgsz)
> +{
> +	return (ALIGN(umem->iova + umem->length, pgsz) -
> +		ALIGN_DOWN(umem->iova, pgsz)) /
> +	       pgsz;
> +}

0-day says this triggers a __udivdi3 error because iova is 64 bit,
I'll change this to:

 static inline size_t ib_umem_num_dma_blocks(struct ib_umem *umem,
                                            unsigned long pgsz)
 {
-       return (ALIGN(umem->iova + umem->length, pgsz) -
-               ALIGN_DOWN(umem->iova, pgsz)) /
+       return (size_t)((ALIGN(umem->iova + umem->length, pgsz) -
+                        ALIGN_DOWN(umem->iova, pgsz))) /
               pgsz;
 }
 

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-09-11 13:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 22:41 [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 01/17] RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundary Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 02/17] RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 03/17] RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 04/17] RDMA/umem: Add rdma_umem_for_each_dma_block() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 05/17] RDMA/umem: Replace for_each_sg_dma_page with rdma_umem_for_each_dma_block Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 06/17] RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks() Jason Gunthorpe
2020-09-07 12:16   ` Gal Pressman
2020-09-11 13:21   ` Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 07/17] RDMA/efa: Use ib_umem_num_dma_pages() Jason Gunthorpe
2020-09-07 12:19   ` Gal Pressman
2020-09-08 13:48     ` Jason Gunthorpe
2020-09-09  8:18       ` Gal Pressman
2020-09-09 11:14         ` Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 08/17] RDMA/i40iw: " Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 09/17] RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 10/17] RDMA/qedr: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 11/17] RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 12/17] RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding Jason Gunthorpe
2020-09-07  8:11   ` liweihang
2020-09-04 22:41 ` [PATCH v2 13/17] RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 14/17] RDMA/pvrdma: " Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 15/17] RDMA/mlx4: Use ib_umem_num_dma_blocks() Jason Gunthorpe
2020-09-04 22:41 ` [PATCH v2 16/17] RDMA/qedr: Remove fbo and zbva from the MR Jason Gunthorpe
2020-09-06  8:01   ` [EXT] " Michal Kalderon
2020-09-04 22:41 ` [PATCH v2 17/17] RDMA/ocrdma: Remove fbo from MR Jason Gunthorpe
2020-09-06  7:21   ` Leon Romanovsky
2020-09-09 18:38 ` [PATCH v2 00/17] RDMA: Improve use of umem in DMA drivers Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).