linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
@ 2019-10-30  9:44 Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 1/8] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

This patch series uses the doorbell overflow recovery mechanism
introduced in
commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
for rdma ( RoCE and iWARP )

The first six patches modify the core code to contain helper
functions for managing mmap_xa inserting, getting and freeing
entries. The code was based on the code from efa driver.
There is still an open discussion on whether we should take
this even further and make the entire mmap generic. Until a
decision is made, I only created the database API and modified
the efa, qedr, siw driver to use it. The functions are integrated
with the umap mechanism.

The doorbell recovery code is based on the common code.

rdma-core pull request #493 was closed for now, once kernel series is
accepted will be reopend.

This series applies over the wip/jgg-for-next branch and not the
for-next since it contains the series:
RDMA/qedr: Fix memory leaks and synchronization
https://www.spinics.net/lists/linux-rdma/msg85242.html

SIW driver was reviewed, tested and signed-off by Bernard Metzler.

Changes from v11:
- Linking the umap_priv and the mmap entry should only be done for entries
  mapped with rdma_user_mmap_io since for cpu pages, the mm performs the
  reference counting. Therefore only rdma_user_mmap_io sets the
  vm_private_data and now accepts a new parameter from drivers if they use
  the rdma_user_mmap api.

- QEDR - the dpi hw resource used for mapping the doorbell to the user is
  only freed in mmap_free, once we know there are no more mappings open
  to it.

- QEDR - modify cpu page address to be void * instead of u64, use a union
  for io and cpu address.

- Add an invalidate flag to a mmap_xa entry to disable future mappings to
  the offset.

- In the uverbs_user_mmap_disassocate function fix the location that drops
  the mmap_xa entry reference, and set the pointer to NULL.

- Instead of storing the key in the drivers, store the pointer allocated by
  the driver to rdma_user_mmap_entry, this gets rid of INVALID_KEY.
  This change affected EFA / SIW / QEDR.

- Separated the code that links the umap + mmap_entry to a different patch.

- Store NULL in priv->entry after reference is dropped in umap_close to avoid
  double put.

- Fixed some comments and function error flows.

Changes from v10:
- SIW - remove keys only when user context exists.
- EFA - fix debug print in error path of mmap.

Changes from V9:
- EFA changes (requested by Gal)
  - Entries should be removed and deleted in destroy_qp flow only after
    they are unmapped.
  - Fix mmap entries remove to really be in reverse order :-)
  - Always refer to the rdma_user_mmap_entry as rdma_entry for consistency.
  - In case of errors in __efa_mmap, print an error message only once.
SIW changes (requested by Bernard)
  - No need to check if the key is invalid before removing it. There are
    already sanity checks inside the remove function.

Changes from V8:
- CORE changes
  - Fix race between getting an entry and deleting it. Increase
    the refcount under the lock only if it is not zero.  Erase all entries
    with __xa_erase instead of xa_erase and take the lock outside the loop.
  - Fix comment when erasing all the xa_entries of a single mmap_entry.
  - Take comment out of loop
  - Change length field in driver structures to be size_t instead of u64
    suggested by Bernard Metzler
  - Change do..while(true) to while(true)
- COMMON driver changes
  - Change mmap length to be size_t instead of u64.
  - In mmap, call put_entry if there is a length error.
- EFA changes:
  - Reverse mmap entries remove order.
  - Give meaningful label names in create_qp error flows.
  - In error flow undo change that frees pages based only on key and
    make sure rq_size > 0 first.
  - Fix xmas tree alignment, move ucontext initialization to declaration
    line.
- SIW changes:
  - Changes received from Bernard Metzler
	- make the siw_user_mmap_entry.address a void *, which
	  naturally fits with remap_vmalloc_range. also avoids
	  other casting during resource address assignment.
	- do not kfree SQ/RQ/CQ/SRQ in preparation of mmap.
	  Those resources are always further needed ;)
	- Fix check for correct mmap range in siw_mmap().
	  - entry->length is the object length. We have to
	    expand to PAGE_ALIGN(entry->length), since mmap
	    comes with complete page(s) containing the
	    object.
	  - put mmap_entry if that check fails. Otherwise
	    entry object ref counting screws up, and later
	    crashes during context close.
	- simplify siw_mmap_free() - it must just free
	  the entry.
  - Change length to size_t instead of u64

Changes from V7:
- Remove license text, SPDX id should suffice.
- Fix some comments text.
- Add comment regarding vm_ops being set in ib_uverbs_mmap.
- Allocate the rdma_user_mmap_entry in the driver and not in the
  ib_core_uverbs. This lead to defining three new structures per driver
  and seperating the fields between the driver private structures and
  the common rdma_user_mmap_entry. Freeing the entry was also moved
  to the drivers.
- Fix bug found by Gal Pressman. Call mmap_free only once per entry.
- Add a mutex around xa_mmap insert to assure threads won't intefere
  while the xa lock is released when inserting an entry into the range.
- Modify the insert algorithm to be more elegant using the
  xas_next_entry instead of foreach.
- Remove the rdma_user_mmap_entries_remove_free function, now that umap.
  and mmap_xa are integrated we should not have any entries in the mmap_xa
  when ucontext is released. Replace the function with a WARN_ON(!xa_empty).
- Rdma_umap_open needs to reset the vm_private_data before initializing it.
- Decrease rdma_user_mmap_entry reference count on mmap disassociate.
- Remove WARN_ON(!kref_read) this is checked when kref debug is on.
- Remove some redundant defines from ib_verbs.h.
- Better error handling for efa create qp flow.
- Add a function that wraps the entry allocation and rdma_user_mmap_entry_insert
  which is used in all places that need to add an entry to the xarray.
- Remove rq_entry_inserted field in efa create qp flow.
- Add mmap_free to siw and free the memory only on mmap free and not before.

Changes from V6:
- Modified series description to be closer to what the series is now.
- Create a new file for the new rdma_user_mmap function. The file
  is called ib_uverbs_core. This file should contain functions related
  to user which are called by hw to eventually enable ib_uverbs to be
  optional.
- Modify SIW driver to use new mmap api.
- When calculating number of pages, need to round it up to PAGE_SIZE.
- Integrate the mmap_xa and umap mechanism so that the entries in
  mmap_xa now have a reference count and can be removed. Previously
  entries existed until context was destroyed. This modified the
  algorithm for allocating a free page range.
- Modify algorithm for inserting an entry into the mmap_xa.
- Rdma_umap_priv is now also used for all mmaps done using the
  mmap_xa helpers.
- Move remove_free header to core_priv.
- Rdma_user_mmap_entry now has a kref that is increase on mmap
  and umap_open and decreased on umap_close.
- Modify efa + qedr to remove the entry from xa_map. This will
  decrease the refcnt and free memory only if refcnt is zero.
- Rdma_user_mmap_io slightly modified to enable drivers not using
  the xa_mmap API to continue using it.
- Modify page allocation for user to use GFP_USER instead of GFP_KERNEL

Changes from V5:
- Switch between driver dealloc_ucontext and mmap_entries_remove call.
- No need to verify the key after using the key to load an entry from
  the mmap_xa.
- Change mmap_free api to pass an 'entry' object.
- Add documentation for mmap_free and for newly exported functions.
- Fix some extra/missing line breaks.

Changes from V4:
- Add common mmap database and cookie helper functions.

Changes from V3:
- Remove casts from void to u8. Pointer arithmetic can be done on void
- rebase to tip of rdma-next

Changes from V2:
- Don't use long-lived kmap. Instead use user-trigger mmap for the
  doorbell recovery entries.
- Modify dpi_addr to be denoted with __iomem and avoid redundant
  casts

Changes from V1:
- call kmap to map virtual address into kernel space
- modify db_rec_delete to be void
- remove some cpu_to_le16 that were added to previous patch which are
  correct but not related to the overflow recovery mechanism. Will be
  submitted as part of a different patch


Michal Kalderon (8):
  RDMA/core: Move core content from ib_uverbs to ib_core
  RDMA/core: Create mmap database and cookie helper functions
  RDMA: Connect between the mmap entry and the umap_priv structure
  RDMA/efa: Use the common mmap_xa helpers
  RDMA/siw: Use the common mmap_xa helpers
  RDMA/qedr: Use the common mmap API
  RDMA/qedr: Add doorbell overflow recovery support
  RDMA/qedr: Add iWARP doorbell recovery support

 drivers/infiniband/core/Makefile          |   2 +-
 drivers/infiniband/core/core_priv.h       |  14 +
 drivers/infiniband/core/device.c          |   1 +
 drivers/infiniband/core/ib_core_uverbs.c  | 299 ++++++++++++++++
 drivers/infiniband/core/rdma_core.c       |   1 +
 drivers/infiniband/core/uverbs_cmd.c      |   2 +
 drivers/infiniband/core/uverbs_main.c     |  88 +----
 drivers/infiniband/hw/efa/efa.h           |  18 +-
 drivers/infiniband/hw/efa/efa_main.c      |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c     | 329 ++++++++----------
 drivers/infiniband/hw/hns/hns_roce_main.c |   6 +-
 drivers/infiniband/hw/mlx4/main.c         |   9 +-
 drivers/infiniband/hw/mlx5/main.c         |   8 +-
 drivers/infiniband/hw/qedr/main.c         |   1 +
 drivers/infiniband/hw/qedr/qedr.h         |  49 ++-
 drivers/infiniband/hw/qedr/verbs.c        | 551 ++++++++++++++++++++++--------
 drivers/infiniband/hw/qedr/verbs.h        |   3 +-
 drivers/infiniband/sw/siw/siw.h           |  20 +-
 drivers/infiniband/sw/siw/siw_main.c      |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c     | 207 +++++------
 drivers/infiniband/sw/siw/siw_verbs.h     |   1 +
 include/rdma/ib_verbs.h                   |  47 ++-
 include/uapi/rdma/qedr-abi.h              |  25 ++
 23 files changed, 1140 insertions(+), 543 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_core_uverbs.c

-- 
2.14.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 1/8] RDMA/core: Move core content from ib_uverbs to ib_core
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/Makefile         |  2 +-
 drivers/infiniband/core/core_priv.h      |  9 ++++
 drivers/infiniband/core/ib_core_uverbs.c | 73 +++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c    | 74 ++------------------------------
 4 files changed, 86 insertions(+), 72 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_core_uverbs.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 09881bd5f12d..9a8871e21545 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -11,7 +11,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
-				nldev.o restrack.o counters.o
+				nldev.o restrack.o counters.o ib_core_uverbs.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3a8b0911c3bc..0252da9560f4 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -387,4 +387,13 @@ int ib_device_set_netns_put(struct sk_buff *skb,
 
 int rdma_nl_net_init(struct rdma_dev_net *rnet);
 void rdma_nl_net_exit(struct rdma_dev_net *rnet);
+
+struct rdma_umap_priv {
+	struct vm_area_struct *vma;
+	struct list_head list;
+};
+
+void rdma_umap_priv_init(struct rdma_umap_priv *priv,
+			 struct vm_area_struct *vma);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
new file mode 100644
index 000000000000..b74d2a2fb342
--- /dev/null
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2019 Marvell. All rights reserved.
+ */
+#include <linux/xarray.h>
+#include "uverbs.h"
+#include "core_priv.h"
+
+/*
+ * Each time we map IO memory into user space this keeps track of the mapping.
+ * When the device is hot-unplugged we 'zap' the mmaps in user space to point
+ * to the zero page and allow the hot unplug to proceed.
+ *
+ * This is necessary for cases like PCI physical hot unplug as the actual BAR
+ * memory may vanish after this and access to it from userspace could MCE.
+ *
+ * RDMA drivers supporting disassociation must have their user space designed
+ * to cope in some way with their IO pages going to the zero page.
+ */
+void rdma_umap_priv_init(struct rdma_umap_priv *priv,
+			 struct vm_area_struct *vma)
+{
+	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
+
+	priv->vma = vma;
+	vma->vm_private_data = priv;
+	/* vm_ops is setup in ib_uverbs_mmap() to avoid module dependencies */
+
+	mutex_lock(&ufile->umap_lock);
+	list_add(&priv->list, &ufile->umaps);
+	mutex_unlock(&ufile->umap_lock);
+}
+EXPORT_SYMBOL(rdma_umap_priv_init);
+
+/*
+ * Map IO memory into a process. This is to be called by drivers as part of
+ * their mmap() functions if they wish to send something like PCI-E BAR memory
+ * to userspace.
+ */
+int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
+		      unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+	struct ib_uverbs_file *ufile = ucontext->ufile;
+	struct rdma_umap_priv *priv;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	if (vma->vm_end - vma->vm_start != size)
+		return -EINVAL;
+
+	/* Driver is using this wrong, must be called by ib_uverbs_mmap */
+	if (WARN_ON(!vma->vm_file ||
+		    vma->vm_file->private_data != ufile))
+		return -EINVAL;
+	lockdep_assert_held(&ufile->device->disassociate_srcu);
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	vma->vm_page_prot = prot;
+	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
+		kfree(priv);
+		return -EAGAIN;
+	}
+
+	rdma_umap_priv_init(priv, vma);
+	return 0;
+}
+EXPORT_SYMBOL(rdma_user_mmap_io);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index db98111b47f4..b1f5334ff907 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -772,6 +772,8 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	return (ret) ? : count;
 }
 
+static const struct vm_operations_struct rdma_umap_ops;
+
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct ib_uverbs_file *file = filp->private_data;
@@ -785,45 +787,13 @@ static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
 		ret = PTR_ERR(ucontext);
 		goto out;
 	}
-
+	vma->vm_ops = &rdma_umap_ops;
 	ret = ucontext->device->ops.mmap(ucontext, vma);
 out:
 	srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
 	return ret;
 }
 
-/*
- * Each time we map IO memory into user space this keeps track of the mapping.
- * When the device is hot-unplugged we 'zap' the mmaps in user space to point
- * to the zero page and allow the hot unplug to proceed.
- *
- * This is necessary for cases like PCI physical hot unplug as the actual BAR
- * memory may vanish after this and access to it from userspace could MCE.
- *
- * RDMA drivers supporting disassociation must have their user space designed
- * to cope in some way with their IO pages going to the zero page.
- */
-struct rdma_umap_priv {
-	struct vm_area_struct *vma;
-	struct list_head list;
-};
-
-static const struct vm_operations_struct rdma_umap_ops;
-
-static void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-				struct vm_area_struct *vma)
-{
-	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
-
-	priv->vma = vma;
-	vma->vm_private_data = priv;
-	vma->vm_ops = &rdma_umap_ops;
-
-	mutex_lock(&ufile->umap_lock);
-	list_add(&priv->list, &ufile->umaps);
-	mutex_unlock(&ufile->umap_lock);
-}
-
 /*
  * The VMA has been dup'd, initialize the vm_private_data with a new tracking
  * struct
@@ -931,44 +901,6 @@ static const struct vm_operations_struct rdma_umap_ops = {
 	.fault = rdma_umap_fault,
 };
 
-/*
- * Map IO memory into a process. This is to be called by drivers as part of
- * their mmap() functions if they wish to send something like PCI-E BAR memory
- * to userspace.
- */
-int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
-		      unsigned long pfn, unsigned long size, pgprot_t prot)
-{
-	struct ib_uverbs_file *ufile = ucontext->ufile;
-	struct rdma_umap_priv *priv;
-
-	if (!(vma->vm_flags & VM_SHARED))
-		return -EINVAL;
-
-	if (vma->vm_end - vma->vm_start != size)
-		return -EINVAL;
-
-	/* Driver is using this wrong, must be called by ib_uverbs_mmap */
-	if (WARN_ON(!vma->vm_file ||
-		    vma->vm_file->private_data != ufile))
-		return -EINVAL;
-	lockdep_assert_held(&ufile->device->disassociate_srcu);
-
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
-		return -ENOMEM;
-
-	vma->vm_page_prot = prot;
-	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
-		kfree(priv);
-		return -EAGAIN;
-	}
-
-	rdma_umap_priv_init(priv, vma);
-	return 0;
-}
-EXPORT_SYMBOL(rdma_user_mmap_io);
-
 void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 {
 	struct rdma_umap_priv *priv, *next_priv;
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 1/8] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-10-31 12:35   ` Yishai Hadas
  2019-11-05 19:44   ` Jason Gunthorpe
  2019-10-30  9:44 ` [PATCH v12 rdma-next 3/8] RDMA: Connect between the mmap entry and the umap_priv structure Michal Kalderon
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

Create some common API's for adding entries to a xa_mmap.
Searching for an entry and freeing one.

Most of the code was copied from the efa driver almost as is, just renamed
function to be generic and not efa specific.
The fact that this code moved to core enabled managing it differently,
so that now entries can be removed and deleted when driver+user are
done with them. This enabled changing the insert algorithm in
comparison to what was done in efa.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/device.c         |   1 +
 drivers/infiniband/core/ib_core_uverbs.c | 201 +++++++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.c      |   1 +
 drivers/infiniband/core/uverbs_cmd.c     |   2 +
 include/rdma/ib_verbs.h                  |  34 ++++++
 5 files changed, 239 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a667636f74bf..bf3a683057bc 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2629,6 +2629,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
 	SET_DEVICE_OP(dev_ops, map_phys_fmr);
 	SET_DEVICE_OP(dev_ops, mmap);
+	SET_DEVICE_OP(dev_ops, mmap_free);
 	SET_DEVICE_OP(dev_ops, modify_ah);
 	SET_DEVICE_OP(dev_ops, modify_cq);
 	SET_DEVICE_OP(dev_ops, modify_device);
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
index b74d2a2fb342..1ffc89fd5d94 100644
--- a/drivers/infiniband/core/ib_core_uverbs.c
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -71,3 +71,204 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 	return 0;
 }
 EXPORT_SYMBOL(rdma_user_mmap_io);
+
+/**
+ * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @key: the key received from rdma_user_mmap_entry_insert which
+ *     is provided by user as the address to map.
+ * @vma: the vma related to the current mmap call.
+ *
+ * This function is called when a user tries to mmap a key it
+ * initially received from the driver. The key was created by
+ * the function rdma_user_mmap_entry_insert.
+ * This function increases the refcnt of the entry so that it won't
+ * be deleted from the xa in the meantime.
+ *
+ * Return an entry if exists or NULL if there is no match.
+ */
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
+			 struct vm_area_struct *vma)
+{
+	struct rdma_user_mmap_entry *entry;
+	u64 mmap_page;
+
+	mmap_page = key >> PAGE_SHIFT;
+	if (mmap_page > U32_MAX)
+		return NULL;
+
+	xa_lock(&ucontext->mmap_xa);
+
+	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+
+	/* if refcount is zero, entry is already being deleted */
+	if (!entry || entry->invalid || !kref_get_unless_zero(&entry->ref))
+		goto err;
+
+	xa_unlock(&ucontext->mmap_xa);
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] returned\n",
+		  key, entry->npages);
+
+	return entry;
+
+err:
+	xa_unlock(&ucontext->mmap_xa);
+	return NULL;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_get);
+
+void rdma_user_mmap_entry_free(struct kref *kref)
+{
+	struct rdma_user_mmap_entry *entry =
+		container_of(kref, struct rdma_user_mmap_entry, ref);
+	struct ib_ucontext *ucontext = entry->ucontext;
+	unsigned long i;
+
+	/* need to erase all entries occupied by this single entry */
+	xa_lock(&ucontext->mmap_xa);
+	for (i = 0; i < entry->npages; i++)
+		__xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
+	xa_unlock(&ucontext->mmap_xa);
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] removed\n",
+		  rdma_user_mmap_get_key(entry),
+		  entry->npages);
+
+	if (ucontext->device->ops.mmap_free)
+		ucontext->device->ops.mmap_free(entry);
+}
+
+/**
+ * rdma_user_mmap_entry_put() - Drop reference to the mmap entry
+ *
+ * @ucontext: associated user context.
+ * @entry: an entry in the mmap_xa.
+ *
+ * This function is called when the mapping is closed if it was
+ * an io mapping or when the driver is done with the entry for
+ * some other reason.
+ * Should be called after rdma_user_mmap_entry_get was called
+ * and entry is no longer needed. This function will erase the
+ * entry and free it if its refcnt reaches zero.
+ */
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry)
+{
+	kref_put(&entry->ref, rdma_user_mmap_entry_free);
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_put);
+
+/**
+ * rdma_user_mmap_entry_remove() - Drop reference to entry and
+ *				   mark it as invalid.
+ *
+ * @ucontext: associated user context.
+ * @entry: the entry to insert into the mmap_xa
+ */
+void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
+				 struct rdma_user_mmap_entry *entry)
+{
+	if (!entry)
+		return;
+
+	entry->invalid = true;
+	kref_put(&entry->ref, rdma_user_mmap_entry_free);
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
+
+/**
+ * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @entry: the entry to insert into the mmap_xa
+ * @length: length of the address that will be mmapped
+ *
+ * This function should be called by drivers that use the rdma_user_mmap
+ * interface for handling user mmapped addresses. The database is handled in
+ * the core and helper functions are provided to insert entries into the
+ * database and extract entries when the user calls mmap with the given key.
+ * The function allocates a unique key that should be provided to user, the user
+ * will use the key to retrieve information such as address to
+ * be mapped and how.
+ *
+ * Return: 0 on success and -ENOMEM on failure
+ */
+int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				struct rdma_user_mmap_entry *entry,
+				size_t length)
+{
+	struct ib_uverbs_file *ufile = ucontext->ufile;
+	XA_STATE(xas, &ucontext->mmap_xa, 0);
+	u32 xa_first, xa_last, npages;
+	int err, i;
+
+	if (!entry)
+		return -EINVAL;
+
+	kref_init(&entry->ref);
+	entry->ucontext = ucontext;
+
+	/* We want the whole allocation to be done without interruption
+	 * from a different thread. The allocation requires finding a
+	 * free range and storing. During the xa_insert the lock could be
+	 * released, we don't want another thread taking the gap.
+	 */
+	mutex_lock(&ufile->umap_lock);
+
+	xa_lock(&ucontext->mmap_xa);
+
+	/* We want to find an empty range */
+	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
+	entry->npages = npages;
+	while (true) {
+		/* First find an empty index */
+		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
+		if (xas.xa_node == XAS_RESTART)
+			goto err_unlock;
+
+		xa_first = xas.xa_index;
+
+		/* Is there enough room to have the range? */
+		if (check_add_overflow(xa_first, npages, &xa_last))
+			goto err_unlock;
+
+		/* Now look for the next present entry. If such doesn't
+		 * exist, we found an empty range and can proceed
+		 */
+		xas_next_entry(&xas, xa_last - 1);
+		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
+			break;
+		/* o/w look for the next free entry */
+	}
+
+	for (i = xa_first; i < xa_last; i++) {
+		err = __xa_insert(&ucontext->mmap_xa, i, entry, GFP_KERNEL);
+		if (err)
+			goto err_undo;
+	}
+
+	entry->mmap_page = xa_first;
+	xa_unlock(&ucontext->mmap_xa);
+
+	mutex_unlock(&ufile->umap_lock);
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] inserted\n",
+		  rdma_user_mmap_get_key(entry), npages);
+
+	return 0;
+
+err_undo:
+	for (; i > xa_first; i--)
+		__xa_erase(&ucontext->mmap_xa, i - 1);
+
+err_unlock:
+	xa_unlock(&ucontext->mmap_xa);
+	mutex_unlock(&ufile->umap_lock);
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..6c72773faf29 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
 	rdma_restrack_del(&ucontext->res);
 
 	ib_dev->ops.dealloc_ucontext(ucontext);
+	WARN_ON(!xa_empty(&ucontext->mmap_xa));
 	kfree(ucontext);
 
 	ufile->ucontext = NULL;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 14a80fd9f464..06ed32c8662f 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -252,6 +252,8 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
 	ucontext->closing = false;
 	ucontext->cleanup_retryable = false;
 
+	xa_init_flags(&ucontext->mmap_xa, XA_FLAGS_ALLOC);
+
 	ret = get_unused_fd_flags(O_CLOEXEC);
 	if (ret < 0)
 		goto err_free;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6a47ba85c54c..8a87c9d442bc 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1471,6 +1471,7 @@ struct ib_ucontext {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+	struct xarray mmap_xa;
 };
 
 struct ib_uobject {
@@ -2251,6 +2252,20 @@ struct iw_cm_conn_param;
 
 #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
 
+struct rdma_user_mmap_entry {
+	struct kref ref;
+	struct ib_ucontext *ucontext;
+	u32 npages;
+	u32 mmap_page;
+	bool invalid;
+};
+
+static inline u64
+rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
+{
+	return (u64)entry->mmap_page << PAGE_SHIFT;
+}
+
 /**
  * struct ib_device_ops - InfiniBand device operations
  * This structure defines all the InfiniBand device operations, providers will
@@ -2363,6 +2378,13 @@ struct ib_device_ops {
 			      struct ib_udata *udata);
 	void (*dealloc_ucontext)(struct ib_ucontext *context);
 	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
+	/**
+	 * This will be called once refcount of an entry in mmap_xa reaches
+	 * zero. The type of the memory that was mapped may differ between
+	 * entries and is opaque to the rdma_user_mmap interface.
+	 * Therefore needs to be implemented by the driver in mmap_free.
+	 */
+	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
 	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
 	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
 	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
@@ -2801,6 +2823,18 @@ static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
 	return -EINVAL;
 }
 #endif
+int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				struct rdma_user_mmap_entry *entry,
+				size_t length);
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
+			 struct vm_area_struct *vma);
+
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry);
+
+void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
+				 struct rdma_user_mmap_entry *entry);
 
 static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
 {
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 3/8] RDMA: Connect between the mmap entry and the umap_priv structure
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 1/8] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-11-05 19:47   ` Jason Gunthorpe
  2019-10-30  9:44 ` [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

The rdma_user_mmap_io interface created a common interface for drivers
to correctly map hw resources and zap them once the ucontext is
destroyed enabling the drivers to safely free the hw resources.
However, this meant the drivers need to delay freeing the resource
to the ucontext destroy phase to ensure they were no longer mapped.
The new mechanism for a common way of handling user/driver address
mapping enabled notifying the driver if all umap_priv mappings
were removed, and enabled freeing the hw resources when they
are done with and not delay it until ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before.
Drivers that use the mmap_xa interface can pass the entry being
mapped to the rdma_user_mmap_io function to be linked together.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/core_priv.h       |  7 ++++-
 drivers/infiniband/core/ib_core_uverbs.c  | 47 +++++++++++++++++++++++--------
 drivers/infiniband/core/uverbs_main.c     | 14 ++++++++-
 drivers/infiniband/hw/efa/efa_verbs.c     |  6 ++--
 drivers/infiniband/hw/hns/hns_roce_main.c |  6 ++--
 drivers/infiniband/hw/mlx4/main.c         |  9 ++++--
 drivers/infiniband/hw/mlx5/main.c         |  8 ++++--
 include/rdma/ib_verbs.h                   | 13 ++-------
 8 files changed, 76 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 0252da9560f4..355c59d2eaa3 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -391,9 +391,14 @@ void rdma_nl_net_exit(struct rdma_dev_net *rnet);
 struct rdma_umap_priv {
 	struct vm_area_struct *vma;
 	struct list_head list;
+	struct rdma_user_mmap_entry *entry;
 };
 
 void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-			 struct vm_area_struct *vma);
+			 struct vm_area_struct *vma,
+			 struct rdma_user_mmap_entry *entry);
+
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry);
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
index 1ffc89fd5d94..88d9d47fb8ad 100644
--- a/drivers/infiniband/core/ib_core_uverbs.c
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -8,23 +8,36 @@
 #include "uverbs.h"
 #include "core_priv.h"
 
-/*
- * Each time we map IO memory into user space this keeps track of the mapping.
- * When the device is hot-unplugged we 'zap' the mmaps in user space to point
- * to the zero page and allow the hot unplug to proceed.
+/**
+ * rdma_umap_priv_init() - Initialize the private data of a vma
+ *
+ * @vma: The vm area struct that needs private data
+ * @entry: entry into the mmap_xa that needs to be linked with
+ *       this vma
+ *
+ * Each time we map IO memory into user space this keeps track
+ * of the mapping. When the device is hot-unplugged we 'zap' the
+ * mmaps in user space to point to the zero page and allow the
+ * hot unplug to proceed.
  *
  * This is necessary for cases like PCI physical hot unplug as the actual BAR
  * memory may vanish after this and access to it from userspace could MCE.
  *
  * RDMA drivers supporting disassociation must have their user space designed
  * to cope in some way with their IO pages going to the zero page.
+ *
  */
 void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-			 struct vm_area_struct *vma)
+			 struct vm_area_struct *vma,
+			 struct rdma_user_mmap_entry *entry)
 {
 	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
 
 	priv->vma = vma;
+	if (entry) {
+		kref_get(&entry->ref);
+		priv->entry = entry;
+	}
 	vma->vm_private_data = priv;
 	/* vm_ops is setup in ib_uverbs_mmap() to avoid module dependencies */
 
@@ -34,13 +47,25 @@ void rdma_umap_priv_init(struct rdma_umap_priv *priv,
 }
 EXPORT_SYMBOL(rdma_umap_priv_init);
 
-/*
- * Map IO memory into a process. This is to be called by drivers as part of
- * their mmap() functions if they wish to send something like PCI-E BAR memory
- * to userspace.
+/**
+ * rdma_user_mmap_io() - Map IO memory into a process.
+ *
+ * @ucontext: associated user context
+ * @vma: the vma related to the current mmap call.
+ * @pfn: pfn to map
+ * @size: size to map
+ * @prot: pgprot to use in remap call
+ *
+ * This is to be called by drivers as part of their mmap()
+ * functions if they wish to send something like PCI-E BAR
+ * memory to userspace.
+ *
+ * Return -EINVAL on wrong flags or size, -EAGAIN on failure to
+ * map. 0 on success.
  */
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
-		      unsigned long pfn, unsigned long size, pgprot_t prot)
+		      unsigned long pfn, unsigned long size, pgprot_t prot,
+		      struct rdma_user_mmap_entry *entry)
 {
 	struct ib_uverbs_file *ufile = ucontext->ufile;
 	struct rdma_umap_priv *priv;
@@ -67,7 +92,7 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		return -EAGAIN;
 	}
 
-	rdma_umap_priv_init(priv, vma);
+	rdma_umap_priv_init(priv, vma, entry);
 	return 0;
 }
 EXPORT_SYMBOL(rdma_user_mmap_io);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index b1f5334ff907..dbe9bd3d389a 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -819,7 +819,7 @@ static void rdma_umap_open(struct vm_area_struct *vma)
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (!priv)
 		goto out_unlock;
-	rdma_umap_priv_init(priv, vma);
+	rdma_umap_priv_init(priv, vma, opriv->entry);
 
 	up_read(&ufile->hw_destroy_rwsem);
 	return;
@@ -844,6 +844,11 @@ static void rdma_umap_close(struct vm_area_struct *vma)
 	if (!priv)
 		return;
 
+	if (priv->entry) {
+		rdma_user_mmap_entry_put(ufile->ucontext, priv->entry);
+		priv->entry = NULL;
+	}
+
 	/*
 	 * The vma holds a reference on the struct file that created it, which
 	 * in turn means that the ib_uverbs_file is guaranteed to exist at
@@ -946,6 +951,13 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 
 			if (vma->vm_mm != mm)
 				continue;
+
+			if (priv->entry) {
+				rdma_user_mmap_entry_put(ufile->ucontext,
+							 priv->entry);
+				priv->entry = NULL;
+			}
+
 			list_del_init(&priv->list);
 
 			zap_vma_ptes(vma, vma->vm_start,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 4edae89e8e3c..37e3d62bbb51 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1612,11 +1612,13 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 	switch (entry->mmap_flag) {
 	case EFA_MMAP_IO_NC:
 		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length,
-					pgprot_noncached(vma->vm_page_prot));
+					pgprot_noncached(vma->vm_page_prot),
+					NULL);
 		break;
 	case EFA_MMAP_IO_WC:
 		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length,
-					pgprot_writecombine(vma->vm_page_prot));
+					pgprot_writecombine(vma->vm_page_prot),
+					NULL);
 		break;
 	case EFA_MMAP_DMA_PAGE:
 		for (va = vma->vm_start; va < vma->vm_end;
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index b5d196c119ee..803dc6f4b496 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -359,7 +359,8 @@ static int hns_roce_mmap(struct ib_ucontext *context,
 		return rdma_user_mmap_io(context, vma,
 					 to_hr_ucontext(context)->uar.pfn,
 					 PAGE_SIZE,
-					 pgprot_noncached(vma->vm_page_prot));
+					 pgprot_noncached(vma->vm_page_prot),
+					 NULL);
 
 	/* vm_pgoff: 1 -- TPTR */
 	case 1:
@@ -372,7 +373,8 @@ static int hns_roce_mmap(struct ib_ucontext *context,
 		return rdma_user_mmap_io(context, vma,
 					 hr_dev->tptr_dma_addr >> PAGE_SHIFT,
 					 hr_dev->tptr_size,
-					 vma->vm_page_prot);
+					 vma->vm_page_prot,
+					 NULL);
 
 	default:
 		return -EINVAL;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8d2f1e38b891..f89b129b7e3a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1146,7 +1146,8 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 		return rdma_user_mmap_io(context, vma,
 					 to_mucontext(context)->uar.pfn,
 					 PAGE_SIZE,
-					 pgprot_noncached(vma->vm_page_prot));
+					 pgprot_noncached(vma->vm_page_prot),
+					 NULL);
 
 	case 1:
 		if (dev->dev->caps.bf_reg_size == 0)
@@ -1155,7 +1156,8 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 			context, vma,
 			to_mucontext(context)->uar.pfn +
 				dev->dev->caps.num_uars,
-			PAGE_SIZE, pgprot_writecombine(vma->vm_page_prot));
+			PAGE_SIZE, pgprot_writecombine(vma->vm_page_prot),
+			NULL);
 
 	case 3: {
 		struct mlx4_clock_params params;
@@ -1171,7 +1173,8 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 					    params.bar) +
 			 params.offset) >>
 				PAGE_SHIFT,
-			PAGE_SIZE, pgprot_noncached(vma->vm_page_prot));
+			PAGE_SIZE, pgprot_noncached(vma->vm_page_prot),
+			NULL);
 	}
 
 	default:
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b95c2b05f682..eff96303f086 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2168,7 +2168,7 @@ static int uar_mmap(struct mlx5_ib_dev *dev, enum mlx5_ib_mmap_cmd cmd,
 	mlx5_ib_dbg(dev, "uar idx 0x%lx, pfn %pa\n", idx, &pfn);
 
 	err = rdma_user_mmap_io(&context->ibucontext, vma, pfn, PAGE_SIZE,
-				prot);
+				prot, NULL);
 	if (err) {
 		mlx5_ib_err(dev,
 			    "rdma_user_mmap_io failed with error=%d, mmap_cmd=%s\n",
@@ -2210,7 +2210,8 @@ static int dm_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 	      PAGE_SHIFT) +
 	      page_idx;
 	return rdma_user_mmap_io(context, vma, pfn, map_size,
-				 pgprot_writecombine(vma->vm_page_prot));
+				 pgprot_writecombine(vma->vm_page_prot),
+				 NULL);
 }
 
 static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
@@ -2248,7 +2249,8 @@ static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vm
 			PAGE_SHIFT;
 		return rdma_user_mmap_io(&context->ibucontext, vma, pfn,
 					 PAGE_SIZE,
-					 pgprot_noncached(vma->vm_page_prot));
+					 pgprot_noncached(vma->vm_page_prot),
+					 NULL);
 	case MLX5_IB_MMAP_CLOCK_INFO:
 		return mlx5_ib_mmap_clock_info_page(dev, vma, context);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8a87c9d442bc..456d888be411 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2811,18 +2811,9 @@ void  ib_set_client_data(struct ib_device *device, struct ib_client *client,
 void ib_set_device_ops(struct ib_device *device,
 		       const struct ib_device_ops *ops);
 
-#if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
-		      unsigned long pfn, unsigned long size, pgprot_t prot);
-#else
-static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
-				    struct vm_area_struct *vma,
-				    unsigned long pfn, unsigned long size,
-				    pgprot_t prot)
-{
-	return -EINVAL;
-}
-#endif
+		      unsigned long pfn, unsigned long size, pgprot_t prot,
+		      struct rdma_user_mmap_entry *entry);
 int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 				struct rdma_user_mmap_entry *entry,
 				size_t length);
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (2 preceding siblings ...)
  2019-10-30  9:44 ` [PATCH v12 rdma-next 3/8] RDMA: Connect between the mmap entry and the umap_priv structure Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-11-05 19:54   ` Jason Gunthorpe
  2019-10-30  9:44 ` [PATCH v12 rdma-next 5/8] RDMA/siw: " Michal Kalderon
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

Remove the functions related to managing the mmap_xa database.
This code was copied to the ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/efa/efa.h       |  18 +-
 drivers/infiniband/hw/efa/efa_main.c  |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c | 327 +++++++++++++++-------------------
 3 files changed, 164 insertions(+), 182 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 2283e432693e..482d8acaad2c 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -71,8 +71,6 @@ struct efa_dev {
 
 struct efa_ucontext {
 	struct ib_ucontext ibucontext;
-	struct xarray mmap_xa;
-	u32 mmap_xa_page;
 	u16 uarn;
 };
 
@@ -91,6 +89,7 @@ struct efa_cq {
 	struct efa_ucontext *ucontext;
 	dma_addr_t dma_addr;
 	void *cpu_addr;
+	struct rdma_user_mmap_entry *mmap_entry;
 	size_t size;
 	u16 cq_idx;
 };
@@ -101,6 +100,13 @@ struct efa_qp {
 	void *rq_cpu_addr;
 	size_t rq_size;
 	enum ib_qp_state state;
+
+	/* Used for saving mmap_xa entries */
+	struct rdma_user_mmap_entry *sq_db_mmap_entry;
+	struct rdma_user_mmap_entry *llq_desc_mmap_entry;
+	struct rdma_user_mmap_entry *rq_db_mmap_entry;
+	struct rdma_user_mmap_entry *rq_mmap_entry;
+
 	u32 qp_handle;
 	u32 max_send_wr;
 	u32 max_recv_wr;
@@ -116,6 +122,13 @@ struct efa_ah {
 	u8 id[EFA_GID_SIZE];
 };
 
+struct efa_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	size_t length;
+	u8 mmap_flag;
+};
+
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata);
@@ -147,6 +160,7 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata);
 void efa_dealloc_ucontext(struct ib_ucontext *ibucontext);
 int efa_mmap(struct ib_ucontext *ibucontext,
 	     struct vm_area_struct *vma);
+void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 int efa_create_ah(struct ib_ah *ibah,
 		  struct rdma_ah_attr *ah_attr,
 		  u32 flags,
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 83858f7e83d0..0e3050d01b75 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -217,6 +217,7 @@ static const struct ib_device_ops efa_dev_ops = {
 	.get_link_layer = efa_port_link_layer,
 	.get_port_immutable = efa_get_port_immutable,
 	.mmap = efa_mmap,
+	.mmap_free = efa_mmap_free,
 	.modify_qp = efa_modify_qp,
 	.query_device = efa_query_device,
 	.query_gid = efa_query_gid,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 37e3d62bbb51..f39e24df29a5 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -13,10 +13,6 @@
 
 #include "efa.h"
 
-#define EFA_MMAP_FLAG_SHIFT 56
-#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
-#define EFA_MMAP_INVALID U64_MAX
-
 enum {
 	EFA_MMAP_DMA_PAGE = 0,
 	EFA_MMAP_IO_WC,
@@ -27,20 +23,6 @@ enum {
 	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
 	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
 
-struct efa_mmap_entry {
-	void  *obj;
-	u64 address;
-	u64 length;
-	u32 mmap_page;
-	u8 mmap_flag;
-};
-
-static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
-{
-	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
-	       ((u64)efa->mmap_page << PAGE_SHIFT);
-}
-
 #define EFA_DEFINE_STATS(op) \
 	op(EFA_TX_BYTES, "tx_bytes") \
 	op(EFA_TX_PKTS, "tx_pkts") \
@@ -147,6 +129,12 @@ static inline struct efa_ah *to_eah(struct ib_ah *ibah)
 	return container_of(ibah, struct efa_ah, ibah);
 }
 
+static inline struct efa_user_mmap_entry *
+to_emmap(struct rdma_user_mmap_entry *rdma_entry)
+{
+	return container_of(rdma_entry, struct efa_user_mmap_entry, rdma_entry);
+}
+
 #define field_avail(x, fld, sz) (offsetof(typeof(x), fld) + \
 				 FIELD_SIZEOF(typeof(x), fld) <= (sz))
 
@@ -172,106 +160,6 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
 	return addr;
 }
 
-/*
- * This is only called when the ucontext is destroyed and there can be no
- * concurrent query via mmap or allocate on the xarray, thus we can be sure no
- * other thread is using the entry pointer. We also know that all the BAR
- * pages have either been zap'd or munmaped at this point.  Normal pages are
- * refcounted and will be freed at the proper time.
- */
-static void mmap_entries_remove_free(struct efa_dev *dev,
-				     struct efa_ucontext *ucontext)
-{
-	struct efa_mmap_entry *entry;
-	unsigned long mmap_page;
-
-	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
-		xa_erase(&ucontext->mmap_xa, mmap_page);
-
-		ibdev_dbg(
-			&dev->ibdev,
-			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-			entry->obj, get_mmap_key(entry), entry->address,
-			entry->length);
-		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
-			/* DMA mapping is already gone, now free the pages */
-			free_pages_exact(phys_to_virt(entry->address),
-					 entry->length);
-		kfree(entry);
-	}
-}
-
-static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
-					     struct efa_ucontext *ucontext,
-					     u64 key, u64 len)
-{
-	struct efa_mmap_entry *entry;
-	u64 mmap_page;
-
-	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
-	if (mmap_page > U32_MAX)
-		return NULL;
-
-	entry = xa_load(&ucontext->mmap_xa, mmap_page);
-	if (!entry || get_mmap_key(entry) != key || entry->length != len)
-		return NULL;
-
-	ibdev_dbg(&dev->ibdev,
-		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-		  entry->obj, key, entry->address, entry->length);
-
-	return entry;
-}
-
-/*
- * Note this locking scheme cannot support removal of entries, except during
- * ucontext destruction when the core code guarentees no concurrency.
- */
-static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext *ucontext,
-			     void *obj, u64 address, u64 length, u8 mmap_flag)
-{
-	struct efa_mmap_entry *entry;
-	u32 next_mmap_page;
-	int err;
-
-	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
-	if (!entry)
-		return EFA_MMAP_INVALID;
-
-	entry->obj = obj;
-	entry->address = address;
-	entry->length = length;
-	entry->mmap_flag = mmap_flag;
-
-	xa_lock(&ucontext->mmap_xa);
-	if (check_add_overflow(ucontext->mmap_xa_page,
-			       (u32)(length >> PAGE_SHIFT),
-			       &next_mmap_page))
-		goto err_unlock;
-
-	entry->mmap_page = ucontext->mmap_xa_page;
-	ucontext->mmap_xa_page = next_mmap_page;
-	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
-			  GFP_KERNEL);
-	if (err)
-		goto err_unlock;
-
-	xa_unlock(&ucontext->mmap_xa);
-
-	ibdev_dbg(
-		&dev->ibdev,
-		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
-		entry->obj, entry->address, entry->length, get_mmap_key(entry));
-
-	return get_mmap_key(entry);
-
-err_unlock:
-	xa_unlock(&ucontext->mmap_xa);
-	kfree(entry);
-	return EFA_MMAP_INVALID;
-
-}
-
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata)
@@ -485,8 +373,19 @@ static int efa_destroy_qp_handle(struct efa_dev *dev, u32 qp_handle)
 	return efa_com_destroy_qp(&dev->edev, &params);
 }
 
+static void efa_qp_user_mmap_entries_remove(struct efa_ucontext *uctx,
+					    struct efa_qp *qp)
+{
+	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->rq_mmap_entry);
+	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->rq_db_mmap_entry);
+	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->llq_desc_mmap_entry);
+	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->sq_db_mmap_entry);
+}
+
 int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
+	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata,
+		struct efa_ucontext, ibucontext);
 	struct efa_dev *dev = to_edev(ibqp->pd->device);
 	struct efa_qp *qp = to_eqp(ibqp);
 	int err;
@@ -505,61 +404,102 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 				 DMA_TO_DEVICE);
 	}
 
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
 	kfree(qp);
 	return 0;
 }
 
+static struct rdma_user_mmap_entry*
+efa_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+			   u64 address, size_t length,
+			   u8 mmap_flag, u64 *key)
+{
+	struct efa_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	int err;
+
+	if (!entry)
+		return NULL;
+
+	entry->address = address;
+	entry->length = length;
+	entry->mmap_flag = mmap_flag;
+
+	err = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
+					  length);
+	if (err) {
+		kfree(entry);
+		return NULL;
+	}
+	*key = rdma_user_mmap_get_key(&entry->rdma_entry);
+
+	return &entry->rdma_entry;
+}
+
 static int qp_mmap_entries_setup(struct efa_qp *qp,
 				 struct efa_dev *dev,
 				 struct efa_ucontext *ucontext,
 				 struct efa_com_create_qp_params *params,
 				 struct efa_ibv_create_qp_resp *resp)
 {
-	/*
-	 * Once an entry is inserted it might be mmapped, hence cannot be
-	 * cleaned up until dealloc_ucontext.
-	 */
-	resp->sq_db_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->db_bar_addr + resp->sq_db_offset,
-				  PAGE_SIZE, EFA_MMAP_IO_NC);
-	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
+	size_t length;
+	u64 address;
+
+	address = dev->db_bar_addr + resp->sq_db_offset;
+	qp->sq_db_mmap_entry =
+		efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					   address,
+					   PAGE_SIZE, EFA_MMAP_IO_NC,
+					   &resp->sq_db_mmap_key);
+	if (!qp->sq_db_mmap_entry)
 		return -ENOMEM;
 
 	resp->sq_db_offset &= ~PAGE_MASK;
 
-	resp->llq_desc_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->mem_bar_addr + resp->llq_desc_offset,
-				  PAGE_ALIGN(params->sq_ring_size_in_bytes +
-					     (resp->llq_desc_offset & ~PAGE_MASK)),
-				  EFA_MMAP_IO_WC);
-	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+	address = dev->mem_bar_addr + resp->llq_desc_offset;
+	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
+			    (resp->llq_desc_offset & ~PAGE_MASK));
+
+	qp->llq_desc_mmap_entry =
+		efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					   address, length,
+					   EFA_MMAP_IO_WC,
+					   &resp->llq_desc_mmap_key);
+	if (!qp->llq_desc_mmap_entry)
+		goto err_remove_mmap;
 
 	resp->llq_desc_offset &= ~PAGE_MASK;
 
 	if (qp->rq_size) {
-		resp->rq_db_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  dev->db_bar_addr + resp->rq_db_offset,
-					  PAGE_SIZE, EFA_MMAP_IO_NC);
-		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
-			return -ENOMEM;
+		address = dev->db_bar_addr + resp->rq_db_offset;
+
+		qp->rq_db_mmap_entry =
+			efa_user_mmap_entry_insert(&ucontext->ibucontext,
+						   address, PAGE_SIZE,
+						   EFA_MMAP_IO_NC,
+						   &resp->rq_db_mmap_key);
+		if (!qp->rq_db_mmap_entry)
+			goto err_remove_mmap;
 
 		resp->rq_db_offset &= ~PAGE_MASK;
 
-		resp->rq_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  virt_to_phys(qp->rq_cpu_addr),
-					  qp->rq_size, EFA_MMAP_DMA_PAGE);
-		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
-			return -ENOMEM;
+		address = virt_to_phys(qp->rq_cpu_addr);
+		qp->rq_mmap_entry =
+			efa_user_mmap_entry_insert(&ucontext->ibucontext,
+						   address, qp->rq_size,
+						   EFA_MMAP_DMA_PAGE,
+						   &resp->rq_mmap_key);
+		if (!qp->rq_mmap_entry)
+			goto err_remove_mmap;
 
 		resp->rq_mmap_size = qp->rq_size;
 	}
 
 	return 0;
+
+err_remove_mmap:
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
+
+	return -ENOMEM;
 }
 
 static int efa_qp_validate_cap(struct efa_dev *dev,
@@ -634,7 +574,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 	struct efa_dev *dev = to_edev(ibpd->device);
 	struct efa_ibv_create_qp_resp resp = {};
 	struct efa_ibv_create_qp cmd = {};
-	bool rq_entry_inserted = false;
 	struct efa_ucontext *ucontext;
 	struct efa_qp *qp;
 	int err;
@@ -742,7 +681,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 	if (err)
 		goto err_destroy_qp;
 
-	rq_entry_inserted = true;
 	qp->qp_handle = create_qp_resp.qp_handle;
 	qp->ibqp.qp_num = create_qp_resp.qp_num;
 	qp->ibqp.qp_type = init_attr->qp_type;
@@ -759,7 +697,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 			ibdev_dbg(&dev->ibdev,
 				  "Failed to copy udata for qp[%u]\n",
 				  create_qp_resp.qp_num);
-			goto err_destroy_qp;
+			goto err_remove_mmap_entries;
 		}
 	}
 
@@ -767,13 +705,16 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 
 	return &qp->ibqp;
 
+err_remove_mmap_entries:
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
 err_destroy_qp:
 	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
 err_free_mapped:
 	if (qp->rq_size) {
 		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr, qp->rq_size,
 				 DMA_TO_DEVICE);
-		if (!rq_entry_inserted)
+
+		if (!qp->rq_mmap_entry)
 			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
 	}
 err_free_qp:
@@ -887,6 +828,9 @@ static int efa_destroy_cq_idx(struct efa_dev *dev, int cq_idx)
 
 void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
+	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata,
+			struct efa_ucontext, ibucontext);
+
 	struct efa_dev *dev = to_edev(ibcq->device);
 	struct efa_cq *cq = to_ecq(ibcq);
 
@@ -897,16 +841,19 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    cq->mmap_entry);
 }
 
 static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
 				 struct efa_ibv_create_cq_resp *resp)
 {
 	resp->q_mmap_size = cq->size;
-	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
-					     virt_to_phys(cq->cpu_addr),
-					     cq->size, EFA_MMAP_DMA_PAGE);
-	if (resp->q_mmap_key == EFA_MMAP_INVALID)
+	cq->mmap_entry = efa_user_mmap_entry_insert(&cq->ucontext->ibucontext,
+						    virt_to_phys(cq->cpu_addr),
+						    cq->size, EFA_MMAP_DMA_PAGE,
+						    &resp->q_mmap_key);
+	if (!cq->mmap_entry)
 		return -ENOMEM;
 
 	return 0;
@@ -924,7 +871,6 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	struct efa_dev *dev = to_edev(ibdev);
 	struct efa_ibv_create_cq cmd = {};
 	struct efa_cq *cq = to_ecq(ibcq);
-	bool cq_entry_inserted = false;
 	int entries = attr->cqe;
 	int err;
 
@@ -1013,15 +959,13 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		goto err_destroy_cq;
 	}
 
-	cq_entry_inserted = true;
-
 	if (udata->outlen) {
 		err = ib_copy_to_udata(udata, &resp,
 				       min(sizeof(resp), udata->outlen));
 		if (err) {
 			ibdev_dbg(ibdev,
 				  "Failed to copy udata for create_cq\n");
-			goto err_destroy_cq;
+			goto err_remove_mmap;
 		}
 	}
 
@@ -1030,13 +974,16 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	return 0;
 
+err_remove_mmap:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, cq->mmap_entry);
 err_destroy_cq:
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 err_free_mapped:
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
-	if (!cq_entry_inserted)
+	if (!cq->mmap_entry)
 		free_pages_exact(cq->cpu_addr, cq->size);
+
 err_out:
 	atomic64_inc(&dev->stats.sw_stats.create_cq_err);
 	return err;
@@ -1556,7 +1503,6 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
 		goto err_out;
 
 	ucontext->uarn = result.uarn;
-	xa_init(&ucontext->mmap_xa);
 
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
@@ -1585,40 +1531,60 @@ void efa_dealloc_ucontext(struct ib_ucontext *ibucontext)
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
 
-	mmap_entries_remove_free(dev, ucontext);
 	efa_dealloc_uar(dev, ucontext->uarn);
 }
 
+void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
+{
+	struct efa_user_mmap_entry *entry = to_emmap(rdma_entry);
+
+	/* DMA mapping is already gone, now free the pages */
+	if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
+		free_pages_exact(phys_to_virt(entry->address),
+				 entry->length);
+	kfree(entry);
+}
+
 static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
-		      struct vm_area_struct *vma, u64 key, u64 length)
+		      struct vm_area_struct *vma, u64 key, size_t length)
 {
-	struct efa_mmap_entry *entry;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct efa_user_mmap_entry *entry;
 	unsigned long va;
+	int err = 0;
 	u64 pfn;
-	int err;
 
-	entry = mmap_entry_get(dev, ucontext, key, length);
-	if (!entry) {
+	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
+					      vma);
+	if (!rdma_entry) {
 		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
 			  key);
 		return -EINVAL;
 	}
+	entry = to_emmap(rdma_entry);
+	if (entry->length != length) {
+		ibdev_dbg(&dev->ibdev,
+			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
+			  key, length, entry->length);
+		err = -EINVAL;
+		goto out;
+	}
 
 	ibdev_dbg(&dev->ibdev,
-		  "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
-		  entry->address, length, entry->mmap_flag);
+		  "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n",
+		  entry->address, entry->length, entry->mmap_flag);
 
 	pfn = entry->address >> PAGE_SHIFT;
 	switch (entry->mmap_flag) {
 	case EFA_MMAP_IO_NC:
 		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length,
 					pgprot_noncached(vma->vm_page_prot),
-					NULL);
+					rdma_entry);
 		break;
 	case EFA_MMAP_IO_WC:
 		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length,
 					pgprot_writecombine(vma->vm_page_prot),
-					NULL);
+					rdma_entry);
 		break;
 	case EFA_MMAP_DMA_PAGE:
 		for (va = vma->vm_start; va < vma->vm_end;
@@ -1633,14 +1599,15 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 	}
 
 	if (err) {
-		ibdev_dbg(
-			&dev->ibdev,
-			"Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
-			entry->address, length, entry->mmap_flag, err);
-		return err;
+		ibdev_dbg(&dev->ibdev,
+			  "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
+			  entry->address, length, entry->mmap_flag, err);
 	}
 
-	return 0;
+out:
+	rdma_user_mmap_entry_put(&ucontext->ibucontext, rdma_entry);
+
+	return err;
 }
 
 int efa_mmap(struct ib_ucontext *ibucontext,
@@ -1648,16 +1615,16 @@ int efa_mmap(struct ib_ucontext *ibucontext,
 {
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
-	u64 length = vma->vm_end - vma->vm_start;
+	size_t length = vma->vm_end - vma->vm_start;
 	u64 key = vma->vm_pgoff << PAGE_SHIFT;
 
 	ibdev_dbg(&dev->ibdev,
-		  "start %#lx, end %#lx, length = %#llx, key = %#llx\n",
+		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
 		  vma->vm_start, vma->vm_end, length, key);
 
 	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
 		ibdev_dbg(&dev->ibdev,
-			  "length[%#llx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+			  "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
 			  length, PAGE_SIZE, vma->vm_flags);
 		return -EINVAL;
 	}
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 5/8] RDMA/siw: Use the common mmap_xa helpers
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (3 preceding siblings ...)
  2019-10-30  9:44 ` [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 6/8] RDMA/qedr: Use the common mmap API Michal Kalderon
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

Remove the functions related to managing the mmap_xa database.
This code is now common in ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
---
 drivers/infiniband/sw/siw/siw.h       |  20 +++-
 drivers/infiniband/sw/siw/siw_main.c  |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c | 207 ++++++++++++++++++----------------
 drivers/infiniband/sw/siw/siw_verbs.h |   1 +
 4 files changed, 128 insertions(+), 101 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index dba4535494ab..824b11d5b879 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -220,7 +220,7 @@ struct siw_cq {
 	u32 cq_get;
 	u32 num_cqe;
 	bool kernel_verbs;
-	u32 xa_cq_index; /* mmap information for CQE array */
+	struct rdma_user_mmap_entry *cq_entry; /* mmap info for CQE array */
 	u32 id; /* For debugging only */
 };
 
@@ -263,7 +263,7 @@ struct siw_srq {
 	u32 rq_put;
 	u32 rq_get;
 	u32 num_rqe; /* max # of wqe's allowed */
-	u32 xa_srq_index; /* mmap information for SRQ array */
+	struct rdma_user_mmap_entry *srq_entry; /* mmap info for SRQ array */
 	char armed; /* inform user if limit hit */
 	char kernel_verbs; /* '1' if kernel client */
 };
@@ -477,8 +477,8 @@ struct siw_qp {
 		u8 layer : 4, etype : 4;
 		u8 ecode;
 	} term_info;
-	u32 xa_sq_index; /* mmap information for SQE array */
-	u32 xa_rq_index; /* mmap information for RQE array */
+	struct rdma_user_mmap_entry *sq_entry; /* mmap info for SQE array */
+	struct rdma_user_mmap_entry *rq_entry; /* mmap info for RQE array */
 	struct rcu_head rcu;
 };
 
@@ -503,6 +503,12 @@ struct iwarp_msg_info {
 	int (*rx_data)(struct siw_qp *qp);
 };
 
+struct siw_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	void *address;
+	size_t length;
+};
+
 /* Global siw parameters. Currently set in siw_main.c */
 extern const bool zcopy_tx;
 extern const bool try_gso;
@@ -607,6 +613,12 @@ static inline struct siw_mr *to_siw_mr(struct ib_mr *base_mr)
 	return container_of(base_mr, struct siw_mr, base_mr);
 }
 
+static inline struct siw_user_mmap_entry *
+to_siw_mmap_entry(struct rdma_user_mmap_entry *rdma_mmap)
+{
+	return container_of(rdma_mmap, struct siw_user_mmap_entry, rdma_entry);
+}
+
 static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev, int id)
 {
 	struct siw_qp *qp;
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index d1a1b7aa7d83..4b44d5f075fd 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -299,6 +299,7 @@ static const struct ib_device_ops siw_device_ops = {
 	.iw_rem_ref = siw_qp_put_ref,
 	.map_mr_sg = siw_map_mr_sg,
 	.mmap = siw_mmap,
+	.mmap_free = siw_mmap_free,
 	.modify_qp = siw_verbs_modify_qp,
 	.modify_srq = siw_modify_srq,
 	.poll_cq = siw_poll_cq,
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index 869e02b69a01..71c7a61cbfe3 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -34,44 +34,20 @@ static char ib_qp_state_to_string[IB_QPS_ERR + 1][sizeof("RESET")] = {
 	[IB_QPS_ERR] = "ERR"
 };
 
-static u32 siw_create_uobj(struct siw_ucontext *uctx, void *vaddr, u32 size)
+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
-	struct siw_uobj *uobj;
-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
-	u32 key;
+	struct siw_user_mmap_entry *entry = to_siw_mmap_entry(rdma_entry);
 
-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
-	if (!uobj)
-		return SIW_INVAL_UOBJ_KEY;
-
-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit, &uctx->uobj_nextkey,
-			    GFP_KERNEL) < 0) {
-		kfree(uobj);
-		return SIW_INVAL_UOBJ_KEY;
-	}
-	uobj->size = PAGE_ALIGN(size);
-	uobj->addr = vaddr;
-
-	return key;
-}
-
-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
-				     unsigned long off, u32 size)
-{
-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
-
-	if (uobj && uobj->size == size)
-		return uobj;
-
-	return NULL;
+	kfree(entry);
 }
 
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
-	struct siw_uobj *uobj;
-	unsigned long off = vma->vm_pgoff;
-	int size = vma->vm_end - vma->vm_start;
+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+	size_t size = vma->vm_end - vma->vm_start;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct siw_user_mmap_entry *entry;
 	int rv = -EINVAL;
 
 	/*
@@ -79,18 +55,32 @@ int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 	 */
 	if (vma->vm_start & (PAGE_SIZE - 1)) {
 		pr_warn("siw: mmap not page aligned\n");
-		goto out;
+		return -EINVAL;
 	}
-	uobj = siw_get_uobj(uctx, off, size);
-	if (!uobj) {
-		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %u\n",
+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off,
+					      vma);
+	if (!rdma_entry) {
+		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %#zx\n",
 			off, size);
+		return -EINVAL;
+	}
+	entry = to_siw_mmap_entry(rdma_entry);
+	if (PAGE_ALIGN(entry->length) != size) {
+		siw_dbg(&uctx->sdev->base_dev,
+			"key[%#lx] does not have valid length[%#zx] expected[%#zx]\n",
+			off, size, entry->length);
+		rv = -EINVAL;
+		goto out;
+	}
+
+	rv = remap_vmalloc_range(vma, entry->address, 0);
+	if (rv) {
+		pr_warn("remap_vmalloc_range failed: %lu, %zu\n", off, size);
 		goto out;
 	}
-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
-	if (rv)
-		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off, size);
 out:
+	rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
+
 	return rv;
 }
 
@@ -105,7 +95,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx, struct ib_udata *udata)
 		rv = -ENOMEM;
 		goto err_out;
 	}
-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
+
 	ctx->uobj_nextkey = 0;
 	ctx->sdev = sdev;
 
@@ -135,19 +125,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx, struct ib_udata *udata)
 void siw_dealloc_ucontext(struct ib_ucontext *base_ctx)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
-	void *entry;
-	unsigned long index;
 
-	/*
-	 * Make sure all user mmap objects are gone. Since QP, CQ
-	 * and SRQ destroy routines destroy related objects, nothing
-	 * should be found here.
-	 */
-	xa_for_each(&uctx->xa, index, entry) {
-		kfree(xa_erase(&uctx->xa, index));
-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
-	}
-	xa_destroy(&uctx->xa);
 	atomic_dec(&uctx->sdev->num_ctx);
 }
 
@@ -293,6 +271,34 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
 	siw_qp_put(to_siw_qp(base_qp));
 }
 
+static struct rdma_user_mmap_entry *
+siw_mmap_entry_insert(struct siw_ucontext *uctx,
+		      void *address, size_t length,
+		      u64 *key)
+{
+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	int rv;
+
+	*key = SIW_INVAL_UOBJ_KEY;
+	if (!entry)
+		return NULL;
+
+	entry->address = address;
+	entry->length = length;
+
+	rv = rdma_user_mmap_entry_insert(&uctx->base_ucontext,
+					 &entry->rdma_entry,
+					 length);
+	if (rv) {
+		kfree(entry);
+		return NULL;
+	}
+
+	*key = rdma_user_mmap_get_key(&entry->rdma_entry);
+
+	return &entry->rdma_entry;
+}
+
 /*
  * siw_create_qp()
  *
@@ -317,6 +323,7 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	struct siw_cq *scq = NULL, *rcq = NULL;
 	unsigned long flags;
 	int num_sqe, num_rqe, rv = 0;
+	size_t length;
 
 	siw_dbg(base_dev, "create new QP\n");
 
@@ -380,8 +387,6 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	spin_lock_init(&qp->orq_lock);
 
 	qp->kernel_verbs = !udata;
-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
 
 	rv = siw_qp_add(sdev, qp);
 	if (rv)
@@ -458,22 +463,27 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 		uresp.qp_id = qp_id(qp);
 
 		if (qp->sendq) {
-			qp->xa_sq_index =
-				siw_create_uobj(uctx, qp->sendq,
-					num_sqe * sizeof(struct siw_sqe));
+			length = num_sqe * sizeof(struct siw_sqe);
+			qp->sq_entry =
+				siw_mmap_entry_insert(uctx, qp->sendq,
+						      length, &uresp.sq_key);
+			if (!qp->sq_entry) {
+				rv = -ENOMEM;
+				goto err_out_xa;
+			}
 		}
+
 		if (qp->recvq) {
-			qp->xa_rq_index =
-				 siw_create_uobj(uctx, qp->recvq,
-					num_rqe * sizeof(struct siw_rqe));
-		}
-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
-			goto err_out_xa;
+			length = num_rqe * sizeof(struct siw_rqe);
+			qp->rq_entry =
+				siw_mmap_entry_insert(uctx, qp->recvq,
+						      length, &uresp.rq_key);
+			if (!qp->rq_entry) {
+				uresp.sq_key = SIW_INVAL_UOBJ_KEY;
+				rv = -ENOMEM;
+				goto err_out_xa;
+			}
 		}
-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;
 
 		if (udata->outlen < sizeof(uresp)) {
 			rv = -EINVAL;
@@ -501,11 +511,12 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	kfree(siw_base_qp);
 
 	if (qp) {
-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
-
+		if (uctx) {
+			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
+						    qp->sq_entry);
+			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
+						    qp->rq_entry);
+		}
 		vfree(qp->sendq);
 		vfree(qp->recvq);
 		kfree(qp);
@@ -619,10 +630,10 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
 	qp->rx_stream.rx_suspend = 1;
 
-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
+	if (uctx) {
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_entry);
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_entry);
+	}
 
 	down_write(&qp->state_lock);
 
@@ -993,8 +1004,8 @@ void siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata)
 
 	siw_cq_flush(cq);
 
-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
+	if (ctx)
+		rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_entry);
 
 	atomic_dec(&sdev->num_cq);
 
@@ -1031,7 +1042,6 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 	size = roundup_pow_of_two(size);
 	cq->base_cq.cqe = size;
 	cq->num_cqe = size;
-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
 
 	if (!udata) {
 		cq->kernel_verbs = 1;
@@ -1057,16 +1067,17 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		struct siw_ucontext *ctx =
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
+		size_t length = size * sizeof(struct siw_cqe) +
+			sizeof(struct siw_cq_ctrl);
 
-		cq->xa_cq_index =
-			siw_create_uobj(ctx, cq->queue,
-					size * sizeof(struct siw_cqe) +
-						sizeof(struct siw_cq_ctrl));
-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
+		cq->cq_entry =
+			siw_mmap_entry_insert(ctx, cq->queue,
+					      length, &uresp.cq_key);
+		if (!cq->cq_entry) {
 			rv = -ENOMEM;
 			goto err_out;
 		}
-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
+
 		uresp.cq_id = cq->id;
 		uresp.num_cqe = size;
 
@@ -1087,8 +1098,9 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		struct siw_ucontext *ctx =
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
+		if (ctx)
+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+						    cq->cq_entry);
 		vfree(cq->queue);
 	}
 	atomic_dec(&sdev->num_cq);
@@ -1492,7 +1504,6 @@ int siw_create_srq(struct ib_srq *base_srq,
 	}
 	srq->max_sge = attrs->max_sge;
 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
 	srq->limit = attrs->srq_limit;
 	if (srq->limit)
 		srq->armed = 1;
@@ -1511,15 +1522,16 @@ int siw_create_srq(struct ib_srq *base_srq,
 	}
 	if (udata) {
 		struct siw_uresp_create_srq uresp = {};
+		size_t length = srq->num_rqe * sizeof(struct siw_rqe);
 
-		srq->xa_srq_index = siw_create_uobj(
-			ctx, srq->recvq, srq->num_rqe * sizeof(struct siw_rqe));
-
-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
+		srq->srq_entry =
+			siw_mmap_entry_insert(ctx, srq->recvq,
+					      length, &uresp.srq_key);
+		if (!srq->srq_entry) {
 			rv = -ENOMEM;
 			goto err_out;
 		}
-		uresp.srq_key = srq->xa_srq_index;
+
 		uresp.num_rqe = srq->num_rqe;
 
 		if (udata->outlen < sizeof(uresp)) {
@@ -1538,8 +1550,9 @@ int siw_create_srq(struct ib_srq *base_srq,
 
 err_out:
 	if (srq->recvq) {
-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
+		if (ctx)
+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+						    srq->srq_entry);
 		vfree(srq->recvq);
 	}
 	atomic_dec(&sdev->num_srq);
@@ -1625,9 +1638,9 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata)
 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
 					  base_ucontext);
 
-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
-
+	if (ctx)
+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+					    srq->srq_entry);
 	vfree(srq->recvq);
 	atomic_dec(&sdev->num_srq);
 }
diff --git a/drivers/infiniband/sw/siw/siw_verbs.h b/drivers/infiniband/sw/siw/siw_verbs.h
index 1910869281cb..1a731989fad6 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.h
+++ b/drivers/infiniband/sw/siw/siw_verbs.h
@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata);
 int siw_post_srq_recv(struct ib_srq *base_srq, const struct ib_recv_wr *wr,
 		      const struct ib_recv_wr **bad_wr);
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma);
+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);
 void siw_cq_event(struct siw_cq *cq, enum ib_event_type type);
 void siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 6/8] RDMA/qedr: Use the common mmap API
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (4 preceding siblings ...)
  2019-10-30  9:44 ` [PATCH v12 rdma-next 5/8] RDMA/siw: " Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 7/8] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

Remove all function related to mmap from qedr and use the common
API

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/main.c  |   1 +
 drivers/infiniband/hw/qedr/qedr.h  |  30 +++---
 drivers/infiniband/hw/qedr/verbs.c | 201 ++++++++++++++++++-------------------
 drivers/infiniband/hw/qedr/verbs.h |   3 +-
 4 files changed, 118 insertions(+), 117 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 432dff95a7aa..f7879c5626dc 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -212,6 +212,7 @@ static const struct ib_device_ops qedr_dev_ops = {
 	.get_link_layer = qedr_link_layer,
 	.map_mr_sg = qedr_map_mr_sg,
 	.mmap = qedr_mmap,
+	.mmap_free = qedr_mmap_free,
 	.modify_port = qedr_modify_port,
 	.modify_qp = qedr_modify_qp,
 	.modify_srq = qedr_modify_srq,
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index f89dafc0dbb8..8486fbfe5032 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -231,14 +231,10 @@ struct qedr_ucontext {
 	struct qedr_dev *dev;
 	struct qedr_pd *pd;
 	void __iomem *dpi_addr;
+	struct rdma_user_mmap_entry *db_mmap_entry;
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
-
-	struct list_head mm_head;
-
-	/* Lock to protect mm list */
-	struct mutex mm_list_lock;
 };
 
 union db_prod64 {
@@ -301,14 +297,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-struct qedr_mm {
-	struct {
-		u64 phy_addr;
-		unsigned long len;
-	} key;
-	struct list_head entry;
-};
-
 union db_prod32 {
 	struct rdma_pwm_val16_data data;
 	u32 raw;
@@ -492,6 +480,15 @@ struct qedr_mr {
 	u32 npages;
 };
 
+struct qedr_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	struct qedr_dev *dev;
+	u64 io_address;
+	size_t length;
+	u16 dpi;
+	u8 mmap_flag;
+};
+
 #define SET_FIELD2(value, name, flag) ((value) |= ((flag) << (name ## _SHIFT)))
 
 #define QEDR_RESP_IMM	(RDMA_CQE_RESPONDER_IMM_FLG_MASK << \
@@ -590,4 +587,11 @@ static inline struct qedr_srq *get_qedr_srq(struct ib_srq *ibsrq)
 {
 	return container_of(ibsrq, struct qedr_srq, ibsrq);
 }
+
+static inline struct qedr_user_mmap_entry *
+get_qedr_mmap_entry(struct rdma_user_mmap_entry *rdma_entry)
+{
+	return container_of(rdma_entry, struct qedr_user_mmap_entry,
+			    rdma_entry);
+}
 #endif
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 0da977b203e9..87b1ae9c9d4a 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -59,6 +59,10 @@
 
 #define DB_ADDR_SHIFT(addr)		((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
 
+enum {
+	QEDR_USER_MMAP_IO_WC = 0,
+};
+
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
 					size_t len)
 {
@@ -257,60 +261,6 @@ int qedr_modify_port(struct ib_device *ibdev, u8 port, int mask,
 	return 0;
 }
 
-static int qedr_add_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			 unsigned long len)
-{
-	struct qedr_mm *mm;
-
-	mm = kzalloc(sizeof(*mm), GFP_KERNEL);
-	if (!mm)
-		return -ENOMEM;
-
-	mm->key.phy_addr = phy_addr;
-	/* This function might be called with a length which is not a multiple
-	 * of PAGE_SIZE, while the mapping is PAGE_SIZE grained and the kernel
-	 * forces this granularity by increasing the requested size if needed.
-	 * When qedr_mmap is called, it will search the list with the updated
-	 * length as a key. To prevent search failures, the length is rounded up
-	 * in advance to PAGE_SIZE.
-	 */
-	mm->key.len = roundup(len, PAGE_SIZE);
-	INIT_LIST_HEAD(&mm->entry);
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_add(&mm->entry, &uctx->mm_head);
-	mutex_unlock(&uctx->mm_list_lock);
-
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "added (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-		 (unsigned long long)mm->key.phy_addr,
-		 (unsigned long)mm->key.len, uctx);
-
-	return 0;
-}
-
-static bool qedr_search_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			     unsigned long len)
-{
-	bool found = false;
-	struct qedr_mm *mm;
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_for_each_entry(mm, &uctx->mm_head, entry) {
-		if (len != mm->key.len || phy_addr != mm->key.phy_addr)
-			continue;
-
-		found = true;
-		break;
-	}
-	mutex_unlock(&uctx->mm_list_lock);
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "searched for (addr=0x%llx,len=0x%lx) for ctx=%p, result=%d\n",
-		 mm->key.phy_addr, mm->key.len, uctx, found);
-
-	return found;
-}
-
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 {
 	struct ib_device *ibdev = uctx->device;
@@ -319,6 +269,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	struct qedr_alloc_ucontext_resp uresp = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
+	struct qedr_user_mmap_entry *entry;
 
 	if (!udata)
 		return -EFAULT;
@@ -335,13 +286,29 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	ctx->dpi_addr = oparams.dpi_addr;
 	ctx->dpi_phys_addr = oparams.dpi_phys_addr;
 	ctx->dpi_size = oparams.dpi_size;
-	INIT_LIST_HEAD(&ctx->mm_head);
-	mutex_init(&ctx->mm_list_lock);
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	entry->io_address = ctx->dpi_phys_addr;
+	entry->length = ctx->dpi_size;
+	entry->mmap_flag = QEDR_USER_MMAP_IO_WC;
+	entry->dpi = ctx->dpi;
+	entry->dev = dev;
+	rc = rdma_user_mmap_entry_insert(uctx, &entry->rdma_entry,
+					 ctx->dpi_size);
+	if (rc) {
+		kfree(entry);
+		goto err;
+	}
+	ctx->db_mmap_entry = &entry->rdma_entry;
 
 	uresp.dpm_enabled = dev->user_dpm_enabled;
 	uresp.wids_enabled = 1;
 	uresp.wid_count = oparams.wid_count;
-	uresp.db_pa = ctx->dpi_phys_addr;
+	uresp.db_pa = rdma_user_mmap_get_key(ctx->db_mmap_entry);
 	uresp.db_size = ctx->dpi_size;
 	uresp.max_send_wr = dev->attr.max_sqe;
 	uresp.max_recv_wr = dev->attr.max_rqe;
@@ -353,82 +320,110 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
-		return rc;
+		goto err;
 
 	ctx->dev = dev;
 
-	rc = qedr_add_mmap(ctx, ctx->dpi_phys_addr, ctx->dpi_size);
-	if (rc)
-		return rc;
-
 	DP_DEBUG(dev, QEDR_MSG_INIT, "Allocating user context %p\n",
 		 &ctx->ibucontext);
 	return 0;
+
+err:
+	if (!ctx->db_mmap_entry)
+		dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
+	else
+		rdma_user_mmap_entry_remove(uctx, ctx->db_mmap_entry);
+
+	return rc;
 }
 
 void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
 {
 	struct qedr_ucontext *uctx = get_qedr_ucontext(ibctx);
-	struct qedr_mm *mm, *tmp;
 
 	DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n",
 		 uctx);
-	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
-
-	list_for_each_entry_safe(mm, tmp, &uctx->mm_head, entry) {
-		DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-			 "deleted (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-			 mm->key.phy_addr, mm->key.len, uctx);
-		list_del(&mm->entry);
-		kfree(mm);
-	}
+
+	rdma_user_mmap_entry_remove(ibctx, uctx->db_mmap_entry);
 }
 
-int qedr_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
-	struct qedr_ucontext *ucontext = get_qedr_ucontext(context);
-	struct qedr_dev *dev = get_qedr_dev(context->device);
-	unsigned long phys_addr = vma->vm_pgoff << PAGE_SHIFT;
-	unsigned long len = (vma->vm_end - vma->vm_start);
-	unsigned long dpi_start;
+	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
+	struct qedr_dev *dev = entry->dev;
 
-	dpi_start = dev->db_phys_addr + (ucontext->dpi * ucontext->dpi_size);
+	if (entry->mmap_flag == QEDR_USER_MMAP_IO_WC)
+		dev->ops->rdma_remove_user(dev->rdma_ctx, entry->dpi);
 
-	DP_DEBUG(dev, QEDR_MSG_INIT,
-		 "mmap invoked with vm_start=0x%pK, vm_end=0x%pK,vm_pgoff=0x%pK; dpi_start=0x%pK dpi_size=0x%x\n",
-		 (void *)vma->vm_start, (void *)vma->vm_end,
-		 (void *)vma->vm_pgoff, (void *)dpi_start, ucontext->dpi_size);
+	kfree(entry);
+}
 
-	if ((vma->vm_start & (PAGE_SIZE - 1)) || (len & (PAGE_SIZE - 1))) {
-		DP_ERR(dev,
-		       "failed mmap, addresses must be page aligned: start=0x%pK, end=0x%pK\n",
-		       (void *)vma->vm_start, (void *)vma->vm_end);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
+{
+	struct ib_device *dev = ucontext->device;
+	size_t length = vma->vm_end - vma->vm_start;
+	u64 key = vma->vm_pgoff << PAGE_SHIFT;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct qedr_user_mmap_entry *entry;
+	int rc = 0;
+	u64 pfn;
+
+	ibdev_dbg(dev,
+		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
+		  vma->vm_start, vma->vm_end, length, key);
+
+	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
+		ibdev_dbg(dev,
+			  "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+			  length, PAGE_SIZE, vma->vm_flags);
 		return -EINVAL;
 	}
 
-	if (!qedr_search_mmap(ucontext, phys_addr, len)) {
-		DP_ERR(dev, "failed mmap, vm_pgoff=0x%lx is not authorized\n",
-		       vma->vm_pgoff);
-		return -EINVAL;
+	if (vma->vm_flags & VM_EXEC) {
+		ibdev_dbg(dev, "Mapping executable pages is not permitted\n");
+		return -EPERM;
 	}
+	vma->vm_flags &= ~VM_MAYEXEC;
 
-	if (phys_addr < dpi_start ||
-	    ((phys_addr + len) > (dpi_start + ucontext->dpi_size))) {
-		DP_ERR(dev,
-		       "failed mmap, pages are outside of dpi; page address=0x%pK, dpi_start=0x%pK, dpi_size=0x%x\n",
-		       (void *)phys_addr, (void *)dpi_start,
-		       ucontext->dpi_size);
+	rdma_entry = rdma_user_mmap_entry_get(ucontext, key, vma);
+	if (!rdma_entry) {
+		ibdev_dbg(dev, "key[%#llx] does not have valid entry\n",
+			  key);
 		return -EINVAL;
 	}
+	entry = get_qedr_mmap_entry(rdma_entry);
+	if (entry->length != length) {
+		ibdev_dbg(dev,
+			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
+			  key, length, entry->length);
+		rc = -EINVAL;
+		goto out;
+	}
 
-	if (vma->vm_flags & VM_READ) {
-		DP_ERR(dev, "failed mmap, cannot map doorbell bar for read\n");
-		return -EINVAL;
+	ibdev_dbg(dev,
+		  "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n",
+		  entry->io_address, length, entry->mmap_flag);
+
+	switch (entry->mmap_flag) {
+	case QEDR_USER_MMAP_IO_WC:
+		pfn = entry->io_address >> PAGE_SHIFT;
+		rc = rdma_user_mmap_io(ucontext, vma, pfn, length,
+				       pgprot_writecombine(vma->vm_page_prot),
+				       rdma_entry);
+		break;
+	default:
+		rc = -EINVAL;
 	}
 
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, len,
-				  vma->vm_page_prot);
+	if (rc)
+		ibdev_dbg(dev,
+			  "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
+			  entry->io_address, length, entry->mmap_flag, rc);
+
+out:
+	rdma_user_mmap_entry_put(ucontext, rdma_entry);
+
+	return rc;
 }
 
 int qedr_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 9aaa90283d6e..3606e97e95da 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -46,7 +46,8 @@ int qedr_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
 void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
 
-int qedr_mmap(struct ib_ucontext *, struct vm_area_struct *vma);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
+void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 7/8] RDMA/qedr: Add doorbell overflow recovery support
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (5 preceding siblings ...)
  2019-10-30  9:44 ` [PATCH v12 rdma-next 6/8] RDMA/qedr: Use the common mmap API Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-10-30  9:44 ` [PATCH v12 rdma-next 8/8] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  |  11 +-
 drivers/infiniband/hw/qedr/verbs.c | 319 +++++++++++++++++++++++++++++++------
 include/uapi/rdma/qedr-abi.h       |  25 +++
 3 files changed, 305 insertions(+), 50 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 8486fbfe5032..3ea5b7e8dbc6 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -235,6 +235,7 @@ struct qedr_ucontext {
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
+	bool db_rec;
 };
 
 union db_prod64 {
@@ -262,6 +263,11 @@ struct qedr_userq {
 	struct qedr_pbl *pbl_tbl;
 	u64 buf_addr;
 	size_t buf_len;
+
+	/* doorbell recovery */
+	void __iomem *db_addr;
+	struct qedr_user_db_rec *db_rec_data;
+	struct rdma_user_mmap_entry *db_mmap_entry;
 };
 
 struct qedr_cq {
@@ -483,7 +489,10 @@ struct qedr_mr {
 struct qedr_user_mmap_entry {
 	struct rdma_user_mmap_entry rdma_entry;
 	struct qedr_dev *dev;
-	u64 io_address;
+	union {
+		u64 io_address;
+		void *address;
+	};
 	size_t length;
 	u16 dpi;
 	u8 mmap_flag;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 87b1ae9c9d4a..29686e5b8e55 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -61,6 +61,7 @@
 
 enum {
 	QEDR_USER_MMAP_IO_WC = 0,
+	QEDR_USER_MMAP_PHYS_PAGE,
 };
 
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
@@ -267,6 +268,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	int rc;
 	struct qedr_ucontext *ctx = get_qedr_ucontext(uctx);
 	struct qedr_alloc_ucontext_resp uresp = {};
+	struct qedr_alloc_ucontext_req ureq = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
 	struct qedr_user_mmap_entry *entry;
@@ -274,6 +276,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	if (!udata)
 		return -EFAULT;
 
+	if (udata->inlen) {
+		rc = ib_copy_from_udata(&ureq, udata,
+					min(sizeof(ureq), udata->inlen));
+		if (rc) {
+			DP_ERR(dev, "Problem copying data from user space\n");
+			return -EFAULT;
+		}
+
+		ctx->db_rec = !!(ureq.context_flags & QEDR_ALLOC_UCTX_DB_REC);
+	}
+
 	rc = dev->ops->rdma_add_user(dev->rdma_ctx, &oparams);
 	if (rc) {
 		DP_ERR(dev,
@@ -352,7 +365,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
 	struct qedr_dev *dev = entry->dev;
 
-	if (entry->mmap_flag == QEDR_USER_MMAP_IO_WC)
+	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
+		free_page((unsigned long)entry->address);
+	else if (entry->mmap_flag == QEDR_USER_MMAP_IO_WC)
 		dev->ops->rdma_remove_user(dev->rdma_ctx, entry->dpi);
 
 	kfree(entry);
@@ -411,6 +426,10 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 				       pgprot_writecombine(vma->vm_page_prot),
 				       rdma_entry);
 		break;
+	case QEDR_USER_MMAP_PHYS_PAGE:
+		rc = vm_insert_page(vma, vma->vm_start,
+				    virt_to_page(entry->address));
+		break;
 	default:
 		rc = -EINVAL;
 	}
@@ -653,16 +672,48 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
 	}
 }
 
+static int qedr_db_recovery_add(struct qedr_dev *dev,
+				void __iomem *db_addr,
+				void *db_data,
+				enum qed_db_rec_width db_width,
+				enum qed_db_rec_space db_space)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return 0;
+	}
+
+	return dev->ops->common->db_recovery_add(dev->cdev, db_addr, db_data,
+						 db_width, db_space);
+}
+
+static void qedr_db_recovery_del(struct qedr_dev *dev,
+				 void __iomem *db_addr,
+				 void *db_data)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return;
+	}
+
+	/* Ignore return code as there is not much we can do about it. Error
+	 * log will be printed inside.
+	 */
+	dev->ops->common->db_recovery_del(dev->cdev, db_addr, db_data);
+}
+
 static int qedr_copy_cq_uresp(struct qedr_dev *dev,
-			      struct qedr_cq *cq, struct ib_udata *udata)
+			      struct qedr_cq *cq, struct ib_udata *udata,
+			      u32 db_offset)
 {
 	struct qedr_create_cq_uresp uresp;
 	int rc;
 
 	memset(&uresp, 0, sizeof(uresp));
 
-	uresp.db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+	uresp.db_offset = db_offset;
 	uresp.icid = cq->icid;
+	uresp.db_rec_addr = rdma_user_mmap_get_key(cq->q.db_mmap_entry);
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
@@ -690,10 +741,58 @@ static inline int qedr_align_cq_entries(int entries)
 	return aligned_size / QEDR_CQE_SIZE;
 }
 
+static int qedr_init_user_db_rec(struct ib_udata *udata,
+				 struct qedr_dev *dev, struct qedr_userq *q,
+				 bool requires_db_rec)
+{
+	struct qedr_ucontext *uctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+	struct qedr_user_mmap_entry *entry;
+	int rc;
+
+	/* Aborting for non doorbell userqueue (SRQ) or non-supporting lib */
+	if (requires_db_rec == 0 || !uctx->db_rec)
+		return 0;
+
+	/* Allocate a page for doorbell recovery, add to mmap */
+	q->db_rec_data = (void *)get_zeroed_page(GFP_USER);
+	if (!q->db_rec_data) {
+		DP_ERR(dev, "get_free_page failed\n");
+		return -ENOMEM;
+	}
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto err_free_db_data;
+
+	entry->address = q->db_rec_data;
+	entry->length = PAGE_SIZE;
+	entry->mmap_flag = QEDR_USER_MMAP_PHYS_PAGE;
+	rc = rdma_user_mmap_entry_insert(&uctx->ibucontext,
+					 &entry->rdma_entry,
+					 PAGE_SIZE);
+	if (rc)
+		goto err_free_entry;
+
+	q->db_mmap_entry = &entry->rdma_entry;
+
+	return 0;
+
+err_free_entry:
+	kfree(entry);
+
+err_free_db_data:
+	free_page((unsigned long)q->db_rec_data);
+	q->db_rec_data = NULL;
+	return -ENOMEM;
+}
+
 static inline int qedr_init_user_queue(struct ib_udata *udata,
 				       struct qedr_dev *dev,
 				       struct qedr_userq *q, u64 buf_addr,
-				       size_t buf_len, int access, int dmasync,
+				       size_t buf_len, bool requires_db_rec,
+				       int access, int dmasync,
 				       int alloc_and_init)
 {
 	u32 fw_pages;
@@ -731,7 +830,8 @@ static inline int qedr_init_user_queue(struct ib_udata *udata,
 		}
 	}
 
-	return 0;
+	/* mmap the user address used to store doorbell data for recovery */
+	return qedr_init_user_db_rec(udata, dev, q, requires_db_rec);
 
 err0:
 	ib_umem_release(q->umem);
@@ -817,6 +917,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int entries = attr->cqe;
 	struct qedr_cq *cq = get_qedr_cq(ibcq);
 	int chain_entries;
+	u32 db_offset;
 	int page_cnt;
 	u64 pbl_ptr;
 	u16 icid;
@@ -836,8 +937,12 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	chain_entries = qedr_align_cq_entries(entries);
 	chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
 
+	/* calc db offset. user will add DPI base, kernel will add db addr */
+	db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create cq: problem copying data from user space\n");
 			goto err0;
@@ -852,8 +957,9 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->cq_type = QEDR_CQ_TYPE_USER;
 
 		rc = qedr_init_user_queue(udata, dev, &cq->q, ureq.addr,
-					  ureq.len, IB_ACCESS_LOCAL_WRITE, 1,
-					  1);
+					  ureq.len, true,
+					  IB_ACCESS_LOCAL_WRITE,
+					  1, 1);
 		if (rc)
 			goto err0;
 
@@ -861,6 +967,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		page_cnt = cq->q.pbl_info.num_pbes;
 
 		cq->ibcq.cqe = chain_entries;
+		cq->q.db_addr = ctx->dpi_addr + db_offset;
 	} else {
 		cq->cq_type = QEDR_CQ_TYPE_KERNEL;
 
@@ -872,7 +979,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 						   sizeof(union rdma_cqe),
 						   &cq->pbl, NULL);
 		if (rc)
-			goto err1;
+			goto err0;
 
 		page_cnt = qed_chain_get_page_cnt(&cq->pbl);
 		pbl_ptr = qed_chain_get_pbl_phys(&cq->pbl);
@@ -884,21 +991,28 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	rc = dev->ops->rdma_create_cq(dev->rdma_ctx, &params, &icid);
 	if (rc)
-		goto err2;
+		goto err1;
 
 	cq->icid = icid;
 	cq->sig = QEDR_CQ_MAGIC_NUMBER;
 	spin_lock_init(&cq->cq_lock);
 
 	if (udata) {
-		rc = qedr_copy_cq_uresp(dev, cq, udata);
+		rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
 		if (rc)
-			goto err3;
+			goto err2;
+
+		rc = qedr_db_recovery_add(dev, cq->q.db_addr,
+					  &cq->q.db_rec_data->db_data,
+					  DB_REC_WIDTH_64B,
+					  DB_REC_USER);
+		if (rc)
+			goto err2;
+
 	} else {
 		/* Generate doorbell address. */
-		cq->db_addr = dev->db_addr +
-		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
 		cq->db.data.icid = cq->icid;
+		cq->db_addr = dev->db_addr + db_offset;
 		cq->db.data.params = DB_AGG_CMD_SET <<
 		    RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
 
@@ -908,6 +1022,11 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->latest_cqe = NULL;
 		consume_cqe(cq);
 		cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
+
+		rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
+					  DB_REC_WIDTH_64B, DB_REC_KERNEL);
+		if (rc)
+			goto err2;
 	}
 
 	DP_DEBUG(dev, QEDR_MSG_CQ,
@@ -916,18 +1035,20 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	return 0;
 
-err3:
+err2:
 	destroy_iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &destroy_iparams,
 				  &destroy_oparams);
-err2:
-	if (udata)
-		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
-	else
-		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
 err1:
-	if (udata)
+	if (udata) {
+		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+		if (ctx)
+			rdma_user_mmap_entry_remove(&ctx->ibucontext,
+						    cq->q.db_mmap_entry);
+	} else {
+		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
+	}
 err0:
 	return -EINVAL;
 }
@@ -947,6 +1068,8 @@ int qedr_resize_cq(struct ib_cq *ibcq, int new_cnt, struct ib_udata *udata)
 
 void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
+	struct qedr_ucontext *ctx = rdma_udata_to_drv_context(udata,
+		struct qedr_ucontext, ibucontext);
 	struct qedr_dev *dev = get_qedr_dev(ibcq->device);
 	struct qed_rdma_destroy_cq_out_params oparams;
 	struct qed_rdma_destroy_cq_in_params iparams;
@@ -958,8 +1081,10 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	cq->destroyed = 1;
 
 	/* GSIs CQs are handled by driver, so they don't exist in the FW */
-	if (cq->cq_type == QEDR_CQ_TYPE_GSI)
+	if (cq->cq_type == QEDR_CQ_TYPE_GSI) {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 		return;
+	}
 
 	iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &iparams, &oparams);
@@ -968,6 +1093,15 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	if (udata) {
 		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+
+		if (cq->q.db_rec_data) {
+			qedr_db_recovery_del(dev, cq->q.db_addr,
+					     &cq->q.db_rec_data->db_data);
+			rdma_user_mmap_entry_remove(&ctx->ibucontext,
+						    cq->q.db_mmap_entry);
+		}
+	} else {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 	}
 
 	/* We don't want the IRQ handler to handle a non-existing CQ so we
@@ -1132,8 +1266,8 @@ static int qedr_copy_srq_uresp(struct qedr_dev *dev,
 }
 
 static void qedr_copy_rq_uresp(struct qedr_dev *dev,
-			       struct qedr_create_qp_uresp *uresp,
-			       struct qedr_qp *qp)
+			      struct qedr_create_qp_uresp *uresp,
+			      struct qedr_qp *qp)
 {
 	/* iWARP requires two doorbells per RQ. */
 	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
@@ -1146,6 +1280,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev,
 	}
 
 	uresp->rq_icid = qp->icid;
+	uresp->rq_db_rec_addr = rdma_user_mmap_get_key(qp->urq.db_mmap_entry);
 }
 
 static void qedr_copy_sq_uresp(struct qedr_dev *dev,
@@ -1159,22 +1294,24 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev,
 		uresp->sq_icid = qp->icid;
 	else
 		uresp->sq_icid = qp->icid + 1;
+
+	uresp->sq_db_rec_addr = rdma_user_mmap_get_key(qp->usq.db_mmap_entry);
 }
 
 static int qedr_copy_qp_uresp(struct qedr_dev *dev,
-			      struct qedr_qp *qp, struct ib_udata *udata)
+			      struct qedr_qp *qp, struct ib_udata *udata,
+			      struct qedr_create_qp_uresp *uresp)
 {
-	struct qedr_create_qp_uresp uresp;
 	int rc;
 
-	memset(&uresp, 0, sizeof(uresp));
-	qedr_copy_sq_uresp(dev, &uresp, qp);
-	qedr_copy_rq_uresp(dev, &uresp, qp);
+	memset(uresp, 0, sizeof(*uresp));
+	qedr_copy_sq_uresp(dev, uresp, qp);
+	qedr_copy_rq_uresp(dev, uresp, qp);
 
-	uresp.atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
-	uresp.qp_id = qp->qp_id;
+	uresp->atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
+	uresp->qp_id = qp->qp_id;
 
-	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+	rc = qedr_ib_copy_to_udata(udata, uresp, sizeof(*uresp));
 	if (rc)
 		DP_ERR(dev,
 		       "create qp: failed a copy to user space with qp icid=0x%x.\n",
@@ -1222,16 +1359,35 @@ static void qedr_set_common_qp_params(struct qedr_dev *dev,
 		 qp->sq.max_sges, qp->sq_cq->icid);
 }
 
-static void qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid + 1;
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	if (!qp->srq) {
 		qp->rq.db = dev->db_addr +
 			    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD);
 		qp->rq.db_data.data.icid = qp->icid;
+
+		rc = qedr_db_recovery_add(dev, qp->rq.db,
+					  &qp->rq.db_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_KERNEL);
+		if (rc)
+			qedr_db_recovery_del(dev, qp->sq.db,
+					     &qp->sq.db_data);
 	}
+
+	return rc;
 }
 
 static int qedr_check_srq_params(struct qedr_dev *dev,
@@ -1285,7 +1441,7 @@ static int qedr_init_srq_user_params(struct ib_udata *udata,
 	int rc;
 
 	rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, ureq->srq_addr,
-				  ureq->srq_len, access, dmasync, 1);
+				  ureq->srq_len, false, access, dmasync, 1);
 	if (rc)
 		return rc;
 
@@ -1381,7 +1537,8 @@ int qedr_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr,
 	hw_srq->max_sges = init_attr->attr.max_sge;
 
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create srq: problem copying data from user space\n");
 			goto err0;
@@ -1570,7 +1727,9 @@ qedr_iwarp_populate_user_qp(struct qedr_dev *dev,
 			   &qp->urq.pbl_info, FW_PAGE_SHIFT);
 }
 
-static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
+static void qedr_cleanup_user(struct qedr_dev *dev,
+			      struct qedr_ucontext *ctx,
+			      struct qedr_qp *qp)
 {
 	ib_umem_release(qp->usq.umem);
 	qp->usq.umem = NULL;
@@ -1585,6 +1744,20 @@ static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
 		kfree(qp->usq.pbl_tbl);
 		kfree(qp->urq.pbl_tbl);
 	}
+
+	if (qp->usq.db_rec_data) {
+		qedr_db_recovery_del(dev, qp->usq.db_addr,
+				     &qp->usq.db_rec_data->db_data);
+		rdma_user_mmap_entry_remove(&ctx->ibucontext,
+					    qp->usq.db_mmap_entry);
+	}
+
+	if (qp->urq.db_rec_data) {
+		qedr_db_recovery_del(dev, qp->urq.db_addr,
+				     &qp->urq.db_rec_data->db_data);
+		rdma_user_mmap_entry_remove(&ctx->ibucontext,
+					    qp->urq.db_mmap_entry);
+	}
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1596,13 +1769,15 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	struct qed_rdma_create_qp_in_params in_params;
 	struct qed_rdma_create_qp_out_params out_params;
 	struct qedr_pd *pd = get_qedr_pd(ibpd);
+	struct qedr_create_qp_uresp uresp;
+	struct qedr_ucontext *ctx = NULL;
 	struct qedr_create_qp_ureq ureq;
 	int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1);
 	int rc = -EINVAL;
 
 	qp->create_type = QEDR_QP_CREATE_USER;
 	memset(&ureq, 0, sizeof(ureq));
-	rc = ib_copy_from_udata(&ureq, udata, sizeof(ureq));
+	rc = ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), udata->inlen));
 	if (rc) {
 		DP_ERR(dev, "Problem copying data from user space\n");
 		return rc;
@@ -1610,14 +1785,16 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 
 	/* SQ - read access only (0), dma sync not required (0) */
 	rc = qedr_init_user_queue(udata, dev, &qp->usq, ureq.sq_addr,
-				  ureq.sq_len, 0, 0, alloc_and_init);
+				  ureq.sq_len, true, 0, 0,
+				  alloc_and_init);
 	if (rc)
 		return rc;
 
 	if (!qp->srq) {
 		/* RQ - read access only (0), dma sync not required (0) */
 		rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
-					  ureq.rq_len, 0, 0, alloc_and_init);
+					  ureq.rq_len, true,
+					  0, 0, alloc_and_init);
 		if (rc)
 			return rc;
 	}
@@ -1647,29 +1824,56 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	rc = qedr_copy_qp_uresp(dev, qp, udata);
+	rc = qedr_copy_qp_uresp(dev, qp, udata, &uresp);
+	if (rc)
+		goto err;
+
+	/* db offset was calculated in copy_qp_uresp, now set in the user q */
+	ctx = pd->uctx;
+	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
+	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
+
+	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
+				  &qp->usq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
 	if (rc)
 		goto err;
 
+	rc = qedr_db_recovery_add(dev, qp->urq.db_addr,
+				  &qp->urq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
+	if (rc)
+		goto err;
 	qedr_qp_user_print(dev, qp);
 
-	return 0;
+	return rc;
 err:
 	rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
 	if (rc)
 		DP_ERR(dev, "create qp: fatal fault. rc=%d", rc);
 
 err1:
-	qedr_cleanup_user(dev, qp);
+	qedr_cleanup_user(dev, ctx, qp);
 	return rc;
 }
 
-static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 	    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid;
 
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	qp->rq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD);
 	qp->rq.db_data.data.icid = qp->icid;
@@ -1677,6 +1881,13 @@ static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 			   DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_FLAGS);
 	qp->rq.iwarp_db2_data.data.icid = qp->icid;
 	qp->rq.iwarp_db2_data.data.value = DQ_TCM_IWARP_POST_RQ_CF_CMD;
+
+	rc = qedr_db_recovery_add(dev, qp->rq.db,
+				  &qp->rq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+
+	return rc;
 }
 
 static int
@@ -1724,8 +1935,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_roce_db_info(dev, qp);
-	return rc;
+	return qedr_set_roce_db_info(dev, qp);
 }
 
 static int
@@ -1783,8 +1993,7 @@ qedr_iwarp_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_iwarp_db_info(dev, qp);
-	return rc;
+	return qedr_set_iwarp_db_info(dev, qp);
 
 err:
 	dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -1799,6 +2008,15 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	dev->ops->common->chain_free(dev->cdev, &qp->rq.pbl);
 	kfree(qp->rqe_wr_id);
+
+	/* GSI qp is not registered to db mechanism so no need to delete */
+	if (qp->qp_type == IB_QPT_GSI)
+		return;
+
+	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
+
+	if (!qp->srq)
+		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
@@ -2439,7 +2657,10 @@ int qedr_query_qp(struct ib_qp *ibqp,
 static int qedr_free_qp_resources(struct qedr_dev *dev, struct qedr_qp *qp,
 				  struct ib_udata *udata)
 {
-	int rc = 0;
+	struct qedr_ucontext *ctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+	int rc;
 
 	if (qp->qp_type != IB_QPT_GSI) {
 		rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -2448,7 +2669,7 @@ static int qedr_free_qp_resources(struct qedr_dev *dev, struct qedr_qp *qp,
 	}
 
 	if (qp->create_type == QEDR_QP_CREATE_USER)
-		qedr_cleanup_user(dev, qp);
+		qedr_cleanup_user(dev, ctx, qp);
 	else
 		qedr_cleanup_kernel(dev, qp);
 
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 7a10b3a325fa..c022ee26089b 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -38,6 +38,15 @@
 #define QEDR_ABI_VERSION		(8)
 
 /* user kernel communication data structures. */
+enum qedr_alloc_ucontext_flags {
+	QEDR_ALLOC_UCTX_RESERVED	= 1 << 0,
+	QEDR_ALLOC_UCTX_DB_REC		= 1 << 1
+};
+
+struct qedr_alloc_ucontext_req {
+	__u32 context_flags;
+	__u32 reserved;
+};
 
 struct qedr_alloc_ucontext_resp {
 	__aligned_u64 db_pa;
@@ -74,6 +83,7 @@ struct qedr_create_cq_uresp {
 	__u32 db_offset;
 	__u16 icid;
 	__u16 reserved;
+	__aligned_u64 db_rec_addr;
 };
 
 struct qedr_create_qp_ureq {
@@ -109,6 +119,13 @@ struct qedr_create_qp_uresp {
 
 	__u32 rq_db2_offset;
 	__u32 reserved;
+
+	/* address of SQ doorbell recovery user entry */
+	__aligned_u64 sq_db_rec_addr;
+
+	/* address of RQ doorbell recovery user entry */
+	__aligned_u64 rq_db_rec_addr;
+
 };
 
 struct qedr_create_srq_ureq {
@@ -128,4 +145,12 @@ struct qedr_create_srq_uresp {
 	__u32 reserved1;
 };
 
+/* doorbell recovery entry allocated and populated by userspace doorbelling
+ * entities and mapped to kernel. Kernel uses this to register doorbell
+ * information with doorbell drop recovery mechanism.
+ */
+struct qedr_user_db_rec {
+	__aligned_u64 db_data; /* doorbell data */
+};
+
 #endif /* __QEDR_USER_H__ */
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v12 rdma-next 8/8] RDMA/qedr: Add iWARP doorbell recovery support
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (6 preceding siblings ...)
  2019-10-30  9:44 ` [PATCH v12 rdma-next 7/8] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
@ 2019-10-30  9:44 ` Michal Kalderon
  2019-10-30 15:33 ` [PATCH v12 rdma-next 5/8] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
  2019-11-05 20:41 ` [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Jason Gunthorpe
  9 siblings, 0 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-10-30  9:44 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, dledford, jgg, galpress, yishaih, bmt
  Cc: linux-rdma

This patch adds the iWARP specific doorbells to the doorbell
recovery mechanism

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  | 12 +++++++-----
 drivers/infiniband/hw/qedr/verbs.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 3ea5b7e8dbc6..c7a94ac5cc15 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -238,6 +238,11 @@ struct qedr_ucontext {
 	bool db_rec;
 };
 
+union db_prod32 {
+	struct rdma_pwm_val16_data data;
+	u32 raw;
+};
+
 union db_prod64 {
 	struct rdma_pwm_val32_data data;
 	u64 raw;
@@ -268,6 +273,8 @@ struct qedr_userq {
 	void __iomem *db_addr;
 	struct qedr_user_db_rec *db_rec_data;
 	struct rdma_user_mmap_entry *db_mmap_entry;
+	void __iomem *db_rec_db2_addr;
+	union db_prod32 db_rec_db2_data;
 };
 
 struct qedr_cq {
@@ -303,11 +310,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-union db_prod32 {
-	struct rdma_pwm_val16_data data;
-	u32 raw;
-};
-
 struct qedr_qp_hwq_info {
 	/* WQE Elements */
 	struct qed_chain pbl;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 29686e5b8e55..a593593aa658 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1758,6 +1758,10 @@ static void qedr_cleanup_user(struct qedr_dev *dev,
 		rdma_user_mmap_entry_remove(&ctx->ibucontext,
 					    qp->urq.db_mmap_entry);
 	}
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1))
+		qedr_db_recovery_del(dev, qp->urq.db_rec_db2_addr,
+				     &qp->urq.db_rec_db2_data);
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1833,6 +1837,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
 	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
 
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		qp->urq.db_rec_db2_addr = ctx->dpi_addr + uresp.rq_db2_offset;
+
+		/* calculate the db_rec_db2 data since it is constant so no
+		 *  need to reflect from user
+		 */
+		qp->urq.db_rec_db2_data.data.icid = cpu_to_le16(qp->icid);
+		qp->urq.db_rec_db2_data.data.value =
+			cpu_to_le16(DQ_TCM_IWARP_POST_RQ_CF_CMD);
+	}
+
 	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
 				  &qp->usq.db_rec_data->db_data,
 				  DB_REC_WIDTH_32B,
@@ -1846,6 +1861,15 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 				  DB_REC_USER);
 	if (rc)
 		goto err;
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		rc = qedr_db_recovery_add(dev, qp->urq.db_rec_db2_addr,
+					  &qp->urq.db_rec_db2_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_USER);
+		if (rc)
+			goto err;
+	}
 	qedr_qp_user_print(dev, qp);
 
 	return rc;
@@ -1886,7 +1910,13 @@ static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 				  &qp->rq.db_data,
 				  DB_REC_WIDTH_32B,
 				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
 
+	rc = qedr_db_recovery_add(dev, qp->rq.iwarp_db2,
+				  &qp->rq.iwarp_db2_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
 	return rc;
 }
 
@@ -2015,8 +2045,13 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
 
-	if (!qp->srq)
+	if (!qp->srq) {
 		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
+
+		if (rdma_protocol_iwarp(&dev->ibdev, 1))
+			qedr_db_recovery_del(dev, qp->rq.iwarp_db2,
+					     &qp->rq.iwarp_db2_data);
+	}
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 5/8] RDMA/siw: Use the common mmap_xa helpers
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (7 preceding siblings ...)
  2019-10-30  9:44 ` [PATCH v12 rdma-next 8/8] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
@ 2019-10-30 15:33 ` Bernard Metzler
  2019-11-05 20:41 ` [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Jason Gunthorpe
  9 siblings, 0 replies; 20+ messages in thread
From: Bernard Metzler @ 2019-10-30 15:33 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: ariel.elior, dledford, jgg, galpress, yishaih, linux-rdma

-----"Michal Kalderon" <michal.kalderon@marvell.com> wrote: -----

>To: <michal.kalderon@marvell.com>, <ariel.elior@marvell.com>,
><dledford@redhat.com>, <jgg@ziepe.ca>, <galpress@amazon.com>,
><yishaih@mellanox.com>, <bmt@zurich.ibm.com>
>From: "Michal Kalderon" <michal.kalderon@marvell.com>
>Date: 10/30/2019 10:47AM
>Cc: <linux-rdma@vger.kernel.org>
>Subject: [EXTERNAL] [PATCH v12 rdma-next 5/8] RDMA/siw: Use the
>common mmap_xa helpers
>
>Remove the functions related to managing the mmap_xa database.
>This code is now common in ib_core. Use the common API's instead.
>
>Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
>Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
>---
> drivers/infiniband/sw/siw/siw.h       |  20 +++-
> drivers/infiniband/sw/siw/siw_main.c  |   1 +
> drivers/infiniband/sw/siw/siw_verbs.c | 207
>++++++++++++++++++----------------
> drivers/infiniband/sw/siw/siw_verbs.h |   1 +
> 4 files changed, 128 insertions(+), 101 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw.h
>b/drivers/infiniband/sw/siw/siw.h
>index dba4535494ab..824b11d5b879 100644
>--- a/drivers/infiniband/sw/siw/siw.h
>+++ b/drivers/infiniband/sw/siw/siw.h
>@@ -220,7 +220,7 @@ struct siw_cq {
> 	u32 cq_get;
> 	u32 num_cqe;
> 	bool kernel_verbs;
>-	u32 xa_cq_index; /* mmap information for CQE array */
>+	struct rdma_user_mmap_entry *cq_entry; /* mmap info for CQE array
>*/
> 	u32 id; /* For debugging only */
> };
> 
>@@ -263,7 +263,7 @@ struct siw_srq {
> 	u32 rq_put;
> 	u32 rq_get;
> 	u32 num_rqe; /* max # of wqe's allowed */
>-	u32 xa_srq_index; /* mmap information for SRQ array */
>+	struct rdma_user_mmap_entry *srq_entry; /* mmap info for SRQ array
>*/
> 	char armed; /* inform user if limit hit */
> 	char kernel_verbs; /* '1' if kernel client */
> };
>@@ -477,8 +477,8 @@ struct siw_qp {
> 		u8 layer : 4, etype : 4;
> 		u8 ecode;
> 	} term_info;
>-	u32 xa_sq_index; /* mmap information for SQE array */
>-	u32 xa_rq_index; /* mmap information for RQE array */
>+	struct rdma_user_mmap_entry *sq_entry; /* mmap info for SQE array
>*/
>+	struct rdma_user_mmap_entry *rq_entry; /* mmap info for RQE array
>*/
> 	struct rcu_head rcu;
> };
> 
>@@ -503,6 +503,12 @@ struct iwarp_msg_info {
> 	int (*rx_data)(struct siw_qp *qp);
> };
> 
>+struct siw_user_mmap_entry {
>+	struct rdma_user_mmap_entry rdma_entry;
>+	void *address;
>+	size_t length;
>+};
>+
> /* Global siw parameters. Currently set in siw_main.c */
> extern const bool zcopy_tx;
> extern const bool try_gso;
>@@ -607,6 +613,12 @@ static inline struct siw_mr *to_siw_mr(struct
>ib_mr *base_mr)
> 	return container_of(base_mr, struct siw_mr, base_mr);
> }
> 
>+static inline struct siw_user_mmap_entry *
>+to_siw_mmap_entry(struct rdma_user_mmap_entry *rdma_mmap)
>+{
>+	return container_of(rdma_mmap, struct siw_user_mmap_entry,
>rdma_entry);
>+}
>+
> static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev,
>int id)
> {
> 	struct siw_qp *qp;
>diff --git a/drivers/infiniband/sw/siw/siw_main.c
>b/drivers/infiniband/sw/siw/siw_main.c
>index d1a1b7aa7d83..4b44d5f075fd 100644
>--- a/drivers/infiniband/sw/siw/siw_main.c
>+++ b/drivers/infiniband/sw/siw/siw_main.c
>@@ -299,6 +299,7 @@ static const struct ib_device_ops siw_device_ops
>= {
> 	.iw_rem_ref = siw_qp_put_ref,
> 	.map_mr_sg = siw_map_mr_sg,
> 	.mmap = siw_mmap,
>+	.mmap_free = siw_mmap_free,
> 	.modify_qp = siw_verbs_modify_qp,
> 	.modify_srq = siw_modify_srq,
> 	.poll_cq = siw_poll_cq,
>diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>b/drivers/infiniband/sw/siw/siw_verbs.c
>index 869e02b69a01..71c7a61cbfe3 100644
>--- a/drivers/infiniband/sw/siw/siw_verbs.c
>+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>@@ -34,44 +34,20 @@ static char ib_qp_state_to_string[IB_QPS_ERR +
>1][sizeof("RESET")] = {
> 	[IB_QPS_ERR] = "ERR"
> };
> 
>-static u32 siw_create_uobj(struct siw_ucontext *uctx, void *vaddr,
>u32 size)
>+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
> {
>-	struct siw_uobj *uobj;
>-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
>-	u32 key;
>+	struct siw_user_mmap_entry *entry = to_siw_mmap_entry(rdma_entry);
> 
>-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
>-	if (!uobj)
>-		return SIW_INVAL_UOBJ_KEY;
>-
>-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit,
>&uctx->uobj_nextkey,
>-			    GFP_KERNEL) < 0) {
>-		kfree(uobj);
>-		return SIW_INVAL_UOBJ_KEY;
>-	}
>-	uobj->size = PAGE_ALIGN(size);
>-	uobj->addr = vaddr;
>-
>-	return key;
>-}
>-
>-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
>-				     unsigned long off, u32 size)
>-{
>-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
>-
>-	if (uobj && uobj->size == size)
>-		return uobj;
>-
>-	return NULL;
>+	kfree(entry);
> }
> 
> int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
> {
> 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
>-	struct siw_uobj *uobj;
>-	unsigned long off = vma->vm_pgoff;
>-	int size = vma->vm_end - vma->vm_start;
>+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
>+	size_t size = vma->vm_end - vma->vm_start;
>+	struct rdma_user_mmap_entry *rdma_entry;
>+	struct siw_user_mmap_entry *entry;
> 	int rv = -EINVAL;
> 
> 	/*
>@@ -79,18 +55,32 @@ int siw_mmap(struct ib_ucontext *ctx, struct
>vm_area_struct *vma)
> 	 */
> 	if (vma->vm_start & (PAGE_SIZE - 1)) {
> 		pr_warn("siw: mmap not page aligned\n");
>-		goto out;
>+		return -EINVAL;
> 	}
>-	uobj = siw_get_uobj(uctx, off, size);
>-	if (!uobj) {
>-		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %u\n",
>+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off,
>+					      vma);
>+	if (!rdma_entry) {
>+		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %#zx\n",
> 			off, size);
>+		return -EINVAL;
>+	}
>+	entry = to_siw_mmap_entry(rdma_entry);
>+	if (PAGE_ALIGN(entry->length) != size) {
>+		siw_dbg(&uctx->sdev->base_dev,
>+			"key[%#lx] does not have valid length[%#zx] expected[%#zx]\n",
>+			off, size, entry->length);
>+		rv = -EINVAL;
>+		goto out;
>+	}
>+
>+	rv = remap_vmalloc_range(vma, entry->address, 0);
>+	if (rv) {
>+		pr_warn("remap_vmalloc_range failed: %lu, %zu\n", off, size);
> 		goto out;
> 	}
>-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
>-	if (rv)
>-		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off, size);
> out:
>+	rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
>+
> 	return rv;
> }
> 
>@@ -105,7 +95,7 @@ int siw_alloc_ucontext(struct ib_ucontext
>*base_ctx, struct ib_udata *udata)
> 		rv = -ENOMEM;
> 		goto err_out;
> 	}
>-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
>+
> 	ctx->uobj_nextkey = 0;
> 	ctx->sdev = sdev;
> 
>@@ -135,19 +125,7 @@ int siw_alloc_ucontext(struct ib_ucontext
>*base_ctx, struct ib_udata *udata)
> void siw_dealloc_ucontext(struct ib_ucontext *base_ctx)
> {
> 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
>-	void *entry;
>-	unsigned long index;
> 
>-	/*
>-	 * Make sure all user mmap objects are gone. Since QP, CQ
>-	 * and SRQ destroy routines destroy related objects, nothing
>-	 * should be found here.
>-	 */
>-	xa_for_each(&uctx->xa, index, entry) {
>-		kfree(xa_erase(&uctx->xa, index));
>-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
>-	}
>-	xa_destroy(&uctx->xa);
> 	atomic_dec(&uctx->sdev->num_ctx);
> }
> 
>@@ -293,6 +271,34 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
> 	siw_qp_put(to_siw_qp(base_qp));
> }
> 
>+static struct rdma_user_mmap_entry *
>+siw_mmap_entry_insert(struct siw_ucontext *uctx,
>+		      void *address, size_t length,
>+		      u64 *key)
>+{
>+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry),
>GFP_KERNEL);
>+	int rv;
>+
>+	*key = SIW_INVAL_UOBJ_KEY;
>+	if (!entry)
>+		return NULL;
>+
>+	entry->address = address;
>+	entry->length = length;
>+
>+	rv = rdma_user_mmap_entry_insert(&uctx->base_ucontext,
>+					 &entry->rdma_entry,
>+					 length);
>+	if (rv) {
>+		kfree(entry);
>+		return NULL;
>+	}
>+
>+	*key = rdma_user_mmap_get_key(&entry->rdma_entry);
>+
>+	return &entry->rdma_entry;
>+}
>+
> /*
>  * siw_create_qp()
>  *
>@@ -317,6 +323,7 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 	struct siw_cq *scq = NULL, *rcq = NULL;
> 	unsigned long flags;
> 	int num_sqe, num_rqe, rv = 0;
>+	size_t length;
> 
> 	siw_dbg(base_dev, "create new QP\n");
> 
>@@ -380,8 +387,6 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 	spin_lock_init(&qp->orq_lock);
> 
> 	qp->kernel_verbs = !udata;
>-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
>-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
> 
> 	rv = siw_qp_add(sdev, qp);
> 	if (rv)
>@@ -458,22 +463,27 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 		uresp.qp_id = qp_id(qp);
> 
> 		if (qp->sendq) {
>-			qp->xa_sq_index =
>-				siw_create_uobj(uctx, qp->sendq,
>-					num_sqe * sizeof(struct siw_sqe));
>+			length = num_sqe * sizeof(struct siw_sqe);
>+			qp->sq_entry =
>+				siw_mmap_entry_insert(uctx, qp->sendq,
>+						      length, &uresp.sq_key);
>+			if (!qp->sq_entry) {
>+				rv = -ENOMEM;
>+				goto err_out_xa;
>+			}
> 		}
>+
> 		if (qp->recvq) {
>-			qp->xa_rq_index =
>-				 siw_create_uobj(uctx, qp->recvq,
>-					num_rqe * sizeof(struct siw_rqe));
>-		}
>-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
>-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
>-			rv = -ENOMEM;
>-			goto err_out_xa;
>+			length = num_rqe * sizeof(struct siw_rqe);
>+			qp->rq_entry =
>+				siw_mmap_entry_insert(uctx, qp->recvq,
>+						      length, &uresp.rq_key);
>+			if (!qp->rq_entry) {
>+				uresp.sq_key = SIW_INVAL_UOBJ_KEY;
>+				rv = -ENOMEM;
>+				goto err_out_xa;
>+			}
> 		}
>-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
>-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;
> 
> 		if (udata->outlen < sizeof(uresp)) {
> 			rv = -EINVAL;
>@@ -501,11 +511,12 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 	kfree(siw_base_qp);
> 
> 	if (qp) {
>-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
>-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
>-
>+		if (uctx) {
>+			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
>+						    qp->sq_entry);
>+			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
>+						    qp->rq_entry);
>+		}
> 		vfree(qp->sendq);
> 		vfree(qp->recvq);
> 		kfree(qp);
>@@ -619,10 +630,10 @@ int siw_destroy_qp(struct ib_qp *base_qp,
>struct ib_udata *udata)
> 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
> 	qp->rx_stream.rx_suspend = 1;
> 
>-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
>-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
>+	if (uctx) {
>+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_entry);
>+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_entry);
>+	}
> 
> 	down_write(&qp->state_lock);
> 
>@@ -993,8 +1004,8 @@ void siw_destroy_cq(struct ib_cq *base_cq,
>struct ib_udata *udata)
> 
> 	siw_cq_flush(cq);
> 
>-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
>+	if (ctx)
>+		rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_entry);
> 
> 	atomic_dec(&sdev->num_cq);
> 
>@@ -1031,7 +1042,6 @@ int siw_create_cq(struct ib_cq *base_cq, const
>struct ib_cq_init_attr *attr,
> 	size = roundup_pow_of_two(size);
> 	cq->base_cq.cqe = size;
> 	cq->num_cqe = size;
>-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
> 
> 	if (!udata) {
> 		cq->kernel_verbs = 1;
>@@ -1057,16 +1067,17 @@ int siw_create_cq(struct ib_cq *base_cq,
>const struct ib_cq_init_attr *attr,
> 		struct siw_ucontext *ctx =
> 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
> 						  base_ucontext);
>+		size_t length = size * sizeof(struct siw_cqe) +
>+			sizeof(struct siw_cq_ctrl);
> 
>-		cq->xa_cq_index =
>-			siw_create_uobj(ctx, cq->queue,
>-					size * sizeof(struct siw_cqe) +
>-						sizeof(struct siw_cq_ctrl));
>-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
>+		cq->cq_entry =
>+			siw_mmap_entry_insert(ctx, cq->queue,
>+					      length, &uresp.cq_key);
>+		if (!cq->cq_entry) {
> 			rv = -ENOMEM;
> 			goto err_out;
> 		}
>-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
>+
> 		uresp.cq_id = cq->id;
> 		uresp.num_cqe = size;
> 
>@@ -1087,8 +1098,9 @@ int siw_create_cq(struct ib_cq *base_cq, const
>struct ib_cq_init_attr *attr,
> 		struct siw_ucontext *ctx =
> 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
> 						  base_ucontext);
>-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
>+		if (ctx)
>+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>+						    cq->cq_entry);
> 		vfree(cq->queue);
> 	}
> 	atomic_dec(&sdev->num_cq);
>@@ -1492,7 +1504,6 @@ int siw_create_srq(struct ib_srq *base_srq,
> 	}
> 	srq->max_sge = attrs->max_sge;
> 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
>-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
> 	srq->limit = attrs->srq_limit;
> 	if (srq->limit)
> 		srq->armed = 1;
>@@ -1511,15 +1522,16 @@ int siw_create_srq(struct ib_srq *base_srq,
> 	}
> 	if (udata) {
> 		struct siw_uresp_create_srq uresp = {};
>+		size_t length = srq->num_rqe * sizeof(struct siw_rqe);
> 
>-		srq->xa_srq_index = siw_create_uobj(
>-			ctx, srq->recvq, srq->num_rqe * sizeof(struct siw_rqe));
>-
>-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
>+		srq->srq_entry =
>+			siw_mmap_entry_insert(ctx, srq->recvq,
>+					      length, &uresp.srq_key);
>+		if (!srq->srq_entry) {
> 			rv = -ENOMEM;
> 			goto err_out;
> 		}
>-		uresp.srq_key = srq->xa_srq_index;
>+
> 		uresp.num_rqe = srq->num_rqe;
> 
> 		if (udata->outlen < sizeof(uresp)) {
>@@ -1538,8 +1550,9 @@ int siw_create_srq(struct ib_srq *base_srq,
> 
> err_out:
> 	if (srq->recvq) {
>-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
>+		if (ctx)
>+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>+						    srq->srq_entry);
> 		vfree(srq->recvq);
> 	}
> 	atomic_dec(&sdev->num_srq);
>@@ -1625,9 +1638,9 @@ void siw_destroy_srq(struct ib_srq *base_srq,
>struct ib_udata *udata)
> 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
> 					  base_ucontext);
> 
>-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
>-
>+	if (ctx)
>+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>+					    srq->srq_entry);
> 	vfree(srq->recvq);
> 	atomic_dec(&sdev->num_srq);
> }
>diff --git a/drivers/infiniband/sw/siw/siw_verbs.h
>b/drivers/infiniband/sw/siw/siw_verbs.h
>index 1910869281cb..1a731989fad6 100644
>--- a/drivers/infiniband/sw/siw/siw_verbs.h
>+++ b/drivers/infiniband/sw/siw/siw_verbs.h
>@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq,
>struct ib_udata *udata);
> int siw_post_srq_recv(struct ib_srq *base_srq, const struct
>ib_recv_wr *wr,
> 		      const struct ib_recv_wr **bad_wr);
> int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma);
>+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
> void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);
> void siw_cq_event(struct siw_cq *cq, enum ib_event_type type);
> void siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
>-- 
>2.14.5
>
>

Hi Michal,

For the siw part, this patch works fine and looks good.


Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions
  2019-10-30  9:44 ` [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
@ 2019-10-31 12:35   ` Yishai Hadas
  2019-11-01 10:19     ` [EXT] " Michal Kalderon
  2019-11-05 19:44   ` Jason Gunthorpe
  1 sibling, 1 reply; 20+ messages in thread
From: Yishai Hadas @ 2019-10-31 12:35 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: ariel.elior, dledford, jgg, galpress, yishaih, bmt, linux-rdma

On 10/30/2019 11:44 AM, Michal Kalderon wrote:
> Create some common API's for adding entries to a xa_mmap.
> Searching for an entry and freeing one.
> 
> Most of the code was copied from the efa driver almost as is, just renamed
> function to be generic and not efa specific.
> The fact that this code moved to core enabled managing it differently,
> so that now entries can be removed and deleted when driver+user are
> done with them. This enabled changing the insert algorithm in
> comparison to what was done in efa.
> 
> Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> ---
>   drivers/infiniband/core/device.c         |   1 +
>   drivers/infiniband/core/ib_core_uverbs.c | 201 +++++++++++++++++++++++++++++++
>   drivers/infiniband/core/rdma_core.c      |   1 +
>   drivers/infiniband/core/uverbs_cmd.c     |   2 +
>   include/rdma/ib_verbs.h                  |  34 ++++++
>   5 files changed, 239 insertions(+)
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index a667636f74bf..bf3a683057bc 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2629,6 +2629,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>   	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
>   	SET_DEVICE_OP(dev_ops, map_phys_fmr);
>   	SET_DEVICE_OP(dev_ops, mmap);
> +	SET_DEVICE_OP(dev_ops, mmap_free);
>   	SET_DEVICE_OP(dev_ops, modify_ah);
>   	SET_DEVICE_OP(dev_ops, modify_cq);
>   	SET_DEVICE_OP(dev_ops, modify_device);
> diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
> index b74d2a2fb342..1ffc89fd5d94 100644
> --- a/drivers/infiniband/core/ib_core_uverbs.c
> +++ b/drivers/infiniband/core/ib_core_uverbs.c
> @@ -71,3 +71,204 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>   	return 0;
>   }
>   EXPORT_SYMBOL(rdma_user_mmap_io);
> +
> +/**
> + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @key: the key received from rdma_user_mmap_entry_insert which
> + *     is provided by user as the address to map.
> + * @vma: the vma related to the current mmap call.
> + *
> + * This function is called when a user tries to mmap a key it
> + * initially received from the driver. The key was created by
> + * the function rdma_user_mmap_entry_insert.
> + * This function increases the refcnt of the entry so that it won't
> + * be deleted from the xa in the meantime.
> + *
> + * Return an entry if exists or NULL if there is no match.
> + */
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
> +			 struct vm_area_struct *vma)

Where @vma is used in this function ? I would expect that this API will 
return the entry pointed by @key without any relation to the @vma, 
wasn't that the plan ?

> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;
> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	xa_lock(&ucontext->mmap_xa);
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> +
> +	/* if refcount is zero, entry is already being deleted */
> +	if (!entry || entry->invalid || !kref_get_unless_zero(&entry->ref))
> +		goto err;
> +
> +	xa_unlock(&ucontext->mmap_xa);
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: key[%#llx] npages[%#x] returned\n",
> +		  key, entry->npages);
> +
> +	return entry;
> +
> +err:
> +	xa_unlock(&ucontext->mmap_xa);
> +	return NULL;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> +
> +void rdma_user_mmap_entry_free(struct kref *kref)
> +{
> +	struct rdma_user_mmap_entry *entry =
> +		container_of(kref, struct rdma_user_mmap_entry, ref);
> +	struct ib_ucontext *ucontext = entry->ucontext;
> +	unsigned long i;
> +
> +	/* need to erase all entries occupied by this single entry */
> +	xa_lock(&ucontext->mmap_xa);
> +	for (i = 0; i < entry->npages; i++)
> +		__xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
> +	xa_unlock(&ucontext->mmap_xa);
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: key[%#llx] npages[%#x] removed\n",
> +		  rdma_user_mmap_get_key(entry),
> +		  entry->npages);
> +
> +	if (ucontext->device->ops.mmap_free)
> +		ucontext->device->ops.mmap_free(entry);
> +}
> +
> +/**
> + * rdma_user_mmap_entry_put() - Drop reference to the mmap entry
> + *
> + * @ucontext: associated user context.
> + * @entry: an entry in the mmap_xa.
> + *
> + * This function is called when the mapping is closed if it was
> + * an io mapping or when the driver is done with the entry for
> + * some other reason.
> + * Should be called after rdma_user_mmap_entry_get was called
> + * and entry is no longer needed. This function will erase the
> + * entry and free it if its refcnt reaches zero.
> + */
> +void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
> +			      struct rdma_user_mmap_entry *entry)
> +{
> +	kref_put(&entry->ref, rdma_user_mmap_entry_free);
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_put);
> +
> +/**
> + * rdma_user_mmap_entry_remove() - Drop reference to entry and
> + *				   mark it as invalid.
> + *
> + * @ucontext: associated user context.
> + * @entry: the entry to insert into the mmap_xa
> + */
> +void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
> +				 struct rdma_user_mmap_entry *entry)
> +{
> +	if (!entry)
> +		return;
> +
> +	entry->invalid = true;
> +	kref_put(&entry->ref, rdma_user_mmap_entry_free);
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
> +
> +/**
> + * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @entry: the entry to insert into the mmap_xa
> + * @length: length of the address that will be mmapped
> + *
> + * This function should be called by drivers that use the rdma_user_mmap
> + * interface for handling user mmapped addresses. The database is handled in
> + * the core and helper functions are provided to insert entries into the
> + * database and extract entries when the user calls mmap with the given key.
> + * The function allocates a unique key that should be provided to user, the user
> + * will use the key to retrieve information such as address to
> + * be mapped and how.
> + *
> + * Return: 0 on success and -ENOMEM on failure
> + */
> +int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> +				struct rdma_user_mmap_entry *entry,
> +				size_t length)
> +{
> +	struct ib_uverbs_file *ufile = ucontext->ufile;
> +	XA_STATE(xas, &ucontext->mmap_xa, 0);
> +	u32 xa_first, xa_last, npages;
> +	int err, i;
> +
> +	if (!entry)
> +		return -EINVAL;
> +
> +	kref_init(&entry->ref);
> +	entry->ucontext = ucontext;
> +
> +	/* We want the whole allocation to be done without interruption
> +	 * from a different thread. The allocation requires finding a
> +	 * free range and storing. During the xa_insert the lock could be
> +	 * released, we don't want another thread taking the gap.
> +	 */
> +	mutex_lock(&ufile->umap_lock);
> +
> +	xa_lock(&ucontext->mmap_xa);
> +
> +	/* We want to find an empty range */
> +	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
> +	entry->npages = npages;
> +	while (true) {
> +		/* First find an empty index */
> +		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
> +		if (xas.xa_node == XAS_RESTART)
> +			goto err_unlock;
> +
> +		xa_first = xas.xa_index;
> +
> +		/* Is there enough room to have the range? */
> +		if (check_add_overflow(xa_first, npages, &xa_last))
> +			goto err_unlock;
> +
> +		/* Now look for the next present entry. If such doesn't
> +		 * exist, we found an empty range and can proceed
> +		 */
> +		xas_next_entry(&xas, xa_last - 1);
> +		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
> +			break;
> +		/* o/w look for the next free entry */
> +	}
> +
> +	for (i = xa_first; i < xa_last; i++) {
> +		err = __xa_insert(&ucontext->mmap_xa, i, entry, GFP_KERNEL);
> +		if (err)
> +			goto err_undo;
> +	}
> +
> +	entry->mmap_page = xa_first;
> +	xa_unlock(&ucontext->mmap_xa);
> +
> +	mutex_unlock(&ufile->umap_lock);
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: key[%#llx] npages[%#x] inserted\n",
> +		  rdma_user_mmap_get_key(entry), npages);
> +
> +	return 0;
> +
> +err_undo:
> +	for (; i > xa_first; i--)
> +		__xa_erase(&ucontext->mmap_xa, i - 1);
> +
> +err_unlock:
> +	xa_unlock(&ucontext->mmap_xa);
> +	mutex_unlock(&ufile->umap_lock);
> +	return -ENOMEM;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..6c72773faf29 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
>   	rdma_restrack_del(&ucontext->res);
>   
>   	ib_dev->ops.dealloc_ucontext(ucontext);
> +	WARN_ON(!xa_empty(&ucontext->mmap_xa));
>   	kfree(ucontext);
>   
>   	ufile->ucontext = NULL;
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 14a80fd9f464..06ed32c8662f 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -252,6 +252,8 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
>   	ucontext->closing = false;
>   	ucontext->cleanup_retryable = false;
>   
> +	xa_init_flags(&ucontext->mmap_xa, XA_FLAGS_ALLOC);
> +
>   	ret = get_unused_fd_flags(O_CLOEXEC);
>   	if (ret < 0)
>   		goto err_free;
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 6a47ba85c54c..8a87c9d442bc 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1471,6 +1471,7 @@ struct ib_ucontext {
>   	 * Implementation details of the RDMA core, don't use in drivers:
>   	 */
>   	struct rdma_restrack_entry res;
> +	struct xarray mmap_xa;
>   };
>   
>   struct ib_uobject {
> @@ -2251,6 +2252,20 @@ struct iw_cm_conn_param;
>   
>   #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
>   
> +struct rdma_user_mmap_entry {
> +	struct kref ref;
> +	struct ib_ucontext *ucontext;
> +	u32 npages;
> +	u32 mmap_page;
> +	bool invalid;
> +};
> +
> +static inline u64
> +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> +{
> +	return (u64)entry->mmap_page << PAGE_SHIFT;
> +}
> +
>   /**
>    * struct ib_device_ops - InfiniBand device operations
>    * This structure defines all the InfiniBand device operations, providers will
> @@ -2363,6 +2378,13 @@ struct ib_device_ops {
>   			      struct ib_udata *udata);
>   	void (*dealloc_ucontext)(struct ib_ucontext *context);
>   	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
> +	/**
> +	 * This will be called once refcount of an entry in mmap_xa reaches
> +	 * zero. The type of the memory that was mapped may differ between
> +	 * entries and is opaque to the rdma_user_mmap interface.
> +	 * Therefore needs to be implemented by the driver in mmap_free.
> +	 */
> +	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
>   	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
>   	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
>   	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> @@ -2801,6 +2823,18 @@ static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
>   	return -EINVAL;
>   }
>   #endif
> +int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> +				struct rdma_user_mmap_entry *entry,
> +				size_t length);
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
> +			 struct vm_area_struct *vma);
> +
> +void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
> +			      struct rdma_user_mmap_entry *entry);
> +
> +void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
> +				 struct rdma_user_mmap_entry *entry);
>   
>   static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
>   {
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [EXT] Re: [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions
  2019-10-31 12:35   ` Yishai Hadas
@ 2019-11-01 10:19     ` Michal Kalderon
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-11-01 10:19 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Ariel Elior, dledford, jgg, galpress, yishaih, bmt, linux-rdma

> From: Yishai Hadas <yishaih@dev.mellanox.co.il>
> Sent: Thursday, October 31, 2019 2:36 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 10/30/2019 11:44 AM, Michal Kalderon wrote:
> > Create some common API's for adding entries to a xa_mmap.
> > Searching for an entry and freeing one.
> >
> > Most of the code was copied from the efa driver almost as is, just
> > renamed function to be generic and not efa specific.
> > The fact that this code moved to core enabled managing it differently,
> > so that now entries can be removed and deleted when driver+user are
> > done with them. This enabled changing the insert algorithm in
> > comparison to what was done in efa.
> >
> > Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > ---
> >   drivers/infiniband/core/device.c         |   1 +
> >   drivers/infiniband/core/ib_core_uverbs.c | 201
> +++++++++++++++++++++++++++++++
> >   drivers/infiniband/core/rdma_core.c      |   1 +
> >   drivers/infiniband/core/uverbs_cmd.c     |   2 +
> >   include/rdma/ib_verbs.h                  |  34 ++++++
> >   5 files changed, 239 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/device.c
> > b/drivers/infiniband/core/device.c
> > index a667636f74bf..bf3a683057bc 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2629,6 +2629,7 @@ void ib_set_device_ops(struct ib_device *dev,
> const struct ib_device_ops *ops)
> >   	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> >   	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> >   	SET_DEVICE_OP(dev_ops, mmap);
> > +	SET_DEVICE_OP(dev_ops, mmap_free);
> >   	SET_DEVICE_OP(dev_ops, modify_ah);
> >   	SET_DEVICE_OP(dev_ops, modify_cq);
> >   	SET_DEVICE_OP(dev_ops, modify_device); diff --git
> > a/drivers/infiniband/core/ib_core_uverbs.c
> > b/drivers/infiniband/core/ib_core_uverbs.c
> > index b74d2a2fb342..1ffc89fd5d94 100644
> > --- a/drivers/infiniband/core/ib_core_uverbs.c
> > +++ b/drivers/infiniband/core/ib_core_uverbs.c
> > @@ -71,3 +71,204 @@ int rdma_user_mmap_io(struct ib_ucontext
> *ucontext, struct vm_area_struct *vma,
> >   	return 0;
> >   }
> >   EXPORT_SYMBOL(rdma_user_mmap_io);
> > +
> > +/**
> > + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @key: the key received from rdma_user_mmap_entry_insert which
> > + *     is provided by user as the address to map.
> > + * @vma: the vma related to the current mmap call.
> > + *
> > + * This function is called when a user tries to mmap a key it
> > + * initially received from the driver. The key was created by
> > + * the function rdma_user_mmap_entry_insert.
> > + * This function increases the refcnt of the entry so that it won't
> > + * be deleted from the xa in the meantime.
> > + *
> > + * Return an entry if exists or NULL if there is no match.
> > + */
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
> > +			 struct vm_area_struct *vma)
> 
> Where @vma is used in this function ? I would expect that this API will return
> the entry pointed by @key without any relation to the @vma, wasn't that
> the plan ?
Yes, seems this is left over from first version of the series that also tried making the
Remap common. 
The parameter can be removed.

> 
> > +{
> > +	struct rdma_user_mmap_entry *entry;
> > +	u64 mmap_page;
> > +
> > +	mmap_page = key >> PAGE_SHIFT;
> > +	if (mmap_page > U32_MAX)
> > +		return NULL;
> > +
> > +	xa_lock(&ucontext->mmap_xa);
> > +
> > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > +
> > +	/* if refcount is zero, entry is already being deleted */
> > +	if (!entry || entry->invalid || !kref_get_unless_zero(&entry->ref))
> > +		goto err;
> > +
> > +	xa_unlock(&ucontext->mmap_xa);
> > +
> > +	ibdev_dbg(ucontext->device,
> > +		  "mmap: key[%#llx] npages[%#x] returned\n",
> > +		  key, entry->npages);
> > +
> > +	return entry;
> > +
> > +err:
> > +	xa_unlock(&ucontext->mmap_xa);
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> > +
> > +void rdma_user_mmap_entry_free(struct kref *kref) {
> > +	struct rdma_user_mmap_entry *entry =
> > +		container_of(kref, struct rdma_user_mmap_entry, ref);
> > +	struct ib_ucontext *ucontext = entry->ucontext;
> > +	unsigned long i;
> > +
> > +	/* need to erase all entries occupied by this single entry */
> > +	xa_lock(&ucontext->mmap_xa);
> > +	for (i = 0; i < entry->npages; i++)
> > +		__xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
> > +	xa_unlock(&ucontext->mmap_xa);
> > +
> > +	ibdev_dbg(ucontext->device,
> > +		  "mmap: key[%#llx] npages[%#x] removed\n",
> > +		  rdma_user_mmap_get_key(entry),
> > +		  entry->npages);
> > +
> > +	if (ucontext->device->ops.mmap_free)
> > +		ucontext->device->ops.mmap_free(entry);
> > +}
> > +
> > +/**
> > + * rdma_user_mmap_entry_put() - Drop reference to the mmap entry
> > + *
> > + * @ucontext: associated user context.
> > + * @entry: an entry in the mmap_xa.
> > + *
> > + * This function is called when the mapping is closed if it was
> > + * an io mapping or when the driver is done with the entry for
> > + * some other reason.
> > + * Should be called after rdma_user_mmap_entry_get was called
> > + * and entry is no longer needed. This function will erase the
> > + * entry and free it if its refcnt reaches zero.
> > + */
> > +void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
> > +			      struct rdma_user_mmap_entry *entry) {
> > +	kref_put(&entry->ref, rdma_user_mmap_entry_free); }
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_put);
> > +
> > +/**
> > + * rdma_user_mmap_entry_remove() - Drop reference to entry and
> > + *				   mark it as invalid.
> > + *
> > + * @ucontext: associated user context.
> > + * @entry: the entry to insert into the mmap_xa  */ void
> > +rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
> > +				 struct rdma_user_mmap_entry *entry) {
> > +	if (!entry)
> > +		return;
> > +
> > +	entry->invalid = true;
> > +	kref_put(&entry->ref, rdma_user_mmap_entry_free); }
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
> > +
> > +/**
> > + * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @entry: the entry to insert into the mmap_xa
> > + * @length: length of the address that will be mmapped
> > + *
> > + * This function should be called by drivers that use the
> > +rdma_user_mmap
> > + * interface for handling user mmapped addresses. The database is
> > +handled in
> > + * the core and helper functions are provided to insert entries into
> > +the
> > + * database and extract entries when the user calls mmap with the given
> key.
> > + * The function allocates a unique key that should be provided to
> > +user, the user
> > + * will use the key to retrieve information such as address to
> > + * be mapped and how.
> > + *
> > + * Return: 0 on success and -ENOMEM on failure  */ int
> > +rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> > +				struct rdma_user_mmap_entry *entry,
> > +				size_t length)
> > +{
> > +	struct ib_uverbs_file *ufile = ucontext->ufile;
> > +	XA_STATE(xas, &ucontext->mmap_xa, 0);
> > +	u32 xa_first, xa_last, npages;
> > +	int err, i;
> > +
> > +	if (!entry)
> > +		return -EINVAL;
> > +
> > +	kref_init(&entry->ref);
> > +	entry->ucontext = ucontext;
> > +
> > +	/* We want the whole allocation to be done without interruption
> > +	 * from a different thread. The allocation requires finding a
> > +	 * free range and storing. During the xa_insert the lock could be
> > +	 * released, we don't want another thread taking the gap.
> > +	 */
> > +	mutex_lock(&ufile->umap_lock);
> > +
> > +	xa_lock(&ucontext->mmap_xa);
> > +
> > +	/* We want to find an empty range */
> > +	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
> > +	entry->npages = npages;
> > +	while (true) {
> > +		/* First find an empty index */
> > +		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
> > +		if (xas.xa_node == XAS_RESTART)
> > +			goto err_unlock;
> > +
> > +		xa_first = xas.xa_index;
> > +
> > +		/* Is there enough room to have the range? */
> > +		if (check_add_overflow(xa_first, npages, &xa_last))
> > +			goto err_unlock;
> > +
> > +		/* Now look for the next present entry. If such doesn't
> > +		 * exist, we found an empty range and can proceed
> > +		 */
> > +		xas_next_entry(&xas, xa_last - 1);
> > +		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
> > +			break;
> > +		/* o/w look for the next free entry */
> > +	}
> > +
> > +	for (i = xa_first; i < xa_last; i++) {
> > +		err = __xa_insert(&ucontext->mmap_xa, i, entry,
> GFP_KERNEL);
> > +		if (err)
> > +			goto err_undo;
> > +	}
> > +
> > +	entry->mmap_page = xa_first;
> > +	xa_unlock(&ucontext->mmap_xa);
> > +
> > +	mutex_unlock(&ufile->umap_lock);
> > +	ibdev_dbg(ucontext->device,
> > +		  "mmap: key[%#llx] npages[%#x] inserted\n",
> > +		  rdma_user_mmap_get_key(entry), npages);
> > +
> > +	return 0;
> > +
> > +err_undo:
> > +	for (; i > xa_first; i--)
> > +		__xa_erase(&ucontext->mmap_xa, i - 1);
> > +
> > +err_unlock:
> > +	xa_unlock(&ucontext->mmap_xa);
> > +	mutex_unlock(&ufile->umap_lock);
> > +	return -ENOMEM;
> > +}
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
> > diff --git a/drivers/infiniband/core/rdma_core.c
> > b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..6c72773faf29 100644
> > --- a/drivers/infiniband/core/rdma_core.c
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct
> ib_uverbs_file *ufile,
> >   	rdma_restrack_del(&ucontext->res);
> >
> >   	ib_dev->ops.dealloc_ucontext(ucontext);
> > +	WARN_ON(!xa_empty(&ucontext->mmap_xa));
> >   	kfree(ucontext);
> >
> >   	ufile->ucontext = NULL;
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > b/drivers/infiniband/core/uverbs_cmd.c
> > index 14a80fd9f464..06ed32c8662f 100644
> > --- a/drivers/infiniband/core/uverbs_cmd.c
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -252,6 +252,8 @@ static int ib_uverbs_get_context(struct
> uverbs_attr_bundle *attrs)
> >   	ucontext->closing = false;
> >   	ucontext->cleanup_retryable = false;
> >
> > +	xa_init_flags(&ucontext->mmap_xa, XA_FLAGS_ALLOC);
> > +
> >   	ret = get_unused_fd_flags(O_CLOEXEC);
> >   	if (ret < 0)
> >   		goto err_free;
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> > 6a47ba85c54c..8a87c9d442bc 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -1471,6 +1471,7 @@ struct ib_ucontext {
> >   	 * Implementation details of the RDMA core, don't use in drivers:
> >   	 */
> >   	struct rdma_restrack_entry res;
> > +	struct xarray mmap_xa;
> >   };
> >
> >   struct ib_uobject {
> > @@ -2251,6 +2252,20 @@ struct iw_cm_conn_param;
> >
> >   #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
> >
> > +struct rdma_user_mmap_entry {
> > +	struct kref ref;
> > +	struct ib_ucontext *ucontext;
> > +	u32 npages;
> > +	u32 mmap_page;
> > +	bool invalid;
> > +};
> > +
> > +static inline u64
> > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry
> *entry) {
> > +	return (u64)entry->mmap_page << PAGE_SHIFT; }
> > +
> >   /**
> >    * struct ib_device_ops - InfiniBand device operations
> >    * This structure defines all the InfiniBand device operations,
> > providers will @@ -2363,6 +2378,13 @@ struct ib_device_ops {
> >   			      struct ib_udata *udata);
> >   	void (*dealloc_ucontext)(struct ib_ucontext *context);
> >   	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct
> > *vma);
> > +	/**
> > +	 * This will be called once refcount of an entry in mmap_xa reaches
> > +	 * zero. The type of the memory that was mapped may differ
> between
> > +	 * entries and is opaque to the rdma_user_mmap interface.
> > +	 * Therefore needs to be implemented by the driver in mmap_free.
> > +	 */
> > +	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
> >   	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
> >   	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> >   	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata); @@
> > -2801,6 +2823,18 @@ static inline int rdma_user_mmap_io(struct
> ib_ucontext *ucontext,
> >   	return -EINVAL;
> >   }
> >   #endif
> > +int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> > +				struct rdma_user_mmap_entry *entry,
> > +				size_t length);
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
> > +			 struct vm_area_struct *vma);
> > +
> > +void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
> > +			      struct rdma_user_mmap_entry *entry);
> > +
> > +void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
> > +				 struct rdma_user_mmap_entry *entry);
> >
> >   static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata,
> size_t len)
> >   {
> >


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions
  2019-10-30  9:44 ` [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
  2019-10-31 12:35   ` Yishai Hadas
@ 2019-11-05 19:44   ` Jason Gunthorpe
  1 sibling, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2019-11-05 19:44 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: ariel.elior, dledford, galpress, yishaih, bmt, linux-rdma

On Wed, Oct 30, 2019 at 11:44:11AM +0200, Michal Kalderon wrote:
> diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
> index b74d2a2fb342..1ffc89fd5d94 100644
> +++ b/drivers/infiniband/core/ib_core_uverbs.c
> @@ -71,3 +71,204 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  	return 0;
>  }
>  EXPORT_SYMBOL(rdma_user_mmap_io);
> +
> +/**
> + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @key: the key received from rdma_user_mmap_entry_insert which
> + *     is provided by user as the address to map.
> + * @vma: the vma related to the current mmap call.
> + *
> + * This function is called when a user tries to mmap a key it
> + * initially received from the driver. The key was created by
> + * the function rdma_user_mmap_entry_insert.
> + * This function increases the refcnt of the entry so that it won't
> + * be deleted from the xa in the meantime.
> + *
> + * Return an entry if exists or NULL if there is no match.
> + */
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
> +			 struct vm_area_struct *vma)

This should just accept 'key' (but called pgoff for clarity), but
since everyone has a vma a static inline wrapper should take in the
vma

> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;

It isn't even really 'key' here if it was shifted as that is called
'offset'

> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	xa_lock(&ucontext->mmap_xa);
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);

Since each xarray entry in the range stores the same pointer this
needs to check that mmap_page is actually the right entry, attempting
to directly mmap the 2nd page should fail as we don't have the rest of
the infrastructure to make that work.


> +/**
> + * rdma_user_mmap_entry_put() - Drop reference to the mmap entry
> + *
> + * @ucontext: associated user context.
> + * @entry: an entry in the mmap_xa.
> + *
> + * This function is called when the mapping is closed if it was
> + * an io mapping or when the driver is done with the entry for
> + * some other reason.
> + * Should be called after rdma_user_mmap_entry_get was called
> + * and entry is no longer needed. This function will erase the
> + * entry and free it if its refcnt reaches zero.
> + */
> +void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
> +			      struct rdma_user_mmap_entry *entry)
> +{

ucontext is not needed

> +	kref_put(&entry->ref, rdma_user_mmap_entry_free);
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_put);
> +
> +/**
> + * rdma_user_mmap_entry_remove() - Drop reference to entry and
> + *				   mark it as invalid.
> + *
> + * @ucontext: associated user context.
> + * @entry: the entry to insert into the mmap_xa
> + */
> +void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
> +				 struct rdma_user_mmap_entry *entry)
> +{

ucontext is not needed

> +/**
> + * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @entry: the entry to insert into the mmap_xa
> + * @length: length of the address that will be mmapped
> + *
> + * This function should be called by drivers that use the rdma_user_mmap
> + * interface for handling user mmapped addresses. The database is handled in
> + * the core and helper functions are provided to insert entries into the
> + * database and extract entries when the user calls mmap with the given key.
> + * The function allocates a unique key that should be provided to user, the user
> + * will use the key to retrieve information such as address to
> + * be mapped and how.
> + *
> + * Return: 0 on success and -ENOMEM on failure
> + */
> +int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> +				struct rdma_user_mmap_entry *entry,
> +				size_t length)
> +{
> +	struct ib_uverbs_file *ufile = ucontext->ufile;
> +	XA_STATE(xas, &ucontext->mmap_xa, 0);
> +	u32 xa_first, xa_last, npages;
> +	int err, i;

'i' should be u32

> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 6a47ba85c54c..8a87c9d442bc 100644
> +++ b/include/rdma/ib_verbs.h
> @@ -1471,6 +1471,7 @@ struct ib_ucontext {
>  	 * Implementation details of the RDMA core, don't use in drivers:
>  	 */
>  	struct rdma_restrack_entry res;
> +	struct xarray mmap_xa;
>  };
>  
>  struct ib_uobject {
> @@ -2251,6 +2252,20 @@ struct iw_cm_conn_param;
>  
>  #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
>  
> +struct rdma_user_mmap_entry {
> +	struct kref ref;
> +	struct ib_ucontext *ucontext;
> +	u32 npages;
> +	u32 mmap_page;
> +	bool invalid;

These names are confusing, lets use:

       unsigned long start_pgoff;
       size_t npages;
       bool driver_removed;

> +};
> +
> +static inline u64
> +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> +{
> +	return (u64)entry->mmap_page << PAGE_SHIFT;
> +}

This is offset not key, in fact lets not use the word 'key' at
all. Either 'pgoff' or 'offset'

There are also a number of fixables in comments, grammer, indentation
and style..

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 3/8] RDMA: Connect between the mmap entry and the umap_priv structure
  2019-10-30  9:44 ` [PATCH v12 rdma-next 3/8] RDMA: Connect between the mmap entry and the umap_priv structure Michal Kalderon
@ 2019-11-05 19:47   ` Jason Gunthorpe
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2019-11-05 19:47 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: ariel.elior, dledford, galpress, yishaih, bmt, linux-rdma

On Wed, Oct 30, 2019 at 11:44:12AM +0200, Michal Kalderon wrote:

> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index b1f5334ff907..dbe9bd3d389a 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -819,7 +819,7 @@ static void rdma_umap_open(struct vm_area_struct *vma)
>  	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>  	if (!priv)
>  		goto out_unlock;
> -	rdma_umap_priv_init(priv, vma);
> +	rdma_umap_priv_init(priv, vma, opriv->entry);
>  
>  	up_read(&ufile->hw_destroy_rwsem);
>  	return;
> @@ -844,6 +844,11 @@ static void rdma_umap_close(struct vm_area_struct *vma)
>  	if (!priv)
>  		return;
>  
> +	if (priv->entry) {
> +		rdma_user_mmap_entry_put(ufile->ucontext, priv->entry);
> +		priv->entry = NULL;
> +	}
> +

This should be done inside the lock otherwise it can race with
uverbs_user_mmap_disassociate(), the assignment of NULL is not needed
as we free it immediately after.


> @@ -946,6 +951,13 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
>  
>  			if (vma->vm_mm != mm)
>  				continue;
> +
> +			if (priv->entry) {
> +				rdma_user_mmap_entry_put(ufile->ucontext,
> +							 priv->entry);
> +				priv->entry = NULL;
> +			}
> +
>  			list_del_init(&priv->list);
>  
>  			zap_vma_ptes(vma, vma->vm_start,

The zap needs to be before the entry_put so that the pages are
actually removed before the driver goes to free them

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers
  2019-10-30  9:44 ` [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
@ 2019-11-05 19:54   ` Jason Gunthorpe
  2019-11-06  8:20     ` Gal Pressman
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2019-11-05 19:54 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: ariel.elior, dledford, galpress, yishaih, bmt, linux-rdma

On Wed, Oct 30, 2019 at 11:44:13AM +0200, Michal Kalderon wrote:

> @@ -116,6 +122,13 @@ struct efa_ah {
>  	u8 id[EFA_GID_SIZE];
>  };
>  
> +struct efa_user_mmap_entry {
> +	struct rdma_user_mmap_entry rdma_entry;
> +	u64 address;
> +	size_t length;
> +	u8 mmap_flag;
> +};

There is no reason to move this struct out of efa_verbs.c

length is redundant with rdma_entry.npages

>  static int qp_mmap_entries_setup(struct efa_qp *qp,
>  				 struct efa_dev *dev,
>  				 struct efa_ucontext *ucontext,
>  				 struct efa_com_create_qp_params *params,
>  				 struct efa_ibv_create_qp_resp *resp)
>  {
> -	/*
> -	 * Once an entry is inserted it might be mmapped, hence cannot be
> -	 * cleaned up until dealloc_ucontext.
> -	 */
> -	resp->sq_db_mmap_key =
> -		mmap_entry_insert(dev, ucontext, qp,
> -				  dev->db_bar_addr + resp->sq_db_offset,
> -				  PAGE_SIZE, EFA_MMAP_IO_NC);
> -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
> +	size_t length;
> +	u64 address;
> +
> +	address = dev->db_bar_addr + resp->sq_db_offset;
> +	qp->sq_db_mmap_entry =
> +		efa_user_mmap_entry_insert(&ucontext->ibucontext,
> +					   address,
> +					   PAGE_SIZE, EFA_MMAP_IO_NC,
> +					   &resp->sq_db_mmap_key);

I'm still confused how this is OK for the lifetime, 'sq_db_offset'
comes from the device, does the device prevent re-use of the same
db_offset until the ucontext is closed? If so that deserves a comment
in here.

>  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
> -		      struct vm_area_struct *vma, u64 key, u64 length)
> +		      struct vm_area_struct *vma, u64 key, size_t length)
>  {
> -	struct efa_mmap_entry *entry;
> +	struct rdma_user_mmap_entry *rdma_entry;
> +	struct efa_user_mmap_entry *entry;
>  	unsigned long va;
> +	int err = 0;
>  	u64 pfn;
> -	int err;
>  
> -	entry = mmap_entry_get(dev, ucontext, key, length);
> -	if (!entry) {
> +	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
> +					      vma);
> +	if (!rdma_entry) {
>  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
>  			  key);
>  		return -EINVAL;
>  	}
> +	entry = to_emmap(rdma_entry);
> +	if (entry->length != length) {
> +		ibdev_dbg(&dev->ibdev,
> +			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
> +			  key, length, entry->length);
> +		err = -EINVAL;
> +		goto out;
> +	}

Should be in common code

Same with the VM_SHARED (whichi s only needed for the vm_insert_page
flow)

Also driver should not be messing with VM_EXEC as that is known to
cause problems.

> @@ -1648,16 +1615,16 @@ int efa_mmap(struct ib_ucontext *ibucontext,
>  {
>  	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
>  	struct efa_dev *dev = to_edev(ibucontext->device);
> -	u64 length = vma->vm_end - vma->vm_start;
> +	size_t length = vma->vm_end - vma->vm_start;
>  	u64 key = vma->vm_pgoff << PAGE_SHIFT;
>  
>  	ibdev_dbg(&dev->ibdev,
> -		  "start %#lx, end %#lx, length = %#llx, key = %#llx\n",
> +		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
>  		  vma->vm_start, vma->vm_end, length, key);
>  
>  	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {

vm_end - vm_start is always page aligned

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
  2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (8 preceding siblings ...)
  2019-10-30 15:33 ` [PATCH v12 rdma-next 5/8] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
@ 2019-11-05 20:41 ` Jason Gunthorpe
  2019-11-06 13:04   ` Michal Kalderon
  9 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2019-11-05 20:41 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: ariel.elior, dledford, galpress, yishaih, bmt, linux-rdma

On Wed, Oct 30, 2019 at 11:44:09AM +0200, Michal Kalderon wrote:
> This patch series uses the doorbell overflow recovery mechanism
> introduced in
> commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
> for rdma ( RoCE and iWARP )
> 
> The first six patches modify the core code to contain helper
> functions for managing mmap_xa inserting, getting and freeing
> entries. The code was based on the code from efa driver.
> There is still an open discussion on whether we should take
> this even further and make the entire mmap generic. Until a
> decision is made, I only created the database API and modified
> the efa, qedr, siw driver to use it. The functions are integrated
> with the umap mechanism.
> 
> The doorbell recovery code is based on the common code.
> 
> rdma-core pull request #493 was closed for now, once kernel series is
> accepted will be reopend.
> 
> This series applies over the wip/jgg-for-next branch and not the
> for-next since it contains the series:
> RDMA/qedr: Fix memory leaks and synchronization
> https://www.spinics.net/lists/linux-rdma/msg85242.html
> 
> SIW driver was reviewed, tested and signed-off by Bernard Metzler.

Since we are on v12 now, let us get this done. I added this diff to
the series. Mostly just renaming, indenting, the notes I gave already
(to all drivers) and probably a few other things I forgot

Here is the series with everything reflowed hopefully properly:

https://github.com/jgunthorpe/linux/commits/rdma_mmap

Please let me know if I messed it up, otherwise I'll apply this in a
few days.

Thanks,
Jason

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 355c59d2eaa35b..6be180eb8cb329 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -398,7 +398,4 @@ void rdma_umap_priv_init(struct rdma_umap_priv *priv,
 			 struct vm_area_struct *vma,
 			 struct rdma_user_mmap_entry *entry);
 
-void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
-			      struct rdma_user_mmap_entry *entry);
-
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
index 88d9d47fb8adaa..6238842fd06402 100644
--- a/drivers/infiniband/core/ib_core_uverbs.c
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -11,14 +11,14 @@
 /**
  * rdma_umap_priv_init() - Initialize the private data of a vma
  *
+ * @priv: The already allocated private data
  * @vma: The vm area struct that needs private data
  * @entry: entry into the mmap_xa that needs to be linked with
  *       this vma
  *
- * Each time we map IO memory into user space this keeps track
- * of the mapping. When the device is hot-unplugged we 'zap' the
- * mmaps in user space to point to the zero page and allow the
- * hot unplug to proceed.
+ * Each time we map IO memory into user space this keeps track of the
+ * mapping. When the device is hot-unplugged we 'zap' the mmaps in user space
+ * to point to the zero page and allow the hot unplug to proceed.
  *
  * This is necessary for cases like PCI physical hot unplug as the actual BAR
  * memory may vanish after this and access to it from userspace could MCE.
@@ -48,20 +48,21 @@ void rdma_umap_priv_init(struct rdma_umap_priv *priv,
 EXPORT_SYMBOL(rdma_umap_priv_init);
 
 /**
- * rdma_user_mmap_io() - Map IO memory into a process.
+ * rdma_user_mmap_io() - Map IO memory into a process
  *
  * @ucontext: associated user context
- * @vma: the vma related to the current mmap call.
+ * @vma: the vma related to the current mmap call
  * @pfn: pfn to map
  * @size: size to map
  * @prot: pgprot to use in remap call
+ * @entry: mmap_entry retrieved from rdma_user_mmap_entry_get(), or NULL
+ *         if mmap_entry is not used by the driver
  *
- * This is to be called by drivers as part of their mmap()
- * functions if they wish to send something like PCI-E BAR
- * memory to userspace.
+ * This is to be called by drivers as part of their mmap() functions if they
+ * wish to send something like PCI-E BAR memory to userspace.
  *
- * Return -EINVAL on wrong flags or size, -EAGAIN on failure to
- * map. 0 on success.
+ * Return -EINVAL on wrong flags or size, -EAGAIN on failure to map. 0 on
+ * success.
  */
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		      unsigned long pfn, unsigned long size, pgprot_t prot,
@@ -98,45 +99,46 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 EXPORT_SYMBOL(rdma_user_mmap_io);
 
 /**
- * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
+ * rdma_user_mmap_entry_get_pgoff() - Get an entry from the mmap_xa
  *
- * @ucontext: associated user context.
- * @key: the key received from rdma_user_mmap_entry_insert which
- *     is provided by user as the address to map.
- * @vma: the vma related to the current mmap call.
+ * @ucontext: associated user context
+ * @pgoff: The mmap offset >> PAGE_SHIFT
  *
- * This function is called when a user tries to mmap a key it
- * initially received from the driver. The key was created by
- * the function rdma_user_mmap_entry_insert.
- * This function increases the refcnt of the entry so that it won't
- * be deleted from the xa in the meantime.
+ * This function is called when a user tries to mmap with an offset (returned
+ * by rdma_user_mmap_get_offset()) it initially received from the driver. The
+ * rdma_user_mmap_entry was created by the function
+ * rdma_user_mmap_entry_insert().  This function increases the refcnt of the
+ * entry so that it won't be deleted from the xarray in the meantime.
  *
- * Return an entry if exists or NULL if there is no match.
+ * Return an reference to an entry if exists or NULL if there is no
+ * match. rdma_user_mmap_entry_put() must be called to put the reference.
  */
 struct rdma_user_mmap_entry *
-rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
-			 struct vm_area_struct *vma)
+rdma_user_mmap_entry_get_pgoff(struct ib_ucontext *ucontext,
+			       unsigned long pgoff)
 {
 	struct rdma_user_mmap_entry *entry;
-	u64 mmap_page;
 
-	mmap_page = key >> PAGE_SHIFT;
-	if (mmap_page > U32_MAX)
+	if (pgoff > U32_MAX)
 		return NULL;
 
 	xa_lock(&ucontext->mmap_xa);
 
-	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+	entry = xa_load(&ucontext->mmap_xa, pgoff);
 
-	/* if refcount is zero, entry is already being deleted */
-	if (!entry || entry->invalid || !kref_get_unless_zero(&entry->ref))
+	/*
+	 * If refcount is zero, entry is already being deleted, driver_removed
+	 * indicates that the no further mmaps are possible and we waiting for
+	 * the active VMAs to be closed.
+	 */
+	if (!entry || entry->start_pgoff != pgoff || entry->driver_removed ||
+	    !kref_get_unless_zero(&entry->ref))
 		goto err;
 
 	xa_unlock(&ucontext->mmap_xa);
 
-	ibdev_dbg(ucontext->device,
-		  "mmap: key[%#llx] npages[%#x] returned\n",
-		  key, entry->npages);
+	ibdev_dbg(ucontext->device, "mmap: pgoff[%#lx] npages[%#zx] returned\n",
+		  pgoff, entry->npages);
 
 	return entry;
 
@@ -144,25 +146,54 @@ rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
 	xa_unlock(&ucontext->mmap_xa);
 	return NULL;
 }
+EXPORT_SYMBOL(rdma_user_mmap_entry_get_pgoff);
+
+/**
+ * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa
+ *
+ * @ucontext: associated user context
+ * @vma: the vma being mmap'd into
+ *
+ * This function is like rdma_user_mmap_entry_get_pgoff() except that it also
+ * checks that the VMA is correct.
+ */
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext,
+			 struct vm_area_struct *vma)
+{
+	struct rdma_user_mmap_entry *entry;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return NULL;
+	entry = rdma_user_mmap_entry_get_pgoff(ucontext, vma->vm_pgoff);
+	if (!entry)
+		return NULL;
+	if (entry->npages * PAGE_SIZE != vma->vm_end - vma->vm_start) {
+		rdma_user_mmap_entry_put(entry);
+		return NULL;
+	}
+	return entry;
+}
 EXPORT_SYMBOL(rdma_user_mmap_entry_get);
 
-void rdma_user_mmap_entry_free(struct kref *kref)
+static void rdma_user_mmap_entry_free(struct kref *kref)
 {
 	struct rdma_user_mmap_entry *entry =
 		container_of(kref, struct rdma_user_mmap_entry, ref);
 	struct ib_ucontext *ucontext = entry->ucontext;
 	unsigned long i;
 
-	/* need to erase all entries occupied by this single entry */
+	/*
+	 * Erase all entries occupied by this single entry, this is deferred
+	 * until all VMA are closed so that the mmap offsets remain unique.
+	 */
 	xa_lock(&ucontext->mmap_xa);
 	for (i = 0; i < entry->npages; i++)
-		__xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
+		__xa_erase(&ucontext->mmap_xa, entry->start_pgoff + i);
 	xa_unlock(&ucontext->mmap_xa);
 
-	ibdev_dbg(ucontext->device,
-		  "mmap: key[%#llx] npages[%#x] removed\n",
-		  rdma_user_mmap_get_key(entry),
-		  entry->npages);
+	ibdev_dbg(ucontext->device, "mmap: pgoff[%#lx] npages[%#zx] removed\n",
+		  entry->start_pgoff, entry->npages);
 
 	if (ucontext->device->ops.mmap_free)
 		ucontext->device->ops.mmap_free(entry);
@@ -171,8 +202,7 @@ void rdma_user_mmap_entry_free(struct kref *kref)
 /**
  * rdma_user_mmap_entry_put() - Drop reference to the mmap entry
  *
- * @ucontext: associated user context.
- * @entry: an entry in the mmap_xa.
+ * @entry: an entry in the mmap_xa
  *
  * This function is called when the mapping is closed if it was
  * an io mapping or when the driver is done with the entry for
@@ -181,8 +211,7 @@ void rdma_user_mmap_entry_free(struct kref *kref)
  * and entry is no longer needed. This function will erase the
  * entry and free it if its refcnt reaches zero.
  */
-void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
-			      struct rdma_user_mmap_entry *entry)
+void rdma_user_mmap_entry_put(struct rdma_user_mmap_entry *entry)
 {
 	kref_put(&entry->ref, rdma_user_mmap_entry_free);
 }
@@ -190,36 +219,38 @@ EXPORT_SYMBOL(rdma_user_mmap_entry_put);
 
 /**
  * rdma_user_mmap_entry_remove() - Drop reference to entry and
- *				   mark it as invalid.
+ *				   mark it as unmmapable
  *
- * @ucontext: associated user context.
  * @entry: the entry to insert into the mmap_xa
+ *
+ * Drivers can call this to prevent userspace from creating more mappings for
+ * entry, however existing mmaps continue to exist and ops->mmap_free() will
+ * not be called until all user mmaps are destroyed.
  */
-void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
-				 struct rdma_user_mmap_entry *entry)
+void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry)
 {
 	if (!entry)
 		return;
 
-	entry->invalid = true;
+	entry->driver_removed = true;
 	kref_put(&entry->ref, rdma_user_mmap_entry_free);
 }
 EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
 
 /**
- * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
+ * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa
  *
  * @ucontext: associated user context.
  * @entry: the entry to insert into the mmap_xa
  * @length: length of the address that will be mmapped
  *
  * This function should be called by drivers that use the rdma_user_mmap
- * interface for handling user mmapped addresses. The database is handled in
- * the core and helper functions are provided to insert entries into the
- * database and extract entries when the user calls mmap with the given key.
- * The function allocates a unique key that should be provided to user, the user
- * will use the key to retrieve information such as address to
- * be mapped and how.
+ * interface for implementing their mmap syscall A database of mmap offsets is
+ * handled in the core and helper functions are provided to insert entries
+ * into the database and extract entries when the user calls mmap with the
+ * given offset.  The function allocates a unique page offset that should be
+ * provided to user, the user will use the iffset to retrieve information such
+ * as address to be mapped and how.
  *
  * Return: 0 on success and -ENOMEM on failure
  */
@@ -230,7 +261,8 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 	struct ib_uverbs_file *ufile = ucontext->ufile;
 	XA_STATE(xas, &ucontext->mmap_xa, 0);
 	u32 xa_first, xa_last, npages;
-	int err, i;
+	int err;
+	u32 i;
 
 	if (!entry)
 		return -EINVAL;
@@ -238,10 +270,11 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 	kref_init(&entry->ref);
 	entry->ucontext = ucontext;
 
-	/* We want the whole allocation to be done without interruption
-	 * from a different thread. The allocation requires finding a
-	 * free range and storing. During the xa_insert the lock could be
-	 * released, we don't want another thread taking the gap.
+	/*
+	 * We want the whole allocation to be done without interruption from a
+	 * different thread. The allocation requires finding a free range and
+	 * storing. During the xa_insert the lock could be released, possibly
+	 * allowing another thread to choose the same range.
 	 */
 	mutex_lock(&ufile->umap_lock);
 
@@ -262,13 +295,13 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 		if (check_add_overflow(xa_first, npages, &xa_last))
 			goto err_unlock;
 
-		/* Now look for the next present entry. If such doesn't
-		 * exist, we found an empty range and can proceed
+		/*
+		 * Now look for the next present entry. If an entry doesn't
+		 * exist, we found an empty range and can proceed.
 		 */
 		xas_next_entry(&xas, xa_last - 1);
 		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
 			break;
-		/* o/w look for the next free entry */
 	}
 
 	for (i = xa_first; i < xa_last; i++) {
@@ -277,13 +310,16 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 			goto err_undo;
 	}
 
-	entry->mmap_page = xa_first;
+	/*
+	 * Internally the kernel uses a page offset, in libc this is a byte
+	 * offset. Drivers should not return pgoff to userspace.
+	 */
+	entry->start_pgoff = xa_first;
 	xa_unlock(&ucontext->mmap_xa);
-
 	mutex_unlock(&ufile->umap_lock);
-	ibdev_dbg(ucontext->device,
-		  "mmap: key[%#llx] npages[%#x] inserted\n",
-		  rdma_user_mmap_get_key(entry), npages);
+
+	ibdev_dbg(ucontext->device, "mmap: pgoff[%#lx] npages[%#x] inserted\n",
+		  entry->start_pgoff, npages);
 
 	return 0;
 
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index dbe9bd3d389a28..9aa7ffc1d12a93 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -844,17 +844,15 @@ static void rdma_umap_close(struct vm_area_struct *vma)
 	if (!priv)
 		return;
 
-	if (priv->entry) {
-		rdma_user_mmap_entry_put(ufile->ucontext, priv->entry);
-		priv->entry = NULL;
-	}
-
 	/*
 	 * The vma holds a reference on the struct file that created it, which
 	 * in turn means that the ib_uverbs_file is guaranteed to exist at
 	 * this point.
 	 */
 	mutex_lock(&ufile->umap_lock);
+	if (priv->entry)
+		rdma_user_mmap_entry_put(priv->entry);
+
 	list_del(&priv->list);
 	mutex_unlock(&ufile->umap_lock);
 	kfree(priv);
@@ -951,17 +949,15 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 
 			if (vma->vm_mm != mm)
 				continue;
-
-			if (priv->entry) {
-				rdma_user_mmap_entry_put(ufile->ucontext,
-							 priv->entry);
-				priv->entry = NULL;
-			}
-
 			list_del_init(&priv->list);
 
 			zap_vma_ptes(vma, vma->vm_start,
 				     vma->vm_end - vma->vm_start);
+
+			if (priv->entry) {
+				rdma_user_mmap_entry_put(priv->entry);
+				priv->entry = NULL;
+			}
 		}
 		mutex_unlock(&ufile->umap_lock);
 	skip_mm:
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 482d8acaad2c1d..2bda07078b9743 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -122,13 +122,6 @@ struct efa_ah {
 	u8 id[EFA_GID_SIZE];
 };
 
-struct efa_user_mmap_entry {
-	struct rdma_user_mmap_entry rdma_entry;
-	u64 address;
-	size_t length;
-	u8 mmap_flag;
-};
-
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index f39e24df29a55e..b242ea7a3bc868 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -23,6 +23,12 @@ enum {
 	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
 	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
 
+struct efa_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	u8 mmap_flag;
+};
+
 #define EFA_DEFINE_STATS(op) \
 	op(EFA_TX_BYTES, "tx_bytes") \
 	op(EFA_TX_PKTS, "tx_pkts") \
@@ -376,10 +382,10 @@ static int efa_destroy_qp_handle(struct efa_dev *dev, u32 qp_handle)
 static void efa_qp_user_mmap_entries_remove(struct efa_ucontext *uctx,
 					    struct efa_qp *qp)
 {
-	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->rq_mmap_entry);
-	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->rq_db_mmap_entry);
-	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->llq_desc_mmap_entry);
-	rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->sq_db_mmap_entry);
+	rdma_user_mmap_entry_remove(qp->rq_mmap_entry);
+	rdma_user_mmap_entry_remove(qp->rq_db_mmap_entry);
+	rdma_user_mmap_entry_remove(qp->llq_desc_mmap_entry);
+	rdma_user_mmap_entry_remove(qp->sq_db_mmap_entry);
 }
 
 int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
@@ -412,7 +418,7 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 static struct rdma_user_mmap_entry*
 efa_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 			   u64 address, size_t length,
-			   u8 mmap_flag, u64 *key)
+			   u8 mmap_flag, u64 *offset)
 {
 	struct efa_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
 	int err;
@@ -421,7 +427,6 @@ efa_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 		return NULL;
 
 	entry->address = address;
-	entry->length = length;
 	entry->mmap_flag = mmap_flag;
 
 	err = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
@@ -430,7 +435,7 @@ efa_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 		kfree(entry);
 		return NULL;
 	}
-	*key = rdma_user_mmap_get_key(&entry->rdma_entry);
+	*offset = rdma_user_mmap_get_offset(&entry->rdma_entry);
 
 	return &entry->rdma_entry;
 }
@@ -828,9 +833,6 @@ static int efa_destroy_cq_idx(struct efa_dev *dev, int cq_idx)
 
 void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
-	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata,
-			struct efa_ucontext, ibucontext);
-
 	struct efa_dev *dev = to_edev(ibcq->device);
 	struct efa_cq *cq = to_ecq(ibcq);
 
@@ -841,8 +843,7 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
-	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
-				    cq->mmap_entry);
+	rdma_user_mmap_entry_remove(cq->mmap_entry);
 }
 
 static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
@@ -975,7 +976,7 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return 0;
 
 err_remove_mmap:
-	rdma_user_mmap_entry_remove(&ucontext->ibucontext, cq->mmap_entry);
+	rdma_user_mmap_entry_remove(cq->mmap_entry);
 err_destroy_cq:
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 err_free_mapped:
@@ -1541,12 +1542,12 @@ void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 	/* DMA mapping is already gone, now free the pages */
 	if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
 		free_pages_exact(phys_to_virt(entry->address),
-				 entry->length);
+				 entry->rdma_entry.npages * PAGE_SIZE);
 	kfree(entry);
 }
 
 static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
-		      struct vm_area_struct *vma, u64 key, size_t length)
+		      struct vm_area_struct *vma)
 {
 	struct rdma_user_mmap_entry *rdma_entry;
 	struct efa_user_mmap_entry *entry;
@@ -1554,35 +1555,31 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 	int err = 0;
 	u64 pfn;
 
-	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
-					      vma);
+	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, vma);
 	if (!rdma_entry) {
-		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
-			  key);
+		ibdev_dbg(&dev->ibdev,
+			  "pgoff[%#lx] does not have valid entry\n",
+			  vma->vm_pgoff);
 		return -EINVAL;
 	}
 	entry = to_emmap(rdma_entry);
-	if (entry->length != length) {
-		ibdev_dbg(&dev->ibdev,
-			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
-			  key, length, entry->length);
-		err = -EINVAL;
-		goto out;
-	}
 
 	ibdev_dbg(&dev->ibdev,
 		  "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n",
-		  entry->address, entry->length, entry->mmap_flag);
+		  entry->address, rdma_entry->npages * PAGE_SIZE,
+		  entry->mmap_flag);
 
 	pfn = entry->address >> PAGE_SHIFT;
 	switch (entry->mmap_flag) {
 	case EFA_MMAP_IO_NC:
-		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length,
+		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn,
+					entry->rdma_entry.npages * PAGE_SIZE,
 					pgprot_noncached(vma->vm_page_prot),
 					rdma_entry);
 		break;
 	case EFA_MMAP_IO_WC:
-		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length,
+		err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn,
+					entry->rdma_entry.npages * PAGE_SIZE,
 					pgprot_writecombine(vma->vm_page_prot),
 					rdma_entry);
 		break;
@@ -1599,14 +1596,14 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 	}
 
 	if (err) {
-		ibdev_dbg(&dev->ibdev,
-			  "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
-			  entry->address, length, entry->mmap_flag, err);
+		ibdev_dbg(
+			&dev->ibdev,
+			"Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
+			entry->address, rdma_entry->npages * PAGE_SIZE,
+			entry->mmap_flag, err);
 	}
 
-out:
-	rdma_user_mmap_entry_put(&ucontext->ibucontext, rdma_entry);
-
+	rdma_user_mmap_entry_put(rdma_entry);
 	return err;
 }
 
@@ -1616,25 +1613,12 @@ int efa_mmap(struct ib_ucontext *ibucontext,
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
 	size_t length = vma->vm_end - vma->vm_start;
-	u64 key = vma->vm_pgoff << PAGE_SHIFT;
 
 	ibdev_dbg(&dev->ibdev,
-		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
-		  vma->vm_start, vma->vm_end, length, key);
-
-	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
-		ibdev_dbg(&dev->ibdev,
-			  "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
-			  length, PAGE_SIZE, vma->vm_flags);
-		return -EINVAL;
-	}
-
-	if (vma->vm_flags & VM_EXEC) {
-		ibdev_dbg(&dev->ibdev, "Mapping executable pages is not permitted\n");
-		return -EPERM;
-	}
+		  "start %#lx, end %#lx, length = %#zx, pgoff = %#lx\n",
+		  vma->vm_start, vma->vm_end, length, vma->vm_pgoff);
 
-	return __efa_mmap(dev, ucontext, vma, key, length);
+	return __efa_mmap(dev, ucontext, vma);
 }
 
 static int efa_ah_destroy(struct efa_dev *dev, struct efa_ah *ah)
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index b67248c9d92c29..9a674e65c4f783 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -321,7 +321,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	uresp.dpm_enabled = dev->user_dpm_enabled;
 	uresp.wids_enabled = 1;
 	uresp.wid_count = oparams.wid_count;
-	uresp.db_pa = rdma_user_mmap_get_key(ctx->db_mmap_entry);
+	uresp.db_pa = rdma_user_mmap_get_offset(ctx->db_mmap_entry);
 	uresp.db_size = ctx->dpi_size;
 	uresp.max_send_wr = dev->attr.max_sqe;
 	uresp.max_recv_wr = dev->attr.max_rqe;
@@ -345,7 +345,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	if (!ctx->db_mmap_entry)
 		dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
 	else
-		rdma_user_mmap_entry_remove(uctx, ctx->db_mmap_entry);
+		rdma_user_mmap_entry_remove(ctx->db_mmap_entry);
 
 	return rc;
 }
@@ -357,7 +357,7 @@ void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
 	DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n",
 		 uctx);
 
-	rdma_user_mmap_entry_remove(ibctx, uctx->db_mmap_entry);
+	rdma_user_mmap_entry_remove(uctx->db_mmap_entry);
 }
 
 void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
@@ -377,44 +377,22 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 {
 	struct ib_device *dev = ucontext->device;
 	size_t length = vma->vm_end - vma->vm_start;
-	u64 key = vma->vm_pgoff << PAGE_SHIFT;
 	struct rdma_user_mmap_entry *rdma_entry;
 	struct qedr_user_mmap_entry *entry;
 	int rc = 0;
 	u64 pfn;
 
 	ibdev_dbg(dev,
-		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
-		  vma->vm_start, vma->vm_end, length, key);
+		  "start %#lx, end %#lx, length = %#zx, pgoff = %#lx\n",
+		  vma->vm_start, vma->vm_end, length, vma->vm_pgoff);
 
-	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
-		ibdev_dbg(dev,
-			  "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
-			  length, PAGE_SIZE, vma->vm_flags);
-		return -EINVAL;
-	}
-
-	if (vma->vm_flags & VM_EXEC) {
-		ibdev_dbg(dev, "Mapping executable pages is not permitted\n");
-		return -EPERM;
-	}
-	vma->vm_flags &= ~VM_MAYEXEC;
-
-	rdma_entry = rdma_user_mmap_entry_get(ucontext, key, vma);
+	rdma_entry = rdma_user_mmap_entry_get(ucontext, vma);
 	if (!rdma_entry) {
-		ibdev_dbg(dev, "key[%#llx] does not have valid entry\n",
-			  key);
+		ibdev_dbg(dev, "pgoff[%#lx] does not have valid entry\n",
+			  vma->vm_pgoff);
 		return -EINVAL;
 	}
 	entry = get_qedr_mmap_entry(rdma_entry);
-	if (entry->length != length) {
-		ibdev_dbg(dev,
-			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
-			  key, length, entry->length);
-		rc = -EINVAL;
-		goto out;
-	}
-
 	ibdev_dbg(dev,
 		  "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n",
 		  entry->io_address, length, entry->mmap_flag);
@@ -439,9 +417,7 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 			  "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
 			  entry->io_address, length, entry->mmap_flag, rc);
 
-out:
-	rdma_user_mmap_entry_put(ucontext, rdma_entry);
-
+	rdma_user_mmap_entry_put(rdma_entry);
 	return rc;
 }
 
@@ -713,7 +689,7 @@ static int qedr_copy_cq_uresp(struct qedr_dev *dev,
 
 	uresp.db_offset = db_offset;
 	uresp.icid = cq->icid;
-	uresp.db_rec_addr = rdma_user_mmap_get_key(cq->q.db_mmap_entry);
+	uresp.db_rec_addr = rdma_user_mmap_get_offset(cq->q.db_mmap_entry);
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
@@ -758,7 +734,7 @@ static int qedr_init_user_db_rec(struct ib_udata *udata,
 	/* Allocate a page for doorbell recovery, add to mmap */
 	q->db_rec_data = (void *)get_zeroed_page(GFP_USER);
 	if (!q->db_rec_data) {
-		DP_ERR(dev, "get_free_page failed\n");
+		DP_ERR(dev, "get_zeroed_page failed\n");
 		return -ENOMEM;
 	}
 
@@ -1044,8 +1020,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
 		if (ctx)
-			rdma_user_mmap_entry_remove(&ctx->ibucontext,
-						    cq->q.db_mmap_entry);
+			rdma_user_mmap_entry_remove(cq->q.db_mmap_entry);
 	} else {
 		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
 	}
@@ -1068,8 +1043,6 @@ int qedr_resize_cq(struct ib_cq *ibcq, int new_cnt, struct ib_udata *udata)
 
 void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
-	struct qedr_ucontext *ctx = rdma_udata_to_drv_context(udata,
-		struct qedr_ucontext, ibucontext);
 	struct qedr_dev *dev = get_qedr_dev(ibcq->device);
 	struct qed_rdma_destroy_cq_out_params oparams;
 	struct qed_rdma_destroy_cq_in_params iparams;
@@ -1097,8 +1070,7 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 		if (cq->q.db_rec_data) {
 			qedr_db_recovery_del(dev, cq->q.db_addr,
 					     &cq->q.db_rec_data->db_data);
-			rdma_user_mmap_entry_remove(&ctx->ibucontext,
-						    cq->q.db_mmap_entry);
+			rdma_user_mmap_entry_remove(cq->q.db_mmap_entry);
 		}
 	} else {
 		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
@@ -1280,7 +1252,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev,
 	}
 
 	uresp->rq_icid = qp->icid;
-	uresp->rq_db_rec_addr = rdma_user_mmap_get_key(qp->urq.db_mmap_entry);
+	uresp->rq_db_rec_addr = rdma_user_mmap_get_offset(qp->urq.db_mmap_entry);
 }
 
 static void qedr_copy_sq_uresp(struct qedr_dev *dev,
@@ -1295,7 +1267,8 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev,
 	else
 		uresp->sq_icid = qp->icid + 1;
 
-	uresp->sq_db_rec_addr = rdma_user_mmap_get_key(qp->usq.db_mmap_entry);
+	uresp->sq_db_rec_addr =
+		rdma_user_mmap_get_offset(qp->usq.db_mmap_entry);
 }
 
 static int qedr_copy_qp_uresp(struct qedr_dev *dev,
@@ -1747,15 +1720,13 @@ static void qedr_cleanup_user(struct qedr_dev *dev,
 	if (qp->usq.db_rec_data) {
 		qedr_db_recovery_del(dev, qp->usq.db_addr,
 				     &qp->usq.db_rec_data->db_data);
-		rdma_user_mmap_entry_remove(&ctx->ibucontext,
-					    qp->usq.db_mmap_entry);
+		rdma_user_mmap_entry_remove(qp->usq.db_mmap_entry);
 	}
 
 	if (qp->urq.db_rec_data) {
 		qedr_db_recovery_del(dev, qp->urq.db_addr,
 				     &qp->urq.db_rec_data->db_data);
-		rdma_user_mmap_entry_remove(&ctx->ibucontext,
-					    qp->urq.db_mmap_entry);
+		rdma_user_mmap_entry_remove(qp->urq.db_mmap_entry);
 	}
 
 	if (rdma_protocol_iwarp(&dev->ibdev, 1))
diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index ecc338e8f0b09d..f851afb5632e65 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -507,7 +507,6 @@ struct iwarp_msg_info {
 struct siw_user_mmap_entry {
 	struct rdma_user_mmap_entry rdma_entry;
 	void *address;
-	size_t length;
 };
 
 /* Global siw parameters. Currently set in siw_main.c */
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index cc4e7aebc9bc09..725985ed8af3be 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -44,7 +44,6 @@ void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
-	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
 	size_t size = vma->vm_end - vma->vm_start;
 	struct rdma_user_mmap_entry *rdma_entry;
 	struct siw_user_mmap_entry *entry;
@@ -57,29 +56,22 @@ int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 		pr_warn("siw: mmap not page aligned\n");
 		return -EINVAL;
 	}
-	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off,
-					      vma);
+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, vma);
 	if (!rdma_entry) {
 		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %#zx\n",
-			off, size);
+			vma->vm_pgoff, size);
 		return -EINVAL;
 	}
 	entry = to_siw_mmap_entry(rdma_entry);
-	if (PAGE_ALIGN(entry->length) != size) {
-		siw_dbg(&uctx->sdev->base_dev,
-			"key[%#lx] does not have valid length[%#zx] expected[%#zx]\n",
-			off, size, entry->length);
-		rv = -EINVAL;
-		goto out;
-	}
 
 	rv = remap_vmalloc_range(vma, entry->address, 0);
 	if (rv) {
-		pr_warn("remap_vmalloc_range failed: %lu, %zu\n", off, size);
+		pr_warn("remap_vmalloc_range failed: %lu, %zu\n", vma->vm_pgoff,
+			size);
 		goto out;
 	}
 out:
-	rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
+	rdma_user_mmap_entry_put(rdma_entry);
 
 	return rv;
 }
@@ -274,17 +266,16 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
 static struct rdma_user_mmap_entry *
 siw_mmap_entry_insert(struct siw_ucontext *uctx,
 		      void *address, size_t length,
-		      u64 *key)
+		      u64 *offset)
 {
 	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
 	int rv;
 
-	*key = SIW_INVAL_UOBJ_KEY;
+	*offset = SIW_INVAL_UOBJ_KEY;
 	if (!entry)
 		return NULL;
 
 	entry->address = address;
-	entry->length = length;
 
 	rv = rdma_user_mmap_entry_insert(&uctx->base_ucontext,
 					 &entry->rdma_entry,
@@ -294,7 +285,7 @@ siw_mmap_entry_insert(struct siw_ucontext *uctx,
 		return NULL;
 	}
 
-	*key = rdma_user_mmap_get_key(&entry->rdma_entry);
+	*offset = rdma_user_mmap_get_offset(&entry->rdma_entry);
 
 	return &entry->rdma_entry;
 }
@@ -512,10 +503,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 
 	if (qp) {
 		if (uctx) {
-			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
-						    qp->sq_entry);
-			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
-						    qp->rq_entry);
+			rdma_user_mmap_entry_remove(qp->sq_entry);
+			rdma_user_mmap_entry_remove(qp->rq_entry);
 		}
 		vfree(qp->sendq);
 		vfree(qp->recvq);
@@ -631,8 +620,8 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
 	qp->rx_stream.rx_suspend = 1;
 
 	if (uctx) {
-		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_entry);
-		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_entry);
+		rdma_user_mmap_entry_remove(qp->sq_entry);
+		rdma_user_mmap_entry_remove(qp->rq_entry);
 	}
 
 	down_write(&qp->state_lock);
@@ -1104,7 +1093,7 @@ void siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata)
 	siw_cq_flush(cq);
 
 	if (ctx)
-		rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_entry);
+		rdma_user_mmap_entry_remove(cq->cq_entry);
 
 	atomic_dec(&sdev->num_cq);
 
@@ -1198,8 +1187,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
 		if (ctx)
-			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
-						    cq->cq_entry);
+			rdma_user_mmap_entry_remove(cq->cq_entry);
 		vfree(cq->queue);
 	}
 	atomic_dec(&sdev->num_cq);
@@ -1650,8 +1638,7 @@ int siw_create_srq(struct ib_srq *base_srq,
 err_out:
 	if (srq->recvq) {
 		if (ctx)
-			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
-						    srq->srq_entry);
+			rdma_user_mmap_entry_remove(srq->srq_entry);
 		vfree(srq->recvq);
 	}
 	atomic_dec(&sdev->num_srq);
@@ -1738,8 +1725,7 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata)
 					  base_ucontext);
 
 	if (ctx)
-		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
-					    srq->srq_entry);
+		rdma_user_mmap_entry_remove(srq->srq_entry);
 	vfree(srq->recvq);
 	atomic_dec(&sdev->num_srq);
 }
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 13de7c1b796d7d..416e72ea80d913 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2262,15 +2262,16 @@ struct iw_cm_conn_param;
 struct rdma_user_mmap_entry {
 	struct kref ref;
 	struct ib_ucontext *ucontext;
-	u32 npages;
-	u32 mmap_page;
-	bool invalid;
+	unsigned long start_pgoff;
+	size_t npages;
+	bool driver_removed;
 };
 
+/* Return the offset (in bytes) the user should pass to libc's mmap() */
 static inline u64
-rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
+rdma_user_mmap_get_offset(const struct rdma_user_mmap_entry *entry)
 {
-	return (u64)entry->mmap_page << PAGE_SHIFT;
+	return (u64)entry->start_pgoff << PAGE_SHIFT;
 }
 
 /**
@@ -2832,14 +2833,14 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
 				struct rdma_user_mmap_entry *entry,
 				size_t length);
 struct rdma_user_mmap_entry *
-rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
+rdma_user_mmap_entry_get_pgoff(struct ib_ucontext *ucontext,
+			       unsigned long pgoff);
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext,
 			 struct vm_area_struct *vma);
+void rdma_user_mmap_entry_put(struct rdma_user_mmap_entry *entry);
 
-void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
-			      struct rdma_user_mmap_entry *entry);
-
-void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
-				 struct rdma_user_mmap_entry *entry);
+void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry);
 
 static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
 {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers
  2019-11-05 19:54   ` Jason Gunthorpe
@ 2019-11-06  8:20     ` Gal Pressman
  0 siblings, 0 replies; 20+ messages in thread
From: Gal Pressman @ 2019-11-06  8:20 UTC (permalink / raw)
  To: Jason Gunthorpe, Michal Kalderon
  Cc: ariel.elior, dledford, yishaih, bmt, linux-rdma, sleybo

On 05/11/2019 21:54, Jason Gunthorpe wrote:
>>  static int qp_mmap_entries_setup(struct efa_qp *qp,
>>  				 struct efa_dev *dev,
>>  				 struct efa_ucontext *ucontext,
>>  				 struct efa_com_create_qp_params *params,
>>  				 struct efa_ibv_create_qp_resp *resp)
>>  {
>> -	/*
>> -	 * Once an entry is inserted it might be mmapped, hence cannot be
>> -	 * cleaned up until dealloc_ucontext.
>> -	 */
>> -	resp->sq_db_mmap_key =
>> -		mmap_entry_insert(dev, ucontext, qp,
>> -				  dev->db_bar_addr + resp->sq_db_offset,
>> -				  PAGE_SIZE, EFA_MMAP_IO_NC);
>> -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
>> +	size_t length;
>> +	u64 address;
>> +
>> +	address = dev->db_bar_addr + resp->sq_db_offset;
>> +	qp->sq_db_mmap_entry =
>> +		efa_user_mmap_entry_insert(&ucontext->ibucontext,
>> +					   address,
>> +					   PAGE_SIZE, EFA_MMAP_IO_NC,
>> +					   &resp->sq_db_mmap_key);
> 
> I'm still confused how this is OK for the lifetime, 'sq_db_offset'
> comes from the device, does the device prevent re-use of the same
> db_offset until the ucontext is closed? If so that deserves a comment
> in here.

The device prevents reuse of the DB offset until the UAR is deallocated (during
ucontext close). Same applies for the RQ DB and LLQ offset.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
  2019-11-05 20:41 ` [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Jason Gunthorpe
@ 2019-11-06 13:04   ` Michal Kalderon
  2019-11-06 15:10     ` Jason Gunthorpe
  2019-11-06 17:09     ` Jason Gunthorpe
  0 siblings, 2 replies; 20+ messages in thread
From: Michal Kalderon @ 2019-11-06 13:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Ariel Elior, dledford, galpress, yishaih, bmt, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> 
> On Wed, Oct 30, 2019 at 11:44:09AM +0200, Michal Kalderon wrote:
> > This patch series uses the doorbell overflow recovery mechanism
> > introduced in commit 36907cd5cd72 ("qed: Add doorbell overflow
> > recovery mechanism") for rdma ( RoCE and iWARP )
> >
> > The first six patches modify the core code to contain helper functions
> > for managing mmap_xa inserting, getting and freeing entries. The code
> > was based on the code from efa driver.
> > There is still an open discussion on whether we should take this even
> > further and make the entire mmap generic. Until a decision is made, I
> > only created the database API and modified the efa, qedr, siw driver
> > to use it. The functions are integrated with the umap mechanism.
> >
> > The doorbell recovery code is based on the common code.
> >
> > rdma-core pull request #493 was closed for now, once kernel series is
> > accepted will be reopend.
> >
> > This series applies over the wip/jgg-for-next branch and not the
> > for-next since it contains the series:
> > RDMA/qedr: Fix memory leaks and synchronization
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.spinics.net_l
> > ists_linux-
> 2Drdma_msg85242.html&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=7Y
> > unNpwaTtA-
> c31OjGDRljIVVAwuEHIfekR2HBOc7Ss&m=L5mJN0hgrbvf3DM5YqnWV_SP8x
> > T4w4-
> EglZvbLfNeac&s=vhHBEGZnBPmT8qnK6mkfWyDGgjuTscDayoTdsqEe4Sk&e=
> >
> > SIW driver was reviewed, tested and signed-off by Bernard Metzler.
> 
> Since we are on v12 now, let us get this done. I added this diff to the series.
> Mostly just renaming, indenting, the notes I gave already (to all drivers) and
> probably a few other things I forgot
> 
> Here is the series with everything reflowed hopefully properly:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_jgunthorpe_linux_commits_rdma-
> 5Fmmap&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=7YunNpwaTtA-
> c31OjGDRljIVVAwuEHIfekR2HBOc7Ss&m=L5mJN0hgrbvf3DM5YqnWV_SP8xT
> 4w4-EglZvbLfNeac&s=Msa-
> ckgjuOMPVTodTsdmuiOIg88xQUn0zXYiXybrGv4&e=
> 
> Please let me know if I messed it up, otherwise I'll apply this in a few days.
Thanks Jason. One small comment is that the length field was removed from siw + efa 
But not qedr_user_mmap_entry, I can also send a patch after to fix this. 
And one small insignificant typo below

Other than that all fixes look good, thanks.
Tested-by: Michal Kalderon <michal.kalderon@marvell.com>


> 
> Thanks,
> Jason
> 
> diff --git a/drivers/infiniband/core/ib_core_uverbs.c
> b/drivers/infiniband/core/ib_core_uverbs.c
> index 88d9d47fb8adaa..6238842fd06402 100644
> --- a/drivers/infiniband/core/ib_core_uverbs.c
> +++ b/drivers/infiniband/core/ib_core_uverbs.c
> @@ -11,14 +11,14 @@
> 
>  /**
> - * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
> + * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa
>   *
>   * @ucontext: associated user context.
>   * @entry: the entry to insert into the mmap_xa
>   * @length: length of the address that will be mmapped
>   *
>   * This function should be called by drivers that use the rdma_user_mmap
> - * interface for handling user mmapped addresses. The database is handled
> in
> - * the core and helper functions are provided to insert entries into the
> - * database and extract entries when the user calls mmap with the given
> key.
> - * The function allocates a unique key that should be provided to user, the
> user
> - * will use the key to retrieve information such as address to
> - * be mapped and how.
> + * interface for implementing their mmap syscall A database of mmap
> + offsets is
> + * handled in the core and helper functions are provided to insert
> + entries
> + * into the database and extract entries when the user calls mmap with
> + the
> + * given offset.  The function allocates a unique page offset that
> + should be
> + * provided to user, the user will use the iffset to retrieve

Typo - should be offset

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
  2019-11-06 13:04   ` Michal Kalderon
@ 2019-11-06 15:10     ` Jason Gunthorpe
  2019-11-06 17:09     ` Jason Gunthorpe
  1 sibling, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2019-11-06 15:10 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: Ariel Elior, dledford, galpress, yishaih, bmt, linux-rdma

> > Please let me know if I messed it up, otherwise I'll apply this in
> > a few days.
>
> Thanks Jason. One small comment is that the length field was removed from siw + efa 
> But not qedr_user_mmap_entry, I can also send a patch after to fix this. 
> And one small insignificant typo below

qedr looked to me like it was using non page size lengths, so I left
it..

> > diff --git a/drivers/infiniband/core/ib_core_uverbs.c
> > b/drivers/infiniband/core/ib_core_uverbs.c
> > index 88d9d47fb8adaa..6238842fd06402 100644
> > +++ b/drivers/infiniband/core/ib_core_uverbs.c
> > @@ -11,14 +11,14 @@
> > 
> >  /**
> > - * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa.
> > + * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa
> >   *
> >   * @ucontext: associated user context.
> >   * @entry: the entry to insert into the mmap_xa
> >   * @length: length of the address that will be mmapped
> >   *
> >   * This function should be called by drivers that use the rdma_user_mmap
> > - * interface for handling user mmapped addresses. The database is handled
> > in
> > - * the core and helper functions are provided to insert entries into the
> > - * database and extract entries when the user calls mmap with the given
> > key.
> > - * The function allocates a unique key that should be provided to user, the
> > user
> > - * will use the key to retrieve information such as address to
> > - * be mapped and how.
> > + * interface for implementing their mmap syscall A database of mmap
> > + offsets is
> > + * handled in the core and helper functions are provided to insert
> > + entries
> > + * into the database and extract entries when the user calls mmap with
> > + the
> > + * given offset.  The function allocates a unique page offset that
> > + should be
> > + * provided to user, the user will use the iffset to retrieve
> 
> Typo - should be offset

Thanks, I'll fix that

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
  2019-11-06 13:04   ` Michal Kalderon
  2019-11-06 15:10     ` Jason Gunthorpe
@ 2019-11-06 17:09     ` Jason Gunthorpe
  1 sibling, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2019-11-06 17:09 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: Ariel Elior, dledford, galpress, yishaih, bmt, linux-rdma

On Wed, Nov 06, 2019 at 01:04:58PM +0000, Michal Kalderon wrote:
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > 
> > On Wed, Oct 30, 2019 at 11:44:09AM +0200, Michal Kalderon wrote:
> > > This patch series uses the doorbell overflow recovery mechanism
> > > introduced in commit 36907cd5cd72 ("qed: Add doorbell overflow
> > > recovery mechanism") for rdma ( RoCE and iWARP )
> > >
> > > The first six patches modify the core code to contain helper functions
> > > for managing mmap_xa inserting, getting and freeing entries. The code
> > > was based on the code from efa driver.
> > > There is still an open discussion on whether we should take this even
> > > further and make the entire mmap generic. Until a decision is made, I
> > > only created the database API and modified the efa, qedr, siw driver
> > > to use it. The functions are integrated with the umap mechanism.
> > >
> > > The doorbell recovery code is based on the common code.
> > >
> > > rdma-core pull request #493 was closed for now, once kernel series is
> > > accepted will be reopend.
> > >
> > > This series applies over the wip/jgg-for-next branch and not the
> > > for-next since it contains the series:
> > > RDMA/qedr: Fix memory leaks and synchronization
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__www.spinics.net_l
> > > ists_linux-
> > 2Drdma_msg85242.html&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=7Y
> > > unNpwaTtA-
> > c31OjGDRljIVVAwuEHIfekR2HBOc7Ss&m=L5mJN0hgrbvf3DM5YqnWV_SP8x
> > > T4w4-
> > EglZvbLfNeac&s=vhHBEGZnBPmT8qnK6mkfWyDGgjuTscDayoTdsqEe4Sk&e=
> > >
> > > SIW driver was reviewed, tested and signed-off by Bernard Metzler.
> > 
> > Since we are on v12 now, let us get this done. I added this diff to the series.
> > Mostly just renaming, indenting, the notes I gave already (to all drivers) and
> > probably a few other things I forgot
> > 
> > Here is the series with everything reflowed hopefully properly:
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_jgunthorpe_linux_commits_rdma-
> > 5Fmmap&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=7YunNpwaTtA-
> > c31OjGDRljIVVAwuEHIfekR2HBOc7Ss&m=L5mJN0hgrbvf3DM5YqnWV_SP8xT
> > 4w4-EglZvbLfNeac&s=Msa-
> > ckgjuOMPVTodTsdmuiOIg88xQUn0zXYiXybrGv4&e=
> > 
> > Please let me know if I messed it up, otherwise I'll apply this in a few days.
> Thanks Jason. One small comment is that the length field was removed from siw + efa 
> But not qedr_user_mmap_entry, I can also send a patch after to fix this. 
> And one small insignificant typo below
> 
> Other than that all fixes look good, thanks.
> Tested-by: Michal Kalderon <michal.kalderon@marvell.com>

Okay, applied to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-11-06 17:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30  9:44 [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
2019-10-30  9:44 ` [PATCH v12 rdma-next 1/8] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
2019-10-30  9:44 ` [PATCH v12 rdma-next 2/8] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
2019-10-31 12:35   ` Yishai Hadas
2019-11-01 10:19     ` [EXT] " Michal Kalderon
2019-11-05 19:44   ` Jason Gunthorpe
2019-10-30  9:44 ` [PATCH v12 rdma-next 3/8] RDMA: Connect between the mmap entry and the umap_priv structure Michal Kalderon
2019-11-05 19:47   ` Jason Gunthorpe
2019-10-30  9:44 ` [PATCH v12 rdma-next 4/8] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
2019-11-05 19:54   ` Jason Gunthorpe
2019-11-06  8:20     ` Gal Pressman
2019-10-30  9:44 ` [PATCH v12 rdma-next 5/8] RDMA/siw: " Michal Kalderon
2019-10-30  9:44 ` [PATCH v12 rdma-next 6/8] RDMA/qedr: Use the common mmap API Michal Kalderon
2019-10-30  9:44 ` [PATCH v12 rdma-next 7/8] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
2019-10-30  9:44 ` [PATCH v12 rdma-next 8/8] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
2019-10-30 15:33 ` [PATCH v12 rdma-next 5/8] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
2019-11-05 20:41 ` [PATCH v12 rdma-next 0/8] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Jason Gunthorpe
2019-11-06 13:04   ` Michal Kalderon
2019-11-06 15:10     ` Jason Gunthorpe
2019-11-06 17:09     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).