linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
@ 2019-09-05 10:01 Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

This patch series uses the doorbell overflow recovery mechanism
introduced in
commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
for rdma ( RoCE and iWARP )

The first five patches modify the core code to contain helper
functions for managing mmap_xa inserting, getting and freeing
entries. The code was based on the code from efa driver.
There is still an open discussion on whether we should take
this even further and make the entire mmap generic. Until a
decision is made, I only created the database API and modified
the efa, qedr, siw driver to use it. The functions are integrated
with the umap mechanism.

The doorbell recovery code is based on the common code.

rdma-core pull request #493

Efa driver was reviewed, tested and signed-off by Gal Pressman.
SIW driver was reviewed, tested and signed-off by Bernard Metzler.
Thanks Gal and Bernard! 

Changes from v10:
- SIW - remove keys only when user context exists.
- EFA - fix debug print in error path of mmap.

Changes from V9:
- EFA changes (requested by Gal)
  - Entries should be removed and deleted in destroy_qp flow only after
    they are unmapped.
  - Fix mmap entries remove to really be in reverse order :-)
  - Always refer to the rdma_user_mmap_entry as rdma_entry for consistency.
  - In case of errors in __efa_mmap, print an error message only once.
SIW changes (requested by Bernard)
  - No need to check if the key is invalid before removing it. There are
    already sanity checks inside the remove function.

Changes from V8:
- CORE changes
  - Fix race between getting an entry and deleting it. Increase
    the refcount under the lock only if it is not zero.  Erase all entries
    with __xa_erase instead of xa_erase and take the lock outside the loop.
  - Fix comment when erasing all the xa_entries of a single mmap_entry.
  - Take comment out of loop
  - Change length field in driver structures to be size_t instead of u64
    suggested by Bernard Metzler
  - Change do..while(true) to while(true)
- COMMON driver changes
  - Change mmap length to be size_t instead of u64.
  - In mmap, call put_entry if there is a length error.
- EFA changes:
  - Reverse mmap entries remove order.
  - Give meaningful label names in create_qp error flows.
  - In error flow undo change that frees pages based only on key and
    make sure rq_size > 0 first.
  - Fix xmas tree alignment, move ucontext initialization to declaration
    line.
- SIW changes:
  - Changes received from Bernard Metzler
	- make the siw_user_mmap_entry.address a void *, which
	  naturally fits with remap_vmalloc_range. also avoids
	  other casting during resource address assignment.
	- do not kfree SQ/RQ/CQ/SRQ in preparation of mmap.
	  Those resources are always further needed ;)
	- Fix check for correct mmap range in siw_mmap().
	  - entry->length is the object length. We have to
	    expand to PAGE_ALIGN(entry->length), since mmap
	    comes with complete page(s) containing the
	    object.
	  - put mmap_entry if that check fails. Otherwise
	    entry object ref counting screws up, and later
	    crashes during context close.
	- simplify siw_mmap_free() - it must just free
	  the entry.
  - Change length to size_t instead of u64

Changes from V7:
- Remove license text, SPDX id should suffice.
- Fix some comments text.
- Add comment regarding vm_ops being set in ib_uverbs_mmap.
- Allocate the rdma_user_mmap_entry in the driver and not in the
  ib_core_uverbs. This lead to defining three new structures per driver
  and seperating the fields between the driver private structures and
  the common rdma_user_mmap_entry. Freeing the entry was also moved
  to the drivers.
- Fix bug found by Gal Pressman. Call mmap_free only once per entry.
- Add a mutex around xa_mmap insert to assure threads won't intefere
  while the xa lock is released when inserting an entry into the range.
- Modify the insert algorithm to be more elegant using the
  xas_next_entry instead of foreach.
- Remove the rdma_user_mmap_entries_remove_free function, now that umap.
  and mmap_xa are integrated we should not have any entries in the mmap_xa
  when ucontext is released. Replace the function with a WARN_ON(!xa_empty).
- Rdma_umap_open needs to reset the vm_private_data before initializing it.
- Decrease rdma_user_mmap_entry reference count on mmap disassociate.
- Remove WARN_ON(!kref_read) this is checked when kref debug is on.
- Remove some redundant defines from ib_verbs.h.
- Better error handling for efa create qp flow.
- Add a function that wraps the entry allocation and rdma_user_mmap_entry_insert
  which is used in all places that need to add an entry to the xarray.
- Remove rq_entry_inserted field in efa create qp flow.
- Add mmap_free to siw and free the memory only on mmap free and not before.

Changes from V6:
- Modified series description to be closer to what the series is now.
- Create a new file for the new rdma_user_mmap function. The file
  is called ib_uverbs_core. This file should contain functions related
  to user which are called by hw to eventually enable ib_uverbs to be
  optional.
- Modify SIW driver to use new mmap api.
- When calculating number of pages, need to round it up to PAGE_SIZE.
- Integrate the mmap_xa and umap mechanism so that the entries in
  mmap_xa now have a reference count and can be removed. Previously
  entries existed until context was destroyed. This modified the
  algorithm for allocating a free page range.
- Modify algorithm for inserting an entry into the mmap_xa.
- Rdma_umap_priv is now also used for all mmaps done using the
  mmap_xa helpers.
- Move remove_free header to core_priv.
- Rdma_user_mmap_entry now has a kref that is increase on mmap
  and umap_open and decreased on umap_close.
- Modify efa + qedr to remove the entry from xa_map. This will
  decrease the refcnt and free memory only if refcnt is zero.
- Rdma_user_mmap_io slightly modified to enable drivers not using
  the xa_mmap API to continue using it.
- Modify page allocation for user to use GFP_USER instead of GFP_KERNEL

Changes from V5:
- Switch between driver dealloc_ucontext and mmap_entries_remove call.
- No need to verify the key after using the key to load an entry from
  the mmap_xa.
- Change mmap_free api to pass an 'entry' object.
- Add documentation for mmap_free and for newly exported functions.
- Fix some extra/missing line breaks.

Changes from V4:
- Add common mmap database and cookie helper functions.

Changes from V3:
- Remove casts from void to u8. Pointer arithmetic can be done on void
- rebase to tip of rdma-next

Changes from V2:
- Don't use long-lived kmap. Instead use user-trigger mmap for the
  doorbell recovery entries.
- Modify dpi_addr to be denoted with __iomem and avoid redundant
  casts

Changes from V1:
- call kmap to map virtual address into kernel space
- modify db_rec_delete to be void
- remove some cpu_to_le16 that were added to previous patch which are
  correct but not related to the overflow recovery mechanism. Will be
  submitted as part of a different patch


Michal Kalderon (7):
  RDMA/core: Move core content from ib_uverbs to ib_core
  RDMA/core: Create mmap database and cookie helper functions
  RDMA/efa: Use the common mmap_xa helpers
  RDMA/siw: Use the common mmap_xa helpers
  RDMA/qedr: Use the common mmap API
  RDMA/qedr: Add doorbell overflow recovery support
  RDMA/qedr: Add iWARP doorbell recovery support

 drivers/infiniband/core/Makefile         |   2 +-
 drivers/infiniband/core/core_priv.h      |  16 +
 drivers/infiniband/core/device.c         |   1 +
 drivers/infiniband/core/ib_core_uverbs.c | 353 ++++++++++++++++++++
 drivers/infiniband/core/rdma_core.c      |   1 +
 drivers/infiniband/core/uverbs_cmd.c     |   1 +
 drivers/infiniband/core/uverbs_main.c    |  97 +-----
 drivers/infiniband/hw/efa/efa.h          |  18 +-
 drivers/infiniband/hw/efa/efa_main.c     |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c    | 354 ++++++++++----------
 drivers/infiniband/hw/qedr/main.c        |   1 +
 drivers/infiniband/hw/qedr/qedr.h        |  45 ++-
 drivers/infiniband/hw/qedr/verbs.c       | 544 ++++++++++++++++++++++---------
 drivers/infiniband/hw/qedr/verbs.h       |   3 +-
 drivers/infiniband/sw/siw/siw.h          |  20 +-
 drivers/infiniband/sw/siw/siw_main.c     |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c    | 216 ++++++------
 drivers/infiniband/sw/siw/siw_verbs.h    |   1 +
 include/rdma/ib_verbs.h                  |  37 ++-
 include/uapi/rdma/qedr-abi.h             |  25 ++
 20 files changed, 1192 insertions(+), 545 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_core_uverbs.c

-- 
2.14.5


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/Makefile         |  2 +-
 drivers/infiniband/core/core_priv.h      |  9 ++++
 drivers/infiniband/core/ib_core_uverbs.c | 73 +++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c    | 74 ++------------------------------
 4 files changed, 86 insertions(+), 72 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_core_uverbs.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 09881bd5f12d..9a8871e21545 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -11,7 +11,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
-				nldev.o restrack.o counters.o
+				nldev.o restrack.o counters.o ib_core_uverbs.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3a8b0911c3bc..0252da9560f4 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -387,4 +387,13 @@ int ib_device_set_netns_put(struct sk_buff *skb,
 
 int rdma_nl_net_init(struct rdma_dev_net *rnet);
 void rdma_nl_net_exit(struct rdma_dev_net *rnet);
+
+struct rdma_umap_priv {
+	struct vm_area_struct *vma;
+	struct list_head list;
+};
+
+void rdma_umap_priv_init(struct rdma_umap_priv *priv,
+			 struct vm_area_struct *vma);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
new file mode 100644
index 000000000000..b74d2a2fb342
--- /dev/null
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2019 Marvell. All rights reserved.
+ */
+#include <linux/xarray.h>
+#include "uverbs.h"
+#include "core_priv.h"
+
+/*
+ * Each time we map IO memory into user space this keeps track of the mapping.
+ * When the device is hot-unplugged we 'zap' the mmaps in user space to point
+ * to the zero page and allow the hot unplug to proceed.
+ *
+ * This is necessary for cases like PCI physical hot unplug as the actual BAR
+ * memory may vanish after this and access to it from userspace could MCE.
+ *
+ * RDMA drivers supporting disassociation must have their user space designed
+ * to cope in some way with their IO pages going to the zero page.
+ */
+void rdma_umap_priv_init(struct rdma_umap_priv *priv,
+			 struct vm_area_struct *vma)
+{
+	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
+
+	priv->vma = vma;
+	vma->vm_private_data = priv;
+	/* vm_ops is setup in ib_uverbs_mmap() to avoid module dependencies */
+
+	mutex_lock(&ufile->umap_lock);
+	list_add(&priv->list, &ufile->umaps);
+	mutex_unlock(&ufile->umap_lock);
+}
+EXPORT_SYMBOL(rdma_umap_priv_init);
+
+/*
+ * Map IO memory into a process. This is to be called by drivers as part of
+ * their mmap() functions if they wish to send something like PCI-E BAR memory
+ * to userspace.
+ */
+int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
+		      unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+	struct ib_uverbs_file *ufile = ucontext->ufile;
+	struct rdma_umap_priv *priv;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	if (vma->vm_end - vma->vm_start != size)
+		return -EINVAL;
+
+	/* Driver is using this wrong, must be called by ib_uverbs_mmap */
+	if (WARN_ON(!vma->vm_file ||
+		    vma->vm_file->private_data != ufile))
+		return -EINVAL;
+	lockdep_assert_held(&ufile->device->disassociate_srcu);
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	vma->vm_page_prot = prot;
+	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
+		kfree(priv);
+		return -EAGAIN;
+	}
+
+	rdma_umap_priv_init(priv, vma);
+	return 0;
+}
+EXPORT_SYMBOL(rdma_user_mmap_io);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 02b57240176c..180a5e0f70e4 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -772,6 +772,8 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	return (ret) ? : count;
 }
 
+static const struct vm_operations_struct rdma_umap_ops;
+
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct ib_uverbs_file *file = filp->private_data;
@@ -785,45 +787,13 @@ static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
 		ret = PTR_ERR(ucontext);
 		goto out;
 	}
-
+	vma->vm_ops = &rdma_umap_ops;
 	ret = ucontext->device->ops.mmap(ucontext, vma);
 out:
 	srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
 	return ret;
 }
 
-/*
- * Each time we map IO memory into user space this keeps track of the mapping.
- * When the device is hot-unplugged we 'zap' the mmaps in user space to point
- * to the zero page and allow the hot unplug to proceed.
- *
- * This is necessary for cases like PCI physical hot unplug as the actual BAR
- * memory may vanish after this and access to it from userspace could MCE.
- *
- * RDMA drivers supporting disassociation must have their user space designed
- * to cope in some way with their IO pages going to the zero page.
- */
-struct rdma_umap_priv {
-	struct vm_area_struct *vma;
-	struct list_head list;
-};
-
-static const struct vm_operations_struct rdma_umap_ops;
-
-static void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-				struct vm_area_struct *vma)
-{
-	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
-
-	priv->vma = vma;
-	vma->vm_private_data = priv;
-	vma->vm_ops = &rdma_umap_ops;
-
-	mutex_lock(&ufile->umap_lock);
-	list_add(&priv->list, &ufile->umaps);
-	mutex_unlock(&ufile->umap_lock);
-}
-
 /*
  * The VMA has been dup'd, initialize the vm_private_data with a new tracking
  * struct
@@ -931,44 +901,6 @@ static const struct vm_operations_struct rdma_umap_ops = {
 	.fault = rdma_umap_fault,
 };
 
-/*
- * Map IO memory into a process. This is to be called by drivers as part of
- * their mmap() functions if they wish to send something like PCI-E BAR memory
- * to userspace.
- */
-int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
-		      unsigned long pfn, unsigned long size, pgprot_t prot)
-{
-	struct ib_uverbs_file *ufile = ucontext->ufile;
-	struct rdma_umap_priv *priv;
-
-	if (!(vma->vm_flags & VM_SHARED))
-		return -EINVAL;
-
-	if (vma->vm_end - vma->vm_start != size)
-		return -EINVAL;
-
-	/* Driver is using this wrong, must be called by ib_uverbs_mmap */
-	if (WARN_ON(!vma->vm_file ||
-		    vma->vm_file->private_data != ufile))
-		return -EINVAL;
-	lockdep_assert_held(&ufile->device->disassociate_srcu);
-
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
-		return -ENOMEM;
-
-	vma->vm_page_prot = prot;
-	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
-		kfree(priv);
-		return -EAGAIN;
-	}
-
-	rdma_umap_priv_init(priv, vma);
-	return 0;
-}
-EXPORT_SYMBOL(rdma_user_mmap_io);
-
 void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 {
 	struct rdma_umap_priv *priv, *next_priv;
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  2019-09-19 17:14   ` Jason Gunthorpe
                     ` (2 more replies)
  2019-09-05 10:01 ` [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

Create some common API's for adding entries to a xa_mmap.
Searching for an entry and freeing one.

Most of the code was copied from the efa driver almost as is, just renamed
function to be generic and not efa specific.
In addition to original code, the xa_mmap entries are now linked
to a umap_priv object and reference counted according to umap operations.
The fact that this code moved to core enabled managing it differently,
so that now entries can be removed and deleted when driver+user are
done with them. This enabled changing the insert algorithm in
comparison to what was done in efa.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/core_priv.h      |  11 +-
 drivers/infiniband/core/device.c         |   1 +
 drivers/infiniband/core/ib_core_uverbs.c | 312 +++++++++++++++++++++++++++++--
 drivers/infiniband/core/rdma_core.c      |   1 +
 drivers/infiniband/core/uverbs_cmd.c     |   1 +
 drivers/infiniband/core/uverbs_main.c    |  23 ++-
 include/rdma/ib_verbs.h                  |  37 +++-
 7 files changed, 350 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 0252da9560f4..20beb5de6996 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -391,9 +391,16 @@ void rdma_nl_net_exit(struct rdma_dev_net *rnet);
 struct rdma_umap_priv {
 	struct vm_area_struct *vma;
 	struct list_head list;
+	struct rdma_user_mmap_entry *entry;
 };
 
-void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-			 struct vm_area_struct *vma);
+int rdma_umap_priv_init(struct vm_area_struct *vma,
+			struct rdma_user_mmap_entry *entry);
+
+void rdma_umap_priv_delete(struct ib_uverbs_file *ufile,
+			   struct rdma_umap_priv *priv);
+
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry);
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 99c4a55545cf..8506844ae132 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2629,6 +2629,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
 	SET_DEVICE_OP(dev_ops, map_phys_fmr);
 	SET_DEVICE_OP(dev_ops, mmap);
+	SET_DEVICE_OP(dev_ops, mmap_free);
 	SET_DEVICE_OP(dev_ops, modify_ah);
 	SET_DEVICE_OP(dev_ops, modify_cq);
 	SET_DEVICE_OP(dev_ops, modify_device);
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
index b74d2a2fb342..40f58bcb6fa7 100644
--- a/drivers/infiniband/core/ib_core_uverbs.c
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -8,42 +8,99 @@
 #include "uverbs.h"
 #include "core_priv.h"
 
-/*
- * Each time we map IO memory into user space this keeps track of the mapping.
- * When the device is hot-unplugged we 'zap' the mmaps in user space to point
- * to the zero page and allow the hot unplug to proceed.
+/**
+ * rdma_umap_priv_init() - Initialize the private data of a vma
+ *
+ * @vma: The vm area struct that needs private data
+ * @entry: entry into the mmap_xa that needs to be linked with
+ *       this vma
+ *
+ * Each time we map IO memory into user space this keeps track
+ * of the mapping. When the device is hot-unplugged we 'zap' the
+ * mmaps in user space to point to the zero page and allow the
+ * hot unplug to proceed.
  *
  * This is necessary for cases like PCI physical hot unplug as the actual BAR
  * memory may vanish after this and access to it from userspace could MCE.
  *
  * RDMA drivers supporting disassociation must have their user space designed
  * to cope in some way with their IO pages going to the zero page.
+ *
+ * We extended the umap list usage to track all memory that was mapped by
+ * user space and not only the IO memory. This will occur for drivers that use
+ * the mmap_xa database and helper functions
+ *
+ * Return 0 on success or -ENOMEM if out of memory
  */
-void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-			 struct vm_area_struct *vma)
+int rdma_umap_priv_init(struct vm_area_struct *vma,
+			struct rdma_user_mmap_entry *entry)
 {
 	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
+	struct rdma_umap_priv *priv;
+
+	/* If the xa_mmap is used, private data will already be initialized.
+	 * this is required for the cases that rdma_user_mmap_io is called
+	 * from drivers that don't use the xa_mmap database
+	 */
+	if (vma->vm_private_data)
+		return 0;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
 
 	priv->vma = vma;
+	priv->entry = entry;
 	vma->vm_private_data = priv;
 	/* vm_ops is setup in ib_uverbs_mmap() to avoid module dependencies */
 
 	mutex_lock(&ufile->umap_lock);
 	list_add(&priv->list, &ufile->umaps);
 	mutex_unlock(&ufile->umap_lock);
+
+	return 0;
 }
 EXPORT_SYMBOL(rdma_umap_priv_init);
 
-/*
- * Map IO memory into a process. This is to be called by drivers as part of
- * their mmap() functions if they wish to send something like PCI-E BAR memory
- * to userspace.
+/**
+ * rdma_umap_priv_delete() - Delete an entry from the umaps list
+ *
+ * @ufile: associated user file:
+ * @priv:  private data allocated and stored in
+ *      rdma_umap_priv_init
+ *
+ */
+void rdma_umap_priv_delete(struct ib_uverbs_file *ufile,
+			   struct rdma_umap_priv *priv)
+{
+	mutex_lock(&ufile->umap_lock);
+	list_del(&priv->list);
+	mutex_unlock(&ufile->umap_lock);
+	kfree(priv);
+}
+EXPORT_SYMBOL(rdma_umap_priv_delete);
+
+/**
+ * rdma_user_mmap_io() - Map IO memory into a process.
+ *
+ * @ucontext: associated user context
+ * @vma: the vma related to the current mmap call.
+ * @pfn: pfn to map
+ * @size: size to map
+ * @prot: pgprot to use in remap call
+ *
+ * This is to be called by drivers as part of their mmap()
+ * functions if they wish to send something like PCI-E BAR
+ * memory to userspace.
+ *
+ * Return -EINVAL on wrong flags or size, -EAGAIN on failure to
+ * map. 0 on success.
  */
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		      unsigned long pfn, unsigned long size, pgprot_t prot)
 {
 	struct ib_uverbs_file *ufile = ucontext->ufile;
-	struct rdma_umap_priv *priv;
+	int ret;
 
 	if (!(vma->vm_flags & VM_SHARED))
 		return -EINVAL;
@@ -57,17 +114,240 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		return -EINVAL;
 	lockdep_assert_held(&ufile->device->disassociate_srcu);
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
-		return -ENOMEM;
+	ret = rdma_umap_priv_init(vma, NULL);
+	if (ret)
+		return ret;
 
 	vma->vm_page_prot = prot;
 	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
-		kfree(priv);
+		rdma_umap_priv_delete(ufile, vma->vm_private_data);
 		return -EAGAIN;
 	}
 
-	rdma_umap_priv_init(priv, vma);
 	return 0;
 }
 EXPORT_SYMBOL(rdma_user_mmap_io);
+
+static inline u64
+rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
+{
+	return (u64)entry->mmap_page << PAGE_SHIFT;
+}
+
+/**
+ * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @key: the key received from rdma_user_mmap_entry_insert which
+ *     is provided by user as the address to map.
+ * @len: the length the user wants to map.
+ * @vma: the vma related to the current mmap call.
+ *
+ * This function is called when a user tries to mmap a key it
+ * initially received from the driver. The key was created by
+ * the function rdma_user_mmap_entry_insert. The function should
+ * be called only once per mmap. It initializes the vma and
+ * increases the entries ref-count. Once the memory is unmapped
+ * the ref-count will decrease. When the refcount reaches zero
+ * the entry will be deleted.
+ *
+ * Return an entry if exists or NULL if there is no match.
+ */
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len,
+			 struct vm_area_struct *vma)
+{
+	struct rdma_user_mmap_entry *entry;
+	u64 mmap_page;
+
+	mmap_page = key >> PAGE_SHIFT;
+	if (mmap_page > U32_MAX)
+		return NULL;
+
+	xa_lock(&ucontext->mmap_xa);
+
+	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+	if (!entry)
+		goto err;
+
+	/* if refcount is zero, entry is already being deleted */
+	if (!kref_get_unless_zero(&entry->ref))
+		goto err;
+
+	xa_unlock(&ucontext->mmap_xa);
+	rdma_umap_priv_init(vma, entry);
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] returned\n",
+		  key, entry->npages);
+
+	return entry;
+
+err:
+	xa_unlock(&ucontext->mmap_xa);
+	return NULL;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_get);
+
+void rdma_user_mmap_entry_free(struct kref *kref)
+{
+	struct rdma_user_mmap_entry *entry =
+		container_of(kref, struct rdma_user_mmap_entry, ref);
+	struct ib_ucontext *ucontext = entry->ucontext;
+	unsigned long i;
+
+	/* need to erase all entries occupied by this single entry */
+	xa_lock(&ucontext->mmap_xa);
+	for (i = 0; i < entry->npages; i++)
+		__xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
+	xa_unlock(&ucontext->mmap_xa);
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] removed\n",
+		  rdma_user_mmap_get_key(entry),
+		  entry->npages);
+
+	if (ucontext->device->ops.mmap_free)
+		ucontext->device->ops.mmap_free(entry);
+}
+
+/**
+ * rdma_user_mmap_entry_put() - drop reference to the mmap entry
+ *
+ * @ucontext: associated user context.
+ * @entry: an entry in the mmap_xa.
+ *
+ * This function is called when the mapping is closed or when
+ * the driver is done with the entry for some other reason.
+ * Should be called after rdma_user_mmap_entry_get was called
+ * and entry is no longer needed. This function will erase the
+ * entry and free it if its refcnt reaches zero.
+ */
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry)
+{
+	kref_put(&entry->ref, rdma_user_mmap_entry_free);
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_put);
+
+/**
+ * rdma_user_mmap_entry_remove() - Remove a key's entry from the mmap_xa
+ *
+ * @ucontext: associated user context.
+ * @key: the key to be deleted
+ *
+ * This function will find if there is an entry matching the key and if so
+ * decrease its refcnt, which will in turn delete the entry if
+ * its refcount reaches zero.
+ */
+void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, u64 key)
+{
+	struct rdma_user_mmap_entry *entry;
+	u32 mmap_page;
+
+	if (key == RDMA_USER_MMAP_INVALID)
+		return;
+
+	mmap_page = key >> PAGE_SHIFT;
+	if (mmap_page > U32_MAX)
+		return;
+
+	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+	if (!entry)
+		return;
+
+	rdma_user_mmap_entry_put(ucontext, entry);
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
+
+/**
+ * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @entry: the entry to insert into the mmap_xa
+ * @length: length of the address that will be mmapped
+ *
+ * This function should be called by drivers that use the rdma_user_mmap
+ * interface for handling user mmapped addresses. The database is handled in
+ * the core and helper functions are provided to insert entries into the
+ * database and extract entries when the user call mmap with the given key.
+ * The function returns a unique key that should be provided to user, the user
+ * will use the key to retrieve information such as address to
+ * be mapped and how.
+ *
+ * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
+ */
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				struct rdma_user_mmap_entry *entry,
+				size_t length)
+{
+	struct ib_uverbs_file *ufile = ucontext->ufile;
+	XA_STATE(xas, &ucontext->mmap_xa, 0);
+	u32 xa_first, xa_last, npages;
+	int err, i;
+
+	if (!entry)
+		return RDMA_USER_MMAP_INVALID;
+
+	kref_init(&entry->ref);
+	entry->ucontext = ucontext;
+
+	/* We want the whole allocation to be done without interruption
+	 * from a different thread. The allocation requires finding a
+	 * free range and storing. During the xa_insert the lock could be
+	 * released, we don't want another thread taking the gap.
+	 */
+	mutex_lock(&ufile->umap_lock);
+
+	xa_lock(&ucontext->mmap_xa);
+
+	/* We want to find an empty range */
+	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
+	entry->npages = npages;
+	while (true) {
+		/* First find an empty index */
+		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
+		if (xas.xa_node == XAS_RESTART)
+			goto err_unlock;
+
+		xa_first = xas.xa_index;
+
+		/* Is there enough room to have the range? */
+		if (check_add_overflow(xa_first, npages, &xa_last))
+			goto err_unlock;
+
+		/* Now look for the next present entry. If such doesn't
+		 * exist, we found an empty range and can proceed
+		 */
+		xas_next_entry(&xas, xa_last - 1);
+		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
+			break;
+		/* o/w look for the next free entry */
+	}
+
+	for (i = xa_first; i < xa_last; i++) {
+		err = __xa_insert(&ucontext->mmap_xa, i, entry, GFP_KERNEL);
+		if (err)
+			goto err_undo;
+	}
+
+	entry->mmap_page = xa_first;
+	xa_unlock(&ucontext->mmap_xa);
+
+	mutex_unlock(&ufile->umap_lock);
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] inserted\n",
+		  rdma_user_mmap_get_key(entry), npages);
+
+	return rdma_user_mmap_get_key(entry);
+
+err_undo:
+	for (; i > xa_first; i--)
+		__xa_erase(&ucontext->mmap_xa, i - 1);
+
+err_unlock:
+	xa_unlock(&ucontext->mmap_xa);
+	mutex_unlock(&ufile->umap_lock);
+	return RDMA_USER_MMAP_INVALID;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..6c72773faf29 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
 	rdma_restrack_del(&ucontext->res);
 
 	ib_dev->ops.dealloc_ucontext(ucontext);
+	WARN_ON(!xa_empty(&ucontext->mmap_xa));
 	kfree(ucontext);
 
 	ufile->ucontext = NULL;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 8f4fd4fac159..8af0e32df122 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
 
 	mutex_init(&ucontext->per_mm_list_lock);
 	INIT_LIST_HEAD(&ucontext->per_mm_list);
+	xa_init_flags(&ucontext->mmap_xa, XA_FLAGS_ALLOC);
 
 	ret = get_unused_fd_flags(O_CLOEXEC);
 	if (ret < 0)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 180a5e0f70e4..d550db513be8 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -802,7 +802,7 @@ static void rdma_umap_open(struct vm_area_struct *vma)
 {
 	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
 	struct rdma_umap_priv *opriv = vma->vm_private_data;
-	struct rdma_umap_priv *priv;
+	int ret;
 
 	if (!opriv)
 		return;
@@ -816,10 +816,14 @@ static void rdma_umap_open(struct vm_area_struct *vma)
 	if (!ufile->ucontext)
 		goto out_unlock;
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
+	if (opriv->entry)
+		kref_get(&opriv->entry->ref);
+
+	/* We want to re-initialize the private data */
+	vma->vm_private_data = NULL;
+	ret = rdma_umap_priv_init(vma, opriv->entry);
+	if (ret)
 		goto out_unlock;
-	rdma_umap_priv_init(priv, vma);
 
 	up_read(&ufile->hw_destroy_rwsem);
 	return;
@@ -844,15 +848,15 @@ static void rdma_umap_close(struct vm_area_struct *vma)
 	if (!priv)
 		return;
 
+	if (priv->entry)
+		rdma_user_mmap_entry_put(ufile->ucontext, priv->entry);
+
 	/*
 	 * The vma holds a reference on the struct file that created it, which
 	 * in turn means that the ib_uverbs_file is guaranteed to exist at
 	 * this point.
 	 */
-	mutex_lock(&ufile->umap_lock);
-	list_del(&priv->list);
-	mutex_unlock(&ufile->umap_lock);
-	kfree(priv);
+	rdma_umap_priv_delete(ufile, priv);
 }
 
 /*
@@ -917,6 +921,9 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 
 			priv = list_first_entry(&ufile->umaps,
 						struct rdma_umap_priv, list);
+			if (priv->entry)
+				rdma_user_mmap_entry_put(ufile->ucontext,
+							 priv->entry);
 			mm = priv->vma->vm_mm;
 			ret = mmget_not_zero(mm);
 			if (!ret) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index de5bc352f473..0fbcd6a93056 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1474,6 +1474,7 @@ struct ib_ucontext {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+	struct xarray mmap_xa;
 };
 
 struct ib_uobject {
@@ -2254,6 +2255,14 @@ struct iw_cm_conn_param;
 
 #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
 
+#define RDMA_USER_MMAP_INVALID U64_MAX
+struct rdma_user_mmap_entry {
+	struct kref ref;
+	struct ib_ucontext *ucontext;
+	u32 npages;
+	u32 mmap_page;
+};
+
 /**
  * struct ib_device_ops - InfiniBand device operations
  * This structure defines all the InfiniBand device operations, providers will
@@ -2366,6 +2375,13 @@ struct ib_device_ops {
 			      struct ib_udata *udata);
 	void (*dealloc_ucontext)(struct ib_ucontext *context);
 	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
+	/**
+	 * This will be called once refcount of an entry in mmap_xa reaches
+	 * zero. The type of the memory that was mapped may differ between
+	 * entries and is opaque to the rdma_user_mmap interface.
+	 * Therefore needs to be implemented by the driver in mmap_free.
+	 */
+	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
 	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
 	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
 	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
@@ -2792,18 +2808,19 @@ void  ib_set_client_data(struct ib_device *device, struct ib_client *client,
 void ib_set_device_ops(struct ib_device *device,
 		       const struct ib_device_ops *ops);
 
-#if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		      unsigned long pfn, unsigned long size, pgprot_t prot);
-#else
-static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
-				    struct vm_area_struct *vma,
-				    unsigned long pfn, unsigned long size,
-				    pgprot_t prot)
-{
-	return -EINVAL;
-}
-#endif
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				struct rdma_user_mmap_entry *entry,
+				size_t length);
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len,
+			 struct vm_area_struct *vma);
+
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry);
+
+void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, u64 key);
 
 static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
 {
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  2019-09-19 17:37   ` Jason Gunthorpe
  2019-09-05 10:01 ` [PATCH v11 rdma-next 4/7] RDMA/siw: " Michal Kalderon
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

Remove the functions related to managing the mmap_xa database.
This code was copied to the ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
---
 drivers/infiniband/hw/efa/efa.h       |  18 +-
 drivers/infiniband/hw/efa/efa_main.c  |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c | 354 +++++++++++++++++-----------------
 3 files changed, 193 insertions(+), 180 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 2283e432693e..acfe9ef8a8d2 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -71,8 +71,6 @@ struct efa_dev {
 
 struct efa_ucontext {
 	struct ib_ucontext ibucontext;
-	struct xarray mmap_xa;
-	u32 mmap_xa_page;
 	u16 uarn;
 };
 
@@ -91,6 +89,7 @@ struct efa_cq {
 	struct efa_ucontext *ucontext;
 	dma_addr_t dma_addr;
 	void *cpu_addr;
+	u64 mmap_key;
 	size_t size;
 	u16 cq_idx;
 };
@@ -101,6 +100,13 @@ struct efa_qp {
 	void *rq_cpu_addr;
 	size_t rq_size;
 	enum ib_qp_state state;
+
+	/* Used for saving mmap_xa entries */
+	u64 sq_db_mmap_key;
+	u64 llq_desc_mmap_key;
+	u64 rq_db_mmap_key;
+	u64 rq_mmap_key;
+
 	u32 qp_handle;
 	u32 max_send_wr;
 	u32 max_recv_wr;
@@ -116,6 +122,13 @@ struct efa_ah {
 	u8 id[EFA_GID_SIZE];
 };
 
+struct efa_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	size_t length;
+	u8 mmap_flag;
+};
+
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata);
@@ -147,6 +160,7 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata);
 void efa_dealloc_ucontext(struct ib_ucontext *ibucontext);
 int efa_mmap(struct ib_ucontext *ibucontext,
 	     struct vm_area_struct *vma);
+void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 int efa_create_ah(struct ib_ah *ibah,
 		  struct rdma_ah_attr *ah_attr,
 		  u32 flags,
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 83858f7e83d0..0e3050d01b75 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -217,6 +217,7 @@ static const struct ib_device_ops efa_dev_ops = {
 	.get_link_layer = efa_port_link_layer,
 	.get_port_immutable = efa_get_port_immutable,
 	.mmap = efa_mmap,
+	.mmap_free = efa_mmap_free,
 	.modify_qp = efa_modify_qp,
 	.query_device = efa_query_device,
 	.query_gid = efa_query_gid,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 4edae89e8e3c..58838e75136f 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -13,10 +13,6 @@
 
 #include "efa.h"
 
-#define EFA_MMAP_FLAG_SHIFT 56
-#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
-#define EFA_MMAP_INVALID U64_MAX
-
 enum {
 	EFA_MMAP_DMA_PAGE = 0,
 	EFA_MMAP_IO_WC,
@@ -27,20 +23,6 @@ enum {
 	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
 	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
 
-struct efa_mmap_entry {
-	void  *obj;
-	u64 address;
-	u64 length;
-	u32 mmap_page;
-	u8 mmap_flag;
-};
-
-static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
-{
-	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
-	       ((u64)efa->mmap_page << PAGE_SHIFT);
-}
-
 #define EFA_DEFINE_STATS(op) \
 	op(EFA_TX_BYTES, "tx_bytes") \
 	op(EFA_TX_PKTS, "tx_pkts") \
@@ -147,6 +129,12 @@ static inline struct efa_ah *to_eah(struct ib_ah *ibah)
 	return container_of(ibah, struct efa_ah, ibah);
 }
 
+static inline struct efa_user_mmap_entry *
+to_emmap(struct rdma_user_mmap_entry *rdma_entry)
+{
+	return container_of(rdma_entry, struct efa_user_mmap_entry, rdma_entry);
+}
+
 #define field_avail(x, fld, sz) (offsetof(typeof(x), fld) + \
 				 FIELD_SIZEOF(typeof(x), fld) <= (sz))
 
@@ -172,106 +160,6 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
 	return addr;
 }
 
-/*
- * This is only called when the ucontext is destroyed and there can be no
- * concurrent query via mmap or allocate on the xarray, thus we can be sure no
- * other thread is using the entry pointer. We also know that all the BAR
- * pages have either been zap'd or munmaped at this point.  Normal pages are
- * refcounted and will be freed at the proper time.
- */
-static void mmap_entries_remove_free(struct efa_dev *dev,
-				     struct efa_ucontext *ucontext)
-{
-	struct efa_mmap_entry *entry;
-	unsigned long mmap_page;
-
-	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
-		xa_erase(&ucontext->mmap_xa, mmap_page);
-
-		ibdev_dbg(
-			&dev->ibdev,
-			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-			entry->obj, get_mmap_key(entry), entry->address,
-			entry->length);
-		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
-			/* DMA mapping is already gone, now free the pages */
-			free_pages_exact(phys_to_virt(entry->address),
-					 entry->length);
-		kfree(entry);
-	}
-}
-
-static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
-					     struct efa_ucontext *ucontext,
-					     u64 key, u64 len)
-{
-	struct efa_mmap_entry *entry;
-	u64 mmap_page;
-
-	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
-	if (mmap_page > U32_MAX)
-		return NULL;
-
-	entry = xa_load(&ucontext->mmap_xa, mmap_page);
-	if (!entry || get_mmap_key(entry) != key || entry->length != len)
-		return NULL;
-
-	ibdev_dbg(&dev->ibdev,
-		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-		  entry->obj, key, entry->address, entry->length);
-
-	return entry;
-}
-
-/*
- * Note this locking scheme cannot support removal of entries, except during
- * ucontext destruction when the core code guarentees no concurrency.
- */
-static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext *ucontext,
-			     void *obj, u64 address, u64 length, u8 mmap_flag)
-{
-	struct efa_mmap_entry *entry;
-	u32 next_mmap_page;
-	int err;
-
-	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
-	if (!entry)
-		return EFA_MMAP_INVALID;
-
-	entry->obj = obj;
-	entry->address = address;
-	entry->length = length;
-	entry->mmap_flag = mmap_flag;
-
-	xa_lock(&ucontext->mmap_xa);
-	if (check_add_overflow(ucontext->mmap_xa_page,
-			       (u32)(length >> PAGE_SHIFT),
-			       &next_mmap_page))
-		goto err_unlock;
-
-	entry->mmap_page = ucontext->mmap_xa_page;
-	ucontext->mmap_xa_page = next_mmap_page;
-	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
-			  GFP_KERNEL);
-	if (err)
-		goto err_unlock;
-
-	xa_unlock(&ucontext->mmap_xa);
-
-	ibdev_dbg(
-		&dev->ibdev,
-		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
-		entry->obj, entry->address, entry->length, get_mmap_key(entry));
-
-	return get_mmap_key(entry);
-
-err_unlock:
-	xa_unlock(&ucontext->mmap_xa);
-	kfree(entry);
-	return EFA_MMAP_INVALID;
-
-}
-
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata)
@@ -485,8 +373,20 @@ static int efa_destroy_qp_handle(struct efa_dev *dev, u32 qp_handle)
 	return efa_com_destroy_qp(&dev->edev, &params);
 }
 
+static void efa_qp_user_mmap_entries_remove(struct efa_ucontext *ucontext,
+					    struct efa_qp *qp)
+{
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_mmap_key);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_db_mmap_key);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    qp->llq_desc_mmap_key);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->sq_db_mmap_key);
+}
+
 int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
+	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata,
+		struct efa_ucontext, ibucontext);
 	struct efa_dev *dev = to_edev(ibqp->pd->device);
 	struct efa_qp *qp = to_eqp(ibqp);
 	int err;
@@ -505,61 +405,119 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 				 DMA_TO_DEVICE);
 	}
 
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
 	kfree(qp);
 	return 0;
 }
 
+static int efa_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				      u64 address, size_t length,
+				      u8 mmap_flag, u64 *key)
+{
+	struct efa_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->address = address;
+	entry->length = length;
+	entry->mmap_flag = mmap_flag;
+
+	*key = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
+					   length);
+	if (*key == RDMA_USER_MMAP_INVALID) {
+		kfree(entry);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void efa_qp_init_keys(struct efa_qp *qp)
+{
+	qp->sq_db_mmap_key = RDMA_USER_MMAP_INVALID;
+	qp->llq_desc_mmap_key = RDMA_USER_MMAP_INVALID;
+	qp->rq_db_mmap_key = RDMA_USER_MMAP_INVALID;
+	qp->rq_mmap_key = RDMA_USER_MMAP_INVALID;
+}
+
 static int qp_mmap_entries_setup(struct efa_qp *qp,
 				 struct efa_dev *dev,
 				 struct efa_ucontext *ucontext,
 				 struct efa_com_create_qp_params *params,
 				 struct efa_ibv_create_qp_resp *resp)
 {
-	/*
-	 * Once an entry is inserted it might be mmapped, hence cannot be
-	 * cleaned up until dealloc_ucontext.
-	 */
-	resp->sq_db_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->db_bar_addr + resp->sq_db_offset,
-				  PAGE_SIZE, EFA_MMAP_IO_NC);
-	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+	size_t length;
+	u64 address;
+	int err;
 
+	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					 dev->db_bar_addr +
+					 resp->sq_db_offset,
+					 PAGE_SIZE, EFA_MMAP_IO_NC,
+					 &qp->sq_db_mmap_key);
+	if (err)
+		return err;
+
+	resp->sq_db_mmap_key = qp->sq_db_mmap_key;
 	resp->sq_db_offset &= ~PAGE_MASK;
 
-	resp->llq_desc_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->mem_bar_addr + resp->llq_desc_offset,
-				  PAGE_ALIGN(params->sq_ring_size_in_bytes +
-					     (resp->llq_desc_offset & ~PAGE_MASK)),
-				  EFA_MMAP_IO_WC);
-	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+	address = dev->mem_bar_addr + resp->llq_desc_offset;
+	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
+			    (resp->llq_desc_offset & ~PAGE_MASK));
+
+	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					 address,
+					 length,
+					 EFA_MMAP_IO_WC,
+					 &qp->llq_desc_mmap_key);
+	if (err)
+		goto err_remove_sq_db;
 
+	resp->llq_desc_mmap_key = qp->llq_desc_mmap_key;
 	resp->llq_desc_offset &= ~PAGE_MASK;
 
 	if (qp->rq_size) {
-		resp->rq_db_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  dev->db_bar_addr + resp->rq_db_offset,
-					  PAGE_SIZE, EFA_MMAP_IO_NC);
-		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
-			return -ENOMEM;
+		address = dev->db_bar_addr + resp->rq_db_offset;
+
+		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+						 address, PAGE_SIZE,
+						 EFA_MMAP_IO_NC,
+						 &qp->rq_db_mmap_key);
+		if (err)
+			goto err_remove_llq_desc;
 
+		resp->rq_db_mmap_key = qp->rq_db_mmap_key;
 		resp->rq_db_offset &= ~PAGE_MASK;
 
-		resp->rq_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  virt_to_phys(qp->rq_cpu_addr),
-					  qp->rq_size, EFA_MMAP_DMA_PAGE);
-		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
-			return -ENOMEM;
+		address = virt_to_phys(qp->rq_cpu_addr);
+		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+						 address, qp->rq_size,
+						 EFA_MMAP_DMA_PAGE,
+						 &qp->rq_mmap_key);
+		if (err)
+			goto err_remove_rq_db;
 
+		resp->rq_mmap_key = qp->rq_mmap_key;
 		resp->rq_mmap_size = qp->rq_size;
 	}
 
 	return 0;
+
+err_remove_rq_db:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_db_mmap_key);
+
+err_remove_llq_desc:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    qp->llq_desc_mmap_key);
+
+err_remove_sq_db:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->sq_db_mmap_key);
+
+	/* If any error occurred, we init the keys back to invalid */
+	efa_qp_init_keys(qp);
+
+	return err;
 }
 
 static int efa_qp_validate_cap(struct efa_dev *dev,
@@ -634,7 +592,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 	struct efa_dev *dev = to_edev(ibpd->device);
 	struct efa_ibv_create_qp_resp resp = {};
 	struct efa_ibv_create_qp cmd = {};
-	bool rq_entry_inserted = false;
 	struct efa_ucontext *ucontext;
 	struct efa_qp *qp;
 	int err;
@@ -687,6 +644,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 		goto err_out;
 	}
 
+	efa_qp_init_keys(qp);
 	create_qp_params.uarn = ucontext->uarn;
 	create_qp_params.pd = to_epd(ibpd)->pdn;
 
@@ -742,7 +700,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 	if (err)
 		goto err_destroy_qp;
 
-	rq_entry_inserted = true;
 	qp->qp_handle = create_qp_resp.qp_handle;
 	qp->ibqp.qp_num = create_qp_resp.qp_num;
 	qp->ibqp.qp_type = init_attr->qp_type;
@@ -759,7 +716,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 			ibdev_dbg(&dev->ibdev,
 				  "Failed to copy udata for qp[%u]\n",
 				  create_qp_resp.qp_num);
-			goto err_destroy_qp;
+			goto err_remove_mmap_entries;
 		}
 	}
 
@@ -767,13 +724,16 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 
 	return &qp->ibqp;
 
+err_remove_mmap_entries:
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
 err_destroy_qp:
 	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
 err_free_mapped:
 	if (qp->rq_size) {
 		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr, qp->rq_size,
 				 DMA_TO_DEVICE);
-		if (!rq_entry_inserted)
+
+		if (qp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
 			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
 	}
 err_free_qp:
@@ -887,6 +847,9 @@ static int efa_destroy_cq_idx(struct efa_dev *dev, int cq_idx)
 
 void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
+	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata,
+			struct efa_ucontext, ibucontext);
+
 	struct efa_dev *dev = to_edev(ibcq->device);
 	struct efa_cq *cq = to_ecq(ibcq);
 
@@ -897,17 +860,28 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    cq->mmap_key);
 }
 
 static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
 				 struct efa_ibv_create_cq_resp *resp)
 {
+	struct efa_ucontext *ucontext = cq->ucontext;
+	int err;
+
 	resp->q_mmap_size = cq->size;
-	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
-					     virt_to_phys(cq->cpu_addr),
-					     cq->size, EFA_MMAP_DMA_PAGE);
-	if (resp->q_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+
+	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					 virt_to_phys(cq->cpu_addr),
+					 cq->size, EFA_MMAP_DMA_PAGE,
+					 &cq->mmap_key);
+	if (err) {
+		cq->mmap_key = RDMA_USER_MMAP_INVALID;
+		return err;
+	}
+
+	resp->q_mmap_key = cq->mmap_key;
 
 	return 0;
 }
@@ -924,7 +898,6 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	struct efa_dev *dev = to_edev(ibdev);
 	struct efa_ibv_create_cq cmd = {};
 	struct efa_cq *cq = to_ecq(ibcq);
-	bool cq_entry_inserted = false;
 	int entries = attr->cqe;
 	int err;
 
@@ -1013,15 +986,13 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		goto err_destroy_cq;
 	}
 
-	cq_entry_inserted = true;
-
 	if (udata->outlen) {
 		err = ib_copy_to_udata(udata, &resp,
 				       min(sizeof(resp), udata->outlen));
 		if (err) {
 			ibdev_dbg(ibdev,
 				  "Failed to copy udata for create_cq\n");
-			goto err_destroy_cq;
+			goto err_remove_mmap;
 		}
 	}
 
@@ -1030,13 +1001,16 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	return 0;
 
+err_remove_mmap:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, cq->mmap_key);
 err_destroy_cq:
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 err_free_mapped:
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
-	if (!cq_entry_inserted)
+	if (cq->mmap_key == RDMA_USER_MMAP_INVALID)
 		free_pages_exact(cq->cpu_addr, cq->size);
+
 err_out:
 	atomic64_inc(&dev->stats.sw_stats.create_cq_err);
 	return err;
@@ -1556,7 +1530,6 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
 		goto err_out;
 
 	ucontext->uarn = result.uarn;
-	xa_init(&ucontext->mmap_xa);
 
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
@@ -1585,28 +1558,48 @@ void efa_dealloc_ucontext(struct ib_ucontext *ibucontext)
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
 
-	mmap_entries_remove_free(dev, ucontext);
 	efa_dealloc_uar(dev, ucontext->uarn);
 }
 
+void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
+{
+	struct efa_user_mmap_entry *entry = to_emmap(rdma_entry);
+
+	/* DMA mapping is already gone, now free the pages */
+	if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
+		free_pages_exact(phys_to_virt(entry->address),
+				 entry->length);
+	kfree(entry);
+}
+
 static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
-		      struct vm_area_struct *vma, u64 key, u64 length)
+		      struct vm_area_struct *vma, u64 key, size_t length)
 {
-	struct efa_mmap_entry *entry;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct efa_user_mmap_entry *entry;
 	unsigned long va;
 	u64 pfn;
 	int err;
 
-	entry = mmap_entry_get(dev, ucontext, key, length);
-	if (!entry) {
+	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
+					      length, vma);
+	if (!rdma_entry) {
 		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
 			  key);
 		return -EINVAL;
 	}
+	entry = to_emmap(rdma_entry);
+	if (entry->length != length) {
+		ibdev_dbg(&dev->ibdev,
+			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
+			  key, length, entry->length);
+		err = -EINVAL;
+		goto err;
+	}
 
 	ibdev_dbg(&dev->ibdev,
-		  "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
-		  entry->address, length, entry->mmap_flag);
+		  "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n",
+		  entry->address, entry->length, entry->mmap_flag);
 
 	pfn = entry->address >> PAGE_SHIFT;
 	switch (entry->mmap_flag) {
@@ -1631,14 +1624,19 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 	}
 
 	if (err) {
-		ibdev_dbg(
-			&dev->ibdev,
-			"Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
-			entry->address, length, entry->mmap_flag, err);
-		return err;
+		ibdev_dbg(&dev->ibdev,
+			  "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
+			  entry->address, length, entry->mmap_flag, err);
+		goto err;
 	}
 
 	return 0;
+
+err:
+	rdma_user_mmap_entry_put(&ucontext->ibucontext,
+				 rdma_entry);
+
+	return err;
 }
 
 int efa_mmap(struct ib_ucontext *ibucontext,
@@ -1646,16 +1644,16 @@ int efa_mmap(struct ib_ucontext *ibucontext,
 {
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
-	u64 length = vma->vm_end - vma->vm_start;
+	size_t length = vma->vm_end - vma->vm_start;
 	u64 key = vma->vm_pgoff << PAGE_SHIFT;
 
 	ibdev_dbg(&dev->ibdev,
-		  "start %#lx, end %#lx, length = %#llx, key = %#llx\n",
+		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
 		  vma->vm_start, vma->vm_end, length, key);
 
 	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
 		ibdev_dbg(&dev->ibdev,
-			  "length[%#llx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+			  "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
 			  length, PAGE_SIZE, vma->vm_flags);
 		return -EINVAL;
 	}
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (2 preceding siblings ...)
  2019-09-05 10:01 ` [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

Remove the functions related to managing the mmap_xa database.
This code is now common in ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
---
 drivers/infiniband/sw/siw/siw.h       |  20 +++-
 drivers/infiniband/sw/siw/siw_main.c  |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c | 216 ++++++++++++++++++----------------
 drivers/infiniband/sw/siw/siw_verbs.h |   1 +
 4 files changed, 131 insertions(+), 107 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index 77b1aabf6ff3..c20b4ee1cb4b 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -220,7 +220,7 @@ struct siw_cq {
 	u32 cq_get;
 	u32 num_cqe;
 	bool kernel_verbs;
-	u32 xa_cq_index; /* mmap information for CQE array */
+	u64 cq_key; /* mmap information for CQE array */
 	u32 id; /* For debugging only */
 };
 
@@ -263,7 +263,7 @@ struct siw_srq {
 	u32 rq_put;
 	u32 rq_get;
 	u32 num_rqe; /* max # of wqe's allowed */
-	u32 xa_srq_index; /* mmap information for SRQ array */
+	u64 srq_key; /* mmap information for SRQ array */
 	char armed; /* inform user if limit hit */
 	char kernel_verbs; /* '1' if kernel client */
 };
@@ -477,8 +477,8 @@ struct siw_qp {
 		u8 layer : 4, etype : 4;
 		u8 ecode;
 	} term_info;
-	u32 xa_sq_index; /* mmap information for SQE array */
-	u32 xa_rq_index; /* mmap information for RQE array */
+	u64 sq_key; /* mmap information for SQE array */
+	u64 rq_key; /* mmap information for RQE array */
 	struct rcu_head rcu;
 };
 
@@ -503,6 +503,12 @@ struct iwarp_msg_info {
 	int (*rx_data)(struct siw_qp *qp);
 };
 
+struct siw_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	void *address;
+	size_t length;
+};
+
 /* Global siw parameters. Currently set in siw_main.c */
 extern const bool zcopy_tx;
 extern const bool try_gso;
@@ -607,6 +613,12 @@ static inline struct siw_mr *to_siw_mr(struct ib_mr *base_mr)
 	return container_of(base_mr, struct siw_mr, base_mr);
 }
 
+static inline struct siw_user_mmap_entry *
+to_siw_mmap_entry(struct rdma_user_mmap_entry *rdma_mmap)
+{
+	return container_of(rdma_mmap, struct siw_user_mmap_entry, rdma_entry);
+}
+
 static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev, int id)
 {
 	struct siw_qp *qp;
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 05a92f997f60..46c0bb59489d 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -298,6 +298,7 @@ static const struct ib_device_ops siw_device_ops = {
 	.iw_rem_ref = siw_qp_put_ref,
 	.map_mr_sg = siw_map_mr_sg,
 	.mmap = siw_mmap,
+	.mmap_free = siw_mmap_free,
 	.modify_qp = siw_verbs_modify_qp,
 	.modify_srq = siw_modify_srq,
 	.poll_cq = siw_poll_cq,
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index 03176a3d1e18..bbe7bb962d63 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -34,44 +34,20 @@ static char ib_qp_state_to_string[IB_QPS_ERR + 1][sizeof("RESET")] = {
 	[IB_QPS_ERR] = "ERR"
 };
 
-static u32 siw_create_uobj(struct siw_ucontext *uctx, void *vaddr, u32 size)
+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
-	struct siw_uobj *uobj;
-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
-	u32 key;
+	struct siw_user_mmap_entry *entry = to_siw_mmap_entry(rdma_entry);
 
-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
-	if (!uobj)
-		return SIW_INVAL_UOBJ_KEY;
-
-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit, &uctx->uobj_nextkey,
-			    GFP_KERNEL) < 0) {
-		kfree(uobj);
-		return SIW_INVAL_UOBJ_KEY;
-	}
-	uobj->size = PAGE_ALIGN(size);
-	uobj->addr = vaddr;
-
-	return key;
-}
-
-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
-				     unsigned long off, u32 size)
-{
-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
-
-	if (uobj && uobj->size == size)
-		return uobj;
-
-	return NULL;
+	kfree(entry);
 }
 
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
-	struct siw_uobj *uobj;
-	unsigned long off = vma->vm_pgoff;
-	int size = vma->vm_end - vma->vm_start;
+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+	size_t size = vma->vm_end - vma->vm_start;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct siw_user_mmap_entry *entry;
 	int rv = -EINVAL;
 
 	/*
@@ -79,18 +55,30 @@ int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 	 */
 	if (vma->vm_start & (PAGE_SIZE - 1)) {
 		pr_warn("siw: mmap not page aligned\n");
-		goto out;
+		return -EINVAL;
 	}
-	uobj = siw_get_uobj(uctx, off, size);
-	if (!uobj) {
-		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %u\n",
+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off,
+					      size, vma);
+	if (!rdma_entry) {
+		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %#zx\n",
 			off, size);
-		goto out;
+		return -EINVAL;
 	}
-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
-	if (rv)
-		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off, size);
-out:
+	entry = to_siw_mmap_entry(rdma_entry);
+	if (PAGE_ALIGN(entry->length) != size) {
+		siw_dbg(&uctx->sdev->base_dev,
+			"key[%#lx] does not have valid length[%#zx] expected[%#zx]\n",
+			off, size, entry->length);
+		rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
+		return -EINVAL;
+	}
+
+	rv = remap_vmalloc_range(vma, entry->address, 0);
+	if (rv) {
+		pr_warn("remap_vmalloc_range failed: %lu, %zu\n", off, size);
+		rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
+	}
+
 	return rv;
 }
 
@@ -105,7 +93,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx, struct ib_udata *udata)
 		rv = -ENOMEM;
 		goto err_out;
 	}
-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
+
 	ctx->uobj_nextkey = 0;
 	ctx->sdev = sdev;
 
@@ -135,19 +123,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx, struct ib_udata *udata)
 void siw_dealloc_ucontext(struct ib_ucontext *base_ctx)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
-	void *entry;
-	unsigned long index;
 
-	/*
-	 * Make sure all user mmap objects are gone. Since QP, CQ
-	 * and SRQ destroy routines destroy related objects, nothing
-	 * should be found here.
-	 */
-	xa_for_each(&uctx->xa, index, entry) {
-		kfree(xa_erase(&uctx->xa, index));
-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
-	}
-	xa_destroy(&uctx->xa);
 	atomic_dec(&uctx->sdev->num_ctx);
 }
 
@@ -293,6 +269,28 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
 	siw_qp_put(to_siw_qp(base_qp));
 }
 
+static int siw_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				      void *address, size_t length,
+				      u64 *key)
+{
+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->address = address;
+	entry->length = length;
+
+	*key = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
+					   length);
+	if (*key == RDMA_USER_MMAP_INVALID) {
+		kfree(entry);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /*
  * siw_create_qp()
  *
@@ -317,6 +315,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	struct siw_cq *scq = NULL, *rcq = NULL;
 	unsigned long flags;
 	int num_sqe, num_rqe, rv = 0;
+	size_t length;
+	u64 key;
 
 	siw_dbg(base_dev, "create new QP\n");
 
@@ -380,8 +380,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	spin_lock_init(&qp->orq_lock);
 
 	qp->kernel_verbs = !udata;
-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
+	qp->sq_key = RDMA_USER_MMAP_INVALID;
+	qp->rq_key = RDMA_USER_MMAP_INVALID;
 
 	rv = siw_qp_add(sdev, qp);
 	if (rv)
@@ -459,22 +459,29 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 		uresp.qp_id = qp_id(qp);
 
 		if (qp->sendq) {
-			qp->xa_sq_index =
-				siw_create_uobj(uctx, qp->sendq,
-					num_sqe * sizeof(struct siw_sqe));
+			length = num_sqe * sizeof(struct siw_sqe);
+			rv = siw_user_mmap_entry_insert(&uctx->base_ucontext,
+							qp->sendq, length,
+							&key);
+			if (rv)
+				goto err_out_xa;
+
+			qp->sq_key = key;
 		}
+
 		if (qp->recvq) {
-			qp->xa_rq_index =
-				 siw_create_uobj(uctx, qp->recvq,
-					num_rqe * sizeof(struct siw_rqe));
-		}
-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
-			goto err_out_xa;
+			length = num_rqe * sizeof(struct siw_rqe);
+			rv = siw_user_mmap_entry_insert(&uctx->base_ucontext,
+							qp->recvq, length,
+							&key);
+			if (rv)
+				goto err_out_xa;
+
+			qp->rq_key = key;
 		}
-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;
+
+		uresp.sq_key = qp->sq_key;
+		uresp.rq_key = qp->rq_key;
 
 		if (udata->outlen < sizeof(uresp)) {
 			rv = -EINVAL;
@@ -502,11 +509,12 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	kfree(siw_base_qp);
 
 	if (qp) {
-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
-
+		if (uctx) {
+			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
+						    qp->sq_key);
+			rdma_user_mmap_entry_remove(&uctx->base_ucontext,
+						    qp->rq_key);
+		}
 		vfree(qp->sendq);
 		vfree(qp->recvq);
 		kfree(qp);
@@ -620,10 +628,10 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
 	qp->rx_stream.rx_suspend = 1;
 
-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
+	if (uctx) {
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_key);
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_key);
+	}
 
 	down_write(&qp->state_lock);
 
@@ -993,8 +1001,8 @@ void siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata)
 
 	siw_cq_flush(cq);
 
-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
+	if (ctx)
+		rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_key);
 
 	atomic_dec(&sdev->num_cq);
 
@@ -1031,7 +1039,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 	size = roundup_pow_of_two(size);
 	cq->base_cq.cqe = size;
 	cq->num_cqe = size;
-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
+	cq->cq_key = RDMA_USER_MMAP_INVALID;
 
 	if (!udata) {
 		cq->kernel_verbs = 1;
@@ -1057,16 +1065,16 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		struct siw_ucontext *ctx =
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
+		size_t length = size * sizeof(struct siw_cqe) +
+			sizeof(struct siw_cq_ctrl);
 
-		cq->xa_cq_index =
-			siw_create_uobj(ctx, cq->queue,
-					size * sizeof(struct siw_cqe) +
-						sizeof(struct siw_cq_ctrl));
-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
+						cq->queue, length,
+						&cq->cq_key);
+		if (rv)
 			goto err_out;
-		}
-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
+
+		uresp.cq_key = cq->cq_key;
 		uresp.cq_id = cq->id;
 		uresp.num_cqe = size;
 
@@ -1087,8 +1095,9 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		struct siw_ucontext *ctx =
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
+		if (ctx)
+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+						    cq->cq_key);
 		vfree(cq->queue);
 	}
 	atomic_dec(&sdev->num_cq);
@@ -1490,7 +1499,7 @@ int siw_create_srq(struct ib_srq *base_srq,
 	}
 	srq->max_sge = attrs->max_sge;
 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
+	srq->srq_key = RDMA_USER_MMAP_INVALID;
 	srq->limit = attrs->srq_limit;
 	if (srq->limit)
 		srq->armed = 1;
@@ -1509,15 +1518,15 @@ int siw_create_srq(struct ib_srq *base_srq,
 	}
 	if (udata) {
 		struct siw_uresp_create_srq uresp = {};
+		size_t length = srq->num_rqe * sizeof(struct siw_rqe);
 
-		srq->xa_srq_index = siw_create_uobj(
-			ctx, srq->recvq, srq->num_rqe * sizeof(struct siw_rqe));
-
-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
+						srq->recvq, length,
+						&srq->srq_key);
+		if (rv)
 			goto err_out;
-		}
-		uresp.srq_key = srq->xa_srq_index;
+
+		uresp.srq_key = srq->srq_key;
 		uresp.num_rqe = srq->num_rqe;
 
 		if (udata->outlen < sizeof(uresp)) {
@@ -1536,8 +1545,9 @@ int siw_create_srq(struct ib_srq *base_srq,
 
 err_out:
 	if (srq->recvq) {
-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
+		if (ctx)
+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+						    srq->srq_key);
 		vfree(srq->recvq);
 	}
 	atomic_dec(&sdev->num_srq);
@@ -1623,9 +1633,9 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata)
 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
 					  base_ucontext);
 
-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
-
+	if (ctx)
+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+					    srq->srq_key);
 	vfree(srq->recvq);
 	atomic_dec(&sdev->num_srq);
 }
diff --git a/drivers/infiniband/sw/siw/siw_verbs.h b/drivers/infiniband/sw/siw/siw_verbs.h
index 1910869281cb..1a731989fad6 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.h
+++ b/drivers/infiniband/sw/siw/siw_verbs.h
@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata);
 int siw_post_srq_recv(struct ib_srq *base_srq, const struct ib_recv_wr *wr,
 		      const struct ib_recv_wr **bad_wr);
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma);
+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);
 void siw_cq_event(struct siw_cq *cq, enum ib_event_type type);
 void siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (3 preceding siblings ...)
  2019-09-05 10:01 ` [PATCH v11 rdma-next 4/7] RDMA/siw: " Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  2019-09-19 17:55   ` Jason Gunthorpe
  2019-09-05 10:01 ` [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
  2019-09-05 10:01 ` [PATCH v11 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
  6 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

Remove all function related to mmap from qedr and use the common
API

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/main.c  |   1 +
 drivers/infiniband/hw/qedr/qedr.h  |  28 +++---
 drivers/infiniband/hw/qedr/verbs.c | 196 ++++++++++++++++++-------------------
 drivers/infiniband/hw/qedr/verbs.h |   3 +-
 4 files changed, 111 insertions(+), 117 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 5136b835e1ba..aab269602284 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -212,6 +212,7 @@ static const struct ib_device_ops qedr_dev_ops = {
 	.get_link_layer = qedr_link_layer,
 	.map_mr_sg = qedr_map_mr_sg,
 	.mmap = qedr_mmap,
+	.mmap_free = qedr_mmap_free,
 	.modify_port = qedr_modify_port,
 	.modify_qp = qedr_modify_qp,
 	.modify_srq = qedr_modify_srq,
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 0cfd849b13d6..f273906fca19 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -230,14 +230,10 @@ struct qedr_ucontext {
 	struct qedr_dev *dev;
 	struct qedr_pd *pd;
 	void __iomem *dpi_addr;
+	u64 db_key;
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
-
-	struct list_head mm_head;
-
-	/* Lock to protect mm list */
-	struct mutex mm_list_lock;
 };
 
 union db_prod64 {
@@ -300,14 +296,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-struct qedr_mm {
-	struct {
-		u64 phy_addr;
-		unsigned long len;
-	} key;
-	struct list_head entry;
-};
-
 union db_prod32 {
 	struct rdma_pwm_val16_data data;
 	u32 raw;
@@ -476,6 +464,13 @@ struct qedr_mr {
 	u32 npages;
 };
 
+struct qedr_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	size_t length;
+	u8 mmap_flag;
+};
+
 #define SET_FIELD2(value, name, flag) ((value) |= ((flag) << (name ## _SHIFT)))
 
 #define QEDR_RESP_IMM	(RDMA_CQE_RESPONDER_IMM_FLG_MASK << \
@@ -574,4 +569,11 @@ static inline struct qedr_srq *get_qedr_srq(struct ib_srq *ibsrq)
 {
 	return container_of(ibsrq, struct qedr_srq, ibsrq);
 }
+
+static inline struct qedr_user_mmap_entry *
+get_qedr_mmap_entry(struct rdma_user_mmap_entry *rdma_entry)
+{
+	return container_of(rdma_entry, struct qedr_user_mmap_entry,
+			    rdma_entry);
+}
 #endif
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 6f3ce86019b7..b796ce2c78bc 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -58,6 +58,10 @@
 
 #define DB_ADDR_SHIFT(addr)		((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
 
+enum {
+	QEDR_USER_MMAP_IO_WC = 0,
+};
+
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
 					size_t len)
 {
@@ -256,60 +260,6 @@ int qedr_modify_port(struct ib_device *ibdev, u8 port, int mask,
 	return 0;
 }
 
-static int qedr_add_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			 unsigned long len)
-{
-	struct qedr_mm *mm;
-
-	mm = kzalloc(sizeof(*mm), GFP_KERNEL);
-	if (!mm)
-		return -ENOMEM;
-
-	mm->key.phy_addr = phy_addr;
-	/* This function might be called with a length which is not a multiple
-	 * of PAGE_SIZE, while the mapping is PAGE_SIZE grained and the kernel
-	 * forces this granularity by increasing the requested size if needed.
-	 * When qedr_mmap is called, it will search the list with the updated
-	 * length as a key. To prevent search failures, the length is rounded up
-	 * in advance to PAGE_SIZE.
-	 */
-	mm->key.len = roundup(len, PAGE_SIZE);
-	INIT_LIST_HEAD(&mm->entry);
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_add(&mm->entry, &uctx->mm_head);
-	mutex_unlock(&uctx->mm_list_lock);
-
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "added (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-		 (unsigned long long)mm->key.phy_addr,
-		 (unsigned long)mm->key.len, uctx);
-
-	return 0;
-}
-
-static bool qedr_search_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			     unsigned long len)
-{
-	bool found = false;
-	struct qedr_mm *mm;
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_for_each_entry(mm, &uctx->mm_head, entry) {
-		if (len != mm->key.len || phy_addr != mm->key.phy_addr)
-			continue;
-
-		found = true;
-		break;
-	}
-	mutex_unlock(&uctx->mm_list_lock);
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "searched for (addr=0x%llx,len=0x%lx) for ctx=%p, result=%d\n",
-		 mm->key.phy_addr, mm->key.len, uctx, found);
-
-	return found;
-}
-
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 {
 	struct ib_device *ibdev = uctx->device;
@@ -318,6 +268,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	struct qedr_alloc_ucontext_resp uresp = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
+	struct qedr_user_mmap_entry *entry;
 
 	if (!udata)
 		return -EFAULT;
@@ -334,13 +285,27 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	ctx->dpi_addr = oparams.dpi_addr;
 	ctx->dpi_phys_addr = oparams.dpi_phys_addr;
 	ctx->dpi_size = oparams.dpi_size;
-	INIT_LIST_HEAD(&ctx->mm_head);
-	mutex_init(&ctx->mm_list_lock);
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	entry->address = ctx->dpi_phys_addr;
+	entry->length = ctx->dpi_size;
+	entry->mmap_flag = QEDR_USER_MMAP_IO_WC;
+	ctx->db_key = rdma_user_mmap_entry_insert(uctx, &entry->rdma_entry,
+						  ctx->dpi_size);
+	if (ctx->db_key == RDMA_USER_MMAP_INVALID) {
+		rc = -ENOMEM;
+		kfree(entry);
+		goto err;
+	}
 
 	uresp.dpm_enabled = dev->user_dpm_enabled;
 	uresp.wids_enabled = 1;
 	uresp.wid_count = oparams.wid_count;
-	uresp.db_pa = ctx->dpi_phys_addr;
+	uresp.db_pa = ctx->db_key;
 	uresp.db_size = ctx->dpi_size;
 	uresp.max_send_wr = dev->attr.max_sqe;
 	uresp.max_recv_wr = dev->attr.max_rqe;
@@ -352,82 +317,107 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
-		return rc;
+		goto err1;
 
 	ctx->dev = dev;
 
-	rc = qedr_add_mmap(ctx, ctx->dpi_phys_addr, ctx->dpi_size);
-	if (rc)
-		return rc;
-
 	DP_DEBUG(dev, QEDR_MSG_INIT, "Allocating user context %p\n",
 		 &ctx->ibucontext);
 	return 0;
+
+err1:
+	rdma_user_mmap_entry_remove(uctx, ctx->db_key);
+err:
+	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
+	return rc;
 }
 
 void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
 {
 	struct qedr_ucontext *uctx = get_qedr_ucontext(ibctx);
-	struct qedr_mm *mm, *tmp;
 
 	DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n",
 		 uctx);
-	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
 
-	list_for_each_entry_safe(mm, tmp, &uctx->mm_head, entry) {
-		DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-			 "deleted (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-			 mm->key.phy_addr, mm->key.len, uctx);
-		list_del(&mm->entry);
-		kfree(mm);
-	}
+	rdma_user_mmap_entry_remove(ibctx, uctx->db_key);
+	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
 }
 
-int qedr_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
-	struct qedr_ucontext *ucontext = get_qedr_ucontext(context);
-	struct qedr_dev *dev = get_qedr_dev(context->device);
-	unsigned long phys_addr = vma->vm_pgoff << PAGE_SHIFT;
-	unsigned long len = (vma->vm_end - vma->vm_start);
-	unsigned long dpi_start;
+	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
 
-	dpi_start = dev->db_phys_addr + (ucontext->dpi * ucontext->dpi_size);
-
-	DP_DEBUG(dev, QEDR_MSG_INIT,
-		 "mmap invoked with vm_start=0x%pK, vm_end=0x%pK,vm_pgoff=0x%pK; dpi_start=0x%pK dpi_size=0x%x\n",
-		 (void *)vma->vm_start, (void *)vma->vm_end,
-		 (void *)vma->vm_pgoff, (void *)dpi_start, ucontext->dpi_size);
+	kfree(entry);
+}
 
-	if ((vma->vm_start & (PAGE_SIZE - 1)) || (len & (PAGE_SIZE - 1))) {
-		DP_ERR(dev,
-		       "failed mmap, addresses must be page aligned: start=0x%pK, end=0x%pK\n",
-		       (void *)vma->vm_start, (void *)vma->vm_end);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
+{
+	struct ib_device *dev = ucontext->device;
+	size_t length = vma->vm_end - vma->vm_start;
+	u64 key = vma->vm_pgoff << PAGE_SHIFT;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct qedr_user_mmap_entry *entry;
+	u64 pfn;
+	int err;
+
+	ibdev_dbg(dev,
+		  "start %#lx, end %#lx, length = %#zx, key = %#llx\n",
+		  vma->vm_start, vma->vm_end, length, key);
+
+	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
+		ibdev_dbg(dev,
+			  "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+			  length, PAGE_SIZE, vma->vm_flags);
 		return -EINVAL;
 	}
 
-	if (!qedr_search_mmap(ucontext, phys_addr, len)) {
-		DP_ERR(dev, "failed mmap, vm_pgoff=0x%lx is not authorized\n",
-		       vma->vm_pgoff);
-		return -EINVAL;
+	if (vma->vm_flags & VM_EXEC) {
+		ibdev_dbg(dev, "Mapping executable pages is not permitted\n");
+		return -EPERM;
 	}
+	vma->vm_flags &= ~VM_MAYEXEC;
 
-	if (phys_addr < dpi_start ||
-	    ((phys_addr + len) > (dpi_start + ucontext->dpi_size))) {
-		DP_ERR(dev,
-		       "failed mmap, pages are outside of dpi; page address=0x%pK, dpi_start=0x%pK, dpi_size=0x%x\n",
-		       (void *)phys_addr, (void *)dpi_start,
-		       ucontext->dpi_size);
+	rdma_entry = rdma_user_mmap_entry_get(ucontext, key, length, vma);
+	if (!rdma_entry) {
+		ibdev_dbg(dev, "key[%#llx] does not have valid entry\n",
+			  key);
 		return -EINVAL;
 	}
+	entry = get_qedr_mmap_entry(rdma_entry);
+	if (entry->length != length) {
+		ibdev_dbg(dev,
+			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
+			  key, length, entry->length);
+		err = -EINVAL;
+		goto err;
+	}
 
-	if (vma->vm_flags & VM_READ) {
-		DP_ERR(dev, "failed mmap, cannot map doorbell bar for read\n");
-		return -EINVAL;
+	ibdev_dbg(dev,
+		  "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n",
+		  entry->address, length, entry->mmap_flag);
+
+	pfn = entry->address >> PAGE_SHIFT;
+	switch (entry->mmap_flag) {
+	case QEDR_USER_MMAP_IO_WC:
+		err = rdma_user_mmap_io(ucontext, vma, pfn, length,
+					pgprot_writecombine(vma->vm_page_prot));
+		break;
+	default:
+		err = -EINVAL;
+	}
+
+	if (err) {
+		ibdev_dbg(dev,
+			  "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
+			  entry->address, length, entry->mmap_flag, err);
+		goto err;
 	}
 
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, len,
-				  vma->vm_page_prot);
+	return 0;
+
+err:
+	rdma_user_mmap_entry_put(ucontext, rdma_entry);
+	return err;
 }
 
 int qedr_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 9aaa90283d6e..3606e97e95da 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -46,7 +46,8 @@ int qedr_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
 void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
 
-int qedr_mmap(struct ib_ucontext *, struct vm_area_struct *vma);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
+void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (4 preceding siblings ...)
  2019-09-05 10:01 ` [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  2019-09-19 18:02   ` Jason Gunthorpe
  2019-09-05 10:01 ` [PATCH v11 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
  6 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  |   7 +
 drivers/infiniband/hw/qedr/verbs.c | 315 +++++++++++++++++++++++++++++++------
 include/uapi/rdma/qedr-abi.h       |  25 +++
 3 files changed, 299 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index f273906fca19..04f6f4fbe276 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -234,6 +234,7 @@ struct qedr_ucontext {
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
+	bool db_rec;
 };
 
 union db_prod64 {
@@ -261,6 +262,12 @@ struct qedr_userq {
 	struct qedr_pbl *pbl_tbl;
 	u64 buf_addr;
 	size_t buf_len;
+
+	/* doorbell recovery */
+	void __iomem *db_addr;
+	struct qedr_user_db_rec *db_rec_data;
+	u64 db_rec_phys;
+	u64 db_rec_key;
 };
 
 struct qedr_cq {
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index b796ce2c78bc..922c203ca0ea 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -60,6 +60,7 @@
 
 enum {
 	QEDR_USER_MMAP_IO_WC = 0,
+	QEDR_USER_MMAP_PHYS_PAGE,
 };
 
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
@@ -266,6 +267,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	int rc;
 	struct qedr_ucontext *ctx = get_qedr_ucontext(uctx);
 	struct qedr_alloc_ucontext_resp uresp = {};
+	struct qedr_alloc_ucontext_req ureq = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
 	struct qedr_user_mmap_entry *entry;
@@ -273,6 +275,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	if (!udata)
 		return -EFAULT;
 
+	if (udata->inlen) {
+		rc = ib_copy_from_udata(&ureq, udata,
+					min(sizeof(ureq), udata->inlen));
+		if (rc) {
+			DP_ERR(dev, "Problem copying data from user space\n");
+			return -EFAULT;
+		}
+
+		ctx->db_rec = !!(ureq.context_flags & QEDR_ALLOC_UCTX_DB_REC);
+	}
+
 	rc = dev->ops->rdma_add_user(dev->rdma_ctx, &oparams);
 	if (rc) {
 		DP_ERR(dev,
@@ -347,6 +360,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
 	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
 
+	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
+		free_page((unsigned long)phys_to_virt(entry->address));
+
 	kfree(entry);
 }
 
@@ -402,6 +418,9 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 		err = rdma_user_mmap_io(ucontext, vma, pfn, length,
 					pgprot_writecombine(vma->vm_page_prot));
 		break;
+	case QEDR_USER_MMAP_PHYS_PAGE:
+		err = vm_insert_page(vma, vma->vm_start, pfn_to_page(pfn));
+		break;
 	default:
 		err = -EINVAL;
 	}
@@ -647,16 +666,48 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
 	}
 }
 
+static int qedr_db_recovery_add(struct qedr_dev *dev,
+				void __iomem *db_addr,
+				void *db_data,
+				enum qed_db_rec_width db_width,
+				enum qed_db_rec_space db_space)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return 0;
+	}
+
+	return dev->ops->common->db_recovery_add(dev->cdev, db_addr, db_data,
+						 db_width, db_space);
+}
+
+static void qedr_db_recovery_del(struct qedr_dev *dev,
+				 void __iomem *db_addr,
+				 void *db_data)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return;
+	}
+
+	/* Ignore return code as there is not much we can do about it. Error
+	 * log will be printed inside.
+	 */
+	dev->ops->common->db_recovery_del(dev->cdev, db_addr, db_data);
+}
+
 static int qedr_copy_cq_uresp(struct qedr_dev *dev,
-			      struct qedr_cq *cq, struct ib_udata *udata)
+			      struct qedr_cq *cq, struct ib_udata *udata,
+			      u32 db_offset)
 {
 	struct qedr_create_cq_uresp uresp;
 	int rc;
 
 	memset(&uresp, 0, sizeof(uresp));
 
-	uresp.db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+	uresp.db_offset = db_offset;
 	uresp.icid = cq->icid;
+	uresp.db_rec_addr = cq->q.db_rec_key;
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
@@ -684,10 +735,56 @@ static inline int qedr_align_cq_entries(int entries)
 	return aligned_size / QEDR_CQE_SIZE;
 }
 
+static int qedr_init_user_db_rec(struct ib_udata *udata,
+				 struct qedr_dev *dev, struct qedr_userq *q,
+				 bool requires_db_rec)
+{
+	struct qedr_ucontext *uctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+	struct qedr_user_mmap_entry *entry;
+
+	/* Aborting for non doorbell userqueue (SRQ) or non-supporting lib */
+	if (requires_db_rec == 0 || !uctx->db_rec)
+		return 0;
+
+	/* Allocate a page for doorbell recovery, add to mmap ) */
+	q->db_rec_data = (void *)get_zeroed_page(GFP_USER);
+	if (!q->db_rec_data) {
+		DP_ERR(dev,
+		       "get_free_page failed\n");
+		return -ENOMEM;
+	}
+
+	q->db_rec_phys = virt_to_phys(q->db_rec_data);
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto err_nomem;
+
+	entry->address = q->db_rec_phys;
+	entry->length = PAGE_SIZE;
+	entry->mmap_flag = QEDR_USER_MMAP_PHYS_PAGE;
+	q->db_rec_key = rdma_user_mmap_entry_insert(&uctx->ibucontext,
+						    &entry->rdma_entry,
+						    PAGE_SIZE);
+	if (q->db_rec_key == RDMA_USER_MMAP_INVALID) {
+		kfree(entry);
+		goto err_nomem;
+	}
+
+	return 0;
+
+err_nomem:
+	free_page((unsigned long)q->db_rec_data);
+	q->db_rec_data = NULL;
+	return -ENOMEM;
+}
+
 static inline int qedr_init_user_queue(struct ib_udata *udata,
 				       struct qedr_dev *dev,
 				       struct qedr_userq *q, u64 buf_addr,
-				       size_t buf_len, int access, int dmasync,
+				       size_t buf_len, bool requires_db_rec,
+				       int access, int dmasync,
 				       int alloc_and_init)
 {
 	u32 fw_pages;
@@ -725,7 +822,8 @@ static inline int qedr_init_user_queue(struct ib_udata *udata,
 		}
 	}
 
-	return 0;
+	/* mmap the user address used to store doorbell data for recovery */
+	return qedr_init_user_db_rec(udata, dev, q, requires_db_rec);
 
 err0:
 	ib_umem_release(q->umem);
@@ -811,6 +909,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int entries = attr->cqe;
 	struct qedr_cq *cq = get_qedr_cq(ibcq);
 	int chain_entries;
+	u32 db_offset;
 	int page_cnt;
 	u64 pbl_ptr;
 	u16 icid;
@@ -830,8 +929,12 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	chain_entries = qedr_align_cq_entries(entries);
 	chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
 
+	/* calc db offset. user will add DPI base, kernel will add db addr */
+	db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create cq: problem copying data from user space\n");
 			goto err0;
@@ -846,8 +949,9 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->cq_type = QEDR_CQ_TYPE_USER;
 
 		rc = qedr_init_user_queue(udata, dev, &cq->q, ureq.addr,
-					  ureq.len, IB_ACCESS_LOCAL_WRITE, 1,
-					  1);
+					  ureq.len, true,
+					  IB_ACCESS_LOCAL_WRITE,
+					  1, 1);
 		if (rc)
 			goto err0;
 
@@ -855,6 +959,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		page_cnt = cq->q.pbl_info.num_pbes;
 
 		cq->ibcq.cqe = chain_entries;
+		cq->q.db_addr = ctx->dpi_addr + db_offset;
 	} else {
 		cq->cq_type = QEDR_CQ_TYPE_KERNEL;
 
@@ -866,7 +971,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 						   sizeof(union rdma_cqe),
 						   &cq->pbl, NULL);
 		if (rc)
-			goto err1;
+			goto err0;
 
 		page_cnt = qed_chain_get_page_cnt(&cq->pbl);
 		pbl_ptr = qed_chain_get_pbl_phys(&cq->pbl);
@@ -878,21 +983,28 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	rc = dev->ops->rdma_create_cq(dev->rdma_ctx, &params, &icid);
 	if (rc)
-		goto err2;
+		goto err1;
 
 	cq->icid = icid;
 	cq->sig = QEDR_CQ_MAGIC_NUMBER;
 	spin_lock_init(&cq->cq_lock);
 
 	if (udata) {
-		rc = qedr_copy_cq_uresp(dev, cq, udata);
+		rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
 		if (rc)
-			goto err3;
+			goto err2;
+
+		rc = qedr_db_recovery_add(dev, cq->q.db_addr,
+					  &cq->q.db_rec_data->db_data,
+					  DB_REC_WIDTH_64B,
+					  DB_REC_USER);
+		if (rc)
+			goto err2;
+
 	} else {
 		/* Generate doorbell address. */
-		cq->db_addr = dev->db_addr +
-		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
 		cq->db.data.icid = cq->icid;
+		cq->db_addr = dev->db_addr + db_offset;
 		cq->db.data.params = DB_AGG_CMD_SET <<
 		    RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
 
@@ -902,6 +1014,11 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->latest_cqe = NULL;
 		consume_cqe(cq);
 		cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
+
+		rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
+					  DB_REC_WIDTH_64B, DB_REC_KERNEL);
+		if (rc)
+			goto err2;
 	}
 
 	DP_DEBUG(dev, QEDR_MSG_CQ,
@@ -910,18 +1027,20 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	return 0;
 
-err3:
+err2:
 	destroy_iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &destroy_iparams,
 				  &destroy_oparams);
-err2:
-	if (udata)
-		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
-	else
-		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
 err1:
-	if (udata)
+	if (udata) {
+		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+		if (cq->q.db_rec_data)
+			rdma_user_mmap_entry_remove(&ctx->ibucontext,
+						    cq->q.db_rec_key);
+	} else {
+		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
+	}
 err0:
 	return -EINVAL;
 }
@@ -941,6 +1060,8 @@ int qedr_resize_cq(struct ib_cq *ibcq, int new_cnt, struct ib_udata *udata)
 
 void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
+	struct qedr_ucontext *ctx = rdma_udata_to_drv_context(udata,
+		struct qedr_ucontext, ibucontext);
 	struct qedr_dev *dev = get_qedr_dev(ibcq->device);
 	struct qed_rdma_destroy_cq_out_params oparams;
 	struct qed_rdma_destroy_cq_in_params iparams;
@@ -952,8 +1073,10 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	cq->destroyed = 1;
 
 	/* GSIs CQs are handled by driver, so they don't exist in the FW */
-	if (cq->cq_type == QEDR_CQ_TYPE_GSI)
+	if (cq->cq_type == QEDR_CQ_TYPE_GSI) {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 		return;
+	}
 
 	iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &iparams, &oparams);
@@ -962,6 +1085,15 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	if (udata) {
 		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+
+		if (cq->q.db_rec_data) {
+			qedr_db_recovery_del(dev, cq->q.db_addr,
+					     &cq->q.db_rec_data->db_data);
+			rdma_user_mmap_entry_remove(&ctx->ibucontext,
+						    cq->q.db_rec_key);
+		}
+	} else {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 	}
 
 	/* We don't want the IRQ handler to handle a non-existing CQ so we
@@ -1126,8 +1258,8 @@ static int qedr_copy_srq_uresp(struct qedr_dev *dev,
 }
 
 static void qedr_copy_rq_uresp(struct qedr_dev *dev,
-			       struct qedr_create_qp_uresp *uresp,
-			       struct qedr_qp *qp)
+			      struct qedr_create_qp_uresp *uresp,
+			      struct qedr_qp *qp)
 {
 	/* iWARP requires two doorbells per RQ. */
 	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
@@ -1140,6 +1272,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev,
 	}
 
 	uresp->rq_icid = qp->icid;
+	uresp->rq_db_rec_addr = qp->urq.db_rec_key;
 }
 
 static void qedr_copy_sq_uresp(struct qedr_dev *dev,
@@ -1153,22 +1286,24 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev,
 		uresp->sq_icid = qp->icid;
 	else
 		uresp->sq_icid = qp->icid + 1;
+
+	uresp->sq_db_rec_addr = qp->usq.db_rec_key;
 }
 
 static int qedr_copy_qp_uresp(struct qedr_dev *dev,
-			      struct qedr_qp *qp, struct ib_udata *udata)
+			      struct qedr_qp *qp, struct ib_udata *udata,
+			      struct qedr_create_qp_uresp *uresp)
 {
-	struct qedr_create_qp_uresp uresp;
 	int rc;
 
-	memset(&uresp, 0, sizeof(uresp));
-	qedr_copy_sq_uresp(dev, &uresp, qp);
-	qedr_copy_rq_uresp(dev, &uresp, qp);
+	memset(uresp, 0, sizeof(*uresp));
+	qedr_copy_sq_uresp(dev, uresp, qp);
+	qedr_copy_rq_uresp(dev, uresp, qp);
 
-	uresp.atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
-	uresp.qp_id = qp->qp_id;
+	uresp->atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
+	uresp->qp_id = qp->qp_id;
 
-	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+	rc = qedr_ib_copy_to_udata(udata, uresp, sizeof(*uresp));
 	if (rc)
 		DP_ERR(dev,
 		       "create qp: failed a copy to user space with qp icid=0x%x.\n",
@@ -1212,16 +1347,35 @@ static void qedr_set_common_qp_params(struct qedr_dev *dev,
 		 qp->sq.max_sges, qp->sq_cq->icid);
 }
 
-static void qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid + 1;
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	if (!qp->srq) {
 		qp->rq.db = dev->db_addr +
 			    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD);
 		qp->rq.db_data.data.icid = qp->icid;
+
+		rc = qedr_db_recovery_add(dev, qp->rq.db,
+					  &qp->rq.db_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_KERNEL);
+		if (rc)
+			qedr_db_recovery_del(dev, qp->sq.db,
+					     &qp->sq.db_data);
 	}
+
+	return rc;
 }
 
 static int qedr_check_srq_params(struct qedr_dev *dev,
@@ -1275,7 +1429,7 @@ static int qedr_init_srq_user_params(struct ib_udata *udata,
 	int rc;
 
 	rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, ureq->srq_addr,
-				  ureq->srq_len, access, dmasync, 1);
+				  ureq->srq_len, false, access, dmasync, 1);
 	if (rc)
 		return rc;
 
@@ -1371,7 +1525,8 @@ int qedr_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr,
 	hw_srq->max_sges = init_attr->attr.max_sge;
 
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create srq: problem copying data from user space\n");
 			goto err0;
@@ -1560,13 +1715,29 @@ qedr_iwarp_populate_user_qp(struct qedr_dev *dev,
 			   &qp->urq.pbl_info, FW_PAGE_SHIFT);
 }
 
-static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
+static void qedr_cleanup_user(struct qedr_dev *dev,
+			      struct qedr_ucontext *ctx,
+			      struct qedr_qp *qp)
 {
 	ib_umem_release(qp->usq.umem);
 	qp->usq.umem = NULL;
 
 	ib_umem_release(qp->urq.umem);
 	qp->urq.umem = NULL;
+
+	if (qp->usq.db_rec_data) {
+		qedr_db_recovery_del(dev, qp->usq.db_addr,
+				     &qp->usq.db_rec_data->db_data);
+		rdma_user_mmap_entry_remove(&ctx->ibucontext,
+					    qp->usq.db_rec_key);
+	}
+
+	if (qp->urq.db_rec_data) {
+		qedr_db_recovery_del(dev, qp->urq.db_addr,
+				     &qp->urq.db_rec_data->db_data);
+		rdma_user_mmap_entry_remove(&ctx->ibucontext,
+					    qp->urq.db_rec_key);
+	}
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1578,12 +1749,14 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	struct qed_rdma_create_qp_in_params in_params;
 	struct qed_rdma_create_qp_out_params out_params;
 	struct qedr_pd *pd = get_qedr_pd(ibpd);
+	struct qedr_create_qp_uresp uresp;
+	struct qedr_ucontext *ctx = NULL;
 	struct qedr_create_qp_ureq ureq;
 	int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1);
 	int rc = -EINVAL;
 
 	memset(&ureq, 0, sizeof(ureq));
-	rc = ib_copy_from_udata(&ureq, udata, sizeof(ureq));
+	rc = ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), udata->inlen));
 	if (rc) {
 		DP_ERR(dev, "Problem copying data from user space\n");
 		return rc;
@@ -1591,14 +1764,16 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 
 	/* SQ - read access only (0), dma sync not required (0) */
 	rc = qedr_init_user_queue(udata, dev, &qp->usq, ureq.sq_addr,
-				  ureq.sq_len, 0, 0, alloc_and_init);
+				  ureq.sq_len, true, 0, 0,
+				  alloc_and_init);
 	if (rc)
 		return rc;
 
 	if (!qp->srq) {
 		/* RQ - read access only (0), dma sync not required (0) */
 		rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
-					  ureq.rq_len, 0, 0, alloc_and_init);
+					  ureq.rq_len, true,
+					  0, 0, alloc_and_init);
 		if (rc)
 			return rc;
 	}
@@ -1628,29 +1803,56 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	rc = qedr_copy_qp_uresp(dev, qp, udata);
+	rc = qedr_copy_qp_uresp(dev, qp, udata, &uresp);
+	if (rc)
+		goto err;
+
+	/* db offset was calculated in copy_qp_uresp, now set in the user q */
+	ctx = pd->uctx;
+	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
+	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
+
+	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
+				  &qp->usq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
 	if (rc)
 		goto err;
 
+	rc = qedr_db_recovery_add(dev, qp->urq.db_addr,
+				  &qp->urq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
+	if (rc)
+		goto err;
 	qedr_qp_user_print(dev, qp);
 
-	return 0;
+	return rc;
 err:
 	rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
 	if (rc)
 		DP_ERR(dev, "create qp: fatal fault. rc=%d", rc);
 
 err1:
-	qedr_cleanup_user(dev, qp);
+	qedr_cleanup_user(dev, ctx, qp);
 	return rc;
 }
 
-static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 	    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid;
 
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	qp->rq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD);
 	qp->rq.db_data.data.icid = qp->icid;
@@ -1658,6 +1860,13 @@ static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 			   DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_FLAGS);
 	qp->rq.iwarp_db2_data.data.icid = qp->icid;
 	qp->rq.iwarp_db2_data.data.value = DQ_TCM_IWARP_POST_RQ_CF_CMD;
+
+	rc = qedr_db_recovery_add(dev, qp->rq.db,
+				  &qp->rq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+
+	return rc;
 }
 
 static int
@@ -1705,8 +1914,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_roce_db_info(dev, qp);
-	return rc;
+	return qedr_set_roce_db_info(dev, qp);
 }
 
 static int
@@ -1764,8 +1972,7 @@ qedr_iwarp_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_iwarp_db_info(dev, qp);
-	return rc;
+	return qedr_set_iwarp_db_info(dev, qp);
 
 err:
 	dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -1780,6 +1987,15 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	dev->ops->common->chain_free(dev->cdev, &qp->rq.pbl);
 	kfree(qp->rqe_wr_id);
+
+	/* GSI qp is not registered to db mechanism so no need to delete */
+	if (qp->qp_type == IB_QPT_GSI)
+		return;
+
+	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
+
+	if (!qp->srq)
+		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
@@ -2419,7 +2635,10 @@ int qedr_query_qp(struct ib_qp *ibqp,
 static int qedr_free_qp_resources(struct qedr_dev *dev, struct qedr_qp *qp,
 				  struct ib_udata *udata)
 {
-	int rc = 0;
+	struct qedr_ucontext *ctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+	int rc;
 
 	if (qp->qp_type != IB_QPT_GSI) {
 		rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -2428,7 +2647,7 @@ static int qedr_free_qp_resources(struct qedr_dev *dev, struct qedr_qp *qp,
 	}
 
 	if (udata)
-		qedr_cleanup_user(dev, qp);
+		qedr_cleanup_user(dev, ctx, qp);
 	else
 		qedr_cleanup_kernel(dev, qp);
 
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 7a10b3a325fa..c022ee26089b 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -38,6 +38,15 @@
 #define QEDR_ABI_VERSION		(8)
 
 /* user kernel communication data structures. */
+enum qedr_alloc_ucontext_flags {
+	QEDR_ALLOC_UCTX_RESERVED	= 1 << 0,
+	QEDR_ALLOC_UCTX_DB_REC		= 1 << 1
+};
+
+struct qedr_alloc_ucontext_req {
+	__u32 context_flags;
+	__u32 reserved;
+};
 
 struct qedr_alloc_ucontext_resp {
 	__aligned_u64 db_pa;
@@ -74,6 +83,7 @@ struct qedr_create_cq_uresp {
 	__u32 db_offset;
 	__u16 icid;
 	__u16 reserved;
+	__aligned_u64 db_rec_addr;
 };
 
 struct qedr_create_qp_ureq {
@@ -109,6 +119,13 @@ struct qedr_create_qp_uresp {
 
 	__u32 rq_db2_offset;
 	__u32 reserved;
+
+	/* address of SQ doorbell recovery user entry */
+	__aligned_u64 sq_db_rec_addr;
+
+	/* address of RQ doorbell recovery user entry */
+	__aligned_u64 rq_db_rec_addr;
+
 };
 
 struct qedr_create_srq_ureq {
@@ -128,4 +145,12 @@ struct qedr_create_srq_uresp {
 	__u32 reserved1;
 };
 
+/* doorbell recovery entry allocated and populated by userspace doorbelling
+ * entities and mapped to kernel. Kernel uses this to register doorbell
+ * information with doorbell drop recovery mechanism.
+ */
+struct qedr_user_db_rec {
+	__aligned_u64 db_data; /* doorbell data */
+};
+
 #endif /* __QEDR_USER_H__ */
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v11 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell recovery support
  2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (5 preceding siblings ...)
  2019-09-05 10:01 ` [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
@ 2019-09-05 10:01 ` Michal Kalderon
  6 siblings, 0 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-05 10:01 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon; +Cc: linux-rdma

This patch adds the iWARP specific doorbells to the doorbell
recovery mechanism

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  | 12 +++++++-----
 drivers/infiniband/hw/qedr/verbs.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 04f6f4fbe276..7661d767815c 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -237,6 +237,11 @@ struct qedr_ucontext {
 	bool db_rec;
 };
 
+union db_prod32 {
+	struct rdma_pwm_val16_data data;
+	u32 raw;
+};
+
 union db_prod64 {
 	struct rdma_pwm_val32_data data;
 	u64 raw;
@@ -268,6 +273,8 @@ struct qedr_userq {
 	struct qedr_user_db_rec *db_rec_data;
 	u64 db_rec_phys;
 	u64 db_rec_key;
+	void __iomem *db_rec_db2_addr;
+	union db_prod32 db_rec_db2_data;
 };
 
 struct qedr_cq {
@@ -303,11 +310,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-union db_prod32 {
-	struct rdma_pwm_val16_data data;
-	u32 raw;
-};
-
 struct qedr_qp_hwq_info {
 	/* WQE Elements */
 	struct qed_chain pbl;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 922c203ca0ea..524dd3682d8d 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1738,6 +1738,10 @@ static void qedr_cleanup_user(struct qedr_dev *dev,
 		rdma_user_mmap_entry_remove(&ctx->ibucontext,
 					    qp->urq.db_rec_key);
 	}
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1))
+		qedr_db_recovery_del(dev, qp->urq.db_rec_db2_addr,
+				     &qp->urq.db_rec_db2_data);
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1812,6 +1816,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
 	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
 
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		qp->urq.db_rec_db2_addr = ctx->dpi_addr + uresp.rq_db2_offset;
+
+		/* calculate the db_rec_db2 data since it is constant so no
+		 *  need to reflect from user
+		 */
+		qp->urq.db_rec_db2_data.data.icid = cpu_to_le16(qp->icid);
+		qp->urq.db_rec_db2_data.data.value =
+			cpu_to_le16(DQ_TCM_IWARP_POST_RQ_CF_CMD);
+	}
+
 	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
 				  &qp->usq.db_rec_data->db_data,
 				  DB_REC_WIDTH_32B,
@@ -1825,6 +1840,15 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 				  DB_REC_USER);
 	if (rc)
 		goto err;
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		rc = qedr_db_recovery_add(dev, qp->urq.db_rec_db2_addr,
+					  &qp->urq.db_rec_db2_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_USER);
+		if (rc)
+			goto err;
+	}
 	qedr_qp_user_print(dev, qp);
 
 	return rc;
@@ -1865,7 +1889,13 @@ static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 				  &qp->rq.db_data,
 				  DB_REC_WIDTH_32B,
 				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
 
+	rc = qedr_db_recovery_add(dev, qp->rq.iwarp_db2,
+				  &qp->rq.iwarp_db2_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
 	return rc;
 }
 
@@ -1994,8 +2024,13 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
 
-	if (!qp->srq)
+	if (!qp->srq) {
 		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
+
+		if (rdma_protocol_iwarp(&dev->ibdev, 1))
+			qedr_db_recovery_del(dev, qp->rq.iwarp_db2,
+					     &qp->rq.iwarp_db2_data);
+	}
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-05 10:01 ` [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
@ 2019-09-19 17:14   ` Jason Gunthorpe
  2019-09-19 17:48   ` Jason Gunthorpe
  2019-09-19 18:07   ` Jason Gunthorpe
  2 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-19 17:14 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Thu, Sep 05, 2019 at 01:01:12PM +0300, Michal Kalderon wrote:

>  int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  		      unsigned long pfn, unsigned long size, pgprot_t prot)
>  {
>  	struct ib_uverbs_file *ufile = ucontext->ufile;
> -	struct rdma_umap_priv *priv;
> +	int ret;
>  
>  	if (!(vma->vm_flags & VM_SHARED))
>  		return -EINVAL;
> @@ -57,17 +114,240 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  		return -EINVAL;
>  	lockdep_assert_held(&ufile->device->disassociate_srcu);
>  
> -	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> -	if (!priv)
> -		return -ENOMEM;
> +	ret = rdma_umap_priv_init(vma, NULL);
> +	if (ret)
> +		return ret;
>  
>  	vma->vm_page_prot = prot;
>  	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
> -		kfree(priv);
> +		rdma_umap_priv_delete(ufile, vma->vm_private_data);
>  		return -EAGAIN;

This leaves vma->vm_private_data set to the priv after it has been
freed. This does not seem like such a good idea, can it use after
free? It looks like no, but still would not be a bad idea to null it
here.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-09-05 10:01 ` [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
@ 2019-09-19 17:37   ` Jason Gunthorpe
  2019-09-23  9:15     ` Michal Kalderon
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Thu, Sep 05, 2019 at 01:01:13PM +0300, Michal Kalderon wrote:
>  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
> -		      struct vm_area_struct *vma, u64 key, u64 length)
> +		      struct vm_area_struct *vma, u64 key, size_t length)
>  {
> -	struct efa_mmap_entry *entry;
> +	struct rdma_user_mmap_entry *rdma_entry;
> +	struct efa_user_mmap_entry *entry;
>  	unsigned long va;
>  	u64 pfn;
>  	int err;
>  
> -	entry = mmap_entry_get(dev, ucontext, key, length);
> -	if (!entry) {
> +	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
> +					      length, vma);
> +	if (!rdma_entry) {

This allocates memory and assigns it to vma->vm_private

>  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
>  			  key);
>  		return -EINVAL;
>  	}
> +	entry = to_emmap(rdma_entry);
> +	if (entry->length != length) {
> +		ibdev_dbg(&dev->ibdev,
> +			  "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n",
> +			  key, length, entry->length);
> +		err = -EINVAL;
> +		goto err;

Now we take an error..

> +err:
> +	rdma_user_mmap_entry_put(&ucontext->ibucontext,
> +				 rdma_entry);

And this leaks the struct rdma_umap_priv

I said this already, but it looks wrong that rdma_umap_priv_init() is
testing vm_private_data. rdma_user_mmap_io should accept the struct
rdma_user_mmap_entry pointer so it can directly and always create the
priv instead of trying to pass allocated memory through the vm_private

The only place that should set vm_private_data is rdma_user_mmap_io()
and once it succeeds the driver must return success back through
file_operations->mmap()

This hidden detail should also be noted in the comment for
rdma_user_mmap_io..

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-05 10:01 ` [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
  2019-09-19 17:14   ` Jason Gunthorpe
@ 2019-09-19 17:48   ` Jason Gunthorpe
  2019-09-19 18:07   ` Jason Gunthorpe
  2 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-19 17:48 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Thu, Sep 05, 2019 at 01:01:12PM +0300, Michal Kalderon wrote:

> @@ -917,6 +921,9 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
>  
>  			priv = list_first_entry(&ufile->umaps,
>  						struct rdma_umap_priv, list);
> +			if (priv->entry)
> +				rdma_user_mmap_entry_put(ufile->ucontext,
> +							 priv->entry);

This is in the wrong place, as the comment says the purpose of this
loop is only to get a single mm the actual zap & deletion is the 2nd
loop below, which is where this should be.

Further, it needs to set priv->entry to null after putting it
otherwise it will be double put when rdma_umap_close() is eventually
called.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-05 10:01 ` [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
@ 2019-09-19 17:55   ` Jason Gunthorpe
  2019-09-20 13:39     ` Gal Pressman
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-19 17:55 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Thu, Sep 05, 2019 at 01:01:15PM +0300, Michal Kalderon wrote:

> -int qedr_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
> +void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
>  {
> -	struct qedr_ucontext *ucontext = get_qedr_ucontext(context);
> -	struct qedr_dev *dev = get_qedr_dev(context->device);
> -	unsigned long phys_addr = vma->vm_pgoff << PAGE_SHIFT;
> -	unsigned long len = (vma->vm_end - vma->vm_start);
> -	unsigned long dpi_start;
> +	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
>  
> -	dpi_start = dev->db_phys_addr + (ucontext->dpi * ucontext->dpi_size);
> -
> -	DP_DEBUG(dev, QEDR_MSG_INIT,
> -		 "mmap invoked with vm_start=0x%pK, vm_end=0x%pK,vm_pgoff=0x%pK; dpi_start=0x%pK dpi_size=0x%x\n",
> -		 (void *)vma->vm_start, (void *)vma->vm_end,
> -		 (void *)vma->vm_pgoff, (void *)dpi_start, ucontext->dpi_size);
> +	kfree(entry);
> +}

Huh. If you recall we did all this work with the XA and the free
callback because you said qedr was mmaping BAR pages that had some HW
lifetime associated with them, and the HW resource was not to be
reallocated until all users were gone.

I think it would be a better example of this API if you pulled the

 	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);

Into qedr_mmap_free().

Then the rdma_user_mmap_entry_remove() will call it naturally as it
does entry_put() and if we are destroying the ucontext we already know
the mmaps are destroyed.

Maybe the same basic comment for EFA, not sure. Gal?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-05 10:01 ` [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
@ 2019-09-19 18:02   ` Jason Gunthorpe
  2019-09-20 13:30     ` Gal Pressman
  2019-09-23  9:30     ` Michal Kalderon
  0 siblings, 2 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-19 18:02 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:

> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
>  {
>  	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
>  
> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> +		free_page((unsigned long)phys_to_virt(entry->address));
> +

While it isn't wrong it do it this way, we don't need this mmap_free()
stuff for normal CPU pages. Those are refcounted and qedr can simply
call free_page() during the teardown of the uobject that is using the
this page. This is what other drivers already do.

I'm also not sure why qedr is using a phys_addr for a struct page
object, seems wrong.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-05 10:01 ` [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
  2019-09-19 17:14   ` Jason Gunthorpe
  2019-09-19 17:48   ` Jason Gunthorpe
@ 2019-09-19 18:07   ` Jason Gunthorpe
  2019-09-23  9:36     ` [EXT] " Michal Kalderon
  2 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-19 18:07 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Thu, Sep 05, 2019 at 01:01:12PM +0300, Michal Kalderon wrote:
> +/**
> + * rdma_user_mmap_entry_remove() - Remove a key's entry from the mmap_xa
> + *
> + * @ucontext: associated user context.
> + * @key: the key to be deleted
> + *
> + * This function will find if there is an entry matching the key and if so
> + * decrease its refcnt, which will in turn delete the entry if
> + * its refcount reaches zero.
> + */
> +void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, u64 key)

Since the struct rdma_user_mmap_entry already has both of these,
doesn't it make more sense to pass in the struct pointer to _remove
than store the key?

Ie replace things like ctx->db_key with a pointer and make the
rdma_user_mmap_get_key() into a header inline


> +/**
> + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @entry: the entry to insert into the mmap_xa
> + * @length: length of the address that will be mmapped
> + *
> + * This function should be called by drivers that use the rdma_user_mmap
> + * interface for handling user mmapped addresses. The database is handled in
> + * the core and helper functions are provided to insert entries into the
> + * database and extract entries when the user call mmap with the given key.
> + * The function returns a unique key that should be provided to user, the user
> + * will use the key to retrieve information such as address to
> + * be mapped and how.
> + *
> + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
> + */
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> +				struct rdma_user_mmap_entry *entry,
> +				size_t length)

The similarly we don't need to return a u64 here and the sort of ugly
RDMA_USER_MMAP_INVALID can go away

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-19 18:02   ` Jason Gunthorpe
@ 2019-09-20 13:30     ` Gal Pressman
  2019-09-20 13:38       ` Jason Gunthorpe
  2019-09-23  9:30     ` Michal Kalderon
  1 sibling, 1 reply; 41+ messages in thread
From: Gal Pressman @ 2019-09-20 13:30 UTC (permalink / raw)
  To: Jason Gunthorpe, Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, sleybo, leon, linux-rdma

On 19/09/2019 21:02, Jason Gunthorpe wrote:
> On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> 
>> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
>>  {
>>  	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
>>  
>> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
>> +		free_page((unsigned long)phys_to_virt(entry->address));
>> +
> 
> While it isn't wrong it do it this way, we don't need this mmap_free()
> stuff for normal CPU pages. Those are refcounted and qedr can simply
> call free_page() during the teardown of the uobject that is using the
> this page. This is what other drivers already do.

This is pretty much what EFA does as well.
When we allocate pages for the user (CQ for example), we DMA map them and later
on mmap them to the user. We expect those pages to remain until the entry is
freed, how can we call free_page, who is holding a refcount on those except for
the driver?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-20 13:30     ` Gal Pressman
@ 2019-09-20 13:38       ` Jason Gunthorpe
  2019-09-20 14:00         ` Gal Pressman
  2019-09-25 19:16         ` [EXT] " Michal Kalderon
  0 siblings, 2 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-20 13:38 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Michal Kalderon, mkalderon, aelior, dledford, bmt, sleybo, leon,
	linux-rdma

On Fri, Sep 20, 2019 at 04:30:52PM +0300, Gal Pressman wrote:
> On 19/09/2019 21:02, Jason Gunthorpe wrote:
> > On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> > 
> >> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
> >>  {
> >>  	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
> >>  
> >> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> >> +		free_page((unsigned long)phys_to_virt(entry->address));
> >> +
> > 
> > While it isn't wrong it do it this way, we don't need this mmap_free()
> > stuff for normal CPU pages. Those are refcounted and qedr can simply
> > call free_page() during the teardown of the uobject that is using the
> > this page. This is what other drivers already do.
> 
> This is pretty much what EFA does as well.  When we allocate pages
> for the user (CQ for example), we DMA map them and later on mmap
> them to the user. We expect those pages to remain until the entry is
> freed, how can we call free_page, who is holding a refcount on those
> except for the driver?

free_page is kind of a lie, it is more like put_page (see
__free_pages). I think the difference is that it assumes the page came
from alloc_page and skips some generic stuff when freeing it.

When the driver does vm_insert_page the vma holds another refcount and
the refcount does not go to zero until that page drops out of the
vma (ie at the same time mmap_free above is called). 

Then __put_page will do the free_unref_page(), etc.

So for CPU pages it is fine to not use mmap_free so long as
vm_insert_page is used

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-19 17:55   ` Jason Gunthorpe
@ 2019-09-20 13:39     ` Gal Pressman
  2019-09-23  9:21       ` Michal Kalderon
  0 siblings, 1 reply; 41+ messages in thread
From: Gal Pressman @ 2019-09-20 13:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Michal Kalderon
  Cc: mkalderon, aelior, dledford, bmt, sleybo, leon, linux-rdma

On 19/09/2019 20:55, Jason Gunthorpe wrote:
> Huh. If you recall we did all this work with the XA and the free
> callback because you said qedr was mmaping BAR pages that had some HW
> lifetime associated with them, and the HW resource was not to be
> reallocated until all users were gone.
> 
> I think it would be a better example of this API if you pulled the
> 
>  	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
> 
> Into qedr_mmap_free().
> 
> Then the rdma_user_mmap_entry_remove() will call it naturally as it
> does entry_put() and if we are destroying the ucontext we already know
> the mmaps are destroyed.
> 
> Maybe the same basic comment for EFA, not sure. Gal?

That's what EFA already does in this series, no?
We no longer remove entries on dealloc_ucontext, only when the entry is freed.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-20 13:38       ` Jason Gunthorpe
@ 2019-09-20 14:00         ` Gal Pressman
  2019-09-23  9:37           ` Michal Kalderon
  2019-09-25 19:16         ` [EXT] " Michal Kalderon
  1 sibling, 1 reply; 41+ messages in thread
From: Gal Pressman @ 2019-09-20 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michal Kalderon, mkalderon, aelior, dledford, bmt, sleybo, leon,
	linux-rdma

On 20/09/2019 16:38, Jason Gunthorpe wrote:
> On Fri, Sep 20, 2019 at 04:30:52PM +0300, Gal Pressman wrote:
>> On 19/09/2019 21:02, Jason Gunthorpe wrote:
>>> On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
>>>
>>>> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
>>>>  {
>>>>  	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
>>>>  
>>>> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
>>>> +		free_page((unsigned long)phys_to_virt(entry->address));
>>>> +
>>>
>>> While it isn't wrong it do it this way, we don't need this mmap_free()
>>> stuff for normal CPU pages. Those are refcounted and qedr can simply
>>> call free_page() during the teardown of the uobject that is using the
>>> this page. This is what other drivers already do.
>>
>> This is pretty much what EFA does as well.  When we allocate pages
>> for the user (CQ for example), we DMA map them and later on mmap
>> them to the user. We expect those pages to remain until the entry is
>> freed, how can we call free_page, who is holding a refcount on those
>> except for the driver?
> 
> free_page is kind of a lie, it is more like put_page (see
> __free_pages). I think the difference is that it assumes the page came
> from alloc_page and skips some generic stuff when freeing it.
> 
> When the driver does vm_insert_page the vma holds another refcount and
> the refcount does not go to zero until that page drops out of the
> vma (ie at the same time mmap_free above is called). 
> 
> Then __put_page will do the free_unref_page(), etc.
> 
> So for CPU pages it is fine to not use mmap_free so long as
> vm_insert_page is used

Thanks, I did not know this, it simplifies things.
In that case, maybe the mmap_free callback is redundant.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-09-19 17:37   ` Jason Gunthorpe
@ 2019-09-23  9:15     ` Michal Kalderon
  2019-09-23 13:28       ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-23  9:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> 
> On Thu, Sep 05, 2019 at 01:01:13PM +0300, Michal Kalderon wrote:
> >  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext
> *ucontext,
> > -		      struct vm_area_struct *vma, u64 key, u64 length)
> > +		      struct vm_area_struct *vma, u64 key, size_t length)
> >  {
> > -	struct efa_mmap_entry *entry;
> > +	struct rdma_user_mmap_entry *rdma_entry;
> > +	struct efa_user_mmap_entry *entry;
> >  	unsigned long va;
> >  	u64 pfn;
> >  	int err;
> >
> > -	entry = mmap_entry_get(dev, ucontext, key, length);
> > -	if (!entry) {
> > +	rdma_entry = rdma_user_mmap_entry_get(&ucontext-
> >ibucontext, key,
> > +					      length, vma);
> > +	if (!rdma_entry) {
> 
> This allocates memory and assigns it to vma->vm_private
> 
> >  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid
> entry\n",
> >  			  key);
> >  		return -EINVAL;
> >  	}
> > +	entry = to_emmap(rdma_entry);
> > +	if (entry->length != length) {
> > +		ibdev_dbg(&dev->ibdev,
> > +			  "key[%#llx] does not have valid length[%#zx]
> expected[%#zx]\n",
> > +			  key, length, entry->length);
> > +		err = -EINVAL;
> > +		goto err;
> 
> Now we take an error..
> 
> > +err:
> > +	rdma_user_mmap_entry_put(&ucontext->ibucontext,
> > +				 rdma_entry);
> 
> And this leaks the struct rdma_umap_priv
> 
> I said this already, but it looks wrong that rdma_umap_priv_init() is testing
> vm_private_data. rdma_user_mmap_io should accept the struct
> rdma_user_mmap_entry pointer so it can directly and always create the priv
> instead of trying to pass allocated memory through the vm_private
> 
> The only place that should set vm_private_data is rdma_user_mmap_io()
> and once it succeeds the driver must return success back through
> file_operations->mmap()
> 
> This hidden detail should also be noted in the comment for
> rdma_user_mmap_io..

I actually misunderstood you before and thought that we want to maintain all mappings in umap. 
Now I understand you meant only the io mappings, since they're not referenced counted by the mm system. 
I'll revert the changes and add an additional parameter to rdma_user_mmap_io -> 
However, this means more changes in drivers that call this function and don't use the mmap_xa, 
Should I add an additional interface ? or are you OK with me sending NULL to the additional parameter from 
all drivers using the interface ? (hns, mlx4, mlx5) 
Also, who should increase the refcnt of rdma_user_mmap_entry when it is set to the vma private data ? 
The caller of rdma_user_mmap_io or inside rdma_user_mmap_io function ? 
This will definitely simplify things.  


> 
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-20 13:39     ` Gal Pressman
@ 2019-09-23  9:21       ` Michal Kalderon
  2019-09-23 13:29         ` Jason Gunthorpe
  2019-09-24  8:49         ` Pressman, Gal
  0 siblings, 2 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-23  9:21 UTC (permalink / raw)
  To: Gal Pressman, Jason Gunthorpe
  Cc: Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Gal Pressman
> 
> On 19/09/2019 20:55, Jason Gunthorpe wrote:
> > Huh. If you recall we did all this work with the XA and the free
> > callback because you said qedr was mmaping BAR pages that had some HW
> > lifetime associated with them, and the HW resource was not to be
> > reallocated until all users were gone.
> >
> > I think it would be a better example of this API if you pulled the
> >
> >  	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
> >
> > Into qedr_mmap_free().
> >
> > Then the rdma_user_mmap_entry_remove() will call it naturally as it
> > does entry_put() and if we are destroying the ucontext we already know
> > the mmaps are destroyed.
> >
> > Maybe the same basic comment for EFA, not sure. Gal?
> 
> That's what EFA already does in this series, no?
> We no longer remove entries on dealloc_ucontext, only when the entry is
> freed.

Actually, I think most of the discussions you had on the topic were with Gal, but
Some apply to qedr as well, however, for qedr, the only hw resource we allocate (bar)
is on alloc_ucontext , therefore we were safe to free it on dealloc_ucontext as all mappings
were already zapped. Making the mmap_free a bit redundant for qedr except for the need
to free the entry. 

For EFA, it seemed the only operation delayed was freeing memory - I didn't see hw resources
being freed... Gal?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [EXT] Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-19 18:02   ` Jason Gunthorpe
  2019-09-20 13:30     ` Gal Pressman
@ 2019-09-23  9:30     ` Michal Kalderon
  2019-09-23 13:26       ` Jason Gunthorpe
  1 sibling, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-23  9:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, September 19, 2019 9:02 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> 
> > @@ -347,6 +360,9 @@ void qedr_mmap_free(struct
> rdma_user_mmap_entry
> > *rdma_entry)  {
> >  	struct qedr_user_mmap_entry *entry =
> > get_qedr_mmap_entry(rdma_entry);
> >
> > +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> > +		free_page((unsigned long)phys_to_virt(entry->address));
> > +
> 
> While it isn't wrong it do it this way, we don't need this mmap_free() stuff for
> normal CPU pages. Those are refcounted and qedr can simply call
> free_page() during the teardown of the uobject that is using the this page.
> This is what other drivers already do.
> 
> I'm also not sure why qedr is using a phys_addr for a struct page object,
> seems wrong.
As mentioned in previous email, I misunderstood this part before. I'll move the free
To object teardown. 
What we need here is simply a shared page between kernel + user that both have
Virtual pointers to, user writes to the page, kernel needs to read the data. 

The reason I used phys here is because the entry->address is defines as u64
As it is common whether it is an address to the bar or a page... 
Should I define a union based on the entry type ? and for a page use
struct page object ? 

> 
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [EXT] Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-19 18:07   ` Jason Gunthorpe
@ 2019-09-23  9:36     ` Michal Kalderon
  2019-09-23 13:30       ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-23  9:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, September 19, 2019 9:08 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 05, 2019 at 01:01:12PM +0300, Michal Kalderon wrote:
> > +/**
> > + * rdma_user_mmap_entry_remove() - Remove a key's entry from the
> > +mmap_xa
> > + *
> > + * @ucontext: associated user context.
> > + * @key: the key to be deleted
> > + *
> > + * This function will find if there is an entry matching the key and
> > +if so
> > + * decrease its refcnt, which will in turn delete the entry if
> > + * its refcount reaches zero.
> > + */
> > +void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext,
> u64
> > +key)
> 
> Since the struct rdma_user_mmap_entry already has both of these, doesn't
> it make more sense to pass in the struct pointer to _remove than store the
> key?
> 
> Ie replace things like ctx->db_key with a pointer and make the
> rdma_user_mmap_get_key() into a header inline
> 
> 
> > +/**
> > + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the
> mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @entry: the entry to insert into the mmap_xa
> > + * @length: length of the address that will be mmapped
> > + *
> > + * This function should be called by drivers that use the
> > +rdma_user_mmap
> > + * interface for handling user mmapped addresses. The database is
> > +handled in
> > + * the core and helper functions are provided to insert entries into
> > +the
> > + * database and extract entries when the user call mmap with the given
> key.
> > + * The function returns a unique key that should be provided to user,
> > +the user
> > + * will use the key to retrieve information such as address to
> > + * be mapped and how.
> > + *
> > + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not
> added.
> > + */
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> > +				struct rdma_user_mmap_entry *entry,
> > +				size_t length)
> 
> The similarly we don't need to return a u64 here and the sort of ugly
> RDMA_USER_MMAP_INVALID can go away

So you mean add a return code here ? and on failure driver will delete the 
Allocated xx_user_mmap_entry and not store it in xx_key ? 

> 
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-20 14:00         ` Gal Pressman
@ 2019-09-23  9:37           ` Michal Kalderon
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-23  9:37 UTC (permalink / raw)
  To: Gal Pressman, Jason Gunthorpe
  Cc: Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Gal Pressman
> 
> On 20/09/2019 16:38, Jason Gunthorpe wrote:
> > On Fri, Sep 20, 2019 at 04:30:52PM +0300, Gal Pressman wrote:
> >> On 19/09/2019 21:02, Jason Gunthorpe wrote:
> >>> On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> >>>
> >>>> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct
> rdma_user_mmap_entry
> >>>> *rdma_entry)  {
> >>>>  	struct qedr_user_mmap_entry *entry =
> >>>> get_qedr_mmap_entry(rdma_entry);
> >>>>
> >>>> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> >>>> +		free_page((unsigned long)phys_to_virt(entry->address));
> >>>> +
> >>>
> >>> While it isn't wrong it do it this way, we don't need this
> >>> mmap_free() stuff for normal CPU pages. Those are refcounted and
> >>> qedr can simply call free_page() during the teardown of the uobject
> >>> that is using the this page. This is what other drivers already do.
> >>
> >> This is pretty much what EFA does as well.  When we allocate pages
> >> for the user (CQ for example), we DMA map them and later on mmap
> them
> >> to the user. We expect those pages to remain until the entry is
> >> freed, how can we call free_page, who is holding a refcount on those
> >> except for the driver?
> >
> > free_page is kind of a lie, it is more like put_page (see
> > __free_pages). I think the difference is that it assumes the page came
> > from alloc_page and skips some generic stuff when freeing it.
> >
> > When the driver does vm_insert_page the vma holds another refcount and
> > the refcount does not go to zero until that page drops out of the vma
> > (ie at the same time mmap_free above is called).
> >
> > Then __put_page will do the free_unref_page(), etc.
> >
> > So for CPU pages it is fine to not use mmap_free so long as
> > vm_insert_page is used
> 
> Thanks, I did not know this, it simplifies things.
> In that case, maybe the mmap_free callback is redundant.

We still need it to free the entry though - right ? 
Gal, where do you free your hardware resources ? 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [EXT] Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-23  9:30     ` Michal Kalderon
@ 2019-09-23 13:26       ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-23 13:26 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Mon, Sep 23, 2019 at 09:30:54AM +0000, Michal Kalderon wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Thursday, September 19, 2019 9:02 PM
> > 
> > External Email
> > 
> > On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> > 
> > > @@ -347,6 +360,9 @@ void qedr_mmap_free(struct
> > rdma_user_mmap_entry
> > > *rdma_entry)  {
> > >  	struct qedr_user_mmap_entry *entry =
> > > get_qedr_mmap_entry(rdma_entry);
> > >
> > > +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> > > +		free_page((unsigned long)phys_to_virt(entry->address));
> > > +
> > 
> > While it isn't wrong it do it this way, we don't need this mmap_free() stuff for
> > normal CPU pages. Those are refcounted and qedr can simply call
> > free_page() during the teardown of the uobject that is using the this page.
> > This is what other drivers already do.
> > 
> > I'm also not sure why qedr is using a phys_addr for a struct page object,
> > seems wrong.
> As mentioned in previous email, I misunderstood this part before. I'll move the free
> To object teardown. 
> What we need here is simply a shared page between kernel + user that both have
> Virtual pointers to, user writes to the page, kernel needs to read the data. 
> 
> The reason I used phys here is because the entry->address is defines as u64
> As it is common whether it is an address to the bar or a page... 
> Should I define a union based on the entry type ? and for a page use
> struct page object ? 

Yes, a union with a void * would be OK

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-09-23  9:15     ` Michal Kalderon
@ 2019-09-23 13:28       ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-23 13:28 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Mon, Sep 23, 2019 at 09:15:54AM +0000, Michal Kalderon wrote:

> > This hidden detail should also be noted in the comment for
> > rdma_user_mmap_io..
> 
> I actually misunderstood you before and thought that we want to
> maintain all mappings in umap.  Now I understand you meant only the
> io mappings, since they're not referenced counted by the mm system.
> I'll revert the changes and add an additional parameter to
> rdma_user_mmap_io -> However, this means more changes in drivers
> that call this function and don't use the mmap_xa, Should I add an
> additional interface ? or are you OK with me sending NULL to the
> additional parameter from all drivers using the interface ? (hns,
> mlx4, mlx5)

It should be OK

> Also, who should increase the refcnt of rdma_user_mmap_entry when it
> is set to the vma private data ?  The caller of rdma_user_mmap_io or
> inside rdma_user_mmap_io function ?  This will definitely simplify
> things.

I generally prefer to see refcounts incrd at the spot the pointer is
copied, so the thing that sets vm_prviate_data should incr the refcount.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-23  9:21       ` Michal Kalderon
@ 2019-09-23 13:29         ` Jason Gunthorpe
  2019-09-24  8:25           ` [EXT] " Michal Kalderon
  2019-09-24  8:49         ` Pressman, Gal
  1 sibling, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-23 13:29 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Gal Pressman, Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

On Mon, Sep 23, 2019 at 09:21:37AM +0000, Michal Kalderon wrote:
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Gal Pressman
> > 
> > On 19/09/2019 20:55, Jason Gunthorpe wrote:
> > > Huh. If you recall we did all this work with the XA and the free
> > > callback because you said qedr was mmaping BAR pages that had some HW
> > > lifetime associated with them, and the HW resource was not to be
> > > reallocated until all users were gone.
> > >
> > > I think it would be a better example of this API if you pulled the
> > >
> > >  	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
> > >
> > > Into qedr_mmap_free().
> > >
> > > Then the rdma_user_mmap_entry_remove() will call it naturally as it
> > > does entry_put() and if we are destroying the ucontext we already know
> > > the mmaps are destroyed.
> > >
> > > Maybe the same basic comment for EFA, not sure. Gal?
> > 
> > That's what EFA already does in this series, no?
> > We no longer remove entries on dealloc_ucontext, only when the entry is
> > freed.
> 
> Actually, I think most of the discussions you had on the topic were with Gal, but
> Some apply to qedr as well, however, for qedr, the only hw resource we allocate (bar)
> is on alloc_ucontext , therefore we were safe to free it on dealloc_ucontext as all mappings
> were already zapped. Making the mmap_free a bit redundant for qedr except for the need
> to free the entry. 

I think if we are changing things to use this new interface then you
should consistenly code so that the lifetime of the object in the
mmap_entry is entirely controlled by the mmap_entry and not also split
to the ucontext.

Having mmapable object lifetimes linked to the ucontext is a bit of a
hack really

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [EXT] Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-23  9:36     ` [EXT] " Michal Kalderon
@ 2019-09-23 13:30       ` Jason Gunthorpe
  2019-09-24  8:31         ` Michal Kalderon
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-23 13:30 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Mon, Sep 23, 2019 at 09:36:01AM +0000, Michal Kalderon wrote:
> > > + * will use the key to retrieve information such as address to
> > > + * be mapped and how.
> > > + *
> > > + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not
> > added.
> > > + */
> > > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> > > +				struct rdma_user_mmap_entry *entry,
> > > +				size_t length)
> > 
> > The similarly we don't need to return a u64 here and the sort of ugly
> > RDMA_USER_MMAP_INVALID can go away
> 
> So you mean add a return code here ? and on failure driver will delete the 
> Allocated xx_user_mmap_entry and not store it in xx_key ? 

I'm not sure what this means

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [EXT] Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-23 13:29         ` Jason Gunthorpe
@ 2019-09-24  8:25           ` Michal Kalderon
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Kalderon @ 2019-09-24  8:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gal Pressman, Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Monday, September 23, 2019 4:30 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Mon, Sep 23, 2019 at 09:21:37AM +0000, Michal Kalderon wrote:
> > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > owner@vger.kernel.org> On Behalf Of Gal Pressman
> > >
> > > On 19/09/2019 20:55, Jason Gunthorpe wrote:
> > > > Huh. If you recall we did all this work with the XA and the free
> > > > callback because you said qedr was mmaping BAR pages that had some
> > > > HW lifetime associated with them, and the HW resource was not to
> > > > be reallocated until all users were gone.
> > > >
> > > > I think it would be a better example of this API if you pulled the
> > > >
> > > >  	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
> > > >
> > > > Into qedr_mmap_free().
> > > >
> > > > Then the rdma_user_mmap_entry_remove() will call it naturally as
> > > > it does entry_put() and if we are destroying the ucontext we
> > > > already know the mmaps are destroyed.
> > > >
> > > > Maybe the same basic comment for EFA, not sure. Gal?
> > >
> > > That's what EFA already does in this series, no?
> > > We no longer remove entries on dealloc_ucontext, only when the entry
> > > is freed.
> >
> > Actually, I think most of the discussions you had on the topic were
> > with Gal, but Some apply to qedr as well, however, for qedr, the only
> > hw resource we allocate (bar) is on alloc_ucontext , therefore we were
> > safe to free it on dealloc_ucontext as all mappings were already
> > zapped. Making the mmap_free a bit redundant for qedr except for the
> need to free the entry.
> 
> I think if we are changing things to use this new interface then you should
> consistenly code so that the lifetime of the object in the mmap_entry is
> entirely controlled by the mmap_entry and not also split to the ucontext.
> 
> Having mmapable object lifetimes linked to the ucontext is a bit of a hack
> really
> 
Ok, but the ucontext "hack" will need to remain for other drivers. 

> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [EXT] Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-23 13:30       ` Jason Gunthorpe
@ 2019-09-24  8:31         ` Michal Kalderon
  2019-09-24 12:37           ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-24  8:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> 
> On Mon, Sep 23, 2019 at 09:36:01AM +0000, Michal Kalderon wrote:
> > > > + * will use the key to retrieve information such as address to
> > > > + * be mapped and how.
> > > > + *
> > > > + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was
> not
> > > added.
> > > > + */
> > > > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> > > > +				struct rdma_user_mmap_entry *entry,
> > > > +				size_t length)
> > >
> > > The similarly we don't need to return a u64 here and the sort of
> > > ugly RDMA_USER_MMAP_INVALID can go away
> >
> > So you mean add a return code here ? and on failure driver will delete
> > the Allocated xx_user_mmap_entry and not store it in xx_key ?
> 
> I'm not sure what this means
> 
Before fixing: driver allocated rdma_user_mmap_entry, receives a key and stores
The key, if the key received is INVALID the driver frees the allocated entry. 
Wanted to make sure I understand correctly that you're requesting I change the
Function to return an error code, and on success the driver will store the pointer
To the mmap_entry structure and on failure free it ? 
And the driver will not hold the value of the key, instead call a helper function if it is required ? 
Thanks,
Michal

> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-23  9:21       ` Michal Kalderon
  2019-09-23 13:29         ` Jason Gunthorpe
@ 2019-09-24  8:49         ` Pressman, Gal
  2019-09-24  9:31           ` Michal Kalderon
  1 sibling, 1 reply; 41+ messages in thread
From: Pressman, Gal @ 2019-09-24  8:49 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Jason Gunthorpe, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma


> On 23 Sep 2019, at 18:22, Michal Kalderon <mkalderon@marvell.com> wrote:
> 
> 
>> 
>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> owner@vger.kernel.org> On Behalf Of Gal Pressman
>> 
>>> On 19/09/2019 20:55, Jason Gunthorpe wrote:
>>> Huh. If you recall we did all this work with the XA and the free
>>> callback because you said qedr was mmaping BAR pages that had some HW
>>> lifetime associated with them, and the HW resource was not to be
>>> reallocated until all users were gone.
>>> 
>>> I think it would be a better example of this API if you pulled the
>>> 
>>>    dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
>>> 
>>> Into qedr_mmap_free().
>>> 
>>> Then the rdma_user_mmap_entry_remove() will call it naturally as it
>>> does entry_put() and if we are destroying the ucontext we already know
>>> the mmaps are destroyed.
>>> 
>>> Maybe the same basic comment for EFA, not sure. Gal?
>> 
>> That's what EFA already does in this series, no?
>> We no longer remove entries on dealloc_ucontext, only when the entry is
>> freed.
> 
> Actually, I think most of the discussions you had on the topic were with Gal, but
> Some apply to qedr as well, however, for qedr, the only hw resource we allocate (bar)
> is on alloc_ucontext , therefore we were safe to free it on dealloc_ucontext as all mappings
> were already zapped. Making the mmap_free a bit redundant for qedr except for the need
> to free the entry. 
> 
> For EFA, it seemed the only operation delayed was freeing memory - I didn't see hw resources
> being freed... Gal?

What do you mean by hw resources being freed? The BAR mappings are under the device’s control and are associated to the lifetime of the UAR.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-24  8:49         ` Pressman, Gal
@ 2019-09-24  9:31           ` Michal Kalderon
  2019-10-20  7:19             ` Gal Pressman
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-24  9:31 UTC (permalink / raw)
  To: Pressman, Gal
  Cc: Jason Gunthorpe, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma

> From: Pressman, Gal <galpress@amazon.com>
> Sent: Tuesday, September 24, 2019 11:50 AM
> 
> 
> > On 23 Sep 2019, at 18:22, Michal Kalderon <mkalderon@marvell.com>
> wrote:
> >
> > 
> >>
> >> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> >> owner@vger.kernel.org> On Behalf Of Gal Pressman
> >>
> >>> On 19/09/2019 20:55, Jason Gunthorpe wrote:
> >>> Huh. If you recall we did all this work with the XA and the free
> >>> callback because you said qedr was mmaping BAR pages that had some
> >>> HW lifetime associated with them, and the HW resource was not to be
> >>> reallocated until all users were gone.
> >>>
> >>> I think it would be a better example of this API if you pulled the
> >>>
> >>>    dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
> >>>
> >>> Into qedr_mmap_free().
> >>>
> >>> Then the rdma_user_mmap_entry_remove() will call it naturally as it
> >>> does entry_put() and if we are destroying the ucontext we already
> >>> know the mmaps are destroyed.
> >>>
> >>> Maybe the same basic comment for EFA, not sure. Gal?
> >>
> >> That's what EFA already does in this series, no?
> >> We no longer remove entries on dealloc_ucontext, only when the entry
> >> is freed.
> >
> > Actually, I think most of the discussions you had on the topic were
> > with Gal, but Some apply to qedr as well, however, for qedr, the only
> > hw resource we allocate (bar) is on alloc_ucontext , therefore we were
> > safe to free it on dealloc_ucontext as all mappings were already
> > zapped. Making the mmap_free a bit redundant for qedr except for the
> need to free the entry.
> >
> > For EFA, it seemed the only operation delayed was freeing memory - I
> > didn't see hw resources being freed... Gal?
> 
> What do you mean by hw resources being freed? The BAR mappings are
> under the device’s control and are associated to the lifetime of the UAR.
The bar offset you get is from the device -> don't you release it back to the device
So it can be allocated to a different application ? 
In efa_com_create_qp -> you get offsets from the device that you use for mapping
The bar -> are these unique for every call ? are they released during destroy_qp ? 
Before this patch series mmap_entries_remove_free only freed the DMA pages, but
Following this thread, it seemed the initial intention was that only the hw resources would
Be delayed as the DMA pages are ref counted anyway.  I didn't see any delay to returning
The bar offsets to the device. Thanks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [EXT] Re: [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-09-24  8:31         ` Michal Kalderon
@ 2019-09-24 12:37           ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-24 12:37 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, dledford, bmt, galpress, sleybo, leon, linux-rdma

On Tue, Sep 24, 2019 at 08:31:16AM +0000, Michal Kalderon wrote:
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> > 
> > On Mon, Sep 23, 2019 at 09:36:01AM +0000, Michal Kalderon wrote:
> > > > > + * will use the key to retrieve information such as address to
> > > > > + * be mapped and how.
> > > > > + *
> > > > > + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was
> > not
> > > > added.
> > > > > + */
> > > > > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> > > > > +				struct rdma_user_mmap_entry *entry,
> > > > > +				size_t length)
> > > >
> > > > The similarly we don't need to return a u64 here and the sort of
> > > > ugly RDMA_USER_MMAP_INVALID can go away
> > >
> > > So you mean add a return code here ? and on failure driver will delete
> > > the Allocated xx_user_mmap_entry and not store it in xx_key ?
> > 
> > I'm not sure what this means
> > 
> Before fixing: driver allocated rdma_user_mmap_entry, receives a key and stores
> The key, if the key received is INVALID the driver frees the allocated entry. 
> Wanted to make sure I understand correctly that you're requesting I change the
> Function to return an error code, and on success the driver will store the pointer
> To the mmap_entry structure and on failure free it ? 
> And the driver will not hold the value of the key, instead call a
> helper function if it is required ?

Yes

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [EXT] Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-20 13:38       ` Jason Gunthorpe
  2019-09-20 14:00         ` Gal Pressman
@ 2019-09-25 19:16         ` Michal Kalderon
  2019-09-25 19:21           ` Jason Gunthorpe
  1 sibling, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-25 19:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Gal Pressman
  Cc: Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, September 20, 2019 4:38 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Fri, Sep 20, 2019 at 04:30:52PM +0300, Gal Pressman wrote:
> > On 19/09/2019 21:02, Jason Gunthorpe wrote:
> > > On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> > >
> > >> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct
> rdma_user_mmap_entry
> > >> *rdma_entry)  {
> > >>  	struct qedr_user_mmap_entry *entry =
> > >> get_qedr_mmap_entry(rdma_entry);
> > >>
> > >> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> > >> +		free_page((unsigned long)phys_to_virt(entry->address));
> > >> +
> > >
> > > While it isn't wrong it do it this way, we don't need this
> > > mmap_free() stuff for normal CPU pages. Those are refcounted and
> > > qedr can simply call free_page() during the teardown of the uobject
> > > that is using the this page. This is what other drivers already do.
> >
> > This is pretty much what EFA does as well.  When we allocate pages for
> > the user (CQ for example), we DMA map them and later on mmap them to
> > the user. We expect those pages to remain until the entry is freed,
> > how can we call free_page, who is holding a refcount on those except
> > for the driver?
> 
> free_page is kind of a lie, it is more like put_page (see __free_pages). I think
> the difference is that it assumes the page came from alloc_page and skips
> some generic stuff when freeing it.
> 
> When the driver does vm_insert_page the vma holds another refcount and
> the refcount does not go to zero until that page drops out of the vma (ie at
> the same time mmap_free above is called).
> 
> Then __put_page will do the free_unref_page(), etc.
> 
> So for CPU pages it is fine to not use mmap_free so long as vm_insert_page
> is used
Jason, by adding the kref to the rdma_user_mmap_entry  we sort of disable the 
option of being sure the entry is removed from the mmap xarray when it is removed
by the driver (this will only decrease the refcnt).
If we call free_page during the uobject teardown, we can't be sure
the entry is removed from the mmap_xa, this could lead to us having an entry in the mmap_xa
that points to an invalid page. 

Perhaps we could define the entry as being ref-counted or not based on the type of address, but
the original design was to keep this information opaque to the rdma_user_mmap_entry. 
We can also assume that for CPU pages the flow that increases the refcnt won't be called
(mmap_io) but this feels like bad practice. 

How should I handle this?
Thanks,

> 
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [EXT] Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-25 19:16         ` [EXT] " Michal Kalderon
@ 2019-09-25 19:21           ` Jason Gunthorpe
  2019-09-25 19:37             ` Michal Kalderon
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-25 19:21 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Gal Pressman, Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

On Wed, Sep 25, 2019 at 07:16:23PM +0000, Michal Kalderon wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Friday, September 20, 2019 4:38 PM
> > 
> > External Email
> > 
> > On Fri, Sep 20, 2019 at 04:30:52PM +0300, Gal Pressman wrote:
> > > On 19/09/2019 21:02, Jason Gunthorpe wrote:
> > > > On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> > > >
> > > >> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct
> > rdma_user_mmap_entry
> > > >> *rdma_entry)  {
> > > >>  	struct qedr_user_mmap_entry *entry =
> > > >> get_qedr_mmap_entry(rdma_entry);
> > > >>
> > > >> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> > > >> +		free_page((unsigned long)phys_to_virt(entry->address));
> > > >> +
> > > >
> > > > While it isn't wrong it do it this way, we don't need this
> > > > mmap_free() stuff for normal CPU pages. Those are refcounted and
> > > > qedr can simply call free_page() during the teardown of the uobject
> > > > that is using the this page. This is what other drivers already do.
> > >
> > > This is pretty much what EFA does as well.  When we allocate pages for
> > > the user (CQ for example), we DMA map them and later on mmap them to
> > > the user. We expect those pages to remain until the entry is freed,
> > > how can we call free_page, who is holding a refcount on those except
> > > for the driver?
> > 
> > free_page is kind of a lie, it is more like put_page (see __free_pages). I think
> > the difference is that it assumes the page came from alloc_page and skips
> > some generic stuff when freeing it.
> > 
> > When the driver does vm_insert_page the vma holds another refcount and
> > the refcount does not go to zero until that page drops out of the vma (ie at
> > the same time mmap_free above is called).
> > 
> > Then __put_page will do the free_unref_page(), etc.
> > 
> > So for CPU pages it is fine to not use mmap_free so long as vm_insert_page
> > is used

> Jason, by adding the kref to the rdma_user_mmap_entry we sort of
> disable the option of being sure the entry is removed from the mmap
> xarray when it is removed by the driver (this will only decrease the
> refcnt).  If we call free_page during the uobject teardown, we can't
> be sure the entry is removed from the mmap_xa, this could lead to us
> having an entry in the mmap_xa that points to an invalid page.

I suppose I was expecting that the when the object was no longer to be
shown to userspace the mmap_xa's were somehow neutralized too so new
mmaps cannot be established.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [EXT] Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-25 19:21           ` Jason Gunthorpe
@ 2019-09-25 19:37             ` Michal Kalderon
  2019-09-26 19:10               ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Kalderon @ 2019-09-25 19:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gal Pressman, Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
> 
> On Wed, Sep 25, 2019 at 07:16:23PM +0000, Michal Kalderon wrote:
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Friday, September 20, 2019 4:38 PM
> > >
> > > External Email
> > >
> > > On Fri, Sep 20, 2019 at 04:30:52PM +0300, Gal Pressman wrote:
> > > > On 19/09/2019 21:02, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 05, 2019 at 01:01:16PM +0300, Michal Kalderon wrote:
> > > > >
> > > > >> @@ -347,6 +360,9 @@ void qedr_mmap_free(struct
> > > rdma_user_mmap_entry
> > > > >> *rdma_entry)  {
> > > > >>  	struct qedr_user_mmap_entry *entry =
> > > > >> get_qedr_mmap_entry(rdma_entry);
> > > > >>
> > > > >> +	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
> > > > >> +		free_page((unsigned long)phys_to_virt(entry-
> >address));
> > > > >> +
> > > > >
> > > > > While it isn't wrong it do it this way, we don't need this
> > > > > mmap_free() stuff for normal CPU pages. Those are refcounted and
> > > > > qedr can simply call free_page() during the teardown of the
> > > > > uobject that is using the this page. This is what other drivers already
> do.
> > > >
> > > > This is pretty much what EFA does as well.  When we allocate pages
> > > > for the user (CQ for example), we DMA map them and later on mmap
> > > > them to the user. We expect those pages to remain until the entry
> > > > is freed, how can we call free_page, who is holding a refcount on
> > > > those except for the driver?
> > >
> > > free_page is kind of a lie, it is more like put_page (see
> > > __free_pages). I think the difference is that it assumes the page
> > > came from alloc_page and skips some generic stuff when freeing it.
> > >
> > > When the driver does vm_insert_page the vma holds another refcount
> > > and the refcount does not go to zero until that page drops out of
> > > the vma (ie at the same time mmap_free above is called).
> > >
> > > Then __put_page will do the free_unref_page(), etc.
> > >
> > > So for CPU pages it is fine to not use mmap_free so long as
> > > vm_insert_page is used
> 
> > Jason, by adding the kref to the rdma_user_mmap_entry we sort of
> > disable the option of being sure the entry is removed from the mmap
> > xarray when it is removed by the driver (this will only decrease the
> > refcnt).  If we call free_page during the uobject teardown, we can't
> > be sure the entry is removed from the mmap_xa, this could lead to us
> > having an entry in the mmap_xa that points to an invalid page.
> 
> I suppose I was expecting that the when the object was no longer to be
> shown to userspace the mmap_xa's were somehow neutralized too so new
> mmaps cannot be established.
Adding/removing entries is dynamic and done on qp / cq creation and destruction, 
Could be done all the time. To neutralize them is only to add some interface that
Make sure the entry is deleted like wait for the event that refcnt reaches zero before freeing the memory, 
Or leave it as it is now and only free the memory in the mmap_free. 


> 
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [EXT] Re: [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-09-25 19:37             ` Michal Kalderon
@ 2019-09-26 19:10               ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-09-26 19:10 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Gal Pressman, Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma

On Wed, Sep 25, 2019 at 07:37:38PM +0000, Michal Kalderon wrote:

> > I suppose I was expecting that the when the object was no longer to be
> > shown to userspace the mmap_xa's were somehow neutralized too so new
> > mmaps cannot be established.

> Adding/removing entries is dynamic and done on qp / cq creation and
> destruction, Could be done all the time. To neutralize them is only
> to add some interface that Make sure the entry is deleted like wait
> for the event that refcnt reaches zero before freeing the memory, Or
> leave it as it is now and only free the memory in the mmap_free.

nothing needs to wait, it just needs to make sure future mmap to that
offset will not succeed

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-09-24  9:31           ` Michal Kalderon
@ 2019-10-20  7:19             ` Gal Pressman
  2019-10-21 17:33               ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Gal Pressman @ 2019-10-20  7:19 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Jason Gunthorpe, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma

On 24/09/2019 12:31, Michal Kalderon wrote:
>> From: Pressman, Gal <galpress@amazon.com>
>> Sent: Tuesday, September 24, 2019 11:50 AM
>>
>>
>>> On 23 Sep 2019, at 18:22, Michal Kalderon <mkalderon@marvell.com>
>> wrote:
>>>
>>>
>>>>
>>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>>>> owner@vger.kernel.org> On Behalf Of Gal Pressman
>>>>
>>>>> On 19/09/2019 20:55, Jason Gunthorpe wrote:
>>>>> Huh. If you recall we did all this work with the XA and the free
>>>>> callback because you said qedr was mmaping BAR pages that had some
>>>>> HW lifetime associated with them, and the HW resource was not to be
>>>>> reallocated until all users were gone.
>>>>>
>>>>> I think it would be a better example of this API if you pulled the
>>>>>
>>>>>    dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
>>>>>
>>>>> Into qedr_mmap_free().
>>>>>
>>>>> Then the rdma_user_mmap_entry_remove() will call it naturally as it
>>>>> does entry_put() and if we are destroying the ucontext we already
>>>>> know the mmaps are destroyed.
>>>>>
>>>>> Maybe the same basic comment for EFA, not sure. Gal?
>>>>
>>>> That's what EFA already does in this series, no?
>>>> We no longer remove entries on dealloc_ucontext, only when the entry
>>>> is freed.
>>>
>>> Actually, I think most of the discussions you had on the topic were
>>> with Gal, but Some apply to qedr as well, however, for qedr, the only
>>> hw resource we allocate (bar) is on alloc_ucontext , therefore we were
>>> safe to free it on dealloc_ucontext as all mappings were already
>>> zapped. Making the mmap_free a bit redundant for qedr except for the
>> need to free the entry.
>>>
>>> For EFA, it seemed the only operation delayed was freeing memory - I
>>> didn't see hw resources being freed... Gal?
>>
>> What do you mean by hw resources being freed? The BAR mappings are
>> under the device’s control and are associated to the lifetime of the UAR.
> The bar offset you get is from the device -> don't you release it back to the device
> So it can be allocated to a different application ? 
> In efa_com_create_qp -> you get offsets from the device that you use for mapping
> The bar -> are these unique for every call ? are they released during destroy_qp ? 
> Before this patch series mmap_entries_remove_free only freed the DMA pages, but
> Following this thread, it seemed the initial intention was that only the hw resources would
> Be delayed as the DMA pages are ref counted anyway.  I didn't see any delay to returning
> The bar offsets to the device. Thanks.
The BAR pages are being "freed" back to the device once the UAR is freed.
These pages' lifetime is under the control of the device so there's nothing the
driver needs to do, just make sure no one else is using them.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-10-20  7:19             ` Gal Pressman
@ 2019-10-21 17:33               ` Jason Gunthorpe
  2019-10-23  6:40                 ` Gal Pressman
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-10-21 17:33 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Michal Kalderon, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma

On Sun, Oct 20, 2019 at 10:19:34AM +0300, Gal Pressman wrote:
> On 24/09/2019 12:31, Michal Kalderon wrote:
> >> From: Pressman, Gal <galpress@amazon.com>
> >> Sent: Tuesday, September 24, 2019 11:50 AM
> >>
> >>
> >>> On 23 Sep 2019, at 18:22, Michal Kalderon <mkalderon@marvell.com>
> >> wrote:
> >>>
> >>>
> >>>>
> >>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> >>>> owner@vger.kernel.org> On Behalf Of Gal Pressman
> >>>>
> >>>>> On 19/09/2019 20:55, Jason Gunthorpe wrote:
> >>>>> Huh. If you recall we did all this work with the XA and the free
> >>>>> callback because you said qedr was mmaping BAR pages that had some
> >>>>> HW lifetime associated with them, and the HW resource was not to be
> >>>>> reallocated until all users were gone.
> >>>>>
> >>>>> I think it would be a better example of this API if you pulled the
> >>>>>
> >>>>>    dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
> >>>>>
> >>>>> Into qedr_mmap_free().
> >>>>>
> >>>>> Then the rdma_user_mmap_entry_remove() will call it naturally as it
> >>>>> does entry_put() and if we are destroying the ucontext we already
> >>>>> know the mmaps are destroyed.
> >>>>>
> >>>>> Maybe the same basic comment for EFA, not sure. Gal?
> >>>>
> >>>> That's what EFA already does in this series, no?
> >>>> We no longer remove entries on dealloc_ucontext, only when the entry
> >>>> is freed.
> >>>
> >>> Actually, I think most of the discussions you had on the topic were
> >>> with Gal, but Some apply to qedr as well, however, for qedr, the only
> >>> hw resource we allocate (bar) is on alloc_ucontext , therefore we were
> >>> safe to free it on dealloc_ucontext as all mappings were already
> >>> zapped. Making the mmap_free a bit redundant for qedr except for the
> >> need to free the entry.
> >>>
> >>> For EFA, it seemed the only operation delayed was freeing memory - I
> >>> didn't see hw resources being freed... Gal?
> >>
> >> What do you mean by hw resources being freed? The BAR mappings are
> >> under the device’s control and are associated to the lifetime of the UAR.
> > The bar offset you get is from the device -> don't you release it back to the device
> > So it can be allocated to a different application ? 
> > In efa_com_create_qp -> you get offsets from the device that you use for mapping
> > The bar -> are these unique for every call ? are they released during destroy_qp ? 
> > Before this patch series mmap_entries_remove_free only freed the DMA pages, but
> > Following this thread, it seemed the initial intention was that only the hw resources would
> > Be delayed as the DMA pages are ref counted anyway.  I didn't see any delay to returning
> > The bar offsets to the device. Thanks.
> The BAR pages are being "freed" back to the device once the UAR is freed.
> These pages' lifetime is under the control of the device so there's nothing the
> driver needs to do, just make sure no one else is using them.

What frees the UAR?

In the mlx drivers this was done during destruction of the ucontext,
but with this new mmap stuff it could be moved to the mmap_free..

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-10-21 17:33               ` Jason Gunthorpe
@ 2019-10-23  6:40                 ` Gal Pressman
  2019-10-23 14:41                   ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Gal Pressman @ 2019-10-23  6:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michal Kalderon, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma

On 21/10/2019 20:33, Jason Gunthorpe wrote:
> On Sun, Oct 20, 2019 at 10:19:34AM +0300, Gal Pressman wrote:
>> On 24/09/2019 12:31, Michal Kalderon wrote:
>>>> From: Pressman, Gal <galpress@amazon.com>
>>>> Sent: Tuesday, September 24, 2019 11:50 AM
>>>>
>>>>
>>>>> On 23 Sep 2019, at 18:22, Michal Kalderon <mkalderon@marvell.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>>>>>> owner@vger.kernel.org> On Behalf Of Gal Pressman
>>>>>>
>>>>>>> On 19/09/2019 20:55, Jason Gunthorpe wrote:
>>>>>>> Huh. If you recall we did all this work with the XA and the free
>>>>>>> callback because you said qedr was mmaping BAR pages that had some
>>>>>>> HW lifetime associated with them, and the HW resource was not to be
>>>>>>> reallocated until all users were gone.
>>>>>>>
>>>>>>> I think it would be a better example of this API if you pulled the
>>>>>>>
>>>>>>>    dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
>>>>>>>
>>>>>>> Into qedr_mmap_free().
>>>>>>>
>>>>>>> Then the rdma_user_mmap_entry_remove() will call it naturally as it
>>>>>>> does entry_put() and if we are destroying the ucontext we already
>>>>>>> know the mmaps are destroyed.
>>>>>>>
>>>>>>> Maybe the same basic comment for EFA, not sure. Gal?
>>>>>>
>>>>>> That's what EFA already does in this series, no?
>>>>>> We no longer remove entries on dealloc_ucontext, only when the entry
>>>>>> is freed.
>>>>>
>>>>> Actually, I think most of the discussions you had on the topic were
>>>>> with Gal, but Some apply to qedr as well, however, for qedr, the only
>>>>> hw resource we allocate (bar) is on alloc_ucontext , therefore we were
>>>>> safe to free it on dealloc_ucontext as all mappings were already
>>>>> zapped. Making the mmap_free a bit redundant for qedr except for the
>>>> need to free the entry.
>>>>>
>>>>> For EFA, it seemed the only operation delayed was freeing memory - I
>>>>> didn't see hw resources being freed... Gal?
>>>>
>>>> What do you mean by hw resources being freed? The BAR mappings are
>>>> under the device’s control and are associated to the lifetime of the UAR.
>>> The bar offset you get is from the device -> don't you release it back to the device
>>> So it can be allocated to a different application ? 
>>> In efa_com_create_qp -> you get offsets from the device that you use for mapping
>>> The bar -> are these unique for every call ? are they released during destroy_qp ? 
>>> Before this patch series mmap_entries_remove_free only freed the DMA pages, but
>>> Following this thread, it seemed the initial intention was that only the hw resources would
>>> Be delayed as the DMA pages are ref counted anyway.  I didn't see any delay to returning
>>> The bar offsets to the device. Thanks.
>> The BAR pages are being "freed" back to the device once the UAR is freed.
>> These pages' lifetime is under the control of the device so there's nothing the
>> driver needs to do, just make sure no one else is using them.
> 
> What frees the UAR?
> 
> In the mlx drivers this was done during destruction of the ucontext,
> but with this new mmap stuff it could be moved to the mmap_free..

Dealloc UAR is currently being called during dealloc_ucontext.
The mmap_free callback is per entry, how can dealloc_uar be moved there?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-10-23  6:40                 ` Gal Pressman
@ 2019-10-23 14:41                   ` Jason Gunthorpe
  2019-10-24  8:06                     ` Gal Pressman
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-10-23 14:41 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Michal Kalderon, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma

On Wed, Oct 23, 2019 at 09:40:44AM +0300, Gal Pressman wrote:
> > In the mlx drivers this was done during destruction of the ucontext,
> > but with this new mmap stuff it could be moved to the mmap_free..
> 
> Dealloc UAR is currently being called during dealloc_ucontext.
> The mmap_free callback is per entry, how can dealloc_uar be moved there?

If it is some global per context uar then sure, it should live till
dealloc ucontext

If it is a dynamicly created uar then it should have a shorter
lifetime.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-10-23 14:41                   ` Jason Gunthorpe
@ 2019-10-24  8:06                     ` Gal Pressman
  0 siblings, 0 replies; 41+ messages in thread
From: Gal Pressman @ 2019-10-24  8:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michal Kalderon, Ariel Elior, dledford, bmt, Leybovich, Yossi,
	leon, linux-rdma

On 23/10/2019 17:41, Jason Gunthorpe wrote:
> On Wed, Oct 23, 2019 at 09:40:44AM +0300, Gal Pressman wrote:
>>> In the mlx drivers this was done during destruction of the ucontext,
>>> but with this new mmap stuff it could be moved to the mmap_free..
>>
>> Dealloc UAR is currently being called during dealloc_ucontext.
>> The mmap_free callback is per entry, how can dealloc_uar be moved there?
> 
> If it is some global per context uar then sure, it should live till
> dealloc ucontext
> 
> If it is a dynamicly created uar then it should have a shorter
> lifetime.

It's a UAR per ucontext. Created on alloc_ucontext and deallocated on
dealloc_ucontext.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-10-24  8:06 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05 10:01 [PATCH v11 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
2019-09-05 10:01 ` [PATCH v11 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
2019-09-05 10:01 ` [PATCH v11 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
2019-09-19 17:14   ` Jason Gunthorpe
2019-09-19 17:48   ` Jason Gunthorpe
2019-09-19 18:07   ` Jason Gunthorpe
2019-09-23  9:36     ` [EXT] " Michal Kalderon
2019-09-23 13:30       ` Jason Gunthorpe
2019-09-24  8:31         ` Michal Kalderon
2019-09-24 12:37           ` Jason Gunthorpe
2019-09-05 10:01 ` [PATCH v11 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
2019-09-19 17:37   ` Jason Gunthorpe
2019-09-23  9:15     ` Michal Kalderon
2019-09-23 13:28       ` Jason Gunthorpe
2019-09-05 10:01 ` [PATCH v11 rdma-next 4/7] RDMA/siw: " Michal Kalderon
2019-09-05 10:01 ` [PATCH v11 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
2019-09-19 17:55   ` Jason Gunthorpe
2019-09-20 13:39     ` Gal Pressman
2019-09-23  9:21       ` Michal Kalderon
2019-09-23 13:29         ` Jason Gunthorpe
2019-09-24  8:25           ` [EXT] " Michal Kalderon
2019-09-24  8:49         ` Pressman, Gal
2019-09-24  9:31           ` Michal Kalderon
2019-10-20  7:19             ` Gal Pressman
2019-10-21 17:33               ` Jason Gunthorpe
2019-10-23  6:40                 ` Gal Pressman
2019-10-23 14:41                   ` Jason Gunthorpe
2019-10-24  8:06                     ` Gal Pressman
2019-09-05 10:01 ` [PATCH v11 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
2019-09-19 18:02   ` Jason Gunthorpe
2019-09-20 13:30     ` Gal Pressman
2019-09-20 13:38       ` Jason Gunthorpe
2019-09-20 14:00         ` Gal Pressman
2019-09-23  9:37           ` Michal Kalderon
2019-09-25 19:16         ` [EXT] " Michal Kalderon
2019-09-25 19:21           ` Jason Gunthorpe
2019-09-25 19:37             ` Michal Kalderon
2019-09-26 19:10               ` Jason Gunthorpe
2019-09-23  9:30     ` Michal Kalderon
2019-09-23 13:26       ` Jason Gunthorpe
2019-09-05 10:01 ` [PATCH v11 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell " Michal Kalderon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).