Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
@ 2019-08-27 13:28 Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon

This patch series uses the doorbell overflow recovery mechanism
introduced in
commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
for rdma ( RoCE and iWARP )

The first five patches modify the core code to contain helper
functions for managing mmap_xa inserting, getting and freeing
entries. The code was based on the code from efa driver.
There is still an open discussion on whether we should take
this even further and make the entire mmap generic. Until a
decision is made, I only created the database API and modified
the efa, qedr, siw driver to use it. The functions are integrated
with the umap mechanism.

The doorbell recovery code is based on the common code.

Efa driver was compile tested and checked only modprobe/rmmod.
SIW was compile tested and checked only modprobe/rmmod.

rdma-core pull request #493

Changes from V7:
- Remove license text, SPDX id should suffice.
- Fix some comments text.
- Add comment regarding vm_ops being set in ib_uverbs_mmap.
- Allocate the rdma_user_mmap_entry in the driver and not in the
  ib_core_uverbs. This lead to defining three new structures per driver
  and seperating the fields between the driver private structures and
  the common rdma_user_mmap_entry. Freeing the entry was also moved
  to the drivers.
- Fix bug found by Gal Pressman. Call mmap_free only once per entry.
- Add a mutex around xa_mmap insert to assure threads won't intefere
  while the xa lock is released when inserting an entry into the range.
- Modify the insert algorithm to be more elegant using the
  xas_next_entry instead of foreach.
- Remove the rdma_user_mmap_entries_remove_free function, now that umap.
  and mmap_xa are integrated we should not have any entries in the mmap_xa
  when ucontext is released. Replace the function with a WARN_ON(!xa_empty).
- Rdma_umap_open needs to reset the vm_private_data before initializing it.
- Decrease rdma_user_mmap_entry reference count on mmap disassociate.
- Remove WARN_ON(!kref_read) this is checked when kref debug is on.
- Remove some redundant defines from ib_verbs.h.
- Better error handling for efa create qp flow.
- Add a function that wraps the entry allocation and rdma_user_mmap_entry_insert
  which is used in all places that need to add an entry to the xarray.
- Remove rq_entry_inserted field in efa create qp flow.
- Add mmap_free to siw and free the memory only on mmap free and not before.

Changes from V6:
- Modified series description to be closer to what the series is now.
- Create a new file for the new rdma_user_mmap function. The file
  is called ib_uverbs_core. This file should contain functions related
  to user which are called by hw to eventually enable ib_uverbs to be
  optional.
- Modify SIW driver to use new mmap api.
- When calculating number of pages, need to round it up to PAGE_SIZE.
- Integrate the mmap_xa and umap mechanism so that the entries in
  mmap_xa now have a reference count and can be removed. Previously
  entries existed until context was destroyed. This modified the
  algorithm for allocating a free page range.
- Modify algorithm for inserting an entry into the mmap_xa.
- Rdma_umap_priv is now also used for all mmaps done using the
  mmap_xa helpers.
- Move remove_free header to core_priv.
- Rdma_user_mmap_entry now has a kref that is increase on mmap
  and umap_open and decreased on umap_close.
- Modify efa + qedr to remove the entry from xa_map. This will
  decrease the refcnt and free memory only if refcnt is zero.
- Rdma_user_mmap_io slightly modified to enable drivers not using
  the xa_mmap API to continue using it.
- Modify page allocation for user to use GFP_USER instead of GFP_KERNEL

Changes from V5:
- Switch between driver dealloc_ucontext and mmap_entries_remove call.
- No need to verify the key after using the key to load an entry from
  the mmap_xa.
- Change mmap_free api to pass an 'entry' object.
- Add documentation for mmap_free and for newly exported functions.
- Fix some extra/missing line breaks.

Changes from V4:
- Add common mmap database and cookie helper functions.

Changes from V3:
- Remove casts from void to u8. Pointer arithmetic can be done on void
- rebase to tip of rdma-next

Changes from V2:
- Don't use long-lived kmap. Instead use user-trigger mmap for the
  doorbell recovery entries.
- Modify dpi_addr to be denoted with __iomem and avoid redundant
  casts

Changes from V1:
- call kmap to map virtual address into kernel space
- modify db_rec_delete to be void
- remove some cpu_to_le16 that were added to previous patch which are
  correct but not related to the overflow recovery mechanism. Will be
  submitted as part of a different patch


Michal Kalderon (7):
  RDMA/core: Move core content from ib_uverbs to ib_core
  RDMA/core: Create mmap database and cookie helper functions
  RDMA/efa: Use the common mmap_xa helpers
  RDMA/siw: Use the common mmap_xa helpers
  RDMA/qedr: Use the common mmap API
  RDMA/qedr: Add doorbell overflow recovery support
  RDMA/qedr: Add iWARP doorbell recovery support

 drivers/infiniband/core/Makefile         |   2 +-
 drivers/infiniband/core/core_priv.h      |  16 +
 drivers/infiniband/core/device.c         |   1 +
 drivers/infiniband/core/ib_core_uverbs.c | 341 +++++++++++++++++++
 drivers/infiniband/core/rdma_core.c      |   1 +
 drivers/infiniband/core/uverbs_cmd.c     |   1 +
 drivers/infiniband/core/uverbs_main.c    |  97 ++----
 drivers/infiniband/hw/efa/efa.h          |  18 +-
 drivers/infiniband/hw/efa/efa_main.c     |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c    | 339 ++++++++++---------
 drivers/infiniband/hw/qedr/main.c        |   1 +
 drivers/infiniband/hw/qedr/qedr.h        |  45 ++-
 drivers/infiniband/hw/qedr/verbs.c       | 539 ++++++++++++++++++++++---------
 drivers/infiniband/hw/qedr/verbs.h       |   3 +-
 drivers/infiniband/sw/siw/siw.h          |  20 +-
 drivers/infiniband/sw/siw/siw_main.c     |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c    | 223 +++++++------
 drivers/infiniband/sw/siw/siw_verbs.h    |   1 +
 include/rdma/ib_verbs.h                  |  37 ++-
 include/uapi/rdma/qedr-abi.h             |  25 ++
 20 files changed, 1180 insertions(+), 532 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_core_uverbs.c

-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
@ 2019-08-27 13:28 ` Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/Makefile         |  2 +-
 drivers/infiniband/core/core_priv.h      |  9 ++++
 drivers/infiniband/core/ib_core_uverbs.c | 73 +++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c    | 74 ++------------------------------
 4 files changed, 86 insertions(+), 72 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_core_uverbs.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 09881bd5f12d..9a8871e21545 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -11,7 +11,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
-				nldev.o restrack.o counters.o
+				nldev.o restrack.o counters.o ib_core_uverbs.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3a8b0911c3bc..0252da9560f4 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -387,4 +387,13 @@ int ib_device_set_netns_put(struct sk_buff *skb,
 
 int rdma_nl_net_init(struct rdma_dev_net *rnet);
 void rdma_nl_net_exit(struct rdma_dev_net *rnet);
+
+struct rdma_umap_priv {
+	struct vm_area_struct *vma;
+	struct list_head list;
+};
+
+void rdma_umap_priv_init(struct rdma_umap_priv *priv,
+			 struct vm_area_struct *vma);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
new file mode 100644
index 000000000000..b74d2a2fb342
--- /dev/null
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2019 Marvell. All rights reserved.
+ */
+#include <linux/xarray.h>
+#include "uverbs.h"
+#include "core_priv.h"
+
+/*
+ * Each time we map IO memory into user space this keeps track of the mapping.
+ * When the device is hot-unplugged we 'zap' the mmaps in user space to point
+ * to the zero page and allow the hot unplug to proceed.
+ *
+ * This is necessary for cases like PCI physical hot unplug as the actual BAR
+ * memory may vanish after this and access to it from userspace could MCE.
+ *
+ * RDMA drivers supporting disassociation must have their user space designed
+ * to cope in some way with their IO pages going to the zero page.
+ */
+void rdma_umap_priv_init(struct rdma_umap_priv *priv,
+			 struct vm_area_struct *vma)
+{
+	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
+
+	priv->vma = vma;
+	vma->vm_private_data = priv;
+	/* vm_ops is setup in ib_uverbs_mmap() to avoid module dependencies */
+
+	mutex_lock(&ufile->umap_lock);
+	list_add(&priv->list, &ufile->umaps);
+	mutex_unlock(&ufile->umap_lock);
+}
+EXPORT_SYMBOL(rdma_umap_priv_init);
+
+/*
+ * Map IO memory into a process. This is to be called by drivers as part of
+ * their mmap() functions if they wish to send something like PCI-E BAR memory
+ * to userspace.
+ */
+int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
+		      unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+	struct ib_uverbs_file *ufile = ucontext->ufile;
+	struct rdma_umap_priv *priv;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	if (vma->vm_end - vma->vm_start != size)
+		return -EINVAL;
+
+	/* Driver is using this wrong, must be called by ib_uverbs_mmap */
+	if (WARN_ON(!vma->vm_file ||
+		    vma->vm_file->private_data != ufile))
+		return -EINVAL;
+	lockdep_assert_held(&ufile->device->disassociate_srcu);
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	vma->vm_page_prot = prot;
+	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
+		kfree(priv);
+		return -EAGAIN;
+	}
+
+	rdma_umap_priv_init(priv, vma);
+	return 0;
+}
+EXPORT_SYMBOL(rdma_user_mmap_io);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 02b57240176c..180a5e0f70e4 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -772,6 +772,8 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	return (ret) ? : count;
 }
 
+static const struct vm_operations_struct rdma_umap_ops;
+
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct ib_uverbs_file *file = filp->private_data;
@@ -785,45 +787,13 @@ static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
 		ret = PTR_ERR(ucontext);
 		goto out;
 	}
-
+	vma->vm_ops = &rdma_umap_ops;
 	ret = ucontext->device->ops.mmap(ucontext, vma);
 out:
 	srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
 	return ret;
 }
 
-/*
- * Each time we map IO memory into user space this keeps track of the mapping.
- * When the device is hot-unplugged we 'zap' the mmaps in user space to point
- * to the zero page and allow the hot unplug to proceed.
- *
- * This is necessary for cases like PCI physical hot unplug as the actual BAR
- * memory may vanish after this and access to it from userspace could MCE.
- *
- * RDMA drivers supporting disassociation must have their user space designed
- * to cope in some way with their IO pages going to the zero page.
- */
-struct rdma_umap_priv {
-	struct vm_area_struct *vma;
-	struct list_head list;
-};
-
-static const struct vm_operations_struct rdma_umap_ops;
-
-static void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-				struct vm_area_struct *vma)
-{
-	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
-
-	priv->vma = vma;
-	vma->vm_private_data = priv;
-	vma->vm_ops = &rdma_umap_ops;
-
-	mutex_lock(&ufile->umap_lock);
-	list_add(&priv->list, &ufile->umaps);
-	mutex_unlock(&ufile->umap_lock);
-}
-
 /*
  * The VMA has been dup'd, initialize the vm_private_data with a new tracking
  * struct
@@ -931,44 +901,6 @@ static const struct vm_operations_struct rdma_umap_ops = {
 	.fault = rdma_umap_fault,
 };
 
-/*
- * Map IO memory into a process. This is to be called by drivers as part of
- * their mmap() functions if they wish to send something like PCI-E BAR memory
- * to userspace.
- */
-int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
-		      unsigned long pfn, unsigned long size, pgprot_t prot)
-{
-	struct ib_uverbs_file *ufile = ucontext->ufile;
-	struct rdma_umap_priv *priv;
-
-	if (!(vma->vm_flags & VM_SHARED))
-		return -EINVAL;
-
-	if (vma->vm_end - vma->vm_start != size)
-		return -EINVAL;
-
-	/* Driver is using this wrong, must be called by ib_uverbs_mmap */
-	if (WARN_ON(!vma->vm_file ||
-		    vma->vm_file->private_data != ufile))
-		return -EINVAL;
-	lockdep_assert_held(&ufile->device->disassociate_srcu);
-
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
-		return -ENOMEM;
-
-	vma->vm_page_prot = prot;
-	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
-		kfree(priv);
-		return -EAGAIN;
-	}
-
-	rdma_umap_priv_init(priv, vma);
-	return 0;
-}
-EXPORT_SYMBOL(rdma_user_mmap_io);
-
 void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 {
 	struct rdma_umap_priv *priv, *next_priv;
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
@ 2019-08-27 13:28 ` Michal Kalderon
  2019-08-29 11:35   ` Gal Pressman
  2019-08-27 13:28 ` [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

Create some common API's for adding entries to a xa_mmap.
Searching for an entry and freeing one.

Most of the code was copied from the efa driver almost as is, just renamed
function to be generic and not efa specific.
In addition to original code, the xa_mmap entries are now linked
to a umap_priv object and reference counted according to umap operations.
The fact that this code moved to core enabled managing it differently,
so that now entries can be removed and deleted when driver+user are
done with them. This enabled changing the insert algorithm in
comparison to what was done in efa.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/core_priv.h      |  11 +-
 drivers/infiniband/core/device.c         |   1 +
 drivers/infiniband/core/ib_core_uverbs.c | 300 +++++++++++++++++++++++++++++--
 drivers/infiniband/core/rdma_core.c      |   1 +
 drivers/infiniband/core/uverbs_cmd.c     |   1 +
 drivers/infiniband/core/uverbs_main.c    |  23 ++-
 include/rdma/ib_verbs.h                  |  37 ++--
 7 files changed, 338 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 0252da9560f4..20beb5de6996 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -391,9 +391,16 @@ void rdma_nl_net_exit(struct rdma_dev_net *rnet);
 struct rdma_umap_priv {
 	struct vm_area_struct *vma;
 	struct list_head list;
+	struct rdma_user_mmap_entry *entry;
 };
 
-void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-			 struct vm_area_struct *vma);
+int rdma_umap_priv_init(struct vm_area_struct *vma,
+			struct rdma_user_mmap_entry *entry);
+
+void rdma_umap_priv_delete(struct ib_uverbs_file *ufile,
+			   struct rdma_umap_priv *priv);
+
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry);
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 99c4a55545cf..8506844ae132 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2629,6 +2629,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
 	SET_DEVICE_OP(dev_ops, map_phys_fmr);
 	SET_DEVICE_OP(dev_ops, mmap);
+	SET_DEVICE_OP(dev_ops, mmap_free);
 	SET_DEVICE_OP(dev_ops, modify_ah);
 	SET_DEVICE_OP(dev_ops, modify_cq);
 	SET_DEVICE_OP(dev_ops, modify_device);
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
index b74d2a2fb342..b91f86a05540 100644
--- a/drivers/infiniband/core/ib_core_uverbs.c
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -8,42 +8,99 @@
 #include "uverbs.h"
 #include "core_priv.h"
 
-/*
- * Each time we map IO memory into user space this keeps track of the mapping.
- * When the device is hot-unplugged we 'zap' the mmaps in user space to point
- * to the zero page and allow the hot unplug to proceed.
+/**
+ * rdma_umap_priv_init() - Initialize the private data of a vma
+ *
+ * @vma: The vm area struct that needs private data
+ * @entry: entry into the mmap_xa that needs to be linked with
+ *       this vma
+ *
+ * Each time we map IO memory into user space this keeps track
+ * of the mapping. When the device is hot-unplugged we 'zap' the
+ * mmaps in user space to point to the zero page and allow the
+ * hot unplug to proceed.
  *
  * This is necessary for cases like PCI physical hot unplug as the actual BAR
  * memory may vanish after this and access to it from userspace could MCE.
  *
  * RDMA drivers supporting disassociation must have their user space designed
  * to cope in some way with their IO pages going to the zero page.
+ *
+ * We extended the umap list usage to track all memory that was mapped by
+ * user space and not only the IO memory. This will occur for drivers that use
+ * the mmap_xa database and helper functions
+ *
+ * Return 0 on success or -ENOMEM if out of memory
  */
-void rdma_umap_priv_init(struct rdma_umap_priv *priv,
-			 struct vm_area_struct *vma)
+int rdma_umap_priv_init(struct vm_area_struct *vma,
+			struct rdma_user_mmap_entry *entry)
 {
 	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
+	struct rdma_umap_priv *priv;
+
+	/* If the xa_mmap is used, private data will already be initialized.
+	 * this is required for the cases that rdma_user_mmap_io is called
+	 * from drivers that don't use the xa_mmap database
+	 */
+	if (vma->vm_private_data)
+		return 0;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
 
 	priv->vma = vma;
+	priv->entry = entry;
 	vma->vm_private_data = priv;
 	/* vm_ops is setup in ib_uverbs_mmap() to avoid module dependencies */
 
 	mutex_lock(&ufile->umap_lock);
 	list_add(&priv->list, &ufile->umaps);
 	mutex_unlock(&ufile->umap_lock);
+
+	return 0;
 }
 EXPORT_SYMBOL(rdma_umap_priv_init);
 
-/*
- * Map IO memory into a process. This is to be called by drivers as part of
- * their mmap() functions if they wish to send something like PCI-E BAR memory
- * to userspace.
+/**
+ * rdma_umap_priv_delete() - Delete an entry from the umaps list
+ *
+ * @ufile: associated user file:
+ * @priv:  private data allocated and stored in
+ *      rdma_umap_priv_init
+ *
+ */
+void rdma_umap_priv_delete(struct ib_uverbs_file *ufile,
+			   struct rdma_umap_priv *priv)
+{
+	mutex_lock(&ufile->umap_lock);
+	list_del(&priv->list);
+	mutex_unlock(&ufile->umap_lock);
+	kfree(priv);
+}
+EXPORT_SYMBOL(rdma_umap_priv_delete);
+
+/**
+ * rdma_user_mmap_io() - Map IO memory into a process.
+ *
+ * @ucontext: associated user context
+ * @vma: the vma related to the current mmap call.
+ * @pfn: pfn to map
+ * @size: size to map
+ * @prot: pgprot to use in remap call
+ *
+ * This is to be called by drivers as part of their mmap()
+ * functions if they wish to send something like PCI-E BAR
+ * memory to userspace.
+ *
+ * Return -EINVAL on wrong flags or size, -EAGAIN on failure to
+ * map. 0 on success.
  */
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		      unsigned long pfn, unsigned long size, pgprot_t prot)
 {
 	struct ib_uverbs_file *ufile = ucontext->ufile;
-	struct rdma_umap_priv *priv;
+	int ret;
 
 	if (!(vma->vm_flags & VM_SHARED))
 		return -EINVAL;
@@ -57,17 +114,228 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		return -EINVAL;
 	lockdep_assert_held(&ufile->device->disassociate_srcu);
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
-		return -ENOMEM;
+	ret = rdma_umap_priv_init(vma, NULL);
+	if (ret)
+		return ret;
 
 	vma->vm_page_prot = prot;
 	if (io_remap_pfn_range(vma, vma->vm_start, pfn, size, prot)) {
-		kfree(priv);
+		rdma_umap_priv_delete(ufile, vma->vm_private_data);
 		return -EAGAIN;
 	}
 
-	rdma_umap_priv_init(priv, vma);
 	return 0;
 }
 EXPORT_SYMBOL(rdma_user_mmap_io);
+
+static inline u64
+rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
+{
+	return (u64)entry->mmap_page << PAGE_SHIFT;
+}
+
+/**
+ * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @key: the key received from rdma_user_mmap_entry_insert which
+ *     is provided by user as the address to map.
+ * @len: the length the user wants to map.
+ * @vma: the vma related to the current mmap call.
+ *
+ * This function is called when a user tries to mmap a key it
+ * initially received from the driver. The key was created by
+ * the function rdma_user_mmap_entry_insert. The function should
+ * be called only once per mmap. It initializes the vma and
+ * increases the entries ref-count. Once the memory is unmapped
+ * the ref-count will decrease. When the refcount reaches zero
+ * the entry will be deleted.
+ *
+ * Return an entry if exists or NULL if there is no match.
+ */
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len,
+			 struct vm_area_struct *vma)
+{
+	struct rdma_user_mmap_entry *entry;
+	u64 mmap_page;
+
+	mmap_page = key >> PAGE_SHIFT;
+	if (mmap_page > U32_MAX)
+		return NULL;
+
+	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+	if (!entry)
+		return NULL;
+
+	kref_get(&entry->ref);
+	rdma_umap_priv_init(vma, entry);
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] returned\n",
+		  key, entry->npages);
+
+	return entry;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_get);
+
+void rdma_user_mmap_entry_free(struct kref *kref)
+{
+	struct rdma_user_mmap_entry *entry =
+		container_of(kref, struct rdma_user_mmap_entry, ref);
+	struct ib_ucontext *ucontext = entry->ucontext;
+	unsigned long i;
+
+	/* need to erase all entries occupied... */
+	for (i = 0; i < entry->npages; i++) {
+		xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
+
+		ibdev_dbg(ucontext->device,
+			  "mmap: key[%#llx] npages[%#x] removed\n",
+			  rdma_user_mmap_get_key(entry),
+			  entry->npages);
+	}
+	if (ucontext->device->ops.mmap_free)
+		ucontext->device->ops.mmap_free(entry);
+}
+
+/**
+ * rdma_user_mmap_entry_put() - drop reference to the mmap entry
+ *
+ * @ucontext: associated user context.
+ * @entry: an entry in the mmap_xa.
+ *
+ * This function is called when the mapping is closed or when
+ * the driver is done with the entry for some other reason.
+ * Should be called after rdma_user_mmap_entry_get was called
+ * and entry is no longer needed. This function will erase the
+ * entry and free it if its refcnt reaches zero.
+ */
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry)
+{
+	kref_put(&entry->ref, rdma_user_mmap_entry_free);
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_put);
+
+/**
+ * rdma_user_mmap_entry_remove() - Remove a key's entry from the mmap_xa
+ *
+ * @ucontext: associated user context.
+ * @key: the key to be deleted
+ *
+ * This function will find if there is an entry matching the key and if so
+ * decrease its refcnt, which will in turn delete the entry if
+ * its refcount reaches zero.
+ */
+void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, u64 key)
+{
+	struct rdma_user_mmap_entry *entry;
+	u32 mmap_page;
+
+	if (key == RDMA_USER_MMAP_INVALID)
+		return;
+
+	mmap_page = key >> PAGE_SHIFT;
+	if (mmap_page > U32_MAX)
+		return;
+
+	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+	if (!entry)
+		return;
+
+	rdma_user_mmap_entry_put(ucontext, entry);
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
+
+/**
+ * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @entry: the entry to insert into the mmap_xa
+ * @length: length of the address that will be mmapped
+ *
+ * This function should be called by drivers that use the rdma_user_mmap
+ * interface for handling user mmapped addresses. The database is handled in
+ * the core and helper functions are provided to insert entries into the
+ * database and extract entries when the user call mmap with the given key.
+ * The function returns a unique key that should be provided to user, the user
+ * will use the key to retrieve information such as address to
+ * be mapped and how.
+ *
+ * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
+ */
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				struct rdma_user_mmap_entry *entry,
+				u64 length)
+{
+	struct ib_uverbs_file *ufile = ucontext->ufile;
+	XA_STATE(xas, &ucontext->mmap_xa, 0);
+	u32 xa_first, xa_last, npages;
+	int err, i;
+
+	if (!entry)
+		return RDMA_USER_MMAP_INVALID;
+
+	kref_init(&entry->ref);
+	entry->ucontext = ucontext;
+
+	/* We want the whole allocation to be done without interruption
+	 * from a different thread. The allocation requires finding a
+	 * free range and storing. During the xa_insert the lock could be
+	 * released, we don't want another thread taking the gap.
+	 */
+	mutex_lock(&ufile->umap_lock);
+
+	xa_lock(&ucontext->mmap_xa);
+
+	/* We want to find an empty range */
+	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
+	entry->npages = npages;
+	do {
+		/* First find an empty index */
+		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
+		if (xas.xa_node == XAS_RESTART)
+			goto err_unlock;
+
+		xa_first = xas.xa_index;
+
+		/* Is there enough room to have the range? */
+		if (check_add_overflow(xa_first, npages, &xa_last))
+			goto err_unlock;
+
+		/* Now look for the next present entry. If such doesn't
+		 * exist, we found an empty range and can proceed
+		 */
+		xas_next_entry(&xas, xa_last - 1);
+		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
+			break;
+		/* o/w look for the next free entry */
+	} while (true);
+
+	for (i = xa_first; i < xa_last; i++) {
+		err = __xa_insert(&ucontext->mmap_xa, i, entry, GFP_KERNEL);
+		if (err)
+			goto err_undo;
+	}
+
+	entry->mmap_page = xa_first;
+	xa_unlock(&ucontext->mmap_xa);
+
+	mutex_unlock(&ufile->umap_lock);
+	ibdev_dbg(ucontext->device,
+		  "mmap: key[%#llx] npages[%#x] inserted\n",
+		  rdma_user_mmap_get_key(entry), npages);
+
+	return rdma_user_mmap_get_key(entry);
+
+err_undo:
+	for (; i > xa_first; i--)
+		__xa_erase(&ucontext->mmap_xa, i - 1);
+
+err_unlock:
+	xa_unlock(&ucontext->mmap_xa);
+	mutex_unlock(&ufile->umap_lock);
+	return RDMA_USER_MMAP_INVALID;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..6c72773faf29 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
 	rdma_restrack_del(&ucontext->res);
 
 	ib_dev->ops.dealloc_ucontext(ucontext);
+	WARN_ON(!xa_empty(&ucontext->mmap_xa));
 	kfree(ucontext);
 
 	ufile->ucontext = NULL;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 8f4fd4fac159..8af0e32df122 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
 
 	mutex_init(&ucontext->per_mm_list_lock);
 	INIT_LIST_HEAD(&ucontext->per_mm_list);
+	xa_init_flags(&ucontext->mmap_xa, XA_FLAGS_ALLOC);
 
 	ret = get_unused_fd_flags(O_CLOEXEC);
 	if (ret < 0)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 180a5e0f70e4..d550db513be8 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -802,7 +802,7 @@ static void rdma_umap_open(struct vm_area_struct *vma)
 {
 	struct ib_uverbs_file *ufile = vma->vm_file->private_data;
 	struct rdma_umap_priv *opriv = vma->vm_private_data;
-	struct rdma_umap_priv *priv;
+	int ret;
 
 	if (!opriv)
 		return;
@@ -816,10 +816,14 @@ static void rdma_umap_open(struct vm_area_struct *vma)
 	if (!ufile->ucontext)
 		goto out_unlock;
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
+	if (opriv->entry)
+		kref_get(&opriv->entry->ref);
+
+	/* We want to re-initialize the private data */
+	vma->vm_private_data = NULL;
+	ret = rdma_umap_priv_init(vma, opriv->entry);
+	if (ret)
 		goto out_unlock;
-	rdma_umap_priv_init(priv, vma);
 
 	up_read(&ufile->hw_destroy_rwsem);
 	return;
@@ -844,15 +848,15 @@ static void rdma_umap_close(struct vm_area_struct *vma)
 	if (!priv)
 		return;
 
+	if (priv->entry)
+		rdma_user_mmap_entry_put(ufile->ucontext, priv->entry);
+
 	/*
 	 * The vma holds a reference on the struct file that created it, which
 	 * in turn means that the ib_uverbs_file is guaranteed to exist at
 	 * this point.
 	 */
-	mutex_lock(&ufile->umap_lock);
-	list_del(&priv->list);
-	mutex_unlock(&ufile->umap_lock);
-	kfree(priv);
+	rdma_umap_priv_delete(ufile, priv);
 }
 
 /*
@@ -917,6 +921,9 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 
 			priv = list_first_entry(&ufile->umaps,
 						struct rdma_umap_priv, list);
+			if (priv->entry)
+				rdma_user_mmap_entry_put(ufile->ucontext,
+							 priv->entry);
 			mm = priv->vma->vm_mm;
 			ret = mmget_not_zero(mm);
 			if (!ret) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index de5bc352f473..030339ed4263 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1474,6 +1474,7 @@ struct ib_ucontext {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+	struct xarray mmap_xa;
 };
 
 struct ib_uobject {
@@ -2254,6 +2255,14 @@ struct iw_cm_conn_param;
 
 #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
 
+#define RDMA_USER_MMAP_INVALID U64_MAX
+struct rdma_user_mmap_entry {
+	struct kref ref;
+	struct ib_ucontext *ucontext;
+	u32 npages;
+	u32 mmap_page;
+};
+
 /**
  * struct ib_device_ops - InfiniBand device operations
  * This structure defines all the InfiniBand device operations, providers will
@@ -2366,6 +2375,13 @@ struct ib_device_ops {
 			      struct ib_udata *udata);
 	void (*dealloc_ucontext)(struct ib_ucontext *context);
 	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
+	/**
+	 * This will be called once refcount of an entry in mmap_xa reaches
+	 * zero. The type of the memory that was mapped may differ between
+	 * entries and is opaque to the rdma_user_mmap interface.
+	 * Therefore needs to be implemented by the driver in mmap_free.
+	 */
+	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
 	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
 	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
 	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
@@ -2792,18 +2808,19 @@ void  ib_set_client_data(struct ib_device *device, struct ib_client *client,
 void ib_set_device_ops(struct ib_device *device,
 		       const struct ib_device_ops *ops);
 
-#if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		      unsigned long pfn, unsigned long size, pgprot_t prot);
-#else
-static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
-				    struct vm_area_struct *vma,
-				    unsigned long pfn, unsigned long size,
-				    pgprot_t prot)
-{
-	return -EINVAL;
-}
-#endif
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				struct rdma_user_mmap_entry *entry,
+				u64 length);
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len,
+			 struct vm_area_struct *vma);
+
+void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext,
+			      struct rdma_user_mmap_entry *entry);
+
+void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, u64 key);
 
 static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
 {
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
@ 2019-08-27 13:28 ` Michal Kalderon
  2019-08-29 14:20   ` Gal Pressman
  2019-08-27 13:28 ` [PATCH v8 rdma-next 4/7] RDMA/siw: " Michal Kalderon
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

Remove the functions related to managing the mmap_xa database.
This code was copied to the ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/efa/efa.h       |  18 +-
 drivers/infiniband/hw/efa/efa_main.c  |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c | 339 +++++++++++++++++-----------------
 3 files changed, 185 insertions(+), 173 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 2283e432693e..148ff5b86746 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -71,8 +71,6 @@ struct efa_dev {
 
 struct efa_ucontext {
 	struct ib_ucontext ibucontext;
-	struct xarray mmap_xa;
-	u32 mmap_xa_page;
 	u16 uarn;
 };
 
@@ -91,6 +89,7 @@ struct efa_cq {
 	struct efa_ucontext *ucontext;
 	dma_addr_t dma_addr;
 	void *cpu_addr;
+	u64 mmap_key;
 	size_t size;
 	u16 cq_idx;
 };
@@ -101,6 +100,13 @@ struct efa_qp {
 	void *rq_cpu_addr;
 	size_t rq_size;
 	enum ib_qp_state state;
+
+	/* Used for saving mmap_xa entries */
+	u64 sq_db_mmap_key;
+	u64 llq_desc_mmap_key;
+	u64 rq_db_mmap_key;
+	u64 rq_mmap_key;
+
 	u32 qp_handle;
 	u32 max_send_wr;
 	u32 max_recv_wr;
@@ -116,6 +122,13 @@ struct efa_ah {
 	u8 id[EFA_GID_SIZE];
 };
 
+struct efa_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	u64 length;
+	u8 mmap_flag;
+};
+
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata);
@@ -147,6 +160,7 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata);
 void efa_dealloc_ucontext(struct ib_ucontext *ibucontext);
 int efa_mmap(struct ib_ucontext *ibucontext,
 	     struct vm_area_struct *vma);
+void efa_mmap_free(struct rdma_user_mmap_entry *entry);
 int efa_create_ah(struct ib_ah *ibah,
 		  struct rdma_ah_attr *ah_attr,
 		  u32 flags,
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 83858f7e83d0..0e3050d01b75 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -217,6 +217,7 @@ static const struct ib_device_ops efa_dev_ops = {
 	.get_link_layer = efa_port_link_layer,
 	.get_port_immutable = efa_get_port_immutable,
 	.mmap = efa_mmap,
+	.mmap_free = efa_mmap_free,
 	.modify_qp = efa_modify_qp,
 	.query_device = efa_query_device,
 	.query_gid = efa_query_gid,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 70851bd7f801..dde276acbd00 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -13,10 +13,6 @@
 
 #include "efa.h"
 
-#define EFA_MMAP_FLAG_SHIFT 56
-#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
-#define EFA_MMAP_INVALID U64_MAX
-
 enum {
 	EFA_MMAP_DMA_PAGE = 0,
 	EFA_MMAP_IO_WC,
@@ -27,20 +23,6 @@ enum {
 	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
 	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
 
-struct efa_mmap_entry {
-	void  *obj;
-	u64 address;
-	u64 length;
-	u32 mmap_page;
-	u8 mmap_flag;
-};
-
-static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
-{
-	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
-	       ((u64)efa->mmap_page << PAGE_SHIFT);
-}
-
 #define EFA_DEFINE_STATS(op) \
 	op(EFA_TX_BYTES, "tx_bytes") \
 	op(EFA_TX_PKTS, "tx_pkts") \
@@ -147,6 +129,12 @@ static inline struct efa_ah *to_eah(struct ib_ah *ibah)
 	return container_of(ibah, struct efa_ah, ibah);
 }
 
+static inline struct efa_user_mmap_entry *
+to_emmap(struct rdma_user_mmap_entry *rdma_mmap)
+{
+	return container_of(rdma_mmap, struct efa_user_mmap_entry, rdma_entry);
+}
+
 #define field_avail(x, fld, sz) (offsetof(typeof(x), fld) + \
 				 sizeof(((typeof(x) *)0)->fld) <= (sz))
 
@@ -172,106 +160,6 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
 	return addr;
 }
 
-/*
- * This is only called when the ucontext is destroyed and there can be no
- * concurrent query via mmap or allocate on the xarray, thus we can be sure no
- * other thread is using the entry pointer. We also know that all the BAR
- * pages have either been zap'd or munmaped at this point.  Normal pages are
- * refcounted and will be freed at the proper time.
- */
-static void mmap_entries_remove_free(struct efa_dev *dev,
-				     struct efa_ucontext *ucontext)
-{
-	struct efa_mmap_entry *entry;
-	unsigned long mmap_page;
-
-	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
-		xa_erase(&ucontext->mmap_xa, mmap_page);
-
-		ibdev_dbg(
-			&dev->ibdev,
-			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-			entry->obj, get_mmap_key(entry), entry->address,
-			entry->length);
-		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
-			/* DMA mapping is already gone, now free the pages */
-			free_pages_exact(phys_to_virt(entry->address),
-					 entry->length);
-		kfree(entry);
-	}
-}
-
-static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
-					     struct efa_ucontext *ucontext,
-					     u64 key, u64 len)
-{
-	struct efa_mmap_entry *entry;
-	u64 mmap_page;
-
-	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
-	if (mmap_page > U32_MAX)
-		return NULL;
-
-	entry = xa_load(&ucontext->mmap_xa, mmap_page);
-	if (!entry || get_mmap_key(entry) != key || entry->length != len)
-		return NULL;
-
-	ibdev_dbg(&dev->ibdev,
-		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-		  entry->obj, key, entry->address, entry->length);
-
-	return entry;
-}
-
-/*
- * Note this locking scheme cannot support removal of entries, except during
- * ucontext destruction when the core code guarentees no concurrency.
- */
-static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext *ucontext,
-			     void *obj, u64 address, u64 length, u8 mmap_flag)
-{
-	struct efa_mmap_entry *entry;
-	u32 next_mmap_page;
-	int err;
-
-	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
-	if (!entry)
-		return EFA_MMAP_INVALID;
-
-	entry->obj = obj;
-	entry->address = address;
-	entry->length = length;
-	entry->mmap_flag = mmap_flag;
-
-	xa_lock(&ucontext->mmap_xa);
-	if (check_add_overflow(ucontext->mmap_xa_page,
-			       (u32)(length >> PAGE_SHIFT),
-			       &next_mmap_page))
-		goto err_unlock;
-
-	entry->mmap_page = ucontext->mmap_xa_page;
-	ucontext->mmap_xa_page = next_mmap_page;
-	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
-			  GFP_KERNEL);
-	if (err)
-		goto err_unlock;
-
-	xa_unlock(&ucontext->mmap_xa);
-
-	ibdev_dbg(
-		&dev->ibdev,
-		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
-		entry->obj, entry->address, entry->length, get_mmap_key(entry));
-
-	return get_mmap_key(entry);
-
-err_unlock:
-	xa_unlock(&ucontext->mmap_xa);
-	kfree(entry);
-	return EFA_MMAP_INVALID;
-
-}
-
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata)
@@ -485,13 +373,28 @@ static int efa_destroy_qp_handle(struct efa_dev *dev, u32 qp_handle)
 	return efa_com_destroy_qp(&dev->edev, &params);
 }
 
+static void efa_qp_user_mmap_entries_remove(struct efa_ucontext *ucontext,
+					    struct efa_qp *qp)
+{
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->sq_db_mmap_key);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    qp->llq_desc_mmap_key);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_mmap_key);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_db_mmap_key);
+}
+
 int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
+	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata,
+		struct efa_ucontext, ibucontext);
 	struct efa_dev *dev = to_edev(ibqp->pd->device);
 	struct efa_qp *qp = to_eqp(ibqp);
 	int err;
 
 	ibdev_dbg(&dev->ibdev, "Destroy qp[%u]\n", ibqp->qp_num);
+
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
+
 	err = efa_destroy_qp_handle(dev, qp->qp_handle);
 	if (err)
 		return err;
@@ -509,57 +412,114 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	return 0;
 }
 
+static int efa_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				      u64 address, u64 length,
+				      u8 mmap_flag, u64 *key)
+{
+	struct efa_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->address = address;
+	entry->length = length;
+	entry->mmap_flag = mmap_flag;
+
+	*key = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
+					   length);
+	if (*key == RDMA_USER_MMAP_INVALID) {
+		kfree(entry);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void efa_qp_init_keys(struct efa_qp *qp)
+{
+	qp->sq_db_mmap_key = RDMA_USER_MMAP_INVALID;
+	qp->llq_desc_mmap_key = RDMA_USER_MMAP_INVALID;
+	qp->rq_db_mmap_key = RDMA_USER_MMAP_INVALID;
+	qp->rq_mmap_key = RDMA_USER_MMAP_INVALID;
+}
+
 static int qp_mmap_entries_setup(struct efa_qp *qp,
 				 struct efa_dev *dev,
 				 struct efa_ucontext *ucontext,
 				 struct efa_com_create_qp_params *params,
 				 struct efa_ibv_create_qp_resp *resp)
 {
-	/*
-	 * Once an entry is inserted it might be mmapped, hence cannot be
-	 * cleaned up until dealloc_ucontext.
-	 */
-	resp->sq_db_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->db_bar_addr + resp->sq_db_offset,
-				  PAGE_SIZE, EFA_MMAP_IO_NC);
-	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+	u64 address;
+	u64 length;
+	int err;
+
+	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					 dev->db_bar_addr +
+					 resp->sq_db_offset,
+					 PAGE_SIZE, EFA_MMAP_IO_NC,
+					 &qp->sq_db_mmap_key);
+	if (err)
+		return err;
 
+	resp->sq_db_mmap_key = qp->sq_db_mmap_key;
 	resp->sq_db_offset &= ~PAGE_MASK;
 
-	resp->llq_desc_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->mem_bar_addr + resp->llq_desc_offset,
-				  PAGE_ALIGN(params->sq_ring_size_in_bytes +
-					     (resp->llq_desc_offset & ~PAGE_MASK)),
-				  EFA_MMAP_IO_WC);
-	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+	address = dev->mem_bar_addr + resp->llq_desc_offset;
+	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
+			    (resp->llq_desc_offset & ~PAGE_MASK));
+
+	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					 address,
+					 length,
+					 EFA_MMAP_IO_WC,
+					 &qp->llq_desc_mmap_key);
+	if (err)
+		goto err1;
 
+	resp->llq_desc_mmap_key = qp->llq_desc_mmap_key;
 	resp->llq_desc_offset &= ~PAGE_MASK;
 
 	if (qp->rq_size) {
-		resp->rq_db_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  dev->db_bar_addr + resp->rq_db_offset,
-					  PAGE_SIZE, EFA_MMAP_IO_NC);
-		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
-			return -ENOMEM;
+		address = dev->db_bar_addr + resp->rq_db_offset;
 
+		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+						 address, PAGE_SIZE,
+						 EFA_MMAP_IO_NC,
+						 &qp->rq_db_mmap_key);
+		if (err)
+			goto err2;
+
+		resp->rq_db_mmap_key = qp->rq_db_mmap_key;
 		resp->rq_db_offset &= ~PAGE_MASK;
 
-		resp->rq_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  virt_to_phys(qp->rq_cpu_addr),
-					  qp->rq_size, EFA_MMAP_DMA_PAGE);
-		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
-			return -ENOMEM;
+		address = virt_to_phys(qp->rq_cpu_addr);
+		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+						 address, qp->rq_size,
+						 EFA_MMAP_DMA_PAGE,
+						 &qp->rq_mmap_key);
+		if (err)
+			goto err3;
 
+		resp->rq_mmap_key = qp->rq_mmap_key;
 		resp->rq_mmap_size = qp->rq_size;
 	}
 
 	return 0;
+
+err3:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_db_mmap_key);
+
+err2:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    qp->llq_desc_mmap_key);
+
+err1:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->sq_db_mmap_key);
+
+	/* If any error occurred, we init the keys back to invalid */
+	efa_qp_init_keys(qp);
+
+	return err;
 }
 
 static int efa_qp_validate_cap(struct efa_dev *dev,
@@ -634,7 +594,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 	struct efa_dev *dev = to_edev(ibpd->device);
 	struct efa_ibv_create_qp_resp resp = {};
 	struct efa_ibv_create_qp cmd = {};
-	bool rq_entry_inserted = false;
 	struct efa_ucontext *ucontext;
 	struct efa_qp *qp;
 	int err;
@@ -687,6 +646,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 		goto err_out;
 	}
 
+	efa_qp_init_keys(qp);
 	create_qp_params.uarn = ucontext->uarn;
 	create_qp_params.pd = to_epd(ibpd)->pdn;
 
@@ -742,7 +702,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 	if (err)
 		goto err_destroy_qp;
 
-	rq_entry_inserted = true;
 	qp->qp_handle = create_qp_resp.qp_handle;
 	qp->ibqp.qp_num = create_qp_resp.qp_num;
 	qp->ibqp.qp_type = init_attr->qp_type;
@@ -759,7 +718,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 			ibdev_dbg(&dev->ibdev,
 				  "Failed to copy udata for qp[%u]\n",
 				  create_qp_resp.qp_num);
-			goto err_destroy_qp;
+			goto err_remove_mmap_entries;
 		}
 	}
 
@@ -767,15 +726,17 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 
 	return &qp->ibqp;
 
+err_remove_mmap_entries:
+	efa_qp_user_mmap_entries_remove(ucontext, qp);
 err_destroy_qp:
 	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
 err_free_mapped:
-	if (qp->rq_size) {
+	if (qp->rq_dma_addr)
 		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr, qp->rq_size,
 				 DMA_TO_DEVICE);
-		if (!rq_entry_inserted)
-			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
-	}
+
+	if (qp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
+		free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
 err_free_qp:
 	kfree(qp);
 err_out:
@@ -887,6 +848,7 @@ static int efa_destroy_cq_idx(struct efa_dev *dev, int cq_idx)
 
 void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
+	struct efa_ucontext *ucontext;
 	struct efa_dev *dev = to_edev(ibcq->device);
 	struct efa_cq *cq = to_ecq(ibcq);
 
@@ -894,20 +856,33 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 		  "Destroy cq[%d] virt[0x%p] freed: size[%lu], dma[%pad]\n",
 		  cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
 
+	ucontext = rdma_udata_to_drv_context(udata, struct efa_ucontext,
+					     ibucontext);
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
+				    cq->mmap_key);
 }
 
 static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
 				 struct efa_ibv_create_cq_resp *resp)
 {
+	struct efa_ucontext *ucontext = cq->ucontext;
+	int err;
+
 	resp->q_mmap_size = cq->size;
-	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
-					     virt_to_phys(cq->cpu_addr),
-					     cq->size, EFA_MMAP_DMA_PAGE);
-	if (resp->q_mmap_key == EFA_MMAP_INVALID)
-		return -ENOMEM;
+
+	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
+					 virt_to_phys(cq->cpu_addr),
+					 cq->size, EFA_MMAP_DMA_PAGE,
+					 &cq->mmap_key);
+	if (err) {
+		cq->mmap_key = RDMA_USER_MMAP_INVALID;
+		return err;
+	}
+
+	resp->q_mmap_key = cq->mmap_key;
 
 	return 0;
 }
@@ -924,7 +899,6 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	struct efa_dev *dev = to_edev(ibdev);
 	struct efa_ibv_create_cq cmd = {};
 	struct efa_cq *cq = to_ecq(ibcq);
-	bool cq_entry_inserted = false;
 	int entries = attr->cqe;
 	int err;
 
@@ -1013,15 +987,13 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		goto err_destroy_cq;
 	}
 
-	cq_entry_inserted = true;
-
 	if (udata->outlen) {
 		err = ib_copy_to_udata(udata, &resp,
 				       min(sizeof(resp), udata->outlen));
 		if (err) {
 			ibdev_dbg(ibdev,
 				  "Failed to copy udata for create_cq\n");
-			goto err_destroy_cq;
+			goto err_remove_mmap;
 		}
 	}
 
@@ -1030,13 +1002,16 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	return 0;
 
+err_remove_mmap:
+	rdma_user_mmap_entry_remove(&ucontext->ibucontext, cq->mmap_key);
 err_destroy_cq:
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 err_free_mapped:
 	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
 			 DMA_FROM_DEVICE);
-	if (!cq_entry_inserted)
+	if (cq->mmap_key == RDMA_USER_MMAP_INVALID)
 		free_pages_exact(cq->cpu_addr, cq->size);
+
 err_out:
 	atomic64_inc(&dev->stats.sw_stats.create_cq_err);
 	return err;
@@ -1558,7 +1533,6 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
 		goto err_out;
 
 	ucontext->uarn = result.uarn;
-	xa_init(&ucontext->mmap_xa);
 
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
@@ -1587,28 +1561,47 @@ void efa_dealloc_ucontext(struct ib_ucontext *ibucontext)
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
 
-	mmap_entries_remove_free(dev, ucontext);
 	efa_dealloc_uar(dev, ucontext->uarn);
 }
 
+void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
+{
+	struct efa_user_mmap_entry *entry = to_emmap(rdma_entry);
+
+	/* DMA mapping is already gone, now free the pages */
+	if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
+		free_pages_exact(phys_to_virt(entry->address),
+				 entry->length);
+	kfree(entry);
+}
+
 static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 		      struct vm_area_struct *vma, u64 key, u64 length)
 {
-	struct efa_mmap_entry *entry;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct efa_user_mmap_entry *entry;
 	unsigned long va;
 	u64 pfn;
 	int err;
 
-	entry = mmap_entry_get(dev, ucontext, key, length);
-	if (!entry) {
+	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
+					      length, vma);
+	if (!rdma_entry) {
 		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
 			  key);
 		return -EINVAL;
 	}
+	entry = to_emmap(rdma_entry);
+	if (entry->length != length) {
+		ibdev_dbg(&dev->ibdev,
+			  "key[%#llx] does not have valid length[%#llx] expected[%#llx]\n",
+			  key, length, entry->length);
+		return -EINVAL;
+	}
 
 	ibdev_dbg(&dev->ibdev,
 		  "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
-		  entry->address, length, entry->mmap_flag);
+		  entry->address, entry->length, entry->mmap_flag);
 
 	pfn = entry->address >> PAGE_SHIFT;
 	switch (entry->mmap_flag) {
@@ -1637,6 +1630,10 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 			&dev->ibdev,
 			"Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
 			entry->address, length, entry->mmap_flag, err);
+
+		rdma_user_mmap_entry_put(&ucontext->ibucontext,
+					 rdma_entry);
+
 		return err;
 	}
 
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (2 preceding siblings ...)
  2019-08-27 13:28 ` [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
@ 2019-08-27 13:28 ` " Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

Remove the functions related to managing the mmap_xa database.
This code is now common in ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/sw/siw/siw.h       |  20 ++-
 drivers/infiniband/sw/siw/siw_main.c  |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c | 223 +++++++++++++++++++---------------
 drivers/infiniband/sw/siw/siw_verbs.h |   1 +
 4 files changed, 144 insertions(+), 101 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index 77b1aabf6ff3..d48cd42ae43e 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -220,7 +220,7 @@ struct siw_cq {
 	u32 cq_get;
 	u32 num_cqe;
 	bool kernel_verbs;
-	u32 xa_cq_index; /* mmap information for CQE array */
+	u64 cq_key; /* mmap information for CQE array */
 	u32 id; /* For debugging only */
 };
 
@@ -263,7 +263,7 @@ struct siw_srq {
 	u32 rq_put;
 	u32 rq_get;
 	u32 num_rqe; /* max # of wqe's allowed */
-	u32 xa_srq_index; /* mmap information for SRQ array */
+	u64 srq_key; /* mmap information for SRQ array */
 	char armed; /* inform user if limit hit */
 	char kernel_verbs; /* '1' if kernel client */
 };
@@ -477,8 +477,8 @@ struct siw_qp {
 		u8 layer : 4, etype : 4;
 		u8 ecode;
 	} term_info;
-	u32 xa_sq_index; /* mmap information for SQE array */
-	u32 xa_rq_index; /* mmap information for RQE array */
+	u64 sq_key; /* mmap information for SQE array */
+	u64 rq_key; /* mmap information for RQE array */
 	struct rcu_head rcu;
 };
 
@@ -503,6 +503,12 @@ struct iwarp_msg_info {
 	int (*rx_data)(struct siw_qp *qp);
 };
 
+struct siw_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	u64 length;
+};
+
 /* Global siw parameters. Currently set in siw_main.c */
 extern const bool zcopy_tx;
 extern const bool try_gso;
@@ -607,6 +613,12 @@ static inline struct siw_mr *to_siw_mr(struct ib_mr *base_mr)
 	return container_of(base_mr, struct siw_mr, base_mr);
 }
 
+static inline struct siw_user_mmap_entry *
+to_siw_mmap_entry(struct rdma_user_mmap_entry *rdma_mmap)
+{
+	return container_of(rdma_mmap, struct siw_user_mmap_entry, rdma_entry);
+}
+
 static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev, int id)
 {
 	struct siw_qp *qp;
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 05a92f997f60..46c0bb59489d 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -298,6 +298,7 @@ static const struct ib_device_ops siw_device_ops = {
 	.iw_rem_ref = siw_qp_put_ref,
 	.map_mr_sg = siw_map_mr_sg,
 	.mmap = siw_mmap,
+	.mmap_free = siw_mmap_free,
 	.modify_qp = siw_verbs_modify_qp,
 	.modify_srq = siw_modify_srq,
 	.poll_cq = siw_poll_cq,
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index 03176a3d1e18..9e049241051e 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -34,44 +34,21 @@ static char ib_qp_state_to_string[IB_QPS_ERR + 1][sizeof("RESET")] = {
 	[IB_QPS_ERR] = "ERR"
 };
 
-static u32 siw_create_uobj(struct siw_ucontext *uctx, void *vaddr, u32 size)
+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
-	struct siw_uobj *uobj;
-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
-	u32 key;
+	struct siw_user_mmap_entry *entry = to_siw_mmap_entry(rdma_entry);
 
-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
-	if (!uobj)
-		return SIW_INVAL_UOBJ_KEY;
-
-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit, &uctx->uobj_nextkey,
-			    GFP_KERNEL) < 0) {
-		kfree(uobj);
-		return SIW_INVAL_UOBJ_KEY;
-	}
-	uobj->size = PAGE_ALIGN(size);
-	uobj->addr = vaddr;
-
-	return key;
-}
-
-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
-				     unsigned long off, u32 size)
-{
-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
-
-	if (uobj && uobj->size == size)
-		return uobj;
-
-	return NULL;
+	vfree((void *)entry->address);
+	kfree(entry);
 }
 
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
-	struct siw_uobj *uobj;
-	unsigned long off = vma->vm_pgoff;
+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+	struct rdma_user_mmap_entry *rdma_entry;
 	int size = vma->vm_end - vma->vm_start;
+	struct siw_user_mmap_entry *entry;
 	int rv = -EINVAL;
 
 	/*
@@ -81,15 +58,26 @@ int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
 		pr_warn("siw: mmap not page aligned\n");
 		goto out;
 	}
-	uobj = siw_get_uobj(uctx, off, size);
-	if (!uobj) {
+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off,
+					      size, vma);
+	if (!rdma_entry) {
 		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %u\n",
 			off, size);
 		goto out;
 	}
-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
-	if (rv)
+	entry = to_siw_mmap_entry(rdma_entry);
+	if (entry->length != size) {
+		siw_dbg(&uctx->sdev->base_dev,
+			"key[%#lx] does not have valid length[%#x] expected[%#llx]\n",
+			off, size, entry->length);
+		return -EINVAL;
+	}
+
+	rv = remap_vmalloc_range(vma, (void *)entry->address, 0);
+	if (rv) {
 		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off, size);
+		rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
+	}
 out:
 	return rv;
 }
@@ -105,7 +93,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx, struct ib_udata *udata)
 		rv = -ENOMEM;
 		goto err_out;
 	}
-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
+
 	ctx->uobj_nextkey = 0;
 	ctx->sdev = sdev;
 
@@ -135,19 +123,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx, struct ib_udata *udata)
 void siw_dealloc_ucontext(struct ib_ucontext *base_ctx)
 {
 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
-	void *entry;
-	unsigned long index;
 
-	/*
-	 * Make sure all user mmap objects are gone. Since QP, CQ
-	 * and SRQ destroy routines destroy related objects, nothing
-	 * should be found here.
-	 */
-	xa_for_each(&uctx->xa, index, entry) {
-		kfree(xa_erase(&uctx->xa, index));
-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
-	}
-	xa_destroy(&uctx->xa);
 	atomic_dec(&uctx->sdev->num_ctx);
 }
 
@@ -293,6 +269,28 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
 	siw_qp_put(to_siw_qp(base_qp));
 }
 
+static int siw_user_mmap_entry_insert(struct ib_ucontext *ucontext,
+				      u64 address, u64 length,
+				      u64 *key)
+{
+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->address = address;
+	entry->length = length;
+
+	*key = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
+					   length);
+	if (*key == RDMA_USER_MMAP_INVALID) {
+		kfree(entry);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /*
  * siw_create_qp()
  *
@@ -317,6 +315,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	struct siw_cq *scq = NULL, *rcq = NULL;
 	unsigned long flags;
 	int num_sqe, num_rqe, rv = 0;
+	u64 length;
+	u64 key;
 
 	siw_dbg(base_dev, "create new QP\n");
 
@@ -380,8 +380,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 	spin_lock_init(&qp->orq_lock);
 
 	qp->kernel_verbs = !udata;
-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
+	qp->sq_key = RDMA_USER_MMAP_INVALID;
+	qp->rq_key = RDMA_USER_MMAP_INVALID;
 
 	rv = siw_qp_add(sdev, qp);
 	if (rv)
@@ -459,26 +459,41 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 		uresp.qp_id = qp_id(qp);
 
 		if (qp->sendq) {
-			qp->xa_sq_index =
-				siw_create_uobj(uctx, qp->sendq,
-					num_sqe * sizeof(struct siw_sqe));
+			length = num_sqe * sizeof(struct siw_sqe);
+			rv = siw_user_mmap_entry_insert(&uctx->base_ucontext,
+							(uintptr_t)qp->sendq,
+							length, &key);
+			if (!rv)
+				goto err_out_xa;
+
+			/* If entry was inserted successfully, qp->sendq
+			 * will be freed by siw_mmap_free
+			 */
+			qp->sendq = NULL;
+			qp->sq_key = key;
 		}
+
 		if (qp->recvq) {
-			qp->xa_rq_index =
-				 siw_create_uobj(uctx, qp->recvq,
-					num_rqe * sizeof(struct siw_rqe));
-		}
-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
-			goto err_out_xa;
+			length = num_rqe * sizeof(struct siw_rqe);
+			rv = siw_user_mmap_entry_insert(&uctx->base_ucontext,
+							(uintptr_t)qp->recvq,
+							length, &key);
+			if (!rv)
+				goto err_out_mmap_rem;
+
+			/* If entry was inserted successfully, qp->recvq will
+			 * be freed by siw_mmap_free
+			 */
+			qp->recvq = NULL;
+			qp->rq_key = key;
 		}
-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;
+
+		uresp.sq_key = qp->sq_key;
+		uresp.rq_key = qp->rq_key;
 
 		if (udata->outlen < sizeof(uresp)) {
 			rv = -EINVAL;
-			goto err_out_xa;
+			goto err_out_mmap_rem;
 		}
 		rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 		if (rv)
@@ -496,21 +511,23 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
 
 	return qp->ib_qp;
 
+err_out_mmap_rem:
+	if (udata) {
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_key);
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_key);
+	}
+
 err_out_xa:
 	xa_erase(&sdev->qp_xa, qp_id(qp));
 err_out:
 	kfree(siw_base_qp);
 
 	if (qp) {
-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
-
 		vfree(qp->sendq);
 		vfree(qp->recvq);
 		kfree(qp);
 	}
+
 	atomic_dec(&sdev->num_qp);
 
 	return ERR_PTR(rv);
@@ -620,10 +637,10 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
 	qp->rx_stream.rx_suspend = 1;
 
-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
+	if (uctx) {
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_key);
+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_key);
+	}
 
 	down_write(&qp->state_lock);
 
@@ -993,8 +1010,8 @@ void siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata)
 
 	siw_cq_flush(cq);
 
-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
+	if (ctx)
+		rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_key);
 
 	atomic_dec(&sdev->num_cq);
 
@@ -1031,7 +1048,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 	size = roundup_pow_of_two(size);
 	cq->base_cq.cqe = size;
 	cq->num_cqe = size;
-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
+	cq->cq_key = RDMA_USER_MMAP_INVALID;
 
 	if (!udata) {
 		cq->kernel_verbs = 1;
@@ -1057,16 +1074,21 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		struct siw_ucontext *ctx =
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
+		u64 length = size * sizeof(struct siw_cqe) +
+			     sizeof(struct siw_cq_ctrl);
 
-		cq->xa_cq_index =
-			siw_create_uobj(ctx, cq->queue,
-					size * sizeof(struct siw_cqe) +
-						sizeof(struct siw_cq_ctrl));
-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
+						(uintptr_t)cq->queue,
+						length, &cq->cq_key);
+		if (!rv)
 			goto err_out;
-		}
-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
+
+		/* If entry was inserted successfully, cq->queue will be freed
+		 * by siw_mmap_free
+		 */
+		cq->queue = NULL;
+
+		uresp.cq_key = cq->cq_key;
 		uresp.cq_id = cq->id;
 		uresp.num_cqe = size;
 
@@ -1087,8 +1109,9 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		struct siw_ucontext *ctx =
 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
 						  base_ucontext);
-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
+		if (udata)
+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+						    cq->cq_key);
 		vfree(cq->queue);
 	}
 	atomic_dec(&sdev->num_cq);
@@ -1490,7 +1513,7 @@ int siw_create_srq(struct ib_srq *base_srq,
 	}
 	srq->max_sge = attrs->max_sge;
 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
+	srq->srq_key = RDMA_USER_MMAP_INVALID;
 	srq->limit = attrs->srq_limit;
 	if (srq->limit)
 		srq->armed = 1;
@@ -1509,15 +1532,19 @@ int siw_create_srq(struct ib_srq *base_srq,
 	}
 	if (udata) {
 		struct siw_uresp_create_srq uresp = {};
+		u64 length = srq->num_rqe * sizeof(struct siw_rqe);
 
-		srq->xa_srq_index = siw_create_uobj(
-			ctx, srq->recvq, srq->num_rqe * sizeof(struct siw_rqe));
-
-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
-			rv = -ENOMEM;
+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
+						(uintptr_t)srq->recvq,
+						length, &srq->srq_key);
+		if (!rv)
 			goto err_out;
-		}
-		uresp.srq_key = srq->xa_srq_index;
+
+		/* If entry was inserted successfully, srq->recvq will be freed
+		 * by siw_mmap_free
+		 */
+		srq->recvq = NULL;
+		uresp.srq_key = srq->srq_key;
 		uresp.num_rqe = srq->num_rqe;
 
 		if (udata->outlen < sizeof(uresp)) {
@@ -1536,8 +1563,9 @@ int siw_create_srq(struct ib_srq *base_srq,
 
 err_out:
 	if (srq->recvq) {
-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
+		if (udata)
+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+						    srq->srq_key);
 		vfree(srq->recvq);
 	}
 	atomic_dec(&sdev->num_srq);
@@ -1623,8 +1651,9 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata)
 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
 					  base_ucontext);
 
-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
+	if (ctx)
+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
+					    srq->srq_key);
 
 	vfree(srq->recvq);
 	atomic_dec(&sdev->num_srq);
diff --git a/drivers/infiniband/sw/siw/siw_verbs.h b/drivers/infiniband/sw/siw/siw_verbs.h
index 1910869281cb..1a731989fad6 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.h
+++ b/drivers/infiniband/sw/siw/siw_verbs.h
@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata);
 int siw_post_srq_recv(struct ib_srq *base_srq, const struct ib_recv_wr *wr,
 		      const struct ib_recv_wr **bad_wr);
 int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma);
+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);
 void siw_cq_event(struct siw_cq *cq, enum ib_event_type type);
 void siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 5/7] RDMA/qedr: Use the common mmap API
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (3 preceding siblings ...)
  2019-08-27 13:28 ` [PATCH v8 rdma-next 4/7] RDMA/siw: " Michal Kalderon
@ 2019-08-27 13:28 ` Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

Remove all function related to mmap from qedr and use the common
API

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/main.c  |   1 +
 drivers/infiniband/hw/qedr/qedr.h  |  28 +++---
 drivers/infiniband/hw/qedr/verbs.c | 191 +++++++++++++++++--------------------
 drivers/infiniband/hw/qedr/verbs.h |   3 +-
 4 files changed, 106 insertions(+), 117 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 5136b835e1ba..aab269602284 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -212,6 +212,7 @@ static const struct ib_device_ops qedr_dev_ops = {
 	.get_link_layer = qedr_link_layer,
 	.map_mr_sg = qedr_map_mr_sg,
 	.mmap = qedr_mmap,
+	.mmap_free = qedr_mmap_free,
 	.modify_port = qedr_modify_port,
 	.modify_qp = qedr_modify_qp,
 	.modify_srq = qedr_modify_srq,
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 0cfd849b13d6..31769e203372 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -230,14 +230,10 @@ struct qedr_ucontext {
 	struct qedr_dev *dev;
 	struct qedr_pd *pd;
 	void __iomem *dpi_addr;
+	u64 db_key;
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
-
-	struct list_head mm_head;
-
-	/* Lock to protect mm list */
-	struct mutex mm_list_lock;
 };
 
 union db_prod64 {
@@ -300,14 +296,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-struct qedr_mm {
-	struct {
-		u64 phy_addr;
-		unsigned long len;
-	} key;
-	struct list_head entry;
-};
-
 union db_prod32 {
 	struct rdma_pwm_val16_data data;
 	u32 raw;
@@ -476,6 +464,13 @@ struct qedr_mr {
 	u32 npages;
 };
 
+struct qedr_user_mmap_entry {
+	struct rdma_user_mmap_entry rdma_entry;
+	u64 address;
+	u64 length;
+	u8 mmap_flag;
+};
+
 #define SET_FIELD2(value, name, flag) ((value) |= ((flag) << (name ## _SHIFT)))
 
 #define QEDR_RESP_IMM	(RDMA_CQE_RESPONDER_IMM_FLG_MASK << \
@@ -574,4 +569,11 @@ static inline struct qedr_srq *get_qedr_srq(struct ib_srq *ibsrq)
 {
 	return container_of(ibsrq, struct qedr_srq, ibsrq);
 }
+
+static inline struct qedr_user_mmap_entry *
+get_qedr_mmap_entry(struct rdma_user_mmap_entry *rdma_entry)
+{
+	return container_of(rdma_entry, struct qedr_user_mmap_entry,
+			    rdma_entry);
+}
 #endif
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 6f3ce86019b7..205c0b0c3244 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -58,6 +58,10 @@
 
 #define DB_ADDR_SHIFT(addr)		((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
 
+enum {
+	QEDR_USER_MMAP_IO_WC = 0,
+};
+
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
 					size_t len)
 {
@@ -256,60 +260,6 @@ int qedr_modify_port(struct ib_device *ibdev, u8 port, int mask,
 	return 0;
 }
 
-static int qedr_add_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			 unsigned long len)
-{
-	struct qedr_mm *mm;
-
-	mm = kzalloc(sizeof(*mm), GFP_KERNEL);
-	if (!mm)
-		return -ENOMEM;
-
-	mm->key.phy_addr = phy_addr;
-	/* This function might be called with a length which is not a multiple
-	 * of PAGE_SIZE, while the mapping is PAGE_SIZE grained and the kernel
-	 * forces this granularity by increasing the requested size if needed.
-	 * When qedr_mmap is called, it will search the list with the updated
-	 * length as a key. To prevent search failures, the length is rounded up
-	 * in advance to PAGE_SIZE.
-	 */
-	mm->key.len = roundup(len, PAGE_SIZE);
-	INIT_LIST_HEAD(&mm->entry);
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_add(&mm->entry, &uctx->mm_head);
-	mutex_unlock(&uctx->mm_list_lock);
-
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "added (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-		 (unsigned long long)mm->key.phy_addr,
-		 (unsigned long)mm->key.len, uctx);
-
-	return 0;
-}
-
-static bool qedr_search_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			     unsigned long len)
-{
-	bool found = false;
-	struct qedr_mm *mm;
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_for_each_entry(mm, &uctx->mm_head, entry) {
-		if (len != mm->key.len || phy_addr != mm->key.phy_addr)
-			continue;
-
-		found = true;
-		break;
-	}
-	mutex_unlock(&uctx->mm_list_lock);
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "searched for (addr=0x%llx,len=0x%lx) for ctx=%p, result=%d\n",
-		 mm->key.phy_addr, mm->key.len, uctx, found);
-
-	return found;
-}
-
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 {
 	struct ib_device *ibdev = uctx->device;
@@ -318,6 +268,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	struct qedr_alloc_ucontext_resp uresp = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
+	struct qedr_user_mmap_entry *entry;
 
 	if (!udata)
 		return -EFAULT;
@@ -334,13 +285,27 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	ctx->dpi_addr = oparams.dpi_addr;
 	ctx->dpi_phys_addr = oparams.dpi_phys_addr;
 	ctx->dpi_size = oparams.dpi_size;
-	INIT_LIST_HEAD(&ctx->mm_head);
-	mutex_init(&ctx->mm_list_lock);
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	entry->address = ctx->dpi_phys_addr;
+	entry->length = ctx->dpi_size;
+	entry->mmap_flag = QEDR_USER_MMAP_IO_WC;
+	ctx->db_key = rdma_user_mmap_entry_insert(uctx, &entry->rdma_entry,
+						  ctx->dpi_size);
+	if (ctx->db_key == RDMA_USER_MMAP_INVALID) {
+		rc = -ENOMEM;
+		kfree(entry);
+		goto err;
+	}
 
 	uresp.dpm_enabled = dev->user_dpm_enabled;
 	uresp.wids_enabled = 1;
 	uresp.wid_count = oparams.wid_count;
-	uresp.db_pa = ctx->dpi_phys_addr;
+	uresp.db_pa = ctx->db_key;
 	uresp.db_size = ctx->dpi_size;
 	uresp.max_send_wr = dev->attr.max_sqe;
 	uresp.max_recv_wr = dev->attr.max_rqe;
@@ -352,82 +317,102 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
-		return rc;
+		goto err1;
 
 	ctx->dev = dev;
 
-	rc = qedr_add_mmap(ctx, ctx->dpi_phys_addr, ctx->dpi_size);
-	if (rc)
-		return rc;
-
 	DP_DEBUG(dev, QEDR_MSG_INIT, "Allocating user context %p\n",
 		 &ctx->ibucontext);
 	return 0;
+
+err1:
+	rdma_user_mmap_entry_remove(uctx, ctx->db_key);
+err:
+	dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi);
+	return rc;
 }
 
 void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
 {
 	struct qedr_ucontext *uctx = get_qedr_ucontext(ibctx);
-	struct qedr_mm *mm, *tmp;
 
 	DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n",
 		 uctx);
-	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
 
-	list_for_each_entry_safe(mm, tmp, &uctx->mm_head, entry) {
-		DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-			 "deleted (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-			 mm->key.phy_addr, mm->key.len, uctx);
-		list_del(&mm->entry);
-		kfree(mm);
-	}
+	rdma_user_mmap_entry_remove(ibctx, uctx->db_key);
+	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
 }
 
-int qedr_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
-	struct qedr_ucontext *ucontext = get_qedr_ucontext(context);
-	struct qedr_dev *dev = get_qedr_dev(context->device);
-	unsigned long phys_addr = vma->vm_pgoff << PAGE_SHIFT;
-	unsigned long len = (vma->vm_end - vma->vm_start);
-	unsigned long dpi_start;
-
-	dpi_start = dev->db_phys_addr + (ucontext->dpi * ucontext->dpi_size);
+	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
 
-	DP_DEBUG(dev, QEDR_MSG_INIT,
-		 "mmap invoked with vm_start=0x%pK, vm_end=0x%pK,vm_pgoff=0x%pK; dpi_start=0x%pK dpi_size=0x%x\n",
-		 (void *)vma->vm_start, (void *)vma->vm_end,
-		 (void *)vma->vm_pgoff, (void *)dpi_start, ucontext->dpi_size);
+	kfree(entry);
+}
 
-	if ((vma->vm_start & (PAGE_SIZE - 1)) || (len & (PAGE_SIZE - 1))) {
-		DP_ERR(dev,
-		       "failed mmap, addresses must be page aligned: start=0x%pK, end=0x%pK\n",
-		       (void *)vma->vm_start, (void *)vma->vm_end);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
+{
+	struct ib_device *dev = ucontext->device;
+	u64 length = vma->vm_end - vma->vm_start;
+	u64 key = vma->vm_pgoff << PAGE_SHIFT;
+	struct rdma_user_mmap_entry *rdma_entry;
+	struct qedr_user_mmap_entry *entry;
+	u64 pfn;
+	int err;
+
+	ibdev_dbg(dev,
+		  "start %#lx, end %#lx, length = %#llx, key = %#llx\n",
+		  vma->vm_start, vma->vm_end, length, key);
+
+	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
+		ibdev_dbg(dev,
+			  "length[%#llx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+			  length, PAGE_SIZE, vma->vm_flags);
 		return -EINVAL;
 	}
 
-	if (!qedr_search_mmap(ucontext, phys_addr, len)) {
-		DP_ERR(dev, "failed mmap, vm_pgoff=0x%lx is not authorized\n",
-		       vma->vm_pgoff);
-		return -EINVAL;
+	if (vma->vm_flags & VM_EXEC) {
+		ibdev_dbg(dev, "Mapping executable pages is not permitted\n");
+		return -EPERM;
 	}
+	vma->vm_flags &= ~VM_MAYEXEC;
 
-	if (phys_addr < dpi_start ||
-	    ((phys_addr + len) > (dpi_start + ucontext->dpi_size))) {
-		DP_ERR(dev,
-		       "failed mmap, pages are outside of dpi; page address=0x%pK, dpi_start=0x%pK, dpi_size=0x%x\n",
-		       (void *)phys_addr, (void *)dpi_start,
-		       ucontext->dpi_size);
+	rdma_entry = rdma_user_mmap_entry_get(ucontext, key, length, vma);
+	if (!rdma_entry) {
+		ibdev_dbg(dev, "key[%#llx] does not have valid entry\n",
+			  key);
 		return -EINVAL;
 	}
-
-	if (vma->vm_flags & VM_READ) {
-		DP_ERR(dev, "failed mmap, cannot map doorbell bar for read\n");
+	entry = get_qedr_mmap_entry(rdma_entry);
+	if (entry->length != length) {
+		ibdev_dbg(dev,
+			  "key[%#llx] does not have valid length[%#llx] expected[%#llx]\n",
+			  key, length, entry->length);
 		return -EINVAL;
 	}
 
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, len,
-				  vma->vm_page_prot);
+	ibdev_dbg(dev,
+		  "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
+		  entry->address, length, entry->mmap_flag);
+
+	pfn = entry->address >> PAGE_SHIFT;
+	switch (entry->mmap_flag) {
+	case QEDR_USER_MMAP_IO_WC:
+		err = rdma_user_mmap_io(ucontext, vma, pfn, length,
+					pgprot_writecombine(vma->vm_page_prot));
+		break;
+	default:
+		err = -EINVAL;
+	}
+
+	if (err) {
+		ibdev_dbg(dev,
+			  "Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
+			  entry->address, length, entry->mmap_flag, err);
+		rdma_user_mmap_entry_put(ucontext, rdma_entry);
+	}
+
+	return err;
 }
 
 int qedr_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 9aaa90283d6e..3606e97e95da 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -46,7 +46,8 @@ int qedr_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
 void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
 
-int qedr_mmap(struct ib_ucontext *, struct vm_area_struct *vma);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
+void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
 int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (4 preceding siblings ...)
  2019-08-27 13:28 ` [PATCH v8 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
@ 2019-08-27 13:28 ` Michal Kalderon
  2019-08-27 13:28 ` [PATCH v8 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  |   7 +
 drivers/infiniband/hw/qedr/verbs.c | 315 +++++++++++++++++++++++++++++++------
 include/uapi/rdma/qedr-abi.h       |  25 +++
 3 files changed, 299 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 31769e203372..8cb12b3396d5 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -234,6 +234,7 @@ struct qedr_ucontext {
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
+	bool db_rec;
 };
 
 union db_prod64 {
@@ -261,6 +262,12 @@ struct qedr_userq {
 	struct qedr_pbl *pbl_tbl;
 	u64 buf_addr;
 	size_t buf_len;
+
+	/* doorbell recovery */
+	void __iomem *db_addr;
+	struct qedr_user_db_rec *db_rec_data;
+	u64 db_rec_phys;
+	u64 db_rec_key;
 };
 
 struct qedr_cq {
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 205c0b0c3244..205985490df1 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -60,6 +60,7 @@
 
 enum {
 	QEDR_USER_MMAP_IO_WC = 0,
+	QEDR_USER_MMAP_PHYS_PAGE,
 };
 
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
@@ -266,6 +267,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	int rc;
 	struct qedr_ucontext *ctx = get_qedr_ucontext(uctx);
 	struct qedr_alloc_ucontext_resp uresp = {};
+	struct qedr_alloc_ucontext_req ureq = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
 	struct qedr_user_mmap_entry *entry;
@@ -273,6 +275,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	if (!udata)
 		return -EFAULT;
 
+	if (udata->inlen) {
+		rc = ib_copy_from_udata(&ureq, udata,
+					min(sizeof(ureq), udata->inlen));
+		if (rc) {
+			DP_ERR(dev, "Problem copying data from user space\n");
+			return -EFAULT;
+		}
+
+		ctx->db_rec = !!(ureq.context_flags & QEDR_ALLOC_UCTX_DB_REC);
+	}
+
 	rc = dev->ops->rdma_add_user(dev->rdma_ctx, &oparams);
 	if (rc) {
 		DP_ERR(dev,
@@ -347,6 +360,9 @@ void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 {
 	struct qedr_user_mmap_entry *entry = get_qedr_mmap_entry(rdma_entry);
 
+	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
+		free_page((unsigned long)phys_to_virt(entry->address));
+
 	kfree(entry);
 }
 
@@ -401,6 +417,9 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 		err = rdma_user_mmap_io(ucontext, vma, pfn, length,
 					pgprot_writecombine(vma->vm_page_prot));
 		break;
+	case QEDR_USER_MMAP_PHYS_PAGE:
+		err = vm_insert_page(vma, vma->vm_start, pfn_to_page(pfn));
+		break;
 	default:
 		err = -EINVAL;
 	}
@@ -642,16 +661,48 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
 	}
 }
 
+static int qedr_db_recovery_add(struct qedr_dev *dev,
+				void __iomem *db_addr,
+				void *db_data,
+				enum qed_db_rec_width db_width,
+				enum qed_db_rec_space db_space)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return 0;
+	}
+
+	return dev->ops->common->db_recovery_add(dev->cdev, db_addr, db_data,
+						 db_width, db_space);
+}
+
+static void qedr_db_recovery_del(struct qedr_dev *dev,
+				 void __iomem *db_addr,
+				 void *db_data)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return;
+	}
+
+	/* Ignore return code as there is not much we can do about it. Error
+	 * log will be printed inside.
+	 */
+	dev->ops->common->db_recovery_del(dev->cdev, db_addr, db_data);
+}
+
 static int qedr_copy_cq_uresp(struct qedr_dev *dev,
-			      struct qedr_cq *cq, struct ib_udata *udata)
+			      struct qedr_cq *cq, struct ib_udata *udata,
+			      u32 db_offset)
 {
 	struct qedr_create_cq_uresp uresp;
 	int rc;
 
 	memset(&uresp, 0, sizeof(uresp));
 
-	uresp.db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+	uresp.db_offset = db_offset;
 	uresp.icid = cq->icid;
+	uresp.db_rec_addr = cq->q.db_rec_key;
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
@@ -679,10 +730,56 @@ static inline int qedr_align_cq_entries(int entries)
 	return aligned_size / QEDR_CQE_SIZE;
 }
 
+static int qedr_init_user_db_rec(struct ib_udata *udata,
+				 struct qedr_dev *dev, struct qedr_userq *q,
+				 bool requires_db_rec)
+{
+	struct qedr_ucontext *uctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+	struct qedr_user_mmap_entry *entry;
+
+	/* Aborting for non doorbell userqueue (SRQ) or non-supporting lib */
+	if (requires_db_rec == 0 || !uctx->db_rec)
+		return 0;
+
+	/* Allocate a page for doorbell recovery, add to mmap ) */
+	q->db_rec_data = (void *)get_zeroed_page(GFP_USER);
+	if (!q->db_rec_data) {
+		DP_ERR(dev,
+		       "get_free_page failed\n");
+		return -ENOMEM;
+	}
+
+	q->db_rec_phys = virt_to_phys(q->db_rec_data);
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto err_nomem;
+
+	entry->address = q->db_rec_phys;
+	entry->length = PAGE_SIZE;
+	entry->mmap_flag = QEDR_USER_MMAP_PHYS_PAGE;
+	q->db_rec_key = rdma_user_mmap_entry_insert(&uctx->ibucontext,
+						    &entry->rdma_entry,
+						    PAGE_SIZE);
+	if (q->db_rec_key == RDMA_USER_MMAP_INVALID) {
+		kfree(entry);
+		goto err_nomem;
+	}
+
+	return 0;
+
+err_nomem:
+	free_page((unsigned long)q->db_rec_data);
+	q->db_rec_data = NULL;
+	return -ENOMEM;
+}
+
 static inline int qedr_init_user_queue(struct ib_udata *udata,
 				       struct qedr_dev *dev,
 				       struct qedr_userq *q, u64 buf_addr,
-				       size_t buf_len, int access, int dmasync,
+				       size_t buf_len, bool requires_db_rec,
+				       int access, int dmasync,
 				       int alloc_and_init)
 {
 	u32 fw_pages;
@@ -720,7 +817,8 @@ static inline int qedr_init_user_queue(struct ib_udata *udata,
 		}
 	}
 
-	return 0;
+	/* mmap the user address used to store doorbell data for recovery */
+	return qedr_init_user_db_rec(udata, dev, q, requires_db_rec);
 
 err0:
 	ib_umem_release(q->umem);
@@ -806,6 +904,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int entries = attr->cqe;
 	struct qedr_cq *cq = get_qedr_cq(ibcq);
 	int chain_entries;
+	u32 db_offset;
 	int page_cnt;
 	u64 pbl_ptr;
 	u16 icid;
@@ -825,8 +924,12 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	chain_entries = qedr_align_cq_entries(entries);
 	chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
 
+	/* calc db offset. user will add DPI base, kernel will add db addr */
+	db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create cq: problem copying data from user space\n");
 			goto err0;
@@ -841,8 +944,9 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->cq_type = QEDR_CQ_TYPE_USER;
 
 		rc = qedr_init_user_queue(udata, dev, &cq->q, ureq.addr,
-					  ureq.len, IB_ACCESS_LOCAL_WRITE, 1,
-					  1);
+					  ureq.len, true,
+					  IB_ACCESS_LOCAL_WRITE,
+					  1, 1);
 		if (rc)
 			goto err0;
 
@@ -850,6 +954,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		page_cnt = cq->q.pbl_info.num_pbes;
 
 		cq->ibcq.cqe = chain_entries;
+		cq->q.db_addr = ctx->dpi_addr + db_offset;
 	} else {
 		cq->cq_type = QEDR_CQ_TYPE_KERNEL;
 
@@ -861,7 +966,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 						   sizeof(union rdma_cqe),
 						   &cq->pbl, NULL);
 		if (rc)
-			goto err1;
+			goto err0;
 
 		page_cnt = qed_chain_get_page_cnt(&cq->pbl);
 		pbl_ptr = qed_chain_get_pbl_phys(&cq->pbl);
@@ -873,21 +978,28 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	rc = dev->ops->rdma_create_cq(dev->rdma_ctx, &params, &icid);
 	if (rc)
-		goto err2;
+		goto err1;
 
 	cq->icid = icid;
 	cq->sig = QEDR_CQ_MAGIC_NUMBER;
 	spin_lock_init(&cq->cq_lock);
 
 	if (udata) {
-		rc = qedr_copy_cq_uresp(dev, cq, udata);
+		rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
 		if (rc)
-			goto err3;
+			goto err2;
+
+		rc = qedr_db_recovery_add(dev, cq->q.db_addr,
+					  &cq->q.db_rec_data->db_data,
+					  DB_REC_WIDTH_64B,
+					  DB_REC_USER);
+		if (rc)
+			goto err2;
+
 	} else {
 		/* Generate doorbell address. */
-		cq->db_addr = dev->db_addr +
-		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
 		cq->db.data.icid = cq->icid;
+		cq->db_addr = dev->db_addr + db_offset;
 		cq->db.data.params = DB_AGG_CMD_SET <<
 		    RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
 
@@ -897,6 +1009,11 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->latest_cqe = NULL;
 		consume_cqe(cq);
 		cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
+
+		rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
+					  DB_REC_WIDTH_64B, DB_REC_KERNEL);
+		if (rc)
+			goto err2;
 	}
 
 	DP_DEBUG(dev, QEDR_MSG_CQ,
@@ -905,18 +1022,20 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	return 0;
 
-err3:
+err2:
 	destroy_iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &destroy_iparams,
 				  &destroy_oparams);
-err2:
-	if (udata)
-		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
-	else
-		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
 err1:
-	if (udata)
+	if (udata) {
+		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+		if (cq->q.db_rec_data)
+			rdma_user_mmap_entry_remove(&ctx->ibucontext,
+						    cq->q.db_rec_key);
+	} else {
+		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
+	}
 err0:
 	return -EINVAL;
 }
@@ -936,6 +1055,8 @@ int qedr_resize_cq(struct ib_cq *ibcq, int new_cnt, struct ib_udata *udata)
 
 void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 {
+	struct qedr_ucontext *ctx = rdma_udata_to_drv_context(udata,
+		struct qedr_ucontext, ibucontext);
 	struct qedr_dev *dev = get_qedr_dev(ibcq->device);
 	struct qed_rdma_destroy_cq_out_params oparams;
 	struct qed_rdma_destroy_cq_in_params iparams;
@@ -947,8 +1068,10 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	cq->destroyed = 1;
 
 	/* GSIs CQs are handled by driver, so they don't exist in the FW */
-	if (cq->cq_type == QEDR_CQ_TYPE_GSI)
+	if (cq->cq_type == QEDR_CQ_TYPE_GSI) {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 		return;
+	}
 
 	iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &iparams, &oparams);
@@ -957,6 +1080,15 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	if (udata) {
 		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+
+		if (cq->q.db_rec_data) {
+			qedr_db_recovery_del(dev, cq->q.db_addr,
+					     &cq->q.db_rec_data->db_data);
+			rdma_user_mmap_entry_remove(&ctx->ibucontext,
+						    cq->q.db_rec_key);
+		}
+	} else {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 	}
 
 	/* We don't want the IRQ handler to handle a non-existing CQ so we
@@ -1121,8 +1253,8 @@ static int qedr_copy_srq_uresp(struct qedr_dev *dev,
 }
 
 static void qedr_copy_rq_uresp(struct qedr_dev *dev,
-			       struct qedr_create_qp_uresp *uresp,
-			       struct qedr_qp *qp)
+			      struct qedr_create_qp_uresp *uresp,
+			      struct qedr_qp *qp)
 {
 	/* iWARP requires two doorbells per RQ. */
 	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
@@ -1135,6 +1267,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev,
 	}
 
 	uresp->rq_icid = qp->icid;
+	uresp->rq_db_rec_addr = qp->urq.db_rec_key;
 }
 
 static void qedr_copy_sq_uresp(struct qedr_dev *dev,
@@ -1148,22 +1281,24 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev,
 		uresp->sq_icid = qp->icid;
 	else
 		uresp->sq_icid = qp->icid + 1;
+
+	uresp->sq_db_rec_addr = qp->usq.db_rec_key;
 }
 
 static int qedr_copy_qp_uresp(struct qedr_dev *dev,
-			      struct qedr_qp *qp, struct ib_udata *udata)
+			      struct qedr_qp *qp, struct ib_udata *udata,
+			      struct qedr_create_qp_uresp *uresp)
 {
-	struct qedr_create_qp_uresp uresp;
 	int rc;
 
-	memset(&uresp, 0, sizeof(uresp));
-	qedr_copy_sq_uresp(dev, &uresp, qp);
-	qedr_copy_rq_uresp(dev, &uresp, qp);
+	memset(uresp, 0, sizeof(*uresp));
+	qedr_copy_sq_uresp(dev, uresp, qp);
+	qedr_copy_rq_uresp(dev, uresp, qp);
 
-	uresp.atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
-	uresp.qp_id = qp->qp_id;
+	uresp->atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
+	uresp->qp_id = qp->qp_id;
 
-	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+	rc = qedr_ib_copy_to_udata(udata, uresp, sizeof(*uresp));
 	if (rc)
 		DP_ERR(dev,
 		       "create qp: failed a copy to user space with qp icid=0x%x.\n",
@@ -1207,16 +1342,35 @@ static void qedr_set_common_qp_params(struct qedr_dev *dev,
 		 qp->sq.max_sges, qp->sq_cq->icid);
 }
 
-static void qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid + 1;
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	if (!qp->srq) {
 		qp->rq.db = dev->db_addr +
 			    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD);
 		qp->rq.db_data.data.icid = qp->icid;
+
+		rc = qedr_db_recovery_add(dev, qp->rq.db,
+					  &qp->rq.db_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_KERNEL);
+		if (rc)
+			qedr_db_recovery_del(dev, qp->sq.db,
+					     &qp->sq.db_data);
 	}
+
+	return rc;
 }
 
 static int qedr_check_srq_params(struct qedr_dev *dev,
@@ -1270,7 +1424,7 @@ static int qedr_init_srq_user_params(struct ib_udata *udata,
 	int rc;
 
 	rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, ureq->srq_addr,
-				  ureq->srq_len, access, dmasync, 1);
+				  ureq->srq_len, false, access, dmasync, 1);
 	if (rc)
 		return rc;
 
@@ -1366,7 +1520,8 @@ int qedr_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr,
 	hw_srq->max_sges = init_attr->attr.max_sge;
 
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create srq: problem copying data from user space\n");
 			goto err0;
@@ -1555,13 +1710,29 @@ qedr_iwarp_populate_user_qp(struct qedr_dev *dev,
 			   &qp->urq.pbl_info, FW_PAGE_SHIFT);
 }
 
-static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
+static void qedr_cleanup_user(struct qedr_dev *dev,
+			      struct qedr_ucontext *ctx,
+			      struct qedr_qp *qp)
 {
 	ib_umem_release(qp->usq.umem);
 	qp->usq.umem = NULL;
 
 	ib_umem_release(qp->urq.umem);
 	qp->urq.umem = NULL;
+
+	if (qp->usq.db_rec_data) {
+		qedr_db_recovery_del(dev, qp->usq.db_addr,
+				     &qp->usq.db_rec_data->db_data);
+		rdma_user_mmap_entry_remove(&ctx->ibucontext,
+					    qp->usq.db_rec_key);
+	}
+
+	if (qp->urq.db_rec_data) {
+		qedr_db_recovery_del(dev, qp->urq.db_addr,
+				     &qp->urq.db_rec_data->db_data);
+		rdma_user_mmap_entry_remove(&ctx->ibucontext,
+					    qp->urq.db_rec_key);
+	}
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1573,12 +1744,14 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	struct qed_rdma_create_qp_in_params in_params;
 	struct qed_rdma_create_qp_out_params out_params;
 	struct qedr_pd *pd = get_qedr_pd(ibpd);
+	struct qedr_create_qp_uresp uresp;
+	struct qedr_ucontext *ctx = NULL;
 	struct qedr_create_qp_ureq ureq;
 	int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1);
 	int rc = -EINVAL;
 
 	memset(&ureq, 0, sizeof(ureq));
-	rc = ib_copy_from_udata(&ureq, udata, sizeof(ureq));
+	rc = ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), udata->inlen));
 	if (rc) {
 		DP_ERR(dev, "Problem copying data from user space\n");
 		return rc;
@@ -1586,14 +1759,16 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 
 	/* SQ - read access only (0), dma sync not required (0) */
 	rc = qedr_init_user_queue(udata, dev, &qp->usq, ureq.sq_addr,
-				  ureq.sq_len, 0, 0, alloc_and_init);
+				  ureq.sq_len, true, 0, 0,
+				  alloc_and_init);
 	if (rc)
 		return rc;
 
 	if (!qp->srq) {
 		/* RQ - read access only (0), dma sync not required (0) */
 		rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
-					  ureq.rq_len, 0, 0, alloc_and_init);
+					  ureq.rq_len, true,
+					  0, 0, alloc_and_init);
 		if (rc)
 			return rc;
 	}
@@ -1623,29 +1798,56 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	rc = qedr_copy_qp_uresp(dev, qp, udata);
+	rc = qedr_copy_qp_uresp(dev, qp, udata, &uresp);
+	if (rc)
+		goto err;
+
+	/* db offset was calculated in copy_qp_uresp, now set in the user q */
+	ctx = pd->uctx;
+	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
+	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
+
+	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
+				  &qp->usq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
 	if (rc)
 		goto err;
 
+	rc = qedr_db_recovery_add(dev, qp->urq.db_addr,
+				  &qp->urq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
+	if (rc)
+		goto err;
 	qedr_qp_user_print(dev, qp);
 
-	return 0;
+	return rc;
 err:
 	rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
 	if (rc)
 		DP_ERR(dev, "create qp: fatal fault. rc=%d", rc);
 
 err1:
-	qedr_cleanup_user(dev, qp);
+	qedr_cleanup_user(dev, ctx, qp);
 	return rc;
 }
 
-static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 	    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid;
 
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	qp->rq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD);
 	qp->rq.db_data.data.icid = qp->icid;
@@ -1653,6 +1855,13 @@ static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 			   DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_FLAGS);
 	qp->rq.iwarp_db2_data.data.icid = qp->icid;
 	qp->rq.iwarp_db2_data.data.value = DQ_TCM_IWARP_POST_RQ_CF_CMD;
+
+	rc = qedr_db_recovery_add(dev, qp->rq.db,
+				  &qp->rq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+
+	return rc;
 }
 
 static int
@@ -1700,8 +1909,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_roce_db_info(dev, qp);
-	return rc;
+	return qedr_set_roce_db_info(dev, qp);
 }
 
 static int
@@ -1759,8 +1967,7 @@ qedr_iwarp_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_iwarp_db_info(dev, qp);
-	return rc;
+	return qedr_set_iwarp_db_info(dev, qp);
 
 err:
 	dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -1775,6 +1982,15 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	dev->ops->common->chain_free(dev->cdev, &qp->rq.pbl);
 	kfree(qp->rqe_wr_id);
+
+	/* GSI qp is not registered to db mechanism so no need to delete */
+	if (qp->qp_type == IB_QPT_GSI)
+		return;
+
+	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
+
+	if (!qp->srq)
+		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
@@ -2414,7 +2630,10 @@ int qedr_query_qp(struct ib_qp *ibqp,
 static int qedr_free_qp_resources(struct qedr_dev *dev, struct qedr_qp *qp,
 				  struct ib_udata *udata)
 {
-	int rc = 0;
+	struct qedr_ucontext *ctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+	int rc;
 
 	if (qp->qp_type != IB_QPT_GSI) {
 		rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -2423,7 +2642,7 @@ static int qedr_free_qp_resources(struct qedr_dev *dev, struct qedr_qp *qp,
 	}
 
 	if (udata)
-		qedr_cleanup_user(dev, qp);
+		qedr_cleanup_user(dev, ctx, qp);
 	else
 		qedr_cleanup_kernel(dev, qp);
 
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 7a10b3a325fa..c022ee26089b 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -38,6 +38,15 @@
 #define QEDR_ABI_VERSION		(8)
 
 /* user kernel communication data structures. */
+enum qedr_alloc_ucontext_flags {
+	QEDR_ALLOC_UCTX_RESERVED	= 1 << 0,
+	QEDR_ALLOC_UCTX_DB_REC		= 1 << 1
+};
+
+struct qedr_alloc_ucontext_req {
+	__u32 context_flags;
+	__u32 reserved;
+};
 
 struct qedr_alloc_ucontext_resp {
 	__aligned_u64 db_pa;
@@ -74,6 +83,7 @@ struct qedr_create_cq_uresp {
 	__u32 db_offset;
 	__u16 icid;
 	__u16 reserved;
+	__aligned_u64 db_rec_addr;
 };
 
 struct qedr_create_qp_ureq {
@@ -109,6 +119,13 @@ struct qedr_create_qp_uresp {
 
 	__u32 rq_db2_offset;
 	__u32 reserved;
+
+	/* address of SQ doorbell recovery user entry */
+	__aligned_u64 sq_db_rec_addr;
+
+	/* address of RQ doorbell recovery user entry */
+	__aligned_u64 rq_db_rec_addr;
+
 };
 
 struct qedr_create_srq_ureq {
@@ -128,4 +145,12 @@ struct qedr_create_srq_uresp {
 	__u32 reserved1;
 };
 
+/* doorbell recovery entry allocated and populated by userspace doorbelling
+ * entities and mapped to kernel. Kernel uses this to register doorbell
+ * information with doorbell drop recovery mechanism.
+ */
+struct qedr_user_db_rec {
+	__aligned_u64 db_data; /* doorbell data */
+};
+
 #endif /* __QEDR_USER_H__ */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell recovery support
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (5 preceding siblings ...)
  2019-08-27 13:28 ` [PATCH v8 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
@ 2019-08-27 13:28 ` " Michal Kalderon
  2019-08-28 14:41 ` [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Gal Pressman
  2019-08-30 12:07 ` [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
  8 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-27 13:28 UTC (permalink / raw)
  To: mkalderon, aelior, jgg, dledford, bmt, galpress, sleybo, leon
  Cc: linux-rdma, Michal Kalderon, Ariel Elior

This patch adds the iWARP specific doorbells to the doorbell
recovery mechanism

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  | 12 +++++++-----
 drivers/infiniband/hw/qedr/verbs.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 8cb12b3396d5..07d8ba2a82b2 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -237,6 +237,11 @@ struct qedr_ucontext {
 	bool db_rec;
 };
 
+union db_prod32 {
+	struct rdma_pwm_val16_data data;
+	u32 raw;
+};
+
 union db_prod64 {
 	struct rdma_pwm_val32_data data;
 	u64 raw;
@@ -268,6 +273,8 @@ struct qedr_userq {
 	struct qedr_user_db_rec *db_rec_data;
 	u64 db_rec_phys;
 	u64 db_rec_key;
+	void __iomem *db_rec_db2_addr;
+	union db_prod32 db_rec_db2_data;
 };
 
 struct qedr_cq {
@@ -303,11 +310,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-union db_prod32 {
-	struct rdma_pwm_val16_data data;
-	u32 raw;
-};
-
 struct qedr_qp_hwq_info {
 	/* WQE Elements */
 	struct qed_chain pbl;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 205985490df1..43c9536d0f5d 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1733,6 +1733,10 @@ static void qedr_cleanup_user(struct qedr_dev *dev,
 		rdma_user_mmap_entry_remove(&ctx->ibucontext,
 					    qp->urq.db_rec_key);
 	}
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1))
+		qedr_db_recovery_del(dev, qp->urq.db_rec_db2_addr,
+				     &qp->urq.db_rec_db2_data);
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1807,6 +1811,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
 	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
 
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		qp->urq.db_rec_db2_addr = ctx->dpi_addr + uresp.rq_db2_offset;
+
+		/* calculate the db_rec_db2 data since it is constant so no
+		 *  need to reflect from user
+		 */
+		qp->urq.db_rec_db2_data.data.icid = cpu_to_le16(qp->icid);
+		qp->urq.db_rec_db2_data.data.value =
+			cpu_to_le16(DQ_TCM_IWARP_POST_RQ_CF_CMD);
+	}
+
 	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
 				  &qp->usq.db_rec_data->db_data,
 				  DB_REC_WIDTH_32B,
@@ -1820,6 +1835,15 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 				  DB_REC_USER);
 	if (rc)
 		goto err;
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		rc = qedr_db_recovery_add(dev, qp->urq.db_rec_db2_addr,
+					  &qp->urq.db_rec_db2_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_USER);
+		if (rc)
+			goto err;
+	}
 	qedr_qp_user_print(dev, qp);
 
 	return rc;
@@ -1860,7 +1884,13 @@ static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 				  &qp->rq.db_data,
 				  DB_REC_WIDTH_32B,
 				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
 
+	rc = qedr_db_recovery_add(dev, qp->rq.iwarp_db2,
+				  &qp->rq.iwarp_db2_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
 	return rc;
 }
 
@@ -1989,8 +2019,13 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
 
-	if (!qp->srq)
+	if (!qp->srq) {
 		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
+
+		if (rdma_protocol_iwarp(&dev->ibdev, 1))
+			qedr_db_recovery_del(dev, qp->rq.iwarp_db2,
+					     &qp->rq.iwarp_db2_data);
+	}
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
-- 
2.14.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (6 preceding siblings ...)
  2019-08-27 13:28 ` [PATCH v8 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
@ 2019-08-28 14:41 ` Gal Pressman
  2019-08-30 12:07 ` [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
  8 siblings, 0 replies; 25+ messages in thread
From: Gal Pressman @ 2019-08-28 14:41 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, jgg, dledford, bmt, sleybo, leon, linux-rdma

On 27/08/2019 16:28, Michal Kalderon wrote:
> This patch series uses the doorbell overflow recovery mechanism
> introduced in
> commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
> for rdma ( RoCE and iWARP )
> 
> The first five patches modify the core code to contain helper
> functions for managing mmap_xa inserting, getting and freeing
> entries. The code was based on the code from efa driver.
> There is still an open discussion on whether we should take
> this even further and make the entire mmap generic. Until a
> decision is made, I only created the database API and modified
> the efa, qedr, siw driver to use it. The functions are integrated
> with the umap mechanism.
> 
> The doorbell recovery code is based on the common code.
> 
> Efa driver was compile tested and checked only modprobe/rmmod.
> SIW was compile tested and checked only modprobe/rmmod.
> 
> rdma-core pull request #493
> 
> Changes from V7:
> - Remove license text, SPDX id should suffice.
> - Fix some comments text.
> - Add comment regarding vm_ops being set in ib_uverbs_mmap.
> - Allocate the rdma_user_mmap_entry in the driver and not in the
>   ib_core_uverbs. This lead to defining three new structures per driver
>   and seperating the fields between the driver private structures and
>   the common rdma_user_mmap_entry. Freeing the entry was also moved
>   to the drivers.
> - Fix bug found by Gal Pressman. Call mmap_free only once per entry.
> - Add a mutex around xa_mmap insert to assure threads won't intefere
>   while the xa lock is released when inserting an entry into the range.
> - Modify the insert algorithm to be more elegant using the
>   xas_next_entry instead of foreach.
> - Remove the rdma_user_mmap_entries_remove_free function, now that umap.
>   and mmap_xa are integrated we should not have any entries in the mmap_xa
>   when ucontext is released. Replace the function with a WARN_ON(!xa_empty).
> - Rdma_umap_open needs to reset the vm_private_data before initializing it.
> - Decrease rdma_user_mmap_entry reference count on mmap disassociate.
> - Remove WARN_ON(!kref_read) this is checked when kref debug is on.
> - Remove some redundant defines from ib_verbs.h.
> - Better error handling for efa create qp flow.
> - Add a function that wraps the entry allocation and rdma_user_mmap_entry_insert
>   which is used in all places that need to add an entry to the xarray.
> - Remove rq_entry_inserted field in efa create qp flow.
> - Add mmap_free to siw and free the memory only on mmap free and not before.
> 
> Changes from V6:
> - Modified series description to be closer to what the series is now.
> - Create a new file for the new rdma_user_mmap function. The file
>   is called ib_uverbs_core. This file should contain functions related
>   to user which are called by hw to eventually enable ib_uverbs to be
>   optional.
> - Modify SIW driver to use new mmap api.
> - When calculating number of pages, need to round it up to PAGE_SIZE.
> - Integrate the mmap_xa and umap mechanism so that the entries in
>   mmap_xa now have a reference count and can be removed. Previously
>   entries existed until context was destroyed. This modified the
>   algorithm for allocating a free page range.
> - Modify algorithm for inserting an entry into the mmap_xa.
> - Rdma_umap_priv is now also used for all mmaps done using the
>   mmap_xa helpers.
> - Move remove_free header to core_priv.
> - Rdma_user_mmap_entry now has a kref that is increase on mmap
>   and umap_open and decreased on umap_close.
> - Modify efa + qedr to remove the entry from xa_map. This will
>   decrease the refcnt and free memory only if refcnt is zero.
> - Rdma_user_mmap_io slightly modified to enable drivers not using
>   the xa_mmap API to continue using it.
> - Modify page allocation for user to use GFP_USER instead of GFP_KERNEL
> 
> Changes from V5:
> - Switch between driver dealloc_ucontext and mmap_entries_remove call.
> - No need to verify the key after using the key to load an entry from
>   the mmap_xa.
> - Change mmap_free api to pass an 'entry' object.
> - Add documentation for mmap_free and for newly exported functions.
> - Fix some extra/missing line breaks.
> 
> Changes from V4:
> - Add common mmap database and cookie helper functions.
> 
> Changes from V3:
> - Remove casts from void to u8. Pointer arithmetic can be done on void
> - rebase to tip of rdma-next
> 
> Changes from V2:
> - Don't use long-lived kmap. Instead use user-trigger mmap for the
>   doorbell recovery entries.
> - Modify dpi_addr to be denoted with __iomem and avoid redundant
>   casts
> 
> Changes from V1:
> - call kmap to map virtual address into kernel space
> - modify db_rec_delete to be void
> - remove some cpu_to_le16 that were added to previous patch which are
>   correct but not related to the overflow recovery mechanism. Will be
>   submitted as part of a different patch
> 
> 
> Michal Kalderon (7):
>   RDMA/core: Move core content from ib_uverbs to ib_core
>   RDMA/core: Create mmap database and cookie helper functions
>   RDMA/efa: Use the common mmap_xa helpers
>   RDMA/siw: Use the common mmap_xa helpers
>   RDMA/qedr: Use the common mmap API
>   RDMA/qedr: Add doorbell overflow recovery support
>   RDMA/qedr: Add iWARP doorbell recovery support

Haven't reviewed the patches yet, but tested successfully with EFA.
Tested-by: Gal Pressman <galpress@amazon.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-08-27 13:28 ` [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
@ 2019-08-29 11:35   ` Gal Pressman
  2019-08-29 12:40     ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Gal Pressman @ 2019-08-29 11:35 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, jgg, dledford, bmt, sleybo, leon, linux-rdma,
	Ariel Elior

On 27/08/2019 16:28, Michal Kalderon wrote:
> +/**
> + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @key: the key received from rdma_user_mmap_entry_insert which
> + *     is provided by user as the address to map.
> + * @len: the length the user wants to map.
> + * @vma: the vma related to the current mmap call.
> + *
> + * This function is called when a user tries to mmap a key it
> + * initially received from the driver. The key was created by
> + * the function rdma_user_mmap_entry_insert. The function should
> + * be called only once per mmap. It initializes the vma and
> + * increases the entries ref-count. Once the memory is unmapped
> + * the ref-count will decrease. When the refcount reaches zero
> + * the entry will be deleted.
> + *
> + * Return an entry if exists or NULL if there is no match.
> + */
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len,
> +			 struct vm_area_struct *vma)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;
> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> +	if (!entry)
> +		return NULL;

I'm probably missing something, what happens if an insertion is done, a get is
called and right at this point (before kref_get) the entry is being removed (and
freed by the driver)?

> +
> +	kref_get(&entry->ref);
> +	rdma_umap_priv_init(vma, entry);
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: key[%#llx] npages[%#x] returned\n",
> +		  key, entry->npages);
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> +
> +void rdma_user_mmap_entry_free(struct kref *kref)
> +{
> +	struct rdma_user_mmap_entry *entry =
> +		container_of(kref, struct rdma_user_mmap_entry, ref);
> +	struct ib_ucontext *ucontext = entry->ucontext;
> +	unsigned long i;
> +
> +	/* need to erase all entries occupied... */

To be clear, this erases all pages of a single entry right? The comment is
confusing.

> +	for (i = 0; i < entry->npages; i++) {
> +		xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
> +
> +		ibdev_dbg(ucontext->device,
> +			  "mmap: key[%#llx] npages[%#x] removed\n",
> +			  rdma_user_mmap_get_key(entry),
> +			  entry->npages);

This print can be outside of the loop, it doesn't change between iterations.

> +	}
> +	if (ucontext->device->ops.mmap_free)
> +		ucontext->device->ops.mmap_free(entry);
> +}
> +/**
> + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @entry: the entry to insert into the mmap_xa
> + * @length: length of the address that will be mmapped
> + *
> + * This function should be called by drivers that use the rdma_user_mmap
> + * interface for handling user mmapped addresses. The database is handled in
> + * the core and helper functions are provided to insert entries into the
> + * database and extract entries when the user call mmap with the given key.
> + * The function returns a unique key that should be provided to user, the user
> + * will use the key to retrieve information such as address to
> + * be mapped and how.
> + *
> + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
> + */
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> +				struct rdma_user_mmap_entry *entry,
> +				u64 length)
> +{
> +	struct ib_uverbs_file *ufile = ucontext->ufile;
> +	XA_STATE(xas, &ucontext->mmap_xa, 0);
> +	u32 xa_first, xa_last, npages;
> +	int err, i;
> +
> +	if (!entry)
> +		return RDMA_USER_MMAP_INVALID;
> +
> +	kref_init(&entry->ref);
> +	entry->ucontext = ucontext;
> +
> +	/* We want the whole allocation to be done without interruption
> +	 * from a different thread. The allocation requires finding a
> +	 * free range and storing. During the xa_insert the lock could be
> +	 * released, we don't want another thread taking the gap.
> +	 */
> +	mutex_lock(&ufile->umap_lock);
> +
> +	xa_lock(&ucontext->mmap_xa);

Doesn't the mutex replace the xa_lock?

> +
> +	/* We want to find an empty range */
> +	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
> +	entry->npages = npages;
> +	do {
> +		/* First find an empty index */
> +		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
> +		if (xas.xa_node == XAS_RESTART)
> +			goto err_unlock;
> +
> +		xa_first = xas.xa_index;
> +
> +		/* Is there enough room to have the range? */
> +		if (check_add_overflow(xa_first, npages, &xa_last))
> +			goto err_unlock;
> +
> +		/* Now look for the next present entry. If such doesn't
> +		 * exist, we found an empty range and can proceed
> +		 */
> +		xas_next_entry(&xas, xa_last - 1);
> +		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
> +			break;
> +		/* o/w look for the next free entry */
> +	} while (true);
> +
> +	for (i = xa_first; i < xa_last; i++) {
> +		err = __xa_insert(&ucontext->mmap_xa, i, entry, GFP_KERNEL);
> +		if (err)
> +			goto err_undo;
> +	}
> +
> +	entry->mmap_page = xa_first;
> +	xa_unlock(&ucontext->mmap_xa);
> +
> +	mutex_unlock(&ufile->umap_lock);
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: key[%#llx] npages[%#x] inserted\n",
> +		  rdma_user_mmap_get_key(entry), npages);
> +
> +	return rdma_user_mmap_get_key(entry);
> +
> +err_undo:
> +	for (; i > xa_first; i--)
> +		__xa_erase(&ucontext->mmap_xa, i - 1);
> +
> +err_unlock:
> +	xa_unlock(&ucontext->mmap_xa);
> +	mutex_unlock(&ufile->umap_lock);
> +	return RDMA_USER_MMAP_INVALID;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-08-29 11:35   ` Gal Pressman
@ 2019-08-29 12:40     ` Jason Gunthorpe
  2019-08-30  6:10       ` [EXT] " Michal Kalderon
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-29 12:40 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Michal Kalderon, mkalderon, aelior, dledford, bmt, sleybo, leon,
	linux-rdma, Ariel Elior

On Thu, Aug 29, 2019 at 02:35:45PM +0300, Gal Pressman wrote:
> On 27/08/2019 16:28, Michal Kalderon wrote:
> > +/**
> > + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @key: the key received from rdma_user_mmap_entry_insert which
> > + *     is provided by user as the address to map.
> > + * @len: the length the user wants to map.
> > + * @vma: the vma related to the current mmap call.
> > + *
> > + * This function is called when a user tries to mmap a key it
> > + * initially received from the driver. The key was created by
> > + * the function rdma_user_mmap_entry_insert. The function should
> > + * be called only once per mmap. It initializes the vma and
> > + * increases the entries ref-count. Once the memory is unmapped
> > + * the ref-count will decrease. When the refcount reaches zero
> > + * the entry will be deleted.
> > + *
> > + * Return an entry if exists or NULL if there is no match.
> > + */
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len,
> > +			 struct vm_area_struct *vma)
> > +{
> > +	struct rdma_user_mmap_entry *entry;
> > +	u64 mmap_page;
> > +
> > +	mmap_page = key >> PAGE_SHIFT;
> > +	if (mmap_page > U32_MAX)
> > +		return NULL;
> > +
> > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > +	if (!entry)
> > +		return NULL;
> 
> I'm probably missing something, what happens if an insertion is done, a get is
> called and right at this point (before kref_get) the entry is being removed (and
> freed by the driver)?

Yes, things are wrong here.. It should hold xa_lock to protect entry
until the kref is obtained and this must use kref_get_unless_zero() as
the kref could be 0 while still in the xarray.

> > +	for (i = 0; i < entry->npages; i++) {
> > +		xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);

This is better to use __xa_erase and hold the xa_lock outside the loop

> > +	/* We want the whole allocation to be done without interruption
> > +	 * from a different thread. The allocation requires finding a
> > +	 * free range and storing. During the xa_insert the lock could be
> > +	 * released, we don't want another thread taking the gap.
> > +	 */
> > +	mutex_lock(&ufile->umap_lock);
> > +
> > +	xa_lock(&ucontext->mmap_xa);
> 
> Doesn't the mutex replace the xa_lock?

No, absolutely not. xarray must hold its internal lock when
required. The external lock is only about protecting the contents

I'm not sure why this needs to hold this mutex, the spinlock looks OK.

> > +
> > +	/* We want to find an empty range */
> > +	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
> > +	entry->npages = npages;
> > +	do {
> > +		/* First find an empty index */
> > +		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
> > +		if (xas.xa_node == XAS_RESTART)
> > +			goto err_unlock;
> > +
> > +		xa_first = xas.xa_index;
> > +
> > +		/* Is there enough room to have the range? */
> > +		if (check_add_overflow(xa_first, npages, &xa_last))
> > +			goto err_unlock;
> > +
> > +		/* Now look for the next present entry. If such doesn't
> > +		 * exist, we found an empty range and can proceed
> > +		 */
> > +		xas_next_entry(&xas, xa_last - 1);
> > +		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
> > +			break;
> > +		/* o/w look for the next free entry */
> > +	} while (true);

while(true) not do/while is the usual convention

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-08-27 13:28 ` [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
@ 2019-08-29 14:20   ` Gal Pressman
  2019-08-30  6:15     ` Michal Kalderon
  0 siblings, 1 reply; 25+ messages in thread
From: Gal Pressman @ 2019-08-29 14:20 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, jgg, dledford, bmt, sleybo, leon, linux-rdma,
	Ariel Elior

On 27/08/2019 16:28, Michal Kalderon wrote:
> +static void efa_qp_user_mmap_entries_remove(struct efa_ucontext *ucontext,
> +					    struct efa_qp *qp)
> +{
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->sq_db_mmap_key);
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> +				    qp->llq_desc_mmap_key);
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_mmap_key);
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_db_mmap_key);

Please remove the entries in reverse insertion order.

> +}
> +
>  static int qp_mmap_entries_setup(struct efa_qp *qp,
>  				 struct efa_dev *dev,
>  				 struct efa_ucontext *ucontext,
>  				 struct efa_com_create_qp_params *params,
>  				 struct efa_ibv_create_qp_resp *resp)
>  {
> -	/*
> -	 * Once an entry is inserted it might be mmapped, hence cannot be
> -	 * cleaned up until dealloc_ucontext.
> -	 */
> -	resp->sq_db_mmap_key =
> -		mmap_entry_insert(dev, ucontext, qp,
> -				  dev->db_bar_addr + resp->sq_db_offset,
> -				  PAGE_SIZE, EFA_MMAP_IO_NC);
> -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
> -		return -ENOMEM;
> +	u64 address;
> +	u64 length;
> +	int err;
> +
> +	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> +					 dev->db_bar_addr +
> +					 resp->sq_db_offset,
> +					 PAGE_SIZE, EFA_MMAP_IO_NC,
> +					 &qp->sq_db_mmap_key);
> +	if (err)
> +		return err;
>  
> +	resp->sq_db_mmap_key = qp->sq_db_mmap_key;
>  	resp->sq_db_offset &= ~PAGE_MASK;
>  
> -	resp->llq_desc_mmap_key =
> -		mmap_entry_insert(dev, ucontext, qp,
> -				  dev->mem_bar_addr + resp->llq_desc_offset,
> -				  PAGE_ALIGN(params->sq_ring_size_in_bytes +
> -					     (resp->llq_desc_offset & ~PAGE_MASK)),
> -				  EFA_MMAP_IO_WC);
> -	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
> -		return -ENOMEM;
> +	address = dev->mem_bar_addr + resp->llq_desc_offset;
> +	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
> +			    (resp->llq_desc_offset & ~PAGE_MASK));
> +
> +	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> +					 address,
> +					 length,
> +					 EFA_MMAP_IO_WC,
> +					 &qp->llq_desc_mmap_key);
> +	if (err)
> +		goto err1;
>  
> +	resp->llq_desc_mmap_key = qp->llq_desc_mmap_key;
>  	resp->llq_desc_offset &= ~PAGE_MASK;
>  
>  	if (qp->rq_size) {
> -		resp->rq_db_mmap_key =
> -			mmap_entry_insert(dev, ucontext, qp,
> -					  dev->db_bar_addr + resp->rq_db_offset,
> -					  PAGE_SIZE, EFA_MMAP_IO_NC);
> -		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
> -			return -ENOMEM;
> +		address = dev->db_bar_addr + resp->rq_db_offset;
>  
> +		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> +						 address, PAGE_SIZE,
> +						 EFA_MMAP_IO_NC,
> +						 &qp->rq_db_mmap_key);
> +		if (err)
> +			goto err2;
> +
> +		resp->rq_db_mmap_key = qp->rq_db_mmap_key;
>  		resp->rq_db_offset &= ~PAGE_MASK;
>  
> -		resp->rq_mmap_key =
> -			mmap_entry_insert(dev, ucontext, qp,
> -					  virt_to_phys(qp->rq_cpu_addr),
> -					  qp->rq_size, EFA_MMAP_DMA_PAGE);
> -		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
> -			return -ENOMEM;
> +		address = virt_to_phys(qp->rq_cpu_addr);
> +		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> +						 address, qp->rq_size,
> +						 EFA_MMAP_DMA_PAGE,
> +						 &qp->rq_mmap_key);
> +		if (err)
> +			goto err3;
>  
> +		resp->rq_mmap_key = qp->rq_mmap_key;
>  		resp->rq_mmap_size = qp->rq_size;
>  	}
>  
>  	return 0;
> +
> +err3:
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->rq_db_mmap_key);
> +
> +err2:
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> +				    qp->llq_desc_mmap_key);
> +
> +err1:
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp->sq_db_mmap_key);

I prefer meaningful goto labels, e.g err_remove_sq_db instead of err1.

> +
> +	/* If any error occurred, we init the keys back to invalid */
> +	efa_qp_init_keys(qp);
> +
> +	return err;
>  }
>  
>  static int efa_qp_validate_cap(struct efa_dev *dev,
> @@ -634,7 +594,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>  	struct efa_dev *dev = to_edev(ibpd->device);
>  	struct efa_ibv_create_qp_resp resp = {};
>  	struct efa_ibv_create_qp cmd = {};
> -	bool rq_entry_inserted = false;
>  	struct efa_ucontext *ucontext;
>  	struct efa_qp *qp;
>  	int err;
> @@ -687,6 +646,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>  		goto err_out;
>  	}
>  
> +	efa_qp_init_keys(qp);
>  	create_qp_params.uarn = ucontext->uarn;
>  	create_qp_params.pd = to_epd(ibpd)->pdn;
>  
> @@ -742,7 +702,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>  	if (err)
>  		goto err_destroy_qp;
>  
> -	rq_entry_inserted = true;
>  	qp->qp_handle = create_qp_resp.qp_handle;
>  	qp->ibqp.qp_num = create_qp_resp.qp_num;
>  	qp->ibqp.qp_type = init_attr->qp_type;
> @@ -759,7 +718,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>  			ibdev_dbg(&dev->ibdev,
>  				  "Failed to copy udata for qp[%u]\n",
>  				  create_qp_resp.qp_num);
> -			goto err_destroy_qp;
> +			goto err_remove_mmap_entries;
>  		}
>  	}
>  
> @@ -767,15 +726,17 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>  
>  	return &qp->ibqp;
>  
> +err_remove_mmap_entries:
> +	efa_qp_user_mmap_entries_remove(ucontext, qp);
>  err_destroy_qp:
>  	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
>  err_free_mapped:
> -	if (qp->rq_size) {
> +	if (qp->rq_dma_addr)

What's the difference?

>  		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr, qp->rq_size,
>  				 DMA_TO_DEVICE);
> -		if (!rq_entry_inserted)
> -			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
> -	}
> +
> +	if (qp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
> +		free_pages_exact(qp->rq_cpu_addr, qp->rq_size);

This should be inside the previous if statement, otherwise it might try to free
pages that weren't allocated.

>  err_free_qp:
>  	kfree(qp);
>  err_out:
> @@ -887,6 +848,7 @@ static int efa_destroy_cq_idx(struct efa_dev *dev, int cq_idx)
>  
>  void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
>  {
> +	struct efa_ucontext *ucontext;

Reverse xmas tree.

>  	struct efa_dev *dev = to_edev(ibcq->device);
>  	struct efa_cq *cq = to_ecq(ibcq);
>  
> @@ -894,20 +856,33 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
>  		  "Destroy cq[%d] virt[0x%p] freed: size[%lu], dma[%pad]\n",
>  		  cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
>  
> +	ucontext = rdma_udata_to_drv_context(udata, struct efa_ucontext,
> +					     ibucontext);
>  	efa_destroy_cq_idx(dev, cq->cq_idx);
>  	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
>  			 DMA_FROM_DEVICE);
> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> +				    cq->mmap_key);

Entry removal should be first.

>  }
>  
>  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
>  		      struct vm_area_struct *vma, u64 key, u64 length)
>  {
> -	struct efa_mmap_entry *entry;
> +	struct rdma_user_mmap_entry *rdma_entry;
> +	struct efa_user_mmap_entry *entry;
>  	unsigned long va;
>  	u64 pfn;
>  	int err;
>  
> -	entry = mmap_entry_get(dev, ucontext, key, length);
> -	if (!entry) {
> +	rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
> +					      length, vma);
> +	if (!rdma_entry) {
>  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
>  			  key);
>  		return -EINVAL;
>  	}
> +	entry = to_emmap(rdma_entry);
> +	if (entry->length != length) {
> +		ibdev_dbg(&dev->ibdev,
> +			  "key[%#llx] does not have valid length[%#llx] expected[%#llx]\n",
> +			  key, length, entry->length);

Need to put the entry.

> +		return -EINVAL;
> +	}
>  
>  	ibdev_dbg(&dev->ibdev,
>  		  "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
> -		  entry->address, length, entry->mmap_flag);
> +		  entry->address, entry->length, entry->mmap_flag);
>  
>  	pfn = entry->address >> PAGE_SHIFT;
>  	switch (entry->mmap_flag) {
> @@ -1637,6 +1630,10 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
>  			&dev->ibdev,
>  			"Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
>  			entry->address, length, entry->mmap_flag, err);
> +
> +		rdma_user_mmap_entry_put(&ucontext->ibucontext,
> +					 rdma_entry);
> +
>  		return err;
>  	}
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [EXT] Re: [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions
  2019-08-29 12:40     ` Jason Gunthorpe
@ 2019-08-30  6:10       ` " Michal Kalderon
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-30  6:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Gal Pressman
  Cc: Ariel Elior, dledford, bmt, sleybo, leon, linux-rdma, Ariel Elior

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, August 29, 2019 3:41 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Aug 29, 2019 at 02:35:45PM +0300, Gal Pressman wrote:
> > On 27/08/2019 16:28, Michal Kalderon wrote:
> > > +/**
> > > + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> > > + *
> > > + * @ucontext: associated user context.
> > > + * @key: the key received from rdma_user_mmap_entry_insert which
> > > + *     is provided by user as the address to map.
> > > + * @len: the length the user wants to map.
> > > + * @vma: the vma related to the current mmap call.
> > > + *
> > > + * This function is called when a user tries to mmap a key it
> > > + * initially received from the driver. The key was created by
> > > + * the function rdma_user_mmap_entry_insert. The function should
> > > + * be called only once per mmap. It initializes the vma and
> > > + * increases the entries ref-count. Once the memory is unmapped
> > > + * the ref-count will decrease. When the refcount reaches zero
> > > + * the entry will be deleted.
> > > + *
> > > + * Return an entry if exists or NULL if there is no match.
> > > + */
> > > +struct rdma_user_mmap_entry *
> > > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key,
> u64 len,
> > > +			 struct vm_area_struct *vma)
> > > +{
> > > +	struct rdma_user_mmap_entry *entry;
> > > +	u64 mmap_page;
> > > +
> > > +	mmap_page = key >> PAGE_SHIFT;
> > > +	if (mmap_page > U32_MAX)
> > > +		return NULL;
> > > +
> > > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > > +	if (!entry)
> > > +		return NULL;
> >
> > I'm probably missing something, what happens if an insertion is done,
> > a get is called and right at this point (before kref_get) the entry is
> > being removed (and freed by the driver)?
> 
> Yes, things are wrong here.. It should hold xa_lock to protect entry until the
> kref is obtained and this must use kref_get_unless_zero() as the kref could
> be 0 while still in the xarray.
> 
> > > +	for (i = 0; i < entry->npages; i++) {
> > > +		xa_erase(&ucontext->mmap_xa, entry->mmap_page + i);
> 
> This is better to use __xa_erase and hold the xa_lock outside the loop
Ok, will fix

> 
> > > +	/* We want the whole allocation to be done without interruption
> > > +	 * from a different thread. The allocation requires finding a
> > > +	 * free range and storing. During the xa_insert the lock could be
> > > +	 * released, we don't want another thread taking the gap.
> > > +	 */
> > > +	mutex_lock(&ufile->umap_lock);
> > > +
> > > +	xa_lock(&ucontext->mmap_xa);
> >
> > Doesn't the mutex replace the xa_lock?
> 
> No, absolutely not. xarray must hold its internal lock when required. The
> external lock is only about protecting the contents
> 
> I'm not sure why this needs to hold this mutex, the spinlock looks OK.
> 
You pointed this out in "v7" xa_insert can release the lock while allocating memory leading
To a race that another thread could squeeze into the gap in the meantime. 

> > > +
> > > +	/* We want to find an empty range */
> > > +	npages = (u32)DIV_ROUND_UP(length, PAGE_SIZE);
> > > +	entry->npages = npages;
> > > +	do {
> > > +		/* First find an empty index */
> > > +		xas_find_marked(&xas, U32_MAX, XA_FREE_MARK);
> > > +		if (xas.xa_node == XAS_RESTART)
> > > +			goto err_unlock;
> > > +
> > > +		xa_first = xas.xa_index;
> > > +
> > > +		/* Is there enough room to have the range? */
> > > +		if (check_add_overflow(xa_first, npages, &xa_last))
> > > +			goto err_unlock;
> > > +
> > > +		/* Now look for the next present entry. If such doesn't
> > > +		 * exist, we found an empty range and can proceed
> > > +		 */
> > > +		xas_next_entry(&xas, xa_last - 1);
> > > +		if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last)
> > > +			break;
> > > +		/* o/w look for the next free entry */
> > > +	} while (true);
> 
> while(true) not do/while is the usual convention
ok
> 
> Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-08-29 14:20   ` Gal Pressman
@ 2019-08-30  6:15     ` Michal Kalderon
  2019-09-01  6:56       ` Gal Pressman
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Kalderon @ 2019-08-30  6:15 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Ariel Elior, jgg, dledford, bmt, sleybo, leon, linux-rdma, Ariel Elior

> From: Gal Pressman <galpress@amazon.com>
> Sent: Thursday, August 29, 2019 5:21 PM
> 
> On 27/08/2019 16:28, Michal Kalderon wrote:
> > +static void efa_qp_user_mmap_entries_remove(struct efa_ucontext
> *ucontext,
> > +					    struct efa_qp *qp)
> > +{
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp-
> >sq_db_mmap_key);
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> > +				    qp->llq_desc_mmap_key);
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp-
> >rq_mmap_key);
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> > +qp->rq_db_mmap_key);
> 
> Please remove the entries in reverse insertion order.
I don't mind fixing, but why ? 

> 
> > +}
> > +
> >  static int qp_mmap_entries_setup(struct efa_qp *qp,
> >  				 struct efa_dev *dev,
> >  				 struct efa_ucontext *ucontext,
> >  				 struct efa_com_create_qp_params
> *params,
> >  				 struct efa_ibv_create_qp_resp *resp)  {
> > -	/*
> > -	 * Once an entry is inserted it might be mmapped, hence cannot be
> > -	 * cleaned up until dealloc_ucontext.
> > -	 */
> > -	resp->sq_db_mmap_key =
> > -		mmap_entry_insert(dev, ucontext, qp,
> > -				  dev->db_bar_addr + resp->sq_db_offset,
> > -				  PAGE_SIZE, EFA_MMAP_IO_NC);
> > -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
> > -		return -ENOMEM;
> > +	u64 address;
> > +	u64 length;
> > +	int err;
> > +
> > +	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> > +					 dev->db_bar_addr +
> > +					 resp->sq_db_offset,
> > +					 PAGE_SIZE, EFA_MMAP_IO_NC,
> > +					 &qp->sq_db_mmap_key);
> > +	if (err)
> > +		return err;
> >
> > +	resp->sq_db_mmap_key = qp->sq_db_mmap_key;
> >  	resp->sq_db_offset &= ~PAGE_MASK;
> >
> > -	resp->llq_desc_mmap_key =
> > -		mmap_entry_insert(dev, ucontext, qp,
> > -				  dev->mem_bar_addr + resp-
> >llq_desc_offset,
> > -				  PAGE_ALIGN(params-
> >sq_ring_size_in_bytes +
> > -					     (resp->llq_desc_offset &
> ~PAGE_MASK)),
> > -				  EFA_MMAP_IO_WC);
> > -	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
> > -		return -ENOMEM;
> > +	address = dev->mem_bar_addr + resp->llq_desc_offset;
> > +	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
> > +			    (resp->llq_desc_offset & ~PAGE_MASK));
> > +
> > +	err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> > +					 address,
> > +					 length,
> > +					 EFA_MMAP_IO_WC,
> > +					 &qp->llq_desc_mmap_key);
> > +	if (err)
> > +		goto err1;
> >
> > +	resp->llq_desc_mmap_key = qp->llq_desc_mmap_key;
> >  	resp->llq_desc_offset &= ~PAGE_MASK;
> >
> >  	if (qp->rq_size) {
> > -		resp->rq_db_mmap_key =
> > -			mmap_entry_insert(dev, ucontext, qp,
> > -					  dev->db_bar_addr + resp-
> >rq_db_offset,
> > -					  PAGE_SIZE, EFA_MMAP_IO_NC);
> > -		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
> > -			return -ENOMEM;
> > +		address = dev->db_bar_addr + resp->rq_db_offset;
> >
> > +		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> > +						 address, PAGE_SIZE,
> > +						 EFA_MMAP_IO_NC,
> > +						 &qp->rq_db_mmap_key);
> > +		if (err)
> > +			goto err2;
> > +
> > +		resp->rq_db_mmap_key = qp->rq_db_mmap_key;
> >  		resp->rq_db_offset &= ~PAGE_MASK;
> >
> > -		resp->rq_mmap_key =
> > -			mmap_entry_insert(dev, ucontext, qp,
> > -					  virt_to_phys(qp->rq_cpu_addr),
> > -					  qp->rq_size,
> EFA_MMAP_DMA_PAGE);
> > -		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
> > -			return -ENOMEM;
> > +		address = virt_to_phys(qp->rq_cpu_addr);
> > +		err = efa_user_mmap_entry_insert(&ucontext->ibucontext,
> > +						 address, qp->rq_size,
> > +						 EFA_MMAP_DMA_PAGE,
> > +						 &qp->rq_mmap_key);
> > +		if (err)
> > +			goto err3;
> >
> > +		resp->rq_mmap_key = qp->rq_mmap_key;
> >  		resp->rq_mmap_size = qp->rq_size;
> >  	}
> >
> >  	return 0;
> > +
> > +err3:
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> > +qp->rq_db_mmap_key);
> > +
> > +err2:
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> > +				    qp->llq_desc_mmap_key);
> > +
> > +err1:
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> > +qp->sq_db_mmap_key);
> 
> I prefer meaningful goto labels, e.g err_remove_sq_db instead of err1.
Ok

> 
> > +
> > +	/* If any error occurred, we init the keys back to invalid */
> > +	efa_qp_init_keys(qp);
> > +
> > +	return err;
> >  }
> >
> >  static int efa_qp_validate_cap(struct efa_dev *dev, @@ -634,7 +594,6
> > @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >  	struct efa_dev *dev = to_edev(ibpd->device);
> >  	struct efa_ibv_create_qp_resp resp = {};
> >  	struct efa_ibv_create_qp cmd = {};
> > -	bool rq_entry_inserted = false;
> >  	struct efa_ucontext *ucontext;
> >  	struct efa_qp *qp;
> >  	int err;
> > @@ -687,6 +646,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >  		goto err_out;
> >  	}
> >
> > +	efa_qp_init_keys(qp);
> >  	create_qp_params.uarn = ucontext->uarn;
> >  	create_qp_params.pd = to_epd(ibpd)->pdn;
> >
> > @@ -742,7 +702,6 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >  	if (err)
> >  		goto err_destroy_qp;
> >
> > -	rq_entry_inserted = true;
> >  	qp->qp_handle = create_qp_resp.qp_handle;
> >  	qp->ibqp.qp_num = create_qp_resp.qp_num;
> >  	qp->ibqp.qp_type = init_attr->qp_type; @@ -759,7 +718,7 @@
> struct
> > ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >  			ibdev_dbg(&dev->ibdev,
> >  				  "Failed to copy udata for qp[%u]\n",
> >  				  create_qp_resp.qp_num);
> > -			goto err_destroy_qp;
> > +			goto err_remove_mmap_entries;
> >  		}
> >  	}
> >
> > @@ -767,15 +726,17 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >
> >  	return &qp->ibqp;
> >
> > +err_remove_mmap_entries:
> > +	efa_qp_user_mmap_entries_remove(ucontext, qp);
> >  err_destroy_qp:
> >  	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
> >  err_free_mapped:
> > -	if (qp->rq_size) {
> > +	if (qp->rq_dma_addr)
> 
> What's the difference?
Seemed a better query since it now only covers the rq_dma_addr unmapping. 

> 
> >  		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr,
> qp->rq_size,
> >  				 DMA_TO_DEVICE);
> > -		if (!rq_entry_inserted)
> > -			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
> > -	}
> > +
> > +	if (qp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
> > +		free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
> 
> This should be inside the previous if statement, otherwise it might try to free
> pages that weren't allocated.
If they weren't allocated the key will be INVALID and they won't be freed.

> 
> >  err_free_qp:
> >  	kfree(qp);
> >  err_out:
> > @@ -887,6 +848,7 @@ static int efa_destroy_cq_idx(struct efa_dev *dev,
> > int cq_idx)
> >
> >  void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)  {
> > +	struct efa_ucontext *ucontext;
> 
> Reverse xmas tree.
ok
> 
> >  	struct efa_dev *dev = to_edev(ibcq->device);
> >  	struct efa_cq *cq = to_ecq(ibcq);
> >
> > @@ -894,20 +856,33 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct
> ib_udata *udata)
> >  		  "Destroy cq[%d] virt[0x%p] freed: size[%lu], dma[%pad]\n",
> >  		  cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
> >
> > +	ucontext = rdma_udata_to_drv_context(udata, struct efa_ucontext,
> > +					     ibucontext);
> >  	efa_destroy_cq_idx(dev, cq->cq_idx);
> >  	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
> >  			 DMA_FROM_DEVICE);
> > +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> > +				    cq->mmap_key);
> 
> Entry removal should be first.
Why ? removing can lead to freeing, why would we want that before unmapping ? 

> 
> >  }
> >
> >  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext
> *ucontext,
> >  		      struct vm_area_struct *vma, u64 key, u64 length)  {
> > -	struct efa_mmap_entry *entry;
> > +	struct rdma_user_mmap_entry *rdma_entry;
> > +	struct efa_user_mmap_entry *entry;
> >  	unsigned long va;
> >  	u64 pfn;
> >  	int err;
> >
> > -	entry = mmap_entry_get(dev, ucontext, key, length);
> > -	if (!entry) {
> > +	rdma_entry = rdma_user_mmap_entry_get(&ucontext-
> >ibucontext, key,
> > +					      length, vma);
> > +	if (!rdma_entry) {
> >  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid
> entry\n",
> >  			  key);
> >  		return -EINVAL;
> >  	}
> > +	entry = to_emmap(rdma_entry);
> > +	if (entry->length != length) {
> > +		ibdev_dbg(&dev->ibdev,
> > +			  "key[%#llx] does not have valid length[%#llx]
> expected[%#llx]\n",
> > +			  key, length, entry->length);
> 
> Need to put the entry.
Right thanks, will also fix in qedr and siw.

> 
> > +		return -EINVAL;
> > +	}
> >
> >  	ibdev_dbg(&dev->ibdev,
> >  		  "Mapping address[%#llx], length[%#llx],
> mmap_flag[%d]\n",
> > -		  entry->address, length, entry->mmap_flag);
> > +		  entry->address, entry->length, entry->mmap_flag);
> >
> >  	pfn = entry->address >> PAGE_SHIFT;
> >  	switch (entry->mmap_flag) {
> > @@ -1637,6 +1630,10 @@ static int __efa_mmap(struct efa_dev *dev,
> struct efa_ucontext *ucontext,
> >  			&dev->ibdev,
> >  			"Couldn't mmap address[%#llx] length[%#llx]
> mmap_flag[%d] err[%d]\n",
> >  			entry->address, length, entry->mmap_flag, err);
> > +
> > +		rdma_user_mmap_entry_put(&ucontext->ibucontext,
> > +					 rdma_entry);
> > +
> >  		return err;
> >  	}
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
                   ` (7 preceding siblings ...)
  2019-08-28 14:41 ` [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Gal Pressman
@ 2019-08-30 12:07 ` Bernard Metzler
  2019-08-30 12:42   ` [EXT] " Michal Kalderon
  2019-08-30 13:19   ` Bernard Metzler
  8 siblings, 2 replies; 25+ messages in thread
From: Bernard Metzler @ 2019-08-30 12:07 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: mkalderon, aelior, jgg, dledford, galpress, sleybo, leon,
	linux-rdma, Ariel Elior

-----"Michal Kalderon" <michal.kalderon@marvell.com> wrote: -----

Hi Michael,

I tried this patch. It unfortunately panics immediately
when siw gets used. I'll investigate further. Some comments in line.

Thanks
Bernard.

>To: <mkalderon@marvell.com>, <aelior@marvell.com>, <jgg@ziepe.ca>,
><dledford@redhat.com>, <bmt@zurich.ibm.com>, <galpress@amazon.com>,
><sleybo@amazon.com>, <leon@kernel.org>
>From: "Michal Kalderon" <michal.kalderon@marvell.com>
>Date: 08/27/2019 03:31PM
>Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
><michal.kalderon@marvell.com>, "Ariel Elior"
><ariel.elior@marvell.com>
>Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common
>mmap_xa helpers
>
>Remove the functions related to managing the mmap_xa database.
>This code is now common in ib_core. Use the common API's instead.
>
>Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
>Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>---
> drivers/infiniband/sw/siw/siw.h       |  20 ++-
> drivers/infiniband/sw/siw/siw_main.c  |   1 +
> drivers/infiniband/sw/siw/siw_verbs.c | 223
>+++++++++++++++++++---------------
> drivers/infiniband/sw/siw/siw_verbs.h |   1 +
> 4 files changed, 144 insertions(+), 101 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw.h
>b/drivers/infiniband/sw/siw/siw.h
>index 77b1aabf6ff3..d48cd42ae43e 100644
>--- a/drivers/infiniband/sw/siw/siw.h
>+++ b/drivers/infiniband/sw/siw/siw.h
>@@ -220,7 +220,7 @@ struct siw_cq {
> 	u32 cq_get;
> 	u32 num_cqe;
> 	bool kernel_verbs;
>-	u32 xa_cq_index; /* mmap information for CQE array */
>+	u64 cq_key; /* mmap information for CQE array */
> 	u32 id; /* For debugging only */
> };
> 
>@@ -263,7 +263,7 @@ struct siw_srq {
> 	u32 rq_put;
> 	u32 rq_get;
> 	u32 num_rqe; /* max # of wqe's allowed */
>-	u32 xa_srq_index; /* mmap information for SRQ array */
>+	u64 srq_key; /* mmap information for SRQ array */
> 	char armed; /* inform user if limit hit */
> 	char kernel_verbs; /* '1' if kernel client */
> };
>@@ -477,8 +477,8 @@ struct siw_qp {
> 		u8 layer : 4, etype : 4;
> 		u8 ecode;
> 	} term_info;
>-	u32 xa_sq_index; /* mmap information for SQE array */
>-	u32 xa_rq_index; /* mmap information for RQE array */
>+	u64 sq_key; /* mmap information for SQE array */
>+	u64 rq_key; /* mmap information for RQE array */
> 	struct rcu_head rcu;
> };
> 
>@@ -503,6 +503,12 @@ struct iwarp_msg_info {
> 	int (*rx_data)(struct siw_qp *qp);
> };
> 
>+struct siw_user_mmap_entry {
>+	struct rdma_user_mmap_entry rdma_entry;
>+	u64 address;
>+	u64 length;

This unsigned 64 range for an mmap length seems to be quite
ambitious. I do not expect we are mmapping objects of
that size. That range would be also not supported by 32bit
architectures  - a user cannot mmap more than size_t.
So size_t might be more appropriate? This throughout
all the proposed mmap_xa helpers.

>+};
>+
> /* Global siw parameters. Currently set in siw_main.c */
> extern const bool zcopy_tx;
> extern const bool try_gso;
>@@ -607,6 +613,12 @@ static inline struct siw_mr *to_siw_mr(struct
>ib_mr *base_mr)
> 	return container_of(base_mr, struct siw_mr, base_mr);
> }
> 
>+static inline struct siw_user_mmap_entry *
>+to_siw_mmap_entry(struct rdma_user_mmap_entry *rdma_mmap)
>+{
>+	return container_of(rdma_mmap, struct siw_user_mmap_entry,
>rdma_entry);
>+}
>+
> static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev,
>int id)
> {
> 	struct siw_qp *qp;
>diff --git a/drivers/infiniband/sw/siw/siw_main.c
>b/drivers/infiniband/sw/siw/siw_main.c
>index 05a92f997f60..46c0bb59489d 100644
>--- a/drivers/infiniband/sw/siw/siw_main.c
>+++ b/drivers/infiniband/sw/siw/siw_main.c
>@@ -298,6 +298,7 @@ static const struct ib_device_ops siw_device_ops
>= {
> 	.iw_rem_ref = siw_qp_put_ref,
> 	.map_mr_sg = siw_map_mr_sg,
> 	.mmap = siw_mmap,
>+	.mmap_free = siw_mmap_free,
> 	.modify_qp = siw_verbs_modify_qp,
> 	.modify_srq = siw_modify_srq,
> 	.poll_cq = siw_poll_cq,
>diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>b/drivers/infiniband/sw/siw/siw_verbs.c
>index 03176a3d1e18..9e049241051e 100644
>--- a/drivers/infiniband/sw/siw/siw_verbs.c
>+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>@@ -34,44 +34,21 @@ static char ib_qp_state_to_string[IB_QPS_ERR +
>1][sizeof("RESET")] = {
> 	[IB_QPS_ERR] = "ERR"
> };
> 
>-static u32 siw_create_uobj(struct siw_ucontext *uctx, void *vaddr,
>u32 size)
>+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
> {
>-	struct siw_uobj *uobj;
>-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
>-	u32 key;
>+	struct siw_user_mmap_entry *entry = to_siw_mmap_entry(rdma_entry);
> 
>-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
>-	if (!uobj)
>-		return SIW_INVAL_UOBJ_KEY;
>-
>-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit,
>&uctx->uobj_nextkey,
>-			    GFP_KERNEL) < 0) {
>-		kfree(uobj);
>-		return SIW_INVAL_UOBJ_KEY;
>-	}
>-	uobj->size = PAGE_ALIGN(size);
>-	uobj->addr = vaddr;
>-
>-	return key;
>-}
>-
>-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
>-				     unsigned long off, u32 size)
>-{
>-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
>-
>-	if (uobj && uobj->size == size)
>-		return uobj;
>-
>-	return NULL;
>+	vfree((void *)entry->address);
>+	kfree(entry);
> }
> 
> int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
> {
> 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
>-	struct siw_uobj *uobj;
>-	unsigned long off = vma->vm_pgoff;
>+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
>+	struct rdma_user_mmap_entry *rdma_entry;
> 	int size = vma->vm_end - vma->vm_start;
>+	struct siw_user_mmap_entry *entry;
> 	int rv = -EINVAL;
> 
> 	/*
>@@ -81,15 +58,26 @@ int siw_mmap(struct ib_ucontext *ctx, struct
>vm_area_struct *vma)
> 		pr_warn("siw: mmap not page aligned\n");
> 		goto out;
> 	}
>-	uobj = siw_get_uobj(uctx, off, size);
>-	if (!uobj) {
>+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off,
>+					      size, vma);
>+	if (!rdma_entry) {
> 		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %u\n",
> 			off, size);
> 		goto out;
> 	}
>-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
>-	if (rv)
>+	entry = to_siw_mmap_entry(rdma_entry);
>+	if (entry->length != size) {
>+		siw_dbg(&uctx->sdev->base_dev,
>+			"key[%#lx] does not have valid length[%#x] expected[%#llx]\n",
>+			off, size, entry->length);
>+		return -EINVAL;
>+	}
>+
>+	rv = remap_vmalloc_range(vma, (void *)entry->address, 0);
>+	if (rv) {
> 		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off, size);
>+		rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry);
>+	}
> out:
> 	return rv;
> }
>@@ -105,7 +93,7 @@ int siw_alloc_ucontext(struct ib_ucontext
>*base_ctx, struct ib_udata *udata)
> 		rv = -ENOMEM;
> 		goto err_out;
> 	}
>-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
>+
> 	ctx->uobj_nextkey = 0;
> 	ctx->sdev = sdev;
> 
>@@ -135,19 +123,7 @@ int siw_alloc_ucontext(struct ib_ucontext
>*base_ctx, struct ib_udata *udata)
> void siw_dealloc_ucontext(struct ib_ucontext *base_ctx)
> {
> 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
>-	void *entry;
>-	unsigned long index;
> 
>-	/*
>-	 * Make sure all user mmap objects are gone. Since QP, CQ
>-	 * and SRQ destroy routines destroy related objects, nothing
>-	 * should be found here.
>-	 */
>-	xa_for_each(&uctx->xa, index, entry) {
>-		kfree(xa_erase(&uctx->xa, index));
>-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
>-	}
>-	xa_destroy(&uctx->xa);
> 	atomic_dec(&uctx->sdev->num_ctx);
> }
> 
>@@ -293,6 +269,28 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
> 	siw_qp_put(to_siw_qp(base_qp));
> }
> 
>+static int siw_user_mmap_entry_insert(struct ib_ucontext *ucontext,
>+				      u64 address, u64 length,
>+				      u64 *key)
>+{
>+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry),
>GFP_KERNEL);
>+
>+	if (!entry)
>+		return -ENOMEM;
>+
>+	entry->address = address;
>+	entry->length = length;
>+
>+	*key = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry,
>+					   length);
>+	if (*key == RDMA_USER_MMAP_INVALID) {
>+		kfree(entry);
>+		return -ENOMEM;
>+	}
>+
>+	return 0;
>+}
>+
> /*
>  * siw_create_qp()
>  *
>@@ -317,6 +315,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 	struct siw_cq *scq = NULL, *rcq = NULL;
> 	unsigned long flags;
> 	int num_sqe, num_rqe, rv = 0;
>+	u64 length;
>+	u64 key;
> 
> 	siw_dbg(base_dev, "create new QP\n");
> 
>@@ -380,8 +380,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 	spin_lock_init(&qp->orq_lock);
> 
> 	qp->kernel_verbs = !udata;
>-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
>-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
>+	qp->sq_key = RDMA_USER_MMAP_INVALID;
>+	qp->rq_key = RDMA_USER_MMAP_INVALID;
> 
> 	rv = siw_qp_add(sdev, qp);
> 	if (rv)
>@@ -459,26 +459,41 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 		uresp.qp_id = qp_id(qp);
> 
> 		if (qp->sendq) {
>-			qp->xa_sq_index =
>-				siw_create_uobj(uctx, qp->sendq,
>-					num_sqe * sizeof(struct siw_sqe));
>+			length = num_sqe * sizeof(struct siw_sqe);
>+			rv = siw_user_mmap_entry_insert(&uctx->base_ucontext,
>+							(uintptr_t)qp->sendq,
>+							length, &key);
>+			if (!rv)

Shall be 'if (rv)'
The new 'siw_user_mmap_entry_insert()' seem to return zero
for success. Same issue afrer all other invocations of
siw_user_mmap_entry_insert() further down in that file.

>+				goto err_out_xa;
>+
>+			/* If entry was inserted successfully, qp->sendq
>+			 * will be freed by siw_mmap_free
>+			 */
>+			qp->sendq = NULL;

qp->sendq points to the SQ array. Zeroing this pointer will leave
siw with no idea where the WQE's are. It will panic de-referencing
[NULL + current position in ring buffer]. Same for RQ, SRQ and CQ.

 
>+			qp->sq_key = key;
> 		}
>+
> 		if (qp->recvq) {
>-			qp->xa_rq_index =
>-				 siw_create_uobj(uctx, qp->recvq,
>-					num_rqe * sizeof(struct siw_rqe));
>-		}
>-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
>-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
>-			rv = -ENOMEM;
>-			goto err_out_xa;
>+			length = num_rqe * sizeof(struct siw_rqe);
>+			rv = siw_user_mmap_entry_insert(&uctx->base_ucontext,
>+							(uintptr_t)qp->recvq,
>+							length, &key);
>+			if (!rv)
>+				goto err_out_mmap_rem;
>+
>+			/* If entry was inserted successfully, qp->recvq will
>+			 * be freed by siw_mmap_free
>+			 */
>+			qp->recvq = NULL;

see above

>+			qp->rq_key = key;
> 		}
>-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
>-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;

Changes here would require changes to the siw user lib

>+
>+		uresp.sq_key = qp->sq_key;
>+		uresp.rq_key = qp->rq_key;
> 
> 		if (udata->outlen < sizeof(uresp)) {
> 			rv = -EINVAL;
>-			goto err_out_xa;
>+			goto err_out_mmap_rem;
> 		}
> 		rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
> 		if (rv)
>@@ -496,21 +511,23 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> 
> 	return qp->ib_qp;
> 
>+err_out_mmap_rem:
>+	if (udata) {
>+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_key);
>+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_key);
>+	}
>+
> err_out_xa:
> 	xa_erase(&sdev->qp_xa, qp_id(qp));
> err_out:
> 	kfree(siw_base_qp);
> 
> 	if (qp) {
>-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
>-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
>-
> 		vfree(qp->sendq);
> 		vfree(qp->recvq);
> 		kfree(qp);
> 	}
>+
> 	atomic_dec(&sdev->num_qp);
> 
> 	return ERR_PTR(rv);
>@@ -620,10 +637,10 @@ int siw_destroy_qp(struct ib_qp *base_qp,
>struct ib_udata *udata)
> 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
> 	qp->rx_stream.rx_suspend = 1;
> 
>-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
>-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
>+	if (uctx) {
>+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_key);
>+		rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_key);
>+	}
> 
> 	down_write(&qp->state_lock);
> 
>@@ -993,8 +1010,8 @@ void siw_destroy_cq(struct ib_cq *base_cq,
>struct ib_udata *udata)
> 
> 	siw_cq_flush(cq);
> 
>-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
>+	if (ctx)
>+		rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_key);
> 
> 	atomic_dec(&sdev->num_cq);
> 
>@@ -1031,7 +1048,7 @@ int siw_create_cq(struct ib_cq *base_cq, const
>struct ib_cq_init_attr *attr,
> 	size = roundup_pow_of_two(size);
> 	cq->base_cq.cqe = size;
> 	cq->num_cqe = size;
>-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
>+	cq->cq_key = RDMA_USER_MMAP_INVALID;
> 
> 	if (!udata) {
> 		cq->kernel_verbs = 1;
>@@ -1057,16 +1074,21 @@ int siw_create_cq(struct ib_cq *base_cq,
>const struct ib_cq_init_attr *attr,
> 		struct siw_ucontext *ctx =
> 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
> 						  base_ucontext);
>+		u64 length = size * sizeof(struct siw_cqe) +
>+			     sizeof(struct siw_cq_ctrl);
> 
>-		cq->xa_cq_index =
>-			siw_create_uobj(ctx, cq->queue,
>-					size * sizeof(struct siw_cqe) +
>-						sizeof(struct siw_cq_ctrl));
>-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
>-			rv = -ENOMEM;
>+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
>+						(uintptr_t)cq->queue,
>+						length, &cq->cq_key);
>+		if (!rv)

if (rv)
> 			goto err_out;
>-		}
>-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
>+
>+		/* If entry was inserted successfully, cq->queue will be freed
>+		 * by siw_mmap_free
>+		 */
>+		cq->queue = NULL;

causes panic
>+
>+		uresp.cq_key = cq->cq_key;
> 		uresp.cq_id = cq->id;
> 		uresp.num_cqe = size;
> 
>@@ -1087,8 +1109,9 @@ int siw_create_cq(struct ib_cq *base_cq, const
>struct ib_cq_init_attr *attr,
> 		struct siw_ucontext *ctx =
> 			rdma_udata_to_drv_context(udata, struct siw_ucontext,
> 						  base_ucontext);
>-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
>+		if (udata)
>+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>+						    cq->cq_key);
> 		vfree(cq->queue);
> 	}
> 	atomic_dec(&sdev->num_cq);
>@@ -1490,7 +1513,7 @@ int siw_create_srq(struct ib_srq *base_srq,
> 	}
> 	srq->max_sge = attrs->max_sge;
> 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
>-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
>+	srq->srq_key = RDMA_USER_MMAP_INVALID;
> 	srq->limit = attrs->srq_limit;
> 	if (srq->limit)
> 		srq->armed = 1;
>@@ -1509,15 +1532,19 @@ int siw_create_srq(struct ib_srq *base_srq,
> 	}
> 	if (udata) {
> 		struct siw_uresp_create_srq uresp = {};
>+		u64 length = srq->num_rqe * sizeof(struct siw_rqe);
> 
>-		srq->xa_srq_index = siw_create_uobj(
>-			ctx, srq->recvq, srq->num_rqe * sizeof(struct siw_rqe));
>-
>-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
>-			rv = -ENOMEM;
>+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
>+						(uintptr_t)srq->recvq,
>+						length, &srq->srq_key);
>+		if (!rv)

if (rv)

> 			goto err_out;
>-		}
>-		uresp.srq_key = srq->xa_srq_index;
>+
>+		/* If entry was inserted successfully, srq->recvq will be freed
>+		 * by siw_mmap_free
>+		 */
>+		srq->recvq = NULL;

causes panic

>+		uresp.srq_key = srq->srq_key;
> 		uresp.num_rqe = srq->num_rqe;
> 
> 		if (udata->outlen < sizeof(uresp)) {
>@@ -1536,8 +1563,9 @@ int siw_create_srq(struct ib_srq *base_srq,
> 
> err_out:
> 	if (srq->recvq) {
>-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
>-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
>+		if (udata)
>+			rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>+						    srq->srq_key);
> 		vfree(srq->recvq);
> 	}
> 	atomic_dec(&sdev->num_srq);
>@@ -1623,8 +1651,9 @@ void siw_destroy_srq(struct ib_srq *base_srq,
>struct ib_udata *udata)
> 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
> 					  base_ucontext);
> 
>-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
>-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
>+	if (ctx)
>+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>+					    srq->srq_key);
> 
> 	vfree(srq->recvq);
> 	atomic_dec(&sdev->num_srq);
>diff --git a/drivers/infiniband/sw/siw/siw_verbs.h
>b/drivers/infiniband/sw/siw/siw_verbs.h
>index 1910869281cb..1a731989fad6 100644
>--- a/drivers/infiniband/sw/siw/siw_verbs.h
>+++ b/drivers/infiniband/sw/siw/siw_verbs.h
>@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq,
>struct ib_udata *udata);
> int siw_post_srq_recv(struct ib_srq *base_srq, const struct
>ib_recv_wr *wr,
> 		      const struct ib_recv_wr **bad_wr);
> int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma);
>+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
> void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);
> void siw_cq_event(struct siw_cq *cq, enum ib_event_type type);
> void siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
>-- 
>2.14.5
>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-30 12:07 ` [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
@ 2019-08-30 12:42   ` " Michal Kalderon
  2019-08-30 13:19   ` Bernard Metzler
  1 sibling, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-08-30 12:42 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma,
	Ariel Elior

> From: Bernard Metzler <BMT@zurich.ibm.com>
> Sent: Friday, August 30, 2019 3:08 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote: -----
> 
> Hi Michael,
> 
> I tried this patch. It unfortunately panics immediately when siw gets used. I'll
> investigate further. Some comments in line.
Thanks for testing, 

> 
> Thanks
> Bernard.
> 
> >To: <mkalderon@marvell.com>, <aelior@marvell.com>, <jgg@ziepe.ca>,
> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
> <galpress@amazon.com>,
> ><sleybo@amazon.com>, <leon@kernel.org>
> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
> >Date: 08/27/2019 03:31PM
> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
> ><michal.kalderon@marvell.com>, "Ariel Elior"
> ><ariel.elior@marvell.com>
> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
> common
> >mmap_xa helpers
> >
> >Remove the functions related to managing the mmap_xa database.
> >This code is now common in ib_core. Use the common API's instead.
> >
> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> >---
> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
> > drivers/infiniband/sw/siw/siw_verbs.c | 223
> >+++++++++++++++++++---------------
> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
> > 4 files changed, 144 insertions(+), 101 deletions(-)
> >
> >diff --git a/drivers/infiniband/sw/siw/siw.h
> >b/drivers/infiniband/sw/siw/siw.h
> >index 77b1aabf6ff3..d48cd42ae43e 100644
> >--- a/drivers/infiniband/sw/siw/siw.h
> >+++ b/drivers/infiniband/sw/siw/siw.h
> >@@ -220,7 +220,7 @@ struct siw_cq {
> > 	u32 cq_get;
> > 	u32 num_cqe;
> > 	bool kernel_verbs;
> >-	u32 xa_cq_index; /* mmap information for CQE array */
> >+	u64 cq_key; /* mmap information for CQE array */
> > 	u32 id; /* For debugging only */
> > };
> >
> >@@ -263,7 +263,7 @@ struct siw_srq {
> > 	u32 rq_put;
> > 	u32 rq_get;
> > 	u32 num_rqe; /* max # of wqe's allowed */
> >-	u32 xa_srq_index; /* mmap information for SRQ array */
> >+	u64 srq_key; /* mmap information for SRQ array */
> > 	char armed; /* inform user if limit hit */
> > 	char kernel_verbs; /* '1' if kernel client */  }; @@ -477,8 +477,8 @@
> >struct siw_qp {
> > 		u8 layer : 4, etype : 4;
> > 		u8 ecode;
> > 	} term_info;
> >-	u32 xa_sq_index; /* mmap information for SQE array */
> >-	u32 xa_rq_index; /* mmap information for RQE array */
> >+	u64 sq_key; /* mmap information for SQE array */
> >+	u64 rq_key; /* mmap information for RQE array */
> > 	struct rcu_head rcu;
> > };
> >
> >@@ -503,6 +503,12 @@ struct iwarp_msg_info {
> > 	int (*rx_data)(struct siw_qp *qp);
> > };
> >
> >+struct siw_user_mmap_entry {
> >+	struct rdma_user_mmap_entry rdma_entry;
> >+	u64 address;
> >+	u64 length;
> 
> This unsigned 64 range for an mmap length seems to be quite ambitious. I do
> not expect we are mmapping objects of that size. That range would be also
> not supported by 32bit architectures  - a user cannot mmap more than size_t.
> So size_t might be more appropriate? This throughout all the proposed
> mmap_xa helpers.
I will take a look but size_t is 64 bits anyway on a 64 bit system.

> 
> >+};
> >+
> > /* Global siw parameters. Currently set in siw_main.c */  extern const
> >bool zcopy_tx;  extern const bool try_gso; @@ -607,6 +613,12 @@ static
> >inline struct siw_mr *to_siw_mr(struct ib_mr *base_mr)
> > 	return container_of(base_mr, struct siw_mr, base_mr);  }
> >
> >+static inline struct siw_user_mmap_entry * to_siw_mmap_entry(struct
> >+rdma_user_mmap_entry *rdma_mmap) {
> >+	return container_of(rdma_mmap, struct siw_user_mmap_entry,
> >rdma_entry);
> >+}
> >+
> > static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev,
> >int id)  {
> > 	struct siw_qp *qp;
> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
> >b/drivers/infiniband/sw/siw/siw_main.c
> >index 05a92f997f60..46c0bb59489d 100644
> >--- a/drivers/infiniband/sw/siw/siw_main.c
> >+++ b/drivers/infiniband/sw/siw/siw_main.c
> >@@ -298,6 +298,7 @@ static const struct ib_device_ops siw_device_ops =
> >{
> > 	.iw_rem_ref = siw_qp_put_ref,
> > 	.map_mr_sg = siw_map_mr_sg,
> > 	.mmap = siw_mmap,
> >+	.mmap_free = siw_mmap_free,
> > 	.modify_qp = siw_verbs_modify_qp,
> > 	.modify_srq = siw_modify_srq,
> > 	.poll_cq = siw_poll_cq,
> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
> >b/drivers/infiniband/sw/siw/siw_verbs.c
> >index 03176a3d1e18..9e049241051e 100644
> >--- a/drivers/infiniband/sw/siw/siw_verbs.c
> >+++ b/drivers/infiniband/sw/siw/siw_verbs.c
> >@@ -34,44 +34,21 @@ static char ib_qp_state_to_string[IB_QPS_ERR +
> >1][sizeof("RESET")] = {
> > 	[IB_QPS_ERR] = "ERR"
> > };
> >
> >-static u32 siw_create_uobj(struct siw_ucontext *uctx, void *vaddr,
> >u32 size)
> >+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
> > {
> >-	struct siw_uobj *uobj;
> >-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
> >-	u32 key;
> >+	struct siw_user_mmap_entry *entry =
> to_siw_mmap_entry(rdma_entry);
> >
> >-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
> >-	if (!uobj)
> >-		return SIW_INVAL_UOBJ_KEY;
> >-
> >-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit,
> >&uctx->uobj_nextkey,
> >-			    GFP_KERNEL) < 0) {
> >-		kfree(uobj);
> >-		return SIW_INVAL_UOBJ_KEY;
> >-	}
> >-	uobj->size = PAGE_ALIGN(size);
> >-	uobj->addr = vaddr;
> >-
> >-	return key;
> >-}
> >-
> >-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
> >-				     unsigned long off, u32 size)
> >-{
> >-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
> >-
> >-	if (uobj && uobj->size == size)
> >-		return uobj;
> >-
> >-	return NULL;
> >+	vfree((void *)entry->address);
> >+	kfree(entry);
> > }
> >
> > int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)  {
> > 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
> >-	struct siw_uobj *uobj;
> >-	unsigned long off = vma->vm_pgoff;
> >+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
> >+	struct rdma_user_mmap_entry *rdma_entry;
> > 	int size = vma->vm_end - vma->vm_start;
> >+	struct siw_user_mmap_entry *entry;
> > 	int rv = -EINVAL;
> >
> > 	/*
> >@@ -81,15 +58,26 @@ int siw_mmap(struct ib_ucontext *ctx, struct
> >vm_area_struct *vma)
> > 		pr_warn("siw: mmap not page aligned\n");
> > 		goto out;
> > 	}
> >-	uobj = siw_get_uobj(uctx, off, size);
> >-	if (!uobj) {
> >+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext,
> off,
> >+					      size, vma);
> >+	if (!rdma_entry) {
> > 		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu,
> %u\n",
> > 			off, size);
> > 		goto out;
> > 	}
> >-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
> >-	if (rv)
> >+	entry = to_siw_mmap_entry(rdma_entry);
> >+	if (entry->length != size) {
> >+		siw_dbg(&uctx->sdev->base_dev,
> >+			"key[%#lx] does not have valid length[%#x]
> expected[%#llx]\n",
> >+			off, size, entry->length);
> >+		return -EINVAL;
> >+	}
> >+
> >+	rv = remap_vmalloc_range(vma, (void *)entry->address, 0);
> >+	if (rv) {
> > 		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off,
> size);
> >+		rdma_user_mmap_entry_put(&uctx->base_ucontext,
> rdma_entry);
> >+	}
> > out:
> > 	return rv;
> > }
> >@@ -105,7 +93,7 @@ int siw_alloc_ucontext(struct ib_ucontext *base_ctx,
> >struct ib_udata *udata)
> > 		rv = -ENOMEM;
> > 		goto err_out;
> > 	}
> >-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
> >+
> > 	ctx->uobj_nextkey = 0;
> > 	ctx->sdev = sdev;
> >
> >@@ -135,19 +123,7 @@ int siw_alloc_ucontext(struct ib_ucontext
> >*base_ctx, struct ib_udata *udata)  void siw_dealloc_ucontext(struct
> >ib_ucontext *base_ctx)  {
> > 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
> >-	void *entry;
> >-	unsigned long index;
> >
> >-	/*
> >-	 * Make sure all user mmap objects are gone. Since QP, CQ
> >-	 * and SRQ destroy routines destroy related objects, nothing
> >-	 * should be found here.
> >-	 */
> >-	xa_for_each(&uctx->xa, index, entry) {
> >-		kfree(xa_erase(&uctx->xa, index));
> >-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
> >-	}
> >-	xa_destroy(&uctx->xa);
> > 	atomic_dec(&uctx->sdev->num_ctx);
> > }
> >
> >@@ -293,6 +269,28 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
> > 	siw_qp_put(to_siw_qp(base_qp));
> > }
> >
> >+static int siw_user_mmap_entry_insert(struct ib_ucontext *ucontext,
> >+				      u64 address, u64 length,
> >+				      u64 *key)
> >+{
> >+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry),
> >GFP_KERNEL);
> >+
> >+	if (!entry)
> >+		return -ENOMEM;
> >+
> >+	entry->address = address;
> >+	entry->length = length;
> >+
> >+	*key = rdma_user_mmap_entry_insert(ucontext, &entry-
> >rdma_entry,
> >+					   length);
> >+	if (*key == RDMA_USER_MMAP_INVALID) {
> >+		kfree(entry);
> >+		return -ENOMEM;
> >+	}
> >+
> >+	return 0;
> >+}
> >+
> > /*
> >  * siw_create_qp()
> >  *
> >@@ -317,6 +315,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> > 	struct siw_cq *scq = NULL, *rcq = NULL;
> > 	unsigned long flags;
> > 	int num_sqe, num_rqe, rv = 0;
> >+	u64 length;
> >+	u64 key;
> >
> > 	siw_dbg(base_dev, "create new QP\n");
> >
> >@@ -380,8 +380,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> > 	spin_lock_init(&qp->orq_lock);
> >
> > 	qp->kernel_verbs = !udata;
> >-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
> >-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
> >+	qp->sq_key = RDMA_USER_MMAP_INVALID;
> >+	qp->rq_key = RDMA_USER_MMAP_INVALID;
> >
> > 	rv = siw_qp_add(sdev, qp);
> > 	if (rv)
> >@@ -459,26 +459,41 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> > 		uresp.qp_id = qp_id(qp);
> >
> > 		if (qp->sendq) {
> >-			qp->xa_sq_index =
> >-				siw_create_uobj(uctx, qp->sendq,
> >-					num_sqe * sizeof(struct siw_sqe));
> >+			length = num_sqe * sizeof(struct siw_sqe);
> >+			rv = siw_user_mmap_entry_insert(&uctx-
> >base_ucontext,
> >+							(uintptr_t)qp-
> >sendq,
> >+							length, &key);
> >+			if (!rv)
> 
> Shall be 'if (rv)'
> The new 'siw_user_mmap_entry_insert()' seem to return zero for success.
> Same issue afrer all other invocations of
> siw_user_mmap_entry_insert() further down in that file.
Perhaps this is causing the panic you're seeing ? 

> 
> >+				goto err_out_xa;
> >+
> >+			/* If entry was inserted successfully, qp->sendq
> >+			 * will be freed by siw_mmap_free
> >+			 */
> >+			qp->sendq = NULL;
> 
> qp->sendq points to the SQ array. Zeroing this pointer will leave
> siw with no idea where the WQE's are. It will panic de-referencing [NULL +
> current position in ring buffer]. Same for RQ, SRQ and CQ.
Qp->sendq is only used in kernel mode, and only set to NULL is user-space mode
Where it is allocated and mapped in user, so the user will be the one accessing the rings
And not kernel, unless I'm missing something. 
> 
> 
> >+			qp->sq_key = key;
> > 		}
> >+
> > 		if (qp->recvq) {
> >-			qp->xa_rq_index =
> >-				 siw_create_uobj(uctx, qp->recvq,
> >-					num_rqe * sizeof(struct siw_rqe));
> >-		}
> >-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
> >-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
> >-			rv = -ENOMEM;
> >-			goto err_out_xa;
> >+			length = num_rqe * sizeof(struct siw_rqe);
> >+			rv = siw_user_mmap_entry_insert(&uctx-
> >base_ucontext,
> >+							(uintptr_t)qp->recvq,
> >+							length, &key);
> >+			if (!rv)
> >+				goto err_out_mmap_rem;
> >+
> >+			/* If entry was inserted successfully, qp->recvq will
> >+			 * be freed by siw_mmap_free
> >+			 */
> >+			qp->recvq = NULL;
> 
> see above
same
> 
> >+			qp->rq_key = key;
> > 		}
> >-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
> >-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;
> 
> Changes here would require changes to the siw user lib
The key received from the API is shifed by PAGE_SHIFT already, so you're left with the same interface, 
There are no changes required for siw lib.

> 
> >+
> >+		uresp.sq_key = qp->sq_key;
> >+		uresp.rq_key = qp->rq_key;
> >
> > 		if (udata->outlen < sizeof(uresp)) {
> > 			rv = -EINVAL;
> >-			goto err_out_xa;
> >+			goto err_out_mmap_rem;
> > 		}
> > 		rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
> > 		if (rv)
> >@@ -496,21 +511,23 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
> >
> > 	return qp->ib_qp;
> >
> >+err_out_mmap_rem:
> >+	if (udata) {
> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
> qp->sq_key);
> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
> qp->rq_key);
> >+	}
> >+
> > err_out_xa:
> > 	xa_erase(&sdev->qp_xa, qp_id(qp));
> > err_out:
> > 	kfree(siw_base_qp);
> >
> > 	if (qp) {
> >-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
> >-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
> >-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
> >-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
> >-
> > 		vfree(qp->sendq);
> > 		vfree(qp->recvq);
> > 		kfree(qp);
> > 	}
> >+
> > 	atomic_dec(&sdev->num_qp);
> >
> > 	return ERR_PTR(rv);
> >@@ -620,10 +637,10 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct
> >ib_udata *udata)
> > 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
> > 	qp->rx_stream.rx_suspend = 1;
> >
> >-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
> >-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
> >-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
> >-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
> >+	if (uctx) {
> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
> qp->sq_key);
> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
> qp->rq_key);
> >+	}
> >
> > 	down_write(&qp->state_lock);
> >
> >@@ -993,8 +1010,8 @@ void siw_destroy_cq(struct ib_cq *base_cq, struct
> >ib_udata *udata)
> >
> > 	siw_cq_flush(cq);
> >
> >-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
> >-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
> >+	if (ctx)
> >+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
> cq->cq_key);
> >
> > 	atomic_dec(&sdev->num_cq);
> >
> >@@ -1031,7 +1048,7 @@ int siw_create_cq(struct ib_cq *base_cq, const
> >struct ib_cq_init_attr *attr,
> > 	size = roundup_pow_of_two(size);
> > 	cq->base_cq.cqe = size;
> > 	cq->num_cqe = size;
> >-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
> >+	cq->cq_key = RDMA_USER_MMAP_INVALID;
> >
> > 	if (!udata) {
> > 		cq->kernel_verbs = 1;
> >@@ -1057,16 +1074,21 @@ int siw_create_cq(struct ib_cq *base_cq, const
> >struct ib_cq_init_attr *attr,
> > 		struct siw_ucontext *ctx =
> > 			rdma_udata_to_drv_context(udata, struct
> siw_ucontext,
> > 						  base_ucontext);
> >+		u64 length = size * sizeof(struct siw_cqe) +
> >+			     sizeof(struct siw_cq_ctrl);
> >
> >-		cq->xa_cq_index =
> >-			siw_create_uobj(ctx, cq->queue,
> >-					size * sizeof(struct siw_cqe) +
> >-						sizeof(struct siw_cq_ctrl));
> >-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
> >-			rv = -ENOMEM;
> >+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
> >+						(uintptr_t)cq->queue,
> >+						length, &cq->cq_key);
> >+		if (!rv)
> 
> if (rv)
> > 			goto err_out;
> >-		}
> >-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
> >+
> >+		/* If entry was inserted successfully, cq->queue will be freed
> >+		 * by siw_mmap_free
> >+		 */
> >+		cq->queue = NULL;
> 
> causes panic
Should be set to null only if there is udata. I will fix. 

> >+
> >+		uresp.cq_key = cq->cq_key;
> > 		uresp.cq_id = cq->id;
> > 		uresp.num_cqe = size;
> >
> >@@ -1087,8 +1109,9 @@ int siw_create_cq(struct ib_cq *base_cq, const
> >struct ib_cq_init_attr *attr,
> > 		struct siw_ucontext *ctx =
> > 			rdma_udata_to_drv_context(udata, struct
> siw_ucontext,
> > 						  base_ucontext);
> >-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
> >-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
> >+		if (udata)
> >+			rdma_user_mmap_entry_remove(&ctx-
> >base_ucontext,
> >+						    cq->cq_key);
> > 		vfree(cq->queue);
> > 	}
> > 	atomic_dec(&sdev->num_cq);
> >@@ -1490,7 +1513,7 @@ int siw_create_srq(struct ib_srq *base_srq,
> > 	}
> > 	srq->max_sge = attrs->max_sge;
> > 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
> >-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
> >+	srq->srq_key = RDMA_USER_MMAP_INVALID;
> > 	srq->limit = attrs->srq_limit;
> > 	if (srq->limit)
> > 		srq->armed = 1;
> >@@ -1509,15 +1532,19 @@ int siw_create_srq(struct ib_srq *base_srq,
> > 	}
> > 	if (udata) {
> > 		struct siw_uresp_create_srq uresp = {};
> >+		u64 length = srq->num_rqe * sizeof(struct siw_rqe);
> >
> >-		srq->xa_srq_index = siw_create_uobj(
> >-			ctx, srq->recvq, srq->num_rqe * sizeof(struct
> siw_rqe));
> >-
> >-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
> >-			rv = -ENOMEM;
> >+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
> >+						(uintptr_t)srq->recvq,
> >+						length, &srq->srq_key);
> >+		if (!rv)
> 
> if (rv)
> 
> > 			goto err_out;
> >-		}
> >-		uresp.srq_key = srq->xa_srq_index;
> >+
> >+		/* If entry was inserted successfully, srq->recvq will be freed
> >+		 * by siw_mmap_free
> >+		 */
> >+		srq->recvq = NULL;
> 
> causes panic
> 
> >+		uresp.srq_key = srq->srq_key;
> > 		uresp.num_rqe = srq->num_rqe;
> >
> > 		if (udata->outlen < sizeof(uresp)) { @@ -1536,8 +1563,9 @@
> int
> >siw_create_srq(struct ib_srq *base_srq,
> >
> > err_out:
> > 	if (srq->recvq) {
> >-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
> >-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
> >+		if (udata)
> >+			rdma_user_mmap_entry_remove(&ctx-
> >base_ucontext,
> >+						    srq->srq_key);
> > 		vfree(srq->recvq);
> > 	}
> > 	atomic_dec(&sdev->num_srq);
> >@@ -1623,8 +1651,9 @@ void siw_destroy_srq(struct ib_srq *base_srq,
> >struct ib_udata *udata)
> > 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
> > 					  base_ucontext);
> >
> >-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
> >-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
> >+	if (ctx)
> >+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
> >+					    srq->srq_key);
> >
> > 	vfree(srq->recvq);
> > 	atomic_dec(&sdev->num_srq);
> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.h
> >b/drivers/infiniband/sw/siw/siw_verbs.h
> >index 1910869281cb..1a731989fad6 100644
> >--- a/drivers/infiniband/sw/siw/siw_verbs.h
> >+++ b/drivers/infiniband/sw/siw/siw_verbs.h
> >@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct
> >ib_udata *udata);  int siw_post_srq_recv(struct ib_srq *base_srq, const
> >struct ib_recv_wr *wr,
> > 		      const struct ib_recv_wr **bad_wr);  int siw_mmap(struct
> >ib_ucontext *ctx, struct vm_area_struct *vma);
> >+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
> > void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);  void
> >siw_cq_event(struct siw_cq *cq, enum ib_event_type type);  void
> >siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
> >--
> >2.14.5
> >
> >


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-30 12:07 ` [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
  2019-08-30 12:42   ` [EXT] " Michal Kalderon
@ 2019-08-30 13:19   ` Bernard Metzler
  2019-09-02  8:00     ` Michal Kalderon
                       ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Bernard Metzler @ 2019-08-30 13:19 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

-----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Michal Kalderon" <mkalderon@marvell.com>
>Date: 08/30/2019 02:42PM
>Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>"galpress@amazon.com" <galpress@amazon.com>, "sleybo@amazon.com"
><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Ariel
>Elior" <aelior@marvell.com>
>Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw:
>Use the common mmap_xa helpers
>
>> From: Bernard Metzler <BMT@zurich.ibm.com>
>> Sent: Friday, August 30, 2019 3:08 PM
>> 
>> External Email
>> 
>>
>---------------------------------------------------------------------
>-
>> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote: -----
>> 
>> Hi Michael,
>> 
>> I tried this patch. It unfortunately panics immediately when siw
>gets used. I'll
>> investigate further. Some comments in line.
>Thanks for testing, 
>
>> 
>> Thanks
>> Bernard.
>> 
>> >To: <mkalderon@marvell.com>, <aelior@marvell.com>, <jgg@ziepe.ca>,
>> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
>> <galpress@amazon.com>,
>> ><sleybo@amazon.com>, <leon@kernel.org>
>> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
>> >Date: 08/27/2019 03:31PM
>> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
>> ><michal.kalderon@marvell.com>, "Ariel Elior"
>> ><ariel.elior@marvell.com>
>> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
>> common
>> >mmap_xa helpers
>> >
>> >Remove the functions related to managing the mmap_xa database.
>> >This code is now common in ib_core. Use the common API's instead.
>> >
>> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
>> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>> >---
>> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
>> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
>> > drivers/infiniband/sw/siw/siw_verbs.c | 223
>> >+++++++++++++++++++---------------
>> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
>> > 4 files changed, 144 insertions(+), 101 deletions(-)
>> >
>> >diff --git a/drivers/infiniband/sw/siw/siw.h
>> >b/drivers/infiniband/sw/siw/siw.h
>> >index 77b1aabf6ff3..d48cd42ae43e 100644
>> >--- a/drivers/infiniband/sw/siw/siw.h
>> >+++ b/drivers/infiniband/sw/siw/siw.h
>> >@@ -220,7 +220,7 @@ struct siw_cq {
>> > 	u32 cq_get;
>> > 	u32 num_cqe;
>> > 	bool kernel_verbs;
>> >-	u32 xa_cq_index; /* mmap information for CQE array */
>> >+	u64 cq_key; /* mmap information for CQE array */
>> > 	u32 id; /* For debugging only */
>> > };
>> >
>> >@@ -263,7 +263,7 @@ struct siw_srq {
>> > 	u32 rq_put;
>> > 	u32 rq_get;
>> > 	u32 num_rqe; /* max # of wqe's allowed */
>> >-	u32 xa_srq_index; /* mmap information for SRQ array */
>> >+	u64 srq_key; /* mmap information for SRQ array */
>> > 	char armed; /* inform user if limit hit */
>> > 	char kernel_verbs; /* '1' if kernel client */  }; @@ -477,8
>+477,8 @@
>> >struct siw_qp {
>> > 		u8 layer : 4, etype : 4;
>> > 		u8 ecode;
>> > 	} term_info;
>> >-	u32 xa_sq_index; /* mmap information for SQE array */
>> >-	u32 xa_rq_index; /* mmap information for RQE array */
>> >+	u64 sq_key; /* mmap information for SQE array */
>> >+	u64 rq_key; /* mmap information for RQE array */
>> > 	struct rcu_head rcu;
>> > };
>> >
>> >@@ -503,6 +503,12 @@ struct iwarp_msg_info {
>> > 	int (*rx_data)(struct siw_qp *qp);
>> > };
>> >
>> >+struct siw_user_mmap_entry {
>> >+	struct rdma_user_mmap_entry rdma_entry;
>> >+	u64 address;
>> >+	u64 length;
>> 
>> This unsigned 64 range for an mmap length seems to be quite
>ambitious. I do
>> not expect we are mmapping objects of that size. That range would
>be also
>> not supported by 32bit architectures  - a user cannot mmap more
>than size_t.
>> So size_t might be more appropriate? This throughout all the
>proposed
>> mmap_xa helpers.
>I will take a look but size_t is 64 bits anyway on a 64 bit system.
>

yes, but we are supposed to support 32 bit architectures
as well, where size_t is 32 bits.

>> 
>> >+};
>> >+
>> > /* Global siw parameters. Currently set in siw_main.c */  extern
>const
>> >bool zcopy_tx;  extern const bool try_gso; @@ -607,6 +613,12 @@
>static
>> >inline struct siw_mr *to_siw_mr(struct ib_mr *base_mr)
>> > 	return container_of(base_mr, struct siw_mr, base_mr);  }
>> >
>> >+static inline struct siw_user_mmap_entry *
>to_siw_mmap_entry(struct
>> >+rdma_user_mmap_entry *rdma_mmap) {
>> >+	return container_of(rdma_mmap, struct siw_user_mmap_entry,
>> >rdma_entry);
>> >+}
>> >+
>> > static inline struct siw_qp *siw_qp_id2obj(struct siw_device
>*sdev,
>> >int id)  {
>> > 	struct siw_qp *qp;
>> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
>> >b/drivers/infiniband/sw/siw/siw_main.c
>> >index 05a92f997f60..46c0bb59489d 100644
>> >--- a/drivers/infiniband/sw/siw/siw_main.c
>> >+++ b/drivers/infiniband/sw/siw/siw_main.c
>> >@@ -298,6 +298,7 @@ static const struct ib_device_ops
>siw_device_ops =
>> >{
>> > 	.iw_rem_ref = siw_qp_put_ref,
>> > 	.map_mr_sg = siw_map_mr_sg,
>> > 	.mmap = siw_mmap,
>> >+	.mmap_free = siw_mmap_free,
>> > 	.modify_qp = siw_verbs_modify_qp,
>> > 	.modify_srq = siw_modify_srq,
>> > 	.poll_cq = siw_poll_cq,
>> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>> >b/drivers/infiniband/sw/siw/siw_verbs.c
>> >index 03176a3d1e18..9e049241051e 100644
>> >--- a/drivers/infiniband/sw/siw/siw_verbs.c
>> >+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>> >@@ -34,44 +34,21 @@ static char ib_qp_state_to_string[IB_QPS_ERR +
>> >1][sizeof("RESET")] = {
>> > 	[IB_QPS_ERR] = "ERR"
>> > };
>> >
>> >-static u32 siw_create_uobj(struct siw_ucontext *uctx, void
>*vaddr,
>> >u32 size)
>> >+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
>> > {
>> >-	struct siw_uobj *uobj;
>> >-	struct xa_limit limit = XA_LIMIT(0, SIW_UOBJ_MAX_KEY);
>> >-	u32 key;
>> >+	struct siw_user_mmap_entry *entry =
>> to_siw_mmap_entry(rdma_entry);
>> >
>> >-	uobj = kzalloc(sizeof(*uobj), GFP_KERNEL);
>> >-	if (!uobj)
>> >-		return SIW_INVAL_UOBJ_KEY;
>> >-
>> >-	if (xa_alloc_cyclic(&uctx->xa, &key, uobj, limit,
>> >&uctx->uobj_nextkey,
>> >-			    GFP_KERNEL) < 0) {
>> >-		kfree(uobj);
>> >-		return SIW_INVAL_UOBJ_KEY;
>> >-	}
>> >-	uobj->size = PAGE_ALIGN(size);
>> >-	uobj->addr = vaddr;
>> >-
>> >-	return key;
>> >-}
>> >-
>> >-static struct siw_uobj *siw_get_uobj(struct siw_ucontext *uctx,
>> >-				     unsigned long off, u32 size)
>> >-{
>> >-	struct siw_uobj *uobj = xa_load(&uctx->xa, off);
>> >-
>> >-	if (uobj && uobj->size == size)
>> >-		return uobj;
>> >-
>> >-	return NULL;
>> >+	vfree((void *)entry->address);
>> >+	kfree(entry);
>> > }
>> >
>> > int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma)
> {
>> > 	struct siw_ucontext *uctx = to_siw_ctx(ctx);
>> >-	struct siw_uobj *uobj;
>> >-	unsigned long off = vma->vm_pgoff;
>> >+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
>> >+	struct rdma_user_mmap_entry *rdma_entry;
>> > 	int size = vma->vm_end - vma->vm_start;
>> >+	struct siw_user_mmap_entry *entry;
>> > 	int rv = -EINVAL;
>> >
>> > 	/*
>> >@@ -81,15 +58,26 @@ int siw_mmap(struct ib_ucontext *ctx, struct
>> >vm_area_struct *vma)
>> > 		pr_warn("siw: mmap not page aligned\n");
>> > 		goto out;
>> > 	}
>> >-	uobj = siw_get_uobj(uctx, off, size);
>> >-	if (!uobj) {
>> >+	rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext,
>> off,
>> >+					      size, vma);
>> >+	if (!rdma_entry) {
>> > 		siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu,
>> %u\n",
>> > 			off, size);
>> > 		goto out;
>> > 	}
>> >-	rv = remap_vmalloc_range(vma, uobj->addr, 0);
>> >-	if (rv)
>> >+	entry = to_siw_mmap_entry(rdma_entry);
>> >+	if (entry->length != size) {
>> >+		siw_dbg(&uctx->sdev->base_dev,
>> >+			"key[%#lx] does not have valid length[%#x]
>> expected[%#llx]\n",
>> >+			off, size, entry->length);
>> >+		return -EINVAL;
>> >+	}
>> >+
>> >+	rv = remap_vmalloc_range(vma, (void *)entry->address, 0);
>> >+	if (rv) {
>> > 		pr_warn("remap_vmalloc_range failed: %lu, %u\n", off,
>> size);
>> >+		rdma_user_mmap_entry_put(&uctx->base_ucontext,
>> rdma_entry);
>> >+	}
>> > out:
>> > 	return rv;
>> > }
>> >@@ -105,7 +93,7 @@ int siw_alloc_ucontext(struct ib_ucontext
>*base_ctx,
>> >struct ib_udata *udata)
>> > 		rv = -ENOMEM;
>> > 		goto err_out;
>> > 	}
>> >-	xa_init_flags(&ctx->xa, XA_FLAGS_ALLOC);
>> >+
>> > 	ctx->uobj_nextkey = 0;
>> > 	ctx->sdev = sdev;
>> >
>> >@@ -135,19 +123,7 @@ int siw_alloc_ucontext(struct ib_ucontext
>> >*base_ctx, struct ib_udata *udata)  void
>siw_dealloc_ucontext(struct
>> >ib_ucontext *base_ctx)  {
>> > 	struct siw_ucontext *uctx = to_siw_ctx(base_ctx);
>> >-	void *entry;
>> >-	unsigned long index;
>> >
>> >-	/*
>> >-	 * Make sure all user mmap objects are gone. Since QP, CQ
>> >-	 * and SRQ destroy routines destroy related objects, nothing
>> >-	 * should be found here.
>> >-	 */
>> >-	xa_for_each(&uctx->xa, index, entry) {
>> >-		kfree(xa_erase(&uctx->xa, index));
>> >-		pr_warn("siw: dropping orphaned uobj at %lu\n", index);
>> >-	}
>> >-	xa_destroy(&uctx->xa);
>> > 	atomic_dec(&uctx->sdev->num_ctx);
>> > }
>> >
>> >@@ -293,6 +269,28 @@ void siw_qp_put_ref(struct ib_qp *base_qp)
>> > 	siw_qp_put(to_siw_qp(base_qp));
>> > }
>> >
>> >+static int siw_user_mmap_entry_insert(struct ib_ucontext
>*ucontext,
>> >+				      u64 address, u64 length,
>> >+				      u64 *key)
>> >+{
>> >+	struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry),
>> >GFP_KERNEL);
>> >+
>> >+	if (!entry)
>> >+		return -ENOMEM;
>> >+
>> >+	entry->address = address;
>> >+	entry->length = length;
>> >+
>> >+	*key = rdma_user_mmap_entry_insert(ucontext, &entry-
>> >rdma_entry,
>> >+					   length);
>> >+	if (*key == RDMA_USER_MMAP_INVALID) {
>> >+		kfree(entry);
>> >+		return -ENOMEM;
>> >+	}
>> >+
>> >+	return 0;
>> >+}
>> >+
>> > /*
>> >  * siw_create_qp()
>> >  *
>> >@@ -317,6 +315,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
>> > 	struct siw_cq *scq = NULL, *rcq = NULL;
>> > 	unsigned long flags;
>> > 	int num_sqe, num_rqe, rv = 0;
>> >+	u64 length;
>> >+	u64 key;
>> >
>> > 	siw_dbg(base_dev, "create new QP\n");
>> >
>> >@@ -380,8 +380,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
>> > 	spin_lock_init(&qp->orq_lock);
>> >
>> > 	qp->kernel_verbs = !udata;
>> >-	qp->xa_sq_index = SIW_INVAL_UOBJ_KEY;
>> >-	qp->xa_rq_index = SIW_INVAL_UOBJ_KEY;
>> >+	qp->sq_key = RDMA_USER_MMAP_INVALID;
>> >+	qp->rq_key = RDMA_USER_MMAP_INVALID;
>> >
>> > 	rv = siw_qp_add(sdev, qp);
>> > 	if (rv)
>> >@@ -459,26 +459,41 @@ struct ib_qp *siw_create_qp(struct ib_pd
>*pd,
>> > 		uresp.qp_id = qp_id(qp);
>> >
>> > 		if (qp->sendq) {
>> >-			qp->xa_sq_index =
>> >-				siw_create_uobj(uctx, qp->sendq,
>> >-					num_sqe * sizeof(struct siw_sqe));
>> >+			length = num_sqe * sizeof(struct siw_sqe);
>> >+			rv = siw_user_mmap_entry_insert(&uctx-
>> >base_ucontext,
>> >+							(uintptr_t)qp-
>> >sendq,
>> >+							length, &key);
>> >+			if (!rv)
>> 
>> Shall be 'if (rv)'
>> The new 'siw_user_mmap_entry_insert()' seem to return zero for
>success.
>> Same issue afrer all other invocations of
>> siw_user_mmap_entry_insert() further down in that file.
>Perhaps this is causing the panic you're seeing ? 

No, I had fixed that.
That's because siw accesses NULL memory when
accessing the CQ/SQ/RQ/SRQ ring.

A failing allocation must not panic ;)

>
>> 
>> >+				goto err_out_xa;
>> >+
>> >+			/* If entry was inserted successfully, qp->sendq
>> >+			 * will be freed by siw_mmap_free
>> >+			 */
>> >+			qp->sendq = NULL;
>> 
>> qp->sendq points to the SQ array. Zeroing this pointer will leave
>> siw with no idea where the WQE's are. It will panic de-referencing
>[NULL +
>> current position in ring buffer]. Same for RQ, SRQ and CQ.
>Qp->sendq is only used in kernel mode, and only set to NULL is
>user-space mode
>Where it is allocated and mapped in user, so the user will be the one
>accessing the rings
>And not kernel, unless I'm missing something. 

These pointers are pointing to the allocated work queues.
These queues/arrays are holding the actual send/recv/complete/srq
work queue elements. It is a shared array. That is why we need
mmap at all.

e.g., 

struct siw_sqe *sqe = &qp->sendq[qp->sq_get % qp->attrs.sq_size];


>> 
>> 
>> >+			qp->sq_key = key;
>> > 		}
>> >+
>> > 		if (qp->recvq) {
>> >-			qp->xa_rq_index =
>> >-				 siw_create_uobj(uctx, qp->recvq,
>> >-					num_rqe * sizeof(struct siw_rqe));
>> >-		}
>> >-		if (qp->xa_sq_index == SIW_INVAL_UOBJ_KEY ||
>> >-		    qp->xa_rq_index == SIW_INVAL_UOBJ_KEY) {
>> >-			rv = -ENOMEM;
>> >-			goto err_out_xa;
>> >+			length = num_rqe * sizeof(struct siw_rqe);
>> >+			rv = siw_user_mmap_entry_insert(&uctx-
>> >base_ucontext,
>> >+							(uintptr_t)qp->recvq,
>> >+							length, &key);
>> >+			if (!rv)
>> >+				goto err_out_mmap_rem;
>> >+
>> >+			/* If entry was inserted successfully, qp->recvq will
>> >+			 * be freed by siw_mmap_free
>> >+			 */
>> >+			qp->recvq = NULL;
>> 
>> see above
>same
>> 
>> >+			qp->rq_key = key;
>> > 		}
>> >-		uresp.sq_key = qp->xa_sq_index << PAGE_SHIFT;
>> >-		uresp.rq_key = qp->xa_rq_index << PAGE_SHIFT;
>> 
>> Changes here would require changes to the siw user lib
>The key received from the API is shifed by PAGE_SHIFT already, so
>you're left with the same interface, 
>There are no changes required for siw lib.
>

That's good news!

>> 
>> >+
>> >+		uresp.sq_key = qp->sq_key;
>> >+		uresp.rq_key = qp->rq_key;
>> >
>> > 		if (udata->outlen < sizeof(uresp)) {
>> > 			rv = -EINVAL;
>> >-			goto err_out_xa;
>> >+			goto err_out_mmap_rem;
>> > 		}
>> > 		rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
>> > 		if (rv)
>> >@@ -496,21 +511,23 @@ struct ib_qp *siw_create_qp(struct ib_pd
>*pd,
>> >
>> > 	return qp->ib_qp;
>> >
>> >+err_out_mmap_rem:
>> >+	if (udata) {
>> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
>> qp->sq_key);
>> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
>> qp->rq_key);
>> >+	}
>> >+
>> > err_out_xa:
>> > 	xa_erase(&sdev->qp_xa, qp_id(qp));
>> > err_out:
>> > 	kfree(siw_base_qp);
>> >
>> > 	if (qp) {
>> >-		if (qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
>> >-			kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
>> >-		if (qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
>> >-			kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
>> >-
>> > 		vfree(qp->sendq);
>> > 		vfree(qp->recvq);
>> > 		kfree(qp);
>> > 	}
>> >+
>> > 	atomic_dec(&sdev->num_qp);
>> >
>> > 	return ERR_PTR(rv);
>> >@@ -620,10 +637,10 @@ int siw_destroy_qp(struct ib_qp *base_qp,
>struct
>> >ib_udata *udata)
>> > 	qp->attrs.flags |= SIW_QP_IN_DESTROY;
>> > 	qp->rx_stream.rx_suspend = 1;
>> >
>> >-	if (uctx && qp->xa_sq_index != SIW_INVAL_UOBJ_KEY)
>> >-		kfree(xa_erase(&uctx->xa, qp->xa_sq_index));
>> >-	if (uctx && qp->xa_rq_index != SIW_INVAL_UOBJ_KEY)
>> >-		kfree(xa_erase(&uctx->xa, qp->xa_rq_index));
>> >+	if (uctx) {
>> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
>> qp->sq_key);
>> >+		rdma_user_mmap_entry_remove(&uctx->base_ucontext,
>> qp->rq_key);
>> >+	}
>> >
>> > 	down_write(&qp->state_lock);
>> >
>> >@@ -993,8 +1010,8 @@ void siw_destroy_cq(struct ib_cq *base_cq,
>struct
>> >ib_udata *udata)
>> >
>> > 	siw_cq_flush(cq);
>> >
>> >-	if (ctx && cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
>> >-		kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
>> >+	if (ctx)
>> >+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>> cq->cq_key);
>> >
>> > 	atomic_dec(&sdev->num_cq);
>> >
>> >@@ -1031,7 +1048,7 @@ int siw_create_cq(struct ib_cq *base_cq,
>const
>> >struct ib_cq_init_attr *attr,
>> > 	size = roundup_pow_of_two(size);
>> > 	cq->base_cq.cqe = size;
>> > 	cq->num_cqe = size;
>> >-	cq->xa_cq_index = SIW_INVAL_UOBJ_KEY;
>> >+	cq->cq_key = RDMA_USER_MMAP_INVALID;
>> >
>> > 	if (!udata) {
>> > 		cq->kernel_verbs = 1;
>> >@@ -1057,16 +1074,21 @@ int siw_create_cq(struct ib_cq *base_cq,
>const
>> >struct ib_cq_init_attr *attr,
>> > 		struct siw_ucontext *ctx =
>> > 			rdma_udata_to_drv_context(udata, struct
>> siw_ucontext,
>> > 						  base_ucontext);
>> >+		u64 length = size * sizeof(struct siw_cqe) +
>> >+			     sizeof(struct siw_cq_ctrl);
>> >
>> >-		cq->xa_cq_index =
>> >-			siw_create_uobj(ctx, cq->queue,
>> >-					size * sizeof(struct siw_cqe) +
>> >-						sizeof(struct siw_cq_ctrl));
>> >-		if (cq->xa_cq_index == SIW_INVAL_UOBJ_KEY) {
>> >-			rv = -ENOMEM;
>> >+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
>> >+						(uintptr_t)cq->queue,
>> >+						length, &cq->cq_key);
>> >+		if (!rv)
>> 
>> if (rv)
>> > 			goto err_out;
>> >-		}
>> >-		uresp.cq_key = cq->xa_cq_index << PAGE_SHIFT;
>> >+
>> >+		/* If entry was inserted successfully, cq->queue will be freed
>> >+		 * by siw_mmap_free
>> >+		 */
>> >+		cq->queue = NULL;
>> 
>> causes panic
>Should be set to null only if there is udata. I will fix. 
>

Should be never null.

>> >+
>> >+		uresp.cq_key = cq->cq_key;
>> > 		uresp.cq_id = cq->id;
>> > 		uresp.num_cqe = size;
>> >
>> >@@ -1087,8 +1109,9 @@ int siw_create_cq(struct ib_cq *base_cq,
>const
>> >struct ib_cq_init_attr *attr,
>> > 		struct siw_ucontext *ctx =
>> > 			rdma_udata_to_drv_context(udata, struct
>> siw_ucontext,
>> > 						  base_ucontext);
>> >-		if (cq->xa_cq_index != SIW_INVAL_UOBJ_KEY)
>> >-			kfree(xa_erase(&ctx->xa, cq->xa_cq_index));
>> >+		if (udata)
>> >+			rdma_user_mmap_entry_remove(&ctx-
>> >base_ucontext,
>> >+						    cq->cq_key);
>> > 		vfree(cq->queue);
>> > 	}
>> > 	atomic_dec(&sdev->num_cq);
>> >@@ -1490,7 +1513,7 @@ int siw_create_srq(struct ib_srq *base_srq,
>> > 	}
>> > 	srq->max_sge = attrs->max_sge;
>> > 	srq->num_rqe = roundup_pow_of_two(attrs->max_wr);
>> >-	srq->xa_srq_index = SIW_INVAL_UOBJ_KEY;
>> >+	srq->srq_key = RDMA_USER_MMAP_INVALID;
>> > 	srq->limit = attrs->srq_limit;
>> > 	if (srq->limit)
>> > 		srq->armed = 1;
>> >@@ -1509,15 +1532,19 @@ int siw_create_srq(struct ib_srq
>*base_srq,
>> > 	}
>> > 	if (udata) {
>> > 		struct siw_uresp_create_srq uresp = {};
>> >+		u64 length = srq->num_rqe * sizeof(struct siw_rqe);
>> >
>> >-		srq->xa_srq_index = siw_create_uobj(
>> >-			ctx, srq->recvq, srq->num_rqe * sizeof(struct
>> siw_rqe));
>> >-
>> >-		if (srq->xa_srq_index == SIW_INVAL_UOBJ_KEY) {
>> >-			rv = -ENOMEM;
>> >+		rv = siw_user_mmap_entry_insert(&ctx->base_ucontext,
>> >+						(uintptr_t)srq->recvq,
>> >+						length, &srq->srq_key);
>> >+		if (!rv)
>> 
>> if (rv)
>> 
>> > 			goto err_out;
>> >-		}
>> >-		uresp.srq_key = srq->xa_srq_index;
>> >+
>> >+		/* If entry was inserted successfully, srq->recvq will be freed
>> >+		 * by siw_mmap_free
>> >+		 */
>> >+		srq->recvq = NULL;
>> 
>> causes panic
>> 
>> >+		uresp.srq_key = srq->srq_key;
>> > 		uresp.num_rqe = srq->num_rqe;
>> >
>> > 		if (udata->outlen < sizeof(uresp)) { @@ -1536,8 +1563,9 @@
>> int
>> >siw_create_srq(struct ib_srq *base_srq,
>> >
>> > err_out:
>> > 	if (srq->recvq) {
>> >-		if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
>> >-			kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
>> >+		if (udata)
>> >+			rdma_user_mmap_entry_remove(&ctx-
>> >base_ucontext,
>> >+						    srq->srq_key);
>> > 		vfree(srq->recvq);
>> > 	}
>> > 	atomic_dec(&sdev->num_srq);
>> >@@ -1623,8 +1651,9 @@ void siw_destroy_srq(struct ib_srq
>*base_srq,
>> >struct ib_udata *udata)
>> > 		rdma_udata_to_drv_context(udata, struct siw_ucontext,
>> > 					  base_ucontext);
>> >
>> >-	if (ctx && srq->xa_srq_index != SIW_INVAL_UOBJ_KEY)
>> >-		kfree(xa_erase(&ctx->xa, srq->xa_srq_index));
>> >+	if (ctx)
>> >+		rdma_user_mmap_entry_remove(&ctx->base_ucontext,
>> >+					    srq->srq_key);
>> >
>> > 	vfree(srq->recvq);
>> > 	atomic_dec(&sdev->num_srq);
>> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.h
>> >b/drivers/infiniband/sw/siw/siw_verbs.h
>> >index 1910869281cb..1a731989fad6 100644
>> >--- a/drivers/infiniband/sw/siw/siw_verbs.h
>> >+++ b/drivers/infiniband/sw/siw/siw_verbs.h
>> >@@ -83,6 +83,7 @@ void siw_destroy_srq(struct ib_srq *base_srq,
>struct
>> >ib_udata *udata);  int siw_post_srq_recv(struct ib_srq *base_srq,
>const
>> >struct ib_recv_wr *wr,
>> > 		      const struct ib_recv_wr **bad_wr);  int siw_mmap(struct
>> >ib_ucontext *ctx, struct vm_area_struct *vma);
>> >+void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
>> > void siw_qp_event(struct siw_qp *qp, enum ib_event_type type);
>void
>> >siw_cq_event(struct siw_cq *cq, enum ib_event_type type);  void
>> >siw_srq_event(struct siw_srq *srq, enum ib_event_type type);
>> >--
>> >2.14.5
>> >
>> >
>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-08-30  6:15     ` Michal Kalderon
@ 2019-09-01  6:56       ` Gal Pressman
  2019-09-02  7:47         ` Michal Kalderon
  0 siblings, 1 reply; 25+ messages in thread
From: Gal Pressman @ 2019-09-01  6:56 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: Ariel Elior, jgg, dledford, bmt, sleybo, leon, linux-rdma

On 30/08/2019 9:15, Michal Kalderon wrote:
>> From: Gal Pressman <galpress@amazon.com>
>> Sent: Thursday, August 29, 2019 5:21 PM
>>
>> On 27/08/2019 16:28, Michal Kalderon wrote:
>>> +static void efa_qp_user_mmap_entries_remove(struct efa_ucontext
>> *ucontext,
>>> +					    struct efa_qp *qp)
>>> +{
>>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp-
>>> sq_db_mmap_key);
>>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
>>> +				    qp->llq_desc_mmap_key);
>>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp-
>>> rq_mmap_key);
>>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
>>> +qp->rq_db_mmap_key);
>>
>> Please remove the entries in reverse insertion order.
> I don't mind fixing, but why ? 

So the flows will be symmetric.

>>
>>> +}
>>> +
>>> @@ -767,15 +726,17 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>>>
>>>  	return &qp->ibqp;
>>>
>>> +err_remove_mmap_entries:
>>> +	efa_qp_user_mmap_entries_remove(ucontext, qp);
>>>  err_destroy_qp:
>>>  	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
>>>  err_free_mapped:
>>> -	if (qp->rq_size) {
>>> +	if (qp->rq_dma_addr)
>>
>> What's the difference?
> Seemed a better query since it now only covers the rq_dma_addr unmapping. 
> 
>>
>>>  		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr,
>> qp->rq_size,
>>>  				 DMA_TO_DEVICE);
>>> -		if (!rq_entry_inserted)
>>> -			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
>>> -	}
>>> +
>>> +	if (qp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
>>> +		free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
>>
>> This should be inside the previous if statement, otherwise it might try to free
>> pages that weren't allocated.
> If they weren't allocated the key will be INVALID and they won't be freed.

If the key is INVALID you call free_pages_exact, but rq_cpu_addr might have
never been allocated (if RQ is of size zero).

> 
>>
>>>  err_free_qp:
>>>  	kfree(qp);
>>>  err_out:
>>> @@ -887,6 +848,7 @@ static int efa_destroy_cq_idx(struct efa_dev *dev,
>>> int cq_idx)
>>>
>>>  void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)  {
>>> +	struct efa_ucontext *ucontext;
>>
>> Reverse xmas tree.
> ok
>>
>>>  	struct efa_dev *dev = to_edev(ibcq->device);
>>>  	struct efa_cq *cq = to_ecq(ibcq);
>>>
>>> @@ -894,20 +856,33 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct
>> ib_udata *udata)
>>>  		  "Destroy cq[%d] virt[0x%p] freed: size[%lu], dma[%pad]\n",
>>>  		  cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
>>>
>>> +	ucontext = rdma_udata_to_drv_context(udata, struct efa_ucontext,
>>> +					     ibucontext);
>>>  	efa_destroy_cq_idx(dev, cq->cq_idx);
>>>  	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
>>>  			 DMA_FROM_DEVICE);
>>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
>>> +				    cq->mmap_key);
>>
>> Entry removal should be first.
> Why ? removing can lead to freeing, why would we want that before unmapping ? 

Makes sense, thanks.

>>
>>>  }
>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers
  2019-09-01  6:56       ` Gal Pressman
@ 2019-09-02  7:47         ` Michal Kalderon
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-09-02  7:47 UTC (permalink / raw)
  To: Gal Pressman; +Cc: Ariel Elior, jgg, dledford, bmt, sleybo, leon, linux-rdma

> From: Gal Pressman <galpress@amazon.com>
> Sent: Sunday, September 1, 2019 9:57 AM
> 
> On 30/08/2019 9:15, Michal Kalderon wrote:
> >> From: Gal Pressman <galpress@amazon.com>
> >> Sent: Thursday, August 29, 2019 5:21 PM
> >>
> >> On 27/08/2019 16:28, Michal Kalderon wrote:
> >>> +static void efa_qp_user_mmap_entries_remove(struct efa_ucontext
> >> *ucontext,
> >>> +					    struct efa_qp *qp)
> >>> +{
> >>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp-
> >>> sq_db_mmap_key);
> >>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> >>> +				    qp->llq_desc_mmap_key);
> >>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext, qp-
> >>> rq_mmap_key);
> >>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> >>> +qp->rq_db_mmap_key);
> >>
> >> Please remove the entries in reverse insertion order.
> > I don't mind fixing, but why ?
> 
> So the flows will be symmetric.
> 
> >>
> >>> +}
> >>> +
> >>> @@ -767,15 +726,17 @@ struct ib_qp *efa_create_qp(struct ib_pd
> >>> *ibpd,
> >>>
> >>>  	return &qp->ibqp;
> >>>
> >>> +err_remove_mmap_entries:
> >>> +	efa_qp_user_mmap_entries_remove(ucontext, qp);
> >>>  err_destroy_qp:
> >>>  	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
> >>>  err_free_mapped:
> >>> -	if (qp->rq_size) {
> >>> +	if (qp->rq_dma_addr)
> >>
> >> What's the difference?
> > Seemed a better query since it now only covers the rq_dma_addr
> unmapping.
> >
> >>
> >>>  		dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr,
> >> qp->rq_size,
> >>>  				 DMA_TO_DEVICE);
> >>> -		if (!rq_entry_inserted)
> >>> -			free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
> >>> -	}
> >>> +
> >>> +	if (qp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
> >>> +		free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
> >>
> >> This should be inside the previous if statement, otherwise it might
> >> try to free pages that weren't allocated.
> > If they weren't allocated the key will be INVALID and they won't be freed.
> 
> If the key is INVALID you call free_pages_exact, but rq_cpu_addr might have
> never been allocated (if RQ is of size zero).
Right, thanks
> 
> >
> >>
> >>>  err_free_qp:
> >>>  	kfree(qp);
> >>>  err_out:
> >>> @@ -887,6 +848,7 @@ static int efa_destroy_cq_idx(struct efa_dev
> >>> *dev, int cq_idx)
> >>>
> >>>  void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)  {
> >>> +	struct efa_ucontext *ucontext;
> >>
> >> Reverse xmas tree.
> > ok
> >>
> >>>  	struct efa_dev *dev = to_edev(ibcq->device);
> >>>  	struct efa_cq *cq = to_ecq(ibcq);
> >>>
> >>> @@ -894,20 +856,33 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct
> >> ib_udata *udata)
> >>>  		  "Destroy cq[%d] virt[0x%p] freed: size[%lu], dma[%pad]\n",
> >>>  		  cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
> >>>
> >>> +	ucontext = rdma_udata_to_drv_context(udata, struct efa_ucontext,
> >>> +					     ibucontext);
> >>>  	efa_destroy_cq_idx(dev, cq->cq_idx);
> >>>  	dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
> >>>  			 DMA_FROM_DEVICE);
> >>> +	rdma_user_mmap_entry_remove(&ucontext->ibucontext,
> >>> +				    cq->mmap_key);
> >>
> >> Entry removal should be first.
> > Why ? removing can lead to freeing, why would we want that before
> unmapping ?
> 
> Makes sense, thanks.
> 
> >>
> >>>  }
> >>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-30 13:19   ` Bernard Metzler
@ 2019-09-02  8:00     ` Michal Kalderon
  2019-09-02 10:55     ` Bernard Metzler
  2019-09-02 11:18     ` Bernard Metzler
  2 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-09-02  8:00 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Bernard Metzler
> 
> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
> 
> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Michal Kalderon" <mkalderon@marvell.com>
> >Date: 08/30/2019 02:42PM
> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
> >"galpress@amazon.com" <galpress@amazon.com>,
> "sleybo@amazon.com"
> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Ariel
> >Elior" <aelior@marvell.com>
> >Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw:
> >Use the common mmap_xa helpers
> >
> >> From: Bernard Metzler <BMT@zurich.ibm.com>
> >> Sent: Friday, August 30, 2019 3:08 PM
> >>
> >> External Email
> >>
> >>
> >---------------------------------------------------------------------
> >-
> >> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote: -----
> >>
> >> Hi Michael,
> >>
> >> I tried this patch. It unfortunately panics immediately when siw
> >gets used. I'll
> >> investigate further. Some comments in line.
> >Thanks for testing,
> >
> >>
> >> Thanks
> >> Bernard.
> >>
> >> >To: <mkalderon@marvell.com>, <aelior@marvell.com>,
> <jgg@ziepe.ca>,
> >> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
> >> <galpress@amazon.com>,
> >> ><sleybo@amazon.com>, <leon@kernel.org>
> >> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
> >> >Date: 08/27/2019 03:31PM
> >> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
> >> ><michal.kalderon@marvell.com>, "Ariel Elior"
> >> ><ariel.elior@marvell.com>
> >> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
> >> common
> >> >mmap_xa helpers
> >> >
> >> >Remove the functions related to managing the mmap_xa database.
> >> >This code is now common in ib_core. Use the common API's instead.
> >> >
> >> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> >> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> >> >---
> >> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
> >> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
> >> > drivers/infiniband/sw/siw/siw_verbs.c | 223
> >> >+++++++++++++++++++---------------
> >> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
> >> > 4 files changed, 144 insertions(+), 101 deletions(-)
> >> >
> >> >+			/* If entry was inserted successfully, qp->sendq
> >> >+			 * will be freed by siw_mmap_free
> >> >+			 */
> >> >+			qp->sendq = NULL;
> >>
> >> qp->sendq points to the SQ array. Zeroing this pointer will leave
> >> siw with no idea where the WQE's are. It will panic de-referencing
> >[NULL +
> >> current position in ring buffer]. Same for RQ, SRQ and CQ.
> >Qp->sendq is only used in kernel mode, and only set to NULL is
> >user-space mode
> >Where it is allocated and mapped in user, so the user will be the one
> >accessing the rings And not kernel, unless I'm missing something.
> 
> These pointers are pointing to the allocated work queues.
> These queues/arrays are holding the actual send/recv/complete/srq work
> queue elements. It is a shared array. That is why we need mmap at all.
> 
> e.g.,
> 
> struct siw_sqe *sqe = &qp->sendq[qp->sq_get % qp->attrs.sq_size];
> 
> 
Ok got it, I'm HW oriented... so user chains and kernel are totally separated. 
Will add a flag whether to free or not and remove setting to NULL. 
Thanks,
Michal



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-30 13:19   ` Bernard Metzler
  2019-09-02  8:00     ` Michal Kalderon
@ 2019-09-02 10:55     ` Bernard Metzler
  2019-09-02 11:21       ` Michal Kalderon
  2019-09-02 11:18     ` Bernard Metzler
  2 siblings, 1 reply; 25+ messages in thread
From: Bernard Metzler @ 2019-09-02 10:55 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

-----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Michal Kalderon" <mkalderon@marvell.com>
>Date: 09/02/2019 10:01AM
>Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>"galpress@amazon.com" <galpress@amazon.com>, "sleybo@amazon.com"
><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>Subject: [EXTERNAL] RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>RDMA/siw: Use the common mmap_xa helpers
>
>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> owner@vger.kernel.org> On Behalf Of Bernard Metzler
>> 
>> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
>> 
>> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
>> >From: "Michal Kalderon" <mkalderon@marvell.com>
>> >Date: 08/30/2019 02:42PM
>> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
>> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>> >"galpress@amazon.com" <galpress@amazon.com>,
>> "sleybo@amazon.com"
>> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Ariel
>> >Elior" <aelior@marvell.com>
>> >Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>RDMA/siw:
>> >Use the common mmap_xa helpers
>> >
>> >> From: Bernard Metzler <BMT@zurich.ibm.com>
>> >> Sent: Friday, August 30, 2019 3:08 PM
>> >>
>> >> External Email
>> >>
>> >>
>>
>>--------------------------------------------------------------------
>-
>> >-
>> >> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote:
>-----
>> >>
>> >> Hi Michael,
>> >>
>> >> I tried this patch. It unfortunately panics immediately when siw
>> >gets used. I'll
>> >> investigate further. Some comments in line.
>> >Thanks for testing,
>> >
>> >>
>> >> Thanks
>> >> Bernard.
>> >>
>> >> >To: <mkalderon@marvell.com>, <aelior@marvell.com>,
>> <jgg@ziepe.ca>,
>> >> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
>> >> <galpress@amazon.com>,
>> >> ><sleybo@amazon.com>, <leon@kernel.org>
>> >> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
>> >> >Date: 08/27/2019 03:31PM
>> >> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
>> >> ><michal.kalderon@marvell.com>, "Ariel Elior"
>> >> ><ariel.elior@marvell.com>
>> >> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
>> >> common
>> >> >mmap_xa helpers
>> >> >
>> >> >Remove the functions related to managing the mmap_xa database.
>> >> >This code is now common in ib_core. Use the common API's
>instead.
>> >> >
>> >> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
>> >> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>> >> >---
>> >> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
>> >> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
>> >> > drivers/infiniband/sw/siw/siw_verbs.c | 223
>> >> >+++++++++++++++++++---------------
>> >> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
>> >> > 4 files changed, 144 insertions(+), 101 deletions(-)
>> >> >
>> >> >+			/* If entry was inserted successfully, qp->sendq
>> >> >+			 * will be freed by siw_mmap_free
>> >> >+			 */
>> >> >+			qp->sendq = NULL;
>> >>
>> >> qp->sendq points to the SQ array. Zeroing this pointer will
>leave
>> >> siw with no idea where the WQE's are. It will panic
>de-referencing
>> >[NULL +
>> >> current position in ring buffer]. Same for RQ, SRQ and CQ.
>> >Qp->sendq is only used in kernel mode, and only set to NULL is
>> >user-space mode
>> >Where it is allocated and mapped in user, so the user will be the
>one
>> >accessing the rings And not kernel, unless I'm missing something.
>> 
>> These pointers are pointing to the allocated work queues.
>> These queues/arrays are holding the actual send/recv/complete/srq
>work
>> queue elements. It is a shared array. That is why we need mmap at
>all.
>> 
>> e.g.,
>> 
>> struct siw_sqe *sqe = &qp->sendq[qp->sq_get % qp->attrs.sq_size];
>> 
>> 
>Ok got it, I'm HW oriented... so user chains and kernel are totally
>separated. 
>Will add a flag whether to free or not and remove setting to NULL. 
>Thanks,
>Michal
>
Michael,
I even think there is an underlying design decision I am
questioning. The provider allocates the resource to be mmapped,
and there is no reason to have the mmapping service freeing
that resource at any point in time. I simply don't understand
why the mmap service wants to free the mmapped resource at all,
if it did not allocate it. The aim of the service
should be limited to provide mmapping bookkeeping.

The new mmap helper framework already defines a callback
for unmapping. The provider may use it to free the
resource, if appropriate. E.g., siw will not destroy a 
currently used CQ, only because the user land application
calls munmap on it.

Best regards,
Bernard.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-08-30 13:19   ` Bernard Metzler
  2019-09-02  8:00     ` Michal Kalderon
  2019-09-02 10:55     ` Bernard Metzler
@ 2019-09-02 11:18     ` Bernard Metzler
  2019-09-02 13:16       ` Michal Kalderon
  2019-09-02 14:17       ` Bernard Metzler
  2 siblings, 2 replies; 25+ messages in thread
From: Bernard Metzler @ 2019-09-02 11:18 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

-----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Michal Kalderon" <mkalderon@marvell.com>
>Date: 09/02/2019 10:01AM
>Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>"galpress@amazon.com" <galpress@amazon.com>, "sleybo@amazon.com"
><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>Subject: [EXTERNAL] RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>RDMA/siw: Use the common mmap_xa helpers
>
>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> owner@vger.kernel.org> On Behalf Of Bernard Metzler
>> 
>> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
>> 
>> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
>> >From: "Michal Kalderon" <mkalderon@marvell.com>
>> >Date: 08/30/2019 02:42PM
>> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
>> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>> >"galpress@amazon.com" <galpress@amazon.com>,
>> "sleybo@amazon.com"
>> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Ariel
>> >Elior" <aelior@marvell.com>
>> >Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>RDMA/siw:
>> >Use the common mmap_xa helpers
>> >
>> >> From: Bernard Metzler <BMT@zurich.ibm.com>
>> >> Sent: Friday, August 30, 2019 3:08 PM
>> >>
>> >> External Email
>> >>
>> >>
>>
>>--------------------------------------------------------------------
>-
>> >-
>> >> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote:
>-----
>> >>
>> >> Hi Michael,
>> >>
>> >> I tried this patch. It unfortunately panics immediately when siw
>> >gets used. I'll
>> >> investigate further. Some comments in line.
>> >Thanks for testing,
>> >
>> >>
>> >> Thanks
>> >> Bernard.
>> >>
>> >> >To: <mkalderon@marvell.com>, <aelior@marvell.com>,
>> <jgg@ziepe.ca>,
>> >> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
>> >> <galpress@amazon.com>,
>> >> ><sleybo@amazon.com>, <leon@kernel.org>
>> >> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
>> >> >Date: 08/27/2019 03:31PM
>> >> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
>> >> ><michal.kalderon@marvell.com>, "Ariel Elior"
>> >> ><ariel.elior@marvell.com>
>> >> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
>> >> common
>> >> >mmap_xa helpers
>> >> >
>> >> >Remove the functions related to managing the mmap_xa database.
>> >> >This code is now common in ib_core. Use the common API's
>instead.
>> >> >
>> >> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
>> >> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>> >> >---
>> >> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
>> >> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
>> >> > drivers/infiniband/sw/siw/siw_verbs.c | 223
>> >> >+++++++++++++++++++---------------
>> >> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
>> >> > 4 files changed, 144 insertions(+), 101 deletions(-)
>> >> >
>> >> >+			/* If entry was inserted successfully, qp->sendq
>> >> >+			 * will be freed by siw_mmap_free
>> >> >+			 */
>> >> >+			qp->sendq = NULL;
>> >>
>> >> qp->sendq points to the SQ array. Zeroing this pointer will
>leave
>> >> siw with no idea where the WQE's are. It will panic
>de-referencing
>> >[NULL +
>> >> current position in ring buffer]. Same for RQ, SRQ and CQ.
>> >Qp->sendq is only used in kernel mode, and only set to NULL is
>> >user-space mode
>> >Where it is allocated and mapped in user, so the user will be the
>one
>> >accessing the rings And not kernel, unless I'm missing something.
>> 
>> These pointers are pointing to the allocated work queues.
>> These queues/arrays are holding the actual send/recv/complete/srq
>work
>> queue elements. It is a shared array. That is why we need mmap at
>all.
>> 
>> e.g.,
>> 
>> struct siw_sqe *sqe = &qp->sendq[qp->sq_get % qp->attrs.sq_size];
>> 
>> 
>Ok got it, I'm HW oriented... so user chains and kernel are totally
>separated. 
>Will add a flag whether to free or not and remove setting to NULL. 

Forget my last reply. I was under the impression the RDMA core
mmap_xa helper stuff would free the resource. Sorry about that. So
just do not free the resource in siw_mmap_free().

Best regards,
Bernard.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-09-02 10:55     ` Bernard Metzler
@ 2019-09-02 11:21       ` Michal Kalderon
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-09-02 11:21 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

> From: Bernard Metzler <BMT@zurich.ibm.com>
> Sent: Monday, September 2, 2019 1:55 PM
> 
> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
> 
> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Michal Kalderon" <mkalderon@marvell.com>
> >Date: 09/02/2019 10:01AM
> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
> >"galpress@amazon.com" <galpress@amazon.com>,
> "sleybo@amazon.com"
> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
> >Subject: [EXTERNAL] RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
> >RDMA/siw: Use the common mmap_xa helpers
> >
> >> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> >> owner@vger.kernel.org> On Behalf Of Bernard Metzler
> >>
> >> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
> >>
> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >> >From: "Michal Kalderon" <mkalderon@marvell.com>
> >> >Date: 08/30/2019 02:42PM
> >> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
> >> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
> >> >"galpress@amazon.com" <galpress@amazon.com>,
> >> "sleybo@amazon.com"
> >> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
> >> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Ariel
> >> >Elior" <aelior@marvell.com>
> >> >Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
> >RDMA/siw:
> >> >Use the common mmap_xa helpers
> >> >
> >> >> From: Bernard Metzler <BMT@zurich.ibm.com>
> >> >> Sent: Friday, August 30, 2019 3:08 PM
> >> >>
> >> >> External Email
> >> >>
> >> >>
> >>
> >>--------------------------------------------------------------------
> >-
> >> >-
> >> >> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote:
> >-----
> >> >>
> >> >> Hi Michael,
> >> >>
> >> >> I tried this patch. It unfortunately panics immediately when siw
> >> >gets used. I'll
> >> >> investigate further. Some comments in line.
> >> >Thanks for testing,
> >> >
> >> >>
> >> >> Thanks
> >> >> Bernard.
> >> >>
> >> >> >To: <mkalderon@marvell.com>, <aelior@marvell.com>,
> >> <jgg@ziepe.ca>,
> >> >> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
> >> >> <galpress@amazon.com>,
> >> >> ><sleybo@amazon.com>, <leon@kernel.org>
> >> >> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
> >> >> >Date: 08/27/2019 03:31PM
> >> >> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
> >> >> ><michal.kalderon@marvell.com>, "Ariel Elior"
> >> >> ><ariel.elior@marvell.com>
> >> >> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
> >> >> common
> >> >> >mmap_xa helpers
> >> >> >
> >> >> >Remove the functions related to managing the mmap_xa database.
> >> >> >This code is now common in ib_core. Use the common API's
> >instead.
> >> >> >
> >> >> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> >> >> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> >> >> >---
> >> >> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
> >> >> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
> >> >> > drivers/infiniband/sw/siw/siw_verbs.c | 223
> >> >> >+++++++++++++++++++---------------
> >> >> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
> >> >> > 4 files changed, 144 insertions(+), 101 deletions(-)
> >> >> >
> >> >> >+			/* If entry was inserted successfully, qp-
> >sendq
> >> >> >+			 * will be freed by siw_mmap_free
> >> >> >+			 */
> >> >> >+			qp->sendq = NULL;
> >> >>
> >> >> qp->sendq points to the SQ array. Zeroing this pointer will
> >leave
> >> >> siw with no idea where the WQE's are. It will panic
> >de-referencing
> >> >[NULL +
> >> >> current position in ring buffer]. Same for RQ, SRQ and CQ.
> >> >Qp->sendq is only used in kernel mode, and only set to NULL is
> >> >user-space mode
> >> >Where it is allocated and mapped in user, so the user will be the
> >one
> >> >accessing the rings And not kernel, unless I'm missing something.
> >>
> >> These pointers are pointing to the allocated work queues.
> >> These queues/arrays are holding the actual send/recv/complete/srq
> >work
> >> queue elements. It is a shared array. That is why we need mmap at
> >all.
> >>
> >> e.g.,
> >>
> >> struct siw_sqe *sqe = &qp->sendq[qp->sq_get % qp->attrs.sq_size];
> >>
> >>
> >Ok got it, I'm HW oriented... so user chains and kernel are totally
> >separated.
> >Will add a flag whether to free or not and remove setting to NULL.
> >Thanks,
> >Michal
> >
> Michael,
Michal  (not Michael) thanks 😊 

> I even think there is an underlying design decision I am questioning. The
> provider allocates the resource to be mmapped, and there is no reason to
> have the mmapping service freeing that resource at any point in time. I
> simply don't understand why the mmap service wants to free the mmapped
> resource at all, if it did not allocate it. The aim of the service should be limited
> to provide mmapping bookkeeping.
> 
> The new mmap helper framework already defines a callback for unmapping.
> The provider may use it to free the resource, if appropriate. E.g., siw will not
> destroy a currently used CQ, only because the user land application calls
> munmap on it.

The function that frees the memory is the siw_mmap_free in the provider , not the 
Mapping service. 
This is called when provider calls mmap_remove and ALL maps to the address were unmapped or 
Zapped, we don't want to free the memory before the mapping was removed. 
Once the provider calls mmap_entry_insert the refcnt is set to '1' and can
Only reach zero after provider calls mmap_entry_remove. 

Thanks,
Michal

> 
> Best regards,
> Bernard.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-09-02 11:18     ` Bernard Metzler
@ 2019-09-02 13:16       ` Michal Kalderon
  2019-09-02 14:17       ` Bernard Metzler
  1 sibling, 0 replies; 25+ messages in thread
From: Michal Kalderon @ 2019-09-02 13:16 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Bernard Metzler
> 
> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
> 
> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Michal Kalderon" <mkalderon@marvell.com>
> >Date: 09/02/2019 10:01AM
> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
> >"galpress@amazon.com" <galpress@amazon.com>,
> "sleybo@amazon.com"
> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
> >Subject: [EXTERNAL] RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
> >RDMA/siw: Use the common mmap_xa helpers
> >
> >> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> >> owner@vger.kernel.org> On Behalf Of Bernard Metzler
> >>
> >> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
> >>
> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >> >From: "Michal Kalderon" <mkalderon@marvell.com>
> >> >Date: 08/30/2019 02:42PM
> >> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
> >> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
> >> >"galpress@amazon.com" <galpress@amazon.com>,
> >> "sleybo@amazon.com"
> >> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
> >> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Ariel
> >> >Elior" <aelior@marvell.com>
> >> >Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
> >RDMA/siw:
> >> >Use the common mmap_xa helpers
> >> >
> >> >> From: Bernard Metzler <BMT@zurich.ibm.com>
> >> >> Sent: Friday, August 30, 2019 3:08 PM
> >> >>
> >> >> External Email
> >> >>
> >> >>
> >>
> >>--------------------------------------------------------------------
> >-
> >> >-
> >> >> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote:
> >-----
> >> >>
> >> >> Hi Michael,
> >> >>
> >> >> I tried this patch. It unfortunately panics immediately when siw
> >> >gets used. I'll
> >> >> investigate further. Some comments in line.
> >> >Thanks for testing,
> >> >
> >> >>
> >> >> Thanks
> >> >> Bernard.
> >> >>
> >> >> >To: <mkalderon@marvell.com>, <aelior@marvell.com>,
> >> <jgg@ziepe.ca>,
> >> >> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
> >> >> <galpress@amazon.com>,
> >> >> ><sleybo@amazon.com>, <leon@kernel.org>
> >> >> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
> >> >> >Date: 08/27/2019 03:31PM
> >> >> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
> >> >> ><michal.kalderon@marvell.com>, "Ariel Elior"
> >> >> ><ariel.elior@marvell.com>
> >> >> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use the
> >> >> common
> >> >> >mmap_xa helpers
> >> >> >
> >> >> >Remove the functions related to managing the mmap_xa database.
> >> >> >This code is now common in ib_core. Use the common API's
> >instead.
> >> >> >
> >> >> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> >> >> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> >> >> >---
> >> >> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
> >> >> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
> >> >> > drivers/infiniband/sw/siw/siw_verbs.c | 223
> >> >> >+++++++++++++++++++---------------
> >> >> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
> >> >> > 4 files changed, 144 insertions(+), 101 deletions(-)
> >> >> >
> >> >> >+			/* If entry was inserted successfully, qp-
> >sendq
> >> >> >+			 * will be freed by siw_mmap_free
> >> >> >+			 */
> >> >> >+			qp->sendq = NULL;
> >> >>
> >> >> qp->sendq points to the SQ array. Zeroing this pointer will
> >leave
> >> >> siw with no idea where the WQE's are. It will panic
> >de-referencing
> >> >[NULL +
> >> >> current position in ring buffer]. Same for RQ, SRQ and CQ.
> >> >Qp->sendq is only used in kernel mode, and only set to NULL is
> >> >user-space mode
> >> >Where it is allocated and mapped in user, so the user will be the
> >one
> >> >accessing the rings And not kernel, unless I'm missing something.
> >>
> >> These pointers are pointing to the allocated work queues.
> >> These queues/arrays are holding the actual send/recv/complete/srq
> >work
> >> queue elements. It is a shared array. That is why we need mmap at
> >all.
> >>
> >> e.g.,
> >>
> >> struct siw_sqe *sqe = &qp->sendq[qp->sq_get % qp->attrs.sq_size];
> >>
> >>
> >Ok got it, I'm HW oriented... so user chains and kernel are totally
> >separated.
> >Will add a flag whether to free or not and remove setting to NULL.
> 
> Forget my last reply. I was under the impression the RDMA core mmap_xa
> helper stuff would free the resource. Sorry about that. So just do not free
> the resource in siw_mmap_free().
OK, will leave the logic exactly as it was before, just use the common mmap helper
Functions. 
thanks


> 
> Best regards,
> Bernard.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers
  2019-09-02 11:18     ` Bernard Metzler
  2019-09-02 13:16       ` Michal Kalderon
@ 2019-09-02 14:17       ` Bernard Metzler
  1 sibling, 0 replies; 25+ messages in thread
From: Bernard Metzler @ 2019-09-02 14:17 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, jgg, dledford, galpress, sleybo, leon, linux-rdma

-----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Michal Kalderon" <mkalderon@marvell.com>
>Date: 09/02/2019 03:16PM
>Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>"galpress@amazon.com" <galpress@amazon.com>, "sleybo@amazon.com"
><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>Subject: [EXTERNAL] RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>RDMA/siw: Use the common mmap_xa helpers
>
>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> owner@vger.kernel.org> On Behalf Of Bernard Metzler
>> 
>> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
>> 
>> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
>> >From: "Michal Kalderon" <mkalderon@marvell.com>
>> >Date: 09/02/2019 10:01AM
>> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
>> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>> >"galpress@amazon.com" <galpress@amazon.com>,
>> "sleybo@amazon.com"
>> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>> >Subject: [EXTERNAL] RE: RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>> >RDMA/siw: Use the common mmap_xa helpers
>> >
>> >> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> >> owner@vger.kernel.org> On Behalf Of Bernard Metzler
>> >>
>> >> -----"Michal Kalderon" <mkalderon@marvell.com> wrote: -----
>> >>
>> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
>> >> >From: "Michal Kalderon" <mkalderon@marvell.com>
>> >> >Date: 08/30/2019 02:42PM
>> >> >Cc: "Ariel Elior" <aelior@marvell.com>, "jgg@ziepe.ca"
>> >> ><jgg@ziepe.ca>, "dledford@redhat.com" <dledford@redhat.com>,
>> >> >"galpress@amazon.com" <galpress@amazon.com>,
>> >> "sleybo@amazon.com"
>> >> ><sleybo@amazon.com>, "leon@kernel.org" <leon@kernel.org>,
>> >> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
>"Ariel
>> >> >Elior" <aelior@marvell.com>
>> >> >Subject: [EXTERNAL] RE: [EXT] Re: [PATCH v8 rdma-next 4/7]
>> >RDMA/siw:
>> >> >Use the common mmap_xa helpers
>> >> >
>> >> >> From: Bernard Metzler <BMT@zurich.ibm.com>
>> >> >> Sent: Friday, August 30, 2019 3:08 PM
>> >> >>
>> >> >> External Email
>> >> >>
>> >> >>
>> >>
>>
>>>-------------------------------------------------------------------
>-
>> >-
>> >> >-
>> >> >> -----"Michal Kalderon" <michal.kalderon@marvell.com> wrote:
>> >-----
>> >> >>
>> >> >> Hi Michael,
>> >> >>
>> >> >> I tried this patch. It unfortunately panics immediately when
>siw
>> >> >gets used. I'll
>> >> >> investigate further. Some comments in line.
>> >> >Thanks for testing,
>> >> >
>> >> >>
>> >> >> Thanks
>> >> >> Bernard.
>> >> >>
>> >> >> >To: <mkalderon@marvell.com>, <aelior@marvell.com>,
>> >> <jgg@ziepe.ca>,
>> >> >> ><dledford@redhat.com>, <bmt@zurich.ibm.com>,
>> >> >> <galpress@amazon.com>,
>> >> >> ><sleybo@amazon.com>, <leon@kernel.org>
>> >> >> >From: "Michal Kalderon" <michal.kalderon@marvell.com>
>> >> >> >Date: 08/27/2019 03:31PM
>> >> >> >Cc: <linux-rdma@vger.kernel.org>, "Michal Kalderon"
>> >> >> ><michal.kalderon@marvell.com>, "Ariel Elior"
>> >> >> ><ariel.elior@marvell.com>
>> >> >> >Subject: [EXTERNAL] [PATCH v8 rdma-next 4/7] RDMA/siw: Use
>the
>> >> >> common
>> >> >> >mmap_xa helpers
>> >> >> >
>> >> >> >Remove the functions related to managing the mmap_xa
>database.
>> >> >> >This code is now common in ib_core. Use the common API's
>> >instead.
>> >> >> >
>> >> >> >Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
>> >> >> >Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>> >> >> >---
>> >> >> > drivers/infiniband/sw/siw/siw.h       |  20 ++-
>> >> >> > drivers/infiniband/sw/siw/siw_main.c  |   1 +
>> >> >> > drivers/infiniband/sw/siw/siw_verbs.c | 223
>> >> >> >+++++++++++++++++++---------------
>> >> >> > drivers/infiniband/sw/siw/siw_verbs.h |   1 +
>> >> >> > 4 files changed, 144 insertions(+), 101 deletions(-)
>> >> >> >
>> >> >> >+			/* If entry was inserted successfully, qp-
>> >sendq
>> >> >> >+			 * will be freed by siw_mmap_free
>> >> >> >+			 */
>> >> >> >+			qp->sendq = NULL;
>> >> >>
>> >> >> qp->sendq points to the SQ array. Zeroing this pointer will
>> >leave
>> >> >> siw with no idea where the WQE's are. It will panic
>> >de-referencing
>> >> >[NULL +
>> >> >> current position in ring buffer]. Same for RQ, SRQ and CQ.
>> >> >Qp->sendq is only used in kernel mode, and only set to NULL is
>> >> >user-space mode
>> >> >Where it is allocated and mapped in user, so the user will be
>the
>> >one
>> >> >accessing the rings And not kernel, unless I'm missing
>something.
>> >>
>> >> These pointers are pointing to the allocated work queues.
>> >> These queues/arrays are holding the actual
>send/recv/complete/srq
>> >work
>> >> queue elements. It is a shared array. That is why we need mmap
>at
>> >all.
>> >>
>> >> e.g.,
>> >>
>> >> struct siw_sqe *sqe = &qp->sendq[qp->sq_get %
>qp->attrs.sq_size];
>> >>
>> >>
>> >Ok got it, I'm HW oriented... so user chains and kernel are
>totally
>> >separated.
>> >Will add a flag whether to free or not and remove setting to NULL.
>> 
>> Forget my last reply. I was under the impression the RDMA core
>mmap_xa
>> helper stuff would free the resource. Sorry about that. So just do
>not free
>> the resource in siw_mmap_free().
>OK, will leave the logic exactly as it was before, just use the
>common mmap helper
>Functions. 
>thanks
>
Hi Michal,

Let me send a short patch to the list which should
fix it for siw. There were a few more nits:

- make the siw_user_mmap_entry.address a void *, which
  naturally fits with remap_vmalloc_range. also avoids
  other casting during resource address assignment.
    
- do not kfree SQ/RQ/CQ/SRQ in preparation of mmap.
  Those resources are always further needed ;)
    
- Fix check for correct mmap range in siw_mmap().
  - entry->length is the object length. We have to
    expand to PAGE_ALIGN(entry->length), since mmap
    comes with complete page(s) containing the
    object.
  - put mmap_entry if that check fails. Otherwise
    entry object ref counting screws up, and later
    crashes during context close.
    
- simplify siw_mmap_free() - it must just free
  the entry.


I think we shall further really change the size of
the to be mmapped object to size_t. That perfectly
reflects the user API mmap() call and is the right
type for both 64bit and 32bit architectures.

Best regards
Bernard.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 13:28 [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 1/7] RDMA/core: Move core content from ib_uverbs to ib_core Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 2/7] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
2019-08-29 11:35   ` Gal Pressman
2019-08-29 12:40     ` Jason Gunthorpe
2019-08-30  6:10       ` [EXT] " Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 3/7] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
2019-08-29 14:20   ` Gal Pressman
2019-08-30  6:15     ` Michal Kalderon
2019-09-01  6:56       ` Gal Pressman
2019-09-02  7:47         ` Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 4/7] RDMA/siw: " Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 5/7] RDMA/qedr: Use the common mmap API Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 6/7] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
2019-08-27 13:28 ` [PATCH v8 rdma-next 7/7] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
2019-08-28 14:41 ` [PATCH v8 rdma-next 0/7] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Gal Pressman
2019-08-30 12:07 ` [PATCH v8 rdma-next 4/7] RDMA/siw: Use the common mmap_xa helpers Bernard Metzler
2019-08-30 12:42   ` [EXT] " Michal Kalderon
2019-08-30 13:19   ` Bernard Metzler
2019-09-02  8:00     ` Michal Kalderon
2019-09-02 10:55     ` Bernard Metzler
2019-09-02 11:21       ` Michal Kalderon
2019-09-02 11:18     ` Bernard Metzler
2019-09-02 13:16       ` Michal Kalderon
2019-09-02 14:17       ` Bernard Metzler

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org linux-rdma@archiver.kernel.org
	public-inbox-index linux-rdma


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/ public-inbox