All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment
@ 2021-01-15 10:22 Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP Weihang Li
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

The hip09 introduces the DCA(Dynamic Context Attachment) feature which
supports many RC QPs to share the WQE buffer in a memory pool. If a QP
enables DCA feature, the WQE's buffer will not be allocated when creating
but when the users start to post WRs. This will reduce the memory
consumption when there are too many QPs are inactive.

Xi Wang (7):
  RDMA/hns: Introduce DCA for RC QP
  RDMA/hns: Add method for shrinking DCA memory pool
  RDMA/hns: Configure DCA mode for the userspace QP
  RDMA/hns: Add method for attaching WQE buffer
  RDMA/hns: Setup the configuration of WQE addressing to QPC
  RDMA/hns: Add method to detach WQE buffer
  RDMA/hns: Add method to query WQE buffer's address

 drivers/infiniband/hw/hns/Makefile          |    2 +-
 drivers/infiniband/hw/hns/hns_roce_dca.c    | 1264 +++++++++++++++++++++++++++
 drivers/infiniband/hw/hns/hns_roce_dca.h    |   68 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |   32 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  223 ++++-
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |    3 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |   27 +-
 drivers/infiniband/hw/hns/hns_roce_qp.c     |  119 ++-
 include/uapi/rdma/hns-abi.h                 |   60 ++
 9 files changed, 1751 insertions(+), 47 deletions(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h

-- 
2.8.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  2021-01-20  8:10   ` Leon Romanovsky
  2021-01-15 10:22 ` [PATCH RFC 2/7] RDMA/hns: Add method for shrinking DCA memory pool Weihang Li
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

The hip09 introduces the DCA(Dynamic context attachment) feature which
supports many RC QPs to share the WQE buffer in a memory pool, this will
reduce the memory consumption when there are too many QPs are inactive.

If a QP enables DCA feature, the WQE's buffer will not be allocated when
creating. But when the users start to post WRs, the hns driver will
allocate a buffer from the memory pool and then fill WQEs which tagged with
this QP's number.

The hns ROCEE will stop accessing the WQE buffer when the user polled all
of the CQEs for a DCA QP, then the driver will recycle this WQE's buffer
to the memory pool.

This patch adds a group of methods to support the user space register
buffers to a memory pool which belongs to the user context. The hns kernel
driver will update the pages state in this pool when the user calling the
post/poll methods and the user driver can get the QP's WQE buffer address
by the key and offset which queried from kernel.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/Makefile          |   2 +-
 drivers/infiniband/hw/hns/hns_roce_dca.c    | 381 ++++++++++++++++++++++++++++
 drivers/infiniband/hw/hns/hns_roce_dca.h    |  22 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |  10 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |  27 +-
 include/uapi/rdma/hns-abi.h                 |  23 ++
 6 files changed, 462 insertions(+), 3 deletions(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h

diff --git a/drivers/infiniband/hw/hns/Makefile b/drivers/infiniband/hw/hns/Makefile
index e105945..9962b23 100644
--- a/drivers/infiniband/hw/hns/Makefile
+++ b/drivers/infiniband/hw/hns/Makefile
@@ -6,7 +6,7 @@
 ccflags-y :=  -I $(srctree)/drivers/net/ethernet/hisilicon/hns3
 
 hns-roce-objs := hns_roce_main.o hns_roce_cmd.o hns_roce_pd.o \
-	hns_roce_ah.o hns_roce_hem.o hns_roce_mr.o hns_roce_qp.o \
+	hns_roce_ah.o hns_roce_hem.o hns_roce_mr.o hns_roce_qp.o hns_roce_dca.o \
 	hns_roce_cq.o hns_roce_alloc.o hns_roce_db.o hns_roce_srq.o hns_roce_restrack.o
 
 ifdef CONFIG_INFINIBAND_HNS_HIP06
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.c b/drivers/infiniband/hw/hns/hns_roce_dca.c
new file mode 100644
index 0000000..872e51a
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.c
@@ -0,0 +1,381 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2021 Hisilicon Limited. All rights reserved.
+ */
+
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/uverbs_types.h>
+#include <rdma/uverbs_ioctl.h>
+#include <rdma/uverbs_std_types.h>
+#include <rdma/ib_umem.h>
+#include "hns_roce_device.h"
+#include "hns_roce_dca.h"
+
+#define UVERBS_MODULE_NAME hns_ib
+#include <rdma/uverbs_named_ioctl.h>
+
+/* DCA memory */
+struct dca_mem {
+#define DCA_MEM_FLAGS_ALLOCED BIT(0)
+#define DCA_MEM_FLAGS_REGISTERED BIT(1)
+	u32 flags;
+	struct list_head list; /* link to mem list in dca context */
+	spinlock_t lock; /* protect the @flags and @list */
+	int page_count; /* page count in this mem obj */
+	u64 key; /* register by caller */
+	u32 size; /* bytes in this mem object */
+	struct hns_dca_page_state *states; /* record each page's state */
+	void *pages; /* memory handle for getting dma address */
+};
+
+struct dca_mem_attr {
+	u64 key;
+	u64 addr;
+	u32 size;
+};
+
+static inline bool dca_mem_is_free(struct dca_mem *mem)
+{
+	return mem->flags == 0;
+}
+
+static inline void set_dca_mem_free(struct dca_mem *mem)
+{
+	mem->flags = 0;
+}
+
+static inline void set_dca_mem_alloced(struct dca_mem *mem)
+{
+	mem->flags |= DCA_MEM_FLAGS_ALLOCED;
+}
+
+static inline void set_dca_mem_registered(struct dca_mem *mem)
+{
+	mem->flags |= DCA_MEM_FLAGS_REGISTERED;
+}
+
+static inline void clr_dca_mem_registered(struct dca_mem *mem)
+{
+	mem->flags &= ~DCA_MEM_FLAGS_REGISTERED;
+}
+
+static void free_dca_pages(void *pages)
+{
+	ib_umem_release(pages);
+}
+
+static void *alloc_dca_pages(struct hns_roce_dev *hr_dev, struct dca_mem *mem,
+			     struct dca_mem_attr *attr)
+{
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	struct ib_umem *umem;
+
+	umem = ib_umem_get(ibdev, attr->addr, attr->size, 0);
+	if (IS_ERR(umem)) {
+		ibdev_err(ibdev, "failed to get uDCA pages, ret = %ld.\n",
+			  PTR_ERR(umem));
+		return NULL;
+	}
+
+	mem->page_count = ib_umem_num_dma_blocks(umem, HNS_HW_PAGE_SIZE);
+
+	return umem;
+}
+
+static void free_mem_states(struct hns_dca_page_state *states)
+{
+	kfree(states);
+}
+
+static void init_dca_umem_states(struct hns_dca_page_state *states, int count,
+				 struct ib_umem *umem)
+{
+	struct ib_block_iter biter;
+	dma_addr_t cur_addr;
+	dma_addr_t pre_addr;
+	int i = 0;
+
+	pre_addr = 0;
+	rdma_for_each_block(umem->sg_head.sgl, &biter, umem->nmap,
+			    HNS_HW_PAGE_SIZE) {
+		cur_addr = rdma_block_iter_dma_address(&biter);
+		if (i < count) {
+			if (cur_addr - pre_addr != HNS_HW_PAGE_SIZE)
+				states[i].head = 1;
+		}
+
+		pre_addr = cur_addr;
+		i++;
+	}
+}
+
+static struct hns_dca_page_state *alloc_dca_states(void *pages, int count)
+{
+	struct hns_dca_page_state *states;
+
+	states = kcalloc(count, sizeof(*states), GFP_ATOMIC);
+	if (!states)
+		return NULL;
+
+	init_dca_umem_states(states, count, pages);
+
+	return states;
+}
+
+/* user DCA is managed by ucontext */
+static inline struct hns_roce_dca_ctx *
+to_hr_dca_ctx(struct hns_roce_ucontext *uctx)
+{
+	return &uctx->dca_ctx;
+}
+
+static void unregister_dca_mem(struct hns_roce_ucontext *uctx,
+			       struct dca_mem *mem)
+{
+	struct hns_roce_dca_ctx *ctx = to_hr_dca_ctx(uctx);
+	unsigned long flags;
+	void *states, *pages;
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+
+	spin_lock(&mem->lock);
+	clr_dca_mem_registered(mem);
+	mem->page_count = 0;
+	pages = mem->pages;
+	mem->pages = NULL;
+	states = mem->states;
+	mem->states = NULL;
+	spin_unlock(&mem->lock);
+
+	ctx->free_mems--;
+	ctx->free_size -= mem->size;
+
+	ctx->total_size -= mem->size;
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+
+	free_mem_states(states);
+	free_dca_pages(pages);
+}
+
+static int register_dca_mem(struct hns_roce_dev *hr_dev,
+			    struct hns_roce_ucontext *uctx,
+			    struct dca_mem *mem, struct dca_mem_attr *attr)
+{
+	struct hns_roce_dca_ctx *ctx = to_hr_dca_ctx(uctx);
+	void *states, *pages;
+	unsigned long flags;
+
+	pages = alloc_dca_pages(hr_dev, mem, attr);
+	if (!pages)
+		return -ENOMEM;
+
+	states = alloc_dca_states(pages, mem->page_count);
+	if (!states) {
+		free_dca_pages(pages);
+		return -ENOMEM;
+	}
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+
+	spin_lock(&mem->lock);
+	mem->pages = pages;
+	mem->states = states;
+	mem->key = attr->key;
+	mem->size = attr->size;
+	set_dca_mem_registered(mem);
+	spin_unlock(&mem->lock);
+
+	ctx->free_mems++;
+	ctx->free_size += attr->size;
+	ctx->total_size += attr->size;
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+
+	return 0;
+}
+
+static void init_dca_context(struct hns_roce_dca_ctx *ctx)
+{
+	INIT_LIST_HEAD(&ctx->pool);
+	spin_lock_init(&ctx->pool_lock);
+	ctx->total_size = 0;
+}
+
+static void cleanup_dca_context(struct hns_roce_dev *hr_dev,
+				struct hns_roce_dca_ctx *ctx)
+{
+	struct dca_mem *mem, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
+		list_del(&mem->list);
+		set_dca_mem_free(mem);
+		spin_unlock_irqrestore(&ctx->pool_lock, flags);
+
+		free_mem_states(mem->states);
+		free_dca_pages(mem->pages);
+		kfree(mem);
+
+		spin_lock_irqsave(&ctx->pool_lock, flags);
+	}
+	ctx->total_size = 0;
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+}
+
+void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
+			    struct hns_roce_ucontext *uctx)
+{
+	init_dca_context(&uctx->dca_ctx);
+}
+
+void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
+			      struct hns_roce_ucontext *uctx)
+{
+	cleanup_dca_context(hr_dev, &uctx->dca_ctx);
+}
+
+static struct dca_mem *alloc_dca_mem(struct hns_roce_dca_ctx *ctx)
+{
+	struct dca_mem *mem, *tmp, *found = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
+		spin_lock(&mem->lock);
+		if (dca_mem_is_free(mem)) {
+			found = mem;
+			set_dca_mem_alloced(mem);
+			spin_unlock(&mem->lock);
+			goto done;
+		}
+		spin_unlock(&mem->lock);
+	}
+
+done:
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+
+	if (found)
+		return found;
+
+	mem = kzalloc(sizeof(*mem), GFP_ATOMIC);
+	if (!mem)
+		return NULL;
+
+	spin_lock_init(&mem->lock);
+	INIT_LIST_HEAD(&mem->list);
+
+	set_dca_mem_alloced(mem);
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	list_add(&mem->list, &ctx->pool);
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+	return mem;
+}
+
+static void free_dca_mem(struct dca_mem *mem)
+{
+	/* We cannot hold the whole pool's lock during the DCA is working
+	 * until cleanup the context in cleanup_dca_context(), so we just
+	 * set the DCA mem state as free when destroying DCA mem object.
+	 */
+	spin_lock(&mem->lock);
+	set_dca_mem_free(mem);
+	spin_unlock(&mem->lock);
+}
+
+static inline struct hns_roce_ucontext *
+uverbs_attr_to_hr_uctx(struct uverbs_attr_bundle *attrs)
+{
+	return rdma_udata_to_drv_context(&attrs->driver_udata,
+					 struct hns_roce_ucontext, ibucontext);
+}
+
+static int UVERBS_HANDLER(HNS_IB_METHOD_DCA_MEM_REG)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_ucontext *uctx = uverbs_attr_to_hr_uctx(attrs);
+	struct hns_roce_dev *hr_dev = to_hr_dev(uctx->ibucontext.device);
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, HNS_IB_ATTR_DCA_MEM_REG_HANDLE);
+	struct dca_mem_attr init_attr = {};
+	struct dca_mem *mem;
+	int ret;
+
+	if (uverbs_copy_from(&init_attr.addr, attrs,
+			     HNS_IB_ATTR_DCA_MEM_REG_ADDR) ||
+	    uverbs_copy_from(&init_attr.size, attrs,
+			     HNS_IB_ATTR_DCA_MEM_REG_LEN) ||
+	    uverbs_copy_from(&init_attr.key, attrs,
+			     HNS_IB_ATTR_DCA_MEM_REG_KEY))
+		return -EFAULT;
+
+	mem = alloc_dca_mem(to_hr_dca_ctx(uctx));
+	if (!mem)
+		return -ENOMEM;
+
+	ret = register_dca_mem(hr_dev, uctx, mem, &init_attr);
+	if (ret) {
+		free_dca_mem(mem);
+		return ret;
+	}
+
+	uobj->object = mem;
+
+	return 0;
+}
+
+static int dca_cleanup(struct ib_uobject *uobject, enum rdma_remove_reason why,
+		       struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_ucontext *uctx = uverbs_attr_to_hr_uctx(attrs);
+	struct dca_mem *mem;
+
+	/* One DCA MEM maybe shared by many QPs, so the DCA mem uobject must
+	 * be destroyed before all QP uobjects, and we will destroy the DCA
+	 * uobjects when cleanup DCA context by calling hns_roce_cleanup_dca().
+	 */
+	if (why == RDMA_REMOVE_CLOSE || why == RDMA_REMOVE_DRIVER_REMOVE)
+		return 0;
+
+	mem = uobject->object;
+	unregister_dca_mem(uctx, mem);
+	free_dca_mem(mem);
+
+	return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+	HNS_IB_METHOD_DCA_MEM_REG,
+	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_REG_HANDLE, HNS_IB_OBJECT_DCA_MEM,
+			UVERBS_ACCESS_NEW, UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_REG_LEN, UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_REG_ADDR, UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_REG_KEY, UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD_DESTROY(
+	HNS_IB_METHOD_DCA_MEM_DEREG,
+	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_DEREG_HANDLE, HNS_IB_OBJECT_DCA_MEM,
+			UVERBS_ACCESS_DESTROY, UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(HNS_IB_OBJECT_DCA_MEM,
+			    UVERBS_TYPE_ALLOC_IDR(dca_cleanup),
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_REG),
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DEREG));
+
+static bool dca_is_supported(struct ib_device *device)
+{
+	struct hns_roce_dev *dev = to_hr_dev(device);
+
+	return dev->caps.flags & HNS_ROCE_CAP_FLAG_DCA_MODE;
+}
+
+const struct uapi_definition hns_roce_dca_uapi_defs[] = {
+	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(
+		HNS_IB_OBJECT_DCA_MEM,
+		UAPI_DEF_IS_OBJ_SUPPORTED(dca_is_supported)),
+	{}
+};
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.h b/drivers/infiniband/hw/hns/hns_roce_dca.h
new file mode 100644
index 0000000..cb3481f
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/*
+ * Copyright (c) 2021 Hisilicon Limited. All rights reserved.
+ */
+
+#ifndef __HNS_ROCE_DCA_H
+#define __HNS_ROCE_DCA_H
+
+/* DCA page state (32 bit) */
+struct hns_dca_page_state {
+	u32 buf_id : 29; /* If zero, means page can be used by any buffer. */
+	u32 lock : 1; /* @buf_id locked this page to prepare access. */
+	u32 active : 1; /* @buf_id is accessing this page. */
+	u32 head : 1; /* This page is the head in a continuous address range. */
+};
+
+void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
+			    struct hns_roce_ucontext *uctx);
+void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
+			      struct hns_roce_ucontext *uctx);
+
+#endif
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 55d5386..5524d72 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -215,6 +215,7 @@ enum {
 	HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL		= BIT(9),
 	HNS_ROCE_CAP_FLAG_ATOMIC		= BIT(10),
 	HNS_ROCE_CAP_FLAG_SDI_MODE		= BIT(14),
+	HNS_ROCE_CAP_FLAG_DCA_MODE		= BIT(15),
 	HNS_ROCE_CAP_FLAG_STASH			= BIT(17),
 };
 
@@ -266,11 +267,20 @@ struct hns_roce_uar {
 	unsigned long	logic_idx;
 };
 
+struct hns_roce_dca_ctx {
+	struct list_head pool; /* all DCA mems link to @pool */
+	spinlock_t pool_lock; /* protect @pool */
+	unsigned int free_mems; /* free mem num in pool */
+	size_t free_size; /* free mem size in pool */
+	size_t total_size; /* total size in pool */
+};
+
 struct hns_roce_ucontext {
 	struct ib_ucontext	ibucontext;
 	struct hns_roce_uar	uar;
 	struct list_head	page_list;
 	struct mutex		page_mutex;
+	struct hns_roce_dca_ctx	dca_ctx;
 };
 
 struct hns_roce_pd {
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index d9179ba..66d0d02d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -37,10 +37,12 @@
 #include <rdma/ib_addr.h>
 #include <rdma/ib_smi.h>
 #include <rdma/ib_user_verbs.h>
+#include <rdma/uverbs_ioctl.h>
 #include <rdma/ib_cache.h>
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
 #include "hns_roce_hem.h"
+#include "hns_roce_dca.h"
 
 /**
  * hns_get_gid_index - Get gid index.
@@ -306,15 +308,16 @@ static int hns_roce_modify_device(struct ib_device *ib_dev, int mask,
 static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx,
 				   struct ib_udata *udata)
 {
-	int ret;
 	struct hns_roce_ucontext *context = to_hr_ucontext(uctx);
-	struct hns_roce_ib_alloc_ucontext_resp resp = {};
 	struct hns_roce_dev *hr_dev = to_hr_dev(uctx->device);
+	struct hns_roce_ib_alloc_ucontext_resp resp = {};
+	int ret;
 
 	if (!hr_dev->active)
 		return -EAGAIN;
 
 	resp.qp_tab_size = hr_dev->caps.num_qps;
+	resp.cap_flags = (u32)hr_dev->caps.flags;
 
 	ret = hns_roce_uar_alloc(hr_dev, &context->uar);
 	if (ret)
@@ -325,6 +328,9 @@ static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx,
 		mutex_init(&context->page_mutex);
 	}
 
+	if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_DCA_MODE)
+		hns_roce_register_udca(hr_dev, context);
+
 	resp.cqe_size = hr_dev->caps.cqe_sz;
 
 	ret = ib_copy_to_udata(udata, &resp,
@@ -335,6 +341,9 @@ static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx,
 	return 0;
 
 error_fail_copy_to_udata:
+	if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_DCA_MODE)
+		hns_roce_unregister_udca(hr_dev, context);
+
 	hns_roce_uar_free(hr_dev, &context->uar);
 
 error_fail_uar_alloc:
@@ -344,8 +353,12 @@ static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx,
 static void hns_roce_dealloc_ucontext(struct ib_ucontext *ibcontext)
 {
 	struct hns_roce_ucontext *context = to_hr_ucontext(ibcontext);
+	struct hns_roce_dev *hr_dev = to_hr_dev(ibcontext->device);
 
 	hns_roce_uar_free(to_hr_dev(ibcontext->device), &context->uar);
+
+	if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_DCA_MODE)
+		hns_roce_unregister_udca(hr_dev, context);
 }
 
 static int hns_roce_mmap(struct ib_ucontext *context,
@@ -414,6 +427,12 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
 	ib_unregister_device(&hr_dev->ib_dev);
 }
 
+extern const struct uapi_definition hns_roce_dca_uapi_defs[];
+static const struct uapi_definition hns_roce_uapi_defs[] = {
+	UAPI_DEF_CHAIN(hns_roce_dca_uapi_defs),
+	{}
+};
+
 static const struct ib_device_ops hns_roce_dev_ops = {
 	.owner = THIS_MODULE,
 	.driver_id = RDMA_DRIVER_HNS,
@@ -515,6 +534,10 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
 
 	ib_set_device_ops(ib_dev, hr_dev->hw->hns_roce_dev_ops);
 	ib_set_device_ops(ib_dev, &hns_roce_dev_ops);
+
+	if (IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS))
+		ib_dev->driver_def = hns_roce_uapi_defs;
+
 	for (i = 0; i < hr_dev->caps.num_ports; i++) {
 		if (!hr_dev->iboe.netdevs[i])
 			continue;
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index 90b739d..f59abc4 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -86,10 +86,33 @@ struct hns_roce_ib_create_qp_resp {
 struct hns_roce_ib_alloc_ucontext_resp {
 	__u32	qp_tab_size;
 	__u32	cqe_size;
+	__u32	cap_flags;
 };
 
 struct hns_roce_ib_alloc_pd_resp {
 	__u32 pdn;
 };
 
+#define UVERBS_ID_NS_MASK 0xF000
+#define UVERBS_ID_NS_SHIFT 12
+
+enum hns_ib_objects {
+	HNS_IB_OBJECT_DCA_MEM = (1U << UVERBS_ID_NS_SHIFT),
+};
+
+enum hns_ib_dca_mem_methods {
+	HNS_IB_METHOD_DCA_MEM_REG = (1U << UVERBS_ID_NS_SHIFT),
+	HNS_IB_METHOD_DCA_MEM_DEREG,
+};
+
+enum hns_ib_dca_mem_reg_attrs {
+	HNS_IB_ATTR_DCA_MEM_REG_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	HNS_IB_ATTR_DCA_MEM_REG_LEN,
+	HNS_IB_ATTR_DCA_MEM_REG_ADDR,
+	HNS_IB_ATTR_DCA_MEM_REG_KEY,
+};
+
+enum hns_ib_dca_mem_dereg_attrs {
+	HNS_IB_ATTR_DCA_MEM_DEREG_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+};
 #endif /* HNS_ABI_USER_H */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 2/7] RDMA/hns: Add method for shrinking DCA memory pool
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 3/7] RDMA/hns: Configure DCA mode for the userspace QP Weihang Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

If no QP is using a DCA mem object, the userspace driver can destroy it.
So add a new method 'HNS_IB_METHOD_DCA_MEM_SHRINK' to allow the userspace
dirver to remove an object from DCA memory pool.

If a DCA mem object has been shrunk, the userspace driver can destroy it
by 'HNS_IB_METHOD_DCA_MEM_DEREG' method and free the buffer which is
allocated in userspace.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_dca.c | 142 ++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/hns/hns_roce_dca.h |   7 ++
 include/uapi/rdma/hns-abi.h              |   9 ++
 3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.c b/drivers/infiniband/hw/hns/hns_roce_dca.c
index 872e51a..72273f0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.c
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.c
@@ -35,6 +35,11 @@ struct dca_mem_attr {
 	u32 size;
 };
 
+static inline bool dca_page_is_free(struct hns_dca_page_state *state)
+{
+	return state->buf_id == HNS_DCA_INVALID_BUF_ID;
+}
+
 static inline bool dca_mem_is_free(struct dca_mem *mem)
 {
 	return mem->flags == 0;
@@ -60,6 +65,11 @@ static inline void clr_dca_mem_registered(struct dca_mem *mem)
 	mem->flags &= ~DCA_MEM_FLAGS_REGISTERED;
 }
 
+static inline bool dca_mem_is_available(struct dca_mem *mem)
+{
+	return mem->flags == (DCA_MEM_FLAGS_ALLOCED | DCA_MEM_FLAGS_REGISTERED);
+}
+
 static void free_dca_pages(void *pages)
 {
 	ib_umem_release(pages);
@@ -123,6 +133,41 @@ static struct hns_dca_page_state *alloc_dca_states(void *pages, int count)
 	return states;
 }
 
+#define DCA_MEM_STOP_ITERATE -1
+#define DCA_MEM_NEXT_ITERATE -2
+static void travel_dca_pages(struct hns_roce_dca_ctx *ctx, void *param,
+			     int (*cb)(struct dca_mem *, int, void *))
+{
+	struct dca_mem *mem, *tmp;
+	unsigned long flags;
+	bool avail;
+	int ret;
+	int i;
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
+		spin_unlock_irqrestore(&ctx->pool_lock, flags);
+
+		spin_lock(&mem->lock);
+		avail = dca_mem_is_available(mem);
+		ret = 0;
+		for (i = 0; avail && i < mem->page_count; i++) {
+			ret = cb(mem, i, param);
+			if (ret == DCA_MEM_STOP_ITERATE ||
+			    ret == DCA_MEM_NEXT_ITERATE)
+				break;
+		}
+		spin_unlock(&mem->lock);
+		spin_lock_irqsave(&ctx->pool_lock, flags);
+
+		if (ret == DCA_MEM_STOP_ITERATE)
+			goto done;
+	}
+
+done:
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+}
+
 /* user DCA is managed by ucontext */
 static inline struct hns_roce_dca_ctx *
 to_hr_dca_ctx(struct hns_roce_ucontext *uctx)
@@ -194,6 +239,63 @@ static int register_dca_mem(struct hns_roce_dev *hr_dev,
 	return 0;
 }
 
+struct dca_mem_shrink_attr {
+	u64 shrink_key;
+	u32 shrink_mems;
+};
+
+static int shrink_dca_page_proc(struct dca_mem *mem, int index, void *param)
+{
+	struct dca_mem_shrink_attr *attr = param;
+	struct hns_dca_page_state *state;
+	int i, free_pages;
+
+	free_pages = 0;
+	for (i = 0; i < mem->page_count; i++) {
+		state = &mem->states[i];
+		if (dca_page_is_free(state))
+			free_pages++;
+	}
+
+	/* No pages are in use */
+	if (free_pages == mem->page_count) {
+		/* unregister first empty DCA mem */
+		if (!attr->shrink_mems) {
+			clr_dca_mem_registered(mem);
+			attr->shrink_key = mem->key;
+		}
+
+		attr->shrink_mems++;
+	}
+
+	if (attr->shrink_mems > 1)
+		return DCA_MEM_STOP_ITERATE;
+	else
+		return DCA_MEM_NEXT_ITERATE;
+}
+
+static int shrink_dca_mem(struct hns_roce_dev *hr_dev,
+			  struct hns_roce_ucontext *uctx, u64 reserved_size,
+			  struct hns_dca_shrink_resp *resp)
+{
+	struct hns_roce_dca_ctx *ctx = to_hr_dca_ctx(uctx);
+	struct dca_mem_shrink_attr attr = {};
+	unsigned long flags;
+	bool need_shink;
+
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	need_shink = ctx->free_mems > 0 && ctx->free_size > reserved_size;
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+	if (!need_shink)
+		return 0;
+
+	travel_dca_pages(ctx, &attr, shrink_dca_page_proc);
+	resp->free_mems = attr.shrink_mems;
+	resp->free_key = attr.shrink_key;
+
+	return 0;
+}
+
 static void init_dca_context(struct hns_roce_dca_ctx *ctx)
 {
 	INIT_LIST_HEAD(&ctx->pool);
@@ -361,10 +463,48 @@ DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_DEREG_HANDLE, HNS_IB_OBJECT_DCA_MEM,
 			UVERBS_ACCESS_DESTROY, UA_MANDATORY));
 
+static int UVERBS_HANDLER(HNS_IB_METHOD_DCA_MEM_SHRINK)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_ucontext *uctx = uverbs_attr_to_hr_uctx(attrs);
+	struct hns_dca_shrink_resp resp = {};
+	u64 reserved_size = 0;
+	int ret;
+
+	if (uverbs_copy_from(&reserved_size, attrs,
+			     HNS_IB_ATTR_DCA_MEM_SHRINK_RESERVED_SIZE))
+		return -EFAULT;
+
+	ret = shrink_dca_mem(to_hr_dev(uctx->ibucontext.device), uctx,
+			     reserved_size, &resp);
+	if (ret)
+		return ret;
+
+	if (uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_KEY,
+			   &resp.free_key, sizeof(resp.free_key)) ||
+	    uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_MEMS,
+			   &resp.free_mems, sizeof(resp.free_mems)))
+		return -EFAULT;
+
+	return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+	HNS_IB_METHOD_DCA_MEM_SHRINK,
+	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_SHRINK_HANDLE,
+			HNS_IB_OBJECT_DCA_MEM, UVERBS_ACCESS_WRITE,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_SHRINK_RESERVED_SIZE,
+			   UVERBS_ATTR_TYPE(u64), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_KEY,
+			    UVERBS_ATTR_TYPE(u64), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_MEMS,
+			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
 DECLARE_UVERBS_NAMED_OBJECT(HNS_IB_OBJECT_DCA_MEM,
 			    UVERBS_TYPE_ALLOC_IDR(dca_cleanup),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_REG),
-			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DEREG));
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DEREG),
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_SHRINK));
 
 static bool dca_is_supported(struct ib_device *device)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.h b/drivers/infiniband/hw/hns/hns_roce_dca.h
index cb3481f..97caf03 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.h
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.h
@@ -14,6 +14,13 @@ struct hns_dca_page_state {
 	u32 head : 1; /* This page is the head in a continuous address range. */
 };
 
+struct hns_dca_shrink_resp {
+	u64 free_key; /* free buffer's key which registered by the user */
+	u32 free_mems; /* free buffer count which no any QP be using */
+};
+
+#define HNS_DCA_INVALID_BUF_ID 0UL
+
 void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_ucontext *uctx);
 void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index f59abc4..74fc11a 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -103,6 +103,7 @@ enum hns_ib_objects {
 enum hns_ib_dca_mem_methods {
 	HNS_IB_METHOD_DCA_MEM_REG = (1U << UVERBS_ID_NS_SHIFT),
 	HNS_IB_METHOD_DCA_MEM_DEREG,
+	HNS_IB_METHOD_DCA_MEM_SHRINK,
 };
 
 enum hns_ib_dca_mem_reg_attrs {
@@ -115,4 +116,12 @@ enum hns_ib_dca_mem_reg_attrs {
 enum hns_ib_dca_mem_dereg_attrs {
 	HNS_IB_ATTR_DCA_MEM_DEREG_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
 };
+
+enum hns_ib_dca_mem_shrink_attrs {
+	HNS_IB_ATTR_DCA_MEM_SHRINK_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	HNS_IB_ATTR_DCA_MEM_SHRINK_RESERVED_SIZE,
+	HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_KEY,
+	HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_MEMS,
+};
+
 #endif /* HNS_ABI_USER_H */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 3/7] RDMA/hns: Configure DCA mode for the userspace QP
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 2/7] RDMA/hns: Add method for shrinking DCA memory pool Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 4/7] RDMA/hns: Add method for attaching WQE buffer Weihang Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

If the userspace driver assign a NULL to the field of 'buf_addr' in
'struct hns_roce_ib_create_qp' when creating QP, this means the kernel
driver need setup the QP as DCA mode. So add a QP capability bit in
response to indicate the userspace driver that the DCA mode has been
enabled.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_dca.c    |  17 ++++
 drivers/infiniband/hw/hns/hns_roce_dca.h    |   3 +
 drivers/infiniband/hw/hns/hns_roce_device.h |   5 ++
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  23 +++++-
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |   2 +
 drivers/infiniband/hw/hns/hns_roce_qp.c     | 119 ++++++++++++++++++++++------
 include/uapi/rdma/hns-abi.h                 |   1 +
 7 files changed, 141 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.c b/drivers/infiniband/hw/hns/hns_roce_dca.c
index 72273f0..40999a8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.c
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.c
@@ -386,6 +386,23 @@ static void free_dca_mem(struct dca_mem *mem)
 	spin_unlock(&mem->lock);
 }
 
+int hns_roce_enable_dca(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+{
+	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
+
+	cfg->buf_id = HNS_DCA_INVALID_BUF_ID;
+
+	return 0;
+}
+
+void hns_roce_disable_dca(struct hns_roce_dev *hr_dev,
+			  struct hns_roce_qp *hr_qp)
+{
+	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
+
+	cfg->buf_id = HNS_DCA_INVALID_BUF_ID;
+}
+
 static inline struct hns_roce_ucontext *
 uverbs_attr_to_hr_uctx(struct uverbs_attr_bundle *attrs)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.h b/drivers/infiniband/hw/hns/hns_roce_dca.h
index 97caf03..419606ef 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.h
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.h
@@ -26,4 +26,7 @@ void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
 void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
 			      struct hns_roce_ucontext *uctx);
 
+int hns_roce_enable_dca(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
+void hns_roce_disable_dca(struct hns_roce_dev *hr_dev,
+			  struct hns_roce_qp *hr_qp);
 #endif
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 5524d72..016df5b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -365,6 +365,10 @@ struct hns_roce_mtr {
 	struct hns_roce_hem_cfg  hem_cfg; /* config for hardware addressing */
 };
 
+struct hns_roce_dca_cfg {
+	u32 buf_id;
+};
+
 struct hns_roce_mw {
 	struct ib_mw		ibmw;
 	u32			pdn;
@@ -661,6 +665,7 @@ struct hns_roce_qp {
 	struct hns_roce_wq	sq;
 
 	struct hns_roce_mtr	mtr;
+	struct hns_roce_dca_cfg	dca_cfg;
 
 	u32			buff_size;
 	struct mutex		mutex;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 4c06889..4ae76ba 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -346,6 +346,11 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 	return 0;
 }
 
+static inline bool check_qp_dca_enable(struct hns_roce_qp *hr_qp)
+{
+	return !!(hr_qp->en_flags & HNS_ROCE_QP_CAP_DCA);
+}
+
 static int check_send_valid(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_qp *hr_qp)
 {
@@ -4290,6 +4295,21 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
 	roce_set_field(qpc_mask->byte_140_raq, V2_QPC_BYTE_140_TRRL_BA_M,
 		       V2_QPC_BYTE_140_TRRL_BA_S, 0);
 
+	/* hip09 reused the IRRL_HEAD fileds in hip08 */
+	if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP09) {
+		if (check_qp_dca_enable(hr_qp)) {
+			roce_set_bit(context->byte_196_sq_psn,
+				     V2_QPC_BYTE_196_DCA_MODE_S, 1);
+			roce_set_bit(qpc_mask->byte_196_sq_psn,
+				     V2_QPC_BYTE_196_DCA_MODE_S, 0);
+		}
+	} else {
+		/* reset IRRL_HEAD */
+		roce_set_field(qpc_mask->byte_196_sq_psn,
+			       V2_QPC_BYTE_196_IRRL_HEAD_M,
+			       V2_QPC_BYTE_196_IRRL_HEAD_S, 0);
+	}
+
 	context->irrl_ba = cpu_to_le32(irrl_ba >> 6);
 	qpc_mask->irrl_ba = 0;
 	roce_set_field(context->byte_208_irrl, V2_QPC_BYTE_208_IRRL_BA_M,
@@ -4456,9 +4476,6 @@ static int modify_qp_rtr_to_rts(struct ib_qp *ibqp,
 	roce_set_field(qpc_mask->byte_212_lsn, V2_QPC_BYTE_212_LSN_M,
 		       V2_QPC_BYTE_212_LSN_S, 0);
 
-	roce_set_field(qpc_mask->byte_196_sq_psn, V2_QPC_BYTE_196_IRRL_HEAD_M,
-		       V2_QPC_BYTE_196_IRRL_HEAD_S, 0);
-
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index bdaccf8..6c4c2b7 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -796,6 +796,8 @@ struct hns_roce_v2_qp_context {
 #define	V2_QPC_BYTE_196_IRRL_HEAD_S 0
 #define V2_QPC_BYTE_196_IRRL_HEAD_M GENMASK(7, 0)
 
+#define V2_QPC_BYTE_196_DCA_MODE_S 6
+
 #define	V2_QPC_BYTE_196_SQ_MAX_PSN_S 8
 #define V2_QPC_BYTE_196_SQ_MAX_PSN_M GENMASK(31, 8)
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index d8e2fe5..b08d111 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -39,6 +39,7 @@
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
 #include "hns_roce_hem.h"
+#include "hns_roce_dca.h"
 
 static void flush_work_handle(struct work_struct *work)
 {
@@ -553,8 +554,21 @@ static int set_user_sq_size(struct hns_roce_dev *hr_dev,
 	return 0;
 }
 
+static bool check_dca_is_enable(struct hns_roce_dev *hr_dev, bool is_user,
+				unsigned long addr)
+{
+	if (!(hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_DCA_MODE))
+		return false;
+
+	/* If the user QP's buffer addr is 0, the DCA mode should be enabled */
+	if (is_user)
+		return !addr;
+
+	return false;
+}
+
 static int set_wqe_buf_attr(struct hns_roce_dev *hr_dev,
-			    struct hns_roce_qp *hr_qp,
+			    struct hns_roce_qp *hr_qp, bool dca_en,
 			    struct hns_roce_buf_attr *buf_attr)
 {
 	int buf_size;
@@ -598,10 +612,22 @@ static int set_wqe_buf_attr(struct hns_roce_dev *hr_dev,
 	if (hr_qp->buff_size < 1)
 		return -EINVAL;
 
-	buf_attr->page_shift = HNS_HW_PAGE_SHIFT + hr_dev->caps.mtt_buf_pg_sz;
 	buf_attr->fixed_page = true;
 	buf_attr->region_count = idx;
 
+	if (dca_en) {
+		/*
+		 * When enable DCA, there's no need to alloc buffer now, and
+		 * the page shift should be fixed to 4K.
+		 */
+		buf_attr->mtt_only = true;
+		buf_attr->page_shift = HNS_HW_PAGE_SHIFT;
+	} else {
+		buf_attr->mtt_only = false;
+		buf_attr->page_shift = HNS_HW_PAGE_SHIFT +
+				       hr_dev->caps.mtt_buf_pg_sz;
+	}
+
 	return 0;
 }
 
@@ -700,12 +726,53 @@ static void free_rq_inline_buf(struct hns_roce_qp *hr_qp)
 	kfree(hr_qp->rq_inl_buf.wqe_list);
 }
 
-static int alloc_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+static int alloc_wqe_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+			 bool dca_en, struct hns_roce_buf_attr *buf_attr,
+			 struct ib_udata *udata, unsigned long addr)
+{
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	int ret;
+
+	if (dca_en) {
+		/* DCA must be enabled after the buffer size is configured. */
+		ret = hns_roce_enable_dca(hr_dev, hr_qp);
+		if (ret) {
+			ibdev_err(ibdev, "failed to enable DCA, ret = %d.\n",
+				  ret);
+			return ret;
+		}
+
+		hr_qp->en_flags |= HNS_ROCE_QP_CAP_DCA;
+	}
+
+	ret = hns_roce_mtr_create(hr_dev, &hr_qp->mtr, buf_attr,
+				  HNS_HW_PAGE_SHIFT + hr_dev->caps.mtt_ba_pg_sz,
+				  udata, addr);
+	if (ret) {
+		ibdev_err(ibdev, "failed to create WQE mtr, ret = %d.\n", ret);
+		if (dca_en)
+			hns_roce_disable_dca(hr_dev, hr_qp);
+	}
+
+	return ret;
+}
+
+static void free_wqe_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+			 struct ib_udata *udata)
+{
+	hns_roce_mtr_destroy(hr_dev, &hr_qp->mtr);
+
+	if (hr_qp->en_flags & HNS_ROCE_QP_CAP_DCA)
+		hns_roce_disable_dca(hr_dev, hr_qp);
+}
+
+static int alloc_qp_wqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 			struct ib_qp_init_attr *init_attr,
 			struct ib_udata *udata, unsigned long addr)
 {
 	struct ib_device *ibdev = &hr_dev->ib_dev;
 	struct hns_roce_buf_attr buf_attr = {};
+	bool dca_en;
 	int ret;
 
 	if (!udata && hr_qp->rq_inl_buf.wqe_cnt) {
@@ -720,16 +787,16 @@ static int alloc_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 		hr_qp->rq_inl_buf.wqe_list = NULL;
 	}
 
-	ret = set_wqe_buf_attr(hr_dev, hr_qp, &buf_attr);
+	dca_en = check_dca_is_enable(hr_dev, !!udata, addr);
+	ret = set_wqe_buf_attr(hr_dev, hr_qp, dca_en, &buf_attr);
 	if (ret) {
-		ibdev_err(ibdev, "failed to split WQE buf, ret = %d.\n", ret);
+		ibdev_err(ibdev, "failed to set WQE attr, ret = %d.\n", ret);
 		goto err_inline;
 	}
-	ret = hns_roce_mtr_create(hr_dev, &hr_qp->mtr, &buf_attr,
-				  HNS_HW_PAGE_SHIFT + hr_dev->caps.mtt_ba_pg_sz,
-				  udata, addr);
+
+	ret = alloc_wqe_buf(hr_dev, hr_qp, dca_en, &buf_attr, udata, addr);
 	if (ret) {
-		ibdev_err(ibdev, "failed to create WQE mtr, ret = %d.\n", ret);
+		ibdev_err(ibdev, "failed to alloc WQE buf, ret = %d.\n", ret);
 		goto err_inline;
 	}
 
@@ -740,9 +807,10 @@ static int alloc_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 	return ret;
 }
 
-static void free_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+static void free_qp_wqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+			struct ib_udata *udata)
 {
-	hns_roce_mtr_destroy(hr_dev, &hr_qp->mtr);
+	free_wqe_buf(hr_dev, hr_qp, udata);
 	free_rq_inline_buf(hr_qp);
 }
 
@@ -800,7 +868,6 @@ static int alloc_qp_db(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 				goto err_out;
 			}
 			hr_qp->en_flags |= HNS_ROCE_QP_CAP_SQ_RECORD_DB;
-			resp->cap_flags |= HNS_ROCE_QP_CAP_SQ_RECORD_DB;
 		}
 
 		if (user_qp_has_rdb(hr_dev, init_attr, udata, resp)) {
@@ -813,7 +880,6 @@ static int alloc_qp_db(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 				goto err_sdb;
 			}
 			hr_qp->en_flags |= HNS_ROCE_QP_CAP_RQ_RECORD_DB;
-			resp->cap_flags |= HNS_ROCE_QP_CAP_RQ_RECORD_DB;
 		}
 	} else {
 		/* QP doorbell register address */
@@ -987,22 +1053,22 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 		}
 	}
 
-	ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp);
+	ret = alloc_qpn(hr_dev, hr_qp);
 	if (ret) {
-		ibdev_err(ibdev, "failed to alloc QP doorbell, ret = %d.\n",
-			  ret);
+		ibdev_err(ibdev, "failed to alloc QPN, ret = %d.\n", ret);
 		goto err_wrid;
 	}
 
-	ret = alloc_qp_buf(hr_dev, hr_qp, init_attr, udata, ucmd.buf_addr);
+	ret = alloc_qp_wqe(hr_dev, hr_qp, init_attr, udata, ucmd.buf_addr);
 	if (ret) {
 		ibdev_err(ibdev, "failed to alloc QP buffer, ret = %d.\n", ret);
-		goto err_db;
+		goto err_qpn;
 	}
 
-	ret = alloc_qpn(hr_dev, hr_qp);
+	ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp);
 	if (ret) {
-		ibdev_err(ibdev, "failed to alloc QPN, ret = %d.\n", ret);
+		ibdev_err(ibdev, "failed to alloc QP doorbell, ret = %d.\n",
+			  ret);
 		goto err_buf;
 	}
 
@@ -1010,7 +1076,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 	if (ret) {
 		ibdev_err(ibdev, "failed to alloc QP context, ret = %d.\n",
 			  ret);
-		goto err_qpn;
+		goto err_db;
 	}
 
 	ret = hns_roce_qp_store(hr_dev, hr_qp, init_attr);
@@ -1020,6 +1086,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 	}
 
 	if (udata) {
+		resp.cap_flags = hr_qp->en_flags;
 		ret = ib_copy_to_udata(udata, &resp,
 				       min(udata->outlen, sizeof(resp)));
 		if (ret) {
@@ -1045,12 +1112,12 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 	hns_roce_qp_remove(hr_dev, hr_qp);
 err_qpc:
 	free_qpc(hr_dev, hr_qp);
-err_qpn:
-	free_qpn(hr_dev, hr_qp);
-err_buf:
-	free_qp_buf(hr_dev, hr_qp);
 err_db:
 	free_qp_db(hr_dev, hr_qp, udata);
+err_buf:
+	free_qp_wqe(hr_dev, hr_qp, udata);
+err_qpn:
+	free_qpn(hr_dev, hr_qp);
 err_wrid:
 	free_kernel_wrid(hr_qp);
 	return ret;
@@ -1065,7 +1132,7 @@ void hns_roce_qp_destroy(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 
 	free_qpc(hr_dev, hr_qp);
 	free_qpn(hr_dev, hr_qp);
-	free_qp_buf(hr_dev, hr_qp);
+	free_qp_wqe(hr_dev, hr_qp, udata);
 	free_kernel_wrid(hr_qp);
 	free_qp_db(hr_dev, hr_qp, udata);
 
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index 74fc11a..996336f 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -77,6 +77,7 @@ enum hns_roce_qp_cap_flags {
 	HNS_ROCE_QP_CAP_RQ_RECORD_DB = 1 << 0,
 	HNS_ROCE_QP_CAP_SQ_RECORD_DB = 1 << 1,
 	HNS_ROCE_QP_CAP_OWNER_DB = 1 << 2,
+	HNS_ROCE_QP_CAP_DCA = 1 << 4,
 };
 
 struct hns_roce_ib_create_qp_resp {
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 4/7] RDMA/hns: Add method for attaching WQE buffer
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
                   ` (2 preceding siblings ...)
  2021-01-15 10:22 ` [PATCH RFC 3/7] RDMA/hns: Configure DCA mode for the userspace QP Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 5/7] RDMA/hns: Setup the configuration of WQE addressing to QPC Weihang Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

If a uQP works as DCA mode, the userspace driver need config the WQE buffer
by calling the 'HNS_IB_METHOD_DCA_MEM_ATTACH' method before filling the
WQE. This method will allocate a group of pages from DCA memory pool and
write the configuration of addressing to QPC.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_dca.c    | 460 +++++++++++++++++++++++++++-
 drivers/infiniband/hw/hns/hns_roce_dca.h    |  25 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |  13 +
 include/uapi/rdma/hns-abi.h                 |  11 +
 4 files changed, 508 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.c b/drivers/infiniband/hw/hns/hns_roce_dca.c
index 40999a8..f44197d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.c
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.c
@@ -35,11 +35,53 @@ struct dca_mem_attr {
 	u32 size;
 };
 
+static inline void set_dca_page_to_free(struct hns_dca_page_state *state)
+{
+	state->buf_id = HNS_DCA_INVALID_BUF_ID;
+	state->active = 0;
+	state->lock = 0;
+}
+
+static inline void lock_dca_page_to_attach(struct hns_dca_page_state *state,
+					   u32 buf_id)
+{
+	state->buf_id = HNS_DCA_ID_MASK & buf_id;
+	state->active = 0;
+	state->lock = 1;
+}
+
+static inline void unlock_dca_page_to_active(struct hns_dca_page_state *state,
+					     u32 buf_id)
+{
+	state->buf_id = HNS_DCA_ID_MASK & buf_id;
+	state->active = 1;
+	state->lock = 0;
+}
+
 static inline bool dca_page_is_free(struct hns_dca_page_state *state)
 {
 	return state->buf_id == HNS_DCA_INVALID_BUF_ID;
 }
 
+static inline bool dca_page_is_attached(struct hns_dca_page_state *state,
+					u32 buf_id)
+{
+	/* only the own bit needs to be matched. */
+	return (HNS_DCA_OWN_MASK & buf_id) ==
+			(HNS_DCA_OWN_MASK & state->buf_id);
+}
+
+static inline bool dca_page_is_allocated(struct hns_dca_page_state *state,
+					 u32 buf_id)
+{
+	return dca_page_is_attached(state, buf_id) && state->lock;
+}
+
+static inline bool dca_page_is_inactive(struct hns_dca_page_state *state)
+{
+	return !state->lock && !state->active;
+}
+
 static inline bool dca_mem_is_free(struct dca_mem *mem)
 {
 	return mem->flags == 0;
@@ -386,11 +428,365 @@ static void free_dca_mem(struct dca_mem *mem)
 	spin_unlock(&mem->lock);
 }
 
+static inline struct hns_roce_dca_ctx *hr_qp_to_dca_ctx(struct hns_roce_qp *qp)
+{
+	return to_hr_dca_ctx(to_hr_ucontext(qp->ibqp.pd->uobject->context));
+}
+
+struct dca_page_clear_attr {
+	u32 buf_id;
+	u32 max_pages;
+	u32 clear_pages;
+};
+
+static int clear_dca_pages_proc(struct dca_mem *mem, int index, void *param)
+{
+	struct hns_dca_page_state *state = &mem->states[index];
+	struct dca_page_clear_attr *attr = param;
+
+	if (dca_page_is_attached(state, attr->buf_id)) {
+		set_dca_page_to_free(state);
+		attr->clear_pages++;
+	}
+
+	if (attr->clear_pages >= attr->max_pages)
+		return DCA_MEM_STOP_ITERATE;
+	else
+		return 0;
+}
+
+static void clear_dca_pages(struct hns_roce_dca_ctx *ctx, u32 buf_id, u32 count)
+{
+	struct dca_page_clear_attr attr = {};
+
+	attr.buf_id = buf_id;
+	attr.max_pages = count;
+	travel_dca_pages(ctx, &attr, clear_dca_pages_proc);
+}
+
+struct dca_page_assign_attr {
+	u32 buf_id;
+	int unit;
+	int total;
+	int max;
+};
+
+static bool dca_page_is_allocable(struct hns_dca_page_state *state, bool head)
+{
+	bool is_free = dca_page_is_free(state) || dca_page_is_inactive(state);
+
+	return head ? is_free : is_free && !state->head;
+}
+
+static int assign_dca_pages_proc(struct dca_mem *mem, int index, void *param)
+{
+	struct dca_page_assign_attr *attr = param;
+	struct hns_dca_page_state *state;
+	int checked_pages = 0;
+	int start_index = 0;
+	int free_pages = 0;
+	int i;
+
+	/* Check the continuous pages count is not smaller than unit count */
+	for (i = index; free_pages < attr->unit && i < mem->page_count; i++) {
+		checked_pages++;
+		state = &mem->states[i];
+		if (dca_page_is_allocable(state, free_pages == 0)) {
+			if (free_pages == 0)
+				start_index = i;
+
+			free_pages++;
+		} else {
+			free_pages = 0;
+		}
+	}
+
+	if (free_pages < attr->unit)
+		return DCA_MEM_NEXT_ITERATE;
+
+	for (i = 0; i < free_pages; i++) {
+		state = &mem->states[start_index + i];
+		lock_dca_page_to_attach(state, attr->buf_id);
+		attr->total++;
+	}
+
+	if (attr->total >= attr->max)
+		return DCA_MEM_STOP_ITERATE;
+
+	return checked_pages;
+}
+
+static u32 assign_dca_pages(struct hns_roce_dca_ctx *ctx, u32 buf_id, u32 count,
+			    u32 unit)
+{
+	struct dca_page_assign_attr attr = {};
+
+	attr.buf_id = buf_id;
+	attr.unit = unit;
+	attr.max = count;
+	travel_dca_pages(ctx, &attr, assign_dca_pages_proc);
+	return attr.total;
+}
+
+struct dca_page_active_attr {
+	u32 buf_id;
+	u32 max_pages;
+	u32 alloc_pages;
+	u32 dirty_mems;
+};
+
+static int active_dca_pages_proc(struct dca_mem *mem, int index, void *param)
+{
+	struct dca_page_active_attr *attr = param;
+	struct hns_dca_page_state *state;
+	bool changed = false;
+	bool stop = false;
+	int i, free_pages;
+
+	free_pages = 0;
+	for (i = 0; !stop && i < mem->page_count; i++) {
+		state = &mem->states[i];
+		if (dca_page_is_free(state)) {
+			free_pages++;
+		} else if (dca_page_is_allocated(state, attr->buf_id)) {
+			free_pages++;
+			/* Change matched pages state */
+			unlock_dca_page_to_active(state, attr->buf_id);
+			changed = true;
+			attr->alloc_pages++;
+			if (attr->alloc_pages == attr->max_pages)
+				stop = true;
+		}
+	}
+
+	for (; changed && i < mem->page_count; i++)
+		if (dca_page_is_free(state))
+			free_pages++;
+
+	/* Clean mem changed to dirty */
+	if (changed && free_pages == mem->page_count)
+		attr->dirty_mems++;
+
+	return stop ? DCA_MEM_STOP_ITERATE : DCA_MEM_NEXT_ITERATE;
+}
+
+static u32 active_dca_pages(struct hns_roce_dca_ctx *ctx, u32 buf_id, u32 count)
+{
+	struct dca_page_active_attr attr = {};
+	unsigned long flags;
+
+	attr.buf_id = buf_id;
+	attr.max_pages = count;
+	travel_dca_pages(ctx, &attr, active_dca_pages_proc);
+
+	/* Update free size */
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	ctx->free_mems -= attr.dirty_mems;
+	ctx->free_size -= attr.alloc_pages << HNS_HW_PAGE_SHIFT;
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+
+	return attr.alloc_pages;
+}
+
+struct dca_get_alloced_pages_attr {
+	u32 buf_id;
+	dma_addr_t *pages;
+	u32 total;
+	u32 max;
+};
+
+static int get_alloced_umem_proc(struct dca_mem *mem, int index, void *param)
+
+{
+	struct dca_get_alloced_pages_attr *attr = param;
+	struct hns_dca_page_state *states = mem->states;
+	struct ib_umem *umem = mem->pages;
+	struct ib_block_iter biter;
+	u32 i = 0;
+
+	rdma_for_each_block(umem->sg_head.sgl, &biter, umem->nmap,
+			    HNS_HW_PAGE_SIZE) {
+		if (dca_page_is_allocated(&states[i], attr->buf_id)) {
+			attr->pages[attr->total++] =
+					rdma_block_iter_dma_address(&biter);
+			if (attr->total >= attr->max)
+				return DCA_MEM_STOP_ITERATE;
+		}
+		i++;
+	}
+
+	return DCA_MEM_NEXT_ITERATE;
+}
+
+static int apply_dca_cfg(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+			 struct hns_dca_attach_attr *attach_attr)
+{
+	struct hns_roce_dca_attr attr;
+
+	if (hr_dev->hw->set_dca_buf) {
+		attr.sq_offset = attach_attr->sq_offset;
+		attr.sge_offset = attach_attr->sge_offset;
+		attr.rq_offset = attach_attr->rq_offset;
+		return hr_dev->hw->set_dca_buf(hr_dev, hr_qp, &attr);
+	}
+
+	return 0;
+}
+
+static int setup_dca_buf_to_hw(struct hns_roce_dca_ctx *ctx,
+			       struct hns_roce_qp *hr_qp, u32 buf_id,
+			       struct hns_dca_attach_attr *attach_attr)
+{
+	struct hns_roce_dev *hr_dev = to_hr_dev(hr_qp->ibqp.device);
+	struct dca_get_alloced_pages_attr attr = {};
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	u32 count = hr_qp->dca_cfg.npages;
+	dma_addr_t *pages;
+	int ret;
+
+	/* Alloc a tmp array to store buffer's dma address */
+	pages = kvcalloc(count, sizeof(dma_addr_t), GFP_ATOMIC);
+	if (!pages)
+		return -ENOMEM;
+
+	attr.buf_id = buf_id;
+	attr.pages = pages;
+	attr.max = count;
+
+	travel_dca_pages(ctx, &attr, get_alloced_umem_proc);
+	if (attr.total != count) {
+		ibdev_err(ibdev, "failed to get DCA page %u != %u.\n",
+			  attr.total, count);
+		ret = -ENOMEM;
+		goto done;
+	}
+
+	/* Update MTT for ROCEE addressing */
+	ret = hns_roce_mtr_map(hr_dev, &hr_qp->mtr, pages, count);
+	if (ret) {
+		ibdev_err(ibdev, "failed to map DCA pages, ret = %d.\n", ret);
+		goto done;
+	}
+
+	/* Apply the changes for WQE address */
+	ret = apply_dca_cfg(hr_dev, hr_qp, attach_attr);
+	if (ret)
+		ibdev_err(ibdev, "failed to apply DCA cfg, ret = %d.\n", ret);
+
+done:
+	/* Drop tmp array */
+	kvfree(pages);
+	return ret;
+}
+
+static u32 alloc_buf_from_dca_mem(struct hns_roce_qp *hr_qp,
+				  struct hns_roce_dca_ctx *ctx)
+{
+	u32 buf_pages, unit_pages, alloc_pages;
+	u32 buf_id;
+
+	buf_pages = hr_qp->dca_cfg.npages;
+	/* Gen new buf id */
+	buf_id = HNS_DCA_TO_BUF_ID(hr_qp->qpn, hr_qp->dca_cfg.attach_count);
+
+	/* Assign pages from free pages */
+	unit_pages = hr_qp->mtr.hem_cfg.is_direct ? buf_pages : 1;
+	alloc_pages = assign_dca_pages(ctx, buf_id, buf_pages, unit_pages);
+	if (buf_pages != alloc_pages) {
+		if (alloc_pages > 0)
+			clear_dca_pages(ctx, buf_id, alloc_pages);
+		return HNS_DCA_INVALID_BUF_ID;
+	}
+
+	return buf_id;
+}
+
+static int active_alloced_buf(struct hns_roce_qp *hr_qp,
+			      struct hns_roce_dca_ctx *ctx,
+			      struct hns_dca_attach_attr *attr, u32 buf_id)
+{
+	struct hns_roce_dev *hr_dev = to_hr_dev(hr_qp->ibqp.device);
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	u32 active_pages, alloc_pages;
+	int ret;
+
+	ret = setup_dca_buf_to_hw(ctx, hr_qp, buf_id, attr);
+	if (ret) {
+		ibdev_err(ibdev, "failed to setup DCA buf, ret = %d.\n", ret);
+		goto active_fail;
+	}
+
+	alloc_pages = hr_qp->dca_cfg.npages;
+	active_pages = active_dca_pages(ctx, buf_id, alloc_pages);
+	if (active_pages != alloc_pages) {
+		ibdev_err(ibdev, "failed to active DCA pages, %u != %u.\n",
+			  active_pages, alloc_pages);
+		ret = -ENOBUFS;
+		goto active_fail;
+	}
+
+	return 0;
+active_fail:
+	clear_dca_pages(ctx, buf_id, alloc_pages);
+	return ret;
+}
+
+static int attach_dca_mem(struct hns_roce_dev *hr_dev,
+			  struct hns_roce_qp *hr_qp,
+			  struct hns_dca_attach_attr *attr,
+			  struct hns_dca_attach_resp *resp)
+{
+	struct hns_roce_dca_ctx *ctx = hr_qp_to_dca_ctx(hr_qp);
+	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
+	u32 buf_id;
+	int ret;
+
+	resp->alloc_flags = 0;
+	spin_lock(&cfg->lock);
+	buf_id = cfg->buf_id;
+	/* Already attached */
+	if (buf_id != HNS_DCA_INVALID_BUF_ID) {
+		resp->alloc_pages = cfg->npages;
+		spin_unlock(&cfg->lock);
+		return 0;
+	}
+
+	/* Start to new attach */
+	resp->alloc_pages = 0;
+	buf_id = alloc_buf_from_dca_mem(hr_qp, ctx);
+	if (buf_id == HNS_DCA_INVALID_BUF_ID) {
+		spin_unlock(&cfg->lock);
+		/* No report fail, need try again after the pool increased */
+		return 0;
+	}
+
+	ret = active_alloced_buf(hr_qp, ctx, attr, buf_id);
+	if (ret) {
+		spin_unlock(&cfg->lock);
+		ibdev_err(&hr_dev->ib_dev,
+			  "failed to active DCA buf for QP-%lu, ret = %d.\n",
+			  hr_qp->qpn, ret);
+		return ret;
+	}
+
+	/* Attach ok */
+	cfg->buf_id = buf_id;
+	cfg->attach_count++;
+	spin_unlock(&cfg->lock);
+
+	resp->alloc_flags |= HNS_IB_ATTACH_FLAGS_NEW_BUFFER;
+	resp->alloc_pages = cfg->npages;
+
+	return 0;
+}
+
 int hns_roce_enable_dca(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
 {
 	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
 
+	spin_lock_init(&cfg->lock);
 	cfg->buf_id = HNS_DCA_INVALID_BUF_ID;
+	cfg->npages = hr_qp->buff_size >> HNS_HW_PAGE_SHIFT;
 
 	return 0;
 }
@@ -517,11 +913,73 @@ DECLARE_UVERBS_NAMED_METHOD(
 			    UVERBS_ATTR_TYPE(u64), UA_MANDATORY),
 	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_MEMS,
 			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
+
+static inline struct hns_roce_qp *
+uverbs_attr_to_hr_qp(struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, 1U << UVERBS_ID_NS_SHIFT);
+
+	if (uobj_get_object_id(uobj) == UVERBS_OBJECT_QP)
+		return to_hr_qp(uobj->object);
+
+	return NULL;
+}
+
+static int UVERBS_HANDLER(HNS_IB_METHOD_DCA_MEM_ATTACH)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_qp *hr_qp = uverbs_attr_to_hr_qp(attrs);
+	struct hns_dca_attach_attr attr = {};
+	struct hns_dca_attach_resp resp = {};
+	int ret;
+
+	if (!hr_qp)
+		return -EINVAL;
+
+	if (uverbs_copy_from(&attr.sq_offset, attrs,
+			     HNS_IB_ATTR_DCA_MEM_ATTACH_SQ_OFFSET) ||
+	    uverbs_copy_from(&attr.sge_offset, attrs,
+			     HNS_IB_ATTR_DCA_MEM_ATTACH_SGE_OFFSET) ||
+	    uverbs_copy_from(&attr.rq_offset, attrs,
+			     HNS_IB_ATTR_DCA_MEM_ATTACH_RQ_OFFSET))
+		return -EFAULT;
+
+	ret = attach_dca_mem(to_hr_dev(hr_qp->ibqp.device), hr_qp, &attr,
+			     &resp);
+	if (ret)
+		return ret;
+
+	if (uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_FLAGS,
+			   &resp.alloc_flags, sizeof(resp.alloc_flags)) ||
+	    uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_PAGES,
+			   &resp.alloc_pages, sizeof(resp.alloc_pages)))
+		return -EFAULT;
+
+	return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+	HNS_IB_METHOD_DCA_MEM_ATTACH,
+	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_ATTACH_HANDLE, UVERBS_OBJECT_QP,
+			UVERBS_ACCESS_WRITE, UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_ATTACH_SQ_OFFSET,
+			   UVERBS_ATTR_TYPE(u32), UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_ATTACH_SGE_OFFSET,
+			   UVERBS_ATTR_TYPE(u32), UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_ATTACH_RQ_OFFSET,
+			   UVERBS_ATTR_TYPE(u32), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_FLAGS,
+			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_PAGES,
+			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_OBJECT(HNS_IB_OBJECT_DCA_MEM,
 			    UVERBS_TYPE_ALLOC_IDR(dca_cleanup),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_REG),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DEREG),
-			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_SHRINK));
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_SHRINK),
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_ATTACH));
 
 static bool dca_is_supported(struct ib_device *device)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.h b/drivers/infiniband/hw/hns/hns_roce_dca.h
index 419606ef..39ac99f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.h
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.h
@@ -21,6 +21,31 @@ struct hns_dca_shrink_resp {
 
 #define HNS_DCA_INVALID_BUF_ID 0UL
 
+/*
+ * buffer id(29b) = tag(7b) + owner(22b)
+ * [28:22] tag  : indicate the QP config update times.
+ * [21: 0] owner: indicate the QP to which the page belongs.
+ */
+#define HNS_DCA_ID_MASK GENMASK(28, 0)
+#define HNS_DCA_TAG_MASK GENMASK(28, 22)
+#define HNS_DCA_OWN_MASK GENMASK(21, 0)
+
+#define HNS_DCA_BUF_ID_TO_TAG(buf_id) (((buf_id) & HNS_DCA_TAG_MASK) >> 22)
+#define HNS_DCA_BUF_ID_TO_QPN(buf_id) ((buf_id) & HNS_DCA_OWN_MASK)
+#define HNS_DCA_TO_BUF_ID(qpn, tag) (((qpn) & HNS_DCA_OWN_MASK) | \
+					(((tag) << 22) & HNS_DCA_TAG_MASK))
+
+struct hns_dca_attach_attr {
+	u32 sq_offset;
+	u32 sge_offset;
+	u32 rq_offset;
+};
+
+struct hns_dca_attach_resp {
+	u32 alloc_flags;
+	u32 alloc_pages;
+};
+
 void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_ucontext *uctx);
 void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 016df5b..d49feb9 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -366,7 +366,17 @@ struct hns_roce_mtr {
 };
 
 struct hns_roce_dca_cfg {
+	spinlock_t lock;
 	u32 buf_id;
+	u16 attach_count;
+	u32 npages;
+};
+
+/* DCA attr for setting WQE buffer */
+struct hns_roce_dca_attr {
+	u32 sq_offset;
+	u32 sge_offset;
+	u32 rq_offset;
 };
 
 struct hns_roce_mw {
@@ -940,6 +950,9 @@ struct hns_roce_hw {
 	int (*clear_hem)(struct hns_roce_dev *hr_dev,
 			 struct hns_roce_hem_table *table, int obj,
 			 int step_idx);
+	int (*set_dca_buf)(struct hns_roce_dev *hr_dev,
+			   struct hns_roce_qp *hr_qp,
+			   struct hns_roce_dca_attr *attr);
 	int (*query_qp)(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
 			int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr);
 	int (*modify_qp)(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index 996336f..da3effb 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -105,6 +105,7 @@ enum hns_ib_dca_mem_methods {
 	HNS_IB_METHOD_DCA_MEM_REG = (1U << UVERBS_ID_NS_SHIFT),
 	HNS_IB_METHOD_DCA_MEM_DEREG,
 	HNS_IB_METHOD_DCA_MEM_SHRINK,
+	HNS_IB_METHOD_DCA_MEM_ATTACH,
 };
 
 enum hns_ib_dca_mem_reg_attrs {
@@ -125,4 +126,14 @@ enum hns_ib_dca_mem_shrink_attrs {
 	HNS_IB_ATTR_DCA_MEM_SHRINK_OUT_FREE_MEMS,
 };
 
+#define HNS_IB_ATTACH_FLAGS_NEW_BUFFER 1U
+
+enum hns_ib_dca_mem_attach_attrs {
+	HNS_IB_ATTR_DCA_MEM_ATTACH_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	HNS_IB_ATTR_DCA_MEM_ATTACH_SQ_OFFSET,
+	HNS_IB_ATTR_DCA_MEM_ATTACH_SGE_OFFSET,
+	HNS_IB_ATTR_DCA_MEM_ATTACH_RQ_OFFSET,
+	HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_FLAGS,
+	HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_PAGES,
+};
 #endif /* HNS_ABI_USER_H */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 5/7] RDMA/hns: Setup the configuration of WQE addressing to QPC
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
                   ` (3 preceding siblings ...)
  2021-01-15 10:22 ` [PATCH RFC 4/7] RDMA/hns: Add method for attaching WQE buffer Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 6/7] RDMA/hns: Add method to detach WQE buffer Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 7/7] RDMA/hns: Add method to query WQE buffer's address Weihang Li
  6 siblings, 0 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

Add a new command to update the configuration of WQE buffer addressing to
QPC in DCA mode.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 153 ++++++++++++++++++++++++++---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h |   1 +
 2 files changed, 139 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 4ae76ba..95e90f1 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -2628,6 +2628,17 @@ static int hns_roce_v2_cmd_complete(struct hns_roce_dev *hr_dev)
 	return status & HNS_ROCE_HW_MB_STATUS_MASK;
 }
 
+static inline void mbox_desc_init(struct hns_roce_post_mbox *mb, u64 in_param,
+				  u64 out_param, u32 in_modifier,
+				  u8 op_modifier, u16 op)
+{
+	mb->in_param_l = cpu_to_le32(in_param);
+	mb->in_param_h = cpu_to_le32(in_param >> 32);
+	mb->out_param_l = cpu_to_le32(out_param);
+	mb->out_param_h = cpu_to_le32(out_param >> 32);
+	mb->cmd_tag = cpu_to_le32(in_modifier << 8 | op);
+}
+
 static int hns_roce_mbox_post(struct hns_roce_dev *hr_dev, u64 in_param,
 			      u64 out_param, u32 in_modifier, u8 op_modifier,
 			      u16 op, u16 token, int event)
@@ -2636,17 +2647,34 @@ static int hns_roce_mbox_post(struct hns_roce_dev *hr_dev, u64 in_param,
 	struct hns_roce_post_mbox *mb = (struct hns_roce_post_mbox *)desc.data;
 
 	hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_POST_MB, false);
-
-	mb->in_param_l = cpu_to_le32(in_param);
-	mb->in_param_h = cpu_to_le32(in_param >> 32);
-	mb->out_param_l = cpu_to_le32(out_param);
-	mb->out_param_h = cpu_to_le32(out_param >> 32);
-	mb->cmd_tag = cpu_to_le32(in_modifier << 8 | op);
+	mbox_desc_init(mb, in_param, out_param, in_modifier, op_modifier, op);
 	mb->token_event_en = cpu_to_le32(event << 16 | token);
 
 	return hns_roce_cmq_send(hr_dev, &desc, 1);
 }
 
+static int hns_roce_mbox_send(struct hns_roce_dev *hr_dev, u64 in_param,
+			      u64 out_param, u32 in_modifier, u8 op_modifier,
+			      u16 op)
+{
+	struct hns_roce_cmq_desc desc;
+	struct hns_roce_post_mbox *mb = (struct hns_roce_post_mbox *)desc.data;
+
+	hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_SYNC_MB, false);
+
+	mbox_desc_init(mb, in_param, out_param, in_modifier, op_modifier, op);
+
+	/* The hardware doesn't care about the token fields when working in
+	 * sync mode.
+	 */
+	mb->token_event_en = 0;
+
+	/* The cmdq send returns 0 indicates that the hardware has already
+	 * finished the operation defined in this mbox.
+	 */
+	return hns_roce_cmq_send(hr_dev, &desc, 1);
+}
+
 static int hns_roce_v2_post_mbox(struct hns_roce_dev *hr_dev, u64 in_param,
 				 u64 out_param, u32 in_modifier, u8 op_modifier,
 				 u16 op, u16 token, int event)
@@ -4050,15 +4078,16 @@ static void modify_qp_init_to_init(struct ib_qp *ibqp,
 static int config_qp_rq_buf(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_qp *hr_qp,
 			    struct hns_roce_v2_qp_context *context,
-			    struct hns_roce_v2_qp_context *qpc_mask)
+			    struct hns_roce_v2_qp_context *qpc_mask,
+			    struct hns_roce_dca_attr *dca_attr)
 {
 	u64 mtts[MTT_MIN_COUNT] = { 0 };
 	u64 wqe_sge_ba;
 	int count;
 
 	/* Search qp buf's mtts */
-	count = hns_roce_mtr_find(hr_dev, &hr_qp->mtr, hr_qp->rq.offset, mtts,
-				  MTT_MIN_COUNT, &wqe_sge_ba);
+	count = hns_roce_mtr_find(hr_dev, &hr_qp->mtr, dca_attr->rq_offset,
+				  mtts, ARRAY_SIZE(mtts), &wqe_sge_ba);
 	if (hr_qp->rq.wqe_cnt && count < 1) {
 		ibdev_err(&hr_dev->ib_dev,
 			  "failed to find RQ WQE, QPN = 0x%lx.\n", hr_qp->qpn);
@@ -4160,7 +4189,8 @@ static int config_qp_rq_buf(struct hns_roce_dev *hr_dev,
 static int config_qp_sq_buf(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_qp *hr_qp,
 			    struct hns_roce_v2_qp_context *context,
-			    struct hns_roce_v2_qp_context *qpc_mask)
+			    struct hns_roce_v2_qp_context *qpc_mask,
+			    struct hns_roce_dca_attr *dca_attr)
 {
 	struct ib_device *ibdev = &hr_dev->ib_dev;
 	u64 sge_cur_blk = 0;
@@ -4168,7 +4198,8 @@ static int config_qp_sq_buf(struct hns_roce_dev *hr_dev,
 	int count;
 
 	/* search qp buf's mtts */
-	count = hns_roce_mtr_find(hr_dev, &hr_qp->mtr, 0, &sq_cur_blk, 1, NULL);
+	count = hns_roce_mtr_find(hr_dev, &hr_qp->mtr, dca_attr->sq_offset,
+				  &sq_cur_blk, 1, NULL);
 	if (count < 1) {
 		ibdev_err(ibdev, "failed to find QP(0x%lx) SQ buf.\n",
 			  hr_qp->qpn);
@@ -4176,8 +4207,8 @@ static int config_qp_sq_buf(struct hns_roce_dev *hr_dev,
 	}
 	if (hr_qp->sge.sge_cnt > 0) {
 		count = hns_roce_mtr_find(hr_dev, &hr_qp->mtr,
-					  hr_qp->sge.offset,
-					  &sge_cur_blk, 1, NULL);
+					  dca_attr->sge_offset, &sge_cur_blk, 1,
+					  NULL);
 		if (count < 1) {
 			ibdev_err(ibdev, "failed to find QP(0x%lx) SGE buf.\n",
 				  hr_qp->qpn);
@@ -4244,6 +4275,7 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
 	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
 	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
 	struct ib_device *ibdev = &hr_dev->ib_dev;
+	struct hns_roce_dca_attr dca_attr = {};
 	dma_addr_t trrl_ba;
 	dma_addr_t irrl_ba;
 	enum ib_mtu mtu;
@@ -4255,7 +4287,8 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
 	int port;
 	int ret;
 
-	ret = config_qp_rq_buf(hr_dev, hr_qp, context, qpc_mask);
+	dca_attr.rq_offset = hr_qp->rq.offset;
+	ret = config_qp_rq_buf(hr_dev, hr_qp, context, qpc_mask, &dca_attr);
 	if (ret) {
 		ibdev_err(ibdev, "failed to config rq buf, ret = %d.\n", ret);
 		return ret;
@@ -4421,6 +4454,7 @@ static int modify_qp_rtr_to_rts(struct ib_qp *ibqp,
 	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
 	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
 	struct ib_device *ibdev = &hr_dev->ib_dev;
+	struct hns_roce_dca_attr dca_attr = {};
 	int ret;
 
 	/* Not support alternate path and path migration */
@@ -4429,7 +4463,9 @@ static int modify_qp_rtr_to_rts(struct ib_qp *ibqp,
 		return -EINVAL;
 	}
 
-	ret = config_qp_sq_buf(hr_dev, hr_qp, context, qpc_mask);
+	dca_attr.sq_offset = hr_qp->sq.offset;
+	dca_attr.sge_offset = hr_qp->sge.offset;
+	ret = config_qp_sq_buf(hr_dev, hr_qp, context, qpc_mask, &dca_attr);
 	if (ret) {
 		ibdev_err(ibdev, "failed to config sq buf, ret = %d.\n", ret);
 		return ret;
@@ -4947,6 +4983,92 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
 	return ret;
 }
 
+static int init_dca_buf_attr(struct hns_roce_dev *hr_dev,
+			     struct hns_roce_qp *hr_qp,
+			     struct hns_roce_dca_attr *init_attr,
+			     struct hns_roce_dca_attr *dca_attr)
+{
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+
+	if (hr_qp->sq.wqe_cnt > 0) {
+		dca_attr->sq_offset = hr_qp->sq.offset + init_attr->sq_offset;
+		if (dca_attr->sq_offset >= hr_qp->sge.offset) {
+			ibdev_err(ibdev, "failed to check SQ offset = %u\n",
+				  init_attr->sq_offset);
+			return -EINVAL;
+		}
+	}
+
+	if (hr_qp->sge.sge_cnt > 0) {
+		dca_attr->sge_offset = hr_qp->sge.offset + init_attr->sge_offset;
+		if (dca_attr->sge_offset >= hr_qp->rq.offset) {
+			ibdev_err(ibdev, "failed to check exSGE offset = %u\n",
+				  init_attr->sge_offset);
+			return -EINVAL;
+		}
+	}
+
+	if (hr_qp->rq.wqe_cnt > 0) {
+		dca_attr->rq_offset = hr_qp->rq.offset + init_attr->rq_offset;
+		if (dca_attr->rq_offset >= hr_qp->buff_size) {
+			ibdev_err(ibdev, "failed to check RQ offset = %u\n",
+				  init_attr->rq_offset);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int hns_roce_v2_set_dca_buf(struct hns_roce_dev *hr_dev,
+				   struct hns_roce_qp *hr_qp,
+				   struct hns_roce_dca_attr *init_attr)
+{
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	struct hns_roce_v2_qp_context *qpc, *msk;
+	struct hns_roce_dca_attr dca_attr = {};
+	dma_addr_t dma_handle;
+	int qpc_sz;
+	int ret;
+
+	ret = init_dca_buf_attr(hr_dev, hr_qp, init_attr, &dca_attr);
+	if (ret) {
+		ibdev_err(ibdev, "failed to init DCA attr, ret = %d.\n", ret);
+		return ret;
+	}
+
+	qpc_sz = hr_dev->caps.qpc_sz;
+	WARN_ON(2 * qpc_sz > HNS_ROCE_MAILBOX_SIZE);
+	qpc = dma_pool_alloc(hr_dev->cmd.pool, GFP_ATOMIC, &dma_handle);
+	if (!qpc)
+		return -ENOMEM;
+
+	msk = (struct hns_roce_v2_qp_context *)((void *)qpc + qpc_sz);
+	memset(msk, 0xff, qpc_sz);
+
+	ret = config_qp_rq_buf(hr_dev, hr_qp, qpc, msk, &dca_attr);
+	if (ret) {
+		ibdev_err(ibdev, "failed to config rq qpc, ret = %d.\n", ret);
+		goto done;
+	}
+
+	ret = config_qp_sq_buf(hr_dev, hr_qp, qpc, msk, &dca_attr);
+	if (ret) {
+		ibdev_err(ibdev, "failed to config sq qpc, ret = %d.\n", ret);
+		goto done;
+	}
+
+	ret = hns_roce_mbox_send(hr_dev, dma_handle, 0, hr_qp->qpn, 0,
+				 HNS_ROCE_CMD_MODIFY_QPC);
+	if (ret)
+		ibdev_err(ibdev, "failed to modify DCA buf, ret = %d.\n", ret);
+
+done:
+	dma_pool_free(hr_dev->cmd.pool, qpc, dma_handle);
+
+	return ret;
+}
+
 static int to_ib_qp_st(enum hns_roce_v2_qp_state state)
 {
 	static const enum ib_qp_state map[] = {
@@ -6256,6 +6378,7 @@ static const struct hns_roce_hw hns_roce_hw_v2 = {
 	.write_cqc = hns_roce_v2_write_cqc,
 	.set_hem = hns_roce_v2_set_hem,
 	.clear_hem = hns_roce_v2_clear_hem,
+	.set_dca_buf = hns_roce_v2_set_dca_buf,
 	.modify_qp = hns_roce_v2_modify_qp,
 	.query_qp = hns_roce_v2_query_qp,
 	.destroy_qp = hns_roce_v2_destroy_qp,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 6c4c2b7..59861b8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -245,6 +245,7 @@ enum hns_roce_opcode_type {
 	HNS_ROCE_OPC_RESET_SCCC				= 0x850b,
 	HNS_ROCE_OPC_CFG_GMV_TBL			= 0x850f,
 	HNS_ROCE_OPC_CFG_GMV_BT				= 0x8510,
+	HNS_ROCE_OPC_SYNC_MB				= 0x8511,
 	HNS_SWITCH_PARAMETER_CFG			= 0x1033,
 };
 
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 6/7] RDMA/hns: Add method to detach WQE buffer
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
                   ` (4 preceding siblings ...)
  2021-01-15 10:22 ` [PATCH RFC 5/7] RDMA/hns: Setup the configuration of WQE addressing to QPC Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  2021-01-15 10:22 ` [PATCH RFC 7/7] RDMA/hns: Add method to query WQE buffer's address Weihang Li
  6 siblings, 0 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

If a uQP works in DCA mode, the userspace driver needs to drop the WQE
buffer by calling the 'HNS_IB_METHOD_DCA_MEM_DETACH' method when the QP's
CI is equal to PI, that means, the hns ROCEE will not access the WQE's
buffer at this time, and the userspace driver can free this WQE's buffer.

This method will start an worker queue to recycle the WQE buffer in kernel
space, if the WQE buffer is indeed not being accessed by hns ROCEE, the
worker will change the pages' state as free in DCA memroy pool.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_dca.c    | 162 +++++++++++++++++++++++++++-
 drivers/infiniband/hw/hns/hns_roce_dca.h    |   7 +-
 drivers/infiniband/hw/hns/hns_roce_device.h |   4 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  47 ++++++++
 drivers/infiniband/hw/hns/hns_roce_qp.c     |   4 +-
 include/uapi/rdma/hns-abi.h                 |   6 ++
 6 files changed, 225 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.c b/drivers/infiniband/hw/hns/hns_roce_dca.c
index f44197d..3d1e1b4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.c
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.c
@@ -15,6 +15,9 @@
 #define UVERBS_MODULE_NAME hns_ib
 #include <rdma/uverbs_named_ioctl.h>
 
+/* DCA mem ageing interval time */
+#define DCA_MEM_AGEING_MSES 1000
+
 /* DCA memory */
 struct dca_mem {
 #define DCA_MEM_FLAGS_ALLOCED BIT(0)
@@ -42,6 +45,12 @@ static inline void set_dca_page_to_free(struct hns_dca_page_state *state)
 	state->lock = 0;
 }
 
+static inline void set_dca_page_to_inactive(struct hns_dca_page_state *state)
+{
+	state->active = 0;
+	state->lock = 0;
+}
+
 static inline void lock_dca_page_to_attach(struct hns_dca_page_state *state,
 					   u32 buf_id)
 {
@@ -741,7 +750,10 @@ static int attach_dca_mem(struct hns_roce_dev *hr_dev,
 	u32 buf_id;
 	int ret;
 
+	/* Stop DCA mem ageing worker */
+	cancel_delayed_work(&cfg->dwork);
 	resp->alloc_flags = 0;
+
 	spin_lock(&cfg->lock);
 	buf_id = cfg->buf_id;
 	/* Already attached */
@@ -780,11 +792,128 @@ static int attach_dca_mem(struct hns_roce_dev *hr_dev,
 	return 0;
 }
 
+struct dca_page_free_buf_attr {
+	u32 buf_id;
+	u32 max_pages;
+	u32 free_pages;
+	u32 clean_mems;
+};
+
+static int free_buffer_pages_proc(struct dca_mem *mem, int index, void *param)
+{
+	struct dca_page_free_buf_attr *attr = param;
+	struct hns_dca_page_state *state;
+	bool changed = false;
+	bool stop = false;
+	int i, free_pages;
+
+	free_pages = 0;
+	for (i = 0; !stop && i < mem->page_count; i++) {
+		state = &mem->states[i];
+		/* Change matched pages state */
+		if (dca_page_is_attached(state, attr->buf_id)) {
+			set_dca_page_to_free(state);
+			changed = true;
+			attr->free_pages++;
+			if (attr->free_pages == attr->max_pages)
+				stop = true;
+		}
+
+		if (dca_page_is_free(state))
+			free_pages++;
+	}
+
+	for (; changed && i < mem->page_count; i++)
+		if (dca_page_is_free(state))
+			free_pages++;
+
+	if (changed && free_pages == mem->page_count)
+		attr->clean_mems++;
+
+	return stop ? DCA_MEM_STOP_ITERATE : DCA_MEM_NEXT_ITERATE;
+}
+
+static void free_buf_from_dca_mem(struct hns_roce_dca_ctx *ctx,
+				  struct hns_roce_dca_cfg *cfg)
+{
+	struct dca_page_free_buf_attr attr = {};
+	unsigned long flags;
+	u32 buf_id;
+
+	spin_lock(&cfg->lock);
+	buf_id = cfg->buf_id;
+	cfg->buf_id = HNS_DCA_INVALID_BUF_ID;
+	spin_unlock(&cfg->lock);
+	if (buf_id == HNS_DCA_INVALID_BUF_ID)
+		return;
+
+	attr.buf_id = buf_id;
+	attr.max_pages = cfg->npages;
+	travel_dca_pages(ctx, &attr, free_buffer_pages_proc);
+
+	/* Update free size */
+	spin_lock_irqsave(&ctx->pool_lock, flags);
+	ctx->free_mems += attr.clean_mems;
+	ctx->free_size += attr.free_pages << HNS_HW_PAGE_SHIFT;
+	spin_unlock_irqrestore(&ctx->pool_lock, flags);
+}
+
+static void kick_dca_mem(struct hns_roce_dev *hr_dev,
+			 struct hns_roce_dca_cfg *cfg,
+			 struct hns_roce_ucontext *uctx)
+{
+	struct hns_roce_dca_ctx *ctx = to_hr_dca_ctx(uctx);
+
+	/* Stop ageing worker and free DCA buffer from pool */
+	cancel_delayed_work_sync(&cfg->dwork);
+	free_buf_from_dca_mem(ctx, cfg);
+}
+
+static void dca_mem_ageing_work(struct work_struct *work)
+{
+	struct hns_roce_qp *hr_qp = container_of(work, struct hns_roce_qp,
+						 dca_cfg.dwork.work);
+	struct hns_roce_dev *hr_dev = to_hr_dev(hr_qp->ibqp.device);
+	struct hns_roce_dca_ctx *ctx = hr_qp_to_dca_ctx(hr_qp);
+	bool hw_is_inactive;
+
+	hw_is_inactive = hr_dev->hw->chk_dca_buf_inactive &&
+			 hr_dev->hw->chk_dca_buf_inactive(hr_dev, hr_qp);
+	if (hw_is_inactive)
+		free_buf_from_dca_mem(ctx, &hr_qp->dca_cfg);
+}
+
+void hns_roce_dca_kick(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+{
+	struct hns_roce_ucontext *uctx;
+
+	if (hr_qp->ibqp.uobject && hr_qp->ibqp.pd->uobject) {
+		uctx = to_hr_ucontext(hr_qp->ibqp.pd->uobject->context);
+		kick_dca_mem(hr_dev, &hr_qp->dca_cfg, uctx);
+	}
+}
+
+static void detach_dca_mem(struct hns_roce_dev *hr_dev,
+			   struct hns_roce_qp *hr_qp,
+			   struct hns_dca_detach_attr *attr)
+{
+	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
+
+	/* Start an ageing worker to free buffer */
+	cancel_delayed_work(&cfg->dwork);
+	spin_lock(&cfg->lock);
+	cfg->sq_idx = attr->sq_idx;
+	queue_delayed_work(hr_dev->irq_workq, &cfg->dwork,
+			   msecs_to_jiffies(DCA_MEM_AGEING_MSES));
+	spin_unlock(&cfg->lock);
+}
+
 int hns_roce_enable_dca(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
 {
 	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
 
 	spin_lock_init(&cfg->lock);
+	INIT_DELAYED_WORK(&cfg->dwork, dca_mem_ageing_work);
 	cfg->buf_id = HNS_DCA_INVALID_BUF_ID;
 	cfg->npages = hr_qp->buff_size >> HNS_HW_PAGE_SHIFT;
 
@@ -792,10 +921,13 @@ int hns_roce_enable_dca(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
 }
 
 void hns_roce_disable_dca(struct hns_roce_dev *hr_dev,
-			  struct hns_roce_qp *hr_qp)
+			  struct hns_roce_qp *hr_qp, struct ib_udata *udata)
 {
+	struct hns_roce_ucontext *uctx = rdma_udata_to_drv_context(udata,
+					 struct hns_roce_ucontext, ibucontext);
 	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
 
+	kick_dca_mem(hr_dev, cfg, uctx);
 	cfg->buf_id = HNS_DCA_INVALID_BUF_ID;
 }
 
@@ -974,12 +1106,38 @@ DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_PAGES,
 			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
 
+static int UVERBS_HANDLER(HNS_IB_METHOD_DCA_MEM_DETACH)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_qp *hr_qp = uverbs_attr_to_hr_qp(attrs);
+	struct hns_dca_detach_attr attr = {};
+
+	if (!hr_qp)
+		return -EINVAL;
+
+	if (uverbs_copy_from(&attr.sq_idx, attrs,
+			     HNS_IB_ATTR_DCA_MEM_DETACH_SQ_INDEX))
+		return -EFAULT;
+
+	detach_dca_mem(to_hr_dev(hr_qp->ibqp.device), hr_qp, &attr);
+
+	return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+	HNS_IB_METHOD_DCA_MEM_DETACH,
+	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_DETACH_HANDLE, UVERBS_OBJECT_QP,
+			UVERBS_ACCESS_WRITE, UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_DETACH_SQ_INDEX,
+			   UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_OBJECT(HNS_IB_OBJECT_DCA_MEM,
 			    UVERBS_TYPE_ALLOC_IDR(dca_cleanup),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_REG),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DEREG),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_SHRINK),
-			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_ATTACH));
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_ATTACH),
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DETACH));
 
 static bool dca_is_supported(struct ib_device *device)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.h b/drivers/infiniband/hw/hns/hns_roce_dca.h
index 39ac99f..8155903 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.h
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.h
@@ -46,6 +46,10 @@ struct hns_dca_attach_resp {
 	u32 alloc_pages;
 };
 
+struct hns_dca_detach_attr {
+	u32 sq_idx;
+};
+
 void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_ucontext *uctx);
 void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
@@ -53,5 +57,6 @@ void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
 
 int hns_roce_enable_dca(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
 void hns_roce_disable_dca(struct hns_roce_dev *hr_dev,
-			  struct hns_roce_qp *hr_qp);
+			  struct hns_roce_qp *hr_qp, struct ib_udata *udata);
+void hns_roce_dca_kick(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
 #endif
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index d49feb9..b50586f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -370,6 +370,8 @@ struct hns_roce_dca_cfg {
 	u32 buf_id;
 	u16 attach_count;
 	u32 npages;
+	u32 sq_idx;
+	struct delayed_work dwork;
 };
 
 /* DCA attr for setting WQE buffer */
@@ -953,6 +955,8 @@ struct hns_roce_hw {
 	int (*set_dca_buf)(struct hns_roce_dev *hr_dev,
 			   struct hns_roce_qp *hr_qp,
 			   struct hns_roce_dca_attr *attr);
+	bool (*chk_dca_buf_inactive)(struct hns_roce_dev *hr_dev,
+				     struct hns_roce_qp *hr_qp);
 	int (*query_qp)(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
 			int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr);
 	int (*modify_qp)(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 95e90f1..736a58d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -46,6 +46,7 @@
 #include "hns_roce_device.h"
 #include "hns_roce_cmd.h"
 #include "hns_roce_hem.h"
+#include "hns_roce_dca.h"
 #include "hns_roce_hw_v2.h"
 
 static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
@@ -4979,6 +4980,9 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
 			*hr_qp->rdb.db_record = 0;
 	}
 
+	if (check_qp_dca_enable(hr_qp) &&
+	    (new_state == IB_QPS_RESET || new_state == IB_QPS_ERR))
+		hns_roce_dca_kick(hr_dev, hr_qp);
 out:
 	return ret;
 }
@@ -5240,6 +5244,48 @@ static int hns_roce_v2_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
 	return ret;
 }
 
+static bool hns_roce_v2_chk_dca_buf_inactive(struct hns_roce_dev *hr_dev,
+					     struct hns_roce_qp *hr_qp)
+{
+	struct hns_roce_dca_cfg *cfg = &hr_qp->dca_cfg;
+	struct hns_roce_v2_qp_context context = {};
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	u32 tmp, sq_idx;
+	int state;
+	int ret;
+
+	ret = hns_roce_v2_query_qpc(hr_dev, hr_qp, &context);
+	if (ret) {
+		ibdev_err(ibdev, "failed to query DCA QPC, ret = %d.\n", ret);
+		return false;
+	}
+
+	state = roce_get_field(context.byte_60_qpst_tempid,
+			       V2_QPC_BYTE_60_QP_ST_M, V2_QPC_BYTE_60_QP_ST_S);
+	if (state == HNS_ROCE_QP_ST_ERR || state == HNS_ROCE_QP_ST_RST)
+		return true;
+
+	/* If RQ is not empty, the buffer is always active until the QP stops
+	 * working.
+	 */
+	if (hr_qp->rq.wqe_cnt > 0)
+		return false;
+
+	if (hr_qp->sq.wqe_cnt > 0) {
+		tmp = (u32)roce_get_field(context.byte_220_retry_psn_msn,
+					  V2_QPC_BYTE_220_RETRY_MSG_MSN_M,
+					  V2_QPC_BYTE_220_RETRY_MSG_MSN_S);
+		sq_idx = tmp & (hr_qp->sq.wqe_cnt - 1);
+		/* If SQ-PI equals to retry_msg_msn in QPC, the QP is
+		 * inactive.
+		 */
+		if (sq_idx != cfg->sq_idx)
+			return false;
+	}
+
+	return true;
+}
+
 static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
 					 struct hns_roce_qp *hr_qp,
 					 struct ib_udata *udata)
@@ -6379,6 +6425,7 @@ static const struct hns_roce_hw hns_roce_hw_v2 = {
 	.set_hem = hns_roce_v2_set_hem,
 	.clear_hem = hns_roce_v2_clear_hem,
 	.set_dca_buf = hns_roce_v2_set_dca_buf,
+	.chk_dca_buf_inactive = hns_roce_v2_chk_dca_buf_inactive,
 	.modify_qp = hns_roce_v2_modify_qp,
 	.query_qp = hns_roce_v2_query_qp,
 	.destroy_qp = hns_roce_v2_destroy_qp,
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index b08d111..ee2ea2e3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -751,7 +751,7 @@ static int alloc_wqe_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 	if (ret) {
 		ibdev_err(ibdev, "failed to create WQE mtr, ret = %d.\n", ret);
 		if (dca_en)
-			hns_roce_disable_dca(hr_dev, hr_qp);
+			hns_roce_disable_dca(hr_dev, hr_qp, udata);
 	}
 
 	return ret;
@@ -763,7 +763,7 @@ static void free_wqe_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 	hns_roce_mtr_destroy(hr_dev, &hr_qp->mtr);
 
 	if (hr_qp->en_flags & HNS_ROCE_QP_CAP_DCA)
-		hns_roce_disable_dca(hr_dev, hr_qp);
+		hns_roce_disable_dca(hr_dev, hr_qp, udata);
 }
 
 static int alloc_qp_wqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index da3effb..e6b01de 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -106,6 +106,7 @@ enum hns_ib_dca_mem_methods {
 	HNS_IB_METHOD_DCA_MEM_DEREG,
 	HNS_IB_METHOD_DCA_MEM_SHRINK,
 	HNS_IB_METHOD_DCA_MEM_ATTACH,
+	HNS_IB_METHOD_DCA_MEM_DETACH,
 };
 
 enum hns_ib_dca_mem_reg_attrs {
@@ -136,4 +137,9 @@ enum hns_ib_dca_mem_attach_attrs {
 	HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_FLAGS,
 	HNS_IB_ATTR_DCA_MEM_ATTACH_OUT_ALLOC_PAGES,
 };
+
+enum hns_ib_dca_mem_detach_attrs {
+	HNS_IB_ATTR_DCA_MEM_DETACH_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	HNS_IB_ATTR_DCA_MEM_DETACH_SQ_INDEX,
+};
 #endif /* HNS_ABI_USER_H */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 7/7] RDMA/hns: Add method to query WQE buffer's address
  2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
                   ` (5 preceding siblings ...)
  2021-01-15 10:22 ` [PATCH RFC 6/7] RDMA/hns: Add method to detach WQE buffer Weihang Li
@ 2021-01-15 10:22 ` Weihang Li
  6 siblings, 0 replies; 16+ messages in thread
From: Weihang Li @ 2021-01-15 10:22 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

If a uQP works in DCA mode, the userspace driver need to get the buffer's
address in DCA memory pool by calling the 'HNS_IB_METHOD_DCA_MEM_QUERY'
method after the QP was attached by calling the
'HNS_IB_METHOD_DCA_MEM_ATTACH' method.

This method will return the DCA mem object's key and the offset to let the
userspace driver get the WQE's virtual address in DCA memory pool.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_dca.c | 112 ++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/hns/hns_roce_dca.h |   6 ++
 include/uapi/rdma/hns-abi.h              |  10 +++
 3 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.c b/drivers/infiniband/hw/hns/hns_roce_dca.c
index 3d1e1b4..dded481 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.c
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.c
@@ -80,6 +80,14 @@ static inline bool dca_page_is_attached(struct hns_dca_page_state *state,
 			(HNS_DCA_OWN_MASK & state->buf_id);
 }
 
+static inline bool dca_page_is_active(struct hns_dca_page_state *state,
+				      u32 buf_id)
+{
+	/* all buf id bits must be matched */
+	return (HNS_DCA_ID_MASK & buf_id) == state->buf_id &&
+		!state->lock && state->active;
+}
+
 static inline bool dca_page_is_allocated(struct hns_dca_page_state *state,
 					 u32 buf_id)
 {
@@ -792,6 +800,64 @@ static int attach_dca_mem(struct hns_roce_dev *hr_dev,
 	return 0;
 }
 
+struct dca_page_query_active_attr {
+	u32 buf_id;
+	u32 curr_index;
+	u32 start_index;
+	u32 page_index;
+	u32 page_count;
+	u64 mem_key;
+};
+
+static int query_dca_active_pages_proc(struct dca_mem *mem, int index,
+				       void *param)
+{
+	struct hns_dca_page_state *state = &mem->states[index];
+	struct dca_page_query_active_attr *attr = param;
+
+	if (!dca_page_is_active(state, attr->buf_id))
+		return 0;
+
+	if (attr->curr_index < attr->start_index) {
+		attr->curr_index++;
+		return 0;
+	} else if (attr->curr_index > attr->start_index) {
+		return DCA_MEM_STOP_ITERATE;
+	}
+
+	/* Search first page in DCA mem */
+	attr->page_index = index;
+	attr->mem_key = mem->key;
+	/* Search active pages in continuous addresses */
+	while (index < mem->page_count) {
+		state = &mem->states[index];
+		if (!dca_page_is_active(state, attr->buf_id))
+			break;
+
+		index++;
+		attr->page_count++;
+	}
+
+	return DCA_MEM_STOP_ITERATE;
+}
+
+static int query_dca_mem(struct hns_roce_qp *hr_qp, u32 page_index,
+			 struct hns_dca_query_resp *resp)
+{
+	struct hns_roce_dca_ctx *ctx = hr_qp_to_dca_ctx(hr_qp);
+	struct dca_page_query_active_attr attr = {};
+
+	attr.buf_id = hr_qp->dca_cfg.buf_id;
+	attr.start_index = page_index;
+	travel_dca_pages(ctx, &attr, query_dca_active_pages_proc);
+
+	resp->mem_key = attr.mem_key;
+	resp->mem_ofs = attr.page_index << HNS_HW_PAGE_SHIFT;
+	resp->page_count = attr.page_count;
+
+	return attr.page_count ? 0 : -ENOMEM;
+}
+
 struct dca_page_free_buf_attr {
 	u32 buf_id;
 	u32 max_pages;
@@ -1131,13 +1197,57 @@ DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_DETACH_SQ_INDEX,
 			   UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
 
+static int UVERBS_HANDLER(HNS_IB_METHOD_DCA_MEM_QUERY)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_qp *hr_qp = uverbs_attr_to_hr_qp(attrs);
+	struct hns_dca_query_resp resp = {};
+	u32 page_idx;
+	int ret;
+
+	if (!hr_qp)
+		return -EINVAL;
+
+	if (uverbs_copy_from(&page_idx, attrs,
+			     HNS_IB_ATTR_DCA_MEM_QUERY_PAGE_INDEX))
+		return -EFAULT;
+
+	ret = query_dca_mem(hr_qp, page_idx, &resp);
+	if (ret)
+		return ret;
+
+	if (uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_QUERY_OUT_KEY,
+			   &resp.mem_key, sizeof(resp.mem_key)) ||
+	    uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_QUERY_OUT_OFFSET,
+			   &resp.mem_ofs, sizeof(resp.mem_ofs)) ||
+	    uverbs_copy_to(attrs, HNS_IB_ATTR_DCA_MEM_QUERY_OUT_PAGE_COUNT,
+			   &resp.page_count, sizeof(resp.page_count)))
+		return -EFAULT;
+
+	return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+	HNS_IB_METHOD_DCA_MEM_QUERY,
+	UVERBS_ATTR_IDR(HNS_IB_ATTR_DCA_MEM_QUERY_HANDLE, UVERBS_OBJECT_QP,
+			UVERBS_ACCESS_READ, UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(HNS_IB_ATTR_DCA_MEM_QUERY_PAGE_INDEX,
+			   UVERBS_ATTR_TYPE(u32), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_QUERY_OUT_KEY,
+			    UVERBS_ATTR_TYPE(u64), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_QUERY_OUT_OFFSET,
+			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(HNS_IB_ATTR_DCA_MEM_QUERY_OUT_PAGE_COUNT,
+			    UVERBS_ATTR_TYPE(u32), UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_OBJECT(HNS_IB_OBJECT_DCA_MEM,
 			    UVERBS_TYPE_ALLOC_IDR(dca_cleanup),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_REG),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DEREG),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_SHRINK),
 			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_ATTACH),
-			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DETACH));
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_DETACH),
+			    &UVERBS_METHOD(HNS_IB_METHOD_DCA_MEM_QUERY));
 
 static bool dca_is_supported(struct ib_device *device)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_dca.h b/drivers/infiniband/hw/hns/hns_roce_dca.h
index 8155903..3e9971a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_dca.h
+++ b/drivers/infiniband/hw/hns/hns_roce_dca.h
@@ -50,6 +50,12 @@ struct hns_dca_detach_attr {
 	u32 sq_idx;
 };
 
+struct hns_dca_query_resp {
+	u64 mem_key;
+	u32 mem_ofs;
+	u32 page_count;
+};
+
 void hns_roce_register_udca(struct hns_roce_dev *hr_dev,
 			    struct hns_roce_ucontext *uctx);
 void hns_roce_unregister_udca(struct hns_roce_dev *hr_dev,
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index e6b01de..4f5ac46 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -107,6 +107,7 @@ enum hns_ib_dca_mem_methods {
 	HNS_IB_METHOD_DCA_MEM_SHRINK,
 	HNS_IB_METHOD_DCA_MEM_ATTACH,
 	HNS_IB_METHOD_DCA_MEM_DETACH,
+	HNS_IB_METHOD_DCA_MEM_QUERY,
 };
 
 enum hns_ib_dca_mem_reg_attrs {
@@ -142,4 +143,13 @@ enum hns_ib_dca_mem_detach_attrs {
 	HNS_IB_ATTR_DCA_MEM_DETACH_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
 	HNS_IB_ATTR_DCA_MEM_DETACH_SQ_INDEX,
 };
+
+enum hns_ib_dca_mem_query_attrs {
+	HNS_IB_ATTR_DCA_MEM_QUERY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	HNS_IB_ATTR_DCA_MEM_QUERY_PAGE_INDEX,
+	HNS_IB_ATTR_DCA_MEM_QUERY_OUT_KEY,
+	HNS_IB_ATTR_DCA_MEM_QUERY_OUT_OFFSET,
+	HNS_IB_ATTR_DCA_MEM_QUERY_OUT_PAGE_COUNT,
+};
+
 #endif /* HNS_ABI_USER_H */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-15 10:22 ` [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP Weihang Li
@ 2021-01-20  8:10   ` Leon Romanovsky
  2021-01-21  7:01     ` liweihang
  0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-01-20  8:10 UTC (permalink / raw)
  To: Weihang Li; +Cc: dledford, jgg, linux-rdma, linuxarm

On Fri, Jan 15, 2021 at 06:22:12PM +0800, Weihang Li wrote:
> From: Xi Wang <wangxi11@huawei.com>
>
> The hip09 introduces the DCA(Dynamic context attachment) feature which
> supports many RC QPs to share the WQE buffer in a memory pool, this will
> reduce the memory consumption when there are too many QPs are inactive.
>
> If a QP enables DCA feature, the WQE's buffer will not be allocated when
> creating. But when the users start to post WRs, the hns driver will
> allocate a buffer from the memory pool and then fill WQEs which tagged with
> this QP's number.
>
> The hns ROCEE will stop accessing the WQE buffer when the user polled all
> of the CQEs for a DCA QP, then the driver will recycle this WQE's buffer
> to the memory pool.
>
> This patch adds a group of methods to support the user space register
> buffers to a memory pool which belongs to the user context. The hns kernel
> driver will update the pages state in this pool when the user calling the
> post/poll methods and the user driver can get the QP's WQE buffer address
> by the key and offset which queried from kernel.
>
> Signed-off-by: Xi Wang <wangxi11@huawei.com>
> Signed-off-by: Weihang Li <liweihang@huawei.com>
> ---
>  drivers/infiniband/hw/hns/Makefile          |   2 +-
>  drivers/infiniband/hw/hns/hns_roce_dca.c    | 381 ++++++++++++++++++++++++++++
>  drivers/infiniband/hw/hns/hns_roce_dca.h    |  22 ++
>  drivers/infiniband/hw/hns/hns_roce_device.h |  10 +
>  drivers/infiniband/hw/hns/hns_roce_main.c   |  27 +-
>  include/uapi/rdma/hns-abi.h                 |  23 ++
>  6 files changed, 462 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h

<...>

> +static struct dca_mem *alloc_dca_mem(struct hns_roce_dca_ctx *ctx)
> +{
> +	struct dca_mem *mem, *tmp, *found = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ctx->pool_lock, flags);
> +	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
> +		spin_lock(&mem->lock);
> +		if (dca_mem_is_free(mem)) {
> +			found = mem;
> +			set_dca_mem_alloced(mem);
> +			spin_unlock(&mem->lock);
> +			goto done;
> +		}
> +		spin_unlock(&mem->lock);
> +	}
> +
> +done:
> +	spin_unlock_irqrestore(&ctx->pool_lock, flags);
> +
> +	if (found)
> +		return found;
> +
> +	mem = kzalloc(sizeof(*mem), GFP_ATOMIC);

Should it be ATOMIC?

> +	if (!mem)
> +		return NULL;
> +
> +	spin_lock_init(&mem->lock);
> +	INIT_LIST_HEAD(&mem->list);
> +
> +	set_dca_mem_alloced(mem);
> +
> +	spin_lock_irqsave(&ctx->pool_lock, flags);
> +	list_add(&mem->list, &ctx->pool);
> +	spin_unlock_irqrestore(&ctx->pool_lock, flags);
> +	return mem;
> +}

<...>

>  /**
>   * hns_get_gid_index - Get gid index.
> @@ -306,15 +308,16 @@ static int hns_roce_modify_device(struct ib_device *ib_dev, int mask,
>  static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx,
>  				   struct ib_udata *udata)
>  {
> -	int ret;
>  	struct hns_roce_ucontext *context = to_hr_ucontext(uctx);
> -	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>  	struct hns_roce_dev *hr_dev = to_hr_dev(uctx->device);
> +	struct hns_roce_ib_alloc_ucontext_resp resp = {};
> +	int ret;
>
>  	if (!hr_dev->active)
>  		return -EAGAIN;
>
>  	resp.qp_tab_size = hr_dev->caps.num_qps;
> +	resp.cap_flags = (u32)hr_dev->caps.flags;

This is prone to errors, flags is u64.

<...>

> diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
> index 90b739d..f59abc4 100644
> --- a/include/uapi/rdma/hns-abi.h
> +++ b/include/uapi/rdma/hns-abi.h
> @@ -86,10 +86,33 @@ struct hns_roce_ib_create_qp_resp {
>  struct hns_roce_ib_alloc_ucontext_resp {
>  	__u32	qp_tab_size;
>  	__u32	cqe_size;
> +	__u32	cap_flags;
>  };

This struct should be padded to 64bits,

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-20  8:10   ` Leon Romanovsky
@ 2021-01-21  7:01     ` liweihang
  2021-01-21  8:53       ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: liweihang @ 2021-01-21  7:01 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: dledford, jgg, linux-rdma, linuxarm

On 2021/1/20 16:10, Leon Romanovsky wrote:
> On Fri, Jan 15, 2021 at 06:22:12PM +0800, Weihang Li wrote:
>> From: Xi Wang <wangxi11@huawei.com>
>>
>> The hip09 introduces the DCA(Dynamic context attachment) feature which
>> supports many RC QPs to share the WQE buffer in a memory pool, this will
>> reduce the memory consumption when there are too many QPs are inactive.
>>
>> If a QP enables DCA feature, the WQE's buffer will not be allocated when
>> creating. But when the users start to post WRs, the hns driver will
>> allocate a buffer from the memory pool and then fill WQEs which tagged with
>> this QP's number.
>>
>> The hns ROCEE will stop accessing the WQE buffer when the user polled all
>> of the CQEs for a DCA QP, then the driver will recycle this WQE's buffer
>> to the memory pool.
>>
>> This patch adds a group of methods to support the user space register
>> buffers to a memory pool which belongs to the user context. The hns kernel
>> driver will update the pages state in this pool when the user calling the
>> post/poll methods and the user driver can get the QP's WQE buffer address
>> by the key and offset which queried from kernel.
>>
>> Signed-off-by: Xi Wang <wangxi11@huawei.com>
>> Signed-off-by: Weihang Li <liweihang@huawei.com>
>> ---
>>  drivers/infiniband/hw/hns/Makefile          |   2 +-
>>  drivers/infiniband/hw/hns/hns_roce_dca.c    | 381 ++++++++++++++++++++++++++++
>>  drivers/infiniband/hw/hns/hns_roce_dca.h    |  22 ++
>>  drivers/infiniband/hw/hns/hns_roce_device.h |  10 +
>>  drivers/infiniband/hw/hns/hns_roce_main.c   |  27 +-
>>  include/uapi/rdma/hns-abi.h                 |  23 ++
>>  6 files changed, 462 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
>>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h
> 
> <...>
> 
>> +static struct dca_mem *alloc_dca_mem(struct hns_roce_dca_ctx *ctx)
>> +{
>> +	struct dca_mem *mem, *tmp, *found = NULL;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&ctx->pool_lock, flags);
>> +	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
>> +		spin_lock(&mem->lock);
>> +		if (dca_mem_is_free(mem)) {
>> +			found = mem;
>> +			set_dca_mem_alloced(mem);
>> +			spin_unlock(&mem->lock);
>> +			goto done;
>> +		}
>> +		spin_unlock(&mem->lock);
>> +	}
>> +
>> +done:
>> +	spin_unlock_irqrestore(&ctx->pool_lock, flags);
>> +
>> +	if (found)
>> +		return found;
>> +
>> +	mem = kzalloc(sizeof(*mem), GFP_ATOMIC);
> 
> Should it be ATOMIC?
> 

Hi Leon,

The current DCA interfaces can be invoked by userspace through ibv_xx_cmd(),
but it is expected that it can work in ib_post_xx() in kernel in the future.
Since it may work in context of spin_lock, so we use GFP_ATOMIC.


>> +	if (!mem)
>> +		return NULL;
>> +
>> +	spin_lock_init(&mem->lock);
>> +	INIT_LIST_HEAD(&mem->list);
>> +
>> +	set_dca_mem_alloced(mem);
>> +
>> +	spin_lock_irqsave(&ctx->pool_lock, flags);
>> +	list_add(&mem->list, &ctx->pool);
>> +	spin_unlock_irqrestore(&ctx->pool_lock, flags);
>> +	return mem;
>> +}
> 
> <...>
> 
>>  /**
>>   * hns_get_gid_index - Get gid index.
>> @@ -306,15 +308,16 @@ static int hns_roce_modify_device(struct ib_device *ib_dev, int mask,
>>  static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx,
>>  				   struct ib_udata *udata)
>>  {
>> -	int ret;
>>  	struct hns_roce_ucontext *context = to_hr_ucontext(uctx);
>> -	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>>  	struct hns_roce_dev *hr_dev = to_hr_dev(uctx->device);
>> +	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>> +	int ret;
>>
>>  	if (!hr_dev->active)
>>  		return -EAGAIN;
>>
>>  	resp.qp_tab_size = hr_dev->caps.num_qps;
>> +	resp.cap_flags = (u32)hr_dev->caps.flags;
> 
> This is prone to errors, flags is u64.
> 

OK, we plan to change type of resp.cap_flags to u64.

> <...>
> 
>> diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
>> index 90b739d..f59abc4 100644
>> --- a/include/uapi/rdma/hns-abi.h
>> +++ b/include/uapi/rdma/hns-abi.h
>> @@ -86,10 +86,33 @@ struct hns_roce_ib_create_qp_resp {
>>  struct hns_roce_ib_alloc_ucontext_resp {
>>  	__u32	qp_tab_size;
>>  	__u32	cqe_size;
>> +	__u32	cap_flags;
>>  };
> 
> This struct should be padded to 64bits,
> > Thanks
> 
Thanks, I will fix it.

Weihang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-21  7:01     ` liweihang
@ 2021-01-21  8:53       ` Leon Romanovsky
  2021-01-21 13:33         ` liweihang
  0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-01-21  8:53 UTC (permalink / raw)
  To: liweihang; +Cc: dledford, jgg, linux-rdma, linuxarm

On Thu, Jan 21, 2021 at 07:01:50AM +0000, liweihang wrote:
> On 2021/1/20 16:10, Leon Romanovsky wrote:
> > On Fri, Jan 15, 2021 at 06:22:12PM +0800, Weihang Li wrote:
> >> From: Xi Wang <wangxi11@huawei.com>
> >>
> >> The hip09 introduces the DCA(Dynamic context attachment) feature which
> >> supports many RC QPs to share the WQE buffer in a memory pool, this will
> >> reduce the memory consumption when there are too many QPs are inactive.
> >>
> >> If a QP enables DCA feature, the WQE's buffer will not be allocated when
> >> creating. But when the users start to post WRs, the hns driver will
> >> allocate a buffer from the memory pool and then fill WQEs which tagged with
> >> this QP's number.
> >>
> >> The hns ROCEE will stop accessing the WQE buffer when the user polled all
> >> of the CQEs for a DCA QP, then the driver will recycle this WQE's buffer
> >> to the memory pool.
> >>
> >> This patch adds a group of methods to support the user space register
> >> buffers to a memory pool which belongs to the user context. The hns kernel
> >> driver will update the pages state in this pool when the user calling the
> >> post/poll methods and the user driver can get the QP's WQE buffer address
> >> by the key and offset which queried from kernel.
> >>
> >> Signed-off-by: Xi Wang <wangxi11@huawei.com>
> >> Signed-off-by: Weihang Li <liweihang@huawei.com>
> >> ---
> >>  drivers/infiniband/hw/hns/Makefile          |   2 +-
> >>  drivers/infiniband/hw/hns/hns_roce_dca.c    | 381 ++++++++++++++++++++++++++++
> >>  drivers/infiniband/hw/hns/hns_roce_dca.h    |  22 ++
> >>  drivers/infiniband/hw/hns/hns_roce_device.h |  10 +
> >>  drivers/infiniband/hw/hns/hns_roce_main.c   |  27 +-
> >>  include/uapi/rdma/hns-abi.h                 |  23 ++
> >>  6 files changed, 462 insertions(+), 3 deletions(-)
> >>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
> >>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h
> >
> > <...>
> >
> >> +static struct dca_mem *alloc_dca_mem(struct hns_roce_dca_ctx *ctx)
> >> +{
> >> +	struct dca_mem *mem, *tmp, *found = NULL;
> >> +	unsigned long flags;
> >> +
> >> +	spin_lock_irqsave(&ctx->pool_lock, flags);
> >> +	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
> >> +		spin_lock(&mem->lock);
> >> +		if (dca_mem_is_free(mem)) {
> >> +			found = mem;
> >> +			set_dca_mem_alloced(mem);
> >> +			spin_unlock(&mem->lock);
> >> +			goto done;
> >> +		}
> >> +		spin_unlock(&mem->lock);
> >> +	}
> >> +
> >> +done:
> >> +	spin_unlock_irqrestore(&ctx->pool_lock, flags);
> >> +
> >> +	if (found)
> >> +		return found;
> >> +
> >> +	mem = kzalloc(sizeof(*mem), GFP_ATOMIC);
> >
> > Should it be ATOMIC?
> >
>
> Hi Leon,
>
> The current DCA interfaces can be invoked by userspace through ibv_xx_cmd(),
> but it is expected that it can work in ib_post_xx() in kernel in the future.
> Since it may work in context of spin_lock, so we use GFP_ATOMIC.

Are you planning to invoke kzalloc in data path?

The GFP_ATOMIC will cause to use special allocation pool that is seen as precious
resource because it must to succeed.

It is better to avoid this flag if you don't need it.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-21  8:53       ` Leon Romanovsky
@ 2021-01-21 13:33         ` liweihang
  2021-01-21 13:34           ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: liweihang @ 2021-01-21 13:33 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: dledford, jgg, linux-rdma, linuxarm

On 2021/1/21 16:53, Leon Romanovsky wrote:
> On Thu, Jan 21, 2021 at 07:01:50AM +0000, liweihang wrote:
>> On 2021/1/20 16:10, Leon Romanovsky wrote:
>>> On Fri, Jan 15, 2021 at 06:22:12PM +0800, Weihang Li wrote:
>>>> From: Xi Wang <wangxi11@huawei.com>
>>>>
>>>> The hip09 introduces the DCA(Dynamic context attachment) feature which
>>>> supports many RC QPs to share the WQE buffer in a memory pool, this will
>>>> reduce the memory consumption when there are too many QPs are inactive.
>>>>
>>>> If a QP enables DCA feature, the WQE's buffer will not be allocated when
>>>> creating. But when the users start to post WRs, the hns driver will
>>>> allocate a buffer from the memory pool and then fill WQEs which tagged with
>>>> this QP's number.
>>>>
>>>> The hns ROCEE will stop accessing the WQE buffer when the user polled all
>>>> of the CQEs for a DCA QP, then the driver will recycle this WQE's buffer
>>>> to the memory pool.
>>>>
>>>> This patch adds a group of methods to support the user space register
>>>> buffers to a memory pool which belongs to the user context. The hns kernel
>>>> driver will update the pages state in this pool when the user calling the
>>>> post/poll methods and the user driver can get the QP's WQE buffer address
>>>> by the key and offset which queried from kernel.
>>>>
>>>> Signed-off-by: Xi Wang <wangxi11@huawei.com>
>>>> Signed-off-by: Weihang Li <liweihang@huawei.com>
>>>> ---
>>>>  drivers/infiniband/hw/hns/Makefile          |   2 +-
>>>>  drivers/infiniband/hw/hns/hns_roce_dca.c    | 381 ++++++++++++++++++++++++++++
>>>>  drivers/infiniband/hw/hns/hns_roce_dca.h    |  22 ++
>>>>  drivers/infiniband/hw/hns/hns_roce_device.h |  10 +
>>>>  drivers/infiniband/hw/hns/hns_roce_main.c   |  27 +-
>>>>  include/uapi/rdma/hns-abi.h                 |  23 ++
>>>>  6 files changed, 462 insertions(+), 3 deletions(-)
>>>>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
>>>>  create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h
>>>
>>> <...>
>>>
>>>> +static struct dca_mem *alloc_dca_mem(struct hns_roce_dca_ctx *ctx)
>>>> +{
>>>> +	struct dca_mem *mem, *tmp, *found = NULL;
>>>> +	unsigned long flags;
>>>> +
>>>> +	spin_lock_irqsave(&ctx->pool_lock, flags);
>>>> +	list_for_each_entry_safe(mem, tmp, &ctx->pool, list) {
>>>> +		spin_lock(&mem->lock);
>>>> +		if (dca_mem_is_free(mem)) {
>>>> +			found = mem;
>>>> +			set_dca_mem_alloced(mem);
>>>> +			spin_unlock(&mem->lock);
>>>> +			goto done;
>>>> +		}
>>>> +		spin_unlock(&mem->lock);
>>>> +	}
>>>> +
>>>> +done:
>>>> +	spin_unlock_irqrestore(&ctx->pool_lock, flags);
>>>> +
>>>> +	if (found)
>>>> +		return found;
>>>> +
>>>> +	mem = kzalloc(sizeof(*mem), GFP_ATOMIC);
>>>
>>> Should it be ATOMIC?
>>>
>>
>> Hi Leon,
>>
>> The current DCA interfaces can be invoked by userspace through ibv_xx_cmd(),
>> but it is expected that it can work in ib_post_xx() in kernel in the future.
>> Since it may work in context of spin_lock, so we use GFP_ATOMIC.
> 
> Are you planning to invoke kzalloc in data path?
> 
> The GFP_ATOMIC will cause to use special allocation pool that is seen as precious
> resource because it must to succeed.
> 
> It is better to avoid this flag if you don't need it.
> 
> Thanks

We need to allocate memory while spin_lock is hold, how about using GFP_KERNEL or
GFP_NOWAIT?

Thanks
Weihang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-21 13:33         ` liweihang
@ 2021-01-21 13:34           ` Jason Gunthorpe
  2021-01-21 13:48             ` liweihang
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-01-21 13:34 UTC (permalink / raw)
  To: liweihang; +Cc: Leon Romanovsky, dledford, linux-rdma, linuxarm

On Thu, Jan 21, 2021 at 01:33:42PM +0000, liweihang wrote:

> We need to allocate memory while spin_lock is hold, how about using GFP_KERNEL or
> GFP_NOWAIT?

You should try hard not to do that. Convert he spinlock to a mutex,
for instance.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-21 13:34           ` Jason Gunthorpe
@ 2021-01-21 13:48             ` liweihang
  2021-01-21 13:51               ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: liweihang @ 2021-01-21 13:48 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, dledford, linux-rdma, linuxarm

On 2021/1/21 21:36, Jason Gunthorpe wrote:
> On Thu, Jan 21, 2021 at 01:33:42PM +0000, liweihang wrote:
> 
>> We need to allocate memory while spin_lock is hold, how about using GFP_KERNEL or
>> GFP_NOWAIT?
> 
> You should try hard not to do that. Convert he spinlock to a mutex,
> for instance.
> 
> Jason
> 

But what if some kernel users call ib_post_send() when holding a spinlock?

Thanks
Weihang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-21 13:48             ` liweihang
@ 2021-01-21 13:51               ` Jason Gunthorpe
  2021-01-22  9:06                 ` liweihang
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-01-21 13:51 UTC (permalink / raw)
  To: liweihang; +Cc: Leon Romanovsky, dledford, linux-rdma, linuxarm

On Thu, Jan 21, 2021 at 01:48:56PM +0000, liweihang wrote:
> On 2021/1/21 21:36, Jason Gunthorpe wrote:
> > On Thu, Jan 21, 2021 at 01:33:42PM +0000, liweihang wrote:
> > 
> >> We need to allocate memory while spin_lock is hold, how about using GFP_KERNEL or
> >> GFP_NOWAIT?
> > 
> > You should try hard not to do that. Convert he spinlock to a mutex,
> > for instance.
> > 
> > Jason
> > 
> 
> But what if some kernel users call ib_post_send() when holding a spinlock?

I doubt extensions like this would be part of kernel verbs..

Does any ULP call ib_post_send under lock? I'm not sure that is valid.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP
  2021-01-21 13:51               ` Jason Gunthorpe
@ 2021-01-22  9:06                 ` liweihang
  0 siblings, 0 replies; 16+ messages in thread
From: liweihang @ 2021-01-22  9:06 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, dledford, linux-rdma, linuxarm

On 2021/1/21 21:52, Jason Gunthorpe wrote:
> On Thu, Jan 21, 2021 at 01:48:56PM +0000, liweihang wrote:
>> On 2021/1/21 21:36, Jason Gunthorpe wrote:
>>> On Thu, Jan 21, 2021 at 01:33:42PM +0000, liweihang wrote:
>>>
>>>> We need to allocate memory while spin_lock is hold, how about using GFP_KERNEL or
>>>> GFP_NOWAIT?
>>>
>>> You should try hard not to do that. Convert he spinlock to a mutex,
>>> for instance.
>>>
>>> Jason
>>>
>>
>> But what if some kernel users call ib_post_send() when holding a spinlock?
> 
> I doubt extensions like this would be part of kernel verbs..
> 
> Does any ULP call ib_post_send under lock? I'm not sure that is valid.
> 
> Jason
> 

I didn't find such a ULP calling ib_post_send in a spinlock either. Anyway,
I will use GFP_NOWAIT instead of GFP_ATOMIC.

Thanks
Weihang

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-01-22 11:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-15 10:22 [PATCH RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment Weihang Li
2021-01-15 10:22 ` [PATCH RFC 1/7] RDMA/hns: Introduce DCA for RC QP Weihang Li
2021-01-20  8:10   ` Leon Romanovsky
2021-01-21  7:01     ` liweihang
2021-01-21  8:53       ` Leon Romanovsky
2021-01-21 13:33         ` liweihang
2021-01-21 13:34           ` Jason Gunthorpe
2021-01-21 13:48             ` liweihang
2021-01-21 13:51               ` Jason Gunthorpe
2021-01-22  9:06                 ` liweihang
2021-01-15 10:22 ` [PATCH RFC 2/7] RDMA/hns: Add method for shrinking DCA memory pool Weihang Li
2021-01-15 10:22 ` [PATCH RFC 3/7] RDMA/hns: Configure DCA mode for the userspace QP Weihang Li
2021-01-15 10:22 ` [PATCH RFC 4/7] RDMA/hns: Add method for attaching WQE buffer Weihang Li
2021-01-15 10:22 ` [PATCH RFC 5/7] RDMA/hns: Setup the configuration of WQE addressing to QPC Weihang Li
2021-01-15 10:22 ` [PATCH RFC 6/7] RDMA/hns: Add method to detach WQE buffer Weihang Li
2021-01-15 10:22 ` [PATCH RFC 7/7] RDMA/hns: Add method to query WQE buffer's address Weihang Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.