All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v7 0/8] RDMA resource tracking
@ 2018-01-28  9:17 Leon Romanovsky
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise

Changelog:
v6->v7:
  * Added completion mechanism for rdam_restrack_put()
  * Converted init funciton to be void
  * Changed to use kref_get_unless_zero()
  * Forced valid to be false for not-added QPs.
  * Added centralized place to update index for skipped QPs.
v5->v6:
  * Removed SRCU
  * Added kref logic to protect resources
  * Dropped support of XRC QPs for now
v4->v5:
  * Rewrote logic to acquire kernel owners of PD and CQ objects
  * Simplified interface.
  * Added logic to return summary for all PID namespaces and not only
    init PID namespace.
  * Changed DB implementation from struct with arrays to be struct with
    hash table.
  * Moved to use SRCU per-hash table.
  * Dropped RW semaphores in favor of mutex to protect linked lists.
  * Dropped QP named annotation patch.
  * Used SPDX licenses.
v3->v4:
  * Added Steve ROB tags.
  * Updated restrack.h to be compatible with kernel-doc format
  * Fixed bug when not all QPs (more than one netlink page) were
    returned by netlink interface
  * Removed "MAX" values from object summary. In followup patches,
    I will eturn object summary per-namespace and not for init PID
    namespace only. Max values can be seen with ibv_devinfo tool.
  * Renamed "comm" to be kern_name.
  * Fix spelling errors.
v2->v3:
  * Added support of PID namespaces.
  * Rewrote rdma_restraack_add function to ensure that it won't
    appear in lists before it is valid.
  * Replace pid/task name caching logic to use task_struct instead.
  * Removed report of name of task's for user space objects. Users
    are expected to read it through /proc/PID/comm.
v1->v2:
 * Rebased on latest rdma/for-next
 * Replaced mutex lock which protect linked lists to be RW semaphore.
   It has no impact on current implementation, because there is only one
   reader (nldev) and it is serialized. However better to be prepared
   for multiple readers from the beginning.
 * Added reinitialization next QP entry logic to ensure that it exists
   and didn't vanish during fill_req_qp() work.
v0->v1:
 * Dropped RFC
 * Separated to thre series, one for for-rc, and two for-next.

-------
The original goal of this series was to allow ability to view connection
(QP) information about running processes, however I used this opportunity and
created common infrastructure to track and report various resources. The report
part is implemented in netlink (nldev), but smart ULPs can now create
advanced usage models based on device utilization.

The current implementation relies on one lock per-object per-device, so
creation/destroying of various objects (CQ, PD, e.t.c) on various or the
same devices doesn't interfere each with another.

The data protection is performed with SRCU and its reader-writer model
ensures that resource won't be destroyed till readers will finish their
work.

Possible future work will include:
 * Reducing number of locks in RDMA, because of SRCU.
 * Converting CMA to be based completely on resource tracking.
 * Addition of other objects and extending current to give full
   and detailed state of the RDMA kernel stack.
 * Replacing synchronize_srcu with call_srcu to make destroy flow
   non-blocking.
 * Provide reliable device reset flow, preserving resource creation ordering.

	Thanks
---------------------------------------

Leon Romanovsky (8):
  RDMA/core: Print caller name instead of function name
  RDMA/core: Save kernel caller name in PD and CQ objects
  RDMA/restrack: Add general infrastructure to track RDMA resources
  RDMA/core: Add resource tracking for create and destroy QPs
  RDMA/core: Add resource tracking for create and destroy CQs
  RDMA/core: Add resource tracking for create and destroy PDs
  RDMA/nldev: Provide global resource utilization
  RDMA/nldev: Provide detailed QP information

 drivers/infiniband/core/Makefile           |   2 +-
 drivers/infiniband/core/core_priv.h        |  28 +++
 drivers/infiniband/core/cq.c               |  16 +-
 drivers/infiniband/core/device.c           |   4 +
 drivers/infiniband/core/nldev.c            | 378 +++++++++++++++++++++++++++++
 drivers/infiniband/core/restrack.c         | 166 +++++++++++++
 drivers/infiniband/core/uverbs_cmd.c       |   7 +-
 drivers/infiniband/core/uverbs_std_types.c |   3 +
 drivers/infiniband/core/verbs.c            |  18 +-
 include/rdma/ib_verbs.h                    |  34 ++-
 include/rdma/restrack.h                    | 157 ++++++++++++
 include/uapi/rdma/rdma_netlink.h           |  55 +++++
 12 files changed, 851 insertions(+), 17 deletions(-)
 create mode 100644 drivers/infiniband/core/restrack.c
 create mode 100644 include/rdma/restrack.h

--
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 1/8] RDMA/core: Print caller name instead of function name
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2018-01-28  9:17   ` Leon Romanovsky
  2018-01-28  9:17   ` [PATCH rdma-next v7 2/8] RDMA/core: Save kernel caller name in PD and CQ objects Leon Romanovsky
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Reuse the name from the Kconfig to mark the caller.

Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/rdma/ib_verbs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5e32fe781ca3..1c6e9f52f127 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2858,7 +2858,7 @@ enum ib_pd_flags {
 struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 		const char *caller);
 #define ib_alloc_pd(device, flags) \
-	__ib_alloc_pd((device), (flags), __func__)
+	__ib_alloc_pd((device), (flags), KBUILD_MODNAME)
 void ib_dealloc_pd(struct ib_pd *pd);
 
 /**
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 2/8] RDMA/core: Save kernel caller name in PD and CQ objects
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-28  9:17   ` [PATCH rdma-next v7 1/8] RDMA/core: Print caller name instead of function name Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
  2018-01-28  9:17   ` [PATCH rdma-next v7 3/8] RDMA/restrack: Add general infrastructure to track RDMA resources Leon Romanovsky
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The KBUILD_MODNAME variable contains the module name
and it is known for kernel users during compilation,
so let's reuse it to track the owners.

Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cq.c    | 10 ++++++----
 drivers/infiniband/core/verbs.c |  4 ++--
 include/rdma/ib_verbs.h         | 13 ++++++++++---
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index c8c5a5a7f433..d99565dbd12f 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -120,20 +120,22 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
 }
 
 /**
- * ib_alloc_cq - allocate a completion queue
+ * __ib_alloc_cq - allocate a completion queue
  * @dev:		device to allocate the CQ for
  * @private:		driver private data, accessible from cq->cq_context
  * @nr_cqe:		number of CQEs to allocate
  * @comp_vector:	HCA completion vectors for this CQ
  * @poll_ctx:		context to poll the CQ from.
+ * @caller:		module owner name.
  *
  * This is the proper interface to allocate a CQ for in-kernel users. A
  * CQ allocated with this interface will automatically be polled from the
  * specified context. The ULP must use wr->wr_cqe instead of wr->wr_id
  * to use this CQ abstraction.
  */
-struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private,
-		int nr_cqe, int comp_vector, enum ib_poll_context poll_ctx)
+struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
+			    int nr_cqe, int comp_vector,
+			    enum ib_poll_context poll_ctx, const char *caller)
 {
 	struct ib_cq_init_attr cq_attr = {
 		.cqe		= nr_cqe,
@@ -185,7 +187,7 @@ struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private,
 	cq->device->destroy_cq(cq);
 	return ERR_PTR(ret);
 }
-EXPORT_SYMBOL(ib_alloc_cq);
+EXPORT_SYMBOL(__ib_alloc_cq);
 
 /**
  * ib_free_cq - free a completion queue
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e9c3991a93ff..c2b347f6e8a2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1782,7 +1782,7 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid)
 }
 EXPORT_SYMBOL(ib_detach_mcast);
 
-struct ib_xrcd *ib_alloc_xrcd(struct ib_device *device)
+struct ib_xrcd *__ib_alloc_xrcd(struct ib_device *device, const char *caller)
 {
 	struct ib_xrcd *xrcd;
 
@@ -1800,7 +1800,7 @@ struct ib_xrcd *ib_alloc_xrcd(struct ib_device *device)
 
 	return xrcd;
 }
-EXPORT_SYMBOL(ib_alloc_xrcd);
+EXPORT_SYMBOL(__ib_alloc_xrcd);
 
 int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
 {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1c6e9f52f127..f9cabb1b670e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3135,8 +3135,12 @@ static inline int ib_post_recv(struct ib_qp *qp,
 	return qp->device->post_recv(qp, recv_wr, bad_recv_wr);
 }
 
-struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private,
-		int nr_cqe, int comp_vector, enum ib_poll_context poll_ctx);
+struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
+			    int nr_cqe, int comp_vector,
+			    enum ib_poll_context poll_ctx, const char *caller);
+#define ib_alloc_cq(device, priv, nr_cqe, comp_vect, poll_ctx) \
+	__ib_alloc_cq((device), (priv), (nr_cqe), (comp_vect), (poll_ctx), KBUILD_MODNAME)
+
 void ib_free_cq(struct ib_cq *cq);
 int ib_process_cq_direct(struct ib_cq *cq, int budget);
 
@@ -3560,8 +3564,11 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid);
 /**
  * ib_alloc_xrcd - Allocates an XRC domain.
  * @device: The device on which to allocate the XRC domain.
+ * @caller: Module name for kernel consumers
  */
-struct ib_xrcd *ib_alloc_xrcd(struct ib_device *device);
+struct ib_xrcd *__ib_alloc_xrcd(struct ib_device *device, const char *caller);
+#define ib_alloc_xrcd(device) \
+	__ib_alloc_xrcd((device), KBUILD_MODNAME)
 
 /**
  * ib_dealloc_xrcd - Deallocates an XRC domain.
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 3/8] RDMA/restrack: Add general infrastructure to track RDMA resources
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-28  9:17   ` [PATCH rdma-next v7 1/8] RDMA/core: Print caller name instead of function name Leon Romanovsky
  2018-01-28  9:17   ` [PATCH rdma-next v7 2/8] RDMA/core: Save kernel caller name in PD and CQ objects Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
       [not found]     ` <20180128091725.13103-4-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-28  9:17   ` [PATCH rdma-next v7 4/8] RDMA/core: Add resource tracking for create and destroy QPs Leon Romanovsky
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The RDMA subsystem has very strict set of objects to work on it,
but it completely lacks tracking facilities and no visibility of
resource utilization.

The following patch adds such infrastructure to keep track of RDMA
resources to help with debugging of user space applications. The primary
user of this infrastructure is RDMA nldev netlink (following patches),
but it is not limited too.

At this stage, the main three objects (PD, CQ and QP) are added,
and more will be added later.

Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/Makefile    |   2 +-
 drivers/infiniband/core/core_priv.h |   1 +
 drivers/infiniband/core/device.c    |   4 +
 drivers/infiniband/core/restrack.c  | 166 ++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h             |  19 +++++
 include/rdma/restrack.h             | 157 ++++++++++++++++++++++++++++++++++
 6 files changed, 348 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/restrack.c
 create mode 100644 include/rdma/restrack.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 504b926552c6..f69833db0a32 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
-				security.o nldev.o
+				security.o nldev.o restrack.o
 
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index aef9aa0ac0e6..2b1372da708a 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -40,6 +40,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/opa_addr.h>
 #include <rdma/ib_mad.h>
+#include <rdma/restrack.h>
 #include "mad_priv.h"
 
 /* Total number of ports combined across all struct ib_devices's */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 2826e06311a5..e8010e73a1cf 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -263,6 +263,8 @@ struct ib_device *ib_alloc_device(size_t size)
 	if (!device)
 		return NULL;
 
+	rdma_restrack_init(&device->res);
+
 	device->dev.class = &ib_class;
 	device_initialize(&device->dev);
 
@@ -596,6 +598,8 @@ void ib_unregister_device(struct ib_device *device)
 	}
 	up_read(&lists_rwsem);
 
+	rdma_restrack_clean(&device->res);
+
 	ib_device_unregister_rdmacg(device);
 	ib_device_unregister_sysfs(device);
 
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
new file mode 100644
index 000000000000..351b6940f6dc
--- /dev/null
+++ b/drivers/infiniband/core/restrack.c
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) */
+/*
+ * Copyright (c) 2017-2018 Mellanox Technologies. All rights reserved.
+ */
+
+#include <rdma/ib_verbs.h>
+#include <rdma/restrack.h>
+#include <linux/mutex.h>
+#include <linux/sched/task.h>
+#include <linux/uaccess.h>
+#include <linux/pid_namespace.h>
+
+void rdma_restrack_init(struct rdma_restrack_root *res)
+{
+	init_rwsem(&res->rwsem);
+}
+
+void rdma_restrack_clean(struct rdma_restrack_root *res)
+{
+	WARN_ON_ONCE(!hash_empty(res->hash));
+}
+
+int rdma_restrack_count(struct rdma_restrack_root *res,
+			enum rdma_restrack_type type,
+			struct pid_namespace *ns)
+{
+	struct rdma_restrack_entry *e;
+	u32 cnt = 0;
+
+	down_read(&res->rwsem);
+	hash_for_each_possible(res->hash, e, node, type) {
+		if (ns == &init_pid_ns ||
+		    (!rdma_is_kernel_res(e) &&
+		     ns == task_active_pid_ns(e->task)))
+			cnt++;
+	}
+	up_read(&res->rwsem);
+	return cnt;
+}
+EXPORT_SYMBOL(rdma_restrack_count);
+
+static void set_kern_name(struct rdma_restrack_entry *res)
+{
+	enum rdma_restrack_type type = res->type;
+	struct ib_qp *qp;
+
+	if (type != RDMA_RESTRACK_QP)
+		/* PD and CQ types already have this name embedded in */
+		return;
+
+	qp = container_of(res, struct ib_qp, res);
+	if (!qp->pd) {
+		WARN_ONCE(true, "XRC QPs are not supported\n");
+		/* Survive, despite the programmer's error */
+		res->kern_name = " ";
+		return;
+	}
+
+	res->kern_name = qp->pd->res.kern_name;
+}
+
+static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
+{
+	enum rdma_restrack_type type = res->type;
+	struct ib_device *dev;
+	struct ib_xrcd *xrcd;
+	struct ib_pd *pd;
+	struct ib_cq *cq;
+	struct ib_qp *qp;
+
+	switch (type) {
+	case RDMA_RESTRACK_PD:
+		pd = container_of(res, struct ib_pd, res);
+		dev = pd->device;
+		break;
+	case RDMA_RESTRACK_CQ:
+		cq = container_of(res, struct ib_cq, res);
+		dev = cq->device;
+		break;
+	case RDMA_RESTRACK_QP:
+		qp = container_of(res, struct ib_qp, res);
+		dev = qp->device;
+		break;
+	case RDMA_RESTRACK_XRCD:
+		xrcd = container_of(res, struct ib_xrcd, res);
+		dev = xrcd->device;
+		break;
+	default:
+		WARN_ONCE(true, "Wrong resource tracking type %u\n", type);
+		return NULL;
+	}
+
+	return dev;
+}
+
+void rdma_restrack_add(struct rdma_restrack_entry *res)
+{
+	struct ib_device *dev = res_to_dev(res);
+
+	if (!dev)
+		return;
+
+	if (!uaccess_kernel()) {
+		get_task_struct(current);
+		res->task = current;
+		res->kern_name = NULL;
+	} else {
+		set_kern_name(res);
+		res->task = NULL;
+	}
+
+	kref_init(&res->kref);
+	init_completion(&res->comp);
+	res->valid = true;
+
+	down_write(&dev->res.rwsem);
+	hash_add(dev->res.hash, &res->node, res->type);
+	up_write(&dev->res.rwsem);
+}
+EXPORT_SYMBOL(rdma_restrack_add);
+
+int __must_check rdma_restrack_get(struct rdma_restrack_entry *res)
+{
+	return kref_get_unless_zero(&res->kref);
+}
+EXPORT_SYMBOL(rdma_restrack_get);
+
+static void restrack_release(struct kref *kref)
+{
+	struct rdma_restrack_entry *res;
+
+	res = container_of(kref, struct rdma_restrack_entry, kref);
+	complete(&res->comp);
+}
+
+int rdma_restrack_put(struct rdma_restrack_entry *res)
+{
+	return kref_put(&res->kref, restrack_release);
+}
+EXPORT_SYMBOL(rdma_restrack_put);
+
+void rdma_restrack_del(struct rdma_restrack_entry *res)
+{
+	struct ib_device *dev;
+
+	if (!res->valid)
+		return;
+
+	dev = res_to_dev(res);
+	if (!dev)
+		return;
+
+	down_read(&dev->res.rwsem);
+	rdma_restrack_put(res);
+	up_read(&dev->res.rwsem);
+
+	wait_for_completion(&res->comp);
+
+	down_write(&dev->res.rwsem);
+	hash_del(&res->node);
+	res->valid = false;
+	if (res->task)
+		put_task_struct(res->task);
+	up_write(&dev->res.rwsem);
+}
+EXPORT_SYMBOL(rdma_restrack_del);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f9cabb1b670e..65af66b477cb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -63,6 +63,7 @@
 #include <linux/uaccess.h>
 #include <linux/cgroup_rdma.h>
 #include <uapi/rdma/ib_user_verbs.h>
+#include <rdma/restrack.h>
 
 #define IB_FW_VERSION_NAME_MAX	ETHTOOL_FWVERS_LEN
 
@@ -1530,6 +1531,7 @@ struct ib_pd {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct ib_mr	       *__internal_mr;
+	struct rdma_restrack_entry res;
 };
 
 struct ib_xrcd {
@@ -1539,6 +1541,10 @@ struct ib_xrcd {
 
 	struct mutex		tgt_qp_mutex;
 	struct list_head	tgt_qp_list;
+	/*
+	 * Implementation details of the RDMA core, don't use in drivers:
+	 */
+	struct rdma_restrack_entry res;
 };
 
 struct ib_ah {
@@ -1570,6 +1576,10 @@ struct ib_cq {
 		struct irq_poll		iop;
 		struct work_struct	work;
 	};
+	/*
+	 * Implementation details of the RDMA core, don't use in drivers:
+	 */
+	struct rdma_restrack_entry res;
 };
 
 struct ib_srq {
@@ -1746,6 +1756,11 @@ struct ib_qp {
 	struct ib_rwq_ind_table *rwq_ind_tbl;
 	struct ib_qp_security  *qp_sec;
 	u8			port;
+
+	/*
+	 * Implementation details of the RDMA core, don't use in drivers:
+	 */
+	struct rdma_restrack_entry     res;
 };
 
 struct ib_mr {
@@ -2352,6 +2367,10 @@ struct ib_device {
 #endif
 
 	u32                          index;
+	/*
+	 * Implementation details of the RDMA core, don't use in drivers
+	 */
+	struct rdma_restrack_root     res;
 
 	/**
 	 * The following mandatory functions are used only at device
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
new file mode 100644
index 000000000000..c2d81167c858
--- /dev/null
+++ b/include/rdma/restrack.h
@@ -0,0 +1,157 @@
+/* SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) */
+/*
+ * Copyright (c) 2017-2018 Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef _RDMA_RESTRACK_H_
+#define _RDMA_RESTRACK_H_
+
+#include <linux/typecheck.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/kref.h>
+#include <linux/completion.h>
+
+/**
+ * enum rdma_restrack_type - HW objects to track
+ */
+enum rdma_restrack_type {
+	/**
+	 * @RDMA_RESTRACK_PD: Protection domain (PD)
+	 */
+	RDMA_RESTRACK_PD,
+	/**
+	 * @RDMA_RESTRACK_CQ: Completion queue (CQ)
+	 */
+	RDMA_RESTRACK_CQ,
+	/**
+	 * @RDMA_RESTRACK_QP: Queue pair (QP)
+	 */
+	RDMA_RESTRACK_QP,
+	/**
+	 * @RDMA_RESTRACK_XRCD: XRC domain (XRCD)
+	 */
+	RDMA_RESTRACK_XRCD,
+	/**
+	 * @RDMA_RESTRACK_MAX: Last entry, used for array dclarations
+	 */
+	RDMA_RESTRACK_MAX
+};
+
+#define RDMA_RESTRACK_HASH_BITS	8
+/**
+ * struct rdma_restrack_root - main resource tracking management
+ * entity, per-device
+ */
+struct rdma_restrack_root {
+	/*
+	 * @rwsem: Read/write lock to protect lists
+	 */
+	struct rw_semaphore	rwsem;
+	/**
+	 * @hash: global database for all resources per-device
+	 */
+	DECLARE_HASHTABLE(hash, RDMA_RESTRACK_HASH_BITS);
+};
+
+/**
+ * struct rdma_restrack_entry - metadata per-entry
+ */
+struct rdma_restrack_entry {
+	/**
+	 * @valid: validity indicator
+	 *
+	 * The entries are filled during rdma_restrack_add,
+	 * can be attempted to be free during rdma_restrack_del.
+	 *
+	 * As an example for that, see mlx5 QPs with type MLX5_IB_QPT_HW_GSI
+	 */
+	bool			valid;
+	/*
+	 * @kref: Protect destroy of the resource
+	 */
+	struct kref		kref;
+	/*
+	 * @comp: Signal that all consumers of resource are completed their work
+	 */
+	struct completion	comp;
+	/**
+	 * @task: owner of resource tracking entity
+	 *
+	 * There are two types of entities: created by user and created
+	 * by kernel.
+	 *
+	 * This is relevant for the entities created by users.
+	 * For the entities created by kernel, this pointer will be NULL.
+	 */
+	struct task_struct	*task;
+	/**
+	 * @kern_name: name of owner for the kernel created entities.
+	 */
+	const char		*kern_name;
+	/**
+	 * @node: hash table entry
+	 */
+	struct hlist_node	node;
+	/**
+	 * @type: various objects in restrack database
+	 */
+	enum rdma_restrack_type	type;
+};
+
+/**
+ * rdma_restrack_init() - initialize resource tracking
+ * @res:  resource tracking root
+ */
+void rdma_restrack_init(struct rdma_restrack_root *res);
+
+/**
+ * rdma_restrack_clean() - clean resource tracking
+ * @res:  resource tracking root
+ */
+void rdma_restrack_clean(struct rdma_restrack_root *res);
+
+/**
+ * rdma_restrack_count() - the current usage of specific object
+ * @res:  resource entry
+ * @type: actual type of object to operate
+ * @ns:   PID namespace
+ */
+int rdma_restrack_count(struct rdma_restrack_root *res,
+			enum rdma_restrack_type type,
+			struct pid_namespace *ns);
+
+/**
+ * rdma_restrack_add() - add object to the reource tracking database
+ * @res:  resource entry
+ */
+void rdma_restrack_add(struct rdma_restrack_entry *res);
+
+/**
+ * rdma_restrack_del() - delete object from the reource tracking database
+ * @res:  resource entry
+ * @type: actual type of object to operate
+ */
+void rdma_restrack_del(struct rdma_restrack_entry *res);
+
+/**
+ * rdma_is_kernel_res() - check the owner of resource
+ * @res:  resource entry
+ */
+static inline bool rdma_is_kernel_res(struct rdma_restrack_entry *res)
+{
+	return !res->task;
+}
+
+/**
+ * rdma_restrack_get() - grab to protect resource from release
+ * @res:  resource entry
+ */
+int __must_check rdma_restrack_get(struct rdma_restrack_entry *res);
+
+/**
+ * rdma_restrack_put() - relase resource
+ * @res:  resource entry
+ */
+int rdma_restrack_put(struct rdma_restrack_entry *res);
+#endif /* _RDMA_RESTRACK_H_ */
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 4/8] RDMA/core: Add resource tracking for create and destroy QPs
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-01-28  9:17   ` [PATCH rdma-next v7 3/8] RDMA/restrack: Add general infrastructure to track RDMA resources Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
  2018-01-28  9:17   ` [PATCH rdma-next v7 5/8] RDMA/core: Add resource tracking for create and destroy CQs Leon Romanovsky
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Track create and destroy operations of QP objects.

Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/core_priv.h  | 27 +++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_cmd.c |  3 +--
 drivers/infiniband/core/verbs.c      |  6 ++----
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 2b1372da708a..3dcacf220e5e 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -301,4 +301,31 @@ struct ib_device *ib_device_get_by_index(u32 ifindex);
 /* RDMA device netlink */
 void nldev_init(void);
 void nldev_exit(void);
+
+static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
+					  struct ib_pd *pd,
+					  struct ib_qp_init_attr *attr,
+					  struct ib_udata *udata)
+{
+	struct ib_qp *qp;
+
+	qp = dev->create_qp(pd, attr, udata);
+	if (!IS_ERR(qp)) {
+		qp->device = dev;
+		qp->pd	   = pd;
+		/*
+		 * We don't track XRC QPs for now, because they don't have PD
+		 * and more importantly they are created internaly by driver,
+		 * see mlx5 create_dev_resources() as an example.
+		 */
+		if (attr->qp_type < IB_QPT_XRC_INI) {
+			qp->res.type = RDMA_RESTRACK_QP;
+			rdma_restrack_add(&qp->res);
+		} else {
+			qp->res.valid = false;
+		}
+	}
+
+	return qp;
+}
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 4ddd61d90507..825325c764a1 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1514,7 +1514,7 @@ static int create_qp(struct ib_uverbs_file *file,
 	if (cmd->qp_type == IB_QPT_XRC_TGT)
 		qp = ib_create_qp(pd, &attr);
 	else
-		qp = device->create_qp(pd, &attr, uhw);
+		qp = _ib_create_qp(device, pd, &attr, uhw);
 
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
@@ -1527,7 +1527,6 @@ static int create_qp(struct ib_uverbs_file *file,
 			goto err_cb;
 
 		qp->real_qp	  = qp;
-		qp->device	  = device;
 		qp->pd		  = pd;
 		qp->send_cq	  = attr.send_cq;
 		qp->recv_cq	  = attr.recv_cq;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index c2b347f6e8a2..c3628a437403 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -844,7 +844,6 @@ static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
 
 	qp->event_handler = __ib_shared_qp_event_handler;
 	qp->qp_context = qp;
-	qp->pd = NULL;
 	qp->send_cq = qp->recv_cq = NULL;
 	qp->srq = NULL;
 	qp->xrcd = qp_init_attr->xrcd;
@@ -882,7 +881,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	if (qp_init_attr->cap.max_rdma_ctxs)
 		rdma_rw_init_qp(device, qp_init_attr);
 
-	qp = device->create_qp(pd, qp_init_attr, NULL);
+	qp = _ib_create_qp(device, pd, qp_init_attr, NULL);
 	if (IS_ERR(qp))
 		return qp;
 
@@ -892,7 +891,6 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 		return ERR_PTR(ret);
 	}
 
-	qp->device     = device;
 	qp->real_qp    = qp;
 	qp->uobject    = NULL;
 	qp->qp_type    = qp_init_attr->qp_type;
@@ -922,7 +920,6 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 			atomic_inc(&qp_init_attr->srq->usecnt);
 	}
 
-	qp->pd	    = pd;
 	qp->send_cq = qp_init_attr->send_cq;
 	qp->xrcd    = NULL;
 
@@ -1538,6 +1535,7 @@ int ib_destroy_qp(struct ib_qp *qp)
 	if (!qp->uobject)
 		rdma_rw_cleanup_mrs(qp);
 
+	rdma_restrack_del(&qp->res);
 	ret = qp->device->destroy_qp(qp);
 	if (!ret) {
 		if (pd)
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 5/8] RDMA/core: Add resource tracking for create and destroy CQs
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-01-28  9:17   ` [PATCH rdma-next v7 4/8] RDMA/core: Add resource tracking for create and destroy QPs Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
  2018-01-28  9:17   ` [PATCH rdma-next v7 6/8] RDMA/core: Add resource tracking for create and destroy PDs Leon Romanovsky
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Track create and destroy operations of CQ objects.

Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/cq.c               | 6 ++++++
 drivers/infiniband/core/uverbs_cmd.c       | 2 ++
 drivers/infiniband/core/uverbs_std_types.c | 3 +++
 drivers/infiniband/core/verbs.c            | 3 +++
 4 files changed, 14 insertions(+)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index d99565dbd12f..bc79ca8215d7 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -159,6 +159,10 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 	if (!cq->wc)
 		goto out_destroy_cq;
 
+	cq->res.type = RDMA_RESTRACK_CQ;
+	cq->res.kern_name = caller;
+	rdma_restrack_add(&cq->res);
+
 	switch (cq->poll_ctx) {
 	case IB_POLL_DIRECT:
 		cq->comp_handler = ib_cq_completion_direct;
@@ -183,6 +187,7 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 
 out_free_wc:
 	kfree(cq->wc);
+	rdma_restrack_del(&cq->res);
 out_destroy_cq:
 	cq->device->destroy_cq(cq);
 	return ERR_PTR(ret);
@@ -214,6 +219,7 @@ void ib_free_cq(struct ib_cq *cq)
 	}
 
 	kfree(cq->wc);
+	rdma_restrack_del(&cq->res);
 	ret = cq->device->destroy_cq(cq);
 	WARN_ON_ONCE(ret);
 }
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 825325c764a1..3e95acd29de7 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1033,6 +1033,8 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 		goto err_cb;
 
 	uobj_alloc_commit(&obj->uobject);
+	cq->res.type = RDMA_RESTRACK_CQ;
+	rdma_restrack_add(&cq->res);
 
 	return obj;
 
diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c
index c3ee5d9b336d..b571176babbe 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -35,6 +35,7 @@
 #include <rdma/ib_verbs.h>
 #include <linux/bug.h>
 #include <linux/file.h>
+#include <rdma/restrack.h>
 #include "rdma_core.h"
 #include "uverbs.h"
 
@@ -319,6 +320,8 @@ static int uverbs_create_cq_handler(struct ib_device *ib_dev,
 	obj->uobject.object = cq;
 	obj->uobject.user_handle = user_handle;
 	atomic_set(&cq->usecnt, 0);
+	cq->res.type = RDMA_RESTRACK_CQ;
+	rdma_restrack_add(&cq->res);
 
 	ret = uverbs_copy_to(attrs, CREATE_CQ_RESP_CQE, &cq->cqe);
 	if (ret)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index c3628a437403..983b49ffb8d5 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1578,6 +1578,8 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
 		cq->event_handler = event_handler;
 		cq->cq_context    = cq_context;
 		atomic_set(&cq->usecnt, 0);
+		cq->res.type = RDMA_RESTRACK_CQ;
+		rdma_restrack_add(&cq->res);
 	}
 
 	return cq;
@@ -1596,6 +1598,7 @@ int ib_destroy_cq(struct ib_cq *cq)
 	if (atomic_read(&cq->usecnt))
 		return -EBUSY;
 
+	rdma_restrack_del(&cq->res);
 	return cq->device->destroy_cq(cq);
 }
 EXPORT_SYMBOL(ib_destroy_cq);
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 6/8] RDMA/core: Add resource tracking for create and destroy PDs
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-01-28  9:17   ` [PATCH rdma-next v7 5/8] RDMA/core: Add resource tracking for create and destroy CQs Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
       [not found]     ` <20180128091725.13103-7-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-28  9:17   ` [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization Leon Romanovsky
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Track create and destroy operations of PD objects.

Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 2 ++
 drivers/infiniband/core/verbs.c      | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3e95acd29de7..256934d1f64f 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -340,6 +340,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	uobj->object = pd;
 	memset(&resp, 0, sizeof resp);
 	resp.pd_handle = uobj->id;
+	pd->res.type = RDMA_RESTRACK_PD;
+	rdma_restrack_add(&pd->res);
 
 	if (copy_to_user(u64_to_user_ptr(cmd.response), &resp, sizeof resp)) {
 		ret = -EFAULT;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 983b49ffb8d5..a98a3e8412f8 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -263,6 +263,10 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 		mr_access_flags |= IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE;
 	}
 
+	pd->res.type = RDMA_RESTRACK_PD;
+	pd->res.kern_name = caller;
+	rdma_restrack_add(&pd->res);
+
 	if (mr_access_flags) {
 		struct ib_mr *mr;
 
@@ -312,6 +316,7 @@ void ib_dealloc_pd(struct ib_pd *pd)
 	   requires the caller to guarantee we can't race here. */
 	WARN_ON(atomic_read(&pd->usecnt));
 
+	rdma_restrack_del(&pd->res);
 	/* Making delalloc_pd a void return is a WIP, no driver should return
 	   an error here. */
 	ret = pd->device->dealloc_pd(pd);
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-01-28  9:17   ` [PATCH rdma-next v7 6/8] RDMA/core: Add resource tracking for create and destroy PDs Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
       [not found]     ` <20180128091725.13103-8-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-28  9:17   ` [PATCH rdma-next v7 8/8] RDMA/nldev: Provide detailed QP information Leon Romanovsky
  2018-01-28 21:05   ` [PATCH rdma-next v7 0/8] RDMA resource tracking Jason Gunthorpe
  8 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Export through netlink interface, the global device utilization
for the rdmatool as the main user of RDMA nldev interface.

Provide both dumpit and doit callbacks.

As an example of possible output from rdmatool for system with 5
Mellanox's card

$ rdma res
1: mlx5_0: qp 4 cq 5 pd 3
2: mlx5_1: qp 4 cq 5 pd 3
3: mlx5_2: qp 4 cq 5 pd 3
4: mlx5_3: qp 2 cq 3 pd 2
5: mlx5_4: qp 4 cq 5 pd 3

Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/nldev.c  | 151 +++++++++++++++++++++++++++++++++++++++
 include/uapi/rdma/rdma_netlink.h |  10 +++
 2 files changed, 161 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 5d790c507c7e..7bbd88a6b6a0 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -31,6 +31,8 @@
  */
 
 #include <linux/module.h>
+#include <linux/pid.h>
+#include <linux/pid_namespace.h>
 #include <net/netlink.h>
 #include <rdma/rdma_netlink.h>
 
@@ -52,6 +54,11 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_PORT_STATE]	= { .type = NLA_U8 },
 	[RDMA_NLDEV_ATTR_PORT_PHYS_STATE] = { .type = NLA_U8 },
 	[RDMA_NLDEV_ATTR_DEV_NODE_TYPE] = { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_RES_SUMMARY]	= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY]	= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME] = { .type = NLA_NUL_STRING,
+					     .len = 16 },
+	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR] = { .type = NLA_U64 },
 };
 
 static int fill_nldev_handle(struct sk_buff *msg, struct ib_device *device)
@@ -134,6 +141,65 @@ static int fill_port_info(struct sk_buff *msg,
 	return 0;
 }
 
+static int fill_res_info_entry(struct sk_buff *msg,
+			       const char *name, u64 curr)
+{
+	struct nlattr *entry_attr;
+
+	entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY);
+	if (!entry_attr)
+		return -EMSGSIZE;
+
+	if (nla_put_string(msg, RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME, name))
+		goto err;
+	if (nla_put_u64_64bit(msg,
+			      RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR, curr, 0))
+		goto err;
+
+	nla_nest_end(msg, entry_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, entry_attr);
+	return -EMSGSIZE;
+}
+
+static int fill_res_info(struct sk_buff *msg, struct ib_device *device)
+{
+	static const char *names[RDMA_RESTRACK_MAX] = {
+		[RDMA_RESTRACK_PD] = "pd",
+		[RDMA_RESTRACK_CQ] = "cq",
+		[RDMA_RESTRACK_QP] = "qp",
+	};
+
+	struct rdma_restrack_root *res = &device->res;
+	struct nlattr *table_attr;
+	int ret, i, curr;
+
+	if (fill_nldev_handle(msg, device))
+		return -EMSGSIZE;
+
+	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_SUMMARY);
+	if (!table_attr)
+		return -EMSGSIZE;
+
+	for (i = 0; i < RDMA_RESTRACK_MAX; i++) {
+		if (!names[i])
+			continue;
+		curr = rdma_restrack_count(res, i, task_active_pid_ns(current));
+		ret = fill_res_info_entry(msg, names[i], curr);
+		if (ret)
+			goto err;
+	}
+
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, table_attr);
+	return ret;
+}
+
 static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct netlink_ext_ack *extack)
 {
@@ -329,6 +395,87 @@ static int nldev_port_get_dumpit(struct sk_buff *skb,
 	return skb->len;
 }
 
+static int nldev_res_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+			      struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	struct ib_device *device;
+	struct sk_buff *msg;
+	u32 index;
+	int ret;
+
+	ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (ret || !tb[RDMA_NLDEV_ATTR_DEV_INDEX])
+		return -EINVAL;
+
+	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(index);
+	if (!device)
+		return -EINVAL;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		goto err;
+
+	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_RES_GET),
+			0, 0);
+
+	ret = fill_res_info(msg, device);
+	if (ret)
+		goto err_free;
+
+	nlmsg_end(msg, nlh);
+	put_device(&device->dev);
+	return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_free:
+	nlmsg_free(msg);
+err:
+	put_device(&device->dev);
+	return ret;
+}
+
+static int _nldev_res_get_dumpit(struct ib_device *device,
+				 struct sk_buff *skb,
+				 struct netlink_callback *cb,
+				 unsigned int idx)
+{
+	int start = cb->args[0];
+	struct nlmsghdr *nlh;
+
+	if (idx < start)
+		return 0;
+
+	nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_RES_GET),
+			0, NLM_F_MULTI);
+
+	if (fill_res_info(skb, device)) {
+		nlmsg_cancel(skb, nlh);
+		goto out;
+	}
+
+	nlmsg_end(skb, nlh);
+
+	idx++;
+
+out:
+	cb->args[0] = idx;
+	return skb->len;
+}
+
+static int nldev_res_get_dumpit(struct sk_buff *skb,
+				struct netlink_callback *cb)
+{
+	/*
+	 * There is no need to take lock, because
+	 * we are relying on ib_core's lists_rwsem
+	 */
+	return ib_enum_all_devs(_nldev_res_get_dumpit, skb, cb);
+}
+
 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_GET] = {
 		.doit = nldev_get_doit,
@@ -338,6 +485,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 		.doit = nldev_port_get_doit,
 		.dump = nldev_port_get_dumpit,
 	},
+	[RDMA_NLDEV_CMD_RES_GET] = {
+		.doit = nldev_res_get_doit,
+		.dump = nldev_res_get_dumpit,
+	},
 };
 
 void __init nldev_init(void)
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index cc002e316d09..e0f5cdc81541 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -236,6 +236,11 @@ enum rdma_nldev_command {
 	RDMA_NLDEV_CMD_PORT_NEW,
 	RDMA_NLDEV_CMD_PORT_DEL,
 
+	RDMA_NLDEV_CMD_RES_GET, /* can dump */
+	RDMA_NLDEV_CMD_RES_SET,
+	RDMA_NLDEV_CMD_RES_NEW,
+	RDMA_NLDEV_CMD_RES_DEL,
+
 	RDMA_NLDEV_NUM_OPS
 };
 
@@ -303,6 +308,11 @@ enum rdma_nldev_attr {
 
 	RDMA_NLDEV_ATTR_DEV_NODE_TYPE,		/* u8 */
 
+	RDMA_NLDEV_ATTR_RES_SUMMARY,		/* nested table */
+	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY,	/* nested table */
+	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME,	/* string */
+	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR,	/* u64 */
+
 	RDMA_NLDEV_ATTR_MAX
 };
 #endif /* _UAPI_RDMA_NETLINK_H */
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH rdma-next v7 8/8] RDMA/nldev: Provide detailed QP information
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-01-28  9:17   ` [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization Leon Romanovsky
@ 2018-01-28  9:17   ` Leon Romanovsky
  2018-01-28 21:05   ` [PATCH rdma-next v7 0/8] RDMA resource tracking Jason Gunthorpe
  8 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-28  9:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement RDMA nldev netlink interface to get detailed
QP information.

Currently only dumpit variant is implemented.

Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/nldev.c  | 227 +++++++++++++++++++++++++++++++++++++++
 include/uapi/rdma/rdma_netlink.h |  45 ++++++++
 2 files changed, 272 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 7bbd88a6b6a0..421d5fa6a817 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -59,6 +59,18 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME] = { .type = NLA_NUL_STRING,
 					     .len = 16 },
 	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR] = { .type = NLA_U64 },
+	[RDMA_NLDEV_ATTR_RES_QP]		= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_RES_QP_ENTRY]		= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_RES_LQPN]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_RES_RQPN]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_RES_RQ_PSN]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_RES_SQ_PSN]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_RES_PATH_MIG_STATE] = { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_RES_TYPE]		= { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_RES_STATE]		= { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_RES_PID]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_RES_KERN_NAME]		= { .type = NLA_NUL_STRING,
+						    .len = TASK_COMM_LEN },
 };
 
 static int fill_nldev_handle(struct sk_buff *msg, struct ib_device *device)
@@ -200,6 +212,78 @@ static int fill_res_info(struct sk_buff *msg, struct ib_device *device)
 	return ret;
 }
 
+static int fill_res_qp_entry(struct sk_buff *msg,
+			     struct ib_qp *qp, uint32_t port)
+{
+	struct rdma_restrack_entry *res = &qp->res;
+	struct ib_qp_init_attr qp_init_attr;
+	struct nlattr *entry_attr;
+	struct ib_qp_attr qp_attr;
+	int ret;
+
+	ret = ib_query_qp(qp, &qp_attr, 0, &qp_init_attr);
+	if (ret)
+		return ret;
+
+	if (port && port != qp_attr.port_num)
+		return 0;
+
+	entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_QP_ENTRY);
+	if (!entry_attr)
+		goto out;
+
+	/* In create_qp() port is not set yet */
+	if (qp_attr.port_num &&
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, qp_attr.port_num))
+		goto err;
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qp->qp_num))
+		goto err;
+	if (qp->qp_type == IB_QPT_RC || qp->qp_type == IB_QPT_UC) {
+		if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_RQPN,
+				qp_attr.dest_qp_num))
+			goto err;
+		if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_RQ_PSN,
+				qp_attr.rq_psn))
+			goto err;
+	}
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_SQ_PSN, qp_attr.sq_psn))
+		goto err;
+
+	if (qp->qp_type == IB_QPT_RC || qp->qp_type == IB_QPT_UC ||
+	    qp->qp_type == IB_QPT_XRC_INI || qp->qp_type == IB_QPT_XRC_TGT) {
+		if (nla_put_u8(msg, RDMA_NLDEV_ATTR_RES_PATH_MIG_STATE,
+			       qp_attr.path_mig_state))
+			goto err;
+	}
+	if (nla_put_u8(msg, RDMA_NLDEV_ATTR_RES_TYPE, qp->qp_type))
+		goto err;
+	if (nla_put_u8(msg, RDMA_NLDEV_ATTR_RES_STATE, qp_attr.qp_state))
+		goto err;
+
+	/*
+	 * Existence of task means that it is user QP and netlink
+	 * user is invited to go and read /proc/PID/comm to get name
+	 * of the task file and res->task_com should be NULL.
+	 */
+	if (rdma_is_kernel_res(res)) {
+		if (nla_put_string(msg, RDMA_NLDEV_ATTR_RES_KERN_NAME, res->kern_name))
+			goto err;
+	} else {
+		if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_PID, task_pid_vnr(res->task)))
+			goto err;
+	}
+
+	nla_nest_end(msg, entry_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, entry_attr);
+out:
+	return -EMSGSIZE;
+}
+
 static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct netlink_ext_ack *extack)
 {
@@ -476,6 +560,136 @@ static int nldev_res_get_dumpit(struct sk_buff *skb,
 	return ib_enum_all_devs(_nldev_res_get_dumpit, skb, cb);
 }
 
+static int nldev_res_get_qp_dumpit(struct sk_buff *skb,
+				   struct netlink_callback *cb)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	struct rdma_restrack_entry *res;
+	int err, ret = 0, idx = 0;
+	struct nlattr *table_attr;
+	struct ib_device *device;
+	int start = cb->args[0];
+	struct ib_qp *qp = NULL;
+	struct nlmsghdr *nlh;
+	u32 index, port = 0;
+
+	err = nlmsg_parse(cb->nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, NULL);
+	/*
+	 * Right now, we are expecting the device index to get QP information,
+	 * but it is possible to extend this code to return all devices in
+	 * one shot by checking the existence of RDMA_NLDEV_ATTR_DEV_INDEX.
+	 * if it doesn't exist, we will iterate over all devices.
+	 *
+	 * But it is not needed for now.
+	 */
+	if (err || !tb[RDMA_NLDEV_ATTR_DEV_INDEX])
+		return -EINVAL;
+
+	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(index);
+	if (!device)
+		return -EINVAL;
+
+	/*
+	 * If no PORT_INDEX is supplied, we will return all QPs from that device
+	 */
+	if (tb[RDMA_NLDEV_ATTR_PORT_INDEX]) {
+		port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+		if (!rdma_is_port_valid(device, port)) {
+			ret = -EINVAL;
+			goto err_index;
+		}
+	}
+
+	nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_RES_QP_GET),
+			0, NLM_F_MULTI);
+
+	if (fill_nldev_handle(skb, device)) {
+		ret = -EMSGSIZE;
+		goto err;
+	}
+
+	table_attr = nla_nest_start(skb, RDMA_NLDEV_ATTR_RES_QP);
+	if (!table_attr) {
+		ret = -EMSGSIZE;
+		goto err;
+	}
+
+	down_read(&device->res.rwsem);
+	hash_for_each_possible(device->res.hash, res, node, RDMA_RESTRACK_QP) {
+		if (idx < start)
+			goto next;
+
+		if ((rdma_is_kernel_res(res) &&
+		     task_active_pid_ns(current) != &init_pid_ns) ||
+		    (!rdma_is_kernel_res(res) &&
+		     task_active_pid_ns(current) != task_active_pid_ns(res->task)))
+			/*
+			 * 1. Kernel QPs should be visible in init namsapce only
+			 * 2. Preent only QPs visible in the current namespace
+			 */
+			goto next;
+
+		if (!rdma_restrack_get(res))
+			/*
+			 * Resource is under release now, but we are not
+			 * relesing lock now, so it will be released in
+			 * our next pass, once we will get ->next pointer.
+			 */
+			goto next;
+
+		qp = container_of(res, struct ib_qp, res);
+
+		up_read(&device->res.rwsem);
+		ret = fill_res_qp_entry(skb, qp, port);
+		down_read(&device->res.rwsem);
+		/*
+		 * Return resource back, but it won't be released till
+		 * the &device->res.rwsem will be released for write.
+		 */
+		rdma_restrack_put(res);
+
+		if (ret == -EMSGSIZE)
+			/*
+			 * There is a chance to optimize here.
+			 * It can be done by using list_prepare_entry
+			 * and list_for_each_entry_continue afterwards.
+			 */
+			break;
+		if (ret)
+			goto res_err;
+next:		idx++;
+	}
+	up_read(&device->res.rwsem);
+
+	nla_nest_end(skb, table_attr);
+	nlmsg_end(skb, nlh);
+	cb->args[0] = idx;
+
+	/*
+	 * No more QPs to fill, cancel the message and
+	 * return 0 to mark end of dumpit.
+	 */
+	if (!qp)
+		goto err;
+
+	put_device(&device->dev);
+	return skb->len;
+
+res_err:
+	nla_nest_cancel(skb, table_attr);
+	up_read(&device->res.rwsem);
+
+err:
+	nlmsg_cancel(skb, nlh);
+
+err_index:
+	put_device(&device->dev);
+	return ret;
+}
+
 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_GET] = {
 		.doit = nldev_get_doit,
@@ -489,6 +703,19 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 		.doit = nldev_res_get_doit,
 		.dump = nldev_res_get_dumpit,
 	},
+	[RDMA_NLDEV_CMD_RES_QP_GET] = {
+		.dump = nldev_res_get_qp_dumpit,
+		/*
+		 * .doit is not implemented yet for two reasons:
+		 * 1. It is not needed yet.
+		 * 2. There is a need to provide identifier, while it is easy
+		 * for the QPs (device index + port index + LQPN), it is not
+		 * the case for the rest of resources (PD and CQ). Because it
+		 * is better to provide similar interface for all resources,
+		 * let's wait till we will have other resources implemented
+		 * too.
+		 */
+	},
 };
 
 void __init nldev_init(void)
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index e0f5cdc81541..23bef4015982 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -241,6 +241,11 @@ enum rdma_nldev_command {
 	RDMA_NLDEV_CMD_RES_NEW,
 	RDMA_NLDEV_CMD_RES_DEL,
 
+	RDMA_NLDEV_CMD_RES_QP_GET, /* can dump */
+	RDMA_NLDEV_CMD_RES_QP_SET,
+	RDMA_NLDEV_CMD_RES_QP_NEW,
+	RDMA_NLDEV_CMD_RES_QP_DEL,
+
 	RDMA_NLDEV_NUM_OPS
 };
 
@@ -313,6 +318,46 @@ enum rdma_nldev_attr {
 	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME,	/* string */
 	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR,	/* u64 */
 
+	RDMA_NLDEV_ATTR_RES_QP,			/* nested table */
+	RDMA_NLDEV_ATTR_RES_QP_ENTRY,		/* nested table */
+	/*
+	 * Local QPN
+	 */
+	RDMA_NLDEV_ATTR_RES_LQPN,		/* u32 */
+	/*
+	 * Remote QPN,
+	 * Applicable for RC and UC only IBTA 11.2.5.3 QUERY QUEUE PAIR
+	 */
+	RDMA_NLDEV_ATTR_RES_RQPN,		/* u32 */
+	/*
+	 * Receive Queue PSN,
+	 * Applicable for RC and UC only 11.2.5.3 QUERY QUEUE PAIR
+	 */
+	RDMA_NLDEV_ATTR_RES_RQ_PSN,		/* u32 */
+	/*
+	 * Send Queue PSN
+	 */
+	RDMA_NLDEV_ATTR_RES_SQ_PSN,		/* u32 */
+	RDMA_NLDEV_ATTR_RES_PATH_MIG_STATE,	/* u8 */
+	/*
+	 * QP types as visible to RDMA/core, the reserved QPT
+	 * are not exported through this interface.
+	 */
+	RDMA_NLDEV_ATTR_RES_TYPE,		/* u8 */
+	RDMA_NLDEV_ATTR_RES_STATE,		/* u8 */
+	/*
+	 * Process ID which created object,
+	 * in case of kernel origin, PID won't exist.
+	 */
+	RDMA_NLDEV_ATTR_RES_PID,		/* u32 */
+	/*
+	 * The name of process created following resource.
+	 * It will exist only for kernel objects.
+	 * For user created objects, the user is supposed
+	 * to read /proc/PID/comm file.
+	 */
+	RDMA_NLDEV_ATTR_RES_KERN_NAME,		/* string */
+
 	RDMA_NLDEV_ATTR_MAX
 };
 #endif /* _UAPI_RDMA_NETLINK_H */
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization
       [not found]     ` <20180128091725.13103-8-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2018-01-28 20:45       ` Jason Gunthorpe
       [not found]         ` <20180128204513.GH23869-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-28 20:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise, Leon Romanovsky

On Sun, Jan 28, 2018 at 11:17:24AM +0200, Leon Romanovsky wrote:

> @@ -52,6 +54,11 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
>  	[RDMA_NLDEV_ATTR_PORT_STATE]	= { .type = NLA_U8 },
>  	[RDMA_NLDEV_ATTR_PORT_PHYS_STATE] = { .type = NLA_U8 },
>  	[RDMA_NLDEV_ATTR_DEV_NODE_TYPE] = { .type = NLA_U8 },
> +	[RDMA_NLDEV_ATTR_RES_SUMMARY]	= { .type = NLA_NESTED },
> +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY]	= { .type = NLA_NESTED },
> +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME] = { .type = NLA_NUL_STRING,
> +					     .len = 16 },
> +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR] = { .type = NLA_U64 },

nla_policy is only used during kernel parsing, it shouldn't have
anything the kernel will not accept as input, right? ie it should omit
things that are output only?

>  static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
>  	[RDMA_NLDEV_CMD_GET] = {
>  		.doit = nldev_get_doit,
> @@ -338,6 +485,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
>  		.doit = nldev_port_get_doit,
>  		.dump = nldev_port_get_dumpit,
>  	},
> +	[RDMA_NLDEV_CMD_RES_GET] = {
> +		.doit = nldev_res_get_doit,
> +		.dump = nldev_res_get_dumpit,
> +	},
>  };
>  
>  void __init nldev_init(void)
> diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index cc002e316d09..e0f5cdc81541 100644
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -236,6 +236,11 @@ enum rdma_nldev_command {
>  	RDMA_NLDEV_CMD_PORT_NEW,
>  	RDMA_NLDEV_CMD_PORT_DEL,
>  
> +	RDMA_NLDEV_CMD_RES_GET, /* can dump */
> +	RDMA_NLDEV_CMD_RES_SET,
> +	RDMA_NLDEV_CMD_RES_NEW,
> +	RDMA_NLDEV_CMD_RES_DEL,

Confused why all thse have get/set/new/del tuples and then don't
implement all of them. Is there some reason for this?

AFAIK netlink doesn't have any rules for numbering get/set/new/del, so
why can't we just add when we need?

> @@ -303,6 +308,11 @@ enum rdma_nldev_attr {
>  
>  	RDMA_NLDEV_ATTR_DEV_NODE_TYPE,		/* u8 */
>  
> +	RDMA_NLDEV_ATTR_RES_SUMMARY,		/* nested table */
> +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY,	/* nested table */
> +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME,	/* string */
> +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR,	/* u64 */

Had to look it up to figure out what CURR was..

ENTRY_NUM_OBJECTS ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 6/8] RDMA/core: Add resource tracking for create and destroy PDs
       [not found]     ` <20180128091725.13103-7-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2018-01-28 20:48       ` Jason Gunthorpe
       [not found]         ` <20180128204858.GI23869-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-28 20:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise, Leon Romanovsky

On Sun, Jan 28, 2018 at 11:17:23AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Track create and destroy operations of PD objects.
> 
> Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>  drivers/infiniband/core/uverbs_cmd.c | 2 ++
>  drivers/infiniband/core/verbs.c      | 5 +++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 3e95acd29de7..256934d1f64f 100644
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -340,6 +340,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
>  	uobj->object = pd;
>  	memset(&resp, 0, sizeof resp);
>  	resp.pd_handle = uobj->id;
> +	pd->res.type = RDMA_RESTRACK_PD;
> +	rdma_restrack_add(&pd->res);

This pattern happens a lot, any reason not to add type to the
signature of rdma_restrack_add ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 3/8] RDMA/restrack: Add general infrastructure to track RDMA resources
       [not found]     ` <20180128091725.13103-4-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2018-01-28 21:03       ` Jason Gunthorpe
       [not found]         ` <20180128210350.GJ23869-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-28 21:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise, Leon Romanovsky

On Sun, Jan 28, 2018 at 11:17:20AM +0200, Leon Romanovsky wrote:
> +int __must_check rdma_restrack_get(struct rdma_restrack_entry *res)
> +{
> +	return kref_get_unless_zero(&res->kref);
> +}
> +EXPORT_SYMBOL(rdma_restrack_get);
> +
> +static void restrack_release(struct kref *kref)
> +{
> +	struct rdma_restrack_entry *res;
> +
> +	res = container_of(kref, struct rdma_restrack_entry, kref);
> +	complete(&res->comp);
> +}
> +
> +int rdma_restrack_put(struct rdma_restrack_entry *res)
> +{
> +	return kref_put(&res->kref, restrack_release);
> +}
> +EXPORT_SYMBOL(rdma_restrack_put);
> +
> +void rdma_restrack_del(struct rdma_restrack_entry *res)
> +{
> +	struct ib_device *dev;
> +
> +	if (!res->valid)
> +		return;
> +
> +	dev = res_to_dev(res);
> +	if (!dev)
> +		return;
> +
> +	down_read(&dev->res.rwsem);
> +	rdma_restrack_put(res);
> +	up_read(&dev->res.rwsem);

I can't see why this lock is necessary, the underlying kref is already
atomic.

This locking seems fine, can't see any problem with it.

But I still hate the readability of the kref-as-not-a-kref approach.

Now that you've written it out, it is clear this is actually open
coding a rw_semaphore??

I think lockdep will be okay with this due to the trylock?

It saves a bit of memory compared to a kref + completion, and has
better clarity:

static inline int __must_check rdma_restrack_get(struct rdma_restrack_entry *res)
{
	return down_read_trylock(&res->rwsem);
}

static inline int rdma_restrack_put(struct rdma_restrack_entry *res)
{
	return up_read(&res->rwsem);
}

void rdma_restrack_del(struct rdma_restrack_entry *res)
{
        down_write(res->rwsem);
	down_write(&dev->res.rwsem);
	hash_del(&res->node);
        [..]
}

No change to the read side.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-01-28  9:17   ` [PATCH rdma-next v7 8/8] RDMA/nldev: Provide detailed QP information Leon Romanovsky
@ 2018-01-28 21:05   ` Jason Gunthorpe
       [not found]     ` <20180128210520.GK23869-uk2M96/98Pc@public.gmane.org>
  8 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-28 21:05 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:

> The original goal of this series was to allow ability to view connection
> (QP) information about running processes, however I used this opportunity and
> created common infrastructure to track and report various resources. The report
> part is implemented in netlink (nldev), but smart ULPs can now create
> advanced usage models based on device utilization.
> 
> The current implementation relies on one lock per-object per-device, so
> creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> same devices doesn't interfere each with another.
> 
> The data protection is performed with SRCU and its reader-writer model
> ensures that resource won't be destroyed till readers will finish their
> work.

Well, this cover letter isn't quite right anymore.. but no matter.

My small comments aside it looks OK to me.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization
       [not found]         ` <20180128204513.GH23869-uk2M96/98Pc@public.gmane.org>
@ 2018-01-29  5:09           ` Leon Romanovsky
       [not found]             ` <20180129050922.GA1393-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-29  5:09 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

[-- Attachment #1: Type: text/plain, Size: 3331 bytes --]

On Sun, Jan 28, 2018 at 01:45:13PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2018 at 11:17:24AM +0200, Leon Romanovsky wrote:
>
> > @@ -52,6 +54,11 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
> >  	[RDMA_NLDEV_ATTR_PORT_STATE]	= { .type = NLA_U8 },
> >  	[RDMA_NLDEV_ATTR_PORT_PHYS_STATE] = { .type = NLA_U8 },
> >  	[RDMA_NLDEV_ATTR_DEV_NODE_TYPE] = { .type = NLA_U8 },
> > +	[RDMA_NLDEV_ATTR_RES_SUMMARY]	= { .type = NLA_NESTED },
> > +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY]	= { .type = NLA_NESTED },
> > +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME] = { .type = NLA_NUL_STRING,
> > +					     .len = 16 },
> > +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR] = { .type = NLA_U64 },
>
> nla_policy is only used during kernel parsing, it shouldn't have
> anything the kernel will not accept as input, right? ie it should omit
> things that are output only?

The common practice for netlink code is to add nla_policy for every exposed
attribute. It achieves number of goals. First, future tools which will
send those attributes as inputs will be matched to ensure that UAPI
contract is still valid, despite the fact that it is not used in the
code. Second, once kernel developers will add needed implementation to
use those attributes as inputs, they are not supposed to touch this
policy, and it will be in sync with the output from the beginning - easy
review, easy maintenance.

>
> >  static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
> >  	[RDMA_NLDEV_CMD_GET] = {
> >  		.doit = nldev_get_doit,
> > @@ -338,6 +485,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
> >  		.doit = nldev_port_get_doit,
> >  		.dump = nldev_port_get_dumpit,
> >  	},
> > +	[RDMA_NLDEV_CMD_RES_GET] = {
> > +		.doit = nldev_res_get_doit,
> > +		.dump = nldev_res_get_dumpit,
> > +	},
> >  };
> >
> >  void __init nldev_init(void)
> > diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> > index cc002e316d09..e0f5cdc81541 100644
> > +++ b/include/uapi/rdma/rdma_netlink.h
> > @@ -236,6 +236,11 @@ enum rdma_nldev_command {
> >  	RDMA_NLDEV_CMD_PORT_NEW,
> >  	RDMA_NLDEV_CMD_PORT_DEL,
> >
> > +	RDMA_NLDEV_CMD_RES_GET, /* can dump */
> > +	RDMA_NLDEV_CMD_RES_SET,
> > +	RDMA_NLDEV_CMD_RES_NEW,
> > +	RDMA_NLDEV_CMD_RES_DEL,
>
> Confused why all thse have get/set/new/del tuples and then don't
> implement all of them. Is there some reason for this?
>
> AFAIK netlink doesn't have any rules for numbering get/set/new/del, so
> why can't we just add when we need?

Maybe, it is cargo cult, but I followed devlink and devlink does such
to follow ip tool paradigm where every object can have those properties
(GET/SET/NEW/DEL).

The current declaration places all commands in one place.
Otherwise, this structure will be mixed with different commands.

>
> > @@ -303,6 +308,11 @@ enum rdma_nldev_attr {
> >
> >  	RDMA_NLDEV_ATTR_DEV_NODE_TYPE,		/* u8 */
> >
> > +	RDMA_NLDEV_ATTR_RES_SUMMARY,		/* nested table */
> > +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY,	/* nested table */
> > +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME,	/* string */
> > +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR,	/* u64 */
>
> Had to look it up to figure out what CURR was..
>
> ENTRY_NUM_OBJECTS ?

I still didn't abandon my idea to provide ENTRY_MAX for the objects too.

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 6/8] RDMA/core: Add resource tracking for create and destroy PDs
       [not found]         ` <20180128204858.GI23869-uk2M96/98Pc@public.gmane.org>
@ 2018-01-29  5:14           ` Leon Romanovsky
  0 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-29  5:14 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

On Sun, Jan 28, 2018 at 01:48:58PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2018 at 11:17:23AM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > Track create and destroy operations of PD objects.
> >
> > Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> >  drivers/infiniband/core/uverbs_cmd.c | 2 ++
> >  drivers/infiniband/core/verbs.c      | 5 +++++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> > index 3e95acd29de7..256934d1f64f 100644
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -340,6 +340,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
> >  	uobj->object = pd;
> >  	memset(&resp, 0, sizeof resp);
> >  	resp.pd_handle = uobj->id;
> > +	pd->res.type = RDMA_RESTRACK_PD;
> > +	rdma_restrack_add(&pd->res);
>
> This pattern happens a lot, any reason not to add type to the
> signature of rdma_restrack_add ?

In early versions, I used the suggested signature, but found it clumsy
and liked more this patter, where all restrack functions operates or on
the entry or on the root.

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 3/8] RDMA/restrack: Add general infrastructure to track RDMA resources
       [not found]         ` <20180128210350.GJ23869-uk2M96/98Pc@public.gmane.org>
@ 2018-01-29  5:37           ` Leon Romanovsky
  0 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-29  5:37 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

On Sun, Jan 28, 2018 at 02:03:50PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2018 at 11:17:20AM +0200, Leon Romanovsky wrote:
> > +int __must_check rdma_restrack_get(struct rdma_restrack_entry *res)
> > +{
> > +	return kref_get_unless_zero(&res->kref);
> > +}
> > +EXPORT_SYMBOL(rdma_restrack_get);
> > +
> > +static void restrack_release(struct kref *kref)
> > +{
> > +	struct rdma_restrack_entry *res;
> > +
> > +	res = container_of(kref, struct rdma_restrack_entry, kref);
> > +	complete(&res->comp);
> > +}
> > +
> > +int rdma_restrack_put(struct rdma_restrack_entry *res)
> > +{
> > +	return kref_put(&res->kref, restrack_release);
> > +}
> > +EXPORT_SYMBOL(rdma_restrack_put);
> > +
> > +void rdma_restrack_del(struct rdma_restrack_entry *res)
> > +{
> > +	struct ib_device *dev;
> > +
> > +	if (!res->valid)
> > +		return;
> > +
> > +	dev = res_to_dev(res);
> > +	if (!dev)
> > +		return;
> > +
> > +	down_read(&dev->res.rwsem);
> > +	rdma_restrack_put(res);
> > +	up_read(&dev->res.rwsem);
>
> I can't see why this lock is necessary, the underlying kref is already
> atomic.

Just to be similar to read implementation.

>
> This locking seems fine, can't see any problem with it.
>
> But I still hate the readability of the kref-as-not-a-kref approach.
>

And I like :)

> Now that you've written it out, it is clear this is actually open
> coding a rw_semaphore??
>
> I think lockdep will be okay with this due to the trylock?
>
> It saves a bit of memory compared to a kref + completion, and has
> better clarity:

Let's put debug option aside,
It saves one "unsigned int done" and only if you didn't
enable CONFIG_RWSEM_SPIN_ON_OWNER, otherwise they are the same.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]     ` <20180128210520.GK23869-uk2M96/98Pc@public.gmane.org>
@ 2018-01-29  5:39       ` Leon Romanovsky
  2018-01-29 20:11       ` Doug Ledford
  1 sibling, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-29  5:39 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

On Sun, Jan 28, 2018 at 02:05:20PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:
>
> > The original goal of this series was to allow ability to view connection
> > (QP) information about running processes, however I used this opportunity and
> > created common infrastructure to track and report various resources. The report
> > part is implemented in netlink (nldev), but smart ULPs can now create
> > advanced usage models based on device utilization.
> >
> > The current implementation relies on one lock per-object per-device, so
> > creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> > same devices doesn't interfere each with another.
> >
> > The data protection is performed with SRCU and its reader-writer model
> > ensures that resource won't be destroyed till readers will finish their
> > work.
>
> Well, this cover letter isn't quite right anymore.. but no matter.

So, what is next? Are we ready to merge it?

>
> My small comments aside it looks OK to me.
>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization
       [not found]             ` <20180129050922.GA1393-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2018-01-29 17:59               ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-29 17:59 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

On Mon, Jan 29, 2018 at 07:09:22AM +0200, Leon Romanovsky wrote:
> On Sun, Jan 28, 2018 at 01:45:13PM -0700, Jason Gunthorpe wrote:
> > On Sun, Jan 28, 2018 at 11:17:24AM +0200, Leon Romanovsky wrote:
> >
> > > @@ -52,6 +54,11 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
> > >  	[RDMA_NLDEV_ATTR_PORT_STATE]	= { .type = NLA_U8 },
> > >  	[RDMA_NLDEV_ATTR_PORT_PHYS_STATE] = { .type = NLA_U8 },
> > >  	[RDMA_NLDEV_ATTR_DEV_NODE_TYPE] = { .type = NLA_U8 },
> > > +	[RDMA_NLDEV_ATTR_RES_SUMMARY]	= { .type = NLA_NESTED },
> > > +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY]	= { .type = NLA_NESTED },
> > > +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME] = { .type = NLA_NUL_STRING,
> > > +					     .len = 16 },
> > > +	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR] = { .type = NLA_U64 },
> >
> > nla_policy is only used during kernel parsing, it shouldn't have
> > anything the kernel will not accept as input, right? ie it should omit
> > things that are output only?
> 
> The common practice for netlink code is to add nla_policy for every exposed
> attribute. It achieves number of goals. First, future tools which will
> send those attributes as inputs will be matched to ensure that UAPI
> contract is still valid, despite the fact that it is not used in the
> code. Second, once kernel developers will add needed implementation to
> use those attributes as inputs, they are not supposed to touch this
> policy, and it will be in sync with the output from the beginning - easy
> review, easy maintenance.

Okay.

> > > +	RDMA_NLDEV_CMD_RES_GET, /* can dump */
> > > +	RDMA_NLDEV_CMD_RES_SET,
> > > +	RDMA_NLDEV_CMD_RES_NEW,
> > > +	RDMA_NLDEV_CMD_RES_DEL,
> >
> > Confused why all thse have get/set/new/del tuples and then don't
> > implement all of them. Is there some reason for this?
> >
> > AFAIK netlink doesn't have any rules for numbering get/set/new/del, so
> > why can't we just add when we need?
> 
> Maybe, it is cargo cult, but I followed devlink and devlink does such
> to follow ip tool paradigm where every object can have those properties
> (GET/SET/NEW/DEL).

Yes, I think it is. Don't see this in other netlink users.

'set' 'new' 'del' don't even make sense in this context.

Lets us stop doing this and mark the others as reserved??

> > > @@ -303,6 +308,11 @@ enum rdma_nldev_attr {
> > >
> > >  	RDMA_NLDEV_ATTR_DEV_NODE_TYPE,		/* u8 */
> > >
> > > +	RDMA_NLDEV_ATTR_RES_SUMMARY,		/* nested table */
> > > +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY,	/* nested table */
> > > +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME,	/* string */
> > > +	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR,	/* u64 */
> >
> > Had to look it up to figure out what CURR was..
> >
> > ENTRY_NUM_OBJECTS ?
> 
> I still didn't abandon my idea to provide ENTRY_MAX for the objects too.

Okay, but CURR is still a bit obtuse?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]     ` <20180128210520.GK23869-uk2M96/98Pc@public.gmane.org>
  2018-01-29  5:39       ` Leon Romanovsky
@ 2018-01-29 20:11       ` Doug Ledford
       [not found]         ` <1517256713.27592.241.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 37+ messages in thread
From: Doug Ledford @ 2018-01-29 20:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: RDMA mailing list, Mark Bloch, Steve Wise

[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]

On Sun, 2018-01-28 at 14:05 -0700, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:
> 
> > The original goal of this series was to allow ability to view connection
> > (QP) information about running processes, however I used this opportunity and
> > created common infrastructure to track and report various resources. The report
> > part is implemented in netlink (nldev), but smart ULPs can now create
> > advanced usage models based on device utilization.
> > 
> > The current implementation relies on one lock per-object per-device, so
> > creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> > same devices doesn't interfere each with another.
> > 
> > The data protection is performed with SRCU and its reader-writer model
> > ensures that resource won't be destroyed till readers will finish their
> > work.
> 
> Well, this cover letter isn't quite right anymore.. but no matter.
> 
> My small comments aside it looks OK to me.

Likewise.  I'm happy with it at this point.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]         ` <1517256713.27592.241.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-01-30  3:34           ` Jason Gunthorpe
       [not found]             ` <20180130033436.GA17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30  3:34 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Leon Romanovsky, RDMA mailing list, Mark Bloch, Steve Wise

On Mon, Jan 29, 2018 at 03:11:53PM -0500, Doug Ledford wrote:
> On Sun, 2018-01-28 at 14:05 -0700, Jason Gunthorpe wrote:
> > On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:
> > 
> > > The original goal of this series was to allow ability to view connection
> > > (QP) information about running processes, however I used this opportunity and
> > > created common infrastructure to track and report various resources. The report
> > > part is implemented in netlink (nldev), but smart ULPs can now create
> > > advanced usage models based on device utilization.
> > > 
> > > The current implementation relies on one lock per-object per-device, so
> > > creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> > > same devices doesn't interfere each with another.
> > > 
> > > The data protection is performed with SRCU and its reader-writer model
> > > ensures that resource won't be destroyed till readers will finish their
> > > work.
> > 
> > Well, this cover letter isn't quite right anymore.. but no matter.
> > 
> > My small comments aside it looks OK to me.
> 
> Likewise.  I'm happy with it at this point.

Okay, I fixed up the small things and applied the patches to for-next

Leon: Please validate I didn't screw it up. Here is the diff against
what you sent:

- Success path on the main execution flow, not under an if
- constify static structure
- Remove confusing comment about locking, ib_enum_all_devs
  obtains locks to iterate its list and rdma_restrack_count holds
  res->rwsem so everything is accounted for directly without
  trickyness
- Speeling
- Remove extra lock in rdma_restrack_del
- Restore pd = NULL in ib_create_xrc_qp. This scraed me a bit, xrc is
  wonky. But ib_create_xrc_q is only called in cases where
  rdma_restrack_add is not added, so keeping things as-they-are should
  not impact restrack. If restrack needs the pd for a XRC someday it
  should get it from qp->real_qp
- Remove SET/NEW/DEL cargo cult, please send a patch for rest?

Branch is here:

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-for-next

Still unhappy with the kref-as-not-a-kref.

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3dcacf220e5e7e..c4560d84dfaebd 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -310,21 +310,21 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	struct ib_qp *qp;
 
 	qp = dev->create_qp(pd, attr, udata);
-	if (!IS_ERR(qp)) {
-		qp->device = dev;
-		qp->pd	   = pd;
-		/*
-		 * We don't track XRC QPs for now, because they don't have PD
-		 * and more importantly they are created internaly by driver,
-		 * see mlx5 create_dev_resources() as an example.
-		 */
-		if (attr->qp_type < IB_QPT_XRC_INI) {
-			qp->res.type = RDMA_RESTRACK_QP;
-			rdma_restrack_add(&qp->res);
-		} else {
-			qp->res.valid = false;
-		}
-	}
+	if (IS_ERR(qp))
+		return qp;
+
+	qp->device = dev;
+	qp->pd = pd;
+	/*
+	 * We don't track XRC QPs for now, because they don't have PD
+	 * and more importantly they are created internaly by driver,
+	 * see mlx5 create_dev_resources() as an example.
+	 */
+	if (attr->qp_type < IB_QPT_XRC_INI) {
+		qp->res.type = RDMA_RESTRACK_QP;
+		rdma_restrack_add(&qp->res);
+	} else
+		qp->res.valid = false;
 
 	return qp;
 }
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 421d5fa6a81731..fa8655e3b3edfe 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -178,7 +178,7 @@ static int fill_res_info_entry(struct sk_buff *msg,
 
 static int fill_res_info(struct sk_buff *msg, struct ib_device *device)
 {
-	static const char *names[RDMA_RESTRACK_MAX] = {
+	static const char * const names[RDMA_RESTRACK_MAX] = {
 		[RDMA_RESTRACK_PD] = "pd",
 		[RDMA_RESTRACK_CQ] = "cq",
 		[RDMA_RESTRACK_QP] = "qp",
@@ -553,10 +553,6 @@ static int _nldev_res_get_dumpit(struct ib_device *device,
 static int nldev_res_get_dumpit(struct sk_buff *skb,
 				struct netlink_callback *cb)
 {
-	/*
-	 * There is no need to take lock, because
-	 * we are relying on ib_core's lists_rwsem
-	 */
 	return ib_enum_all_devs(_nldev_res_get_dumpit, skb, cb);
 }
 
@@ -627,8 +623,8 @@ static int nldev_res_get_qp_dumpit(struct sk_buff *skb,
 		    (!rdma_is_kernel_res(res) &&
 		     task_active_pid_ns(current) != task_active_pid_ns(res->task)))
 			/*
-			 * 1. Kernel QPs should be visible in init namsapce only
-			 * 2. Preent only QPs visible in the current namespace
+			 * 1. Kernel QPs should be visible in init namspace only
+			 * 2. Present only QPs visible in the current namespace
 			 */
 			goto next;
 
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 351b6940f6dc17..857637bf46da27 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -150,9 +150,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 	if (!dev)
 		return;
 
-	down_read(&dev->res.rwsem);
 	rdma_restrack_put(res);
-	up_read(&dev->res.rwsem);
 
 	wait_for_completion(&res->comp);
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index a98a3e8412f810..16ebc6372c31ab 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -849,6 +849,7 @@ static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
 
 	qp->event_handler = __ib_shared_qp_event_handler;
 	qp->qp_context = qp;
+	qp->pd = NULL;
 	qp->send_cq = qp->recv_cq = NULL;
 	qp->srq = NULL;
 	qp->xrcd = qp_init_attr->xrcd;
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 23bef401598208..17e59bec169ec0 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -237,14 +237,8 @@ enum rdma_nldev_command {
 	RDMA_NLDEV_CMD_PORT_DEL,
 
 	RDMA_NLDEV_CMD_RES_GET, /* can dump */
-	RDMA_NLDEV_CMD_RES_SET,
-	RDMA_NLDEV_CMD_RES_NEW,
-	RDMA_NLDEV_CMD_RES_DEL,
 
 	RDMA_NLDEV_CMD_RES_QP_GET, /* can dump */
-	RDMA_NLDEV_CMD_RES_QP_SET,
-	RDMA_NLDEV_CMD_RES_QP_NEW,
-	RDMA_NLDEV_CMD_RES_QP_DEL,
 
 	RDMA_NLDEV_NUM_OPS
 };
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]             ` <20180130033436.GA17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30  9:16               ` Leon Romanovsky
       [not found]                 ` <20180130091654.GD2055-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2018-01-30  9:16 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Mark Bloch, Steve Wise

[-- Attachment #1: Type: text/plain, Size: 2766 bytes --]

On Mon, Jan 29, 2018 at 08:34:36PM -0700, Jason Gunthorpe wrote:
> On Mon, Jan 29, 2018 at 03:11:53PM -0500, Doug Ledford wrote:
> > On Sun, 2018-01-28 at 14:05 -0700, Jason Gunthorpe wrote:
> > > On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:
> > >
> > > > The original goal of this series was to allow ability to view connection
> > > > (QP) information about running processes, however I used this opportunity and
> > > > created common infrastructure to track and report various resources. The report
> > > > part is implemented in netlink (nldev), but smart ULPs can now create
> > > > advanced usage models based on device utilization.
> > > >
> > > > The current implementation relies on one lock per-object per-device, so
> > > > creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> > > > same devices doesn't interfere each with another.
> > > >
> > > > The data protection is performed with SRCU and its reader-writer model
> > > > ensures that resource won't be destroyed till readers will finish their
> > > > work.
> > >
> > > Well, this cover letter isn't quite right anymore.. but no matter.
> > >
> > > My small comments aside it looks OK to me.
> >
> > Likewise.  I'm happy with it at this point.
>
> Okay, I fixed up the small things and applied the patches to for-next
>
> Leon: Please validate I didn't screw it up. Here is the diff against
> what you sent:
>
> - Success path on the main execution flow, not under an if
> - constify static structure
> - Remove confusing comment about locking, ib_enum_all_devs
>   obtains locks to iterate its list and rdma_restrack_count holds
>   res->rwsem so everything is accounted for directly without
>   trickyness
> - Speeling
> - Remove extra lock in rdma_restrack_del
> - Restore pd = NULL in ib_create_xrc_qp. This scraed me a bit, xrc is
>   wonky. But ib_create_xrc_q is only called in cases where
>   rdma_restrack_add is not added, so keeping things as-they-are should
>   not impact restrack. If restrack needs the pd for a XRC someday it
>   should get it from qp->real_qp
> - Remove SET/NEW/DEL cargo cult, please send a patch for rest?
>
> Branch is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-for-next
>
> Still unhappy with the kref-as-not-a-kref.

Thanks Doug and Jason for accepting it.

The "qp->pd = NULL" assignment is not needed. PD is NULL
for the XRCD and you are setting "qp->pd = pd" before returning
from _ib_create_qp() call, so it is actually the same.

Everything works as expected, it passed my tests with ud/rc/xsrq
pingpongs and shutdowns during traffic.

Steve, I created the stable tag for you: rdma-next-2018-01-30

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                 ` <20180130091654.GD2055-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2018-01-30 15:21                   ` Steve Wise
  2018-01-30 15:56                     ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2018-01-30 15:21 UTC (permalink / raw)
  To: 'Leon Romanovsky', 'Jason Gunthorpe'
  Cc: 'Doug Ledford', 'RDMA mailing list',
	'Mark Bloch'

> > Branch is here:
> >
> >
>
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg
-
> for-next
> >
> > Still unhappy with the kref-as-not-a-kref.
> 
> Thanks Doug and Jason for accepting it.
> 
> The "qp->pd = NULL" assignment is not needed. PD is NULL
> for the XRCD and you are setting "qp->pd = pd" before returning
> from _ib_create_qp() call, so it is actually the same.
> 
> Everything works as expected, it passed my tests with ud/rc/xsrq
> pingpongs and shutdowns during traffic.
> 
> Steve, I created the stable tag for you: rdma-next-2018-01-30
> 
> Thanks

Thanks guys,  Q: 4.15-rc2 kills my machine, I've been rebasing to a later rc
to continue my development work, but the rebase fails horribly on mlx5.  I
skip these failures and don't compile mlx5 because I don't need it.  

Anyway, will rdma-next be rebased on 4.15.0 soon?  Or does that wait until
-rc2...



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
  2018-01-30 15:21                   ` Steve Wise
@ 2018-01-30 15:56                     ` Jason Gunthorpe
       [not found]                       ` <20180130155643.GC17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 15:56 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Leon Romanovsky', 'Doug Ledford',
	'RDMA mailing list', 'Mark Bloch'

On Tue, Jan 30, 2018 at 09:21:30AM -0600, Steve Wise wrote:
> > > Branch is here:
> > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg
> -
> > for-next
> > >
> > > Still unhappy with the kref-as-not-a-kref.
> > 
> > Thanks Doug and Jason for accepting it.
> > 
> > The "qp->pd = NULL" assignment is not needed. PD is NULL
> > for the XRCD and you are setting "qp->pd = pd" before returning
> > from _ib_create_qp() call, so it is actually the same.
> > 
> > Everything works as expected, it passed my tests with ud/rc/xsrq
> > pingpongs and shutdowns during traffic.
> > 
> > Steve, I created the stable tag for you: rdma-next-2018-01-30
> > 
> > Thanks
> 
> Thanks guys,  Q: 4.15-rc2 kills my machine, I've been rebasing to a later rc
> to continue my development work, but the rebase fails horribly on mlx5.  I
> skip these failures and don't compile mlx5 because I don't need it.  
> 
> Anyway, will rdma-next be rebased on 4.15.0 soon?  Or does that wait until
> -rc2...

It is annoying for all concerned but Linus has asked us not to have
needless merges. Hopefully we can build enough trust to relax on that
down the road as other trees do.

I have made a merge to v4.15 here:

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/for-linus-merged

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                       ` <20180130155643.GC17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30 16:16                         ` Steve Wise
  2018-01-30 16:33                           ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2018-01-30 16:16 UTC (permalink / raw)
  To: 'Jason Gunthorpe'
  Cc: 'Leon Romanovsky', 'Doug Ledford',
	'RDMA mailing list', 'Mark Bloch'

> > Thanks guys,  Q: 4.15-rc2 kills my machine, I've been rebasing to a
later rc
> > to continue my development work, but the rebase fails horribly on mlx5.
I
> > skip these failures and don't compile mlx5 because I don't need it.
> >
> > Anyway, will rdma-next be rebased on 4.15.0 soon?  Or does that wait
until
> > -rc2...
> 
> It is annoying for all concerned but Linus has asked us not to have
> needless merges. Hopefully we can build enough trust to relax on that
> down the road as other trees do.
> 
> I have made a merge to v4.15 here:
> 
>
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/for
-
> linus-merged

What is this a merge of exactly? I don't see the restrack stuff, for
instance.

Thanks,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
  2018-01-30 16:16                         ` Steve Wise
@ 2018-01-30 16:33                           ` Jason Gunthorpe
       [not found]                             ` <20180130163330.GE17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 16:33 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Leon Romanovsky', 'Doug Ledford',
	'RDMA mailing list', 'Mark Bloch'

On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:

> What is this a merge of exactly? I don't see the restrack stuff, for
> instance.

Yesterday's for-next. You could merge it with the latest for-next..

I updated it.

I think we are done now, so for-next is what will be sent as the
pull-request and for-next-merged is the conflict resolution.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                             ` <20180130163330.GE17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30 19:07                               ` Bart Van Assche
       [not found]                                 ` <1517339252.2589.34.camel-Sjgp3cTcYWE@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2018-01-30 19:07 UTC (permalink / raw)
  To: jgg-uk2M96/98Pc, swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:
> 
> > What is this a merge of exactly? I don't see the restrack stuff, for
> > instance.
> 
> Yesterday's for-next. You could merge it with the latest for-next..
> 
> I updated it.
> 
> I think we are done now, so for-next is what will be sent as the
> pull-request and for-next-merged is the conflict resolution.

Hello Jason,

Although I have not yet tried to root-cause this, I want to let you know
that with your for-linus-merged branch the following error message is
reported if I try to run the srp-test software against the rdma_rxe driver:

id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory

In the kernel log I found the following:

Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)

With your for-next branch from a few days ago the same test ran fine.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                 ` <1517339252.2589.34.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-30 19:46                                   ` Jason Gunthorpe
       [not found]                                     ` <20180130194639.GJ17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 19:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, Jan 30, 2018 at 07:07:35PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:
> > 
> > > What is this a merge of exactly? I don't see the restrack stuff, for
> > > instance.
> > 
> > Yesterday's for-next. You could merge it with the latest for-next..
> > 
> > I updated it.
> > 
> > I think we are done now, so for-next is what will be sent as the
> > pull-request and for-next-merged is the conflict resolution.
> 
> Hello Jason,
> 
> Although I have not yet tried to root-cause this, I want to let you know
> that with your for-linus-merged branch the following error message is
> reported if I try to run the srp-test software against the rdma_rxe driver:
> 
> id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
> arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
> nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory
> 
> In the kernel log I found the following:
> 
> Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)
> 
> With your for-next branch from a few days ago the same test ran fine.

I don't have a guess for you..

The difference between for-next and merged is only the inclusion of
v4.15? Could some v4.15 non-rdma code be causing issue here?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                     ` <20180130194639.GJ17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30 20:42                                       ` Bart Van Assche
       [not found]                                         ` <1517344962.2589.39.camel-Sjgp3cTcYWE@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2018-01-30 20:42 UTC (permalink / raw)
  To: jgg-uk2M96/98Pc
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	leon-DgEjT+Ai2ygdnm+yROfE0A, markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, 2018-01-30 at 12:46 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 07:07:35PM +0000, Bart Van Assche wrote:
> > On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> > > On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:
> > > 
> > > > What is this a merge of exactly? I don't see the restrack stuff, for
> > > > instance.
> > > 
> > > Yesterday's for-next. You could merge it with the latest for-next..
> > > 
> > > I updated it.
> > > 
> > > I think we are done now, so for-next is what will be sent as the
> > > pull-request and for-next-merged is the conflict resolution.
> > 
> > Hello Jason,
> > 
> > Although I have not yet tried to root-cause this, I want to let you know
> > that with your for-linus-merged branch the following error message is
> > reported if I try to run the srp-test software against the rdma_rxe driver:
> > 
> > id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
> > arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
> > nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory
> > 
> > In the kernel log I found the following:
> > 
> > Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)
> > 
> > With your for-next branch from a few days ago the same test ran fine.
> 
> I don't have a guess for you..
> 
> The difference between for-next and merged is only the inclusion of
> v4.15? Could some v4.15 non-rdma code be causing issue here?

Hello Jason,

I should have mentioned that in the previous tests I ran I merged kernel
v4.15-rc9 myself into the RDMA for-next branch. So this behavior was probably
introduced by a patch that was queued recently on the RDMA for-next branch,
e.g. RDMA resource tracking.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                         ` <1517344962.2589.39.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-30 20:48                                           ` Jason Gunthorpe
       [not found]                                             ` <20180130204840.GK17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 20:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	leon-DgEjT+Ai2ygdnm+yROfE0A, markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, Jan 30, 2018 at 08:42:44PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 12:46 -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 30, 2018 at 07:07:35PM +0000, Bart Van Assche wrote:
> > > On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> > > > On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:
> > > > 
> > > > > What is this a merge of exactly? I don't see the restrack stuff, for
> > > > > instance.
> > > > 
> > > > Yesterday's for-next. You could merge it with the latest for-next..
> > > > 
> > > > I updated it.
> > > > 
> > > > I think we are done now, so for-next is what will be sent as the
> > > > pull-request and for-next-merged is the conflict resolution.
> > > 
> > > Hello Jason,
> > > 
> > > Although I have not yet tried to root-cause this, I want to let you know
> > > that with your for-linus-merged branch the following error message is
> > > reported if I try to run the srp-test software against the rdma_rxe driver:
> > > 
> > > id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
> > > arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
> > > nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory
> > > 
> > > In the kernel log I found the following:
> > > 
> > > Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)
> > > 
> > > With your for-next branch from a few days ago the same test ran fine.
> > 
> > I don't have a guess for you..
> > 
> > The difference between for-next and merged is only the inclusion of
> > v4.15? Could some v4.15 non-rdma code be causing issue here?
> 
> Hello Jason,
> 
> I should have mentioned that in the previous tests I ran I merged kernel
> v4.15-rc9 myself into the RDMA for-next branch. So this behavior was probably
> introduced by a patch that was queued recently on the RDMA for-next branch,
> e.g. RDMA resource tracking.

Ok, I think that is the only likely thing recently..

But your print above must be caused by this line, right:

static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
                                              struct ib_pd *pd, int pool_size,
                                              int max_page_list_len)
{
        ret = -ENOMEM;
        pool = kzalloc(sizeof(struct srp_fr_pool) +
                       pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
        if (!pool)
                goto err;

Since you didn't report the ib_alloc_mr() print it can't be the other
ENOMEM case?

Hard to see how that interesects with resource tracking.. Are you
thinking memory corruption?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                             ` <20180130204840.GK17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30 21:22                                               ` Bart Van Assche
       [not found]                                                 ` <1517347322.2589.58.camel-Sjgp3cTcYWE@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2018-01-30 21:22 UTC (permalink / raw)
  To: jgg-uk2M96/98Pc
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, 2018-01-30 at 13:48 -0700, Jason Gunthorpe wrote:
> Ok, I think that is the only likely thing recently..
> 
> But your print above must be caused by this line, right:
> 
> static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>                                               struct ib_pd *pd, int pool_size,
>                                               int max_page_list_len)
> {
>         ret = -ENOMEM;
>         pool = kzalloc(sizeof(struct srp_fr_pool) +
>                        pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
>         if (!pool)
>                 goto err;
> 
> Since you didn't report the ib_alloc_mr() print it can't be the other
> ENOMEM case?
> 
> Hard to see how that interesects with resource tracking.. Are you
> thinking memory corruption?

Hello Jason,

I don't see any reason to suspect memory corruption. kmemleak isn't reporting
any memory leaks. Maybe memory fragmentation has increased?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                 ` <1517347322.2589.58.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-30 21:33                                                   ` Laurence Oberman
       [not found]                                                     ` <1517347999.15224.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Laurence Oberman @ 2018-01-30 21:33 UTC (permalink / raw)
  To: Bart Van Assche, jgg-uk2M96/98Pc
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, 2018-01-30 at 21:22 +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 13:48 -0700, Jason Gunthorpe wrote:
> > Ok, I think that is the only likely thing recently..
> > 
> > But your print above must be caused by this line, right:
> > 
> > static struct srp_fr_pool *srp_create_fr_pool(struct ib_device
> > *device,
> >                                               struct ib_pd *pd, int
> > pool_size,
> >                                               int
> > max_page_list_len)
> > {
> >         ret = -ENOMEM;
> >         pool = kzalloc(sizeof(struct srp_fr_pool) +
> >                        pool_size * sizeof(struct srp_fr_desc),
> > GFP_KERNEL);
> >         if (!pool)
> >                 goto err;
> > 
> > Since you didn't report the ib_alloc_mr() print it can't be the
> > other
> > ENOMEM case?
> > 
> > Hard to see how that interesects with resource tracking.. Are you
> > thinking memory corruption?
> 
> Hello Jason,
> 
> I don't see any reason to suspect memory corruption. kmemleak isn't
> reporting
> any memory leaks. Maybe memory fragmentation has increased?
> 
> Thanks,
> 
> Bart.NrybXǧv^)޺{.n+{ٚ{ay\x1d
ʇڙ,j\afhz\x1e
w\f
j:+vwjm\azZ+ݢj"!

Hi Bart, 

Can I take your tree and see if this fails for me too,
Your last tree was fine, so did not have this latest stuff.
Can I just pull to what I have

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                     ` <1517347999.15224.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-01-30 21:40                                                       ` Bart Van Assche
       [not found]                                                         ` <1517348412.2589.60.camel-Sjgp3cTcYWE@public.gmane.org>
  2018-01-30 21:40                                                       ` Jason Gunthorpe
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2018-01-30 21:40 UTC (permalink / raw)
  To: jgg-uk2M96/98Pc, loberman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> Can I take your tree and see if this fails for me too,
> Your last tree was fine, so did not have this latest stuff.
> Can I just pull to what I have

Hello Laurence,

So far I have seen this behavior only inside a VM but not yet on a system
with more memory than the VM. This issue may be specific to the memory size
of the VM. I think we should try to isolate furhter what caused this before
trying to reproduce it on more setups.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                     ` <1517347999.15224.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2018-01-30 21:40                                                       ` Bart Van Assche
@ 2018-01-30 21:40                                                       ` Jason Gunthorpe
  1 sibling, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 21:40 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Bart Van Assche, leon-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, Jan 30, 2018 at 04:33:19PM -0500, Laurence Oberman wrote:
> > > Since you didn't report the ib_alloc_mr() print it can't be the
> > > other
> > > ENOMEM case?
> > > 
> > > Hard to see how that interesects with resource tracking.. Are you
> > > thinking memory corruption?
> > 
> > Hello Jason,
> > 
> > I don't see any reason to suspect memory corruption. kmemleak isn't
> > reporting
> > any memory leaks. Maybe memory fragmentation has increased?
> > 
> > Thanks,
> > 
> > Bart
> 
> Hi Bart,
>
> Can I take your tree and see if this fails for me too,
> Your last tree was fine, so did not have this latest stuff.
> Can I just pull to what I have

Try what we are about to send as a PR:

git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
branch wip/for-linus-merged
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/for-linus-merged

No guess why a kzalloc will fail. How much is it asking for anyhow?
Did the size get out of control for some reason?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                         ` <1517348412.2589.60.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-30 21:42                                                           ` Jason Gunthorpe
       [not found]                                                             ` <20180130214227.GM17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 21:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: loberman-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, markb-VPRAkNaXOzVWk0Htik3J/w

On Tue, Jan 30, 2018 at 09:40:14PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> > Can I take your tree and see if this fails for me too,
> > Your last tree was fine, so did not have this latest stuff.
> > Can I just pull to what I have
> 
> Hello Laurence,
> 
> So far I have seen this behavior only inside a VM but not yet on a system
> with more memory than the VM. This issue may be specific to the memory size
> of the VM. I think we should try to isolate furhter what caused this before
> trying to reproduce it on more setups.

Did you get an oops print related a kalloc failure?

Or am I wrong and the ENOMEM is coming from someplace else?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                             ` <20180130214227.GM17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30 21:47                                                               ` Bart Van Assche
       [not found]                                                                 ` <1517348867.2589.63.camel-Sjgp3cTcYWE@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2018-01-30 21:47 UTC (permalink / raw)
  To: jgg-uk2M96/98Pc
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, markb-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	loberman-H+wXaHxf7aLQT0dZR+AlfA, dledford-H+wXaHxf7aLQT0dZR+AlfA

On Tue, 2018-01-30 at 14:42 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 09:40:14PM +0000, Bart Van Assche wrote:
> > On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> > > Can I take your tree and see if this fails for me too,
> > > Your last tree was fine, so did not have this latest stuff.
> > > Can I just pull to what I have
> > 
> > Hello Laurence,
> > 
> > So far I have seen this behavior only inside a VM but not yet on a system
> > with more memory than the VM. This issue may be specific to the memory size
> > of the VM. I think we should try to isolate furhter what caused this before
> > trying to reproduce it on more setups.
> 
> Did you get an oops print related a kalloc failure?
> 
> Or am I wrong and the ENOMEM is coming from someplace else?

Hello Jason,

I just noticed the following in the system log:

Jan 30 12:53:15 ubuntu-vm kernel: ib_srp: rxe0: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count

So apparently the ib_alloc_mr() fails sometimes (but not the first few times
it is called).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                                 ` <1517348867.2589.63.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-30 22:02                                                                   ` Jason Gunthorpe
       [not found]                                                                     ` <20180130220233.GN17053-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2018-01-30 22:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, markb-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	loberman-H+wXaHxf7aLQT0dZR+AlfA, dledford-H+wXaHxf7aLQT0dZR+AlfA

On Tue, Jan 30, 2018 at 09:47:48PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 14:42 -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 30, 2018 at 09:40:14PM +0000, Bart Van Assche wrote:
> > > On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> > > > Can I take your tree and see if this fails for me too,
> > > > Your last tree was fine, so did not have this latest stuff.
> > > > Can I just pull to what I have
> > > 
> > > Hello Laurence,
> > > 
> > > So far I have seen this behavior only inside a VM but not yet on a system
> > > with more memory than the VM. This issue may be specific to the memory size
> > > of the VM. I think we should try to isolate furhter what caused this before
> > > trying to reproduce it on more setups.
> > 
> > Did you get an oops print related a kalloc failure?
> > 
> > Or am I wrong and the ENOMEM is coming from someplace else?
> 
> Hello Jason,
> 
> I just noticed the following in the system log:
> 
> Jan 30 12:53:15 ubuntu-vm kernel: ib_srp: rxe0: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count
> 
> So apparently the ib_alloc_mr() fails sometimes (but not the first few times
> it is called).

Looks like the only way you can get that without hitting an kalloc
oops print is if rxe_alloc() fails, and probably here:

	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
		goto out_put_pool;

Suggesting srp hit the max # of mrs in rxe:

	RXE_MAX_MR			= 2 * 1024,

Or maybe we are now leaking mrs someplace?

There is nothing accepted recently that mucks with this, still not
seeing even a tenuous connection to any patches in the last few days

What was accepted in the past week(s) was a bunch of srp stuff
though:

$ git diff --stat 052eac6eeb5655c52a490a49f09c55500f868558
 MAINTAINERS                                  |   3 +-
 drivers/infiniband/core/Makefile             |   2 +-
 drivers/infiniband/core/cm.c                 |   6 +-
 drivers/infiniband/core/cma.c                |   2 +-
 drivers/infiniband/core/core_priv.h          |  28 ++++
 drivers/infiniband/core/cq.c                 |  16 ++-
 drivers/infiniband/core/device.c             |   4 +
 drivers/infiniband/core/nldev.c              | 374 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/restrack.c           | 164 ++++++++++++++++++++++
 drivers/infiniband/core/user_mad.c           |   2 +-
 drivers/infiniband/core/uverbs_cmd.c         |   7 +-
 drivers/infiniband/core/uverbs_ioctl.c       |  19 ++-
 drivers/infiniband/core/uverbs_std_types.c   |   3 +
 drivers/infiniband/core/verbs.c              |  17 ++-
 drivers/infiniband/hw/mlx4/cq.c              |   4 +-
 drivers/infiniband/hw/mlx5/cq.c              |   2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |   4 +-
 drivers/infiniband/hw/mlx5/qp.c              |   5 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c  |   2 +-
 drivers/infiniband/hw/mthca/mthca_user.h     | 112 ---------------
 drivers/infiniband/hw/qedr/verbs.c           |   6 +-
 drivers/infiniband/hw/qib/qib_keys.c         | 235 -------------------------------
 drivers/infiniband/sw/rxe/Kconfig            |   4 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |  16 +--
 drivers/infiniband/ulp/srp/ib_srp.c          | 723 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
 drivers/infiniband/ulp/srp/ib_srp.h          |  43 +++++-
 drivers/infiniband/ulp/srpt/ib_srpt.c        |   2 -
 include/rdma/ib_verbs.h                      |  39 ++++--
 include/rdma/restrack.h                      | 157 +++++++++++++++++++++
 include/scsi/srp.h                           |  17 +++
 include/uapi/rdma/ib_user_verbs.h            |   7 +-
 include/uapi/rdma/rdma_netlink.h             |  49 +++++++
 lib/kobject.c                                |   2 +
 33 files changed, 1511 insertions(+), 565 deletions(-)

Any chance one of the SRP patches got mishandled somehow??

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH rdma-next v7 0/8] RDMA resource tracking
       [not found]                                                                     ` <20180130220233.GN17053-uk2M96/98Pc@public.gmane.org>
@ 2018-01-30 22:10                                                                       ` Bart Van Assche
  0 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2018-01-30 22:10 UTC (permalink / raw)
  To: jgg-uk2M96/98Pc
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	markb-VPRAkNaXOzVWk0Htik3J/w, loberman-H+wXaHxf7aLQT0dZR+AlfA

On Tue, 2018-01-30 at 15:02 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 09:47:48PM +0000, Bart Van Assche wrote:
> > So apparently the ib_alloc_mr() fails sometimes (but not the first few times
> > it is called).
> 
> Looks like the only way you can get that without hitting an kalloc
> oops print is if rxe_alloc() fails, and probably here:
> 
> 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
> 		goto out_put_pool;
> 
> Suggesting srp hit the max # of mrs in rxe:
> 
> 	RXE_MAX_MR			= 2 * 1024,
> 
> Or maybe we are now leaking mrs someplace?
> 
> There is nothing accepted recently that mucks with this, still not
> seeing even a tenuous connection to any patches in the last few days

Hello Jason,

Since the number of memory regions that is allocated by the ib_srp driver
depends on the max_sect parameter in the login string I will start with
checking whether there have been any recent changes of that parameter in
the test script.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-01-30 22:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-28  9:17 [PATCH rdma-next v7 0/8] RDMA resource tracking Leon Romanovsky
     [not found] ` <20180128091725.13103-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-01-28  9:17   ` [PATCH rdma-next v7 1/8] RDMA/core: Print caller name instead of function name Leon Romanovsky
2018-01-28  9:17   ` [PATCH rdma-next v7 2/8] RDMA/core: Save kernel caller name in PD and CQ objects Leon Romanovsky
2018-01-28  9:17   ` [PATCH rdma-next v7 3/8] RDMA/restrack: Add general infrastructure to track RDMA resources Leon Romanovsky
     [not found]     ` <20180128091725.13103-4-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-01-28 21:03       ` Jason Gunthorpe
     [not found]         ` <20180128210350.GJ23869-uk2M96/98Pc@public.gmane.org>
2018-01-29  5:37           ` Leon Romanovsky
2018-01-28  9:17   ` [PATCH rdma-next v7 4/8] RDMA/core: Add resource tracking for create and destroy QPs Leon Romanovsky
2018-01-28  9:17   ` [PATCH rdma-next v7 5/8] RDMA/core: Add resource tracking for create and destroy CQs Leon Romanovsky
2018-01-28  9:17   ` [PATCH rdma-next v7 6/8] RDMA/core: Add resource tracking for create and destroy PDs Leon Romanovsky
     [not found]     ` <20180128091725.13103-7-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-01-28 20:48       ` Jason Gunthorpe
     [not found]         ` <20180128204858.GI23869-uk2M96/98Pc@public.gmane.org>
2018-01-29  5:14           ` Leon Romanovsky
2018-01-28  9:17   ` [PATCH rdma-next v7 7/8] RDMA/nldev: Provide global resource utilization Leon Romanovsky
     [not found]     ` <20180128091725.13103-8-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-01-28 20:45       ` Jason Gunthorpe
     [not found]         ` <20180128204513.GH23869-uk2M96/98Pc@public.gmane.org>
2018-01-29  5:09           ` Leon Romanovsky
     [not found]             ` <20180129050922.GA1393-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-29 17:59               ` Jason Gunthorpe
2018-01-28  9:17   ` [PATCH rdma-next v7 8/8] RDMA/nldev: Provide detailed QP information Leon Romanovsky
2018-01-28 21:05   ` [PATCH rdma-next v7 0/8] RDMA resource tracking Jason Gunthorpe
     [not found]     ` <20180128210520.GK23869-uk2M96/98Pc@public.gmane.org>
2018-01-29  5:39       ` Leon Romanovsky
2018-01-29 20:11       ` Doug Ledford
     [not found]         ` <1517256713.27592.241.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-30  3:34           ` Jason Gunthorpe
     [not found]             ` <20180130033436.GA17053-uk2M96/98Pc@public.gmane.org>
2018-01-30  9:16               ` Leon Romanovsky
     [not found]                 ` <20180130091654.GD2055-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-30 15:21                   ` Steve Wise
2018-01-30 15:56                     ` Jason Gunthorpe
     [not found]                       ` <20180130155643.GC17053-uk2M96/98Pc@public.gmane.org>
2018-01-30 16:16                         ` Steve Wise
2018-01-30 16:33                           ` Jason Gunthorpe
     [not found]                             ` <20180130163330.GE17053-uk2M96/98Pc@public.gmane.org>
2018-01-30 19:07                               ` Bart Van Assche
     [not found]                                 ` <1517339252.2589.34.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-30 19:46                                   ` Jason Gunthorpe
     [not found]                                     ` <20180130194639.GJ17053-uk2M96/98Pc@public.gmane.org>
2018-01-30 20:42                                       ` Bart Van Assche
     [not found]                                         ` <1517344962.2589.39.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-30 20:48                                           ` Jason Gunthorpe
     [not found]                                             ` <20180130204840.GK17053-uk2M96/98Pc@public.gmane.org>
2018-01-30 21:22                                               ` Bart Van Assche
     [not found]                                                 ` <1517347322.2589.58.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-30 21:33                                                   ` Laurence Oberman
     [not found]                                                     ` <1517347999.15224.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-30 21:40                                                       ` Bart Van Assche
     [not found]                                                         ` <1517348412.2589.60.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-30 21:42                                                           ` Jason Gunthorpe
     [not found]                                                             ` <20180130214227.GM17053-uk2M96/98Pc@public.gmane.org>
2018-01-30 21:47                                                               ` Bart Van Assche
     [not found]                                                                 ` <1517348867.2589.63.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-30 22:02                                                                   ` Jason Gunthorpe
     [not found]                                                                     ` <20180130220233.GN17053-uk2M96/98Pc@public.gmane.org>
2018-01-30 22:10                                                                       ` Bart Van Assche
2018-01-30 21:40                                                       ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.