All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v2 00/17] Statistics counter support
@ 2019-04-29  8:34 ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
 v1 -> v2:
 * Rebased to latest rdma-next
 v0 -> v1:
 * Changed wording of counter comment
 * Removed unneeded assignments
 * Added extra patch to present global counters

 * I didn't change QP type from int to be enum ib_qp_type,
   because it caused to cyclic dependency between ib_verbs.h and
   rdma_counter.h.

----------------------------------------------------

Hi,

This series from Mark provides dynamic statistics infrastructure.
He uses netlink interface to configure and retrieve those counters.

This infrastructure allows to users monitor various objects by binding
to them counters. As the beginning, we used QP object as target for
those counters, but future patches will include ODP MR information too.

Two binding modes are supported:
 - Auto: This allows a user to build automatic set of objects to a counter
   according to common criteria. For example in a per-type scheme, where in
   one process all QPs with same QP type are bound automatically to a single
   counter.
 - Manual: This allows a user to manually bind objects on a counter.

Those two modes are mutual-exclusive with separation between processes,
objects created by different processes cannot be bound to a same counter.

For objects which don't support counter binding, we will return
pre-allocated counters.

$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
$ rdma statistic show
$ rdma statistic qp mode

Thanks


Mark Zhang (17):
  net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
  RDMA/restrack: Introduce statistic counter
  RDMA/restrack: Add an API to attach a task to a resource
  RDMA/restrack: Make is_visible_in_pid_ns() as an API
  RDMA/counter: Add set/clear per-port auto mode support
  RDMA/counter: Add "auto" configuration mode support
  IB/mlx5: Support set qp counter
  IB/mlx5: Add counter set id as a parameter for
    mlx5_ib_query_q_counters()
  IB/mlx5: Support statistic q counter configuration
  RDMA/nldev: Allow counter auto mode configration through RDMA netlink
  RDMA/netlink: Implement counter dumpit calback
  IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
  RDMA/core: Get sum value of all counters when perform a sysfs stat
    read
  RDMA/counter: Allow manual mode configuration support
  RDMA/nldev: Allow counter manual mode configration through RDMA
    netlink
  RDMA/nldev: Allow get counter mode through RDMA netlink
  RDMA/nldev: Allow get default counter statistics through RDMA netlink

 drivers/infiniband/core/Makefile     |   2 +-
 drivers/infiniband/core/counters.c   | 653 +++++++++++++++++++++++++++
 drivers/infiniband/core/device.c     |  14 +
 drivers/infiniband/core/nldev.c      | 524 ++++++++++++++++++++-
 drivers/infiniband/core/restrack.c   |  49 +-
 drivers/infiniband/core/restrack.h   |   3 +
 drivers/infiniband/core/sysfs.c      |  10 +-
 drivers/infiniband/core/verbs.c      |   9 +
 drivers/infiniband/hw/mlx5/main.c    |  88 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   6 +
 drivers/infiniband/hw/mlx5/qp.c      |  76 +++-
 include/linux/mlx5/mlx5_ifc.h        |   4 +-
 include/linux/mlx5/qp.h              |   1 +
 include/rdma/ib_verbs.h              |  30 ++
 include/rdma/rdma_counter.h          |  64 +++
 include/rdma/restrack.h              |   4 +
 include/uapi/rdma/rdma_netlink.h     |  52 ++-
 17 files changed, 1558 insertions(+), 31 deletions(-)
 create mode 100644 drivers/infiniband/core/counters.c
 create mode 100644 include/rdma/rdma_counter.h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 00/17] Statistics counter support
@ 2019-04-29  8:34 ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
 v1 -> v2:
 * Rebased to latest rdma-next
 v0 -> v1:
 * Changed wording of counter comment
 * Removed unneeded assignments
 * Added extra patch to present global counters

 * I didn't change QP type from int to be enum ib_qp_type,
   because it caused to cyclic dependency between ib_verbs.h and
   rdma_counter.h.

----------------------------------------------------

Hi,

This series from Mark provides dynamic statistics infrastructure.
He uses netlink interface to configure and retrieve those counters.

This infrastructure allows to users monitor various objects by binding
to them counters. As the beginning, we used QP object as target for
those counters, but future patches will include ODP MR information too.

Two binding modes are supported:
 - Auto: This allows a user to build automatic set of objects to a counter
   according to common criteria. For example in a per-type scheme, where in
   one process all QPs with same QP type are bound automatically to a single
   counter.
 - Manual: This allows a user to manually bind objects on a counter.

Those two modes are mutual-exclusive with separation between processes,
objects created by different processes cannot be bound to a same counter.

For objects which don't support counter binding, we will return
pre-allocated counters.

$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
$ rdma statistic show
$ rdma statistic qp mode

Thanks


Mark Zhang (17):
  net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
  RDMA/restrack: Introduce statistic counter
  RDMA/restrack: Add an API to attach a task to a resource
  RDMA/restrack: Make is_visible_in_pid_ns() as an API
  RDMA/counter: Add set/clear per-port auto mode support
  RDMA/counter: Add "auto" configuration mode support
  IB/mlx5: Support set qp counter
  IB/mlx5: Add counter set id as a parameter for
    mlx5_ib_query_q_counters()
  IB/mlx5: Support statistic q counter configuration
  RDMA/nldev: Allow counter auto mode configration through RDMA netlink
  RDMA/netlink: Implement counter dumpit calback
  IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
  RDMA/core: Get sum value of all counters when perform a sysfs stat
    read
  RDMA/counter: Allow manual mode configuration support
  RDMA/nldev: Allow counter manual mode configration through RDMA
    netlink
  RDMA/nldev: Allow get counter mode through RDMA netlink
  RDMA/nldev: Allow get default counter statistics through RDMA netlink

 drivers/infiniband/core/Makefile     |   2 +-
 drivers/infiniband/core/counters.c   | 653 +++++++++++++++++++++++++++
 drivers/infiniband/core/device.c     |  14 +
 drivers/infiniband/core/nldev.c      | 524 ++++++++++++++++++++-
 drivers/infiniband/core/restrack.c   |  49 +-
 drivers/infiniband/core/restrack.h   |   3 +
 drivers/infiniband/core/sysfs.c      |  10 +-
 drivers/infiniband/core/verbs.c      |   9 +
 drivers/infiniband/hw/mlx5/main.c    |  88 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   6 +
 drivers/infiniband/hw/mlx5/qp.c      |  76 +++-
 include/linux/mlx5/mlx5_ifc.h        |   4 +-
 include/linux/mlx5/qp.h              |   1 +
 include/rdma/ib_verbs.h              |  30 ++
 include/rdma/rdma_counter.h          |  64 +++
 include/rdma/restrack.h              |   4 +
 include/uapi/rdma/rdma_netlink.h     |  52 ++-
 17 files changed, 1558 insertions(+), 31 deletions(-)
 create mode 100644 drivers/infiniband/core/counters.c
 create mode 100644 include/rdma/rdma_counter.h

--
2.20.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH mlx5-next v2 01/17] net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
  2019-04-29  8:34 ` Leon Romanovsky
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS
qp modification can be used to change the counter of a QP.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 4b37519bd6a5..eb0cafa7bab8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -997,7 +997,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         cc_modify_allowed[0x1];
 	u8         start_pad[0x1];
 	u8         cache_line_128byte[0x1];
-	u8         reserved_at_165[0xa];
+	u8         reserved_at_165[0x4];
+	u8         rts2rts_qp_counters_set_id[0x1];
+	u8         reserved_at_16a[0x5];
 	u8         qcam_reg[0x1];
 	u8         gid_table_size[0x10];
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 02/17] RDMA/restrack: Introduce statistic counter
  2019-04-29  8:34 ` Leon Romanovsky
  (?)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Introduce statistic counter as a new resource. It allows a user
to monitor specific objects (e.g., QPs) by binding to a counter.

In some cases a user counter resource is created with task other then
"current", because its creation is done as part of rdmatool call.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/restrack.c | 22 +++++++++++++++++-----
 include/rdma/rdma_counter.h        | 18 ++++++++++++++++++
 include/rdma/restrack.h            |  4 ++++
 3 files changed, 39 insertions(+), 5 deletions(-)
 create mode 100644 include/rdma/rdma_counter.h

diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 3b5ff2f7b5f8..95573f292aae 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -6,6 +6,7 @@
 #include <rdma/rdma_cm.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/restrack.h>
+#include <rdma/rdma_counter.h>
 #include <linux/mutex.h>
 #include <linux/sched/task.h>
 #include <linux/pid_namespace.h>
@@ -45,6 +46,7 @@ static const char *type2str(enum rdma_restrack_type type)
 		[RDMA_RESTRACK_CM_ID] = "CM_ID",
 		[RDMA_RESTRACK_MR] = "MR",
 		[RDMA_RESTRACK_CTX] = "CTX",
+		[RDMA_RESTRACK_COUNTER] = "COUNTER",
 	};
 
 	return names[type];
@@ -169,6 +171,8 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
 		return container_of(res, struct ib_mr, res)->device;
 	case RDMA_RESTRACK_CTX:
 		return container_of(res, struct ib_ucontext, res)->device;
+	case RDMA_RESTRACK_COUNTER:
+		return container_of(res, struct rdma_counter, res)->device;
 	default:
 		WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
 		return NULL;
@@ -203,15 +207,22 @@ static void rdma_restrack_add(struct rdma_restrack_entry *res)
 
 	kref_init(&res->kref);
 	init_completion(&res->comp);
-	if (res->type != RDMA_RESTRACK_QP)
-		ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
-				&rt->next_id, GFP_KERNEL);
-	else {
+	if (res->type == RDMA_RESTRACK_QP) {
 		/* Special case to ensure that LQPN points to right QP */
 		struct ib_qp *qp = container_of(res, struct ib_qp, res);
 
 		ret = xa_insert(&rt->xa, qp->qp_num, res, GFP_KERNEL);
 		res->id = ret ? 0 : qp->qp_num;
+	} else if (res->type == RDMA_RESTRACK_COUNTER) {
+		/* Special case to ensure that cntn points to right counter */
+		struct rdma_counter *counter;
+
+		counter = container_of(res, struct rdma_counter, res);
+		ret = xa_insert(&rt->xa, counter->id, res, GFP_KERNEL);
+		res->id = ret ? 0 : counter->id;
+	} else {
+		ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
+				      &rt->next_id, GFP_KERNEL);
 	}
 
 	if (!ret)
@@ -237,7 +248,8 @@ EXPORT_SYMBOL(rdma_restrack_kadd);
  */
 void rdma_restrack_uadd(struct rdma_restrack_entry *res)
 {
-	if (res->type != RDMA_RESTRACK_CM_ID)
+	if ((res->type != RDMA_RESTRACK_CM_ID) &&
+	    (res->type != RDMA_RESTRACK_COUNTER))
 		res->task = NULL;
 
 	if (!res->task)
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
new file mode 100644
index 000000000000..283ac1a0cdb7
--- /dev/null
+++ b/include/rdma/rdma_counter.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/*
+ * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef _RDMA_COUNTER_H_
+#define _RDMA_COUNTER_H_
+
+#include <rdma/ib_verbs.h>
+#include <rdma/restrack.h>
+
+struct rdma_counter {
+	struct rdma_restrack_entry	res;
+	struct ib_device		*device;
+	uint32_t			id;
+	u8				port;
+};
+#endif /* _RDMA_COUNTER_H_ */
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index ecf3c7702a4f..4041a4d96524 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -42,6 +42,10 @@ enum rdma_restrack_type {
 	 * @RDMA_RESTRACK_CTX: Verbs contexts (CTX)
 	 */
 	RDMA_RESTRACK_CTX,
+	/**
+	 * @RDMA_RESTRACK_COUNTER: Statistic Counter
+	 */
+	RDMA_RESTRACK_COUNTER,
 	/**
 	 * @RDMA_RESTRACK_MAX: Last entry, used for array dclarations
 	 */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 03/17] RDMA/restrack: Add an API to attach a task to a resource
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (2 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Add rdma_restrack_attach_task() which is able to attach a task
other then "current" to a resource.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/restrack.c | 14 ++++++++++++++
 drivers/infiniband/core/restrack.h |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 95573f292aae..3714634ae296 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -194,6 +194,20 @@ void rdma_restrack_set_task(struct rdma_restrack_entry *res,
 }
 EXPORT_SYMBOL(rdma_restrack_set_task);
 
+/**
+ * rdma_restrack_attach_task() - attach the task onto this resource
+ * @res:  resource entry
+ * @task: the task to attach, the current task will be used if it is NULL.
+ */
+void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
+			       struct task_struct *task)
+{
+	if (res->task)
+		put_task_struct(res->task);
+	get_task_struct(task);
+	res->task = task;
+}
+
 static void rdma_restrack_add(struct rdma_restrack_entry *res)
 {
 	struct ib_device *dev = res_to_dev(res);
diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
index 09a1fbdf578e..d084e5f89849 100644
--- a/drivers/infiniband/core/restrack.h
+++ b/drivers/infiniband/core/restrack.h
@@ -25,4 +25,6 @@ struct rdma_restrack_root {
 
 int rdma_restrack_init(struct ib_device *dev);
 void rdma_restrack_clean(struct ib_device *dev);
+void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
+			       struct task_struct *task);
 #endif /* _RDMA_CORE_RESTRACK_H_ */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 04/17] RDMA/restrack: Make is_visible_in_pid_ns() as an API
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (3 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Remove is_visible_in_pid_ns() from nldev.c and make it as a restrack API,
so that it can be taken advantage by other parts like counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c    | 15 ++-------------
 drivers/infiniband/core/restrack.c | 13 +++++++++++++
 drivers/infiniband/core/restrack.h |  1 +
 3 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index bced945a456d..4fb6d4285970 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -991,17 +991,6 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 	},
 };
 
-static bool is_visible_in_pid_ns(struct rdma_restrack_entry *res)
-{
-	/*
-	 * 1. Kern resources should be visible in init name space only
-	 * 2. Present only resources visible in the current namespace
-	 */
-	if (rdma_is_kernel_res(res))
-		return task_active_pid_ns(current) == &init_pid_ns;
-	return task_active_pid_ns(current) == task_active_pid_ns(res->task);
-}
-
 static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack,
 			       enum rdma_restrack_type res_type)
@@ -1046,7 +1035,7 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto err;
 	}
 
-	if (!is_visible_in_pid_ns(res)) {
+	if (!rdma_is_visible_in_pid_ns(res)) {
 		ret = -ENOENT;
 		goto err_get;
 	}
@@ -1158,7 +1147,7 @@ static int res_get_common_dumpit(struct sk_buff *skb,
 	 * objects.
 	 */
 	xa_for_each(&rt->xa, id, res) {
-		if (!is_visible_in_pid_ns(res))
+		if (!rdma_is_visible_in_pid_ns(res))
 			continue;
 
 		if (idx < start || !rdma_restrack_get(res))
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 3714634ae296..bddff426ee0f 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -349,3 +349,16 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 	}
 }
 EXPORT_SYMBOL(rdma_restrack_del);
+
+bool rdma_is_visible_in_pid_ns(struct rdma_restrack_entry *res)
+{
+	/*
+	 * 1. Kern resources should be visible in init
+	 *    namespace only
+	 * 2. Present only resources visible in the current
+	 *     namespace
+	 */
+	if (rdma_is_kernel_res(res))
+		return task_active_pid_ns(current) == &init_pid_ns;
+	return task_active_pid_ns(current) == task_active_pid_ns(res->task);
+}
diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
index d084e5f89849..7bd177cc0a61 100644
--- a/drivers/infiniband/core/restrack.h
+++ b/drivers/infiniband/core/restrack.h
@@ -27,4 +27,5 @@ int rdma_restrack_init(struct ib_device *dev);
 void rdma_restrack_clean(struct ib_device *dev);
 void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
 			       struct task_struct *task);
+bool rdma_is_visible_in_pid_ns(struct rdma_restrack_entry *res);
 #endif /* _RDMA_CORE_RESTRACK_H_ */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (4 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  2019-05-22 16:56   ` Jason Gunthorpe
  -1 siblings, 1 reply; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Add an API to support set/clear per-port auto mode.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/Makefile   |  2 +-
 drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
 drivers/infiniband/core/device.c   |  4 ++
 include/rdma/ib_verbs.h            |  2 +
 include/rdma/rdma_counter.h        | 24 ++++++++++
 include/uapi/rdma/rdma_netlink.h   | 26 ++++++++++
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/counters.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 313f2349b518..cddf748c15c9 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
-				nldev.o restrack.o
+				nldev.o restrack.o counters.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
new file mode 100644
index 000000000000..bda8d945a758
--- /dev/null
+++ b/drivers/infiniband/core/counters.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
+ */
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_counter.h>
+
+#include "core_priv.h"
+#include "restrack.h"
+
+#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
+
+static int __counter_set_mode(struct rdma_counter_mode *curr,
+			      enum rdma_nl_counter_mode new_mode,
+			      enum rdma_nl_counter_mask new_mask)
+{
+	if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
+	    ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
+	     (curr->mode != RDMA_COUNTER_MODE_NONE)))
+		return -EINVAL;
+
+	curr->mode = new_mode;
+	curr->mask = new_mask;
+	return 0;
+}
+
+/**
+ * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
+ *
+ * When @on is true, the @mask must be set
+ */
+int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
+			       bool on, enum rdma_nl_counter_mask mask)
+{
+	struct rdma_port_counter *port_counter;
+	int ret;
+
+	if (!rdma_is_port_valid(dev, port))
+		return -EINVAL;
+
+	port_counter = &dev->port_data[port].port_counter;
+	mutex_lock(&port_counter->lock);
+	if (on) {
+		ret = __counter_set_mode(&port_counter->mode,
+					 RDMA_COUNTER_MODE_AUTO, mask);
+	} else {
+		if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
+			ret = -EINVAL;
+			goto out;
+		}
+		ret = __counter_set_mode(&port_counter->mode,
+					 RDMA_COUNTER_MODE_NONE, 0);
+	}
+
+out:
+	mutex_unlock(&port_counter->lock);
+	return ret;
+}
+
+void rdma_counter_init(struct ib_device *dev)
+{
+	struct rdma_port_counter *port_counter;
+	u32 port;
+
+	if (!dev->ops.alloc_hw_stats)
+		return;
+
+	rdma_for_each_port(dev, port) {
+		port_counter = &dev->port_data[port].port_counter;
+		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
+		mutex_init(&port_counter->lock);
+	}
+}
+
+void rdma_counter_cleanup(struct ib_device *dev)
+{
+}
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index fcbf2d4c865d..9204b4251fc8 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -46,6 +46,7 @@
 #include <rdma/rdma_netlink.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
+#include <rdma/rdma_counter.h>
 
 #include "core_priv.h"
 #include "restrack.h"
@@ -1254,6 +1255,8 @@ int ib_register_device(struct ib_device *device, const char *name)
 		goto dev_cleanup;
 	}
 
+	rdma_counter_init(device);
+
 	ret = enable_device_and_get(device);
 	if (ret) {
 		void (*dealloc_fn)(struct ib_device *);
@@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
 		goto out;
 
 	disable_device(ib_dev);
+	rdma_counter_cleanup(ib_dev);
 
 	/* Expedite removing unregistered pointers from the hash table */
 	free_netdevs(ib_dev);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 737ef5ed3930..003d7b49ea54 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -62,6 +62,7 @@
 #include <linux/irqflags.h>
 #include <linux/preempt.h>
 #include <uapi/rdma/ib_user_verbs.h>
+#include <rdma/rdma_counter.h>
 #include <rdma/restrack.h>
 #include <uapi/rdma/rdma_user_ioctl.h>
 #include <uapi/rdma/ib_user_ioctl_verbs.h>
@@ -2213,6 +2214,7 @@ struct ib_port_data {
 	spinlock_t netdev_lock;
 	struct net_device __rcu *netdev;
 	struct hlist_node ndev_hash_link;
+	struct rdma_port_counter port_counter;
 };
 
 /* rdma netdev type - specifies protocol type */
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 283ac1a0cdb7..a8a7c1627800 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -6,8 +6,26 @@
 #ifndef _RDMA_COUNTER_H_
 #define _RDMA_COUNTER_H_
 
+#include <linux/mutex.h>
+
 #include <rdma/ib_verbs.h>
 #include <rdma/restrack.h>
+#include <rdma/rdma_netlink.h>
+
+struct auto_mode_param {
+	int qp_type;
+};
+
+struct rdma_counter_mode {
+	enum rdma_nl_counter_mode mode;
+	enum rdma_nl_counter_mask mask;
+	struct auto_mode_param param;
+};
+
+struct rdma_port_counter {
+	struct rdma_counter_mode mode;
+	struct mutex lock;
+};
 
 struct rdma_counter {
 	struct rdma_restrack_entry	res;
@@ -15,4 +33,10 @@ struct rdma_counter {
 	uint32_t			id;
 	u8				port;
 };
+
+void rdma_counter_init(struct ib_device *dev);
+void rdma_counter_cleanup(struct ib_device *dev);
+int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
+			       bool on, enum rdma_nl_counter_mask mask);
+
 #endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 42a8bdc40a14..cd3cace46b27 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -484,4 +484,30 @@ enum rdma_nldev_attr {
 	 */
 	RDMA_NLDEV_ATTR_MAX
 };
+
+/*
+ * Supported counter bind modes. All modes are mutual-exclusive.
+ */
+enum rdma_nl_counter_mode {
+	RDMA_COUNTER_MODE_NONE,
+
+	/*
+	 * A qp is bound with a counter automatically during initialization
+	 * based on the auto mode (e.g., qp type, ...)
+	 */
+	RDMA_COUNTER_MODE_AUTO,
+
+	/*
+	 * Always the end
+	 */
+	RDMA_COUNTER_MODE_MAX,
+};
+
+/*
+ * Supported criteria in counter auto mode.
+ * Currently only "qp type" is supported
+ */
+enum rdma_nl_counter_mask {
+	RDMA_COUNTER_MASK_QP_TYPE = 1,
+};
 #endif /* _UAPI_RDMA_NETLINK_H */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration mode support
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (5 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  2019-05-22 17:11   ` Jason Gunthorpe
  2019-05-22 17:15   ` Jason Gunthorpe
  -1 siblings, 2 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

In auto mode all QPs belong to one category are bind automatically to
a single counter set. Currently only "qp type" is supported.

In this mode the qp counter is set in RST2INIT modification, and when
a qp is destroyed the counter is unbound.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c | 222 +++++++++++++++++++++++++++++
 drivers/infiniband/core/device.c   |   2 +
 drivers/infiniband/core/verbs.c    |   9 ++
 include/rdma/ib_verbs.h            |  18 +++
 include/rdma/rdma_counter.h        |   8 ++
 5 files changed, 259 insertions(+)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index bda8d945a758..665e0d43c21b 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -57,6 +57,228 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 	return ret;
 }
 
+static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
+					       enum rdma_nl_counter_mode mode)
+{
+	struct rdma_counter *counter;
+
+	counter = kzalloc(sizeof(*counter), GFP_KERNEL);
+	if (!counter)
+		return NULL;
+
+	counter->device    = dev;
+	counter->port      = port;
+	counter->res.type  = RDMA_RESTRACK_COUNTER;
+	counter->mode.mode = mode;
+	atomic_set(&counter->usecnt, 0);
+	mutex_init(&counter->lock);
+
+	return counter;
+}
+
+static void rdma_counter_dealloc(struct rdma_counter *counter)
+{
+	rdma_restrack_del(&counter->res);
+	kfree(counter);
+}
+
+static void auto_mode_init_counter(struct rdma_counter *counter,
+				   const struct ib_qp *qp,
+				   enum rdma_nl_counter_mask new_mask)
+{
+	struct auto_mode_param *param = &counter->mode.param;
+
+	counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
+	counter->mode.mask = new_mask;
+
+	if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
+		param->qp_type = qp->qp_type;
+}
+
+static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
+			    enum rdma_nl_counter_mask auto_mask)
+{
+	struct auto_mode_param *param = &counter->mode.param;
+	bool match = true;
+
+	if (rdma_is_kernel_res(&counter->res) != rdma_is_kernel_res(&qp->res))
+		return false;
+
+	/* Ensure that counter belong to right PID */
+	if (!rdma_is_kernel_res(&counter->res) &&
+	    !rdma_is_kernel_res(&qp->res) &&
+	    (task_pid_vnr(counter->res.task) != current->pid))
+		return false;
+
+	if (auto_mask & RDMA_COUNTER_MASK_QP_TYPE)
+		match &= (param->qp_type == qp->qp_type);
+
+	return match;
+}
+
+static int __rdma_counter_bind_qp(struct rdma_counter *counter,
+				  struct ib_qp *qp)
+{
+	int ret;
+
+	if (qp->counter)
+		return -EINVAL;
+
+	if (!qp->device->ops.counter_bind_qp)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&counter->lock);
+	ret = qp->device->ops.counter_bind_qp(counter, qp);
+	mutex_unlock(&counter->lock);
+
+	return ret;
+}
+
+static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
+{
+	struct rdma_counter *counter = qp->counter;
+	int ret;
+
+	if (!qp->device->ops.counter_unbind_qp)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&counter->lock);
+	ret = qp->device->ops.counter_unbind_qp(qp, force);
+	mutex_unlock(&counter->lock);
+
+	return ret;
+}
+
+/**
+ * rdma_get_counter_auto_mode - Find the counter that @qp should be bound
+ *     with in auto mode
+ *
+ * Return: The counter (with ref-count increased) if found
+ */
+static struct rdma_counter *rdma_get_counter_auto_mode(struct ib_qp *qp,
+						       u8 port)
+{
+	struct rdma_port_counter *port_counter;
+	struct rdma_counter *counter = NULL;
+	struct ib_device *dev = qp->device;
+	struct rdma_restrack_entry *res;
+	struct rdma_restrack_root *rt;
+	unsigned long id = 0;
+
+	port_counter = &dev->port_data[port].port_counter;
+	rt = &dev->res[RDMA_RESTRACK_COUNTER];
+	xa_lock(&rt->xa);
+	xa_for_each(&rt->xa, id, res) {
+		if (!rdma_is_visible_in_pid_ns(res))
+			continue;
+
+		if (!rdma_restrack_get(res))
+			continue;
+
+		counter = container_of(res, struct rdma_counter, res);
+		if ((counter->device != qp->device) || (counter->port != port))
+			goto next;
+
+		if (auto_mode_match(qp, counter, port_counter->mode.mask))
+			break;
+next:
+		rdma_restrack_put(res);
+		counter = NULL;
+	}
+
+	xa_unlock(&rt->xa);
+	return counter;
+}
+
+static void rdma_counter_res_add(struct rdma_counter *counter,
+				 struct ib_qp *qp)
+{
+	if (rdma_is_kernel_res(&qp->res)) {
+		rdma_restrack_set_task(&counter->res, qp->res.kern_name);
+		rdma_restrack_kadd(&counter->res);
+	} else {
+		rdma_restrack_attach_task(&counter->res, qp->res.task);
+		rdma_restrack_uadd(&counter->res);
+	}
+}
+
+/**
+ * rdma_counter_bind_qp_auto - Check and bind the QP to a counter base on
+ *   the auto-mode rule
+ */
+int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port)
+{
+	struct rdma_port_counter *port_counter;
+	struct ib_device *dev = qp->device;
+	struct rdma_counter *counter;
+	int ret;
+
+	if (!rdma_is_port_valid(dev, port))
+		return -EINVAL;
+
+	port_counter = &dev->port_data[port].port_counter;
+	if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO)
+		return 0;
+
+	counter = rdma_get_counter_auto_mode(qp, port);
+	if (counter) {
+		ret = __rdma_counter_bind_qp(counter, qp);
+		if (ret) {
+			rdma_restrack_put(&counter->res);
+			return ret;
+		}
+	} else {
+		counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_AUTO);
+		if (!counter)
+			return -ENOMEM;
+
+		auto_mode_init_counter(counter, qp, port_counter->mode.mask);
+
+		ret = __rdma_counter_bind_qp(counter, qp);
+		if (ret)
+			goto err_bind;
+
+		rdma_counter_res_add(counter, qp);
+		if (!rdma_restrack_get(&counter->res)) {
+			ret = -EINVAL;
+			goto err_get;
+		}
+	}
+
+	atomic_inc(&counter->usecnt);
+	return 0;
+
+err_get:
+	__rdma_counter_unbind_qp(qp, false);
+err_bind:
+	rdma_counter_dealloc(counter);
+	return ret;
+}
+
+/**
+ * rdma_counter_unbind_qp - Unbind a qp from a counter
+ * @force:
+ *   true - Decrease the counter ref-count anyway (e.g., qp destroy)
+ */
+int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
+{
+	struct rdma_counter *counter = qp->counter;
+	int ret;
+
+	if (!counter)
+		return -EINVAL;
+
+	ret = __rdma_counter_unbind_qp(qp, force);
+	if (ret && !force)
+		return ret;
+
+	rdma_restrack_put(&counter->res);
+	if (atomic_dec_and_test(&counter->usecnt))
+		rdma_counter_dealloc(counter);
+
+	return 0;
+}
+
 void rdma_counter_init(struct ib_device *dev)
 {
 	struct rdma_port_counter *port_counter;
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 9204b4251fc8..dfaa57de871f 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2349,6 +2349,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, set_vf_guid);
 	SET_DEVICE_OP(dev_ops, set_vf_link_state);
 	SET_DEVICE_OP(dev_ops, unmap_fmr);
+	SET_DEVICE_OP(dev_ops, counter_bind_qp);
+	SET_DEVICE_OP(dev_ops, counter_unbind_qp);
 
 	SET_OBJ_SIZE(dev_ops, ib_ah);
 	SET_OBJ_SIZE(dev_ops, ib_pd);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 7313edc9f091..060d2f071ea7 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1682,6 +1682,14 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
 		}
 	}
 
+	/*
+	 * Bind this qp to a counter automatically based on the rdma counter
+	 * rules. This only set in RST2INIT with port specified
+	 */
+	if (!qp->counter && (attr_mask & IB_QP_PORT) &&
+	    ((attr_mask & IB_QP_STATE) && attr->qp_state == IB_QPS_INIT))
+		rdma_counter_bind_qp_auto(qp, attr->port_num);
+
 	ret = ib_security_modify_qp(qp, attr, attr_mask, udata);
 	if (ret)
 		goto out;
@@ -1877,6 +1885,7 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
 	if (!qp->uobject)
 		rdma_rw_cleanup_mrs(qp);
 
+	rdma_counter_unbind_qp(qp, true);
 	rdma_restrack_del(&qp->res);
 	ret = qp->device->ops.destroy_qp(qp, udata);
 	if (!ret) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 003d7b49ea54..06c77b3e42cd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1791,6 +1791,9 @@ struct ib_qp {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry     res;
+
+	/* The counter the qp is bind to */
+	struct rdma_counter    *counter;
 };
 
 struct ib_dm {
@@ -2553,6 +2556,21 @@ struct ib_device_ops {
 	 */
 	void (*dealloc_driver)(struct ib_device *dev);
 
+	/**
+	 * counter_bind_qp - Bind a QP to a counter.
+	 * @counter - The counter to be bound. If counter->id is zero then
+	 *   the driver needs to allocate a new counter and set counter->id
+	 */
+	int (*counter_bind_qp)(struct rdma_counter *counter, struct ib_qp *qp);
+	/**
+	 * counter_unbind_qp - Unbind the qp from the dynamically-allocated
+	 *   counter and bind it onto the default one. If this is the last
+	 *   bound qp, then this counter will be deallocated.
+	 * @force - If it is true then free the counter in case of any error.
+	 *   used in cases like qp destroy.
+	 */
+	int (*counter_unbind_qp)(struct ib_qp *qp, bool force);
+
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_pd);
 	DECLARE_RDMA_OBJ_SIZE(ib_srq);
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index a8a7c1627800..159b8ba3487e 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -7,11 +7,14 @@
 #define _RDMA_COUNTER_H_
 
 #include <linux/mutex.h>
+#include <linux/pid_namespace.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/restrack.h>
 #include <rdma/rdma_netlink.h>
 
+struct ib_qp;
+
 struct auto_mode_param {
 	int qp_type;
 };
@@ -31,6 +34,9 @@ struct rdma_counter {
 	struct rdma_restrack_entry	res;
 	struct ib_device		*device;
 	uint32_t			id;
+	atomic_t			usecnt;
+	struct rdma_counter_mode	mode;
+	struct mutex			lock;
 	u8				port;
 };
 
@@ -38,5 +44,7 @@ void rdma_counter_init(struct ib_device *dev);
 void rdma_counter_cleanup(struct ib_device *dev);
 int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 			       bool on, enum rdma_nl_counter_mask mask);
+int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
+int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
 
 #endif /* _RDMA_COUNTER_H_ */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (6 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  2019-04-29 18:22   ` Saeed Mahameed
  -1 siblings, 1 reply; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Support bind a qp with counter. If counter is null then bind the qp to
the default counter. Different QP state has different operation:
- RESET: Set the counter field so that it will take effective
  during RST2INIT change;
- RTS: Issue an RTS2RTS change to update the QP counter;
- Other: Set the counter field and mark the counter_pending flag,
  when QP is moved to RTS state and this flag is set, then issue
  an RTS2RTS modification to update the counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 +++
 drivers/infiniband/hw/mlx5/qp.c      | 76 +++++++++++++++++++++++++++-
 include/linux/mlx5/qp.h              |  1 +
 3 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 55b8bdb402b6..447f8ad5abbd 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -437,6 +437,10 @@ struct mlx5_ib_qp {
 	u32			flags_en;
 	/* storage for qp sub type when core qp type is IB_QPT_DRIVER */
 	enum ib_qp_type		qp_sub_type;
+	/* A flag to indicate if there's a new counter is configured
+	 * but not take effective
+	 */
+	u32                     counter_pending;
 };
 
 struct mlx5_ib_cq_buf {
@@ -1418,4 +1422,6 @@ void mlx5_ib_put_xlt_emergency_page(void);
 int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
 			struct mlx5_bfreg_info *bfregi, u32 bfregn,
 			bool dyn_bfreg);
+
+int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter);
 #endif /* MLX5_IB_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index efe1f6f0c351..29e3fcd66510 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -34,6 +34,7 @@
 #include <rdma/ib_umem.h>
 #include <rdma/ib_cache.h>
 #include <rdma/ib_user_verbs.h>
+#include <rdma/rdma_counter.h>
 #include <linux/mlx5/fs.h>
 #include "mlx5_ib.h"
 #include "ib_rep.h"
@@ -3365,6 +3366,35 @@ static unsigned int get_tx_affinity(struct mlx5_ib_dev *dev,
 	return tx_port_affinity;
 }
 
+static int __mlx5_ib_qp_set_counter(struct ib_qp *qp,
+				    struct rdma_counter *counter)
+{
+	struct mlx5_ib_dev *dev = to_mdev(qp->device);
+	struct mlx5_ib_qp *mqp = to_mqp(qp);
+	struct mlx5_qp_context context = {};
+	struct mlx5_ib_port *mibport = NULL;
+	struct mlx5_ib_qp_base *base;
+	u32 set_id;
+
+	if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id))
+		return 0;
+
+	if (counter) {
+		set_id = counter->id;
+	} else {
+		mibport = &dev->port[mqp->port - 1];
+		set_id = mibport->cnts.set_id;
+	}
+
+	base = &mqp->trans_qp.base;
+	context.qp_counter_set_usr_page &= cpu_to_be32(0xffffff);
+	context.qp_counter_set_usr_page |= cpu_to_be32(set_id << 24);
+	return mlx5_core_qp_modify(dev->mdev,
+				   MLX5_CMD_OP_RTS2RTS_QP,
+				   MLX5_QP_OPTPAR_COUNTER_SET_ID,
+				   &context, &base->mqp);
+}
+
 static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 			       const struct ib_qp_attr *attr, int attr_mask,
 			       enum ib_qp_state cur_state,
@@ -3418,6 +3448,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 	struct mlx5_ib_port *mibport = NULL;
 	enum mlx5_qp_state mlx5_cur, mlx5_new;
 	enum mlx5_qp_optpar optpar;
+	u32 set_id = 0;
 	int mlx5_st;
 	int err;
 	u16 op;
@@ -3580,8 +3611,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 			port_num = 0;
 
 		mibport = &dev->port[port_num];
+		if (ibqp->counter)
+			set_id = ibqp->counter->id;
+		else
+			set_id = mibport->cnts.set_id;
 		context->qp_counter_set_usr_page |=
-			cpu_to_be32((u32)(mibport->cnts.set_id) << 24);
+			cpu_to_be32(set_id << 24);
 	}
 
 	if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
@@ -3609,7 +3644,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 
 		raw_qp_param.operation = op;
 		if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
-			raw_qp_param.rq_q_ctr_id = mibport->cnts.set_id;
+			raw_qp_param.rq_q_ctr_id = set_id;
 			raw_qp_param.set_mask |= MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID;
 		}
 
@@ -3686,6 +3721,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 		qp->db.db[MLX5_SND_DBR] = 0;
 	}
 
+	if ((new_state == IB_QPS_RTS) && qp->counter_pending) {
+		err = __mlx5_ib_qp_set_counter(ibqp, ibqp->counter);
+		if (!err)
+			qp->counter_pending = 0;
+	}
+
 out:
 	kfree(context);
 	return err;
@@ -6347,3 +6388,34 @@ void mlx5_ib_drain_rq(struct ib_qp *qp)
 
 	handle_drain_completion(cq, &rdrain, dev);
 }
+
+/**
+ * Bind a qp to a counter. If @counter is NULL then bind the qp to
+ * the default counter
+ */
+int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter)
+{
+	struct mlx5_ib_qp *mqp = to_mqp(qp);
+	int err = 0;
+
+	mutex_lock(&mqp->mutex);
+	if (mqp->state == IB_QPS_RESET) {
+		qp->counter = counter;
+		goto out;
+	}
+
+	if (mqp->state == IB_QPS_RTS) {
+		err = __mlx5_ib_qp_set_counter(qp, counter);
+		if (!err)
+			qp->counter = counter;
+
+		goto out;
+	}
+
+	mqp->counter_pending = 1;
+	qp->counter = counter;
+
+out:
+	mutex_unlock(&mqp->mutex);
+	return err;
+}
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 0343c81d4c5f..b0b47106bc76 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -70,6 +70,7 @@ enum mlx5_qp_optpar {
 	MLX5_QP_OPTPAR_CQN_RCV			= 1 << 19,
 	MLX5_QP_OPTPAR_DC_HS			= 1 << 20,
 	MLX5_QP_OPTPAR_DC_KEY			= 1 << 21,
+	MLX5_QP_OPTPAR_COUNTER_SET_ID		= 1 << 25,
 };
 
 enum mlx5_qp_state {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 08/17] IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters()
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (7 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Add counter set id as a parameter so that this API can be used for
querying any q counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6135a0b285de..06da76df4aa1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5341,7 +5341,8 @@ static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
 
 static int mlx5_ib_query_q_counters(struct mlx5_core_dev *mdev,
 				    struct mlx5_ib_port *port,
-				    struct rdma_hw_stats *stats)
+				    struct rdma_hw_stats *stats,
+				    u16 set_id)
 {
 	int outlen = MLX5_ST_SZ_BYTES(query_q_counter_out);
 	void *out;
@@ -5352,9 +5353,7 @@ static int mlx5_ib_query_q_counters(struct mlx5_core_dev *mdev,
 	if (!out)
 		return -ENOMEM;
 
-	ret = mlx5_core_query_q_counter(mdev,
-					port->cnts.set_id, 0,
-					out, outlen);
+	ret = mlx5_core_query_q_counter(mdev, set_id, 0, out, outlen);
 	if (ret)
 		goto free;
 
@@ -5414,7 +5413,8 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
 		       port->cnts.num_ext_ppcnt_counters;
 
 	/* q_counters are per IB device, query the master mdev */
-	ret = mlx5_ib_query_q_counters(dev->mdev, port, stats);
+	ret = mlx5_ib_query_q_counters(dev->mdev, port, stats,
+				       port->cnts.set_id);
 	if (ret)
 		return ret;
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 09/17] IB/mlx5: Support statistic q counter configuration
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (8 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Add support for ib callbacks counter_bind_qp() and counter_unbind_qp().

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 55 +++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 06da76df4aa1..18a3e855d45b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5450,6 +5450,59 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
 	return num_counters;
 }
 
+static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
+				   struct ib_qp *qp)
+{
+	struct mlx5_ib_dev *dev = to_mdev(qp->device);
+	u16 cnt_set_id = 0;
+	int err;
+
+	if (counter->id == 0) {
+		err = mlx5_cmd_alloc_q_counter(dev->mdev,
+					       &cnt_set_id,
+					       MLX5_SHARED_RESOURCE_UID);
+		if (err)
+			return err;
+		counter->id = cnt_set_id;
+	}
+
+	err = mlx5_ib_qp_set_counter(qp, counter);
+	if (err)
+		goto fail_set_counter;
+
+	return 0;
+
+fail_set_counter:
+	if (cnt_set_id != 0) {
+		mlx5_core_dealloc_q_counter(dev->mdev, cnt_set_id);
+		counter->id = 0;
+	}
+
+	return err;
+}
+
+static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, bool force)
+{
+	struct mlx5_ib_dev *dev = to_mdev(qp->device);
+	struct rdma_counter *counter = qp->counter;
+	int err;
+
+	err = mlx5_ib_qp_set_counter(qp, NULL);
+	if (err && !force)
+		return err;
+
+	/*
+	 * Deallocate the counter if this is the last QP bound on it;
+	 * If @force is set then we still deallocate the q counter
+	 * no matter if there's any error in previous. used for cases
+	 * like qp destroy.
+	 */
+	if (atomic_read(&counter->usecnt) == 1)
+		return mlx5_core_dealloc_q_counter(dev->mdev, counter->id);
+
+	return 0;
+}
+
 static int mlx5_ib_rn_get_params(struct ib_device *device, u8 port_num,
 				 enum rdma_netdev_t type,
 				 struct rdma_netdev_alloc_params *params)
@@ -6306,6 +6359,8 @@ static void mlx5_ib_stage_odp_cleanup(struct mlx5_ib_dev *dev)
 static const struct ib_device_ops mlx5_ib_dev_hw_stats_ops = {
 	.alloc_hw_stats = mlx5_ib_alloc_hw_stats,
 	.get_hw_stats = mlx5_ib_get_hw_stats,
+	.counter_bind_qp = mlx5_ib_counter_bind_qp,
+	.counter_unbind_qp = mlx5_ib_counter_unbind_qp,
 };
 
 static int mlx5_ib_stage_counters_init(struct mlx5_ib_dev *dev)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 10/17] RDMA/nldev: Allow counter auto mode configration through RDMA netlink
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (9 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Provide an option to enable/disable per-port counter auto mode through
RDMA netlink. Limit it to users with ADMIN capability only.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c  | 78 ++++++++++++++++++++++++++++++++
 include/uapi/rdma/rdma_netlink.h |  8 ++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 4fb6d4285970..b46992ed3553 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -120,6 +120,9 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_DEV_PROTOCOL]		= { .type = NLA_NUL_STRING,
 				    .len = RDMA_NLDEV_ATTR_ENTRY_STRLEN },
 	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_STAT_MODE]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_STAT_RES]		= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]	= { .type = NLA_U32 },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -1384,6 +1387,78 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
+static int nldev_stat_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	u32 index, port, mode, mask = 0;
+	struct ib_device *device;
+	struct sk_buff *msg;
+	int ret;
+
+	ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	/* Currently only counter for QP is supported */
+	if (ret || !tb[RDMA_NLDEV_ATTR_STAT_RES] ||
+	    !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
+	    !tb[RDMA_NLDEV_ATTR_PORT_INDEX] || !tb[RDMA_NLDEV_ATTR_STAT_MODE])
+		return -EINVAL;
+
+	if (nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP)
+		return -EINVAL;
+
+	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(sock_net(skb->sk), index);
+	if (!device)
+		return -EINVAL;
+
+	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+	if (!rdma_is_port_valid(device, port)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto err;
+	}
+	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					 RDMA_NLDEV_CMD_STAT_SET),
+			0, 0);
+
+	mode = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_MODE]);
+	if (mode != RDMA_COUNTER_MODE_AUTO) {
+		ret = -EMSGSIZE;
+		goto err_msg;
+	}
+
+	if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK])
+		mask = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]);
+
+	ret = rdma_counter_set_auto_mode(device, port,
+					 mask ? true : false, mask);
+	if (ret)
+		goto err_msg;
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, mode) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) {
+		ret = -EMSGSIZE;
+		goto err_msg;
+	}
+
+	nlmsg_end(msg, nlh);
+	ib_device_put(device);
+	return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_msg:
+	nlmsg_free(msg);
+err:
+	ib_device_put(device);
+	return ret;
+}
+
 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_GET] = {
 		.doit = nldev_get_doit,
@@ -1434,6 +1509,9 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	},
 	[RDMA_NLDEV_CMD_SYS_SET] = {
 		.doit = nldev_set_sys_set_doit,
+	},
+	[RDMA_NLDEV_CMD_STAT_SET] = {
+		.doit = nldev_stat_set_doit,
 		.flags = RDMA_NL_ADMIN_PERM,
 	},
 };
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index cd3cace46b27..673fdf9ba10f 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -267,6 +267,8 @@ enum rdma_nldev_command {
 
 	RDMA_NLDEV_CMD_RES_PD_GET, /* can dump */
 
+	RDMA_NLDEV_CMD_STAT_SET,
+
 	RDMA_NLDEV_NUM_OPS
 };
 
@@ -478,6 +480,12 @@ enum rdma_nldev_attr {
 	 * File descriptor handle of the net namespace object
 	 */
 	RDMA_NLDEV_NET_NS_FD,			/* u32 */
+	/*
+	 * Counter-specific attributes.
+	 */
+	RDMA_NLDEV_ATTR_STAT_MODE,		/* u32 */
+	RDMA_NLDEV_ATTR_STAT_RES,		/* u32 */
+	RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK,	/* u32 */
 
 	/*
 	 * Always the end
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (10 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  2019-05-22 17:21   ` Jason Gunthorpe
  2019-05-22 17:22   ` Jason Gunthorpe
  -1 siblings, 2 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

This patch adds the ability to return all available counters
together with their properties and hwstats.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c |  28 +++++
 drivers/infiniband/core/device.c   |   2 +
 drivers/infiniband/core/nldev.c    | 173 +++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h            |  10 ++
 include/rdma/rdma_counter.h        |   3 +
 include/uapi/rdma/rdma_netlink.h   |  10 +-
 6 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 665e0d43c21b..36cd9eca1e46 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 {
 	struct rdma_counter *counter;
 
+	if (!dev->ops.counter_alloc_stats)
+		return NULL;
+
 	counter = kzalloc(sizeof(*counter), GFP_KERNEL);
 	if (!counter)
 		return NULL;
@@ -69,16 +72,25 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 	counter->device    = dev;
 	counter->port      = port;
 	counter->res.type  = RDMA_RESTRACK_COUNTER;
+	counter->stats     = dev->ops.counter_alloc_stats(counter);
+	if (!counter->stats)
+		goto err_stats;
+
 	counter->mode.mode = mode;
 	atomic_set(&counter->usecnt, 0);
 	mutex_init(&counter->lock);
 
 	return counter;
+
+err_stats:
+	kfree(counter);
+	return NULL;
 }
 
 static void rdma_counter_dealloc(struct rdma_counter *counter)
 {
 	rdma_restrack_del(&counter->res);
+	kfree(counter->stats);
 	kfree(counter);
 }
 
@@ -279,6 +291,22 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
 	return 0;
 }
 
+int rdma_counter_query_stats(struct rdma_counter *counter)
+{
+	int ret;
+
+	struct ib_device *dev = counter->device;
+
+	if (!dev->ops.counter_update_stats)
+		return -EINVAL;
+
+	mutex_lock(&counter->lock);
+	ret = dev->ops.counter_update_stats(counter);
+	mutex_unlock(&counter->lock);
+
+	return ret;
+}
+
 void rdma_counter_init(struct ib_device *dev)
 {
 	struct rdma_port_counter *port_counter;
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index dfaa57de871f..c56ffc61ab1e 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2351,6 +2351,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, unmap_fmr);
 	SET_DEVICE_OP(dev_ops, counter_bind_qp);
 	SET_DEVICE_OP(dev_ops, counter_unbind_qp);
+	SET_DEVICE_OP(dev_ops, counter_alloc_stats);
+	SET_DEVICE_OP(dev_ops, counter_update_stats);
 
 	SET_OBJ_SIZE(dev_ops, ib_ah);
 	SET_OBJ_SIZE(dev_ops, ib_pd);
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index b46992ed3553..1f7de88effad 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -123,6 +123,13 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_STAT_MODE]		= { .type = NLA_U32 },
 	[RDMA_NLDEV_ATTR_STAT_RES]		= { .type = NLA_U32 },
 	[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]	= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_STAT_COUNTER]		= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY]	= { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]       = { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_STAT_HWCOUNTERS]       = { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY]  = { .type = NLA_NESTED },
+	[RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME] = { .type = NLA_NUL_STRING },
+	[RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE] = { .type = NLA_U64 },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -625,6 +632,160 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
 err:	return -EMSGSIZE;
 }
 
+static int fill_stat_counter_mode(struct sk_buff *msg,
+				  struct rdma_counter *counter)
+{
+	struct rdma_counter_mode *m = &counter->mode;
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, m->mode))
+		return -EMSGSIZE;
+
+	if (m->mode == RDMA_COUNTER_MODE_AUTO)
+		if ((m->mask & RDMA_COUNTER_MASK_QP_TYPE) &&
+		    nla_put_u8(msg, RDMA_NLDEV_ATTR_RES_TYPE, m->param.qp_type))
+			return -EMSGSIZE;
+
+	return 0;
+}
+
+static int fill_stat_counter_qp_entry(struct sk_buff *msg, u32 qpn)
+{
+	struct nlattr *entry_attr;
+
+	entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_QP_ENTRY);
+	if (!entry_attr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qpn))
+		goto err;
+
+	nla_nest_end(msg, entry_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, entry_attr);
+	return -EMSGSIZE;
+}
+
+static int fill_stat_counter_qps(struct sk_buff *msg,
+				 struct rdma_counter *counter)
+{
+	struct rdma_restrack_entry *res;
+	struct rdma_restrack_root *rt;
+	struct nlattr *table_attr;
+	struct ib_qp *qp = NULL;
+	unsigned long id = 0;
+	int ret = 0;
+
+	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_QP);
+
+	rt = &counter->device->res[RDMA_RESTRACK_QP];
+	xa_lock(&rt->xa);
+	xa_for_each(&rt->xa, id, res) {
+		if (!rdma_is_visible_in_pid_ns(res))
+			continue;
+
+		if (!rdma_restrack_get(res))
+			continue;
+
+		qp = container_of(res, struct ib_qp, res);
+		if (qp->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
+			goto next;
+
+		if (!qp->counter || (qp->counter->id != counter->id))
+			goto next;
+
+		ret = fill_stat_counter_qp_entry(msg, qp->qp_num);
+		if (ret) {
+			rdma_restrack_put(res);
+			goto err;
+		}
+
+next:
+		rdma_restrack_put(res);
+	}
+
+	xa_unlock(&rt->xa);
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err:
+	xa_unlock(&rt->xa);
+	nla_nest_cancel(msg, table_attr);
+	return ret;
+}
+
+static int fill_stat_hwcounter_entry(struct sk_buff *msg,
+				     const char *name, u64 value)
+{
+	struct nlattr *entry_attr;
+
+	entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY);
+	if (!entry_attr)
+		return -EMSGSIZE;
+
+	if (nla_put_string(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME,
+			   name))
+		goto err;
+	if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE,
+			      value, RDMA_NLDEV_ATTR_PAD))
+		goto err;
+
+	nla_nest_end(msg, entry_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, entry_attr);
+	return -EMSGSIZE;
+}
+
+static int fill_stat_counter_hwcounters(struct sk_buff *msg,
+					struct rdma_counter *counter)
+{
+	struct rdma_hw_stats *st = counter->stats;
+	struct nlattr *table_attr;
+	int i;
+
+	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
+	if (!table_attr)
+		return -EMSGSIZE;
+
+	for (i = 0; i < st->num_counters; i++)
+		if (fill_stat_hwcounter_entry(msg, st->names[i], st->value[i]))
+			goto err;
+
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, table_attr);
+	return -EMSGSIZE;
+}
+
+static int fill_res_counter_entry(struct sk_buff *msg, bool has_cap_net_admin,
+				  struct rdma_restrack_entry *res,
+				  uint32_t port)
+{
+	struct rdma_counter *counter =
+		container_of(res, struct rdma_counter, res);
+
+	if (port && port != counter->port)
+		return 0;
+
+	/* Dump it even query failed */
+	rdma_counter_query_stats(counter);
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, counter->port) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, counter->id) ||
+	    fill_res_name_pid(msg, &counter->res) ||
+	    fill_stat_counter_mode(msg, counter) ||
+	    fill_stat_counter_qps(msg, counter) ||
+	    fill_stat_counter_hwcounters(msg, counter))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct netlink_ext_ack *extack)
 {
@@ -992,6 +1153,13 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 		.entry = RDMA_NLDEV_ATTR_RES_PD_ENTRY,
 		.id = RDMA_NLDEV_ATTR_RES_PDN,
 	},
+	[RDMA_RESTRACK_COUNTER] = {
+		.fill_res_func = fill_res_counter_entry,
+		.nldev_cmd = RDMA_NLDEV_CMD_STAT_GET,
+		.nldev_attr = RDMA_NLDEV_ATTR_STAT_COUNTER,
+		.entry = RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY,
+		.id = RDMA_NLDEV_ATTR_STAT_COUNTER_ID,
+	},
 };
 
 static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -1228,6 +1396,7 @@ RES_GET_FUNCS(cm_id, RDMA_RESTRACK_CM_ID);
 RES_GET_FUNCS(cq, RDMA_RESTRACK_CQ);
 RES_GET_FUNCS(pd, RDMA_RESTRACK_PD);
 RES_GET_FUNCS(mr, RDMA_RESTRACK_MR);
+RES_GET_FUNCS(counter, RDMA_RESTRACK_COUNTER);
 
 static LIST_HEAD(link_ops);
 static DECLARE_RWSEM(link_ops_rwsem);
@@ -1514,6 +1683,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 		.doit = nldev_stat_set_doit,
 		.flags = RDMA_NL_ADMIN_PERM,
 	},
+	[RDMA_NLDEV_CMD_STAT_GET] = {
+		.doit = nldev_res_get_counter_doit,
+		.dump = nldev_res_get_counter_dumpit,
+	},
 };
 
 void __init nldev_init(void)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 06c77b3e42cd..d12770f63e4e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2570,6 +2570,16 @@ struct ib_device_ops {
 	 *   used in cases like qp destroy.
 	 */
 	int (*counter_unbind_qp)(struct ib_qp *qp, bool force);
+	/**
+	 * counter_alloc_stats - Allocate a struct rdma_hw_stats and fill in
+	 * the driver initialized data.
+	 */
+	struct rdma_hw_stats *(*counter_alloc_stats)(
+		struct rdma_counter *counter);
+	/**
+	 * counter_update_stats - Query the stats value of this counter
+	 */
+	int (*counter_update_stats)(struct rdma_counter *counter);
 
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_pd);
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 159b8ba3487e..4bc62909a638 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -37,6 +37,7 @@ struct rdma_counter {
 	atomic_t			usecnt;
 	struct rdma_counter_mode	mode;
 	struct mutex			lock;
+	struct rdma_hw_stats		*stats;
 	u8				port;
 };
 
@@ -47,4 +48,6 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
 int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
 
+int rdma_counter_query_stats(struct rdma_counter *counter);
+
 #endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 673fdf9ba10f..33eee6fb16b2 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -269,6 +269,8 @@ enum rdma_nldev_command {
 
 	RDMA_NLDEV_CMD_STAT_SET,
 
+	RDMA_NLDEV_CMD_STAT_GET, /* can dump */
+
 	RDMA_NLDEV_NUM_OPS
 };
 
@@ -486,7 +488,13 @@ enum rdma_nldev_attr {
 	RDMA_NLDEV_ATTR_STAT_MODE,		/* u32 */
 	RDMA_NLDEV_ATTR_STAT_RES,		/* u32 */
 	RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK,	/* u32 */
-
+	RDMA_NLDEV_ATTR_STAT_COUNTER,		/* nested table */
+	RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY,	/* nested table */
+	RDMA_NLDEV_ATTR_STAT_COUNTER_ID,	/* u32 */
+	RDMA_NLDEV_ATTR_STAT_HWCOUNTERS,	/* nested table */
+	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY,	/* nested table */
+	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME,	/* string */
+	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE,	/* u64 */
 	/*
 	 * Always the end
 	 */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 12/17] IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (11 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Add support for ib callback counter_alloc_stats() and
counter_update_stats().

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 18a3e855d45b..d69cfe511c37 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5450,6 +5450,27 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
 	return num_counters;
 }
 
+static struct rdma_hw_stats *
+mlx5_ib_counter_alloc_stats(struct rdma_counter *counter)
+{
+	struct mlx5_ib_dev *dev = to_mdev(counter->device);
+	struct mlx5_ib_port *port = &dev->port[counter->port - 1];
+
+	/* Q counters are in the beginning of all counters */
+	return rdma_alloc_hw_stats_struct(port->cnts.names,
+					  port->cnts.num_q_counters,
+					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
+}
+
+static int mlx5_ib_counter_update_stats(struct rdma_counter *counter)
+{
+	struct mlx5_ib_dev *dev = to_mdev(counter->device);
+	struct mlx5_ib_port *port = &dev->port[counter->port - 1];
+
+	return mlx5_ib_query_q_counters(dev->mdev, port,
+					counter->stats, counter->id);
+}
+
 static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
 				   struct ib_qp *qp)
 {
@@ -6361,6 +6382,8 @@ static const struct ib_device_ops mlx5_ib_dev_hw_stats_ops = {
 	.get_hw_stats = mlx5_ib_get_hw_stats,
 	.counter_bind_qp = mlx5_ib_counter_bind_qp,
 	.counter_unbind_qp = mlx5_ib_counter_unbind_qp,
+	.counter_alloc_stats = mlx5_ib_counter_alloc_stats,
+	.counter_update_stats = mlx5_ib_counter_update_stats,
 };
 
 static int mlx5_ib_stage_counters_init(struct mlx5_ib_dev *dev)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (12 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  2019-05-22 17:10   ` Jason Gunthorpe
                     ` (2 more replies)
  -1 siblings, 3 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Since a QP can only be bound to one counter, then if it is bound to a
separate counter, for backward compatibility purpose, the statistic
value must be:
* stat of default counter
+ stat of all running allocated counters
+ stat of all deallocated counters (history stats)

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
 drivers/infiniband/core/device.c   |  8 ++-
 drivers/infiniband/core/sysfs.c    | 10 ++-
 include/rdma/rdma_counter.h        |  5 +-
 4 files changed, 113 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 36cd9eca1e46..f598b1cdb241 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
 	return ret;
 }
 
+static void counter_history_stat_update(const struct rdma_counter *counter)
+{
+	struct ib_device *dev = counter->device;
+	struct rdma_port_counter *port_counter;
+	int i;
+
+	port_counter = &dev->port_data[counter->port].port_counter;
+	if (!port_counter->hstats)
+		return;
+
+	for (i = 0; i < counter->stats->num_counters; i++)
+		port_counter->hstats->value[i] += counter->stats->value[i];
+}
+
 static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
 {
 	struct rdma_counter *counter = qp->counter;
@@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
 		return ret;
 
 	rdma_restrack_put(&counter->res);
-	if (atomic_dec_and_test(&counter->usecnt))
+	if (atomic_dec_and_test(&counter->usecnt)) {
+		counter_history_stat_update(counter);
 		rdma_counter_dealloc(counter);
+	}
 
 	return 0;
 }
@@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
 	return ret;
 }
 
-void rdma_counter_init(struct ib_device *dev)
+static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
+					   u8 port, u32 index)
+{
+	struct rdma_restrack_entry *res;
+	struct rdma_restrack_root *rt;
+	struct rdma_counter *counter;
+	unsigned long id = 0;
+	u64 sum = 0;
+
+	rt = &dev->res[RDMA_RESTRACK_COUNTER];
+	xa_lock(&rt->xa);
+	xa_for_each(&rt->xa, id, res) {
+		if (!rdma_restrack_get(res))
+			continue;
+
+		counter = container_of(res, struct rdma_counter, res);
+		if ((counter->device != dev) || (counter->port != port))
+			goto next;
+
+		if (rdma_counter_query_stats(counter))
+			goto next;
+
+		sum += counter->stats->value[index];
+next:
+		rdma_restrack_put(res);
+	}
+
+	xa_unlock(&rt->xa);
+	return sum;
+}
+
+/**
+ * rdma_counter_get_hwstat_value() - Get the sum value of all counters on a
+ *   specific port, including the running ones and history data
+ */
+u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index)
+{
+	struct rdma_port_counter *port_counter;
+	u64 sum;
+
+	if (!rdma_is_port_valid(dev, port))
+		return -EINVAL;
+
+	port_counter = &dev->port_data[port].port_counter;
+	if (index >= port_counter->hstats->num_counters)
+		return -EINVAL;
+
+	sum = get_running_counters_hwstat_sum(dev, port, index);
+	sum += port_counter->hstats->value[index];
+
+	return sum;
+}
+
+int rdma_counter_init(struct ib_device *dev)
 {
 	struct rdma_port_counter *port_counter;
 	u32 port;
 
 	if (!dev->ops.alloc_hw_stats)
-		return;
+		return 0;
 
 	rdma_for_each_port(dev, port) {
 		port_counter = &dev->port_data[port].port_counter;
 		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
 		mutex_init(&port_counter->lock);
+
+		port_counter->hstats = dev->ops.alloc_hw_stats(dev, port);
+		if (!port_counter->hstats)
+			goto fail;
 	}
+
+	return 0;
+
+fail:
+	rdma_for_each_port(dev, port) {
+		port_counter = &dev->port_data[port].port_counter;
+		kfree(port_counter->hstats);
+	}
+
+	return -ENOMEM;
 }
 
 void rdma_counter_cleanup(struct ib_device *dev)
 {
+	struct rdma_port_counter *port_counter;
+	u32 port;
+
+	if (!dev->ops.alloc_hw_stats)
+		return;
+
+	rdma_for_each_port(dev, port) {
+		port_counter = &dev->port_data[port].port_counter;
+		kfree(port_counter->hstats);
+	}
 }
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c56ffc61ab1e..8ae4906a60e7 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
 		goto dev_cleanup;
 	}
 
-	rdma_counter_init(device);
+	ret = rdma_counter_init(device);
+	if (ret) {
+		dev_warn(&device->dev, "Couldn't initialize counter\n");
+		goto sysfs_cleanup;
+	}
 
 	ret = enable_device_and_get(device);
 	if (ret) {
@@ -1283,6 +1287,8 @@ int ib_register_device(struct ib_device *device, const char *name)
 
 	return 0;
 
+sysfs_cleanup:
+	ib_device_unregister_sysfs(device);
 dev_cleanup:
 	device_del(&device->dev);
 cg_cleanup:
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 2fe89754e592..8d1cf1bbb5f5 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -43,6 +43,7 @@
 #include <rdma/ib_mad.h>
 #include <rdma/ib_pma.h>
 #include <rdma/ib_cache.h>
+#include <rdma/rdma_counter.h>
 
 struct ib_port;
 
@@ -795,9 +796,12 @@ static int update_hw_stats(struct ib_device *dev, struct rdma_hw_stats *stats,
 	return 0;
 }
 
-static ssize_t print_hw_stat(struct rdma_hw_stats *stats, int index, char *buf)
+static ssize_t print_hw_stat(struct ib_device *dev, int port_num,
+			     struct rdma_hw_stats *stats, int index, char *buf)
 {
-	return sprintf(buf, "%llu\n", stats->value[index]);
+	u64 v = rdma_counter_get_hwstat_value(dev, port_num, index);
+
+	return sprintf(buf, "%llu\n", stats->value[index] + v);
 }
 
 static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
@@ -823,7 +827,7 @@ static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
 	ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index);
 	if (ret)
 		goto unlock;
-	ret = print_hw_stat(stats, hsa->index, buf);
+	ret = print_hw_stat(dev, hsa->port_num, stats, hsa->index, buf);
 unlock:
 	mutex_unlock(&stats->lock);
 
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 4bc62909a638..5ad86ae67cc5 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -27,6 +27,7 @@ struct rdma_counter_mode {
 
 struct rdma_port_counter {
 	struct rdma_counter_mode mode;
+	struct rdma_hw_stats *hstats;
 	struct mutex lock;
 };
 
@@ -41,13 +42,13 @@ struct rdma_counter {
 	u8				port;
 };
 
-void rdma_counter_init(struct ib_device *dev);
+int rdma_counter_init(struct ib_device *dev);
 void rdma_counter_cleanup(struct ib_device *dev);
 int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 			       bool on, enum rdma_nl_counter_mask mask);
 int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
 int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
-
 int rdma_counter_query_stats(struct rdma_counter *counter);
+u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index);
 
 #endif /* _RDMA_COUNTER_H_ */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 14/17] RDMA/counter: Allow manual mode configuration support
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (13 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

In manual mode a QP is bound to a counter manually. If counter is not
specified then a new one will be allocated.
Manually mode is enabled when user binds a QP, and disabled when the
last manually bound QP is unbound.
When auto-mode is turned off and there are counters left, manual mode
is enabled so that the user is able to access these counters.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c | 226 ++++++++++++++++++++++++++++-
 include/rdma/rdma_counter.h        |   7 +
 include/uapi/rdma/rdma_netlink.h   |   6 +
 3 files changed, 236 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index f598b1cdb241..6f58352a29ee 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -27,7 +27,9 @@ static int __counter_set_mode(struct rdma_counter_mode *curr,
 /**
  * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
  *
- * When @on is true, the @mask must be set
+ * When @on is true, the @mask must be set; When @on is false, it goes
+ * into manual mode if there's any counter, so that the user is able to
+ * manually access them.
  */
 int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 			       bool on, enum rdma_nl_counter_mask mask)
@@ -48,8 +50,13 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 			ret = -EINVAL;
 			goto out;
 		}
-		ret = __counter_set_mode(&port_counter->mode,
-					 RDMA_COUNTER_MODE_NONE, 0);
+
+		if (port_counter->num_counters != 0)
+			ret = __counter_set_mode(&port_counter->mode,
+						 RDMA_COUNTER_MODE_MANUAL, 0);
+		else
+			ret = __counter_set_mode(&port_counter->mode,
+						 RDMA_COUNTER_MODE_NONE, 0);
 	}
 
 out:
@@ -60,7 +67,9 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 					       enum rdma_nl_counter_mode mode)
 {
+	struct rdma_port_counter *port_counter;
 	struct rdma_counter *counter;
+	int ret;
 
 	if (!dev->ops.counter_alloc_stats)
 		return NULL;
@@ -76,12 +85,27 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 	if (!counter->stats)
 		goto err_stats;
 
+	port_counter = &dev->port_data[port].port_counter;
+	mutex_lock(&port_counter->lock);
+	if (mode == RDMA_COUNTER_MODE_MANUAL) {
+		ret = __counter_set_mode(&port_counter->mode,
+					 RDMA_COUNTER_MODE_MANUAL, 0);
+		if (ret)
+			goto err_mode;
+	}
+
+	port_counter->num_counters++;
+	mutex_unlock(&port_counter->lock);
+
 	counter->mode.mode = mode;
 	atomic_set(&counter->usecnt, 0);
 	mutex_init(&counter->lock);
 
 	return counter;
 
+err_mode:
+	mutex_unlock(&port_counter->lock);
+	kfree(counter->stats);
 err_stats:
 	kfree(counter);
 	return NULL;
@@ -89,6 +113,18 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 
 static void rdma_counter_dealloc(struct rdma_counter *counter)
 {
+	struct rdma_port_counter *port_counter;
+
+	port_counter = &counter->device->port_data[counter->port].port_counter;
+	mutex_lock(&port_counter->lock);
+	port_counter->num_counters--;
+	if ((port_counter->num_counters == 0) &&
+	    (port_counter->mode.mode == RDMA_COUNTER_MODE_MANUAL))
+		__counter_set_mode(&port_counter->mode, RDMA_COUNTER_MODE_NONE,
+				   0);
+
+	mutex_unlock(&port_counter->lock);
+
 	rdma_restrack_del(&counter->res);
 	kfree(counter->stats);
 	kfree(counter);
@@ -376,6 +412,190 @@ u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index)
 	return sum;
 }
 
+static struct ib_qp *rdma_counter_get_qp(struct ib_device *dev, u32 qp_num)
+{
+	struct rdma_restrack_entry *res = NULL;
+	struct ib_qp *qp = NULL;
+
+	res = rdma_restrack_get_byid(dev, RDMA_RESTRACK_QP, qp_num);
+	if (IS_ERR(res))
+		return NULL;
+
+	if (!rdma_is_visible_in_pid_ns(res))
+		goto err;
+
+	qp = container_of(res, struct ib_qp, res);
+	if (qp->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
+		goto err;
+
+	return qp;
+
+err:
+	rdma_restrack_put(&qp->res);
+	return NULL;
+}
+
+static int rdma_counter_bind_qp_manual(struct rdma_counter *counter,
+				       struct ib_qp *qp)
+{
+	int ret;
+
+	if (qp->port != counter->port)
+		return -EINVAL;
+
+	ret = __rdma_counter_bind_qp(counter, qp);
+	if (ret)
+		return ret;
+
+	atomic_inc(&counter->usecnt);
+	return 0;
+}
+
+static struct rdma_counter *rdma_get_counter_by_id(struct ib_device *dev,
+						   u32 counter_id)
+{
+	struct rdma_restrack_entry *res;
+
+	res = rdma_restrack_get_byid(dev, RDMA_RESTRACK_COUNTER, counter_id);
+	if (IS_ERR(res))
+		return NULL;
+
+	if (!rdma_is_visible_in_pid_ns(res)) {
+		rdma_restrack_put(res);
+		return NULL;
+	}
+
+	return container_of(res, struct rdma_counter, res);
+}
+
+/**
+ * rdma_counter_bind_qpn() - Bind QP @qp_num to counter @counter_id
+ */
+int rdma_counter_bind_qpn(struct ib_device *dev, u8 port,
+			  u32 qp_num, u32 counter_id)
+{
+	struct rdma_counter *counter;
+	struct ib_qp *qp;
+	int ret;
+
+	qp = rdma_counter_get_qp(dev, qp_num);
+	if (!qp)
+		return -ENOENT;
+
+	counter = rdma_get_counter_by_id(dev, counter_id);
+	if (!counter) {
+		ret = -ENOENT;
+		goto err;
+	}
+
+	if (counter->res.task != qp->res.task) {
+		ret = -EINVAL;
+		goto err_task;
+	}
+
+	ret = rdma_counter_bind_qp_manual(counter, qp);
+	if (ret)
+		goto err_task;
+
+	rdma_restrack_put(&qp->res);
+	return 0;
+
+err_task:
+	rdma_restrack_put(&counter->res);
+err:
+	rdma_restrack_put(&qp->res);
+	return ret;
+}
+
+/**
+ * rdma_counter_bind_qpn_alloc() - Alloc a counter and bind QP @qp_num to it
+ *   The id of new counter is returned in @counter_id
+ */
+int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
+				u32 qp_num, u32 *counter_id)
+{
+	struct rdma_counter *counter;
+	struct ib_qp *qp;
+	int ret;
+
+	if (!rdma_is_port_valid(dev, port))
+		return -EINVAL;
+
+	qp = rdma_counter_get_qp(dev, qp_num);
+	if (!qp)
+		return -ENOENT;
+
+	if (rdma_is_port_valid(dev, qp->port) && (qp->port != port)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_MANUAL);
+	if (!counter) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	ret = rdma_counter_bind_qp_manual(counter, qp);
+	if (ret)
+		goto err_bind;
+
+	if (counter_id)
+		*counter_id = counter->id;
+
+	rdma_counter_res_add(counter, qp);
+
+	if (!rdma_restrack_get(&counter->res)) {
+		rdma_counter_unbind_qp(qp, false);
+		ret = -EINVAL;
+	}
+
+	rdma_restrack_put(&qp->res);
+	return ret;
+
+err_bind:
+	rdma_counter_dealloc(counter);
+err:
+	rdma_restrack_put(&qp->res);
+	return ret;
+}
+
+/**
+ * rdma_counter_unbind_qpn() - Unbind QP @qp_num from a counter
+ */
+int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
+			    u32 qp_num, u32 counter_id)
+{
+	struct rdma_port_counter *port_counter;
+	struct ib_qp *qp;
+	int ret;
+
+	if (!rdma_is_port_valid(dev, port))
+		return -EINVAL;
+
+	qp = rdma_counter_get_qp(dev, qp_num);
+	if (!qp)
+		return -ENOENT;
+
+	if (rdma_is_port_valid(dev, qp->port) && (qp->port != port)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	port_counter = &dev->port_data[port].port_counter;
+	if (!qp->counter || qp->counter->id != counter_id ||
+	    port_counter->mode.mode != RDMA_COUNTER_MODE_MANUAL) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = rdma_counter_unbind_qp(qp, false);
+
+out:
+	rdma_restrack_put(&qp->res);
+	return ret;
+}
+
 int rdma_counter_init(struct ib_device *dev)
 {
 	struct rdma_port_counter *port_counter;
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 5ad86ae67cc5..b79319abbdef 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -28,6 +28,7 @@ struct rdma_counter_mode {
 struct rdma_port_counter {
 	struct rdma_counter_mode mode;
 	struct rdma_hw_stats *hstats;
+	unsigned int num_counters;
 	struct mutex lock;
 };
 
@@ -50,5 +51,11 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
 int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
 int rdma_counter_query_stats(struct rdma_counter *counter);
 u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index);
+int rdma_counter_bind_qpn(struct ib_device *dev, u8 port,
+			  u32 qp_num, u32 counter_id);
+int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
+				u32 qp_num, u32 *counter_id);
+int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
+			    u32 qp_num, u32 counter_id);
 
 #endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 33eee6fb16b2..5f4461286e28 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -513,6 +513,12 @@ enum rdma_nl_counter_mode {
 	 */
 	RDMA_COUNTER_MODE_AUTO,
 
+	/*
+	 * Which qp are bound with which counter is explicitly specified
+	 * by the user
+	 */
+	RDMA_COUNTER_MODE_MANUAL,
+
 	/*
 	 * Always the end
 	 */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 15/17] RDMA/nldev: Allow counter manual mode configration through RDMA netlink
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (14 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Provide an option to allow users to manually bind a qp with a counter
through RDMA netlink. Limit it to users with ADMIN capability only.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c  | 111 +++++++++++++++++++++++++++----
 include/rdma/rdma_counter.h      |   3 +
 include/uapi/rdma/rdma_netlink.h |   2 +
 3 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 1f7de88effad..6b7bc1a3d61d 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1559,8 +1559,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 static int nldev_stat_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack)
 {
+	u32 index, port, mode, mask = 0, qpn, cntn = 0;
 	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
-	u32 index, port, mode, mask = 0;
 	struct ib_device *device;
 	struct sk_buff *msg;
 	int ret;
@@ -1598,30 +1598,111 @@ static int nldev_stat_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			0, 0);
 
 	mode = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_MODE]);
-	if (mode != RDMA_COUNTER_MODE_AUTO) {
-		ret = -EMSGSIZE;
-		goto err_msg;
+	if (mode == RDMA_COUNTER_MODE_AUTO) {
+		if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK])
+			mask = nla_get_u32(
+				tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]);
+
+		ret = rdma_counter_set_auto_mode(device, port,
+						 mask ? true : false, mask);
+		if (ret)
+			goto err_msg;
+	} else {
+		qpn = nla_get_u32(tb[RDMA_NLDEV_ATTR_RES_LQPN]);
+		if (tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]) {
+			cntn = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]);
+			ret = rdma_counter_bind_qpn(device, port, qpn, cntn);
+		} else {
+			ret = rdma_counter_bind_qpn_alloc(device, port,
+							  qpn, &cntn);
+		}
+		if (ret)
+			goto err_msg;
+
+		if (fill_nldev_handle(msg, device) ||
+		    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port) ||
+		    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, cntn) ||
+		    nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qpn)) {
+			ret = -EMSGSIZE;
+			goto err_fill;
+		}
+	}
+
+	nlmsg_end(msg, nlh);
+	ib_device_put(device);
+	return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_fill:
+	rdma_counter_unbind_qpn(device, port, qpn, cntn);
+err_msg:
+	nlmsg_free(msg);
+err:
+	ib_device_put(device);
+	return ret;
+}
+
+static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	struct ib_device *device;
+	struct sk_buff *msg;
+	u32 index, port, qpn, cntn;
+	int ret;
+
+	ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (ret || !tb[RDMA_NLDEV_ATTR_STAT_RES] ||
+	    !tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX] ||
+	    !tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID] ||
+	    !tb[RDMA_NLDEV_ATTR_RES_LQPN])
+		return -EINVAL;
+
+	if (nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP)
+		return -EINVAL;
+
+	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(sock_net(skb->sk), index);
+	if (!device)
+		return -EINVAL;
+
+	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+	if (!rdma_is_port_valid(device, port)) {
+		ret = -EINVAL;
+		goto err;
 	}
 
-	if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK])
-		mask = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]);
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto err;
+	}
+	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					 RDMA_NLDEV_CMD_STAT_SET),
+			0, 0);
 
-	ret = rdma_counter_set_auto_mode(device, port,
-					 mask ? true : false, mask);
+	cntn = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]);
+	qpn = nla_get_u32(tb[RDMA_NLDEV_ATTR_RES_LQPN]);
+	ret = rdma_counter_unbind_qpn(device, port, qpn, cntn);
 	if (ret)
-		goto err_msg;
+		goto err_unbind;
 
-	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, mode) ||
-	    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) {
+	if (fill_nldev_handle(msg, device) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, cntn) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qpn)) {
 		ret = -EMSGSIZE;
-		goto err_msg;
+		goto err_fill;
 	}
 
 	nlmsg_end(msg, nlh);
 	ib_device_put(device);
 	return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
 
-err_msg:
+err_fill:
+	rdma_counter_bind_qpn(device, port, qpn, cntn);
+err_unbind:
 	nlmsg_free(msg);
 err:
 	ib_device_put(device);
@@ -1687,6 +1768,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 		.doit = nldev_res_get_counter_doit,
 		.dump = nldev_res_get_counter_dumpit,
 	},
+	[RDMA_NLDEV_CMD_STAT_DEL] = {
+		.doit = nldev_stat_del_doit,
+		.flags = RDMA_NL_ADMIN_PERM,
+	},
 };
 
 void __init nldev_init(void)
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index b79319abbdef..f485b86a318c 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -57,5 +57,8 @@ int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
 				u32 qp_num, u32 *counter_id);
 int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
 			    u32 qp_num, u32 counter_id);
+int rdma_counter_get_mode(struct ib_device *dev, u8 port,
+			  enum rdma_nl_counter_mode *mode,
+			  enum rdma_nl_counter_mask *mask);
 
 #endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 5f4461286e28..aaeeb0a8aec6 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -271,6 +271,8 @@ enum rdma_nldev_command {
 
 	RDMA_NLDEV_CMD_STAT_GET, /* can dump */
 
+	RDMA_NLDEV_CMD_STAT_DEL,
+
 	RDMA_NLDEV_NUM_OPS
 };
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 16/17] RDMA/nldev: Allow get counter mode through RDMA netlink
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (15 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  -1 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

Provide an option to get current counter mode through RDMA netlink.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c | 13 +++++
 drivers/infiniband/core/nldev.c    | 78 +++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 6f58352a29ee..a3151799f625 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -596,6 +596,19 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
 	return ret;
 }
 
+int rdma_counter_get_mode(struct ib_device *dev, u8 port,
+			  enum rdma_nl_counter_mode *mode,
+			  enum rdma_nl_counter_mask *mask)
+{
+	struct rdma_port_counter *port_counter;
+
+	port_counter = &dev->port_data[port].port_counter;
+	*mode = port_counter->mode.mode;
+	*mask = port_counter->mode.mask;
+
+	return 0;
+}
+
 int rdma_counter_init(struct ib_device *dev)
 {
 	struct rdma_port_counter *port_counter;
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 6b7bc1a3d61d..53c1d2d82a06 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1709,6 +1709,82 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return ret;
 }
 
+static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	static enum rdma_nl_counter_mode mode;
+	static enum rdma_nl_counter_mask mask;
+	struct ib_device *device;
+	struct sk_buff *msg;
+	u32 index, port;
+	int ret;
+
+	ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (ret)
+		return -EINVAL;
+
+	if (!tb[RDMA_NLDEV_ATTR_STAT_MODE])
+		return nldev_res_get_counter_doit(skb, nlh, extack);
+
+	if (!tb[RDMA_NLDEV_ATTR_STAT_RES] || !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
+	    !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
+		return -EINVAL;
+
+	if (nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP)
+		return -EINVAL;
+
+	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(sock_net(skb->sk), index);
+	if (!device)
+		return -EINVAL;
+
+	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+	if (!rdma_is_port_valid(device, port)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					 RDMA_NLDEV_CMD_STAT_GET),
+			0, 0);
+
+	ret = rdma_counter_get_mode(device, port, &mode, &mask);
+	if (ret)
+		goto err_msg;
+
+	if (fill_nldev_handle(msg, device) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, mode)) {
+		ret = -EMSGSIZE;
+		goto err_msg;
+	}
+
+	if ((mode == RDMA_COUNTER_MODE_AUTO) &&
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) {
+		ret = -EMSGSIZE;
+		goto err_msg;
+	}
+
+	nlmsg_end(msg, nlh);
+	ib_device_put(device);
+	return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_msg:
+	nlmsg_free(msg);
+err:
+	ib_device_put(device);
+	return ret;
+}
+
 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_GET] = {
 		.doit = nldev_get_doit,
@@ -1765,7 +1841,7 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 		.flags = RDMA_NL_ADMIN_PERM,
 	},
 	[RDMA_NLDEV_CMD_STAT_GET] = {
-		.doit = nldev_res_get_counter_doit,
+		.doit = nldev_stat_get_doit,
 		.dump = nldev_res_get_counter_dumpit,
 	},
 	[RDMA_NLDEV_CMD_STAT_DEL] = {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics through RDMA netlink
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (16 preceding siblings ...)
  (?)
@ 2019-04-29  8:34 ` Leon Romanovsky
  2019-05-22 17:30   ` Jason Gunthorpe
  -1 siblings, 1 reply; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

From: Mark Zhang <markz@mellanox.com>

This patch adds the ability to return the hwstats of per-port default
counters (which can also be queried through sysfs nodes).

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c | 101 +++++++++++++++++++++++++++++++-
 1 file changed, 99 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 53c1d2d82a06..cb2dd38f49f1 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1709,6 +1709,98 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return ret;
 }
 
+static int nldev_res_get_default_counter_doit(struct sk_buff *skb,
+					      struct nlmsghdr *nlh,
+					      struct netlink_ext_ack *extack,
+					      struct nlattr *tb[])
+{
+	struct rdma_hw_stats *stats;
+	struct nlattr *table_attr;
+	struct ib_device *device;
+	int ret, num_cnts, i;
+	struct sk_buff *msg;
+	u32 index, port;
+	u64 v;
+
+	if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
+		return -EINVAL;
+
+	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(sock_net(skb->sk), index);
+	if (!device)
+		return -EINVAL;
+
+	if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+	if (!rdma_is_port_valid(device, port)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					 RDMA_NLDEV_CMD_STAT_GET),
+			0, 0);
+
+	if (fill_nldev_handle(msg, device) ||
+	    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) {
+		ret = -EMSGSIZE;
+		goto err_msg;
+	}
+
+	stats = device->ops.alloc_hw_stats(device, port);
+	if (!stats) {
+		ret = -ENOMEM;
+		goto err_msg;
+	}
+
+	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
+	if (num_cnts < 0) {
+		ret = -EINVAL;
+		goto err_stats;
+	}
+
+	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
+	if (!table_attr) {
+		ret = -EMSGSIZE;
+		goto err_stats;
+	}
+	for (i = 0; i < num_cnts; i++) {
+		v = stats->value[i] +
+			rdma_counter_get_hwstat_value(device, port, i);
+		if (fill_stat_hwcounter_entry(msg, stats->names[i], v)) {
+			ret = -EMSGSIZE;
+			goto err_table;
+		}
+	}
+	nla_nest_end(msg, table_attr);
+
+	kfree(stats);
+	nlmsg_end(msg, nlh);
+	ib_device_put(device);
+	return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_table:
+	nla_nest_cancel(msg, table_attr);
+err_stats:
+	kfree(stats);
+err_msg:
+	nlmsg_free(msg);
+err:
+	ib_device_put(device);
+	return ret;
+}
+
 static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack)
 {
@@ -1725,8 +1817,13 @@ static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (ret)
 		return -EINVAL;
 
-	if (!tb[RDMA_NLDEV_ATTR_STAT_MODE])
-		return nldev_res_get_counter_doit(skb, nlh, extack);
+	if (!tb[RDMA_NLDEV_ATTR_STAT_MODE]) {
+		if (!tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID])
+			return nldev_res_get_default_counter_doit(skb, nlh,
+								  extack, tb);
+		else
+			return nldev_res_get_counter_doit(skb, nlh, extack);
+	}
 
 	if (!tb[RDMA_NLDEV_ATTR_STAT_RES] || !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
 	    !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter
  2019-04-29  8:34 ` [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter Leon Romanovsky
@ 2019-04-29 18:22   ` Saeed Mahameed
  2019-04-29 18:38     ` Leon Romanovsky
  0 siblings, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2019-04-29 18:22 UTC (permalink / raw)
  To: Jason Gunthorpe, leon, dledford
  Cc: Majd Dibbiny, Mark Zhang, Leon Romanovsky, linux-rdma, netdev

On Mon, 2019-04-29 at 11:34 +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> Support bind a qp with counter. If counter is null then bind the qp
> to
> the default counter. Different QP state has different operation:
> - RESET: Set the counter field so that it will take effective
>   during RST2INIT change;
> - RTS: Issue an RTS2RTS change to update the QP counter;
> - Other: Set the counter field and mark the counter_pending flag,
>   when QP is moved to RTS state and this flag is set, then issue
>   an RTS2RTS modification to update the counter.
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 +++
>  drivers/infiniband/hw/mlx5/qp.c      | 76
> +++++++++++++++++++++++++++-
>  include/linux/mlx5/qp.h              |  1 +

I don't see any reason why this patch should go to mlx5-next branch.
Just because you have one liner in include/linux/mlx5/qp.h, is not
enough reason.

>  3 files changed, 81 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 55b8bdb402b6..447f8ad5abbd 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -437,6 +437,10 @@ struct mlx5_ib_qp {
>  	u32			flags_en;
>  	/* storage for qp sub type when core qp type is IB_QPT_DRIVER
> */
>  	enum ib_qp_type		qp_sub_type;
> +	/* A flag to indicate if there's a new counter is configured
> +	 * but not take effective
> +	 */
> +	u32                     counter_pending;
>  };
>  
>  struct mlx5_ib_cq_buf {
> @@ -1418,4 +1422,6 @@ void mlx5_ib_put_xlt_emergency_page(void);
>  int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
>  			struct mlx5_bfreg_info *bfregi, u32 bfregn,
>  			bool dyn_bfreg);
> +
> +int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter
> *counter);
>  #endif /* MLX5_IB_H */
> diff --git a/drivers/infiniband/hw/mlx5/qp.c
> b/drivers/infiniband/hw/mlx5/qp.c
> index efe1f6f0c351..29e3fcd66510 100644
> --- a/drivers/infiniband/hw/mlx5/qp.c
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -34,6 +34,7 @@
>  #include <rdma/ib_umem.h>
>  #include <rdma/ib_cache.h>
>  #include <rdma/ib_user_verbs.h>
> +#include <rdma/rdma_counter.h>
>  #include <linux/mlx5/fs.h>
>  #include "mlx5_ib.h"
>  #include "ib_rep.h"
> @@ -3365,6 +3366,35 @@ static unsigned int get_tx_affinity(struct
> mlx5_ib_dev *dev,
>  	return tx_port_affinity;
>  }
>  
> +static int __mlx5_ib_qp_set_counter(struct ib_qp *qp,
> +				    struct rdma_counter *counter)
> +{
> +	struct mlx5_ib_dev *dev = to_mdev(qp->device);
> +	struct mlx5_ib_qp *mqp = to_mqp(qp);
> +	struct mlx5_qp_context context = {};
> +	struct mlx5_ib_port *mibport = NULL;
> +	struct mlx5_ib_qp_base *base;
> +	u32 set_id;
> +
> +	if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id))
> +		return 0;
> +
> +	if (counter) {
> +		set_id = counter->id;
> +	} else {
> +		mibport = &dev->port[mqp->port - 1];
> +		set_id = mibport->cnts.set_id;
> +	}
> +
> +	base = &mqp->trans_qp.base;
> +	context.qp_counter_set_usr_page &= cpu_to_be32(0xffffff);
> +	context.qp_counter_set_usr_page |= cpu_to_be32(set_id << 24);
> +	return mlx5_core_qp_modify(dev->mdev,
> +				   MLX5_CMD_OP_RTS2RTS_QP,
> +				   MLX5_QP_OPTPAR_COUNTER_SET_ID,
> +				   &context, &base->mqp);
> +}
> +
>  static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
>  			       const struct ib_qp_attr *attr, int
> attr_mask,
>  			       enum ib_qp_state cur_state,
> @@ -3418,6 +3448,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
>  	struct mlx5_ib_port *mibport = NULL;
>  	enum mlx5_qp_state mlx5_cur, mlx5_new;
>  	enum mlx5_qp_optpar optpar;
> +	u32 set_id = 0;
>  	int mlx5_st;
>  	int err;
>  	u16 op;
> @@ -3580,8 +3611,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
>  			port_num = 0;
>  
>  		mibport = &dev->port[port_num];
> +		if (ibqp->counter)
> +			set_id = ibqp->counter->id;
> +		else
> +			set_id = mibport->cnts.set_id;
>  		context->qp_counter_set_usr_page |=
> -			cpu_to_be32((u32)(mibport->cnts.set_id) << 24);
> +			cpu_to_be32(set_id << 24);
>  	}
>  
>  	if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state ==
> IB_QPS_INIT)
> @@ -3609,7 +3644,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
>  
>  		raw_qp_param.operation = op;
>  		if (cur_state == IB_QPS_RESET && new_state ==
> IB_QPS_INIT) {
> -			raw_qp_param.rq_q_ctr_id = mibport-
> >cnts.set_id;
> +			raw_qp_param.rq_q_ctr_id = set_id;
>  			raw_qp_param.set_mask |=
> MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID;
>  		}
>  
> @@ -3686,6 +3721,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
>  		qp->db.db[MLX5_SND_DBR] = 0;
>  	}
>  
> +	if ((new_state == IB_QPS_RTS) && qp->counter_pending) {
> +		err = __mlx5_ib_qp_set_counter(ibqp, ibqp->counter);
> +		if (!err)
> +			qp->counter_pending = 0;
> +	}
> +
>  out:
>  	kfree(context);
>  	return err;
> @@ -6347,3 +6388,34 @@ void mlx5_ib_drain_rq(struct ib_qp *qp)
>  
>  	handle_drain_completion(cq, &rdrain, dev);
>  }
> +
> +/**
> + * Bind a qp to a counter. If @counter is NULL then bind the qp to
> + * the default counter
> + */
> +int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter
> *counter)
> +{
> +	struct mlx5_ib_qp *mqp = to_mqp(qp);
> +	int err = 0;
> +
> +	mutex_lock(&mqp->mutex);
> +	if (mqp->state == IB_QPS_RESET) {
> +		qp->counter = counter;
> +		goto out;
> +	}
> +
> +	if (mqp->state == IB_QPS_RTS) {
> +		err = __mlx5_ib_qp_set_counter(qp, counter);
> +		if (!err)
> +			qp->counter = counter;
> +
> +		goto out;
> +	}
> +
> +	mqp->counter_pending = 1;
> +	qp->counter = counter;
> +
> +out:
> +	mutex_unlock(&mqp->mutex);
> +	return err;
> +}
> diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
> index 0343c81d4c5f..b0b47106bc76 100644
> --- a/include/linux/mlx5/qp.h
> +++ b/include/linux/mlx5/qp.h
> @@ -70,6 +70,7 @@ enum mlx5_qp_optpar {
>  	MLX5_QP_OPTPAR_CQN_RCV			= 1 << 19,
>  	MLX5_QP_OPTPAR_DC_HS			= 1 << 20,
>  	MLX5_QP_OPTPAR_DC_KEY			= 1 << 21,
> +	MLX5_QP_OPTPAR_COUNTER_SET_ID		= 1 << 25,
>  };
>  
>  enum mlx5_qp_state {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter
  2019-04-29 18:22   ` Saeed Mahameed
@ 2019-04-29 18:38     ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-04-29 18:38 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Jason Gunthorpe, dledford, Majd Dibbiny, Mark Zhang, linux-rdma, netdev

On Mon, Apr 29, 2019 at 06:22:03PM +0000, Saeed Mahameed wrote:
> On Mon, 2019-04-29 at 11:34 +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > Support bind a qp with counter. If counter is null then bind the qp
> > to
> > the default counter. Different QP state has different operation:
> > - RESET: Set the counter field so that it will take effective
> >   during RST2INIT change;
> > - RTS: Issue an RTS2RTS change to update the QP counter;
> > - Other: Set the counter field and mark the counter_pending flag,
> >   when QP is moved to RTS state and this flag is set, then issue
> >   an RTS2RTS modification to update the counter.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > ---
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 +++
> >  drivers/infiniband/hw/mlx5/qp.c      | 76
> > +++++++++++++++++++++++++++-
> >  include/linux/mlx5/qp.h              |  1 +
>
> I don't see any reason why this patch should go to mlx5-next branch.
> Just because you have one liner in include/linux/mlx5/qp.h, is not
> enough reason.
>

I'm changing target automatically based on get_maintainer.pl output.

If "$GET_MAINTAINER --email --nol --nos --nosubsystem --noroles $_file | grep Saeed -c"
returns Saeed, it means that such patch has all potential to create merge conflict.

Thanks

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
  2019-04-29  8:34 ` [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support Leon Romanovsky
@ 2019-05-22 16:56   ` Jason Gunthorpe
  2019-05-29 10:12     ` Leon Romanovsky
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 16:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:41AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> Add an API to support set/clear per-port auto mode.
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/Makefile   |  2 +-
>  drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
>  drivers/infiniband/core/device.c   |  4 ++
>  include/rdma/ib_verbs.h            |  2 +
>  include/rdma/rdma_counter.h        | 24 ++++++++++
>  include/uapi/rdma/rdma_netlink.h   | 26 ++++++++++
>  6 files changed, 134 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/infiniband/core/counters.c
> 
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index 313f2349b518..cddf748c15c9 100644
> +++ b/drivers/infiniband/core/Makefile
> @@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
>  				device.o fmr_pool.o cache.o netlink.o \
>  				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
>  				multicast.o mad.o smi.o agent.o mad_rmpp.o \
> -				nldev.o restrack.o
> +				nldev.o restrack.o counters.o
>  
>  ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
>  ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> new file mode 100644
> index 000000000000..bda8d945a758
> +++ b/drivers/infiniband/core/counters.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
> + */
> +#include <rdma/ib_verbs.h>
> +#include <rdma/rdma_counter.h>
> +
> +#include "core_priv.h"
> +#include "restrack.h"
> +
> +#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
> +
> +static int __counter_set_mode(struct rdma_counter_mode *curr,
> +			      enum rdma_nl_counter_mode new_mode,
> +			      enum rdma_nl_counter_mask new_mask)
> +{
> +	if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
> +	    ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
> +	     (curr->mode != RDMA_COUNTER_MODE_NONE)))
> +		return -EINVAL;
> +
> +	curr->mode = new_mode;
> +	curr->mask = new_mask;
> +	return 0;
> +}
> +
> +/**
> + * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
> + *
> + * When @on is true, the @mask must be set
> + */
> +int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> +			       bool on, enum rdma_nl_counter_mask mask)
> +{
> +	struct rdma_port_counter *port_counter;
> +	int ret;
> +
> +	if (!rdma_is_port_valid(dev, port))
> +		return -EINVAL;
> +
> +	port_counter = &dev->port_data[port].port_counter;
> +	mutex_lock(&port_counter->lock);
> +	if (on) {
> +		ret = __counter_set_mode(&port_counter->mode,
> +					 RDMA_COUNTER_MODE_AUTO, mask);
> +	} else {
> +		if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		ret = __counter_set_mode(&port_counter->mode,
> +					 RDMA_COUNTER_MODE_NONE, 0);
> +	}
> +
> +out:
> +	mutex_unlock(&port_counter->lock);
> +	return ret;
> +}
> +
> +void rdma_counter_init(struct ib_device *dev)
> +{
> +	struct rdma_port_counter *port_counter;
> +	u32 port;
> +
> +	if (!dev->ops.alloc_hw_stats)
> +		return;
> +
> +	rdma_for_each_port(dev, port) {
> +		port_counter = &dev->port_data[port].port_counter;
> +		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> +		mutex_init(&port_counter->lock);
> +	}
> +}
> +
> +void rdma_counter_cleanup(struct ib_device *dev)
> +{
> +}

Please don't add empty functions

> @@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
>  		goto out;
>  
>  	disable_device(ib_dev);
> +	rdma_counter_cleanup(ib_dev);

This is the wrong place to call this, the patch that actually adds a
body is just doing kfree's so it is properly called
'rdma_counter_release' and it belongs in ib_device_release()

And it shouldn't test hw_stats, and it shouldn't have a 'fail' stanza
for allocation either.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-04-29  8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
@ 2019-05-22 17:10   ` Jason Gunthorpe
  2019-05-29 11:15     ` Leon Romanovsky
  2019-05-22 17:26   ` Jason Gunthorpe
  2019-05-29 11:17   ` Leon Romanovsky
  2 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> Since a QP can only be bound to one counter, then if it is bound to a
> separate counter, for backward compatibility purpose, the statistic
> value must be:
> * stat of default counter
> + stat of all running allocated counters
> + stat of all deallocated counters (history stats)
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
>  drivers/infiniband/core/device.c   |  8 ++-
>  drivers/infiniband/core/sysfs.c    | 10 ++-
>  include/rdma/rdma_counter.h        |  5 +-
>  4 files changed, 113 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 36cd9eca1e46..f598b1cdb241 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
>  	return ret;
>  }
>  
> +static void counter_history_stat_update(const struct rdma_counter *counter)
> +{
> +	struct ib_device *dev = counter->device;
> +	struct rdma_port_counter *port_counter;
> +	int i;
> +
> +	port_counter = &dev->port_data[counter->port].port_counter;
> +	if (!port_counter->hstats)
> +		return;
> +
> +	for (i = 0; i < counter->stats->num_counters; i++)
> +		port_counter->hstats->value[i] += counter->stats->value[i];
> +}
> +
>  static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
>  {
>  	struct rdma_counter *counter = qp->counter;
> @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
>  		return ret;
>  
>  	rdma_restrack_put(&counter->res);
> -	if (atomic_dec_and_test(&counter->usecnt))
> +	if (atomic_dec_and_test(&counter->usecnt)) {
> +		counter_history_stat_update(counter);
>  		rdma_counter_dealloc(counter);
> +	}
>  
>  	return 0;
>  }
> @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
>  	return ret;
>  }
>  
> -void rdma_counter_init(struct ib_device *dev)
> +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> +					   u8 port, u32 index)
> +{
> +	struct rdma_restrack_entry *res;
> +	struct rdma_restrack_root *rt;
> +	struct rdma_counter *counter;
> +	unsigned long id = 0;
> +	u64 sum = 0;
> +
> +	rt = &dev->res[RDMA_RESTRACK_COUNTER];
> +	xa_lock(&rt->xa);
> +	xa_for_each(&rt->xa, id, res) {
> +		if (!rdma_restrack_get(res))
> +			continue;

Why do we need to get refcounts if we are holding the xa_lock?

> +
> +		counter = container_of(res, struct rdma_counter, res);
> +		if ((counter->device != dev) || (counter->port != port))
> +			goto next;
> +
> +		if (rdma_counter_query_stats(counter))
> +			goto next;

And rdma_counter_query_stats does

+	mutex_lock(&counter->lock);

So this was never tested as it will insta-crash with lockdep.

Presumably this is why it is using xa_for_each and restrack_get - but
it needs to drop the lock after successful get.

This sort of comment applies to nearly evey place in this series that
uses xa_for_each. 

This needs to be tested with lockdep.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration mode support
  2019-04-29  8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
@ 2019-05-22 17:11   ` Jason Gunthorpe
  2019-05-22 17:15   ` Jason Gunthorpe
  1 sibling, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:11 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:42AM +0300, Leon Romanovsky wrote:

>  void rdma_counter_init(struct ib_device *dev)
>  {
>  	struct rdma_port_counter *port_counter;
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 9204b4251fc8..dfaa57de871f 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -2349,6 +2349,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, set_vf_guid);
>  	SET_DEVICE_OP(dev_ops, set_vf_link_state);
>  	SET_DEVICE_OP(dev_ops, unmap_fmr);
> +	SET_DEVICE_OP(dev_ops, counter_bind_qp);
> +	SET_DEVICE_OP(dev_ops, counter_unbind_qp);

Keep sorted

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration mode support
  2019-04-29  8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
  2019-05-22 17:11   ` Jason Gunthorpe
@ 2019-05-22 17:15   ` Jason Gunthorpe
  1 sibling, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:42AM +0300, Leon Romanovsky wrote:

> +/**
> + * rdma_counter_unbind_qp - Unbind a qp from a counter
> + * @force:
> + *   true - Decrease the counter ref-count anyway (e.g., qp destroy)
> + */
> +int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> +{
> +	struct rdma_counter *counter = qp->counter;
> +	int ret;
> +
> +	if (!counter)
> +		return -EINVAL;
> +
> +	ret = __rdma_counter_unbind_qp(qp, force);
> +	if (ret && !force)
> +		return ret;
> +
> +	rdma_restrack_put(&counter->res);
> +	if (atomic_dec_and_test(&counter->usecnt))
> +		rdma_counter_dealloc(counter);

An atomic that does kfree when it reaches zero should be implemented
with a kref.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
  2019-04-29  8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
@ 2019-05-22 17:21   ` Jason Gunthorpe
  2019-05-29 11:31     ` Leon Romanovsky
  2019-05-22 17:22   ` Jason Gunthorpe
  1 sibling, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:47AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> This patch adds the ability to return all available counters
> together with their properties and hwstats.
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/counters.c |  28 +++++
>  drivers/infiniband/core/device.c   |   2 +
>  drivers/infiniband/core/nldev.c    | 173 +++++++++++++++++++++++++++++
>  include/rdma/ib_verbs.h            |  10 ++
>  include/rdma/rdma_counter.h        |   3 +
>  include/uapi/rdma/rdma_netlink.h   |  10 +-
>  6 files changed, 225 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 665e0d43c21b..36cd9eca1e46 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
>  {
>  	struct rdma_counter *counter;
>  
> +	if (!dev->ops.counter_alloc_stats)
> +		return NULL;
> +

Seems weird to add this now, why was it Ok to have counters prior to
this patch?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
  2019-04-29  8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
  2019-05-22 17:21   ` Jason Gunthorpe
@ 2019-05-22 17:22   ` Jason Gunthorpe
  1 sibling, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:47AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> This patch adds the ability to return all available counters
> together with their properties and hwstats.
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/counters.c |  28 +++++
>  drivers/infiniband/core/device.c   |   2 +
>  drivers/infiniband/core/nldev.c    | 173 +++++++++++++++++++++++++++++
>  include/rdma/ib_verbs.h            |  10 ++
>  include/rdma/rdma_counter.h        |   3 +
>  include/uapi/rdma/rdma_netlink.h   |  10 +-
>  6 files changed, 225 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 665e0d43c21b..36cd9eca1e46 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
>  {
>  	struct rdma_counter *counter;
>  
> +	if (!dev->ops.counter_alloc_stats)
> +		return NULL;
> +
>  	counter = kzalloc(sizeof(*counter), GFP_KERNEL);
>  	if (!counter)
>  		return NULL;
> @@ -69,16 +72,25 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
>  	counter->device    = dev;
>  	counter->port      = port;
>  	counter->res.type  = RDMA_RESTRACK_COUNTER;
> +	counter->stats     = dev->ops.counter_alloc_stats(counter);
> +	if (!counter->stats)
> +		goto err_stats;
> +
>  	counter->mode.mode = mode;
>  	atomic_set(&counter->usecnt, 0);
>  	mutex_init(&counter->lock);
>  
>  	return counter;
> +
> +err_stats:
> +	kfree(counter);
> +	return NULL;
>  }
>  
>  static void rdma_counter_dealloc(struct rdma_counter *counter)
>  {
>  	rdma_restrack_del(&counter->res);
> +	kfree(counter->stats);
>  	kfree(counter);
>  }
>  
> @@ -279,6 +291,22 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
>  	return 0;
>  }
>  
> +int rdma_counter_query_stats(struct rdma_counter *counter)
> +{
> +	int ret;
> +
> +	struct ib_device *dev = counter->device;
> +

Extra blank line
Something about festive trees

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-04-29  8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
  2019-05-22 17:10   ` Jason Gunthorpe
@ 2019-05-22 17:26   ` Jason Gunthorpe
  2019-05-29 11:05     ` Leon Romanovsky
  2019-05-29 11:17   ` Leon Romanovsky
  2 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c56ffc61ab1e..8ae4906a60e7 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
>  		goto dev_cleanup;
>  	}
>  
> -	rdma_counter_init(device);
> +	ret = rdma_counter_init(device);
> +	if (ret) {
> +		dev_warn(&device->dev, "Couldn't initialize counter\n");
> +		goto sysfs_cleanup;
> +	}

Don't put this things randomly, if there is some reason it should be
after sysfs it needs a comment, otherwise if it is just allocating
memory it belongs earlier, and the unwind should be done in release.

I also think it is very strange/wrong that both sysfs and counters are
allocating the same alloc_hw_stats object

Why can't they share?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics through RDMA netlink
  2019-04-29  8:34 ` [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics " Leon Romanovsky
@ 2019-05-22 17:30   ` Jason Gunthorpe
  2019-05-29 11:54     ` Leon Romanovsky
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:30 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:53AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> This patch adds the ability to return the hwstats of per-port default
> counters (which can also be queried through sysfs nodes).
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/nldev.c | 101 +++++++++++++++++++++++++++++++-
>  1 file changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 53c1d2d82a06..cb2dd38f49f1 100644
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1709,6 +1709,98 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>  	return ret;
>  }
>  
> +static int nldev_res_get_default_counter_doit(struct sk_buff *skb,
> +					      struct nlmsghdr *nlh,
> +					      struct netlink_ext_ack *extack,
> +					      struct nlattr *tb[])
> +{
> +	struct rdma_hw_stats *stats;
> +	struct nlattr *table_attr;
> +	struct ib_device *device;
> +	int ret, num_cnts, i;
> +	struct sk_buff *msg;
> +	u32 index, port;
> +	u64 v;
> +
> +	if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
> +		return -EINVAL;
> +
> +	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
> +	device = ib_device_get_by_index(sock_net(skb->sk), index);
> +	if (!device)
> +		return -EINVAL;
> +
> +	if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> +	if (!rdma_is_port_valid(device, port)) {
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
> +			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
> +					 RDMA_NLDEV_CMD_STAT_GET),
> +			0, 0);
> +
> +	if (fill_nldev_handle(msg, device) ||
> +	    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) {
> +		ret = -EMSGSIZE;
> +		goto err_msg;
> +	}
> +
> +	stats = device->ops.alloc_hw_stats(device, port);
> +	if (!stats) {
> +		ret = -ENOMEM;
> +		goto err_msg;
> +	}

Why do we need yet another one of these to be allocated?

> +	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);

Is '0' right here?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 00/17] Statistics counter support
  2019-04-29  8:34 ` Leon Romanovsky
                   ` (17 preceding siblings ...)
  (?)
@ 2019-05-22 17:31 ` Jason Gunthorpe
  -1 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:31 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev

On Mon, Apr 29, 2019 at 11:34:36AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Changelog:
>  v1 -> v2:
>  * Rebased to latest rdma-next
>  v0 -> v1:
>  * Changed wording of counter comment
>  * Removed unneeded assignments
>  * Added extra patch to present global counters
> 
>  * I didn't change QP type from int to be enum ib_qp_type,
>    because it caused to cyclic dependency between ib_verbs.h and
>    rdma_counter.h.
> 
> 
> Hi,
> 
> This series from Mark provides dynamic statistics infrastructure.
> He uses netlink interface to configure and retrieve those counters.
> 
> This infrastructure allows to users monitor various objects by binding
> to them counters. As the beginning, we used QP object as target for
> those counters, but future patches will include ODP MR information too.
> 
> Two binding modes are supported:
>  - Auto: This allows a user to build automatic set of objects to a counter
>    according to common criteria. For example in a per-type scheme, where in
>    one process all QPs with same QP type are bound automatically to a single
>    counter.
>  - Manual: This allows a user to manually bind objects on a counter.
> 
> Those two modes are mutual-exclusive with separation between processes,
> objects created by different processes cannot be bound to a same counter.
> 
> For objects which don't support counter binding, we will return
> pre-allocated counters.
> 
> $ rdma statistic qp set link mlx5_2/1 auto type on
> $ rdma statistic qp set link mlx5_2/1 auto off
> $ rdma statistic qp bind link mlx5_2/1 lqpn 178
> $ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
> $ rdma statistic show
> $ rdma statistic qp mode

Can you please include the command outputs?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
  2019-05-22 16:56   ` Jason Gunthorpe
@ 2019-05-29 10:12     ` Leon Romanovsky
  2019-05-29 10:43       ` Leon Romanovsky
  0 siblings, 1 reply; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 10:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 22, 2019 at 01:56:08PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:41AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > Add an API to support set/clear per-port auto mode.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/Makefile   |  2 +-
> >  drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
> >  drivers/infiniband/core/device.c   |  4 ++
> >  include/rdma/ib_verbs.h            |  2 +
> >  include/rdma/rdma_counter.h        | 24 ++++++++++
> >  include/uapi/rdma/rdma_netlink.h   | 26 ++++++++++
> >  6 files changed, 134 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/infiniband/core/counters.c
> >
> > diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> > index 313f2349b518..cddf748c15c9 100644
> > +++ b/drivers/infiniband/core/Makefile
> > @@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> >  				device.o fmr_pool.o cache.o netlink.o \
> >  				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
> >  				multicast.o mad.o smi.o agent.o mad_rmpp.o \
> > -				nldev.o restrack.o
> > +				nldev.o restrack.o counters.o
> >
> >  ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
> >  ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > new file mode 100644
> > index 000000000000..bda8d945a758
> > +++ b/drivers/infiniband/core/counters.c
> > @@ -0,0 +1,77 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > +/*
> > + * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
> > + */
> > +#include <rdma/ib_verbs.h>
> > +#include <rdma/rdma_counter.h>
> > +
> > +#include "core_priv.h"
> > +#include "restrack.h"
> > +
> > +#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
> > +
> > +static int __counter_set_mode(struct rdma_counter_mode *curr,
> > +			      enum rdma_nl_counter_mode new_mode,
> > +			      enum rdma_nl_counter_mask new_mask)
> > +{
> > +	if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
> > +	    ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
> > +	     (curr->mode != RDMA_COUNTER_MODE_NONE)))
> > +		return -EINVAL;
> > +
> > +	curr->mode = new_mode;
> > +	curr->mask = new_mask;
> > +	return 0;
> > +}
> > +
> > +/**
> > + * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
> > + *
> > + * When @on is true, the @mask must be set
> > + */
> > +int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> > +			       bool on, enum rdma_nl_counter_mask mask)
> > +{
> > +	struct rdma_port_counter *port_counter;
> > +	int ret;
> > +
> > +	if (!rdma_is_port_valid(dev, port))
> > +		return -EINVAL;
> > +
> > +	port_counter = &dev->port_data[port].port_counter;
> > +	mutex_lock(&port_counter->lock);
> > +	if (on) {
> > +		ret = __counter_set_mode(&port_counter->mode,
> > +					 RDMA_COUNTER_MODE_AUTO, mask);
> > +	} else {
> > +		if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +		ret = __counter_set_mode(&port_counter->mode,
> > +					 RDMA_COUNTER_MODE_NONE, 0);
> > +	}
> > +
> > +out:
> > +	mutex_unlock(&port_counter->lock);
> > +	return ret;
> > +}
> > +
> > +void rdma_counter_init(struct ib_device *dev)
> > +{
> > +	struct rdma_port_counter *port_counter;
> > +	u32 port;
> > +
> > +	if (!dev->ops.alloc_hw_stats)
> > +		return;
> > +
> > +	rdma_for_each_port(dev, port) {
> > +		port_counter = &dev->port_data[port].port_counter;
> > +		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> > +		mutex_init(&port_counter->lock);
> > +	}
> > +}
> > +
> > +void rdma_counter_cleanup(struct ib_device *dev)
> > +{
> > +}
>
> Please don't add empty functions

It is brought here for symmetry, the function is going to be filled in
patch "RDMA/core: Get sum value of all counters when perform a sysfs
stat read".

>
> > @@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> >  		goto out;
> >
> >  	disable_device(ib_dev);
> > +	rdma_counter_cleanup(ib_dev);
>
> This is the wrong place to call this, the patch that actually adds a
> body is just doing kfree's so it is properly called
> 'rdma_counter_release' and it belongs in ib_device_release()

I'll move.

>
> And it shouldn't test hw_stats, and it shouldn't have a 'fail' stanza
> for allocation either.

Not all devices implement hw_stat.

>
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
  2019-05-29 10:12     ` Leon Romanovsky
@ 2019-05-29 10:43       ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 10:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 29, 2019 at 01:12:18PM +0300, Leon Romanovsky wrote:
> On Wed, May 22, 2019 at 01:56:08PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 11:34:41AM +0300, Leon Romanovsky wrote:
> > > From: Mark Zhang <markz@mellanox.com>
> > >
> > > Add an API to support set/clear per-port auto mode.
> > >
> > > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >  drivers/infiniband/core/Makefile   |  2 +-
> > >  drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
> > >  drivers/infiniband/core/device.c   |  4 ++
> > >  include/rdma/ib_verbs.h            |  2 +
> > >  include/rdma/rdma_counter.h        | 24 ++++++++++
> > >  include/uapi/rdma/rdma_netlink.h   | 26 ++++++++++
> > >  6 files changed, 134 insertions(+), 1 deletion(-)
> > >  create mode 100644 drivers/infiniband/core/counters.c
> > >
> > > diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> > > index 313f2349b518..cddf748c15c9 100644
> > > +++ b/drivers/infiniband/core/Makefile
> > > @@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> > >  				device.o fmr_pool.o cache.o netlink.o \
> > >  				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
> > >  				multicast.o mad.o smi.o agent.o mad_rmpp.o \
> > > -				nldev.o restrack.o
> > > +				nldev.o restrack.o counters.o
> > >
> > >  ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
> > >  ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> > > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > > new file mode 100644
> > > index 000000000000..bda8d945a758
> > > +++ b/drivers/infiniband/core/counters.c
> > > @@ -0,0 +1,77 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
> > > + */
> > > +#include <rdma/ib_verbs.h>
> > > +#include <rdma/rdma_counter.h>
> > > +
> > > +#include "core_priv.h"
> > > +#include "restrack.h"
> > > +
> > > +#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
> > > +
> > > +static int __counter_set_mode(struct rdma_counter_mode *curr,
> > > +			      enum rdma_nl_counter_mode new_mode,
> > > +			      enum rdma_nl_counter_mask new_mask)
> > > +{
> > > +	if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
> > > +	    ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
> > > +	     (curr->mode != RDMA_COUNTER_MODE_NONE)))
> > > +		return -EINVAL;
> > > +
> > > +	curr->mode = new_mode;
> > > +	curr->mask = new_mask;
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
> > > + *
> > > + * When @on is true, the @mask must be set
> > > + */
> > > +int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> > > +			       bool on, enum rdma_nl_counter_mask mask)
> > > +{
> > > +	struct rdma_port_counter *port_counter;
> > > +	int ret;
> > > +
> > > +	if (!rdma_is_port_valid(dev, port))
> > > +		return -EINVAL;
> > > +
> > > +	port_counter = &dev->port_data[port].port_counter;
> > > +	mutex_lock(&port_counter->lock);
> > > +	if (on) {
> > > +		ret = __counter_set_mode(&port_counter->mode,
> > > +					 RDMA_COUNTER_MODE_AUTO, mask);
> > > +	} else {
> > > +		if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
> > > +			ret = -EINVAL;
> > > +			goto out;
> > > +		}
> > > +		ret = __counter_set_mode(&port_counter->mode,
> > > +					 RDMA_COUNTER_MODE_NONE, 0);
> > > +	}
> > > +
> > > +out:
> > > +	mutex_unlock(&port_counter->lock);
> > > +	return ret;
> > > +}
> > > +
> > > +void rdma_counter_init(struct ib_device *dev)
> > > +{
> > > +	struct rdma_port_counter *port_counter;
> > > +	u32 port;
> > > +
> > > +	if (!dev->ops.alloc_hw_stats)
> > > +		return;
> > > +
> > > +	rdma_for_each_port(dev, port) {
> > > +		port_counter = &dev->port_data[port].port_counter;
> > > +		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> > > +		mutex_init(&port_counter->lock);
> > > +	}
> > > +}
> > > +
> > > +void rdma_counter_cleanup(struct ib_device *dev)
> > > +{
> > > +}
> >
> > Please don't add empty functions
>
> It is brought here for symmetry, the function is going to be filled in
> patch "RDMA/core: Get sum value of all counters when perform a sysfs
> stat read".
>
> >
> > > @@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> > >  		goto out;
> > >
> > >  	disable_device(ib_dev);
> > > +	rdma_counter_cleanup(ib_dev);
> >
> > This is the wrong place to call this, the patch that actually adds a
> > body is just doing kfree's so it is properly called
> > 'rdma_counter_release' and it belongs in ib_device_release()
>
> I'll move.
>
> >
> > And it shouldn't test hw_stats, and it shouldn't have a 'fail' stanza
> > for allocation either.
>
> Not all devices implement hw_stat.

ok, I think that I found a way to rewrite the code without need to check hw_stat.

Thanks

>
> >
> > Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-05-22 17:26   ` Jason Gunthorpe
@ 2019-05-29 11:05     ` Leon Romanovsky
  2019-05-29 15:44       ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index c56ffc61ab1e..8ae4906a60e7 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> >  		goto dev_cleanup;
> >  	}
> >
> > -	rdma_counter_init(device);
> > +	ret = rdma_counter_init(device);
> > +	if (ret) {
> > +		dev_warn(&device->dev, "Couldn't initialize counter\n");
> > +		goto sysfs_cleanup;
> > +	}
>
> Don't put this things randomly, if there is some reason it should be
> after sysfs it needs a comment, otherwise if it is just allocating
> memory it belongs earlier, and the unwind should be done in release.
>
> I also think it is very strange/wrong that both sysfs and counters are
> allocating the same alloc_hw_stats object
>
> Why can't they share?

They can, but we wanted to separate "legacy" counters which were exposed
through sysfs and "new" counters which can be enabled/disable automatically.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-05-22 17:10   ` Jason Gunthorpe
@ 2019-05-29 11:15     ` Leon Romanovsky
  2019-05-29 15:41       ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 22, 2019 at 02:10:42PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > Since a QP can only be bound to one counter, then if it is bound to a
> > separate counter, for backward compatibility purpose, the statistic
> > value must be:
> > * stat of default counter
> > + stat of all running allocated counters
> > + stat of all deallocated counters (history stats)
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
> >  drivers/infiniband/core/device.c   |  8 ++-
> >  drivers/infiniband/core/sysfs.c    | 10 ++-
> >  include/rdma/rdma_counter.h        |  5 +-
> >  4 files changed, 113 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > index 36cd9eca1e46..f598b1cdb241 100644
> > +++ b/drivers/infiniband/core/counters.c
> > @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> >  	return ret;
> >  }
> >
> > +static void counter_history_stat_update(const struct rdma_counter *counter)
> > +{
> > +	struct ib_device *dev = counter->device;
> > +	struct rdma_port_counter *port_counter;
> > +	int i;
> > +
> > +	port_counter = &dev->port_data[counter->port].port_counter;
> > +	if (!port_counter->hstats)
> > +		return;
> > +
> > +	for (i = 0; i < counter->stats->num_counters; i++)
> > +		port_counter->hstats->value[i] += counter->stats->value[i];
> > +}
> > +
> >  static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> >  {
> >  	struct rdma_counter *counter = qp->counter;
> > @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> >  		return ret;
> >
> >  	rdma_restrack_put(&counter->res);
> > -	if (atomic_dec_and_test(&counter->usecnt))
> > +	if (atomic_dec_and_test(&counter->usecnt)) {
> > +		counter_history_stat_update(counter);
> >  		rdma_counter_dealloc(counter);
> > +	}
> >
> >  	return 0;
> >  }
> > @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
> >  	return ret;
> >  }
> >
> > -void rdma_counter_init(struct ib_device *dev)
> > +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> > +					   u8 port, u32 index)
> > +{
> > +	struct rdma_restrack_entry *res;
> > +	struct rdma_restrack_root *rt;
> > +	struct rdma_counter *counter;
> > +	unsigned long id = 0;
> > +	u64 sum = 0;
> > +
> > +	rt = &dev->res[RDMA_RESTRACK_COUNTER];
> > +	xa_lock(&rt->xa);
> > +	xa_for_each(&rt->xa, id, res) {
> > +		if (!rdma_restrack_get(res))
> > +			continue;
>
> Why do we need to get refcounts if we are holding the xa_lock?

Don't we need to protect an entry itself from disappearing?

>
> > +
> > +		counter = container_of(res, struct rdma_counter, res);
> > +		if ((counter->device != dev) || (counter->port != port))
> > +			goto next;
> > +
> > +		if (rdma_counter_query_stats(counter))
> > +			goto next;
>
> And rdma_counter_query_stats does
>
> +	mutex_lock(&counter->lock);
>
> So this was never tested as it will insta-crash with lockdep.
>
> Presumably this is why it is using xa_for_each and restrack_get - but
> it needs to drop the lock after successful get.
>
> This sort of comment applies to nearly evey place in this series that
> uses xa_for_each.
>
> This needs to be tested with lockdep.

I use LOCKDEP.

>
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-04-29  8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
  2019-05-22 17:10   ` Jason Gunthorpe
  2019-05-22 17:26   ` Jason Gunthorpe
@ 2019-05-29 11:17   ` Leon Romanovsky
  2 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Majd Dibbiny, Mark Zhang, Saeed Mahameed,
	linux-netdev

On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> Since a QP can only be bound to one counter, then if it is bound to a
> separate counter, for backward compatibility purpose, the statistic
> value must be:
> * stat of default counter
> + stat of all running allocated counters
> + stat of all deallocated counters (history stats)
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
>  drivers/infiniband/core/device.c   |  8 ++-
>  drivers/infiniband/core/sysfs.c    | 10 ++-
>  include/rdma/rdma_counter.h        |  5 +-
>  4 files changed, 113 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 36cd9eca1e46..f598b1cdb241 100644
> --- a/drivers/infiniband/core/counters.c
> +++ b/drivers/infiniband/core/counters.c
> @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
>  	return ret;
>  }
>
> +static void counter_history_stat_update(const struct rdma_counter *counter)
> +{
> +	struct ib_device *dev = counter->device;
> +	struct rdma_port_counter *port_counter;
> +	int i;
> +
> +	port_counter = &dev->port_data[counter->port].port_counter;
> +	if (!port_counter->hstats)
> +		return;
> +
> +	for (i = 0; i < counter->stats->num_counters; i++)
> +		port_counter->hstats->value[i] += counter->stats->value[i];
> +}
> +
>  static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
>  {
>  	struct rdma_counter *counter = qp->counter;
> @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
>  		return ret;
>
>  	rdma_restrack_put(&counter->res);
> -	if (atomic_dec_and_test(&counter->usecnt))
> +	if (atomic_dec_and_test(&counter->usecnt)) {
> +		counter_history_stat_update(counter);
>  		rdma_counter_dealloc(counter);
> +	}
>
>  	return 0;
>  }
> @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
>  	return ret;
>  }
>
> -void rdma_counter_init(struct ib_device *dev)
> +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> +					   u8 port, u32 index)
> +{
> +	struct rdma_restrack_entry *res;
> +	struct rdma_restrack_root *rt;
> +	struct rdma_counter *counter;
> +	unsigned long id = 0;
> +	u64 sum = 0;
> +
> +	rt = &dev->res[RDMA_RESTRACK_COUNTER];
> +	xa_lock(&rt->xa);
> +	xa_for_each(&rt->xa, id, res) {
> +		if (!rdma_restrack_get(res))
> +			continue;
> +
> +		counter = container_of(res, struct rdma_counter, res);
> +		if ((counter->device != dev) || (counter->port != port))
> +			goto next;
> +
> +		if (rdma_counter_query_stats(counter))
> +			goto next;
> +
> +		sum += counter->stats->value[index];
> +next:
> +		rdma_restrack_put(res);
> +	}
> +
> +	xa_unlock(&rt->xa);
> +	return sum;
> +}
> +
> +/**
> + * rdma_counter_get_hwstat_value() - Get the sum value of all counters on a
> + *   specific port, including the running ones and history data
> + */
> +u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index)
> +{
> +	struct rdma_port_counter *port_counter;
> +	u64 sum;
> +
> +	if (!rdma_is_port_valid(dev, port))
> +		return -EINVAL;
> +
> +	port_counter = &dev->port_data[port].port_counter;
> +	if (index >= port_counter->hstats->num_counters)
> +		return -EINVAL;
> +
> +	sum = get_running_counters_hwstat_sum(dev, port, index);
> +	sum += port_counter->hstats->value[index];
> +
> +	return sum;

This function is wrong, it can't return negative value (error) while
return type is declared as u64.

> +}
> +
> +int rdma_counter_init(struct ib_device *dev)
>  {
>  	struct rdma_port_counter *port_counter;
>  	u32 port;
>
>  	if (!dev->ops.alloc_hw_stats)
> -		return;
> +		return 0;
>
>  	rdma_for_each_port(dev, port) {
>  		port_counter = &dev->port_data[port].port_counter;
>  		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
>  		mutex_init(&port_counter->lock);
> +
> +		port_counter->hstats = dev->ops.alloc_hw_stats(dev, port);
> +		if (!port_counter->hstats)
> +			goto fail;
>  	}
> +
> +	return 0;
> +
> +fail:
> +	rdma_for_each_port(dev, port) {
> +		port_counter = &dev->port_data[port].port_counter;
> +		kfree(port_counter->hstats);
> +	}
> +
> +	return -ENOMEM;
>  }
>
>  void rdma_counter_cleanup(struct ib_device *dev)
>  {
> +	struct rdma_port_counter *port_counter;
> +	u32 port;
> +
> +	if (!dev->ops.alloc_hw_stats)
> +		return;
> +
> +	rdma_for_each_port(dev, port) {
> +		port_counter = &dev->port_data[port].port_counter;
> +		kfree(port_counter->hstats);
> +	}
>  }
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c56ffc61ab1e..8ae4906a60e7 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
>  		goto dev_cleanup;
>  	}
>
> -	rdma_counter_init(device);
> +	ret = rdma_counter_init(device);
> +	if (ret) {
> +		dev_warn(&device->dev, "Couldn't initialize counter\n");
> +		goto sysfs_cleanup;
> +	}
>
>  	ret = enable_device_and_get(device);
>  	if (ret) {
> @@ -1283,6 +1287,8 @@ int ib_register_device(struct ib_device *device, const char *name)
>
>  	return 0;
>
> +sysfs_cleanup:
> +	ib_device_unregister_sysfs(device);
>  dev_cleanup:
>  	device_del(&device->dev);
>  cg_cleanup:
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 2fe89754e592..8d1cf1bbb5f5 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -43,6 +43,7 @@
>  #include <rdma/ib_mad.h>
>  #include <rdma/ib_pma.h>
>  #include <rdma/ib_cache.h>
> +#include <rdma/rdma_counter.h>
>
>  struct ib_port;
>
> @@ -795,9 +796,12 @@ static int update_hw_stats(struct ib_device *dev, struct rdma_hw_stats *stats,
>  	return 0;
>  }
>
> -static ssize_t print_hw_stat(struct rdma_hw_stats *stats, int index, char *buf)
> +static ssize_t print_hw_stat(struct ib_device *dev, int port_num,
> +			     struct rdma_hw_stats *stats, int index, char *buf)
>  {
> -	return sprintf(buf, "%llu\n", stats->value[index]);
> +	u64 v = rdma_counter_get_hwstat_value(dev, port_num, index);
> +
> +	return sprintf(buf, "%llu\n", stats->value[index] + v);
>  }
>
>  static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
> @@ -823,7 +827,7 @@ static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
>  	ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index);
>  	if (ret)
>  		goto unlock;
> -	ret = print_hw_stat(stats, hsa->index, buf);
> +	ret = print_hw_stat(dev, hsa->port_num, stats, hsa->index, buf);
>  unlock:
>  	mutex_unlock(&stats->lock);
>
> diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
> index 4bc62909a638..5ad86ae67cc5 100644
> --- a/include/rdma/rdma_counter.h
> +++ b/include/rdma/rdma_counter.h
> @@ -27,6 +27,7 @@ struct rdma_counter_mode {
>
>  struct rdma_port_counter {
>  	struct rdma_counter_mode mode;
> +	struct rdma_hw_stats *hstats;
>  	struct mutex lock;
>  };
>
> @@ -41,13 +42,13 @@ struct rdma_counter {
>  	u8				port;
>  };
>
> -void rdma_counter_init(struct ib_device *dev);
> +int rdma_counter_init(struct ib_device *dev);
>  void rdma_counter_cleanup(struct ib_device *dev);
>  int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
>  			       bool on, enum rdma_nl_counter_mask mask);
>  int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
>  int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
> -
>  int rdma_counter_query_stats(struct rdma_counter *counter);
> +u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index);
>
>  #endif /* _RDMA_COUNTER_H_ */
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
  2019-05-22 17:21   ` Jason Gunthorpe
@ 2019-05-29 11:31     ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 22, 2019 at 02:21:37PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:47AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > This patch adds the ability to return all available counters
> > together with their properties and hwstats.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/counters.c |  28 +++++
> >  drivers/infiniband/core/device.c   |   2 +
> >  drivers/infiniband/core/nldev.c    | 173 +++++++++++++++++++++++++++++
> >  include/rdma/ib_verbs.h            |  10 ++
> >  include/rdma/rdma_counter.h        |   3 +
> >  include/uapi/rdma/rdma_netlink.h   |  10 +-
> >  6 files changed, 225 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > index 665e0d43c21b..36cd9eca1e46 100644
> > +++ b/drivers/infiniband/core/counters.c
> > @@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
> >  {
> >  	struct rdma_counter *counter;
> >
> > +	if (!dev->ops.counter_alloc_stats)
> > +		return NULL;
> > +
>
> Seems weird to add this now, why was it Ok to have counters prior to
> this patch?

Prior to this patch, "sysfs" counters and "netlink" counters were
independent from user perspective.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics through RDMA netlink
  2019-05-22 17:30   ` Jason Gunthorpe
@ 2019-05-29 11:54     ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 22, 2019 at 02:30:11PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:53AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > This patch adds the ability to return the hwstats of per-port default
> > counters (which can also be queried through sysfs nodes).
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/nldev.c | 101 +++++++++++++++++++++++++++++++-
> >  1 file changed, 99 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> > index 53c1d2d82a06..cb2dd38f49f1 100644
> > +++ b/drivers/infiniband/core/nldev.c
> > @@ -1709,6 +1709,98 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> >  	return ret;
> >  }
> >
> > +static int nldev_res_get_default_counter_doit(struct sk_buff *skb,
> > +					      struct nlmsghdr *nlh,
> > +					      struct netlink_ext_ack *extack,
> > +					      struct nlattr *tb[])
> > +{
> > +	struct rdma_hw_stats *stats;
> > +	struct nlattr *table_attr;
> > +	struct ib_device *device;
> > +	int ret, num_cnts, i;
> > +	struct sk_buff *msg;
> > +	u32 index, port;
> > +	u64 v;
> > +
> > +	if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
> > +		return -EINVAL;
> > +
> > +	index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
> > +	device = ib_device_get_by_index(sock_net(skb->sk), index);
> > +	if (!device)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
> > +		ret = -EINVAL;
> > +		goto err;
> > +	}
> > +
> > +	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> > +	if (!rdma_is_port_valid(device, port)) {
> > +		ret = -EINVAL;
> > +		goto err;
> > +	}
> > +
> > +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> > +	if (!msg) {
> > +		ret = -ENOMEM;
> > +		goto err;
> > +	}
> > +
> > +	nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
> > +			RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
> > +					 RDMA_NLDEV_CMD_STAT_GET),
> > +			0, 0);
> > +
> > +	if (fill_nldev_handle(msg, device) ||
> > +	    nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) {
> > +		ret = -EMSGSIZE;
> > +		goto err_msg;
> > +	}
> > +
> > +	stats = device->ops.alloc_hw_stats(device, port);
> > +	if (!stats) {
> > +		ret = -ENOMEM;
> > +		goto err_msg;
> > +	}
>
> Why do we need yet another one of these to be allocated?

I would say that it is bug.

>
> > +	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
>
> Is '0' right here?

I think that "index" (third parameter) is not used at all.

>
> Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-05-29 11:15     ` Leon Romanovsky
@ 2019-05-29 15:41       ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-29 15:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 29, 2019 at 02:15:44PM +0300, Leon Romanovsky wrote:
> On Wed, May 22, 2019 at 02:10:42PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > > From: Mark Zhang <markz@mellanox.com>
> > >
> > > Since a QP can only be bound to one counter, then if it is bound to a
> > > separate counter, for backward compatibility purpose, the statistic
> > > value must be:
> > > * stat of default counter
> > > + stat of all running allocated counters
> > > + stat of all deallocated counters (history stats)
> > >
> > > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >  drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
> > >  drivers/infiniband/core/device.c   |  8 ++-
> > >  drivers/infiniband/core/sysfs.c    | 10 ++-
> > >  include/rdma/rdma_counter.h        |  5 +-
> > >  4 files changed, 113 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > > index 36cd9eca1e46..f598b1cdb241 100644
> > > +++ b/drivers/infiniband/core/counters.c
> > > @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> > >  	return ret;
> > >  }
> > >
> > > +static void counter_history_stat_update(const struct rdma_counter *counter)
> > > +{
> > > +	struct ib_device *dev = counter->device;
> > > +	struct rdma_port_counter *port_counter;
> > > +	int i;
> > > +
> > > +	port_counter = &dev->port_data[counter->port].port_counter;
> > > +	if (!port_counter->hstats)
> > > +		return;
> > > +
> > > +	for (i = 0; i < counter->stats->num_counters; i++)
> > > +		port_counter->hstats->value[i] += counter->stats->value[i];
> > > +}
> > > +
> > >  static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> > >  {
> > >  	struct rdma_counter *counter = qp->counter;
> > > @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> > >  		return ret;
> > >
> > >  	rdma_restrack_put(&counter->res);
> > > -	if (atomic_dec_and_test(&counter->usecnt))
> > > +	if (atomic_dec_and_test(&counter->usecnt)) {
> > > +		counter_history_stat_update(counter);
> > >  		rdma_counter_dealloc(counter);
> > > +	}
> > >
> > >  	return 0;
> > >  }
> > > @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
> > >  	return ret;
> > >  }
> > >
> > > -void rdma_counter_init(struct ib_device *dev)
> > > +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> > > +					   u8 port, u32 index)
> > > +{
> > > +	struct rdma_restrack_entry *res;
> > > +	struct rdma_restrack_root *rt;
> > > +	struct rdma_counter *counter;
> > > +	unsigned long id = 0;
> > > +	u64 sum = 0;
> > > +
> > > +	rt = &dev->res[RDMA_RESTRACK_COUNTER];
> > > +	xa_lock(&rt->xa);
> > > +	xa_for_each(&rt->xa, id, res) {
> > > +		if (!rdma_restrack_get(res))
> > > +			continue;
> >
> > Why do we need to get refcounts if we are holding the xa_lock?
> 
> Don't we need to protect an entry itself from disappearing?

xa_lock prevents xa_erase and xa_erase should be done before any
parallel kfree.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-05-29 11:05     ` Leon Romanovsky
@ 2019-05-29 15:44       ` Jason Gunthorpe
  2019-05-30  6:01         ` Mark Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2019-05-29 15:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev

On Wed, May 29, 2019 at 02:05:24PM +0300, Leon Romanovsky wrote:
> On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > > index c56ffc61ab1e..8ae4906a60e7 100644
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> > >  		goto dev_cleanup;
> > >  	}
> > >
> > > -	rdma_counter_init(device);
> > > +	ret = rdma_counter_init(device);
> > > +	if (ret) {
> > > +		dev_warn(&device->dev, "Couldn't initialize counter\n");
> > > +		goto sysfs_cleanup;
> > > +	}
> >
> > Don't put this things randomly, if there is some reason it should be
> > after sysfs it needs a comment, otherwise if it is just allocating
> > memory it belongs earlier, and the unwind should be done in release.
> >
> > I also think it is very strange/wrong that both sysfs and counters are
> > allocating the same alloc_hw_stats object
> >
> > Why can't they share?
> 
> They can, but we wanted to separate "legacy" counters which were exposed
> through sysfs and "new" counters which can be enabled/disable automatically.

Is there any cross contamination through the hw_stats? If no they
should just share.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-05-29 15:44       ` Jason Gunthorpe
@ 2019-05-30  6:01         ` Mark Zhang
  2019-05-30  7:04           ` Leon Romanovsky
  0 siblings, 1 reply; 41+ messages in thread
From: Mark Zhang @ 2019-05-30  6:01 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Saeed Mahameed,
	linux-netdev

On 5/29/2019 11:44 PM, Jason Gunthorpe wrote:
> On Wed, May 29, 2019 at 02:05:24PM +0300, Leon Romanovsky wrote:
>> On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
>>> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
>>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
>>>> index c56ffc61ab1e..8ae4906a60e7 100644
>>>> +++ b/drivers/infiniband/core/device.c
>>>> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
>>>>   		goto dev_cleanup;
>>>>   	}
>>>>
>>>> -	rdma_counter_init(device);
>>>> +	ret = rdma_counter_init(device);
>>>> +	if (ret) {
>>>> +		dev_warn(&device->dev, "Couldn't initialize counter\n");
>>>> +		goto sysfs_cleanup;
>>>> +	}
>>>
>>> Don't put this things randomly, if there is some reason it should be
>>> after sysfs it needs a comment, otherwise if it is just allocating
>>> memory it belongs earlier, and the unwind should be done in release.
>>>
>>> I also think it is very strange/wrong that both sysfs and counters are
>>> allocating the same alloc_hw_stats object
>>>
>>> Why can't they share?
>>
>> They can, but we wanted to separate "legacy" counters which were exposed
>> through sysfs and "new" counters which can be enabled/disable automatically.
> 
> Is there any cross contamination through the hw_stats? If no they
> should just share. >

sysfs hw_stats holds the port counter, while this one holds the
history value of all dynamically allocated counters. They can not share.
port_counter =
   default_counter + running_dynamic_counter + history_dynamic_counter

> Jason
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
  2019-05-30  6:01         ` Mark Zhang
@ 2019-05-30  7:04           ` Leon Romanovsky
  0 siblings, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2019-05-30  7:04 UTC (permalink / raw)
  To: Mark Zhang
  Cc: Jason Gunthorpe, Doug Ledford, RDMA mailing list, Majd Dibbiny,
	Saeed Mahameed, linux-netdev

On Thu, May 30, 2019 at 06:01:46AM +0000, Mark Zhang wrote:
> On 5/29/2019 11:44 PM, Jason Gunthorpe wrote:
> > On Wed, May 29, 2019 at 02:05:24PM +0300, Leon Romanovsky wrote:
> >> On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
> >>> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> >>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> >>>> index c56ffc61ab1e..8ae4906a60e7 100644
> >>>> +++ b/drivers/infiniband/core/device.c
> >>>> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> >>>>   		goto dev_cleanup;
> >>>>   	}
> >>>>
> >>>> -	rdma_counter_init(device);
> >>>> +	ret = rdma_counter_init(device);
> >>>> +	if (ret) {
> >>>> +		dev_warn(&device->dev, "Couldn't initialize counter\n");
> >>>> +		goto sysfs_cleanup;
> >>>> +	}
> >>>
> >>> Don't put this things randomly, if there is some reason it should be
> >>> after sysfs it needs a comment, otherwise if it is just allocating
> >>> memory it belongs earlier, and the unwind should be done in release.
> >>>
> >>> I also think it is very strange/wrong that both sysfs and counters are
> >>> allocating the same alloc_hw_stats object
> >>>
> >>> Why can't they share?
> >>
> >> They can, but we wanted to separate "legacy" counters which were exposed
> >> through sysfs and "new" counters which can be enabled/disable automatically.
> >
> > Is there any cross contamination through the hw_stats? If no they
> > should just share. >
>
> sysfs hw_stats holds the port counter, while this one holds the
> history value of all dynamically allocated counters. They can not share.
> port_counter =
>    default_counter + running_dynamic_counter + history_dynamic_counter

I'm not saying that it is right thing to do, but nothing prevents from you
to add extra field to port_counter exactly for that.

Thanks

>
> > Jason
> >
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-05-30  7:04 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29  8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
2019-04-29  8:34 ` Leon Romanovsky
2019-04-29  8:34 ` [PATCH mlx5-next v2 01/17] net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 02/17] RDMA/restrack: Introduce statistic counter Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 03/17] RDMA/restrack: Add an API to attach a task to a resource Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 04/17] RDMA/restrack: Make is_visible_in_pid_ns() as an API Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support Leon Romanovsky
2019-05-22 16:56   ` Jason Gunthorpe
2019-05-29 10:12     ` Leon Romanovsky
2019-05-29 10:43       ` Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
2019-05-22 17:11   ` Jason Gunthorpe
2019-05-22 17:15   ` Jason Gunthorpe
2019-04-29  8:34 ` [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter Leon Romanovsky
2019-04-29 18:22   ` Saeed Mahameed
2019-04-29 18:38     ` Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 08/17] IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters() Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 09/17] IB/mlx5: Support statistic q counter configuration Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 10/17] RDMA/nldev: Allow counter auto mode configration through RDMA netlink Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
2019-05-22 17:21   ` Jason Gunthorpe
2019-05-29 11:31     ` Leon Romanovsky
2019-05-22 17:22   ` Jason Gunthorpe
2019-04-29  8:34 ` [PATCH rdma-next v2 12/17] IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
2019-05-22 17:10   ` Jason Gunthorpe
2019-05-29 11:15     ` Leon Romanovsky
2019-05-29 15:41       ` Jason Gunthorpe
2019-05-22 17:26   ` Jason Gunthorpe
2019-05-29 11:05     ` Leon Romanovsky
2019-05-29 15:44       ` Jason Gunthorpe
2019-05-30  6:01         ` Mark Zhang
2019-05-30  7:04           ` Leon Romanovsky
2019-05-29 11:17   ` Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 14/17] RDMA/counter: Allow manual mode configuration support Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 15/17] RDMA/nldev: Allow counter manual mode configration through RDMA netlink Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 16/17] RDMA/nldev: Allow get counter mode " Leon Romanovsky
2019-04-29  8:34 ` [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics " Leon Romanovsky
2019-05-22 17:30   ` Jason Gunthorpe
2019-05-29 11:54     ` Leon Romanovsky
2019-05-22 17:31 ` [PATCH rdma-next v2 00/17] Statistics counter support Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.