* [PATCH mlx5-next v2 01/17] net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 02/17] RDMA/restrack: Introduce statistic counter Leon Romanovsky
` (16 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS
qp modification can be used to change the counter of a QP.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/linux/mlx5/mlx5_ifc.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 4b37519bd6a5..eb0cafa7bab8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -997,7 +997,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 cc_modify_allowed[0x1];
u8 start_pad[0x1];
u8 cache_line_128byte[0x1];
- u8 reserved_at_165[0xa];
+ u8 reserved_at_165[0x4];
+ u8 rts2rts_qp_counters_set_id[0x1];
+ u8 reserved_at_16a[0x5];
u8 qcam_reg[0x1];
u8 gid_table_size[0x10];
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 02/17] RDMA/restrack: Introduce statistic counter
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
2019-04-29 8:34 ` [PATCH mlx5-next v2 01/17] net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 03/17] RDMA/restrack: Add an API to attach a task to a resource Leon Romanovsky
` (15 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Introduce statistic counter as a new resource. It allows a user
to monitor specific objects (e.g., QPs) by binding to a counter.
In some cases a user counter resource is created with task other then
"current", because its creation is done as part of rdmatool call.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/restrack.c | 22 +++++++++++++++++-----
include/rdma/rdma_counter.h | 18 ++++++++++++++++++
include/rdma/restrack.h | 4 ++++
3 files changed, 39 insertions(+), 5 deletions(-)
create mode 100644 include/rdma/rdma_counter.h
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 3b5ff2f7b5f8..95573f292aae 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -6,6 +6,7 @@
#include <rdma/rdma_cm.h>
#include <rdma/ib_verbs.h>
#include <rdma/restrack.h>
+#include <rdma/rdma_counter.h>
#include <linux/mutex.h>
#include <linux/sched/task.h>
#include <linux/pid_namespace.h>
@@ -45,6 +46,7 @@ static const char *type2str(enum rdma_restrack_type type)
[RDMA_RESTRACK_CM_ID] = "CM_ID",
[RDMA_RESTRACK_MR] = "MR",
[RDMA_RESTRACK_CTX] = "CTX",
+ [RDMA_RESTRACK_COUNTER] = "COUNTER",
};
return names[type];
@@ -169,6 +171,8 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
return container_of(res, struct ib_mr, res)->device;
case RDMA_RESTRACK_CTX:
return container_of(res, struct ib_ucontext, res)->device;
+ case RDMA_RESTRACK_COUNTER:
+ return container_of(res, struct rdma_counter, res)->device;
default:
WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
return NULL;
@@ -203,15 +207,22 @@ static void rdma_restrack_add(struct rdma_restrack_entry *res)
kref_init(&res->kref);
init_completion(&res->comp);
- if (res->type != RDMA_RESTRACK_QP)
- ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
- &rt->next_id, GFP_KERNEL);
- else {
+ if (res->type == RDMA_RESTRACK_QP) {
/* Special case to ensure that LQPN points to right QP */
struct ib_qp *qp = container_of(res, struct ib_qp, res);
ret = xa_insert(&rt->xa, qp->qp_num, res, GFP_KERNEL);
res->id = ret ? 0 : qp->qp_num;
+ } else if (res->type == RDMA_RESTRACK_COUNTER) {
+ /* Special case to ensure that cntn points to right counter */
+ struct rdma_counter *counter;
+
+ counter = container_of(res, struct rdma_counter, res);
+ ret = xa_insert(&rt->xa, counter->id, res, GFP_KERNEL);
+ res->id = ret ? 0 : counter->id;
+ } else {
+ ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
+ &rt->next_id, GFP_KERNEL);
}
if (!ret)
@@ -237,7 +248,8 @@ EXPORT_SYMBOL(rdma_restrack_kadd);
*/
void rdma_restrack_uadd(struct rdma_restrack_entry *res)
{
- if (res->type != RDMA_RESTRACK_CM_ID)
+ if ((res->type != RDMA_RESTRACK_CM_ID) &&
+ (res->type != RDMA_RESTRACK_COUNTER))
res->task = NULL;
if (!res->task)
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
new file mode 100644
index 000000000000..283ac1a0cdb7
--- /dev/null
+++ b/include/rdma/rdma_counter.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/*
+ * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef _RDMA_COUNTER_H_
+#define _RDMA_COUNTER_H_
+
+#include <rdma/ib_verbs.h>
+#include <rdma/restrack.h>
+
+struct rdma_counter {
+ struct rdma_restrack_entry res;
+ struct ib_device *device;
+ uint32_t id;
+ u8 port;
+};
+#endif /* _RDMA_COUNTER_H_ */
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index ecf3c7702a4f..4041a4d96524 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -42,6 +42,10 @@ enum rdma_restrack_type {
* @RDMA_RESTRACK_CTX: Verbs contexts (CTX)
*/
RDMA_RESTRACK_CTX,
+ /**
+ * @RDMA_RESTRACK_COUNTER: Statistic Counter
+ */
+ RDMA_RESTRACK_COUNTER,
/**
* @RDMA_RESTRACK_MAX: Last entry, used for array dclarations
*/
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 03/17] RDMA/restrack: Add an API to attach a task to a resource
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
2019-04-29 8:34 ` [PATCH mlx5-next v2 01/17] net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 02/17] RDMA/restrack: Introduce statistic counter Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 04/17] RDMA/restrack: Make is_visible_in_pid_ns() as an API Leon Romanovsky
` (14 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Add rdma_restrack_attach_task() which is able to attach a task
other then "current" to a resource.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/restrack.c | 14 ++++++++++++++
drivers/infiniband/core/restrack.h | 2 ++
2 files changed, 16 insertions(+)
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 95573f292aae..3714634ae296 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -194,6 +194,20 @@ void rdma_restrack_set_task(struct rdma_restrack_entry *res,
}
EXPORT_SYMBOL(rdma_restrack_set_task);
+/**
+ * rdma_restrack_attach_task() - attach the task onto this resource
+ * @res: resource entry
+ * @task: the task to attach, the current task will be used if it is NULL.
+ */
+void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
+ struct task_struct *task)
+{
+ if (res->task)
+ put_task_struct(res->task);
+ get_task_struct(task);
+ res->task = task;
+}
+
static void rdma_restrack_add(struct rdma_restrack_entry *res)
{
struct ib_device *dev = res_to_dev(res);
diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
index 09a1fbdf578e..d084e5f89849 100644
--- a/drivers/infiniband/core/restrack.h
+++ b/drivers/infiniband/core/restrack.h
@@ -25,4 +25,6 @@ struct rdma_restrack_root {
int rdma_restrack_init(struct ib_device *dev);
void rdma_restrack_clean(struct ib_device *dev);
+void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
+ struct task_struct *task);
#endif /* _RDMA_CORE_RESTRACK_H_ */
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 04/17] RDMA/restrack: Make is_visible_in_pid_ns() as an API
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (2 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 03/17] RDMA/restrack: Add an API to attach a task to a resource Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support Leon Romanovsky
` (13 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Remove is_visible_in_pid_ns() from nldev.c and make it as a restrack API,
so that it can be taken advantage by other parts like counter.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/nldev.c | 15 ++-------------
drivers/infiniband/core/restrack.c | 13 +++++++++++++
drivers/infiniband/core/restrack.h | 1 +
3 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index bced945a456d..4fb6d4285970 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -991,17 +991,6 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
},
};
-static bool is_visible_in_pid_ns(struct rdma_restrack_entry *res)
-{
- /*
- * 1. Kern resources should be visible in init name space only
- * 2. Present only resources visible in the current namespace
- */
- if (rdma_is_kernel_res(res))
- return task_active_pid_ns(current) == &init_pid_ns;
- return task_active_pid_ns(current) == task_active_pid_ns(res->task);
-}
-
static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack,
enum rdma_restrack_type res_type)
@@ -1046,7 +1035,7 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
goto err;
}
- if (!is_visible_in_pid_ns(res)) {
+ if (!rdma_is_visible_in_pid_ns(res)) {
ret = -ENOENT;
goto err_get;
}
@@ -1158,7 +1147,7 @@ static int res_get_common_dumpit(struct sk_buff *skb,
* objects.
*/
xa_for_each(&rt->xa, id, res) {
- if (!is_visible_in_pid_ns(res))
+ if (!rdma_is_visible_in_pid_ns(res))
continue;
if (idx < start || !rdma_restrack_get(res))
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 3714634ae296..bddff426ee0f 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -349,3 +349,16 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
}
}
EXPORT_SYMBOL(rdma_restrack_del);
+
+bool rdma_is_visible_in_pid_ns(struct rdma_restrack_entry *res)
+{
+ /*
+ * 1. Kern resources should be visible in init
+ * namespace only
+ * 2. Present only resources visible in the current
+ * namespace
+ */
+ if (rdma_is_kernel_res(res))
+ return task_active_pid_ns(current) == &init_pid_ns;
+ return task_active_pid_ns(current) == task_active_pid_ns(res->task);
+}
diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
index d084e5f89849..7bd177cc0a61 100644
--- a/drivers/infiniband/core/restrack.h
+++ b/drivers/infiniband/core/restrack.h
@@ -27,4 +27,5 @@ int rdma_restrack_init(struct ib_device *dev);
void rdma_restrack_clean(struct ib_device *dev);
void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
struct task_struct *task);
+bool rdma_is_visible_in_pid_ns(struct rdma_restrack_entry *res);
#endif /* _RDMA_CORE_RESTRACK_H_ */
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (3 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 04/17] RDMA/restrack: Make is_visible_in_pid_ns() as an API Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-05-22 16:56 ` Jason Gunthorpe
2019-04-29 8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
` (12 subsequent siblings)
17 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Add an API to support set/clear per-port auto mode.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/Makefile | 2 +-
drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
drivers/infiniband/core/device.c | 4 ++
include/rdma/ib_verbs.h | 2 +
include/rdma/rdma_counter.h | 24 ++++++++++
include/uapi/rdma/rdma_netlink.h | 26 ++++++++++
6 files changed, 134 insertions(+), 1 deletion(-)
create mode 100644 drivers/infiniband/core/counters.c
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 313f2349b518..cddf748c15c9 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -12,7 +12,7 @@ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
multicast.o mad.o smi.o agent.o mad_rmpp.o \
- nldev.o restrack.o
+ nldev.o restrack.o counters.o
ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
new file mode 100644
index 000000000000..bda8d945a758
--- /dev/null
+++ b/drivers/infiniband/core/counters.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
+ */
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_counter.h>
+
+#include "core_priv.h"
+#include "restrack.h"
+
+#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
+
+static int __counter_set_mode(struct rdma_counter_mode *curr,
+ enum rdma_nl_counter_mode new_mode,
+ enum rdma_nl_counter_mask new_mask)
+{
+ if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
+ ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
+ (curr->mode != RDMA_COUNTER_MODE_NONE)))
+ return -EINVAL;
+
+ curr->mode = new_mode;
+ curr->mask = new_mask;
+ return 0;
+}
+
+/**
+ * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
+ *
+ * When @on is true, the @mask must be set
+ */
+int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
+ bool on, enum rdma_nl_counter_mask mask)
+{
+ struct rdma_port_counter *port_counter;
+ int ret;
+
+ if (!rdma_is_port_valid(dev, port))
+ return -EINVAL;
+
+ port_counter = &dev->port_data[port].port_counter;
+ mutex_lock(&port_counter->lock);
+ if (on) {
+ ret = __counter_set_mode(&port_counter->mode,
+ RDMA_COUNTER_MODE_AUTO, mask);
+ } else {
+ if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
+ ret = -EINVAL;
+ goto out;
+ }
+ ret = __counter_set_mode(&port_counter->mode,
+ RDMA_COUNTER_MODE_NONE, 0);
+ }
+
+out:
+ mutex_unlock(&port_counter->lock);
+ return ret;
+}
+
+void rdma_counter_init(struct ib_device *dev)
+{
+ struct rdma_port_counter *port_counter;
+ u32 port;
+
+ if (!dev->ops.alloc_hw_stats)
+ return;
+
+ rdma_for_each_port(dev, port) {
+ port_counter = &dev->port_data[port].port_counter;
+ port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
+ mutex_init(&port_counter->lock);
+ }
+}
+
+void rdma_counter_cleanup(struct ib_device *dev)
+{
+}
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index fcbf2d4c865d..9204b4251fc8 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -46,6 +46,7 @@
#include <rdma/rdma_netlink.h>
#include <rdma/ib_addr.h>
#include <rdma/ib_cache.h>
+#include <rdma/rdma_counter.h>
#include "core_priv.h"
#include "restrack.h"
@@ -1254,6 +1255,8 @@ int ib_register_device(struct ib_device *device, const char *name)
goto dev_cleanup;
}
+ rdma_counter_init(device);
+
ret = enable_device_and_get(device);
if (ret) {
void (*dealloc_fn)(struct ib_device *);
@@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
goto out;
disable_device(ib_dev);
+ rdma_counter_cleanup(ib_dev);
/* Expedite removing unregistered pointers from the hash table */
free_netdevs(ib_dev);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 737ef5ed3930..003d7b49ea54 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -62,6 +62,7 @@
#include <linux/irqflags.h>
#include <linux/preempt.h>
#include <uapi/rdma/ib_user_verbs.h>
+#include <rdma/rdma_counter.h>
#include <rdma/restrack.h>
#include <uapi/rdma/rdma_user_ioctl.h>
#include <uapi/rdma/ib_user_ioctl_verbs.h>
@@ -2213,6 +2214,7 @@ struct ib_port_data {
spinlock_t netdev_lock;
struct net_device __rcu *netdev;
struct hlist_node ndev_hash_link;
+ struct rdma_port_counter port_counter;
};
/* rdma netdev type - specifies protocol type */
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 283ac1a0cdb7..a8a7c1627800 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -6,8 +6,26 @@
#ifndef _RDMA_COUNTER_H_
#define _RDMA_COUNTER_H_
+#include <linux/mutex.h>
+
#include <rdma/ib_verbs.h>
#include <rdma/restrack.h>
+#include <rdma/rdma_netlink.h>
+
+struct auto_mode_param {
+ int qp_type;
+};
+
+struct rdma_counter_mode {
+ enum rdma_nl_counter_mode mode;
+ enum rdma_nl_counter_mask mask;
+ struct auto_mode_param param;
+};
+
+struct rdma_port_counter {
+ struct rdma_counter_mode mode;
+ struct mutex lock;
+};
struct rdma_counter {
struct rdma_restrack_entry res;
@@ -15,4 +33,10 @@ struct rdma_counter {
uint32_t id;
u8 port;
};
+
+void rdma_counter_init(struct ib_device *dev);
+void rdma_counter_cleanup(struct ib_device *dev);
+int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
+ bool on, enum rdma_nl_counter_mask mask);
+
#endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 42a8bdc40a14..cd3cace46b27 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -484,4 +484,30 @@ enum rdma_nldev_attr {
*/
RDMA_NLDEV_ATTR_MAX
};
+
+/*
+ * Supported counter bind modes. All modes are mutual-exclusive.
+ */
+enum rdma_nl_counter_mode {
+ RDMA_COUNTER_MODE_NONE,
+
+ /*
+ * A qp is bound with a counter automatically during initialization
+ * based on the auto mode (e.g., qp type, ...)
+ */
+ RDMA_COUNTER_MODE_AUTO,
+
+ /*
+ * Always the end
+ */
+ RDMA_COUNTER_MODE_MAX,
+};
+
+/*
+ * Supported criteria in counter auto mode.
+ * Currently only "qp type" is supported
+ */
+enum rdma_nl_counter_mask {
+ RDMA_COUNTER_MASK_QP_TYPE = 1,
+};
#endif /* _UAPI_RDMA_NETLINK_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
2019-04-29 8:34 ` [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support Leon Romanovsky
@ 2019-05-22 16:56 ` Jason Gunthorpe
2019-05-29 10:12 ` Leon Romanovsky
0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 16:56 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:41AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> Add an API to support set/clear per-port auto mode.
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> drivers/infiniband/core/Makefile | 2 +-
> drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
> drivers/infiniband/core/device.c | 4 ++
> include/rdma/ib_verbs.h | 2 +
> include/rdma/rdma_counter.h | 24 ++++++++++
> include/uapi/rdma/rdma_netlink.h | 26 ++++++++++
> 6 files changed, 134 insertions(+), 1 deletion(-)
> create mode 100644 drivers/infiniband/core/counters.c
>
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index 313f2349b518..cddf748c15c9 100644
> +++ b/drivers/infiniband/core/Makefile
> @@ -12,7 +12,7 @@ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> device.o fmr_pool.o cache.o netlink.o \
> roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
> multicast.o mad.o smi.o agent.o mad_rmpp.o \
> - nldev.o restrack.o
> + nldev.o restrack.o counters.o
>
> ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
> ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> new file mode 100644
> index 000000000000..bda8d945a758
> +++ b/drivers/infiniband/core/counters.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
> + */
> +#include <rdma/ib_verbs.h>
> +#include <rdma/rdma_counter.h>
> +
> +#include "core_priv.h"
> +#include "restrack.h"
> +
> +#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
> +
> +static int __counter_set_mode(struct rdma_counter_mode *curr,
> + enum rdma_nl_counter_mode new_mode,
> + enum rdma_nl_counter_mask new_mask)
> +{
> + if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
> + ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
> + (curr->mode != RDMA_COUNTER_MODE_NONE)))
> + return -EINVAL;
> +
> + curr->mode = new_mode;
> + curr->mask = new_mask;
> + return 0;
> +}
> +
> +/**
> + * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
> + *
> + * When @on is true, the @mask must be set
> + */
> +int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> + bool on, enum rdma_nl_counter_mask mask)
> +{
> + struct rdma_port_counter *port_counter;
> + int ret;
> +
> + if (!rdma_is_port_valid(dev, port))
> + return -EINVAL;
> +
> + port_counter = &dev->port_data[port].port_counter;
> + mutex_lock(&port_counter->lock);
> + if (on) {
> + ret = __counter_set_mode(&port_counter->mode,
> + RDMA_COUNTER_MODE_AUTO, mask);
> + } else {
> + if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
> + ret = -EINVAL;
> + goto out;
> + }
> + ret = __counter_set_mode(&port_counter->mode,
> + RDMA_COUNTER_MODE_NONE, 0);
> + }
> +
> +out:
> + mutex_unlock(&port_counter->lock);
> + return ret;
> +}
> +
> +void rdma_counter_init(struct ib_device *dev)
> +{
> + struct rdma_port_counter *port_counter;
> + u32 port;
> +
> + if (!dev->ops.alloc_hw_stats)
> + return;
> +
> + rdma_for_each_port(dev, port) {
> + port_counter = &dev->port_data[port].port_counter;
> + port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> + mutex_init(&port_counter->lock);
> + }
> +}
> +
> +void rdma_counter_cleanup(struct ib_device *dev)
> +{
> +}
Please don't add empty functions
> @@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> goto out;
>
> disable_device(ib_dev);
> + rdma_counter_cleanup(ib_dev);
This is the wrong place to call this, the patch that actually adds a
body is just doing kfree's so it is properly called
'rdma_counter_release' and it belongs in ib_device_release()
And it shouldn't test hw_stats, and it shouldn't have a 'fail' stanza
for allocation either.
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
2019-05-22 16:56 ` Jason Gunthorpe
@ 2019-05-29 10:12 ` Leon Romanovsky
2019-05-29 10:43 ` Leon Romanovsky
0 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 10:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 22, 2019 at 01:56:08PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:41AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > Add an API to support set/clear per-port auto mode.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > drivers/infiniband/core/Makefile | 2 +-
> > drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
> > drivers/infiniband/core/device.c | 4 ++
> > include/rdma/ib_verbs.h | 2 +
> > include/rdma/rdma_counter.h | 24 ++++++++++
> > include/uapi/rdma/rdma_netlink.h | 26 ++++++++++
> > 6 files changed, 134 insertions(+), 1 deletion(-)
> > create mode 100644 drivers/infiniband/core/counters.c
> >
> > diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> > index 313f2349b518..cddf748c15c9 100644
> > +++ b/drivers/infiniband/core/Makefile
> > @@ -12,7 +12,7 @@ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> > device.o fmr_pool.o cache.o netlink.o \
> > roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
> > multicast.o mad.o smi.o agent.o mad_rmpp.o \
> > - nldev.o restrack.o
> > + nldev.o restrack.o counters.o
> >
> > ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
> > ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > new file mode 100644
> > index 000000000000..bda8d945a758
> > +++ b/drivers/infiniband/core/counters.c
> > @@ -0,0 +1,77 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > +/*
> > + * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
> > + */
> > +#include <rdma/ib_verbs.h>
> > +#include <rdma/rdma_counter.h>
> > +
> > +#include "core_priv.h"
> > +#include "restrack.h"
> > +
> > +#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
> > +
> > +static int __counter_set_mode(struct rdma_counter_mode *curr,
> > + enum rdma_nl_counter_mode new_mode,
> > + enum rdma_nl_counter_mask new_mask)
> > +{
> > + if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
> > + ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
> > + (curr->mode != RDMA_COUNTER_MODE_NONE)))
> > + return -EINVAL;
> > +
> > + curr->mode = new_mode;
> > + curr->mask = new_mask;
> > + return 0;
> > +}
> > +
> > +/**
> > + * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
> > + *
> > + * When @on is true, the @mask must be set
> > + */
> > +int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> > + bool on, enum rdma_nl_counter_mask mask)
> > +{
> > + struct rdma_port_counter *port_counter;
> > + int ret;
> > +
> > + if (!rdma_is_port_valid(dev, port))
> > + return -EINVAL;
> > +
> > + port_counter = &dev->port_data[port].port_counter;
> > + mutex_lock(&port_counter->lock);
> > + if (on) {
> > + ret = __counter_set_mode(&port_counter->mode,
> > + RDMA_COUNTER_MODE_AUTO, mask);
> > + } else {
> > + if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > + ret = __counter_set_mode(&port_counter->mode,
> > + RDMA_COUNTER_MODE_NONE, 0);
> > + }
> > +
> > +out:
> > + mutex_unlock(&port_counter->lock);
> > + return ret;
> > +}
> > +
> > +void rdma_counter_init(struct ib_device *dev)
> > +{
> > + struct rdma_port_counter *port_counter;
> > + u32 port;
> > +
> > + if (!dev->ops.alloc_hw_stats)
> > + return;
> > +
> > + rdma_for_each_port(dev, port) {
> > + port_counter = &dev->port_data[port].port_counter;
> > + port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> > + mutex_init(&port_counter->lock);
> > + }
> > +}
> > +
> > +void rdma_counter_cleanup(struct ib_device *dev)
> > +{
> > +}
>
> Please don't add empty functions
It is brought here for symmetry, the function is going to be filled in
patch "RDMA/core: Get sum value of all counters when perform a sysfs
stat read".
>
> > @@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> > goto out;
> >
> > disable_device(ib_dev);
> > + rdma_counter_cleanup(ib_dev);
>
> This is the wrong place to call this, the patch that actually adds a
> body is just doing kfree's so it is properly called
> 'rdma_counter_release' and it belongs in ib_device_release()
I'll move.
>
> And it shouldn't test hw_stats, and it shouldn't have a 'fail' stanza
> for allocation either.
Not all devices implement hw_stat.
>
> Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support
2019-05-29 10:12 ` Leon Romanovsky
@ 2019-05-29 10:43 ` Leon Romanovsky
0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 10:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 29, 2019 at 01:12:18PM +0300, Leon Romanovsky wrote:
> On Wed, May 22, 2019 at 01:56:08PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 11:34:41AM +0300, Leon Romanovsky wrote:
> > > From: Mark Zhang <markz@mellanox.com>
> > >
> > > Add an API to support set/clear per-port auto mode.
> > >
> > > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > drivers/infiniband/core/Makefile | 2 +-
> > > drivers/infiniband/core/counters.c | 77 ++++++++++++++++++++++++++++++
> > > drivers/infiniband/core/device.c | 4 ++
> > > include/rdma/ib_verbs.h | 2 +
> > > include/rdma/rdma_counter.h | 24 ++++++++++
> > > include/uapi/rdma/rdma_netlink.h | 26 ++++++++++
> > > 6 files changed, 134 insertions(+), 1 deletion(-)
> > > create mode 100644 drivers/infiniband/core/counters.c
> > >
> > > diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> > > index 313f2349b518..cddf748c15c9 100644
> > > +++ b/drivers/infiniband/core/Makefile
> > > @@ -12,7 +12,7 @@ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> > > device.o fmr_pool.o cache.o netlink.o \
> > > roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
> > > multicast.o mad.o smi.o agent.o mad_rmpp.o \
> > > - nldev.o restrack.o
> > > + nldev.o restrack.o counters.o
> > >
> > > ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
> > > ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> > > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > > new file mode 100644
> > > index 000000000000..bda8d945a758
> > > +++ b/drivers/infiniband/core/counters.c
> > > @@ -0,0 +1,77 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2019 Mellanox Technologies. All rights reserved.
> > > + */
> > > +#include <rdma/ib_verbs.h>
> > > +#include <rdma/rdma_counter.h>
> > > +
> > > +#include "core_priv.h"
> > > +#include "restrack.h"
> > > +
> > > +#define ALL_AUTO_MODE_MASKS (RDMA_COUNTER_MASK_QP_TYPE)
> > > +
> > > +static int __counter_set_mode(struct rdma_counter_mode *curr,
> > > + enum rdma_nl_counter_mode new_mode,
> > > + enum rdma_nl_counter_mask new_mask)
> > > +{
> > > + if ((new_mode == RDMA_COUNTER_MODE_AUTO) &&
> > > + ((new_mask & (~ALL_AUTO_MODE_MASKS)) ||
> > > + (curr->mode != RDMA_COUNTER_MODE_NONE)))
> > > + return -EINVAL;
> > > +
> > > + curr->mode = new_mode;
> > > + curr->mask = new_mask;
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
> > > + *
> > > + * When @on is true, the @mask must be set
> > > + */
> > > +int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> > > + bool on, enum rdma_nl_counter_mask mask)
> > > +{
> > > + struct rdma_port_counter *port_counter;
> > > + int ret;
> > > +
> > > + if (!rdma_is_port_valid(dev, port))
> > > + return -EINVAL;
> > > +
> > > + port_counter = &dev->port_data[port].port_counter;
> > > + mutex_lock(&port_counter->lock);
> > > + if (on) {
> > > + ret = __counter_set_mode(&port_counter->mode,
> > > + RDMA_COUNTER_MODE_AUTO, mask);
> > > + } else {
> > > + if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO) {
> > > + ret = -EINVAL;
> > > + goto out;
> > > + }
> > > + ret = __counter_set_mode(&port_counter->mode,
> > > + RDMA_COUNTER_MODE_NONE, 0);
> > > + }
> > > +
> > > +out:
> > > + mutex_unlock(&port_counter->lock);
> > > + return ret;
> > > +}
> > > +
> > > +void rdma_counter_init(struct ib_device *dev)
> > > +{
> > > + struct rdma_port_counter *port_counter;
> > > + u32 port;
> > > +
> > > + if (!dev->ops.alloc_hw_stats)
> > > + return;
> > > +
> > > + rdma_for_each_port(dev, port) {
> > > + port_counter = &dev->port_data[port].port_counter;
> > > + port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> > > + mutex_init(&port_counter->lock);
> > > + }
> > > +}
> > > +
> > > +void rdma_counter_cleanup(struct ib_device *dev)
> > > +{
> > > +}
> >
> > Please don't add empty functions
>
> It is brought here for symmetry, the function is going to be filled in
> patch "RDMA/core: Get sum value of all counters when perform a sysfs
> stat read".
>
> >
> > > @@ -1304,6 +1307,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> > > goto out;
> > >
> > > disable_device(ib_dev);
> > > + rdma_counter_cleanup(ib_dev);
> >
> > This is the wrong place to call this, the patch that actually adds a
> > body is just doing kfree's so it is properly called
> > 'rdma_counter_release' and it belongs in ib_device_release()
>
> I'll move.
>
> >
> > And it shouldn't test hw_stats, and it shouldn't have a 'fail' stanza
> > for allocation either.
>
> Not all devices implement hw_stat.
ok, I think that I found a way to rewrite the code without need to check hw_stat.
Thanks
>
> >
> > Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration mode support
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (4 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 05/17] RDMA/counter: Add set/clear per-port auto mode support Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-05-22 17:11 ` Jason Gunthorpe
2019-05-22 17:15 ` Jason Gunthorpe
2019-04-29 8:34 ` [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter Leon Romanovsky
` (11 subsequent siblings)
17 siblings, 2 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
In auto mode all QPs belong to one category are bind automatically to
a single counter set. Currently only "qp type" is supported.
In this mode the qp counter is set in RST2INIT modification, and when
a qp is destroyed the counter is unbound.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/counters.c | 222 +++++++++++++++++++++++++++++
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/verbs.c | 9 ++
include/rdma/ib_verbs.h | 18 +++
include/rdma/rdma_counter.h | 8 ++
5 files changed, 259 insertions(+)
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index bda8d945a758..665e0d43c21b 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -57,6 +57,228 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
return ret;
}
+static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
+ enum rdma_nl_counter_mode mode)
+{
+ struct rdma_counter *counter;
+
+ counter = kzalloc(sizeof(*counter), GFP_KERNEL);
+ if (!counter)
+ return NULL;
+
+ counter->device = dev;
+ counter->port = port;
+ counter->res.type = RDMA_RESTRACK_COUNTER;
+ counter->mode.mode = mode;
+ atomic_set(&counter->usecnt, 0);
+ mutex_init(&counter->lock);
+
+ return counter;
+}
+
+static void rdma_counter_dealloc(struct rdma_counter *counter)
+{
+ rdma_restrack_del(&counter->res);
+ kfree(counter);
+}
+
+static void auto_mode_init_counter(struct rdma_counter *counter,
+ const struct ib_qp *qp,
+ enum rdma_nl_counter_mask new_mask)
+{
+ struct auto_mode_param *param = &counter->mode.param;
+
+ counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
+ counter->mode.mask = new_mask;
+
+ if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
+ param->qp_type = qp->qp_type;
+}
+
+static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
+ enum rdma_nl_counter_mask auto_mask)
+{
+ struct auto_mode_param *param = &counter->mode.param;
+ bool match = true;
+
+ if (rdma_is_kernel_res(&counter->res) != rdma_is_kernel_res(&qp->res))
+ return false;
+
+ /* Ensure that counter belong to right PID */
+ if (!rdma_is_kernel_res(&counter->res) &&
+ !rdma_is_kernel_res(&qp->res) &&
+ (task_pid_vnr(counter->res.task) != current->pid))
+ return false;
+
+ if (auto_mask & RDMA_COUNTER_MASK_QP_TYPE)
+ match &= (param->qp_type == qp->qp_type);
+
+ return match;
+}
+
+static int __rdma_counter_bind_qp(struct rdma_counter *counter,
+ struct ib_qp *qp)
+{
+ int ret;
+
+ if (qp->counter)
+ return -EINVAL;
+
+ if (!qp->device->ops.counter_bind_qp)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&counter->lock);
+ ret = qp->device->ops.counter_bind_qp(counter, qp);
+ mutex_unlock(&counter->lock);
+
+ return ret;
+}
+
+static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
+{
+ struct rdma_counter *counter = qp->counter;
+ int ret;
+
+ if (!qp->device->ops.counter_unbind_qp)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&counter->lock);
+ ret = qp->device->ops.counter_unbind_qp(qp, force);
+ mutex_unlock(&counter->lock);
+
+ return ret;
+}
+
+/**
+ * rdma_get_counter_auto_mode - Find the counter that @qp should be bound
+ * with in auto mode
+ *
+ * Return: The counter (with ref-count increased) if found
+ */
+static struct rdma_counter *rdma_get_counter_auto_mode(struct ib_qp *qp,
+ u8 port)
+{
+ struct rdma_port_counter *port_counter;
+ struct rdma_counter *counter = NULL;
+ struct ib_device *dev = qp->device;
+ struct rdma_restrack_entry *res;
+ struct rdma_restrack_root *rt;
+ unsigned long id = 0;
+
+ port_counter = &dev->port_data[port].port_counter;
+ rt = &dev->res[RDMA_RESTRACK_COUNTER];
+ xa_lock(&rt->xa);
+ xa_for_each(&rt->xa, id, res) {
+ if (!rdma_is_visible_in_pid_ns(res))
+ continue;
+
+ if (!rdma_restrack_get(res))
+ continue;
+
+ counter = container_of(res, struct rdma_counter, res);
+ if ((counter->device != qp->device) || (counter->port != port))
+ goto next;
+
+ if (auto_mode_match(qp, counter, port_counter->mode.mask))
+ break;
+next:
+ rdma_restrack_put(res);
+ counter = NULL;
+ }
+
+ xa_unlock(&rt->xa);
+ return counter;
+}
+
+static void rdma_counter_res_add(struct rdma_counter *counter,
+ struct ib_qp *qp)
+{
+ if (rdma_is_kernel_res(&qp->res)) {
+ rdma_restrack_set_task(&counter->res, qp->res.kern_name);
+ rdma_restrack_kadd(&counter->res);
+ } else {
+ rdma_restrack_attach_task(&counter->res, qp->res.task);
+ rdma_restrack_uadd(&counter->res);
+ }
+}
+
+/**
+ * rdma_counter_bind_qp_auto - Check and bind the QP to a counter base on
+ * the auto-mode rule
+ */
+int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port)
+{
+ struct rdma_port_counter *port_counter;
+ struct ib_device *dev = qp->device;
+ struct rdma_counter *counter;
+ int ret;
+
+ if (!rdma_is_port_valid(dev, port))
+ return -EINVAL;
+
+ port_counter = &dev->port_data[port].port_counter;
+ if (port_counter->mode.mode != RDMA_COUNTER_MODE_AUTO)
+ return 0;
+
+ counter = rdma_get_counter_auto_mode(qp, port);
+ if (counter) {
+ ret = __rdma_counter_bind_qp(counter, qp);
+ if (ret) {
+ rdma_restrack_put(&counter->res);
+ return ret;
+ }
+ } else {
+ counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_AUTO);
+ if (!counter)
+ return -ENOMEM;
+
+ auto_mode_init_counter(counter, qp, port_counter->mode.mask);
+
+ ret = __rdma_counter_bind_qp(counter, qp);
+ if (ret)
+ goto err_bind;
+
+ rdma_counter_res_add(counter, qp);
+ if (!rdma_restrack_get(&counter->res)) {
+ ret = -EINVAL;
+ goto err_get;
+ }
+ }
+
+ atomic_inc(&counter->usecnt);
+ return 0;
+
+err_get:
+ __rdma_counter_unbind_qp(qp, false);
+err_bind:
+ rdma_counter_dealloc(counter);
+ return ret;
+}
+
+/**
+ * rdma_counter_unbind_qp - Unbind a qp from a counter
+ * @force:
+ * true - Decrease the counter ref-count anyway (e.g., qp destroy)
+ */
+int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
+{
+ struct rdma_counter *counter = qp->counter;
+ int ret;
+
+ if (!counter)
+ return -EINVAL;
+
+ ret = __rdma_counter_unbind_qp(qp, force);
+ if (ret && !force)
+ return ret;
+
+ rdma_restrack_put(&counter->res);
+ if (atomic_dec_and_test(&counter->usecnt))
+ rdma_counter_dealloc(counter);
+
+ return 0;
+}
+
void rdma_counter_init(struct ib_device *dev)
{
struct rdma_port_counter *port_counter;
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 9204b4251fc8..dfaa57de871f 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2349,6 +2349,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, set_vf_guid);
SET_DEVICE_OP(dev_ops, set_vf_link_state);
SET_DEVICE_OP(dev_ops, unmap_fmr);
+ SET_DEVICE_OP(dev_ops, counter_bind_qp);
+ SET_DEVICE_OP(dev_ops, counter_unbind_qp);
SET_OBJ_SIZE(dev_ops, ib_ah);
SET_OBJ_SIZE(dev_ops, ib_pd);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 7313edc9f091..060d2f071ea7 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1682,6 +1682,14 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
}
}
+ /*
+ * Bind this qp to a counter automatically based on the rdma counter
+ * rules. This only set in RST2INIT with port specified
+ */
+ if (!qp->counter && (attr_mask & IB_QP_PORT) &&
+ ((attr_mask & IB_QP_STATE) && attr->qp_state == IB_QPS_INIT))
+ rdma_counter_bind_qp_auto(qp, attr->port_num);
+
ret = ib_security_modify_qp(qp, attr, attr_mask, udata);
if (ret)
goto out;
@@ -1877,6 +1885,7 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
if (!qp->uobject)
rdma_rw_cleanup_mrs(qp);
+ rdma_counter_unbind_qp(qp, true);
rdma_restrack_del(&qp->res);
ret = qp->device->ops.destroy_qp(qp, udata);
if (!ret) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 003d7b49ea54..06c77b3e42cd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1791,6 +1791,9 @@ struct ib_qp {
* Implementation details of the RDMA core, don't use in drivers:
*/
struct rdma_restrack_entry res;
+
+ /* The counter the qp is bind to */
+ struct rdma_counter *counter;
};
struct ib_dm {
@@ -2553,6 +2556,21 @@ struct ib_device_ops {
*/
void (*dealloc_driver)(struct ib_device *dev);
+ /**
+ * counter_bind_qp - Bind a QP to a counter.
+ * @counter - The counter to be bound. If counter->id is zero then
+ * the driver needs to allocate a new counter and set counter->id
+ */
+ int (*counter_bind_qp)(struct rdma_counter *counter, struct ib_qp *qp);
+ /**
+ * counter_unbind_qp - Unbind the qp from the dynamically-allocated
+ * counter and bind it onto the default one. If this is the last
+ * bound qp, then this counter will be deallocated.
+ * @force - If it is true then free the counter in case of any error.
+ * used in cases like qp destroy.
+ */
+ int (*counter_unbind_qp)(struct ib_qp *qp, bool force);
+
DECLARE_RDMA_OBJ_SIZE(ib_ah);
DECLARE_RDMA_OBJ_SIZE(ib_pd);
DECLARE_RDMA_OBJ_SIZE(ib_srq);
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index a8a7c1627800..159b8ba3487e 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -7,11 +7,14 @@
#define _RDMA_COUNTER_H_
#include <linux/mutex.h>
+#include <linux/pid_namespace.h>
#include <rdma/ib_verbs.h>
#include <rdma/restrack.h>
#include <rdma/rdma_netlink.h>
+struct ib_qp;
+
struct auto_mode_param {
int qp_type;
};
@@ -31,6 +34,9 @@ struct rdma_counter {
struct rdma_restrack_entry res;
struct ib_device *device;
uint32_t id;
+ atomic_t usecnt;
+ struct rdma_counter_mode mode;
+ struct mutex lock;
u8 port;
};
@@ -38,5 +44,7 @@ void rdma_counter_init(struct ib_device *dev);
void rdma_counter_cleanup(struct ib_device *dev);
int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
bool on, enum rdma_nl_counter_mask mask);
+int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
+int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
#endif /* _RDMA_COUNTER_H_ */
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration mode support
2019-04-29 8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
@ 2019-05-22 17:11 ` Jason Gunthorpe
2019-05-22 17:15 ` Jason Gunthorpe
1 sibling, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:11 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:42AM +0300, Leon Romanovsky wrote:
> void rdma_counter_init(struct ib_device *dev)
> {
> struct rdma_port_counter *port_counter;
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 9204b4251fc8..dfaa57de871f 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -2349,6 +2349,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> SET_DEVICE_OP(dev_ops, set_vf_guid);
> SET_DEVICE_OP(dev_ops, set_vf_link_state);
> SET_DEVICE_OP(dev_ops, unmap_fmr);
> + SET_DEVICE_OP(dev_ops, counter_bind_qp);
> + SET_DEVICE_OP(dev_ops, counter_unbind_qp);
Keep sorted
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration mode support
2019-04-29 8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
2019-05-22 17:11 ` Jason Gunthorpe
@ 2019-05-22 17:15 ` Jason Gunthorpe
1 sibling, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:15 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:42AM +0300, Leon Romanovsky wrote:
> +/**
> + * rdma_counter_unbind_qp - Unbind a qp from a counter
> + * @force:
> + * true - Decrease the counter ref-count anyway (e.g., qp destroy)
> + */
> +int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> +{
> + struct rdma_counter *counter = qp->counter;
> + int ret;
> +
> + if (!counter)
> + return -EINVAL;
> +
> + ret = __rdma_counter_unbind_qp(qp, force);
> + if (ret && !force)
> + return ret;
> +
> + rdma_restrack_put(&counter->res);
> + if (atomic_dec_and_test(&counter->usecnt))
> + rdma_counter_dealloc(counter);
An atomic that does kfree when it reaches zero should be implemented
with a kref.
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (5 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 06/17] RDMA/counter: Add "auto" configuration " Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 18:22 ` Saeed Mahameed
2019-04-29 8:34 ` [PATCH rdma-next v2 08/17] IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters() Leon Romanovsky
` (10 subsequent siblings)
17 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Support bind a qp with counter. If counter is null then bind the qp to
the default counter. Different QP state has different operation:
- RESET: Set the counter field so that it will take effective
during RST2INIT change;
- RTS: Issue an RTS2RTS change to update the QP counter;
- Other: Set the counter field and mark the counter_pending flag,
when QP is moved to RTS state and this flag is set, then issue
an RTS2RTS modification to update the counter.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +++
drivers/infiniband/hw/mlx5/qp.c | 76 +++++++++++++++++++++++++++-
include/linux/mlx5/qp.h | 1 +
3 files changed, 81 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 55b8bdb402b6..447f8ad5abbd 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -437,6 +437,10 @@ struct mlx5_ib_qp {
u32 flags_en;
/* storage for qp sub type when core qp type is IB_QPT_DRIVER */
enum ib_qp_type qp_sub_type;
+ /* A flag to indicate if there's a new counter is configured
+ * but not take effective
+ */
+ u32 counter_pending;
};
struct mlx5_ib_cq_buf {
@@ -1418,4 +1422,6 @@ void mlx5_ib_put_xlt_emergency_page(void);
int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
struct mlx5_bfreg_info *bfregi, u32 bfregn,
bool dyn_bfreg);
+
+int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter);
#endif /* MLX5_IB_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index efe1f6f0c351..29e3fcd66510 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -34,6 +34,7 @@
#include <rdma/ib_umem.h>
#include <rdma/ib_cache.h>
#include <rdma/ib_user_verbs.h>
+#include <rdma/rdma_counter.h>
#include <linux/mlx5/fs.h>
#include "mlx5_ib.h"
#include "ib_rep.h"
@@ -3365,6 +3366,35 @@ static unsigned int get_tx_affinity(struct mlx5_ib_dev *dev,
return tx_port_affinity;
}
+static int __mlx5_ib_qp_set_counter(struct ib_qp *qp,
+ struct rdma_counter *counter)
+{
+ struct mlx5_ib_dev *dev = to_mdev(qp->device);
+ struct mlx5_ib_qp *mqp = to_mqp(qp);
+ struct mlx5_qp_context context = {};
+ struct mlx5_ib_port *mibport = NULL;
+ struct mlx5_ib_qp_base *base;
+ u32 set_id;
+
+ if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id))
+ return 0;
+
+ if (counter) {
+ set_id = counter->id;
+ } else {
+ mibport = &dev->port[mqp->port - 1];
+ set_id = mibport->cnts.set_id;
+ }
+
+ base = &mqp->trans_qp.base;
+ context.qp_counter_set_usr_page &= cpu_to_be32(0xffffff);
+ context.qp_counter_set_usr_page |= cpu_to_be32(set_id << 24);
+ return mlx5_core_qp_modify(dev->mdev,
+ MLX5_CMD_OP_RTS2RTS_QP,
+ MLX5_QP_OPTPAR_COUNTER_SET_ID,
+ &context, &base->mqp);
+}
+
static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
const struct ib_qp_attr *attr, int attr_mask,
enum ib_qp_state cur_state,
@@ -3418,6 +3448,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
struct mlx5_ib_port *mibport = NULL;
enum mlx5_qp_state mlx5_cur, mlx5_new;
enum mlx5_qp_optpar optpar;
+ u32 set_id = 0;
int mlx5_st;
int err;
u16 op;
@@ -3580,8 +3611,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
port_num = 0;
mibport = &dev->port[port_num];
+ if (ibqp->counter)
+ set_id = ibqp->counter->id;
+ else
+ set_id = mibport->cnts.set_id;
context->qp_counter_set_usr_page |=
- cpu_to_be32((u32)(mibport->cnts.set_id) << 24);
+ cpu_to_be32(set_id << 24);
}
if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
@@ -3609,7 +3644,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
raw_qp_param.operation = op;
if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
- raw_qp_param.rq_q_ctr_id = mibport->cnts.set_id;
+ raw_qp_param.rq_q_ctr_id = set_id;
raw_qp_param.set_mask |= MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID;
}
@@ -3686,6 +3721,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
qp->db.db[MLX5_SND_DBR] = 0;
}
+ if ((new_state == IB_QPS_RTS) && qp->counter_pending) {
+ err = __mlx5_ib_qp_set_counter(ibqp, ibqp->counter);
+ if (!err)
+ qp->counter_pending = 0;
+ }
+
out:
kfree(context);
return err;
@@ -6347,3 +6388,34 @@ void mlx5_ib_drain_rq(struct ib_qp *qp)
handle_drain_completion(cq, &rdrain, dev);
}
+
+/**
+ * Bind a qp to a counter. If @counter is NULL then bind the qp to
+ * the default counter
+ */
+int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter)
+{
+ struct mlx5_ib_qp *mqp = to_mqp(qp);
+ int err = 0;
+
+ mutex_lock(&mqp->mutex);
+ if (mqp->state == IB_QPS_RESET) {
+ qp->counter = counter;
+ goto out;
+ }
+
+ if (mqp->state == IB_QPS_RTS) {
+ err = __mlx5_ib_qp_set_counter(qp, counter);
+ if (!err)
+ qp->counter = counter;
+
+ goto out;
+ }
+
+ mqp->counter_pending = 1;
+ qp->counter = counter;
+
+out:
+ mutex_unlock(&mqp->mutex);
+ return err;
+}
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 0343c81d4c5f..b0b47106bc76 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -70,6 +70,7 @@ enum mlx5_qp_optpar {
MLX5_QP_OPTPAR_CQN_RCV = 1 << 19,
MLX5_QP_OPTPAR_DC_HS = 1 << 20,
MLX5_QP_OPTPAR_DC_KEY = 1 << 21,
+ MLX5_QP_OPTPAR_COUNTER_SET_ID = 1 << 25,
};
enum mlx5_qp_state {
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter
2019-04-29 8:34 ` [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter Leon Romanovsky
@ 2019-04-29 18:22 ` Saeed Mahameed
2019-04-29 18:38 ` Leon Romanovsky
0 siblings, 1 reply; 40+ messages in thread
From: Saeed Mahameed @ 2019-04-29 18:22 UTC (permalink / raw)
To: Jason Gunthorpe, leon, dledford
Cc: Majd Dibbiny, Mark Zhang, Leon Romanovsky, linux-rdma, netdev
On Mon, 2019-04-29 at 11:34 +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> Support bind a qp with counter. If counter is null then bind the qp
> to
> the default counter. Different QP state has different operation:
> - RESET: Set the counter field so that it will take effective
> during RST2INIT change;
> - RTS: Issue an RTS2RTS change to update the QP counter;
> - Other: Set the counter field and mark the counter_pending flag,
> when QP is moved to RTS state and this flag is set, then issue
> an RTS2RTS modification to update the counter.
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
> drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +++
> drivers/infiniband/hw/mlx5/qp.c | 76
> +++++++++++++++++++++++++++-
> include/linux/mlx5/qp.h | 1 +
I don't see any reason why this patch should go to mlx5-next branch.
Just because you have one liner in include/linux/mlx5/qp.h, is not
enough reason.
> 3 files changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 55b8bdb402b6..447f8ad5abbd 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -437,6 +437,10 @@ struct mlx5_ib_qp {
> u32 flags_en;
> /* storage for qp sub type when core qp type is IB_QPT_DRIVER
> */
> enum ib_qp_type qp_sub_type;
> + /* A flag to indicate if there's a new counter is configured
> + * but not take effective
> + */
> + u32 counter_pending;
> };
>
> struct mlx5_ib_cq_buf {
> @@ -1418,4 +1422,6 @@ void mlx5_ib_put_xlt_emergency_page(void);
> int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
> struct mlx5_bfreg_info *bfregi, u32 bfregn,
> bool dyn_bfreg);
> +
> +int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter
> *counter);
> #endif /* MLX5_IB_H */
> diff --git a/drivers/infiniband/hw/mlx5/qp.c
> b/drivers/infiniband/hw/mlx5/qp.c
> index efe1f6f0c351..29e3fcd66510 100644
> --- a/drivers/infiniband/hw/mlx5/qp.c
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -34,6 +34,7 @@
> #include <rdma/ib_umem.h>
> #include <rdma/ib_cache.h>
> #include <rdma/ib_user_verbs.h>
> +#include <rdma/rdma_counter.h>
> #include <linux/mlx5/fs.h>
> #include "mlx5_ib.h"
> #include "ib_rep.h"
> @@ -3365,6 +3366,35 @@ static unsigned int get_tx_affinity(struct
> mlx5_ib_dev *dev,
> return tx_port_affinity;
> }
>
> +static int __mlx5_ib_qp_set_counter(struct ib_qp *qp,
> + struct rdma_counter *counter)
> +{
> + struct mlx5_ib_dev *dev = to_mdev(qp->device);
> + struct mlx5_ib_qp *mqp = to_mqp(qp);
> + struct mlx5_qp_context context = {};
> + struct mlx5_ib_port *mibport = NULL;
> + struct mlx5_ib_qp_base *base;
> + u32 set_id;
> +
> + if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id))
> + return 0;
> +
> + if (counter) {
> + set_id = counter->id;
> + } else {
> + mibport = &dev->port[mqp->port - 1];
> + set_id = mibport->cnts.set_id;
> + }
> +
> + base = &mqp->trans_qp.base;
> + context.qp_counter_set_usr_page &= cpu_to_be32(0xffffff);
> + context.qp_counter_set_usr_page |= cpu_to_be32(set_id << 24);
> + return mlx5_core_qp_modify(dev->mdev,
> + MLX5_CMD_OP_RTS2RTS_QP,
> + MLX5_QP_OPTPAR_COUNTER_SET_ID,
> + &context, &base->mqp);
> +}
> +
> static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
> const struct ib_qp_attr *attr, int
> attr_mask,
> enum ib_qp_state cur_state,
> @@ -3418,6 +3448,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
> struct mlx5_ib_port *mibport = NULL;
> enum mlx5_qp_state mlx5_cur, mlx5_new;
> enum mlx5_qp_optpar optpar;
> + u32 set_id = 0;
> int mlx5_st;
> int err;
> u16 op;
> @@ -3580,8 +3611,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
> port_num = 0;
>
> mibport = &dev->port[port_num];
> + if (ibqp->counter)
> + set_id = ibqp->counter->id;
> + else
> + set_id = mibport->cnts.set_id;
> context->qp_counter_set_usr_page |=
> - cpu_to_be32((u32)(mibport->cnts.set_id) << 24);
> + cpu_to_be32(set_id << 24);
> }
>
> if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state ==
> IB_QPS_INIT)
> @@ -3609,7 +3644,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
>
> raw_qp_param.operation = op;
> if (cur_state == IB_QPS_RESET && new_state ==
> IB_QPS_INIT) {
> - raw_qp_param.rq_q_ctr_id = mibport-
> >cnts.set_id;
> + raw_qp_param.rq_q_ctr_id = set_id;
> raw_qp_param.set_mask |=
> MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID;
> }
>
> @@ -3686,6 +3721,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp
> *ibqp,
> qp->db.db[MLX5_SND_DBR] = 0;
> }
>
> + if ((new_state == IB_QPS_RTS) && qp->counter_pending) {
> + err = __mlx5_ib_qp_set_counter(ibqp, ibqp->counter);
> + if (!err)
> + qp->counter_pending = 0;
> + }
> +
> out:
> kfree(context);
> return err;
> @@ -6347,3 +6388,34 @@ void mlx5_ib_drain_rq(struct ib_qp *qp)
>
> handle_drain_completion(cq, &rdrain, dev);
> }
> +
> +/**
> + * Bind a qp to a counter. If @counter is NULL then bind the qp to
> + * the default counter
> + */
> +int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter
> *counter)
> +{
> + struct mlx5_ib_qp *mqp = to_mqp(qp);
> + int err = 0;
> +
> + mutex_lock(&mqp->mutex);
> + if (mqp->state == IB_QPS_RESET) {
> + qp->counter = counter;
> + goto out;
> + }
> +
> + if (mqp->state == IB_QPS_RTS) {
> + err = __mlx5_ib_qp_set_counter(qp, counter);
> + if (!err)
> + qp->counter = counter;
> +
> + goto out;
> + }
> +
> + mqp->counter_pending = 1;
> + qp->counter = counter;
> +
> +out:
> + mutex_unlock(&mqp->mutex);
> + return err;
> +}
> diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
> index 0343c81d4c5f..b0b47106bc76 100644
> --- a/include/linux/mlx5/qp.h
> +++ b/include/linux/mlx5/qp.h
> @@ -70,6 +70,7 @@ enum mlx5_qp_optpar {
> MLX5_QP_OPTPAR_CQN_RCV = 1 << 19,
> MLX5_QP_OPTPAR_DC_HS = 1 << 20,
> MLX5_QP_OPTPAR_DC_KEY = 1 << 21,
> + MLX5_QP_OPTPAR_COUNTER_SET_ID = 1 << 25,
> };
>
> enum mlx5_qp_state {
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter
2019-04-29 18:22 ` Saeed Mahameed
@ 2019-04-29 18:38 ` Leon Romanovsky
0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 18:38 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Jason Gunthorpe, dledford, Majd Dibbiny, Mark Zhang, linux-rdma, netdev
On Mon, Apr 29, 2019 at 06:22:03PM +0000, Saeed Mahameed wrote:
> On Mon, 2019-04-29 at 11:34 +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > Support bind a qp with counter. If counter is null then bind the qp
> > to
> > the default counter. Different QP state has different operation:
> > - RESET: Set the counter field so that it will take effective
> > during RST2INIT change;
> > - RTS: Issue an RTS2RTS change to update the QP counter;
> > - Other: Set the counter field and mark the counter_pending flag,
> > when QP is moved to RTS state and this flag is set, then issue
> > an RTS2RTS modification to update the counter.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > ---
> > drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +++
> > drivers/infiniband/hw/mlx5/qp.c | 76
> > +++++++++++++++++++++++++++-
> > include/linux/mlx5/qp.h | 1 +
>
> I don't see any reason why this patch should go to mlx5-next branch.
> Just because you have one liner in include/linux/mlx5/qp.h, is not
> enough reason.
>
I'm changing target automatically based on get_maintainer.pl output.
If "$GET_MAINTAINER --email --nol --nos --nosubsystem --noroles $_file | grep Saeed -c"
returns Saeed, it means that such patch has all potential to create merge conflict.
Thanks
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 08/17] IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters()
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (6 preceding siblings ...)
2019-04-29 8:34 ` [PATCH mlx5-next v2 07/17] IB/mlx5: Support set qp counter Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 09/17] IB/mlx5: Support statistic q counter configuration Leon Romanovsky
` (9 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Add counter set id as a parameter so that this API can be used for
querying any q counter.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6135a0b285de..06da76df4aa1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5341,7 +5341,8 @@ static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
static int mlx5_ib_query_q_counters(struct mlx5_core_dev *mdev,
struct mlx5_ib_port *port,
- struct rdma_hw_stats *stats)
+ struct rdma_hw_stats *stats,
+ u16 set_id)
{
int outlen = MLX5_ST_SZ_BYTES(query_q_counter_out);
void *out;
@@ -5352,9 +5353,7 @@ static int mlx5_ib_query_q_counters(struct mlx5_core_dev *mdev,
if (!out)
return -ENOMEM;
- ret = mlx5_core_query_q_counter(mdev,
- port->cnts.set_id, 0,
- out, outlen);
+ ret = mlx5_core_query_q_counter(mdev, set_id, 0, out, outlen);
if (ret)
goto free;
@@ -5414,7 +5413,8 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
port->cnts.num_ext_ppcnt_counters;
/* q_counters are per IB device, query the master mdev */
- ret = mlx5_ib_query_q_counters(dev->mdev, port, stats);
+ ret = mlx5_ib_query_q_counters(dev->mdev, port, stats,
+ port->cnts.set_id);
if (ret)
return ret;
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 09/17] IB/mlx5: Support statistic q counter configuration
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (7 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 08/17] IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters() Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 10/17] RDMA/nldev: Allow counter auto mode configration through RDMA netlink Leon Romanovsky
` (8 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Add support for ib callbacks counter_bind_qp() and counter_unbind_qp().
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 55 +++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 06da76df4aa1..18a3e855d45b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5450,6 +5450,59 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
return num_counters;
}
+static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
+ struct ib_qp *qp)
+{
+ struct mlx5_ib_dev *dev = to_mdev(qp->device);
+ u16 cnt_set_id = 0;
+ int err;
+
+ if (counter->id == 0) {
+ err = mlx5_cmd_alloc_q_counter(dev->mdev,
+ &cnt_set_id,
+ MLX5_SHARED_RESOURCE_UID);
+ if (err)
+ return err;
+ counter->id = cnt_set_id;
+ }
+
+ err = mlx5_ib_qp_set_counter(qp, counter);
+ if (err)
+ goto fail_set_counter;
+
+ return 0;
+
+fail_set_counter:
+ if (cnt_set_id != 0) {
+ mlx5_core_dealloc_q_counter(dev->mdev, cnt_set_id);
+ counter->id = 0;
+ }
+
+ return err;
+}
+
+static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, bool force)
+{
+ struct mlx5_ib_dev *dev = to_mdev(qp->device);
+ struct rdma_counter *counter = qp->counter;
+ int err;
+
+ err = mlx5_ib_qp_set_counter(qp, NULL);
+ if (err && !force)
+ return err;
+
+ /*
+ * Deallocate the counter if this is the last QP bound on it;
+ * If @force is set then we still deallocate the q counter
+ * no matter if there's any error in previous. used for cases
+ * like qp destroy.
+ */
+ if (atomic_read(&counter->usecnt) == 1)
+ return mlx5_core_dealloc_q_counter(dev->mdev, counter->id);
+
+ return 0;
+}
+
static int mlx5_ib_rn_get_params(struct ib_device *device, u8 port_num,
enum rdma_netdev_t type,
struct rdma_netdev_alloc_params *params)
@@ -6306,6 +6359,8 @@ static void mlx5_ib_stage_odp_cleanup(struct mlx5_ib_dev *dev)
static const struct ib_device_ops mlx5_ib_dev_hw_stats_ops = {
.alloc_hw_stats = mlx5_ib_alloc_hw_stats,
.get_hw_stats = mlx5_ib_get_hw_stats,
+ .counter_bind_qp = mlx5_ib_counter_bind_qp,
+ .counter_unbind_qp = mlx5_ib_counter_unbind_qp,
};
static int mlx5_ib_stage_counters_init(struct mlx5_ib_dev *dev)
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 10/17] RDMA/nldev: Allow counter auto mode configration through RDMA netlink
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (8 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 09/17] IB/mlx5: Support statistic q counter configuration Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
` (7 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Provide an option to enable/disable per-port counter auto mode through
RDMA netlink. Limit it to users with ADMIN capability only.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/nldev.c | 78 ++++++++++++++++++++++++++++++++
include/uapi/rdma/rdma_netlink.h | 8 ++++
2 files changed, 86 insertions(+)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 4fb6d4285970..b46992ed3553 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -120,6 +120,9 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
[RDMA_NLDEV_ATTR_DEV_PROTOCOL] = { .type = NLA_NUL_STRING,
.len = RDMA_NLDEV_ATTR_ENTRY_STRLEN },
[RDMA_NLDEV_NET_NS_FD] = { .type = NLA_U32 },
+ [RDMA_NLDEV_ATTR_STAT_MODE] = { .type = NLA_U32 },
+ [RDMA_NLDEV_ATTR_STAT_RES] = { .type = NLA_U32 },
+ [RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK] = { .type = NLA_U32 },
};
static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -1384,6 +1387,78 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
return err;
}
+static int nldev_stat_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+ u32 index, port, mode, mask = 0;
+ struct ib_device *device;
+ struct sk_buff *msg;
+ int ret;
+
+ ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+ nldev_policy, extack);
+ /* Currently only counter for QP is supported */
+ if (ret || !tb[RDMA_NLDEV_ATTR_STAT_RES] ||
+ !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
+ !tb[RDMA_NLDEV_ATTR_PORT_INDEX] || !tb[RDMA_NLDEV_ATTR_STAT_MODE])
+ return -EINVAL;
+
+ if (nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP)
+ return -EINVAL;
+
+ index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+ device = ib_device_get_by_index(sock_net(skb->sk), index);
+ if (!device)
+ return -EINVAL;
+
+ port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+ if (!rdma_is_port_valid(device, port)) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+ RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+ RDMA_NLDEV_CMD_STAT_SET),
+ 0, 0);
+
+ mode = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_MODE]);
+ if (mode != RDMA_COUNTER_MODE_AUTO) {
+ ret = -EMSGSIZE;
+ goto err_msg;
+ }
+
+ if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK])
+ mask = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]);
+
+ ret = rdma_counter_set_auto_mode(device, port,
+ mask ? true : false, mask);
+ if (ret)
+ goto err_msg;
+
+ if (nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, mode) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) {
+ ret = -EMSGSIZE;
+ goto err_msg;
+ }
+
+ nlmsg_end(msg, nlh);
+ ib_device_put(device);
+ return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_msg:
+ nlmsg_free(msg);
+err:
+ ib_device_put(device);
+ return ret;
+}
+
static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
[RDMA_NLDEV_CMD_GET] = {
.doit = nldev_get_doit,
@@ -1434,6 +1509,9 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
},
[RDMA_NLDEV_CMD_SYS_SET] = {
.doit = nldev_set_sys_set_doit,
+ },
+ [RDMA_NLDEV_CMD_STAT_SET] = {
+ .doit = nldev_stat_set_doit,
.flags = RDMA_NL_ADMIN_PERM,
},
};
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index cd3cace46b27..673fdf9ba10f 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -267,6 +267,8 @@ enum rdma_nldev_command {
RDMA_NLDEV_CMD_RES_PD_GET, /* can dump */
+ RDMA_NLDEV_CMD_STAT_SET,
+
RDMA_NLDEV_NUM_OPS
};
@@ -478,6 +480,12 @@ enum rdma_nldev_attr {
* File descriptor handle of the net namespace object
*/
RDMA_NLDEV_NET_NS_FD, /* u32 */
+ /*
+ * Counter-specific attributes.
+ */
+ RDMA_NLDEV_ATTR_STAT_MODE, /* u32 */
+ RDMA_NLDEV_ATTR_STAT_RES, /* u32 */
+ RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, /* u32 */
/*
* Always the end
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (9 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 10/17] RDMA/nldev: Allow counter auto mode configration through RDMA netlink Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-05-22 17:21 ` Jason Gunthorpe
2019-05-22 17:22 ` Jason Gunthorpe
2019-04-29 8:34 ` [PATCH rdma-next v2 12/17] IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support Leon Romanovsky
` (6 subsequent siblings)
17 siblings, 2 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
This patch adds the ability to return all available counters
together with their properties and hwstats.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/counters.c | 28 +++++
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/nldev.c | 173 +++++++++++++++++++++++++++++
include/rdma/ib_verbs.h | 10 ++
include/rdma/rdma_counter.h | 3 +
include/uapi/rdma/rdma_netlink.h | 10 +-
6 files changed, 225 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 665e0d43c21b..36cd9eca1e46 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
{
struct rdma_counter *counter;
+ if (!dev->ops.counter_alloc_stats)
+ return NULL;
+
counter = kzalloc(sizeof(*counter), GFP_KERNEL);
if (!counter)
return NULL;
@@ -69,16 +72,25 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
counter->device = dev;
counter->port = port;
counter->res.type = RDMA_RESTRACK_COUNTER;
+ counter->stats = dev->ops.counter_alloc_stats(counter);
+ if (!counter->stats)
+ goto err_stats;
+
counter->mode.mode = mode;
atomic_set(&counter->usecnt, 0);
mutex_init(&counter->lock);
return counter;
+
+err_stats:
+ kfree(counter);
+ return NULL;
}
static void rdma_counter_dealloc(struct rdma_counter *counter)
{
rdma_restrack_del(&counter->res);
+ kfree(counter->stats);
kfree(counter);
}
@@ -279,6 +291,22 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
return 0;
}
+int rdma_counter_query_stats(struct rdma_counter *counter)
+{
+ int ret;
+
+ struct ib_device *dev = counter->device;
+
+ if (!dev->ops.counter_update_stats)
+ return -EINVAL;
+
+ mutex_lock(&counter->lock);
+ ret = dev->ops.counter_update_stats(counter);
+ mutex_unlock(&counter->lock);
+
+ return ret;
+}
+
void rdma_counter_init(struct ib_device *dev)
{
struct rdma_port_counter *port_counter;
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index dfaa57de871f..c56ffc61ab1e 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2351,6 +2351,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, unmap_fmr);
SET_DEVICE_OP(dev_ops, counter_bind_qp);
SET_DEVICE_OP(dev_ops, counter_unbind_qp);
+ SET_DEVICE_OP(dev_ops, counter_alloc_stats);
+ SET_DEVICE_OP(dev_ops, counter_update_stats);
SET_OBJ_SIZE(dev_ops, ib_ah);
SET_OBJ_SIZE(dev_ops, ib_pd);
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index b46992ed3553..1f7de88effad 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -123,6 +123,13 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
[RDMA_NLDEV_ATTR_STAT_MODE] = { .type = NLA_U32 },
[RDMA_NLDEV_ATTR_STAT_RES] = { .type = NLA_U32 },
[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK] = { .type = NLA_U32 },
+ [RDMA_NLDEV_ATTR_STAT_COUNTER] = { .type = NLA_NESTED },
+ [RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY] = { .type = NLA_NESTED },
+ [RDMA_NLDEV_ATTR_STAT_COUNTER_ID] = { .type = NLA_U32 },
+ [RDMA_NLDEV_ATTR_STAT_HWCOUNTERS] = { .type = NLA_NESTED },
+ [RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY] = { .type = NLA_NESTED },
+ [RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME] = { .type = NLA_NUL_STRING },
+ [RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE] = { .type = NLA_U64 },
};
static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -625,6 +632,160 @@ static int fill_res_pd_entry(struct sk_buff *msg, bool has_cap_net_admin,
err: return -EMSGSIZE;
}
+static int fill_stat_counter_mode(struct sk_buff *msg,
+ struct rdma_counter *counter)
+{
+ struct rdma_counter_mode *m = &counter->mode;
+
+ if (nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, m->mode))
+ return -EMSGSIZE;
+
+ if (m->mode == RDMA_COUNTER_MODE_AUTO)
+ if ((m->mask & RDMA_COUNTER_MASK_QP_TYPE) &&
+ nla_put_u8(msg, RDMA_NLDEV_ATTR_RES_TYPE, m->param.qp_type))
+ return -EMSGSIZE;
+
+ return 0;
+}
+
+static int fill_stat_counter_qp_entry(struct sk_buff *msg, u32 qpn)
+{
+ struct nlattr *entry_attr;
+
+ entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_QP_ENTRY);
+ if (!entry_attr)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qpn))
+ goto err;
+
+ nla_nest_end(msg, entry_attr);
+ return 0;
+
+err:
+ nla_nest_cancel(msg, entry_attr);
+ return -EMSGSIZE;
+}
+
+static int fill_stat_counter_qps(struct sk_buff *msg,
+ struct rdma_counter *counter)
+{
+ struct rdma_restrack_entry *res;
+ struct rdma_restrack_root *rt;
+ struct nlattr *table_attr;
+ struct ib_qp *qp = NULL;
+ unsigned long id = 0;
+ int ret = 0;
+
+ table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_RES_QP);
+
+ rt = &counter->device->res[RDMA_RESTRACK_QP];
+ xa_lock(&rt->xa);
+ xa_for_each(&rt->xa, id, res) {
+ if (!rdma_is_visible_in_pid_ns(res))
+ continue;
+
+ if (!rdma_restrack_get(res))
+ continue;
+
+ qp = container_of(res, struct ib_qp, res);
+ if (qp->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
+ goto next;
+
+ if (!qp->counter || (qp->counter->id != counter->id))
+ goto next;
+
+ ret = fill_stat_counter_qp_entry(msg, qp->qp_num);
+ if (ret) {
+ rdma_restrack_put(res);
+ goto err;
+ }
+
+next:
+ rdma_restrack_put(res);
+ }
+
+ xa_unlock(&rt->xa);
+ nla_nest_end(msg, table_attr);
+ return 0;
+
+err:
+ xa_unlock(&rt->xa);
+ nla_nest_cancel(msg, table_attr);
+ return ret;
+}
+
+static int fill_stat_hwcounter_entry(struct sk_buff *msg,
+ const char *name, u64 value)
+{
+ struct nlattr *entry_attr;
+
+ entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY);
+ if (!entry_attr)
+ return -EMSGSIZE;
+
+ if (nla_put_string(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME,
+ name))
+ goto err;
+ if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE,
+ value, RDMA_NLDEV_ATTR_PAD))
+ goto err;
+
+ nla_nest_end(msg, entry_attr);
+ return 0;
+
+err:
+ nla_nest_cancel(msg, entry_attr);
+ return -EMSGSIZE;
+}
+
+static int fill_stat_counter_hwcounters(struct sk_buff *msg,
+ struct rdma_counter *counter)
+{
+ struct rdma_hw_stats *st = counter->stats;
+ struct nlattr *table_attr;
+ int i;
+
+ table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
+ if (!table_attr)
+ return -EMSGSIZE;
+
+ for (i = 0; i < st->num_counters; i++)
+ if (fill_stat_hwcounter_entry(msg, st->names[i], st->value[i]))
+ goto err;
+
+ nla_nest_end(msg, table_attr);
+ return 0;
+
+err:
+ nla_nest_cancel(msg, table_attr);
+ return -EMSGSIZE;
+}
+
+static int fill_res_counter_entry(struct sk_buff *msg, bool has_cap_net_admin,
+ struct rdma_restrack_entry *res,
+ uint32_t port)
+{
+ struct rdma_counter *counter =
+ container_of(res, struct rdma_counter, res);
+
+ if (port && port != counter->port)
+ return 0;
+
+ /* Dump it even query failed */
+ rdma_counter_query_stats(counter);
+
+ if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, counter->port) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, counter->id) ||
+ fill_res_name_pid(msg, &counter->res) ||
+ fill_stat_counter_mode(msg, counter) ||
+ fill_stat_counter_qps(msg, counter) ||
+ fill_stat_counter_hwcounters(msg, counter))
+ return -EMSGSIZE;
+
+ return 0;
+}
+
static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
@@ -992,6 +1153,13 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
.entry = RDMA_NLDEV_ATTR_RES_PD_ENTRY,
.id = RDMA_NLDEV_ATTR_RES_PDN,
},
+ [RDMA_RESTRACK_COUNTER] = {
+ .fill_res_func = fill_res_counter_entry,
+ .nldev_cmd = RDMA_NLDEV_CMD_STAT_GET,
+ .nldev_attr = RDMA_NLDEV_ATTR_STAT_COUNTER,
+ .entry = RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY,
+ .id = RDMA_NLDEV_ATTR_STAT_COUNTER_ID,
+ },
};
static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -1228,6 +1396,7 @@ RES_GET_FUNCS(cm_id, RDMA_RESTRACK_CM_ID);
RES_GET_FUNCS(cq, RDMA_RESTRACK_CQ);
RES_GET_FUNCS(pd, RDMA_RESTRACK_PD);
RES_GET_FUNCS(mr, RDMA_RESTRACK_MR);
+RES_GET_FUNCS(counter, RDMA_RESTRACK_COUNTER);
static LIST_HEAD(link_ops);
static DECLARE_RWSEM(link_ops_rwsem);
@@ -1514,6 +1683,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
.doit = nldev_stat_set_doit,
.flags = RDMA_NL_ADMIN_PERM,
},
+ [RDMA_NLDEV_CMD_STAT_GET] = {
+ .doit = nldev_res_get_counter_doit,
+ .dump = nldev_res_get_counter_dumpit,
+ },
};
void __init nldev_init(void)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 06c77b3e42cd..d12770f63e4e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2570,6 +2570,16 @@ struct ib_device_ops {
* used in cases like qp destroy.
*/
int (*counter_unbind_qp)(struct ib_qp *qp, bool force);
+ /**
+ * counter_alloc_stats - Allocate a struct rdma_hw_stats and fill in
+ * the driver initialized data.
+ */
+ struct rdma_hw_stats *(*counter_alloc_stats)(
+ struct rdma_counter *counter);
+ /**
+ * counter_update_stats - Query the stats value of this counter
+ */
+ int (*counter_update_stats)(struct rdma_counter *counter);
DECLARE_RDMA_OBJ_SIZE(ib_ah);
DECLARE_RDMA_OBJ_SIZE(ib_pd);
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 159b8ba3487e..4bc62909a638 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -37,6 +37,7 @@ struct rdma_counter {
atomic_t usecnt;
struct rdma_counter_mode mode;
struct mutex lock;
+ struct rdma_hw_stats *stats;
u8 port;
};
@@ -47,4 +48,6 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
+int rdma_counter_query_stats(struct rdma_counter *counter);
+
#endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 673fdf9ba10f..33eee6fb16b2 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -269,6 +269,8 @@ enum rdma_nldev_command {
RDMA_NLDEV_CMD_STAT_SET,
+ RDMA_NLDEV_CMD_STAT_GET, /* can dump */
+
RDMA_NLDEV_NUM_OPS
};
@@ -486,7 +488,13 @@ enum rdma_nldev_attr {
RDMA_NLDEV_ATTR_STAT_MODE, /* u32 */
RDMA_NLDEV_ATTR_STAT_RES, /* u32 */
RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, /* u32 */
-
+ RDMA_NLDEV_ATTR_STAT_COUNTER, /* nested table */
+ RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY, /* nested table */
+ RDMA_NLDEV_ATTR_STAT_COUNTER_ID, /* u32 */
+ RDMA_NLDEV_ATTR_STAT_HWCOUNTERS, /* nested table */
+ RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY, /* nested table */
+ RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME, /* string */
+ RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE, /* u64 */
/*
* Always the end
*/
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
2019-04-29 8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
@ 2019-05-22 17:21 ` Jason Gunthorpe
2019-05-29 11:31 ` Leon Romanovsky
2019-05-22 17:22 ` Jason Gunthorpe
1 sibling, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:21 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:47AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> This patch adds the ability to return all available counters
> together with their properties and hwstats.
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> drivers/infiniband/core/counters.c | 28 +++++
> drivers/infiniband/core/device.c | 2 +
> drivers/infiniband/core/nldev.c | 173 +++++++++++++++++++++++++++++
> include/rdma/ib_verbs.h | 10 ++
> include/rdma/rdma_counter.h | 3 +
> include/uapi/rdma/rdma_netlink.h | 10 +-
> 6 files changed, 225 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 665e0d43c21b..36cd9eca1e46 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
> {
> struct rdma_counter *counter;
>
> + if (!dev->ops.counter_alloc_stats)
> + return NULL;
> +
Seems weird to add this now, why was it Ok to have counters prior to
this patch?
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
2019-05-22 17:21 ` Jason Gunthorpe
@ 2019-05-29 11:31 ` Leon Romanovsky
0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:31 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 22, 2019 at 02:21:37PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:47AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > This patch adds the ability to return all available counters
> > together with their properties and hwstats.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > drivers/infiniband/core/counters.c | 28 +++++
> > drivers/infiniband/core/device.c | 2 +
> > drivers/infiniband/core/nldev.c | 173 +++++++++++++++++++++++++++++
> > include/rdma/ib_verbs.h | 10 ++
> > include/rdma/rdma_counter.h | 3 +
> > include/uapi/rdma/rdma_netlink.h | 10 +-
> > 6 files changed, 225 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > index 665e0d43c21b..36cd9eca1e46 100644
> > +++ b/drivers/infiniband/core/counters.c
> > @@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
> > {
> > struct rdma_counter *counter;
> >
> > + if (!dev->ops.counter_alloc_stats)
> > + return NULL;
> > +
>
> Seems weird to add this now, why was it Ok to have counters prior to
> this patch?
Prior to this patch, "sysfs" counters and "netlink" counters were
independent from user perspective.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback
2019-04-29 8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
2019-05-22 17:21 ` Jason Gunthorpe
@ 2019-05-22 17:22 ` Jason Gunthorpe
1 sibling, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:22 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:47AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> This patch adds the ability to return all available counters
> together with their properties and hwstats.
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> drivers/infiniband/core/counters.c | 28 +++++
> drivers/infiniband/core/device.c | 2 +
> drivers/infiniband/core/nldev.c | 173 +++++++++++++++++++++++++++++
> include/rdma/ib_verbs.h | 10 ++
> include/rdma/rdma_counter.h | 3 +
> include/uapi/rdma/rdma_netlink.h | 10 +-
> 6 files changed, 225 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 665e0d43c21b..36cd9eca1e46 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -62,6 +62,9 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
> {
> struct rdma_counter *counter;
>
> + if (!dev->ops.counter_alloc_stats)
> + return NULL;
> +
> counter = kzalloc(sizeof(*counter), GFP_KERNEL);
> if (!counter)
> return NULL;
> @@ -69,16 +72,25 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
> counter->device = dev;
> counter->port = port;
> counter->res.type = RDMA_RESTRACK_COUNTER;
> + counter->stats = dev->ops.counter_alloc_stats(counter);
> + if (!counter->stats)
> + goto err_stats;
> +
> counter->mode.mode = mode;
> atomic_set(&counter->usecnt, 0);
> mutex_init(&counter->lock);
>
> return counter;
> +
> +err_stats:
> + kfree(counter);
> + return NULL;
> }
>
> static void rdma_counter_dealloc(struct rdma_counter *counter)
> {
> rdma_restrack_del(&counter->res);
> + kfree(counter->stats);
> kfree(counter);
> }
>
> @@ -279,6 +291,22 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> return 0;
> }
>
> +int rdma_counter_query_stats(struct rdma_counter *counter)
> +{
> + int ret;
> +
> + struct ib_device *dev = counter->device;
> +
Extra blank line
Something about festive trees
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 12/17] IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (10 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 11/17] RDMA/netlink: Implement counter dumpit calback Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
` (5 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Add support for ib callback counter_alloc_stats() and
counter_update_stats().
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 18a3e855d45b..d69cfe511c37 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5450,6 +5450,27 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
return num_counters;
}
+static struct rdma_hw_stats *
+mlx5_ib_counter_alloc_stats(struct rdma_counter *counter)
+{
+ struct mlx5_ib_dev *dev = to_mdev(counter->device);
+ struct mlx5_ib_port *port = &dev->port[counter->port - 1];
+
+ /* Q counters are in the beginning of all counters */
+ return rdma_alloc_hw_stats_struct(port->cnts.names,
+ port->cnts.num_q_counters,
+ RDMA_HW_STATS_DEFAULT_LIFESPAN);
+}
+
+static int mlx5_ib_counter_update_stats(struct rdma_counter *counter)
+{
+ struct mlx5_ib_dev *dev = to_mdev(counter->device);
+ struct mlx5_ib_port *port = &dev->port[counter->port - 1];
+
+ return mlx5_ib_query_q_counters(dev->mdev, port,
+ counter->stats, counter->id);
+}
+
static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
struct ib_qp *qp)
{
@@ -6361,6 +6382,8 @@ static const struct ib_device_ops mlx5_ib_dev_hw_stats_ops = {
.get_hw_stats = mlx5_ib_get_hw_stats,
.counter_bind_qp = mlx5_ib_counter_bind_qp,
.counter_unbind_qp = mlx5_ib_counter_unbind_qp,
+ .counter_alloc_stats = mlx5_ib_counter_alloc_stats,
+ .counter_update_stats = mlx5_ib_counter_update_stats,
};
static int mlx5_ib_stage_counters_init(struct mlx5_ib_dev *dev)
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (11 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 12/17] IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-05-22 17:10 ` Jason Gunthorpe
` (2 more replies)
2019-04-29 8:34 ` [PATCH rdma-next v2 14/17] RDMA/counter: Allow manual mode configuration support Leon Romanovsky
` (4 subsequent siblings)
17 siblings, 3 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Since a QP can only be bound to one counter, then if it is bound to a
separate counter, for backward compatibility purpose, the statistic
value must be:
* stat of default counter
+ stat of all running allocated counters
+ stat of all deallocated counters (history stats)
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
drivers/infiniband/core/device.c | 8 ++-
drivers/infiniband/core/sysfs.c | 10 ++-
include/rdma/rdma_counter.h | 5 +-
4 files changed, 113 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 36cd9eca1e46..f598b1cdb241 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
return ret;
}
+static void counter_history_stat_update(const struct rdma_counter *counter)
+{
+ struct ib_device *dev = counter->device;
+ struct rdma_port_counter *port_counter;
+ int i;
+
+ port_counter = &dev->port_data[counter->port].port_counter;
+ if (!port_counter->hstats)
+ return;
+
+ for (i = 0; i < counter->stats->num_counters; i++)
+ port_counter->hstats->value[i] += counter->stats->value[i];
+}
+
static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
{
struct rdma_counter *counter = qp->counter;
@@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
return ret;
rdma_restrack_put(&counter->res);
- if (atomic_dec_and_test(&counter->usecnt))
+ if (atomic_dec_and_test(&counter->usecnt)) {
+ counter_history_stat_update(counter);
rdma_counter_dealloc(counter);
+ }
return 0;
}
@@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
return ret;
}
-void rdma_counter_init(struct ib_device *dev)
+static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
+ u8 port, u32 index)
+{
+ struct rdma_restrack_entry *res;
+ struct rdma_restrack_root *rt;
+ struct rdma_counter *counter;
+ unsigned long id = 0;
+ u64 sum = 0;
+
+ rt = &dev->res[RDMA_RESTRACK_COUNTER];
+ xa_lock(&rt->xa);
+ xa_for_each(&rt->xa, id, res) {
+ if (!rdma_restrack_get(res))
+ continue;
+
+ counter = container_of(res, struct rdma_counter, res);
+ if ((counter->device != dev) || (counter->port != port))
+ goto next;
+
+ if (rdma_counter_query_stats(counter))
+ goto next;
+
+ sum += counter->stats->value[index];
+next:
+ rdma_restrack_put(res);
+ }
+
+ xa_unlock(&rt->xa);
+ return sum;
+}
+
+/**
+ * rdma_counter_get_hwstat_value() - Get the sum value of all counters on a
+ * specific port, including the running ones and history data
+ */
+u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index)
+{
+ struct rdma_port_counter *port_counter;
+ u64 sum;
+
+ if (!rdma_is_port_valid(dev, port))
+ return -EINVAL;
+
+ port_counter = &dev->port_data[port].port_counter;
+ if (index >= port_counter->hstats->num_counters)
+ return -EINVAL;
+
+ sum = get_running_counters_hwstat_sum(dev, port, index);
+ sum += port_counter->hstats->value[index];
+
+ return sum;
+}
+
+int rdma_counter_init(struct ib_device *dev)
{
struct rdma_port_counter *port_counter;
u32 port;
if (!dev->ops.alloc_hw_stats)
- return;
+ return 0;
rdma_for_each_port(dev, port) {
port_counter = &dev->port_data[port].port_counter;
port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
mutex_init(&port_counter->lock);
+
+ port_counter->hstats = dev->ops.alloc_hw_stats(dev, port);
+ if (!port_counter->hstats)
+ goto fail;
}
+
+ return 0;
+
+fail:
+ rdma_for_each_port(dev, port) {
+ port_counter = &dev->port_data[port].port_counter;
+ kfree(port_counter->hstats);
+ }
+
+ return -ENOMEM;
}
void rdma_counter_cleanup(struct ib_device *dev)
{
+ struct rdma_port_counter *port_counter;
+ u32 port;
+
+ if (!dev->ops.alloc_hw_stats)
+ return;
+
+ rdma_for_each_port(dev, port) {
+ port_counter = &dev->port_data[port].port_counter;
+ kfree(port_counter->hstats);
+ }
}
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c56ffc61ab1e..8ae4906a60e7 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
goto dev_cleanup;
}
- rdma_counter_init(device);
+ ret = rdma_counter_init(device);
+ if (ret) {
+ dev_warn(&device->dev, "Couldn't initialize counter\n");
+ goto sysfs_cleanup;
+ }
ret = enable_device_and_get(device);
if (ret) {
@@ -1283,6 +1287,8 @@ int ib_register_device(struct ib_device *device, const char *name)
return 0;
+sysfs_cleanup:
+ ib_device_unregister_sysfs(device);
dev_cleanup:
device_del(&device->dev);
cg_cleanup:
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 2fe89754e592..8d1cf1bbb5f5 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -43,6 +43,7 @@
#include <rdma/ib_mad.h>
#include <rdma/ib_pma.h>
#include <rdma/ib_cache.h>
+#include <rdma/rdma_counter.h>
struct ib_port;
@@ -795,9 +796,12 @@ static int update_hw_stats(struct ib_device *dev, struct rdma_hw_stats *stats,
return 0;
}
-static ssize_t print_hw_stat(struct rdma_hw_stats *stats, int index, char *buf)
+static ssize_t print_hw_stat(struct ib_device *dev, int port_num,
+ struct rdma_hw_stats *stats, int index, char *buf)
{
- return sprintf(buf, "%llu\n", stats->value[index]);
+ u64 v = rdma_counter_get_hwstat_value(dev, port_num, index);
+
+ return sprintf(buf, "%llu\n", stats->value[index] + v);
}
static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
@@ -823,7 +827,7 @@ static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index);
if (ret)
goto unlock;
- ret = print_hw_stat(stats, hsa->index, buf);
+ ret = print_hw_stat(dev, hsa->port_num, stats, hsa->index, buf);
unlock:
mutex_unlock(&stats->lock);
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 4bc62909a638..5ad86ae67cc5 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -27,6 +27,7 @@ struct rdma_counter_mode {
struct rdma_port_counter {
struct rdma_counter_mode mode;
+ struct rdma_hw_stats *hstats;
struct mutex lock;
};
@@ -41,13 +42,13 @@ struct rdma_counter {
u8 port;
};
-void rdma_counter_init(struct ib_device *dev);
+int rdma_counter_init(struct ib_device *dev);
void rdma_counter_cleanup(struct ib_device *dev);
int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
bool on, enum rdma_nl_counter_mask mask);
int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
-
int rdma_counter_query_stats(struct rdma_counter *counter);
+u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index);
#endif /* _RDMA_COUNTER_H_ */
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-04-29 8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
@ 2019-05-22 17:10 ` Jason Gunthorpe
2019-05-29 11:15 ` Leon Romanovsky
2019-05-22 17:26 ` Jason Gunthorpe
2019-05-29 11:17 ` Leon Romanovsky
2 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:10 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> Since a QP can only be bound to one counter, then if it is bound to a
> separate counter, for backward compatibility purpose, the statistic
> value must be:
> * stat of default counter
> + stat of all running allocated counters
> + stat of all deallocated counters (history stats)
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
> drivers/infiniband/core/device.c | 8 ++-
> drivers/infiniband/core/sysfs.c | 10 ++-
> include/rdma/rdma_counter.h | 5 +-
> 4 files changed, 113 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 36cd9eca1e46..f598b1cdb241 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> return ret;
> }
>
> +static void counter_history_stat_update(const struct rdma_counter *counter)
> +{
> + struct ib_device *dev = counter->device;
> + struct rdma_port_counter *port_counter;
> + int i;
> +
> + port_counter = &dev->port_data[counter->port].port_counter;
> + if (!port_counter->hstats)
> + return;
> +
> + for (i = 0; i < counter->stats->num_counters; i++)
> + port_counter->hstats->value[i] += counter->stats->value[i];
> +}
> +
> static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> {
> struct rdma_counter *counter = qp->counter;
> @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> return ret;
>
> rdma_restrack_put(&counter->res);
> - if (atomic_dec_and_test(&counter->usecnt))
> + if (atomic_dec_and_test(&counter->usecnt)) {
> + counter_history_stat_update(counter);
> rdma_counter_dealloc(counter);
> + }
>
> return 0;
> }
> @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
> return ret;
> }
>
> -void rdma_counter_init(struct ib_device *dev)
> +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> + u8 port, u32 index)
> +{
> + struct rdma_restrack_entry *res;
> + struct rdma_restrack_root *rt;
> + struct rdma_counter *counter;
> + unsigned long id = 0;
> + u64 sum = 0;
> +
> + rt = &dev->res[RDMA_RESTRACK_COUNTER];
> + xa_lock(&rt->xa);
> + xa_for_each(&rt->xa, id, res) {
> + if (!rdma_restrack_get(res))
> + continue;
Why do we need to get refcounts if we are holding the xa_lock?
> +
> + counter = container_of(res, struct rdma_counter, res);
> + if ((counter->device != dev) || (counter->port != port))
> + goto next;
> +
> + if (rdma_counter_query_stats(counter))
> + goto next;
And rdma_counter_query_stats does
+ mutex_lock(&counter->lock);
So this was never tested as it will insta-crash with lockdep.
Presumably this is why it is using xa_for_each and restrack_get - but
it needs to drop the lock after successful get.
This sort of comment applies to nearly evey place in this series that
uses xa_for_each.
This needs to be tested with lockdep.
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-05-22 17:10 ` Jason Gunthorpe
@ 2019-05-29 11:15 ` Leon Romanovsky
2019-05-29 15:41 ` Jason Gunthorpe
0 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 22, 2019 at 02:10:42PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > Since a QP can only be bound to one counter, then if it is bound to a
> > separate counter, for backward compatibility purpose, the statistic
> > value must be:
> > * stat of default counter
> > + stat of all running allocated counters
> > + stat of all deallocated counters (history stats)
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
> > drivers/infiniband/core/device.c | 8 ++-
> > drivers/infiniband/core/sysfs.c | 10 ++-
> > include/rdma/rdma_counter.h | 5 +-
> > 4 files changed, 113 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > index 36cd9eca1e46..f598b1cdb241 100644
> > +++ b/drivers/infiniband/core/counters.c
> > @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> > return ret;
> > }
> >
> > +static void counter_history_stat_update(const struct rdma_counter *counter)
> > +{
> > + struct ib_device *dev = counter->device;
> > + struct rdma_port_counter *port_counter;
> > + int i;
> > +
> > + port_counter = &dev->port_data[counter->port].port_counter;
> > + if (!port_counter->hstats)
> > + return;
> > +
> > + for (i = 0; i < counter->stats->num_counters; i++)
> > + port_counter->hstats->value[i] += counter->stats->value[i];
> > +}
> > +
> > static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> > {
> > struct rdma_counter *counter = qp->counter;
> > @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> > return ret;
> >
> > rdma_restrack_put(&counter->res);
> > - if (atomic_dec_and_test(&counter->usecnt))
> > + if (atomic_dec_and_test(&counter->usecnt)) {
> > + counter_history_stat_update(counter);
> > rdma_counter_dealloc(counter);
> > + }
> >
> > return 0;
> > }
> > @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
> > return ret;
> > }
> >
> > -void rdma_counter_init(struct ib_device *dev)
> > +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> > + u8 port, u32 index)
> > +{
> > + struct rdma_restrack_entry *res;
> > + struct rdma_restrack_root *rt;
> > + struct rdma_counter *counter;
> > + unsigned long id = 0;
> > + u64 sum = 0;
> > +
> > + rt = &dev->res[RDMA_RESTRACK_COUNTER];
> > + xa_lock(&rt->xa);
> > + xa_for_each(&rt->xa, id, res) {
> > + if (!rdma_restrack_get(res))
> > + continue;
>
> Why do we need to get refcounts if we are holding the xa_lock?
Don't we need to protect an entry itself from disappearing?
>
> > +
> > + counter = container_of(res, struct rdma_counter, res);
> > + if ((counter->device != dev) || (counter->port != port))
> > + goto next;
> > +
> > + if (rdma_counter_query_stats(counter))
> > + goto next;
>
> And rdma_counter_query_stats does
>
> + mutex_lock(&counter->lock);
>
> So this was never tested as it will insta-crash with lockdep.
>
> Presumably this is why it is using xa_for_each and restrack_get - but
> it needs to drop the lock after successful get.
>
> This sort of comment applies to nearly evey place in this series that
> uses xa_for_each.
>
> This needs to be tested with lockdep.
I use LOCKDEP.
>
> Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-05-29 11:15 ` Leon Romanovsky
@ 2019-05-29 15:41 ` Jason Gunthorpe
0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-29 15:41 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 29, 2019 at 02:15:44PM +0300, Leon Romanovsky wrote:
> On Wed, May 22, 2019 at 02:10:42PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > > From: Mark Zhang <markz@mellanox.com>
> > >
> > > Since a QP can only be bound to one counter, then if it is bound to a
> > > separate counter, for backward compatibility purpose, the statistic
> > > value must be:
> > > * stat of default counter
> > > + stat of all running allocated counters
> > > + stat of all deallocated counters (history stats)
> > >
> > > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
> > > drivers/infiniband/core/device.c | 8 ++-
> > > drivers/infiniband/core/sysfs.c | 10 ++-
> > > include/rdma/rdma_counter.h | 5 +-
> > > 4 files changed, 113 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> > > index 36cd9eca1e46..f598b1cdb241 100644
> > > +++ b/drivers/infiniband/core/counters.c
> > > @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> > > return ret;
> > > }
> > >
> > > +static void counter_history_stat_update(const struct rdma_counter *counter)
> > > +{
> > > + struct ib_device *dev = counter->device;
> > > + struct rdma_port_counter *port_counter;
> > > + int i;
> > > +
> > > + port_counter = &dev->port_data[counter->port].port_counter;
> > > + if (!port_counter->hstats)
> > > + return;
> > > +
> > > + for (i = 0; i < counter->stats->num_counters; i++)
> > > + port_counter->hstats->value[i] += counter->stats->value[i];
> > > +}
> > > +
> > > static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> > > {
> > > struct rdma_counter *counter = qp->counter;
> > > @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> > > return ret;
> > >
> > > rdma_restrack_put(&counter->res);
> > > - if (atomic_dec_and_test(&counter->usecnt))
> > > + if (atomic_dec_and_test(&counter->usecnt)) {
> > > + counter_history_stat_update(counter);
> > > rdma_counter_dealloc(counter);
> > > + }
> > >
> > > return 0;
> > > }
> > > @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
> > > return ret;
> > > }
> > >
> > > -void rdma_counter_init(struct ib_device *dev)
> > > +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> > > + u8 port, u32 index)
> > > +{
> > > + struct rdma_restrack_entry *res;
> > > + struct rdma_restrack_root *rt;
> > > + struct rdma_counter *counter;
> > > + unsigned long id = 0;
> > > + u64 sum = 0;
> > > +
> > > + rt = &dev->res[RDMA_RESTRACK_COUNTER];
> > > + xa_lock(&rt->xa);
> > > + xa_for_each(&rt->xa, id, res) {
> > > + if (!rdma_restrack_get(res))
> > > + continue;
> >
> > Why do we need to get refcounts if we are holding the xa_lock?
>
> Don't we need to protect an entry itself from disappearing?
xa_lock prevents xa_erase and xa_erase should be done before any
parallel kfree.
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-04-29 8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
2019-05-22 17:10 ` Jason Gunthorpe
@ 2019-05-22 17:26 ` Jason Gunthorpe
2019-05-29 11:05 ` Leon Romanovsky
2019-05-29 11:17 ` Leon Romanovsky
2 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:26 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c56ffc61ab1e..8ae4906a60e7 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> goto dev_cleanup;
> }
>
> - rdma_counter_init(device);
> + ret = rdma_counter_init(device);
> + if (ret) {
> + dev_warn(&device->dev, "Couldn't initialize counter\n");
> + goto sysfs_cleanup;
> + }
Don't put this things randomly, if there is some reason it should be
after sysfs it needs a comment, otherwise if it is just allocating
memory it belongs earlier, and the unwind should be done in release.
I also think it is very strange/wrong that both sysfs and counters are
allocating the same alloc_hw_stats object
Why can't they share?
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-05-22 17:26 ` Jason Gunthorpe
@ 2019-05-29 11:05 ` Leon Romanovsky
2019-05-29 15:44 ` Jason Gunthorpe
0 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index c56ffc61ab1e..8ae4906a60e7 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> > goto dev_cleanup;
> > }
> >
> > - rdma_counter_init(device);
> > + ret = rdma_counter_init(device);
> > + if (ret) {
> > + dev_warn(&device->dev, "Couldn't initialize counter\n");
> > + goto sysfs_cleanup;
> > + }
>
> Don't put this things randomly, if there is some reason it should be
> after sysfs it needs a comment, otherwise if it is just allocating
> memory it belongs earlier, and the unwind should be done in release.
>
> I also think it is very strange/wrong that both sysfs and counters are
> allocating the same alloc_hw_stats object
>
> Why can't they share?
They can, but we wanted to separate "legacy" counters which were exposed
through sysfs and "new" counters which can be enabled/disable automatically.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-05-29 11:05 ` Leon Romanovsky
@ 2019-05-29 15:44 ` Jason Gunthorpe
2019-05-30 6:01 ` Mark Zhang
0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-29 15:44 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 29, 2019 at 02:05:24PM +0300, Leon Romanovsky wrote:
> On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > > index c56ffc61ab1e..8ae4906a60e7 100644
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> > > goto dev_cleanup;
> > > }
> > >
> > > - rdma_counter_init(device);
> > > + ret = rdma_counter_init(device);
> > > + if (ret) {
> > > + dev_warn(&device->dev, "Couldn't initialize counter\n");
> > > + goto sysfs_cleanup;
> > > + }
> >
> > Don't put this things randomly, if there is some reason it should be
> > after sysfs it needs a comment, otherwise if it is just allocating
> > memory it belongs earlier, and the unwind should be done in release.
> >
> > I also think it is very strange/wrong that both sysfs and counters are
> > allocating the same alloc_hw_stats object
> >
> > Why can't they share?
>
> They can, but we wanted to separate "legacy" counters which were exposed
> through sysfs and "new" counters which can be enabled/disable automatically.
Is there any cross contamination through the hw_stats? If no they
should just share.
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-05-29 15:44 ` Jason Gunthorpe
@ 2019-05-30 6:01 ` Mark Zhang
2019-05-30 7:04 ` Leon Romanovsky
0 siblings, 1 reply; 40+ messages in thread
From: Mark Zhang @ 2019-05-30 6:01 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Saeed Mahameed,
linux-netdev
On 5/29/2019 11:44 PM, Jason Gunthorpe wrote:
> On Wed, May 29, 2019 at 02:05:24PM +0300, Leon Romanovsky wrote:
>> On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
>>> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
>>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
>>>> index c56ffc61ab1e..8ae4906a60e7 100644
>>>> +++ b/drivers/infiniband/core/device.c
>>>> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
>>>> goto dev_cleanup;
>>>> }
>>>>
>>>> - rdma_counter_init(device);
>>>> + ret = rdma_counter_init(device);
>>>> + if (ret) {
>>>> + dev_warn(&device->dev, "Couldn't initialize counter\n");
>>>> + goto sysfs_cleanup;
>>>> + }
>>>
>>> Don't put this things randomly, if there is some reason it should be
>>> after sysfs it needs a comment, otherwise if it is just allocating
>>> memory it belongs earlier, and the unwind should be done in release.
>>>
>>> I also think it is very strange/wrong that both sysfs and counters are
>>> allocating the same alloc_hw_stats object
>>>
>>> Why can't they share?
>>
>> They can, but we wanted to separate "legacy" counters which were exposed
>> through sysfs and "new" counters which can be enabled/disable automatically.
>
> Is there any cross contamination through the hw_stats? If no they
> should just share. >
sysfs hw_stats holds the port counter, while this one holds the
history value of all dynamically allocated counters. They can not share.
port_counter =
default_counter + running_dynamic_counter + history_dynamic_counter
> Jason
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-05-30 6:01 ` Mark Zhang
@ 2019-05-30 7:04 ` Leon Romanovsky
0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-30 7:04 UTC (permalink / raw)
To: Mark Zhang
Cc: Jason Gunthorpe, Doug Ledford, RDMA mailing list, Majd Dibbiny,
Saeed Mahameed, linux-netdev
On Thu, May 30, 2019 at 06:01:46AM +0000, Mark Zhang wrote:
> On 5/29/2019 11:44 PM, Jason Gunthorpe wrote:
> > On Wed, May 29, 2019 at 02:05:24PM +0300, Leon Romanovsky wrote:
> >> On Wed, May 22, 2019 at 02:26:36PM -0300, Jason Gunthorpe wrote:
> >>> On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> >>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> >>>> index c56ffc61ab1e..8ae4906a60e7 100644
> >>>> +++ b/drivers/infiniband/core/device.c
> >>>> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> >>>> goto dev_cleanup;
> >>>> }
> >>>>
> >>>> - rdma_counter_init(device);
> >>>> + ret = rdma_counter_init(device);
> >>>> + if (ret) {
> >>>> + dev_warn(&device->dev, "Couldn't initialize counter\n");
> >>>> + goto sysfs_cleanup;
> >>>> + }
> >>>
> >>> Don't put this things randomly, if there is some reason it should be
> >>> after sysfs it needs a comment, otherwise if it is just allocating
> >>> memory it belongs earlier, and the unwind should be done in release.
> >>>
> >>> I also think it is very strange/wrong that both sysfs and counters are
> >>> allocating the same alloc_hw_stats object
> >>>
> >>> Why can't they share?
> >>
> >> They can, but we wanted to separate "legacy" counters which were exposed
> >> through sysfs and "new" counters which can be enabled/disable automatically.
> >
> > Is there any cross contamination through the hw_stats? If no they
> > should just share. >
>
> sysfs hw_stats holds the port counter, while this one holds the
> history value of all dynamically allocated counters. They can not share.
> port_counter =
> default_counter + running_dynamic_counter + history_dynamic_counter
I'm not saying that it is right thing to do, but nothing prevents from you
to add extra field to port_counter exactly for that.
Thanks
>
> > Jason
> >
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read
2019-04-29 8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
2019-05-22 17:10 ` Jason Gunthorpe
2019-05-22 17:26 ` Jason Gunthorpe
@ 2019-05-29 11:17 ` Leon Romanovsky
2 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:17 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: RDMA mailing list, Majd Dibbiny, Mark Zhang, Saeed Mahameed,
linux-netdev
On Mon, Apr 29, 2019 at 11:34:49AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> Since a QP can only be bound to one counter, then if it is bound to a
> separate counter, for backward compatibility purpose, the statistic
> value must be:
> * stat of default counter
> + stat of all running allocated counters
> + stat of all deallocated counters (history stats)
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
> drivers/infiniband/core/counters.c | 99 +++++++++++++++++++++++++++++-
> drivers/infiniband/core/device.c | 8 ++-
> drivers/infiniband/core/sysfs.c | 10 ++-
> include/rdma/rdma_counter.h | 5 +-
> 4 files changed, 113 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 36cd9eca1e46..f598b1cdb241 100644
> --- a/drivers/infiniband/core/counters.c
> +++ b/drivers/infiniband/core/counters.c
> @@ -146,6 +146,20 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> return ret;
> }
>
> +static void counter_history_stat_update(const struct rdma_counter *counter)
> +{
> + struct ib_device *dev = counter->device;
> + struct rdma_port_counter *port_counter;
> + int i;
> +
> + port_counter = &dev->port_data[counter->port].port_counter;
> + if (!port_counter->hstats)
> + return;
> +
> + for (i = 0; i < counter->stats->num_counters; i++)
> + port_counter->hstats->value[i] += counter->stats->value[i];
> +}
> +
> static int __rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> {
> struct rdma_counter *counter = qp->counter;
> @@ -285,8 +299,10 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
> return ret;
>
> rdma_restrack_put(&counter->res);
> - if (atomic_dec_and_test(&counter->usecnt))
> + if (atomic_dec_and_test(&counter->usecnt)) {
> + counter_history_stat_update(counter);
> rdma_counter_dealloc(counter);
> + }
>
> return 0;
> }
> @@ -307,21 +323,98 @@ int rdma_counter_query_stats(struct rdma_counter *counter)
> return ret;
> }
>
> -void rdma_counter_init(struct ib_device *dev)
> +static u64 get_running_counters_hwstat_sum(struct ib_device *dev,
> + u8 port, u32 index)
> +{
> + struct rdma_restrack_entry *res;
> + struct rdma_restrack_root *rt;
> + struct rdma_counter *counter;
> + unsigned long id = 0;
> + u64 sum = 0;
> +
> + rt = &dev->res[RDMA_RESTRACK_COUNTER];
> + xa_lock(&rt->xa);
> + xa_for_each(&rt->xa, id, res) {
> + if (!rdma_restrack_get(res))
> + continue;
> +
> + counter = container_of(res, struct rdma_counter, res);
> + if ((counter->device != dev) || (counter->port != port))
> + goto next;
> +
> + if (rdma_counter_query_stats(counter))
> + goto next;
> +
> + sum += counter->stats->value[index];
> +next:
> + rdma_restrack_put(res);
> + }
> +
> + xa_unlock(&rt->xa);
> + return sum;
> +}
> +
> +/**
> + * rdma_counter_get_hwstat_value() - Get the sum value of all counters on a
> + * specific port, including the running ones and history data
> + */
> +u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index)
> +{
> + struct rdma_port_counter *port_counter;
> + u64 sum;
> +
> + if (!rdma_is_port_valid(dev, port))
> + return -EINVAL;
> +
> + port_counter = &dev->port_data[port].port_counter;
> + if (index >= port_counter->hstats->num_counters)
> + return -EINVAL;
> +
> + sum = get_running_counters_hwstat_sum(dev, port, index);
> + sum += port_counter->hstats->value[index];
> +
> + return sum;
This function is wrong, it can't return negative value (error) while
return type is declared as u64.
> +}
> +
> +int rdma_counter_init(struct ib_device *dev)
> {
> struct rdma_port_counter *port_counter;
> u32 port;
>
> if (!dev->ops.alloc_hw_stats)
> - return;
> + return 0;
>
> rdma_for_each_port(dev, port) {
> port_counter = &dev->port_data[port].port_counter;
> port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
> mutex_init(&port_counter->lock);
> +
> + port_counter->hstats = dev->ops.alloc_hw_stats(dev, port);
> + if (!port_counter->hstats)
> + goto fail;
> }
> +
> + return 0;
> +
> +fail:
> + rdma_for_each_port(dev, port) {
> + port_counter = &dev->port_data[port].port_counter;
> + kfree(port_counter->hstats);
> + }
> +
> + return -ENOMEM;
> }
>
> void rdma_counter_cleanup(struct ib_device *dev)
> {
> + struct rdma_port_counter *port_counter;
> + u32 port;
> +
> + if (!dev->ops.alloc_hw_stats)
> + return;
> +
> + rdma_for_each_port(dev, port) {
> + port_counter = &dev->port_data[port].port_counter;
> + kfree(port_counter->hstats);
> + }
> }
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c56ffc61ab1e..8ae4906a60e7 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1255,7 +1255,11 @@ int ib_register_device(struct ib_device *device, const char *name)
> goto dev_cleanup;
> }
>
> - rdma_counter_init(device);
> + ret = rdma_counter_init(device);
> + if (ret) {
> + dev_warn(&device->dev, "Couldn't initialize counter\n");
> + goto sysfs_cleanup;
> + }
>
> ret = enable_device_and_get(device);
> if (ret) {
> @@ -1283,6 +1287,8 @@ int ib_register_device(struct ib_device *device, const char *name)
>
> return 0;
>
> +sysfs_cleanup:
> + ib_device_unregister_sysfs(device);
> dev_cleanup:
> device_del(&device->dev);
> cg_cleanup:
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 2fe89754e592..8d1cf1bbb5f5 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -43,6 +43,7 @@
> #include <rdma/ib_mad.h>
> #include <rdma/ib_pma.h>
> #include <rdma/ib_cache.h>
> +#include <rdma/rdma_counter.h>
>
> struct ib_port;
>
> @@ -795,9 +796,12 @@ static int update_hw_stats(struct ib_device *dev, struct rdma_hw_stats *stats,
> return 0;
> }
>
> -static ssize_t print_hw_stat(struct rdma_hw_stats *stats, int index, char *buf)
> +static ssize_t print_hw_stat(struct ib_device *dev, int port_num,
> + struct rdma_hw_stats *stats, int index, char *buf)
> {
> - return sprintf(buf, "%llu\n", stats->value[index]);
> + u64 v = rdma_counter_get_hwstat_value(dev, port_num, index);
> +
> + return sprintf(buf, "%llu\n", stats->value[index] + v);
> }
>
> static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
> @@ -823,7 +827,7 @@ static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
> ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index);
> if (ret)
> goto unlock;
> - ret = print_hw_stat(stats, hsa->index, buf);
> + ret = print_hw_stat(dev, hsa->port_num, stats, hsa->index, buf);
> unlock:
> mutex_unlock(&stats->lock);
>
> diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
> index 4bc62909a638..5ad86ae67cc5 100644
> --- a/include/rdma/rdma_counter.h
> +++ b/include/rdma/rdma_counter.h
> @@ -27,6 +27,7 @@ struct rdma_counter_mode {
>
> struct rdma_port_counter {
> struct rdma_counter_mode mode;
> + struct rdma_hw_stats *hstats;
> struct mutex lock;
> };
>
> @@ -41,13 +42,13 @@ struct rdma_counter {
> u8 port;
> };
>
> -void rdma_counter_init(struct ib_device *dev);
> +int rdma_counter_init(struct ib_device *dev);
> void rdma_counter_cleanup(struct ib_device *dev);
> int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
> bool on, enum rdma_nl_counter_mask mask);
> int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
> int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
> -
> int rdma_counter_query_stats(struct rdma_counter *counter);
> +u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index);
>
> #endif /* _RDMA_COUNTER_H_ */
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 14/17] RDMA/counter: Allow manual mode configuration support
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (12 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 13/17] RDMA/core: Get sum value of all counters when perform a sysfs stat read Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 15/17] RDMA/nldev: Allow counter manual mode configration through RDMA netlink Leon Romanovsky
` (3 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
In manual mode a QP is bound to a counter manually. If counter is not
specified then a new one will be allocated.
Manually mode is enabled when user binds a QP, and disabled when the
last manually bound QP is unbound.
When auto-mode is turned off and there are counters left, manual mode
is enabled so that the user is able to access these counters.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/counters.c | 226 ++++++++++++++++++++++++++++-
include/rdma/rdma_counter.h | 7 +
include/uapi/rdma/rdma_netlink.h | 6 +
3 files changed, 236 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index f598b1cdb241..6f58352a29ee 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -27,7 +27,9 @@ static int __counter_set_mode(struct rdma_counter_mode *curr,
/**
* rdma_counter_set_auto_mode() - Turn on/off per-port auto mode
*
- * When @on is true, the @mask must be set
+ * When @on is true, the @mask must be set; When @on is false, it goes
+ * into manual mode if there's any counter, so that the user is able to
+ * manually access them.
*/
int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
bool on, enum rdma_nl_counter_mask mask)
@@ -48,8 +50,13 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
ret = -EINVAL;
goto out;
}
- ret = __counter_set_mode(&port_counter->mode,
- RDMA_COUNTER_MODE_NONE, 0);
+
+ if (port_counter->num_counters != 0)
+ ret = __counter_set_mode(&port_counter->mode,
+ RDMA_COUNTER_MODE_MANUAL, 0);
+ else
+ ret = __counter_set_mode(&port_counter->mode,
+ RDMA_COUNTER_MODE_NONE, 0);
}
out:
@@ -60,7 +67,9 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
enum rdma_nl_counter_mode mode)
{
+ struct rdma_port_counter *port_counter;
struct rdma_counter *counter;
+ int ret;
if (!dev->ops.counter_alloc_stats)
return NULL;
@@ -76,12 +85,27 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
if (!counter->stats)
goto err_stats;
+ port_counter = &dev->port_data[port].port_counter;
+ mutex_lock(&port_counter->lock);
+ if (mode == RDMA_COUNTER_MODE_MANUAL) {
+ ret = __counter_set_mode(&port_counter->mode,
+ RDMA_COUNTER_MODE_MANUAL, 0);
+ if (ret)
+ goto err_mode;
+ }
+
+ port_counter->num_counters++;
+ mutex_unlock(&port_counter->lock);
+
counter->mode.mode = mode;
atomic_set(&counter->usecnt, 0);
mutex_init(&counter->lock);
return counter;
+err_mode:
+ mutex_unlock(&port_counter->lock);
+ kfree(counter->stats);
err_stats:
kfree(counter);
return NULL;
@@ -89,6 +113,18 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
static void rdma_counter_dealloc(struct rdma_counter *counter)
{
+ struct rdma_port_counter *port_counter;
+
+ port_counter = &counter->device->port_data[counter->port].port_counter;
+ mutex_lock(&port_counter->lock);
+ port_counter->num_counters--;
+ if ((port_counter->num_counters == 0) &&
+ (port_counter->mode.mode == RDMA_COUNTER_MODE_MANUAL))
+ __counter_set_mode(&port_counter->mode, RDMA_COUNTER_MODE_NONE,
+ 0);
+
+ mutex_unlock(&port_counter->lock);
+
rdma_restrack_del(&counter->res);
kfree(counter->stats);
kfree(counter);
@@ -376,6 +412,190 @@ u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index)
return sum;
}
+static struct ib_qp *rdma_counter_get_qp(struct ib_device *dev, u32 qp_num)
+{
+ struct rdma_restrack_entry *res = NULL;
+ struct ib_qp *qp = NULL;
+
+ res = rdma_restrack_get_byid(dev, RDMA_RESTRACK_QP, qp_num);
+ if (IS_ERR(res))
+ return NULL;
+
+ if (!rdma_is_visible_in_pid_ns(res))
+ goto err;
+
+ qp = container_of(res, struct ib_qp, res);
+ if (qp->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
+ goto err;
+
+ return qp;
+
+err:
+ rdma_restrack_put(&qp->res);
+ return NULL;
+}
+
+static int rdma_counter_bind_qp_manual(struct rdma_counter *counter,
+ struct ib_qp *qp)
+{
+ int ret;
+
+ if (qp->port != counter->port)
+ return -EINVAL;
+
+ ret = __rdma_counter_bind_qp(counter, qp);
+ if (ret)
+ return ret;
+
+ atomic_inc(&counter->usecnt);
+ return 0;
+}
+
+static struct rdma_counter *rdma_get_counter_by_id(struct ib_device *dev,
+ u32 counter_id)
+{
+ struct rdma_restrack_entry *res;
+
+ res = rdma_restrack_get_byid(dev, RDMA_RESTRACK_COUNTER, counter_id);
+ if (IS_ERR(res))
+ return NULL;
+
+ if (!rdma_is_visible_in_pid_ns(res)) {
+ rdma_restrack_put(res);
+ return NULL;
+ }
+
+ return container_of(res, struct rdma_counter, res);
+}
+
+/**
+ * rdma_counter_bind_qpn() - Bind QP @qp_num to counter @counter_id
+ */
+int rdma_counter_bind_qpn(struct ib_device *dev, u8 port,
+ u32 qp_num, u32 counter_id)
+{
+ struct rdma_counter *counter;
+ struct ib_qp *qp;
+ int ret;
+
+ qp = rdma_counter_get_qp(dev, qp_num);
+ if (!qp)
+ return -ENOENT;
+
+ counter = rdma_get_counter_by_id(dev, counter_id);
+ if (!counter) {
+ ret = -ENOENT;
+ goto err;
+ }
+
+ if (counter->res.task != qp->res.task) {
+ ret = -EINVAL;
+ goto err_task;
+ }
+
+ ret = rdma_counter_bind_qp_manual(counter, qp);
+ if (ret)
+ goto err_task;
+
+ rdma_restrack_put(&qp->res);
+ return 0;
+
+err_task:
+ rdma_restrack_put(&counter->res);
+err:
+ rdma_restrack_put(&qp->res);
+ return ret;
+}
+
+/**
+ * rdma_counter_bind_qpn_alloc() - Alloc a counter and bind QP @qp_num to it
+ * The id of new counter is returned in @counter_id
+ */
+int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
+ u32 qp_num, u32 *counter_id)
+{
+ struct rdma_counter *counter;
+ struct ib_qp *qp;
+ int ret;
+
+ if (!rdma_is_port_valid(dev, port))
+ return -EINVAL;
+
+ qp = rdma_counter_get_qp(dev, qp_num);
+ if (!qp)
+ return -ENOENT;
+
+ if (rdma_is_port_valid(dev, qp->port) && (qp->port != port)) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_MANUAL);
+ if (!counter) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ ret = rdma_counter_bind_qp_manual(counter, qp);
+ if (ret)
+ goto err_bind;
+
+ if (counter_id)
+ *counter_id = counter->id;
+
+ rdma_counter_res_add(counter, qp);
+
+ if (!rdma_restrack_get(&counter->res)) {
+ rdma_counter_unbind_qp(qp, false);
+ ret = -EINVAL;
+ }
+
+ rdma_restrack_put(&qp->res);
+ return ret;
+
+err_bind:
+ rdma_counter_dealloc(counter);
+err:
+ rdma_restrack_put(&qp->res);
+ return ret;
+}
+
+/**
+ * rdma_counter_unbind_qpn() - Unbind QP @qp_num from a counter
+ */
+int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
+ u32 qp_num, u32 counter_id)
+{
+ struct rdma_port_counter *port_counter;
+ struct ib_qp *qp;
+ int ret;
+
+ if (!rdma_is_port_valid(dev, port))
+ return -EINVAL;
+
+ qp = rdma_counter_get_qp(dev, qp_num);
+ if (!qp)
+ return -ENOENT;
+
+ if (rdma_is_port_valid(dev, qp->port) && (qp->port != port)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ port_counter = &dev->port_data[port].port_counter;
+ if (!qp->counter || qp->counter->id != counter_id ||
+ port_counter->mode.mode != RDMA_COUNTER_MODE_MANUAL) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = rdma_counter_unbind_qp(qp, false);
+
+out:
+ rdma_restrack_put(&qp->res);
+ return ret;
+}
+
int rdma_counter_init(struct ib_device *dev)
{
struct rdma_port_counter *port_counter;
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index 5ad86ae67cc5..b79319abbdef 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -28,6 +28,7 @@ struct rdma_counter_mode {
struct rdma_port_counter {
struct rdma_counter_mode mode;
struct rdma_hw_stats *hstats;
+ unsigned int num_counters;
struct mutex lock;
};
@@ -50,5 +51,11 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port);
int rdma_counter_unbind_qp(struct ib_qp *qp, bool force);
int rdma_counter_query_stats(struct rdma_counter *counter);
u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u8 port, u32 index);
+int rdma_counter_bind_qpn(struct ib_device *dev, u8 port,
+ u32 qp_num, u32 counter_id);
+int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
+ u32 qp_num, u32 *counter_id);
+int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
+ u32 qp_num, u32 counter_id);
#endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 33eee6fb16b2..5f4461286e28 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -513,6 +513,12 @@ enum rdma_nl_counter_mode {
*/
RDMA_COUNTER_MODE_AUTO,
+ /*
+ * Which qp are bound with which counter is explicitly specified
+ * by the user
+ */
+ RDMA_COUNTER_MODE_MANUAL,
+
/*
* Always the end
*/
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 15/17] RDMA/nldev: Allow counter manual mode configration through RDMA netlink
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (13 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 14/17] RDMA/counter: Allow manual mode configuration support Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 16/17] RDMA/nldev: Allow get counter mode " Leon Romanovsky
` (2 subsequent siblings)
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Provide an option to allow users to manually bind a qp with a counter
through RDMA netlink. Limit it to users with ADMIN capability only.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/nldev.c | 111 +++++++++++++++++++++++++++----
include/rdma/rdma_counter.h | 3 +
include/uapi/rdma/rdma_netlink.h | 2 +
3 files changed, 103 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 1f7de88effad..6b7bc1a3d61d 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1559,8 +1559,8 @@ static int nldev_set_sys_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
static int nldev_stat_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
+ u32 index, port, mode, mask = 0, qpn, cntn = 0;
struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
- u32 index, port, mode, mask = 0;
struct ib_device *device;
struct sk_buff *msg;
int ret;
@@ -1598,30 +1598,111 @@ static int nldev_stat_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
0, 0);
mode = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_MODE]);
- if (mode != RDMA_COUNTER_MODE_AUTO) {
- ret = -EMSGSIZE;
- goto err_msg;
+ if (mode == RDMA_COUNTER_MODE_AUTO) {
+ if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK])
+ mask = nla_get_u32(
+ tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]);
+
+ ret = rdma_counter_set_auto_mode(device, port,
+ mask ? true : false, mask);
+ if (ret)
+ goto err_msg;
+ } else {
+ qpn = nla_get_u32(tb[RDMA_NLDEV_ATTR_RES_LQPN]);
+ if (tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]) {
+ cntn = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]);
+ ret = rdma_counter_bind_qpn(device, port, qpn, cntn);
+ } else {
+ ret = rdma_counter_bind_qpn_alloc(device, port,
+ qpn, &cntn);
+ }
+ if (ret)
+ goto err_msg;
+
+ if (fill_nldev_handle(msg, device) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, cntn) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qpn)) {
+ ret = -EMSGSIZE;
+ goto err_fill;
+ }
+ }
+
+ nlmsg_end(msg, nlh);
+ ib_device_put(device);
+ return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_fill:
+ rdma_counter_unbind_qpn(device, port, qpn, cntn);
+err_msg:
+ nlmsg_free(msg);
+err:
+ ib_device_put(device);
+ return ret;
+}
+
+static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+ struct ib_device *device;
+ struct sk_buff *msg;
+ u32 index, port, qpn, cntn;
+ int ret;
+
+ ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+ nldev_policy, extack);
+ if (ret || !tb[RDMA_NLDEV_ATTR_STAT_RES] ||
+ !tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX] ||
+ !tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID] ||
+ !tb[RDMA_NLDEV_ATTR_RES_LQPN])
+ return -EINVAL;
+
+ if (nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP)
+ return -EINVAL;
+
+ index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+ device = ib_device_get_by_index(sock_net(skb->sk), index);
+ if (!device)
+ return -EINVAL;
+
+ port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+ if (!rdma_is_port_valid(device, port)) {
+ ret = -EINVAL;
+ goto err;
}
- if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK])
- mask = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]);
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+ RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+ RDMA_NLDEV_CMD_STAT_SET),
+ 0, 0);
- ret = rdma_counter_set_auto_mode(device, port,
- mask ? true : false, mask);
+ cntn = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]);
+ qpn = nla_get_u32(tb[RDMA_NLDEV_ATTR_RES_LQPN]);
+ ret = rdma_counter_unbind_qpn(device, port, qpn, cntn);
if (ret)
- goto err_msg;
+ goto err_unbind;
- if (nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, mode) ||
- nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) {
+ if (fill_nldev_handle(msg, device) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, cntn) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_LQPN, qpn)) {
ret = -EMSGSIZE;
- goto err_msg;
+ goto err_fill;
}
nlmsg_end(msg, nlh);
ib_device_put(device);
return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
-err_msg:
+err_fill:
+ rdma_counter_bind_qpn(device, port, qpn, cntn);
+err_unbind:
nlmsg_free(msg);
err:
ib_device_put(device);
@@ -1687,6 +1768,10 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
.doit = nldev_res_get_counter_doit,
.dump = nldev_res_get_counter_dumpit,
},
+ [RDMA_NLDEV_CMD_STAT_DEL] = {
+ .doit = nldev_stat_del_doit,
+ .flags = RDMA_NL_ADMIN_PERM,
+ },
};
void __init nldev_init(void)
diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h
index b79319abbdef..f485b86a318c 100644
--- a/include/rdma/rdma_counter.h
+++ b/include/rdma/rdma_counter.h
@@ -57,5 +57,8 @@ int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
u32 qp_num, u32 *counter_id);
int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
u32 qp_num, u32 counter_id);
+int rdma_counter_get_mode(struct ib_device *dev, u8 port,
+ enum rdma_nl_counter_mode *mode,
+ enum rdma_nl_counter_mask *mask);
#endif /* _RDMA_COUNTER_H_ */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 5f4461286e28..aaeeb0a8aec6 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -271,6 +271,8 @@ enum rdma_nldev_command {
RDMA_NLDEV_CMD_STAT_GET, /* can dump */
+ RDMA_NLDEV_CMD_STAT_DEL,
+
RDMA_NLDEV_NUM_OPS
};
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 16/17] RDMA/nldev: Allow get counter mode through RDMA netlink
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (14 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 15/17] RDMA/nldev: Allow counter manual mode configration through RDMA netlink Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-04-29 8:34 ` [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics " Leon Romanovsky
2019-05-22 17:31 ` [PATCH rdma-next v2 00/17] Statistics counter support Jason Gunthorpe
17 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
Provide an option to get current counter mode through RDMA netlink.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/counters.c | 13 +++++
drivers/infiniband/core/nldev.c | 78 +++++++++++++++++++++++++++++-
2 files changed, 90 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 6f58352a29ee..a3151799f625 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -596,6 +596,19 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u8 port,
return ret;
}
+int rdma_counter_get_mode(struct ib_device *dev, u8 port,
+ enum rdma_nl_counter_mode *mode,
+ enum rdma_nl_counter_mask *mask)
+{
+ struct rdma_port_counter *port_counter;
+
+ port_counter = &dev->port_data[port].port_counter;
+ *mode = port_counter->mode.mode;
+ *mask = port_counter->mode.mask;
+
+ return 0;
+}
+
int rdma_counter_init(struct ib_device *dev)
{
struct rdma_port_counter *port_counter;
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 6b7bc1a3d61d..53c1d2d82a06 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1709,6 +1709,82 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
return ret;
}
+static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+ static enum rdma_nl_counter_mode mode;
+ static enum rdma_nl_counter_mask mask;
+ struct ib_device *device;
+ struct sk_buff *msg;
+ u32 index, port;
+ int ret;
+
+ ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+ nldev_policy, extack);
+ if (ret)
+ return -EINVAL;
+
+ if (!tb[RDMA_NLDEV_ATTR_STAT_MODE])
+ return nldev_res_get_counter_doit(skb, nlh, extack);
+
+ if (!tb[RDMA_NLDEV_ATTR_STAT_RES] || !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
+ !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
+ return -EINVAL;
+
+ if (nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP)
+ return -EINVAL;
+
+ index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+ device = ib_device_get_by_index(sock_net(skb->sk), index);
+ if (!device)
+ return -EINVAL;
+
+ port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+ if (!rdma_is_port_valid(device, port)) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+ RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+ RDMA_NLDEV_CMD_STAT_GET),
+ 0, 0);
+
+ ret = rdma_counter_get_mode(device, port, &mode, &mask);
+ if (ret)
+ goto err_msg;
+
+ if (fill_nldev_handle(msg, device) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_MODE, mode)) {
+ ret = -EMSGSIZE;
+ goto err_msg;
+ }
+
+ if ((mode == RDMA_COUNTER_MODE_AUTO) &&
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) {
+ ret = -EMSGSIZE;
+ goto err_msg;
+ }
+
+ nlmsg_end(msg, nlh);
+ ib_device_put(device);
+ return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_msg:
+ nlmsg_free(msg);
+err:
+ ib_device_put(device);
+ return ret;
+}
+
static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
[RDMA_NLDEV_CMD_GET] = {
.doit = nldev_get_doit,
@@ -1765,7 +1841,7 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
.flags = RDMA_NL_ADMIN_PERM,
},
[RDMA_NLDEV_CMD_STAT_GET] = {
- .doit = nldev_res_get_counter_doit,
+ .doit = nldev_stat_get_doit,
.dump = nldev_res_get_counter_dumpit,
},
[RDMA_NLDEV_CMD_STAT_DEL] = {
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics through RDMA netlink
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (15 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 16/17] RDMA/nldev: Allow get counter mode " Leon Romanovsky
@ 2019-04-29 8:34 ` Leon Romanovsky
2019-05-22 17:30 ` Jason Gunthorpe
2019-05-22 17:31 ` [PATCH rdma-next v2 00/17] Statistics counter support Jason Gunthorpe
17 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2019-04-29 8:34 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
From: Mark Zhang <markz@mellanox.com>
This patch adds the ability to return the hwstats of per-port default
counters (which can also be queried through sysfs nodes).
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/nldev.c | 101 +++++++++++++++++++++++++++++++-
1 file changed, 99 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 53c1d2d82a06..cb2dd38f49f1 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1709,6 +1709,98 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
return ret;
}
+static int nldev_res_get_default_counter_doit(struct sk_buff *skb,
+ struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack,
+ struct nlattr *tb[])
+{
+ struct rdma_hw_stats *stats;
+ struct nlattr *table_attr;
+ struct ib_device *device;
+ int ret, num_cnts, i;
+ struct sk_buff *msg;
+ u32 index, port;
+ u64 v;
+
+ if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
+ return -EINVAL;
+
+ index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+ device = ib_device_get_by_index(sock_net(skb->sk), index);
+ if (!device)
+ return -EINVAL;
+
+ if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
+ if (!rdma_is_port_valid(device, port)) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
+ RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+ RDMA_NLDEV_CMD_STAT_GET),
+ 0, 0);
+
+ if (fill_nldev_handle(msg, device) ||
+ nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) {
+ ret = -EMSGSIZE;
+ goto err_msg;
+ }
+
+ stats = device->ops.alloc_hw_stats(device, port);
+ if (!stats) {
+ ret = -ENOMEM;
+ goto err_msg;
+ }
+
+ num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
+ if (num_cnts < 0) {
+ ret = -EINVAL;
+ goto err_stats;
+ }
+
+ table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
+ if (!table_attr) {
+ ret = -EMSGSIZE;
+ goto err_stats;
+ }
+ for (i = 0; i < num_cnts; i++) {
+ v = stats->value[i] +
+ rdma_counter_get_hwstat_value(device, port, i);
+ if (fill_stat_hwcounter_entry(msg, stats->names[i], v)) {
+ ret = -EMSGSIZE;
+ goto err_table;
+ }
+ }
+ nla_nest_end(msg, table_attr);
+
+ kfree(stats);
+ nlmsg_end(msg, nlh);
+ ib_device_put(device);
+ return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_table:
+ nla_nest_cancel(msg, table_attr);
+err_stats:
+ kfree(stats);
+err_msg:
+ nlmsg_free(msg);
+err:
+ ib_device_put(device);
+ return ret;
+}
+
static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
@@ -1725,8 +1817,13 @@ static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
if (ret)
return -EINVAL;
- if (!tb[RDMA_NLDEV_ATTR_STAT_MODE])
- return nldev_res_get_counter_doit(skb, nlh, extack);
+ if (!tb[RDMA_NLDEV_ATTR_STAT_MODE]) {
+ if (!tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID])
+ return nldev_res_get_default_counter_doit(skb, nlh,
+ extack, tb);
+ else
+ return nldev_res_get_counter_doit(skb, nlh, extack);
+ }
if (!tb[RDMA_NLDEV_ATTR_STAT_RES] || !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
!tb[RDMA_NLDEV_ATTR_PORT_INDEX])
--
2.20.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics through RDMA netlink
2019-04-29 8:34 ` [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics " Leon Romanovsky
@ 2019-05-22 17:30 ` Jason Gunthorpe
2019-05-29 11:54 ` Leon Romanovsky
0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:30 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:53AM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> This patch adds the ability to return the hwstats of per-port default
> counters (which can also be queried through sysfs nodes).
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> drivers/infiniband/core/nldev.c | 101 +++++++++++++++++++++++++++++++-
> 1 file changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 53c1d2d82a06..cb2dd38f49f1 100644
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1709,6 +1709,98 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> return ret;
> }
>
> +static int nldev_res_get_default_counter_doit(struct sk_buff *skb,
> + struct nlmsghdr *nlh,
> + struct netlink_ext_ack *extack,
> + struct nlattr *tb[])
> +{
> + struct rdma_hw_stats *stats;
> + struct nlattr *table_attr;
> + struct ib_device *device;
> + int ret, num_cnts, i;
> + struct sk_buff *msg;
> + u32 index, port;
> + u64 v;
> +
> + if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
> + return -EINVAL;
> +
> + index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
> + device = ib_device_get_by_index(sock_net(skb->sk), index);
> + if (!device)
> + return -EINVAL;
> +
> + if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
> + ret = -EINVAL;
> + goto err;
> + }
> +
> + port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> + if (!rdma_is_port_valid(device, port)) {
> + ret = -EINVAL;
> + goto err;
> + }
> +
> + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> + if (!msg) {
> + ret = -ENOMEM;
> + goto err;
> + }
> +
> + nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
> + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
> + RDMA_NLDEV_CMD_STAT_GET),
> + 0, 0);
> +
> + if (fill_nldev_handle(msg, device) ||
> + nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) {
> + ret = -EMSGSIZE;
> + goto err_msg;
> + }
> +
> + stats = device->ops.alloc_hw_stats(device, port);
> + if (!stats) {
> + ret = -ENOMEM;
> + goto err_msg;
> + }
Why do we need yet another one of these to be allocated?
> + num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
Is '0' right here?
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics through RDMA netlink
2019-05-22 17:30 ` Jason Gunthorpe
@ 2019-05-29 11:54 ` Leon Romanovsky
0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2019-05-29 11:54 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
Saeed Mahameed, linux-netdev
On Wed, May 22, 2019 at 02:30:11PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 11:34:53AM +0300, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > This patch adds the ability to return the hwstats of per-port default
> > counters (which can also be queried through sysfs nodes).
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > drivers/infiniband/core/nldev.c | 101 +++++++++++++++++++++++++++++++-
> > 1 file changed, 99 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> > index 53c1d2d82a06..cb2dd38f49f1 100644
> > +++ b/drivers/infiniband/core/nldev.c
> > @@ -1709,6 +1709,98 @@ static int nldev_stat_del_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> > return ret;
> > }
> >
> > +static int nldev_res_get_default_counter_doit(struct sk_buff *skb,
> > + struct nlmsghdr *nlh,
> > + struct netlink_ext_ack *extack,
> > + struct nlattr *tb[])
> > +{
> > + struct rdma_hw_stats *stats;
> > + struct nlattr *table_attr;
> > + struct ib_device *device;
> > + int ret, num_cnts, i;
> > + struct sk_buff *msg;
> > + u32 index, port;
> > + u64 v;
> > +
> > + if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
> > + return -EINVAL;
> > +
> > + index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
> > + device = ib_device_get_by_index(sock_net(skb->sk), index);
> > + if (!device)
> > + return -EINVAL;
> > +
> > + if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
> > + ret = -EINVAL;
> > + goto err;
> > + }
> > +
> > + port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> > + if (!rdma_is_port_valid(device, port)) {
> > + ret = -EINVAL;
> > + goto err;
> > + }
> > +
> > + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> > + if (!msg) {
> > + ret = -ENOMEM;
> > + goto err;
> > + }
> > +
> > + nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
> > + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
> > + RDMA_NLDEV_CMD_STAT_GET),
> > + 0, 0);
> > +
> > + if (fill_nldev_handle(msg, device) ||
> > + nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) {
> > + ret = -EMSGSIZE;
> > + goto err_msg;
> > + }
> > +
> > + stats = device->ops.alloc_hw_stats(device, port);
> > + if (!stats) {
> > + ret = -ENOMEM;
> > + goto err_msg;
> > + }
>
> Why do we need yet another one of these to be allocated?
I would say that it is bug.
>
> > + num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
>
> Is '0' right here?
I think that "index" (third parameter) is not used at all.
>
> Jason
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH rdma-next v2 00/17] Statistics counter support
2019-04-29 8:34 [PATCH rdma-next v2 00/17] Statistics counter support Leon Romanovsky
` (16 preceding siblings ...)
2019-04-29 8:34 ` [PATCH rdma-next v2 17/17] RDMA/nldev: Allow get default counter statistics " Leon Romanovsky
@ 2019-05-22 17:31 ` Jason Gunthorpe
17 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2019-05-22 17:31 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
Mark Zhang, Saeed Mahameed, linux-netdev
On Mon, Apr 29, 2019 at 11:34:36AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Changelog:
> v1 -> v2:
> * Rebased to latest rdma-next
> v0 -> v1:
> * Changed wording of counter comment
> * Removed unneeded assignments
> * Added extra patch to present global counters
>
> * I didn't change QP type from int to be enum ib_qp_type,
> because it caused to cyclic dependency between ib_verbs.h and
> rdma_counter.h.
>
>
> Hi,
>
> This series from Mark provides dynamic statistics infrastructure.
> He uses netlink interface to configure and retrieve those counters.
>
> This infrastructure allows to users monitor various objects by binding
> to them counters. As the beginning, we used QP object as target for
> those counters, but future patches will include ODP MR information too.
>
> Two binding modes are supported:
> - Auto: This allows a user to build automatic set of objects to a counter
> according to common criteria. For example in a per-type scheme, where in
> one process all QPs with same QP type are bound automatically to a single
> counter.
> - Manual: This allows a user to manually bind objects on a counter.
>
> Those two modes are mutual-exclusive with separation between processes,
> objects created by different processes cannot be bound to a same counter.
>
> For objects which don't support counter binding, we will return
> pre-allocated counters.
>
> $ rdma statistic qp set link mlx5_2/1 auto type on
> $ rdma statistic qp set link mlx5_2/1 auto off
> $ rdma statistic qp bind link mlx5_2/1 lqpn 178
> $ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
> $ rdma statistic show
> $ rdma statistic qp mode
Can you please include the command outputs?
Jason
^ permalink raw reply [flat|nested] 40+ messages in thread