All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Dynamically extendable device counter support
@ 2016-03-15 15:54 Christoph Lameter
  2016-03-15 15:54 ` [PATCH 1/3] ib core: Make device counter infrastructure dynamic Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Christoph Lameter @ 2016-03-15 15:54 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

We currently have issues when we want to expose device counters to the
system because we have statically defined structures that define
these counters. Changing these requires a change in the ABI since
the counter structures are defined in ib_verbs.h.

Moreover it seems that these counter diverge in a device driver
specific way. Counters show up in sysfs that are not supported
by the device driver and that will always be zero.

The patches here change the ABI to avoid hardcoding counter
names to the ABI. Instead of populated counters in structure
predefined in ib_verbs.h we simply return a pointer to an array
of strings describing the pointers supported and an array of
values of these counters. That allows device drives to define
an arbitrary amount of counters as needed. Adding a counter
does not requrie changing ib_verbs.h.

IB device drivers can develop standard semantics for specific
counters that are supported by multiple device drives by using
the same names there. So software can just check if the device
supports a certain counter and then expect the same behavior
of that counter regardless of the device driver.

This patch also adds counters at a port level. A device driver
can support counters at the device level and then at the level
of each individual port supports.

This patch series modifies the Iwarp device drives to support
the new way of handling counters. It removes the
counters that these devices do not support and that have
never had a function.

The mlx4 implementatation provided provides basic counter
support based on a patch posted by Majd years ago.

The mlx5 implementation is just some sample code that
does not actually retrieve any counters. I hope that Mellanox
will fix up the last two patches to allow more counters
for mlx4 and provide the proper way of retrieving counters
for mlx5.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/3] ib core: Make device counter infrastructure dynamic
  2016-03-15 15:54 [PATCH 0/3] Dynamically extendable device counter support Christoph Lameter
@ 2016-03-15 15:54 ` Christoph Lameter
       [not found]   ` <20160315155455.173645653-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2016-03-15 15:54 ` [PATCH 2/3] mlx4: Add support for protocol statistics Christoph Lameter
  2016-03-15 15:54 ` [PATCH 3/3] mlx5: Implement new counter infrastructure Christoph Lameter
  2 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-15 15:54 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: ib_counter_infrastructure --]
[-- Type: text/plain, Size: 19505 bytes --]

The current approach in ib_verbs.h is to define a struct with
statistics fields. ib_verbs is an abstraction layer for
device drivers and the protocol_stats should be abstract. However,
in practice these are actually device specific counters (also used
in such a way in Mellanox OFED). The device stats are a union
and in practice evolve to a union of device specific stats.

ARGH!!!

This should not be defined in a concrete way in ib_verbs.h since the
API definition needs to be device neutral.

What we need is an API that can return an arbitrary number
of counters that may be general or device specific.

Here we simply define an array of strings that describe the
counters and an array of 64 bit counters that are
displayed through sysfs. This makes ib_verbs API appropriately
abstract and device drivers do not need to update ib_verbs.h
(and thus break the API) to add more counters,.

This patch is groundwork to later define the protocol stats for
Mellanox devices.

The strings being used in the device drives should only be present
if the driver actually supports those counters. If there is
a standardized name then devices should use the same name
as in other infiniband device drivers.

Note on cxgb3/4 portions:
- I was unable to test because of the lack of hardware
- The counters that are never incremented were omitted. There may be the
  impression that less counter are supported. But the other counters
  have never been supported.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Index: linux/drivers/infiniband/core/sysfs.c
===================================================================
--- linux.orig/drivers/infiniband/core/sysfs.c
+++ linux/drivers/infiniband/core/sysfs.c
@@ -58,6 +58,7 @@ struct ib_port {
 	struct attribute_group pkey_group;
 	u8                     port_num;
 	struct attribute_group *pma_table;
+	struct attribute_group *ag;
 };
 
 struct port_attribute {
@@ -80,6 +81,16 @@ struct port_table_attribute {
 	__be16			attr_id;
 };
 
+struct ib_port_stats {
+	struct port_attribute	attr;
+	u32			index;
+};
+
+struct ib_device_stats {
+	struct device_attribute	attr;
+	u32			index;
+};
+
 static ssize_t port_attr_show(struct kobject *kobj,
 			      struct attribute *attr, char *buf)
 {
@@ -733,6 +744,151 @@ static struct attribute_group *get_count
 	return &pma_group;
 }
 
+static ssize_t show_protocol_stats(struct ib_device *dev, int index,
+				   u8 port, char *buf)
+{
+	struct rdma_protocol_stats stats = {0};
+	ssize_t ret;
+
+	ret = dev->get_protocol_stats(dev, &stats, port);
+	if (ret)
+		return ret;
+
+	return sprintf(buf, "%llu\n", stats.value[index]);
+}
+
+static ssize_t show_device_protocol_stats(struct device *device,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+	struct ib_device_stats *ds;
+
+	ds = container_of(attr, struct ib_device_stats, attr);
+	return show_protocol_stats(dev, ds->index, 0, buf);
+}
+
+static ssize_t show_port_protocol_stats(struct ib_port *p,
+					struct port_attribute *attr,
+					char *buf)
+{
+	struct ib_port_stats *ps;
+
+	ps = container_of(attr, struct ib_port_stats, attr);
+	return	show_protocol_stats(p->ibdev, ps->index, p->port_num, buf);
+}
+
+static void free_protocol_stats(struct kobject *kobj,
+				struct attribute_group *ag)
+{
+	struct attribute **da;
+
+	sysfs_remove_group(kobj, ag);
+
+	for (da = ag->attrs; *da; da++)
+		kfree(*da);
+
+	kfree(ag->attrs);
+	kfree(ag);
+}
+
+static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
+{
+	struct ib_port_stats *ps;
+
+	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
+	if (!ps)
+		return NULL;
+
+	ps->attr.attr.name = name;
+	ps->attr.attr.mode = S_IRUGO;
+	ps->attr.show = show_port_protocol_stats;
+	ps->attr.store = NULL;
+	ps->index = index;
+
+	return &ps->attr.attr;
+}
+
+static struct attribute *alloc_stats_device(u32 index, u8 port, char *name)
+{
+	struct ib_device_stats *da;
+
+	da = kmalloc(sizeof(*da), GFP_KERNEL);
+	if (!da)
+		return NULL;
+
+	da->attr.attr.name = name;
+	da->attr.attr.mode = S_IRUGO;
+	da->attr.show = show_device_protocol_stats;
+	da->attr.store = NULL;
+	da->index = index;
+
+	return &da->attr.attr;
+}
+
+static struct attribute *alloc_stats_attribute(u32 index, u8 port, char *name)
+{
+	return (port) ? alloc_stats_port(index, port, name) :
+		alloc_stats_device(index, port, name);
+}
+
+static struct attribute_group *create_protocol_stats(struct ib_device *device,
+						     struct kobject *kobj,
+						     u8 port) {
+	struct attribute_group *ag;
+	struct rdma_protocol_stats stats = {0};
+	u32 counters;
+	u32 i;
+	int ret;
+
+	ret = device->get_protocol_stats(device, &stats, port);
+
+	if (ret || !stats.name)
+		return NULL;
+
+	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	if (!ag)
+		return NULL;
+
+	ag->name = stats.dirname;
+
+	for (counters = 0; stats.name[counters]; counters++)
+		;
+
+	BUG_ON(counters > MAX_NR_PROTOCOL_STATS);
+
+	ag->attrs = kcalloc((counters + 1), sizeof(struct attribute *),
+			    GFP_KERNEL);
+	if (!ag->attrs)
+		goto nomem;
+
+	for (i = 0; i < counters; i++) {
+		struct attribute *attr;
+
+		attr = alloc_stats_attribute(i, port, stats.name[i]);
+		if (!attr)
+			goto nomem2;
+
+		ag->attrs[i] = attr;
+	}
+	ag->attrs[counters] = NULL;
+
+	ret = sysfs_create_group(kobj, ag);
+
+	if (ret)
+		goto nomem2;
+
+	return ag;
+nomem2:
+	for (i = 0; i < counters; i++)
+		kfree(ag->attrs[i]);
+
+	kfree(ag->attrs);
+nomem:
+	kfree(ag);
+	return NULL;
+}
+
 static int add_port(struct ib_device *device, int port_num,
 		    int (*port_callback)(struct ib_device *,
 					 u8, struct kobject *))
@@ -835,6 +991,12 @@ static int add_port(struct ib_device *de
 			goto err_remove_pkey;
 	}
 
+	/* If port == 0, it means we have only one port, so the device should
+	 * hold the protocol stats
+	 */
+	if (device->get_protocol_stats && port_num)
+		p->ag = create_protocol_stats(device, &p->kobj, port_num);
+
 	list_add_tail(&p->kobj.entry, &device->port_list);
 
 	kobject_uevent(&p->kobj, KOBJ_ADD);
@@ -972,120 +1134,6 @@ static struct device_attribute *ib_class
 	&dev_attr_node_desc
 };
 
-/* Show a given an attribute in the statistics group */
-static ssize_t show_protocol_stat(const struct device *device,
-			    struct device_attribute *attr, char *buf,
-			    unsigned offset)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-	union rdma_protocol_stats stats;
-	ssize_t ret;
-
-	ret = dev->get_protocol_stats(dev, &stats);
-	if (ret)
-		return ret;
-
-	return sprintf(buf, "%llu\n",
-		       (unsigned long long) ((u64 *) &stats)[offset]);
-}
-
-/* generate a read-only iwarp statistics attribute */
-#define IW_STATS_ENTRY(name)						\
-static ssize_t show_##name(struct device *device,			\
-			   struct device_attribute *attr, char *buf)	\
-{									\
-	return show_protocol_stat(device, attr, buf,			\
-				  offsetof(struct iw_protocol_stats, name) / \
-				  sizeof (u64));			\
-}									\
-static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
-
-IW_STATS_ENTRY(ipInReceives);
-IW_STATS_ENTRY(ipInHdrErrors);
-IW_STATS_ENTRY(ipInTooBigErrors);
-IW_STATS_ENTRY(ipInNoRoutes);
-IW_STATS_ENTRY(ipInAddrErrors);
-IW_STATS_ENTRY(ipInUnknownProtos);
-IW_STATS_ENTRY(ipInTruncatedPkts);
-IW_STATS_ENTRY(ipInDiscards);
-IW_STATS_ENTRY(ipInDelivers);
-IW_STATS_ENTRY(ipOutForwDatagrams);
-IW_STATS_ENTRY(ipOutRequests);
-IW_STATS_ENTRY(ipOutDiscards);
-IW_STATS_ENTRY(ipOutNoRoutes);
-IW_STATS_ENTRY(ipReasmTimeout);
-IW_STATS_ENTRY(ipReasmReqds);
-IW_STATS_ENTRY(ipReasmOKs);
-IW_STATS_ENTRY(ipReasmFails);
-IW_STATS_ENTRY(ipFragOKs);
-IW_STATS_ENTRY(ipFragFails);
-IW_STATS_ENTRY(ipFragCreates);
-IW_STATS_ENTRY(ipInMcastPkts);
-IW_STATS_ENTRY(ipOutMcastPkts);
-IW_STATS_ENTRY(ipInBcastPkts);
-IW_STATS_ENTRY(ipOutBcastPkts);
-IW_STATS_ENTRY(tcpRtoAlgorithm);
-IW_STATS_ENTRY(tcpRtoMin);
-IW_STATS_ENTRY(tcpRtoMax);
-IW_STATS_ENTRY(tcpMaxConn);
-IW_STATS_ENTRY(tcpActiveOpens);
-IW_STATS_ENTRY(tcpPassiveOpens);
-IW_STATS_ENTRY(tcpAttemptFails);
-IW_STATS_ENTRY(tcpEstabResets);
-IW_STATS_ENTRY(tcpCurrEstab);
-IW_STATS_ENTRY(tcpInSegs);
-IW_STATS_ENTRY(tcpOutSegs);
-IW_STATS_ENTRY(tcpRetransSegs);
-IW_STATS_ENTRY(tcpInErrs);
-IW_STATS_ENTRY(tcpOutRsts);
-
-static struct attribute *iw_proto_stats_attrs[] = {
-	&dev_attr_ipInReceives.attr,
-	&dev_attr_ipInHdrErrors.attr,
-	&dev_attr_ipInTooBigErrors.attr,
-	&dev_attr_ipInNoRoutes.attr,
-	&dev_attr_ipInAddrErrors.attr,
-	&dev_attr_ipInUnknownProtos.attr,
-	&dev_attr_ipInTruncatedPkts.attr,
-	&dev_attr_ipInDiscards.attr,
-	&dev_attr_ipInDelivers.attr,
-	&dev_attr_ipOutForwDatagrams.attr,
-	&dev_attr_ipOutRequests.attr,
-	&dev_attr_ipOutDiscards.attr,
-	&dev_attr_ipOutNoRoutes.attr,
-	&dev_attr_ipReasmTimeout.attr,
-	&dev_attr_ipReasmReqds.attr,
-	&dev_attr_ipReasmOKs.attr,
-	&dev_attr_ipReasmFails.attr,
-	&dev_attr_ipFragOKs.attr,
-	&dev_attr_ipFragFails.attr,
-	&dev_attr_ipFragCreates.attr,
-	&dev_attr_ipInMcastPkts.attr,
-	&dev_attr_ipOutMcastPkts.attr,
-	&dev_attr_ipInBcastPkts.attr,
-	&dev_attr_ipOutBcastPkts.attr,
-	&dev_attr_tcpRtoAlgorithm.attr,
-	&dev_attr_tcpRtoMin.attr,
-	&dev_attr_tcpRtoMax.attr,
-	&dev_attr_tcpMaxConn.attr,
-	&dev_attr_tcpActiveOpens.attr,
-	&dev_attr_tcpPassiveOpens.attr,
-	&dev_attr_tcpAttemptFails.attr,
-	&dev_attr_tcpEstabResets.attr,
-	&dev_attr_tcpCurrEstab.attr,
-	&dev_attr_tcpInSegs.attr,
-	&dev_attr_tcpOutSegs.attr,
-	&dev_attr_tcpRetransSegs.attr,
-	&dev_attr_tcpInErrs.attr,
-	&dev_attr_tcpOutRsts.attr,
-	NULL
-};
-
-static struct attribute_group iw_stats_group = {
-	.name	= "proto_stats",
-	.attrs	= iw_proto_stats_attrs,
-};
-
 static void free_port_list_attributes(struct ib_device *device)
 {
 	struct kobject *p, *t;
@@ -1093,6 +1141,8 @@ static void free_port_list_attributes(st
 	list_for_each_entry_safe(p, t, &device->port_list, entry) {
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 		list_del(&p->entry);
+		if (port->ag)
+			free_protocol_stats(&port->kobj, port->ag);
 		sysfs_remove_group(p, port->pma_table);
 		sysfs_remove_group(p, &port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
@@ -1149,12 +1199,14 @@ int ib_device_register_sysfs(struct ib_d
 		}
 	}
 
-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats) {
-		ret = sysfs_create_group(&class_dev->kobj, &iw_stats_group);
-		if (ret)
-			goto err_put;
-	}
+	if (!device->get_protocol_stats)
+		return 0;
+
+	device->ag = create_protocol_stats(device, &class_dev->kobj, 0);
 
+	/* If create_protocol_stats returns NULL it might not be a failure, so
+	 * do nothing
+	 */
 	return 0;
 
 err_put:
@@ -1169,15 +1221,13 @@ err:
 
 void ib_device_unregister_sysfs(struct ib_device *device)
 {
-	/* Hold kobject until ib_dealloc_device() */
-	struct kobject *kobj_dev = kobject_get(&device->dev.kobj);
 	int i;
 
-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats)
-		sysfs_remove_group(kobj_dev, &iw_stats_group);
-
 	free_port_list_attributes(device);
 
+	if (device->ag)
+		free_protocol_stats(&device->dev.kobj, device->ag);
+
 	for (i = 0; i < ARRAY_SIZE(ib_class_attributes); ++i)
 		device_remove_file(&device->dev, ib_class_attributes[i]);
 
Index: linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1218,12 +1218,46 @@ static ssize_t show_board(struct device
 		       iwch_dev->rdev.rnic_info.pdev->device);
 }
 
+char *names[] = {
+	"ipInReceives",
+	"ipInHdrErrors",
+	"ipInAddrErrors",
+	"ipInUnknownProtos",
+	"ipInDiscards",
+	"ipInDelivers",
+	"ipOutRequests",
+	"ipOutDiscards",
+	"ipOutNoRoutes",
+	"ipReasmTimeout",
+	"ipReasmReqds",
+	"ipReasmOKs",
+	"ipReasmFails",
+	"tcpActiveOpens",
+	"tcpPassiveOpens",
+	"tcpAttemptFails",
+	"tcpEstabResets",
+	"tcpCurrEstab",
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpInErrs",
+	"tcpOutRsts",
+	"tcpRtoMin",
+	"tcpRtoMax",
+	NULL
+};
+
 static int iwch_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port)
 {
 	struct iwch_dev *dev;
 	struct tp_mib_stats m;
 	int ret;
+	u64 *p;
+
+	if (port != 0)
+		return 0;
 
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 	dev = to_iwch_dev(ibdev);
@@ -1231,45 +1265,53 @@ static int iwch_get_mib(struct ib_device
 	if (ret)
 		return -ENOSYS;
 
-	memset(stats, 0, sizeof *stats);
-	stats->iw.ipInReceives = ((u64) m.ipInReceive_hi << 32) +
-				m.ipInReceive_lo;
-	stats->iw.ipInHdrErrors = ((u64) m.ipInHdrErrors_hi << 32) +
-				  m.ipInHdrErrors_lo;
-	stats->iw.ipInAddrErrors = ((u64) m.ipInAddrErrors_hi << 32) +
-				   m.ipInAddrErrors_lo;
-	stats->iw.ipInUnknownProtos = ((u64) m.ipInUnknownProtos_hi << 32) +
-				      m.ipInUnknownProtos_lo;
-	stats->iw.ipInDiscards = ((u64) m.ipInDiscards_hi << 32) +
-				 m.ipInDiscards_lo;
-	stats->iw.ipInDelivers = ((u64) m.ipInDelivers_hi << 32) +
-				 m.ipInDelivers_lo;
-	stats->iw.ipOutRequests = ((u64) m.ipOutRequests_hi << 32) +
-				  m.ipOutRequests_lo;
-	stats->iw.ipOutDiscards = ((u64) m.ipOutDiscards_hi << 32) +
-				  m.ipOutDiscards_lo;
-	stats->iw.ipOutNoRoutes = ((u64) m.ipOutNoRoutes_hi << 32) +
-				  m.ipOutNoRoutes_lo;
-	stats->iw.ipReasmTimeout = (u64) m.ipReasmTimeout;
-	stats->iw.ipReasmReqds = (u64) m.ipReasmReqds;
-	stats->iw.ipReasmOKs = (u64) m.ipReasmOKs;
-	stats->iw.ipReasmFails = (u64) m.ipReasmFails;
-	stats->iw.tcpActiveOpens = (u64) m.tcpActiveOpens;
-	stats->iw.tcpPassiveOpens = (u64) m.tcpPassiveOpens;
-	stats->iw.tcpAttemptFails = (u64) m.tcpAttemptFails;
-	stats->iw.tcpEstabResets = (u64) m.tcpEstabResets;
-	stats->iw.tcpOutRsts = (u64) m.tcpOutRsts;
-	stats->iw.tcpCurrEstab = (u64) m.tcpCurrEstab;
-	stats->iw.tcpInSegs = ((u64) m.tcpInSegs_hi << 32) +
-			      m.tcpInSegs_lo;
-	stats->iw.tcpOutSegs = ((u64) m.tcpOutSegs_hi << 32) +
-			       m.tcpOutSegs_lo;
-	stats->iw.tcpRetransSegs = ((u64) m.tcpRetransSeg_hi << 32) +
-				  m.tcpRetransSeg_lo;
-	stats->iw.tcpInErrs = ((u64) m.tcpInErrs_hi << 32) +
-			      m.tcpInErrs_lo;
-	stats->iw.tcpRtoMin = (u64) m.tcpRtoMin;
-	stats->iw.tcpRtoMax = (u64) m.tcpRtoMax;
+	stats->dirname = "iw_stats";
+	stats->name = names;
+
+	p = stats->value;
+	*p++ = ((u64)m.ipInReceive_hi << 32) +
+		m.ipInReceive_lo;
+	*p++ = ((u64)m.ipInHdrErrors_hi << 32) +
+		m.ipInHdrErrors_lo;
+	*p++ = ((u64)m.ipInAddrErrors_hi << 32) +
+		m.ipInAddrErrors_lo;
+	*p++ = ((u64)m.ipInUnknownProtos_hi << 32) +
+		m.ipInUnknownProtos_lo;
+	*p++ = ((u64)m.ipInDiscards_hi << 32) +
+		m.ipInDiscards_lo;
+	*p++ = ((u64)m.ipInDelivers_hi << 32) +
+		m.ipInDelivers_lo;
+	*p++ = ((u64)m.ipOutRequests_hi << 32) +
+		m.ipOutRequests_lo;
+	*p++ = ((u64)m.ipOutDiscards_hi << 32) +
+		m.ipOutDiscards_lo;
+	*p++ = ((u64)m.ipOutNoRoutes_hi << 32) +
+		m.ipOutNoRoutes_lo;
+	*p++ = (u64)m.ipReasmTimeout;
+	*p++ = (u64)m.ipReasmReqds;
+	*p++ = (u64)m.ipReasmOKs;
+	*p++ = (u64)m.ipReasmFails;
+	*p++ = (u64)m.tcpActiveOpens;
+	*p++ = (u64)m.tcpPassiveOpens;
+	*p++ = (u64)m.tcpAttemptFails;
+	*p++ = (u64)m.tcpEstabResets;
+	*p++ = (u64)m.tcpOutRsts;
+	*p++ = (u64)m.tcpCurrEstab;
+	*p++ = ((u64)m.tcpInSegs_hi << 32) +
+		m.tcpInSegs_lo;
+	*p++ = ((u64)m.tcpOutSegs_hi << 32) +
+		m.tcpOutSegs_lo;
+	*p++ = ((u64)m.tcpRetransSeg_hi << 32) +
+		m.tcpRetransSeg_lo;
+	*p++ = ((u64)m.tcpInErrs_hi << 32) +
+		m.tcpInErrs_lo;
+	*p++ = (u64)m.tcpRtoMin;
+	*p++ = (u64)m.tcpRtoMax;
+
+	/* Emsure we provided all values */
+	BUG_ON(p - stats->value !=
+	       (sizeof(names) / sizeof(char *)) - 1);
+
 	return 0;
 }
 
Index: linux/drivers/infiniband/hw/cxgb4/provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb4/provider.c
+++ linux/drivers/infiniband/hw/cxgb4/provider.c
@@ -445,18 +445,32 @@ static ssize_t show_board(struct device
 		       c4iw_dev->rdev.lldi.pdev->device);
 }
 
+static char *names[] = {
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpOutRsts",
+	NULL
+};
+
 static int c4iw_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port)
 {
 	struct tp_tcp_stats v4, v6;
 	struct c4iw_dev *c4iw_dev = to_c4iw_dev(ibdev);
 
+	if (port != 0)
+		return 0;
+
 	cxgb4_get_tcp_stats(c4iw_dev->rdev.lldi.pdev, &v4, &v6);
-	memset(stats, 0, sizeof *stats);
-	stats->iw.tcpInSegs = v4.tcp_in_segs + v6.tcp_in_segs;
-	stats->iw.tcpOutSegs = v4.tcp_out_segs + v6.tcp_out_segs;
-	stats->iw.tcpRetransSegs = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
-	stats->iw.tcpOutRsts = v4.tcp_out_rsts + v6.tcp_out_rsts;
+	stats->value[0] = v4.tcp_in_segs + v6.tcp_in_segs;
+	stats->value[1] = v4.tcp_out_segs + v6.tcp_out_segs;
+	stats->value[2] = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
+	stats->value[3] = v4.tcp_out_rsts + v6.tcp_out_rsts;
+
+	stats->name = names;
+	stats->dirname = "iw_stats";
 
 	return 0;
 }
Index: linux/include/rdma/ib_verbs.h
===================================================================
--- linux.orig/include/rdma/ib_verbs.h
+++ linux/include/rdma/ib_verbs.h
@@ -394,55 +394,13 @@ enum ib_port_speed {
 	IB_SPEED_EDR	= 32
 };
 
-struct ib_protocol_stats {
-	/* TBD... */
-};
-
-struct iw_protocol_stats {
-	u64	ipInReceives;
-	u64	ipInHdrErrors;
-	u64	ipInTooBigErrors;
-	u64	ipInNoRoutes;
-	u64	ipInAddrErrors;
-	u64	ipInUnknownProtos;
-	u64	ipInTruncatedPkts;
-	u64	ipInDiscards;
-	u64	ipInDelivers;
-	u64	ipOutForwDatagrams;
-	u64	ipOutRequests;
-	u64	ipOutDiscards;
-	u64	ipOutNoRoutes;
-	u64	ipReasmTimeout;
-	u64	ipReasmReqds;
-	u64	ipReasmOKs;
-	u64	ipReasmFails;
-	u64	ipFragOKs;
-	u64	ipFragFails;
-	u64	ipFragCreates;
-	u64	ipInMcastPkts;
-	u64	ipOutMcastPkts;
-	u64	ipInBcastPkts;
-	u64	ipOutBcastPkts;
-
-	u64	tcpRtoAlgorithm;
-	u64	tcpRtoMin;
-	u64	tcpRtoMax;
-	u64	tcpMaxConn;
-	u64	tcpActiveOpens;
-	u64	tcpPassiveOpens;
-	u64	tcpAttemptFails;
-	u64	tcpEstabResets;
-	u64	tcpCurrEstab;
-	u64	tcpInSegs;
-	u64	tcpOutSegs;
-	u64	tcpRetransSegs;
-	u64	tcpInErrs;
-	u64	tcpOutRsts;
-};
-
-union rdma_protocol_stats {
-	struct ib_protocol_stats	ib;
-	struct iw_protocol_stats	iw;
+#define MAX_NR_PROTOCOL_STATS 120
+struct rdma_protocol_stats {
+	char *dirname;		/* Directory name */
+	char **name;		/* NULL terminated list of strings for the names
+				 * of the counters
+				 */
+	u64 value[MAX_NR_PROTOCOL_STATS];
 };
 
 /* Define bits for the various functionality this port needs to be supported by
@@ -1665,7 +1623,8 @@ struct ib_device {
 	struct iw_cm_verbs	     *iwcm;
 
 	int		           (*get_protocol_stats)(struct ib_device *device,
-							 union rdma_protocol_stats *stats);
+							 struct rdma_protocol_stats *stats,
+							 u8 port);
 	int		           (*query_device)(struct ib_device *device,
 						   struct ib_device_attr *device_attr,
 						   struct ib_udata *udata);
@@ -1871,6 +1830,7 @@ struct ib_device {
 	u8                           node_type;
 	u8                           phys_port_cnt;
 	struct ib_device_attr        attrs;
+	struct attribute_group	     *ag;
 
 	/**
 	 * The following mandatory functions are used only at device

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 2/3] mlx4: Add support for protocol statistics
  2016-03-15 15:54 [PATCH 0/3] Dynamically extendable device counter support Christoph Lameter
  2016-03-15 15:54 ` [PATCH 1/3] ib core: Make device counter infrastructure dynamic Christoph Lameter
@ 2016-03-15 15:54 ` Christoph Lameter
  2016-03-15 15:54 ` [PATCH 3/3] mlx5: Implement new counter infrastructure Christoph Lameter
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2016-03-15 15:54 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny, Jason Gunthorpe,
	Mark Bloch, Steve Wise, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: mlx4_protocol_stats --]
[-- Type: text/plain, Size: 5368 bytes --]

Add support for basic Infiniband CQ statistics for the mlx4 driver.

This is based on an old approach posted years ago. Maybe Mellanox
can come up with a better one but this shows how it can be implemented
in the mlx4 driver and it works for our needs.

The old patch did only show how to retrieve the counter
values for the device as a whole not for each port. An improvement
would be to add proper support for port based counters.

Signed-off-by: Majd Dibbiny <majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Index: linux/drivers/infiniband/hw/mlx4/main.c
===================================================================
--- linux.orig/drivers/infiniband/hw/mlx4/main.c
+++ linux/drivers/infiniband/hw/mlx4/main.c
@@ -2210,6 +2210,106 @@ static int mlx4_port_immutable(struct ib
 	return 0;
 }
 
+enum mlx4_diagnostic_counters {
+	MLX4_RQ_NUM_LLE,
+	MLX4_SQ_NUM_LLE,
+	MLX4_RQ_NUM_LQPOE,
+	MLX4_SQ_NUM_LQPOE,
+	MLX4_RQ_NUM_LPE,
+	MLX4_SQ_NUM_LPE,
+	MLX4_RQ_NUM_WRFE,
+	MLX4_SQ_NUM_WRFE,
+	MLX4_SQ_NUM_MWBE,
+	MLX4_SQ_NUM_BRE,
+	MLX4_RQ_NUM_LAE,
+	MLX4_SQ_NUM_RIRE,
+	MLX4_RQ_NUM_RIRE,
+	MLX4_SQ_NUM_RAE,
+	MLX4_RQ_NUM_RAE,
+	MLX4_SQ_NUM_ROE,
+	MLX4_SQ_NUM_TREE,
+	MLX4_SQ_NUM_RREE,
+	MLX4_RQ_NUM_RNR,
+	MLX4_SQ_NUM_RNR,
+	MLX4_RQ_NUM_OOS,
+	MLX4_SQ_NUM_OOS,
+	MLX4_RQ_NUM_MCE,
+	MLX4_RQ_NUM_UDSDPRD,
+	MLX4_RQ_NUM_UCSDPRD,
+	MLX4_NUM_CQOVF,
+	MLX4_NUM_EQOVF,
+	MLX4_NUM_BADDB,
+	MLX4_COUNTERS_NUM,
+};
+
+
+static char *mlx4_counter_names[] = {
+	"rq_num_lle",
+	"sq_num_lle",
+	"rq_num_lqpoe",
+	"sq_num_lqpoe",
+	"rq_num_lpe",
+	"sq_num_lpe",
+	"rq_num_wrfe",
+	"sq_num_wrfe",
+	"sq_num_mwbe",
+	"sq_num_bre",
+	"rq_num_lae",
+	"sq_num_rire",
+	"rq_num_rire",
+	"sq_num_rae",
+	"rq_num_rae",
+	"sq_num_roe",
+	"sq_num_tree",
+	"sq_num_rree",
+	"rq_num_rnr",
+	"sq_num_rnr",
+	"rq_num_oos",
+	"sq_num_oos",
+	"rq_num_mce",
+	"rq_num_udsdprd",
+	"rq_num_ucsdprd",
+	"num_cqovf",
+	"num_eqovf",
+	"num_baddb",
+	NULL
+};
+
+static int offsets[MLX4_COUNTERS_NUM] = {
+	0x00, 0x04, 0x08, 0x0C, 0x18, 0x1C, 0x20, 0x24,
+	0x2C, 0x34, 0x38, 0x44, 0x48, 0x4C, 0x50, 0x54,
+	0x5C, 0x64, 0x68, 0x6C, 0x100, 0x104, 0x108, 0x118,
+	0x120, 0x1A0, 0x1A4, 0x1A8 };
+
+static int mlx4_ib_show_stats(struct ib_device *ibdev,
+		struct rdma_protocol_stats *stats, u8 port)
+{
+	struct mlx4_ib_dev *dev;
+	int ret;
+
+	stats->name = mlx4_counter_names;
+	stats->dirname = "ib_stats";
+
+	/*
+	 * We should be checking for the port here and obtain
+	 * port specific counters if possible
+	 *
+	 *	if (port)
+	 *		{ Do port specific things }
+	 *
+	 */
+
+	dev = to_mdev(ibdev);
+	ret = mlx4_query_diag_counters(dev->dev, 2, offsets, stats->value,
+				       MLX4_COUNTERS_NUM);
+	if (ret) {
+		pr_err("%s: failed to read ib counters\n", __func__);
+		return ret;
+	}
+
+	return 0;
+}
+
 static void *mlx4_ib_add(struct mlx4_dev *dev)
 {
 	struct mlx4_ib_dev *ibdev;
@@ -2344,6 +2444,7 @@ static void *mlx4_ib_add(struct mlx4_dev
 	ibdev->ib_dev.process_mad	= mlx4_ib_process_mad;
 	ibdev->ib_dev.get_port_immutable = mlx4_port_immutable;
 	ibdev->ib_dev.disassociate_ucontext = mlx4_ib_disassociate_ucontext;
+	ibdev->ib_dev.get_protocol_stats = mlx4_ib_show_stats;
 
 	if (!mlx4_is_slave(ibdev->dev)) {
 		ibdev->ib_dev.alloc_fmr		= mlx4_ib_fmr_alloc;
Index: linux/drivers/net/ethernet/mellanox/mlx4/fw.c
===================================================================
--- linux.orig/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ linux/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -2453,6 +2453,41 @@ int mlx4_NOP(struct mlx4_dev *dev)
 			MLX4_CMD_NATIVE);
 }
 
+int mlx4_query_diag_counters(struct mlx4_dev *dev, u8 op_modifier,
+			     int *offsets, u64 *values,
+			     size_t array_len)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 *outbox;
+	int ret;
+	int i;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+	outbox = mailbox->buf;
+
+	ret = mlx4_cmd_box(dev, 0, mailbox->dma, 0, op_modifier,
+			   MLX4_CMD_DIAG_RPRT, MLX4_CMD_TIME_CLASS_A,
+			   MLX4_CMD_NATIVE);
+	if (ret)
+		goto out;
+
+	for (i = 0; i < array_len; i++) {
+		if (offsets[i] > MLX4_MAILBOX_SIZE) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		MLX4_GET(values[i], outbox, offsets[i]);
+	}
+
+out:
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_query_diag_counters);
+
 int mlx4_get_phys_port_id(struct mlx4_dev *dev)
 {
 	u8 port;
Index: linux/include/linux/mlx4/device.h
===================================================================
--- linux.orig/include/linux/mlx4/device.h
+++ linux/include/linux/mlx4/device.h
@@ -1369,6 +1369,8 @@ void mlx4_fmr_unmap(struct mlx4_dev *dev
 		    u32 *lkey, u32 *rkey);
 int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
+int mlx4_query_diag_counters(struct mlx4_dev *dev, u8 op_modifier,
+			     int *offsets, u64 *values, size_t array_len);
 int mlx4_test_interrupts(struct mlx4_dev *dev);
 u32 mlx4_get_eqs_per_port(struct mlx4_dev *dev, u8 port);
 bool mlx4_is_eq_vector_valid(struct mlx4_dev *dev, u8 port, int vector);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 3/3] mlx5: Implement new counter infrastructure
  2016-03-15 15:54 [PATCH 0/3] Dynamically extendable device counter support Christoph Lameter
  2016-03-15 15:54 ` [PATCH 1/3] ib core: Make device counter infrastructure dynamic Christoph Lameter
  2016-03-15 15:54 ` [PATCH 2/3] mlx4: Add support for protocol statistics Christoph Lameter
@ 2016-03-15 15:54 ` Christoph Lameter
       [not found]   ` <20160315155455.397561811-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-15 15:54 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: mlx5_sample_stats --]
[-- Type: text/plain, Size: 2651 bytes --]

This provides the proper infrastructure to add counters to mlx5 and
shows some constant integer values in sysfs. It lacks the actual
access to the hardware counters. This is a vendor specific issue
so I hope that Mellanox can add that.

Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Index: linux/drivers/infiniband/hw/mlx5/main.c
===================================================================
--- linux.orig/drivers/infiniband/hw/mlx5/main.c
+++ linux/drivers/infiniband/hw/mlx5/main.c
@@ -2122,6 +2122,76 @@ static int mlx5_port_immutable(struct ib
 	return 0;
 }
 
+#define MLX5_NUM_COUNTERS 28
+
+static char *mlx5_counter_names[MLX5_NUM_COUNTERS + 1] = {
+	"rq_num_lle",
+	"sq_num_lle",
+	"rq_num_lqpoe",
+	"sq_num_lqpoe",
+	"rq_num_lpe",
+	"sq_num_lpe",
+	"rq_num_wrfe",
+	"sq_num_wrfe",
+	"sq_num_mwbe",
+	"sq_num_bre",
+	"rq_num_lae",
+	"sq_num_rire",
+	"rq_num_rire",
+	"sq_num_rae",
+	"rq_num_rae",
+	"sq_num_roe",
+	"sq_num_tree",
+	"sq_num_rree",
+	"rq_num_rnr",
+	"sq_num_rnr",
+	"rq_num_oos",
+	"sq_num_oos",
+	"rq_num_mce",
+	"rq_num_udsdprd",
+	"rq_num_ucsdprd",
+	"num_cqovf",
+	"num_eqovf",
+	"num_baddb",
+	NULL };
+
+#define MLX5_NUM_PORT_COUNTERS 1
+static char *mlx5_port_counter_names[MLX5_NUM_PORT_COUNTERS + 1] = {
+	"p1",
+	NULL
+};
+
+static int mlx5_get_protocol_stats(struct ib_device *ibdev,
+				   struct rdma_protocol_stats *stats,
+				   u8 port)
+{
+	if (port != 0) {
+		/* Port Counters */
+		stats->name = mlx5_port_counter_names;
+		stats->dirname = "stats";
+		/*
+		 * Just one value for testing. Insert the right
+		 * way to retrieve all the counters here
+		 */
+		stats->value[0] = 1;
+		return 0;
+	}
+
+
+	/* Device Counters */
+
+	/* No idea how to retrieve the actual counters. Just some values */
+	stats->value[0] = 1;
+	stats->value[1] = 2;
+	stats->value[2] = 3;
+	stats->value[3] = 4;
+
+	stats->name = mlx5_counter_names;
+	stats->dirname = "ib_stats";
+
+	return 0;
+}
+
 static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
 {
 	int err;
@@ -2266,6 +2336,7 @@ static void *mlx5_ib_add(struct mlx5_cor
 	dev->ib_dev.map_mr_sg		= mlx5_ib_map_mr_sg;
 	dev->ib_dev.check_mr_status	= mlx5_ib_check_mr_status;
 	dev->ib_dev.get_port_immutable  = mlx5_port_immutable;
+	dev->ib_dev.get_protocol_stats	= mlx5_get_protocol_stats;
 
 	mlx5_ib_internal_fill_odp_caps(dev);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]   ` <20160315155455.173645653-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
@ 2016-03-16 15:47     ` Dennis Dalessandro
       [not found]       ` <20160316154738.GA26530-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  2016-03-17  7:23     ` Leon Romanovsky
  2016-05-13 19:18     ` Doug Ledford
  2 siblings, 1 reply; 42+ messages in thread
From: Dennis Dalessandro @ 2016-03-16 15:47 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Tue, Mar 15, 2016 at 10:54:42AM -0500, Christoph Lameter wrote:
 
>+static ssize_t show_protocol_stats(struct ib_device *dev, int index,
>+				   u8 port, char *buf)
>+{
>+	struct rdma_protocol_stats stats = {0};
>+	ssize_t ret;
>+
>+	ret = dev->get_protocol_stats(dev, &stats, port);
>+	if (ret)
>+		return ret;
>+
>+	return sprintf(buf, "%llu\n", stats.value[index]);
>+}

This works fine when there are few counters, but if there are lots? What 
about changing things to include the index in the get_protocol_stats() call 
so that we just grab the single item we are looking for?

>+static struct attribute_group *create_protocol_stats(struct ib_device 
>*device,
>+						     struct kobject *kobj,
>+						     u8 port) {
>+	struct attribute_group *ag;
>+	struct rdma_protocol_stats stats = {0};
>+	u32 counters;
>+	u32 i;
>+	int ret;
>+
>+	ret = device->get_protocol_stats(device, &stats, port);
>+
>+	if (ret || !stats.name)
>+		return NULL;
>+
>+	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
>+	if (!ag)
>+		return NULL;
>+
>+	ag->name = stats.dirname;
>+
>+	for (counters = 0; stats.name[counters]; counters++)
>+		;
>+
>+	BUG_ON(counters > MAX_NR_PROTOCOL_STATS);

Should we really be bringing the machine crashing down here?

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]       ` <20160316154738.GA26530-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2016-03-16 17:17         ` Christoph Lameter
       [not found]           ` <alpine.DEB.2.20.1603161213560.15010-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-16 17:17 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Wed, 16 Mar 2016, Dennis Dalessandro wrote:

> > +	return sprintf(buf, "%llu\n", stats.value[index]);
> > +}
>
> This works fine when there are few counters, but if there are lots? What about
> changing things to include the index in the get_protocol_stats() call so that
> we just grab the single item we are looking for?

Right now there is a limiting in the source to a max of 128 counters. In
practice all users use less than 30 counters at this point. So we have
some space to go until we may need a better solution. We could also save
the stats we have retrieved so it can be used for multiple counters.

But I think this something for the future.

> > +	for (counters = 0; stats.name[counters]; counters++)
> > +		;
> > +
> > +	BUG_ON(counters > MAX_NR_PROTOCOL_STATS);
>
> Should we really be bringing the machine crashing down here?

The counter tables are defined at compile time. The BUG_ON ensures that
the driver will bug immediately when testing if we hit that limit.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]           ` <alpine.DEB.2.20.1603161213560.15010-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-03-16 17:45             ` Dennis Dalessandro
  0 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2016-03-16 17:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Wed, Mar 16, 2016 at 12:17:23PM -0500, Christoph Lameter wrote:
>On Wed, 16 Mar 2016, Dennis Dalessandro wrote:
>
>> > +	return sprintf(buf, "%llu\n", stats.value[index]);
>> > +}
>>
>> This works fine when there are few counters, but if there are lots? What about
>> changing things to include the index in the get_protocol_stats() call so that
>> we just grab the single item we are looking for?
>
>Right now there is a limiting in the source to a max of 128 counters. In
>practice all users use less than 30 counters at this point. So we have
>some space to go until we may need a better solution. We could also save
>the stats we have retrieved so it can be used for multiple counters.
>
>But I think this something for the future.

Fair enough. The only thing I would add is it would be easier now since you 
already change the only 2 drivers which use it, and are adding it to the Mlx 
drivers for the first time. 

Just throwing the idea out. I'm not totally against the current patch.

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]   ` <20160315155455.173645653-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2016-03-16 15:47     ` Dennis Dalessandro
@ 2016-03-17  7:23     ` Leon Romanovsky
       [not found]       ` <20160317072354.GB25216-2ukJVAZIZ/Y@public.gmane.org>
  2016-05-13 19:18     ` Doug Ledford
  2 siblings, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2016-03-17  7:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Tue, Mar 15, 2016 at 10:54:42AM -0500, Christoph Lameter wrote:
> +static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
> +{
> +	struct ib_port_stats *ps;
> +
> +	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
> +	if (!ps)
> +		return NULL;

Out of curiosity, why are we returning NULL and not ERR_PTR(ENOMEM)?

> +
> +
> +static struct attribute_group *create_protocol_stats(struct ib_device *device,
> +						     struct kobject *kobj,
> +						     u8 port) {
> +	struct attribute_group *ag;
> +	struct rdma_protocol_stats stats = {0};
> +	u32 counters;
> +	u32 i;
> +	int ret;
> +
> +	ret = device->get_protocol_stats(device, &stats, port);
> +
> +	if (ret || !stats.name)

This check puzzles me, "stats" variable was declares on stack a couple
of lines before. All values in it were assigned to zero, so my
expectation that stats.name == 0 (null). Am I right?
So what do you check here?

> +		return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]       ` <20160317072354.GB25216-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-03-17  8:17         ` Leon Romanovsky
       [not found]           ` <20160317081716.GD25216-2ukJVAZIZ/Y@public.gmane.org>
  2016-03-18  1:56         ` Christoph Lameter
  1 sibling, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2016-03-17  8:17 UTC (permalink / raw)
  To: Christoph Lameter, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Thu, Mar 17, 2016 at 09:23:54AM +0200, Leon Romanovsky wrote:
> On Tue, Mar 15, 2016 at 10:54:42AM -0500, Christoph Lameter wrote:
> > +static struct attribute_group *create_protocol_stats(struct ib_device *device,
> > +						     struct kobject *kobj,
> > +						     u8 port) {
> > +	struct attribute_group *ag;
> > +	struct rdma_protocol_stats stats = {0};
> > +	u32 counters;
> > +	u32 i;
> > +	int ret;
> > +
> > +	ret = device->get_protocol_stats(device, &stats, port);
> > +
> > +	if (ret || !stats.name)
> 
> This check puzzles me, "stats" variable was declares on stack a couple
> of lines before. All values in it were assigned to zero, so my
> expectation that stats.name == 0 (null). Am I right?
> So what do you check here?

Reply to myself, after internal conversation.
The stats.name is filled by get_protocol_stats function in case of
counter was found.

I didn't check for other proposals except mlx5 but I assume that in case nothing was filled
there, this function should return NOSUPPORT.

> 
> > +		return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]       ` <20160317072354.GB25216-2ukJVAZIZ/Y@public.gmane.org>
  2016-03-17  8:17         ` Leon Romanovsky
@ 2016-03-18  1:56         ` Christoph Lameter
       [not found]           ` <alpine.DEB.2.20.1603172054540.18714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-18  1:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Thu, 17 Mar 2016, Leon Romanovsky wrote:

> On Tue, Mar 15, 2016 at 10:54:42AM -0500, Christoph Lameter wrote:
> > +static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
> > +{
> > +	struct ib_port_stats *ps;
> > +
> > +	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
> > +	if (!ps)
> > +		return NULL;
>
> Out of curiosity, why are we returning NULL and not ERR_PTR(ENOMEM)?

Its the convention for kmalloc. Yes, I am the maintainer of the slab
allocators but this predates my earliest activities and this convention is
widely used throughout the kernel and NULL is the only way that
kmalloc signals an error condition.

> > +	u32 i;
> > +	int ret;
> > +
> > +	ret = device->get_protocol_stats(device, &stats, port);
> > +
> > +	if (ret || !stats.name)
>
> This check puzzles me, "stats" variable was declares on stack a couple
> of lines before. All values in it were assigned to zero, so my
> expectation that stats.name == 0 (null). Am I right?

The get_protocol_stats() function modifies the stats structure.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]           ` <20160317081716.GD25216-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-03-18  1:57             ` Christoph Lameter
       [not found]               ` <alpine.DEB.2.20.1603172057030.18714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-18  1:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Thu, 17 Mar 2016, Leon Romanovsky wrote:

> I didn't check for other proposals except mlx5 but I assume that in case nothing was filled
> there, this function should return NOSUPPORT.

The function simply does not create the directory. So we are fine.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]           ` <alpine.DEB.2.20.1603172054540.18714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-03-18  6:16             ` Leon Romanovsky
       [not found]               ` <20160318061659.GH25216-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2016-03-18  6:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Thu, Mar 17, 2016 at 08:56:55PM -0500, Christoph Lameter wrote:
> On Thu, 17 Mar 2016, Leon Romanovsky wrote:
> 
> > On Tue, Mar 15, 2016 at 10:54:42AM -0500, Christoph Lameter wrote:
> > > +static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
> > > +{
> > > +	struct ib_port_stats *ps;
> > > +
> > > +	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
> > > +	if (!ps)
> > > +		return NULL;
> >
> > Out of curiosity, why are we returning NULL and not ERR_PTR(ENOMEM)?
> 
> Its the convention for kmalloc. Yes, I am the maintainer of the slab
> allocators but this predates my earliest activities and this convention is
> widely used throughout the kernel and NULL is the only way that
> kmalloc signals an error condition.

Sure, and the question was "do you need to carry this semantic on and
return NULL instead of ERR_PTR for alloc_stats_port call?"

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]               ` <alpine.DEB.2.20.1603172057030.18714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-03-18  6:20                 ` Leon Romanovsky
       [not found]                   ` <20160318062009.GI25216-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2016-03-18  6:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Thu, Mar 17, 2016 at 08:57:23PM -0500, Christoph Lameter wrote:
> On Thu, 17 Mar 2016, Leon Romanovsky wrote:
> 
> > I didn't check for other proposals except mlx5 but I assume that in case nothing was filled
> > there, this function should return NOSUPPORT.
> 
> The function simply does not create the directory. So we are fine.
> 

It makes the stat.name check redundant. Do you have scenario in mind
where ret == 0 and stat.name == NULL as a result of call to the
function?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                   ` <20160318062009.GI25216-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-03-18 14:33                     ` Christoph Lameter
       [not found]                       ` <alpine.DEB.2.20.1603180932310.25235-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-18 14:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Fri, 18 Mar 2016, Leon Romanovsky wrote:

> It makes the stat.name check redundant. Do you have scenario in mind
> where ret == 0 and stat.name == NULL as a result of call to the
> function?

This is a function provided by those writing the device drivers. Better
check that the function filled out at least one required value.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]               ` <20160318061659.GH25216-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-03-18 14:34                 ` Christoph Lameter
       [not found]                   ` <alpine.DEB.2.20.1603180933320.25235-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-03-18 14:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Fri, 18 Mar 2016, Leon Romanovsky wrote:

> Sure, and the question was "do you need to carry this semantic on and
> return NULL instead of ERR_PTR for alloc_stats_port call?"

Yes I think that is most convenient and its customary to do it like that
unless there are special reasons why one wants to propagate an error code.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                   ` <alpine.DEB.2.20.1603180933320.25235-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-03-18 15:42                     ` Leon Romanovsky
  0 siblings, 0 replies; 42+ messages in thread
From: Leon Romanovsky @ 2016-03-18 15:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Fri, Mar 18, 2016 at 09:34:04AM -0500, Christoph Lameter wrote:
> On Fri, 18 Mar 2016, Leon Romanovsky wrote:
> 
> > Sure, and the question was "do you need to carry this semantic on and
> > return NULL instead of ERR_PTR for alloc_stats_port call?"
> 
> Yes I think that is most convenient and its customary to do it like that
> unless there are special reasons why one wants to propagate an error code.

Thanks, I'll take this suggestion into account while I'm reviewing
submitted code.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                       ` <alpine.DEB.2.20.1603180932310.25235-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-03-18 15:51                         ` Leon Romanovsky
  0 siblings, 0 replies; 42+ messages in thread
From: Leon Romanovsky @ 2016-03-18 15:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Fri, Mar 18, 2016 at 09:33:28AM -0500, Christoph Lameter wrote:
> On Fri, 18 Mar 2016, Leon Romanovsky wrote:
> 
> > It makes the stat.name check redundant. Do you have scenario in mind
> > where ret == 0 and stat.name == NULL as a result of call to the
> > function?
> 
> This is a function provided by those writing the device drivers. Better
> check that the function filled out at least one required value.
> 

I don't have any objections with this code either, just worried about
patch bots like we saw a couple of months before.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]   ` <20160315155455.173645653-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2016-03-16 15:47     ` Dennis Dalessandro
  2016-03-17  7:23     ` Leon Romanovsky
@ 2016-05-13 19:18     ` Doug Ledford
       [not found]       ` <057c8ac8-1d34-e7b9-c0ad-91d805c81139-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-13 19:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 5598 bytes --]

On 03/15/2016 11:54 AM, Christoph Lameter wrote:
> Index: linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
> ===================================================================
> --- linux.orig/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1218,12 +1218,46 @@ static ssize_t show_board(struct device
>  		       iwch_dev->rdev.rnic_info.pdev->device);
>  }
>  
> +char *names[] = {
> +	"ipInReceives",
> +	"ipInHdrErrors",
> +	"ipInAddrErrors",
> +	"ipInUnknownProtos",
> +	"ipInDiscards",
> +	"ipInDelivers",
> +	"ipOutRequests",
> +	"ipOutDiscards",
> +	"ipOutNoRoutes",
> +	"ipReasmTimeout",
> +	"ipReasmReqds",
> +	"ipReasmOKs",
> +	"ipReasmFails",
> +	"tcpActiveOpens",
> +	"tcpPassiveOpens",
> +	"tcpAttemptFails",
> +	"tcpEstabResets",
> +	"tcpCurrEstab",
> +	"tcpInSegs",
> +	"tcpOutSegs",
> +	"tcpRetransSegs",
> +	"tcpInErrs",
> +	"tcpOutRsts",
> +	"tcpRtoMin",
> +	"tcpRtoMax",
> +	NULL
> +};
> +
>  static int iwch_get_mib(struct ib_device *ibdev,
> -			union rdma_protocol_stats *stats)
> +			struct rdma_protocol_stats *stats,
> +			u8 port)
>  {
>  	struct iwch_dev *dev;
>  	struct tp_mib_stats m;
>  	int ret;
> +	u64 *p;
> +
> +	if (port != 0)
> +		return 0;
>  
>  	PDBG("%s ibdev %p\n", __func__, ibdev);
>  	dev = to_iwch_dev(ibdev);
> @@ -1231,45 +1265,53 @@ static int iwch_get_mib(struct ib_device
>  	if (ret)
>  		return -ENOSYS;
>  
> -	memset(stats, 0, sizeof *stats);
> -	stats->iw.ipInReceives = ((u64) m.ipInReceive_hi << 32) +
> -				m.ipInReceive_lo;
> -	stats->iw.ipInHdrErrors = ((u64) m.ipInHdrErrors_hi << 32) +
> -				  m.ipInHdrErrors_lo;
> -	stats->iw.ipInAddrErrors = ((u64) m.ipInAddrErrors_hi << 32) +
> -				   m.ipInAddrErrors_lo;
> -	stats->iw.ipInUnknownProtos = ((u64) m.ipInUnknownProtos_hi << 32) +
> -				      m.ipInUnknownProtos_lo;
> -	stats->iw.ipInDiscards = ((u64) m.ipInDiscards_hi << 32) +
> -				 m.ipInDiscards_lo;
> -	stats->iw.ipInDelivers = ((u64) m.ipInDelivers_hi << 32) +
> -				 m.ipInDelivers_lo;
> -	stats->iw.ipOutRequests = ((u64) m.ipOutRequests_hi << 32) +
> -				  m.ipOutRequests_lo;
> -	stats->iw.ipOutDiscards = ((u64) m.ipOutDiscards_hi << 32) +
> -				  m.ipOutDiscards_lo;
> -	stats->iw.ipOutNoRoutes = ((u64) m.ipOutNoRoutes_hi << 32) +
> -				  m.ipOutNoRoutes_lo;
> -	stats->iw.ipReasmTimeout = (u64) m.ipReasmTimeout;
> -	stats->iw.ipReasmReqds = (u64) m.ipReasmReqds;
> -	stats->iw.ipReasmOKs = (u64) m.ipReasmOKs;
> -	stats->iw.ipReasmFails = (u64) m.ipReasmFails;
> -	stats->iw.tcpActiveOpens = (u64) m.tcpActiveOpens;
> -	stats->iw.tcpPassiveOpens = (u64) m.tcpPassiveOpens;
> -	stats->iw.tcpAttemptFails = (u64) m.tcpAttemptFails;
> -	stats->iw.tcpEstabResets = (u64) m.tcpEstabResets;
> -	stats->iw.tcpOutRsts = (u64) m.tcpOutRsts;
> -	stats->iw.tcpCurrEstab = (u64) m.tcpCurrEstab;
> -	stats->iw.tcpInSegs = ((u64) m.tcpInSegs_hi << 32) +
> -			      m.tcpInSegs_lo;
> -	stats->iw.tcpOutSegs = ((u64) m.tcpOutSegs_hi << 32) +
> -			       m.tcpOutSegs_lo;
> -	stats->iw.tcpRetransSegs = ((u64) m.tcpRetransSeg_hi << 32) +
> -				  m.tcpRetransSeg_lo;
> -	stats->iw.tcpInErrs = ((u64) m.tcpInErrs_hi << 32) +
> -			      m.tcpInErrs_lo;
> -	stats->iw.tcpRtoMin = (u64) m.tcpRtoMin;
> -	stats->iw.tcpRtoMax = (u64) m.tcpRtoMax;
> +	stats->dirname = "iw_stats";
> +	stats->name = names;
> +
> +	p = stats->value;
> +	*p++ = ((u64)m.ipInReceive_hi << 32) +
> +		m.ipInReceive_lo;
> +	*p++ = ((u64)m.ipInHdrErrors_hi << 32) +
> +		m.ipInHdrErrors_lo;
> +	*p++ = ((u64)m.ipInAddrErrors_hi << 32) +
> +		m.ipInAddrErrors_lo;
> +	*p++ = ((u64)m.ipInUnknownProtos_hi << 32) +
> +		m.ipInUnknownProtos_lo;
> +	*p++ = ((u64)m.ipInDiscards_hi << 32) +
> +		m.ipInDiscards_lo;
> +	*p++ = ((u64)m.ipInDelivers_hi << 32) +
> +		m.ipInDelivers_lo;
> +	*p++ = ((u64)m.ipOutRequests_hi << 32) +
> +		m.ipOutRequests_lo;
> +	*p++ = ((u64)m.ipOutDiscards_hi << 32) +
> +		m.ipOutDiscards_lo;
> +	*p++ = ((u64)m.ipOutNoRoutes_hi << 32) +
> +		m.ipOutNoRoutes_lo;
> +	*p++ = (u64)m.ipReasmTimeout;
> +	*p++ = (u64)m.ipReasmReqds;
> +	*p++ = (u64)m.ipReasmOKs;
> +	*p++ = (u64)m.ipReasmFails;
> +	*p++ = (u64)m.tcpActiveOpens;
> +	*p++ = (u64)m.tcpPassiveOpens;
> +	*p++ = (u64)m.tcpAttemptFails;
> +	*p++ = (u64)m.tcpEstabResets;
> +	*p++ = (u64)m.tcpOutRsts;
> +	*p++ = (u64)m.tcpCurrEstab;
> +	*p++ = ((u64)m.tcpInSegs_hi << 32) +
> +		m.tcpInSegs_lo;
> +	*p++ = ((u64)m.tcpOutSegs_hi << 32) +
> +		m.tcpOutSegs_lo;
> +	*p++ = ((u64)m.tcpRetransSeg_hi << 32) +
> +		m.tcpRetransSeg_lo;
> +	*p++ = ((u64)m.tcpInErrs_hi << 32) +
> +		m.tcpInErrs_lo;
> +	*p++ = (u64)m.tcpRtoMin;
> +	*p++ = (u64)m.tcpRtoMax;
> +
> +	/* Emsure we provided all values */
> +	BUG_ON(p - stats->value !=
> +	       (sizeof(names) / sizeof(char *)) - 1);
> +
>  	return 0;
>  }
>  

I don't like this entire chunk.  Please handle both the cxgb3 and cxgb4
changes like you did the mlx4 changes.  Specifically, use an enum to
define the array index for names and an array of offsets so that the
textual names from the enum can be used to access the array.  The way
things are here is horribly fragile.

Secondly, do *not* use a BUG_ON in this patch.  I saw at least two of
them.  There is nothing in this patch so serious that we should crash
the kernel.  Any failure here is something we can work around and keep
running.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/3] mlx5: Implement new counter infrastructure
       [not found]   ` <20160315155455.397561811-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
@ 2016-05-13 19:18     ` Doug Ledford
       [not found]       ` <d01d42d9-08a8-968a-97f9-40188301170a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-13 19:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 287 bytes --]

On 03/15/2016 11:54 AM, Christoph Lameter wrote:

Please either get Mellanox to provide you what you need to actually set
the counters, or drop this patch from the series.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]       ` <057c8ac8-1d34-e7b9-c0ad-91d805c81139-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-16 13:52         ` Christoph Lameter
       [not found]           ` <alpine.DEB.2.20.1605160851370.23895-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-05-16 13:52 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Fri, 13 May 2016, Doug Ledford wrote:

> I don't like this entire chunk.  Please handle both the cxgb3 and cxgb4
> changes like you did the mlx4 changes.  Specifically, use an enum to
> define the array index for names and an array of offsets so that the
> textual names from the enum can be used to access the array.  The way
> things are here is horribly fragile.

Ok but I cannot separate this out in a a distinct patch.

> Secondly, do *not* use a BUG_ON in this patch.  I saw at least two of
> them.  There is nothing in this patch so serious that we should crash
> the kernel.  Any failure here is something we can work around and keep
> running.

WARN_ON_ONCE then?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/3] mlx5: Implement new counter infrastructure
       [not found]       ` <d01d42d9-08a8-968a-97f9-40188301170a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-16 13:52         ` Christoph Lameter
       [not found]           ` <alpine.DEB.2.20.1605160852260.23895-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-05-16 13:52 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Fri, 13 May 2016, Doug Ledford wrote:

> On 03/15/2016 11:54 AM, Christoph Lameter wrote:
>
> Please either get Mellanox to provide you what you need to actually set
> the counters, or drop this patch from the series.

Mellanox did a new patch that we are still internally revising.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]           ` <alpine.DEB.2.20.1605160851370.23895-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-05-16 15:43             ` Doug Ledford
       [not found]               ` <041c6da0-e022-2bd1-5f00-e569c077e154-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-16 15:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

On 05/16/2016 09:52 AM, Christoph Lameter wrote:
> On Fri, 13 May 2016, Doug Ledford wrote:
> 
>> I don't like this entire chunk.  Please handle both the cxgb3 and cxgb4
>> changes like you did the mlx4 changes.  Specifically, use an enum to
>> define the array index for names and an array of offsets so that the
>> textual names from the enum can be used to access the array.  The way
>> things are here is horribly fragile.
> 
> Ok but I cannot separate this out in a a distinct patch.

That's fine.

>> Secondly, do *not* use a BUG_ON in this patch.  I saw at least two of
>> them.  There is nothing in this patch so serious that we should crash
>> the kernel.  Any failure here is something we can work around and keep
>> running.
> 
> WARN_ON_ONCE then?
> 

Yep.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/3] mlx5: Implement new counter infrastructure
       [not found]           ` <alpine.DEB.2.20.1605160852260.23895-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-05-16 15:44             ` Doug Ledford
  0 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2016-05-16 15:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

On 05/16/2016 09:52 AM, Christoph Lameter wrote:
> On Fri, 13 May 2016, Doug Ledford wrote:
> 
>> On 03/15/2016 11:54 AM, Christoph Lameter wrote:
>>
>> Please either get Mellanox to provide you what you need to actually set
>> the counters, or drop this patch from the series.
> 
> Mellanox did a new patch that we are still internally revising.
> 

Cool, I'll wait for the new series then.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]               ` <041c6da0-e022-2bd1-5f00-e569c077e154-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-16 17:04                 ` Christoph Lameter
       [not found]                   ` <alpine.DEB.2.20.1605161203010.26453-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-05-16 17:04 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Mon, 16 May 2016, Doug Ledford wrote:

> >> Secondly, do *not* use a BUG_ON in this patch.  I saw at least two of
> >> them.  There is nothing in this patch so serious that we should crash
> >> the kernel.  Any failure here is something we can work around and keep
> >> running.
> >
> > WARN_ON_ONCE then?
> Yep.

BUILD_BUG_ON works even better once we have the enum. Revised patch
follows:


Subject: ib core: Make device counter infrastructure dynamic v2

V1-V2:
 - Use enums for counters for cxgb3/cxgb4
 - Use BUILD_BUG_ON instead of BUG_ON

The current approach in ib_verbs.h is to define a struct with
statistics fields. ib_verbs is an abstraction layer for
device drivers and the protocol_stats should be abstract. However,
in practice these are actually device specific counters (also used
in such a way in Mellanox OFED). The device stats are a union
and in practice evolve to a union of device specific stats.

ARGH!!!

This should not be defined in a concrete way in ib_verbs.h since the
API definition needs to be device neutral.

What we need is an API that can return an arbitrary number
of counters that may be general or device specific.

Here we simply define an array of strings that describe the
counters and an array of 64 bit counters that are
displayed through sysfs. This makes ib_verbs API appropriately
abstract and device drivers do not need to update ib_verbs.h
(and thus break the API) to add more counters,.

This patch is groundwork to later define the protocol stats for
Mellanox devices.

The strings being used in the device drives should only be present
if the driver actually supports those counters. If there is
a standardized name then devices should use the same name
as in other infiniband device drivers.

Note on cxgb3/4 portions:
- I was unable to test because of the lack of hardware
- The counters that are never incremented were omitted. There may be the
  impression that less counter are supported. But the other counters
  have never been supported.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Index: linux/drivers/infiniband/core/sysfs.c
===================================================================
--- linux.orig/drivers/infiniband/core/sysfs.c
+++ linux/drivers/infiniband/core/sysfs.c
@@ -58,6 +58,7 @@ struct ib_port {
 	struct attribute_group pkey_group;
 	u8                     port_num;
 	struct attribute_group *pma_table;
+	struct attribute_group *ag;
 };

 struct port_attribute {
@@ -80,6 +81,16 @@ struct port_table_attribute {
 	__be16			attr_id;
 };

+struct ib_port_stats {
+	struct port_attribute	attr;
+	u32			index;
+};
+
+struct ib_device_stats {
+	struct device_attribute	attr;
+	u32			index;
+};
+
 static ssize_t port_attr_show(struct kobject *kobj,
 			      struct attribute *attr, char *buf)
 {
@@ -733,6 +744,151 @@ static struct attribute_group *get_count
 	return &pma_group;
 }

+static ssize_t show_protocol_stats(struct ib_device *dev, int index,
+				   u8 port, char *buf)
+{
+	struct rdma_protocol_stats stats = {0};
+	ssize_t ret;
+
+	ret = dev->get_protocol_stats(dev, &stats, port);
+	if (ret)
+		return ret;
+
+	return sprintf(buf, "%llu\n", stats.value[index]);
+}
+
+static ssize_t show_device_protocol_stats(struct device *device,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+	struct ib_device_stats *ds;
+
+	ds = container_of(attr, struct ib_device_stats, attr);
+	return show_protocol_stats(dev, ds->index, 0, buf);
+}
+
+static ssize_t show_port_protocol_stats(struct ib_port *p,
+					struct port_attribute *attr,
+					char *buf)
+{
+	struct ib_port_stats *ps;
+
+	ps = container_of(attr, struct ib_port_stats, attr);
+	return	show_protocol_stats(p->ibdev, ps->index, p->port_num, buf);
+}
+
+static void free_protocol_stats(struct kobject *kobj,
+				struct attribute_group *ag)
+{
+	struct attribute **da;
+
+	sysfs_remove_group(kobj, ag);
+
+	for (da = ag->attrs; *da; da++)
+		kfree(*da);
+
+	kfree(ag->attrs);
+	kfree(ag);
+}
+
+static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
+{
+	struct ib_port_stats *ps;
+
+	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
+	if (!ps)
+		return NULL;
+
+	ps->attr.attr.name = name;
+	ps->attr.attr.mode = S_IRUGO;
+	ps->attr.show = show_port_protocol_stats;
+	ps->attr.store = NULL;
+	ps->index = index;
+
+	return &ps->attr.attr;
+}
+
+static struct attribute *alloc_stats_device(u32 index, u8 port, char *name)
+{
+	struct ib_device_stats *da;
+
+	da = kmalloc(sizeof(*da), GFP_KERNEL);
+	if (!da)
+		return NULL;
+
+	da->attr.attr.name = name;
+	da->attr.attr.mode = S_IRUGO;
+	da->attr.show = show_device_protocol_stats;
+	da->attr.store = NULL;
+	da->index = index;
+
+	return &da->attr.attr;
+}
+
+static struct attribute *alloc_stats_attribute(u32 index, u8 port, char *name)
+{
+	return (port) ? alloc_stats_port(index, port, name) :
+		alloc_stats_device(index, port, name);
+}
+
+static struct attribute_group *create_protocol_stats(struct ib_device *device,
+						     struct kobject *kobj,
+						     u8 port) {
+	struct attribute_group *ag;
+	struct rdma_protocol_stats stats = {0};
+	u32 counters;
+	u32 i;
+	int ret;
+
+	ret = device->get_protocol_stats(device, &stats, port);
+
+	if (ret || !stats.name)
+		return NULL;
+
+	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	if (!ag)
+		return NULL;
+
+	ag->name = stats.dirname;
+
+	for (counters = 0; stats.name[counters]; counters++)
+		;
+
+	BUG_ON(counters > MAX_NR_PROTOCOL_STATS);
+
+	ag->attrs = kcalloc((counters + 1), sizeof(struct attribute *),
+			    GFP_KERNEL);
+	if (!ag->attrs)
+		goto nomem;
+
+	for (i = 0; i < counters; i++) {
+		struct attribute *attr;
+
+		attr = alloc_stats_attribute(i, port, stats.name[i]);
+		if (!attr)
+			goto nomem2;
+
+		ag->attrs[i] = attr;
+	}
+	ag->attrs[counters] = NULL;
+
+	ret = sysfs_create_group(kobj, ag);
+
+	if (ret)
+		goto nomem2;
+
+	return ag;
+nomem2:
+	for (i = 0; i < counters; i++)
+		kfree(ag->attrs[i]);
+
+	kfree(ag->attrs);
+nomem:
+	kfree(ag);
+	return NULL;
+}
+
 static int add_port(struct ib_device *device, int port_num,
 		    int (*port_callback)(struct ib_device *,
 					 u8, struct kobject *))
@@ -835,6 +991,12 @@ static int add_port(struct ib_device *de
 			goto err_remove_pkey;
 	}

+	/* If port == 0, it means we have only one port, so the device should
+	 * hold the protocol stats
+	 */
+	if (device->get_protocol_stats && port_num)
+		p->ag = create_protocol_stats(device, &p->kobj, port_num);
+
 	list_add_tail(&p->kobj.entry, &device->port_list);

 	kobject_uevent(&p->kobj, KOBJ_ADD);
@@ -972,120 +1134,6 @@ static struct device_attribute *ib_class
 	&dev_attr_node_desc
 };

-/* Show a given an attribute in the statistics group */
-static ssize_t show_protocol_stat(const struct device *device,
-			    struct device_attribute *attr, char *buf,
-			    unsigned offset)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-	union rdma_protocol_stats stats;
-	ssize_t ret;
-
-	ret = dev->get_protocol_stats(dev, &stats);
-	if (ret)
-		return ret;
-
-	return sprintf(buf, "%llu\n",
-		       (unsigned long long) ((u64 *) &stats)[offset]);
-}
-
-/* generate a read-only iwarp statistics attribute */
-#define IW_STATS_ENTRY(name)						\
-static ssize_t show_##name(struct device *device,			\
-			   struct device_attribute *attr, char *buf)	\
-{									\
-	return show_protocol_stat(device, attr, buf,			\
-				  offsetof(struct iw_protocol_stats, name) / \
-				  sizeof (u64));			\
-}									\
-static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
-
-IW_STATS_ENTRY(ipInReceives);
-IW_STATS_ENTRY(ipInHdrErrors);
-IW_STATS_ENTRY(ipInTooBigErrors);
-IW_STATS_ENTRY(ipInNoRoutes);
-IW_STATS_ENTRY(ipInAddrErrors);
-IW_STATS_ENTRY(ipInUnknownProtos);
-IW_STATS_ENTRY(ipInTruncatedPkts);
-IW_STATS_ENTRY(ipInDiscards);
-IW_STATS_ENTRY(ipInDelivers);
-IW_STATS_ENTRY(ipOutForwDatagrams);
-IW_STATS_ENTRY(ipOutRequests);
-IW_STATS_ENTRY(ipOutDiscards);
-IW_STATS_ENTRY(ipOutNoRoutes);
-IW_STATS_ENTRY(ipReasmTimeout);
-IW_STATS_ENTRY(ipReasmReqds);
-IW_STATS_ENTRY(ipReasmOKs);
-IW_STATS_ENTRY(ipReasmFails);
-IW_STATS_ENTRY(ipFragOKs);
-IW_STATS_ENTRY(ipFragFails);
-IW_STATS_ENTRY(ipFragCreates);
-IW_STATS_ENTRY(ipInMcastPkts);
-IW_STATS_ENTRY(ipOutMcastPkts);
-IW_STATS_ENTRY(ipInBcastPkts);
-IW_STATS_ENTRY(ipOutBcastPkts);
-IW_STATS_ENTRY(tcpRtoAlgorithm);
-IW_STATS_ENTRY(tcpRtoMin);
-IW_STATS_ENTRY(tcpRtoMax);
-IW_STATS_ENTRY(tcpMaxConn);
-IW_STATS_ENTRY(tcpActiveOpens);
-IW_STATS_ENTRY(tcpPassiveOpens);
-IW_STATS_ENTRY(tcpAttemptFails);
-IW_STATS_ENTRY(tcpEstabResets);
-IW_STATS_ENTRY(tcpCurrEstab);
-IW_STATS_ENTRY(tcpInSegs);
-IW_STATS_ENTRY(tcpOutSegs);
-IW_STATS_ENTRY(tcpRetransSegs);
-IW_STATS_ENTRY(tcpInErrs);
-IW_STATS_ENTRY(tcpOutRsts);
-
-static struct attribute *iw_proto_stats_attrs[] = {
-	&dev_attr_ipInReceives.attr,
-	&dev_attr_ipInHdrErrors.attr,
-	&dev_attr_ipInTooBigErrors.attr,
-	&dev_attr_ipInNoRoutes.attr,
-	&dev_attr_ipInAddrErrors.attr,
-	&dev_attr_ipInUnknownProtos.attr,
-	&dev_attr_ipInTruncatedPkts.attr,
-	&dev_attr_ipInDiscards.attr,
-	&dev_attr_ipInDelivers.attr,
-	&dev_attr_ipOutForwDatagrams.attr,
-	&dev_attr_ipOutRequests.attr,
-	&dev_attr_ipOutDiscards.attr,
-	&dev_attr_ipOutNoRoutes.attr,
-	&dev_attr_ipReasmTimeout.attr,
-	&dev_attr_ipReasmReqds.attr,
-	&dev_attr_ipReasmOKs.attr,
-	&dev_attr_ipReasmFails.attr,
-	&dev_attr_ipFragOKs.attr,
-	&dev_attr_ipFragFails.attr,
-	&dev_attr_ipFragCreates.attr,
-	&dev_attr_ipInMcastPkts.attr,
-	&dev_attr_ipOutMcastPkts.attr,
-	&dev_attr_ipInBcastPkts.attr,
-	&dev_attr_ipOutBcastPkts.attr,
-	&dev_attr_tcpRtoAlgorithm.attr,
-	&dev_attr_tcpRtoMin.attr,
-	&dev_attr_tcpRtoMax.attr,
-	&dev_attr_tcpMaxConn.attr,
-	&dev_attr_tcpActiveOpens.attr,
-	&dev_attr_tcpPassiveOpens.attr,
-	&dev_attr_tcpAttemptFails.attr,
-	&dev_attr_tcpEstabResets.attr,
-	&dev_attr_tcpCurrEstab.attr,
-	&dev_attr_tcpInSegs.attr,
-	&dev_attr_tcpOutSegs.attr,
-	&dev_attr_tcpRetransSegs.attr,
-	&dev_attr_tcpInErrs.attr,
-	&dev_attr_tcpOutRsts.attr,
-	NULL
-};
-
-static struct attribute_group iw_stats_group = {
-	.name	= "proto_stats",
-	.attrs	= iw_proto_stats_attrs,
-};
-
 static void free_port_list_attributes(struct ib_device *device)
 {
 	struct kobject *p, *t;
@@ -1093,6 +1141,8 @@ static void free_port_list_attributes(st
 	list_for_each_entry_safe(p, t, &device->port_list, entry) {
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 		list_del(&p->entry);
+		if (port->ag)
+			free_protocol_stats(&port->kobj, port->ag);
 		sysfs_remove_group(p, port->pma_table);
 		sysfs_remove_group(p, &port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
@@ -1149,12 +1199,14 @@ int ib_device_register_sysfs(struct ib_d
 		}
 	}

-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats) {
-		ret = sysfs_create_group(&class_dev->kobj, &iw_stats_group);
-		if (ret)
-			goto err_put;
-	}
+	if (!device->get_protocol_stats)
+		return 0;
+
+	device->ag = create_protocol_stats(device, &class_dev->kobj, 0);

+	/* If create_protocol_stats returns NULL it might not be a failure, so
+	 * do nothing
+	 */
 	return 0;

 err_put:
@@ -1169,15 +1221,13 @@ err:

 void ib_device_unregister_sysfs(struct ib_device *device)
 {
-	/* Hold kobject until ib_dealloc_device() */
-	struct kobject *kobj_dev = kobject_get(&device->dev.kobj);
 	int i;

-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats)
-		sysfs_remove_group(kobj_dev, &iw_stats_group);
-
 	free_port_list_attributes(device);

+	if (device->ag)
+		free_protocol_stats(&device->dev.kobj, device->ag);
+
 	for (i = 0; i < ARRAY_SIZE(ib_class_attributes); ++i)
 		device_remove_file(&device->dev, ib_class_attributes[i]);

Index: linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1219,58 +1219,113 @@ static ssize_t show_board(struct device
 		       iwch_dev->rdev.rnic_info.pdev->device);
 }

+char *names[] = {
+	"ipInReceives",
+	"ipInHdrErrors",
+	"ipInAddrErrors",
+	"ipInUnknownProtos",
+	"ipInDiscards",
+	"ipInDelivers",
+	"ipOutRequests",
+	"ipOutDiscards",
+	"ipOutNoRoutes",
+	"ipReasmTimeout",
+	"ipReasmReqds",
+	"ipReasmOKs",
+	"ipReasmFails",
+	"tcpActiveOpens",
+	"tcpPassiveOpens",
+	"tcpAttemptFails",
+	"tcpEstabResets",
+	"tcpCurrEstab",
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpInErrs",
+	"tcpOutRsts",
+	"tcpRtoMin",
+	"tcpRtoMax",
+	NULL
+};
+
+enum counters {
+	IPINRECEIVES,
+	IPINHDRERRORS,
+	IPINADDRERRORS,
+	IPINUNKNOWNPROTOS,
+	IPINDISCARDS,
+	IPINDELIVERS,
+	IPOUTREQUESTS,
+	IPOUTDISCARDS,
+	IPOUTNOROUTES,
+	IPREASMTIMEOUT,
+	IPREASMREQDS,
+	IPREASMOKS,
+	IPREASMFAILS,
+	TCPACTIVEOPENS,
+	TCPPASSIVEOPENS,
+	TCPATTEMPTFAILS,
+	TCPESTABRESETS,
+	TCPCURRESTAB,
+	TCPINSEGS,
+	TCPOUTSEGS,
+	TCPRETRANSSEGS,
+	TCPINERRS,
+	TCPOUTRSTS,
+	TCPRTOMIN,
+	TCPRTOMAX,
+	NR_COUNTERS
+};
+
 static int iwch_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port)
 {
 	struct iwch_dev *dev;
 	struct tp_mib_stats m;
 	int ret;

+	if (port != 0)
+		return 0;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 	dev = to_iwch_dev(ibdev);
 	ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m);
 	if (ret)
 		return -ENOSYS;

-	memset(stats, 0, sizeof *stats);
-	stats->iw.ipInReceives = ((u64) m.ipInReceive_hi << 32) +
-				m.ipInReceive_lo;
-	stats->iw.ipInHdrErrors = ((u64) m.ipInHdrErrors_hi << 32) +
-				  m.ipInHdrErrors_lo;
-	stats->iw.ipInAddrErrors = ((u64) m.ipInAddrErrors_hi << 32) +
-				   m.ipInAddrErrors_lo;
-	stats->iw.ipInUnknownProtos = ((u64) m.ipInUnknownProtos_hi << 32) +
-				      m.ipInUnknownProtos_lo;
-	stats->iw.ipInDiscards = ((u64) m.ipInDiscards_hi << 32) +
-				 m.ipInDiscards_lo;
-	stats->iw.ipInDelivers = ((u64) m.ipInDelivers_hi << 32) +
-				 m.ipInDelivers_lo;
-	stats->iw.ipOutRequests = ((u64) m.ipOutRequests_hi << 32) +
-				  m.ipOutRequests_lo;
-	stats->iw.ipOutDiscards = ((u64) m.ipOutDiscards_hi << 32) +
-				  m.ipOutDiscards_lo;
-	stats->iw.ipOutNoRoutes = ((u64) m.ipOutNoRoutes_hi << 32) +
-				  m.ipOutNoRoutes_lo;
-	stats->iw.ipReasmTimeout = (u64) m.ipReasmTimeout;
-	stats->iw.ipReasmReqds = (u64) m.ipReasmReqds;
-	stats->iw.ipReasmOKs = (u64) m.ipReasmOKs;
-	stats->iw.ipReasmFails = (u64) m.ipReasmFails;
-	stats->iw.tcpActiveOpens = (u64) m.tcpActiveOpens;
-	stats->iw.tcpPassiveOpens = (u64) m.tcpPassiveOpens;
-	stats->iw.tcpAttemptFails = (u64) m.tcpAttemptFails;
-	stats->iw.tcpEstabResets = (u64) m.tcpEstabResets;
-	stats->iw.tcpOutRsts = (u64) m.tcpOutRsts;
-	stats->iw.tcpCurrEstab = (u64) m.tcpCurrEstab;
-	stats->iw.tcpInSegs = ((u64) m.tcpInSegs_hi << 32) +
-			      m.tcpInSegs_lo;
-	stats->iw.tcpOutSegs = ((u64) m.tcpOutSegs_hi << 32) +
-			       m.tcpOutSegs_lo;
-	stats->iw.tcpRetransSegs = ((u64) m.tcpRetransSeg_hi << 32) +
-				  m.tcpRetransSeg_lo;
-	stats->iw.tcpInErrs = ((u64) m.tcpInErrs_hi << 32) +
-			      m.tcpInErrs_lo;
-	stats->iw.tcpRtoMin = (u64) m.tcpRtoMin;
-	stats->iw.tcpRtoMax = (u64) m.tcpRtoMax;
+	/* Emsure we provided all values */
+	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
+
+	stats->dirname = "iw_stats";
+	stats->name = names;
+
+	stats->value[IPINRECEIVES] = ((u64)m.ipInReceive_hi << 32) +	m.ipInReceive_lo;
+	stats->value[IPINHDRERRORS] = ((u64)m.ipInHdrErrors_hi << 32) + m.ipInHdrErrors_lo;
+	stats->value[IPINADDRERRORS] = ((u64)m.ipInAddrErrors_hi << 32) + m.ipInAddrErrors_lo;
+	stats->value[IPINUNKNOWNPROTOS] = ((u64)m.ipInUnknownProtos_hi << 32) + m.ipInUnknownProtos_lo;
+	stats->value[IPINDISCARDS] = ((u64)m.ipInDiscards_hi << 32) + m.ipInDiscards_lo;
+	stats->value[IPINDELIVERS] = ((u64)m.ipInDelivers_hi << 32) + m.ipInDelivers_lo;
+	stats->value[IPOUTREQUESTS] = ((u64)m.ipOutRequests_hi << 32) + m.ipOutRequests_lo;
+	stats->value[IPOUTDISCARDS] = ((u64)m.ipOutDiscards_hi << 32) + m.ipOutDiscards_lo;
+	stats->value[IPOUTNOROUTES] = ((u64)m.ipOutNoRoutes_hi << 32) + m.ipOutNoRoutes_lo;
+	stats->value[IPREASMTIMEOUT] = 	m.ipReasmTimeout;
+	stats->value[IPREASMREQDS] = m.ipReasmReqds;
+	stats->value[IPREASMOKS] = m.ipReasmOKs;
+	stats->value[IPREASMFAILS] = m.ipReasmFails;
+	stats->value[TCPACTIVEOPENS] =	m.tcpActiveOpens;
+	stats->value[TCPPASSIVEOPENS] =	m.tcpPassiveOpens;
+	stats->value[TCPATTEMPTFAILS] = m.tcpAttemptFails;
+	stats->value[TCPESTABRESETS] = m.tcpEstabResets;
+	stats->value[TCPCURRESTAB] = m.tcpOutRsts;
+	stats->value[TCPINSEGS] = m.tcpCurrEstab;
+	stats->value[TCPOUTSEGS] = ((u64)m.tcpInSegs_hi << 32) + m.tcpInSegs_lo;
+	stats->value[TCPRETRANSSEGS] = ((u64)m.tcpOutSegs_hi << 32) + m.tcpOutSegs_lo;
+	stats->value[TCPINERRS] = ((u64)m.tcpRetransSeg_hi << 32) + m.tcpRetransSeg_lo,
+	stats->value[TCPOUTRSTS] = ((u64)m.tcpInErrs_hi << 32) + m.tcpInErrs_lo;
+	stats->value[TCPRTOMIN] = m.tcpRtoMin;
+	stats->value[TCPRTOMAX] = m.tcpRtoMax;
+
 	return 0;
 }

Index: linux/drivers/infiniband/hw/cxgb4/provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb4/provider.c
+++ linux/drivers/infiniband/hw/cxgb4/provider.c
@@ -446,18 +446,42 @@ static ssize_t show_board(struct device
 		       c4iw_dev->rdev.lldi.pdev->device);
 }

+static char *names[] = {
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpOutRsts",
+	NULL
+};
+
+enum counters {
+	TCPINSEGS,
+	TCPOUTSEGS,
+	TCPRETRANSSEGS,
+	TCPOUTRSTS,
+	NR_COUNTERS
+};
+
 static int c4iw_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port)
 {
 	struct tp_tcp_stats v4, v6;
 	struct c4iw_dev *c4iw_dev = to_c4iw_dev(ibdev);

+	if (port != 0)
+		return 0;
+
+	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
+
 	cxgb4_get_tcp_stats(c4iw_dev->rdev.lldi.pdev, &v4, &v6);
-	memset(stats, 0, sizeof *stats);
-	stats->iw.tcpInSegs = v4.tcp_in_segs + v6.tcp_in_segs;
-	stats->iw.tcpOutSegs = v4.tcp_out_segs + v6.tcp_out_segs;
-	stats->iw.tcpRetransSegs = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
-	stats->iw.tcpOutRsts = v4.tcp_out_rsts + v6.tcp_out_rsts;
+	stats->value[TCPINSEGS] = v4.tcp_in_segs + v6.tcp_in_segs;
+	stats->value[TCPOUTSEGS] = v4.tcp_out_segs + v6.tcp_out_segs;
+	stats->value[TCPRETRANSSEGS] = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
+	stats->value[TCPOUTRSTS] = v4.tcp_out_rsts + v6.tcp_out_rsts;
+
+	stats->name = names;
+	stats->dirname = "iw_stats";

 	return 0;
 }
Index: linux/include/rdma/ib_verbs.h
===================================================================
--- linux.orig/include/rdma/ib_verbs.h
+++ linux/include/rdma/ib_verbs.h
@@ -402,55 +402,13 @@ enum ib_port_speed {
 	IB_SPEED_EDR	= 32
 };

-struct ib_protocol_stats {
-	/* TBD... */
-};
-
-struct iw_protocol_stats {
-	u64	ipInReceives;
-	u64	ipInHdrErrors;
-	u64	ipInTooBigErrors;
-	u64	ipInNoRoutes;
-	u64	ipInAddrErrors;
-	u64	ipInUnknownProtos;
-	u64	ipInTruncatedPkts;
-	u64	ipInDiscards;
-	u64	ipInDelivers;
-	u64	ipOutForwDatagrams;
-	u64	ipOutRequests;
-	u64	ipOutDiscards;
-	u64	ipOutNoRoutes;
-	u64	ipReasmTimeout;
-	u64	ipReasmReqds;
-	u64	ipReasmOKs;
-	u64	ipReasmFails;
-	u64	ipFragOKs;
-	u64	ipFragFails;
-	u64	ipFragCreates;
-	u64	ipInMcastPkts;
-	u64	ipOutMcastPkts;
-	u64	ipInBcastPkts;
-	u64	ipOutBcastPkts;
-
-	u64	tcpRtoAlgorithm;
-	u64	tcpRtoMin;
-	u64	tcpRtoMax;
-	u64	tcpMaxConn;
-	u64	tcpActiveOpens;
-	u64	tcpPassiveOpens;
-	u64	tcpAttemptFails;
-	u64	tcpEstabResets;
-	u64	tcpCurrEstab;
-	u64	tcpInSegs;
-	u64	tcpOutSegs;
-	u64	tcpRetransSegs;
-	u64	tcpInErrs;
-	u64	tcpOutRsts;
-};
-
-union rdma_protocol_stats {
-	struct ib_protocol_stats	ib;
-	struct iw_protocol_stats	iw;
+#define MAX_NR_PROTOCOL_STATS 120
+struct rdma_protocol_stats {
+	char *dirname;		/* Directory name */
+	char **name;		/* NULL terminated list of strings for the names
+				 * of the counters
+				 */
+	u64 value[MAX_NR_PROTOCOL_STATS];
 };

 /* Define bits for the various functionality this port needs to be supported by
@@ -1686,7 +1644,8 @@ struct ib_device {
 	struct iw_cm_verbs	     *iwcm;

 	int		           (*get_protocol_stats)(struct ib_device *device,
-							 union rdma_protocol_stats *stats);
+							 struct rdma_protocol_stats *stats,
+							 u8 port);
 	int		           (*query_device)(struct ib_device *device,
 						   struct ib_device_attr *device_attr,
 						   struct ib_udata *udata);
@@ -1903,6 +1862,7 @@ struct ib_device {
 	u8                           node_type;
 	u8                           phys_port_cnt;
 	struct ib_device_attr        attrs;
+	struct attribute_group	     *ag;

 	/**
 	 * The following mandatory functions are used only at device
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                   ` <alpine.DEB.2.20.1605161203010.26453-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-05-16 17:27                     ` Doug Ledford
       [not found]                       ` <db1bf3dd-20b1-3f2f-eaa3-25e5e8aff35b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-16 17:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

On 05/16/2016 01:04 PM, Christoph Lameter wrote:
> On Mon, 16 May 2016, Doug Ledford wrote:
> 
>>>> Secondly, do *not* use a BUG_ON in this patch.  I saw at least two of
>>>> them.  There is nothing in this patch so serious that we should crash
>>>> the kernel.  Any failure here is something we can work around and keep
>>>> running.
>>>
>>> WARN_ON_ONCE then?
>> Yep.
> 
> BUILD_BUG_ON works even better once we have the enum. Revised patch
> follows:

There's still one BUG_ON.  See below...

> +static struct attribute_group *create_protocol_stats(struct ib_device *device,
> +						     struct kobject *kobj,
> +						     u8 port) {
> +	struct attribute_group *ag;
> +	struct rdma_protocol_stats stats = {0};

We allocated our struct on the stack.  We have a static length counter
array in this struct.

> +	u32 counters;
> +	u32 i;
> +	int ret;
> +
> +	ret = device->get_protocol_stats(device, &stats, port);

We then pass our struct to get_protocol_stats to fill out.

> +
> +	if (ret || !stats.name)
> +		return NULL;
> +
> +	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
> +	if (!ag)
> +		return NULL;
> +
> +	ag->name = stats.dirname;
> +
> +	for (counters = 0; stats.name[counters]; counters++)
> +		;

We count how many names there are...

> +
> +	BUG_ON(counters > MAX_NR_PROTOCOL_STATS);

And we BUG_ON if there are more names than we allocated static counters
in our struct.  However, by now, it's too late.  The device's
get_protocol_stats function has already likely written beyond the end of
our array and smashed our stack.  Let's fix this another way entirely...


> +	/* Emsure we provided all values */

Typo in comment...

> +	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);

Here's where we ant to fix the problem.  We need a second BUILD_BUG_ON
that reads:

	BUILD_BUG_ON(NR_COUNTERS > MAX_NR_PROTOCOL_STATS);

Then we can safely reduce the size of the MAX_NR_PROTOCOL_STATS and know
that if we exceed our allocation, the build will fail and we can
increase it accordingly.  So in all of the get_stats routines we want
both build time checks to make sure that the counter structs are sized
appropriately and the name/enum list lengths match.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                       ` <db1bf3dd-20b1-3f2f-eaa3-25e5e8aff35b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-16 17:49                         ` Christoph Lameter
       [not found]                           ` <alpine.DEB.2.20.1605161238560.10085-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-05-16 17:49 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w


Ok patch with the changes requested. Check in core/sysfs.c removed.



Subject: ib core: Make device counter infrastructure dynamic V3

V1->V2:
 - Use enums for counters for cxgb3/cxgb4
 - Use BUILD_BUG_ON instead of BUG_ON

V2->v3
 - Change the way to check for MAX_NR_PROTOCOL_STATS
 - Fix typo

The current approach in ib_verbs.h is to define a struct with
statistics fields. ib_verbs is an abstraction layer for
device drivers and the protocol_stats should be abstract. However,
in practice these are actually device specific counters (also used
in such a way in Mellanox OFED). The device stats are a union
and in practice evolve to a union of device specific stats.

ARGH!!!

This should not be defined in a concrete way in ib_verbs.h since the
API definition needs to be device neutral.

What we need is an API that can return an arbitrary number
of counters that may be general or device specific.

Here we simply define an array of strings that describe the
counters and an array of 64 bit counters that are
displayed through sysfs. This makes ib_verbs API appropriately
abstract and device drivers do not need to update ib_verbs.h
(and thus break the API) to add more counters,.

This patch is groundwork to later define the protocol stats for
Mellanox devices.

The strings being used in the device drives should only be present
if the driver actually supports those counters. If there is
a standardized name then devices should use the same name
as in other infiniband device drivers.

Note on cxgb3/4 portions:
- I was unable to test because of the lack of hardware
- The counters that are never incremented were omitted. There may be the
  impression that less counter are supported. But the other counters
  have never been supported.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Index: linux/drivers/infiniband/core/sysfs.c
===================================================================
--- linux.orig/drivers/infiniband/core/sysfs.c
+++ linux/drivers/infiniband/core/sysfs.c
@@ -58,6 +58,7 @@ struct ib_port {
 	struct attribute_group pkey_group;
 	u8                     port_num;
 	struct attribute_group *pma_table;
+	struct attribute_group *ag;
 };

 struct port_attribute {
@@ -80,6 +81,16 @@ struct port_table_attribute {
 	__be16			attr_id;
 };

+struct ib_port_stats {
+	struct port_attribute	attr;
+	u32			index;
+};
+
+struct ib_device_stats {
+	struct device_attribute	attr;
+	u32			index;
+};
+
 static ssize_t port_attr_show(struct kobject *kobj,
 			      struct attribute *attr, char *buf)
 {
@@ -733,6 +744,145 @@ static struct attribute_group *get_count
 	return &pma_group;
 }

+static ssize_t show_protocol_stats(struct ib_device *dev, int index,
+				   u8 port, char *buf)
+{
+	struct rdma_protocol_stats stats = {0};
+	ssize_t ret;
+
+	ret = dev->get_protocol_stats(dev, &stats, port);
+	if (ret)
+		return ret;
+
+	return sprintf(buf, "%llu\n", stats.value[index]);
+}
+
+static ssize_t show_device_protocol_stats(struct device *device,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+	struct ib_device_stats *ds;
+
+	ds = container_of(attr, struct ib_device_stats, attr);
+	return show_protocol_stats(dev, ds->index, 0, buf);
+}
+
+static ssize_t show_port_protocol_stats(struct ib_port *p,
+					struct port_attribute *attr,
+					char *buf)
+{
+	struct ib_port_stats *ps;
+
+	ps = container_of(attr, struct ib_port_stats, attr);
+	return	show_protocol_stats(p->ibdev, ps->index, p->port_num, buf);
+}
+
+static void free_protocol_stats(struct kobject *kobj,
+				struct attribute_group *ag)
+{
+	struct attribute **da;
+
+	sysfs_remove_group(kobj, ag);
+
+	for (da = ag->attrs; *da; da++)
+		kfree(*da);
+
+	kfree(ag->attrs);
+	kfree(ag);
+}
+
+static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
+{
+	struct ib_port_stats *ps;
+
+	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
+	if (!ps)
+		return NULL;
+
+	ps->attr.attr.name = name;
+	ps->attr.attr.mode = S_IRUGO;
+	ps->attr.show = show_port_protocol_stats;
+	ps->attr.store = NULL;
+	ps->index = index;
+
+	return &ps->attr.attr;
+}
+
+static struct attribute *alloc_stats_device(u32 index, u8 port, char *name)
+{
+	struct ib_device_stats *da;
+
+	da = kmalloc(sizeof(*da), GFP_KERNEL);
+	if (!da)
+		return NULL;
+
+	da->attr.attr.name = name;
+	da->attr.attr.mode = S_IRUGO;
+	da->attr.show = show_device_protocol_stats;
+	da->attr.store = NULL;
+	da->index = index;
+
+	return &da->attr.attr;
+}
+
+static struct attribute *alloc_stats_attribute(u32 index, u8 port, char *name)
+{
+	return (port) ? alloc_stats_port(index, port, name) :
+		alloc_stats_device(index, port, name);
+}
+
+static struct attribute_group *create_protocol_stats(struct ib_device *device,
+						     struct kobject *kobj,
+						     u8 port) {
+	struct attribute_group *ag;
+	struct rdma_protocol_stats stats = {0};
+	u32 counters;
+	u32 i;
+	int ret;
+
+	ret = device->get_protocol_stats(device, &stats, port);
+
+	if (ret || !stats.name)
+		return NULL;
+
+	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	if (!ag)
+		return NULL;
+
+	ag->name = stats.dirname;
+	ag->attrs = kcalloc((counters + 1), sizeof(struct attribute *),
+			    GFP_KERNEL);
+	if (!ag->attrs)
+		goto nomem;
+
+	for (i = 0; i < counters; i++) {
+		struct attribute *attr;
+
+		attr = alloc_stats_attribute(i, port, stats.name[i]);
+		if (!attr)
+			goto nomem2;
+
+		ag->attrs[i] = attr;
+	}
+	ag->attrs[counters] = NULL;
+
+	ret = sysfs_create_group(kobj, ag);
+
+	if (ret)
+		goto nomem2;
+
+	return ag;
+nomem2:
+	for (i = 0; i < counters; i++)
+		kfree(ag->attrs[i]);
+
+	kfree(ag->attrs);
+nomem:
+	kfree(ag);
+	return NULL;
+}
+
 static int add_port(struct ib_device *device, int port_num,
 		    int (*port_callback)(struct ib_device *,
 					 u8, struct kobject *))
@@ -835,6 +985,12 @@ static int add_port(struct ib_device *de
 			goto err_remove_pkey;
 	}

+	/* If port == 0, it means we have only one port, so the device should
+	 * hold the protocol stats
+	 */
+	if (device->get_protocol_stats && port_num)
+		p->ag = create_protocol_stats(device, &p->kobj, port_num);
+
 	list_add_tail(&p->kobj.entry, &device->port_list);

 	kobject_uevent(&p->kobj, KOBJ_ADD);
@@ -972,120 +1128,6 @@ static struct device_attribute *ib_class
 	&dev_attr_node_desc
 };

-/* Show a given an attribute in the statistics group */
-static ssize_t show_protocol_stat(const struct device *device,
-			    struct device_attribute *attr, char *buf,
-			    unsigned offset)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-	union rdma_protocol_stats stats;
-	ssize_t ret;
-
-	ret = dev->get_protocol_stats(dev, &stats);
-	if (ret)
-		return ret;
-
-	return sprintf(buf, "%llu\n",
-		       (unsigned long long) ((u64 *) &stats)[offset]);
-}
-
-/* generate a read-only iwarp statistics attribute */
-#define IW_STATS_ENTRY(name)						\
-static ssize_t show_##name(struct device *device,			\
-			   struct device_attribute *attr, char *buf)	\
-{									\
-	return show_protocol_stat(device, attr, buf,			\
-				  offsetof(struct iw_protocol_stats, name) / \
-				  sizeof (u64));			\
-}									\
-static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
-
-IW_STATS_ENTRY(ipInReceives);
-IW_STATS_ENTRY(ipInHdrErrors);
-IW_STATS_ENTRY(ipInTooBigErrors);
-IW_STATS_ENTRY(ipInNoRoutes);
-IW_STATS_ENTRY(ipInAddrErrors);
-IW_STATS_ENTRY(ipInUnknownProtos);
-IW_STATS_ENTRY(ipInTruncatedPkts);
-IW_STATS_ENTRY(ipInDiscards);
-IW_STATS_ENTRY(ipInDelivers);
-IW_STATS_ENTRY(ipOutForwDatagrams);
-IW_STATS_ENTRY(ipOutRequests);
-IW_STATS_ENTRY(ipOutDiscards);
-IW_STATS_ENTRY(ipOutNoRoutes);
-IW_STATS_ENTRY(ipReasmTimeout);
-IW_STATS_ENTRY(ipReasmReqds);
-IW_STATS_ENTRY(ipReasmOKs);
-IW_STATS_ENTRY(ipReasmFails);
-IW_STATS_ENTRY(ipFragOKs);
-IW_STATS_ENTRY(ipFragFails);
-IW_STATS_ENTRY(ipFragCreates);
-IW_STATS_ENTRY(ipInMcastPkts);
-IW_STATS_ENTRY(ipOutMcastPkts);
-IW_STATS_ENTRY(ipInBcastPkts);
-IW_STATS_ENTRY(ipOutBcastPkts);
-IW_STATS_ENTRY(tcpRtoAlgorithm);
-IW_STATS_ENTRY(tcpRtoMin);
-IW_STATS_ENTRY(tcpRtoMax);
-IW_STATS_ENTRY(tcpMaxConn);
-IW_STATS_ENTRY(tcpActiveOpens);
-IW_STATS_ENTRY(tcpPassiveOpens);
-IW_STATS_ENTRY(tcpAttemptFails);
-IW_STATS_ENTRY(tcpEstabResets);
-IW_STATS_ENTRY(tcpCurrEstab);
-IW_STATS_ENTRY(tcpInSegs);
-IW_STATS_ENTRY(tcpOutSegs);
-IW_STATS_ENTRY(tcpRetransSegs);
-IW_STATS_ENTRY(tcpInErrs);
-IW_STATS_ENTRY(tcpOutRsts);
-
-static struct attribute *iw_proto_stats_attrs[] = {
-	&dev_attr_ipInReceives.attr,
-	&dev_attr_ipInHdrErrors.attr,
-	&dev_attr_ipInTooBigErrors.attr,
-	&dev_attr_ipInNoRoutes.attr,
-	&dev_attr_ipInAddrErrors.attr,
-	&dev_attr_ipInUnknownProtos.attr,
-	&dev_attr_ipInTruncatedPkts.attr,
-	&dev_attr_ipInDiscards.attr,
-	&dev_attr_ipInDelivers.attr,
-	&dev_attr_ipOutForwDatagrams.attr,
-	&dev_attr_ipOutRequests.attr,
-	&dev_attr_ipOutDiscards.attr,
-	&dev_attr_ipOutNoRoutes.attr,
-	&dev_attr_ipReasmTimeout.attr,
-	&dev_attr_ipReasmReqds.attr,
-	&dev_attr_ipReasmOKs.attr,
-	&dev_attr_ipReasmFails.attr,
-	&dev_attr_ipFragOKs.attr,
-	&dev_attr_ipFragFails.attr,
-	&dev_attr_ipFragCreates.attr,
-	&dev_attr_ipInMcastPkts.attr,
-	&dev_attr_ipOutMcastPkts.attr,
-	&dev_attr_ipInBcastPkts.attr,
-	&dev_attr_ipOutBcastPkts.attr,
-	&dev_attr_tcpRtoAlgorithm.attr,
-	&dev_attr_tcpRtoMin.attr,
-	&dev_attr_tcpRtoMax.attr,
-	&dev_attr_tcpMaxConn.attr,
-	&dev_attr_tcpActiveOpens.attr,
-	&dev_attr_tcpPassiveOpens.attr,
-	&dev_attr_tcpAttemptFails.attr,
-	&dev_attr_tcpEstabResets.attr,
-	&dev_attr_tcpCurrEstab.attr,
-	&dev_attr_tcpInSegs.attr,
-	&dev_attr_tcpOutSegs.attr,
-	&dev_attr_tcpRetransSegs.attr,
-	&dev_attr_tcpInErrs.attr,
-	&dev_attr_tcpOutRsts.attr,
-	NULL
-};
-
-static struct attribute_group iw_stats_group = {
-	.name	= "proto_stats",
-	.attrs	= iw_proto_stats_attrs,
-};
-
 static void free_port_list_attributes(struct ib_device *device)
 {
 	struct kobject *p, *t;
@@ -1093,6 +1135,8 @@ static void free_port_list_attributes(st
 	list_for_each_entry_safe(p, t, &device->port_list, entry) {
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 		list_del(&p->entry);
+		if (port->ag)
+			free_protocol_stats(&port->kobj, port->ag);
 		sysfs_remove_group(p, port->pma_table);
 		sysfs_remove_group(p, &port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
@@ -1149,12 +1193,14 @@ int ib_device_register_sysfs(struct ib_d
 		}
 	}

-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats) {
-		ret = sysfs_create_group(&class_dev->kobj, &iw_stats_group);
-		if (ret)
-			goto err_put;
-	}
+	if (!device->get_protocol_stats)
+		return 0;
+
+	device->ag = create_protocol_stats(device, &class_dev->kobj, 0);

+	/* If create_protocol_stats returns NULL it might not be a failure, so
+	 * do nothing
+	 */
 	return 0;

 err_put:
@@ -1169,15 +1215,13 @@ err:

 void ib_device_unregister_sysfs(struct ib_device *device)
 {
-	/* Hold kobject until ib_dealloc_device() */
-	struct kobject *kobj_dev = kobject_get(&device->dev.kobj);
 	int i;

-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats)
-		sysfs_remove_group(kobj_dev, &iw_stats_group);
-
 	free_port_list_attributes(device);

+	if (device->ag)
+		free_protocol_stats(&device->dev.kobj, device->ag);
+
 	for (i = 0; i < ARRAY_SIZE(ib_class_attributes); ++i)
 		device_remove_file(&device->dev, ib_class_attributes[i]);

Index: linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1219,58 +1219,114 @@ static ssize_t show_board(struct device
 		       iwch_dev->rdev.rnic_info.pdev->device);
 }

+char *names[] = {
+	"ipInReceives",
+	"ipInHdrErrors",
+	"ipInAddrErrors",
+	"ipInUnknownProtos",
+	"ipInDiscards",
+	"ipInDelivers",
+	"ipOutRequests",
+	"ipOutDiscards",
+	"ipOutNoRoutes",
+	"ipReasmTimeout",
+	"ipReasmReqds",
+	"ipReasmOKs",
+	"ipReasmFails",
+	"tcpActiveOpens",
+	"tcpPassiveOpens",
+	"tcpAttemptFails",
+	"tcpEstabResets",
+	"tcpCurrEstab",
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpInErrs",
+	"tcpOutRsts",
+	"tcpRtoMin",
+	"tcpRtoMax",
+	NULL
+};
+
+enum counters {
+	IPINRECEIVES,
+	IPINHDRERRORS,
+	IPINADDRERRORS,
+	IPINUNKNOWNPROTOS,
+	IPINDISCARDS,
+	IPINDELIVERS,
+	IPOUTREQUESTS,
+	IPOUTDISCARDS,
+	IPOUTNOROUTES,
+	IPREASMTIMEOUT,
+	IPREASMREQDS,
+	IPREASMOKS,
+	IPREASMFAILS,
+	TCPACTIVEOPENS,
+	TCPPASSIVEOPENS,
+	TCPATTEMPTFAILS,
+	TCPESTABRESETS,
+	TCPCURRESTAB,
+	TCPINSEGS,
+	TCPOUTSEGS,
+	TCPRETRANSSEGS,
+	TCPINERRS,
+	TCPOUTRSTS,
+	TCPRTOMIN,
+	TCPRTOMAX,
+	NR_COUNTERS
+};
+
 static int iwch_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port)
 {
 	struct iwch_dev *dev;
 	struct tp_mib_stats m;
 	int ret;

+	if (port != 0)
+		return 0;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 	dev = to_iwch_dev(ibdev);
 	ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m);
 	if (ret)
 		return -ENOSYS;

-	memset(stats, 0, sizeof *stats);
-	stats->iw.ipInReceives = ((u64) m.ipInReceive_hi << 32) +
-				m.ipInReceive_lo;
-	stats->iw.ipInHdrErrors = ((u64) m.ipInHdrErrors_hi << 32) +
-				  m.ipInHdrErrors_lo;
-	stats->iw.ipInAddrErrors = ((u64) m.ipInAddrErrors_hi << 32) +
-				   m.ipInAddrErrors_lo;
-	stats->iw.ipInUnknownProtos = ((u64) m.ipInUnknownProtos_hi << 32) +
-				      m.ipInUnknownProtos_lo;
-	stats->iw.ipInDiscards = ((u64) m.ipInDiscards_hi << 32) +
-				 m.ipInDiscards_lo;
-	stats->iw.ipInDelivers = ((u64) m.ipInDelivers_hi << 32) +
-				 m.ipInDelivers_lo;
-	stats->iw.ipOutRequests = ((u64) m.ipOutRequests_hi << 32) +
-				  m.ipOutRequests_lo;
-	stats->iw.ipOutDiscards = ((u64) m.ipOutDiscards_hi << 32) +
-				  m.ipOutDiscards_lo;
-	stats->iw.ipOutNoRoutes = ((u64) m.ipOutNoRoutes_hi << 32) +
-				  m.ipOutNoRoutes_lo;
-	stats->iw.ipReasmTimeout = (u64) m.ipReasmTimeout;
-	stats->iw.ipReasmReqds = (u64) m.ipReasmReqds;
-	stats->iw.ipReasmOKs = (u64) m.ipReasmOKs;
-	stats->iw.ipReasmFails = (u64) m.ipReasmFails;
-	stats->iw.tcpActiveOpens = (u64) m.tcpActiveOpens;
-	stats->iw.tcpPassiveOpens = (u64) m.tcpPassiveOpens;
-	stats->iw.tcpAttemptFails = (u64) m.tcpAttemptFails;
-	stats->iw.tcpEstabResets = (u64) m.tcpEstabResets;
-	stats->iw.tcpOutRsts = (u64) m.tcpOutRsts;
-	stats->iw.tcpCurrEstab = (u64) m.tcpCurrEstab;
-	stats->iw.tcpInSegs = ((u64) m.tcpInSegs_hi << 32) +
-			      m.tcpInSegs_lo;
-	stats->iw.tcpOutSegs = ((u64) m.tcpOutSegs_hi << 32) +
-			       m.tcpOutSegs_lo;
-	stats->iw.tcpRetransSegs = ((u64) m.tcpRetransSeg_hi << 32) +
-				  m.tcpRetransSeg_lo;
-	stats->iw.tcpInErrs = ((u64) m.tcpInErrs_hi << 32) +
-			      m.tcpInErrs_lo;
-	stats->iw.tcpRtoMin = (u64) m.tcpRtoMin;
-	stats->iw.tcpRtoMax = (u64) m.tcpRtoMax;
+	/* Ensure we provided all values */
+	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
+	BUILD_BUG_ON(NR_COUNTERS > MAX_NR_PROTOCOL_STATS);
+
+	stats->dirname = "iw_stats";
+	stats->name = names;
+
+	stats->value[IPINRECEIVES] = ((u64)m.ipInReceive_hi << 32) +	m.ipInReceive_lo;
+	stats->value[IPINHDRERRORS] = ((u64)m.ipInHdrErrors_hi << 32) + m.ipInHdrErrors_lo;
+	stats->value[IPINADDRERRORS] = ((u64)m.ipInAddrErrors_hi << 32) + m.ipInAddrErrors_lo;
+	stats->value[IPINUNKNOWNPROTOS] = ((u64)m.ipInUnknownProtos_hi << 32) + m.ipInUnknownProtos_lo;
+	stats->value[IPINDISCARDS] = ((u64)m.ipInDiscards_hi << 32) + m.ipInDiscards_lo;
+	stats->value[IPINDELIVERS] = ((u64)m.ipInDelivers_hi << 32) + m.ipInDelivers_lo;
+	stats->value[IPOUTREQUESTS] = ((u64)m.ipOutRequests_hi << 32) + m.ipOutRequests_lo;
+	stats->value[IPOUTDISCARDS] = ((u64)m.ipOutDiscards_hi << 32) + m.ipOutDiscards_lo;
+	stats->value[IPOUTNOROUTES] = ((u64)m.ipOutNoRoutes_hi << 32) + m.ipOutNoRoutes_lo;
+	stats->value[IPREASMTIMEOUT] = 	m.ipReasmTimeout;
+	stats->value[IPREASMREQDS] = m.ipReasmReqds;
+	stats->value[IPREASMOKS] = m.ipReasmOKs;
+	stats->value[IPREASMFAILS] = m.ipReasmFails;
+	stats->value[TCPACTIVEOPENS] =	m.tcpActiveOpens;
+	stats->value[TCPPASSIVEOPENS] =	m.tcpPassiveOpens;
+	stats->value[TCPATTEMPTFAILS] = m.tcpAttemptFails;
+	stats->value[TCPESTABRESETS] = m.tcpEstabResets;
+	stats->value[TCPCURRESTAB] = m.tcpOutRsts;
+	stats->value[TCPINSEGS] = m.tcpCurrEstab;
+	stats->value[TCPOUTSEGS] = ((u64)m.tcpInSegs_hi << 32) + m.tcpInSegs_lo;
+	stats->value[TCPRETRANSSEGS] = ((u64)m.tcpOutSegs_hi << 32) + m.tcpOutSegs_lo;
+	stats->value[TCPINERRS] = ((u64)m.tcpRetransSeg_hi << 32) + m.tcpRetransSeg_lo,
+	stats->value[TCPOUTRSTS] = ((u64)m.tcpInErrs_hi << 32) + m.tcpInErrs_lo;
+	stats->value[TCPRTOMIN] = m.tcpRtoMin;
+	stats->value[TCPRTOMAX] = m.tcpRtoMax;
+
 	return 0;
 }

Index: linux/drivers/infiniband/hw/cxgb4/provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb4/provider.c
+++ linux/drivers/infiniband/hw/cxgb4/provider.c
@@ -446,18 +446,43 @@ static ssize_t show_board(struct device
 		       c4iw_dev->rdev.lldi.pdev->device);
 }

+static char *names[] = {
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpOutRsts",
+	NULL
+};
+
+enum counters {
+	TCPINSEGS,
+	TCPOUTSEGS,
+	TCPRETRANSSEGS,
+	TCPOUTRSTS,
+	NR_COUNTERS
+};
+
 static int c4iw_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port)
 {
 	struct tp_tcp_stats v4, v6;
 	struct c4iw_dev *c4iw_dev = to_c4iw_dev(ibdev);

+	if (port != 0)
+		return 0;
+
+	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
+	BUILD_BUG_ON(NR_COUNTERS > MAX_NR_PROTOCOL_STATS);
+
 	cxgb4_get_tcp_stats(c4iw_dev->rdev.lldi.pdev, &v4, &v6);
-	memset(stats, 0, sizeof *stats);
-	stats->iw.tcpInSegs = v4.tcp_in_segs + v6.tcp_in_segs;
-	stats->iw.tcpOutSegs = v4.tcp_out_segs + v6.tcp_out_segs;
-	stats->iw.tcpRetransSegs = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
-	stats->iw.tcpOutRsts = v4.tcp_out_rsts + v6.tcp_out_rsts;
+	stats->value[TCPINSEGS] = v4.tcp_in_segs + v6.tcp_in_segs;
+	stats->value[TCPOUTSEGS] = v4.tcp_out_segs + v6.tcp_out_segs;
+	stats->value[TCPRETRANSSEGS] = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
+	stats->value[TCPOUTRSTS] = v4.tcp_out_rsts + v6.tcp_out_rsts;
+
+	stats->name = names;
+	stats->dirname = "iw_stats";

 	return 0;
 }
Index: linux/include/rdma/ib_verbs.h
===================================================================
--- linux.orig/include/rdma/ib_verbs.h
+++ linux/include/rdma/ib_verbs.h
@@ -402,55 +402,13 @@ enum ib_port_speed {
 	IB_SPEED_EDR	= 32
 };

-struct ib_protocol_stats {
-	/* TBD... */
-};
-
-struct iw_protocol_stats {
-	u64	ipInReceives;
-	u64	ipInHdrErrors;
-	u64	ipInTooBigErrors;
-	u64	ipInNoRoutes;
-	u64	ipInAddrErrors;
-	u64	ipInUnknownProtos;
-	u64	ipInTruncatedPkts;
-	u64	ipInDiscards;
-	u64	ipInDelivers;
-	u64	ipOutForwDatagrams;
-	u64	ipOutRequests;
-	u64	ipOutDiscards;
-	u64	ipOutNoRoutes;
-	u64	ipReasmTimeout;
-	u64	ipReasmReqds;
-	u64	ipReasmOKs;
-	u64	ipReasmFails;
-	u64	ipFragOKs;
-	u64	ipFragFails;
-	u64	ipFragCreates;
-	u64	ipInMcastPkts;
-	u64	ipOutMcastPkts;
-	u64	ipInBcastPkts;
-	u64	ipOutBcastPkts;
-
-	u64	tcpRtoAlgorithm;
-	u64	tcpRtoMin;
-	u64	tcpRtoMax;
-	u64	tcpMaxConn;
-	u64	tcpActiveOpens;
-	u64	tcpPassiveOpens;
-	u64	tcpAttemptFails;
-	u64	tcpEstabResets;
-	u64	tcpCurrEstab;
-	u64	tcpInSegs;
-	u64	tcpOutSegs;
-	u64	tcpRetransSegs;
-	u64	tcpInErrs;
-	u64	tcpOutRsts;
-};
-
-union rdma_protocol_stats {
-	struct ib_protocol_stats	ib;
-	struct iw_protocol_stats	iw;
+#define MAX_NR_PROTOCOL_STATS 120
+struct rdma_protocol_stats {
+	char *dirname;		/* Directory name */
+	char **name;		/* NULL terminated list of strings for the names
+				 * of the counters
+				 */
+	u64 value[MAX_NR_PROTOCOL_STATS];
 };

 /* Define bits for the various functionality this port needs to be supported by
@@ -1686,7 +1644,8 @@ struct ib_device {
 	struct iw_cm_verbs	     *iwcm;

 	int		           (*get_protocol_stats)(struct ib_device *device,
-							 union rdma_protocol_stats *stats);
+							 struct rdma_protocol_stats *stats,
+							 u8 port);
 	int		           (*query_device)(struct ib_device *device,
 						   struct ib_device_attr *device_attr,
 						   struct ib_udata *udata);
@@ -1903,6 +1862,7 @@ struct ib_device {
 	u8                           node_type;
 	u8                           phys_port_cnt;
 	struct ib_device_attr        attrs;
+	struct attribute_group	     *ag;

 	/**
 	 * The following mandatory functions are used only at device
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                           ` <alpine.DEB.2.20.1605161238560.10085-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-05-16 18:01                             ` Doug Ledford
       [not found]                               ` <102cd100-55f7-fa85-cd75-ba0db5b9fa34-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-05-16 18:06                             ` Jason Gunthorpe
  2016-05-16 18:27                             ` Steve Wise
  2 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-16 18:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

On 05/16/2016 01:49 PM, Christoph Lameter wrote:
> 
> Ok patch with the changes requested. Check in core/sysfs.c removed.
> 
> 
> 
> Subject: ib core: Make device counter infrastructure dynamic V3
> 
> V1->V2:
>  - Use enums for counters for cxgb3/cxgb4
>  - Use BUILD_BUG_ON instead of BUG_ON
> 
> V2->v3
>  - Change the way to check for MAX_NR_PROTOCOL_STATS
>  - Fix typo
> 
> The current approach in ib_verbs.h is to define a struct with
> statistics fields. ib_verbs is an abstraction layer for
> device drivers and the protocol_stats should be abstract. However,
> in practice these are actually device specific counters (also used
> in such a way in Mellanox OFED). The device stats are a union
> and in practice evolve to a union of device specific stats.
> 
> ARGH!!!
> 
> This should not be defined in a concrete way in ib_verbs.h since the
> API definition needs to be device neutral.
> 
> What we need is an API that can return an arbitrary number
> of counters that may be general or device specific.
> 
> Here we simply define an array of strings that describe the
> counters and an array of 64 bit counters that are
> displayed through sysfs. This makes ib_verbs API appropriately
> abstract and device drivers do not need to update ib_verbs.h
> (and thus break the API) to add more counters,.
> 
> This patch is groundwork to later define the protocol stats for
> Mellanox devices.
> 
> The strings being used in the device drives should only be present
> if the driver actually supports those counters. If there is
> a standardized name then devices should use the same name
> as in other infiniband device drivers.
> 
> Note on cxgb3/4 portions:
> - I was unable to test because of the lack of hardware
> - The counters that are never incremented were omitted. There may be the
>   impression that less counter are supported. But the other counters
>   have never been supported.
> 
> Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Thanks, this looks good now.  When the other two patches come through
I'll apply the group.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                           ` <alpine.DEB.2.20.1605161238560.10085-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-05-16 18:01                             ` Doug Ledford
@ 2016-05-16 18:06                             ` Jason Gunthorpe
  2016-05-16 18:27                             ` Steve Wise
  2 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2016-05-16 18:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Mon, May 16, 2016 at 12:49:33PM -0500, Christoph Lameter wrote:

> +char *names[] = {

static const char * ?

Isn't Kees trying to shift this sort of stuff to .rodata these days?

> +	"ipInReceives",

[IPINRECEIVES] = "ipInReceives",

Is probably more maintainable

> +	NULL

[NR_COUNTERS] = NULL,

Makes many of the BUILD_BUG_ON's redundant

I think this is a fine idea, but I think we need to document the
common set of counters and what it is they mean, otherwise it will
just be chaos for userspace :(

The MIB based counters and IBTA are already defined fortunately, just
need to have common names.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                           ` <alpine.DEB.2.20.1605161238560.10085-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-05-16 18:01                             ` Doug Ledford
  2016-05-16 18:06                             ` Jason Gunthorpe
@ 2016-05-16 18:27                             ` Steve Wise
  2 siblings, 0 replies; 42+ messages in thread
From: Steve Wise @ 2016-05-16 18:27 UTC (permalink / raw)
  To: 'Christoph Lameter', 'Doug Ledford'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Mark Bloch',
	'Jason Gunthorpe', 'Steve Wise',
	'Majd Dibbiny',
	alonvi-VPRAkNaXOzVWk0Htik3J/w

Looks good to me.

Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                               ` <102cd100-55f7-fa85-cd75-ba0db5b9fa34-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-17 14:19                                 ` Christoph Lameter
       [not found]                                   ` <alpine.DEB.2.20.1605170918080.9956-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-05-17 14:19 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w


On Mon, 16 May 2016, Doug Ledford wrote:
>
> Thanks, this looks good now.  When the other two patches come through

The patch can stand on its own and there has been the expectation
expressed by Mellanox that they want to see this merged first. Guess this
is to reduce the amount of rewrite they would have to do if things change.
Then also the team from Mellanox can directly merge the driver changes
without my involvement.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                   ` <alpine.DEB.2.20.1605170918080.9956-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-05-17 16:00                                     ` Doug Ledford
       [not found]                                       ` <3e9a3e19-58cb-c25f-89a1-f0e51df562d8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-17 16:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch, Jason Gunthorpe,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 4292 bytes --]

On 05/17/2016 10:19 AM, Christoph Lameter wrote:
> 
> On Mon, 16 May 2016, Doug Ledford wrote:
>>
>> Thanks, this looks good now.  When the other two patches come through
> 
> The patch can stand on its own and there has been the expectation
> expressed by Mellanox that they want to see this merged first. Guess this
> is to reduce the amount of rewrite they would have to do if things change.
> Then also the team from Mellanox can directly merge the driver changes
> without my involvement.
> 

OK.  There are comments from Jason outstanding, and I found one thing
that I missed in my earlier reviews.  I think we need to refactor how we
pull out the stats, or at least consider doing so.  In particular, look
at how many stats the cxgb3 driver fills in:

+	stats->dirname = "iw_stats";
+	stats->name = names;
+
+	stats->value[IPINRECEIVES] = ((u64)m.ipInReceive_hi << 32) +
m.ipInReceive_lo;
+	stats->value[IPINHDRERRORS] = ((u64)m.ipInHdrErrors_hi << 32) +
m.ipInHdrErrors_lo;
+	stats->value[IPINADDRERRORS] = ((u64)m.ipInAddrErrors_hi << 32) +
m.ipInAddrErrors_lo;
+	stats->value[IPINUNKNOWNPROTOS] = ((u64)m.ipInUnknownProtos_hi << 32)
+ m.ipInUnknownProtos_lo;
+	stats->value[IPINDISCARDS] = ((u64)m.ipInDiscards_hi << 32) +
m.ipInDiscards_lo;
+	stats->value[IPINDELIVERS] = ((u64)m.ipInDelivers_hi << 32) +
m.ipInDelivers_lo;
+	stats->value[IPOUTREQUESTS] = ((u64)m.ipOutRequests_hi << 32) +
m.ipOutRequests_lo;
+	stats->value[IPOUTDISCARDS] = ((u64)m.ipOutDiscards_hi << 32) +
m.ipOutDiscards_lo;
+	stats->value[IPOUTNOROUTES] = ((u64)m.ipOutNoRoutes_hi << 32) +
m.ipOutNoRoutes_lo;
+	stats->value[IPREASMTIMEOUT] = 	m.ipReasmTimeout;
+	stats->value[IPREASMREQDS] = m.ipReasmReqds;
+	stats->value[IPREASMOKS] = m.ipReasmOKs;
+	stats->value[IPREASMFAILS] = m.ipReasmFails;
+	stats->value[TCPACTIVEOPENS] =	m.tcpActiveOpens;
+	stats->value[TCPPASSIVEOPENS] =	m.tcpPassiveOpens;
+	stats->value[TCPATTEMPTFAILS] = m.tcpAttemptFails;
+	stats->value[TCPESTABRESETS] = m.tcpEstabResets;
+	stats->value[TCPCURRESTAB] = m.tcpOutRsts;
+	stats->value[TCPINSEGS] = m.tcpCurrEstab;
+	stats->value[TCPOUTSEGS] = ((u64)m.tcpInSegs_hi << 32) + m.tcpInSegs_lo;
+	stats->value[TCPRETRANSSEGS] = ((u64)m.tcpOutSegs_hi << 32) +
m.tcpOutSegs_lo;
+	stats->value[TCPINERRS] = ((u64)m.tcpRetransSeg_hi << 32) +
m.tcpRetransSeg_lo,
+	stats->value[TCPOUTRSTS] = ((u64)m.tcpInErrs_hi << 32) + m.tcpInErrs_lo;
+	stats->value[TCPRTOMIN] = m.tcpRtoMin;
+	stats->value[TCPRTOMAX] = m.tcpRtoMax;

That's a lot of copies, and shifts, and everything else.  Then look at
what it does to get them:

 	ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m);

I didn't dig too deep, but that looks suspiciously like it might be an
actual mailbox command to the card.  That can be rather expensive.

Then look at how we get the stats to print them to user space:

+static ssize_t show_protocol_stats(struct ib_device *dev, int index,
+				   u8 port, char *buf)
+{
+	struct rdma_protocol_stats stats = {0};
+	ssize_t ret;
+
+	ret = dev->get_protocol_stats(dev, &stats, port);
+	if (ret)
+		return ret;
+
+	return sprintf(buf, "%llu\n", stats.value[index]);
+}

In a nutshell, we go through the effort of a suspected mailbox command,
then we fill in all of the stats including all of the copies and shifts
and everything else, then we print out precisely one and only one stat
before we throw the rest of them away.  If someone goes into the stats
directory for a card and does cat * or for i in *; do echo -ne "$i:\t";
cat $i; done, then we will issue 25 mailbox commands, and fill out all
25 stats structs 25 times, just to print out one complete set of stats.
For cxgb4 this isn't so bad, it's only got 4 items.  But the longer the
list gets, the worst this is because it makes our efficiency of
operation O(n^2).  Since we can't break out mailbox commands to only
provide part of the data, I think we need to consider using a cached
struct for each device.  If the cached data is less than a certain age
on subsequent reads, we use the cached data.  If it's too old, we
discard it and get new data.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                       ` <3e9a3e19-58cb-c25f-89a1-f0e51df562d8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-17 16:06                                         ` Steve Wise
  2016-05-17 17:00                                         ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Steve Wise @ 2016-05-17 16:06 UTC (permalink / raw)
  To: 'Doug Ledford', 'Christoph Lameter'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Mark Bloch',
	'Jason Gunthorpe', 'Steve Wise',
	'Majd Dibbiny',
	alonvi-VPRAkNaXOzVWk0Htik3J/w



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Doug Ledford
> Sent: Tuesday, May 17, 2016 11:01 AM
> To: Christoph Lameter
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mark Bloch; Jason Gunthorpe; Steve Wise; Majd
> Dibbiny; alonvi-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
> Subject: Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
> 
> On 05/17/2016 10:19 AM, Christoph Lameter wrote:
> >
> > On Mon, 16 May 2016, Doug Ledford wrote:
> >>
> >> Thanks, this looks good now.  When the other two patches come through
> >
> > The patch can stand on its own and there has been the expectation
> > expressed by Mellanox that they want to see this merged first. Guess this
> > is to reduce the amount of rewrite they would have to do if things change.
> > Then also the team from Mellanox can directly merge the driver changes
> > without my involvement.
> >
> 
> OK.  There are comments from Jason outstanding, and I found one thing
> that I missed in my earlier reviews.  I think we need to refactor how we
> pull out the stats, or at least consider doing so.  In particular, look
> at how many stats the cxgb3 driver fills in:
> 
> +	stats->dirname = "iw_stats";
> +	stats->name = names;
> +
> +	stats->value[IPINRECEIVES] = ((u64)m.ipInReceive_hi << 32) +
> m.ipInReceive_lo;
> +	stats->value[IPINHDRERRORS] = ((u64)m.ipInHdrErrors_hi << 32) +
> m.ipInHdrErrors_lo;
> +	stats->value[IPINADDRERRORS] = ((u64)m.ipInAddrErrors_hi << 32) +
> m.ipInAddrErrors_lo;
> +	stats->value[IPINUNKNOWNPROTOS] = ((u64)m.ipInUnknownProtos_hi <<
> 32)
> + m.ipInUnknownProtos_lo;
> +	stats->value[IPINDISCARDS] = ((u64)m.ipInDiscards_hi << 32) +
> m.ipInDiscards_lo;
> +	stats->value[IPINDELIVERS] = ((u64)m.ipInDelivers_hi << 32) +
> m.ipInDelivers_lo;
> +	stats->value[IPOUTREQUESTS] = ((u64)m.ipOutRequests_hi << 32) +
> m.ipOutRequests_lo;
> +	stats->value[IPOUTDISCARDS] = ((u64)m.ipOutDiscards_hi << 32) +
> m.ipOutDiscards_lo;
> +	stats->value[IPOUTNOROUTES] = ((u64)m.ipOutNoRoutes_hi << 32) +
> m.ipOutNoRoutes_lo;
> +	stats->value[IPREASMTIMEOUT] = 	m.ipReasmTimeout;
> +	stats->value[IPREASMREQDS] = m.ipReasmReqds;
> +	stats->value[IPREASMOKS] = m.ipReasmOKs;
> +	stats->value[IPREASMFAILS] = m.ipReasmFails;
> +	stats->value[TCPACTIVEOPENS] =	m.tcpActiveOpens;
> +	stats->value[TCPPASSIVEOPENS] =	m.tcpPassiveOpens;
> +	stats->value[TCPATTEMPTFAILS] = m.tcpAttemptFails;
> +	stats->value[TCPESTABRESETS] = m.tcpEstabResets;
> +	stats->value[TCPCURRESTAB] = m.tcpOutRsts;
> +	stats->value[TCPINSEGS] = m.tcpCurrEstab;
> +	stats->value[TCPOUTSEGS] = ((u64)m.tcpInSegs_hi << 32) + m.tcpInSegs_lo;
> +	stats->value[TCPRETRANSSEGS] = ((u64)m.tcpOutSegs_hi << 32) +
> m.tcpOutSegs_lo;
> +	stats->value[TCPINERRS] = ((u64)m.tcpRetransSeg_hi << 32) +
> m.tcpRetransSeg_lo,
> +	stats->value[TCPOUTRSTS] = ((u64)m.tcpInErrs_hi << 32) + m.tcpInErrs_lo;
> +	stats->value[TCPRTOMIN] = m.tcpRtoMin;
> +	stats->value[TCPRTOMAX] = m.tcpRtoMax;
> 
> That's a lot of copies, and shifts, and everything else.  Then look at
> what it does to get them:
> 
>  	ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m);
> 
> I didn't dig too deep, but that looks suspiciously like it might be an
> actual mailbox command to the card.  That can be rather expensive.
>

It is not a mailbox command, but indirect register reads (ie a write_reg +
read_reg operation).  See
cxgb_rdma_ctl(RDMA_GIT_MIB)->t3_tp_get_mib_stats()->t3_read_indirect().
 
> Then look at how we get the stats to print them to user space:
> 
> +static ssize_t show_protocol_stats(struct ib_device *dev, int index,
> +				   u8 port, char *buf)
> +{
> +	struct rdma_protocol_stats stats = {0};
> +	ssize_t ret;
> +
> +	ret = dev->get_protocol_stats(dev, &stats, port);
> +	if (ret)
> +		return ret;
> +
> +	return sprintf(buf, "%llu\n", stats.value[index]);
> +}
> 
> In a nutshell, we go through the effort of a suspected mailbox command,
> then we fill in all of the stats including all of the copies and shifts
> and everything else, then we print out precisely one and only one stat
> before we throw the rest of them away.  If someone goes into the stats
> directory for a card and does cat * or for i in *; do echo -ne "$i:\t";
> cat $i; done, then we will issue 25 mailbox commands, and fill out all
> 25 stats structs 25 times, just to print out one complete set of stats.
> For cxgb4 this isn't so bad, it's only got 4 items.  But the longer the
> list gets, the worst this is because it makes our efficiency of
> operation O(n^2).  Since we can't break out mailbox commands to only
> provide part of the data, I think we need to consider using a cached
> struct for each device.  If the cached data is less than a certain age
> on subsequent reads, we use the cached data.  If it's too old, we
> discard it and get new data.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                       ` <3e9a3e19-58cb-c25f-89a1-f0e51df562d8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-05-17 16:06                                         ` Steve Wise
@ 2016-05-17 17:00                                         ` Jason Gunthorpe
       [not found]                                           ` <20160517170027.GC19976-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2016-05-17 17:00 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Tue, May 17, 2016 at 12:00:39PM -0400, Doug Ledford wrote:

> operation O(n^2).  Since we can't break out mailbox commands to only
> provide part of the data, I think we need to consider using a cached
> struct for each device.  If the cached data is less than a certain age
> on subsequent reads, we use the cached data.  If it's too old, we
> discard it and get new data.

I noticed this too, but for sysfs reading I just felt it doesn't
matter.

Ultimately a followup patch should export the entire stats block
through netlink as one operation and nothing should use sysfs except
debugging. Keeping the driver interface simple seems valuable.

Caching is going to detrimental to apps that sync stats with external
time. (which is almost every real-world app)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                           ` <20160517170027.GC19976-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-05-17 17:34                                             ` Doug Ledford
       [not found]                                               ` <bc83cc75-aa2e-6fc6-e1c6-a0190b972013-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-17 17:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 2281 bytes --]

On 05/17/2016 01:00 PM, Jason Gunthorpe wrote:
> On Tue, May 17, 2016 at 12:00:39PM -0400, Doug Ledford wrote:
> 
>> operation O(n^2).  Since we can't break out mailbox commands to only
>> provide part of the data, I think we need to consider using a cached
>> struct for each device.  If the cached data is less than a certain age
>> on subsequent reads, we use the cached data.  If it's too old, we
>> discard it and get new data.
> 
> I noticed this too, but for sysfs reading I just felt it doesn't
> matter.

I prefer not to have O(n^2), even for sysfs.  You say it doesn't matter,
but if someone creates a script to check all of their stats via sysfs
once per second, and we start talking about mlx4 or mlx5 where they are
going to export a lot more variables, it eventually gets bad.

> Ultimately a followup patch should export the entire stats block
> through netlink as one operation

That's perfectly fine.

> and nothing should use sysfs except
> debugging.

Nobody manually checking on numbers themselves will use netlink.  And if
the stats are there, people will check them.  You can't depend on this
being used for debug access only.  If I recall correctly, ibstatus uses
all sysfs entries, and people would easily think that using it uses the
"preferred" method.  So, like I said, if it's there, it *will* get used,
and not just for debug,

> Keeping the driver interface simple seems valuable.

If you cache it in the core, the driver interface is simple.  And given
that the core is where the stats block is allocated and deallocated,
that's where the caching would need to be.  And it's not really that
hard.  I'll whip something up here in a bit...

> Caching is going to detrimental to apps that sync stats with external
> time. (which is almost every real-world app)

That's problematic with or without caching as the stats don't have a
timestamp, so scheduling delays could easily make the stats that you get
have a significant variance in time between the time they were retrieved
and when you actually processed them.  Are you suggesting we should go
ahead and add a timestamp to the stats output?

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                               ` <bc83cc75-aa2e-6fc6-e1c6-a0190b972013-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-18 17:25                                                 ` Jason Gunthorpe
       [not found]                                                   ` <20160518172542.GA15516-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2016-05-18 17:25 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Tue, May 17, 2016 at 01:34:34PM -0400, Doug Ledford wrote:
> > I noticed this too, but for sysfs reading I just felt it doesn't
> > matter.
> 
> I prefer not to have O(n^2), even for sysfs.  You say it doesn't matter,
> but if someone creates a script to check all of their stats via
> sysfs

Well, it matters if it actually takes a long time, but I don't really
care of it is O(n^2) and still only adding a few additional 100us's..

> > and nothing should use sysfs except
> > debugging.
> 
> Nobody manually checking on numbers themselves will use netlink.  And if
> the stats are there, people will check them.  You can't depend on this
> being used for debug access only.  If I recall correctly, ibstatus uses
> all sysfs entries, and people would easily think that using it uses the
> "preferred" method.  So, like I said, if it's there, it *will* get used,
> and not just for debug,

Why would people manually use sysfs? A netlink interface would be
accompanied by a tool. I don't schlep around in sysfs for netdev, I
use 'ip -s link show'

> > Caching is going to detrimental to apps that sync stats with external
> > time. (which is almost every real-world app)
> 
> That's problematic with or without caching as the stats don't have a
> timestamp, so scheduling delays could easily make the stats that you
> get

One can avoid scheduling delays with the right SCHED_ policy, how do
you avoid timing jitter from caching?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                                   ` <20160518172542.GA15516-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-05-18 18:11                                                     ` Doug Ledford
       [not found]                                                       ` <12e991bc-aa9b-a8b0-3cd4-b56d58a44d60-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2016-05-18 18:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 2912 bytes --]

On 05/18/2016 01:25 PM, Jason Gunthorpe wrote:
> On Tue, May 17, 2016 at 01:34:34PM -0400, Doug Ledford wrote:
>>> I noticed this too, but for sysfs reading I just felt it doesn't
>>> matter.
>>
>> I prefer not to have O(n^2), even for sysfs.  You say it doesn't matter,
>> but if someone creates a script to check all of their stats via
>> sysfs
> 
> Well, it matters if it actually takes a long time, but I don't really
> care of it is O(n^2) and still only adding a few additional 100us's..

If I understood Steve's response to this thread right, each access to
these stats on cxgb3 still involves a PCI register read/write.  Is that
right Steve?  Not a mailbox command but config space access?

Anyway, it's easy to imagine that as this grows, it might include more
and more stuff that hits the card.  I'm loathe to doing something as
inefficient as "read all stats, print one stat" on every access.

>>> and nothing should use sysfs except
>>> debugging.
>>
>> Nobody manually checking on numbers themselves will use netlink.  And if
>> the stats are there, people will check them.  You can't depend on this
>> being used for debug access only.  If I recall correctly, ibstatus uses
>> all sysfs entries, and people would easily think that using it uses the
>> "preferred" method.  So, like I said, if it's there, it *will* get used,
>> and not just for debug,
> 
> Why would people manually use sysfs? A netlink interface would be
> accompanied by a tool. I don't schlep around in sysfs for netdev, I
> use 'ip -s link show'

And how are you going to get RNR Retries from an IB device with that?
It might get added in the future, but it's currently not there (and
there's a decent argument against adding it to ip since that's actually
looking at the ipoib device and not the underlying ib device).  For some
stats, you may have no choice but to schlep around in sysfs.  If someone
does that, it needs to not suck.

>>> Caching is going to detrimental to apps that sync stats with external
>>> time. (which is almost every real-world app)
>>
>> That's problematic with or without caching as the stats don't have a
>> timestamp, so scheduling delays could easily make the stats that you
>> get
> 
> One can avoid scheduling delays with the right SCHED_ policy. how do
> you avoid timing jitter from caching?

I don't say I do.  I was pointing out that this isn't the distinction
you make it out to be.  However, as I'm working on this, the stats are
aged, and when they pass a certain age, they get recollected.  To make
that happen, I save the age in jiffies when they are taken.  That could
be used to provide their timestamp and give meaningful time context to
the counter.  They simply don't when you are reading the files in sysfs.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                                       ` <12e991bc-aa9b-a8b0-3cd4-b56d58a44d60-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-18 18:17                                                         ` Steve Wise
  2016-05-18 19:01                                                         ` Christoph Lameter
  2016-05-18 21:11                                                         ` Jason Gunthorpe
  2 siblings, 0 replies; 42+ messages in thread
From: Steve Wise @ 2016-05-18 18:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On 5/18/2016 1:11 PM, Doug Ledford wrote:
> On 05/18/2016 01:25 PM, Jason Gunthorpe wrote:
>> On Tue, May 17, 2016 at 01:34:34PM -0400, Doug Ledford wrote:
>>>> I noticed this too, but for sysfs reading I just felt it doesn't
>>>> matter.
>>> I prefer not to have O(n^2), even for sysfs.  You say it doesn't matter,
>>> but if someone creates a script to check all of their stats via
>>> sysfs
>> Well, it matters if it actually takes a long time, but I don't really
>> care of it is O(n^2) and still only adding a few additional 100us's..
> If I understood Steve's response to this thread right, each access to
> these stats on cxgb3 still involves a PCI register read/write.  Is that
> right Steve?  Not a mailbox command but config space access?

Correct.

> Anyway, it's easy to imagine that as this grows, it might include more
> and more stuff that hits the card.  I'm loathe to doing something as
> inefficient as "read all stats, print one stat" on every access.
>
>>>> and nothing should use sysfs except
>>>> debugging.
>>> Nobody manually checking on numbers themselves will use netlink.  And if
>>> the stats are there, people will check them.  You can't depend on this
>>> being used for debug access only.  If I recall correctly, ibstatus uses
>>> all sysfs entries, and people would easily think that using it uses the
>>> "preferred" method.  So, like I said, if it's there, it *will* get used,
>>> and not just for debug,
>> Why would people manually use sysfs? A netlink interface would be
>> accompanied by a tool. I don't schlep around in sysfs for netdev, I
>> use 'ip -s link show'
> And how are you going to get RNR Retries from an IB device with that?
> It might get added in the future, but it's currently not there (and
> there's a decent argument against adding it to ip since that's actually
> looking at the ipoib device and not the underlying ib device).  For some
> stats, you may have no choice but to schlep around in sysfs.  If someone
> does that, it needs to not suck.
>
>>>> Caching is going to detrimental to apps that sync stats with external
>>>> time. (which is almost every real-world app)
>>> That's problematic with or without caching as the stats don't have a
>>> timestamp, so scheduling delays could easily make the stats that you
>>> get
>> One can avoid scheduling delays with the right SCHED_ policy. how do
>> you avoid timing jitter from caching?
> I don't say I do.  I was pointing out that this isn't the distinction
> you make it out to be.  However, as I'm working on this, the stats are
> aged, and when they pass a certain age, they get recollected.  To make
> that happen, I save the age in jiffies when they are taken.  That could
> be used to provide their timestamp and give meaningful time context to
> the counter.  They simply don't when you are reading the files in sysfs.
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                                       ` <12e991bc-aa9b-a8b0-3cd4-b56d58a44d60-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-05-18 18:17                                                         ` Steve Wise
@ 2016-05-18 19:01                                                         ` Christoph Lameter
       [not found]                                                           ` <alpine.DEB.2.20.1605181401030.29313-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-05-18 21:11                                                         ` Jason Gunthorpe
  2 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2016-05-18 19:01 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

Attached a revision of the patch that only fetches one value. Simplifies
things too.


Subject: ib core: Make device counter infrastructure dynamic v3

V2->v3:
- Fetch individual countes from the device
- Get rid of the max counter limit

V1-V2:
 - Use enums for counters for cxgb3/cxgb4
 - Use BUILD_BUG_ON instead of BUG_ON

The current approach in ib_verbs.h is to define a struct with
statistics fields. ib_verbs is an abstraction layer for
device drivers and the protocol_stats should be abstract. However,
in practice these are actually device specific counters (also used
in such a way in Mellanox OFED). The device stats are a union
and in practice evolve to a union of device specific stats.

ARGH!!!

This should not be defined in a concrete way in ib_verbs.h since the
API definition needs to be device neutral.

What we need is an API that can return an arbitrary number
of counters that may be general or device specific.

Here we simply define an array of strings that describe the
counters and an array of 64 bit counters that are
displayed through sysfs. This makes ib_verbs API appropriately
abstract and device drivers do not need to update ib_verbs.h
(and thus break the API) to add more counters,.

This patch is groundwork to later define the protocol stats for
Mellanox devices.

The strings being used in the device drives should only be present
if the driver actually supports those counters. If there is
a standardized name then devices should use the same name
as in other infiniband device drivers.

Note on cxgb3/4 portions:
- I was unable to test because of the lack of hardware
- The counters that are never incremented were omitted. There may be the
  impression that less counter are supported. But the other counters
  have never been supported.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Index: linux/drivers/infiniband/core/sysfs.c
===================================================================
--- linux.orig/drivers/infiniband/core/sysfs.c
+++ linux/drivers/infiniband/core/sysfs.c
@@ -58,6 +58,7 @@ struct ib_port {
 	struct attribute_group pkey_group;
 	u8                     port_num;
 	struct attribute_group *pma_table;
+	struct attribute_group *ag;
 };

 struct port_attribute {
@@ -80,6 +81,16 @@ struct port_table_attribute {
 	__be16			attr_id;
 };

+struct ib_port_stats {
+	struct port_attribute	attr;
+	u32			index;
+};
+
+struct ib_device_stats {
+	struct device_attribute	attr;
+	u32			index;
+};
+
 static ssize_t port_attr_show(struct kobject *kobj,
 			      struct attribute *attr, char *buf)
 {
@@ -733,6 +744,145 @@ static struct attribute_group *get_count
 	return &pma_group;
 }

+static ssize_t show_protocol_stats(struct ib_device *dev, int index,
+				   u8 port, char *buf)
+{
+	ssize_t ret;
+	struct rdma_protocol_stats s;
+
+	ret = dev->get_protocol_stats(dev, &s, port, index);
+	if (ret)
+		return ret;
+
+	return sprintf(buf, "%llu\n", s.value);
+}
+
+static ssize_t show_device_protocol_stats(struct device *device,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+	struct ib_device_stats *ds;
+
+	ds = container_of(attr, struct ib_device_stats, attr);
+	return show_protocol_stats(dev, ds->index, 0, buf);
+}
+
+static ssize_t show_port_protocol_stats(struct ib_port *p,
+					struct port_attribute *attr,
+					char *buf)
+{
+	struct ib_port_stats *ps;
+
+	ps = container_of(attr, struct ib_port_stats, attr);
+	return	show_protocol_stats(p->ibdev, ps->index, p->port_num, buf);
+}
+
+static void free_protocol_stats(struct kobject *kobj,
+				struct attribute_group *ag)
+{
+	struct attribute **da;
+
+	sysfs_remove_group(kobj, ag);
+
+	for (da = ag->attrs; *da; da++)
+		kfree(*da);
+
+	kfree(ag->attrs);
+	kfree(ag);
+}
+
+static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
+{
+	struct ib_port_stats *ps;
+
+	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
+	if (!ps)
+		return NULL;
+
+	ps->attr.attr.name = name;
+	ps->attr.attr.mode = S_IRUGO;
+	ps->attr.show = show_port_protocol_stats;
+	ps->attr.store = NULL;
+	ps->index = index;
+
+	return &ps->attr.attr;
+}
+
+static struct attribute *alloc_stats_device(u32 index, u8 port, char *name)
+{
+	struct ib_device_stats *da;
+
+	da = kmalloc(sizeof(*da), GFP_KERNEL);
+	if (!da)
+		return NULL;
+
+	da->attr.attr.name = name;
+	da->attr.attr.mode = S_IRUGO;
+	da->attr.show = show_device_protocol_stats;
+	da->attr.store = NULL;
+	da->index = index;
+
+	return &da->attr.attr;
+}
+
+static struct attribute *alloc_stats_attribute(u32 index, u8 port, char *name)
+{
+	return (port) ? alloc_stats_port(index, port, name) :
+		alloc_stats_device(index, port, name);
+}
+
+static struct attribute_group *create_protocol_stats(struct ib_device *device,
+						     struct kobject *kobj,
+						     u8 port) {
+	struct attribute_group *ag;
+	struct rdma_protocol_stats stats = {0};
+	u32 counters;
+	u32 i;
+	int ret;
+
+	ret = device->get_protocol_stats(device, &stats, port, 0);
+
+	if (ret || !stats.name)
+		return NULL;
+
+	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	if (!ag)
+		return NULL;
+
+	ag->name = stats.dirname;
+	ag->attrs = kcalloc((counters + 1), sizeof(struct attribute *),
+			    GFP_KERNEL);
+	if (!ag->attrs)
+		goto nomem;
+
+	for (i = 0; i < counters; i++) {
+		struct attribute *attr;
+
+		attr = alloc_stats_attribute(i, port, stats.name[i]);
+		if (!attr)
+			goto nomem2;
+
+		ag->attrs[i] = attr;
+	}
+	ag->attrs[counters] = NULL;
+
+	ret = sysfs_create_group(kobj, ag);
+
+	if (ret)
+		goto nomem2;
+
+	return ag;
+nomem2:
+	for (i = 0; i < counters; i++)
+		kfree(ag->attrs[i]);
+
+	kfree(ag->attrs);
+nomem:
+	kfree(ag);
+	return NULL;
+}
+
 static int add_port(struct ib_device *device, int port_num,
 		    int (*port_callback)(struct ib_device *,
 					 u8, struct kobject *))
@@ -835,6 +985,12 @@ static int add_port(struct ib_device *de
 			goto err_remove_pkey;
 	}

+	/* If port == 0, it means we have only one port, so the device should
+	 * hold the protocol stats
+	 */
+	if (device->get_protocol_stats && port_num)
+		p->ag = create_protocol_stats(device, &p->kobj, port_num);
+
 	list_add_tail(&p->kobj.entry, &device->port_list);

 	kobject_uevent(&p->kobj, KOBJ_ADD);
@@ -972,120 +1128,6 @@ static struct device_attribute *ib_class
 	&dev_attr_node_desc
 };

-/* Show a given an attribute in the statistics group */
-static ssize_t show_protocol_stat(const struct device *device,
-			    struct device_attribute *attr, char *buf,
-			    unsigned offset)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-	union rdma_protocol_stats stats;
-	ssize_t ret;
-
-	ret = dev->get_protocol_stats(dev, &stats);
-	if (ret)
-		return ret;
-
-	return sprintf(buf, "%llu\n",
-		       (unsigned long long) ((u64 *) &stats)[offset]);
-}
-
-/* generate a read-only iwarp statistics attribute */
-#define IW_STATS_ENTRY(name)						\
-static ssize_t show_##name(struct device *device,			\
-			   struct device_attribute *attr, char *buf)	\
-{									\
-	return show_protocol_stat(device, attr, buf,			\
-				  offsetof(struct iw_protocol_stats, name) / \
-				  sizeof (u64));			\
-}									\
-static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
-
-IW_STATS_ENTRY(ipInReceives);
-IW_STATS_ENTRY(ipInHdrErrors);
-IW_STATS_ENTRY(ipInTooBigErrors);
-IW_STATS_ENTRY(ipInNoRoutes);
-IW_STATS_ENTRY(ipInAddrErrors);
-IW_STATS_ENTRY(ipInUnknownProtos);
-IW_STATS_ENTRY(ipInTruncatedPkts);
-IW_STATS_ENTRY(ipInDiscards);
-IW_STATS_ENTRY(ipInDelivers);
-IW_STATS_ENTRY(ipOutForwDatagrams);
-IW_STATS_ENTRY(ipOutRequests);
-IW_STATS_ENTRY(ipOutDiscards);
-IW_STATS_ENTRY(ipOutNoRoutes);
-IW_STATS_ENTRY(ipReasmTimeout);
-IW_STATS_ENTRY(ipReasmReqds);
-IW_STATS_ENTRY(ipReasmOKs);
-IW_STATS_ENTRY(ipReasmFails);
-IW_STATS_ENTRY(ipFragOKs);
-IW_STATS_ENTRY(ipFragFails);
-IW_STATS_ENTRY(ipFragCreates);
-IW_STATS_ENTRY(ipInMcastPkts);
-IW_STATS_ENTRY(ipOutMcastPkts);
-IW_STATS_ENTRY(ipInBcastPkts);
-IW_STATS_ENTRY(ipOutBcastPkts);
-IW_STATS_ENTRY(tcpRtoAlgorithm);
-IW_STATS_ENTRY(tcpRtoMin);
-IW_STATS_ENTRY(tcpRtoMax);
-IW_STATS_ENTRY(tcpMaxConn);
-IW_STATS_ENTRY(tcpActiveOpens);
-IW_STATS_ENTRY(tcpPassiveOpens);
-IW_STATS_ENTRY(tcpAttemptFails);
-IW_STATS_ENTRY(tcpEstabResets);
-IW_STATS_ENTRY(tcpCurrEstab);
-IW_STATS_ENTRY(tcpInSegs);
-IW_STATS_ENTRY(tcpOutSegs);
-IW_STATS_ENTRY(tcpRetransSegs);
-IW_STATS_ENTRY(tcpInErrs);
-IW_STATS_ENTRY(tcpOutRsts);
-
-static struct attribute *iw_proto_stats_attrs[] = {
-	&dev_attr_ipInReceives.attr,
-	&dev_attr_ipInHdrErrors.attr,
-	&dev_attr_ipInTooBigErrors.attr,
-	&dev_attr_ipInNoRoutes.attr,
-	&dev_attr_ipInAddrErrors.attr,
-	&dev_attr_ipInUnknownProtos.attr,
-	&dev_attr_ipInTruncatedPkts.attr,
-	&dev_attr_ipInDiscards.attr,
-	&dev_attr_ipInDelivers.attr,
-	&dev_attr_ipOutForwDatagrams.attr,
-	&dev_attr_ipOutRequests.attr,
-	&dev_attr_ipOutDiscards.attr,
-	&dev_attr_ipOutNoRoutes.attr,
-	&dev_attr_ipReasmTimeout.attr,
-	&dev_attr_ipReasmReqds.attr,
-	&dev_attr_ipReasmOKs.attr,
-	&dev_attr_ipReasmFails.attr,
-	&dev_attr_ipFragOKs.attr,
-	&dev_attr_ipFragFails.attr,
-	&dev_attr_ipFragCreates.attr,
-	&dev_attr_ipInMcastPkts.attr,
-	&dev_attr_ipOutMcastPkts.attr,
-	&dev_attr_ipInBcastPkts.attr,
-	&dev_attr_ipOutBcastPkts.attr,
-	&dev_attr_tcpRtoAlgorithm.attr,
-	&dev_attr_tcpRtoMin.attr,
-	&dev_attr_tcpRtoMax.attr,
-	&dev_attr_tcpMaxConn.attr,
-	&dev_attr_tcpActiveOpens.attr,
-	&dev_attr_tcpPassiveOpens.attr,
-	&dev_attr_tcpAttemptFails.attr,
-	&dev_attr_tcpEstabResets.attr,
-	&dev_attr_tcpCurrEstab.attr,
-	&dev_attr_tcpInSegs.attr,
-	&dev_attr_tcpOutSegs.attr,
-	&dev_attr_tcpRetransSegs.attr,
-	&dev_attr_tcpInErrs.attr,
-	&dev_attr_tcpOutRsts.attr,
-	NULL
-};
-
-static struct attribute_group iw_stats_group = {
-	.name	= "proto_stats",
-	.attrs	= iw_proto_stats_attrs,
-};
-
 static void free_port_list_attributes(struct ib_device *device)
 {
 	struct kobject *p, *t;
@@ -1093,6 +1135,8 @@ static void free_port_list_attributes(st
 	list_for_each_entry_safe(p, t, &device->port_list, entry) {
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 		list_del(&p->entry);
+		if (port->ag)
+			free_protocol_stats(&port->kobj, port->ag);
 		sysfs_remove_group(p, port->pma_table);
 		sysfs_remove_group(p, &port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
@@ -1149,12 +1193,14 @@ int ib_device_register_sysfs(struct ib_d
 		}
 	}

-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats) {
-		ret = sysfs_create_group(&class_dev->kobj, &iw_stats_group);
-		if (ret)
-			goto err_put;
-	}
+	if (!device->get_protocol_stats)
+		return 0;
+
+	device->ag = create_protocol_stats(device, &class_dev->kobj, 0);

+	/* If create_protocol_stats returns NULL it might not be a failure, so
+	 * do nothing
+	 */
 	return 0;

 err_put:
@@ -1169,15 +1215,13 @@ err:

 void ib_device_unregister_sysfs(struct ib_device *device)
 {
-	/* Hold kobject until ib_dealloc_device() */
-	struct kobject *kobj_dev = kobject_get(&device->dev.kobj);
 	int i;

-	if (device->node_type == RDMA_NODE_RNIC && device->get_protocol_stats)
-		sysfs_remove_group(kobj_dev, &iw_stats_group);
-
 	free_port_list_attributes(device);

+	if (device->ag)
+		free_protocol_stats(&device->dev.kobj, device->ag);
+
 	for (i = 0; i < ARRAY_SIZE(ib_class_attributes); ++i)
 		device_remove_file(&device->dev, ib_class_attributes[i]);

Index: linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1219,58 +1219,168 @@ static ssize_t show_board(struct device
 		       iwch_dev->rdev.rnic_info.pdev->device);
 }

+char *names[] = {
+	"ipInReceives",
+	"ipInHdrErrors",
+	"ipInAddrErrors",
+	"ipInUnknownProtos",
+	"ipInDiscards",
+	"ipInDelivers",
+	"ipOutRequests",
+	"ipOutDiscards",
+	"ipOutNoRoutes",
+	"ipReasmTimeout",
+	"ipReasmReqds",
+	"ipReasmOKs",
+	"ipReasmFails",
+	"tcpActiveOpens",
+	"tcpPassiveOpens",
+	"tcpAttemptFails",
+	"tcpEstabResets",
+	"tcpCurrEstab",
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpInErrs",
+	"tcpOutRsts",
+	"tcpRtoMin",
+	"tcpRtoMax",
+	NULL
+};
+
+enum counters {
+	IPINRECEIVES,
+	IPINHDRERRORS,
+	IPINADDRERRORS,
+	IPINUNKNOWNPROTOS,
+	IPINDISCARDS,
+	IPINDELIVERS,
+	IPOUTREQUESTS,
+	IPOUTDISCARDS,
+	IPOUTNOROUTES,
+	IPREASMTIMEOUT,
+	IPREASMREQDS,
+	IPREASMOKS,
+	IPREASMFAILS,
+	TCPACTIVEOPENS,
+	TCPPASSIVEOPENS,
+	TCPATTEMPTFAILS,
+	TCPESTABRESETS,
+	TCPCURRESTAB,
+	TCPINSEGS,
+	TCPOUTSEGS,
+	TCPRETRANSSEGS,
+	TCPINERRS,
+	TCPOUTRSTS,
+	TCPRTOMIN,
+	TCPRTOMAX,
+	NR_COUNTERS
+};
+
 static int iwch_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port, int index)
 {
 	struct iwch_dev *dev;
 	struct tp_mib_stats m;
 	int ret;

+	if (port != 0)
+		return 0;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 	dev = to_iwch_dev(ibdev);
 	ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m);
 	if (ret)
 		return -ENOSYS;

-	memset(stats, 0, sizeof *stats);
-	stats->iw.ipInReceives = ((u64) m.ipInReceive_hi << 32) +
-				m.ipInReceive_lo;
-	stats->iw.ipInHdrErrors = ((u64) m.ipInHdrErrors_hi << 32) +
-				  m.ipInHdrErrors_lo;
-	stats->iw.ipInAddrErrors = ((u64) m.ipInAddrErrors_hi << 32) +
-				   m.ipInAddrErrors_lo;
-	stats->iw.ipInUnknownProtos = ((u64) m.ipInUnknownProtos_hi << 32) +
-				      m.ipInUnknownProtos_lo;
-	stats->iw.ipInDiscards = ((u64) m.ipInDiscards_hi << 32) +
-				 m.ipInDiscards_lo;
-	stats->iw.ipInDelivers = ((u64) m.ipInDelivers_hi << 32) +
-				 m.ipInDelivers_lo;
-	stats->iw.ipOutRequests = ((u64) m.ipOutRequests_hi << 32) +
-				  m.ipOutRequests_lo;
-	stats->iw.ipOutDiscards = ((u64) m.ipOutDiscards_hi << 32) +
-				  m.ipOutDiscards_lo;
-	stats->iw.ipOutNoRoutes = ((u64) m.ipOutNoRoutes_hi << 32) +
-				  m.ipOutNoRoutes_lo;
-	stats->iw.ipReasmTimeout = (u64) m.ipReasmTimeout;
-	stats->iw.ipReasmReqds = (u64) m.ipReasmReqds;
-	stats->iw.ipReasmOKs = (u64) m.ipReasmOKs;
-	stats->iw.ipReasmFails = (u64) m.ipReasmFails;
-	stats->iw.tcpActiveOpens = (u64) m.tcpActiveOpens;
-	stats->iw.tcpPassiveOpens = (u64) m.tcpPassiveOpens;
-	stats->iw.tcpAttemptFails = (u64) m.tcpAttemptFails;
-	stats->iw.tcpEstabResets = (u64) m.tcpEstabResets;
-	stats->iw.tcpOutRsts = (u64) m.tcpOutRsts;
-	stats->iw.tcpCurrEstab = (u64) m.tcpCurrEstab;
-	stats->iw.tcpInSegs = ((u64) m.tcpInSegs_hi << 32) +
-			      m.tcpInSegs_lo;
-	stats->iw.tcpOutSegs = ((u64) m.tcpOutSegs_hi << 32) +
-			       m.tcpOutSegs_lo;
-	stats->iw.tcpRetransSegs = ((u64) m.tcpRetransSeg_hi << 32) +
-				  m.tcpRetransSeg_lo;
-	stats->iw.tcpInErrs = ((u64) m.tcpInErrs_hi << 32) +
-			      m.tcpInErrs_lo;
-	stats->iw.tcpRtoMin = (u64) m.tcpRtoMin;
-	stats->iw.tcpRtoMax = (u64) m.tcpRtoMax;
+	/* Ensure we provided all values */
+	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
+
+	stats->dirname = "iw_stats";
+	stats->name = names;
+
+	switch (index) {
+		case IPINRECEIVES:
+			stats->value = ((u64)m.ipInReceive_hi << 32) +	m.ipInReceive_lo;
+			break;
+		case IPINHDRERRORS:
+			stats->value = ((u64)m.ipInHdrErrors_hi << 32) + m.ipInHdrErrors_lo;
+			break;
+		case IPINADDRERRORS:
+			stats->value = ((u64)m.ipInAddrErrors_hi << 32) + m.ipInAddrErrors_lo;
+			break;
+		case IPINUNKNOWNPROTOS:
+			stats->value = ((u64)m.ipInUnknownProtos_hi << 32) + m.ipInUnknownProtos_lo;
+			break;
+		case IPINDISCARDS:
+			stats->value = ((u64)m.ipInDiscards_hi << 32) + m.ipInDiscards_lo;
+			break;
+		case IPINDELIVERS:
+			stats->value = ((u64)m.ipInDelivers_hi << 32) + m.ipInDelivers_lo;
+			break;
+		case IPOUTREQUESTS:
+			stats->value = ((u64)m.ipOutRequests_hi << 32) + m.ipOutRequests_lo;
+			break;
+		case IPOUTDISCARDS:
+			stats->value = ((u64)m.ipOutDiscards_hi << 32) + m.ipOutDiscards_lo;
+			break;
+		case IPOUTNOROUTES:
+			stats->value = ((u64)m.ipOutNoRoutes_hi << 32) + m.ipOutNoRoutes_lo;
+			break;
+		case IPREASMTIMEOUT:
+			stats->value = m.ipReasmTimeout;
+			break;
+		case IPREASMREQDS:
+			stats->value = m.ipReasmReqds;
+			break;
+		case IPREASMOKS:
+			stats->value = m.ipReasmOKs;
+			break;
+		case IPREASMFAILS:
+			stats->value = m.ipReasmFails;
+			break;
+		case TCPACTIVEOPENS:
+			stats->value = m.tcpActiveOpens;
+			break;
+		case TCPPASSIVEOPENS:
+			stats->value = m.tcpPassiveOpens;
+			break;
+		case TCPATTEMPTFAILS:
+			stats->value = m.tcpAttemptFails;
+			break;
+		case TCPESTABRESETS:
+			stats->value = m.tcpEstabResets;
+			break;
+		case TCPCURRESTAB:
+			stats->value = m.tcpOutRsts;
+			break;
+		case TCPINSEGS:
+			stats->value = m.tcpCurrEstab;
+			break;
+		case TCPOUTSEGS:
+			stats->value = ((u64)m.tcpInSegs_hi << 32) + m.tcpInSegs_lo;
+			break;
+		case TCPRETRANSSEGS:
+			stats->value = ((u64)m.tcpOutSegs_hi << 32) + m.tcpOutSegs_lo;
+			break;
+		case TCPINERRS:
+			stats->value= ((u64)m.tcpRetransSeg_hi << 32) + m.tcpRetransSeg_lo;
+			break;
+		case TCPOUTRSTS:
+			stats->value = ((u64)m.tcpInErrs_hi << 32) + m.tcpInErrs_lo;
+			break;
+		case TCPRTOMIN:
+			stats->value= m.tcpRtoMin;
+			break;
+		case TCPRTOMAX:
+			stats->value= m.tcpRtoMax;
+			break;
+		default:
+			stats->value = -1;
+			break;
+	}
+
 	return 0;
 }

Index: linux/drivers/infiniband/hw/cxgb4/provider.c
===================================================================
--- linux.orig/drivers/infiniband/hw/cxgb4/provider.c
+++ linux/drivers/infiniband/hw/cxgb4/provider.c
@@ -446,18 +446,54 @@ static ssize_t show_board(struct device
 		       c4iw_dev->rdev.lldi.pdev->device);
 }

+static char *names[] = {
+	"tcpInSegs",
+	"tcpOutSegs",
+	"tcpRetransSegs",
+	"tcpOutRsts",
+	NULL
+};
+
+enum counters {
+	TCPINSEGS,
+	TCPOUTSEGS,
+	TCPRETRANSSEGS,
+	TCPOUTRSTS,
+	NR_COUNTERS
+};
+
 static int c4iw_get_mib(struct ib_device *ibdev,
-			union rdma_protocol_stats *stats)
+			struct rdma_protocol_stats *stats,
+			u8 port, int index)
 {
 	struct tp_tcp_stats v4, v6;
 	struct c4iw_dev *c4iw_dev = to_c4iw_dev(ibdev);

+	if (port != 0)
+		return 0;
+
+	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
+
 	cxgb4_get_tcp_stats(c4iw_dev->rdev.lldi.pdev, &v4, &v6);
-	memset(stats, 0, sizeof *stats);
-	stats->iw.tcpInSegs = v4.tcp_in_segs + v6.tcp_in_segs;
-	stats->iw.tcpOutSegs = v4.tcp_out_segs + v6.tcp_out_segs;
-	stats->iw.tcpRetransSegs = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
-	stats->iw.tcpOutRsts = v4.tcp_out_rsts + v6.tcp_out_rsts;
+
+	stats->name = names;
+	stats->dirname = "iw_stats";
+
+	switch (index) {
+		case TCPINSEGS:
+			stats->value = v4.tcp_in_segs + v6.tcp_in_segs;
+			break;
+		case TCPOUTSEGS:
+			stats->value = v4.tcp_out_segs + v6.tcp_out_segs;
+			break;
+		case TCPRETRANSSEGS:
+			stats->value = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
+			break;
+		case TCPOUTRSTS:
+			stats->value = v4.tcp_out_rsts + v6.tcp_out_rsts;
+		default:
+			stats->value = -1;
+	}

 	return 0;
 }
Index: linux/include/rdma/ib_verbs.h
===================================================================
--- linux.orig/include/rdma/ib_verbs.h
+++ linux/include/rdma/ib_verbs.h
@@ -402,55 +402,12 @@ enum ib_port_speed {
 	IB_SPEED_EDR	= 32
 };

-struct ib_protocol_stats {
-	/* TBD... */
-};
-
-struct iw_protocol_stats {
-	u64	ipInReceives;
-	u64	ipInHdrErrors;
-	u64	ipInTooBigErrors;
-	u64	ipInNoRoutes;
-	u64	ipInAddrErrors;
-	u64	ipInUnknownProtos;
-	u64	ipInTruncatedPkts;
-	u64	ipInDiscards;
-	u64	ipInDelivers;
-	u64	ipOutForwDatagrams;
-	u64	ipOutRequests;
-	u64	ipOutDiscards;
-	u64	ipOutNoRoutes;
-	u64	ipReasmTimeout;
-	u64	ipReasmReqds;
-	u64	ipReasmOKs;
-	u64	ipReasmFails;
-	u64	ipFragOKs;
-	u64	ipFragFails;
-	u64	ipFragCreates;
-	u64	ipInMcastPkts;
-	u64	ipOutMcastPkts;
-	u64	ipInBcastPkts;
-	u64	ipOutBcastPkts;
-
-	u64	tcpRtoAlgorithm;
-	u64	tcpRtoMin;
-	u64	tcpRtoMax;
-	u64	tcpMaxConn;
-	u64	tcpActiveOpens;
-	u64	tcpPassiveOpens;
-	u64	tcpAttemptFails;
-	u64	tcpEstabResets;
-	u64	tcpCurrEstab;
-	u64	tcpInSegs;
-	u64	tcpOutSegs;
-	u64	tcpRetransSegs;
-	u64	tcpInErrs;
-	u64	tcpOutRsts;
-};
-
-union rdma_protocol_stats {
-	struct ib_protocol_stats	ib;
-	struct iw_protocol_stats	iw;
+struct rdma_protocol_stats {
+	char *dirname;		/* Directory name */
+	char **name;		/* NULL terminated list of strings for the names
+				 * of the counters
+				 */
+	u64 value;
 };

 /* Define bits for the various functionality this port needs to be supported by
@@ -1686,7 +1643,8 @@ struct ib_device {
 	struct iw_cm_verbs	     *iwcm;

 	int		           (*get_protocol_stats)(struct ib_device *device,
-							 union rdma_protocol_stats *stats);
+							 struct rdma_protocol_stats *stats,
+							 u8 port, int index);
 	int		           (*query_device)(struct ib_device *device,
 						   struct ib_device_attr *device_attr,
 						   struct ib_udata *udata);
@@ -1903,6 +1861,7 @@ struct ib_device {
 	u8                           node_type;
 	u8                           phys_port_cnt;
 	struct ib_device_attr        attrs;
+	struct attribute_group	     *ag;

 	/**
 	 * The following mandatory functions are used only at device
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                                           ` <alpine.DEB.2.20.1605181401030.29313-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-05-18 20:27                                                             ` Steve Wise
  2016-05-19 14:34                                                               ` Christoph Lameter
  2016-05-19 19:24                                                               ` Doug Ledford
  0 siblings, 2 replies; 42+ messages in thread
From: Steve Wise @ 2016-05-18 20:27 UTC (permalink / raw)
  To: 'Christoph Lameter', 'Doug Ledford'
  Cc: 'Jason Gunthorpe',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Mark Bloch',
	'Steve Wise', 'Majd Dibbiny',
	alonvi-VPRAkNaXOzVWk0Htik3J/w



> -----Original Message-----
> From: Christoph Lameter [mailto:cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org]
> Sent: Wednesday, May 18, 2016 2:02 PM
> To: Doug Ledford
> Cc: Jason Gunthorpe; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mark Bloch; Steve Wise; Majd
> Dibbiny; alonvi-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
> Subject: Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
> 
> Attached a revision of the patch that only fetches one value. Simplifies
> things too.
> 
>

What if you enhance cxgb3 to allow fetching one MIB at a time?  Maybe add a new
RDMA_GET_MIB_ONE (or something similar) command that takes an offset?

 
> Subject: ib core: Make device counter infrastructure dynamic v3
> 
> V2->v3:
> - Fetch individual countes from the device
> - Get rid of the max counter limit
> 
> V1-V2:
>  - Use enums for counters for cxgb3/cxgb4
>  - Use BUILD_BUG_ON instead of BUG_ON
> 
> The current approach in ib_verbs.h is to define a struct with
> statistics fields. ib_verbs is an abstraction layer for
> device drivers and the protocol_stats should be abstract. However,
> in practice these are actually device specific counters (also used
> in such a way in Mellanox OFED). The device stats are a union
> and in practice evolve to a union of device specific stats.
> 
> ARGH!!!
> 
> This should not be defined in a concrete way in ib_verbs.h since the
> API definition needs to be device neutral.
> 
> What we need is an API that can return an arbitrary number
> of counters that may be general or device specific.
> 
> Here we simply define an array of strings that describe the
> counters and an array of 64 bit counters that are
> displayed through sysfs. This makes ib_verbs API appropriately
> abstract and device drivers do not need to update ib_verbs.h
> (and thus break the API) to add more counters,.
> 
> This patch is groundwork to later define the protocol stats for
> Mellanox devices.
> 
> The strings being used in the device drives should only be present
> if the driver actually supports those counters. If there is
> a standardized name then devices should use the same name
> as in other infiniband device drivers.
> 
> Note on cxgb3/4 portions:
> - I was unable to test because of the lack of hardware
> - The counters that are never incremented were omitted. There may be the
>   impression that less counter are supported. But the other counters
>   have never been supported.
> 
> Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Index: linux/drivers/infiniband/core/sysfs.c
> ===================================================================
> --- linux.orig/drivers/infiniband/core/sysfs.c
> +++ linux/drivers/infiniband/core/sysfs.c
> @@ -58,6 +58,7 @@ struct ib_port {
>  	struct attribute_group pkey_group;
>  	u8                     port_num;
>  	struct attribute_group *pma_table;
> +	struct attribute_group *ag;
>  };
> 
>  struct port_attribute {
> @@ -80,6 +81,16 @@ struct port_table_attribute {
>  	__be16			attr_id;
>  };
> 
> +struct ib_port_stats {
> +	struct port_attribute	attr;
> +	u32			index;
> +};
> +
> +struct ib_device_stats {
> +	struct device_attribute	attr;
> +	u32			index;
> +};
> +
>  static ssize_t port_attr_show(struct kobject *kobj,
>  			      struct attribute *attr, char *buf)
>  {
> @@ -733,6 +744,145 @@ static struct attribute_group *get_count
>  	return &pma_group;
>  }
> 
> +static ssize_t show_protocol_stats(struct ib_device *dev, int index,
> +				   u8 port, char *buf)
> +{
> +	ssize_t ret;
> +	struct rdma_protocol_stats s;
> +
> +	ret = dev->get_protocol_stats(dev, &s, port, index);
> +	if (ret)
> +		return ret;
> +
> +	return sprintf(buf, "%llu\n", s.value);
> +}
> +
> +static ssize_t show_device_protocol_stats(struct device *device,
> +					  struct device_attribute *attr,
> +					  char *buf)
> +{
> +	struct ib_device *dev = container_of(device, struct ib_device, dev);
> +	struct ib_device_stats *ds;
> +
> +	ds = container_of(attr, struct ib_device_stats, attr);
> +	return show_protocol_stats(dev, ds->index, 0, buf);
> +}
> +
> +static ssize_t show_port_protocol_stats(struct ib_port *p,
> +					struct port_attribute *attr,
> +					char *buf)
> +{
> +	struct ib_port_stats *ps;
> +
> +	ps = container_of(attr, struct ib_port_stats, attr);
> +	return	show_protocol_stats(p->ibdev, ps->index, p->port_num, buf);
> +}
> +
> +static void free_protocol_stats(struct kobject *kobj,
> +				struct attribute_group *ag)
> +{
> +	struct attribute **da;
> +
> +	sysfs_remove_group(kobj, ag);
> +
> +	for (da = ag->attrs; *da; da++)
> +		kfree(*da);
> +
> +	kfree(ag->attrs);
> +	kfree(ag);
> +}
> +
> +static struct attribute *alloc_stats_port(u32 index, u8 port, char *name)
> +{
> +	struct ib_port_stats *ps;
> +
> +	ps = kmalloc(sizeof(*ps), GFP_KERNEL);
> +	if (!ps)
> +		return NULL;
> +
> +	ps->attr.attr.name = name;
> +	ps->attr.attr.mode = S_IRUGO;
> +	ps->attr.show = show_port_protocol_stats;
> +	ps->attr.store = NULL;
> +	ps->index = index;
> +
> +	return &ps->attr.attr;
> +}
> +
> +static struct attribute *alloc_stats_device(u32 index, u8 port, char *name)
> +{
> +	struct ib_device_stats *da;
> +
> +	da = kmalloc(sizeof(*da), GFP_KERNEL);
> +	if (!da)
> +		return NULL;
> +
> +	da->attr.attr.name = name;
> +	da->attr.attr.mode = S_IRUGO;
> +	da->attr.show = show_device_protocol_stats;
> +	da->attr.store = NULL;
> +	da->index = index;
> +
> +	return &da->attr.attr;
> +}
> +
> +static struct attribute *alloc_stats_attribute(u32 index, u8 port, char
*name)
> +{
> +	return (port) ? alloc_stats_port(index, port, name) :
> +		alloc_stats_device(index, port, name);
> +}
> +
> +static struct attribute_group *create_protocol_stats(struct ib_device
*device,
> +						     struct kobject *kobj,
> +						     u8 port) {
> +	struct attribute_group *ag;
> +	struct rdma_protocol_stats stats = {0};
> +	u32 counters;
> +	u32 i;
> +	int ret;
> +
> +	ret = device->get_protocol_stats(device, &stats, port, 0);
> +
> +	if (ret || !stats.name)
> +		return NULL;
> +
> +	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
> +	if (!ag)
> +		return NULL;
> +
> +	ag->name = stats.dirname;
> +	ag->attrs = kcalloc((counters + 1), sizeof(struct attribute *),
> +			    GFP_KERNEL);
> +	if (!ag->attrs)
> +		goto nomem;
> +
> +	for (i = 0; i < counters; i++) {
> +		struct attribute *attr;
> +
> +		attr = alloc_stats_attribute(i, port, stats.name[i]);
> +		if (!attr)
> +			goto nomem2;
> +
> +		ag->attrs[i] = attr;
> +	}
> +	ag->attrs[counters] = NULL;
> +
> +	ret = sysfs_create_group(kobj, ag);
> +
> +	if (ret)
> +		goto nomem2;
> +
> +	return ag;
> +nomem2:
> +	for (i = 0; i < counters; i++)
> +		kfree(ag->attrs[i]);
> +
> +	kfree(ag->attrs);
> +nomem:
> +	kfree(ag);
> +	return NULL;
> +}
> +
>  static int add_port(struct ib_device *device, int port_num,
>  		    int (*port_callback)(struct ib_device *,
>  					 u8, struct kobject *))
> @@ -835,6 +985,12 @@ static int add_port(struct ib_device *de
>  			goto err_remove_pkey;
>  	}
> 
> +	/* If port == 0, it means we have only one port, so the device should
> +	 * hold the protocol stats
> +	 */
> +	if (device->get_protocol_stats && port_num)
> +		p->ag = create_protocol_stats(device, &p->kobj, port_num);
> +
>  	list_add_tail(&p->kobj.entry, &device->port_list);
> 
>  	kobject_uevent(&p->kobj, KOBJ_ADD);
> @@ -972,120 +1128,6 @@ static struct device_attribute *ib_class
>  	&dev_attr_node_desc
>  };
> 
> -/* Show a given an attribute in the statistics group */
> -static ssize_t show_protocol_stat(const struct device *device,
> -			    struct device_attribute *attr, char *buf,
> -			    unsigned offset)
> -{
> -	struct ib_device *dev = container_of(device, struct ib_device, dev);
> -	union rdma_protocol_stats stats;
> -	ssize_t ret;
> -
> -	ret = dev->get_protocol_stats(dev, &stats);
> -	if (ret)
> -		return ret;
> -
> -	return sprintf(buf, "%llu\n",
> -		       (unsigned long long) ((u64 *) &stats)[offset]);
> -}
> -
> -/* generate a read-only iwarp statistics attribute */
> -#define IW_STATS_ENTRY(name)						\
> -static ssize_t show_##name(struct device *device,			\
> -			   struct device_attribute *attr, char *buf)	\
> -{									\
> -	return show_protocol_stat(device, attr, buf,			\
> -				  offsetof(struct iw_protocol_stats, name) / \
> -				  sizeof (u64));			\
> -}									\
> -static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
> -
> -IW_STATS_ENTRY(ipInReceives);
> -IW_STATS_ENTRY(ipInHdrErrors);
> -IW_STATS_ENTRY(ipInTooBigErrors);
> -IW_STATS_ENTRY(ipInNoRoutes);
> -IW_STATS_ENTRY(ipInAddrErrors);
> -IW_STATS_ENTRY(ipInUnknownProtos);
> -IW_STATS_ENTRY(ipInTruncatedPkts);
> -IW_STATS_ENTRY(ipInDiscards);
> -IW_STATS_ENTRY(ipInDelivers);
> -IW_STATS_ENTRY(ipOutForwDatagrams);
> -IW_STATS_ENTRY(ipOutRequests);
> -IW_STATS_ENTRY(ipOutDiscards);
> -IW_STATS_ENTRY(ipOutNoRoutes);
> -IW_STATS_ENTRY(ipReasmTimeout);
> -IW_STATS_ENTRY(ipReasmReqds);
> -IW_STATS_ENTRY(ipReasmOKs);
> -IW_STATS_ENTRY(ipReasmFails);
> -IW_STATS_ENTRY(ipFragOKs);
> -IW_STATS_ENTRY(ipFragFails);
> -IW_STATS_ENTRY(ipFragCreates);
> -IW_STATS_ENTRY(ipInMcastPkts);
> -IW_STATS_ENTRY(ipOutMcastPkts);
> -IW_STATS_ENTRY(ipInBcastPkts);
> -IW_STATS_ENTRY(ipOutBcastPkts);
> -IW_STATS_ENTRY(tcpRtoAlgorithm);
> -IW_STATS_ENTRY(tcpRtoMin);
> -IW_STATS_ENTRY(tcpRtoMax);
> -IW_STATS_ENTRY(tcpMaxConn);
> -IW_STATS_ENTRY(tcpActiveOpens);
> -IW_STATS_ENTRY(tcpPassiveOpens);
> -IW_STATS_ENTRY(tcpAttemptFails);
> -IW_STATS_ENTRY(tcpEstabResets);
> -IW_STATS_ENTRY(tcpCurrEstab);
> -IW_STATS_ENTRY(tcpInSegs);
> -IW_STATS_ENTRY(tcpOutSegs);
> -IW_STATS_ENTRY(tcpRetransSegs);
> -IW_STATS_ENTRY(tcpInErrs);
> -IW_STATS_ENTRY(tcpOutRsts);
> -
> -static struct attribute *iw_proto_stats_attrs[] = {
> -	&dev_attr_ipInReceives.attr,
> -	&dev_attr_ipInHdrErrors.attr,
> -	&dev_attr_ipInTooBigErrors.attr,
> -	&dev_attr_ipInNoRoutes.attr,
> -	&dev_attr_ipInAddrErrors.attr,
> -	&dev_attr_ipInUnknownProtos.attr,
> -	&dev_attr_ipInTruncatedPkts.attr,
> -	&dev_attr_ipInDiscards.attr,
> -	&dev_attr_ipInDelivers.attr,
> -	&dev_attr_ipOutForwDatagrams.attr,
> -	&dev_attr_ipOutRequests.attr,
> -	&dev_attr_ipOutDiscards.attr,
> -	&dev_attr_ipOutNoRoutes.attr,
> -	&dev_attr_ipReasmTimeout.attr,
> -	&dev_attr_ipReasmReqds.attr,
> -	&dev_attr_ipReasmOKs.attr,
> -	&dev_attr_ipReasmFails.attr,
> -	&dev_attr_ipFragOKs.attr,
> -	&dev_attr_ipFragFails.attr,
> -	&dev_attr_ipFragCreates.attr,
> -	&dev_attr_ipInMcastPkts.attr,
> -	&dev_attr_ipOutMcastPkts.attr,
> -	&dev_attr_ipInBcastPkts.attr,
> -	&dev_attr_ipOutBcastPkts.attr,
> -	&dev_attr_tcpRtoAlgorithm.attr,
> -	&dev_attr_tcpRtoMin.attr,
> -	&dev_attr_tcpRtoMax.attr,
> -	&dev_attr_tcpMaxConn.attr,
> -	&dev_attr_tcpActiveOpens.attr,
> -	&dev_attr_tcpPassiveOpens.attr,
> -	&dev_attr_tcpAttemptFails.attr,
> -	&dev_attr_tcpEstabResets.attr,
> -	&dev_attr_tcpCurrEstab.attr,
> -	&dev_attr_tcpInSegs.attr,
> -	&dev_attr_tcpOutSegs.attr,
> -	&dev_attr_tcpRetransSegs.attr,
> -	&dev_attr_tcpInErrs.attr,
> -	&dev_attr_tcpOutRsts.attr,
> -	NULL
> -};
> -
> -static struct attribute_group iw_stats_group = {
> -	.name	= "proto_stats",
> -	.attrs	= iw_proto_stats_attrs,
> -};
> -
>  static void free_port_list_attributes(struct ib_device *device)
>  {
>  	struct kobject *p, *t;
> @@ -1093,6 +1135,8 @@ static void free_port_list_attributes(st
>  	list_for_each_entry_safe(p, t, &device->port_list, entry) {
>  		struct ib_port *port = container_of(p, struct ib_port, kobj);
>  		list_del(&p->entry);
> +		if (port->ag)
> +			free_protocol_stats(&port->kobj, port->ag);
>  		sysfs_remove_group(p, port->pma_table);
>  		sysfs_remove_group(p, &port->pkey_group);
>  		sysfs_remove_group(p, &port->gid_group);
> @@ -1149,12 +1193,14 @@ int ib_device_register_sysfs(struct ib_d
>  		}
>  	}
> 
> -	if (device->node_type == RDMA_NODE_RNIC && device-
> >get_protocol_stats) {
> -		ret = sysfs_create_group(&class_dev->kobj, &iw_stats_group);
> -		if (ret)
> -			goto err_put;
> -	}
> +	if (!device->get_protocol_stats)
> +		return 0;
> +
> +	device->ag = create_protocol_stats(device, &class_dev->kobj, 0);
> 
> +	/* If create_protocol_stats returns NULL it might not be a failure, so
> +	 * do nothing
> +	 */
>  	return 0;
> 
>  err_put:
> @@ -1169,15 +1215,13 @@ err:
> 
>  void ib_device_unregister_sysfs(struct ib_device *device)
>  {
> -	/* Hold kobject until ib_dealloc_device() */
> -	struct kobject *kobj_dev = kobject_get(&device->dev.kobj);
>  	int i;
> 
> -	if (device->node_type == RDMA_NODE_RNIC && device-
> >get_protocol_stats)
> -		sysfs_remove_group(kobj_dev, &iw_stats_group);
> -
>  	free_port_list_attributes(device);
> 
> +	if (device->ag)
> +		free_protocol_stats(&device->dev.kobj, device->ag);
> +
>  	for (i = 0; i < ARRAY_SIZE(ib_class_attributes); ++i)
>  		device_remove_file(&device->dev, ib_class_attributes[i]);
> 
> Index: linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
> ===================================================================
> --- linux.orig/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ linux/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1219,58 +1219,168 @@ static ssize_t show_board(struct device
>  		       iwch_dev->rdev.rnic_info.pdev->device);
>  }
> 
> +char *names[] = {
> +	"ipInReceives",
> +	"ipInHdrErrors",
> +	"ipInAddrErrors",
> +	"ipInUnknownProtos",
> +	"ipInDiscards",
> +	"ipInDelivers",
> +	"ipOutRequests",
> +	"ipOutDiscards",
> +	"ipOutNoRoutes",
> +	"ipReasmTimeout",
> +	"ipReasmReqds",
> +	"ipReasmOKs",
> +	"ipReasmFails",
> +	"tcpActiveOpens",
> +	"tcpPassiveOpens",
> +	"tcpAttemptFails",
> +	"tcpEstabResets",
> +	"tcpCurrEstab",
> +	"tcpInSegs",
> +	"tcpOutSegs",
> +	"tcpRetransSegs",
> +	"tcpInErrs",
> +	"tcpOutRsts",
> +	"tcpRtoMin",
> +	"tcpRtoMax",
> +	NULL
> +};
> +
> +enum counters {
> +	IPINRECEIVES,
> +	IPINHDRERRORS,
> +	IPINADDRERRORS,
> +	IPINUNKNOWNPROTOS,
> +	IPINDISCARDS,
> +	IPINDELIVERS,
> +	IPOUTREQUESTS,
> +	IPOUTDISCARDS,
> +	IPOUTNOROUTES,
> +	IPREASMTIMEOUT,
> +	IPREASMREQDS,
> +	IPREASMOKS,
> +	IPREASMFAILS,
> +	TCPACTIVEOPENS,
> +	TCPPASSIVEOPENS,
> +	TCPATTEMPTFAILS,
> +	TCPESTABRESETS,
> +	TCPCURRESTAB,
> +	TCPINSEGS,
> +	TCPOUTSEGS,
> +	TCPRETRANSSEGS,
> +	TCPINERRS,
> +	TCPOUTRSTS,
> +	TCPRTOMIN,
> +	TCPRTOMAX,
> +	NR_COUNTERS
> +};
> +
>  static int iwch_get_mib(struct ib_device *ibdev,
> -			union rdma_protocol_stats *stats)
> +			struct rdma_protocol_stats *stats,
> +			u8 port, int index)
>  {
>  	struct iwch_dev *dev;
>  	struct tp_mib_stats m;
>  	int ret;
> 
> +	if (port != 0)
> +		return 0;
> +
>  	PDBG("%s ibdev %p\n", __func__, ibdev);
>  	dev = to_iwch_dev(ibdev);
>  	ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m);
>  	if (ret)
>  		return -ENOSYS;
> 
> -	memset(stats, 0, sizeof *stats);
> -	stats->iw.ipInReceives = ((u64) m.ipInReceive_hi << 32) +
> -				m.ipInReceive_lo;
> -	stats->iw.ipInHdrErrors = ((u64) m.ipInHdrErrors_hi << 32) +
> -				  m.ipInHdrErrors_lo;
> -	stats->iw.ipInAddrErrors = ((u64) m.ipInAddrErrors_hi << 32) +
> -				   m.ipInAddrErrors_lo;
> -	stats->iw.ipInUnknownProtos = ((u64) m.ipInUnknownProtos_hi << 32) +
> -				      m.ipInUnknownProtos_lo;
> -	stats->iw.ipInDiscards = ((u64) m.ipInDiscards_hi << 32) +
> -				 m.ipInDiscards_lo;
> -	stats->iw.ipInDelivers = ((u64) m.ipInDelivers_hi << 32) +
> -				 m.ipInDelivers_lo;
> -	stats->iw.ipOutRequests = ((u64) m.ipOutRequests_hi << 32) +
> -				  m.ipOutRequests_lo;
> -	stats->iw.ipOutDiscards = ((u64) m.ipOutDiscards_hi << 32) +
> -				  m.ipOutDiscards_lo;
> -	stats->iw.ipOutNoRoutes = ((u64) m.ipOutNoRoutes_hi << 32) +
> -				  m.ipOutNoRoutes_lo;
> -	stats->iw.ipReasmTimeout = (u64) m.ipReasmTimeout;
> -	stats->iw.ipReasmReqds = (u64) m.ipReasmReqds;
> -	stats->iw.ipReasmOKs = (u64) m.ipReasmOKs;
> -	stats->iw.ipReasmFails = (u64) m.ipReasmFails;
> -	stats->iw.tcpActiveOpens = (u64) m.tcpActiveOpens;
> -	stats->iw.tcpPassiveOpens = (u64) m.tcpPassiveOpens;
> -	stats->iw.tcpAttemptFails = (u64) m.tcpAttemptFails;
> -	stats->iw.tcpEstabResets = (u64) m.tcpEstabResets;
> -	stats->iw.tcpOutRsts = (u64) m.tcpOutRsts;
> -	stats->iw.tcpCurrEstab = (u64) m.tcpCurrEstab;
> -	stats->iw.tcpInSegs = ((u64) m.tcpInSegs_hi << 32) +
> -			      m.tcpInSegs_lo;
> -	stats->iw.tcpOutSegs = ((u64) m.tcpOutSegs_hi << 32) +
> -			       m.tcpOutSegs_lo;
> -	stats->iw.tcpRetransSegs = ((u64) m.tcpRetransSeg_hi << 32) +
> -				  m.tcpRetransSeg_lo;
> -	stats->iw.tcpInErrs = ((u64) m.tcpInErrs_hi << 32) +
> -			      m.tcpInErrs_lo;
> -	stats->iw.tcpRtoMin = (u64) m.tcpRtoMin;
> -	stats->iw.tcpRtoMax = (u64) m.tcpRtoMax;
> +	/* Ensure we provided all values */
> +	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
> +
> +	stats->dirname = "iw_stats";
> +	stats->name = names;
> +
> +	switch (index) {
> +		case IPINRECEIVES:
> +			stats->value = ((u64)m.ipInReceive_hi << 32) +
> 	m.ipInReceive_lo;
> +			break;
> +		case IPINHDRERRORS:
> +			stats->value = ((u64)m.ipInHdrErrors_hi << 32) +
> m.ipInHdrErrors_lo;
> +			break;
> +		case IPINADDRERRORS:
> +			stats->value = ((u64)m.ipInAddrErrors_hi << 32) +
> m.ipInAddrErrors_lo;
> +			break;
> +		case IPINUNKNOWNPROTOS:
> +			stats->value = ((u64)m.ipInUnknownProtos_hi << 32) +
> m.ipInUnknownProtos_lo;
> +			break;
> +		case IPINDISCARDS:
> +			stats->value = ((u64)m.ipInDiscards_hi << 32) +
> m.ipInDiscards_lo;
> +			break;
> +		case IPINDELIVERS:
> +			stats->value = ((u64)m.ipInDelivers_hi << 32) +
> m.ipInDelivers_lo;
> +			break;
> +		case IPOUTREQUESTS:
> +			stats->value = ((u64)m.ipOutRequests_hi << 32) +
> m.ipOutRequests_lo;
> +			break;
> +		case IPOUTDISCARDS:
> +			stats->value = ((u64)m.ipOutDiscards_hi << 32) +
> m.ipOutDiscards_lo;
> +			break;
> +		case IPOUTNOROUTES:
> +			stats->value = ((u64)m.ipOutNoRoutes_hi << 32) +
> m.ipOutNoRoutes_lo;
> +			break;
> +		case IPREASMTIMEOUT:
> +			stats->value = m.ipReasmTimeout;
> +			break;
> +		case IPREASMREQDS:
> +			stats->value = m.ipReasmReqds;
> +			break;
> +		case IPREASMOKS:
> +			stats->value = m.ipReasmOKs;
> +			break;
> +		case IPREASMFAILS:
> +			stats->value = m.ipReasmFails;
> +			break;
> +		case TCPACTIVEOPENS:
> +			stats->value = m.tcpActiveOpens;
> +			break;
> +		case TCPPASSIVEOPENS:
> +			stats->value = m.tcpPassiveOpens;
> +			break;
> +		case TCPATTEMPTFAILS:
> +			stats->value = m.tcpAttemptFails;
> +			break;
> +		case TCPESTABRESETS:
> +			stats->value = m.tcpEstabResets;
> +			break;
> +		case TCPCURRESTAB:
> +			stats->value = m.tcpOutRsts;
> +			break;
> +		case TCPINSEGS:
> +			stats->value = m.tcpCurrEstab;
> +			break;
> +		case TCPOUTSEGS:
> +			stats->value = ((u64)m.tcpInSegs_hi << 32) +
> m.tcpInSegs_lo;
> +			break;
> +		case TCPRETRANSSEGS:
> +			stats->value = ((u64)m.tcpOutSegs_hi << 32) +
> m.tcpOutSegs_lo;
> +			break;
> +		case TCPINERRS:
> +			stats->value= ((u64)m.tcpRetransSeg_hi << 32) +
> m.tcpRetransSeg_lo;
> +			break;
> +		case TCPOUTRSTS:
> +			stats->value = ((u64)m.tcpInErrs_hi << 32) +
m.tcpInErrs_lo;
> +			break;
> +		case TCPRTOMIN:
> +			stats->value= m.tcpRtoMin;
> +			break;
> +		case TCPRTOMAX:
> +			stats->value= m.tcpRtoMax;
> +			break;
> +		default:
> +			stats->value = -1;
> +			break;
> +	}
> +
>  	return 0;
>  }
> 
> Index: linux/drivers/infiniband/hw/cxgb4/provider.c
> ===================================================================
> --- linux.orig/drivers/infiniband/hw/cxgb4/provider.c
> +++ linux/drivers/infiniband/hw/cxgb4/provider.c
> @@ -446,18 +446,54 @@ static ssize_t show_board(struct device
>  		       c4iw_dev->rdev.lldi.pdev->device);
>  }
> 
> +static char *names[] = {
> +	"tcpInSegs",
> +	"tcpOutSegs",
> +	"tcpRetransSegs",
> +	"tcpOutRsts",
> +	NULL
> +};
> +
> +enum counters {
> +	TCPINSEGS,
> +	TCPOUTSEGS,
> +	TCPRETRANSSEGS,
> +	TCPOUTRSTS,
> +	NR_COUNTERS
> +};
> +
>  static int c4iw_get_mib(struct ib_device *ibdev,
> -			union rdma_protocol_stats *stats)
> +			struct rdma_protocol_stats *stats,
> +			u8 port, int index)
>  {
>  	struct tp_tcp_stats v4, v6;
>  	struct c4iw_dev *c4iw_dev = to_c4iw_dev(ibdev);
> 
> +	if (port != 0)
> +		return 0;
> +
> +	BUILD_BUG_ON(NR_COUNTERS != (sizeof(names) / sizeof(char *)) - 1);
> +
>  	cxgb4_get_tcp_stats(c4iw_dev->rdev.lldi.pdev, &v4, &v6);
> -	memset(stats, 0, sizeof *stats);
> -	stats->iw.tcpInSegs = v4.tcp_in_segs + v6.tcp_in_segs;
> -	stats->iw.tcpOutSegs = v4.tcp_out_segs + v6.tcp_out_segs;
> -	stats->iw.tcpRetransSegs = v4.tcp_retrans_segs + v6.tcp_retrans_segs;
> -	stats->iw.tcpOutRsts = v4.tcp_out_rsts + v6.tcp_out_rsts;
> +
> +	stats->name = names;
> +	stats->dirname = "iw_stats";
> +
> +	switch (index) {
> +		case TCPINSEGS:
> +			stats->value = v4.tcp_in_segs + v6.tcp_in_segs;
> +			break;
> +		case TCPOUTSEGS:
> +			stats->value = v4.tcp_out_segs + v6.tcp_out_segs;
> +			break;
> +		case TCPRETRANSSEGS:
> +			stats->value = v4.tcp_retrans_segs +
v6.tcp_retrans_segs;
> +			break;
> +		case TCPOUTRSTS:
> +			stats->value = v4.tcp_out_rsts + v6.tcp_out_rsts;
> +		default:
> +			stats->value = -1;
> +	}
> 
>  	return 0;
>  }
> Index: linux/include/rdma/ib_verbs.h
> ===================================================================
> --- linux.orig/include/rdma/ib_verbs.h
> +++ linux/include/rdma/ib_verbs.h
> @@ -402,55 +402,12 @@ enum ib_port_speed {
>  	IB_SPEED_EDR	= 32
>  };
> 
> -struct ib_protocol_stats {
> -	/* TBD... */
> -};
> -
> -struct iw_protocol_stats {
> -	u64	ipInReceives;
> -	u64	ipInHdrErrors;
> -	u64	ipInTooBigErrors;
> -	u64	ipInNoRoutes;
> -	u64	ipInAddrErrors;
> -	u64	ipInUnknownProtos;
> -	u64	ipInTruncatedPkts;
> -	u64	ipInDiscards;
> -	u64	ipInDelivers;
> -	u64	ipOutForwDatagrams;
> -	u64	ipOutRequests;
> -	u64	ipOutDiscards;
> -	u64	ipOutNoRoutes;
> -	u64	ipReasmTimeout;
> -	u64	ipReasmReqds;
> -	u64	ipReasmOKs;
> -	u64	ipReasmFails;
> -	u64	ipFragOKs;
> -	u64	ipFragFails;
> -	u64	ipFragCreates;
> -	u64	ipInMcastPkts;
> -	u64	ipOutMcastPkts;
> -	u64	ipInBcastPkts;
> -	u64	ipOutBcastPkts;
> -
> -	u64	tcpRtoAlgorithm;
> -	u64	tcpRtoMin;
> -	u64	tcpRtoMax;
> -	u64	tcpMaxConn;
> -	u64	tcpActiveOpens;
> -	u64	tcpPassiveOpens;
> -	u64	tcpAttemptFails;
> -	u64	tcpEstabResets;
> -	u64	tcpCurrEstab;
> -	u64	tcpInSegs;
> -	u64	tcpOutSegs;
> -	u64	tcpRetransSegs;
> -	u64	tcpInErrs;
> -	u64	tcpOutRsts;
> -};
> -
> -union rdma_protocol_stats {
> -	struct ib_protocol_stats	ib;
> -	struct iw_protocol_stats	iw;
> +struct rdma_protocol_stats {
> +	char *dirname;		/* Directory name */
> +	char **name;		/* NULL terminated list of strings for the names
> +				 * of the counters
> +				 */
> +	u64 value;
>  };
> 
>  /* Define bits for the various functionality this port needs to be supported
by
> @@ -1686,7 +1643,8 @@ struct ib_device {
>  	struct iw_cm_verbs	     *iwcm;
> 
>  	int		           (*get_protocol_stats)(struct ib_device
*device,
> -							 union
> rdma_protocol_stats *stats);
> +							 struct
> rdma_protocol_stats *stats,
> +							 u8 port, int index);
>  	int		           (*query_device)(struct ib_device *device,
>  						   struct ib_device_attr
> *device_attr,
>  						   struct ib_udata *udata);
> @@ -1903,6 +1861,7 @@ struct ib_device {
>  	u8                           node_type;
>  	u8                           phys_port_cnt;
>  	struct ib_device_attr        attrs;
> +	struct attribute_group	     *ag;
> 
>  	/**
>  	 * The following mandatory functions are used only at device

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
       [not found]                                                       ` <12e991bc-aa9b-a8b0-3cd4-b56d58a44d60-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-05-18 18:17                                                         ` Steve Wise
  2016-05-18 19:01                                                         ` Christoph Lameter
@ 2016-05-18 21:11                                                         ` Jason Gunthorpe
  2 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2016-05-18 21:11 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mark Bloch,
	Steve Wise, Majd Dibbiny, alonvi-VPRAkNaXOzVWk0Htik3J/w

On Wed, May 18, 2016 at 02:11:27PM -0400, Doug Ledford wrote:
> On 05/18/2016 01:25 PM, Jason Gunthorpe wrote:
> > Why would people manually use sysfs? A netlink interface would be
> > accompanied by a tool. I don't schlep around in sysfs for netdev, I
> > use 'ip -s link show'
> 
> And how are you going to get RNR Retries from an IB device with
> that?

We don't have a netlink tool for rdma devices yet, I'm imagining that
as part of adding netlink bulk access.

> that happen, I save the age in jiffies when they are taken.  That could
> be used to provide their timestamp and give meaningful time context to
> the counter.  They simply don't when you are reading the files in sysfs.

Why do all this weird work to solve a problem we don't even have?
netlink is the way to get bulk stats out of devices, that is what
netdev does, overkilling sysfs seems unnecessary...

JAson
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
  2016-05-18 20:27                                                             ` Steve Wise
@ 2016-05-19 14:34                                                               ` Christoph Lameter
  2016-05-19 19:24                                                               ` Doug Ledford
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2016-05-19 14:34 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Doug Ledford', 'Jason Gunthorpe',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Mark Bloch',
	'Steve Wise', 'Majd Dibbiny',
	alonvi-VPRAkNaXOzVWk0Htik3J/w

Good idea. Do it. This patch enables retrieval of single values from the
drivers. The way that a driver implements that is beyond the scope of
this patch.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
  2016-05-18 20:27                                                             ` Steve Wise
  2016-05-19 14:34                                                               ` Christoph Lameter
@ 2016-05-19 19:24                                                               ` Doug Ledford
  1 sibling, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2016-05-19 19:24 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Lameter'
  Cc: 'Jason Gunthorpe',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Mark Bloch',
	'Steve Wise', 'Majd Dibbiny',
	alonvi-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On 05/18/2016 04:27 PM, Steve Wise wrote:
> 
> 
>> -----Original Message-----
>> From: Christoph Lameter [mailto:cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org]
>> Sent: Wednesday, May 18, 2016 2:02 PM
>> To: Doug Ledford
>> Cc: Jason Gunthorpe; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mark Bloch; Steve Wise; Majd
>> Dibbiny; alonvi-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
>> Subject: Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic
>>
>> Attached a revision of the patch that only fetches one value. Simplifies
>> things too.
>>
>>
> 
> What if you enhance cxgb3 to allow fetching one MIB at a time?  Maybe add a new
> RDMA_GET_MIB_ONE (or something similar) command that takes an offset?

I have a version that does caching compiling now.  I'm getting ready to
test it.  All in all, there were significant re-writes from Christoph's
patch.  But the caching is flexible enough that it supports the i40iw
driver's needs out of the box (namely, typical cache age on parent
devices, but make vfs have a 1 second delay between updates because of
the expense of getting the stats).  It also works out nicely in that in
the i40iw driver, I'm pretty sure you can safely just pass the core
value[] array to the get_hw_stats function and it would all work, but
for now I left it using its own struct and doing a memcpy before it
passes the struct back to the core.  Other drivers might be able to do
the same thing.

And I thought about making the get_stats function pass in the specific
stat, and then leaving it up to the driver whether or not to update the
lot of just one, depending on the cost of getting stats.  That would be
an easy change from what I have now.

Once I have a successful test on cxgb4 I'll post the updated patch.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-05-19 19:24 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-15 15:54 [PATCH 0/3] Dynamically extendable device counter support Christoph Lameter
2016-03-15 15:54 ` [PATCH 1/3] ib core: Make device counter infrastructure dynamic Christoph Lameter
     [not found]   ` <20160315155455.173645653-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
2016-03-16 15:47     ` Dennis Dalessandro
     [not found]       ` <20160316154738.GA26530-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2016-03-16 17:17         ` Christoph Lameter
     [not found]           ` <alpine.DEB.2.20.1603161213560.15010-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-03-16 17:45             ` Dennis Dalessandro
2016-03-17  7:23     ` Leon Romanovsky
     [not found]       ` <20160317072354.GB25216-2ukJVAZIZ/Y@public.gmane.org>
2016-03-17  8:17         ` Leon Romanovsky
     [not found]           ` <20160317081716.GD25216-2ukJVAZIZ/Y@public.gmane.org>
2016-03-18  1:57             ` Christoph Lameter
     [not found]               ` <alpine.DEB.2.20.1603172057030.18714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-03-18  6:20                 ` Leon Romanovsky
     [not found]                   ` <20160318062009.GI25216-2ukJVAZIZ/Y@public.gmane.org>
2016-03-18 14:33                     ` Christoph Lameter
     [not found]                       ` <alpine.DEB.2.20.1603180932310.25235-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-03-18 15:51                         ` Leon Romanovsky
2016-03-18  1:56         ` Christoph Lameter
     [not found]           ` <alpine.DEB.2.20.1603172054540.18714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-03-18  6:16             ` Leon Romanovsky
     [not found]               ` <20160318061659.GH25216-2ukJVAZIZ/Y@public.gmane.org>
2016-03-18 14:34                 ` Christoph Lameter
     [not found]                   ` <alpine.DEB.2.20.1603180933320.25235-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-03-18 15:42                     ` Leon Romanovsky
2016-05-13 19:18     ` Doug Ledford
     [not found]       ` <057c8ac8-1d34-e7b9-c0ad-91d805c81139-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-16 13:52         ` Christoph Lameter
     [not found]           ` <alpine.DEB.2.20.1605160851370.23895-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-05-16 15:43             ` Doug Ledford
     [not found]               ` <041c6da0-e022-2bd1-5f00-e569c077e154-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-16 17:04                 ` Christoph Lameter
     [not found]                   ` <alpine.DEB.2.20.1605161203010.26453-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-05-16 17:27                     ` Doug Ledford
     [not found]                       ` <db1bf3dd-20b1-3f2f-eaa3-25e5e8aff35b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-16 17:49                         ` Christoph Lameter
     [not found]                           ` <alpine.DEB.2.20.1605161238560.10085-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-05-16 18:01                             ` Doug Ledford
     [not found]                               ` <102cd100-55f7-fa85-cd75-ba0db5b9fa34-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-17 14:19                                 ` Christoph Lameter
     [not found]                                   ` <alpine.DEB.2.20.1605170918080.9956-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-05-17 16:00                                     ` Doug Ledford
     [not found]                                       ` <3e9a3e19-58cb-c25f-89a1-f0e51df562d8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-17 16:06                                         ` Steve Wise
2016-05-17 17:00                                         ` Jason Gunthorpe
     [not found]                                           ` <20160517170027.GC19976-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-05-17 17:34                                             ` Doug Ledford
     [not found]                                               ` <bc83cc75-aa2e-6fc6-e1c6-a0190b972013-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-18 17:25                                                 ` Jason Gunthorpe
     [not found]                                                   ` <20160518172542.GA15516-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-05-18 18:11                                                     ` Doug Ledford
     [not found]                                                       ` <12e991bc-aa9b-a8b0-3cd4-b56d58a44d60-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-18 18:17                                                         ` Steve Wise
2016-05-18 19:01                                                         ` Christoph Lameter
     [not found]                                                           ` <alpine.DEB.2.20.1605181401030.29313-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-05-18 20:27                                                             ` Steve Wise
2016-05-19 14:34                                                               ` Christoph Lameter
2016-05-19 19:24                                                               ` Doug Ledford
2016-05-18 21:11                                                         ` Jason Gunthorpe
2016-05-16 18:06                             ` Jason Gunthorpe
2016-05-16 18:27                             ` Steve Wise
2016-03-15 15:54 ` [PATCH 2/3] mlx4: Add support for protocol statistics Christoph Lameter
2016-03-15 15:54 ` [PATCH 3/3] mlx5: Implement new counter infrastructure Christoph Lameter
     [not found]   ` <20160315155455.397561811-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
2016-05-13 19:18     ` Doug Ledford
     [not found]       ` <d01d42d9-08a8-968a-97f9-40188301170a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-16 13:52         ` Christoph Lameter
     [not found]           ` <alpine.DEB.2.20.1605160852260.23895-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-05-16 15:44             ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.