All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices
@ 2021-06-07  8:17 Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 01/15] RDMA: Split the alloc_hw_stats() ops to port and device variants Leon Romanovsky
                   ` (14 more replies)
  0 siblings, 15 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Leon Romanovsky,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Chagelog:
v1:
 * Added two new patches to the series
     RDMA/core: Allow port_groups to be used with namespaces
     RDMA: Remove rdma_set_device_sysfs_group()
 * Fixed missing ops definition in device.c
 * Passed proper internal validation and review
 * changed EXPORT_SYMBOL to be EXPORT_SYMBOL_GPL for the ib_port_sysfs_create_groups
 * qib was converted to use .is_visible() callback together with static
   attribute_group declaration.
v0: https://lore.kernel.org/linux-rdma/0-v1-34c90fa45f1c+3c7b0-port_sysfs_jgg@nvidia.com/

-------------------------------------------------------------------------------
From Jason,

IB has a complex sysfs with a deep nesting of attributes. Nathan and Kees
recently noticed this was not even slightly sane with how it was handling
attributes and a deeper inspection shows the whole thing is a pretty
"ick" coding style.

Further review shows the ick extends outward from the ib_port sysfs and
basically everything is pretty crazy.

Simplify all of it:

 - Organize the ib_port and gid_attr's kobj's to have clear setup/destroy
   function pairings that work only on their own kobjs.

 - All memory allocated in service of a kobject's attributes is freed as
   part of the kobj release function. Thus all the error handling defers
   the memory frees to a put.

 - Build up lists of groups for every kobject and add the entire group
   list as a one-shot operation as the last thing in setup function.

 - Remove essentially all the error cleanup. The final kobject_put() will
   always free any memory allocated or do an internal kobject_del() if
   required. The new ordering eliminates all the other cleanup cases.

 - Make all attributes use proper typing for the kobj they are attached
   to. Split device and port hw_stats handling.

 - Create a ib_port_attribute type and change hfi1, qib and the CM code to
   work with attribute lists of ib_port_attribute type instead of building
   their own kobject madness

Thanks

Jason Gunthorpe (15):
  RDMA: Split the alloc_hw_stats() ops to port and device variants
  RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port
    pointer
  RDMA/core: Split port and device counter sysfs attributes
  RDMA/core: Split gid_attrs related sysfs from add_port()
  RDMA/core: Simplify how the gid_attrs sysfs is created
  RDMA/core: Simplify how the port sysfs is created
  RDMA/core: Create the device hw_counters through the normal groups
    mechanism
  RDMA/core: Remove the kobject_uevent() NOP
  RDMA/core: Expose the ib port sysfs attribute machinery
  RDMA/cm: Use an attribute_group on the ib_port_attribute intead of
    kobj's
  RDMA/qib: Use attributes for the port sysfs
  RDMA/hfi1: Use attributes for the port sysfs
  RDMA: Change ops->init_port to ops->port_groups
  RDMA/core: Allow port_groups to be used with namespaces
  RDMA: Remove rdma_set_device_sysfs_group()

 drivers/infiniband/core/cm.c                  |  227 ++--
 drivers/infiniband/core/core_priv.h           |   13 +-
 drivers/infiniband/core/counters.c            |    4 +-
 drivers/infiniband/core/device.c              |   30 +-
 drivers/infiniband/core/nldev.c               |   10 +-
 drivers/infiniband/core/sysfs.c               | 1095 ++++++++---------
 drivers/infiniband/hw/bnxt_re/hw_counters.c   |    7 +-
 drivers/infiniband/hw/bnxt_re/hw_counters.h   |    4 +-
 drivers/infiniband/hw/bnxt_re/main.c          |    4 +-
 drivers/infiniband/hw/cxgb4/provider.c        |   11 +-
 drivers/infiniband/hw/efa/efa.h               |    3 +-
 drivers/infiniband/hw/efa/efa_main.c          |    3 +-
 drivers/infiniband/hw/efa/efa_verbs.c         |   11 +-
 drivers/infiniband/hw/hfi1/hfi.h              |    7 +-
 drivers/infiniband/hw/hfi1/sysfs.c            |  529 +++-----
 drivers/infiniband/hw/hfi1/verbs.c            |   92 +-
 drivers/infiniband/hw/irdma/verbs.c           |   11 +-
 drivers/infiniband/hw/mlx4/main.c             |   27 +-
 drivers/infiniband/hw/mlx5/counters.c         |   42 +-
 drivers/infiniband/hw/mlx5/main.c             |    2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |    2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c    |    2 +-
 drivers/infiniband/hw/qedr/main.c             |    2 +-
 drivers/infiniband/hw/qib/qib.h               |    8 +-
 drivers/infiniband/hw/qib/qib_sysfs.c         |  616 ++++------
 drivers/infiniband/hw/qib/qib_verbs.c         |    6 +-
 drivers/infiniband/hw/usnic/usnic_ib_main.c   |    3 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_main.c    |    2 +-
 drivers/infiniband/sw/rdmavt/vt.c             |    2 +-
 drivers/infiniband/sw/rxe/rxe_hw_counters.c   |    7 +-
 drivers/infiniband/sw/rxe/rxe_hw_counters.h   |    4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |    4 +-
 include/rdma/ib_sysfs.h                       |   37 +
 include/rdma/ib_verbs.h                       |   68 +-
 34 files changed, 1313 insertions(+), 1582 deletions(-)
 create mode 100644 include/rdma/ib_sysfs.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 01/15] RDMA: Split the alloc_hw_stats() ops to port and device variants
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer Leon Romanovsky
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

This is being used to implement both the port and device global stats,
which is causing some confusion in the drivers. For instance EFA and i40iw
both seem to be misusing the device stats.

Split it into two ops so drivers that don't support one or the other can
leave the op NULL'd, making the calling code a little simpler to
understand.

Tested-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/counters.c          |  4 +-
 drivers/infiniband/core/device.c            |  3 +-
 drivers/infiniband/core/nldev.c             |  2 +-
 drivers/infiniband/core/sysfs.c             | 10 ++-
 drivers/infiniband/hw/bnxt_re/hw_counters.c |  7 +-
 drivers/infiniband/hw/bnxt_re/hw_counters.h |  4 +-
 drivers/infiniband/hw/bnxt_re/main.c        |  2 +-
 drivers/infiniband/hw/cxgb4/provider.c      |  9 +--
 drivers/infiniband/hw/efa/efa.h             |  3 +-
 drivers/infiniband/hw/efa/efa_main.c        |  3 +-
 drivers/infiniband/hw/efa/efa_verbs.c       | 11 ++-
 drivers/infiniband/hw/hfi1/verbs.c          | 86 ++++++++++-----------
 drivers/infiniband/hw/irdma/verbs.c         | 11 +--
 drivers/infiniband/hw/mlx4/main.c           | 25 ++++--
 drivers/infiniband/hw/mlx5/counters.c       | 42 +++++++---
 drivers/infiniband/sw/rxe/rxe_hw_counters.c |  7 +-
 drivers/infiniband/sw/rxe/rxe_hw_counters.h |  4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c       |  2 +-
 include/rdma/ib_verbs.h                     | 13 ++--
 19 files changed, 142 insertions(+), 106 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 15493357cfef..df9e6c5e4ddf 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -605,10 +605,10 @@ void rdma_counter_init(struct ib_device *dev)
 		port_counter->mode.mode = RDMA_COUNTER_MODE_NONE;
 		mutex_init(&port_counter->lock);
 
-		if (!dev->ops.alloc_hw_stats)
+		if (!dev->ops.alloc_hw_port_stats)
 			continue;
 
-		port_counter->hstats = dev->ops.alloc_hw_stats(dev, port);
+		port_counter->hstats = dev->ops.alloc_hw_port_stats(dev, port);
 		if (!port_counter->hstats)
 			goto fail;
 	}
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c660cef66ac6..86a16cd7d7fd 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2595,7 +2595,8 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, add_gid);
 	SET_DEVICE_OP(dev_ops, advise_mr);
 	SET_DEVICE_OP(dev_ops, alloc_dm);
-	SET_DEVICE_OP(dev_ops, alloc_hw_stats);
+	SET_DEVICE_OP(dev_ops, alloc_hw_device_stats);
+	SET_DEVICE_OP(dev_ops, alloc_hw_port_stats);
 	SET_DEVICE_OP(dev_ops, alloc_mr);
 	SET_DEVICE_OP(dev_ops, alloc_mr_integrity);
 	SET_DEVICE_OP(dev_ops, alloc_mw);
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 34d0cc1a4147..01316926cef6 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -2060,7 +2060,7 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
 	if (!device)
 		return -EINVAL;
 
-	if (!device->ops.alloc_hw_stats || !device->ops.get_hw_stats) {
+	if (!device->ops.alloc_hw_port_stats || !device->ops.get_hw_stats) {
 		ret = -EINVAL;
 		goto err;
 	}
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 05b702de00e8..d11ceff2b4e4 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -981,8 +981,10 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
 	struct rdma_hw_stats *stats;
 	int i, ret;
 
-	stats = device->ops.alloc_hw_stats(device, port_num);
-
+	if (port_num)
+		stats = device->ops.alloc_hw_port_stats(device, port_num);
+	else
+		stats = device->ops.alloc_hw_device_stats(device);
 	if (!stats)
 		return;
 
@@ -1165,7 +1167,7 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	 * port, so holder should be device. Therefore skip per port conunter
 	 * initialization.
 	 */
-	if (device->ops.alloc_hw_stats && port_num && is_full_dev)
+	if (device->ops.alloc_hw_port_stats && port_num && is_full_dev)
 		setup_hw_stats(device, p, port_num);
 
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
@@ -1409,7 +1411,7 @@ int ib_device_register_sysfs(struct ib_device *device)
 	if (ret)
 		return ret;
 
-	if (device->ops.alloc_hw_stats)
+	if (device->ops.alloc_hw_device_stats)
 		setup_hw_stats(device, NULL, 0);
 
 	return 0;
diff --git a/drivers/infiniband/hw/bnxt_re/hw_counters.c b/drivers/infiniband/hw/bnxt_re/hw_counters.c
index 3e54e1ae75b4..7ba07797845c 100644
--- a/drivers/infiniband/hw/bnxt_re/hw_counters.c
+++ b/drivers/infiniband/hw/bnxt_re/hw_counters.c
@@ -234,13 +234,10 @@ int bnxt_re_ib_get_hw_stats(struct ib_device *ibdev,
 	return ARRAY_SIZE(bnxt_re_stat_name);
 }
 
-struct rdma_hw_stats *bnxt_re_ib_alloc_hw_stats(struct ib_device *ibdev,
-						u32 port_num)
+struct rdma_hw_stats *bnxt_re_ib_alloc_hw_port_stats(struct ib_device *ibdev,
+						     u32 port_num)
 {
 	BUILD_BUG_ON(ARRAY_SIZE(bnxt_re_stat_name) != BNXT_RE_NUM_COUNTERS);
-	/* We support only per port stats */
-	if (!port_num)
-		return NULL;
 
 	return rdma_alloc_hw_stats_struct(bnxt_re_stat_name,
 					  ARRAY_SIZE(bnxt_re_stat_name),
diff --git a/drivers/infiniband/hw/bnxt_re/hw_counters.h b/drivers/infiniband/hw/bnxt_re/hw_counters.h
index ede048607d6c..6f2d2f91d9ff 100644
--- a/drivers/infiniband/hw/bnxt_re/hw_counters.h
+++ b/drivers/infiniband/hw/bnxt_re/hw_counters.h
@@ -96,8 +96,8 @@ enum bnxt_re_hw_stats {
 	BNXT_RE_NUM_COUNTERS
 };
 
-struct rdma_hw_stats *bnxt_re_ib_alloc_hw_stats(struct ib_device *ibdev,
-						u32 port_num);
+struct rdma_hw_stats *bnxt_re_ib_alloc_hw_port_stats(struct ib_device *ibdev,
+						     u32 port_num);
 int bnxt_re_ib_get_hw_stats(struct ib_device *ibdev,
 			    struct rdma_hw_stats *stats,
 			    u32 port, int index);
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 58c2a7dfb438..58bd905048ae 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -629,7 +629,7 @@ static const struct ib_device_ops bnxt_re_dev_ops = {
 	.uverbs_abi_ver = BNXT_RE_ABI_VERSION,
 
 	.add_gid = bnxt_re_add_gid,
-	.alloc_hw_stats = bnxt_re_ib_alloc_hw_stats,
+	.alloc_hw_port_stats = bnxt_re_ib_alloc_hw_port_stats,
 	.alloc_mr = bnxt_re_alloc_mr,
 	.alloc_pd = bnxt_re_alloc_pd,
 	.alloc_ucontext = bnxt_re_alloc_ucontext,
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 3f1893e180dd..4e453d1dde11 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -377,14 +377,11 @@ static const char * const names[] = {
 	[IP6OUTRSTS] = "ip6OutRsts"
 };
 
-static struct rdma_hw_stats *c4iw_alloc_stats(struct ib_device *ibdev,
-					      u32 port_num)
+static struct rdma_hw_stats *c4iw_alloc_device_stats(struct ib_device *ibdev)
 {
 	BUILD_BUG_ON(ARRAY_SIZE(names) != NR_COUNTERS);
 
-	if (port_num != 0)
-		return NULL;
-
+	/* FIXME: these look like port stats */
 	return rdma_alloc_hw_stats_struct(names, NR_COUNTERS,
 					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
@@ -455,7 +452,7 @@ static const struct ib_device_ops c4iw_dev_ops = {
 	.driver_id = RDMA_DRIVER_CXGB4,
 	.uverbs_abi_ver = C4IW_UVERBS_ABI_VERSION,
 
-	.alloc_hw_stats = c4iw_alloc_stats,
+	.alloc_hw_device_stats = c4iw_alloc_device_stats,
 	.alloc_mr = c4iw_alloc_mr,
 	.alloc_pd = c4iw_allocate_pd,
 	.alloc_ucontext = c4iw_alloc_ucontext,
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index ea322cec27d2..2b8ca099b381 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -157,7 +157,8 @@ int efa_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
 		  int qp_attr_mask, struct ib_udata *udata);
 enum rdma_link_layer efa_port_link_layer(struct ib_device *ibdev,
 					 u32 port_num);
-struct rdma_hw_stats *efa_alloc_hw_stats(struct ib_device *ibdev, u32 port_num);
+struct rdma_hw_stats *efa_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num);
+struct rdma_hw_stats *efa_alloc_hw_device_stats(struct ib_device *ibdev);
 int efa_get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats,
 		     u32 port_num, int index);
 
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 816cfd65b7ac..203e6ddcacbc 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -242,7 +242,8 @@ static const struct ib_device_ops efa_dev_ops = {
 	.driver_id = RDMA_DRIVER_EFA,
 	.uverbs_abi_ver = EFA_UVERBS_ABI_VERSION,
 
-	.alloc_hw_stats = efa_alloc_hw_stats,
+	.alloc_hw_port_stats = efa_alloc_hw_port_stats,
+	.alloc_hw_device_stats = efa_alloc_hw_device_stats,
 	.alloc_pd = efa_alloc_pd,
 	.alloc_ucontext = efa_alloc_ucontext,
 	.create_cq = efa_create_cq,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 51572f1dc611..be6d3ff0f1be 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1904,13 +1904,22 @@ int efa_destroy_ah(struct ib_ah *ibah, u32 flags)
 	return 0;
 }
 
-struct rdma_hw_stats *efa_alloc_hw_stats(struct ib_device *ibdev, u32 port_num)
+struct rdma_hw_stats *efa_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num)
 {
 	return rdma_alloc_hw_stats_struct(efa_stats_names,
 					  ARRAY_SIZE(efa_stats_names),
 					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
 
+struct rdma_hw_stats *efa_alloc_hw_device_stats(struct ib_device *ibdev)
+{
+	/*
+	 * It is probably a bug that efa reports its port stats as device
+	 * stats
+	 */
+	return efa_alloc_hw_port_stats(ibdev, 0);
+}
+
 int efa_get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats,
 		     u32 port_num, int index)
 {
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 554294340caa..85deba07a675 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1693,54 +1693,53 @@ static int init_cntr_names(const char *names_in,
 	return 0;
 }
 
-static struct rdma_hw_stats *alloc_hw_stats(struct ib_device *ibdev,
-					    u32 port_num)
+static int init_counters(struct ib_device *ibdev)
 {
-	int i, err;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	int i, err = 0;
 
 	mutex_lock(&cntr_names_lock);
-	if (!cntr_names_initialized) {
-		struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
-
-		err = init_cntr_names(dd->cntrnames,
-				      dd->cntrnameslen,
-				      num_driver_cntrs,
-				      &num_dev_cntrs,
-				      &dev_cntr_names);
-		if (err) {
-			mutex_unlock(&cntr_names_lock);
-			return NULL;
-		}
-
-		for (i = 0; i < num_driver_cntrs; i++)
-			dev_cntr_names[num_dev_cntrs + i] =
-				driver_cntr_names[i];
-
-		err = init_cntr_names(dd->portcntrnames,
-				      dd->portcntrnameslen,
-				      0,
-				      &num_port_cntrs,
-				      &port_cntr_names);
-		if (err) {
-			kfree(dev_cntr_names);
-			dev_cntr_names = NULL;
-			mutex_unlock(&cntr_names_lock);
-			return NULL;
-		}
-		cntr_names_initialized = 1;
+	if (cntr_names_initialized)
+		goto out_unlock;
+
+	err = init_cntr_names(dd->cntrnames, dd->cntrnameslen, num_driver_cntrs,
+			      &num_dev_cntrs, &dev_cntr_names);
+	if (err)
+		goto out_unlock;
+
+	for (i = 0; i < num_driver_cntrs; i++)
+		dev_cntr_names[num_dev_cntrs + i] = driver_cntr_names[i];
+
+	err = init_cntr_names(dd->portcntrnames, dd->portcntrnameslen, 0,
+			      &num_port_cntrs, &port_cntr_names);
+	if (err) {
+		kfree(dev_cntr_names);
+		dev_cntr_names = NULL;
+		goto out_unlock;
 	}
+	cntr_names_initialized = 1;
+
+out_unlock:
 	mutex_unlock(&cntr_names_lock);
+	return err;
+}
 
-	if (!port_num)
-		return rdma_alloc_hw_stats_struct(
-				dev_cntr_names,
-				num_dev_cntrs + num_driver_cntrs,
-				RDMA_HW_STATS_DEFAULT_LIFESPAN);
-	else
-		return rdma_alloc_hw_stats_struct(
-				port_cntr_names,
-				num_port_cntrs,
-				RDMA_HW_STATS_DEFAULT_LIFESPAN);
+static struct rdma_hw_stats *hfi1_alloc_hw_device_stats(struct ib_device *ibdev)
+{
+	if (init_counters(ibdev))
+		return NULL;
+	return rdma_alloc_hw_stats_struct(dev_cntr_names,
+					  num_dev_cntrs + num_driver_cntrs,
+					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
+}
+
+static struct rdma_hw_stats *hfi_alloc_hw_port_stats(struct ib_device *ibdev,
+						     u32 port_num)
+{
+	if (init_counters(ibdev))
+		return NULL;
+	return rdma_alloc_hw_stats_struct(port_cntr_names, num_port_cntrs,
+					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
 
 static u64 hfi1_sps_ints(void)
@@ -1787,7 +1786,8 @@ static const struct ib_device_ops hfi1_dev_ops = {
 	.owner = THIS_MODULE,
 	.driver_id = RDMA_DRIVER_HFI1,
 
-	.alloc_hw_stats = alloc_hw_stats,
+	.alloc_hw_device_stats = hfi1_alloc_hw_device_stats,
+	.alloc_hw_port_stats = hfi_alloc_hw_port_stats,
 	.alloc_rdma_netdev = hfi1_vnic_alloc_rn,
 	.get_dev_fw_str = hfi1_get_dev_fw_str,
 	.get_hw_stats = get_hw_stats,
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index 294155293243..10b602eccbc4 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -3727,20 +3727,17 @@ static void irdma_get_dev_fw_str(struct ib_device *dev, char *str)
 }
 
 /**
- * irdma_alloc_hw_stats - Allocate a hw stats structure
+ * irdma_alloc_hw_port_stats - Allocate a hw stats structure
  * @ibdev: device pointer from stack
  * @port_num: port number
  */
-static struct rdma_hw_stats *irdma_alloc_hw_stats(struct ib_device *ibdev,
-						  u32 port_num)
+static struct rdma_hw_stats *irdma_alloc_hw_port_stats(struct ib_device *ibdev,
+						       u32 port_num)
 {
 	int num_counters = IRDMA_HW_STAT_INDEX_MAX_32 +
 			   IRDMA_HW_STAT_INDEX_MAX_64;
 	unsigned long lifespan = RDMA_HW_STATS_DEFAULT_LIFESPAN;
 
-	if (!port_num)
-		return NULL;
-
 	BUILD_BUG_ON(ARRAY_SIZE(irdma_hw_stat_names) !=
 		     (IRDMA_HW_STAT_INDEX_MAX_32 + IRDMA_HW_STAT_INDEX_MAX_64));
 
@@ -4354,7 +4351,7 @@ static const struct ib_device_ops irdma_dev_ops = {
 	.driver_id = RDMA_DRIVER_IRDMA,
 	.uverbs_abi_ver = IRDMA_ABI_VER,
 
-	.alloc_hw_stats = irdma_alloc_hw_stats,
+	.alloc_hw_port_stats = irdma_alloc_hw_port_stats,
 	.alloc_mr = irdma_alloc_mr,
 	.alloc_mw = irdma_alloc_mw,
 	.alloc_pd = irdma_alloc_pd,
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 22898d97ecbd..341162aa2175 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2105,17 +2105,29 @@ static const struct diag_counter diag_device_only[] = {
 	DIAG_COUNTER(rq_num_udsdprd, 0x118),
 };
 
-static struct rdma_hw_stats *mlx4_ib_alloc_hw_stats(struct ib_device *ibdev,
-						    u32 port_num)
+static struct rdma_hw_stats *
+mlx4_ib_alloc_hw_device_stats(struct ib_device *ibdev)
 {
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct mlx4_ib_diag_counters *diag = dev->diag_counters;
 
-	if (!diag[!!port_num].name)
+	if (!diag[0].name)
 		return NULL;
 
-	return rdma_alloc_hw_stats_struct(diag[!!port_num].name,
-					  diag[!!port_num].num_counters,
+	return rdma_alloc_hw_stats_struct(diag[0].name, diag[0].num_counters,
+					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
+}
+
+static struct rdma_hw_stats *
+mlx4_ib_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num)
+{
+	struct mlx4_ib_dev *dev = to_mdev(ibdev);
+	struct mlx4_ib_diag_counters *diag = dev->diag_counters;
+
+	if (!diag[1].name)
+		return NULL;
+
+	return rdma_alloc_hw_stats_struct(diag[1].name, diag[1].num_counters,
 					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
 
@@ -2206,7 +2218,8 @@ static void mlx4_ib_fill_diag_counters(struct mlx4_ib_dev *ibdev,
 }
 
 static const struct ib_device_ops mlx4_ib_hw_stats_ops = {
-	.alloc_hw_stats = mlx4_ib_alloc_hw_stats,
+	.alloc_hw_device_stats = mlx4_ib_alloc_hw_device_stats,
+	.alloc_hw_port_stats = mlx4_ib_alloc_hw_port_stats,
 	.get_hw_stats = mlx4_ib_get_hw_stats,
 };
 
diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c
index e365341057cb..224ba36f2946 100644
--- a/drivers/infiniband/hw/mlx5/counters.c
+++ b/drivers/infiniband/hw/mlx5/counters.c
@@ -161,22 +161,29 @@ u16 mlx5_ib_get_counters_id(struct mlx5_ib_dev *dev, u32 port_num)
 	return cnts->set_id;
 }
 
-static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
-						    u32 port_num)
+static struct rdma_hw_stats *
+mlx5_ib_alloc_hw_device_stats(struct ib_device *ibdev)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
-	const struct mlx5_ib_counters *cnts;
-	bool is_switchdev = is_mdev_switchdev_mode(dev->mdev);
+	const struct mlx5_ib_counters *cnts = &dev->port[0].cnts;
 
-	if ((is_switchdev && port_num) || (!is_switchdev && !port_num))
-		return NULL;
+	return rdma_alloc_hw_stats_struct(cnts->names,
+					  cnts->num_q_counters +
+						  cnts->num_cong_counters +
+						  cnts->num_ext_ppcnt_counters,
+					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
+}
 
-	cnts = get_counters(dev, port_num - 1);
+static struct rdma_hw_stats *
+mlx5_ib_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num)
+{
+	struct mlx5_ib_dev *dev = to_mdev(ibdev);
+	const struct mlx5_ib_counters *cnts = &dev->port[port_num - 1].cnts;
 
 	return rdma_alloc_hw_stats_struct(cnts->names,
 					  cnts->num_q_counters +
-					  cnts->num_cong_counters +
-					  cnts->num_ext_ppcnt_counters,
+						  cnts->num_cong_counters +
+						  cnts->num_ext_ppcnt_counters,
 					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
 
@@ -666,7 +673,17 @@ void mlx5_ib_counters_clear_description(struct ib_counters *counters)
 }
 
 static const struct ib_device_ops hw_stats_ops = {
-	.alloc_hw_stats = mlx5_ib_alloc_hw_stats,
+	.alloc_hw_port_stats = mlx5_ib_alloc_hw_port_stats,
+	.get_hw_stats = mlx5_ib_get_hw_stats,
+	.counter_bind_qp = mlx5_ib_counter_bind_qp,
+	.counter_unbind_qp = mlx5_ib_counter_unbind_qp,
+	.counter_dealloc = mlx5_ib_counter_dealloc,
+	.counter_alloc_stats = mlx5_ib_counter_alloc_stats,
+	.counter_update_stats = mlx5_ib_counter_update_stats,
+};
+
+static const struct ib_device_ops hw_switchdev_stats_ops = {
+	.alloc_hw_device_stats = mlx5_ib_alloc_hw_device_stats,
 	.get_hw_stats = mlx5_ib_get_hw_stats,
 	.counter_bind_qp = mlx5_ib_counter_bind_qp,
 	.counter_unbind_qp = mlx5_ib_counter_unbind_qp,
@@ -690,7 +707,10 @@ int mlx5_ib_counters_init(struct mlx5_ib_dev *dev)
 	if (!MLX5_CAP_GEN(dev->mdev, max_qp_cnt))
 		return 0;
 
-	ib_set_device_ops(&dev->ib_dev, &hw_stats_ops);
+	if (is_mdev_switchdev_mode(dev->mdev))
+		ib_set_device_ops(&dev->ib_dev, &hw_switchdev_stats_ops);
+	else
+		ib_set_device_ops(&dev->ib_dev, &hw_stats_ops);
 	return mlx5_ib_alloc_counters(dev);
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
index f469fd1c753d..d5ceb706d964 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
@@ -40,13 +40,10 @@ int rxe_ib_get_hw_stats(struct ib_device *ibdev,
 	return ARRAY_SIZE(rxe_counter_name);
 }
 
-struct rdma_hw_stats *rxe_ib_alloc_hw_stats(struct ib_device *ibdev,
-					    u32 port_num)
+struct rdma_hw_stats *rxe_ib_alloc_hw_port_stats(struct ib_device *ibdev,
+						 u32 port_num)
 {
 	BUILD_BUG_ON(ARRAY_SIZE(rxe_counter_name) != RXE_NUM_OF_COUNTERS);
-	/* We support only per port stats */
-	if (!port_num)
-		return NULL;
 
 	return rdma_alloc_hw_stats_struct(rxe_counter_name,
 					  ARRAY_SIZE(rxe_counter_name),
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
index 2f369acb46d7..71f4d4fa9dc8 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
@@ -29,8 +29,8 @@ enum rxe_counters {
 	RXE_NUM_OF_COUNTERS
 };
 
-struct rdma_hw_stats *rxe_ib_alloc_hw_stats(struct ib_device *ibdev,
-					    u32 port_num);
+struct rdma_hw_stats *rxe_ib_alloc_hw_port_stats(struct ib_device *ibdev,
+						 u32 port_num);
 int rxe_ib_get_hw_stats(struct ib_device *ibdev,
 			struct rdma_hw_stats *stats,
 			u32 port, int index);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 33731b5dd9c3..d3df59d897a7 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1104,7 +1104,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.driver_id = RDMA_DRIVER_RXE,
 	.uverbs_abi_ver = RXE_UVERBS_ABI_VERSION,
 
-	.alloc_hw_stats = rxe_ib_alloc_hw_stats,
+	.alloc_hw_port_stats = rxe_ib_alloc_hw_port_stats,
 	.alloc_mr = rxe_alloc_mr,
 	.alloc_pd = rxe_alloc_pd,
 	.alloc_ucontext = rxe_alloc_ucontext,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 05dbc216eb64..849a06441e29 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2522,13 +2522,14 @@ struct ib_device_ops {
 			    unsigned int *meta_sg_offset);
 
 	/**
-	 * alloc_hw_stats - Allocate a struct rdma_hw_stats and fill in the
-	 *   driver initialized data.  The struct is kfree()'ed by the sysfs
-	 *   core when the device is removed.  A lifespan of -1 in the return
-	 *   struct tells the core to set a default lifespan.
+	 * alloc_hw_[device,port]_stats - Allocate a struct rdma_hw_stats and
+	 *   fill in the driver initialized data.  The struct is kfree()'ed by
+	 *   the sysfs core when the device is removed.  A lifespan of -1 in the
+	 *   return struct tells the core to set a default lifespan.
 	 */
-	struct rdma_hw_stats *(*alloc_hw_stats)(struct ib_device *device,
-						u32 port_num);
+	struct rdma_hw_stats *(*alloc_hw_device_stats)(struct ib_device *device);
+	struct rdma_hw_stats *(*alloc_hw_port_stats)(struct ib_device *device,
+						     u32 port_num);
 	/**
 	 * get_hw_stats - Fill in the counter value(s) in the stats struct.
 	 * @index - The index in the value array we wish to have updated, or
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 01/15] RDMA: Split the alloc_hw_stats() ops to port and device variants Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07 10:23   ` Greg KH
  2021-06-07  8:17 ` [PATCH rdma-next v1 03/15] RDMA/core: Split port and device counter sysfs attributes Leon Romanovsky
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

It is much saner to store a pointer to the kobject structure that contains
the cannonical stats pointer than to copy the stats pointers into a public
structure.

Future patches will require the sysfs pointer for other purposes.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/core_priv.h |  1 +
 drivers/infiniband/core/nldev.c     |  8 ++------
 drivers/infiniband/core/sysfs.c     | 14 +++++++++++---
 include/rdma/ib_verbs.h             |  3 ++-
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 29809dd30041..ec5c2c3db423 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -378,6 +378,7 @@ struct net_device *rdma_read_gid_attr_ndev_rcu(const struct ib_gid_attr *attr);
 
 void ib_free_port_attrs(struct ib_core_device *coredev);
 int ib_setup_port_attrs(struct ib_core_device *coredev);
+struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev, u32 port_num);
 
 int rdma_compatdev_set(u8 enable);
 
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 01316926cef6..e9b4b2cccaa0 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -2066,7 +2066,8 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
 	}
 
 	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
-	if (!rdma_is_port_valid(device, port)) {
+	stats = ib_get_hw_stats_port(device, port);
+	if (!stats) {
 		ret = -EINVAL;
 		goto err;
 	}
@@ -2088,11 +2089,6 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
 		goto err_msg;
 	}
 
-	stats = device->port_data ? device->port_data[port].hw_stats : NULL;
-	if (stats == NULL) {
-		ret = -EINVAL;
-		goto err_msg;
-	}
 	mutex_lock(&stats->lock);
 
 	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index d11ceff2b4e4..b153dee1e0fa 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -1031,8 +1031,6 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
 			goto err;
 		port->hw_stats_ag = hsag;
 		port->hw_stats = stats;
-		if (device->port_data)
-			device->port_data[port_num].hw_stats = stats;
 	} else {
 		struct kobject *kobj = &device->dev.kobj;
 		ret = sysfs_create_group(kobj, hsag);
@@ -1053,6 +1051,14 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
 	kfree(stats);
 }
 
+struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
+					   u32 port_num)
+{
+	if (!ibdev->port_data || !rdma_is_port_valid(ibdev, port_num))
+		return NULL;
+	return ibdev->port_data[port_num].sysfs->hw_stats;
+}
+
 static int add_port(struct ib_core_device *coredev, int port_num)
 {
 	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
@@ -1171,6 +1177,8 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 		setup_hw_stats(device, p, port_num);
 
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
+	if (device->port_data && is_full_dev)
+		device->port_data[port_num].sysfs = p;
 
 	kobject_uevent(&p->kobj, KOBJ_ADD);
 	return 0;
@@ -1361,7 +1369,7 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 			free_hsag(&port->kobj, port->hw_stats_ag);
 		kfree(port->hw_stats);
 		if (device->port_data && is_full_dev)
-			device->port_data[port->port_num].hw_stats = NULL;
+			device->port_data[port->port_num].sysfs = NULL;
 
 		if (port->pma_table)
 			sysfs_remove_group(p, port->pma_table);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 849a06441e29..7a4cb7022f91 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -50,6 +50,7 @@ struct ib_uqp_object;
 struct ib_usrq_object;
 struct ib_uwq_object;
 struct rdma_cm_id;
+struct ib_port;
 
 extern struct workqueue_struct *ib_wq;
 extern struct workqueue_struct *ib_comp_wq;
@@ -2182,7 +2183,7 @@ struct ib_port_data {
 	struct net_device __rcu *netdev;
 	struct hlist_node ndev_hash_link;
 	struct rdma_port_counter port_counter;
-	struct rdma_hw_stats *hw_stats;
+	struct ib_port *sysfs;
 };
 
 /* rdma netdev type - specifies protocol type */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 03/15] RDMA/core: Split port and device counter sysfs attributes
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 01/15] RDMA: Split the alloc_hw_stats() ops to port and device variants Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 04/15] RDMA/core: Split gid_attrs related sysfs from add_port() Leon Romanovsky
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

This code creates a 'struct hw_stats_attribute' for each sysfs entry that
contains a naked 'struct attribute' inside.

It then proceeds to attach this same structure to a 'struct device' kobj
and a 'struct ib_port' kobj. However, this violates the typing
requirements.  'struct device' requires the attribute to be a 'struct
device_attribute' and 'struct ib_port' requires the attribute to be
'struct port_attribute'.

This happens to work because the show/store function pointers in all three
structures happen to be at the same offset and happen to be nearly the
same signature. This means when container_of() was used to go between the
wrong two types it still managed to work.

However clang CFI detection notices that the function pointers have a
slightly different signature. As with show/store this was only working
because the device and port struct layouts happened to have the kobj at
the front.

Correct this by have two independent sets of data structures for the port
and device case. The two different attributes correctly include the
port/device_attribute struct and everything from there up is kept
split. The show/store function call chains start with device/port unique
functions that invoke a common show/store function pointer.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/sysfs.c | 458 ++++++++++++++++++++------------
 include/rdma/ib_verbs.h         |   4 +-
 2 files changed, 292 insertions(+), 170 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index b153dee1e0fa..114fecda9764 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -60,8 +60,7 @@ struct ib_port {
 	struct attribute_group gid_group;
 	struct attribute_group *pkey_group;
 	const struct attribute_group *pma_table;
-	struct attribute_group *hw_stats_ag;
-	struct rdma_hw_stats   *hw_stats;
+	struct hw_stats_port_data *hw_stats_data;
 	u32                     port_num;
 };
 
@@ -85,16 +84,35 @@ struct port_table_attribute {
 	__be16			attr_id;
 };
 
-struct hw_stats_attribute {
-	struct attribute	attr;
-	ssize_t			(*show)(struct kobject *kobj,
-					struct attribute *attr, char *buf);
-	ssize_t			(*store)(struct kobject *kobj,
-					 struct attribute *attr,
-					 const char *buf,
-					 size_t count);
-	int			index;
-	u32			port_num;
+struct hw_stats_device_attribute {
+	struct device_attribute attr;
+	ssize_t (*show)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
+			unsigned int index, unsigned int port_num, char *buf);
+	ssize_t (*store)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
+			 unsigned int index, unsigned int port_num,
+			 const char *buf, size_t count);
+};
+
+struct hw_stats_port_attribute {
+	struct port_attribute attr;
+	ssize_t (*show)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
+			unsigned int index, unsigned int port_num, char *buf);
+	ssize_t (*store)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
+			 unsigned int index, unsigned int port_num,
+			 const char *buf, size_t count);
+};
+
+struct hw_stats_device_data {
+	struct attribute_group group;
+	const struct attribute_group *groups[2];
+	struct rdma_hw_stats *stats;
+	struct hw_stats_device_attribute attrs[];
+};
+
+struct hw_stats_port_data {
+	struct attribute_group group;
+	struct rdma_hw_stats *stats;
+	struct hw_stats_port_attribute attrs[];
 };
 
 static ssize_t port_attr_show(struct kobject *kobj,
@@ -128,6 +146,53 @@ static const struct sysfs_ops port_sysfs_ops = {
 	.store	= port_attr_store
 };
 
+static ssize_t hw_stat_device_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct hw_stats_device_attribute *stat_attr =
+		container_of(attr, struct hw_stats_device_attribute, attr);
+	struct ib_device *ibdev = container_of(dev, struct ib_device, dev);
+
+	return stat_attr->show(ibdev, ibdev->hw_stats_data->stats,
+			       stat_attr - ibdev->hw_stats_data->attrs, 0, buf);
+}
+
+static ssize_t hw_stat_device_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct hw_stats_device_attribute *stat_attr =
+		container_of(attr, struct hw_stats_device_attribute, attr);
+	struct ib_device *ibdev = container_of(dev, struct ib_device, dev);
+
+	return stat_attr->store(ibdev, ibdev->hw_stats_data->stats,
+				stat_attr - ibdev->hw_stats_data->attrs, 0, buf,
+				count);
+}
+
+static ssize_t hw_stat_port_show(struct ib_port *port,
+				 struct port_attribute *attr, char *buf)
+{
+	struct hw_stats_port_attribute *stat_attr =
+		container_of(attr, struct hw_stats_port_attribute, attr);
+
+	return stat_attr->show(port->ibdev, port->hw_stats_data->stats,
+			       stat_attr - port->hw_stats_data->attrs,
+			       port->port_num, buf);
+}
+
+static ssize_t hw_stat_port_store(struct ib_port *port,
+				  struct port_attribute *attr, const char *buf,
+				  size_t count)
+{
+	struct hw_stats_port_attribute *stat_attr =
+		container_of(attr, struct hw_stats_port_attribute, attr);
+
+	return stat_attr->store(port->ibdev, port->hw_stats_data->stats,
+				stat_attr - port->hw_stats_data->attrs,
+				port->port_num, buf, count);
+}
+
 static ssize_t gid_attr_show(struct kobject *kobj,
 			     struct attribute *attr, char *buf)
 {
@@ -835,56 +900,30 @@ static int print_hw_stat(struct ib_device *dev, int port_num,
 	return sysfs_emit(buf, "%llu\n", stats->value[index] + v);
 }
 
-static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
-			     char *buf)
+static ssize_t show_hw_stats(struct ib_device *ibdev,
+			     struct rdma_hw_stats *stats, unsigned int index,
+			     unsigned int port_num, char *buf)
 {
-	struct ib_device *dev;
-	struct ib_port *port;
-	struct hw_stats_attribute *hsa;
-	struct rdma_hw_stats *stats;
 	int ret;
 
-	hsa = container_of(attr, struct hw_stats_attribute, attr);
-	if (!hsa->port_num) {
-		dev = container_of((struct device *)kobj,
-				   struct ib_device, dev);
-		stats = dev->hw_stats;
-	} else {
-		port = container_of(kobj, struct ib_port, kobj);
-		dev = port->ibdev;
-		stats = port->hw_stats;
-	}
 	mutex_lock(&stats->lock);
-	ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index);
+	ret = update_hw_stats(ibdev, stats, port_num, index);
 	if (ret)
 		goto unlock;
-	ret = print_hw_stat(dev, hsa->port_num, stats, hsa->index, buf);
+	ret = print_hw_stat(ibdev, port_num, stats, index, buf);
 unlock:
 	mutex_unlock(&stats->lock);
 
 	return ret;
 }
 
-static ssize_t show_stats_lifespan(struct kobject *kobj,
-				   struct attribute *attr,
+static ssize_t show_stats_lifespan(struct ib_device *ibdev,
+				   struct rdma_hw_stats *stats,
+				   unsigned int index, unsigned int port_num,
 				   char *buf)
 {
-	struct hw_stats_attribute *hsa;
-	struct rdma_hw_stats *stats;
 	int msecs;
 
-	hsa = container_of(attr, struct hw_stats_attribute, attr);
-	if (!hsa->port_num) {
-		struct ib_device *dev = container_of((struct device *)kobj,
-						     struct ib_device, dev);
-
-		stats = dev->hw_stats;
-	} else {
-		struct ib_port *p = container_of(kobj, struct ib_port, kobj);
-
-		stats = p->hw_stats;
-	}
-
 	mutex_lock(&stats->lock);
 	msecs = jiffies_to_msecs(stats->lifespan);
 	mutex_unlock(&stats->lock);
@@ -892,12 +931,11 @@ static ssize_t show_stats_lifespan(struct kobject *kobj,
 	return sysfs_emit(buf, "%d\n", msecs);
 }
 
-static ssize_t set_stats_lifespan(struct kobject *kobj,
-				  struct attribute *attr,
-				  const char *buf, size_t count)
+static ssize_t set_stats_lifespan(struct ib_device *ibdev,
+				   struct rdma_hw_stats *stats,
+				   unsigned int index, unsigned int port_num,
+				   const char *buf, size_t count)
 {
-	struct hw_stats_attribute *hsa;
-	struct rdma_hw_stats *stats;
 	int msecs;
 	int jiffies;
 	int ret;
@@ -908,17 +946,6 @@ static ssize_t set_stats_lifespan(struct kobject *kobj,
 	if (msecs < 0 || msecs > 10000)
 		return -EINVAL;
 	jiffies = msecs_to_jiffies(msecs);
-	hsa = container_of(attr, struct hw_stats_attribute, attr);
-	if (!hsa->port_num) {
-		struct ib_device *dev = container_of((struct device *)kobj,
-						     struct ib_device, dev);
-
-		stats = dev->hw_stats;
-	} else {
-		struct ib_port *p = container_of(kobj, struct ib_port, kobj);
-
-		stats = p->hw_stats;
-	}
 
 	mutex_lock(&stats->lock);
 	stats->lifespan = jiffies;
@@ -927,67 +954,125 @@ static ssize_t set_stats_lifespan(struct kobject *kobj,
 	return count;
 }
 
-static void free_hsag(struct kobject *kobj, struct attribute_group *attr_group)
+static struct hw_stats_device_data *
+alloc_hw_stats_device(struct ib_device *ibdev)
 {
-	struct attribute **attr;
+	struct hw_stats_device_data *data;
+	struct rdma_hw_stats *stats;
+
+	if (!ibdev->ops.alloc_hw_device_stats)
+		return ERR_PTR(-EOPNOTSUPP);
+	stats = ibdev->ops.alloc_hw_device_stats(ibdev);
+	if (!stats)
+		return ERR_PTR(-ENOMEM);
+	if (!stats->names || stats->num_counters <= 0)
+		goto err_free_stats;
+
+	/*
+	 * Two extra attribue elements here, one for the lifespan entry and
+	 * one to NULL terminate the list for the sysfs core code
+	 */
+	data = kzalloc(struct_size(data, attrs, stats->num_counters + 1),
+		       GFP_KERNEL);
+	if (!data)
+		goto err_free_stats;
+	data->group.attrs = kcalloc(stats->num_counters + 2,
+				    sizeof(*data->group.attrs), GFP_KERNEL);
+	if (!data->group.attrs)
+		goto err_free_data;
 
-	sysfs_remove_group(kobj, attr_group);
+	mutex_init(&stats->lock);
+	data->group.name = "hw_counters";
+	data->stats = stats;
+	data->groups[0] = &data->group;
+	return data;
 
-	for (attr = attr_group->attrs; *attr; attr++)
-		kfree(*attr);
-	kfree(attr_group);
+err_free_data:
+	kfree(data);
+err_free_stats:
+	kfree(stats);
+	return ERR_PTR(-ENOMEM);
 }
 
-static struct attribute *alloc_hsa(int index, u32 port_num, const char *name)
+static void free_hw_stats_device(struct hw_stats_device_data *data)
 {
-	struct hw_stats_attribute *hsa;
+	kfree(data->group.attrs);
+	kfree(data->stats);
+	kfree(data);
+}
 
-	hsa = kmalloc(sizeof(*hsa), GFP_KERNEL);
-	if (!hsa)
-		return NULL;
+static int setup_hw_device_stats(struct ib_device *ibdev)
+{
+	struct hw_stats_device_attribute *attr;
+	struct hw_stats_device_data *data;
+	int i, ret;
 
-	hsa->attr.name = (char *)name;
-	hsa->attr.mode = S_IRUGO;
-	hsa->show = show_hw_stats;
-	hsa->store = NULL;
-	hsa->index = index;
-	hsa->port_num = port_num;
+	data = alloc_hw_stats_device(ibdev);
+	if (IS_ERR(data))
+		return PTR_ERR(data);
 
-	return &hsa->attr;
-}
+	ret = ibdev->ops.get_hw_stats(ibdev, data->stats, 0,
+				      data->stats->num_counters);
+	if (ret != data->stats->num_counters) {
+		if (WARN_ON(ret >= 0))
+			ret = -EINVAL;
+		goto err_free;
+	}
 
-static struct attribute *alloc_hsa_lifespan(char *name, u32 port_num)
-{
-	struct hw_stats_attribute *hsa;
+	data->stats->timestamp = jiffies;
 
-	hsa = kmalloc(sizeof(*hsa), GFP_KERNEL);
-	if (!hsa)
-		return NULL;
+	for (i = 0; i < data->stats->num_counters; i++) {
+		attr = &data->attrs[i];
+		sysfs_attr_init(&attr->attr.attr);
+		attr->attr.attr.name = data->stats->names[i];
+		attr->attr.attr.mode = 0444;
+		attr->attr.show = hw_stat_device_show;
+		attr->show = show_hw_stats;
+		data->group.attrs[i] = &attr->attr.attr;
+	}
+
+	attr = &data->attrs[i];
+	sysfs_attr_init(&attr->attr.attr);
+	attr->attr.attr.name = "lifespan";
+	attr->attr.attr.mode = 0644;
+	attr->attr.show = hw_stat_device_show;
+	attr->show = show_stats_lifespan;
+	attr->attr.store = hw_stat_device_store;
+	attr->store = set_stats_lifespan;
+	data->group.attrs[i] = &attr->attr.attr;
+
+	ibdev->hw_stats_data = data;
+	ret = device_add_groups(&ibdev->dev, data->groups);
+	if (ret)
+		goto err_free;
+	return 0;
 
-	hsa->attr.name = name;
-	hsa->attr.mode = S_IWUSR | S_IRUGO;
-	hsa->show = show_stats_lifespan;
-	hsa->store = set_stats_lifespan;
-	hsa->index = 0;
-	hsa->port_num = port_num;
+err_free:
+	free_hw_stats_device(data);
+	ibdev->hw_stats_data = NULL;
+	return ret;
+}
 
-	return &hsa->attr;
+static void destroy_hw_device_stats(struct ib_device *ibdev)
+{
+	if (!ibdev->hw_stats_data)
+		return;
+	device_remove_groups(&ibdev->dev, ibdev->hw_stats_data->groups);
+	free_hw_stats_device(ibdev->hw_stats_data);
+	ibdev->hw_stats_data = NULL;
 }
 
-static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
-			   u32 port_num)
+static struct hw_stats_port_data *alloc_hw_stats_port(struct ib_port *port)
 {
-	struct attribute_group *hsag;
+	struct ib_device *ibdev = port->ibdev;
+	struct hw_stats_port_data *data;
 	struct rdma_hw_stats *stats;
-	int i, ret;
 
-	if (port_num)
-		stats = device->ops.alloc_hw_port_stats(device, port_num);
-	else
-		stats = device->ops.alloc_hw_device_stats(device);
+	if (!ibdev->ops.alloc_hw_port_stats)
+		return ERR_PTR(-EOPNOTSUPP);
+	stats = ibdev->ops.alloc_hw_port_stats(port->ibdev, port->port_num);
 	if (!stats)
-		return;
-
+		return ERR_PTR(-ENOMEM);
 	if (!stats->names || stats->num_counters <= 0)
 		goto err_free_stats;
 
@@ -995,68 +1080,102 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
 	 * Two extra attribue elements here, one for the lifespan entry and
 	 * one to NULL terminate the list for the sysfs core code
 	 */
-	hsag = kzalloc(sizeof(*hsag) +
-		       sizeof(void *) * (stats->num_counters + 2),
+	data = kzalloc(struct_size(data, attrs, stats->num_counters + 1),
 		       GFP_KERNEL);
-	if (!hsag)
+	if (!data)
 		goto err_free_stats;
+	data->group.attrs = kcalloc(stats->num_counters + 2,
+				    sizeof(*data->group.attrs), GFP_KERNEL);
+	if (!data->group.attrs)
+		goto err_free_data;
 
-	ret = device->ops.get_hw_stats(device, stats, port_num,
-				       stats->num_counters);
-	if (ret != stats->num_counters)
-		goto err_free_hsag;
+	mutex_init(&stats->lock);
+	data->group.name = "hw_counters";
+	data->stats = stats;
+	return data;
 
-	stats->timestamp = jiffies;
+err_free_data:
+	kfree(data);
+err_free_stats:
+	kfree(stats);
+	return ERR_PTR(-ENOMEM);
+}
 
-	hsag->name = "hw_counters";
-	hsag->attrs = (void *)hsag + sizeof(*hsag);
+static void free_hw_stats_port(struct hw_stats_port_data *data)
+{
+	kfree(data->group.attrs);
+	kfree(data->stats);
+	kfree(data);
+}
 
-	for (i = 0; i < stats->num_counters; i++) {
-		hsag->attrs[i] = alloc_hsa(i, port_num, stats->names[i]);
-		if (!hsag->attrs[i])
-			goto err;
-		sysfs_attr_init(hsag->attrs[i]);
-	}
+static int setup_hw_port_stats(struct ib_port *port)
+{
+	struct hw_stats_port_attribute *attr;
+	struct hw_stats_port_data *data;
+	int i, ret;
 
-	mutex_init(&stats->lock);
-	/* treat an error here as non-fatal */
-	hsag->attrs[i] = alloc_hsa_lifespan("lifespan", port_num);
-	if (hsag->attrs[i])
-		sysfs_attr_init(hsag->attrs[i]);
-
-	if (port) {
-		struct kobject *kobj = &port->kobj;
-		ret = sysfs_create_group(kobj, hsag);
-		if (ret)
-			goto err;
-		port->hw_stats_ag = hsag;
-		port->hw_stats = stats;
-	} else {
-		struct kobject *kobj = &device->dev.kobj;
-		ret = sysfs_create_group(kobj, hsag);
-		if (ret)
-			goto err;
-		device->hw_stats_ag = hsag;
-		device->hw_stats = stats;
+	data = alloc_hw_stats_port(port);
+	if (IS_ERR(data))
+		return PTR_ERR(data);
+
+	ret = port->ibdev->ops.get_hw_stats(port->ibdev, data->stats,
+					    port->port_num,
+					    data->stats->num_counters);
+	if (ret != data->stats->num_counters) {
+		if (WARN_ON(ret >= 0))
+			ret = -EINVAL;
+		goto err_free;
+	}
+	data->stats->timestamp = jiffies;
+
+	for (i = 0; i < data->stats->num_counters; i++) {
+		attr = &data->attrs[i];
+		sysfs_attr_init(&attr->attr.attr);
+		attr->attr.attr.name = data->stats->names[i];
+		attr->attr.attr.mode = 0444;
+		attr->attr.show = hw_stat_port_show;
+		attr->show = show_hw_stats;
+		data->group.attrs[i] = &attr->attr.attr;
 	}
 
-	return;
+	attr = &data->attrs[i];
+	sysfs_attr_init(&attr->attr.attr);
+	attr->attr.attr.name = "lifespan";
+	attr->attr.attr.mode = 0644;
+	attr->attr.show = hw_stat_port_show;
+	attr->show = show_stats_lifespan;
+	attr->attr.store = hw_stat_port_store;
+	attr->store = set_stats_lifespan;
+	data->group.attrs[i] = &attr->attr.attr;
+
+	port->hw_stats_data = data;
+	ret = sysfs_create_group(&port->kobj, &data->group);
+	if (ret)
+		goto err_free;
+	return 0;
 
-err:
-	for (; i >= 0; i--)
-		kfree(hsag->attrs[i]);
-err_free_hsag:
-	kfree(hsag);
-err_free_stats:
-	kfree(stats);
+err_free:
+	free_hw_stats_port(data);
+	port->hw_stats_data = NULL;
+	return ret;
+}
+
+static void destroy_hw_port_stats(struct ib_port *port)
+{
+	if (!port->hw_stats_data)
+		return;
+	sysfs_remove_group(&port->kobj, &port->hw_stats_data->group);
+	free_hw_stats_port(port->hw_stats_data);
+	port->hw_stats_data = NULL;
 }
 
 struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
 					   u32 port_num)
 {
-	if (!ibdev->port_data || !rdma_is_port_valid(ibdev, port_num))
+	if (!ibdev->port_data || !rdma_is_port_valid(ibdev, port_num) ||
+	    !ibdev->port_data[port_num].sysfs->hw_stats_data)
 		return NULL;
-	return ibdev->port_data[port_num].sysfs->hw_stats;
+	return ibdev->port_data[port_num].sysfs->hw_stats_data->stats;
 }
 
 static int add_port(struct ib_core_device *coredev, int port_num)
@@ -1161,21 +1280,23 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 			goto err_free_pkey;
 	}
 
+	/*
+	 * If port == 0, it means hw_counters are per device and not per
+	 * port, so holder should be device. Therefore skip per port
+	 * counter initialization.
+	 */
+	if (port_num && is_full_dev) {
+		ret = setup_hw_port_stats(p);
+		if (ret && ret != -EOPNOTSUPP)
+			goto err_remove_pkey;
+	}
 
 	if (device->ops.init_port && is_full_dev) {
 		ret = device->ops.init_port(device, port_num, &p->kobj);
 		if (ret)
-			goto err_remove_pkey;
+			goto err_remove_stats;
 	}
 
-	/*
-	 * If port == 0, it means hw_counters are per device and not per
-	 * port, so holder should be device. Therefore skip per port conunter
-	 * initialization.
-	 */
-	if (device->ops.alloc_hw_port_stats && port_num && is_full_dev)
-		setup_hw_stats(device, p, port_num);
-
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
 	if (device->port_data && is_full_dev)
 		device->port_data[port_num].sysfs = p;
@@ -1183,6 +1304,9 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	kobject_uevent(&p->kobj, KOBJ_ADD);
 	return 0;
 
+err_remove_stats:
+	destroy_hw_port_stats(p);
+
 err_remove_pkey:
 	if (p->pkey_group)
 		sysfs_remove_group(&p->kobj, p->pkey_group);
@@ -1365,9 +1489,7 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 
 		list_del(&p->entry);
-		if (port->hw_stats_ag)
-			free_hsag(&port->kobj, port->hw_stats_ag);
-		kfree(port->hw_stats);
+		destroy_hw_port_stats(port);
 		if (device->port_data && is_full_dev)
 			device->port_data[port->port_num].sysfs = NULL;
 
@@ -1419,18 +1541,18 @@ int ib_device_register_sysfs(struct ib_device *device)
 	if (ret)
 		return ret;
 
-	if (device->ops.alloc_hw_device_stats)
-		setup_hw_stats(device, NULL, 0);
+	ret = setup_hw_device_stats(device);
+	if (ret && ret != -EOPNOTSUPP) {
+		ib_free_port_attrs(&device->coredev);
+		return ret;
+	}
 
 	return 0;
 }
 
 void ib_device_unregister_sysfs(struct ib_device *device)
 {
-	if (device->hw_stats_ag)
-		free_hsag(&device->dev.kobj, device->hw_stats_ag);
-	kfree(device->hw_stats);
-
+	destroy_hw_device_stats(device);
 	ib_free_port_attrs(&device->coredev);
 }
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7a4cb7022f91..0dc7ab1a8dcf 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -51,6 +51,7 @@ struct ib_usrq_object;
 struct ib_uwq_object;
 struct rdma_cm_id;
 struct ib_port;
+struct hw_stats_device_data;
 
 extern struct workqueue_struct *ib_wq;
 extern struct workqueue_struct *ib_comp_wq;
@@ -2695,8 +2696,7 @@ struct ib_device {
 	u8                           node_type;
 	u32			     phys_port_cnt;
 	struct ib_device_attr        attrs;
-	struct attribute_group	     *hw_stats_ag;
-	struct rdma_hw_stats         *hw_stats;
+	struct hw_stats_device_data *hw_stats_data;
 
 #ifdef CONFIG_CGROUP_RDMA
 	struct rdmacg_device         cg_device;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 04/15] RDMA/core: Split gid_attrs related sysfs from add_port()
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (2 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 03/15] RDMA/core: Split port and device counter sysfs attributes Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 05/15] RDMA/core: Simplify how the gid_attrs sysfs is created Leon Romanovsky
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

The gid_attrs directory is a dedicated kobj nested under the port,
construct/destruct it with its own pair of functions for
understandability. This is much more readable than having it weirdly
inlined out of order into the add_port() function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/sysfs.c | 160 ++++++++++++++++++--------------
 1 file changed, 89 insertions(+), 71 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 114fecda9764..d2a089a6f666 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -1178,6 +1178,85 @@ struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
 	return ibdev->port_data[port_num].sysfs->hw_stats_data->stats;
 }
 
+/*
+ * Create the sysfs:
+ *  ibp0s9/ports/XX/gid_attrs/{ndevs,types}/YYY
+ * YYY is the gid table index in decimal
+ */
+static int setup_gid_attrs(struct ib_port *port,
+			   const struct ib_port_attr *attr)
+{
+	struct gid_attr_group *gid_attr_group;
+	int ret;
+	int i;
+
+	gid_attr_group = kzalloc(sizeof(*gid_attr_group), GFP_KERNEL);
+	if (!gid_attr_group)
+		return -ENOMEM;
+
+	gid_attr_group->port = port;
+	ret = kobject_init_and_add(&gid_attr_group->kobj, &gid_attr_type,
+				   &port->kobj, "gid_attrs");
+	if (ret)
+		goto err_put_gid_attrs;
+
+	gid_attr_group->ndev.name = "ndevs";
+	gid_attr_group->ndev.attrs =
+		alloc_group_attrs(show_port_gid_attr_ndev, attr->gid_tbl_len);
+	if (!gid_attr_group->ndev.attrs) {
+		ret = -ENOMEM;
+		goto err_put_gid_attrs;
+	}
+
+	ret = sysfs_create_group(&gid_attr_group->kobj, &gid_attr_group->ndev);
+	if (ret)
+		goto err_free_gid_ndev;
+
+	gid_attr_group->type.name = "types";
+	gid_attr_group->type.attrs = alloc_group_attrs(
+		show_port_gid_attr_gid_type, attr->gid_tbl_len);
+	if (!gid_attr_group->type.attrs) {
+		ret = -ENOMEM;
+		goto err_remove_gid_ndev;
+	}
+
+	ret = sysfs_create_group(&gid_attr_group->kobj, &gid_attr_group->type);
+	if (ret)
+		goto err_free_gid_type;
+
+	port->gid_attr_group = gid_attr_group;
+	return 0;
+
+err_free_gid_type:
+	for (i = 0; i < attr->gid_tbl_len; ++i)
+		kfree(gid_attr_group->type.attrs[i]);
+
+	kfree(gid_attr_group->type.attrs);
+	gid_attr_group->type.attrs = NULL;
+err_remove_gid_ndev:
+	sysfs_remove_group(&gid_attr_group->kobj, &gid_attr_group->ndev);
+err_free_gid_ndev:
+	for (i = 0; i < attr->gid_tbl_len; ++i)
+		kfree(gid_attr_group->ndev.attrs[i]);
+
+	kfree(gid_attr_group->ndev.attrs);
+	gid_attr_group->ndev.attrs = NULL;
+err_put_gid_attrs:
+	kobject_put(&gid_attr_group->kobj);
+	return ret;
+}
+
+static void destroy_gid_attrs(struct ib_port *port)
+{
+	struct gid_attr_group *gid_attr_group = port->gid_attr_group;
+
+	sysfs_remove_group(&gid_attr_group->kobj,
+			   &gid_attr_group->ndev);
+	sysfs_remove_group(&gid_attr_group->kobj,
+			   &gid_attr_group->type);
+	kobject_put(&gid_attr_group->kobj);
+}
+
 static int add_port(struct ib_core_device *coredev, int port_num)
 {
 	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
@@ -1204,23 +1283,11 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	if (ret)
 		goto err_put;
 
-	p->gid_attr_group = kzalloc(sizeof(*p->gid_attr_group), GFP_KERNEL);
-	if (!p->gid_attr_group) {
-		ret = -ENOMEM;
-		goto err_put;
-	}
-
-	p->gid_attr_group->port = p;
-	ret = kobject_init_and_add(&p->gid_attr_group->kobj, &gid_attr_type,
-				   &p->kobj, "gid_attrs");
-	if (ret)
-		goto err_put_gid_attrs;
-
 	if (device->ops.process_mad && is_full_dev) {
 		p->pma_table = get_counter_table(device, port_num);
 		ret = sysfs_create_group(&p->kobj, p->pma_table);
 		if (ret)
-			goto err_put_gid_attrs;
+			goto err_put;
 	}
 
 	p->gid_group.name  = "gids";
@@ -1234,37 +1301,11 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	if (ret)
 		goto err_free_gid;
 
-	p->gid_attr_group->ndev.name = "ndevs";
-	p->gid_attr_group->ndev.attrs = alloc_group_attrs(show_port_gid_attr_ndev,
-							  attr.gid_tbl_len);
-	if (!p->gid_attr_group->ndev.attrs) {
-		ret = -ENOMEM;
-		goto err_remove_gid;
-	}
-
-	ret = sysfs_create_group(&p->gid_attr_group->kobj,
-				 &p->gid_attr_group->ndev);
-	if (ret)
-		goto err_free_gid_ndev;
-
-	p->gid_attr_group->type.name = "types";
-	p->gid_attr_group->type.attrs = alloc_group_attrs(show_port_gid_attr_gid_type,
-							  attr.gid_tbl_len);
-	if (!p->gid_attr_group->type.attrs) {
-		ret = -ENOMEM;
-		goto err_remove_gid_ndev;
-	}
-
-	ret = sysfs_create_group(&p->gid_attr_group->kobj,
-				 &p->gid_attr_group->type);
-	if (ret)
-		goto err_free_gid_type;
-
 	if (attr.pkey_tbl_len) {
 		p->pkey_group = kzalloc(sizeof(*p->pkey_group), GFP_KERNEL);
 		if (!p->pkey_group) {
 			ret = -ENOMEM;
-			goto err_remove_gid_type;
+			goto err_remove_gid;
 		}
 
 		p->pkey_group->name  = "pkeys";
@@ -1290,11 +1331,14 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 		if (ret && ret != -EOPNOTSUPP)
 			goto err_remove_pkey;
 	}
+	ret = setup_gid_attrs(p, &attr);
+	if (ret)
+		goto err_remove_stats;
 
 	if (device->ops.init_port && is_full_dev) {
 		ret = device->ops.init_port(device, port_num, &p->kobj);
 		if (ret)
-			goto err_remove_stats;
+			goto err_remove_gid_attrs;
 	}
 
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
@@ -1304,6 +1348,9 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	kobject_uevent(&p->kobj, KOBJ_ADD);
 	return 0;
 
+err_remove_gid_attrs:
+	destroy_gid_attrs(p);
+
 err_remove_stats:
 	destroy_hw_port_stats(p);
 
@@ -1323,28 +1370,6 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 err_free_pkey_group:
 	kfree(p->pkey_group);
 
-err_remove_gid_type:
-	sysfs_remove_group(&p->gid_attr_group->kobj,
-			   &p->gid_attr_group->type);
-
-err_free_gid_type:
-	for (i = 0; i < attr.gid_tbl_len; ++i)
-		kfree(p->gid_attr_group->type.attrs[i]);
-
-	kfree(p->gid_attr_group->type.attrs);
-	p->gid_attr_group->type.attrs = NULL;
-
-err_remove_gid_ndev:
-	sysfs_remove_group(&p->gid_attr_group->kobj,
-			   &p->gid_attr_group->ndev);
-
-err_free_gid_ndev:
-	for (i = 0; i < attr.gid_tbl_len; ++i)
-		kfree(p->gid_attr_group->ndev.attrs[i]);
-
-	kfree(p->gid_attr_group->ndev.attrs);
-	p->gid_attr_group->ndev.attrs = NULL;
-
 err_remove_gid:
 	sysfs_remove_group(&p->kobj, &p->gid_group);
 
@@ -1359,9 +1384,6 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	if (p->pma_table)
 		sysfs_remove_group(&p->kobj, p->pma_table);
 
-err_put_gid_attrs:
-	kobject_put(&p->gid_attr_group->kobj);
-
 err_put:
 	kobject_put(&p->kobj);
 	return ret;
@@ -1498,11 +1520,7 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 		if (port->pkey_group)
 			sysfs_remove_group(p, port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
-		sysfs_remove_group(&port->gid_attr_group->kobj,
-				   &port->gid_attr_group->ndev);
-		sysfs_remove_group(&port->gid_attr_group->kobj,
-				   &port->gid_attr_group->type);
-		kobject_put(&port->gid_attr_group->kobj);
+		destroy_gid_attrs(port);
 		kobject_put(p);
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 05/15] RDMA/core: Simplify how the gid_attrs sysfs is created
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (3 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 04/15] RDMA/core: Split gid_attrs related sysfs from add_port() Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 06/15] RDMA/core: Simplify how the port " Leon Romanovsky
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

Instead of having an whole bunch of different allocations to create the
gid_attr kobjects reduce it to three, one for the kobj struct plus the
attributes, and one for the attribute list for each of the two
groups.

Move the freeing of all allocations to the release function.

Reorder the operations so all the allocations happen first then the
kobject & sysfs operations are last.

This removes the majority of the complicated error unwind since the
release function will always undo all the memory allocations. Freeing the
memory is also much simpler since there is a lot less of it.

Consolidate creating the "group of array indexes" pattern into one helper
function. Ensure kobject_del is used.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/sysfs.c | 170 +++++++++++++++++---------------
 1 file changed, 89 insertions(+), 81 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index d2a089a6f666..006bf759e890 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -47,23 +47,6 @@
 
 struct ib_port;
 
-struct gid_attr_group {
-	struct ib_port		*port;
-	struct kobject		kobj;
-	struct attribute_group	ndev;
-	struct attribute_group	type;
-};
-struct ib_port {
-	struct kobject         kobj;
-	struct ib_device      *ibdev;
-	struct gid_attr_group *gid_attr_group;
-	struct attribute_group gid_group;
-	struct attribute_group *pkey_group;
-	const struct attribute_group *pma_table;
-	struct hw_stats_port_data *hw_stats_data;
-	u32                     port_num;
-};
-
 struct port_attribute {
 	struct attribute attr;
 	ssize_t (*show)(struct ib_port *, struct port_attribute *, char *buf);
@@ -84,6 +67,25 @@ struct port_table_attribute {
 	__be16			attr_id;
 };
 
+struct gid_attr_group {
+	struct ib_port *port;
+	struct kobject kobj;
+	struct attribute_group groups[2];
+	const struct attribute_group *groups_list[3];
+	struct port_table_attribute attrs_list[];
+};
+
+struct ib_port {
+	struct kobject kobj;
+	struct ib_device *ibdev;
+	struct gid_attr_group *gid_attr_group;
+	struct attribute_group gid_group;
+	struct attribute_group *pkey_group;
+	const struct attribute_group *pma_table;
+	struct hw_stats_port_data *hw_stats_data;
+	u32 port_num;
+};
+
 struct hw_stats_device_attribute {
 	struct device_attribute attr;
 	ssize_t (*show)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
@@ -776,26 +778,13 @@ static void ib_port_release(struct kobject *kobj)
 
 static void ib_port_gid_attr_release(struct kobject *kobj)
 {
-	struct gid_attr_group *g = container_of(kobj, struct gid_attr_group,
-						kobj);
-	struct attribute *a;
+	struct gid_attr_group *gid_attr_group =
+		container_of(kobj, struct gid_attr_group, kobj);
 	int i;
 
-	if (g->ndev.attrs) {
-		for (i = 0; (a = g->ndev.attrs[i]); ++i)
-			kfree(a);
-
-		kfree(g->ndev.attrs);
-	}
-
-	if (g->type.attrs) {
-		for (i = 0; (a = g->type.attrs[i]); ++i)
-			kfree(a);
-
-		kfree(g->type.attrs);
-	}
-
-	kfree(g);
+	for (i = 0; i != ARRAY_SIZE(gid_attr_group->groups); i++)
+		kfree(gid_attr_group->groups[i].attrs);
+	kfree(gid_attr_group);
 }
 
 static struct kobj_type port_type = {
@@ -1178,6 +1167,41 @@ struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
 	return ibdev->port_data[port_num].sysfs->hw_stats_data->stats;
 }
 
+static int alloc_port_table_group(
+	const char *name, struct attribute_group *group,
+	struct port_table_attribute *attrs, size_t num,
+	ssize_t (*show)(struct ib_port *, struct port_attribute *, char *buf))
+{
+	struct attribute **attr_list;
+	int i;
+
+	attr_list = kcalloc(num + 1, sizeof(*attr_list), GFP_KERNEL);
+	if (!attr_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num; i++) {
+		struct port_table_attribute *element = &attrs[i];
+
+		if (snprintf(element->name, sizeof(element->name), "%d", i) >=
+		    sizeof(element->name))
+			goto err;
+
+		sysfs_attr_init(&element->attr.attr);
+		element->attr.attr.name = element->name;
+		element->attr.attr.mode = 0444;
+		element->attr.show = show;
+		element->index = i;
+
+		attr_list[i] = &element->attr.attr;
+	}
+	group->name = name;
+	group->attrs = attr_list;
+	return 0;
+err:
+	kfree(attr_list);
+	return -EINVAL;
+}
+
 /*
  * Create the sysfs:
  *  ibp0s9/ports/XX/gid_attrs/{ndevs,types}/YYY
@@ -1188,60 +1212,44 @@ static int setup_gid_attrs(struct ib_port *port,
 {
 	struct gid_attr_group *gid_attr_group;
 	int ret;
-	int i;
 
-	gid_attr_group = kzalloc(sizeof(*gid_attr_group), GFP_KERNEL);
+	gid_attr_group = kzalloc(struct_size(gid_attr_group, attrs_list,
+					     attr->gid_tbl_len * 2),
+				 GFP_KERNEL);
 	if (!gid_attr_group)
 		return -ENOMEM;
-
 	gid_attr_group->port = port;
-	ret = kobject_init_and_add(&gid_attr_group->kobj, &gid_attr_type,
-				   &port->kobj, "gid_attrs");
-	if (ret)
-		goto err_put_gid_attrs;
-
-	gid_attr_group->ndev.name = "ndevs";
-	gid_attr_group->ndev.attrs =
-		alloc_group_attrs(show_port_gid_attr_ndev, attr->gid_tbl_len);
-	if (!gid_attr_group->ndev.attrs) {
-		ret = -ENOMEM;
-		goto err_put_gid_attrs;
-	}
+	kobject_init(&gid_attr_group->kobj, &gid_attr_type);
 
-	ret = sysfs_create_group(&gid_attr_group->kobj, &gid_attr_group->ndev);
+	ret = alloc_port_table_group("ndevs", &gid_attr_group->groups[0],
+				     gid_attr_group->attrs_list,
+				     attr->gid_tbl_len,
+				     show_port_gid_attr_ndev);
 	if (ret)
-		goto err_free_gid_ndev;
-
-	gid_attr_group->type.name = "types";
-	gid_attr_group->type.attrs = alloc_group_attrs(
-		show_port_gid_attr_gid_type, attr->gid_tbl_len);
-	if (!gid_attr_group->type.attrs) {
-		ret = -ENOMEM;
-		goto err_remove_gid_ndev;
-	}
+		goto err_put;
+	gid_attr_group->groups_list[0] = &gid_attr_group->groups[0];
 
-	ret = sysfs_create_group(&gid_attr_group->kobj, &gid_attr_group->type);
+	ret = alloc_port_table_group(
+		"types", &gid_attr_group->groups[1],
+		gid_attr_group->attrs_list + attr->gid_tbl_len,
+		attr->gid_tbl_len, show_port_gid_attr_gid_type);
 	if (ret)
-		goto err_free_gid_type;
+		goto err_put;
+	gid_attr_group->groups_list[1] = &gid_attr_group->groups[1];
 
+	ret = kobject_add(&gid_attr_group->kobj, &port->kobj, "gid_attrs");
+	if (ret)
+		goto err_put;
+	ret = sysfs_create_groups(&gid_attr_group->kobj,
+				  gid_attr_group->groups_list);
+	if (ret)
+		goto err_del;
 	port->gid_attr_group = gid_attr_group;
 	return 0;
 
-err_free_gid_type:
-	for (i = 0; i < attr->gid_tbl_len; ++i)
-		kfree(gid_attr_group->type.attrs[i]);
-
-	kfree(gid_attr_group->type.attrs);
-	gid_attr_group->type.attrs = NULL;
-err_remove_gid_ndev:
-	sysfs_remove_group(&gid_attr_group->kobj, &gid_attr_group->ndev);
-err_free_gid_ndev:
-	for (i = 0; i < attr->gid_tbl_len; ++i)
-		kfree(gid_attr_group->ndev.attrs[i]);
-
-	kfree(gid_attr_group->ndev.attrs);
-	gid_attr_group->ndev.attrs = NULL;
-err_put_gid_attrs:
+err_del:
+	kobject_del(&gid_attr_group->kobj);
+err_put:
 	kobject_put(&gid_attr_group->kobj);
 	return ret;
 }
@@ -1250,10 +1258,10 @@ static void destroy_gid_attrs(struct ib_port *port)
 {
 	struct gid_attr_group *gid_attr_group = port->gid_attr_group;
 
-	sysfs_remove_group(&gid_attr_group->kobj,
-			   &gid_attr_group->ndev);
-	sysfs_remove_group(&gid_attr_group->kobj,
-			   &gid_attr_group->type);
+	if (!gid_attr_group)
+		return;
+	sysfs_remove_groups(&gid_attr_group->kobj, gid_attr_group->groups_list);
+	kobject_del(&gid_attr_group->kobj);
 	kobject_put(&gid_attr_group->kobj);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 06/15] RDMA/core: Simplify how the port sysfs is created
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (4 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 05/15] RDMA/core: Simplify how the gid_attrs sysfs is created Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 07/15] RDMA/core: Create the device hw_counters through the normal groups mechanism Leon Romanovsky
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

Use the same technique as gid_attrs now uses to manage the port
sysfs. Bundle everything into three allocations and use a single
sysfs_create_groups() to build everything in one shot.

All the memory is always freed in the kobj release function, removing most
of the error unwinding.

The gid_attr technique and the hw_counters are very similar, merge the two
together and combine the sysfs_create_group() call for hw_counters with
the single sysfs group setup.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/sysfs.c | 322 ++++++++++----------------------
 1 file changed, 103 insertions(+), 219 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 006bf759e890..2631c179e004 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -79,11 +79,12 @@ struct ib_port {
 	struct kobject kobj;
 	struct ib_device *ibdev;
 	struct gid_attr_group *gid_attr_group;
-	struct attribute_group gid_group;
-	struct attribute_group *pkey_group;
-	const struct attribute_group *pma_table;
 	struct hw_stats_port_data *hw_stats_data;
+
+	struct attribute_group groups[3];
+	const struct attribute_group *groups_list[5];
 	u32 port_num;
+	struct port_table_attribute attrs_list[];
 };
 
 struct hw_stats_device_attribute {
@@ -112,7 +113,6 @@ struct hw_stats_device_data {
 };
 
 struct hw_stats_port_data {
-	struct attribute_group group;
 	struct rdma_hw_stats *stats;
 	struct hw_stats_port_attribute attrs[];
 };
@@ -750,30 +750,15 @@ static const struct attribute_group pma_group_noietf = {
 
 static void ib_port_release(struct kobject *kobj)
 {
-	struct ib_port *p = container_of(kobj, struct ib_port, kobj);
-	struct attribute *a;
+	struct ib_port *port = container_of(kobj, struct ib_port, kobj);
 	int i;
 
-	if (p->gid_group.attrs) {
-		for (i = 0; (a = p->gid_group.attrs[i]); ++i)
-			kfree(a);
-
-		kfree(p->gid_group.attrs);
-	}
-
-	if (p->pkey_group) {
-		if (p->pkey_group->attrs) {
-			for (i = 0; (a = p->pkey_group->attrs[i]); ++i)
-				kfree(a);
-
-			kfree(p->pkey_group->attrs);
-		}
-
-		kfree(p->pkey_group);
-		p->pkey_group = NULL;
-	}
-
-	kfree(p);
+	for (i = 0; i != ARRAY_SIZE(port->groups); i++)
+		kfree(port->groups[i].attrs);
+	if (port->hw_stats_data)
+		kfree(port->hw_stats_data->stats);
+	kfree(port->hw_stats_data);
+	kfree(port);
 }
 
 static void ib_port_gid_attr_release(struct kobject *kobj)
@@ -798,49 +783,6 @@ static struct kobj_type gid_attr_type = {
 	.release        = ib_port_gid_attr_release
 };
 
-static struct attribute **
-alloc_group_attrs(ssize_t (*show)(struct ib_port *,
-				  struct port_attribute *, char *buf),
-		  int len)
-{
-	struct attribute **tab_attr;
-	struct port_table_attribute *element;
-	int i;
-
-	tab_attr = kcalloc(1 + len, sizeof(struct attribute *), GFP_KERNEL);
-	if (!tab_attr)
-		return NULL;
-
-	for (i = 0; i < len; i++) {
-		element = kzalloc(sizeof(struct port_table_attribute),
-				  GFP_KERNEL);
-		if (!element)
-			goto err;
-
-		if (snprintf(element->name, sizeof(element->name),
-			     "%d", i) >= sizeof(element->name)) {
-			kfree(element);
-			goto err;
-		}
-
-		element->attr.attr.name  = element->name;
-		element->attr.attr.mode  = S_IRUGO;
-		element->attr.show       = show;
-		element->index		 = i;
-		sysfs_attr_init(&element->attr.attr);
-
-		tab_attr[i] = &element->attr.attr;
-	}
-
-	return tab_attr;
-
-err:
-	while (--i >= 0)
-		kfree(tab_attr[i]);
-	kfree(tab_attr);
-	return NULL;
-}
-
 /*
  * Figure out which counter table to use depending on
  * the device capabilities.
@@ -1051,7 +993,8 @@ static void destroy_hw_device_stats(struct ib_device *ibdev)
 	ibdev->hw_stats_data = NULL;
 }
 
-static struct hw_stats_port_data *alloc_hw_stats_port(struct ib_port *port)
+static struct hw_stats_port_data *
+alloc_hw_stats_port(struct ib_port *port, struct attribute_group *group)
 {
 	struct ib_device *ibdev = port->ibdev;
 	struct hw_stats_port_data *data;
@@ -1073,13 +1016,13 @@ static struct hw_stats_port_data *alloc_hw_stats_port(struct ib_port *port)
 		       GFP_KERNEL);
 	if (!data)
 		goto err_free_stats;
-	data->group.attrs = kcalloc(stats->num_counters + 2,
-				    sizeof(*data->group.attrs), GFP_KERNEL);
-	if (!data->group.attrs)
+	group->attrs = kcalloc(stats->num_counters + 2,
+				    sizeof(*group->attrs), GFP_KERNEL);
+	if (!group->attrs)
 		goto err_free_data;
 
 	mutex_init(&stats->lock);
-	data->group.name = "hw_counters";
+	group->name = "hw_counters";
 	data->stats = stats;
 	return data;
 
@@ -1090,20 +1033,14 @@ static struct hw_stats_port_data *alloc_hw_stats_port(struct ib_port *port)
 	return ERR_PTR(-ENOMEM);
 }
 
-static void free_hw_stats_port(struct hw_stats_port_data *data)
-{
-	kfree(data->group.attrs);
-	kfree(data->stats);
-	kfree(data);
-}
-
-static int setup_hw_port_stats(struct ib_port *port)
+static int setup_hw_port_stats(struct ib_port *port,
+			       struct attribute_group *group)
 {
 	struct hw_stats_port_attribute *attr;
 	struct hw_stats_port_data *data;
 	int i, ret;
 
-	data = alloc_hw_stats_port(port);
+	data = alloc_hw_stats_port(port, group);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
@@ -1112,9 +1049,10 @@ static int setup_hw_port_stats(struct ib_port *port)
 					    data->stats->num_counters);
 	if (ret != data->stats->num_counters) {
 		if (WARN_ON(ret >= 0))
-			ret = -EINVAL;
-		goto err_free;
+			return -EINVAL;
+		return ret;
 	}
+
 	data->stats->timestamp = jiffies;
 
 	for (i = 0; i < data->stats->num_counters; i++) {
@@ -1124,7 +1062,7 @@ static int setup_hw_port_stats(struct ib_port *port)
 		attr->attr.attr.mode = 0444;
 		attr->attr.show = hw_stat_port_show;
 		attr->show = show_hw_stats;
-		data->group.attrs[i] = &attr->attr.attr;
+		group->attrs[i] = &attr->attr.attr;
 	}
 
 	attr = &data->attrs[i];
@@ -1135,27 +1073,10 @@ static int setup_hw_port_stats(struct ib_port *port)
 	attr->show = show_stats_lifespan;
 	attr->attr.store = hw_stat_port_store;
 	attr->store = set_stats_lifespan;
-	data->group.attrs[i] = &attr->attr.attr;
+	group->attrs[i] = &attr->attr.attr;
 
 	port->hw_stats_data = data;
-	ret = sysfs_create_group(&port->kobj, &data->group);
-	if (ret)
-		goto err_free;
 	return 0;
-
-err_free:
-	free_hw_stats_port(data);
-	port->hw_stats_data = NULL;
-	return ret;
-}
-
-static void destroy_hw_port_stats(struct ib_port *port)
-{
-	if (!port->hw_stats_data)
-		return;
-	sysfs_remove_group(&port->kobj, &port->hw_stats_data->group);
-	free_hw_stats_port(port->hw_stats_data);
-	port->hw_stats_data = NULL;
 }
 
 struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
@@ -1265,68 +1186,42 @@ static void destroy_gid_attrs(struct ib_port *port)
 	kobject_put(&gid_attr_group->kobj);
 }
 
-static int add_port(struct ib_core_device *coredev, int port_num)
+/*
+ * Create the sysfs:
+ *  ibp0s9/ports/XX/{gids,pkeys,counters}/YYY
+ */
+static struct ib_port *setup_port(struct ib_core_device *coredev, int port_num,
+				  const struct ib_port_attr *attr)
 {
 	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
 	bool is_full_dev = &device->coredev == coredev;
+	const struct attribute_group **cur_group;
 	struct ib_port *p;
-	struct ib_port_attr attr;
-	int i;
 	int ret;
 
-	ret = ib_query_port(device, port_num, &attr);
-	if (ret)
-		return ret;
-
-	p = kzalloc(sizeof *p, GFP_KERNEL);
+	p = kzalloc(struct_size(p, attrs_list,
+				attr->gid_tbl_len + attr->pkey_tbl_len),
+		    GFP_KERNEL);
 	if (!p)
-		return -ENOMEM;
-
-	p->ibdev      = device;
-	p->port_num   = port_num;
+		return ERR_PTR(-ENOMEM);
+	p->ibdev = device;
+	p->port_num = port_num;
+	kobject_init(&p->kobj, &port_type);
 
-	ret = kobject_init_and_add(&p->kobj, &port_type,
-				   coredev->ports_kobj,
-				   "%d", port_num);
+	cur_group = p->groups_list;
+	ret = alloc_port_table_group("gids", &p->groups[0], p->attrs_list,
+				     attr->gid_tbl_len, show_port_gid);
 	if (ret)
 		goto err_put;
+	*cur_group++ = &p->groups[0];
 
-	if (device->ops.process_mad && is_full_dev) {
-		p->pma_table = get_counter_table(device, port_num);
-		ret = sysfs_create_group(&p->kobj, p->pma_table);
+	if (attr->pkey_tbl_len) {
+		ret = alloc_port_table_group("pkeys", &p->groups[1],
+					     p->attrs_list + attr->gid_tbl_len,
+					     attr->pkey_tbl_len, show_port_pkey);
 		if (ret)
 			goto err_put;
-	}
-
-	p->gid_group.name  = "gids";
-	p->gid_group.attrs = alloc_group_attrs(show_port_gid, attr.gid_tbl_len);
-	if (!p->gid_group.attrs) {
-		ret = -ENOMEM;
-		goto err_remove_pma;
-	}
-
-	ret = sysfs_create_group(&p->kobj, &p->gid_group);
-	if (ret)
-		goto err_free_gid;
-
-	if (attr.pkey_tbl_len) {
-		p->pkey_group = kzalloc(sizeof(*p->pkey_group), GFP_KERNEL);
-		if (!p->pkey_group) {
-			ret = -ENOMEM;
-			goto err_remove_gid;
-		}
-
-		p->pkey_group->name  = "pkeys";
-		p->pkey_group->attrs = alloc_group_attrs(show_port_pkey,
-							 attr.pkey_tbl_len);
-		if (!p->pkey_group->attrs) {
-			ret = -ENOMEM;
-			goto err_free_pkey_group;
-		}
-
-		ret = sysfs_create_group(&p->kobj, p->pkey_group);
-		if (ret)
-			goto err_free_pkey;
+		*cur_group++ = &p->groups[1];
 	}
 
 	/*
@@ -1335,66 +1230,45 @@ static int add_port(struct ib_core_device *coredev, int port_num)
 	 * counter initialization.
 	 */
 	if (port_num && is_full_dev) {
-		ret = setup_hw_port_stats(p);
+		ret = setup_hw_port_stats(p, &p->groups[2]);
 		if (ret && ret != -EOPNOTSUPP)
-			goto err_remove_pkey;
+			goto err_put;
+		if (!ret)
+			*cur_group++ = &p->groups[2];
 	}
-	ret = setup_gid_attrs(p, &attr);
-	if (ret)
-		goto err_remove_stats;
 
-	if (device->ops.init_port && is_full_dev) {
-		ret = device->ops.init_port(device, port_num, &p->kobj);
-		if (ret)
-			goto err_remove_gid_attrs;
-	}
+	if (device->ops.process_mad && is_full_dev)
+		*cur_group++ = get_counter_table(device, port_num);
+
+	ret = kobject_add(&p->kobj, coredev->ports_kobj, "%d", port_num);
+	if (ret)
+		goto err_put;
+	ret = sysfs_create_groups(&p->kobj, p->groups_list);
+	if (ret)
+		goto err_del;
 
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
 	if (device->port_data && is_full_dev)
 		device->port_data[port_num].sysfs = p;
 
-	kobject_uevent(&p->kobj, KOBJ_ADD);
-	return 0;
-
-err_remove_gid_attrs:
-	destroy_gid_attrs(p);
-
-err_remove_stats:
-	destroy_hw_port_stats(p);
-
-err_remove_pkey:
-	if (p->pkey_group)
-		sysfs_remove_group(&p->kobj, p->pkey_group);
-
-err_free_pkey:
-	if (p->pkey_group) {
-		for (i = 0; i < attr.pkey_tbl_len; ++i)
-			kfree(p->pkey_group->attrs[i]);
-
-		kfree(p->pkey_group->attrs);
-		p->pkey_group->attrs = NULL;
-	}
-
-err_free_pkey_group:
-	kfree(p->pkey_group);
-
-err_remove_gid:
-	sysfs_remove_group(&p->kobj, &p->gid_group);
-
-err_free_gid:
-	for (i = 0; i < attr.gid_tbl_len; ++i)
-		kfree(p->gid_group.attrs[i]);
-
-	kfree(p->gid_group.attrs);
-	p->gid_group.attrs = NULL;
-
-err_remove_pma:
-	if (p->pma_table)
-		sysfs_remove_group(&p->kobj, p->pma_table);
+	return p;
 
+err_del:
+	kobject_del(&p->kobj);
 err_put:
 	kobject_put(&p->kobj);
-	return ret;
+	return ERR_PTR(ret);
+}
+
+static void destroy_port(struct ib_port *port)
+{
+	if (port->ibdev->port_data &&
+	    port->ibdev->port_data[port->port_num].sysfs == port)
+		port->ibdev->port_data[port->port_num].sysfs = NULL;
+	list_del(&port->kobj.entry);
+	sysfs_remove_groups(&port->kobj, port->groups_list);
+	kobject_del(&port->kobj);
+	kobject_put(&port->kobj);
 }
 
 static const char *node_type_string(int node_type)
@@ -1511,25 +1385,13 @@ const struct attribute_group ib_dev_attr_group = {
 
 void ib_free_port_attrs(struct ib_core_device *coredev)
 {
-	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
-	bool is_full_dev = &device->coredev == coredev;
 	struct kobject *p, *t;
 
 	list_for_each_entry_safe(p, t, &coredev->port_list, entry) {
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 
-		list_del(&p->entry);
-		destroy_hw_port_stats(port);
-		if (device->port_data && is_full_dev)
-			device->port_data[port->port_num].sysfs = NULL;
-
-		if (port->pma_table)
-			sysfs_remove_group(p, port->pma_table);
-		if (port->pkey_group)
-			sysfs_remove_group(p, port->pkey_group);
-		sysfs_remove_group(p, &port->gid_group);
 		destroy_gid_attrs(port);
-		kobject_put(p);
+		destroy_port(port);
 	}
 
 	kobject_put(coredev->ports_kobj);
@@ -1538,7 +1400,8 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 int ib_setup_port_attrs(struct ib_core_device *coredev)
 {
 	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
-	u32 port;
+	bool is_full_dev = &device->coredev == coredev;
+	u32 port_num;
 	int ret;
 
 	coredev->ports_kobj = kobject_create_and_add("ports",
@@ -1546,12 +1409,33 @@ int ib_setup_port_attrs(struct ib_core_device *coredev)
 	if (!coredev->ports_kobj)
 		return -ENOMEM;
 
-	rdma_for_each_port (device, port) {
-		ret = add_port(coredev, port);
+	rdma_for_each_port (device, port_num) {
+		struct ib_port_attr attr;
+		struct ib_port *port;
+
+		ret = ib_query_port(device, port_num, &attr);
 		if (ret)
 			goto err_put;
-	}
 
+		port = setup_port(coredev, port_num, &attr);
+		if (IS_ERR(port)) {
+			ret = PTR_ERR(port);
+			goto err_put;
+		}
+
+		ret = setup_gid_attrs(port, &attr);
+		if (ret)
+			goto err_put;
+
+		if (device->ops.init_port && is_full_dev) {
+			ret = device->ops.init_port(device, port_num,
+						    &port->kobj);
+			if (ret)
+				goto err_put;
+		}
+
+		kobject_uevent(&port->kobj, KOBJ_ADD);
+	}
 	return 0;
 
 err_put:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 07/15] RDMA/core: Create the device hw_counters through the normal groups mechanism
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (5 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 06/15] RDMA/core: Simplify how the port " Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 08/15] RDMA/core: Remove the kobject_uevent() NOP Leon Romanovsky
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

Instead of calling device_add_groups() add the group to the existing
groups array which is managed through device_add().

This requires setting up the hw_counters before device_add(), so it gets
split up from the already split port sysfs flow.

Move all the memory freeing to the release function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/core_priv.h |  4 +-
 drivers/infiniband/core/device.c    | 11 ++++-
 drivers/infiniband/core/sysfs.c     | 66 +++++++----------------------
 include/rdma/ib_verbs.h             |  9 ++--
 4 files changed, 32 insertions(+), 58 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index ec5c2c3db423..6066c4b39876 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -78,8 +78,6 @@ static inline struct rdma_dev_net *rdma_net_to_dev_net(struct net *net)
 	return net_generic(net, rdma_dev_net_id);
 }
 
-int ib_device_register_sysfs(struct ib_device *device);
-void ib_device_unregister_sysfs(struct ib_device *device);
 int ib_device_rename(struct ib_device *ibdev, const char *name);
 int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim);
 
@@ -379,6 +377,8 @@ struct net_device *rdma_read_gid_attr_ndev_rcu(const struct ib_gid_attr *attr);
 void ib_free_port_attrs(struct ib_core_device *coredev);
 int ib_setup_port_attrs(struct ib_core_device *coredev);
 struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev, u32 port_num);
+void ib_device_release_hw_stats(struct hw_stats_device_data *data);
+int ib_setup_device_attrs(struct ib_device *ibdev);
 
 int rdma_compatdev_set(u8 enable);
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 86a16cd7d7fd..030a4041b2e0 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -491,6 +491,8 @@ static void ib_device_release(struct device *device)
 
 	free_netdevs(dev);
 	WARN_ON(refcount_read(&dev->refcount));
+	if (dev->hw_stats_data)
+		ib_device_release_hw_stats(dev->hw_stats_data);
 	if (dev->port_data) {
 		ib_cache_release_one(dev);
 		ib_security_release_port_pkey_list(dev);
@@ -1394,6 +1396,10 @@ int ib_register_device(struct ib_device *device, const char *name,
 		return ret;
 	}
 
+	ret = ib_setup_device_attrs(device);
+	if (ret)
+		goto cache_cleanup;
+
 	ib_device_register_rdmacg(device);
 
 	rdma_counter_init(device);
@@ -1407,7 +1413,7 @@ int ib_register_device(struct ib_device *device, const char *name,
 	if (ret)
 		goto cg_cleanup;
 
-	ret = ib_device_register_sysfs(device);
+	ret = ib_setup_port_attrs(&device->coredev);
 	if (ret) {
 		dev_warn(&device->dev,
 			 "Couldn't register device with driver model\n");
@@ -1449,6 +1455,7 @@ int ib_register_device(struct ib_device *device, const char *name,
 cg_cleanup:
 	dev_set_uevent_suppress(&device->dev, false);
 	ib_device_unregister_rdmacg(device);
+cache_cleanup:
 	ib_cache_cleanup_one(device);
 	return ret;
 }
@@ -1473,7 +1480,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
 	/* Expedite removing unregistered pointers from the hash table */
 	free_netdevs(ib_dev);
 
-	ib_device_unregister_sysfs(ib_dev);
+	ib_free_port_attrs(&ib_dev->coredev);
 	device_del(&ib_dev->dev);
 	ib_device_unregister_rdmacg(ib_dev);
 	ib_cache_cleanup_one(ib_dev);
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 2631c179e004..07a00d3d3d44 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -107,7 +107,6 @@ struct hw_stats_port_attribute {
 
 struct hw_stats_device_data {
 	struct attribute_group group;
-	const struct attribute_group *groups[2];
 	struct rdma_hw_stats *stats;
 	struct hw_stats_device_attribute attrs[];
 };
@@ -915,7 +914,6 @@ alloc_hw_stats_device(struct ib_device *ibdev)
 	mutex_init(&stats->lock);
 	data->group.name = "hw_counters";
 	data->stats = stats;
-	data->groups[0] = &data->group;
 	return data;
 
 err_free_data:
@@ -925,29 +923,33 @@ alloc_hw_stats_device(struct ib_device *ibdev)
 	return ERR_PTR(-ENOMEM);
 }
 
-static void free_hw_stats_device(struct hw_stats_device_data *data)
+void ib_device_release_hw_stats(struct hw_stats_device_data *data)
 {
 	kfree(data->group.attrs);
 	kfree(data->stats);
 	kfree(data);
 }
 
-static int setup_hw_device_stats(struct ib_device *ibdev)
+int ib_setup_device_attrs(struct ib_device *ibdev)
 {
 	struct hw_stats_device_attribute *attr;
 	struct hw_stats_device_data *data;
 	int i, ret;
 
 	data = alloc_hw_stats_device(ibdev);
-	if (IS_ERR(data))
+	if (IS_ERR(data)) {
+		if (PTR_ERR(data) == -EOPNOTSUPP)
+			return 0;
 		return PTR_ERR(data);
+	}
+	ibdev->hw_stats_data = data;
 
 	ret = ibdev->ops.get_hw_stats(ibdev, data->stats, 0,
 				      data->stats->num_counters);
 	if (ret != data->stats->num_counters) {
 		if (WARN_ON(ret >= 0))
-			ret = -EINVAL;
-		goto err_free;
+			return -EINVAL;
+		return ret;
 	}
 
 	data->stats->timestamp = jiffies;
@@ -971,26 +973,13 @@ static int setup_hw_device_stats(struct ib_device *ibdev)
 	attr->attr.store = hw_stat_device_store;
 	attr->store = set_stats_lifespan;
 	data->group.attrs[i] = &attr->attr.attr;
-
-	ibdev->hw_stats_data = data;
-	ret = device_add_groups(&ibdev->dev, data->groups);
-	if (ret)
-		goto err_free;
-	return 0;
-
-err_free:
-	free_hw_stats_device(data);
-	ibdev->hw_stats_data = NULL;
-	return ret;
-}
-
-static void destroy_hw_device_stats(struct ib_device *ibdev)
-{
-	if (!ibdev->hw_stats_data)
-		return;
-	device_remove_groups(&ibdev->dev, ibdev->hw_stats_data->groups);
-	free_hw_stats_device(ibdev->hw_stats_data);
-	ibdev->hw_stats_data = NULL;
+	for (i = 0; i != ARRAY_SIZE(ibdev->groups); i++)
+		if (!ibdev->groups[i]) {
+			ibdev->groups[i] = &data->group;
+			return 0;
+		}
+	WARN(true, "struct ib_device->groups is too small");
+	return -EINVAL;
 }
 
 static struct hw_stats_port_data *
@@ -1443,29 +1432,6 @@ int ib_setup_port_attrs(struct ib_core_device *coredev)
 	return ret;
 }
 
-int ib_device_register_sysfs(struct ib_device *device)
-{
-	int ret;
-
-	ret = ib_setup_port_attrs(&device->coredev);
-	if (ret)
-		return ret;
-
-	ret = setup_hw_device_stats(device);
-	if (ret && ret != -EOPNOTSUPP) {
-		ib_free_port_attrs(&device->coredev);
-		return ret;
-	}
-
-	return 0;
-}
-
-void ib_device_unregister_sysfs(struct ib_device *device)
-{
-	destroy_hw_device_stats(device);
-	ib_free_port_attrs(&device->coredev);
-}
-
 /**
  * ib_port_register_module_stat - add module counters under relevant port
  *  of IB device.
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0dc7ab1a8dcf..5ca1cb82a543 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2677,11 +2677,12 @@ struct ib_device {
 		struct ib_core_device	coredev;
 	};
 
-	/* First group for device attributes,
-	 * Second group for driver provided attributes (optional).
-	 * It is NULL terminated array.
+	/* First group is for device attributes,
+	 * Second group is for driver provided attributes (optional).
+	 * Third group is for the hw_stats
+	 * It is a NULL terminated array.
 	 */
-	const struct attribute_group	*groups[3];
+	const struct attribute_group	*groups[4];
 
 	u64			     uverbs_cmd_mask;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 08/15] RDMA/core: Remove the kobject_uevent() NOP
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (6 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 07/15] RDMA/core: Create the device hw_counters through the normal groups mechanism Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 09/15] RDMA/core: Expose the ib port sysfs attribute machinery Leon Romanovsky
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

This call does nothing because the ib_port kobj is nested under a struct
device kobject and the dev_uevent_filter() function of the struct device
blocks uevents for any children kobj's that are not also struct devices.

A uevent for the struct device will be triggered after
ib_setup_port_attrs() returns which causes udev to pick up all the deep
"attributes" which are implemented as kobjects nested under a struct
device and assign them to the udev object for the struct device:

 $ udevadm info -a /sys/class/infiniband/ibp0s9
     ATTR{ports/1/counters/excessive_buffer_overrun_errors}=="0"

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/sysfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 07a00d3d3d44..14b838863b5d 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -1422,8 +1422,6 @@ int ib_setup_port_attrs(struct ib_core_device *coredev)
 			if (ret)
 				goto err_put;
 		}
-
-		kobject_uevent(&port->kobj, KOBJ_ADD);
 	}
 	return 0;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 09/15] RDMA/core: Expose the ib port sysfs attribute machinery
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (7 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 08/15] RDMA/core: Remove the kobject_uevent() NOP Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's Leon Romanovsky
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

Other things outside the core code are creating attributes against the
port. This patch exposes the basic machinery to do this.

The ib_port_attribute type allows creating groups of attributes attatched
to the port and comes with the usual machinery to do this.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/sysfs.c | 217 +++++++++++++++++---------------
 include/rdma/ib_sysfs.h         |  41 ++++++
 2 files changed, 158 insertions(+), 100 deletions(-)
 create mode 100644 include/rdma/ib_sysfs.h

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 14b838863b5d..3c5541c39bf6 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -44,24 +44,10 @@
 #include <rdma/ib_pma.h>
 #include <rdma/ib_cache.h>
 #include <rdma/rdma_counter.h>
-
-struct ib_port;
-
-struct port_attribute {
-	struct attribute attr;
-	ssize_t (*show)(struct ib_port *, struct port_attribute *, char *buf);
-	ssize_t (*store)(struct ib_port *, struct port_attribute *,
-			 const char *buf, size_t count);
-};
-
-#define PORT_ATTR(_name, _mode, _show, _store) \
-struct port_attribute port_attr_##_name = __ATTR(_name, _mode, _show, _store)
-
-#define PORT_ATTR_RO(_name) \
-struct port_attribute port_attr_##_name = __ATTR_RO(_name)
+#include <rdma/ib_sysfs.h>
 
 struct port_table_attribute {
-	struct port_attribute	attr;
+	struct ib_port_attribute attr;
 	char			name[8];
 	int			index;
 	__be16			attr_id;
@@ -97,7 +83,7 @@ struct hw_stats_device_attribute {
 };
 
 struct hw_stats_port_attribute {
-	struct port_attribute attr;
+	struct ib_port_attribute attr;
 	ssize_t (*show)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
 			unsigned int index, unsigned int port_num, char *buf);
 	ssize_t (*store)(struct ib_device *ibdev, struct rdma_hw_stats *stats,
@@ -119,29 +105,55 @@ struct hw_stats_port_data {
 static ssize_t port_attr_show(struct kobject *kobj,
 			      struct attribute *attr, char *buf)
 {
-	struct port_attribute *port_attr =
-		container_of(attr, struct port_attribute, attr);
+	struct ib_port_attribute *port_attr =
+		container_of(attr, struct ib_port_attribute, attr);
 	struct ib_port *p = container_of(kobj, struct ib_port, kobj);
 
 	if (!port_attr->show)
 		return -EIO;
 
-	return port_attr->show(p, port_attr, buf);
+	return port_attr->show(p->ibdev, p->port_num, port_attr, buf);
 }
 
 static ssize_t port_attr_store(struct kobject *kobj,
 			       struct attribute *attr,
 			       const char *buf, size_t count)
 {
-	struct port_attribute *port_attr =
-		container_of(attr, struct port_attribute, attr);
+	struct ib_port_attribute *port_attr =
+		container_of(attr, struct ib_port_attribute, attr);
 	struct ib_port *p = container_of(kobj, struct ib_port, kobj);
 
 	if (!port_attr->store)
 		return -EIO;
-	return port_attr->store(p, port_attr, buf, count);
+	return port_attr->store(p->ibdev, p->port_num, port_attr, buf, count);
 }
 
+int ib_port_sysfs_create_groups(struct ib_device *ibdev, u32 port_num,
+				const struct attribute_group **groups)
+{
+	return sysfs_create_groups(&ibdev->port_data[port_num].sysfs->kobj,
+				   groups);
+}
+EXPORT_SYMBOL_GPL(ib_port_sysfs_create_groups);
+
+void ib_port_sysfs_remove_groups(struct ib_device *ibdev, u32 port_num,
+				 const struct attribute_group **groups)
+{
+	return sysfs_remove_groups(&ibdev->port_data[port_num].sysfs->kobj,
+				   groups);
+}
+EXPORT_SYMBOL_GPL(ib_port_sysfs_remove_groups);
+
+struct ib_device *ib_port_sysfs_get_ibdev_kobj(struct kobject *kobj,
+					       u32 *port_num)
+{
+	struct ib_port *port = container_of(kobj, struct ib_port, kobj);
+
+	*port_num = port->port_num;
+	return port->ibdev;
+}
+EXPORT_SYMBOL(ib_port_sysfs_get_ibdev_kobj);
+
 static const struct sysfs_ops port_sysfs_ops = {
 	.show	= port_attr_show,
 	.store	= port_attr_store
@@ -171,25 +183,27 @@ static ssize_t hw_stat_device_store(struct device *dev,
 				count);
 }
 
-static ssize_t hw_stat_port_show(struct ib_port *port,
-				 struct port_attribute *attr, char *buf)
+static ssize_t hw_stat_port_show(struct ib_device *ibdev, u32 port_num,
+				 struct ib_port_attribute *attr, char *buf)
 {
 	struct hw_stats_port_attribute *stat_attr =
 		container_of(attr, struct hw_stats_port_attribute, attr);
+	struct ib_port *port = ibdev->port_data[port_num].sysfs;
 
-	return stat_attr->show(port->ibdev, port->hw_stats_data->stats,
+	return stat_attr->show(ibdev, port->hw_stats_data->stats,
 			       stat_attr - port->hw_stats_data->attrs,
 			       port->port_num, buf);
 }
 
-static ssize_t hw_stat_port_store(struct ib_port *port,
-				  struct port_attribute *attr, const char *buf,
-				  size_t count)
+static ssize_t hw_stat_port_store(struct ib_device *ibdev, u32 port_num,
+				  struct ib_port_attribute *attr,
+				  const char *buf, size_t count)
 {
 	struct hw_stats_port_attribute *stat_attr =
 		container_of(attr, struct hw_stats_port_attribute, attr);
+	struct ib_port *port = ibdev->port_data[port_num].sysfs;
 
-	return stat_attr->store(port->ibdev, port->hw_stats_data->stats,
+	return stat_attr->store(ibdev, port->hw_stats_data->stats,
 				stat_attr - port->hw_stats_data->attrs,
 				port->port_num, buf, count);
 }
@@ -197,23 +211,23 @@ static ssize_t hw_stat_port_store(struct ib_port *port,
 static ssize_t gid_attr_show(struct kobject *kobj,
 			     struct attribute *attr, char *buf)
 {
-	struct port_attribute *port_attr =
-		container_of(attr, struct port_attribute, attr);
+	struct ib_port_attribute *port_attr =
+		container_of(attr, struct ib_port_attribute, attr);
 	struct ib_port *p = container_of(kobj, struct gid_attr_group,
 					 kobj)->port;
 
 	if (!port_attr->show)
 		return -EIO;
 
-	return port_attr->show(p, port_attr, buf);
+	return port_attr->show(p->ibdev, p->port_num, port_attr, buf);
 }
 
 static const struct sysfs_ops gid_attr_sysfs_ops = {
 	.show = gid_attr_show
 };
 
-static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
-			  char *buf)
+static ssize_t state_show(struct ib_device *ibdev, u32 port_num,
+			  struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	ssize_t ret;
@@ -227,7 +241,7 @@ static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
 		[IB_PORT_ACTIVE_DEFER]	= "ACTIVE_DEFER"
 	};
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
@@ -238,81 +252,80 @@ static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
 				  "UNKNOWN");
 }
 
-static ssize_t lid_show(struct ib_port *p, struct port_attribute *unused,
-			char *buf)
+static ssize_t lid_show(struct ib_device *ibdev, u32 port_num,
+			struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
 	return sysfs_emit(buf, "0x%x\n", attr.lid);
 }
 
-static ssize_t lid_mask_count_show(struct ib_port *p,
-				   struct port_attribute *unused,
-				   char *buf)
+static ssize_t lid_mask_count_show(struct ib_device *ibdev, u32 port_num,
+				   struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
 	return sysfs_emit(buf, "%d\n", attr.lmc);
 }
 
-static ssize_t sm_lid_show(struct ib_port *p, struct port_attribute *unused,
-			   char *buf)
+static ssize_t sm_lid_show(struct ib_device *ibdev, u32 port_num,
+			   struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
 	return sysfs_emit(buf, "0x%x\n", attr.sm_lid);
 }
 
-static ssize_t sm_sl_show(struct ib_port *p, struct port_attribute *unused,
-			  char *buf)
+static ssize_t sm_sl_show(struct ib_device *ibdev, u32 port_num,
+			  struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
 	return sysfs_emit(buf, "%d\n", attr.sm_sl);
 }
 
-static ssize_t cap_mask_show(struct ib_port *p, struct port_attribute *unused,
-			     char *buf)
+static ssize_t cap_mask_show(struct ib_device *ibdev, u32 port_num,
+			     struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
 	return sysfs_emit(buf, "0x%08x\n", attr.port_cap_flags);
 }
 
-static ssize_t rate_show(struct ib_port *p, struct port_attribute *unused,
-			 char *buf)
+static ssize_t rate_show(struct ib_device *ibdev, u32 port_num,
+			 struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 	char *speed = "";
 	int rate;		/* in deci-Gb/sec */
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
@@ -379,14 +392,14 @@ static const char *phys_state_to_str(enum ib_port_phys_state phys_state)
 	return "<unknown>";
 }
 
-static ssize_t phys_state_show(struct ib_port *p, struct port_attribute *unused,
-			       char *buf)
+static ssize_t phys_state_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *unused, char *buf)
 {
 	struct ib_port_attr attr;
 
 	ssize_t ret;
 
-	ret = ib_query_port(p->ibdev, p->port_num, &attr);
+	ret = ib_query_port(ibdev, port_num, &attr);
 	if (ret)
 		return ret;
 
@@ -394,12 +407,12 @@ static ssize_t phys_state_show(struct ib_port *p, struct port_attribute *unused,
 			  phys_state_to_str(attr.phys_state));
 }
 
-static ssize_t link_layer_show(struct ib_port *p, struct port_attribute *unused,
-			       char *buf)
+static ssize_t link_layer_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *unused, char *buf)
 {
 	const char *output;
 
-	switch (rdma_port_get_link_layer(p->ibdev, p->port_num)) {
+	switch (rdma_port_get_link_layer(ibdev, port_num)) {
 	case IB_LINK_LAYER_INFINIBAND:
 		output = "InfiniBand";
 		break;
@@ -414,26 +427,26 @@ static ssize_t link_layer_show(struct ib_port *p, struct port_attribute *unused,
 	return sysfs_emit(buf, "%s\n", output);
 }
 
-static PORT_ATTR_RO(state);
-static PORT_ATTR_RO(lid);
-static PORT_ATTR_RO(lid_mask_count);
-static PORT_ATTR_RO(sm_lid);
-static PORT_ATTR_RO(sm_sl);
-static PORT_ATTR_RO(cap_mask);
-static PORT_ATTR_RO(rate);
-static PORT_ATTR_RO(phys_state);
-static PORT_ATTR_RO(link_layer);
+static IB_PORT_ATTR_RO(state);
+static IB_PORT_ATTR_RO(lid);
+static IB_PORT_ATTR_RO(lid_mask_count);
+static IB_PORT_ATTR_RO(sm_lid);
+static IB_PORT_ATTR_RO(sm_sl);
+static IB_PORT_ATTR_RO(cap_mask);
+static IB_PORT_ATTR_RO(rate);
+static IB_PORT_ATTR_RO(phys_state);
+static IB_PORT_ATTR_RO(link_layer);
 
 static struct attribute *port_default_attrs[] = {
-	&port_attr_state.attr,
-	&port_attr_lid.attr,
-	&port_attr_lid_mask_count.attr,
-	&port_attr_sm_lid.attr,
-	&port_attr_sm_sl.attr,
-	&port_attr_cap_mask.attr,
-	&port_attr_rate.attr,
-	&port_attr_phys_state.attr,
-	&port_attr_link_layer.attr,
+	&ib_port_attr_state.attr,
+	&ib_port_attr_lid.attr,
+	&ib_port_attr_lid_mask_count.attr,
+	&ib_port_attr_sm_lid.attr,
+	&ib_port_attr_sm_sl.attr,
+	&ib_port_attr_cap_mask.attr,
+	&ib_port_attr_rate.attr,
+	&ib_port_attr_phys_state.attr,
+	&ib_port_attr_link_layer.attr,
 	NULL
 };
 
@@ -457,7 +470,8 @@ static ssize_t print_gid_type(const struct ib_gid_attr *gid_attr, char *buf)
 }
 
 static ssize_t _show_port_gid_attr(
-	struct ib_port *p, struct port_attribute *attr, char *buf,
+	struct ib_device *ibdev, u32 port_num, struct ib_port_attribute *attr,
+	char *buf,
 	ssize_t (*print)(const struct ib_gid_attr *gid_attr, char *buf))
 {
 	struct port_table_attribute *tab_attr =
@@ -465,7 +479,7 @@ static ssize_t _show_port_gid_attr(
 	const struct ib_gid_attr *gid_attr;
 	ssize_t ret;
 
-	gid_attr = rdma_get_gid_attr(p->ibdev, p->port_num, tab_attr->index);
+	gid_attr = rdma_get_gid_attr(ibdev, port_num, tab_attr->index);
 	if (IS_ERR(gid_attr))
 		/* -EINVAL is returned for user space compatibility reasons. */
 		return -EINVAL;
@@ -475,15 +489,15 @@ static ssize_t _show_port_gid_attr(
 	return ret;
 }
 
-static ssize_t show_port_gid(struct ib_port *p, struct port_attribute *attr,
-			     char *buf)
+static ssize_t show_port_gid(struct ib_device *ibdev, u32 port_num,
+			     struct ib_port_attribute *attr, char *buf)
 {
 	struct port_table_attribute *tab_attr =
 		container_of(attr, struct port_table_attribute, attr);
 	const struct ib_gid_attr *gid_attr;
 	int len;
 
-	gid_attr = rdma_get_gid_attr(p->ibdev, p->port_num, tab_attr->index);
+	gid_attr = rdma_get_gid_attr(ibdev, port_num, tab_attr->index);
 	if (IS_ERR(gid_attr)) {
 		const union ib_gid zgid = {};
 
@@ -504,28 +518,30 @@ static ssize_t show_port_gid(struct ib_port *p, struct port_attribute *attr,
 	return len;
 }
 
-static ssize_t show_port_gid_attr_ndev(struct ib_port *p,
-				       struct port_attribute *attr, char *buf)
+static ssize_t show_port_gid_attr_ndev(struct ib_device *ibdev, u32 port_num,
+				       struct ib_port_attribute *attr,
+				       char *buf)
 {
-	return _show_port_gid_attr(p, attr, buf, print_ndev);
+	return _show_port_gid_attr(ibdev, port_num, attr, buf, print_ndev);
 }
 
-static ssize_t show_port_gid_attr_gid_type(struct ib_port *p,
-					   struct port_attribute *attr,
+static ssize_t show_port_gid_attr_gid_type(struct ib_device *ibdev,
+					   u32 port_num,
+					   struct ib_port_attribute *attr,
 					   char *buf)
 {
-	return _show_port_gid_attr(p, attr, buf, print_gid_type);
+	return _show_port_gid_attr(ibdev, port_num, attr, buf, print_gid_type);
 }
 
-static ssize_t show_port_pkey(struct ib_port *p, struct port_attribute *attr,
-			      char *buf)
+static ssize_t show_port_pkey(struct ib_device *ibdev, u32 port_num,
+			      struct ib_port_attribute *attr, char *buf)
 {
 	struct port_table_attribute *tab_attr =
 		container_of(attr, struct port_table_attribute, attr);
 	u16 pkey;
 	int ret;
 
-	ret = ib_query_pkey(p->ibdev, p->port_num, tab_attr->index, &pkey);
+	ret = ib_query_pkey(ibdev, port_num, tab_attr->index, &pkey);
 	if (ret)
 		return ret;
 
@@ -594,8 +610,8 @@ static int get_perf_mad(struct ib_device *dev, int port_num, __be16 attr,
 	return ret;
 }
 
-static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
-				char *buf)
+static ssize_t show_pma_counter(struct ib_device *ibdev, u32 port_num,
+				struct ib_port_attribute *attr, char *buf)
 {
 	struct port_table_attribute *tab_attr =
 		container_of(attr, struct port_table_attribute, attr);
@@ -605,7 +621,7 @@ static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
 	u8 data[8];
 	int len;
 
-	ret = get_perf_mad(p->ibdev, p->port_num, tab_attr->attr_id, &data,
+	ret = get_perf_mad(ibdev, port_num, tab_attr->attr_id, &data,
 			40 + offset / 8, sizeof(data));
 	if (ret < 0)
 		return ret;
@@ -1077,10 +1093,11 @@ struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
 	return ibdev->port_data[port_num].sysfs->hw_stats_data->stats;
 }
 
-static int alloc_port_table_group(
-	const char *name, struct attribute_group *group,
-	struct port_table_attribute *attrs, size_t num,
-	ssize_t (*show)(struct ib_port *, struct port_attribute *, char *buf))
+static int
+alloc_port_table_group(const char *name, struct attribute_group *group,
+		       struct port_table_attribute *attrs, size_t num,
+		       ssize_t (*show)(struct ib_device *ibdev, u32 port_num,
+				       struct ib_port_attribute *, char *buf))
 {
 	struct attribute **attr_list;
 	int i;
diff --git a/include/rdma/ib_sysfs.h b/include/rdma/ib_sysfs.h
new file mode 100644
index 000000000000..f869d0e4fd30
--- /dev/null
+++ b/include/rdma/ib_sysfs.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (c) 2021 Mellanox Technologies Ltd.  All rights reserved.
+ */
+#ifndef DEF_RDMA_IB_SYSFS_H
+#define DEF_RDMA_IB_SYSFS_H
+
+#include <linux/sysfs.h>
+
+struct ib_device;
+
+struct ib_port_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct ib_device *ibdev, u32 port_num,
+			struct ib_port_attribute *attr, char *buf);
+	ssize_t (*store)(struct ib_device *ibdev, u32 port_num,
+			 struct ib_port_attribute *attr, const char *buf,
+			 size_t count);
+};
+
+#define IB_PORT_ATTR_RW(_name)                                                 \
+	struct ib_port_attribute ib_port_attr_##_name = __ATTR_RW(_name)
+
+#define IB_PORT_ATTR_ADMIN_RW(_name)                                           \
+	struct ib_port_attribute ib_port_attr_##_name =                        \
+		__ATTR_RW_MODE(_name, 0600)
+
+#define IB_PORT_ATTR_RO(_name)                                                 \
+	struct ib_port_attribute ib_port_attr_##_name = __ATTR_RO(_name)
+
+#define IB_PORT_ATTR_WO(_name)                                                 \
+	struct ib_port_attribute ib_port_attr_##_name = __ATTR_WO(_name)
+
+int ib_port_sysfs_create_groups(struct ib_device *ibdev, u32 port_num,
+				const struct attribute_group **groups);
+void ib_port_sysfs_remove_groups(struct ib_device *ibdev, u32 port_num,
+				 const struct attribute_group **groups);
+struct ib_device *ib_port_sysfs_get_ibdev_kobj(struct kobject *kobj,
+					       u32 *port_num);
+
+#endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (8 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 09/15] RDMA/core: Expose the ib port sysfs attribute machinery Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07 10:25   ` Greg KH
  2021-06-07  8:17 ` [PATCH rdma-next v1 11/15] RDMA/qib: Use attributes for the port sysfs Leon Romanovsky
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

This code is trying to attach a list of counters grouped into 4 groups to
the ib_port sysfs. Instead of creating a bunch of kobjects simply express
everything naturally as an ib_port_attribute and add a single
attribute_groups list.

Remove all the naked kobject manipulations.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cm.c        | 227 ++++++++++++----------------
 drivers/infiniband/core/core_priv.h |   8 +-
 drivers/infiniband/core/sysfs.c     |  50 ++----
 3 files changed, 119 insertions(+), 166 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 80087e678030..1827118c41e3 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -25,6 +25,7 @@
 
 #include <rdma/ib_cache.h>
 #include <rdma/ib_cm.h>
+#include <rdma/ib_sysfs.h>
 #include "cm_msgs.h"
 #include "core_priv.h"
 #include "cm_trace.h"
@@ -148,60 +149,17 @@ enum {
 	CM_COUNTER_GROUPS
 };
 
-static char const counter_group_names[CM_COUNTER_GROUPS]
-				     [sizeof("cm_rx_duplicates")] = {
-	"cm_tx_msgs", "cm_tx_retries",
-	"cm_rx_msgs", "cm_rx_duplicates"
-};
-
-struct cm_counter_group {
-	struct kobject obj;
-	atomic_long_t counter[CM_ATTR_COUNT];
-};
-
 struct cm_counter_attribute {
-	struct attribute attr;
-	int index;
-};
-
-#define CM_COUNTER_ATTR(_name, _index) \
-struct cm_counter_attribute cm_##_name##_counter_attr = { \
-	.attr = { .name = __stringify(_name), .mode = 0444 }, \
-	.index = _index \
-}
-
-static CM_COUNTER_ATTR(req, CM_REQ_COUNTER);
-static CM_COUNTER_ATTR(mra, CM_MRA_COUNTER);
-static CM_COUNTER_ATTR(rej, CM_REJ_COUNTER);
-static CM_COUNTER_ATTR(rep, CM_REP_COUNTER);
-static CM_COUNTER_ATTR(rtu, CM_RTU_COUNTER);
-static CM_COUNTER_ATTR(dreq, CM_DREQ_COUNTER);
-static CM_COUNTER_ATTR(drep, CM_DREP_COUNTER);
-static CM_COUNTER_ATTR(sidr_req, CM_SIDR_REQ_COUNTER);
-static CM_COUNTER_ATTR(sidr_rep, CM_SIDR_REP_COUNTER);
-static CM_COUNTER_ATTR(lap, CM_LAP_COUNTER);
-static CM_COUNTER_ATTR(apr, CM_APR_COUNTER);
-
-static struct attribute *cm_counter_default_attrs[] = {
-	&cm_req_counter_attr.attr,
-	&cm_mra_counter_attr.attr,
-	&cm_rej_counter_attr.attr,
-	&cm_rep_counter_attr.attr,
-	&cm_rtu_counter_attr.attr,
-	&cm_dreq_counter_attr.attr,
-	&cm_drep_counter_attr.attr,
-	&cm_sidr_req_counter_attr.attr,
-	&cm_sidr_rep_counter_attr.attr,
-	&cm_lap_counter_attr.attr,
-	&cm_apr_counter_attr.attr,
-	NULL
+	struct ib_port_attribute attr;
+	unsigned short group;
+	unsigned short index;
 };
 
 struct cm_port {
 	struct cm_device *cm_dev;
 	struct ib_mad_agent *mad_agent;
 	u32 port_num;
-	struct cm_counter_group counter_group[CM_COUNTER_GROUPS];
+	atomic_long_t counters[CM_COUNTER_GROUPS][CM_ATTR_COUNT];
 };
 
 struct cm_device {
@@ -1968,8 +1926,8 @@ static void cm_dup_req_handler(struct cm_work *work,
 	struct ib_mad_send_buf *msg = NULL;
 	int ret;
 
-	atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-			counter[CM_REQ_COUNTER]);
+	atomic_long_inc(
+		&work->port->counters[CM_RECV_DUPLICATES][CM_REQ_COUNTER]);
 
 	/* Quick state check to discard duplicate REQs. */
 	spin_lock_irq(&cm_id_priv->lock);
@@ -2467,8 +2425,8 @@ static void cm_dup_rep_handler(struct cm_work *work)
 	if (!cm_id_priv)
 		return;
 
-	atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-			counter[CM_REP_COUNTER]);
+	atomic_long_inc(
+		&work->port->counters[CM_RECV_DUPLICATES][CM_REP_COUNTER]);
 	ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg);
 	if (ret)
 		goto deref;
@@ -2645,8 +2603,8 @@ static int cm_rtu_handler(struct cm_work *work)
 	if (cm_id_priv->id.state != IB_CM_REP_SENT &&
 	    cm_id_priv->id.state != IB_CM_MRA_REP_RCVD) {
 		spin_unlock_irq(&cm_id_priv->lock);
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_RTU_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_RTU_COUNTER]);
 		goto out;
 	}
 	cm_id_priv->id.state = IB_CM_ESTABLISHED;
@@ -2850,8 +2808,8 @@ static int cm_dreq_handler(struct cm_work *work)
 		cpu_to_be32(IBA_GET(CM_DREQ_REMOTE_COMM_ID, dreq_msg)),
 		cpu_to_be32(IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg)));
 	if (!cm_id_priv) {
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_DREQ_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_DREQ_COUNTER]);
 		cm_issue_drep(work->port, work->mad_recv_wc);
 		trace_icm_no_priv_err(
 			IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg),
@@ -2880,8 +2838,8 @@ static int cm_dreq_handler(struct cm_work *work)
 	case IB_CM_MRA_REP_RCVD:
 		break;
 	case IB_CM_TIMEWAIT:
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_DREQ_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_DREQ_COUNTER]);
 		msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc);
 		if (IS_ERR(msg))
 			goto unlock;
@@ -2896,8 +2854,8 @@ static int cm_dreq_handler(struct cm_work *work)
 			cm_free_response_msg(msg);
 		goto deref;
 	case IB_CM_DREQ_RCVD:
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_DREQ_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_DREQ_COUNTER]);
 		goto unlock;
 	default:
 		trace_icm_dreq_unknown_err(&cm_id_priv->id);
@@ -3248,17 +3206,17 @@ static int cm_mra_handler(struct cm_work *work)
 		    cm_id_priv->id.lap_state != IB_CM_LAP_SENT ||
 		    ib_modify_mad(cm_id_priv->msg, timeout)) {
 			if (cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD)
-				atomic_long_inc(&work->port->
-						counter_group[CM_RECV_DUPLICATES].
-						counter[CM_MRA_COUNTER]);
+				atomic_long_inc(
+					&work->port->counters[CM_RECV_DUPLICATES]
+							     [CM_MRA_COUNTER]);
 			goto out;
 		}
 		cm_id_priv->id.lap_state = IB_CM_MRA_LAP_RCVD;
 		break;
 	case IB_CM_MRA_REQ_RCVD:
 	case IB_CM_MRA_REP_RCVD:
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_MRA_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_MRA_COUNTER]);
 		fallthrough;
 	default:
 		trace_icm_mra_unknown_err(&cm_id_priv->id);
@@ -3384,8 +3342,8 @@ static int cm_lap_handler(struct cm_work *work)
 	case IB_CM_LAP_IDLE:
 		break;
 	case IB_CM_MRA_LAP_SENT:
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_LAP_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_LAP_COUNTER]);
 		msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc);
 		if (IS_ERR(msg))
 			goto unlock;
@@ -3402,8 +3360,8 @@ static int cm_lap_handler(struct cm_work *work)
 			cm_free_response_msg(msg);
 		goto deref;
 	case IB_CM_LAP_RCVD:
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_LAP_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_LAP_COUNTER]);
 		goto unlock;
 	default:
 		goto unlock;
@@ -3622,8 +3580,8 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	listen_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
 	if (listen_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
-		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
-				counter[CM_SIDR_REQ_COUNTER]);
+		atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
+						     [CM_SIDR_REQ_COUNTER]);
 		goto out; /* Duplicate message. */
 	}
 	cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
@@ -3872,12 +3830,10 @@ static void cm_send_handler(struct ib_mad_agent *mad_agent,
 	if (!cm_id_priv && (attr_index != CM_REJ_COUNTER))
 		msg->retries = 1;
 
-	atomic_long_add(1 + msg->retries,
-			&port->counter_group[CM_XMIT].counter[attr_index]);
+	atomic_long_add(1 + msg->retries, &port->counters[CM_XMIT][attr_index]);
 	if (msg->retries)
 		atomic_long_add(msg->retries,
-				&port->counter_group[CM_XMIT_RETRIES].
-				counter[attr_index]);
+				&port->counters[CM_XMIT_RETRIES][attr_index]);
 
 	if (cm_id_priv)
 		cm_process_send_error(cm_id_priv, msg, state,
@@ -4098,8 +4054,7 @@ static void cm_recv_handler(struct ib_mad_agent *mad_agent,
 	}
 
 	attr_id = be16_to_cpu(mad_recv_wc->recv_buf.mad->mad_hdr.attr_id);
-	atomic_long_inc(&port->counter_group[CM_RECV].
-			counter[attr_id - CM_ATTR_ID_OFFSET]);
+	atomic_long_inc(&port->counters[CM_RECV][attr_id - CM_ATTR_ID_OFFSET]);
 
 	work = kmalloc(struct_size(work, path, paths), GFP_KERNEL);
 	if (!work) {
@@ -4301,59 +4256,74 @@ int ib_cm_init_qp_attr(struct ib_cm_id *cm_id,
 }
 EXPORT_SYMBOL(ib_cm_init_qp_attr);
 
-static ssize_t cm_show_counter(struct kobject *obj, struct attribute *attr,
-			       char *buf)
+static ssize_t cm_show_counter(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
 {
-	struct cm_counter_group *group;
-	struct cm_counter_attribute *cm_attr;
+	struct cm_counter_attribute *cm_attr =
+		container_of(attr, struct cm_counter_attribute, attr);
+	struct cm_device *cm_dev = ib_get_client_data(ibdev, &cm_client);
 
-	group = container_of(obj, struct cm_counter_group, obj);
-	cm_attr = container_of(attr, struct cm_counter_attribute, attr);
+	if (WARN_ON(!cm_dev))
+		return -EINVAL;
 
-	return sysfs_emit(buf, "%ld\n",
-			  atomic_long_read(&group->counter[cm_attr->index]));
+	return sysfs_emit(
+		buf, "%ld\n",
+		atomic_long_read(
+			&cm_dev->port[port_num - 1]
+				 ->counters[cm_attr->group][cm_attr->index]));
 }
 
-static const struct sysfs_ops cm_counter_ops = {
-	.show = cm_show_counter
-};
-
-static struct kobj_type cm_counter_obj_type = {
-	.sysfs_ops = &cm_counter_ops,
-	.default_attrs = cm_counter_default_attrs
-};
-
-static int cm_create_port_fs(struct cm_port *port)
-{
-	int i, ret;
-
-	for (i = 0; i < CM_COUNTER_GROUPS; i++) {
-		ret = ib_port_register_module_stat(port->cm_dev->ib_device,
-						   port->port_num,
-						   &port->counter_group[i].obj,
-						   &cm_counter_obj_type,
-						   counter_group_names[i]);
-		if (ret)
-			goto error;
+#define CM_COUNTER_ATTR(_name, _group, _index)                                 \
+	{                                                                      \
+		.attr = __ATTR(_name, 0444, cm_show_counter, NULL),            \
+		.group = _group, .index = _index                               \
 	}
 
-	return 0;
-
-error:
-	while (i--)
-		ib_port_unregister_module_stat(&port->counter_group[i].obj);
-	return ret;
-
-}
-
-static void cm_remove_port_fs(struct cm_port *port)
-{
-	int i;
-
-	for (i = 0; i < CM_COUNTER_GROUPS; i++)
-		ib_port_unregister_module_stat(&port->counter_group[i].obj);
+#define CM_COUNTER_GROUP(_group, _name)                                        \
+	static struct cm_counter_attribute cm_counter_attr_##_group[] = {      \
+		CM_COUNTER_ATTR(req, _group, CM_REQ_COUNTER),                  \
+		CM_COUNTER_ATTR(mra, _group, CM_MRA_COUNTER),                  \
+		CM_COUNTER_ATTR(rej, _group, CM_REJ_COUNTER),                  \
+		CM_COUNTER_ATTR(rep, _group, CM_REP_COUNTER),                  \
+		CM_COUNTER_ATTR(rtu, _group, CM_RTU_COUNTER),                  \
+		CM_COUNTER_ATTR(dreq, _group, CM_DREQ_COUNTER),                \
+		CM_COUNTER_ATTR(drep, _group, CM_DREP_COUNTER),                \
+		CM_COUNTER_ATTR(sidr_req, _group, CM_SIDR_REQ_COUNTER),        \
+		CM_COUNTER_ATTR(sidr_rep, _group, CM_SIDR_REP_COUNTER),        \
+		CM_COUNTER_ATTR(lap, _group, CM_LAP_COUNTER),                  \
+		CM_COUNTER_ATTR(apr, _group, CM_APR_COUNTER),                  \
+	};                                                                     \
+	static struct attribute *cm_counter_attrs_##_group[] = {               \
+		&cm_counter_attr_##_group[0].attr.attr,                        \
+		&cm_counter_attr_##_group[1].attr.attr,                        \
+		&cm_counter_attr_##_group[2].attr.attr,                        \
+		&cm_counter_attr_##_group[3].attr.attr,                        \
+		&cm_counter_attr_##_group[4].attr.attr,                        \
+		&cm_counter_attr_##_group[5].attr.attr,                        \
+		&cm_counter_attr_##_group[6].attr.attr,                        \
+		&cm_counter_attr_##_group[7].attr.attr,                        \
+		&cm_counter_attr_##_group[8].attr.attr,                        \
+		&cm_counter_attr_##_group[9].attr.attr,                        \
+		&cm_counter_attr_##_group[10].attr.attr,                       \
+		NULL,                                                          \
+	};                                                                     \
+	static const struct attribute_group cm_counter_group_##_group = {      \
+		.name = _name,                                                 \
+		.attrs = cm_counter_attrs_##_group,                            \
+	};
 
-}
+CM_COUNTER_GROUP(CM_XMIT, "cm_tx_msgs")
+CM_COUNTER_GROUP(CM_XMIT_RETRIES, "cm_tx_retries")
+CM_COUNTER_GROUP(CM_RECV, "cm_rx_msgs")
+CM_COUNTER_GROUP(CM_RECV_DUPLICATES, "cm_rx_duplicates")
+
+static const struct attribute_group *cm_counter_groups[] = {
+	&cm_counter_group_CM_XMIT,
+	&cm_counter_group_CM_XMIT_RETRIES,
+	&cm_counter_group_CM_RECV,
+	&cm_counter_group_CM_RECV_DUPLICATES,
+	NULL,
+};
 
 static int cm_add_one(struct ib_device *ib_device)
 {
@@ -4382,6 +4352,8 @@ static int cm_add_one(struct ib_device *ib_device)
 	cm_dev->ack_delay = ib_device->attrs.local_ca_ack_delay;
 	cm_dev->going_down = 0;
 
+	ib_set_client_data(ib_device, &cm_client, cm_dev);
+
 	set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
 	rdma_for_each_port (ib_device, i) {
 		if (!rdma_cap_ib_cm(ib_device, i))
@@ -4397,7 +4369,8 @@ static int cm_add_one(struct ib_device *ib_device)
 		port->cm_dev = cm_dev;
 		port->port_num = i;
 
-		ret = cm_create_port_fs(port);
+		ret = ib_port_register_client_groups(ib_device, i,
+						     cm_counter_groups);
 		if (ret)
 			goto error1;
 
@@ -4426,8 +4399,6 @@ static int cm_add_one(struct ib_device *ib_device)
 		goto free;
 	}
 
-	ib_set_client_data(ib_device, &cm_client, cm_dev);
-
 	write_lock_irqsave(&cm.device_lock, flags);
 	list_add_tail(&cm_dev->list, &cm.device_list);
 	write_unlock_irqrestore(&cm.device_lock, flags);
@@ -4436,7 +4407,7 @@ static int cm_add_one(struct ib_device *ib_device)
 error3:
 	ib_unregister_mad_agent(port->mad_agent);
 error2:
-	cm_remove_port_fs(port);
+	ib_port_unregister_client_groups(ib_device, i, cm_counter_groups);
 error1:
 	port_modify.set_port_cap_mask = 0;
 	port_modify.clr_port_cap_mask = IB_PORT_CM_SUP;
@@ -4447,7 +4418,8 @@ static int cm_add_one(struct ib_device *ib_device)
 		port = cm_dev->port[i-1];
 		ib_modify_port(ib_device, port->port_num, 0, &port_modify);
 		ib_unregister_mad_agent(port->mad_agent);
-		cm_remove_port_fs(port);
+		ib_port_unregister_client_groups(ib_device, i,
+						 cm_counter_groups);
 	}
 free:
 	cm_device_put(cm_dev);
@@ -4495,7 +4467,8 @@ static void cm_remove_one(struct ib_device *ib_device, void *client_data)
 		port->mad_agent = NULL;
 		spin_unlock(&cm_dev->mad_agent_lock);
 		ib_unregister_mad_agent(mad_agent);
-		cm_remove_port_fs(port);
+		ib_port_unregister_client_groups(ib_device, i,
+						 cm_counter_groups);
 	}
 
 	cm_device_put(cm_dev);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 6066c4b39876..78782cce47a1 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -382,10 +382,10 @@ int ib_setup_device_attrs(struct ib_device *ibdev);
 
 int rdma_compatdev_set(u8 enable);
 
-int ib_port_register_module_stat(struct ib_device *device, u32 port_num,
-				 struct kobject *kobj, struct kobj_type *ktype,
-				 const char *name);
-void ib_port_unregister_module_stat(struct kobject *kobj);
+int ib_port_register_client_groups(struct ib_device *ibdev, u32 port_num,
+				   const struct attribute_group **groups);
+void ib_port_unregister_client_groups(struct ib_device *ibdev, u32 port_num,
+				     const struct attribute_group **groups);
 
 int ib_device_set_netns_put(struct sk_buff *skb,
 			    struct ib_device *dev, u32 ns_fd);
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 3c5541c39bf6..e550a7eb37f6 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -1448,46 +1448,26 @@ int ib_setup_port_attrs(struct ib_core_device *coredev)
 }
 
 /**
- * ib_port_register_module_stat - add module counters under relevant port
- *  of IB device.
+ * ib_port_register_client_groups - Add an ib_client's attributes to the port
  *
- * @device: IB device to add counters
+ * @ibdev: IB device to add counters
  * @port_num: valid port number
- * @kobj: pointer to the kobject to initialize
- * @ktype: pointer to the ktype for this kobject.
- * @name: the name of the kobject
+ * @groups: Group list of attributes
+ *
+ * Do not use. Only for legacy sysfs compatibility.
  */
-int ib_port_register_module_stat(struct ib_device *device, u32 port_num,
-				 struct kobject *kobj, struct kobj_type *ktype,
-				 const char *name)
+int ib_port_register_client_groups(struct ib_device *ibdev, u32 port_num,
+				   const struct attribute_group **groups)
 {
-	struct kobject *p, *t;
-	int ret;
-
-	list_for_each_entry_safe(p, t, &device->coredev.port_list, entry) {
-		struct ib_port *port = container_of(p, struct ib_port, kobj);
-
-		if (port->port_num != port_num)
-			continue;
-
-		ret = kobject_init_and_add(kobj, ktype, &port->kobj, "%s",
-					   name);
-		if (ret) {
-			kobject_put(kobj);
-			return ret;
-		}
-	}
-
-	return 0;
+	return sysfs_create_groups(&ibdev->port_data[port_num].sysfs->kobj,
+				   groups);
 }
-EXPORT_SYMBOL(ib_port_register_module_stat);
+EXPORT_SYMBOL(ib_port_register_client_groups);
 
-/**
- * ib_port_unregister_module_stat - release module counters
- * @kobj: pointer to the kobject to release
- */
-void ib_port_unregister_module_stat(struct kobject *kobj)
+void ib_port_unregister_client_groups(struct ib_device *ibdev, u32 port_num,
+				      const struct attribute_group **groups)
 {
-	kobject_put(kobj);
+	return sysfs_remove_groups(&ibdev->port_data[port_num].sysfs->kobj,
+				   groups);
 }
-EXPORT_SYMBOL(ib_port_unregister_module_stat);
+EXPORT_SYMBOL(ib_port_unregister_client_groups);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 11/15] RDMA/qib: Use attributes for the port sysfs
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (9 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 12/15] RDMA/hfi1: " Leon Romanovsky
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

qib should not be creating a mess of kobjects to attach to the port
kobject - this is all attributes. The proper API is to create an
attribute_group list and create it against the port's kobject.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/qib/qib.h       |   4 -
 drivers/infiniband/hw/qib/qib_sysfs.c | 606 +++++++++++---------------
 2 files changed, 254 insertions(+), 356 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 88497739029e..b8a2deb5b4d2 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -521,10 +521,6 @@ struct qib_pportdata {
 
 	struct qib_devdata *dd;
 	struct qib_chippport_specific *cpspec; /* chip-specific per-port */
-	struct kobject pport_kobj;
-	struct kobject pport_cc_kobj;
-	struct kobject sl2vl_kobj;
-	struct kobject diagc_kobj;
 
 	/* GUID for this interface, in network order */
 	__be64 guid;
diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c b/drivers/infiniband/hw/qib/qib_sysfs.c
index 5e9e66f27064..a1e22c498712 100644
--- a/drivers/infiniband/hw/qib/qib_sysfs.c
+++ b/drivers/infiniband/hw/qib/qib_sysfs.c
@@ -32,25 +32,38 @@
  * SOFTWARE.
  */
 #include <linux/ctype.h>
+#include <rdma/ib_sysfs.h>
 
 #include "qib.h"
 #include "qib_mad.h"
 
-/* start of per-port functions */
+static struct qib_pportdata *qib_get_pportdata_kobj(struct kobject *kobj)
+{
+	u32 port_num;
+	struct ib_device *ibdev = ib_port_sysfs_get_ibdev_kobj(kobj, &port_num);
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+
+	return &dd->pport[port_num - 1];
+}
+
 /*
  * Get/Set heartbeat enable. OR of 1=enabled, 2=auto
  */
-static ssize_t show_hrtbt_enb(struct qib_pportdata *ppd, char *buf)
+static ssize_t hrtbt_enable_show(struct ib_device *ibdev, u32 port_num,
+				 struct ib_port_attribute *attr, char *buf)
 {
-	struct qib_devdata *dd = ppd->dd;
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_pportdata *ppd = &dd->pport[port_num - 1];
 
 	return sysfs_emit(buf, "%d\n", dd->f_get_ib_cfg(ppd, QIB_IB_CFG_HRTBT));
 }
 
-static ssize_t store_hrtbt_enb(struct qib_pportdata *ppd, const char *buf,
-			       size_t count)
+static ssize_t hrtbt_enable_store(struct ib_device *ibdev, u32 port_num,
+				  struct ib_port_attribute *attr,
+				  const char *buf, size_t count)
 {
-	struct qib_devdata *dd = ppd->dd;
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_pportdata *ppd = &dd->pport[port_num - 1];
 	int ret;
 	u16 val;
 
@@ -70,11 +83,14 @@ static ssize_t store_hrtbt_enb(struct qib_pportdata *ppd, const char *buf,
 	ret = dd->f_set_ib_cfg(ppd, QIB_IB_CFG_HRTBT, val);
 	return ret < 0 ? ret : count;
 }
+static IB_PORT_ATTR_RW(hrtbt_enable);
 
-static ssize_t store_loopback(struct qib_pportdata *ppd, const char *buf,
+static ssize_t loopback_store(struct ib_device *ibdev, u32 port_num,
+			      struct ib_port_attribute *attr, const char *buf,
 			      size_t count)
 {
-	struct qib_devdata *dd = ppd->dd;
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_pportdata *ppd = &dd->pport[port_num - 1];
 	int ret = count, r;
 
 	r = dd->f_set_ib_loopback(ppd, buf);
@@ -83,11 +99,14 @@ static ssize_t store_loopback(struct qib_pportdata *ppd, const char *buf,
 
 	return ret;
 }
+static IB_PORT_ATTR_WO(loopback);
 
-static ssize_t store_led_override(struct qib_pportdata *ppd, const char *buf,
-				  size_t count)
+static ssize_t led_override_store(struct ib_device *ibdev, u32 port_num,
+				  struct ib_port_attribute *attr,
+				  const char *buf, size_t count)
 {
-	struct qib_devdata *dd = ppd->dd;
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_pportdata *ppd = &dd->pport[port_num - 1];
 	int ret;
 	u16 val;
 
@@ -100,14 +119,20 @@ static ssize_t store_led_override(struct qib_pportdata *ppd, const char *buf,
 	qib_set_led_override(ppd, val);
 	return count;
 }
+static IB_PORT_ATTR_WO(led_override);
 
-static ssize_t show_status(struct qib_pportdata *ppd, char *buf)
+static ssize_t status_show(struct ib_device *ibdev, u32 port_num,
+			   struct ib_port_attribute *attr, char *buf)
 {
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_pportdata *ppd = &dd->pport[port_num - 1];
+
 	if (!ppd->statusp)
 		return -EINVAL;
 
 	return sysfs_emit(buf, "0x%llx\n", (unsigned long long)*(ppd->statusp));
 }
+static IB_PORT_ATTR_RO(status);
 
 /*
  * For userland compatibility, these offsets must remain fixed.
@@ -127,8 +152,11 @@ static const char * const qib_status_str[] = {
 	NULL,
 };
 
-static ssize_t show_status_str(struct qib_pportdata *ppd, char *buf)
+static ssize_t status_str_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
 {
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_pportdata *ppd = &dd->pport[port_num - 1];
 	int i, any;
 	u64 s;
 	ssize_t ret;
@@ -160,38 +188,22 @@ static ssize_t show_status_str(struct qib_pportdata *ppd, char *buf)
 bail:
 	return ret;
 }
+static IB_PORT_ATTR_RO(status_str);
 
 /* end of per-port functions */
 
-/*
- * Start of per-port file structures and support code
- * Because we are fitting into other infrastructure, we have to supply the
- * full set of kobject/sysfs_ops structures and routines.
- */
-#define QIB_PORT_ATTR(name, mode, show, store) \
-	static struct qib_port_attr qib_port_attr_##name = \
-		__ATTR(name, mode, show, store)
-
-struct qib_port_attr {
-	struct attribute attr;
-	ssize_t (*show)(struct qib_pportdata *, char *);
-	ssize_t (*store)(struct qib_pportdata *, const char *, size_t);
+static struct attribute *port_linkcontrol_attributes[] = {
+	&ib_port_attr_loopback.attr,
+	&ib_port_attr_led_override.attr,
+	&ib_port_attr_hrtbt_enable.attr,
+	&ib_port_attr_status.attr,
+	&ib_port_attr_status_str.attr,
+	NULL
 };
 
-QIB_PORT_ATTR(loopback, S_IWUSR, NULL, store_loopback);
-QIB_PORT_ATTR(led_override, S_IWUSR, NULL, store_led_override);
-QIB_PORT_ATTR(hrtbt_enable, S_IWUSR | S_IRUGO, show_hrtbt_enb,
-	      store_hrtbt_enb);
-QIB_PORT_ATTR(status, S_IRUGO, show_status, NULL);
-QIB_PORT_ATTR(status_str, S_IRUGO, show_status_str, NULL);
-
-static struct attribute *port_default_attributes[] = {
-	&qib_port_attr_loopback.attr,
-	&qib_port_attr_led_override.attr,
-	&qib_port_attr_hrtbt_enable.attr,
-	&qib_port_attr_status.attr,
-	&qib_port_attr_status_str.attr,
-	NULL
+static const struct attribute_group port_linkcontrol_group = {
+	.name = "linkcontrol",
+	.attrs = port_linkcontrol_attributes,
 };
 
 /*
@@ -201,13 +213,12 @@ static struct attribute *port_default_attributes[] = {
 /*
  * Congestion control table size followed by table entries
  */
-static ssize_t read_cc_table_bin(struct file *filp, struct kobject *kobj,
-		struct bin_attribute *bin_attr,
-		char *buf, loff_t pos, size_t count)
+static ssize_t cc_table_bin_read(struct file *filp, struct kobject *kobj,
+				 struct bin_attribute *bin_attr, char *buf,
+				 loff_t pos, size_t count)
 {
+	struct qib_pportdata *ppd = qib_get_pportdata_kobj(kobj);
 	int ret;
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, pport_cc_kobj);
 
 	if (!qib_cc_table_size || !ppd->ccti_entries_shadow)
 		return -EINVAL;
@@ -230,34 +241,19 @@ static ssize_t read_cc_table_bin(struct file *filp, struct kobject *kobj,
 
 	return count;
 }
-
-static void qib_port_release(struct kobject *kobj)
-{
-	/* nothing to do since memory is freed by qib_free_devdata() */
-}
-
-static struct kobj_type qib_port_cc_ktype = {
-	.release = qib_port_release,
-};
-
-static const struct bin_attribute cc_table_bin_attr = {
-	.attr = {.name = "cc_table_bin", .mode = 0444},
-	.read = read_cc_table_bin,
-	.size = PAGE_SIZE,
-};
+static BIN_ATTR_RO(cc_table_bin, PAGE_SIZE);
 
 /*
  * Congestion settings: port control, control map and an array of 16
  * entries for the congestion entries - increase, timer, event log
  * trigger threshold and the minimum injection rate delay.
  */
-static ssize_t read_cc_setting_bin(struct file *filp, struct kobject *kobj,
-		struct bin_attribute *bin_attr,
-		char *buf, loff_t pos, size_t count)
+static ssize_t cc_setting_bin_read(struct file *filp, struct kobject *kobj,
+				   struct bin_attribute *bin_attr, char *buf,
+				   loff_t pos, size_t count)
 {
+	struct qib_pportdata *ppd = qib_get_pportdata_kobj(kobj);
 	int ret;
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, pport_cc_kobj);
 
 	if (!qib_cc_table_size || !ppd->congestion_entries_shadow)
 		return -EINVAL;
@@ -278,67 +274,54 @@ static ssize_t read_cc_setting_bin(struct file *filp, struct kobject *kobj,
 
 	return count;
 }
+static BIN_ATTR_RO(cc_setting_bin, PAGE_SIZE);
 
-static const struct bin_attribute cc_setting_bin_attr = {
-	.attr = {.name = "cc_settings_bin", .mode = 0444},
-	.read = read_cc_setting_bin,
-	.size = PAGE_SIZE,
+static struct bin_attribute *port_ccmgta_attributes[] = {
+	&bin_attr_cc_setting_bin,
+	&bin_attr_cc_table_bin,
+	NULL,
 };
 
-
-static ssize_t qib_portattr_show(struct kobject *kobj,
-	struct attribute *attr, char *buf)
-{
-	struct qib_port_attr *pattr =
-		container_of(attr, struct qib_port_attr, attr);
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, pport_kobj);
-
-	if (!pattr->show)
-		return -EIO;
-
-	return pattr->show(ppd, buf);
-}
-
-static ssize_t qib_portattr_store(struct kobject *kobj,
-	struct attribute *attr, const char *buf, size_t len)
+static umode_t qib_ccmgta_is_bin_visible(struct kobject *kobj,
+				 struct bin_attribute *attr, int n)
 {
-	struct qib_port_attr *pattr =
-		container_of(attr, struct qib_port_attr, attr);
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, pport_kobj);
+	struct qib_pportdata *ppd = qib_get_pportdata_kobj(kobj);
 
-	if (!pattr->store)
-		return -EIO;
-
-	return pattr->store(ppd, buf, len);
+	if (!qib_cc_table_size || !ppd->congestion_entries_shadow)
+		return 0;
+	return attr->attr.mode;
 }
 
-
-static const struct sysfs_ops qib_port_ops = {
-	.show = qib_portattr_show,
-	.store = qib_portattr_store,
-};
-
-static struct kobj_type qib_port_ktype = {
-	.release = qib_port_release,
-	.sysfs_ops = &qib_port_ops,
-	.default_attrs = port_default_attributes
+static const struct attribute_group port_ccmgta_attribute_group = {
+	.name = "CCMgtA",
+	.is_bin_visible = qib_ccmgta_is_bin_visible,
+	.bin_attrs = port_ccmgta_attributes,
 };
 
 /* Start sl2vl */
 
-#define QIB_SL2VL_ATTR(N) \
-	static struct qib_sl2vl_attr qib_sl2vl_attr_##N = { \
-		.attr = { .name = __stringify(N), .mode = 0444 }, \
-		.sl = N \
-	}
-
 struct qib_sl2vl_attr {
-	struct attribute attr;
+	struct ib_port_attribute attr;
 	int sl;
 };
 
+static ssize_t sl2vl_attr_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
+{
+	struct qib_sl2vl_attr *sattr =
+		container_of(attr, struct qib_sl2vl_attr, attr);
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
+
+	return sysfs_emit(buf, "%u\n", qibp->sl_to_vl[sattr->sl]);
+}
+
+#define QIB_SL2VL_ATTR(N)                                                      \
+	static struct qib_sl2vl_attr qib_sl2vl_attr_##N = {                    \
+		.attr = __ATTR(N, 0444, sl2vl_attr_show, NULL),                \
+		.sl = N,                                                       \
+	}
+
 QIB_SL2VL_ATTR(0);
 QIB_SL2VL_ATTR(1);
 QIB_SL2VL_ATTR(2);
@@ -356,72 +339,74 @@ QIB_SL2VL_ATTR(13);
 QIB_SL2VL_ATTR(14);
 QIB_SL2VL_ATTR(15);
 
-static struct attribute *sl2vl_default_attributes[] = {
-	&qib_sl2vl_attr_0.attr,
-	&qib_sl2vl_attr_1.attr,
-	&qib_sl2vl_attr_2.attr,
-	&qib_sl2vl_attr_3.attr,
-	&qib_sl2vl_attr_4.attr,
-	&qib_sl2vl_attr_5.attr,
-	&qib_sl2vl_attr_6.attr,
-	&qib_sl2vl_attr_7.attr,
-	&qib_sl2vl_attr_8.attr,
-	&qib_sl2vl_attr_9.attr,
-	&qib_sl2vl_attr_10.attr,
-	&qib_sl2vl_attr_11.attr,
-	&qib_sl2vl_attr_12.attr,
-	&qib_sl2vl_attr_13.attr,
-	&qib_sl2vl_attr_14.attr,
-	&qib_sl2vl_attr_15.attr,
+static struct attribute *port_sl2vl_attributes[] = {
+	&qib_sl2vl_attr_0.attr.attr,
+	&qib_sl2vl_attr_1.attr.attr,
+	&qib_sl2vl_attr_2.attr.attr,
+	&qib_sl2vl_attr_3.attr.attr,
+	&qib_sl2vl_attr_4.attr.attr,
+	&qib_sl2vl_attr_5.attr.attr,
+	&qib_sl2vl_attr_6.attr.attr,
+	&qib_sl2vl_attr_7.attr.attr,
+	&qib_sl2vl_attr_8.attr.attr,
+	&qib_sl2vl_attr_9.attr.attr,
+	&qib_sl2vl_attr_10.attr.attr,
+	&qib_sl2vl_attr_11.attr.attr,
+	&qib_sl2vl_attr_12.attr.attr,
+	&qib_sl2vl_attr_13.attr.attr,
+	&qib_sl2vl_attr_14.attr.attr,
+	&qib_sl2vl_attr_15.attr.attr,
 	NULL
 };
 
-static ssize_t sl2vl_attr_show(struct kobject *kobj, struct attribute *attr,
-			       char *buf)
-{
-	struct qib_sl2vl_attr *sattr =
-		container_of(attr, struct qib_sl2vl_attr, attr);
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, sl2vl_kobj);
-	struct qib_ibport *qibp = &ppd->ibport_data;
-
-	return sysfs_emit(buf, "%u\n", qibp->sl_to_vl[sattr->sl]);
-}
-
-static const struct sysfs_ops qib_sl2vl_ops = {
-	.show = sl2vl_attr_show,
-};
-
-static struct kobj_type qib_sl2vl_ktype = {
-	.release = qib_port_release,
-	.sysfs_ops = &qib_sl2vl_ops,
-	.default_attrs = sl2vl_default_attributes
+static const struct attribute_group port_sl2vl_group = {
+	.name = "sl2vl",
+	.attrs = port_sl2vl_attributes,
 };
 
 /* End sl2vl */
 
 /* Start diag_counters */
 
-#define QIB_DIAGC_ATTR(N) \
-	static struct qib_diagc_attr qib_diagc_attr_##N = { \
-		.attr = { .name = __stringify(N), .mode = 0664 }, \
-		.counter = offsetof(struct qib_ibport, rvp.n_##N) \
-	}
-
-#define QIB_DIAGC_ATTR_PER_CPU(N) \
-	static struct qib_diagc_attr qib_diagc_attr_##N = { \
-		.attr = { .name = __stringify(N), .mode = 0664 }, \
-		.counter = offsetof(struct qib_ibport, rvp.z_##N) \
-	}
-
 struct qib_diagc_attr {
-	struct attribute attr;
+	struct ib_port_attribute attr;
 	size_t counter;
 };
 
-QIB_DIAGC_ATTR_PER_CPU(rc_acks);
-QIB_DIAGC_ATTR_PER_CPU(rc_qacks);
-QIB_DIAGC_ATTR_PER_CPU(rc_delayed_comp);
+static ssize_t diagc_attr_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
+{
+	struct qib_diagc_attr *dattr =
+		container_of(attr, struct qib_diagc_attr, attr);
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
+
+	return sysfs_emit(buf, "%llu\n", *((u64 *)qibp + dattr->counter));
+}
+
+static ssize_t diagc_attr_store(struct ib_device *ibdev, u32 port_num,
+				struct ib_port_attribute *attr, const char *buf,
+				size_t count)
+{
+	struct qib_diagc_attr *dattr =
+		container_of(attr, struct qib_diagc_attr, attr);
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
+	u64 val;
+	int ret;
+
+	ret = kstrtou64(buf, 0, &val);
+	if (ret)
+		return ret;
+	*((u64 *)qibp + dattr->counter) = val;
+	return count;
+}
+
+#define QIB_DIAGC_ATTR(N)                                                      \
+	static struct qib_diagc_attr qib_diagc_attr_##N = {                    \
+		.attr = __ATTR(N, 0664, diagc_attr_show, diagc_attr_store),    \
+		.counter = &((struct qib_ibport *)0)->rvp.n_##N - (u64 *)0,    \
+	}
 
 QIB_DIAGC_ATTR(rc_resends);
 QIB_DIAGC_ATTR(seq_naks);
@@ -437,26 +422,6 @@ QIB_DIAGC_ATTR(rc_dupreq);
 QIB_DIAGC_ATTR(rc_seqnak);
 QIB_DIAGC_ATTR(rc_crwaits);
 
-static struct attribute *diagc_default_attributes[] = {
-	&qib_diagc_attr_rc_resends.attr,
-	&qib_diagc_attr_rc_acks.attr,
-	&qib_diagc_attr_rc_qacks.attr,
-	&qib_diagc_attr_rc_delayed_comp.attr,
-	&qib_diagc_attr_seq_naks.attr,
-	&qib_diagc_attr_rdma_seq.attr,
-	&qib_diagc_attr_rnr_naks.attr,
-	&qib_diagc_attr_other_naks.attr,
-	&qib_diagc_attr_rc_timeouts.attr,
-	&qib_diagc_attr_loop_pkts.attr,
-	&qib_diagc_attr_pkt_drops.attr,
-	&qib_diagc_attr_dmawait.attr,
-	&qib_diagc_attr_unaligned.attr,
-	&qib_diagc_attr_rc_dupreq.attr,
-	&qib_diagc_attr_rc_seqnak.attr,
-	&qib_diagc_attr_rc_crwaits.attr,
-	NULL
-};
-
 static u64 get_all_cpu_total(u64 __percpu *cntr)
 {
 	int cpu;
@@ -467,86 +432,127 @@ static u64 get_all_cpu_total(u64 __percpu *cntr)
 	return counter;
 }
 
-#define def_write_per_cpu(cntr) \
-static void write_per_cpu_##cntr(struct qib_pportdata *ppd, u32 data)	\
-{									\
-	struct qib_devdata *dd = ppd->dd;				\
-	struct qib_ibport *qibp = &ppd->ibport_data;			\
-	/*  A write can only zero the counter */			\
-	if (data == 0)							\
-		qibp->rvp.z_##cntr = get_all_cpu_total(qibp->rvp.cntr); \
-	else								\
-		qib_dev_err(dd, "Per CPU cntrs can only be zeroed");	\
+static ssize_t qib_store_per_cpu(struct qib_devdata *dd, const char *buf,
+				 size_t count, u64 *zero, u64 cur)
+{
+	u32 val;
+	int ret;
+
+	ret = kstrtou32(buf, 0, &val);
+	if (ret)
+		return ret;
+	if (val != 0) {
+		qib_dev_err(dd, "Per CPU cntrs can only be zeroed");
+		return count;
+	}
+	*zero = cur;
+	return count;
 }
 
-def_write_per_cpu(rc_acks)
-def_write_per_cpu(rc_qacks)
-def_write_per_cpu(rc_delayed_comp)
+static ssize_t rc_acks_show(struct ib_device *ibdev, u32 port_num,
+			    struct ib_port_attribute *attr, char *buf)
+{
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
 
-#define READ_PER_CPU_CNTR(cntr) (get_all_cpu_total(qibp->rvp.cntr) - \
-							qibp->rvp.z_##cntr)
+	return sysfs_emit(buf, "%llu\n",
+			  get_all_cpu_total(qibp->rvp.rc_acks) -
+				  qibp->rvp.z_rc_acks);
+}
 
-static ssize_t diagc_attr_show(struct kobject *kobj, struct attribute *attr,
-			       char *buf)
+static ssize_t rc_acks_store(struct ib_device *ibdev, u32 port_num,
+			     struct ib_port_attribute *attr, const char *buf,
+			     size_t count)
 {
-	struct qib_diagc_attr *dattr =
-		container_of(attr, struct qib_diagc_attr, attr);
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, diagc_kobj);
-	struct qib_ibport *qibp = &ppd->ibport_data;
-	u64 val;
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
 
-	if (!strncmp(dattr->attr.name, "rc_acks", 7))
-		val = READ_PER_CPU_CNTR(rc_acks);
-	else if (!strncmp(dattr->attr.name, "rc_qacks", 8))
-		val = READ_PER_CPU_CNTR(rc_qacks);
-	else if (!strncmp(dattr->attr.name, "rc_delayed_comp", 15))
-		val = READ_PER_CPU_CNTR(rc_delayed_comp);
-	else
-		val = *(u32 *)((char *)qibp + dattr->counter);
+	return qib_store_per_cpu(dd, buf, count, &qibp->rvp.z_rc_acks,
+				 get_all_cpu_total(qibp->rvp.rc_acks));
+}
+static IB_PORT_ATTR_RW(rc_acks);
+
+static ssize_t rc_qacks_show(struct ib_device *ibdev, u32 port_num,
+			     struct ib_port_attribute *attr, char *buf)
+{
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
 
-	return sysfs_emit(buf, "%llu\n", val);
+	return sysfs_emit(buf, "%llu\n",
+			  get_all_cpu_total(qibp->rvp.rc_qacks) -
+				  qibp->rvp.z_rc_qacks);
 }
 
-static ssize_t diagc_attr_store(struct kobject *kobj, struct attribute *attr,
-				const char *buf, size_t size)
+static ssize_t rc_qacks_store(struct ib_device *ibdev, u32 port_num,
+			      struct ib_port_attribute *attr, const char *buf,
+			      size_t count)
 {
-	struct qib_diagc_attr *dattr =
-		container_of(attr, struct qib_diagc_attr, attr);
-	struct qib_pportdata *ppd =
-		container_of(kobj, struct qib_pportdata, diagc_kobj);
-	struct qib_ibport *qibp = &ppd->ibport_data;
-	u32 val;
-	int ret;
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
 
-	ret = kstrtou32(buf, 0, &val);
-	if (ret)
-		return ret;
+	return qib_store_per_cpu(dd, buf, count, &qibp->rvp.z_rc_qacks,
+				 get_all_cpu_total(qibp->rvp.rc_qacks));
+}
+static IB_PORT_ATTR_RW(rc_qacks);
 
-	if (!strncmp(dattr->attr.name, "rc_acks", 7))
-		write_per_cpu_rc_acks(ppd, val);
-	else if (!strncmp(dattr->attr.name, "rc_qacks", 8))
-		write_per_cpu_rc_qacks(ppd, val);
-	else if (!strncmp(dattr->attr.name, "rc_delayed_comp", 15))
-		write_per_cpu_rc_delayed_comp(ppd, val);
-	else
-		*(u32 *)((char *)qibp + dattr->counter) = val;
-	return size;
+static ssize_t rc_delayed_comp_show(struct ib_device *ibdev, u32 port_num,
+				    struct ib_port_attribute *attr, char *buf)
+{
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
+
+	return sysfs_emit(buf, "%llu\n",
+			 get_all_cpu_total(qibp->rvp.rc_delayed_comp) -
+				 qibp->rvp.z_rc_delayed_comp);
 }
 
-static const struct sysfs_ops qib_diagc_ops = {
-	.show = diagc_attr_show,
-	.store = diagc_attr_store,
+static ssize_t rc_delayed_comp_store(struct ib_device *ibdev, u32 port_num,
+				     struct ib_port_attribute *attr,
+				     const char *buf, size_t count)
+{
+	struct qib_devdata *dd = dd_from_ibdev(ibdev);
+	struct qib_ibport *qibp = &dd->pport[port_num - 1].ibport_data;
+
+	return qib_store_per_cpu(dd, buf, count, &qibp->rvp.z_rc_delayed_comp,
+				 get_all_cpu_total(qibp->rvp.rc_delayed_comp));
+}
+static IB_PORT_ATTR_RW(rc_delayed_comp);
+
+static struct attribute *port_diagc_attributes[] = {
+	&qib_diagc_attr_rc_resends.attr.attr,
+	&qib_diagc_attr_seq_naks.attr.attr,
+	&qib_diagc_attr_rdma_seq.attr.attr,
+	&qib_diagc_attr_rnr_naks.attr.attr,
+	&qib_diagc_attr_other_naks.attr.attr,
+	&qib_diagc_attr_rc_timeouts.attr.attr,
+	&qib_diagc_attr_loop_pkts.attr.attr,
+	&qib_diagc_attr_pkt_drops.attr.attr,
+	&qib_diagc_attr_dmawait.attr.attr,
+	&qib_diagc_attr_unaligned.attr.attr,
+	&qib_diagc_attr_rc_dupreq.attr.attr,
+	&qib_diagc_attr_rc_seqnak.attr.attr,
+	&qib_diagc_attr_rc_crwaits.attr.attr,
+	&ib_port_attr_rc_acks.attr,
+	&ib_port_attr_rc_qacks.attr,
+	&ib_port_attr_rc_delayed_comp.attr,
+	NULL
 };
 
-static struct kobj_type qib_diagc_ktype = {
-	.release = qib_port_release,
-	.sysfs_ops = &qib_diagc_ops,
-	.default_attrs = diagc_default_attributes
+static const struct attribute_group port_diagc_group = {
+	.name = "linkcontrol",
+	.attrs = port_diagc_attributes,
 };
 
 /* End diag_counters */
 
+static const struct attribute_group *qib_port_groups[] = {
+	&port_linkcontrol_group,
+	&port_ccmgta_attribute_group,
+	&port_sl2vl_group,
+	&port_diagc_group,
+	NULL,
+};
+
 /* end of per-port file structures and support code */
 
 /*
@@ -731,99 +737,7 @@ const struct attribute_group qib_attr_group = {
 int qib_create_port_files(struct ib_device *ibdev, u32 port_num,
 			  struct kobject *kobj)
 {
-	struct qib_pportdata *ppd;
-	struct qib_devdata *dd = dd_from_ibdev(ibdev);
-	int ret;
-
-	if (!port_num || port_num > dd->num_pports) {
-		qib_dev_err(dd,
-			"Skipping infiniband class with invalid port %u\n",
-			port_num);
-		ret = -ENODEV;
-		goto bail;
-	}
-	ppd = &dd->pport[port_num - 1];
-
-	ret = kobject_init_and_add(&ppd->pport_kobj, &qib_port_ktype, kobj,
-				   "linkcontrol");
-	if (ret) {
-		qib_dev_err(dd,
-			"Skipping linkcontrol sysfs info, (err %d) port %u\n",
-			ret, port_num);
-		goto bail_link;
-	}
-	kobject_uevent(&ppd->pport_kobj, KOBJ_ADD);
-
-	ret = kobject_init_and_add(&ppd->sl2vl_kobj, &qib_sl2vl_ktype, kobj,
-				   "sl2vl");
-	if (ret) {
-		qib_dev_err(dd,
-			"Skipping sl2vl sysfs info, (err %d) port %u\n",
-			ret, port_num);
-		goto bail_sl;
-	}
-	kobject_uevent(&ppd->sl2vl_kobj, KOBJ_ADD);
-
-	ret = kobject_init_and_add(&ppd->diagc_kobj, &qib_diagc_ktype, kobj,
-				   "diag_counters");
-	if (ret) {
-		qib_dev_err(dd,
-			"Skipping diag_counters sysfs info, (err %d) port %u\n",
-			ret, port_num);
-		goto bail_diagc;
-	}
-	kobject_uevent(&ppd->diagc_kobj, KOBJ_ADD);
-
-	if (!qib_cc_table_size || !ppd->congestion_entries_shadow)
-		return 0;
-
-	ret = kobject_init_and_add(&ppd->pport_cc_kobj, &qib_port_cc_ktype,
-				kobj, "CCMgtA");
-	if (ret) {
-		qib_dev_err(dd,
-		 "Skipping Congestion Control sysfs info, (err %d) port %u\n",
-		 ret, port_num);
-		goto bail_cc;
-	}
-
-	kobject_uevent(&ppd->pport_cc_kobj, KOBJ_ADD);
-
-	ret = sysfs_create_bin_file(&ppd->pport_cc_kobj,
-				&cc_setting_bin_attr);
-	if (ret) {
-		qib_dev_err(dd,
-		 "Skipping Congestion Control setting sysfs info, (err %d) port %u\n",
-		 ret, port_num);
-		goto bail_cc;
-	}
-
-	ret = sysfs_create_bin_file(&ppd->pport_cc_kobj,
-				&cc_table_bin_attr);
-	if (ret) {
-		qib_dev_err(dd,
-		 "Skipping Congestion Control table sysfs info, (err %d) port %u\n",
-		 ret, port_num);
-		goto bail_cc_entry_bin;
-	}
-
-	qib_devinfo(dd->pcidev,
-		"IB%u: Congestion Control Agent enabled for port %d\n",
-		dd->unit, port_num);
-
-	return 0;
-
-bail_cc_entry_bin:
-	sysfs_remove_bin_file(&ppd->pport_cc_kobj, &cc_setting_bin_attr);
-bail_cc:
-	kobject_put(&ppd->pport_cc_kobj);
-bail_diagc:
-	kobject_put(&ppd->diagc_kobj);
-bail_sl:
-	kobject_put(&ppd->sl2vl_kobj);
-bail_link:
-	kobject_put(&ppd->pport_kobj);
-bail:
-	return ret;
+	return ib_port_sysfs_create_groups(ibdev, port_num, qib_port_groups);
 }
 
 /*
@@ -831,21 +745,9 @@ int qib_create_port_files(struct ib_device *ibdev, u32 port_num,
  */
 void qib_verbs_unregister_sysfs(struct qib_devdata *dd)
 {
-	struct qib_pportdata *ppd;
 	int i;
 
-	for (i = 0; i < dd->num_pports; i++) {
-		ppd = &dd->pport[i];
-		if (qib_cc_table_size &&
-			ppd->congestion_entries_shadow) {
-			sysfs_remove_bin_file(&ppd->pport_cc_kobj,
-				&cc_setting_bin_attr);
-			sysfs_remove_bin_file(&ppd->pport_cc_kobj,
-				&cc_table_bin_attr);
-			kobject_put(&ppd->pport_cc_kobj);
-		}
-		kobject_put(&ppd->diagc_kobj);
-		kobject_put(&ppd->sl2vl_kobj);
-		kobject_put(&ppd->pport_kobj);
-	}
+	for (i = 0; i < dd->num_pports; i++)
+		ib_port_sysfs_remove_groups(&dd->verbs_dev.rdi.ibdev, i,
+					    qib_port_groups);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 12/15] RDMA/hfi1: Use attributes for the port sysfs
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (10 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 11/15] RDMA/qib: Use attributes for the port sysfs Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07 13:07   ` kernel test robot
  2021-06-07  8:17 ` [PATCH rdma-next v1 13/15] RDMA: Change ops->init_port to ops->port_groups Leon Romanovsky
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

hfi1 should not be creating a mess of kobjects to attach to the port
kobject - this is all attributes. The proper API is to create an
attribute_group list and create it against the port's kobject.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/hfi1/hfi.h   |   4 -
 drivers/infiniband/hw/hfi1/sysfs.c | 529 +++++++++++------------------
 2 files changed, 193 insertions(+), 340 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 9e020bb6f405..4bb807c154b2 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -772,10 +772,6 @@ struct hfi1_pportdata {
 	struct hfi1_ibport ibport_data;
 
 	struct hfi1_devdata *dd;
-	struct kobject pport_cc_kobj;
-	struct kobject sc2vl_kobj;
-	struct kobject sl2sc_kobj;
-	struct kobject vl2mtu_kobj;
 
 	/* PHY support */
 	struct qsfp_data qsfp_info;
diff --git a/drivers/infiniband/hw/hfi1/sysfs.c b/drivers/infiniband/hw/hfi1/sysfs.c
index eaf441ece25e..326c6c2842f2 100644
--- a/drivers/infiniband/hw/hfi1/sysfs.c
+++ b/drivers/infiniband/hw/hfi1/sysfs.c
@@ -45,11 +45,21 @@
  *
  */
 #include <linux/ctype.h>
+#include <rdma/ib_sysfs.h>
 
 #include "hfi.h"
 #include "mad.h"
 #include "trace.h"
 
+static struct hfi1_pportdata *hfi1_get_pportdata_kobj(struct kobject *kobj)
+{
+	u32 port_num;
+	struct ib_device *ibdev = ib_port_sysfs_get_ibdev_kobj(kobj, &port_num);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+
+	return &dd->pport[port_num - 1];
+}
+
 /*
  * Start of per-port congestion control structures and support code
  */
@@ -57,13 +67,12 @@
 /*
  * Congestion control table size followed by table entries
  */
-static ssize_t read_cc_table_bin(struct file *filp, struct kobject *kobj,
-				 struct bin_attribute *bin_attr,
-				 char *buf, loff_t pos, size_t count)
+static ssize_t cc_table_bin_read(struct file *filp, struct kobject *kobj,
+				 struct bin_attribute *bin_attr, char *buf,
+				 loff_t pos, size_t count)
 {
 	int ret;
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, pport_cc_kobj);
+	struct hfi1_pportdata *ppd = hfi1_get_pportdata_kobj(kobj);
 	struct cc_state *cc_state;
 
 	ret = ppd->total_cct_entry * sizeof(struct ib_cc_table_entry_shadow)
@@ -89,30 +98,19 @@ static ssize_t read_cc_table_bin(struct file *filp, struct kobject *kobj,
 
 	return count;
 }
-
-static void port_release(struct kobject *kobj)
-{
-	/* nothing to do since memory is freed by hfi1_free_devdata() */
-}
-
-static const struct bin_attribute cc_table_bin_attr = {
-	.attr = {.name = "cc_table_bin", .mode = 0444},
-	.read = read_cc_table_bin,
-	.size = PAGE_SIZE,
-};
+static BIN_ATTR_RO(cc_table_bin, PAGE_SIZE);
 
 /*
  * Congestion settings: port control, control map and an array of 16
  * entries for the congestion entries - increase, timer, event log
  * trigger threshold and the minimum injection rate delay.
  */
-static ssize_t read_cc_setting_bin(struct file *filp, struct kobject *kobj,
+static ssize_t cc_setting_bin_read(struct file *filp, struct kobject *kobj,
 				   struct bin_attribute *bin_attr,
 				   char *buf, loff_t pos, size_t count)
 {
+	struct hfi1_pportdata *ppd = hfi1_get_pportdata_kobj(kobj);
 	int ret;
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, pport_cc_kobj);
 	struct cc_state *cc_state;
 
 	ret = sizeof(struct opa_congestion_setting_attr_shadow);
@@ -136,27 +134,30 @@ static ssize_t read_cc_setting_bin(struct file *filp, struct kobject *kobj,
 
 	return count;
 }
+static BIN_ATTR_RO(cc_setting_bin, PAGE_SIZE);
 
-static const struct bin_attribute cc_setting_bin_attr = {
-	.attr = {.name = "cc_settings_bin", .mode = 0444},
-	.read = read_cc_setting_bin,
-	.size = PAGE_SIZE,
-};
-
-struct hfi1_port_attr {
-	struct attribute attr;
-	ssize_t	(*show)(struct hfi1_pportdata *, char *);
-	ssize_t	(*store)(struct hfi1_pportdata *, const char *, size_t);
+static struct bin_attribute *port_cc_bin_attributes[] = {
+	&bin_attr_cc_setting_bin,
+	&bin_attr_cc_table_bin,
+	NULL
 };
 
-static ssize_t cc_prescan_show(struct hfi1_pportdata *ppd, char *buf)
+static ssize_t cc_prescan_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
 {
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_pportdata *ppd = &dd->pport[port_num - 1];
+
 	return sysfs_emit(buf, "%s\n", ppd->cc_prescan ? "on" : "off");
 }
 
-static ssize_t cc_prescan_store(struct hfi1_pportdata *ppd, const char *buf,
+static ssize_t cc_prescan_store(struct ib_device *ibdev, u32 port_num,
+				struct ib_port_attribute *attr, const char *buf,
 				size_t count)
 {
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_pportdata *ppd = &dd->pport[port_num - 1];
+
 	if (!memcmp(buf, "on", 2))
 		ppd->cc_prescan = true;
 	else if (!memcmp(buf, "off", 3))
@@ -164,60 +165,41 @@ static ssize_t cc_prescan_store(struct hfi1_pportdata *ppd, const char *buf,
 
 	return count;
 }
+static IB_PORT_ATTR_ADMIN_RW(cc_prescan);
 
-static struct hfi1_port_attr cc_prescan_attr =
-		__ATTR(cc_prescan, 0600, cc_prescan_show, cc_prescan_store);
-
-static ssize_t cc_attr_show(struct kobject *kobj, struct attribute *attr,
-			    char *buf)
-{
-	struct hfi1_port_attr *port_attr =
-		container_of(attr, struct hfi1_port_attr, attr);
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, pport_cc_kobj);
-
-	return port_attr->show(ppd, buf);
-}
-
-static ssize_t cc_attr_store(struct kobject *kobj, struct attribute *attr,
-			     const char *buf, size_t count)
-{
-	struct hfi1_port_attr *port_attr =
-		container_of(attr, struct hfi1_port_attr, attr);
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, pport_cc_kobj);
-
-	return port_attr->store(ppd, buf, count);
-}
-
-static const struct sysfs_ops port_cc_sysfs_ops = {
-	.show = cc_attr_show,
-	.store = cc_attr_store
-};
-
-static struct attribute *port_cc_default_attributes[] = {
-	&cc_prescan_attr.attr,
+static struct attribute *port_cc_attributes[] = {
+	&ib_port_attr_cc_prescan.attr,
 	NULL
 };
 
-static struct kobj_type port_cc_ktype = {
-	.release = port_release,
-	.sysfs_ops = &port_cc_sysfs_ops,
-	.default_attrs = port_cc_default_attributes
+static const struct attribute_group port_cc_group = {
+	.name = "CCMgtA",
+	.attrs = port_cc_attributes,
+	.bin_attrs = port_cc_bin_attributes,
 };
 
 /* Start sc2vl */
-#define HFI1_SC2VL_ATTR(N)				    \
-	static struct hfi1_sc2vl_attr hfi1_sc2vl_attr_##N = { \
-		.attr = { .name = __stringify(N), .mode = 0444 }, \
-		.sc = N \
-	}
-
 struct hfi1_sc2vl_attr {
-	struct attribute attr;
+	struct ib_port_attribute attr;
 	int sc;
 };
 
+static ssize_t sc2vl_attr_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
+{
+	struct hfi1_sc2vl_attr *sattr =
+		container_of(attr, struct hfi1_sc2vl_attr, attr);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+
+	return sysfs_emit(buf, "%u\n", *((u8 *)dd->sc2vl + sattr->sc));
+}
+
+#define HFI1_SC2VL_ATTR(N)                                                     \
+	static struct hfi1_sc2vl_attr hfi1_sc2vl_attr_##N = {                  \
+		.attr = __ATTR(N, 0444, sc2vl_attr_show, NULL),                \
+		.sc = N,                                                       \
+	}
+
 HFI1_SC2VL_ATTR(0);
 HFI1_SC2VL_ATTR(1);
 HFI1_SC2VL_ATTR(2);
@@ -251,78 +233,70 @@ HFI1_SC2VL_ATTR(29);
 HFI1_SC2VL_ATTR(30);
 HFI1_SC2VL_ATTR(31);
 
-static struct attribute *sc2vl_default_attributes[] = {
-	&hfi1_sc2vl_attr_0.attr,
-	&hfi1_sc2vl_attr_1.attr,
-	&hfi1_sc2vl_attr_2.attr,
-	&hfi1_sc2vl_attr_3.attr,
-	&hfi1_sc2vl_attr_4.attr,
-	&hfi1_sc2vl_attr_5.attr,
-	&hfi1_sc2vl_attr_6.attr,
-	&hfi1_sc2vl_attr_7.attr,
-	&hfi1_sc2vl_attr_8.attr,
-	&hfi1_sc2vl_attr_9.attr,
-	&hfi1_sc2vl_attr_10.attr,
-	&hfi1_sc2vl_attr_11.attr,
-	&hfi1_sc2vl_attr_12.attr,
-	&hfi1_sc2vl_attr_13.attr,
-	&hfi1_sc2vl_attr_14.attr,
-	&hfi1_sc2vl_attr_15.attr,
-	&hfi1_sc2vl_attr_16.attr,
-	&hfi1_sc2vl_attr_17.attr,
-	&hfi1_sc2vl_attr_18.attr,
-	&hfi1_sc2vl_attr_19.attr,
-	&hfi1_sc2vl_attr_20.attr,
-	&hfi1_sc2vl_attr_21.attr,
-	&hfi1_sc2vl_attr_22.attr,
-	&hfi1_sc2vl_attr_23.attr,
-	&hfi1_sc2vl_attr_24.attr,
-	&hfi1_sc2vl_attr_25.attr,
-	&hfi1_sc2vl_attr_26.attr,
-	&hfi1_sc2vl_attr_27.attr,
-	&hfi1_sc2vl_attr_28.attr,
-	&hfi1_sc2vl_attr_29.attr,
-	&hfi1_sc2vl_attr_30.attr,
-	&hfi1_sc2vl_attr_31.attr,
+static struct attribute *port_sc2vl_attributes[] = {
+	&hfi1_sc2vl_attr_0.attr.attr,
+	&hfi1_sc2vl_attr_1.attr.attr,
+	&hfi1_sc2vl_attr_2.attr.attr,
+	&hfi1_sc2vl_attr_3.attr.attr,
+	&hfi1_sc2vl_attr_4.attr.attr,
+	&hfi1_sc2vl_attr_5.attr.attr,
+	&hfi1_sc2vl_attr_6.attr.attr,
+	&hfi1_sc2vl_attr_7.attr.attr,
+	&hfi1_sc2vl_attr_8.attr.attr,
+	&hfi1_sc2vl_attr_9.attr.attr,
+	&hfi1_sc2vl_attr_10.attr.attr,
+	&hfi1_sc2vl_attr_11.attr.attr,
+	&hfi1_sc2vl_attr_12.attr.attr,
+	&hfi1_sc2vl_attr_13.attr.attr,
+	&hfi1_sc2vl_attr_14.attr.attr,
+	&hfi1_sc2vl_attr_15.attr.attr,
+	&hfi1_sc2vl_attr_16.attr.attr,
+	&hfi1_sc2vl_attr_17.attr.attr,
+	&hfi1_sc2vl_attr_18.attr.attr,
+	&hfi1_sc2vl_attr_19.attr.attr,
+	&hfi1_sc2vl_attr_20.attr.attr,
+	&hfi1_sc2vl_attr_21.attr.attr,
+	&hfi1_sc2vl_attr_22.attr.attr,
+	&hfi1_sc2vl_attr_23.attr.attr,
+	&hfi1_sc2vl_attr_24.attr.attr,
+	&hfi1_sc2vl_attr_25.attr.attr,
+	&hfi1_sc2vl_attr_26.attr.attr,
+	&hfi1_sc2vl_attr_27.attr.attr,
+	&hfi1_sc2vl_attr_28.attr.attr,
+	&hfi1_sc2vl_attr_29.attr.attr,
+	&hfi1_sc2vl_attr_30.attr.attr,
+	&hfi1_sc2vl_attr_31.attr.attr,
 	NULL
 };
 
-static ssize_t sc2vl_attr_show(struct kobject *kobj, struct attribute *attr,
-			       char *buf)
-{
-	struct hfi1_sc2vl_attr *sattr =
-		container_of(attr, struct hfi1_sc2vl_attr, attr);
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, sc2vl_kobj);
-	struct hfi1_devdata *dd = ppd->dd;
-
-	return sysfs_emit(buf, "%u\n", *((u8 *)dd->sc2vl + sattr->sc));
-}
-
-static const struct sysfs_ops hfi1_sc2vl_ops = {
-	.show = sc2vl_attr_show,
+static const struct attribute_group port_sc2vl_group = {
+	.name = "sc2vl",
+	.attrs = port_sc2vl_attributes,
 };
-
-static struct kobj_type hfi1_sc2vl_ktype = {
-	.release = port_release,
-	.sysfs_ops = &hfi1_sc2vl_ops,
-	.default_attrs = sc2vl_default_attributes
-};
-
 /* End sc2vl */
 
 /* Start sl2sc */
-#define HFI1_SL2SC_ATTR(N)				    \
-	static struct hfi1_sl2sc_attr hfi1_sl2sc_attr_##N = {	  \
-		.attr = { .name = __stringify(N), .mode = 0444 }, \
-		.sl = N						  \
-	}
-
 struct hfi1_sl2sc_attr {
-	struct attribute attr;
+	struct ib_port_attribute attr;
 	int sl;
 };
 
+static ssize_t sl2sc_attr_show(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_attribute *attr, char *buf)
+{
+	struct hfi1_sl2sc_attr *sattr =
+		container_of(attr, struct hfi1_sl2sc_attr, attr);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_ibport *ibp = &dd->pport[port_num - 1].ibport_data;
+
+	return sysfs_emit(buf, "%u\n", ibp->sl_to_sc[sattr->sl]);
+}
+
+#define HFI1_SL2SC_ATTR(N)                                                     \
+	static struct hfi1_sl2sc_attr hfi1_sl2sc_attr_##N = {                  \
+		.attr = __ATTR(N, 0444, sl2sc_attr_show, NULL), .sl = N        \
+	}
+
 HFI1_SL2SC_ATTR(0);
 HFI1_SL2SC_ATTR(1);
 HFI1_SL2SC_ATTR(2);
@@ -356,79 +330,72 @@ HFI1_SL2SC_ATTR(29);
 HFI1_SL2SC_ATTR(30);
 HFI1_SL2SC_ATTR(31);
 
-static struct attribute *sl2sc_default_attributes[] = {
-	&hfi1_sl2sc_attr_0.attr,
-	&hfi1_sl2sc_attr_1.attr,
-	&hfi1_sl2sc_attr_2.attr,
-	&hfi1_sl2sc_attr_3.attr,
-	&hfi1_sl2sc_attr_4.attr,
-	&hfi1_sl2sc_attr_5.attr,
-	&hfi1_sl2sc_attr_6.attr,
-	&hfi1_sl2sc_attr_7.attr,
-	&hfi1_sl2sc_attr_8.attr,
-	&hfi1_sl2sc_attr_9.attr,
-	&hfi1_sl2sc_attr_10.attr,
-	&hfi1_sl2sc_attr_11.attr,
-	&hfi1_sl2sc_attr_12.attr,
-	&hfi1_sl2sc_attr_13.attr,
-	&hfi1_sl2sc_attr_14.attr,
-	&hfi1_sl2sc_attr_15.attr,
-	&hfi1_sl2sc_attr_16.attr,
-	&hfi1_sl2sc_attr_17.attr,
-	&hfi1_sl2sc_attr_18.attr,
-	&hfi1_sl2sc_attr_19.attr,
-	&hfi1_sl2sc_attr_20.attr,
-	&hfi1_sl2sc_attr_21.attr,
-	&hfi1_sl2sc_attr_22.attr,
-	&hfi1_sl2sc_attr_23.attr,
-	&hfi1_sl2sc_attr_24.attr,
-	&hfi1_sl2sc_attr_25.attr,
-	&hfi1_sl2sc_attr_26.attr,
-	&hfi1_sl2sc_attr_27.attr,
-	&hfi1_sl2sc_attr_28.attr,
-	&hfi1_sl2sc_attr_29.attr,
-	&hfi1_sl2sc_attr_30.attr,
-	&hfi1_sl2sc_attr_31.attr,
+static struct attribute *port_sl2sc_attributes[] = {
+	&hfi1_sl2sc_attr_0.attr.attr,
+	&hfi1_sl2sc_attr_1.attr.attr,
+	&hfi1_sl2sc_attr_2.attr.attr,
+	&hfi1_sl2sc_attr_3.attr.attr,
+	&hfi1_sl2sc_attr_4.attr.attr,
+	&hfi1_sl2sc_attr_5.attr.attr,
+	&hfi1_sl2sc_attr_6.attr.attr,
+	&hfi1_sl2sc_attr_7.attr.attr,
+	&hfi1_sl2sc_attr_8.attr.attr,
+	&hfi1_sl2sc_attr_9.attr.attr,
+	&hfi1_sl2sc_attr_10.attr.attr,
+	&hfi1_sl2sc_attr_11.attr.attr,
+	&hfi1_sl2sc_attr_12.attr.attr,
+	&hfi1_sl2sc_attr_13.attr.attr,
+	&hfi1_sl2sc_attr_14.attr.attr,
+	&hfi1_sl2sc_attr_15.attr.attr,
+	&hfi1_sl2sc_attr_16.attr.attr,
+	&hfi1_sl2sc_attr_17.attr.attr,
+	&hfi1_sl2sc_attr_18.attr.attr,
+	&hfi1_sl2sc_attr_19.attr.attr,
+	&hfi1_sl2sc_attr_20.attr.attr,
+	&hfi1_sl2sc_attr_21.attr.attr,
+	&hfi1_sl2sc_attr_22.attr.attr,
+	&hfi1_sl2sc_attr_23.attr.attr,
+	&hfi1_sl2sc_attr_24.attr.attr,
+	&hfi1_sl2sc_attr_25.attr.attr,
+	&hfi1_sl2sc_attr_26.attr.attr,
+	&hfi1_sl2sc_attr_27.attr.attr,
+	&hfi1_sl2sc_attr_28.attr.attr,
+	&hfi1_sl2sc_attr_29.attr.attr,
+	&hfi1_sl2sc_attr_30.attr.attr,
+	&hfi1_sl2sc_attr_31.attr.attr,
 	NULL
 };
 
-static ssize_t sl2sc_attr_show(struct kobject *kobj, struct attribute *attr,
-			       char *buf)
-{
-	struct hfi1_sl2sc_attr *sattr =
-		container_of(attr, struct hfi1_sl2sc_attr, attr);
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, sl2sc_kobj);
-	struct hfi1_ibport *ibp = &ppd->ibport_data;
-
-	return sysfs_emit(buf, "%u\n", ibp->sl_to_sc[sattr->sl]);
-}
-
-static const struct sysfs_ops hfi1_sl2sc_ops = {
-	.show = sl2sc_attr_show,
-};
-
-static struct kobj_type hfi1_sl2sc_ktype = {
-	.release = port_release,
-	.sysfs_ops = &hfi1_sl2sc_ops,
-	.default_attrs = sl2sc_default_attributes
+static const struct attribute_group port_sl2sc_group = {
+	.name = "sl2sc",
+	.attrs = port_sl2sc_attributes,
 };
 
 /* End sl2sc */
 
 /* Start vl2mtu */
 
-#define HFI1_VL2MTU_ATTR(N) \
-	static struct hfi1_vl2mtu_attr hfi1_vl2mtu_attr_##N = { \
-		.attr = { .name = __stringify(N), .mode = 0444 }, \
-		.vl = N						  \
-	}
-
 struct hfi1_vl2mtu_attr {
-	struct attribute attr;
+	struct ib_port_attribute attr;
 	int vl;
 };
 
+static ssize_t vl2mtu_attr_show(struct ib_device *ibdev, u32 port_num,
+				struct ib_port_attribute *attr, char *buf)
+{
+	struct hfi1_vl2mtu_attr *vlattr =
+		container_of(attr, struct hfi1_vl2mtu_attr, attr);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+
+	return sysfs_emit(buf, "%u\n", dd->vld[vlattr->vl].mtu);
+}
+
+#define HFI1_VL2MTU_ATTR(N)                                                    \
+	static struct hfi1_vl2mtu_attr hfi1_vl2mtu_attr_##N = {                \
+		.attr = __ATTR(N, 0444, vl2mtu_attr_show, NULL),               \
+		.vl = N,                                                       \
+	}
+
 HFI1_VL2MTU_ATTR(0);
 HFI1_VL2MTU_ATTR(1);
 HFI1_VL2MTU_ATTR(2);
@@ -446,46 +413,29 @@ HFI1_VL2MTU_ATTR(13);
 HFI1_VL2MTU_ATTR(14);
 HFI1_VL2MTU_ATTR(15);
 
-static struct attribute *vl2mtu_default_attributes[] = {
-	&hfi1_vl2mtu_attr_0.attr,
-	&hfi1_vl2mtu_attr_1.attr,
-	&hfi1_vl2mtu_attr_2.attr,
-	&hfi1_vl2mtu_attr_3.attr,
-	&hfi1_vl2mtu_attr_4.attr,
-	&hfi1_vl2mtu_attr_5.attr,
-	&hfi1_vl2mtu_attr_6.attr,
-	&hfi1_vl2mtu_attr_7.attr,
-	&hfi1_vl2mtu_attr_8.attr,
-	&hfi1_vl2mtu_attr_9.attr,
-	&hfi1_vl2mtu_attr_10.attr,
-	&hfi1_vl2mtu_attr_11.attr,
-	&hfi1_vl2mtu_attr_12.attr,
-	&hfi1_vl2mtu_attr_13.attr,
-	&hfi1_vl2mtu_attr_14.attr,
-	&hfi1_vl2mtu_attr_15.attr,
+static struct attribute *port_vl2mtu_attributes[] = {
+	&hfi1_vl2mtu_attr_0.attr.attr,
+	&hfi1_vl2mtu_attr_1.attr.attr,
+	&hfi1_vl2mtu_attr_2.attr.attr,
+	&hfi1_vl2mtu_attr_3.attr.attr,
+	&hfi1_vl2mtu_attr_4.attr.attr,
+	&hfi1_vl2mtu_attr_5.attr.attr,
+	&hfi1_vl2mtu_attr_6.attr.attr,
+	&hfi1_vl2mtu_attr_7.attr.attr,
+	&hfi1_vl2mtu_attr_8.attr.attr,
+	&hfi1_vl2mtu_attr_9.attr.attr,
+	&hfi1_vl2mtu_attr_10.attr.attr,
+	&hfi1_vl2mtu_attr_11.attr.attr,
+	&hfi1_vl2mtu_attr_12.attr.attr,
+	&hfi1_vl2mtu_attr_13.attr.attr,
+	&hfi1_vl2mtu_attr_14.attr.attr,
+	&hfi1_vl2mtu_attr_15.attr.attr,
 	NULL
 };
 
-static ssize_t vl2mtu_attr_show(struct kobject *kobj, struct attribute *attr,
-				char *buf)
-{
-	struct hfi1_vl2mtu_attr *vlattr =
-		container_of(attr, struct hfi1_vl2mtu_attr, attr);
-	struct hfi1_pportdata *ppd =
-		container_of(kobj, struct hfi1_pportdata, vl2mtu_kobj);
-	struct hfi1_devdata *dd = ppd->dd;
-
-	return sysfs_emit(buf, "%u\n", dd->vld[vlattr->vl].mtu);
-}
-
-static const struct sysfs_ops hfi1_vl2mtu_ops = {
-	.show = vl2mtu_attr_show,
-};
-
-static struct kobj_type hfi1_vl2mtu_ktype = {
-	.release = port_release,
-	.sysfs_ops = &hfi1_vl2mtu_ops,
-	.default_attrs = vl2mtu_default_attributes
+static const struct attribute_group port_vl2mtu_group = {
+	.name = "vl2mtu",
+	.attrs = port_vl2mtu_attributes,
 };
 
 /* end of per-port file structures and support code */
@@ -649,100 +599,17 @@ const struct attribute_group ib_hfi1_attr_group = {
 	.attrs = hfi1_attributes,
 };
 
+static const struct attribute_group *hfi1_port_groups[] = {
+	&port_sc2vl_group,
+	&port_sl2sc_group,
+	&port_vl2mtu_group,
+	NULL,
+};
+
 int hfi1_create_port_files(struct ib_device *ibdev, u32 port_num,
 			   struct kobject *kobj)
 {
-	struct hfi1_pportdata *ppd;
-	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
-	int ret;
-
-	if (!port_num || port_num > dd->num_pports) {
-		dd_dev_err(dd,
-			   "Skipping infiniband class with invalid port %u\n",
-			   port_num);
-		return -ENODEV;
-	}
-	ppd = &dd->pport[port_num - 1];
-
-	ret = kobject_init_and_add(&ppd->sc2vl_kobj, &hfi1_sc2vl_ktype, kobj,
-				   "sc2vl");
-	if (ret) {
-		dd_dev_err(dd,
-			   "Skipping sc2vl sysfs info, (err %d) port %u\n",
-			   ret, port_num);
-		/*
-		 * Based on the documentation for kobject_init_and_add(), the
-		 * caller should call kobject_put even if this call fails.
-		 */
-		goto bail_sc2vl;
-	}
-	kobject_uevent(&ppd->sc2vl_kobj, KOBJ_ADD);
-
-	ret = kobject_init_and_add(&ppd->sl2sc_kobj, &hfi1_sl2sc_ktype, kobj,
-				   "sl2sc");
-	if (ret) {
-		dd_dev_err(dd,
-			   "Skipping sl2sc sysfs info, (err %d) port %u\n",
-			   ret, port_num);
-		goto bail_sl2sc;
-	}
-	kobject_uevent(&ppd->sl2sc_kobj, KOBJ_ADD);
-
-	ret = kobject_init_and_add(&ppd->vl2mtu_kobj, &hfi1_vl2mtu_ktype, kobj,
-				   "vl2mtu");
-	if (ret) {
-		dd_dev_err(dd,
-			   "Skipping vl2mtu sysfs info, (err %d) port %u\n",
-			   ret, port_num);
-		goto bail_vl2mtu;
-	}
-	kobject_uevent(&ppd->vl2mtu_kobj, KOBJ_ADD);
-
-	ret = kobject_init_and_add(&ppd->pport_cc_kobj, &port_cc_ktype,
-				   kobj, "CCMgtA");
-	if (ret) {
-		dd_dev_err(dd,
-			   "Skipping Congestion Control sysfs info, (err %d) port %u\n",
-			   ret, port_num);
-		goto bail_cc;
-	}
-
-	kobject_uevent(&ppd->pport_cc_kobj, KOBJ_ADD);
-
-	ret = sysfs_create_bin_file(&ppd->pport_cc_kobj, &cc_setting_bin_attr);
-	if (ret) {
-		dd_dev_err(dd,
-			   "Skipping Congestion Control setting sysfs info, (err %d) port %u\n",
-			   ret, port_num);
-		goto bail_cc;
-	}
-
-	ret = sysfs_create_bin_file(&ppd->pport_cc_kobj, &cc_table_bin_attr);
-	if (ret) {
-		dd_dev_err(dd,
-			   "Skipping Congestion Control table sysfs info, (err %d) port %u\n",
-			   ret, port_num);
-		goto bail_cc_entry_bin;
-	}
-
-	dd_dev_info(dd,
-		    "Congestion Control Agent enabled for port %d\n",
-		    port_num);
-
-	return 0;
-
-bail_cc_entry_bin:
-	sysfs_remove_bin_file(&ppd->pport_cc_kobj,
-			      &cc_setting_bin_attr);
-bail_cc:
-	kobject_put(&ppd->pport_cc_kobj);
-bail_vl2mtu:
-	kobject_put(&ppd->vl2mtu_kobj);
-bail_sl2sc:
-	kobject_put(&ppd->sl2sc_kobj);
-bail_sc2vl:
-	kobject_put(&ppd->sc2vl_kobj);
-	return ret;
+	return ib_port_sysfs_create_groups(ibdev, port_num, hfi1_port_groups);
 }
 
 struct sde_attribute {
@@ -868,23 +735,13 @@ int hfi1_verbs_register_sysfs(struct hfi1_devdata *dd)
  */
 void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *dd)
 {
-	struct hfi1_pportdata *ppd;
 	int i;
 
 	/* Unwind operations in hfi1_verbs_register_sysfs() */
 	for (i = 0; i < dd->num_sdma; i++)
 		kobject_put(&dd->per_sdma[i].kobj);
 
-	for (i = 0; i < dd->num_pports; i++) {
-		ppd = &dd->pport[i];
-
-		sysfs_remove_bin_file(&ppd->pport_cc_kobj,
-				      &cc_setting_bin_attr);
-		sysfs_remove_bin_file(&ppd->pport_cc_kobj,
-				      &cc_table_bin_attr);
-		kobject_put(&ppd->pport_cc_kobj);
-		kobject_put(&ppd->vl2mtu_kobj);
-		kobject_put(&ppd->sl2sc_kobj);
-		kobject_put(&ppd->sc2vl_kobj);
-	}
+	for (i = 0; i < dd->num_pports; i++)
+		ib_port_sysfs_remove_groups(&dd->verbs_dev.rdi.ibdev, i + 1,
+					    hfi1_port_groups);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 13/15] RDMA: Change ops->init_port to ops->port_groups
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (11 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 12/15] RDMA/hfi1: " Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces Leon Romanovsky
  2021-06-07  8:17 ` [PATCH rdma-next v1 15/15] RDMA: Remove rdma_set_device_sysfs_group() Leon Romanovsky
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

init_port was only being used to register sysfs attributes against the
port kobject. Now that all users are creating static attribute_group's we can
simply set the attribute_group list in the ops and the core code can just
handle it directly.

This makes all the sysfs management quite straightforward and prevents any
driver from abusing the naked port kobject in future because no driver
code can access it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c      |  4 +--
 drivers/infiniband/core/sysfs.c       | 39 +++++++++------------------
 drivers/infiniband/hw/hfi1/hfi.h      |  3 +--
 drivers/infiniband/hw/hfi1/sysfs.c    | 12 +--------
 drivers/infiniband/hw/hfi1/verbs.c    |  2 +-
 drivers/infiniband/hw/qib/qib.h       |  4 +--
 drivers/infiniband/hw/qib/qib_sysfs.c | 20 +-------------
 drivers/infiniband/hw/qib/qib_verbs.c |  4 +--
 drivers/infiniband/sw/rdmavt/vt.c     |  2 +-
 include/rdma/ib_sysfs.h               |  4 ---
 include/rdma/ib_verbs.h               |  9 +++----
 11 files changed, 25 insertions(+), 78 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 030a4041b2e0..2cbd77933ea5 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1703,7 +1703,7 @@ int ib_device_set_netns_put(struct sk_buff *skb,
 	 * port_cleanup infrastructure is implemented, this limitation will be
 	 * removed.
 	 */
-	if (!dev->ops.disassociate_ucontext || dev->ops.init_port ||
+	if (!dev->ops.disassociate_ucontext || dev->ops.port_groups ||
 	    ib_devices_shared_netns) {
 		ret = -EOPNOTSUPP;
 		goto ns_err;
@@ -2668,7 +2668,6 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, get_vf_config);
 	SET_DEVICE_OP(dev_ops, get_vf_guid);
 	SET_DEVICE_OP(dev_ops, get_vf_stats);
-	SET_DEVICE_OP(dev_ops, init_port);
 	SET_DEVICE_OP(dev_ops, iw_accept);
 	SET_DEVICE_OP(dev_ops, iw_add_ref);
 	SET_DEVICE_OP(dev_ops, iw_connect);
@@ -2691,6 +2690,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, modify_wq);
 	SET_DEVICE_OP(dev_ops, peek_cq);
 	SET_DEVICE_OP(dev_ops, poll_cq);
+	SET_DEVICE_OP(dev_ops, port_groups);
 	SET_DEVICE_OP(dev_ops, post_recv);
 	SET_DEVICE_OP(dev_ops, post_send);
 	SET_DEVICE_OP(dev_ops, post_srq_recv);
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index e550a7eb37f6..09a2e1066df0 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -128,22 +128,6 @@ static ssize_t port_attr_store(struct kobject *kobj,
 	return port_attr->store(p->ibdev, p->port_num, port_attr, buf, count);
 }
 
-int ib_port_sysfs_create_groups(struct ib_device *ibdev, u32 port_num,
-				const struct attribute_group **groups)
-{
-	return sysfs_create_groups(&ibdev->port_data[port_num].sysfs->kobj,
-				   groups);
-}
-EXPORT_SYMBOL_GPL(ib_port_sysfs_create_groups);
-
-void ib_port_sysfs_remove_groups(struct ib_device *ibdev, u32 port_num,
-				 const struct attribute_group **groups)
-{
-	return sysfs_remove_groups(&ibdev->port_data[port_num].sysfs->kobj,
-				   groups);
-}
-EXPORT_SYMBOL_GPL(ib_port_sysfs_remove_groups);
-
 struct ib_device *ib_port_sysfs_get_ibdev_kobj(struct kobject *kobj,
 					       u32 *port_num)
 {
@@ -1252,6 +1236,11 @@ static struct ib_port *setup_port(struct ib_core_device *coredev, int port_num,
 	ret = sysfs_create_groups(&p->kobj, p->groups_list);
 	if (ret)
 		goto err_del;
+	if (is_full_dev) {
+		ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
+		if (ret)
+			goto err_groups;
+	}
 
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
 	if (device->port_data && is_full_dev)
@@ -1259,6 +1248,8 @@ static struct ib_port *setup_port(struct ib_core_device *coredev, int port_num,
 
 	return p;
 
+err_groups:
+	sysfs_remove_groups(&p->kobj, p->groups_list);
 err_del:
 	kobject_del(&p->kobj);
 err_put:
@@ -1266,12 +1257,16 @@ static struct ib_port *setup_port(struct ib_core_device *coredev, int port_num,
 	return ERR_PTR(ret);
 }
 
-static void destroy_port(struct ib_port *port)
+static void destroy_port(struct ib_core_device *coredev, struct ib_port *port)
 {
+	bool is_full_dev = &port->ibdev->coredev == coredev;
+
 	if (port->ibdev->port_data &&
 	    port->ibdev->port_data[port->port_num].sysfs == port)
 		port->ibdev->port_data[port->port_num].sysfs = NULL;
 	list_del(&port->kobj.entry);
+	if (is_full_dev)
+		sysfs_remove_groups(&port->kobj, port->ibdev->ops.port_groups);
 	sysfs_remove_groups(&port->kobj, port->groups_list);
 	kobject_del(&port->kobj);
 	kobject_put(&port->kobj);
@@ -1397,7 +1392,7 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 
 		destroy_gid_attrs(port);
-		destroy_port(port);
+		destroy_port(coredev, port);
 	}
 
 	kobject_put(coredev->ports_kobj);
@@ -1406,7 +1401,6 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 int ib_setup_port_attrs(struct ib_core_device *coredev)
 {
 	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
-	bool is_full_dev = &device->coredev == coredev;
 	u32 port_num;
 	int ret;
 
@@ -1432,13 +1426,6 @@ int ib_setup_port_attrs(struct ib_core_device *coredev)
 		ret = setup_gid_attrs(port, &attr);
 		if (ret)
 			goto err_put;
-
-		if (device->ops.init_port && is_full_dev) {
-			ret = device->ops.init_port(device, port_num,
-						    &port->kobj);
-			if (ret)
-				goto err_put;
-		}
 	}
 	return 0;
 
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 4bb807c154b2..31664f43c27f 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -2184,12 +2184,11 @@ static inline bool hfi1_packet_present(struct hfi1_ctxtdata *rcd)
 
 extern const char ib_hfi1_version[];
 extern const struct attribute_group ib_hfi1_attr_group;
+extern const struct attribute_group *hfi1_attr_port_groups[];
 
 int hfi1_device_create(struct hfi1_devdata *dd);
 void hfi1_device_remove(struct hfi1_devdata *dd);
 
-int hfi1_create_port_files(struct ib_device *ibdev, u32 port_num,
-			   struct kobject *kobj);
 int hfi1_verbs_register_sysfs(struct hfi1_devdata *dd);
 void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *dd);
 /* Hook for sysfs read of QSFP */
diff --git a/drivers/infiniband/hw/hfi1/sysfs.c b/drivers/infiniband/hw/hfi1/sysfs.c
index 326c6c2842f2..96ec62ae5b72 100644
--- a/drivers/infiniband/hw/hfi1/sysfs.c
+++ b/drivers/infiniband/hw/hfi1/sysfs.c
@@ -599,19 +599,13 @@ const struct attribute_group ib_hfi1_attr_group = {
 	.attrs = hfi1_attributes,
 };
 
-static const struct attribute_group *hfi1_port_groups[] = {
+const struct attribute_group *hfi1_attr_port_groups[] = {
 	&port_sc2vl_group,
 	&port_sl2sc_group,
 	&port_vl2mtu_group,
 	NULL,
 };
 
-int hfi1_create_port_files(struct ib_device *ibdev, u32 port_num,
-			   struct kobject *kobj)
-{
-	return ib_port_sysfs_create_groups(ibdev, port_num, hfi1_port_groups);
-}
-
 struct sde_attribute {
 	struct attribute attr;
 	ssize_t (*show)(struct sdma_engine *sde, char *buf);
@@ -740,8 +734,4 @@ void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *dd)
 	/* Unwind operations in hfi1_verbs_register_sysfs() */
 	for (i = 0; i < dd->num_sdma; i++)
 		kobject_put(&dd->per_sdma[i].kobj);
-
-	for (i = 0; i < dd->num_pports; i++)
-		ib_port_sysfs_remove_groups(&dd->verbs_dev.rdi.ibdev, i + 1,
-					    hfi1_port_groups);
 }
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 85deba07a675..49c6ed267a47 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1791,8 +1791,8 @@ static const struct ib_device_ops hfi1_dev_ops = {
 	.alloc_rdma_netdev = hfi1_vnic_alloc_rn,
 	.get_dev_fw_str = hfi1_get_dev_fw_str,
 	.get_hw_stats = get_hw_stats,
-	.init_port = hfi1_create_port_files,
 	.modify_device = modify_device,
+	.port_groups = hfi1_attr_port_groups,
 	/* keep process mad in the driver */
 	.process_mad = hfi1_process_mad,
 	.rdma_netdev_get_params = hfi1_ipoib_rn_get_params,
diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index b8a2deb5b4d2..9363bccfc6e7 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -1361,13 +1361,11 @@ static inline u32 qib_get_rcvhdrtail(const struct qib_ctxtdata *rcd)
 
 extern const char ib_qib_version[];
 extern const struct attribute_group qib_attr_group;
+extern const struct attribute_group *qib_attr_port_groups[];
 
 int qib_device_create(struct qib_devdata *);
 void qib_device_remove(struct qib_devdata *);
 
-int qib_create_port_files(struct ib_device *ibdev, u32 port_num,
-			  struct kobject *kobj);
-void qib_verbs_unregister_sysfs(struct qib_devdata *);
 /* Hook for sysfs read of QSFP */
 extern int qib_qsfp_dump(struct qib_pportdata *ppd, char *buf, int len);
 
diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c b/drivers/infiniband/hw/qib/qib_sysfs.c
index a1e22c498712..d57e49de6650 100644
--- a/drivers/infiniband/hw/qib/qib_sysfs.c
+++ b/drivers/infiniband/hw/qib/qib_sysfs.c
@@ -545,7 +545,7 @@ static const struct attribute_group port_diagc_group = {
 
 /* End diag_counters */
 
-static const struct attribute_group *qib_port_groups[] = {
+const struct attribute_group *qib_attr_port_groups[] = {
 	&port_linkcontrol_group,
 	&port_ccmgta_attribute_group,
 	&port_sl2vl_group,
@@ -733,21 +733,3 @@ static struct attribute *qib_attributes[] = {
 const struct attribute_group qib_attr_group = {
 	.attrs = qib_attributes,
 };
-
-int qib_create_port_files(struct ib_device *ibdev, u32 port_num,
-			  struct kobject *kobj)
-{
-	return ib_port_sysfs_create_groups(ibdev, port_num, qib_port_groups);
-}
-
-/*
- * Unregister and remove our files in /sys/class/infiniband.
- */
-void qib_verbs_unregister_sysfs(struct qib_devdata *dd)
-{
-	int i;
-
-	for (i = 0; i < dd->num_pports; i++)
-		ib_port_sysfs_remove_groups(&dd->verbs_dev.rdi.ibdev, i,
-					    qib_port_groups);
-}
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index d17d034ecdfd..8640a75d61d9 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1483,7 +1483,7 @@ static const struct ib_device_ops qib_dev_ops = {
 	.owner = THIS_MODULE,
 	.driver_id = RDMA_DRIVER_QIB,
 
-	.init_port = qib_create_port_files,
+	.port_groups = qib_attr_port_groups,
 	.modify_device = qib_modify_device,
 	.process_mad = qib_process_mad,
 };
@@ -1644,8 +1644,6 @@ void qib_unregister_ib_device(struct qib_devdata *dd)
 {
 	struct qib_ibdev *dev = &dd->verbs_dev;
 
-	qib_verbs_unregister_sysfs(dd);
-
 	rvt_unregister_device(&dd->verbs_dev.rdi);
 
 	if (!list_empty(&dev->piowait))
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 3749380ff193..ac17209816cd 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -418,7 +418,7 @@ static noinline int check_support(struct rvt_dev_info *rdi, int verb)
 		 * These functions are not part of verbs specifically but are
 		 * required for rdmavt to function.
 		 */
-		if ((!rdi->ibdev.ops.init_port) ||
+		if ((!rdi->ibdev.ops.port_groups) ||
 		    (!rdi->driver_f.get_pci_dev))
 			return -EINVAL;
 		break;
diff --git a/include/rdma/ib_sysfs.h b/include/rdma/ib_sysfs.h
index f869d0e4fd30..3b77cfd74d9a 100644
--- a/include/rdma/ib_sysfs.h
+++ b/include/rdma/ib_sysfs.h
@@ -31,10 +31,6 @@ struct ib_port_attribute {
 #define IB_PORT_ATTR_WO(_name)                                                 \
 	struct ib_port_attribute ib_port_attr_##_name = __ATTR_WO(_name)
 
-int ib_port_sysfs_create_groups(struct ib_device *ibdev, u32 port_num,
-				const struct attribute_group **groups);
-void ib_port_sysfs_remove_groups(struct ib_device *ibdev, u32 port_num,
-				 const struct attribute_group **groups);
 struct ib_device *ib_port_sysfs_get_ibdev_kobj(struct kobject *kobj,
 					       u32 *port_num);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ca1cb82a543..303471585dde 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2300,6 +2300,8 @@ struct ib_device_ops {
 	u32 uverbs_abi_ver;
 	unsigned int uverbs_no_driver_id_binding:1;
 
+	const struct attribute_group **port_groups;
+
 	int (*post_send)(struct ib_qp *qp, const struct ib_send_wr *send_wr,
 			 const struct ib_send_wr **bad_send_wr);
 	int (*post_recv)(struct ib_qp *qp, const struct ib_recv_wr *recv_wr,
@@ -2546,12 +2548,7 @@ struct ib_device_ops {
 	 */
 	int (*get_hw_stats)(struct ib_device *device,
 			    struct rdma_hw_stats *stats, u32 port, int index);
-	/*
-	 * This function is called once for each port when a ib device is
-	 * registered.
-	 */
-	int (*init_port)(struct ib_device *device, u32 port_num,
-			 struct kobject *port_sysfs);
+
 	/**
 	 * Allows rdma drivers to add their own restrack attributes.
 	 */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (12 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 13/15] RDMA: Change ops->init_port to ops->port_groups Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  2021-06-07 13:29   ` Parav Pandit
  2021-06-07  8:17 ` [PATCH rdma-next v1 15/15] RDMA: Remove rdma_set_device_sysfs_group() Leon Romanovsky
  14 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

Now that the port_groups data is being destroyed and managed by the core
code this restriction is no longer needed. All the ib_port_attrs are
compatible with the core's sysfs lifecycle.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c | 10 ++++------
 drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 2cbd77933ea5..92f224a97481 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1698,13 +1698,11 @@ int ib_device_set_netns_put(struct sk_buff *skb,
 	}
 
 	/*
-	 * Currently supported only for those providers which support
-	 * disassociation and don't do port specific sysfs init. Once a
-	 * port_cleanup infrastructure is implemented, this limitation will be
-	 * removed.
+	 * All the ib_clients, including uverbs, are reset when the namespace is
+	 * changed and this cannot be blocked waiting for userspace to do
+	 * something, so disassociation is mandatory.
 	 */
-	if (!dev->ops.disassociate_ucontext || dev->ops.port_groups ||
-	    ib_devices_shared_netns) {
+	if (!dev->ops.disassociate_ucontext || ib_devices_shared_netns) {
 		ret = -EOPNOTSUPP;
 		goto ns_err;
 	}
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 09a2e1066df0..f42034fcf3d9 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct ib_core_device *coredev, int port_num,
 	ret = sysfs_create_groups(&p->kobj, p->groups_list);
 	if (ret)
 		goto err_del;
-	if (is_full_dev) {
-		ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
-		if (ret)
-			goto err_groups;
-	}
+	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
+	if (ret)
+		goto err_groups;
 
 	list_add_tail(&p->kobj.entry, &coredev->port_list);
 	if (device->port_data && is_full_dev)
@@ -1257,16 +1255,13 @@ static struct ib_port *setup_port(struct ib_core_device *coredev, int port_num,
 	return ERR_PTR(ret);
 }
 
-static void destroy_port(struct ib_core_device *coredev, struct ib_port *port)
+static void destroy_port(struct ib_port *port)
 {
-	bool is_full_dev = &port->ibdev->coredev == coredev;
-
 	if (port->ibdev->port_data &&
 	    port->ibdev->port_data[port->port_num].sysfs == port)
 		port->ibdev->port_data[port->port_num].sysfs = NULL;
 	list_del(&port->kobj.entry);
-	if (is_full_dev)
-		sysfs_remove_groups(&port->kobj, port->ibdev->ops.port_groups);
+	sysfs_remove_groups(&port->kobj, port->ibdev->ops.port_groups);
 	sysfs_remove_groups(&port->kobj, port->groups_list);
 	kobject_del(&port->kobj);
 	kobject_put(&port->kobj);
@@ -1392,7 +1387,7 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
 		struct ib_port *port = container_of(p, struct ib_port, kobj);
 
 		destroy_gid_attrs(port);
-		destroy_port(coredev, port);
+		destroy_port(port);
 	}
 
 	kobject_put(coredev->ports_kobj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rdma-next v1 15/15] RDMA: Remove rdma_set_device_sysfs_group()
  2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
                   ` (13 preceding siblings ...)
  2021-06-07  8:17 ` [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces Leon Romanovsky
@ 2021-06-07  8:17 ` Leon Romanovsky
  14 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07  8:17 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

The driver's device group can be specified as part of the ops structure
like the device's port group. No need for the complicated API.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c              |  4 ++-
 drivers/infiniband/hw/bnxt_re/main.c          |  2 +-
 drivers/infiniband/hw/cxgb4/provider.c        |  2 +-
 drivers/infiniband/hw/hfi1/verbs.c            |  4 +--
 drivers/infiniband/hw/mlx4/main.c             |  2 +-
 drivers/infiniband/hw/mlx5/main.c             |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c    |  2 +-
 drivers/infiniband/hw/qedr/main.c             |  2 +-
 drivers/infiniband/hw/qib/qib_verbs.c         |  2 +-
 drivers/infiniband/hw/usnic/usnic_ib_main.c   |  3 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_main.c    |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |  2 +-
 include/rdma/ib_verbs.h                       | 30 +++++--------------
 14 files changed, 22 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 92f224a97481..53a7ec66e0d1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -586,7 +586,6 @@ struct ib_device *_ib_alloc_device(size_t size)
 		return NULL;
 	}
 
-	device->groups[0] = &ib_dev_attr_group;
 	rdma_init_coredev(&device->coredev, device, &init_net);
 
 	INIT_LIST_HEAD(&device->event_handler_list);
@@ -1396,6 +1395,8 @@ int ib_register_device(struct ib_device *device, const char *name,
 		return ret;
 	}
 
+	device->groups[0] = &ib_dev_attr_group;
+	device->groups[1] = device->ops.device_group;
 	ret = ib_setup_device_attrs(device);
 	if (ret)
 		goto cache_cleanup;
@@ -2643,6 +2644,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, destroy_rwq_ind_table);
 	SET_DEVICE_OP(dev_ops, destroy_srq);
 	SET_DEVICE_OP(dev_ops, destroy_wq);
+	SET_DEVICE_OP(dev_ops, device_group);
 	SET_DEVICE_OP(dev_ops, detach_mcast);
 	SET_DEVICE_OP(dev_ops, disassociate_ucontext);
 	SET_DEVICE_OP(dev_ops, drain_rq);
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 58bd905048ae..2798a03a6c31 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -647,6 +647,7 @@ static const struct ib_device_ops bnxt_re_dev_ops = {
 	.destroy_cq = bnxt_re_destroy_cq,
 	.destroy_qp = bnxt_re_destroy_qp,
 	.destroy_srq = bnxt_re_destroy_srq,
+	.device_group = &bnxt_re_dev_attr_group,
 	.get_dev_fw_str = bnxt_re_query_fw_str,
 	.get_dma_mr = bnxt_re_get_dma_mr,
 	.get_hw_stats = bnxt_re_ib_get_hw_stats,
@@ -693,7 +694,6 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 	ibdev->dev.parent = &rdev->en_dev->pdev->dev;
 	ibdev->local_dma_lkey = BNXT_QPLIB_RSVD_LKEY;
 
-	rdma_set_device_sysfs_group(ibdev, &bnxt_re_dev_attr_group);
 	ib_set_device_ops(ibdev, &bnxt_re_dev_ops);
 	ret = ib_device_set_netdev(&rdev->ibdev, rdev->netdev, 1);
 	if (ret)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 4e453d1dde11..881d515eb15a 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -465,6 +465,7 @@ static const struct ib_device_ops c4iw_dev_ops = {
 	.destroy_cq = c4iw_destroy_cq,
 	.destroy_qp = c4iw_destroy_qp,
 	.destroy_srq = c4iw_destroy_srq,
+	.device_group = &c4iw_attr_group,
 	.fill_res_cq_entry = c4iw_fill_res_cq_entry,
 	.fill_res_cm_id_entry = c4iw_fill_res_cm_id_entry,
 	.fill_res_mr_entry = c4iw_fill_res_mr_entry,
@@ -539,7 +540,6 @@ void c4iw_register_device(struct work_struct *work)
 	memcpy(dev->ibdev.iw_ifname, dev->rdev.lldi.ports[0]->name,
 	       sizeof(dev->ibdev.iw_ifname));
 
-	rdma_set_device_sysfs_group(&dev->ibdev, &c4iw_attr_group);
 	ib_set_device_ops(&dev->ibdev, &c4iw_dev_ops);
 	ret = set_netdevs(&dev->ibdev, &dev->rdev);
 	if (ret)
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 49c6ed267a47..9b198c35e1a1 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1789,6 +1789,7 @@ static const struct ib_device_ops hfi1_dev_ops = {
 	.alloc_hw_device_stats = hfi1_alloc_hw_device_stats,
 	.alloc_hw_port_stats = hfi_alloc_hw_port_stats,
 	.alloc_rdma_netdev = hfi1_vnic_alloc_rn,
+	.device_group = &ib_hfi1_attr_group,
 	.get_dev_fw_str = hfi1_get_dev_fw_str,
 	.get_hw_stats = get_hw_stats,
 	.modify_device = modify_device,
@@ -1927,9 +1928,6 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
 			      i,
 			      ppd->pkeys);
 
-	rdma_set_device_sysfs_group(&dd->verbs_dev.rdi.ibdev,
-				    &ib_hfi1_attr_group);
-
 	ret = rvt_register_device(&dd->verbs_dev.rdi);
 	if (ret)
 		goto err_verbs_txreq;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 341162aa2175..6cdcbbc53843 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2547,6 +2547,7 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 	.destroy_qp = mlx4_ib_destroy_qp,
 	.destroy_srq = mlx4_ib_destroy_srq,
 	.detach_mcast = mlx4_ib_mcg_detach,
+	.device_group = &mlx4_attr_group,
 	.disassociate_ucontext = mlx4_ib_disassociate_ucontext,
 	.drain_rq = mlx4_ib_drain_rq,
 	.drain_sq = mlx4_ib_drain_sq,
@@ -2806,7 +2807,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	if (mlx4_ib_alloc_diag_counters(ibdev))
 		goto err_steer_free_bitmap;
 
-	rdma_set_device_sysfs_group(&ibdev->ib_dev, &mlx4_attr_group);
 	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d",
 			       &dev->persist->pdev->dev))
 		goto err_diag_counters;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index e541a9f5f163..c12517b63a8d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3752,6 +3752,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.disassociate_ucontext = mlx5_ib_disassociate_ucontext,
 	.drain_rq = mlx5_ib_drain_rq,
 	.drain_sq = mlx5_ib_drain_sq,
+	.device_group = &mlx5_attr_group,
 	.enable_driver = mlx5_ib_enable_driver,
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = mlx5_ib_get_dma_mr,
@@ -4039,7 +4040,6 @@ static int mlx5_ib_stage_ib_reg_init(struct mlx5_ib_dev *dev)
 {
 	const char *name;
 
-	rdma_set_device_sysfs_group(&dev->ib_dev, &mlx5_attr_group);
 	if (!mlx5_lag_is_roce(dev->mdev))
 		name = "mlx5_%d";
 	else
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 522bb606120e..adf4fcf0fee4 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1099,6 +1099,7 @@ static const struct ib_device_ops mthca_dev_ops = {
 	.destroy_cq = mthca_destroy_cq,
 	.destroy_qp = mthca_destroy_qp,
 	.detach_mcast = mthca_multicast_detach,
+	.device_group = &mthca_attr_group,
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = mthca_get_dma_mr,
 	.get_port_immutable = mthca_port_immutable,
@@ -1186,7 +1187,6 @@ int mthca_register_device(struct mthca_dev *dev)
 
 	mutex_init(&dev->cap_mask_mutex);
 
-	rdma_set_device_sysfs_group(&dev->ib_dev, &mthca_attr_group);
 	ret = ib_register_device(&dev->ib_dev, "mthca%d", &dev->pdev->dev);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 4882b3156edb..f329db0c591f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -161,6 +161,7 @@ static const struct ib_device_ops ocrdma_dev_ops = {
 	.destroy_ah = ocrdma_destroy_ah,
 	.destroy_cq = ocrdma_destroy_cq,
 	.destroy_qp = ocrdma_destroy_qp,
+	.device_group = &ocrdma_attr_group,
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = ocrdma_get_dma_mr,
 	.get_link_layer = ocrdma_link_layer,
@@ -218,7 +219,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
 	if (ocrdma_get_asic_type(dev) == OCRDMA_ASIC_GEN_SKH_R)
 		ib_set_device_ops(&dev->ibdev, &ocrdma_dev_srq_ops);
 
-	rdma_set_device_sysfs_group(&dev->ibdev, &ocrdma_attr_group);
 	ret = ib_device_set_netdev(&dev->ibdev, dev->nic_info.netdev, 1);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 8334a9850220..de98e0604f91 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -208,6 +208,7 @@ static const struct ib_device_ops qedr_dev_ops = {
 	.destroy_cq = qedr_destroy_cq,
 	.destroy_qp = qedr_destroy_qp,
 	.destroy_srq = qedr_destroy_srq,
+	.device_group = &qedr_attr_group,
 	.get_dev_fw_str = qedr_get_dev_fw_str,
 	.get_dma_mr = qedr_get_dma_mr,
 	.get_link_layer = qedr_link_layer,
@@ -256,7 +257,6 @@ static int qedr_register_device(struct qedr_dev *dev)
 	dev->ibdev.num_comp_vectors = dev->num_cnq;
 	dev->ibdev.dev.parent = &dev->pdev->dev;
 
-	rdma_set_device_sysfs_group(&dev->ibdev, &qedr_attr_group);
 	ib_set_device_ops(&dev->ibdev, &qedr_dev_ops);
 
 	rc = ib_device_set_netdev(&dev->ibdev, dev->ndev, 1);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 8640a75d61d9..ef91bff5c23c 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1484,6 +1484,7 @@ static const struct ib_device_ops qib_dev_ops = {
 	.driver_id = RDMA_DRIVER_QIB,
 
 	.port_groups = qib_attr_port_groups,
+	.device_group = &qib_attr_group,
 	.modify_device = qib_modify_device,
 	.process_mad = qib_process_mad,
 };
@@ -1612,7 +1613,6 @@ int qib_register_ib_device(struct qib_devdata *dd)
 			      i,
 			      dd->rcd[ctxt]->pkeys);
 	}
-	rdma_set_device_sysfs_group(&dd->verbs_dev.rdi.ibdev, &qib_attr_group);
 
 	ib_set_device_ops(ibdev, &qib_dev_ops);
 	ret = rvt_register_device(&dd->verbs_dev.rdi);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index ff6a40e259d5..c49f9e19d926 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -347,6 +347,7 @@ static const struct ib_device_ops usnic_dev_ops = {
 	.dereg_mr = usnic_ib_dereg_mr,
 	.destroy_cq = usnic_ib_destroy_cq,
 	.destroy_qp = usnic_ib_destroy_qp,
+	.device_group = &usnic_attr_group,
 	.get_dev_fw_str = usnic_get_dev_fw_str,
 	.get_link_layer = usnic_ib_port_link_layer,
 	.get_port_immutable = usnic_port_immutable,
@@ -400,8 +401,6 @@ static void *usnic_ib_device_add(struct pci_dev *dev)
 
 	ib_set_device_ops(&us_ibdev->ib_dev, &usnic_dev_ops);
 
-	rdma_set_device_sysfs_group(&us_ibdev->ib_dev, &usnic_attr_group);
-
 	ret = ib_device_set_netdev(&us_ibdev->ib_dev, us_ibdev->netdev, 1);
 	if (ret)
 		goto err_fwd_dealloc;
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
index 6bf2d2e47d07..8ed8bc24c69f 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
@@ -162,6 +162,7 @@ static const struct ib_device_ops pvrdma_dev_ops = {
 	.destroy_ah = pvrdma_destroy_ah,
 	.destroy_cq = pvrdma_destroy_cq,
 	.destroy_qp = pvrdma_destroy_qp,
+	.device_group = &pvrdma_attr_group,
 	.get_dev_fw_str = pvrdma_get_fw_ver_str,
 	.get_dma_mr = pvrdma_get_dma_mr,
 	.get_link_layer = pvrdma_port_link_layer,
@@ -240,7 +241,6 @@ static int pvrdma_register_device(struct pvrdma_dev *dev)
 	if (ret)
 		goto err_srq_free;
 	spin_lock_init(&dev->srq_tbl_lock);
-	rdma_set_device_sysfs_group(&dev->ib_dev, &pvrdma_attr_group);
 
 	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d", &dev->pdev->dev);
 	if (ret)
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index d3df59d897a7..90dd513cf993 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1123,6 +1123,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.destroy_qp = rxe_destroy_qp,
 	.destroy_srq = rxe_destroy_srq,
 	.detach_mcast = rxe_detach_mcast,
+	.device_group = &rxe_attr_group,
 	.enable_driver = rxe_enable_driver,
 	.get_dma_mr = rxe_get_dma_mr,
 	.get_hw_stats = rxe_ib_get_hw_stats,
@@ -1189,7 +1190,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	}
 	rxe->tfm = tfm;
 
-	rdma_set_device_sysfs_group(dev, &rxe_attr_group);
 	err = ib_register_device(dev, ibdev_name, NULL);
 	if (err)
 		pr_warn("%s failed with error %d\n", __func__, err);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 303471585dde..9423e70a881c 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2300,6 +2300,12 @@ struct ib_device_ops {
 	u32 uverbs_abi_ver;
 	unsigned int uverbs_no_driver_id_binding:1;
 
+	/*
+	 * NOTE: New drivers should not make use of device_group; instead new
+	 * device parameter should be exposed via netlink command. This
+	 * mechanism exists only for existing drivers.
+	 */
+	const struct attribute_group *device_group;
 	const struct attribute_group **port_groups;
 
 	int (*post_send)(struct ib_qp *qp, const struct ib_send_wr *send_wr,
@@ -4567,28 +4573,6 @@ int rdma_init_netdev(struct ib_device *device, u32 port_num,
 		     void (*setup)(struct net_device *),
 		     struct net_device *netdev);
 
-/**
- * rdma_set_device_sysfs_group - Set device attributes group to have
- *				 driver specific sysfs entries at
- *				 for infiniband class.
- *
- * @device:	device pointer for which attributes to be created
- * @group:	Pointer to group which should be added when device
- *		is registered with sysfs.
- * rdma_set_device_sysfs_group() allows existing drivers to expose one
- * group per device to have sysfs attributes.
- *
- * NOTE: New drivers should not make use of this API; instead new device
- * parameter should be exposed via netlink command. This API and mechanism
- * exist only for existing drivers.
- */
-static inline void
-rdma_set_device_sysfs_group(struct ib_device *dev,
-			    const struct attribute_group *group)
-{
-	dev->groups[1] = group;
-}
-
 /**
  * rdma_device_to_ibdev - Get ib_device pointer from device pointer
  *
@@ -4624,7 +4608,7 @@ static inline int ibdev_to_node(struct ib_device *ibdev)
  *
  * NOTE: New drivers should not make use of this API; This API is only for
  * existing drivers who have exposed sysfs entries using
- * rdma_set_device_sysfs_group().
+ * ops->device_group.
  */
 #define rdma_device_to_drv_device(dev, drv_dev_struct, ibdev_member)           \
 	container_of(rdma_device_to_ibdev(dev), drv_dev_struct, ibdev_member)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
  2021-06-07  8:17 ` [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer Leon Romanovsky
@ 2021-06-07 10:23   ` Greg KH
  2021-06-07 11:00     ` Leon Romanovsky
  2021-06-07 12:32     ` Jason Gunthorpe
  0 siblings, 2 replies; 40+ messages in thread
From: Greg KH @ 2021-06-07 10:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 11:17:27AM +0300, Leon Romanovsky wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> It is much saner to store a pointer to the kobject structure that contains
> the cannonical stats pointer than to copy the stats pointers into a public
> structure.
> 
> Future patches will require the sysfs pointer for other purposes.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/core/core_priv.h |  1 +
>  drivers/infiniband/core/nldev.c     |  8 ++------
>  drivers/infiniband/core/sysfs.c     | 14 +++++++++++---
>  include/rdma/ib_verbs.h             |  3 ++-
>  4 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
> index 29809dd30041..ec5c2c3db423 100644
> --- a/drivers/infiniband/core/core_priv.h
> +++ b/drivers/infiniband/core/core_priv.h
> @@ -378,6 +378,7 @@ struct net_device *rdma_read_gid_attr_ndev_rcu(const struct ib_gid_attr *attr);
>  
>  void ib_free_port_attrs(struct ib_core_device *coredev);
>  int ib_setup_port_attrs(struct ib_core_device *coredev);
> +struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev, u32 port_num);
>  
>  int rdma_compatdev_set(u8 enable);
>  
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 01316926cef6..e9b4b2cccaa0 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -2066,7 +2066,8 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
>  	}
>  
>  	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> -	if (!rdma_is_port_valid(device, port)) {
> +	stats = ib_get_hw_stats_port(device, port);
> +	if (!stats) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
> @@ -2088,11 +2089,6 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
>  		goto err_msg;
>  	}
>  
> -	stats = device->port_data ? device->port_data[port].hw_stats : NULL;
> -	if (stats == NULL) {
> -		ret = -EINVAL;
> -		goto err_msg;
> -	}
>  	mutex_lock(&stats->lock);
>  
>  	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index d11ceff2b4e4..b153dee1e0fa 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -1031,8 +1031,6 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
>  			goto err;
>  		port->hw_stats_ag = hsag;
>  		port->hw_stats = stats;
> -		if (device->port_data)
> -			device->port_data[port_num].hw_stats = stats;
>  	} else {
>  		struct kobject *kobj = &device->dev.kobj;
>  		ret = sysfs_create_group(kobj, hsag);
> @@ -1053,6 +1051,14 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
>  	kfree(stats);
>  }
>  
> +struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
> +					   u32 port_num)
> +{
> +	if (!ibdev->port_data || !rdma_is_port_valid(ibdev, port_num))
> +		return NULL;
> +	return ibdev->port_data[port_num].sysfs->hw_stats;
> +}
> +
>  static int add_port(struct ib_core_device *coredev, int port_num)
>  {
>  	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
> @@ -1171,6 +1177,8 @@ static int add_port(struct ib_core_device *coredev, int port_num)
>  		setup_hw_stats(device, p, port_num);
>  
>  	list_add_tail(&p->kobj.entry, &coredev->port_list);
> +	if (device->port_data && is_full_dev)
> +		device->port_data[port_num].sysfs = p;

You are saving off a pointer to a reference counted structure without
incrementing the reference count on it?  That's brave, and really wrong.
Properly increment/decrement it when using and when done with it.

thnaks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07  8:17 ` [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's Leon Romanovsky
@ 2021-06-07 10:25   ` Greg KH
  2021-06-07 10:51     ` Leon Romanovsky
  2021-06-07 12:14     ` Jason Gunthorpe
  0 siblings, 2 replies; 40+ messages in thread
From: Greg KH @ 2021-06-07 10:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> This code is trying to attach a list of counters grouped into 4 groups to
> the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> everything naturally as an ib_port_attribute and add a single
> attribute_groups list.
> 
> Remove all the naked kobject manipulations.

Much nicer.

But why do you need your counters to be atomic in the first place?  What
are they counting that requires this?  Given that they are just a
statistic for userspace, making them be a u64 should work just the same,
right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 10:25   ` Greg KH
@ 2021-06-07 10:51     ` Leon Romanovsky
  2021-06-07 11:22       ` Greg KH
  2021-06-07 12:14     ` Jason Gunthorpe
  1 sibling, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07 10:51 UTC (permalink / raw)
  To: Greg KH
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > This code is trying to attach a list of counters grouped into 4 groups to
> > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > everything naturally as an ib_port_attribute and add a single
> > attribute_groups list.
> > 
> > Remove all the naked kobject manipulations.
> 
> Much nicer.
> 
> But why do you need your counters to be atomic in the first place?  What
> are they counting that requires this?  Given that they are just a
> statistic for userspace, making them be a u64 should work just the same,
> right?

The statistic counters are per-port, while the cm.c flows run in
asynchronically in parallel for every CM connection.

We need atomic variable to ensure that "write to u64" is not
interrupted.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
  2021-06-07 10:23   ` Greg KH
@ 2021-06-07 11:00     ` Leon Romanovsky
  2021-06-07 11:23       ` Greg KH
  2021-06-07 12:32     ` Jason Gunthorpe
  1 sibling, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07 11:00 UTC (permalink / raw)
  To: Greg KH
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 12:23:23PM +0200, Greg KH wrote:
> On Mon, Jun 07, 2021 at 11:17:27AM +0300, Leon Romanovsky wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > It is much saner to store a pointer to the kobject structure that contains
> > the cannonical stats pointer than to copy the stats pointers into a public
> > structure.
> > 
> > Future patches will require the sysfs pointer for other purposes.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/core/core_priv.h |  1 +
> >  drivers/infiniband/core/nldev.c     |  8 ++------
> >  drivers/infiniband/core/sysfs.c     | 14 +++++++++++---
> >  include/rdma/ib_verbs.h             |  3 ++-
> >  4 files changed, 16 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
> > index 29809dd30041..ec5c2c3db423 100644
> > --- a/drivers/infiniband/core/core_priv.h
> > +++ b/drivers/infiniband/core/core_priv.h
> > @@ -378,6 +378,7 @@ struct net_device *rdma_read_gid_attr_ndev_rcu(const struct ib_gid_attr *attr);
> >  
> >  void ib_free_port_attrs(struct ib_core_device *coredev);
> >  int ib_setup_port_attrs(struct ib_core_device *coredev);
> > +struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev, u32 port_num);
> >  
> >  int rdma_compatdev_set(u8 enable);
> >  
> > diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> > index 01316926cef6..e9b4b2cccaa0 100644
> > --- a/drivers/infiniband/core/nldev.c
> > +++ b/drivers/infiniband/core/nldev.c
> > @@ -2066,7 +2066,8 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
> >  	}
> >  
> >  	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> > -	if (!rdma_is_port_valid(device, port)) {
> > +	stats = ib_get_hw_stats_port(device, port);
> > +	if (!stats) {
> >  		ret = -EINVAL;
> >  		goto err;
> >  	}
> > @@ -2088,11 +2089,6 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
> >  		goto err_msg;
> >  	}
> >  
> > -	stats = device->port_data ? device->port_data[port].hw_stats : NULL;
> > -	if (stats == NULL) {
> > -		ret = -EINVAL;
> > -		goto err_msg;
> > -	}
> >  	mutex_lock(&stats->lock);
> >  
> >  	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
> > diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> > index d11ceff2b4e4..b153dee1e0fa 100644
> > --- a/drivers/infiniband/core/sysfs.c
> > +++ b/drivers/infiniband/core/sysfs.c
> > @@ -1031,8 +1031,6 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
> >  			goto err;
> >  		port->hw_stats_ag = hsag;
> >  		port->hw_stats = stats;
> > -		if (device->port_data)
> > -			device->port_data[port_num].hw_stats = stats;
> >  	} else {
> >  		struct kobject *kobj = &device->dev.kobj;
> >  		ret = sysfs_create_group(kobj, hsag);
> > @@ -1053,6 +1051,14 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
> >  	kfree(stats);
> >  }
> >  
> > +struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
> > +					   u32 port_num)
> > +{
> > +	if (!ibdev->port_data || !rdma_is_port_valid(ibdev, port_num))
> > +		return NULL;
> > +	return ibdev->port_data[port_num].sysfs->hw_stats;
> > +}
> > +
> >  static int add_port(struct ib_core_device *coredev, int port_num)
> >  {
> >  	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
> > @@ -1171,6 +1177,8 @@ static int add_port(struct ib_core_device *coredev, int port_num)
> >  		setup_hw_stats(device, p, port_num);
> >  
> >  	list_add_tail(&p->kobj.entry, &coredev->port_list);
> > +	if (device->port_data && is_full_dev)
> > +		device->port_data[port_num].sysfs = p;
> 
> You are saving off a pointer to a reference counted structure without
> incrementing the reference count on it?  That's brave, and really wrong.

This is done to ensure that device->port_data[port_num].sysfs is not set
before "p" is properly configured. From reference point of view  "sysfs"
is equal to "p" and the latter already initialized.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 10:51     ` Leon Romanovsky
@ 2021-06-07 11:22       ` Greg KH
  2021-06-07 11:37         ` Leon Romanovsky
  0 siblings, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-06-07 11:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 01:51:36PM +0300, Leon Romanovsky wrote:
> On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> > On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > 
> > > This code is trying to attach a list of counters grouped into 4 groups to
> > > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > > everything naturally as an ib_port_attribute and add a single
> > > attribute_groups list.
> > > 
> > > Remove all the naked kobject manipulations.
> > 
> > Much nicer.
> > 
> > But why do you need your counters to be atomic in the first place?  What
> > are they counting that requires this?  Given that they are just a
> > statistic for userspace, making them be a u64 should work just the same,
> > right?
> 
> The statistic counters are per-port, while the cm.c flows run in
> asynchronically in parallel for every CM connection.
> 
> We need atomic variable to ensure that "write to u64" is not
> interrupted.

On what system is "write to u64" interruptable?  As these are per-port,
do multiple threads try to increment these at the same time?  And even
if they do, what happens if one is 'dropped' somehow because of this?
It's just a userspace statistic counter, what relies on this being
exact?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
  2021-06-07 11:00     ` Leon Romanovsky
@ 2021-06-07 11:23       ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-06-07 11:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 02:00:26PM +0300, Leon Romanovsky wrote:
> On Mon, Jun 07, 2021 at 12:23:23PM +0200, Greg KH wrote:
> > On Mon, Jun 07, 2021 at 11:17:27AM +0300, Leon Romanovsky wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > 
> > > It is much saner to store a pointer to the kobject structure that contains
> > > the cannonical stats pointer than to copy the stats pointers into a public
> > > structure.
> > > 
> > > Future patches will require the sysfs pointer for other purposes.
> > > 
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  drivers/infiniband/core/core_priv.h |  1 +
> > >  drivers/infiniband/core/nldev.c     |  8 ++------
> > >  drivers/infiniband/core/sysfs.c     | 14 +++++++++++---
> > >  include/rdma/ib_verbs.h             |  3 ++-
> > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
> > > index 29809dd30041..ec5c2c3db423 100644
> > > --- a/drivers/infiniband/core/core_priv.h
> > > +++ b/drivers/infiniband/core/core_priv.h
> > > @@ -378,6 +378,7 @@ struct net_device *rdma_read_gid_attr_ndev_rcu(const struct ib_gid_attr *attr);
> > >  
> > >  void ib_free_port_attrs(struct ib_core_device *coredev);
> > >  int ib_setup_port_attrs(struct ib_core_device *coredev);
> > > +struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev, u32 port_num);
> > >  
> > >  int rdma_compatdev_set(u8 enable);
> > >  
> > > diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> > > index 01316926cef6..e9b4b2cccaa0 100644
> > > --- a/drivers/infiniband/core/nldev.c
> > > +++ b/drivers/infiniband/core/nldev.c
> > > @@ -2066,7 +2066,8 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
> > >  	}
> > >  
> > >  	port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
> > > -	if (!rdma_is_port_valid(device, port)) {
> > > +	stats = ib_get_hw_stats_port(device, port);
> > > +	if (!stats) {
> > >  		ret = -EINVAL;
> > >  		goto err;
> > >  	}
> > > @@ -2088,11 +2089,6 @@ static int stat_get_doit_default_counter(struct sk_buff *skb,
> > >  		goto err_msg;
> > >  	}
> > >  
> > > -	stats = device->port_data ? device->port_data[port].hw_stats : NULL;
> > > -	if (stats == NULL) {
> > > -		ret = -EINVAL;
> > > -		goto err_msg;
> > > -	}
> > >  	mutex_lock(&stats->lock);
> > >  
> > >  	num_cnts = device->ops.get_hw_stats(device, stats, port, 0);
> > > diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> > > index d11ceff2b4e4..b153dee1e0fa 100644
> > > --- a/drivers/infiniband/core/sysfs.c
> > > +++ b/drivers/infiniband/core/sysfs.c
> > > @@ -1031,8 +1031,6 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
> > >  			goto err;
> > >  		port->hw_stats_ag = hsag;
> > >  		port->hw_stats = stats;
> > > -		if (device->port_data)
> > > -			device->port_data[port_num].hw_stats = stats;
> > >  	} else {
> > >  		struct kobject *kobj = &device->dev.kobj;
> > >  		ret = sysfs_create_group(kobj, hsag);
> > > @@ -1053,6 +1051,14 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
> > >  	kfree(stats);
> > >  }
> > >  
> > > +struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev,
> > > +					   u32 port_num)
> > > +{
> > > +	if (!ibdev->port_data || !rdma_is_port_valid(ibdev, port_num))
> > > +		return NULL;
> > > +	return ibdev->port_data[port_num].sysfs->hw_stats;
> > > +}
> > > +
> > >  static int add_port(struct ib_core_device *coredev, int port_num)
> > >  {
> > >  	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
> > > @@ -1171,6 +1177,8 @@ static int add_port(struct ib_core_device *coredev, int port_num)
> > >  		setup_hw_stats(device, p, port_num);
> > >  
> > >  	list_add_tail(&p->kobj.entry, &coredev->port_list);
> > > +	if (device->port_data && is_full_dev)
> > > +		device->port_data[port_num].sysfs = p;
> > 
> > You are saving off a pointer to a reference counted structure without
> > incrementing the reference count on it?  That's brave, and really wrong.
> 
> This is done to ensure that device->port_data[port_num].sysfs is not set
> before "p" is properly configured. From reference point of view  "sysfs"
> is equal to "p" and the latter already initialized.

That's fine, but again, you are saving a pointer to a reference counted
object, and then later using it, without ever having properly
incremented the reference count to assure that it really is going to
stick around.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 11:22       ` Greg KH
@ 2021-06-07 11:37         ` Leon Romanovsky
  2021-06-07 12:08           ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07 11:37 UTC (permalink / raw)
  To: Greg KH
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 01:22:26PM +0200, Greg KH wrote:
> On Mon, Jun 07, 2021 at 01:51:36PM +0300, Leon Romanovsky wrote:
> > On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> > > On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > 
> > > > This code is trying to attach a list of counters grouped into 4 groups to
> > > > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > > > everything naturally as an ib_port_attribute and add a single
> > > > attribute_groups list.
> > > > 
> > > > Remove all the naked kobject manipulations.
> > > 
> > > Much nicer.
> > > 
> > > But why do you need your counters to be atomic in the first place?  What
> > > are they counting that requires this?  Given that they are just a
> > > statistic for userspace, making them be a u64 should work just the same,
> > > right?
> > 
> > The statistic counters are per-port, while the cm.c flows run in
> > asynchronically in parallel for every CM connection.
> > 
> > We need atomic variable to ensure that "write to u64" is not
> > interrupted.
> 
> On what system is "write to u64" interruptable? 

On 32 bits, and yes, we have a customer who still uses such system.

> As these are per-port, do multiple threads try to increment these at
> the same time?  

Yes, CM connection can be seen as thread. Bottom line everything in parallel.

> And even if they do, what happens if one is 'dropped' somehow because of this?

Probably nothing, we increment the statistics only.

> It's just a userspace statistic counter, what relies on this being
> exact?

In kernel nothing, but I have no idea what userspace does with these counters.

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 11:37         ` Leon Romanovsky
@ 2021-06-07 12:08           ` Greg KH
  2021-06-07 12:41             ` Leon Romanovsky
  0 siblings, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-06-07 12:08 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 02:37:16PM +0300, Leon Romanovsky wrote:
> On Mon, Jun 07, 2021 at 01:22:26PM +0200, Greg KH wrote:
> > On Mon, Jun 07, 2021 at 01:51:36PM +0300, Leon Romanovsky wrote:
> > > On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> > > > On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > 
> > > > > This code is trying to attach a list of counters grouped into 4 groups to
> > > > > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > > > > everything naturally as an ib_port_attribute and add a single
> > > > > attribute_groups list.
> > > > > 
> > > > > Remove all the naked kobject manipulations.
> > > > 
> > > > Much nicer.
> > > > 
> > > > But why do you need your counters to be atomic in the first place?  What
> > > > are they counting that requires this?  Given that they are just a
> > > > statistic for userspace, making them be a u64 should work just the same,
> > > > right?
> > > 
> > > The statistic counters are per-port, while the cm.c flows run in
> > > asynchronically in parallel for every CM connection.
> > > 
> > > We need atomic variable to ensure that "write to u64" is not
> > > interrupted.
> > 
> > On what system is "write to u64" interruptable? 
> 
> On 32 bits, and yes, we have a customer who still uses such system.

So you will see what, a "tear"?  Or a stale value?

> > As these are per-port, do multiple threads try to increment these at
> > the same time?  
> 
> Yes, CM connection can be seen as thread. Bottom line everything in parallel.
> 
> > And even if they do, what happens if one is 'dropped' somehow because of this?
> 
> Probably nothing, we increment the statistics only.

So you are hitting cache lines for no good reason, probably not a good
idea, you are wasting cpu cycles for nothing :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 10:25   ` Greg KH
  2021-06-07 10:51     ` Leon Romanovsky
@ 2021-06-07 12:14     ` Jason Gunthorpe
  2021-06-07 12:39       ` Greg KH
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2021-06-07 12:14 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Doug Ledford, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > This code is trying to attach a list of counters grouped into 4 groups to
> > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > everything naturally as an ib_port_attribute and add a single
> > attribute_groups list.
> > 
> > Remove all the naked kobject manipulations.
> 
> Much nicer.
> 
> But why do you need your counters to be atomic in the first place?  What
> are they counting that requires this?  

The write side of the counter is being updated from concurrent kernel
threads without locking, so this is an atomic because the write side
needs atomic_add().

Making them a naked u64 will cause significant corruption on the write
side, and packet counters that are not accurate after quiescence are
not very useful things.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
  2021-06-07 10:23   ` Greg KH
  2021-06-07 11:00     ` Leon Romanovsky
@ 2021-06-07 12:32     ` Jason Gunthorpe
  1 sibling, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-06-07 12:32 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Doug Ledford, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 12:23:23PM +0200, Greg KH wrote:
> >  static int add_port(struct ib_core_device *coredev, int port_num)
> >  {
> >  	struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
> > @@ -1171,6 +1177,8 @@ static int add_port(struct ib_core_device *coredev, int port_num)
> >  		setup_hw_stats(device, p, port_num);
> >  
> >  	list_add_tail(&p->kobj.entry, &coredev->port_list);
> > +	if (device->port_data && is_full_dev)
> > +		device->port_data[port_num].sysfs = p;
> 
> You are saving off a pointer to a reference counted structure without
> incrementing the reference count on it?  

This storage borrows another reference count, primarily because there
is no locking to read/write .sysfs. It is a fairly common idiom.

You can see it in the free path:

		port->ibdev->port_data[port->port_num].sysfs = NULL;
	kobject_put(&port->kobj);  // port == p above

Due to the lack of locks the whole external thing is arranged so that
the 3 users of .sysfs are sequenced properly around
setup_port()/destroy_port() using other external locks.

Adding more refs without also adding locking is just confusing what
the data protection model is. This is a borrowed ref and access is
only allowed when other locking is properly sequencing it with the ref
owner's manipulation of .sysfs.

Eg I would reject some code sequence like this:

	port->ibdev->port_data[port->port_num].sysfs = NULL;
	kobject_put(&port->kobj);  // one for .sysfs
	kobject_put(&port->kobj);  // one for our stack

As being pretty bogus.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 12:14     ` Jason Gunthorpe
@ 2021-06-07 12:39       ` Greg KH
  2021-06-07 12:50         ` Jason Gunthorpe
  0 siblings, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-06-07 12:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 09:14:11AM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> > On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > 
> > > This code is trying to attach a list of counters grouped into 4 groups to
> > > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > > everything naturally as an ib_port_attribute and add a single
> > > attribute_groups list.
> > > 
> > > Remove all the naked kobject manipulations.
> > 
> > Much nicer.
> > 
> > But why do you need your counters to be atomic in the first place?  What
> > are they counting that requires this?  
> 
> The write side of the counter is being updated from concurrent kernel
> threads without locking, so this is an atomic because the write side
> needs atomic_add().

So the atomic write forces a lock :(

> Making them a naked u64 will cause significant corruption on the write
> side, and packet counters that are not accurate after quiescence are
> not very useful things.

How "accurate" do these have to be?

And have you all tried them?

I'm pushing back here as I see a lot of atomics used for debugging
statistics for no good reason all over the place.  Especially when
userspace just does not care.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 12:08           ` Greg KH
@ 2021-06-07 12:41             ` Leon Romanovsky
  0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07 12:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 02:08:27PM +0200, Greg KH wrote:
> On Mon, Jun 07, 2021 at 02:37:16PM +0300, Leon Romanovsky wrote:
> > On Mon, Jun 07, 2021 at 01:22:26PM +0200, Greg KH wrote:
> > > On Mon, Jun 07, 2021 at 01:51:36PM +0300, Leon Romanovsky wrote:
> > > > On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> > > > > On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > 
> > > > > > This code is trying to attach a list of counters grouped into 4 groups to
> > > > > > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > > > > > everything naturally as an ib_port_attribute and add a single
> > > > > > attribute_groups list.
> > > > > > 
> > > > > > Remove all the naked kobject manipulations.
> > > > > 
> > > > > Much nicer.
> > > > > 
> > > > > But why do you need your counters to be atomic in the first place?  What
> > > > > are they counting that requires this?  Given that they are just a
> > > > > statistic for userspace, making them be a u64 should work just the same,
> > > > > right?
> > > > 
> > > > The statistic counters are per-port, while the cm.c flows run in
> > > > asynchronically in parallel for every CM connection.
> > > > 
> > > > We need atomic variable to ensure that "write to u64" is not
> > > > interrupted.
> > > 
> > > On what system is "write to u64" interruptable? 
> > 
> > On 32 bits, and yes, we have a customer who still uses such system.
> 
> So you will see what, a "tear"?  Or a stale value?

Does it really make any difference for the users? The end result is that
value is incorrect.

> 
> > > As these are per-port, do multiple threads try to increment these at
> > > the same time?  
> > 
> > Yes, CM connection can be seen as thread. Bottom line everything in parallel.
> > 
> > > And even if they do, what happens if one is 'dropped' somehow because of this?
> > 
> > Probably nothing, we increment the statistics only.
> 
> So you are hitting cache lines for no good reason, probably not a good
> idea, you are wasting cpu cycles for nothing :(

I prefer correctness over performance optimization in paths that far
from being hot.

The thing is that we have such correctness for free without any extra
code complexity by simply using proper data structure. Why shouldn't
we use atomics?

Also, we aren't talking about some driver code, but about core infrastructure
for whole RDMA subsystem.

Thanks

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 12:39       ` Greg KH
@ 2021-06-07 12:50         ` Jason Gunthorpe
  2021-06-11  7:25           ` Haakon Bugge
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2021-06-07 12:50 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Doug Ledford, Kees Cook, Nathan Chancellor,
	Adit Ranadive, Ariel Elior, Christian Benvenuti,
	clang-built-linux, Dennis Dalessandro, Devesh Sharma,
	Gal Pressman, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Mustafa Ismail, Naresh Kumar PBS,
	Nelson Escobar, Nick Desaulniers, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, VMware PV-Drivers, Yishai Hadas,
	Zhu Yanjun

On Mon, Jun 07, 2021 at 02:39:45PM +0200, Greg KH wrote:
> On Mon, Jun 07, 2021 at 09:14:11AM -0300, Jason Gunthorpe wrote:
> > On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> > > On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > 
> > > > This code is trying to attach a list of counters grouped into 4 groups to
> > > > the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> > > > everything naturally as an ib_port_attribute and add a single
> > > > attribute_groups list.
> > > > 
> > > > Remove all the naked kobject manipulations.
> > > 
> > > Much nicer.
> > > 
> > > But why do you need your counters to be atomic in the first place?  What
> > > are they counting that requires this?  
> > 
> > The write side of the counter is being updated from concurrent kernel
> > threads without locking, so this is an atomic because the write side
> > needs atomic_add().
> 
> So the atomic write forces a lock :(

Of course, but a single atomic is cheaper than the double atomic in a
full spinlock.
 
> > Making them a naked u64 will cause significant corruption on the write
> > side, and packet counters that are not accurate after quiescence are
> > not very useful things.
> 
> How "accurate" do these have to be?

They have to be accurate. They are networking packet counters. What is
the point of burning CPU cycles keeping track of inaccurate data?

> And have you all tried them?

I've used them over the years. This stuff is something like 15 years
old now.
 
> I'm pushing back here as I see a lot of atomics used for debugging
> statistics for no good reason all over the place.  Especially when
> userspace just does not care.

If userspace doesn't care then just delete the counter entirely.

Reporting a wrong/misleading debugging counter data sounds just
horrible to me.

What good is any debug result you get from the counter if it has to be
questioned because the counter is allowed to be wrong?

""The sender says it sent 7 packets, but the receivers debug counter
reports only 6! I guess my bug is a lost packet in the network.""

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 12/15] RDMA/hfi1: Use attributes for the port sysfs
  2021-06-07  8:17 ` [PATCH rdma-next v1 12/15] RDMA/hfi1: " Leon Romanovsky
@ 2021-06-07 13:07   ` kernel test robot
  2021-06-07 13:31     ` Leon Romanovsky
  0 siblings, 1 reply; 40+ messages in thread
From: kernel test robot @ 2021-06-07 13:07 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]

Hi Leon,

I love your patch! Perhaps something to improve:

[auto build test WARNING on rdma/for-next]
[also build test WARNING on next-20210607]
[cannot apply to v5.13-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Leon-Romanovsky/Reorganize-sysfs-file-creation-for-struct-ib_devices/20210607-161959
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: x86_64-randconfig-b001-20210607 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project ae973380c5f6be77ce395022be40350942260be9)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/1e8e4eebe16212ad4853b902eb8d045fce03f826
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Leon-Romanovsky/Reorganize-sysfs-file-creation-for-struct-ib_devices/20210607-161959
        git checkout 1e8e4eebe16212ad4853b902eb8d045fce03f826
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/infiniband/hw/hfi1/sysfs.c:175:37: warning: unused variable 'port_cc_group' [-Wunused-const-variable]
   static const struct attribute_group port_cc_group = {
                                       ^
   1 warning generated.


vim +/port_cc_group +175 drivers/infiniband/hw/hfi1/sysfs.c

   174	
 > 175	static const struct attribute_group port_cc_group = {
   176		.name = "CCMgtA",
   177		.attrs = port_cc_attributes,
   178		.bin_attrs = port_cc_bin_attributes,
   179	};
   180	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32999 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-07  8:17 ` [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces Leon Romanovsky
@ 2021-06-07 13:29   ` Parav Pandit
  2021-06-07 23:32     ` Jason Gunthorpe
  2021-06-08  4:55     ` Leon Romanovsky
  0 siblings, 2 replies; 40+ messages in thread
From: Parav Pandit @ 2021-06-07 13:29 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford, Jason Gunthorpe
  Cc: Greg KH, Kees Cook, Nathan Chancellor, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, clang-built-linux,
	Dennis Dalessandro, Devesh Sharma, Gal Pressman, linux-kernel,
	linux-rdma, Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun



> From: Leon Romanovsky <leon@kernel.org>
> Sent: Monday, June 7, 2021 1:48 PM
> 
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> Now that the port_groups data is being destroyed and managed by the core
> code this restriction is no longer needed. All the ib_port_attrs are compatible
> with the core's sysfs lifecycle.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/core/device.c | 10 ++++------
> drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
>  2 files changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/core/device.c
> b/drivers/infiniband/core/device.c
> index 2cbd77933ea5..92f224a97481 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1698,13 +1698,11 @@ int ib_device_set_netns_put(struct sk_buff
> *skb,
>  	}
> 
>  	/*
> -	 * Currently supported only for those providers which support
> -	 * disassociation and don't do port specific sysfs init. Once a
> -	 * port_cleanup infrastructure is implemented, this limitation will be
> -	 * removed.
> +	 * All the ib_clients, including uverbs, are reset when the namespace
> is
> +	 * changed and this cannot be blocked waiting for userspace to do
> +	 * something, so disassociation is mandatory.
>  	 */
> -	if (!dev->ops.disassociate_ucontext || dev->ops.port_groups ||
> -	    ib_devices_shared_netns) {
> +	if (!dev->ops.disassociate_ucontext || ib_devices_shared_netns) {
>  		ret = -EOPNOTSUPP;
>  		goto ns_err;
>  	}
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 09a2e1066df0..f42034fcf3d9 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct
> ib_core_device *coredev, int port_num,
>  	ret = sysfs_create_groups(&p->kobj, p->groups_list);
>  	if (ret)
>  		goto err_del;
> -	if (is_full_dev) {
> -		ret = sysfs_create_groups(&p->kobj, device-
> >ops.port_groups);
> -		if (ret)
> -			goto err_groups;
> -	}
> +	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
> +	if (ret)
> +		goto err_groups;
> 
This will expose counters in all net namespaces in shared mode (default case).
Application running in one net namespace will be able to monitor counters of other net namespace.
This should be avoided.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 12/15] RDMA/hfi1: Use attributes for the port sysfs
  2021-06-07 13:07   ` kernel test robot
@ 2021-06-07 13:31     ` Leon Romanovsky
  0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-07 13:31 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

On Mon, Jun 07, 2021 at 09:07:09PM +0800, kernel test robot wrote:
> Hi Leon,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on rdma/for-next]
> [also build test WARNING on next-20210607]
> [cannot apply to v5.13-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Leon-Romanovsky/Reorganize-sysfs-file-creation-for-struct-ib_devices/20210607-161959
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
> config: x86_64-randconfig-b001-20210607 (attached as .config)
> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project ae973380c5f6be77ce395022be40350942260be9)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install x86_64 cross compiling tool for clang build
>         # apt-get install binutils-x86-64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/1e8e4eebe16212ad4853b902eb8d045fce03f826
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Leon-Romanovsky/Reorganize-sysfs-file-creation-for-struct-ib_devices/20210607-161959
>         git checkout 1e8e4eebe16212ad4853b902eb8d045fce03f826
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> drivers/infiniband/hw/hfi1/sysfs.c:175:37: warning: unused variable 'port_cc_group' [-Wunused-const-variable]
>    static const struct attribute_group port_cc_group = {
>                                        ^
>    1 warning generated.

Jason,

You forgot to call to rdma_set_device_sysfs_group(... , port_cc_group)
in hfi1.

Thanks

> 
> 
> vim +/port_cc_group +175 drivers/infiniband/hw/hfi1/sysfs.c
> 
>    174	
>  > 175	static const struct attribute_group port_cc_group = {
>    176		.name = "CCMgtA",
>    177		.attrs = port_cc_attributes,
>    178		.bin_attrs = port_cc_bin_attributes,
>    179	};
>    180	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-07 13:29   ` Parav Pandit
@ 2021-06-07 23:32     ` Jason Gunthorpe
  2021-06-08  5:29       ` Parav Pandit
  2021-06-08  4:55     ` Leon Romanovsky
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2021-06-07 23:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Leon Romanovsky, Doug Ledford, Greg KH, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

On Mon, Jun 07, 2021 at 01:29:58PM +0000, Parav Pandit wrote:
> 
> 
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Monday, June 7, 2021 1:48 PM
> > 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Now that the port_groups data is being destroyed and managed by the core
> > code this restriction is no longer needed. All the ib_port_attrs are compatible
> > with the core's sysfs lifecycle.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >  drivers/infiniband/core/device.c | 10 ++++------
> > drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
> >  2 files changed, 10 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/infiniband/core/device.c
> > b/drivers/infiniband/core/device.c
> > index 2cbd77933ea5..92f224a97481 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -1698,13 +1698,11 @@ int ib_device_set_netns_put(struct sk_buff
> > *skb,
> >  	}
> > 
> >  	/*
> > -	 * Currently supported only for those providers which support
> > -	 * disassociation and don't do port specific sysfs init. Once a
> > -	 * port_cleanup infrastructure is implemented, this limitation will be
> > -	 * removed.
> > +	 * All the ib_clients, including uverbs, are reset when the namespace
> > is
> > +	 * changed and this cannot be blocked waiting for userspace to do
> > +	 * something, so disassociation is mandatory.
> >  	 */
> > -	if (!dev->ops.disassociate_ucontext || dev->ops.port_groups ||
> > -	    ib_devices_shared_netns) {

So this is OK since we have the clean up now

> > +	if (!dev->ops.disassociate_ucontext || ib_devices_shared_netns) {
> >  		ret = -EOPNOTSUPP;
> >  		goto ns_err;
> >  	}
> > diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> > index 09a2e1066df0..f42034fcf3d9 100644
> > +++ b/drivers/infiniband/core/sysfs.c
> > @@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct
> > ib_core_device *coredev, int port_num,
> >  	ret = sysfs_create_groups(&p->kobj, p->groups_list);
> >  	if (ret)
> >  		goto err_del;
> > -	if (is_full_dev) {
> > -		ret = sysfs_create_groups(&p->kobj, device-
> > >ops.port_groups);
> > -		if (ret)
> > -			goto err_groups;
> > -	}
> > +	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
> > +	if (ret)
> > +		goto err_groups;
> 
> This will expose counters in all net namespaces in shared mode
> (default case).  Application running in one net namespace will be
> able to monitor counters of other net namespace.  This should be
> avoided.

And you want this to stay blocked because the port_groups mostly
contain counters?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-07 13:29   ` Parav Pandit
  2021-06-07 23:32     ` Jason Gunthorpe
@ 2021-06-08  4:55     ` Leon Romanovsky
  2021-06-08  5:32       ` Parav Pandit
  1 sibling, 1 reply; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-08  4:55 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Doug Ledford, Jason Gunthorpe, Greg KH, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

On Mon, Jun 07, 2021 at 01:29:58PM +0000, Parav Pandit wrote:
> 
> 
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Monday, June 7, 2021 1:48 PM
> > 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Now that the port_groups data is being destroyed and managed by the core
> > code this restriction is no longer needed. All the ib_port_attrs are compatible
> > with the core's sysfs lifecycle.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/core/device.c | 10 ++++------
> > drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
> >  2 files changed, 10 insertions(+), 17 deletions(-)

<...>

> > diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> > index 09a2e1066df0..f42034fcf3d9 100644
> > --- a/drivers/infiniband/core/sysfs.c
> > +++ b/drivers/infiniband/core/sysfs.c
> > @@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct
> > ib_core_device *coredev, int port_num,
> >  	ret = sysfs_create_groups(&p->kobj, p->groups_list);
> >  	if (ret)
> >  		goto err_del;
> > -	if (is_full_dev) {
> > -		ret = sysfs_create_groups(&p->kobj, device-
> > >ops.port_groups);
> > -		if (ret)
> > -			goto err_groups;
> > -	}
> > +	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
> > +	if (ret)
> > +		goto err_groups;
> > 
> This will expose counters in all net namespaces in shared mode (default case).
> Application running in one net namespace will be able to monitor counters of other net namespace.
> This should be avoided.

In shared mode, we are sharing sysfs anyway and have two options to deal
with the port properties (counters):
1. Show them in all namespaces as being global to port which is shared anyway.
2. Show them in init_net namespace only and applications that were left
in this namespace will see not their counters anyway.

Why should we avoid "item 1"?

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-07 23:32     ` Jason Gunthorpe
@ 2021-06-08  5:29       ` Parav Pandit
  0 siblings, 0 replies; 40+ messages in thread
From: Parav Pandit @ 2021-06-08  5:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Greg KH, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun



> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, June 8, 2021 5:02 AM
> 
> On Mon, Jun 07, 2021 at 01:29:58PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Monday, June 7, 2021 1:48 PM
> > >
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > >
> > > Now that the port_groups data is being destroyed and managed by the
> > > core code this restriction is no longer needed. All the
> > > ib_port_attrs are compatible with the core's sysfs lifecycle.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > drivers/infiniband/core/device.c | 10 ++++------
> > > drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
> > >  2 files changed, 10 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/device.c
> > > b/drivers/infiniband/core/device.c
> > > index 2cbd77933ea5..92f224a97481 100644
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -1698,13 +1698,11 @@ int ib_device_set_netns_put(struct sk_buff
> > > *skb,
> > >  	}
> > >
> > >  	/*
> > > -	 * Currently supported only for those providers which support
> > > -	 * disassociation and don't do port specific sysfs init. Once a
> > > -	 * port_cleanup infrastructure is implemented, this limitation will be
> > > -	 * removed.
> > > +	 * All the ib_clients, including uverbs, are reset when the
> > > +namespace
> > > is
> > > +	 * changed and this cannot be blocked waiting for userspace to do
> > > +	 * something, so disassociation is mandatory.
> > >  	 */
> > > -	if (!dev->ops.disassociate_ucontext || dev->ops.port_groups ||
> > > -	    ib_devices_shared_netns) {
> 
> So this is OK since we have the clean up now
> 
I didn't review the whole series, but yes if there is sysfs clean routine, it should be ok.

> > > +	if (!dev->ops.disassociate_ucontext || ib_devices_shared_netns) {
> > >  		ret = -EOPNOTSUPP;
> > >  		goto ns_err;
> > >  	}
> > > diff --git a/drivers/infiniband/core/sysfs.c
> > > b/drivers/infiniband/core/sysfs.c index 09a2e1066df0..f42034fcf3d9
> > > 100644
> > > +++ b/drivers/infiniband/core/sysfs.c
> > > @@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct
> > > ib_core_device *coredev, int port_num,
> > >  	ret = sysfs_create_groups(&p->kobj, p->groups_list);
> > >  	if (ret)
> > >  		goto err_del;
> > > -	if (is_full_dev) {
> > > -		ret = sysfs_create_groups(&p->kobj, device-
> > > >ops.port_groups);
> > > -		if (ret)
> > > -			goto err_groups;
> > > -	}
> > > +	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
> > > +	if (ret)
> > > +		goto err_groups;
> >
> > This will expose counters in all net namespaces in shared mode
> > (default case).  Application running in one net namespace will be able
> > to monitor counters of other net namespace.  This should be avoided.
> 
> And you want this to stay blocked because the port_groups mostly contain
> counters?
Yes.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-08  4:55     ` Leon Romanovsky
@ 2021-06-08  5:32       ` Parav Pandit
  2021-06-08  6:57         ` Leon Romanovsky
  0 siblings, 1 reply; 40+ messages in thread
From: Parav Pandit @ 2021-06-08  5:32 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Greg KH, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun



> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, June 8, 2021 10:25 AM
> 
> On Mon, Jun 07, 2021 at 01:29:58PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Monday, June 7, 2021 1:48 PM
> > >
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > >
> > > Now that the port_groups data is being destroyed and managed by the
> > > core code this restriction is no longer needed. All the
> > > ib_port_attrs are compatible with the core's sysfs lifecycle.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  drivers/infiniband/core/device.c | 10 ++++------
> > > drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
> > >  2 files changed, 10 insertions(+), 17 deletions(-)
> 
> <...>
> 
> > > diff --git a/drivers/infiniband/core/sysfs.c
> > > b/drivers/infiniband/core/sysfs.c index 09a2e1066df0..f42034fcf3d9
> > > 100644
> > > --- a/drivers/infiniband/core/sysfs.c
> > > +++ b/drivers/infiniband/core/sysfs.c
> > > @@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct
> > > ib_core_device *coredev, int port_num,
> > >  	ret = sysfs_create_groups(&p->kobj, p->groups_list);
> > >  	if (ret)
> > >  		goto err_del;
> > > -	if (is_full_dev) {
> > > -		ret = sysfs_create_groups(&p->kobj, device-
> > > >ops.port_groups);
> > > -		if (ret)
> > > -			goto err_groups;
> > > -	}
> > > +	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
> > > +	if (ret)
> > > +		goto err_groups;
> > >
> > This will expose counters in all net namespaces in shared mode (default
> case).
> > Application running in one net namespace will be able to monitor counters
> of other net namespace.
> > This should be avoided.
> 
> In shared mode, we are sharing sysfs anyway and have two options to deal
> with the port properties (counters):
> 1. Show them in all namespaces as being global to port which is shared
> anyway.
> 2. Show them in init_net namespace only and applications that were left in
> this namespace will see not their counters anyway.
> 
> Why should we avoid "item 1"?
Because it is incorrect to show port counters updated by application running in net ns 1, to show to application running in net ns 2.
Once/if there is per netns counters exist, than those counters can be shown using more modern rdma stats command.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces
  2021-06-08  5:32       ` Parav Pandit
@ 2021-06-08  6:57         ` Leon Romanovsky
  0 siblings, 0 replies; 40+ messages in thread
From: Leon Romanovsky @ 2021-06-08  6:57 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Doug Ledford, Jason Gunthorpe, Greg KH, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

On Tue, Jun 08, 2021 at 05:32:56AM +0000, Parav Pandit wrote:
> 
> 
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Tuesday, June 8, 2021 10:25 AM
> > 
> > On Mon, Jun 07, 2021 at 01:29:58PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Monday, June 7, 2021 1:48 PM
> > > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > >
> > > > Now that the port_groups data is being destroyed and managed by the
> > > > core code this restriction is no longer needed. All the
> > > > ib_port_attrs are compatible with the core's sysfs lifecycle.
> > > >
> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > ---
> > > >  drivers/infiniband/core/device.c | 10 ++++------
> > > > drivers/infiniband/core/sysfs.c  | 17 ++++++-----------
> > > >  2 files changed, 10 insertions(+), 17 deletions(-)
> > 
> > <...>
> > 
> > > > diff --git a/drivers/infiniband/core/sysfs.c
> > > > b/drivers/infiniband/core/sysfs.c index 09a2e1066df0..f42034fcf3d9
> > > > 100644
> > > > --- a/drivers/infiniband/core/sysfs.c
> > > > +++ b/drivers/infiniband/core/sysfs.c
> > > > @@ -1236,11 +1236,9 @@ static struct ib_port *setup_port(struct
> > > > ib_core_device *coredev, int port_num,
> > > >  	ret = sysfs_create_groups(&p->kobj, p->groups_list);
> > > >  	if (ret)
> > > >  		goto err_del;
> > > > -	if (is_full_dev) {
> > > > -		ret = sysfs_create_groups(&p->kobj, device-
> > > > >ops.port_groups);
> > > > -		if (ret)
> > > > -			goto err_groups;
> > > > -	}
> > > > +	ret = sysfs_create_groups(&p->kobj, device->ops.port_groups);
> > > > +	if (ret)
> > > > +		goto err_groups;
> > > >
> > > This will expose counters in all net namespaces in shared mode (default
> > case).
> > > Application running in one net namespace will be able to monitor counters
> > of other net namespace.
> > > This should be avoided.
> > 
> > In shared mode, we are sharing sysfs anyway and have two options to deal
> > with the port properties (counters):
> > 1. Show them in all namespaces as being global to port which is shared
> > anyway.
> > 2. Show them in init_net namespace only and applications that were left in
> > this namespace will see not their counters anyway.
> > 
> > Why should we avoid "item 1"?
> Because it is incorrect to show port counters updated by application running in net ns 1, to show to application running in net ns 2.
> Once/if there is per netns counters exist, than those counters can be shown using more modern rdma stats command.

ok, I see your point and have no strong position about it.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-07 12:50         ` Jason Gunthorpe
@ 2021-06-11  7:25           ` Haakon Bugge
  2021-06-11  8:16             ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Haakon Bugge @ 2021-06-11  7:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Greg KH, Leon Romanovsky, Doug Ledford, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, OFED mailing list,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun



> On 7 Jun 2021, at 14:50, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Mon, Jun 07, 2021 at 02:39:45PM +0200, Greg KH wrote:
>> On Mon, Jun 07, 2021 at 09:14:11AM -0300, Jason Gunthorpe wrote:
>>> On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
>>>> On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
>>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>>> 
>>>>> This code is trying to attach a list of counters grouped into 4 groups to
>>>>> the ib_port sysfs. Instead of creating a bunch of kobjects simply express
>>>>> everything naturally as an ib_port_attribute and add a single
>>>>> attribute_groups list.
>>>>> 
>>>>> Remove all the naked kobject manipulations.
>>>> 
>>>> Much nicer.
>>>> 
>>>> But why do you need your counters to be atomic in the first place?  What
>>>> are they counting that requires this?  
>>> 
>>> The write side of the counter is being updated from concurrent kernel
>>> threads without locking, so this is an atomic because the write side
>>> needs atomic_add().
>> 
>> So the atomic write forces a lock :(
> 
> Of course, but a single atomic is cheaper than the double atomic in a
> full spinlock.
> 
>>> Making them a naked u64 will cause significant corruption on the write
>>> side, and packet counters that are not accurate after quiescence are
>>> not very useful things.
>> 
>> How "accurate" do these have to be?
> 
> They have to be accurate. They are networking packet counters. What is
> the point of burning CPU cycles keeping track of inaccurate data?

Consider a CPU with a 32-bit wide datapath to memory, which reads and writes the most significant 4-byte word first:

    Memory                   CPU1                   CPU2
MSW         LSW        MSW         LSW        MSW         LSW
0x0  0xffffffff
0x0  0xffffffff        0x0
0x0  0xffffffff        0x0  0xffffffff
0x0  0xffffffff        0x1         0x0                         cpu1 has incremented its register
0x1  0xffffffff        0x1         0x0                         cpu1 has written msw
0x1  0xffffffff        0x1         0x0        0x1              cpu2 has read msw
0x1  0xffffffff        0x1         0x0        0x1  0xffffffff
0x1         0x0        0x1         0x0        0x2         0x0
0x2         0x0        0x1         0x0        0x2         0x0
0x2         0x0        0x1         0x0        0x2         0x0


I would say that 0x200000000 vs. 0x100000001 is more than inaccurate!


> 
>> And have you all tried them?
> 
> I've used them over the years. This stuff is something like 15 years
> old now.

We are using them.


Thxs, Håkon

>> I'm pushing back here as I see a lot of atomics used for debugging
>> statistics for no good reason all over the place.  Especially when
>> userspace just does not care.
> 
> If userspace doesn't care then just delete the counter entirely.
> 
> Reporting a wrong/misleading debugging counter data sounds just
> horrible to me.
> 
> What good is any debug result you get from the counter if it has to be
> questioned because the counter is allowed to be wrong?
> 
> ""The sender says it sent 7 packets, but the receivers debug counter
> reports only 6! I guess my bug is a lost packet in the network.""
> 
> Jason


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-11  7:25           ` Haakon Bugge
@ 2021-06-11  8:16             ` Greg KH
  2021-06-14  3:27               ` Haakon Bugge
  0 siblings, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-06-11  8:16 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Jason Gunthorpe, Leon Romanovsky, Doug Ledford, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, OFED mailing list,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun

On Fri, Jun 11, 2021 at 07:25:46AM +0000, Haakon Bugge wrote:
> 
> 
> > On 7 Jun 2021, at 14:50, Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > On Mon, Jun 07, 2021 at 02:39:45PM +0200, Greg KH wrote:
> >> On Mon, Jun 07, 2021 at 09:14:11AM -0300, Jason Gunthorpe wrote:
> >>> On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
> >>>> On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
> >>>>> From: Jason Gunthorpe <jgg@nvidia.com>
> >>>>> 
> >>>>> This code is trying to attach a list of counters grouped into 4 groups to
> >>>>> the ib_port sysfs. Instead of creating a bunch of kobjects simply express
> >>>>> everything naturally as an ib_port_attribute and add a single
> >>>>> attribute_groups list.
> >>>>> 
> >>>>> Remove all the naked kobject manipulations.
> >>>> 
> >>>> Much nicer.
> >>>> 
> >>>> But why do you need your counters to be atomic in the first place?  What
> >>>> are they counting that requires this?  
> >>> 
> >>> The write side of the counter is being updated from concurrent kernel
> >>> threads without locking, so this is an atomic because the write side
> >>> needs atomic_add().
> >> 
> >> So the atomic write forces a lock :(
> > 
> > Of course, but a single atomic is cheaper than the double atomic in a
> > full spinlock.
> > 
> >>> Making them a naked u64 will cause significant corruption on the write
> >>> side, and packet counters that are not accurate after quiescence are
> >>> not very useful things.
> >> 
> >> How "accurate" do these have to be?
> > 
> > They have to be accurate. They are networking packet counters. What is
> > the point of burning CPU cycles keeping track of inaccurate data?
> 
> Consider a CPU with a 32-bit wide datapath to memory, which reads and writes the most significant 4-byte word first:

What CPU is that?

>     Memory                   CPU1                   CPU2
> MSW         LSW        MSW         LSW        MSW         LSW
> 0x0  0xffffffff
> 0x0  0xffffffff        0x0
> 0x0  0xffffffff        0x0  0xffffffff
> 0x0  0xffffffff        0x1         0x0                         cpu1 has incremented its register
> 0x1  0xffffffff        0x1         0x0                         cpu1 has written msw
> 0x1  0xffffffff        0x1         0x0        0x1              cpu2 has read msw
> 0x1  0xffffffff        0x1         0x0        0x1  0xffffffff
> 0x1         0x0        0x1         0x0        0x2         0x0
> 0x2         0x0        0x1         0x0        0x2         0x0
> 0x2         0x0        0x1         0x0        0x2         0x0
> 
> 
> I would say that 0x200000000 vs. 0x100000001 is more than inaccurate!

True, then maybe these should just be 32bit counters :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
  2021-06-11  8:16             ` Greg KH
@ 2021-06-14  3:27               ` Haakon Bugge
  0 siblings, 0 replies; 40+ messages in thread
From: Haakon Bugge @ 2021-06-14  3:27 UTC (permalink / raw)
  To: Greg KH
  Cc: Jason Gunthorpe, Leon Romanovsky, Doug Ledford, Kees Cook,
	Nathan Chancellor, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, clang-built-linux, Dennis Dalessandro,
	Devesh Sharma, Gal Pressman, linux-kernel, OFED mailing list,
	Michal Kalderon, Mike Marciniszyn, Mustafa Ismail,
	Naresh Kumar PBS, Nelson Escobar, Nick Desaulniers,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem,
	VMware PV-Drivers, Yishai Hadas, Zhu Yanjun



> On 11 Jun 2021, at 10:16, Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> On Fri, Jun 11, 2021 at 07:25:46AM +0000, Haakon Bugge wrote:
>> 
>> 
>>> On 7 Jun 2021, at 14:50, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>> 
>>> On Mon, Jun 07, 2021 at 02:39:45PM +0200, Greg KH wrote:
>>>> On Mon, Jun 07, 2021 at 09:14:11AM -0300, Jason Gunthorpe wrote:
>>>>> On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
>>>>>> On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
>>>>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>>>>> 
>>>>>>> This code is trying to attach a list of counters grouped into 4 groups to
>>>>>>> the ib_port sysfs. Instead of creating a bunch of kobjects simply express
>>>>>>> everything naturally as an ib_port_attribute and add a single
>>>>>>> attribute_groups list.
>>>>>>> 
>>>>>>> Remove all the naked kobject manipulations.
>>>>>> 
>>>>>> Much nicer.
>>>>>> 
>>>>>> But why do you need your counters to be atomic in the first place?  What
>>>>>> are they counting that requires this?  
>>>>> 
>>>>> The write side of the counter is being updated from concurrent kernel
>>>>> threads without locking, so this is an atomic because the write side
>>>>> needs atomic_add().
>>>> 
>>>> So the atomic write forces a lock :(
>>> 
>>> Of course, but a single atomic is cheaper than the double atomic in a
>>> full spinlock.
>>> 
>>>>> Making them a naked u64 will cause significant corruption on the write
>>>>> side, and packet counters that are not accurate after quiescence are
>>>>> not very useful things.
>>>> 
>>>> How "accurate" do these have to be?
>>> 
>>> They have to be accurate. They are networking packet counters. What is
>>> the point of burning CPU cycles keeping track of inaccurate data?
>> 
>> Consider a CPU with a 32-bit wide datapath to memory, which reads and writes the most significant 4-byte word first:
> 
> What CPU is that?

Hypothetical32 :-)

>>    Memory                   CPU1                   CPU2
>> MSW         LSW        MSW         LSW        MSW         LSW
>> 0x0  0xffffffff
>> 0x0  0xffffffff        0x0
>> 0x0  0xffffffff        0x0  0xffffffff
>> 0x0  0xffffffff        0x1         0x0                         cpu1 has incremented its register
>> 0x1  0xffffffff        0x1         0x0                         cpu1 has written msw
>> 0x1  0xffffffff        0x1         0x0        0x1              cpu2 has read msw
>> 0x1  0xffffffff        0x1         0x0        0x1  0xffffffff
>> 0x1         0x0        0x1         0x0        0x2         0x0
>> 0x2         0x0        0x1         0x0        0x2         0x0
>> 0x2         0x0        0x1         0x0        0x2         0x0
>> 
>> 
>> I would say that 0x200000000 vs. 0x100000001 is more than inaccurate!
> 
> True, then maybe these should just be 32bit counters :)

How long can we then run without wrapping? Our UEK is security updated by means of ksplice, and since the introduction of Spectre/Meltdown CPU fixes in 2018, we have been able to update the kernels running wrt. security fixes.

I see no harm by using an atomic 64-bit add. Yes, it serializes the pipeline and locks the cache-line in the first-level cache for as long as it takes to for a RMW on it. Compared to surround an ordinary add with lock/unlock, an atomic increment is strongly preferred in my opinion.

Ordinary add without locking leads to the issue above on systems with 32-bit wide memory data paths.

Using 32-bit counters raises the issue of wrapping in systems running for years and having a high frequency of IB connection forming and resurrections.


Thxs, Håkon


> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-06-14  3:27 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07  8:17 [PATCH rdma-next v1 00/15] Reorganize sysfs file creation for struct ib_devices Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 01/15] RDMA: Split the alloc_hw_stats() ops to port and device variants Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 02/15] RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer Leon Romanovsky
2021-06-07 10:23   ` Greg KH
2021-06-07 11:00     ` Leon Romanovsky
2021-06-07 11:23       ` Greg KH
2021-06-07 12:32     ` Jason Gunthorpe
2021-06-07  8:17 ` [PATCH rdma-next v1 03/15] RDMA/core: Split port and device counter sysfs attributes Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 04/15] RDMA/core: Split gid_attrs related sysfs from add_port() Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 05/15] RDMA/core: Simplify how the gid_attrs sysfs is created Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 06/15] RDMA/core: Simplify how the port " Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 07/15] RDMA/core: Create the device hw_counters through the normal groups mechanism Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 08/15] RDMA/core: Remove the kobject_uevent() NOP Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 09/15] RDMA/core: Expose the ib port sysfs attribute machinery Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's Leon Romanovsky
2021-06-07 10:25   ` Greg KH
2021-06-07 10:51     ` Leon Romanovsky
2021-06-07 11:22       ` Greg KH
2021-06-07 11:37         ` Leon Romanovsky
2021-06-07 12:08           ` Greg KH
2021-06-07 12:41             ` Leon Romanovsky
2021-06-07 12:14     ` Jason Gunthorpe
2021-06-07 12:39       ` Greg KH
2021-06-07 12:50         ` Jason Gunthorpe
2021-06-11  7:25           ` Haakon Bugge
2021-06-11  8:16             ` Greg KH
2021-06-14  3:27               ` Haakon Bugge
2021-06-07  8:17 ` [PATCH rdma-next v1 11/15] RDMA/qib: Use attributes for the port sysfs Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 12/15] RDMA/hfi1: " Leon Romanovsky
2021-06-07 13:07   ` kernel test robot
2021-06-07 13:31     ` Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 13/15] RDMA: Change ops->init_port to ops->port_groups Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 14/15] RDMA/core: Allow port_groups to be used with namespaces Leon Romanovsky
2021-06-07 13:29   ` Parav Pandit
2021-06-07 23:32     ` Jason Gunthorpe
2021-06-08  5:29       ` Parav Pandit
2021-06-08  4:55     ` Leon Romanovsky
2021-06-08  5:32       ` Parav Pandit
2021-06-08  6:57         ` Leon Romanovsky
2021-06-07  8:17 ` [PATCH rdma-next v1 15/15] RDMA: Remove rdma_set_device_sysfs_group() Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.