netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards
@ 2022-07-20 15:12 Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Jiri Pirko
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

This patchset implements two features:
1) "devlink dev info" is exposed for line card (patches 5-8)
2) "devlink dev flash" is implemented for line card gearbox
   flashing (patch 9)

For every line card, "a nested" auxiliary device is created which
allows to bind the features mentioned above (patch 3).

The relationship between line card and its auxiliary dev devlink
is carried over extra line card netlink attribute (patches 2 and 4).

The first patch removes devlink_mutex from devlink_register/unregister()
which eliminates possible deadlock during devlink reload command.

Examples:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
       16x100G

$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

$ devlink dev info auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0:
  versions:
      fixed:
        hw.revision 0
        fw.psid MT_0000000749
      running:
        ini.version 4
        fw 19.2010.1312

$ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

Jiri Pirko (11):
  net: devlink: make sure that devlink_try_get() works with valid
    pointer during xarray iteration
  net: devlink: introduce nested devlink entity for line card
  mlxsw: core_linecards: Introduce per line card auxiliary device
  mlxsw: core_linecards: Expose HW revision and INI version
  mlxsw: reg: Extend MDDQ by device_info
  mlxsw: core_linecards: Probe provisioned line cards for devices and
    expose FW version
  mlxsw: reg: Add Management DownStream Device Tunneling Register
  mlxsw: core_linecards: Expose device PSID over device info
  mlxsw: core_linecards: Implement line card device flashing
  selftests: mlxsw: Check line card info on provisioned line card
  selftests: mlxsw: Check line card info on activated line card

 Documentation/networking/devlink/mlxsw.rst    |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/Kconfig   |   1 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  44 +-
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  35 ++
 .../mellanox/mlxsw/core_linecard_dev.c        | 184 ++++++++
 .../ethernet/mellanox/mlxsw/core_linecards.c  | 405 ++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/reg.h     | 173 +++++++-
 include/net/devlink.h                         |   2 +
 include/uapi/linux/devlink.h                  |   2 +
 net/core/devlink.c                            | 156 ++++++-
 .../drivers/net/mlxsw/devlink_linecard.sh     |  54 +++
 12 files changed, 1050 insertions(+), 32 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-20 22:25   ` Keller, Jacob E
  2022-07-21  0:49   ` Jakub Kicinski
  2022-07-20 15:12 ` [patch net-next v3 02/11] net: devlink: introduce nested devlink entity for line card Jiri Pirko
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Remove dependency on devlink_mutex during devlinks xarray iteration.

The reason is that devlink_register/unregister() functions taking
devlink_mutex would deadlock during devlink reload operation of devlink
instance which registers/unregisters nested devlink instances.

The devlinks xarray consistency is ensured internally by xarray.
There is a reference taken when working with devlink using
devlink_try_get(). But there is no guarantee that devlink pointer
picked during xarray iteration is not freed before devlink_try_get()
is called.

Make sure that devlink_try_get() works with valid pointer.
Achieve it by:
1) Splitting devlink_put() so the completion is sent only
   after grace period. Completion unblocks the devlink_unregister()
   routine, which is followed-up by devlink_free()
2) Iterate the devlink xarray holding RCU read lock.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
- s/enf/end/ in devlink_put() comment
- added missing rcu_read_lock() call to info_get_dumpit()
- extended patch description by motivation
- removed an extra "by" from patch description
v1->v2:
- new patch (originally part of different patchset)
---
 net/core/devlink.c | 114 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 96 insertions(+), 18 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 98d79feeb3dc..6a3931a8e338 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -70,6 +70,7 @@ struct devlink {
 	u8 reload_failed:1;
 	refcount_t refcount;
 	struct completion comp;
+	struct rcu_head rcu;
 	char priv[] __aligned(NETDEV_ALIGN);
 };
 
@@ -221,8 +222,6 @@ static DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
 /* devlink_mutex
  *
  * An overall lock guarding every operation coming from userspace.
- * It also guards devlink devices list and it is taken when
- * driver registers/unregisters it.
  */
 static DEFINE_MUTEX(devlink_mutex);
 
@@ -232,10 +231,21 @@ struct net *devlink_net(const struct devlink *devlink)
 }
 EXPORT_SYMBOL_GPL(devlink_net);
 
+static void __devlink_put_rcu(struct rcu_head *head)
+{
+	struct devlink *devlink = container_of(head, struct devlink, rcu);
+
+	complete(&devlink->comp);
+}
+
 void devlink_put(struct devlink *devlink)
 {
 	if (refcount_dec_and_test(&devlink->refcount))
-		complete(&devlink->comp);
+		/* Make sure unregister operation that may await the completion
+		 * is unblocked only after all users are after the end of
+		 * RCU grace period.
+		 */
+		call_rcu(&devlink->rcu, __devlink_put_rcu);
 }
 
 struct devlink *__must_check devlink_try_get(struct devlink *devlink)
@@ -295,6 +305,7 @@ static struct devlink *devlink_get_from_attrs(struct net *net,
 
 	lockdep_assert_held(&devlink_mutex);
 
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
 		    strcmp(dev_name(devlink->dev), devname) == 0 &&
@@ -306,6 +317,7 @@ static struct devlink *devlink_get_from_attrs(struct net *net,
 
 	if (!found || !devlink_try_get(devlink))
 		devlink = ERR_PTR(-ENODEV);
+	rcu_read_unlock();
 
 	return devlink;
 }
@@ -1329,9 +1341,11 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -1358,7 +1372,9 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 	if (err != -EMSGSIZE)
@@ -1432,29 +1448,32 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
-		if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) {
-			devlink_put(devlink);
-			continue;
-		}
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			goto retry;
 
-		if (idx < start) {
-			idx++;
-			devlink_put(devlink);
-			continue;
-		}
+		if (idx < start)
+			goto inc;
 
 		err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
 				      NETLINK_CB(cb->skb).portid,
 				      cb->nlh->nlmsg_seq, NLM_F_MULTI);
-		devlink_put(devlink);
-		if (err)
+		if (err) {
+			devlink_put(devlink);
 			goto out;
+		}
+inc:
 		idx++;
+retry:
+		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -1495,9 +1514,11 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -1523,7 +1544,9 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -2177,9 +2200,11 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -2208,7 +2233,9 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg,
 		mutex_unlock(&devlink->linecards_lock);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -2449,9 +2476,11 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -2477,7 +2506,9 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -2601,9 +2632,11 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
 		    !devlink->ops->sb_pool_get)
@@ -2626,7 +2659,9 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -2822,9 +2857,11 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
 		    !devlink->ops->sb_port_pool_get)
@@ -2847,7 +2884,9 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -3071,9 +3110,11 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
 		    !devlink->ops->sb_tc_pool_bind_get)
@@ -3097,7 +3138,9 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -5158,9 +5201,11 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -5188,7 +5233,9 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -5393,9 +5440,11 @@ static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -5428,7 +5477,9 @@ static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -5977,9 +6028,11 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -5990,7 +6043,9 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg,
 		devlink_put(devlink);
 		if (err)
 			goto out;
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 	cb->args[0] = idx;
@@ -6511,9 +6566,11 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -6531,13 +6588,16 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
 			err = 0;
 		else if (err) {
 			devlink_put(devlink);
+			rcu_read_lock();
 			break;
 		}
 inc:
 		idx++;
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 	mutex_unlock(&devlink_mutex);
 
 	if (err != -EMSGSIZE)
@@ -7691,9 +7751,11 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry_rep;
@@ -7719,11 +7781,13 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
 		mutex_unlock(&devlink->reporters_lock);
 retry_rep:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
 
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry_port;
@@ -7754,7 +7818,9 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry_port:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -8291,9 +8357,11 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -8319,7 +8387,9 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -8518,9 +8588,11 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -8547,7 +8619,9 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -8832,9 +8906,11 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg,
 	int err;
 
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
@@ -8861,7 +8937,9 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg,
 		devl_unlock(devlink);
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 out:
 	mutex_unlock(&devlink_mutex);
 
@@ -9589,10 +9667,8 @@ void devlink_register(struct devlink *devlink)
 	ASSERT_DEVLINK_NOT_REGISTERED(devlink);
 	/* Make sure that we are in .probe() routine */
 
-	mutex_lock(&devlink_mutex);
 	xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
 	devlink_notify_register(devlink);
-	mutex_unlock(&devlink_mutex);
 }
 EXPORT_SYMBOL_GPL(devlink_register);
 
@@ -9609,10 +9685,8 @@ void devlink_unregister(struct devlink *devlink)
 	devlink_put(devlink);
 	wait_for_completion(&devlink->comp);
 
-	mutex_lock(&devlink_mutex);
 	devlink_notify_unregister(devlink);
 	xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
-	mutex_unlock(&devlink_mutex);
 }
 EXPORT_SYMBOL_GPL(devlink_unregister);
 
@@ -12281,9 +12355,11 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net)
 	 * all devlink instances from this namespace into init_net.
 	 */
 	mutex_lock(&devlink_mutex);
+	rcu_read_lock();
 	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
 		if (!devlink_try_get(devlink))
 			continue;
+		rcu_read_unlock();
 
 		if (!net_eq(devlink_net(devlink), net))
 			goto retry;
@@ -12297,7 +12373,9 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net)
 			pr_warn("Failed to reload devlink instance into init_net\n");
 retry:
 		devlink_put(devlink);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 	mutex_unlock(&devlink_mutex);
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 02/11] net: devlink: introduce nested devlink entity for line card
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 03/11] mlxsw: core_linecards: Introduce per line card auxiliary device Jiri Pirko
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

For the purpose of exposing device info and allow flash update which is
going to be implemented in follow-up patches, introduce a possibility
for a line card to expose relation to nested devlink entity. The nested
devlink entity represents the line card.

Example:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
       16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
v2->v3:
- added Ido's RWB tag
v1->v2:
- s/delink/devlink in devlink_linecard_nested_dl_set comment
- fixed alignment
- s/updated/update in patch description
- added Jakub's ack
- added "net: " prefix to patch subject
- rebased
---
 include/net/devlink.h        |  2 ++
 include/uapi/linux/devlink.h |  2 ++
 net/core/devlink.c           | 42 ++++++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 780744b550b8..5bd3fac12e9e 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1580,6 +1580,8 @@ void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
 void devlink_linecard_provision_fail(struct devlink_linecard *linecard);
 void devlink_linecard_activate(struct devlink_linecard *linecard);
 void devlink_linecard_deactivate(struct devlink_linecard *linecard);
+void devlink_linecard_nested_dl_set(struct devlink_linecard *linecard,
+				    struct devlink *nested_devlink);
 int devl_sb_register(struct devlink *devlink, unsigned int sb_index,
 		     u32 size, u16 ingress_pools_count,
 		     u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b3d40a5d72ff..541321695f52 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -576,6 +576,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
 	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
 
+	DEVLINK_ATTR_NESTED_DEVLINK,		/* nested */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6a3931a8e338..2833461fb703 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -89,6 +89,7 @@ struct devlink_linecard {
 	const char *type;
 	struct devlink_linecard_type *types;
 	unsigned int types_count;
+	struct devlink *nested_devlink;
 };
 
 /**
@@ -815,6 +816,24 @@ static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
 	return 0;
 }
 
+static int devlink_nl_put_nested_handle(struct sk_buff *msg, struct devlink *devlink)
+{
+	struct nlattr *nested_attr;
+
+	nested_attr = nla_nest_start(msg, DEVLINK_ATTR_NESTED_DEVLINK);
+	if (!nested_attr)
+		return -EMSGSIZE;
+	if (devlink_nl_put_handle(msg, devlink))
+		goto nla_put_failure;
+
+	nla_nest_end(msg, nested_attr);
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(msg, nested_attr);
+	return -EMSGSIZE;
+}
+
 struct devlink_reload_combination {
 	enum devlink_reload_action action;
 	enum devlink_reload_limit limit;
@@ -2127,6 +2146,10 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
 		nla_nest_end(msg, attr);
 	}
 
+	if (linecard->nested_devlink &&
+	    devlink_nl_put_nested_handle(msg, linecard->nested_devlink))
+		goto nla_put_failure;
+
 	genlmsg_end(msg, hdr);
 	return 0;
 
@@ -10390,6 +10413,7 @@ EXPORT_SYMBOL_GPL(devlink_linecard_provision_set);
 void devlink_linecard_provision_clear(struct devlink_linecard *linecard)
 {
 	mutex_lock(&linecard->state_lock);
+	WARN_ON(linecard->nested_devlink);
 	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
 	linecard->type = NULL;
 	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
@@ -10408,6 +10432,7 @@ EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear);
 void devlink_linecard_provision_fail(struct devlink_linecard *linecard)
 {
 	mutex_lock(&linecard->state_lock);
+	WARN_ON(linecard->nested_devlink);
 	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING_FAILED;
 	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
 	mutex_unlock(&linecard->state_lock);
@@ -10455,6 +10480,23 @@ void devlink_linecard_deactivate(struct devlink_linecard *linecard)
 }
 EXPORT_SYMBOL_GPL(devlink_linecard_deactivate);
 
+/**
+ *	devlink_linecard_nested_dl_set - Attach/detach nested devlink
+ *					 instance to linecard.
+ *
+ *	@linecard: devlink linecard
+ *	@nested_devlink: devlink instance to attach or NULL to detach
+ */
+void devlink_linecard_nested_dl_set(struct devlink_linecard *linecard,
+				    struct devlink *nested_devlink)
+{
+	mutex_lock(&linecard->state_lock);
+	linecard->nested_devlink = nested_devlink;
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+	mutex_unlock(&linecard->state_lock);
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_nested_dl_set);
+
 int devl_sb_register(struct devlink *devlink, unsigned int sb_index,
 		     u32 size, u16 ingress_pools_count,
 		     u16 egress_pools_count, u16 ingress_tc_count,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 03/11] mlxsw: core_linecards: Introduce per line card auxiliary device
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 02/11] net: devlink: introduce nested devlink entity for line card Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-21  8:04   ` Ido Schimmel
  2022-07-20 15:12 ` [patch net-next v3 04/11] mlxsw: core_linecards: Expose HW revision and INI version Jiri Pirko
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

In order to be eventually able to expose line card gearbox version and
possibility to flash FW, model the line card as a separate device on
auxiliary bus.

Add the auxiliary device for provisioned line card in order to be able
to expose provisioned line card info over devlink dev info. When the
line card becomes active, there may be other additional info added to
the output.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
- extended patch description
- added comment to mlxsw_linecard_bdev_del()
- squashed in mlxsw: "core_linecard_dev: Set nested devlink relationship
  for a line card" patch
v1->v2:
- added auxdev removal to mlxsw_linecard_fini()
- adjusted mlxsw_linecard_bdev_del() to cope with bdev == NULL
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig   |   1 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  13 +-
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  10 ++
 .../mellanox/mlxsw/core_linecard_dev.c        | 160 ++++++++++++++++++
 .../ethernet/mellanox/mlxsw/core_linecards.c  |  11 ++
 6 files changed, 194 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index 4683312861ac..a510bf2cff2f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -7,6 +7,7 @@ config MLXSW_CORE
 	tristate "Mellanox Technologies Switch ASICs support"
 	select NET_DEVLINK
 	select MLXFW
+	select AUXILIARY_BUS
 	help
 	  This driver supports Mellanox Technologies Switch ASICs family.
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index c2d6d64ffe4b..3ca9fce759ea 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -2,7 +2,7 @@
 obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o core_acl_flex_keys.o \
 				   core_acl_flex_actions.o core_env.o \
-				   core_linecards.o
+				   core_linecards.o core_linecard_dev.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_HWMON) += core_hwmon.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_THERMAL) += core_thermal.o
 obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 61eb96b93889..831b0d3472c6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -3334,9 +3334,15 @@ static int __init mlxsw_core_module_init(void)
 {
 	int err;
 
+	err = mlxsw_linecard_driver_register();
+	if (err)
+		return err;
+
 	mlxsw_wq = alloc_workqueue(mlxsw_core_driver_name, 0, 0);
-	if (!mlxsw_wq)
-		return -ENOMEM;
+	if (!mlxsw_wq) {
+		err = -ENOMEM;
+		goto err_alloc_workqueue;
+	}
 	mlxsw_owq = alloc_ordered_workqueue("%s_ordered", 0,
 					    mlxsw_core_driver_name);
 	if (!mlxsw_owq) {
@@ -3347,6 +3353,8 @@ static int __init mlxsw_core_module_init(void)
 
 err_alloc_ordered_workqueue:
 	destroy_workqueue(mlxsw_wq);
+err_alloc_workqueue:
+	mlxsw_linecard_driver_unregister();
 	return err;
 }
 
@@ -3354,6 +3362,7 @@ static void __exit mlxsw_core_module_exit(void)
 {
 	destroy_workqueue(mlxsw_owq);
 	destroy_workqueue(mlxsw_wq);
+	mlxsw_linecard_driver_unregister();
 }
 
 module_init(mlxsw_core_module_init);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index a3491ef2aa7e..b22db13fa547 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -12,6 +12,7 @@
 #include <linux/skbuff.h>
 #include <linux/workqueue.h>
 #include <linux/net_namespace.h>
+#include <linux/auxiliary_bus.h>
 #include <net/devlink.h>
 
 #include "trap.h"
@@ -561,6 +562,8 @@ enum mlxsw_linecard_status_event_type {
 	MLXSW_LINECARD_STATUS_EVENT_TYPE_UNPROVISION,
 };
 
+struct mlxsw_linecard_bdev;
+
 struct mlxsw_linecard {
 	u8 slot_index;
 	struct mlxsw_linecards *linecards;
@@ -575,6 +578,7 @@ struct mlxsw_linecard {
 	   active:1;
 	u16 hw_revision;
 	u16 ini_version;
+	struct mlxsw_linecard_bdev *bdev;
 };
 
 struct mlxsw_linecard_types_info;
@@ -614,4 +618,10 @@ void mlxsw_linecards_event_ops_unregister(struct mlxsw_core *mlxsw_core,
 					  struct mlxsw_linecards_event_ops *ops,
 					  void *priv);
 
+int mlxsw_linecard_bdev_add(struct mlxsw_linecard *linecard);
+void mlxsw_linecard_bdev_del(struct mlxsw_linecard *linecard);
+
+int mlxsw_linecard_driver_register(void);
+void mlxsw_linecard_driver_unregister(void);
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
new file mode 100644
index 000000000000..b1fa9f681003
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
@@ -0,0 +1,160 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/* Copyright (c) 2022 NVIDIA Corporation and Mellanox Technologies. All rights reserved */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/idr.h>
+#include <linux/gfp.h>
+#include <linux/slab.h>
+#include <net/devlink.h>
+#include "core.h"
+
+#define MLXSW_LINECARD_DEV_ID_NAME "lc"
+
+struct mlxsw_linecard_dev {
+	struct mlxsw_linecard *linecard;
+};
+
+struct mlxsw_linecard_bdev {
+	struct auxiliary_device adev;
+	struct mlxsw_linecard *linecard;
+	struct mlxsw_linecard_dev *linecard_dev;
+};
+
+static DEFINE_IDA(mlxsw_linecard_bdev_ida);
+
+static int mlxsw_linecard_bdev_id_alloc(void)
+{
+	return ida_alloc(&mlxsw_linecard_bdev_ida, GFP_KERNEL);
+}
+
+static void mlxsw_linecard_bdev_id_free(int id)
+{
+	ida_free(&mlxsw_linecard_bdev_ida, id);
+}
+
+static void mlxsw_linecard_bdev_release(struct device *device)
+{
+	struct auxiliary_device *adev =
+			container_of(device, struct auxiliary_device, dev);
+	struct mlxsw_linecard_bdev *linecard_bdev =
+			container_of(adev, struct mlxsw_linecard_bdev, adev);
+
+	mlxsw_linecard_bdev_id_free(adev->id);
+	kfree(linecard_bdev);
+}
+
+int mlxsw_linecard_bdev_add(struct mlxsw_linecard *linecard)
+{
+	struct mlxsw_linecard_bdev *linecard_bdev;
+	int err;
+	int id;
+
+	id = mlxsw_linecard_bdev_id_alloc();
+	if (id < 0)
+		return id;
+
+	linecard_bdev = kzalloc(sizeof(*linecard_bdev), GFP_KERNEL);
+	if (!linecard_bdev) {
+		mlxsw_linecard_bdev_id_free(id);
+		return -ENOMEM;
+	}
+	linecard_bdev->adev.id = id;
+	linecard_bdev->adev.name = MLXSW_LINECARD_DEV_ID_NAME;
+	linecard_bdev->adev.dev.release = mlxsw_linecard_bdev_release;
+	linecard_bdev->adev.dev.parent = linecard->linecards->bus_info->dev;
+	linecard_bdev->linecard = linecard;
+
+	err = auxiliary_device_init(&linecard_bdev->adev);
+	if (err) {
+		mlxsw_linecard_bdev_id_free(id);
+		kfree(linecard_bdev);
+		return err;
+	}
+
+	err = auxiliary_device_add(&linecard_bdev->adev);
+	if (err) {
+		auxiliary_device_uninit(&linecard_bdev->adev);
+		return err;
+	}
+
+	linecard->bdev = linecard_bdev;
+	return 0;
+}
+
+void mlxsw_linecard_bdev_del(struct mlxsw_linecard *linecard)
+{
+	struct mlxsw_linecard_bdev *linecard_bdev = linecard->bdev;
+
+	if (!linecard_bdev)
+		/* Unprovisioned line cards do not have an auxiliary device. */
+		return;
+	auxiliary_device_delete(&linecard_bdev->adev);
+	auxiliary_device_uninit(&linecard_bdev->adev);
+	linecard->bdev = NULL;
+}
+
+static const struct devlink_ops mlxsw_linecard_dev_devlink_ops = {
+};
+
+static int mlxsw_linecard_bdev_probe(struct auxiliary_device *adev,
+				     const struct auxiliary_device_id *id)
+{
+	struct mlxsw_linecard_bdev *linecard_bdev =
+			container_of(adev, struct mlxsw_linecard_bdev, adev);
+	struct mlxsw_linecard *linecard = linecard_bdev->linecard;
+	struct mlxsw_linecard_dev *linecard_dev;
+	struct devlink *devlink;
+
+	devlink = devlink_alloc(&mlxsw_linecard_dev_devlink_ops,
+				sizeof(*linecard_dev), &adev->dev);
+	if (!devlink)
+		return -ENOMEM;
+	linecard_dev = devlink_priv(devlink);
+	linecard_dev->linecard = linecard_bdev->linecard;
+	linecard_bdev->linecard_dev = linecard_dev;
+
+	devlink_register(devlink);
+	devlink_linecard_nested_dl_set(linecard->devlink_linecard, devlink);
+	return 0;
+}
+
+static void mlxsw_linecard_bdev_remove(struct auxiliary_device *adev)
+{
+	struct mlxsw_linecard_bdev *linecard_bdev =
+			container_of(adev, struct mlxsw_linecard_bdev, adev);
+	struct devlink *devlink = priv_to_devlink(linecard_bdev->linecard_dev);
+	struct mlxsw_linecard *linecard = linecard_bdev->linecard;
+
+	devlink_linecard_nested_dl_set(linecard->devlink_linecard, NULL);
+	devlink_unregister(devlink);
+	devlink_free(devlink);
+}
+
+static const struct auxiliary_device_id mlxsw_linecard_bdev_id_table[] = {
+	{ .name = KBUILD_MODNAME "." MLXSW_LINECARD_DEV_ID_NAME },
+	{},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, mlxsw_linecard_bdev_id_table);
+
+static struct auxiliary_driver mlxsw_linecard_driver = {
+	.name = MLXSW_LINECARD_DEV_ID_NAME,
+	.probe = mlxsw_linecard_bdev_probe,
+	.remove = mlxsw_linecard_bdev_remove,
+	.id_table = mlxsw_linecard_bdev_id_table,
+};
+
+int mlxsw_linecard_driver_register(void)
+{
+	return auxiliary_driver_register(&mlxsw_linecard_driver);
+}
+
+void mlxsw_linecard_driver_unregister(void)
+{
+	auxiliary_driver_unregister(&mlxsw_linecard_driver);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
index 5c9869dcf674..43696d8badca 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
@@ -232,6 +232,7 @@ mlxsw_linecard_provision_set(struct mlxsw_linecard *linecard, u8 card_type,
 {
 	struct mlxsw_linecards *linecards = linecard->linecards;
 	const char *type;
+	int err;
 
 	type = mlxsw_linecard_types_lookup(linecards, card_type);
 	mlxsw_linecard_status_event_done(linecard,
@@ -252,6 +253,14 @@ mlxsw_linecard_provision_set(struct mlxsw_linecard *linecard, u8 card_type,
 	linecard->provisioned = true;
 	linecard->hw_revision = hw_revision;
 	linecard->ini_version = ini_version;
+
+	err = mlxsw_linecard_bdev_add(linecard);
+	if (err) {
+		linecard->provisioned = false;
+		mlxsw_linecard_provision_fail(linecard);
+		return err;
+	}
+
 	devlink_linecard_provision_set(linecard->devlink_linecard, type);
 	return 0;
 }
@@ -260,6 +269,7 @@ static void mlxsw_linecard_provision_clear(struct mlxsw_linecard *linecard)
 {
 	mlxsw_linecard_status_event_done(linecard,
 					 MLXSW_LINECARD_STATUS_EVENT_TYPE_UNPROVISION);
+	mlxsw_linecard_bdev_del(linecard);
 	linecard->provisioned = false;
 	devlink_linecard_provision_clear(linecard->devlink_linecard);
 }
@@ -885,6 +895,7 @@ static void mlxsw_linecard_fini(struct mlxsw_core *mlxsw_core,
 	mlxsw_core_flush_owq();
 	if (linecard->active)
 		mlxsw_linecard_active_clear(linecard);
+	mlxsw_linecard_bdev_del(linecard);
 	devlink_linecard_destroy(linecard->devlink_linecard);
 	mutex_destroy(&linecard->lock);
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 04/11] mlxsw: core_linecards: Expose HW revision and INI version
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (2 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 03/11] mlxsw: core_linecards: Introduce per line card auxiliary device Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-21  8:05   ` Ido Schimmel
  2022-07-20 15:12 ` [patch net-next v3 05/11] mlxsw: reg: Extend MDDQ by device_info Jiri Pirko
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Implement info_get() to expose HW revision of a linecard and loaded INI
version.

Example:

$ devlink dev info auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0:
  versions:
      fixed:
        hw.revision 0
      running:
        ini.version 4

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
- changed return value of mlxsw_linecard_devlink_info_get() is linecard
  is not provisioned to -EOPNOTSUPP
---
 Documentation/networking/devlink/mlxsw.rst    | 18 ++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  4 +++
 .../mellanox/mlxsw/core_linecard_dev.c        | 11 ++++++++
 .../ethernet/mellanox/mlxsw/core_linecards.c  | 28 +++++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/Documentation/networking/devlink/mlxsw.rst b/Documentation/networking/devlink/mlxsw.rst
index cf857cb4ba8f..aededcf68df4 100644
--- a/Documentation/networking/devlink/mlxsw.rst
+++ b/Documentation/networking/devlink/mlxsw.rst
@@ -58,6 +58,24 @@ The ``mlxsw`` driver reports the following versions
      - running
      - Three digit firmware version
 
+Line card auxiliary device info versions
+========================================
+
+The ``mlxsw`` driver reports the following versions for line card auxiliary device
+
+.. list-table:: devlink info versions implemented
+   :widths: 5 5 90
+
+   * - Name
+     - Type
+     - Description
+   * - ``hw.revision``
+     - fixed
+     - The hardware revision for this line card
+   * - ``ini.version``
+     - running
+     - Version of line card INI loaded
+
 Driver-specific Traps
 =====================
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index b22db13fa547..87c58b512536 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -599,6 +599,10 @@ mlxsw_linecard_get(struct mlxsw_linecards *linecards, u8 slot_index)
 	return &linecards->linecards[slot_index - 1];
 }
 
+int mlxsw_linecard_devlink_info_get(struct mlxsw_linecard *linecard,
+				    struct devlink_info_req *req,
+				    struct netlink_ext_ack *extack);
+
 int mlxsw_linecards_init(struct mlxsw_core *mlxsw_core,
 			 const struct mlxsw_bus_info *bus_info);
 void mlxsw_linecards_fini(struct mlxsw_core *mlxsw_core);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
index b1fa9f681003..13c20b83b190 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
@@ -98,7 +98,18 @@ void mlxsw_linecard_bdev_del(struct mlxsw_linecard *linecard)
 	linecard->bdev = NULL;
 }
 
+static int mlxsw_linecard_dev_devlink_info_get(struct devlink *devlink,
+					       struct devlink_info_req *req,
+					       struct netlink_ext_ack *extack)
+{
+	struct mlxsw_linecard_dev *linecard_dev = devlink_priv(devlink);
+	struct mlxsw_linecard *linecard = linecard_dev->linecard;
+
+	return mlxsw_linecard_devlink_info_get(linecard, req, extack);
+}
+
 static const struct devlink_ops mlxsw_linecard_dev_devlink_ops = {
+	.info_get			= mlxsw_linecard_dev_devlink_info_get,
 };
 
 static int mlxsw_linecard_bdev_probe(struct auxiliary_device *adev,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
index 43696d8badca..ee986dd2c486 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
@@ -226,6 +226,34 @@ void mlxsw_linecards_event_ops_unregister(struct mlxsw_core *mlxsw_core,
 }
 EXPORT_SYMBOL(mlxsw_linecards_event_ops_unregister);
 
+int mlxsw_linecard_devlink_info_get(struct mlxsw_linecard *linecard,
+				    struct devlink_info_req *req,
+				    struct netlink_ext_ack *extack)
+{
+	char buf[32];
+	int err;
+
+	mutex_lock(&linecard->lock);
+	if (WARN_ON(!linecard->provisioned)) {
+		err = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	sprintf(buf, "%d", linecard->hw_revision);
+	err = devlink_info_version_fixed_put(req, "hw.revision", buf);
+	if (err)
+		goto unlock;
+
+	sprintf(buf, "%d", linecard->ini_version);
+	err = devlink_info_version_running_put(req, "ini.version", buf);
+	if (err)
+		goto unlock;
+
+unlock:
+	mutex_unlock(&linecard->lock);
+	return err;
+}
+
 static int
 mlxsw_linecard_provision_set(struct mlxsw_linecard *linecard, u8 card_type,
 			     u16 hw_revision, u16 ini_version)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 05/11] mlxsw: reg: Extend MDDQ by device_info
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (3 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 04/11] mlxsw: core_linecards: Expose HW revision and INI version Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version Jiri Pirko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Extend existing MDDQ register by possibility to query information about
devices residing on a line card.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
v2->v3:
- added Ido's RWB tag
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 83 ++++++++++++++++++++++-
 1 file changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 17ce28e65464..76caf06b17d6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -11297,7 +11297,11 @@ MLXSW_ITEM32(reg, mddq, sie, 0x00, 31, 1);
 
 enum mlxsw_reg_mddq_query_type {
 	MLXSW_REG_MDDQ_QUERY_TYPE_SLOT_INFO = 1,
-	MLXSW_REG_MDDQ_QUERY_TYPE_SLOT_NAME = 3,
+	MLXSW_REG_MDDQ_QUERY_TYPE_DEVICE_INFO, /* If there are no devices
+						* on the slot, data_valid
+						* will be '0'.
+						*/
+	MLXSW_REG_MDDQ_QUERY_TYPE_SLOT_NAME,
 };
 
 /* reg_mddq_query_type
@@ -11311,6 +11315,28 @@ MLXSW_ITEM32(reg, mddq, query_type, 0x00, 16, 8);
  */
 MLXSW_ITEM32(reg, mddq, slot_index, 0x00, 0, 4);
 
+/* reg_mddq_response_msg_seq
+ * Response message sequential number. For a specific request, the response
+ * message sequential number is the following one. In addition, the last
+ * message should be 0.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, response_msg_seq, 0x04, 16, 8);
+
+/* reg_mddq_request_msg_seq
+ * Request message sequential number.
+ * The first message number should be 0.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mddq, request_msg_seq, 0x04, 0, 8);
+
+/* reg_mddq_data_valid
+ * If set, the data in the data field is valid and contain the information
+ * for the queried index.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, data_valid, 0x08, 31, 1);
+
 /* reg_mddq_slot_info_provisioned
  * If set, the INI file is applied and the card is provisioned.
  * Access: RO
@@ -11397,6 +11423,61 @@ mlxsw_reg_mddq_slot_info_unpack(const char *payload, u8 *p_slot_index,
 	*p_card_type = mlxsw_reg_mddq_slot_info_card_type_get(payload);
 }
 
+/* reg_mddq_device_info_flash_owner
+ * If set, the device is the flash owner. Otherwise, a shared flash
+ * is used by this device (another device is the flash owner).
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, device_info_flash_owner, 0x10, 30, 1);
+
+/* reg_mddq_device_info_device_index
+ * Device index. The first device should number 0.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, device_info_device_index, 0x10, 0, 8);
+
+/* reg_mddq_device_info_fw_major
+ * Major FW version number.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, device_info_fw_major, 0x14, 16, 16);
+
+/* reg_mddq_device_info_fw_minor
+ * Minor FW version number.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, device_info_fw_minor, 0x18, 16, 16);
+
+/* reg_mddq_device_info_fw_sub_minor
+ * Sub-minor FW version number.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddq, device_info_fw_sub_minor, 0x18, 0, 16);
+
+static inline void
+mlxsw_reg_mddq_device_info_pack(char *payload, u8 slot_index,
+				u8 request_msg_seq)
+{
+	__mlxsw_reg_mddq_pack(payload, slot_index,
+			      MLXSW_REG_MDDQ_QUERY_TYPE_DEVICE_INFO);
+	mlxsw_reg_mddq_request_msg_seq_set(payload, request_msg_seq);
+}
+
+static inline void
+mlxsw_reg_mddq_device_info_unpack(const char *payload, u8 *p_response_msg_seq,
+				  bool *p_data_valid, bool *p_flash_owner,
+				  u8 *p_device_index, u16 *p_fw_major,
+				  u16 *p_fw_minor, u16 *p_fw_sub_minor)
+{
+	*p_response_msg_seq = mlxsw_reg_mddq_response_msg_seq_get(payload);
+	*p_data_valid = mlxsw_reg_mddq_data_valid_get(payload);
+	*p_flash_owner = mlxsw_reg_mddq_device_info_flash_owner_get(payload);
+	*p_device_index = mlxsw_reg_mddq_device_info_device_index_get(payload);
+	*p_fw_major = mlxsw_reg_mddq_device_info_fw_major_get(payload);
+	*p_fw_minor = mlxsw_reg_mddq_device_info_fw_minor_get(payload);
+	*p_fw_sub_minor = mlxsw_reg_mddq_device_info_fw_sub_minor_get(payload);
+}
+
 #define MLXSW_REG_MDDQ_SLOT_ASCII_NAME_LEN 20
 
 /* reg_mddq_slot_ascii_name
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (4 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 05/11] mlxsw: reg: Extend MDDQ by device_info Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-21  8:11   ` Ido Schimmel
  2022-07-20 15:12 ` [patch net-next v3 07/11] mlxsw: reg: Add Management DownStream Device Tunneling Register Jiri Pirko
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

In case the line card is provisioned, go over all possible existing
devices (gearboxes) on it and expose FW version of the flashable one.

Example:

$ devlink dev info auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0:
  versions:
      fixed:
        hw.revision 0
      running:
        ini.version 4
        fw 19.2010.1312

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
- changed the check in mlxsw_linecard_devlink_info_get() to ->active
---
 Documentation/networking/devlink/mlxsw.rst    |  3 +
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  9 +++
 .../ethernet/mellanox/mlxsw/core_linecards.c  | 57 +++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/Documentation/networking/devlink/mlxsw.rst b/Documentation/networking/devlink/mlxsw.rst
index aededcf68df4..65ceed98f94d 100644
--- a/Documentation/networking/devlink/mlxsw.rst
+++ b/Documentation/networking/devlink/mlxsw.rst
@@ -75,6 +75,9 @@ The ``mlxsw`` driver reports the following versions for line card auxiliary devi
    * - ``ini.version``
      - running
      - Version of line card INI loaded
+   * - ``fw.version``
+     - running
+     - Three digit firmware version of line card device
 
 Driver-specific Traps
 =====================
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 87c58b512536..e19860c05e75 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -564,6 +564,12 @@ enum mlxsw_linecard_status_event_type {
 
 struct mlxsw_linecard_bdev;
 
+struct mlxsw_linecard_device_info {
+	u16 fw_major;
+	u16 fw_minor;
+	u16 fw_sub_minor;
+};
+
 struct mlxsw_linecard {
 	u8 slot_index;
 	struct mlxsw_linecards *linecards;
@@ -579,6 +585,9 @@ struct mlxsw_linecard {
 	u16 hw_revision;
 	u16 ini_version;
 	struct mlxsw_linecard_bdev *bdev;
+	struct {
+		struct mlxsw_linecard_device_info info;
+	} device;
 };
 
 struct mlxsw_linecard_types_info;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
index ee986dd2c486..a9568d72ba1b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
@@ -87,6 +87,47 @@ static const char *mlxsw_linecard_type_name(struct mlxsw_linecard *linecard)
 	return linecard->name;
 }
 
+static int mlxsw_linecard_device_info_update(struct mlxsw_linecard *linecard)
+{
+	struct mlxsw_core *mlxsw_core = linecard->linecards->mlxsw_core;
+	bool flashable_found = false;
+	u8 msg_seq = 0;
+
+	do {
+		struct mlxsw_linecard_device_info info;
+		char mddq_pl[MLXSW_REG_MDDQ_LEN];
+		bool flash_owner;
+		bool data_valid;
+		u8 device_index;
+		int err;
+
+		mlxsw_reg_mddq_device_info_pack(mddq_pl, linecard->slot_index,
+						msg_seq);
+		err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mddq), mddq_pl);
+		if (err)
+			return err;
+		mlxsw_reg_mddq_device_info_unpack(mddq_pl, &msg_seq,
+						  &data_valid, &flash_owner,
+						  &device_index,
+						  &info.fw_major,
+						  &info.fw_minor,
+						  &info.fw_sub_minor);
+		if (!data_valid)
+			break;
+		if (!flash_owner) /* We care only about flashable ones. */
+			continue;
+		if (flashable_found) {
+			dev_warn_once(linecard->linecards->bus_info->dev, "linecard %u: More flashable devices present, exposing only the first one\n",
+				      linecard->slot_index);
+			return 0;
+		}
+		linecard->device.info = info;
+		flashable_found = true;
+	} while (msg_seq);
+
+	return 0;
+}
+
 static void mlxsw_linecard_provision_fail(struct mlxsw_linecard *linecard)
 {
 	linecard->provisioned = false;
@@ -249,6 +290,18 @@ int mlxsw_linecard_devlink_info_get(struct mlxsw_linecard *linecard,
 	if (err)
 		goto unlock;
 
+	if (linecard->active) {
+		struct mlxsw_linecard_device_info *info = &linecard->device.info;
+
+		sprintf(buf, "%u.%u.%u", info->fw_major, info->fw_minor,
+			info->fw_sub_minor);
+		err = devlink_info_version_running_put(req,
+						       DEVLINK_INFO_VERSION_GENERIC_FW,
+						       buf);
+		if (err)
+			goto unlock;
+	}
+
 unlock:
 	mutex_unlock(&linecard->lock);
 	return err;
@@ -308,6 +361,10 @@ static int mlxsw_linecard_ready_set(struct mlxsw_linecard *linecard)
 	char mddc_pl[MLXSW_REG_MDDC_LEN];
 	int err;
 
+	err = mlxsw_linecard_device_info_update(linecard);
+	if (err)
+		return err;
+
 	mlxsw_reg_mddc_pack(mddc_pl, linecard->slot_index, false, true);
 	err = mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddc), mddc_pl);
 	if (err)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 07/11] mlxsw: reg: Add Management DownStream Device Tunneling Register
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (5 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 08/11] mlxsw: core_linecards: Expose device PSID over device info Jiri Pirko
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

The MDDT register allows to deliver query and request messages (PRM
registers, commands) to a DownStream device.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 90 +++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 76caf06b17d6..e45df09df757 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -11276,6 +11276,95 @@ mlxsw_reg_mbct_unpack(const char *payload, u8 *p_slot_index,
 		*p_fsm_state = mlxsw_reg_mbct_fsm_state_get(payload);
 }
 
+/* MDDT - Management DownStream Device Tunneling Register
+ * ------------------------------------------------------
+ * This register allows to deliver query and request messages (PRM registers,
+ * commands) to a DownStream device.
+ */
+#define MLXSW_REG_MDDT_ID 0x9160
+#define MLXSW_REG_MDDT_LEN 0x110
+
+MLXSW_REG_DEFINE(mddt, MLXSW_REG_MDDT_ID, MLXSW_REG_MDDT_LEN);
+
+/* reg_mddt_slot_index
+ * Slot index.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mddt, slot_index, 0x00, 8, 4);
+
+/* reg_mddt_device_index
+ * Device index.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mddt, device_index, 0x00, 0, 8);
+
+/* reg_mddt_read_size
+ * Read size in D-Words.
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, mddt, read_size, 0x04, 24, 8);
+
+/* reg_mddt_write_size
+ * Write size in D-Words.
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, mddt, write_size, 0x04, 16, 8);
+
+enum mlxsw_reg_mddt_status {
+	MLXSW_REG_MDDT_STATUS_OK,
+};
+
+/* reg_mddt_status
+ * Return code of the Downstream Device to the register that was sent.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mddt, status, 0x0C, 24, 8);
+
+enum mlxsw_reg_mddt_method {
+	MLXSW_REG_MDDT_METHOD_QUERY,
+	MLXSW_REG_MDDT_METHOD_WRITE,
+};
+
+/* reg_mddt_method
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, mddt, method, 0x0C, 22, 2);
+
+/* reg_mddt_register_id
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mddt, register_id, 0x0C, 0, 16);
+
+#define MLXSW_REG_MDDT_PAYLOAD_OFFSET 0x0C
+#define MLXSW_REG_MDDT_PRM_REGISTER_HEADER_LEN 4
+
+static inline char *mlxsw_reg_mddt_inner_payload(char *payload)
+{
+	return payload + MLXSW_REG_MDDT_PAYLOAD_OFFSET +
+	       MLXSW_REG_MDDT_PRM_REGISTER_HEADER_LEN;
+}
+
+static inline void mlxsw_reg_mddt_pack(char *payload, u8 slot_index,
+				       u8 device_index,
+				       enum mlxsw_reg_mddt_method method,
+				       const struct mlxsw_reg_info *reg,
+				       char **inner_payload)
+{
+	int len = reg->len + MLXSW_REG_MDDT_PRM_REGISTER_HEADER_LEN;
+
+	if (WARN_ON(len + MLXSW_REG_MDDT_PAYLOAD_OFFSET > MLXSW_REG_MDDT_LEN))
+		len = MLXSW_REG_MDDT_LEN - MLXSW_REG_MDDT_PAYLOAD_OFFSET;
+
+	MLXSW_REG_ZERO(mddt, payload);
+	mlxsw_reg_mddt_slot_index_set(payload, slot_index);
+	mlxsw_reg_mddt_device_index_set(payload, device_index);
+	mlxsw_reg_mddt_method_set(payload, method);
+	mlxsw_reg_mddt_register_id_set(payload, reg->id);
+	mlxsw_reg_mddt_read_size_set(payload, len / 4);
+	mlxsw_reg_mddt_write_size_set(payload, len / 4);
+	*inner_payload = mlxsw_reg_mddt_inner_payload(payload);
+}
+
 /* MDDQ - Management DownStream Device Query Register
  * --------------------------------------------------
  * This register allows to query the DownStream device properties. The desired
@@ -12854,6 +12943,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(mfgd),
 	MLXSW_REG(mgpir),
 	MLXSW_REG(mbct),
+	MLXSW_REG(mddt),
 	MLXSW_REG(mddq),
 	MLXSW_REG(mddc),
 	MLXSW_REG(mfde),
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 08/11] mlxsw: core_linecards: Expose device PSID over device info
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (6 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 07/11] mlxsw: reg: Add Management DownStream Device Tunneling Register Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-21  8:13   ` Ido Schimmel
  2022-07-20 15:12 ` [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing Jiri Pirko
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Use tunneled MGIR to obtain PSID of line card device and extend
device_info_get() op to fill up the info with that.

Example:

$ devlink dev info auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0:
  versions:
      fixed:
        hw.revision 0
        fw.psid MT_0000000749
      running:
        ini.version 4
        fw 19.2010.1312

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
- fixed s/Used/Use/ typo in patch description
---
 Documentation/networking/devlink/mlxsw.rst    |  3 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  1 +
 .../ethernet/mellanox/mlxsw/core_linecards.c  | 31 +++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/Documentation/networking/devlink/mlxsw.rst b/Documentation/networking/devlink/mlxsw.rst
index 65ceed98f94d..433962225bd4 100644
--- a/Documentation/networking/devlink/mlxsw.rst
+++ b/Documentation/networking/devlink/mlxsw.rst
@@ -75,6 +75,9 @@ The ``mlxsw`` driver reports the following versions for line card auxiliary devi
    * - ``ini.version``
      - running
      - Version of line card INI loaded
+   * - ``fw.psid``
+     - fixed
+     - Line card device PSID
    * - ``fw.version``
      - running
      - Three digit firmware version of line card device
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index e19860c05e75..a3246082219d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -568,6 +568,7 @@ struct mlxsw_linecard_device_info {
 	u16 fw_major;
 	u16 fw_minor;
 	u16 fw_sub_minor;
+	char psid[MLXSW_REG_MGIR_FW_INFO_PSID_SIZE];
 };
 
 struct mlxsw_linecard {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
index a9568d72ba1b..771a3e43b8bb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
@@ -87,6 +87,27 @@ static const char *mlxsw_linecard_type_name(struct mlxsw_linecard *linecard)
 	return linecard->name;
 }
 
+static int mlxsw_linecard_device_psid_get(struct mlxsw_linecard *linecard,
+					  u8 device_index, char *psid)
+{
+	struct mlxsw_core *mlxsw_core = linecard->linecards->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mgir_pl;
+	int err;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index, device_index,
+			    MLXSW_REG_MDDT_METHOD_QUERY,
+			    MLXSW_REG(mgir), &mgir_pl);
+
+	mlxsw_reg_mgir_pack(mgir_pl);
+	err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_mgir_fw_info_psid_memcpy_from(mgir_pl, psid);
+	return 0;
+}
+
 static int mlxsw_linecard_device_info_update(struct mlxsw_linecard *linecard)
 {
 	struct mlxsw_core *mlxsw_core = linecard->linecards->mlxsw_core;
@@ -121,6 +142,12 @@ static int mlxsw_linecard_device_info_update(struct mlxsw_linecard *linecard)
 				      linecard->slot_index);
 			return 0;
 		}
+
+		err = mlxsw_linecard_device_psid_get(linecard, device_index,
+						     info.psid);
+		if (err)
+			return err;
+
 		linecard->device.info = info;
 		flashable_found = true;
 	} while (msg_seq);
@@ -293,6 +320,10 @@ int mlxsw_linecard_devlink_info_get(struct mlxsw_linecard *linecard,
 	if (linecard->active) {
 		struct mlxsw_linecard_device_info *info = &linecard->device.info;
 
+		err = devlink_info_version_fixed_put(req,
+						     DEVLINK_INFO_VERSION_GENERIC_FW_PSID,
+						     info->psid);
+
 		sprintf(buf, "%u.%u.%u", info->fw_major, info->fw_minor,
 			info->fw_sub_minor);
 		err = devlink_info_version_running_put(req,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (7 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 08/11] mlxsw: core_linecards: Expose device PSID over device info Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-21  8:25   ` Ido Schimmel
  2022-07-20 15:12 ` [patch net-next v3 10/11] selftests: mlxsw: Check line card info on provisioned line card Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 11/11] selftests: mlxsw: Check line card info on activated " Jiri Pirko
  10 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Implement flash_update() devlink op for the line card devlink instance
to allow user to update line card gearbox FW using MDDT register
and mlxfw.

Example:
$ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
- fixed the check in mlxsw_linecard_flash_update() to ->active, removed
  WARN_ON() and added extack fill-up.
---
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  31 +-
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  11 +
 .../mellanox/mlxsw/core_linecard_dev.c        |  13 +
 .../ethernet/mellanox/mlxsw/core_linecards.c  | 278 ++++++++++++++++++
 4 files changed, 323 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 831b0d3472c6..abc9680527d8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -951,6 +951,20 @@ static struct mlxsw_driver *mlxsw_core_driver_get(const char *kind)
 	return mlxsw_driver;
 }
 
+int mlxsw_core_fw_flash(struct mlxsw_core *mlxsw_core,
+			struct mlxfw_dev *mlxfw_dev,
+			const struct firmware *firmware,
+			struct netlink_ext_ack *extack)
+{
+	int err;
+
+	mlxsw_core->fw_flash_in_progress = true;
+	err = mlxfw_firmware_flash(mlxfw_dev, firmware, extack);
+	mlxsw_core->fw_flash_in_progress = false;
+
+	return err;
+}
+
 struct mlxsw_core_fw_info {
 	struct mlxfw_dev mlxfw_dev;
 	struct mlxsw_core *mlxsw_core;
@@ -1105,8 +1119,9 @@ static const struct mlxfw_dev_ops mlxsw_core_fw_mlxsw_dev_ops = {
 	.fsm_release		= mlxsw_core_fw_fsm_release,
 };
 
-static int mlxsw_core_fw_flash(struct mlxsw_core *mlxsw_core, const struct firmware *firmware,
-			       struct netlink_ext_ack *extack)
+static int mlxsw_core_dev_fw_flash(struct mlxsw_core *mlxsw_core,
+				   const struct firmware *firmware,
+				   struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core_fw_info mlxsw_core_fw_info = {
 		.mlxfw_dev = {
@@ -1117,13 +1132,9 @@ static int mlxsw_core_fw_flash(struct mlxsw_core *mlxsw_core, const struct firmw
 		},
 		.mlxsw_core = mlxsw_core
 	};
-	int err;
 
-	mlxsw_core->fw_flash_in_progress = true;
-	err = mlxfw_firmware_flash(&mlxsw_core_fw_info.mlxfw_dev, firmware, extack);
-	mlxsw_core->fw_flash_in_progress = false;
-
-	return err;
+	return mlxsw_core_fw_flash(mlxsw_core, &mlxsw_core_fw_info.mlxfw_dev,
+				   firmware, extack);
 }
 
 static int mlxsw_core_fw_rev_validate(struct mlxsw_core *mlxsw_core,
@@ -1169,7 +1180,7 @@ static int mlxsw_core_fw_rev_validate(struct mlxsw_core *mlxsw_core,
 		return err;
 	}
 
-	err = mlxsw_core_fw_flash(mlxsw_core, firmware, NULL);
+	err = mlxsw_core_dev_fw_flash(mlxsw_core, firmware, NULL);
 	release_firmware(firmware);
 	if (err)
 		dev_err(mlxsw_bus_info->dev, "Could not upgrade firmware\n");
@@ -1187,7 +1198,7 @@ static int mlxsw_core_fw_flash_update(struct mlxsw_core *mlxsw_core,
 				      struct devlink_flash_update_params *params,
 				      struct netlink_ext_ack *extack)
 {
-	return mlxsw_core_fw_flash(mlxsw_core, params->fw, extack);
+	return mlxsw_core_dev_fw_flash(mlxsw_core, params->fw, extack);
 }
 
 static int mlxsw_core_devlink_param_fw_load_policy_validate(struct devlink *devlink, u32 id,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index a3246082219d..39c4a139188f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -19,6 +19,7 @@
 #include "reg.h"
 #include "cmd.h"
 #include "resources.h"
+#include "../mlxfw/mlxfw.h"
 
 enum mlxsw_core_resource_id {
 	MLXSW_CORE_RESOURCE_PORTS = 1,
@@ -48,6 +49,11 @@ mlxsw_core_fw_rev_minor_subminor_validate(const struct mlxsw_fw_rev *rev,
 int mlxsw_core_driver_register(struct mlxsw_driver *mlxsw_driver);
 void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver);
 
+int mlxsw_core_fw_flash(struct mlxsw_core *mlxsw_core,
+			struct mlxfw_dev *mlxfw_dev,
+			const struct firmware *firmware,
+			struct netlink_ext_ack *extack);
+
 int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 				   const struct mlxsw_bus *mlxsw_bus,
 				   void *bus_priv, bool reload,
@@ -588,6 +594,7 @@ struct mlxsw_linecard {
 	struct mlxsw_linecard_bdev *bdev;
 	struct {
 		struct mlxsw_linecard_device_info info;
+		u8 index;
 	} device;
 };
 
@@ -612,6 +619,10 @@ mlxsw_linecard_get(struct mlxsw_linecards *linecards, u8 slot_index)
 int mlxsw_linecard_devlink_info_get(struct mlxsw_linecard *linecard,
 				    struct devlink_info_req *req,
 				    struct netlink_ext_ack *extack);
+int mlxsw_linecard_flash_update(struct devlink *linecard_devlink,
+				struct mlxsw_linecard *linecard,
+				const struct firmware *firmware,
+				struct netlink_ext_ack *extack);
 
 int mlxsw_linecards_init(struct mlxsw_core *mlxsw_core,
 			 const struct mlxsw_bus_info *bus_info);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
index 13c20b83b190..49fee038a99c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c
@@ -108,8 +108,21 @@ static int mlxsw_linecard_dev_devlink_info_get(struct devlink *devlink,
 	return mlxsw_linecard_devlink_info_get(linecard, req, extack);
 }
 
+static int
+mlxsw_linecard_dev_devlink_flash_update(struct devlink *devlink,
+					struct devlink_flash_update_params *params,
+					struct netlink_ext_ack *extack)
+{
+	struct mlxsw_linecard_dev *linecard_dev = devlink_priv(devlink);
+	struct mlxsw_linecard *linecard = linecard_dev->linecard;
+
+	return mlxsw_linecard_flash_update(devlink, linecard,
+					   params->fw, extack);
+}
+
 static const struct devlink_ops mlxsw_linecard_dev_devlink_ops = {
 	.info_get			= mlxsw_linecard_dev_devlink_info_get,
+	.flash_update			= mlxsw_linecard_dev_devlink_flash_update,
 };
 
 static int mlxsw_linecard_bdev_probe(struct auxiliary_device *adev,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
index 771a3e43b8bb..046db8495f02 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c
@@ -13,6 +13,7 @@
 #include <linux/vmalloc.h>
 
 #include "core.h"
+#include "../mlxfw/mlxfw.h"
 
 struct mlxsw_linecard_ini_file {
 	__le16 size;
@@ -87,6 +88,282 @@ static const char *mlxsw_linecard_type_name(struct mlxsw_linecard *linecard)
 	return linecard->name;
 }
 
+struct mlxsw_linecard_device_fw_info {
+	struct mlxfw_dev mlxfw_dev;
+	struct mlxsw_core *mlxsw_core;
+	struct mlxsw_linecard *linecard;
+};
+
+static int mlxsw_linecard_device_fw_component_query(struct mlxfw_dev *mlxfw_dev,
+						    u16 component_index,
+						    u32 *p_max_size,
+						    u8 *p_align_bits,
+						    u16 *p_max_write_size)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcqi_pl;
+	int err;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_QUERY,
+			    MLXSW_REG(mcqi), &mcqi_pl);
+
+	mlxsw_reg_mcqi_pack(mcqi_pl, component_index);
+	err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+	if (err)
+		return err;
+	mlxsw_reg_mcqi_unpack(mcqi_pl, p_max_size, p_align_bits,
+			      p_max_write_size);
+
+	*p_align_bits = max_t(u8, *p_align_bits, 2);
+	*p_max_write_size = min_t(u16, *p_max_write_size,
+				  MLXSW_REG_MCDA_MAX_DATA_LEN);
+	return 0;
+}
+
+static int mlxsw_linecard_device_fw_fsm_lock(struct mlxfw_dev *mlxfw_dev,
+					     u32 *fwhandle)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	u8 control_state;
+	char *mcc_pl;
+	int err;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_QUERY,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, 0, 0, 0, 0);
+	err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_mcc_unpack(mcc_pl, fwhandle, NULL, &control_state);
+	if (control_state != MLXFW_FSM_STATE_IDLE)
+		return -EBUSY;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_LOCK_UPDATE_HANDLE,
+			   0, *fwhandle, 0);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static int
+mlxsw_linecard_device_fw_fsm_component_update(struct mlxfw_dev *mlxfw_dev,
+					      u32 fwhandle,
+					      u16 component_index,
+					      u32 component_size)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcc_pl;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_UPDATE_COMPONENT,
+			   component_index, fwhandle, component_size);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static int
+mlxsw_linecard_device_fw_fsm_block_download(struct mlxfw_dev *mlxfw_dev,
+					    u32 fwhandle, u8 *data,
+					    u16 size, u32 offset)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcda_pl;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcda), &mcda_pl);
+	mlxsw_reg_mcda_pack(mcda_pl, fwhandle, offset, size, data);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static int
+mlxsw_linecard_device_fw_fsm_component_verify(struct mlxfw_dev *mlxfw_dev,
+					      u32 fwhandle, u16 component_index)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcc_pl;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_VERIFY_COMPONENT,
+			   component_index, fwhandle, 0);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static int mlxsw_linecard_device_fw_fsm_activate(struct mlxfw_dev *mlxfw_dev,
+						 u32 fwhandle)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcc_pl;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_ACTIVATE,
+			   0, fwhandle, 0);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static int
+mlxsw_linecard_device_fw_fsm_query_state(struct mlxfw_dev *mlxfw_dev,
+					 u32 fwhandle,
+					 enum mlxfw_fsm_state *fsm_state,
+					 enum mlxfw_fsm_state_err *fsm_state_err)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	u8 control_state;
+	u8 error_code;
+	char *mcc_pl;
+	int err;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_QUERY,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, 0, 0, fwhandle, 0);
+	err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_mcc_unpack(mcc_pl, NULL, &error_code, &control_state);
+	*fsm_state = control_state;
+	*fsm_state_err = min_t(enum mlxfw_fsm_state_err, error_code,
+			       MLXFW_FSM_STATE_ERR_MAX);
+	return 0;
+}
+
+static void mlxsw_linecard_device_fw_fsm_cancel(struct mlxfw_dev *mlxfw_dev,
+						u32 fwhandle)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcc_pl;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_CANCEL,
+			   0, fwhandle, 0);
+	mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static void mlxsw_linecard_device_fw_fsm_release(struct mlxfw_dev *mlxfw_dev,
+						 u32 fwhandle)
+{
+	struct mlxsw_linecard_device_fw_info *info =
+		container_of(mlxfw_dev, struct mlxsw_linecard_device_fw_info,
+			     mlxfw_dev);
+	struct mlxsw_linecard *linecard = info->linecard;
+	struct mlxsw_core *mlxsw_core = info->mlxsw_core;
+	char mddt_pl[MLXSW_REG_MDDT_LEN];
+	char *mcc_pl;
+
+	mlxsw_reg_mddt_pack(mddt_pl, linecard->slot_index,
+			    linecard->device.index,
+			    MLXSW_REG_MDDT_METHOD_WRITE,
+			    MLXSW_REG(mcc), &mcc_pl);
+	mlxsw_reg_mcc_pack(mcc_pl,
+			   MLXSW_REG_MCC_INSTRUCTION_RELEASE_UPDATE_HANDLE,
+			   0, fwhandle, 0);
+	mlxsw_reg_write(mlxsw_core, MLXSW_REG(mddt), mddt_pl);
+}
+
+static const struct mlxfw_dev_ops mlxsw_linecard_device_dev_ops = {
+	.component_query	= mlxsw_linecard_device_fw_component_query,
+	.fsm_lock		= mlxsw_linecard_device_fw_fsm_lock,
+	.fsm_component_update	= mlxsw_linecard_device_fw_fsm_component_update,
+	.fsm_block_download	= mlxsw_linecard_device_fw_fsm_block_download,
+	.fsm_component_verify	= mlxsw_linecard_device_fw_fsm_component_verify,
+	.fsm_activate		= mlxsw_linecard_device_fw_fsm_activate,
+	.fsm_query_state	= mlxsw_linecard_device_fw_fsm_query_state,
+	.fsm_cancel		= mlxsw_linecard_device_fw_fsm_cancel,
+	.fsm_release		= mlxsw_linecard_device_fw_fsm_release,
+};
+
+int mlxsw_linecard_flash_update(struct devlink *linecard_devlink,
+				struct mlxsw_linecard *linecard,
+				const struct firmware *firmware,
+				struct netlink_ext_ack *extack)
+{
+	struct mlxsw_core *mlxsw_core = linecard->linecards->mlxsw_core;
+	struct mlxsw_linecard_device_fw_info info = {
+		.mlxfw_dev = {
+			.ops = &mlxsw_linecard_device_dev_ops,
+			.psid = linecard->device.info.psid,
+			.psid_size = strlen(linecard->device.info.psid),
+			.devlink = linecard_devlink,
+		},
+		.mlxsw_core = mlxsw_core,
+		.linecard = linecard,
+	};
+	int err;
+
+	mutex_lock(&linecard->lock);
+	if (!linecard->active) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to flash non-active linecard");
+		err = -EINVAL;
+		goto unlock;
+	}
+	err = mlxsw_core_fw_flash(mlxsw_core, &info.mlxfw_dev,
+				  firmware, extack);
+unlock:
+	mutex_unlock(&linecard->lock);
+	return err;
+}
+
 static int mlxsw_linecard_device_psid_get(struct mlxsw_linecard *linecard,
 					  u8 device_index, char *psid)
 {
@@ -149,6 +426,7 @@ static int mlxsw_linecard_device_info_update(struct mlxsw_linecard *linecard)
 			return err;
 
 		linecard->device.info = info;
+		linecard->device.index = device_index;
 		flashable_found = true;
 	} while (msg_seq);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 10/11] selftests: mlxsw: Check line card info on provisioned line card
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (8 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  2022-07-20 15:12 ` [patch net-next v3 11/11] selftests: mlxsw: Check line card info on activated " Jiri Pirko
  10 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Once line card is provisioned, check if HW revision and INI version
are exposed on associated nested auxiliary device.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 .../drivers/net/mlxsw/devlink_linecard.sh     | 30 +++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh b/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh
index 08a922d8b86a..ca4e9b08a105 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh
@@ -84,6 +84,13 @@ lc_wait_until_port_count_is()
 	busywait "$timeout" until_lc_port_count_is "$port_count" lc_port_count_get "$lc"
 }
 
+lc_nested_devlink_dev_get()
+{
+	local lc=$1
+
+	devlink lc show $DEVLINK_DEV lc $lc -j | jq -e -r ".[][][].nested_devlink"
+}
+
 PROV_UNPROV_TIMEOUT=8000 # ms
 POST_PROV_ACT_TIMEOUT=2000 # ms
 PROV_PORTS_INSTANTIATION_TIMEOUT=15000 # ms
@@ -191,12 +198,30 @@ ports_check()
 	check_err $? "Unexpected port count linecard $lc (got $port_count, expected $expected_port_count)"
 }
 
+lc_dev_info_provisioned_check()
+{
+	local lc=$1
+	local nested_devlink_dev=$2
+	local fixed_hw_revision
+	local running_ini_version
+
+	fixed_hw_revision=$(devlink dev info $nested_devlink_dev -j | \
+			    jq -e -r '.[][].versions.fixed."hw.revision"')
+	check_err $? "Failed to get linecard $lc fixed.hw.revision"
+	log_info "Linecard $lc fixed.hw.revision: \"$fixed_hw_revision\""
+	running_ini_version=$(devlink dev info $nested_devlink_dev -j | \
+			      jq -e -r '.[][].versions.running."ini.version"')
+	check_err $? "Failed to get linecard $lc running.ini.version"
+	log_info "Linecard $lc running.ini.version: \"$running_ini_version\""
+}
+
 provision_test()
 {
 	RET=0
 	local lc
 	local type
 	local state
+	local nested_devlink_dev
 
 	lc=$LC_SLOT
 	supported_types_check $lc
@@ -207,6 +232,11 @@ provision_test()
 	fi
 	provision_one $lc $LC_16X100G_TYPE
 	ports_check $lc $LC_16X100G_PORT_COUNT
+
+	nested_devlink_dev=$(lc_nested_devlink_dev_get $lc)
+	check_err $? "Failed to get nested devlink handle of linecard $lc"
+	lc_dev_info_provisioned_check $lc $nested_devlink_dev
+
 	log_test "Provision"
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v3 11/11] selftests: mlxsw: Check line card info on activated line card
  2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
                   ` (9 preceding siblings ...)
  2022-07-20 15:12 ` [patch net-next v3 10/11] selftests: mlxsw: Check line card info on provisioned line card Jiri Pirko
@ 2022-07-20 15:12 ` Jiri Pirko
  10 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-20 15:12 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

From: Jiri Pirko <jiri@nvidia.com>

Once line card is activated, check the FW version and PSID are exposed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 .../drivers/net/mlxsw/devlink_linecard.sh     | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh b/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh
index ca4e9b08a105..224ca3695c89 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/devlink_linecard.sh
@@ -250,12 +250,32 @@ interface_check()
 	setup_wait
 }
 
+lc_dev_info_active_check()
+{
+	local lc=$1
+	local nested_devlink_dev=$2
+	local fixed_device_fw_psid
+	local running_device_fw
+
+	fixed_device_fw_psid=$(devlink dev info $nested_devlink_dev -j | \
+			       jq -e -r ".[][].versions.fixed" | \
+			       jq -e -r '."fw.psid"')
+	check_err $? "Failed to get linecard $lc fixed fw PSID"
+	log_info "Linecard $lc fixed.fw.psid: \"$fixed_device_fw_psid\""
+
+	running_device_fw=$(devlink dev info $nested_devlink_dev -j | \
+			    jq -e -r ".[][].versions.running.fw")
+	check_err $? "Failed to get linecard $lc running.fw.version"
+	log_info "Linecard $lc running.fw: \"$running_device_fw\""
+}
+
 activation_16x100G_test()
 {
 	RET=0
 	local lc
 	local type
 	local state
+	local nested_devlink_dev
 
 	lc=$LC_SLOT
 	type=$LC_16X100G_TYPE
@@ -268,6 +288,10 @@ activation_16x100G_test()
 
 	interface_check
 
+	nested_devlink_dev=$(lc_nested_devlink_dev_get $lc)
+	check_err $? "Failed to get nested devlink handle of linecard $lc"
+	lc_dev_info_active_check $lc $nested_devlink_dev
+
 	log_test "Activation 16x100G"
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-20 15:12 ` [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Jiri Pirko
@ 2022-07-20 22:25   ` Keller, Jacob E
  2022-07-21  5:45     ` Jiri Pirko
  2022-07-21  0:49   ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Keller, Jacob E @ 2022-07-20 22:25 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Wednesday, July 20, 2022 8:12 AM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; kuba@kernel.org; idosch@nvidia.com;
> petrm@nvidia.com; pabeni@redhat.com; edumazet@google.com;
> mlxsw@nvidia.com; saeedm@nvidia.com; snelson@pensando.io
> Subject: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get()
> works with valid pointer during xarray iteration
> 
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Remove dependency on devlink_mutex during devlinks xarray iteration.
> 
> The reason is that devlink_register/unregister() functions taking
> devlink_mutex would deadlock during devlink reload operation of devlink
> instance which registers/unregisters nested devlink instances.
> 
> The devlinks xarray consistency is ensured internally by xarray.
> There is a reference taken when working with devlink using
> devlink_try_get(). But there is no guarantee that devlink pointer
> picked during xarray iteration is not freed before devlink_try_get()
> is called.
> 
> Make sure that devlink_try_get() works with valid pointer.
> Achieve it by:
> 1) Splitting devlink_put() so the completion is sent only
>    after grace period. Completion unblocks the devlink_unregister()
>    routine, which is followed-up by devlink_free()
> 2) Iterate the devlink xarray holding RCU read lock.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>


This makes sense as long as its ok to drop the rcu_read_lock while in the body of the xa loops. That feels a bit odd to me...

> ---
> v2->v3:
> - s/enf/end/ in devlink_put() comment
> - added missing rcu_read_lock() call to info_get_dumpit()
> - extended patch description by motivation
> - removed an extra "by" from patch description
> v1->v2:
> - new patch (originally part of different patchset)
> ---
>  net/core/devlink.c | 114 ++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 96 insertions(+), 18 deletions(-)
> 
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 98d79feeb3dc..6a3931a8e338 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -70,6 +70,7 @@ struct devlink {
>  	u8 reload_failed:1;
>  	refcount_t refcount;
>  	struct completion comp;
> +	struct rcu_head rcu;
>  	char priv[] __aligned(NETDEV_ALIGN);
>  };
> 
> @@ -221,8 +222,6 @@ static DEFINE_XARRAY_FLAGS(devlinks,
> XA_FLAGS_ALLOC);
>  /* devlink_mutex
>   *
>   * An overall lock guarding every operation coming from userspace.
> - * It also guards devlink devices list and it is taken when
> - * driver registers/unregisters it.
>   */
>  static DEFINE_MUTEX(devlink_mutex);
> 
> @@ -232,10 +231,21 @@ struct net *devlink_net(const struct devlink *devlink)
>  }
>  EXPORT_SYMBOL_GPL(devlink_net);
> 
> +static void __devlink_put_rcu(struct rcu_head *head)
> +{
> +	struct devlink *devlink = container_of(head, struct devlink, rcu);
> +
> +	complete(&devlink->comp);
> +}
> +
>  void devlink_put(struct devlink *devlink)
>  {
>  	if (refcount_dec_and_test(&devlink->refcount))
> -		complete(&devlink->comp);
> +		/* Make sure unregister operation that may await the completion
> +		 * is unblocked only after all users are after the end of
> +		 * RCU grace period.
> +		 */
> +		call_rcu(&devlink->rcu, __devlink_put_rcu);
>  }
> 
>  struct devlink *__must_check devlink_try_get(struct devlink *devlink)
> @@ -295,6 +305,7 @@ static struct devlink *devlink_get_from_attrs(struct net
> *net,
> 
>  	lockdep_assert_held(&devlink_mutex);
> 
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
>  		    strcmp(dev_name(devlink->dev), devname) == 0 &&
> @@ -306,6 +317,7 @@ static struct devlink *devlink_get_from_attrs(struct net
> *net,
> 
>  	if (!found || !devlink_try_get(devlink))
>  		devlink = ERR_PTR(-ENODEV);
> +	rcu_read_unlock();
> 
>  	return devlink;
>  }
> @@ -1329,9 +1341,11 @@ static int devlink_nl_cmd_rate_get_dumpit(struct
> sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -1358,7 +1372,9 @@ static int devlink_nl_cmd_rate_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
>  	if (err != -EMSGSIZE)
> @@ -1432,29 +1448,32 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff
> *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 

Is it safe to rcu_read_unlock here while we're still in the middle of the array processing? What happens if something else updates the xarray? is the for_each_marked safe?

> -		if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) {
> -			devlink_put(devlink);
> -			continue;
> -		}
> +		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
> +			goto retry;
> 

Ahh retry is at the end of the loop, so we'll just skip this one and move to the next one without needing to duplicate both devlink_put and rcu_read_lock.. ok.

> -		if (idx < start) {
> -			idx++;
> -			devlink_put(devlink);
> -			continue;
> -		}
> +		if (idx < start)
> +			goto inc;
> 
>  		err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
>  				      NETLINK_CB(cb->skb).portid,
>  				      cb->nlh->nlmsg_seq, NLM_F_MULTI);
> -		devlink_put(devlink);
> -		if (err)
> +		if (err) {
> +			devlink_put(devlink);
>  			goto out;
> +		}
> +inc:
>  		idx++;
> +retry:
> +		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -1495,9 +1514,11 @@ static int devlink_nl_cmd_port_get_dumpit(struct
> sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -1523,7 +1544,9 @@ static int devlink_nl_cmd_port_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -2177,9 +2200,11 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct
> sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -2208,7 +2233,9 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct
> sk_buff *msg,
>  		mutex_unlock(&devlink->linecards_lock);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -2449,9 +2476,11 @@ static int devlink_nl_cmd_sb_get_dumpit(struct
> sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -2477,7 +2506,9 @@ static int devlink_nl_cmd_sb_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -2601,9 +2632,11 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct
> sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
>  		    !devlink->ops->sb_pool_get)
> @@ -2626,7 +2659,9 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -2822,9 +2857,11 @@ static int
> devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
>  		    !devlink->ops->sb_port_pool_get)
> @@ -2847,7 +2884,9 @@ static int
> devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -3071,9 +3110,11 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct
> sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
>  		    !devlink->ops->sb_tc_pool_bind_get)
> @@ -3097,7 +3138,9 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -5158,9 +5201,11 @@ static int devlink_nl_cmd_param_get_dumpit(struct
> sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -5188,7 +5233,9 @@ static int devlink_nl_cmd_param_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -5393,9 +5440,11 @@ static int
> devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -5428,7 +5477,9 @@ static int
> devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -5977,9 +6028,11 @@ static int devlink_nl_cmd_region_get_dumpit(struct
> sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -5990,7 +6043,9 @@ static int devlink_nl_cmd_region_get_dumpit(struct
> sk_buff *msg,
>  		devlink_put(devlink);
>  		if (err)
>  			goto out;
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
>  	cb->args[0] = idx;
> @@ -6511,9 +6566,11 @@ static int devlink_nl_cmd_info_get_dumpit(struct
> sk_buff *msg,
>  	int err = 0;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -6531,13 +6588,16 @@ static int devlink_nl_cmd_info_get_dumpit(struct
> sk_buff *msg,
>  			err = 0;
>  		else if (err) {
>  			devlink_put(devlink);
> +			rcu_read_lock();
>  			break;
>  		}
>  inc:
>  		idx++;
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  	mutex_unlock(&devlink_mutex);
> 
>  	if (err != -EMSGSIZE)
> @@ -7691,9 +7751,11 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct
> sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry_rep;
> @@ -7719,11 +7781,13 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct
> sk_buff *msg,
>  		mutex_unlock(&devlink->reporters_lock);
>  retry_rep:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> 
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry_port;
> @@ -7754,7 +7818,9 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry_port:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -8291,9 +8357,11 @@ static int devlink_nl_cmd_trap_get_dumpit(struct
> sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -8319,7 +8387,9 @@ static int devlink_nl_cmd_trap_get_dumpit(struct
> sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -8518,9 +8588,11 @@ static int
> devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -8547,7 +8619,9 @@ static int
> devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -8832,9 +8906,11 @@ static int
> devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg,
>  	int err;
> 
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>  			goto retry;
> @@ -8861,7 +8937,9 @@ static int
> devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg,
>  		devl_unlock(devlink);
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  out:
>  	mutex_unlock(&devlink_mutex);
> 
> @@ -9589,10 +9667,8 @@ void devlink_register(struct devlink *devlink)
>  	ASSERT_DEVLINK_NOT_REGISTERED(devlink);
>  	/* Make sure that we are in .probe() routine */
> 
> -	mutex_lock(&devlink_mutex);
>  	xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
>  	devlink_notify_register(devlink);
> -	mutex_unlock(&devlink_mutex);
>  }
>  EXPORT_SYMBOL_GPL(devlink_register);
> 
> @@ -9609,10 +9685,8 @@ void devlink_unregister(struct devlink *devlink)
>  	devlink_put(devlink);
>  	wait_for_completion(&devlink->comp);
> 
> -	mutex_lock(&devlink_mutex);
>  	devlink_notify_unregister(devlink);
>  	xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
> -	mutex_unlock(&devlink_mutex);
>  }
>  EXPORT_SYMBOL_GPL(devlink_unregister);
> 
> @@ -12281,9 +12355,11 @@ static void __net_exit
> devlink_pernet_pre_exit(struct net *net)
>  	 * all devlink instances from this namespace into init_net.
>  	 */
>  	mutex_lock(&devlink_mutex);
> +	rcu_read_lock();
>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>  		if (!devlink_try_get(devlink))
>  			continue;
> +		rcu_read_unlock();
> 
>  		if (!net_eq(devlink_net(devlink), net))
>  			goto retry;
> @@ -12297,7 +12373,9 @@ static void __net_exit devlink_pernet_pre_exit(struct
> net *net)
>  			pr_warn("Failed to reload devlink instance into
> init_net\n");
>  retry:
>  		devlink_put(devlink);
> +		rcu_read_lock();
>  	}
> +	rcu_read_unlock();
>  	mutex_unlock(&devlink_mutex);
>  }
> 
> --
> 2.35.3


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-20 15:12 ` [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Jiri Pirko
  2022-07-20 22:25   ` Keller, Jacob E
@ 2022-07-21  0:49   ` Jakub Kicinski
  2022-07-21  5:51     ` Jiri Pirko
                       ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Jakub Kicinski @ 2022-07-21  0:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Wed, 20 Jul 2022 17:12:24 +0200 Jiri Pirko wrote:
> +static void __devlink_put_rcu(struct rcu_head *head)
> +{
> +	struct devlink *devlink = container_of(head, struct devlink, rcu);
> +
> +	complete(&devlink->comp);
> +}
> +
>  void devlink_put(struct devlink *devlink)
>  {
>  	if (refcount_dec_and_test(&devlink->refcount))
> -		complete(&devlink->comp);
> +		/* Make sure unregister operation that may await the completion
> +		 * is unblocked only after all users are after the end of
> +		 * RCU grace period.
> +		 */
> +		call_rcu(&devlink->rcu, __devlink_put_rcu);
>  }

Hm. I always assumed we'd just use the xa_lock(). Unmarking the
instance as registered takes that lock which provides a natural 
barrier for others trying to take a reference.

Something along these lines (untested):

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 98d79feeb3dc..6321ea123f79 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -278,6 +278,38 @@ void devl_unlock(struct devlink *devlink)
 }
 EXPORT_SYMBOL_GPL(devl_unlock);
 
+static struct devlink *devlink_iter_next(unsigned long *index)
+{
+	struct devlink *devlink;
+
+	xa_lock(&devlinks);
+	devlink = xa_find_after(&devlinks, index, ULONG_MAX,
+				DEVLINK_REGISTERED);
+	if (devlink && !refcount_inc_not_zero(&devlink->refcount))
+		devlink = NULL;
+	xa_unlock(&devlinks);
+
+	return devlink ?: devlink_iter_next(index);
+}
+
+static struct devlink *devlink_iter_start(unsigned long *index)
+{
+	struct devlink *devlink;
+
+	xa_lock(&devlinks);
+	devlink = xa_find(&devlinks, index, ULONG_MAX, DEVLINK_REGISTERED);
+	if (devlink && !refcount_inc_not_zero(&devlink->refcount))
+		devlink = NULL;
+	xa_unlock(&devlinks);
+
+	return devlink ?: devlink_iter_next(index);
+}
+
+#define devlink_for_each_get(index, entry)			\
+	for (index = 0, entry = devlink_iter_start(&index);	\
+	     entry; entry = devlink_iter_next(&index))
+
 static struct devlink *devlink_get_from_attrs(struct net *net,
 					      struct nlattr **attrs)
 {
@@ -1329,10 +1361,7 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg,
 	int err = 0;
 
 	mutex_lock(&devlink_mutex);
-	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
-		if (!devlink_try_get(devlink))
-			continue;
-
+	devlink_for_each_get(index, devlink) {
 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
 			goto retry;
 
etc.

Plus we need to be more careful about the unregistering order, I
believe the correct ordering is:

	clear_unmark()
	put()
	wait()
	notify()

but I believe we'll run afoul of Leon's notification suppression.
So I guess notify() has to go before clear_unmark(), but we should
unmark before we wait otherwise we could live lock (once the mutex 
is really gone, I mean).

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-20 22:25   ` Keller, Jacob E
@ 2022-07-21  5:45     ` Jiri Pirko
  2022-07-21 18:55       ` Keller, Jacob E
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-21  5:45 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: netdev, davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw,
	saeedm, snelson

Thu, Jul 21, 2022 at 12:25:54AM CEST, jacob.e.keller@intel.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Wednesday, July 20, 2022 8:12 AM
>> To: netdev@vger.kernel.org
>> Cc: davem@davemloft.net; kuba@kernel.org; idosch@nvidia.com;
>> petrm@nvidia.com; pabeni@redhat.com; edumazet@google.com;
>> mlxsw@nvidia.com; saeedm@nvidia.com; snelson@pensando.io
>> Subject: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get()
>> works with valid pointer during xarray iteration
>> 
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> Remove dependency on devlink_mutex during devlinks xarray iteration.
>> 
>> The reason is that devlink_register/unregister() functions taking
>> devlink_mutex would deadlock during devlink reload operation of devlink
>> instance which registers/unregisters nested devlink instances.
>> 
>> The devlinks xarray consistency is ensured internally by xarray.
>> There is a reference taken when working with devlink using
>> devlink_try_get(). But there is no guarantee that devlink pointer
>> picked during xarray iteration is not freed before devlink_try_get()
>> is called.
>> 
>> Make sure that devlink_try_get() works with valid pointer.
>> Achieve it by:
>> 1) Splitting devlink_put() so the completion is sent only
>>    after grace period. Completion unblocks the devlink_unregister()
>>    routine, which is followed-up by devlink_free()
>> 2) Iterate the devlink xarray holding RCU read lock.
>> 
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>
>
>This makes sense as long as its ok to drop the rcu_read_lock while in the body of the xa loops. That feels a bit odd to me...

Yes, it is okay. See my comment below.


>
>> ---
>> v2->v3:
>> - s/enf/end/ in devlink_put() comment
>> - added missing rcu_read_lock() call to info_get_dumpit()
>> - extended patch description by motivation
>> - removed an extra "by" from patch description
>> v1->v2:
>> - new patch (originally part of different patchset)
>> ---
>>  net/core/devlink.c | 114 ++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 96 insertions(+), 18 deletions(-)
>> 
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 98d79feeb3dc..6a3931a8e338 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -70,6 +70,7 @@ struct devlink {
>>  	u8 reload_failed:1;
>>  	refcount_t refcount;
>>  	struct completion comp;
>> +	struct rcu_head rcu;
>>  	char priv[] __aligned(NETDEV_ALIGN);
>>  };
>> 
>> @@ -221,8 +222,6 @@ static DEFINE_XARRAY_FLAGS(devlinks,
>> XA_FLAGS_ALLOC);
>>  /* devlink_mutex
>>   *
>>   * An overall lock guarding every operation coming from userspace.
>> - * It also guards devlink devices list and it is taken when
>> - * driver registers/unregisters it.
>>   */
>>  static DEFINE_MUTEX(devlink_mutex);
>> 
>> @@ -232,10 +231,21 @@ struct net *devlink_net(const struct devlink *devlink)
>>  }
>>  EXPORT_SYMBOL_GPL(devlink_net);
>> 
>> +static void __devlink_put_rcu(struct rcu_head *head)
>> +{
>> +	struct devlink *devlink = container_of(head, struct devlink, rcu);
>> +
>> +	complete(&devlink->comp);
>> +}
>> +
>>  void devlink_put(struct devlink *devlink)
>>  {
>>  	if (refcount_dec_and_test(&devlink->refcount))
>> -		complete(&devlink->comp);
>> +		/* Make sure unregister operation that may await the completion
>> +		 * is unblocked only after all users are after the end of
>> +		 * RCU grace period.
>> +		 */
>> +		call_rcu(&devlink->rcu, __devlink_put_rcu);
>>  }
>> 
>>  struct devlink *__must_check devlink_try_get(struct devlink *devlink)
>> @@ -295,6 +305,7 @@ static struct devlink *devlink_get_from_attrs(struct net
>> *net,
>> 
>>  	lockdep_assert_held(&devlink_mutex);
>> 
>> +	rcu_read_lock();
>>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>>  		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
>>  		    strcmp(dev_name(devlink->dev), devname) == 0 &&
>> @@ -306,6 +317,7 @@ static struct devlink *devlink_get_from_attrs(struct net
>> *net,
>> 
>>  	if (!found || !devlink_try_get(devlink))
>>  		devlink = ERR_PTR(-ENODEV);
>> +	rcu_read_unlock();
>> 
>>  	return devlink;
>>  }
>> @@ -1329,9 +1341,11 @@ static int devlink_nl_cmd_rate_get_dumpit(struct
>> sk_buff *msg,
>>  	int err = 0;
>> 
>>  	mutex_lock(&devlink_mutex);
>> +	rcu_read_lock();
>>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>>  		if (!devlink_try_get(devlink))
>>  			continue;
>> +		rcu_read_unlock();
>> 
>>  		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>>  			goto retry;
>> @@ -1358,7 +1372,9 @@ static int devlink_nl_cmd_rate_get_dumpit(struct
>> sk_buff *msg,
>>  		devl_unlock(devlink);
>>  retry:
>>  		devlink_put(devlink);
>> +		rcu_read_lock();
>>  	}
>> +	rcu_read_unlock();
>>  out:
>>  	mutex_unlock(&devlink_mutex);
>>  	if (err != -EMSGSIZE)
>> @@ -1432,29 +1448,32 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff
>> *msg,
>>  	int err;
>> 
>>  	mutex_lock(&devlink_mutex);
>> +	rcu_read_lock();
>>  	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>>  		if (!devlink_try_get(devlink))
>>  			continue;
>> +		rcu_read_unlock();
>> 
>
>Is it safe to rcu_read_unlock here while we're still in the middle of the array processing? What happens if something else updates the xarray? is the for_each_marked safe?

Sure, you don't need to hold rcu_read_lock during call to xa_for_each_marked.
The consistency of xarray is itself guaranteed. The only reason to take
rcu_read_lock outside is that the devlink pointer which is
rcu_dereference_check()'ed inside xa_for_each_marked() is still valid
once we devlink_try_get() it.


>
>> -		if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) {
>> -			devlink_put(devlink);
>> -			continue;
>> -		}
>> +		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
>> +			goto retry;
>> 
>
>Ahh retry is at the end of the loop, so we'll just skip this one and move to the next one without needing to duplicate both devlink_put and rcu_read_lock.. ok.

Yep.


>
>> -		if (idx < start) {
>> -			idx++;
>> -			devlink_put(devlink);
>> -			continue;
>> -		}
>> +		if (idx < start)
>> +			goto inc;
>> 
>>  		err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
>>  				      NETLINK_CB(cb->skb).portid,
>>  				      cb->nlh->nlmsg_seq, NLM_F_MULTI);
>> -		devlink_put(devlink);
>> -		if (err)
>> +		if (err) {
>> +			devlink_put(devlink);
>>  			goto out;
>> +		}
>> +inc:
>>  		idx++;
>> +retry:
>> +		devlink_put(devlink);
>> +		rcu_read_lock();
>>  	}
>> +	rcu_read_unlock();
>>  out:
>>  	mutex_unlock(&devlink_mutex);
>> 

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21  0:49   ` Jakub Kicinski
@ 2022-07-21  5:51     ` Jiri Pirko
  2022-07-21  6:22       ` Jakub Kicinski
  2022-07-22  6:15     ` Jiri Pirko
  2022-07-22 15:50     ` Jiri Pirko
  2 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-21  5:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Thu, Jul 21, 2022 at 02:49:53AM CEST, kuba@kernel.org wrote:
>On Wed, 20 Jul 2022 17:12:24 +0200 Jiri Pirko wrote:
>> +static void __devlink_put_rcu(struct rcu_head *head)
>> +{
>> +	struct devlink *devlink = container_of(head, struct devlink, rcu);
>> +
>> +	complete(&devlink->comp);
>> +}
>> +
>>  void devlink_put(struct devlink *devlink)
>>  {
>>  	if (refcount_dec_and_test(&devlink->refcount))
>> -		complete(&devlink->comp);
>> +		/* Make sure unregister operation that may await the completion
>> +		 * is unblocked only after all users are after the end of
>> +		 * RCU grace period.
>> +		 */
>> +		call_rcu(&devlink->rcu, __devlink_put_rcu);
>>  }
>
>Hm. I always assumed we'd just use the xa_lock(). Unmarking the
>instance as registered takes that lock which provides a natural 
>barrier for others trying to take a reference.

I guess that the xa_lock() scheme could work, as far as I see it. But
what's wrong with the rcu scheme? I actually find it quite neat. No need
to have another odd iteration helpers. We just benefit of xa_array rcu
internals to make sure devlink pointer is valid at the time we make a
reference. Very clear.



>
>Something along these lines (untested):
>
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 98d79feeb3dc..6321ea123f79 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -278,6 +278,38 @@ void devl_unlock(struct devlink *devlink)
> }
> EXPORT_SYMBOL_GPL(devl_unlock);
> 
>+static struct devlink *devlink_iter_next(unsigned long *index)
>+{
>+	struct devlink *devlink;
>+
>+	xa_lock(&devlinks);
>+	devlink = xa_find_after(&devlinks, index, ULONG_MAX,
>+				DEVLINK_REGISTERED);
>+	if (devlink && !refcount_inc_not_zero(&devlink->refcount))
>+		devlink = NULL;
>+	xa_unlock(&devlinks);
>+
>+	return devlink ?: devlink_iter_next(index);
>+}
>+
>+static struct devlink *devlink_iter_start(unsigned long *index)
>+{
>+	struct devlink *devlink;
>+
>+	xa_lock(&devlinks);
>+	devlink = xa_find(&devlinks, index, ULONG_MAX, DEVLINK_REGISTERED);
>+	if (devlink && !refcount_inc_not_zero(&devlink->refcount))
>+		devlink = NULL;
>+	xa_unlock(&devlinks);
>+
>+	return devlink ?: devlink_iter_next(index);
>+}
>+
>+#define devlink_for_each_get(index, entry)			\
>+	for (index = 0, entry = devlink_iter_start(&index);	\
>+	     entry; entry = devlink_iter_next(&index))
>+
> static struct devlink *devlink_get_from_attrs(struct net *net,
> 					      struct nlattr **attrs)
> {
>@@ -1329,10 +1361,7 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg,
> 	int err = 0;
> 
> 	mutex_lock(&devlink_mutex);
>-	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>-		if (!devlink_try_get(devlink))
>-			continue;
>-
>+	devlink_for_each_get(index, devlink) {
> 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
> 			goto retry;
> 
>etc.
>
>Plus we need to be more careful about the unregistering order, I
>believe the correct ordering is:
>
>	clear_unmark()
>	put()
>	wait()
>	notify()
>
>but I believe we'll run afoul of Leon's notification suppression.
>So I guess notify() has to go before clear_unmark(), but we should
>unmark before we wait otherwise we could live lock (once the mutex 
>is really gone, I mean).

Will check.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21  5:51     ` Jiri Pirko
@ 2022-07-21  6:22       ` Jakub Kicinski
  2022-07-21 12:04         ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2022-07-21  6:22 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Thu, 21 Jul 2022 07:51:37 +0200 Jiri Pirko wrote:
> >Hm. I always assumed we'd just use the xa_lock(). Unmarking the
> >instance as registered takes that lock which provides a natural 
> >barrier for others trying to take a reference.  
> 
> I guess that the xa_lock() scheme could work, as far as I see it. But
> what's wrong with the rcu scheme? I actually find it quite neat. No need
> to have another odd iteration helpers. We just benefit of xa_array rcu
> internals to make sure devlink pointer is valid at the time we make a
> reference. Very clear.

Nothing strongly against the RCU scheme, TBH. Just didn't expect it.
I can concoct some argument like it's one extra sync primitive we
haven't had to think about in devlink so far, but really if you prefer
RCU, I don't mind.

I do like the idea of wrapping the iteration into our own helper, tho.
Contains the implementation details of the iteration nicely. I didn't
look in sufficient detail but I would have even considered rolling the
namespace check into it for dump.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 03/11] mlxsw: core_linecards: Introduce per line card auxiliary device
  2022-07-20 15:12 ` [patch net-next v3 03/11] mlxsw: core_linecards: Introduce per line card auxiliary device Jiri Pirko
@ 2022-07-21  8:04   ` Ido Schimmel
  0 siblings, 0 replies; 32+ messages in thread
From: Ido Schimmel @ 2022-07-21  8:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Wed, Jul 20, 2022 at 05:12:26PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> In order to be eventually able to expose line card gearbox version and
> possibility to flash FW, model the line card as a separate device on
> auxiliary bus.
> 
> Add the auxiliary device for provisioned line card in order to be able
> to expose provisioned line card info over devlink dev info. When the
> line card becomes active, there may be other additional info added to
> the output.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 04/11] mlxsw: core_linecards: Expose HW revision and INI version
  2022-07-20 15:12 ` [patch net-next v3 04/11] mlxsw: core_linecards: Expose HW revision and INI version Jiri Pirko
@ 2022-07-21  8:05   ` Ido Schimmel
  0 siblings, 0 replies; 32+ messages in thread
From: Ido Schimmel @ 2022-07-21  8:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Wed, Jul 20, 2022 at 05:12:27PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Implement info_get() to expose HW revision of a linecard and loaded INI
> version.
> 
> Example:
> 
> $ devlink dev info auxiliary/mlxsw_core.lc.0
> auxiliary/mlxsw_core.lc.0:
>   versions:
>       fixed:
>         hw.revision 0
>       running:
>         ini.version 4
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version
  2022-07-20 15:12 ` [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version Jiri Pirko
@ 2022-07-21  8:11   ` Ido Schimmel
  2022-07-21 16:01     ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Ido Schimmel @ 2022-07-21  8:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

The subject is misleading, only ready/active line cards are probed for
FW version, not merely provisioned ones.

On Wed, Jul 20, 2022 at 05:12:29PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> In case the line card is provisioned, go over all possible existing

Same comment

> devices (gearboxes) on it and expose FW version of the flashable one.
> 
> Example:
> 
> $ devlink dev info auxiliary/mlxsw_core.lc.0
> auxiliary/mlxsw_core.lc.0:
>   versions:
>       fixed:
>         hw.revision 0
>       running:
>         ini.version 4
>         fw 19.2010.1312
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Assuming the above will be fixed in next version (it's already marked as
"Changes Requested"):

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 08/11] mlxsw: core_linecards: Expose device PSID over device info
  2022-07-20 15:12 ` [patch net-next v3 08/11] mlxsw: core_linecards: Expose device PSID over device info Jiri Pirko
@ 2022-07-21  8:13   ` Ido Schimmel
  0 siblings, 0 replies; 32+ messages in thread
From: Ido Schimmel @ 2022-07-21  8:13 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Wed, Jul 20, 2022 at 05:12:31PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Use tunneled MGIR to obtain PSID of line card device and extend
> device_info_get() op to fill up the info with that.
> 
> Example:
> 
> $ devlink dev info auxiliary/mlxsw_core.lc.0
> auxiliary/mlxsw_core.lc.0:
>   versions:
>       fixed:
>         hw.revision 0
>         fw.psid MT_0000000749
>       running:
>         ini.version 4
>         fw 19.2010.1312
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing
  2022-07-20 15:12 ` [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing Jiri Pirko
@ 2022-07-21  8:25   ` Ido Schimmel
  2022-07-21 16:01     ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Ido Schimmel @ 2022-07-21  8:25 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Wed, Jul 20, 2022 at 05:12:32PM +0200, Jiri Pirko wrote:
> +int mlxsw_linecard_flash_update(struct devlink *linecard_devlink,
> +				struct mlxsw_linecard *linecard,
> +				const struct firmware *firmware,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct mlxsw_core *mlxsw_core = linecard->linecards->mlxsw_core;
> +	struct mlxsw_linecard_device_fw_info info = {
> +		.mlxfw_dev = {
> +			.ops = &mlxsw_linecard_device_dev_ops,
> +			.psid = linecard->device.info.psid,
> +			.psid_size = strlen(linecard->device.info.psid),
> +			.devlink = linecard_devlink,
> +		},
> +		.mlxsw_core = mlxsw_core,
> +		.linecard = linecard,
> +	};
> +	int err;
> +
> +	mutex_lock(&linecard->lock);
> +	if (!linecard->active) {
> +		NL_SET_ERR_MSG_MOD(extack, "Failed to flash non-active linecard");

IMO it's not clear enough that the problem is the fact that the line
card is inactive. Maybe:

"Only active linecards can be flashed"

Either way:

Reviewed-by: Ido Schimmel <idosch@nvidia.com>


> +		err = -EINVAL;
> +		goto unlock;
> +	}
> +	err = mlxsw_core_fw_flash(mlxsw_core, &info.mlxfw_dev,
> +				  firmware, extack);
> +unlock:
> +	mutex_unlock(&linecard->lock);
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21  6:22       ` Jakub Kicinski
@ 2022-07-21 12:04         ` Jiri Pirko
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-21 12:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, netdev, davem, idosch, petrm, pabeni, edumazet,
	mlxsw, saeedm, snelson

Thu, Jul 21, 2022 at 08:22:58AM CEST, kuba@kernel.org wrote:
>On Thu, 21 Jul 2022 07:51:37 +0200 Jiri Pirko wrote:
>> >Hm. I always assumed we'd just use the xa_lock(). Unmarking the
>> >instance as registered takes that lock which provides a natural 
>> >barrier for others trying to take a reference.  
>> 
>> I guess that the xa_lock() scheme could work, as far as I see it. But
>> what's wrong with the rcu scheme? I actually find it quite neat. No need
>> to have another odd iteration helpers. We just benefit of xa_array rcu
>> internals to make sure devlink pointer is valid at the time we make a
>> reference. Very clear.
>
>Nothing strongly against the RCU scheme, TBH. Just didn't expect it.
>I can concoct some argument like it's one extra sync primitive we
>haven't had to think about in devlink so far, but really if you prefer
>RCU, I don't mind.
>
>I do like the idea of wrapping the iteration into our own helper, tho.
>Contains the implementation details of the iteration nicely. I didn't
>look in sufficient detail but I would have even considered rolling the
>namespace check into it for dump.

Hmm, okay. I will think about helpers to contain the
iteration/rcu/refget stuff.

Thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing
  2022-07-21  8:25   ` Ido Schimmel
@ 2022-07-21 16:01     ` Jiri Pirko
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-21 16:01 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Thu, Jul 21, 2022 at 10:25:45AM CEST, idosch@nvidia.com wrote:
>On Wed, Jul 20, 2022 at 05:12:32PM +0200, Jiri Pirko wrote:
>> +int mlxsw_linecard_flash_update(struct devlink *linecard_devlink,
>> +				struct mlxsw_linecard *linecard,
>> +				const struct firmware *firmware,
>> +				struct netlink_ext_ack *extack)
>> +{
>> +	struct mlxsw_core *mlxsw_core = linecard->linecards->mlxsw_core;
>> +	struct mlxsw_linecard_device_fw_info info = {
>> +		.mlxfw_dev = {
>> +			.ops = &mlxsw_linecard_device_dev_ops,
>> +			.psid = linecard->device.info.psid,
>> +			.psid_size = strlen(linecard->device.info.psid),
>> +			.devlink = linecard_devlink,
>> +		},
>> +		.mlxsw_core = mlxsw_core,
>> +		.linecard = linecard,
>> +	};
>> +	int err;
>> +
>> +	mutex_lock(&linecard->lock);
>> +	if (!linecard->active) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Failed to flash non-active linecard");
>
>IMO it's not clear enough that the problem is the fact that the line
>card is inactive. Maybe:
>
>"Only active linecards can be flashed"

Fixed.

>
>Either way:
>
>Reviewed-by: Ido Schimmel <idosch@nvidia.com>
>
>
>> +		err = -EINVAL;
>> +		goto unlock;
>> +	}
>> +	err = mlxsw_core_fw_flash(mlxsw_core, &info.mlxfw_dev,
>> +				  firmware, extack);
>> +unlock:
>> +	mutex_unlock(&linecard->lock);
>> +	return err;
>> +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version
  2022-07-21  8:11   ` Ido Schimmel
@ 2022-07-21 16:01     ` Jiri Pirko
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-21 16:01 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Thu, Jul 21, 2022 at 10:11:48AM CEST, idosch@nvidia.com wrote:
>The subject is misleading, only ready/active line cards are probed for
>FW version, not merely provisioned ones.

Fixed.


>
>On Wed, Jul 20, 2022 at 05:12:29PM +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> In case the line card is provisioned, go over all possible existing
>
>Same comment

Fixed.


>
>> devices (gearboxes) on it and expose FW version of the flashable one.
>> 
>> Example:
>> 
>> $ devlink dev info auxiliary/mlxsw_core.lc.0
>> auxiliary/mlxsw_core.lc.0:
>>   versions:
>>       fixed:
>>         hw.revision 0
>>       running:
>>         ini.version 4
>>         fw 19.2010.1312
>> 
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>
>Assuming the above will be fixed in next version (it's already marked as
>"Changes Requested"):
>
>Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21  5:45     ` Jiri Pirko
@ 2022-07-21 18:55       ` Keller, Jacob E
  2022-07-22  6:15         ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Keller, Jacob E @ 2022-07-21 18:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw,
	saeedm, snelson



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Wednesday, July 20, 2022 10:45 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> idosch@nvidia.com; petrm@nvidia.com; pabeni@redhat.com;
> edumazet@google.com; mlxsw@nvidia.com; saeedm@nvidia.com;
> snelson@pensando.io
> Subject: Re: [patch net-next v3 01/11] net: devlink: make sure that
> devlink_try_get() works with valid pointer during xarray iteration
> 
> >Is it safe to rcu_read_unlock here while we're still in the middle of the array
> processing? What happens if something else updates the xarray? is the
> for_each_marked safe?
> 
> Sure, you don't need to hold rcu_read_lock during call to xa_for_each_marked.
> The consistency of xarray is itself guaranteed. The only reason to take
> rcu_read_lock outside is that the devlink pointer which is
> rcu_dereference_check()'ed inside xa_for_each_marked() is still valid
> once we devlink_try_get() it.
> 

Excellent, ok. Basically we need the RCU for protecting just the pointer until we get a reference to it separately.

Thanks!

In that case:

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21  0:49   ` Jakub Kicinski
  2022-07-21  5:51     ` Jiri Pirko
@ 2022-07-22  6:15     ` Jiri Pirko
  2022-07-22 15:50     ` Jiri Pirko
  2 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-22  6:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Thu, Jul 21, 2022 at 02:49:53AM CEST, kuba@kernel.org wrote:
>On Wed, 20 Jul 2022 17:12:24 +0200 Jiri Pirko wrote:
>> +static void __devlink_put_rcu(struct rcu_head *head)
>> +{
>> +	struct devlink *devlink = container_of(head, struct devlink, rcu);
>> +
>> +	complete(&devlink->comp);
>> +}
>> +
>>  void devlink_put(struct devlink *devlink)
>>  {
>>  	if (refcount_dec_and_test(&devlink->refcount))
>> -		complete(&devlink->comp);
>> +		/* Make sure unregister operation that may await the completion
>> +		 * is unblocked only after all users are after the end of
>> +		 * RCU grace period.
>> +		 */
>> +		call_rcu(&devlink->rcu, __devlink_put_rcu);
>>  }
>
>Hm. I always assumed we'd just use the xa_lock(). Unmarking the
>instance as registered takes that lock which provides a natural 
>barrier for others trying to take a reference.
>
>Something along these lines (untested):
>
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 98d79feeb3dc..6321ea123f79 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -278,6 +278,38 @@ void devl_unlock(struct devlink *devlink)
> }
> EXPORT_SYMBOL_GPL(devl_unlock);
> 
>+static struct devlink *devlink_iter_next(unsigned long *index)
>+{
>+	struct devlink *devlink;
>+
>+	xa_lock(&devlinks);
>+	devlink = xa_find_after(&devlinks, index, ULONG_MAX,
>+				DEVLINK_REGISTERED);
>+	if (devlink && !refcount_inc_not_zero(&devlink->refcount))
>+		devlink = NULL;
>+	xa_unlock(&devlinks);
>+
>+	return devlink ?: devlink_iter_next(index);
>+}
>+
>+static struct devlink *devlink_iter_start(unsigned long *index)
>+{
>+	struct devlink *devlink;
>+
>+	xa_lock(&devlinks);
>+	devlink = xa_find(&devlinks, index, ULONG_MAX, DEVLINK_REGISTERED);
>+	if (devlink && !refcount_inc_not_zero(&devlink->refcount))
>+		devlink = NULL;
>+	xa_unlock(&devlinks);
>+
>+	return devlink ?: devlink_iter_next(index);
>+}
>+
>+#define devlink_for_each_get(index, entry)			\
>+	for (index = 0, entry = devlink_iter_start(&index);	\
>+	     entry; entry = devlink_iter_next(&index))
>+
> static struct devlink *devlink_get_from_attrs(struct net *net,
> 					      struct nlattr **attrs)
> {
>@@ -1329,10 +1361,7 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg,
> 	int err = 0;
> 
> 	mutex_lock(&devlink_mutex);
>-	xa_for_each_marked(&devlinks, index, devlink, DEVLINK_REGISTERED) {
>-		if (!devlink_try_get(devlink))
>-			continue;
>-
>+	devlink_for_each_get(index, devlink) {
> 		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
> 			goto retry;
> 
>etc.
>
>Plus we need to be more careful about the unregistering order, I
>believe the correct ordering is:
>
>	clear_unmark()
>	put()
>	wait()
>	notify()

Fixed.

>
>but I believe we'll run afoul of Leon's notification suppression.
>So I guess notify() has to go before clear_unmark(), but we should
>unmark before we wait otherwise we could live lock (once the mutex 
>is really gone, I mean).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21 18:55       ` Keller, Jacob E
@ 2022-07-22  6:15         ` Jiri Pirko
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-22  6:15 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: netdev, davem, kuba, idosch, petrm, pabeni, edumazet, mlxsw,
	saeedm, snelson

Thu, Jul 21, 2022 at 08:55:04PM CEST, jacob.e.keller@intel.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Wednesday, July 20, 2022 10:45 PM
>> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> Cc: netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
>> idosch@nvidia.com; petrm@nvidia.com; pabeni@redhat.com;
>> edumazet@google.com; mlxsw@nvidia.com; saeedm@nvidia.com;
>> snelson@pensando.io
>> Subject: Re: [patch net-next v3 01/11] net: devlink: make sure that
>> devlink_try_get() works with valid pointer during xarray iteration
>> 
>> >Is it safe to rcu_read_unlock here while we're still in the middle of the array
>> processing? What happens if something else updates the xarray? is the
>> for_each_marked safe?
>> 
>> Sure, you don't need to hold rcu_read_lock during call to xa_for_each_marked.
>> The consistency of xarray is itself guaranteed. The only reason to take
>> rcu_read_lock outside is that the devlink pointer which is
>> rcu_dereference_check()'ed inside xa_for_each_marked() is still valid
>> once we devlink_try_get() it.
>> 
>
>Excellent, ok. Basically we need the RCU for protecting just the pointer until we get a reference to it separately.

Yep.


>
>Thanks!
>
>In that case:
>
>Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

Thanks. I will send v4 soon wrapping this up into helper as Jakub
requested.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-21  0:49   ` Jakub Kicinski
  2022-07-21  5:51     ` Jiri Pirko
  2022-07-22  6:15     ` Jiri Pirko
@ 2022-07-22 15:50     ` Jiri Pirko
  2022-07-22 18:23       ` Jakub Kicinski
  2 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-22 15:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Thu, Jul 21, 2022 at 02:49:53AM CEST, kuba@kernel.org wrote:
>On Wed, 20 Jul 2022 17:12:24 +0200 Jiri Pirko wrote:

[...]


>Plus we need to be more careful about the unregistering order, I
>believe the correct ordering is:
>
>	clear_unmark()
>	put()
>	wait()
>	notify()
>
>but I believe we'll run afoul of Leon's notification suppression.
>So I guess notify() has to go before clear_unmark(), but we should
>unmark before we wait otherwise we could live lock (once the mutex 
>is really gone, I mean).

Kuba, could you elaborate a bit more about the live lock problem here?
Thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-22 15:50     ` Jiri Pirko
@ 2022-07-22 18:23       ` Jakub Kicinski
  2022-07-23 15:41         ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2022-07-22 18:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

On Fri, 22 Jul 2022 17:50:17 +0200 Jiri Pirko wrote:
> >Plus we need to be more careful about the unregistering order, I
> >believe the correct ordering is:
> >
> >	clear_unmark()
> >	put()
> >	wait()
> >	notify()
> >
> >but I believe we'll run afoul of Leon's notification suppression.
> >So I guess notify() has to go before clear_unmark(), but we should
> >unmark before we wait otherwise we could live lock (once the mutex 
> >is really gone, I mean).  
> 
> Kuba, could you elaborate a bit more about the live lock problem here?

Once the devlink_mutex lock is gone - (unprivileged) user space dumping
devlink objects could prevent any de-registration from happening
because it can keep the reference of the instance up. So we should mark
the instance as not REGISTERED first, then go to wait.

Pretty theoretical, I guess, but I wanted to mention it in case you can
figure out a solution along the way :S I don't think it's a blocker
right now since we still have the mutex.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-22 18:23       ` Jakub Kicinski
@ 2022-07-23 15:41         ` Jiri Pirko
  2022-07-25  8:17           ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2022-07-23 15:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Fri, Jul 22, 2022 at 08:23:48PM CEST, kuba@kernel.org wrote:
>On Fri, 22 Jul 2022 17:50:17 +0200 Jiri Pirko wrote:
>> >Plus we need to be more careful about the unregistering order, I
>> >believe the correct ordering is:
>> >
>> >	clear_unmark()
>> >	put()
>> >	wait()
>> >	notify()
>> >
>> >but I believe we'll run afoul of Leon's notification suppression.
>> >So I guess notify() has to go before clear_unmark(), but we should
>> >unmark before we wait otherwise we could live lock (once the mutex 
>> >is really gone, I mean).  
>> 
>> Kuba, could you elaborate a bit more about the live lock problem here?
>
>Once the devlink_mutex lock is gone - (unprivileged) user space dumping
>devlink objects could prevent any de-registration from happening
>because it can keep the reference of the instance up. So we should mark
>the instance as not REGISTERED first, then go to wait.

Yeah, that is what I thought. I resolved it as you wrote. I removed the
WARN_ON from devlink_notify(). It is really not good for anything
anyway.


>
>Pretty theoretical, I guess, but I wanted to mention it in case you can
>figure out a solution along the way :S I don't think it's a blocker
>right now since we still have the mutex.

Got it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
  2022-07-23 15:41         ` Jiri Pirko
@ 2022-07-25  8:17           ` Jiri Pirko
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2022-07-25  8:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, idosch, petrm, pabeni, edumazet, mlxsw, saeedm, snelson

Sat, Jul 23, 2022 at 05:41:08PM CEST, jiri@resnulli.us wrote:
>Fri, Jul 22, 2022 at 08:23:48PM CEST, kuba@kernel.org wrote:
>>On Fri, 22 Jul 2022 17:50:17 +0200 Jiri Pirko wrote:
>>> >Plus we need to be more careful about the unregistering order, I
>>> >believe the correct ordering is:
>>> >
>>> >	clear_unmark()
>>> >	put()
>>> >	wait()
>>> >	notify()
>>> >
>>> >but I believe we'll run afoul of Leon's notification suppression.
>>> >So I guess notify() has to go before clear_unmark(), but we should
>>> >unmark before we wait otherwise we could live lock (once the mutex 
>>> >is really gone, I mean).  
>>> 
>>> Kuba, could you elaborate a bit more about the live lock problem here?
>>
>>Once the devlink_mutex lock is gone - (unprivileged) user space dumping
>>devlink objects could prevent any de-registration from happening
>>because it can keep the reference of the instance up. So we should mark
>>the instance as not REGISTERED first, then go to wait.
>
>Yeah, that is what I thought. I resolved it as you wrote. I removed the
>WARN_ON from devlink_notify(). It is really not good for anything
>anyway.

The check for "registered" is in more notifications. I will handle this
in the next patchset, you are right, it is not needed to handle here.

Sending v4.

Thanks!

>
>
>>
>>Pretty theoretical, I guess, but I wanted to mention it in case you can
>>figure out a solution along the way :S I don't think it's a blocker
>>right now since we still have the mutex.
>
>Got it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-07-25  8:17 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-20 15:12 [patch net-next v3 00/11] mlxsw: Implement dev info and dev flash for line cards Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 01/11] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Jiri Pirko
2022-07-20 22:25   ` Keller, Jacob E
2022-07-21  5:45     ` Jiri Pirko
2022-07-21 18:55       ` Keller, Jacob E
2022-07-22  6:15         ` Jiri Pirko
2022-07-21  0:49   ` Jakub Kicinski
2022-07-21  5:51     ` Jiri Pirko
2022-07-21  6:22       ` Jakub Kicinski
2022-07-21 12:04         ` Jiri Pirko
2022-07-22  6:15     ` Jiri Pirko
2022-07-22 15:50     ` Jiri Pirko
2022-07-22 18:23       ` Jakub Kicinski
2022-07-23 15:41         ` Jiri Pirko
2022-07-25  8:17           ` Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 02/11] net: devlink: introduce nested devlink entity for line card Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 03/11] mlxsw: core_linecards: Introduce per line card auxiliary device Jiri Pirko
2022-07-21  8:04   ` Ido Schimmel
2022-07-20 15:12 ` [patch net-next v3 04/11] mlxsw: core_linecards: Expose HW revision and INI version Jiri Pirko
2022-07-21  8:05   ` Ido Schimmel
2022-07-20 15:12 ` [patch net-next v3 05/11] mlxsw: reg: Extend MDDQ by device_info Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 06/11] mlxsw: core_linecards: Probe provisioned line cards for devices and expose FW version Jiri Pirko
2022-07-21  8:11   ` Ido Schimmel
2022-07-21 16:01     ` Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 07/11] mlxsw: reg: Add Management DownStream Device Tunneling Register Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 08/11] mlxsw: core_linecards: Expose device PSID over device info Jiri Pirko
2022-07-21  8:13   ` Ido Schimmel
2022-07-20 15:12 ` [patch net-next v3 09/11] mlxsw: core_linecards: Implement line card device flashing Jiri Pirko
2022-07-21  8:25   ` Ido Schimmel
2022-07-21 16:01     ` Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 10/11] selftests: mlxsw: Check line card info on provisioned line card Jiri Pirko
2022-07-20 15:12 ` [patch net-next v3 11/11] selftests: mlxsw: Check line card info on activated " Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).