All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable
@ 2020-12-06 23:59 Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage Vladimir Oltean
                   ` (12 more replies)
  0 siblings, 13 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Leon Romanovsky, Heiko Carstens, Vasily Gorbik,
	linux-s390, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	Sridhar Samudrala, James E.J. Bottomley, Helge Deller,
	linux-parisc, Christian Brauner

This series converts all callers of dev_get_stats() to be in sleepable
context, so that we can do more work in the .ndo_get_stats64 method.

The situation today is that if we have hardware that needs to be
accessed through a slow bus like SPI, or through a firmware, we cannot
do that directly in .ndo_get_stats64, so we have to poll counters
periodically and return a cached (not up to date) copy in the atomic NDO
callback. This is undesirable on both ends: more work than strictly
needed is being done, and the end result is also worse (not guaranteed
to be up to date). So converting the code paths to be compatible with
sleeping seems to make more sense.

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Cc: Christian Brauner <christian.brauner@ubuntu.com>

Vladimir Oltean (13):
  RDMA/mlx4: remove bogus dev_base_lock usage
  net: mark dev_base_lock for deprecation
  net: introduce a mutex for the netns interface lists
  s390/appldata_net_sum: hold the netdev lists lock when retrieving
    device statistics
  net: bonding: hold the netdev lists lock when retrieving device
    statistics
  net_failover: hold the netdev lists lock when retrieving device
    statistics
  parisc/led: remove trailing whitespaces
  parisc/led: reindent the code that gathers device statistics
  parisc/led: hold the netdev lists lock when retrieving device
    statistics
  net: procfs: hold the netdev lists lock when retrieving device
    statistics
  net: sysfs: don't hold dev_base_lock while retrieving device
    statistics
  net: mark ndo_get_stats64 as being able to sleep
  net: remove obsolete comments about ndo_get_stats64 context from eth
    drivers

 Documentation/networking/netdevices.rst     |   4 +-
 Documentation/networking/statistics.rst     |   9 +-
 arch/s390/appldata/appldata_net_sum.c       |   8 +-
 drivers/infiniband/hw/mlx4/main.c           |   3 -
 drivers/net/bonding/bond_main.c             |  16 +-
 drivers/net/ethernet/cisco/enic/enic_main.c |   1 -
 drivers/net/ethernet/nvidia/forcedeth.c     |   2 -
 drivers/net/ethernet/sfc/efx_common.c       |   1 -
 drivers/net/ethernet/sfc/falcon/efx.c       |   1 -
 drivers/net/net_failover.c                  |  15 +-
 drivers/parisc/led.c                        | 164 ++++++++++----------
 include/net/bonding.h                       |   1 -
 include/net/net_failover.h                  |   3 -
 include/net/net_namespace.h                 |   5 +
 net/core/dev.c                              |  63 +++++---
 net/core/net-procfs.c                       |  13 +-
 net/core/net-sysfs.c                        |   3 +-
 17 files changed, 162 insertions(+), 150 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-08  6:43   ` Leon Romanovsky
  2020-12-06 23:59 ` [RFC PATCH net-next 02/13] net: mark dev_base_lock for deprecation Vladimir Oltean
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Leon Romanovsky

The dev_base_lock does not protect dev->dev_addr, so it serves no
purpose here.

Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/infiniband/hw/mlx4/main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index f0864f40ea1a..e3cd402c079a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2265,10 +2265,7 @@ static void mlx4_ib_update_qps(struct mlx4_ib_dev *ibdev,
 	u64 release_mac = MLX4_IB_INVALID_MAC;
 	struct mlx4_ib_qp *qp;
 
-	read_lock(&dev_base_lock);
 	new_smac = mlx4_mac_to_u64(dev->dev_addr);
-	read_unlock(&dev_base_lock);
-
 	atomic64_set(&ibdev->iboe.mac[port - 1], new_smac);
 
 	/* no need for update QP1 and mac registration in non-SRIOV */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 02/13] net: mark dev_base_lock for deprecation
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 03/13] net: introduce a mutex for the netns interface lists Vladimir Oltean
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel

There is a movement to eliminate the usage of dev_base_lock, which
exists since as far as I could track the kernel history down (the
"7a2deb329241 Import changeset" commit from the bitkeeper branch).

The dev_base_lock approach has multiple issues:
- It is global and not per netns.
- Its meaning has mutated over the years and the vast majority of
  current users is using it to protect against changes of network device
  state, as opposed to changes of the interface list.
- It is overlapping with other protection mechanisms introduced in the
  meantime, which have higher performance for readers, like RCU
  introduced in 2009 by Eric Dumazet for kernel 2.6.

The old comment that I just deleted made this distinction: writers
protect only against readers by holding dev_base_lock, whereas they need
to protect against other writers by holding the RTNL mutex. This is
slightly confusing/incorrect, since a rwlock_t cannot have more than one
concurrently running writer, so that explanation does not justify why
the RTNL mutex would be necessary.

Instead, Stephen Hemminger makes this clarification here:
https://lore.kernel.org/netdev/20201129211230.4d704931@hermes.local/T/#u

| There are really two different domains being covered by locks here.
|
| The first area is change of state of network devices. This has traditionally
| been covered by RTNL because there are places that depend on coordinating
| state between multiple devices. RTNL is too big and held too long but getting
| rid of it is hard because there are corner cases (like state changes from userspace
| for VPN devices).
|
| The other area is code that wants to do read access to look at list of devices.
| These pure readers can/should be converted to RCU by now. Writers should hold RTNL.

This patch edits the comment for dev_base_lock, minimizing its role in
the protection of network interface lists, and clarifies that it is has
other purposes as well.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/dev.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d33099fd7a20..701a326682e8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -169,23 +169,22 @@ static int call_netdevice_notifiers_extack(unsigned long val,
 static struct napi_struct *napi_by_id(unsigned int napi_id);
 
 /*
- * The @dev_base_head list is protected by @dev_base_lock and the rtnl
- * semaphore.
- *
- * Pure readers hold dev_base_lock for reading, or rcu_read_lock()
- *
- * Writers must hold the rtnl semaphore while they loop through the
- * dev_base_head list, and hold dev_base_lock for writing when they do the
- * actual updates.  This allows pure readers to access the list even
- * while a writer is preparing to update it.
- *
- * To put it another way, dev_base_lock is held for writing only to
- * protect against pure readers; the rtnl semaphore provides the
- * protection against other writers.
- *
- * See, for example usages, register_netdevice() and
- * unregister_netdevice(), which must be called with the rtnl
- * semaphore held.
+ * The network interface list of a netns (@net->dev_base_head) and the hash
+ * tables per ifindex (@net->dev_index_head) and per name (@net->dev_name_head)
+ * are protected using the following rules:
+ *
+ * Pure readers should hold rcu_read_lock() which should protect them against
+ * concurrent changes to the interface lists made by the writers. Pure writers
+ * must serialize by holding the RTNL mutex while they loop through the list
+ * and make changes to it.
+ *
+ * It is also possible to hold the global rwlock_t @dev_base_lock for
+ * protection (holding its read side as an alternative to rcu_read_lock, and
+ * its write side as an alternative to the RTNL mutex), however this should not
+ * be done in new code, since it is deprecated and pending removal.
+ *
+ * One other role of @dev_base_lock is to protect against changes in the
+ * operational state of a network interface.
  */
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 03/13] net: introduce a mutex for the netns interface lists
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 02/13] net: mark dev_base_lock for deprecation Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 04/13] s390/appldata_net_sum: hold the netdev lists lock when retrieving device statistics Vladimir Oltean
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel

Currently, any writer that wants to alter the lists of network
interfaces (either the plain list net->dev_base_head, or the hash tables
net->dev_index_head and net->dev_name_head) can keep other writers at
bay using the RTNL mutex.

However, the RTNL mutex has become a very contended resource over the
years, so there is a movement to do finer grained locking.

This patch adds one more way for writers to the network interface lists
to serialize themselves. We assume that all writers to the network
interface lists are easily identifiable because the write side of
dev_base_lock also needs to be held (note that some instances of that
were deliberately skipped, since they only dealt with protecting the
operational state of the netdev).

Holding the RTNL mutex is now optional for new code that alters the
lists, since all relevant writers were made to also hold the new lock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/net_namespace.h |  5 +++++
 net/core/dev.c              | 44 +++++++++++++++++++++++++------------
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 29567875f428..90826982843d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -105,6 +105,11 @@ struct net {
 	struct hlist_head	*dev_index_head;
 	struct raw_notifier_head	netdev_chain;
 
+	/* Serializes writers to @dev_base_head, @dev_name_head
+	 * and @dev_index_head
+	 */
+	struct mutex		netdev_lists_lock;
+
 	/* Note that @hash_mix can be read millions times per second,
 	 * it is critical that it is on a read_mostly cache line.
 	 */
diff --git a/net/core/dev.c b/net/core/dev.c
index 701a326682e8..18904acb7797 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -175,13 +175,16 @@ static struct napi_struct *napi_by_id(unsigned int napi_id);
  *
  * Pure readers should hold rcu_read_lock() which should protect them against
  * concurrent changes to the interface lists made by the writers. Pure writers
- * must serialize by holding the RTNL mutex while they loop through the list
- * and make changes to it.
+ * must serialize by holding the @net->netdev_lists_lock mutex while they loop
+ * through the list and make changes to it.
+ *
+ * It is possible to hold the RTNL mutex for serializing the writers too, but
+ * this should be avoided in new code due to lock contention.
  *
  * It is also possible to hold the global rwlock_t @dev_base_lock for
  * protection (holding its read side as an alternative to rcu_read_lock, and
- * its write side as an alternative to the RTNL mutex), however this should not
- * be done in new code, since it is deprecated and pending removal.
+ * its write side as an alternative to @net->netdev_lists_lock), however this
+ * should not be done in new code, since it is deprecated and pending removal.
  *
  * One other role of @dev_base_lock is to protect against changes in the
  * operational state of a network interface.
@@ -360,12 +363,14 @@ static void list_netdevice(struct net_device *dev)
 
 	ASSERT_RTNL();
 
+	mutex_lock(&net->netdev_lists_lock);
 	write_lock_bh(&dev_base_lock);
 	list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
 	netdev_name_node_add(net, dev->name_node);
 	hlist_add_head_rcu(&dev->index_hlist,
 			   dev_index_hash(net, dev->ifindex));
 	write_unlock_bh(&dev_base_lock);
+	mutex_unlock(&net->netdev_lists_lock);
 
 	dev_base_seq_inc(net);
 }
@@ -375,16 +380,20 @@ static void list_netdevice(struct net_device *dev)
  */
 static void unlist_netdevice(struct net_device *dev)
 {
+	struct net *net = dev_net(dev);
+
 	ASSERT_RTNL();
 
 	/* Unlink dev from the device chain */
+	mutex_lock(&net->netdev_lists_lock);
 	write_lock_bh(&dev_base_lock);
 	list_del_rcu(&dev->dev_list);
 	netdev_name_node_del(dev->name_node);
 	hlist_del_rcu(&dev->index_hlist);
 	write_unlock_bh(&dev_base_lock);
+	mutex_unlock(&net->netdev_lists_lock);
 
-	dev_base_seq_inc(dev_net(dev));
+	dev_base_seq_inc(net);
 }
 
 /*
@@ -850,11 +859,11 @@ EXPORT_SYMBOL_GPL(dev_fill_metadata_dst);
  *	@net: the applicable net namespace
  *	@name: name to find
  *
- *	Find an interface by name. Must be called under RTNL semaphore
- *	or @dev_base_lock. If the name is found a pointer to the device
- *	is returned. If the name is not found then %NULL is returned. The
- *	reference counters are not incremented so the caller must be
- *	careful with locks.
+ *	Find an interface by name. Must be called under RTNL semaphore,
+ *	@net->netdev_lists_lock or @dev_base_lock. If the name is found,
+ *	a pointer to the device is returned. If the name is not found then
+ *	%NULL is returned. The reference counters are not incremented so the
+ *	caller must be careful with locks.
  */
 
 struct net_device *__dev_get_by_name(struct net *net, const char *name)
@@ -920,8 +929,8 @@ EXPORT_SYMBOL(dev_get_by_name);
  *	Search for an interface by index. Returns %NULL if the device
  *	is not found or a pointer to the device. The device has not
  *	had its reference counter increased so the caller must be careful
- *	about locking. The caller must hold either the RTNL semaphore
- *	or @dev_base_lock.
+ *	about locking. The caller must hold either the RTNL semaphore,
+ *	@net->netdev_lists_lock or @dev_base_lock.
  */
 
 struct net_device *__dev_get_by_index(struct net *net, int ifindex)
@@ -1330,15 +1339,19 @@ int dev_change_name(struct net_device *dev, const char *newname)
 
 	netdev_adjacent_rename_links(dev, oldname);
 
+	mutex_lock(&net->netdev_lists_lock);
 	write_lock_bh(&dev_base_lock);
 	netdev_name_node_del(dev->name_node);
 	write_unlock_bh(&dev_base_lock);
+	mutex_unlock(&net->netdev_lists_lock);
 
 	synchronize_rcu();
 
+	mutex_lock(&net->netdev_lists_lock);
 	write_lock_bh(&dev_base_lock);
 	netdev_name_node_add(net, dev->name_node);
 	write_unlock_bh(&dev_base_lock);
+	mutex_unlock(&net->netdev_lists_lock);
 
 	ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
 	ret = notifier_to_errno(ret);
@@ -9379,8 +9392,9 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
  *	@net: the applicable net namespace
  *
  *	Returns a suitable unique value for a new device interface
- *	number.  The caller must hold the rtnl semaphore or the
- *	dev_base_lock to be sure it remains unique.
+ *	number.
+ *	The caller must hold the rtnl semaphore, @net->netdev_lists_lock or the
+ *	@dev_base_lock to be sure it remains unique.
  */
 static int dev_new_index(struct net *net)
 {
@@ -10958,6 +10972,8 @@ static int __net_init netdev_init(struct net *net)
 	if (net->dev_index_head == NULL)
 		goto err_idx;
 
+	mutex_init(&net->netdev_lists_lock);
+
 	RAW_INIT_NOTIFIER_HEAD(&net->netdev_chain);
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 04/13] s390/appldata_net_sum: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (2 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 03/13] net: introduce a mutex for the netns interface lists Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 05/13] net: bonding: " Vladimir Oltean
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Heiko Carstens, Vasily Gorbik, linux-s390

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

In the case of the appldata driver, an RCU read-side critical section is
used to ensure the integrity of the list of network interfaces, because
the driver iterates through all net devices in the netns to aggregate
statistics. We still need some protection against an interface
registering or deregistering, and the writer-side lock, the netns's
mutex, is fine for that, because it offers sleepable context.

The ops->callback function is called from under appldata_ops_mutex
protection, so this is proof that the context is sleepable and holding
a mutex is therefore fine.

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 arch/s390/appldata/appldata_net_sum.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/s390/appldata/appldata_net_sum.c b/arch/s390/appldata/appldata_net_sum.c
index 59c282ca002f..27e9cc998d4d 100644
--- a/arch/s390/appldata/appldata_net_sum.c
+++ b/arch/s390/appldata/appldata_net_sum.c
@@ -78,8 +78,9 @@ static void appldata_get_net_sum_data(void *data)
 	tx_dropped = 0;
 	collisions = 0;
 
-	rcu_read_lock();
-	for_each_netdev_rcu(&init_net, dev) {
+	mutex_lock(&init_net.netdev_lists_lock);
+
+	for_each_netdev(&init_net, dev) {
 		const struct rtnl_link_stats64 *stats;
 		struct rtnl_link_stats64 temp;
 
@@ -95,7 +96,8 @@ static void appldata_get_net_sum_data(void *data)
 		collisions += stats->collisions;
 		i++;
 	}
-	rcu_read_unlock();
+
+	mutex_unlock(&init_net.netdev_lists_unlock);
 
 	net_data->nr_interfaces = i;
 	net_data->rx_packets = rx_packets;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (3 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 04/13] s390/appldata_net_sum: hold the netdev lists lock when retrieving device statistics Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-07  1:00   ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 06/13] net_failover: " Vladimir Oltean
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

The bonding driver uses an RCU read-side critical section to ensure the
integrity of the list of network interfaces, because the driver iterates
through all net devices in the netns to find the ones which are its
configured slaves. We still need some protection against an interface
registering or deregistering, and the writer-side lock, the netns mutex,
is fine for that, because it offers sleepable context.

This mutex now serves double duty. It offers code serialization,
something which the stats_lock already did. So now that serves no
purpose, let's remove it.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/bonding/bond_main.c | 16 +++++-----------
 include/net/bonding.h           |  1 -
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e0880a3840d7..f788f9fa1858 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3738,21 +3738,16 @@ static void bond_get_stats(struct net_device *bond_dev,
 			   struct rtnl_link_stats64 *stats)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct net *net = dev_net(bond_dev);
 	struct rtnl_link_stats64 temp;
 	struct list_head *iter;
 	struct slave *slave;
-	int nest_level = 0;
 
+	mutex_lock(&net->netdev_lists_lock);
 
-	rcu_read_lock();
-#ifdef CONFIG_LOCKDEP
-	nest_level = bond_get_lowest_level_rcu(bond_dev);
-#endif
-
-	spin_lock_nested(&bond->stats_lock, nest_level);
 	memcpy(stats, &bond->bond_stats, sizeof(*stats));
 
-	bond_for_each_slave_rcu(bond, slave, iter) {
+	bond_for_each_slave(bond, slave, iter) {
 		const struct rtnl_link_stats64 *new =
 			dev_get_stats(slave->dev, &temp);
 
@@ -3763,8 +3758,8 @@ static void bond_get_stats(struct net_device *bond_dev,
 	}
 
 	memcpy(&bond->bond_stats, stats, sizeof(*stats));
-	spin_unlock(&bond->stats_lock);
-	rcu_read_unlock();
+
+	mutex_unlock(&net->netdev_lists_lock);
 }
 
 static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd)
@@ -5192,7 +5187,6 @@ static int bond_init(struct net_device *bond_dev)
 	if (!bond->wq)
 		return -ENOMEM;
 
-	spin_lock_init(&bond->stats_lock);
 	netdev_lockdep_set_classes(bond_dev);
 
 	list_add_tail(&bond->bond_list, &bn->dev_list);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index d9d0ff3b0ad3..6fbde9713424 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -224,7 +224,6 @@ struct bonding {
 	 * ALB mode (6) - to sync the use and modifications of its hash table
 	 */
 	spinlock_t mode_lock;
-	spinlock_t stats_lock;
 	u8	 send_peer_notif;
 	u8       igmp_retrans;
 #ifdef CONFIG_PROC_FS
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 06/13] net_failover: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (4 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 05/13] net: bonding: " Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-07  2:22   ` kernel test robot
  2020-12-06 23:59 ` [RFC PATCH net-next 07/13] parisc/led: remove trailing whitespaces Vladimir Oltean
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Sridhar Samudrala

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

The net_failover driver makes copious abuse of RCU protection for the
slave interfaces, which is probably unnecessary given the fact that it
already calls dev_hold on slave interfaces. Nonetheless, to avoid
regressions, we still need to offer the same level of protection against
unregistering the standby and primary devices. We can achieve this by
holding the netns mutex, which gives us the sleepable context that
dev_get_stats() wants to see. Holding this mutex also removes the need
for a separate lock for statistics.

Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/net_failover.c | 15 +++++++--------
 include/net/net_failover.h |  3 ---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 2a4892402ed8..90db0358bc1d 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -184,32 +184,31 @@ static void net_failover_get_stats(struct net_device *dev,
 {
 	struct net_failover_info *nfo_info = netdev_priv(dev);
 	const struct rtnl_link_stats64 *new;
+	struct net *net = dev_net(dev);
 	struct rtnl_link_stats64 temp;
 	struct net_device *slave_dev;
 
-	spin_lock(&nfo_info->stats_lock);
-	memcpy(stats, &nfo_info->failover_stats, sizeof(*stats));
+	mutex_lock(&net->netdev_lists_lock);
 
-	rcu_read_lock();
+	memcpy(stats, &nfo_info->failover_stats, sizeof(*stats));
 
-	slave_dev = rcu_dereference(nfo_info->primary_dev);
+	slave_dev = nfo_info->primary_dev;
 	if (slave_dev) {
 		new = dev_get_stats(slave_dev, &temp);
 		net_failover_fold_stats(stats, new, &nfo_info->primary_stats);
 		memcpy(&nfo_info->primary_stats, new, sizeof(*new));
 	}
 
-	slave_dev = rcu_dereference(nfo_info->standby_dev);
+	slave_dev = nfo_info->standby_dev;
 	if (slave_dev) {
 		new = dev_get_stats(slave_dev, &temp);
 		net_failover_fold_stats(stats, new, &nfo_info->standby_stats);
 		memcpy(&nfo_info->standby_stats, new, sizeof(*new));
 	}
 
-	rcu_read_unlock();
-
 	memcpy(&nfo_info->failover_stats, stats, sizeof(*stats));
-	spin_unlock(&nfo_info->stats_lock);
+
+	mutex_unlock(&net->netdev_lists_lock);
 }
 
 static int net_failover_change_mtu(struct net_device *dev, int new_mtu)
diff --git a/include/net/net_failover.h b/include/net/net_failover.h
index b12a1c469d1c..1e0089800a28 100644
--- a/include/net/net_failover.h
+++ b/include/net/net_failover.h
@@ -22,9 +22,6 @@ struct net_failover_info {
 
 	/* aggregated stats */
 	struct rtnl_link_stats64 failover_stats;
-
-	/* spinlock while updating stats */
-	spinlock_t stats_lock;
 };
 
 struct failover *net_failover_create(struct net_device *standby_dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 07/13] parisc/led: remove trailing whitespaces
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (5 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 06/13] net_failover: " Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 08/13] parisc/led: reindent the code that gathers device statistics Vladimir Oltean
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, James E.J. Bottomley, Helge Deller, linux-parisc

This file looks bad in text editors.

Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/parisc/led.c | 128 +++++++++++++++++++++----------------------
 1 file changed, 64 insertions(+), 64 deletions(-)

diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
index 36c6613f7a36..676a12bb94c9 100644
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -47,10 +47,10 @@
 #include <asm/pdc.h>
 #include <linux/uaccess.h>
 
-/* The control of the LEDs and LCDs on PARISC-machines have to be done 
+/* The control of the LEDs and LCDs on PARISC-machines have to be done
    completely in software. The necessary calculations are done in a work queue
-   task which is scheduled regularly, and since the calculations may consume a 
-   relatively large amount of CPU time, some of the calculations can be 
+   task which is scheduled regularly, and since the calculations may consume a
+   relatively large amount of CPU time, some of the calculations can be
    turned off with the following variables (controlled via procfs) */
 
 static int led_type __read_mostly = -1;
@@ -80,7 +80,7 @@ struct lcd_block {
 };
 
 /* Structure returned by PDC_RETURN_CHASSIS_INFO */
-/* NOTE: we use unsigned long:16 two times, since the following member 
+/* NOTE: we use unsigned long:16 two times, since the following member
    lcd_cmd_reg_addr needs to be 64bit aligned on 64bit PA2.0-machines */
 struct pdc_chassis_lcd_info_ret_block {
 	unsigned long model:16;		/* DISPLAY_MODEL_XXXX */
@@ -103,7 +103,7 @@ struct pdc_chassis_lcd_info_ret_block {
 #define KITTYHAWK_LCD_CMD  F_EXTEND(0xf0190000UL) /* 64bit-ready */
 #define KITTYHAWK_LCD_DATA (KITTYHAWK_LCD_CMD+1)
 
-/* lcd_info is pre-initialized to the values needed to program KittyHawk LCD's 
+/* lcd_info is pre-initialized to the values needed to program KittyHawk LCD's
  * HP seems to have used Sharp/Hitachi HD44780 LCDs most of the time. */
 static struct pdc_chassis_lcd_info_ret_block
 lcd_info __attribute__((aligned(8))) __read_mostly =
@@ -119,16 +119,16 @@ lcd_info __attribute__((aligned(8))) __read_mostly =
 
 
 /* direct access to some of the lcd_info variables */
-#define LCD_CMD_REG	lcd_info.lcd_cmd_reg_addr	 
-#define LCD_DATA_REG	lcd_info.lcd_data_reg_addr	 
+#define LCD_CMD_REG	lcd_info.lcd_cmd_reg_addr
+#define LCD_DATA_REG	lcd_info.lcd_data_reg_addr
 #define LED_DATA_REG	lcd_info.lcd_cmd_reg_addr	/* LASI & ASP only */
 
 #define LED_HASLCD 1
 #define LED_NOLCD  0
 
 /* The workqueue must be created at init-time */
-static int start_task(void) 
-{	
+static int start_task(void)
+{
 	/* Display the default text now */
 	if (led_type == LED_HASLCD) lcd_print( lcd_text_default );
 
@@ -136,7 +136,7 @@ static int start_task(void)
 	if (lcd_no_led_support) return 0;
 
 	/* Create the work queue and queue the LED task */
-	led_wq = create_singlethread_workqueue("led_wq");	
+	led_wq = create_singlethread_workqueue("led_wq");
 	queue_delayed_work(led_wq, &led_task, 0);
 
 	return 0;
@@ -214,14 +214,14 @@ static ssize_t led_proc_write(struct file *file, const char __user *buf,
 	case LED_HASLCD:
 		if (*cur && cur[strlen(cur)-1] == '\n')
 			cur[strlen(cur)-1] = 0;
-		if (*cur == 0) 
+		if (*cur == 0)
 			cur = lcd_text_default;
 		lcd_print(cur);
 		break;
 	default:
 		return 0;
 	}
-	
+
 	return count;
 
 parse_error:
@@ -267,9 +267,9 @@ static int __init led_create_procfs(void)
 #endif
 
 /*
-   ** 
+   **
    ** led_ASP_driver()
-   ** 
+   **
  */
 #define	LED_DATA	0x01	/* data to shift (0:on 1:off) */
 #define	LED_STROBE	0x02	/* strobe to clock data */
@@ -289,9 +289,9 @@ static void led_ASP_driver(unsigned char leds)
 
 
 /*
-   ** 
+   **
    ** led_LASI_driver()
-   ** 
+   **
  */
 static void led_LASI_driver(unsigned char leds)
 {
@@ -301,16 +301,16 @@ static void led_LASI_driver(unsigned char leds)
 
 
 /*
-   ** 
+   **
    ** led_LCD_driver()
-   **   
+   **
  */
 static void led_LCD_driver(unsigned char leds)
 {
 	static int i;
 	static unsigned char mask[4] = { LED_HEARTBEAT, LED_DISK_IO,
 		LED_LAN_RCV, LED_LAN_TX };
-	
+
 	static struct lcd_block * blockp[4] = {
 		&lcd_info.heartbeat,
 		&lcd_info.disk_io,
@@ -320,15 +320,15 @@ static void led_LCD_driver(unsigned char leds)
 
 	/* Convert min_cmd_delay to milliseconds */
 	unsigned int msec_cmd_delay = 1 + (lcd_info.min_cmd_delay / 1000);
-	
-	for (i=0; i<4; ++i) 
+
+	for (i=0; i<4; ++i)
 	{
-		if ((leds & mask[i]) != (lastleds & mask[i])) 
+		if ((leds & mask[i]) != (lastleds & mask[i]))
 		{
 			gsc_writeb( blockp[i]->command, LCD_CMD_REG );
 			msleep(msec_cmd_delay);
-			
-			gsc_writeb( leds & mask[i] ? blockp[i]->on : 
+
+			gsc_writeb( leds & mask[i] ? blockp[i]->on :
 					blockp[i]->off, LCD_DATA_REG );
 			msleep(msec_cmd_delay);
 		}
@@ -337,15 +337,15 @@ static void led_LCD_driver(unsigned char leds)
 
 
 /*
-   ** 
+   **
    ** led_get_net_activity()
-   ** 
+   **
    ** calculate if there was TX- or RX-throughput on the network interfaces
    ** (analog to dev_get_info() from net/core/dev.c)
-   **   
+   **
  */
 static __inline__ int led_get_net_activity(void)
-{ 
+{
 #ifndef CONFIG_NET
 	return 0;
 #else
@@ -355,7 +355,7 @@ static __inline__ int led_get_net_activity(void)
 	int retval;
 
 	rx_total = tx_total = 0;
-	
+
 	/* we are running as a workqueue task, so we can use an RCU lookup */
 	rcu_read_lock();
 	for_each_netdev_rcu(&init_net, dev) {
@@ -390,14 +390,14 @@ static __inline__ int led_get_net_activity(void)
 
 
 /*
-   ** 
+   **
    ** led_get_diskio_activity()
-   ** 
+   **
    ** calculate if there was disk-io in the system
-   **   
+   **
  */
 static __inline__ int led_get_diskio_activity(void)
-{	
+{
 	static unsigned long last_pgpgin, last_pgpgout;
 	unsigned long events[NR_VM_EVENT_ITEMS];
 	int changed;
@@ -418,7 +418,7 @@ static __inline__ int led_get_diskio_activity(void)
 
 /*
    ** led_work_func()
-   ** 
+   **
    ** manages when and which chassis LCD/LED gets updated
 
     TODO:
@@ -453,9 +453,9 @@ static void led_work_func (struct work_struct *unused)
 		/* flash heartbeat-LED like a real heart
 		 * (2 x short then a long delay)
 		 */
-		if (count_HZ < HEARTBEAT_LEN || 
+		if (count_HZ < HEARTBEAT_LEN ||
 				(count_HZ >= HEARTBEAT_2ND_RANGE_START &&
-				count_HZ < HEARTBEAT_2ND_RANGE_END)) 
+				count_HZ < HEARTBEAT_2ND_RANGE_END))
 			currentleds |= LED_HEARTBEAT;
 	}
 
@@ -488,10 +488,10 @@ static void led_work_func (struct work_struct *unused)
 
 /*
    ** led_halt()
-   ** 
+   **
    ** called by the reboot notifier chain at shutdown and stops all
    ** LED/LCD activities.
-   ** 
+   **
  */
 
 static int led_halt(struct notifier_block *, unsigned long, void *);
@@ -501,7 +501,7 @@ static struct notifier_block led_notifier = {
 };
 static int notifier_disabled = 0;
 
-static int led_halt(struct notifier_block *nb, unsigned long event, void *buf) 
+static int led_halt(struct notifier_block *nb, unsigned long event, void *buf)
 {
 	char *txt;
 
@@ -518,45 +518,45 @@ static int led_halt(struct notifier_block *nb, unsigned long event, void *buf)
 				break;
 	default:		return NOTIFY_DONE;
 	}
-	
+
 	/* Cancel the work item and delete the queue */
 	if (led_wq) {
 		cancel_delayed_work_sync(&led_task);
 		destroy_workqueue(led_wq);
 		led_wq = NULL;
 	}
- 
+
 	if (lcd_info.model == DISPLAY_MODEL_LCD)
 		lcd_print(txt);
 	else
 		if (led_func_ptr)
 			led_func_ptr(0xff); /* turn all LEDs ON */
-	
+
 	return NOTIFY_OK;
 }
 
 /*
    ** register_led_driver()
-   ** 
+   **
    ** registers an external LED or LCD for usage by this driver.
    ** currently only LCD-, LASI- and ASP-style LCD/LED's are supported.
-   ** 
+   **
  */
 
 int __init register_led_driver(int model, unsigned long cmd_reg, unsigned long data_reg)
 {
 	static int initialized;
-	
+
 	if (initialized || !data_reg)
 		return 1;
-	
+
 	lcd_info.model = model;		/* store the values */
 	LCD_CMD_REG = (cmd_reg == LED_CMD_REG_NONE) ? 0 : cmd_reg;
 
 	switch (lcd_info.model) {
 	case DISPLAY_MODEL_LCD:
 		LCD_DATA_REG = data_reg;
-		printk(KERN_INFO "LCD display at %lx,%lx registered\n", 
+		printk(KERN_INFO "LCD display at %lx,%lx registered\n",
 			LCD_CMD_REG , LCD_DATA_REG);
 		led_func_ptr = led_LCD_driver;
 		led_type = LED_HASLCD;
@@ -575,7 +575,7 @@ int __init register_led_driver(int model, unsigned long cmd_reg, unsigned long d
 	case DISPLAY_MODEL_OLD_ASP:
 		LED_DATA_REG = data_reg;
 		led_func_ptr = led_ASP_driver;
-		printk(KERN_INFO "LED (ASP-style) display at %lx registered\n", 
+		printk(KERN_INFO "LED (ASP-style) display at %lx registered\n",
 		    LED_DATA_REG);
 		led_type = LED_NOLCD;
 		break;
@@ -585,8 +585,8 @@ int __init register_led_driver(int model, unsigned long cmd_reg, unsigned long d
 		       __func__, lcd_info.model);
 		return 1;
 	}
-	
-	/* mark the LCD/LED driver now as initialized and 
+
+	/* mark the LCD/LED driver now as initialized and
 	 * register to the reboot notifier chain */
 	initialized++;
 	register_reboot_notifier(&led_notifier);
@@ -601,11 +601,11 @@ int __init register_led_driver(int model, unsigned long cmd_reg, unsigned long d
 
 /*
    ** register_led_regions()
-   ** 
+   **
    ** register_led_regions() registers the LCD/LED regions for /procfs.
-   ** At bootup - where the initialisation of the LCD/LED normally happens - 
+   ** At bootup - where the initialisation of the LCD/LED normally happens -
    ** not all internal structures of request_region() are properly set up,
-   ** so that we delay the led-registration until after busdevices_init() 
+   ** so that we delay the led-registration until after busdevices_init()
    ** has been executed.
    **
  */
@@ -626,9 +626,9 @@ void __init register_led_regions(void)
 
 
 /*
-   ** 
+   **
    ** lcd_print()
-   ** 
+   **
    ** Displays the given string on the LCD-Display of newer machines.
    ** lcd_print() disables/enables the timer-based led work queue to
    ** avoid a race condition while writing the CMD/DATA register pair.
@@ -640,7 +640,7 @@ int lcd_print( const char *str )
 
 	if (!led_func_ptr || lcd_info.model != DISPLAY_MODEL_LCD)
 	    return 0;
-	
+
 	/* temporarily disable the led work task */
 	if (led_wq)
 		cancel_delayed_work_sync(&led_task);
@@ -660,7 +660,7 @@ int lcd_print( const char *str )
 		gsc_writeb(' ', LCD_DATA_REG);
 	    udelay(lcd_info.min_cmd_delay);
 	}
-	
+
 	/* re-queue the work */
 	if (led_wq) {
 		queue_delayed_work(led_wq, &led_task, 0);
@@ -671,8 +671,8 @@ int lcd_print( const char *str )
 
 /*
    ** led_init()
-   ** 
-   ** led_init() is called very early in the bootup-process from setup.c 
+   **
+   ** led_init() is called very early in the bootup-process from setup.c
    ** and asks the PDC for an usable chassis LCD or LED.
    ** If the PDC doesn't return any info, then the LED
    ** is detected by lasi.c or asp.c and registered with the
@@ -715,20 +715,20 @@ int __init led_init(void)
 			 (lcd_info.model==DISPLAY_MODEL_LCD) ? "LCD" :
 			  (lcd_info.model==DISPLAY_MODEL_LASI) ? "LED" : "unknown",
 			 lcd_info.lcd_width, lcd_info.min_cmd_delay,
-			 __FILE__, sizeof(lcd_info), 
+			 __FILE__, sizeof(lcd_info),
 			 chassis_info.actcnt, chassis_info.maxcnt));
 		DPRINTK((KERN_INFO "%s: cmd=%p, data=%p, reset1=%x, reset2=%x, act_enable=%d\n",
-			__FILE__, lcd_info.lcd_cmd_reg_addr, 
-			lcd_info.lcd_data_reg_addr, lcd_info.reset_cmd1,  
+			__FILE__, lcd_info.lcd_cmd_reg_addr,
+			lcd_info.lcd_data_reg_addr, lcd_info.reset_cmd1,
 			lcd_info.reset_cmd2, lcd_info.act_enable ));
-	
+
 		/* check the results. Some machines have a buggy PDC */
 		if (chassis_info.actcnt <= 0 || chassis_info.actcnt != chassis_info.maxcnt)
 			goto not_found;
 
 		switch (lcd_info.model) {
 		case DISPLAY_MODEL_LCD:		/* LCD display */
-			if (chassis_info.actcnt < 
+			if (chassis_info.actcnt <
 				offsetof(struct pdc_chassis_lcd_info_ret_block, _pad)-1)
 				goto not_found;
 			if (!lcd_info.act_enable) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 08/13] parisc/led: reindent the code that gathers device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (6 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 07/13] parisc/led: remove trailing whitespaces Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 09/13] parisc/led: hold the netdev lists lock when retrieving " Vladimir Oltean
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, James E.J. Bottomley, Helge Deller, linux-parisc

The standard in the Linux kernel is to use one tab character per
indentation level.

Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/parisc/led.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
index 676a12bb94c9..b7005aaa782b 100644
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -359,16 +359,19 @@ static __inline__ int led_get_net_activity(void)
 	/* we are running as a workqueue task, so we can use an RCU lookup */
 	rcu_read_lock();
 	for_each_netdev_rcu(&init_net, dev) {
-	    const struct rtnl_link_stats64 *stats;
-	    struct rtnl_link_stats64 temp;
-	    struct in_device *in_dev = __in_dev_get_rcu(dev);
-	    if (!in_dev || !in_dev->ifa_list)
-		continue;
-	    if (ipv4_is_loopback(in_dev->ifa_list->ifa_local))
-		continue;
-	    stats = dev_get_stats(dev, &temp);
-	    rx_total += stats->rx_packets;
-	    tx_total += stats->tx_packets;
+		const struct rtnl_link_stats64 *stats;
+		struct rtnl_link_stats64 temp;
+		struct in_device *in_dev = __in_dev_get_rcu(dev);
+
+		if (!in_dev || !in_dev->ifa_list)
+			continue;
+
+		if (ipv4_is_loopback(in_dev->ifa_list->ifa_local))
+			continue;
+
+		stats = dev_get_stats(dev, &temp);
+		rx_total += stats->rx_packets;
+		tx_total += stats->tx_packets;
 	}
 	rcu_read_unlock();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 09/13] parisc/led: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (7 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 08/13] parisc/led: reindent the code that gathers device statistics Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-07  6:14   ` kernel test robot
  2020-12-06 23:59 ` [RFC PATCH net-next 10/13] net: procfs: " Vladimir Oltean
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, James E.J. Bottomley, Helge Deller, linux-parisc

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

The LED driver for HP-PARISC workstations uses a workqueue to
periodically check for updates in network interface statistics, and
flicker when those have changed (i.e. there has been activity on the
line). Ignoring the fact that this driver is completely duplicating
drivers/leds/trigger/ledtrig-netdev.c, there is an even bigger problem.
Now, the dev_get_stats call can sleep, and iterating through the list of
network interfaces still needs to ensure the integrity of list of
network interfaces. So that leaves us only one locking option given the
current design of the network stack, and that is the netns mutex.

Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/parisc/led.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
index b7005aaa782b..1289dd3ea6e4 100644
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -38,7 +38,6 @@
 #include <linux/ctype.h>
 #include <linux/blkdev.h>
 #include <linux/workqueue.h>
-#include <linux/rcupdate.h>
 #include <asm/io.h>
 #include <asm/processor.h>
 #include <asm/hardware.h>
@@ -356,24 +355,28 @@ static __inline__ int led_get_net_activity(void)
 
 	rx_total = tx_total = 0;
 
-	/* we are running as a workqueue task, so we can use an RCU lookup */
-	rcu_read_lock();
-	for_each_netdev_rcu(&init_net, dev) {
+	/* we are running as a workqueue task, so we can sleep */
+	mutex_lock(&init_net->netdev_lists_lock);
+
+	for_each_netdev(&init_net, dev) {
+		struct in_device *in_dev = in_dev_get(dev);
 		const struct rtnl_link_stats64 *stats;
 		struct rtnl_link_stats64 temp;
-		struct in_device *in_dev = __in_dev_get_rcu(dev);
 
-		if (!in_dev || !in_dev->ifa_list)
+		if (!in_dev || !in_dev->ifa_list ||
+		    ipv4_is_loopback(in_dev->ifa_list->ifa_local)) {
+			in_dev_put(in_dev);
 			continue;
+		}
 
-		if (ipv4_is_loopback(in_dev->ifa_list->ifa_local))
-			continue;
+		in_dev_put(in_dev);
 
 		stats = dev_get_stats(dev, &temp);
 		rx_total += stats->rx_packets;
 		tx_total += stats->tx_packets;
 	}
-	rcu_read_unlock();
+
+	mutex_unlock(&init_net->netdev_lists_lock);
 
 	retval = 0;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 10/13] net: procfs: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (8 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 09/13] parisc/led: hold the netdev lists lock when retrieving " Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 11/13] net: sysfs: don't hold dev_base_lock while " Vladimir Oltean
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

The /proc/net/dev file uses an RCU read-side critical section to ensure
the integrity of the list of network interfaces, because it iterates
through all net devices in the netns to show their statistics.

To offer the equivalent protection against an interface registering or
deregistering, while also remaining in sleepable context, we can use the
netns mutex for the interface lists.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/net-procfs.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index c714e6a9dad4..83f8a89dfc5a 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -21,7 +21,7 @@ static inline struct net_device *dev_from_same_bucket(struct seq_file *seq, loff
 	unsigned int count = 0, offset = get_offset(*pos);
 
 	h = &net->dev_index_head[get_bucket(*pos)];
-	hlist_for_each_entry_rcu(dev, h, index_hlist) {
+	hlist_for_each_entry(dev, h, index_hlist) {
 		if (++count == offset)
 			return dev;
 	}
@@ -51,9 +51,11 @@ static inline struct net_device *dev_from_bucket(struct seq_file *seq, loff_t *p
  *	in detail.
  */
 static void *dev_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(RCU)
 {
-	rcu_read_lock();
+	struct net *net = seq_file_net(seq);
+
+	mutex_lock(&net->netdev_lists_lock);
+
 	if (!*pos)
 		return SEQ_START_TOKEN;
 
@@ -70,9 +72,10 @@ static void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 }
 
 static void dev_seq_stop(struct seq_file *seq, void *v)
-	__releases(RCU)
 {
-	rcu_read_unlock();
+	struct net *net = seq_file_net(seq);
+
+	mutex_unlock(&net->netdev_lists_lock);
 }
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 11/13] net: sysfs: don't hold dev_base_lock while retrieving device statistics
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (9 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 10/13] net: procfs: " Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 12/13] net: mark ndo_get_stats64 as being able to sleep Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 13/13] net: remove obsolete comments about ndo_get_stats64 context from eth drivers Vladimir Oltean
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Christian Brauner

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

I need to preface this by saying that I have no idea why netstat_show
takes the dev_base_lock rwlock. Two things can be observed:
(a) it does not appear to be due to dev_isalive requiring it for some
    reason, because broadcast_show() also calls dev_isalive() and has
    had no problem existing since the beginning of git.
(b) the dev_get_stats function definitely does not need dev_base_lock
    protection either. In fact, holding the dev_base_lock is the entire
    problem here, because we want to make dev_get_stats sleepable, and
    holding a rwlock gives us atomic context.

So since no protection seems to be necessary, just run unlocked while
retrieving the /sys/class/net/eth0/statistics/* values.

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/net-sysfs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 999b70c59761..0782a476b424 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -585,14 +585,13 @@ static ssize_t netstat_show(const struct device *d,
 	WARN_ON(offset > sizeof(struct rtnl_link_stats64) ||
 		offset % sizeof(u64) != 0);
 
-	read_lock(&dev_base_lock);
 	if (dev_isalive(dev)) {
 		struct rtnl_link_stats64 temp;
 		const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
 
 		ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset));
 	}
-	read_unlock(&dev_base_lock);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 12/13] net: mark ndo_get_stats64 as being able to sleep
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (10 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 11/13] net: sysfs: don't hold dev_base_lock while " Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  2020-12-06 23:59 ` [RFC PATCH net-next 13/13] net: remove obsolete comments about ndo_get_stats64 context from eth drivers Vladimir Oltean
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel

Now that all callers have been converted to not use atomic context when
calling dev_get_stats, it is time to update the documentation and put a
notice in the function that it expects process context.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 Documentation/networking/netdevices.rst | 4 ++--
 Documentation/networking/statistics.rst | 9 ++++-----
 net/core/dev.c                          | 2 ++
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 5a85fcc80c76..9d005cbf84f7 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -64,8 +64,8 @@ ndo_do_ioctl:
 	Context: process
 
 ndo_get_stats:
-	Synchronization: dev_base_lock rwlock.
-	Context: nominally process, but don't sleep inside an rwlock
+	Synchronization: none. rtnl_lock() might be held, but not guaranteed.
+	Context: process
 
 ndo_start_xmit:
 	Synchronization: __netif_tx_lock spinlock.
diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
index 234abedc29b2..ad3e353df0dd 100644
--- a/Documentation/networking/statistics.rst
+++ b/Documentation/networking/statistics.rst
@@ -155,11 +155,10 @@ Drivers must ensure best possible compliance with
 Please note for example that detailed error statistics must be
 added into the general `rx_error` / `tx_error` counters.
 
-The `.ndo_get_stats64` callback can not sleep because of accesses
-via `/proc/net/dev`. If driver may sleep when retrieving the statistics
-from the device it should do so periodically asynchronously and only return
-a recent copy from `.ndo_get_stats64`. Ethtool interrupt coalescing interface
-allows setting the frequency of refreshing statistics, if needed.
+Drivers may sleep when retrieving the statistics from the device, or they might
+read the counters periodically and only return in `.ndo_get_stats64` a recent
+copy collected asynchronously. In the latter case, the ethtool interrupt
+coalescing interface allows setting the frequency of refreshing statistics.
 
 Retrieving ethtool statistics is a multi-syscall process, drivers are advised
 to keep the number of statistics constant to avoid race conditions with
diff --git a/net/core/dev.c b/net/core/dev.c
index 18904acb7797..45a845526b64 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10367,6 +10367,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
+	might_sleep();
+
 	if (ops->ndo_get_stats64) {
 		memset(storage, 0, sizeof(*storage));
 		ops->ndo_get_stats64(dev, storage);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 13/13] net: remove obsolete comments about ndo_get_stats64 context from eth drivers
  2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
                   ` (11 preceding siblings ...)
  2020-12-06 23:59 ` [RFC PATCH net-next 12/13] net: mark ndo_get_stats64 as being able to sleep Vladimir Oltean
@ 2020-12-06 23:59 ` Vladimir Oltean
  12 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-06 23:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel

Now that we have a good summary in Documentation/networking/netdevices.rst,
these comments serve no purpose and are actually distracting/confusing.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 1 -
 drivers/net/ethernet/nvidia/forcedeth.c     | 2 --
 drivers/net/ethernet/sfc/efx_common.c       | 1 -
 drivers/net/ethernet/sfc/falcon/efx.c       | 1 -
 4 files changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index fb269d587b74..7425f94f9091 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -870,7 +870,6 @@ static netdev_tx_t enic_hard_start_xmit(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
-/* dev_base_lock rwlock held, nominally process context */
 static void enic_get_stats(struct net_device *netdev,
 			   struct rtnl_link_stats64 *net_stats)
 {
diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 8724d6a9ed02..8fa254dc64e9 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1761,8 +1761,6 @@ static void nv_get_stats(int cpu, struct fe_priv *np,
 /*
  * nv_get_stats64: dev->ndo_get_stats64 function
  * Get latest stats value from the nic.
- * Called with read_lock(&dev_base_lock) held for read -
- * only synchronized against unregister_netdevice.
  */
 static void
 nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
index de797e1ac5a9..4d8047b35fb2 100644
--- a/drivers/net/ethernet/sfc/efx_common.c
+++ b/drivers/net/ethernet/sfc/efx_common.c
@@ -596,7 +596,6 @@ void efx_stop_all(struct efx_nic *efx)
 	efx_stop_datapath(efx);
 }
 
-/* Context: process, dev_base_lock or RTNL held, non-blocking. */
 void efx_net_stats(struct net_device *net_dev, struct rtnl_link_stats64 *stats)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
index f8979991970e..6db2b6583dec 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -2096,7 +2096,6 @@ int ef4_net_stop(struct net_device *net_dev)
 	return 0;
 }
 
-/* Context: process, dev_base_lock or RTNL held, non-blocking. */
 static void ef4_net_stats(struct net_device *net_dev,
 			  struct rtnl_link_stats64 *stats)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 ` [RFC PATCH net-next 05/13] net: bonding: " Vladimir Oltean
@ 2020-12-07  1:00   ` Vladimir Oltean
  2020-12-07 15:22     ` Vladimir Oltean
  2020-12-08 23:57     ` Jakub Kicinski
  0 siblings, 2 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-07  1:00 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek

On Mon, Dec 07, 2020 at 01:59:11AM +0200, Vladimir Oltean wrote:
> In the effort of making .ndo_get_stats64 be able to sleep, we need to
> ensure the callers of dev_get_stats do not use atomic context.
>
> The bonding driver uses an RCU read-side critical section to ensure the
> integrity of the list of network interfaces, because the driver iterates
> through all net devices in the netns to find the ones which are its
> configured slaves. We still need some protection against an interface
> registering or deregistering, and the writer-side lock, the netns mutex,
> is fine for that, because it offers sleepable context.
>
> This mutex now serves double duty. It offers code serialization,
> something which the stats_lock already did. So now that serves no
> purpose, let's remove it.
>
> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

There is a very obvious deadlock here which happens when we have
bond-over-bond and the upper calls dev_get_stats from the lower.

Conceptually, the same can happen even in any number of stacking
combinations between bonding, net_failover, [ insert any other driver
that takes net->netdev_lists_lock here ].

There would be two approaches trying to solve this issue:
- using mutex_lock_nested where we aren't sure that we are top level
- ensuring through convention that user space always takes
  net->netdev_lists_lock when calling dev_get_stats, and documenting
  that, and therefore making it unnecessary to lock in bonding.

I took neither of the two approaches (I don't really like either one too
much), hence [ one of ] the reasons for the RFC. Comments?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 06/13] net_failover: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 ` [RFC PATCH net-next 06/13] net_failover: " Vladimir Oltean
@ 2020-12-07  2:22   ` kernel test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2020-12-07  2:22 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]

Hi Vladimir,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Vladimir-Oltean/Make-ndo_get_stats64-sleepable/20201207-080606
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 4054eebf0fb07b010098adcdea1e1d3978490b9a
config: i386-randconfig-s001-20201207 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-179-ga00755aa-dirty
        # https://github.com/0day-ci/linux/commit/6ffc75ea1bd719ef7329193f8e3bd056ae0cc9ab
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Vladimir-Oltean/Make-ndo_get_stats64-sleepable/20201207-080606
        git checkout 6ffc75ea1bd719ef7329193f8e3bd056ae0cc9ab
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/net/net_failover.c:195:19: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct net_device *slave_dev @@     got struct net_device [noderef] __rcu *primary_dev @@
   drivers/net/net_failover.c:195:19: sparse:     expected struct net_device *slave_dev
   drivers/net/net_failover.c:195:19: sparse:     got struct net_device [noderef] __rcu *primary_dev
>> drivers/net/net_failover.c:202:19: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct net_device *slave_dev @@     got struct net_device [noderef] __rcu *standby_dev @@
   drivers/net/net_failover.c:202:19: sparse:     expected struct net_device *slave_dev
   drivers/net/net_failover.c:202:19: sparse:     got struct net_device [noderef] __rcu *standby_dev

vim +195 drivers/net/net_failover.c

   181	
   182	static void net_failover_get_stats(struct net_device *dev,
   183					   struct rtnl_link_stats64 *stats)
   184	{
   185		struct net_failover_info *nfo_info = netdev_priv(dev);
   186		const struct rtnl_link_stats64 *new;
   187		struct net *net = dev_net(dev);
   188		struct rtnl_link_stats64 temp;
   189		struct net_device *slave_dev;
   190	
   191		mutex_lock(&net->netdev_lists_lock);
   192	
   193		memcpy(stats, &nfo_info->failover_stats, sizeof(*stats));
   194	
 > 195		slave_dev = nfo_info->primary_dev;
   196		if (slave_dev) {
   197			new = dev_get_stats(slave_dev, &temp);
   198			net_failover_fold_stats(stats, new, &nfo_info->primary_stats);
   199			memcpy(&nfo_info->primary_stats, new, sizeof(*new));
   200		}
   201	
 > 202		slave_dev = nfo_info->standby_dev;
   203		if (slave_dev) {
   204			new = dev_get_stats(slave_dev, &temp);
   205			net_failover_fold_stats(stats, new, &nfo_info->standby_stats);
   206			memcpy(&nfo_info->standby_stats, new, sizeof(*new));
   207		}
   208	
   209		memcpy(&nfo_info->failover_stats, stats, sizeof(*stats));
   210	
   211		mutex_unlock(&net->netdev_lists_lock);
   212	}
   213	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 38615 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 09/13] parisc/led: hold the netdev lists lock when retrieving device statistics
  2020-12-06 23:59 ` [RFC PATCH net-next 09/13] parisc/led: hold the netdev lists lock when retrieving " Vladimir Oltean
@ 2020-12-07  6:14   ` kernel test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2020-12-07  6:14 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3518 bytes --]

Hi Vladimir,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Vladimir-Oltean/Make-ndo_get_stats64-sleepable/20201207-080606
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 4054eebf0fb07b010098adcdea1e1d3978490b9a
config: parisc-randconfig-c004-20201207 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/4c4eb38dca7fc52d77a0b9f2866090f780a6bc2d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Vladimir-Oltean/Make-ndo_get_stats64-sleepable/20201207-080606
        git checkout 4c4eb38dca7fc52d77a0b9f2866090f780a6bc2d
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/parisc/led.c: In function 'led_get_net_activity':
>> drivers/parisc/led.c:359:22: error: invalid type argument of '->' (have 'struct net')
     359 |  mutex_lock(&init_net->netdev_lists_lock);
         |                      ^~
   drivers/parisc/led.c:379:24: error: invalid type argument of '->' (have 'struct net')
     379 |  mutex_unlock(&init_net->netdev_lists_lock);
         |                        ^~

vim +359 drivers/parisc/led.c

   336	
   337	
   338	/*
   339	   **
   340	   ** led_get_net_activity()
   341	   **
   342	   ** calculate if there was TX- or RX-throughput on the network interfaces
   343	   ** (analog to dev_get_info() from net/core/dev.c)
   344	   **
   345	 */
   346	static __inline__ int led_get_net_activity(void)
   347	{
   348	#ifndef CONFIG_NET
   349		return 0;
   350	#else
   351		static u64 rx_total_last, tx_total_last;
   352		u64 rx_total, tx_total;
   353		struct net_device *dev;
   354		int retval;
   355	
   356		rx_total = tx_total = 0;
   357	
   358		/* we are running as a workqueue task, so we can sleep */
 > 359		mutex_lock(&init_net->netdev_lists_lock);
   360	
   361		for_each_netdev(&init_net, dev) {
   362			struct in_device *in_dev = in_dev_get(dev);
   363			const struct rtnl_link_stats64 *stats;
   364			struct rtnl_link_stats64 temp;
   365	
   366			if (!in_dev || !in_dev->ifa_list ||
   367			    ipv4_is_loopback(in_dev->ifa_list->ifa_local)) {
   368				in_dev_put(in_dev);
   369				continue;
   370			}
   371	
   372			in_dev_put(in_dev);
   373	
   374			stats = dev_get_stats(dev, &temp);
   375			rx_total += stats->rx_packets;
   376			tx_total += stats->tx_packets;
   377		}
   378	
   379		mutex_unlock(&init_net->netdev_lists_lock);
   380	
   381		retval = 0;
   382	
   383		if (rx_total != rx_total_last) {
   384			rx_total_last = rx_total;
   385			retval |= LED_LAN_RCV;
   386		}
   387	
   388		if (tx_total != tx_total_last) {
   389			tx_total_last = tx_total;
   390			retval |= LED_LAN_TX;
   391		}
   392	
   393		return retval;
   394	#endif
   395	}
   396	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26870 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-07  1:00   ` Vladimir Oltean
@ 2020-12-07 15:22     ` Vladimir Oltean
  2020-12-08 23:57     ` Jakub Kicinski
  1 sibling, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-07 15:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Andrew Lunn, Florian Fainelli, Paul Gortmaker,
	Pablo Neira Ayuso, Jiri Benc, Cong Wang, Jamal Hadi Salim,
	Stephen Hemminger, Eric Dumazet, George McCollister,
	Oleksij Rempel, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek

On Mon, Dec 07, 2020 at 03:00:40AM +0200, Vladimir Oltean wrote:
> There is a very obvious deadlock here which happens when we have
> bond-over-bond and the upper calls dev_get_stats from the lower.
>
> Conceptually, the same can happen even in any number of stacking
> combinations between bonding, net_failover, [ insert any other driver
> that takes net->netdev_lists_lock here ].
>
> There would be two approaches trying to solve this issue:
> - using mutex_lock_nested where we aren't sure that we are top level
> - ensuring through convention that user space always takes
>   net->netdev_lists_lock when calling dev_get_stats, and documenting
>   that, and therefore making it unnecessary to lock in bonding.
>
> I took neither of the two approaches (I don't really like either one too
> much), hence [ one of ] the reasons for the RFC. Comments?

And there are also issues which are more subtle (or maybe just to me, at
the time I wrote the patch). Like the fact that the netdev adjacency
lists are not protected by net->netdev_lists_lock, but still by the RTNL
mutex and RCU. I think that in order for the iteration through lower
interfaces to capture a consistent state of the adjancency lists of all
interfaces, the __netdev_adjacent_dev_link_lists and
__netdev_adjacent_dev_unlink_lists functions would need to be run under
the net->netdev_lists_lock, and not just under some lock per-netdev.
But this is raising the locking domain covered by net->netdev_lists_lock
to more than I initially intended. I'll try to do this and see how it
works.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage
  2020-12-06 23:59 ` [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage Vladimir Oltean
@ 2020-12-08  6:43   ` Leon Romanovsky
  0 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-12-08  6:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David S . Miller, Jakub Kicinski, netdev, Andrew Lunn,
	Florian Fainelli, Paul Gortmaker, Pablo Neira Ayuso, Jiri Benc,
	Cong Wang, Jamal Hadi Salim, Stephen Hemminger, Eric Dumazet,
	George McCollister, Oleksij Rempel

On Mon, Dec 07, 2020 at 01:59:07AM +0200, Vladimir Oltean wrote:
> The dev_base_lock does not protect dev->dev_addr, so it serves no
> purpose here.
>
> Cc: Leon Romanovsky <leon@kernel.org>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/infiniband/hw/mlx4/main.c | 3 ---
>  1 file changed, 3 deletions(-)

Agree with the description, most likely the authors wanted to ensure
that "dev" doesn't disappear, but it is not correct way to do and not
needed in that flow.

Thanks for the patch,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-07  1:00   ` Vladimir Oltean
  2020-12-07 15:22     ` Vladimir Oltean
@ 2020-12-08 23:57     ` Jakub Kicinski
  2020-12-09  0:03       ` Vladimir Oltean
  1 sibling, 1 reply; 24+ messages in thread
From: Jakub Kicinski @ 2020-12-08 23:57 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David S . Miller, netdev, Andrew Lunn, Florian Fainelli,
	Paul Gortmaker, Pablo Neira Ayuso, Jiri Benc, Cong Wang,
	Jamal Hadi Salim, Stephen Hemminger, Eric Dumazet,
	George McCollister, Oleksij Rempel, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek

On Mon, 7 Dec 2020 01:00:40 +0000 Vladimir Oltean wrote:
> - ensuring through convention that user space always takes
>   net->netdev_lists_lock when calling dev_get_stats, and documenting
>   that, and therefore making it unnecessary to lock in bonding.

This seems like the better option to me. Makes the locking rules pretty
clear.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-08 23:57     ` Jakub Kicinski
@ 2020-12-09  0:03       ` Vladimir Oltean
  2020-12-09  0:17         ` Jakub Kicinski
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-09  0:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, netdev, Andrew Lunn, Florian Fainelli,
	Paul Gortmaker, Pablo Neira Ayuso, Jiri Benc, Cong Wang,
	Jamal Hadi Salim, Stephen Hemminger, Eric Dumazet,
	George McCollister, Oleksij Rempel, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek

On Tue, Dec 08, 2020 at 03:57:44PM -0800, Jakub Kicinski wrote:
> On Mon, 7 Dec 2020 01:00:40 +0000 Vladimir Oltean wrote:
> > - ensuring through convention that user space always takes
> >   net->netdev_lists_lock when calling dev_get_stats, and documenting
> >   that, and therefore making it unnecessary to lock in bonding.
>
> This seems like the better option to me. Makes the locking rules pretty
> clear.

It is non-obvious to me that top-level callers of dev_get_stats should
hold a lock as specific as the one protecting the lists of network
interfaces. In the vast majority of implementations of dev_get_stats,
that lock would not actually protect anything, which would lead into
just one more lock that is used for more than it should be. In my tree I
had actually already switched over to mutex_lock_nested. Nonetheless, I
am still open if you want to make the case that simplicity should prevail
over specificity. But in that case, maybe we should just keep on using
the RTNL mutex.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-09  0:03       ` Vladimir Oltean
@ 2020-12-09  0:17         ` Jakub Kicinski
  2020-12-09  1:14           ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Jakub Kicinski @ 2020-12-09  0:17 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David S . Miller, netdev, Andrew Lunn, Florian Fainelli,
	Paul Gortmaker, Pablo Neira Ayuso, Jiri Benc, Cong Wang,
	Jamal Hadi Salim, Stephen Hemminger, Eric Dumazet,
	George McCollister, Oleksij Rempel, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek

On Wed, 9 Dec 2020 00:03:56 +0000 Vladimir Oltean wrote:
> On Tue, Dec 08, 2020 at 03:57:44PM -0800, Jakub Kicinski wrote:
> > On Mon, 7 Dec 2020 01:00:40 +0000 Vladimir Oltean wrote:  
> > > - ensuring through convention that user space always takes
> > >   net->netdev_lists_lock when calling dev_get_stats, and documenting
> > >   that, and therefore making it unnecessary to lock in bonding.  
> >
> > This seems like the better option to me. Makes the locking rules pretty
> > clear.  
> 
> It is non-obvious to me that top-level callers of dev_get_stats should
> hold a lock as specific as the one protecting the lists of network
> interfaces. In the vast majority of implementations of dev_get_stats,
> that lock would not actually protect anything, which would lead into
> just one more lock that is used for more than it should be. In my tree I
> had actually already switched over to mutex_lock_nested. Nonetheless, I
> am still open if you want to make the case that simplicity should prevail
> over specificity.

What are the locking rules you have in mind then? Caller may hold RTNL
or ifc mutex?

> But in that case, maybe we should just keep on using the RTNL mutex.

That's a wasted opportunity, RTNL lock is pretty busy.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-09  0:17         ` Jakub Kicinski
@ 2020-12-09  1:14           ` Vladimir Oltean
  2020-12-09  1:28             ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-09  1:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, netdev, Andrew Lunn, Florian Fainelli,
	Paul Gortmaker, Pablo Neira Ayuso, Jiri Benc, Cong Wang,
	Jamal Hadi Salim, Stephen Hemminger, Eric Dumazet,
	George McCollister, Oleksij Rempel, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek

On Tue, Dec 08, 2020 at 04:17:37PM -0800, Jakub Kicinski wrote:
> On Wed, 9 Dec 2020 00:03:56 +0000 Vladimir Oltean wrote:
> > On Tue, Dec 08, 2020 at 03:57:44PM -0800, Jakub Kicinski wrote:
> > > On Mon, 7 Dec 2020 01:00:40 +0000 Vladimir Oltean wrote:
> > > > - ensuring through convention that user space always takes
> > > >   net->netdev_lists_lock when calling dev_get_stats, and documenting
> > > >   that, and therefore making it unnecessary to lock in bonding.
> > >
> > > This seems like the better option to me. Makes the locking rules pretty
> > > clear.
> >
> > It is non-obvious to me that top-level callers of dev_get_stats should
> > hold a lock as specific as the one protecting the lists of network
> > interfaces. In the vast majority of implementations of dev_get_stats,
> > that lock would not actually protect anything, which would lead into
> > just one more lock that is used for more than it should be. In my tree I
> > had actually already switched over to mutex_lock_nested. Nonetheless, I
> > am still open if you want to make the case that simplicity should prevail
> > over specificity.
>
> What are the locking rules you have in mind then? Caller may hold RTNL
> or ifc mutex?

Well, it's clear that calling mutex_lock_nested would only silence lockdep,
there would still be this non-reentrant mutex that will be held from a
potentially recursive code path. So it a non-solution, and even worse
than using plain mutex_lock, because at least that is detectable when it
locks up the system.

net_failover and bonding are the only drivers that are creating this
recursivity requirement in dev_get_stats. Other one-over-many stackable
interfaces, like the bridge, just use dev_get_tstats64. I'm almost
thinking that it would be cleaner to convert these two to dev_get_tstats64
too, that would simplify things enormously. Even team uses something
that is based on software counters, something reminiscent of
dev_get_tstats64, definitely not counters retrieved from the underlying
device. Of course, the disadvantage with doing that is going to be that
virtual interfaces cannot retrieve statistics recursively from their
lower interface. I'm trying to think how much of a real disadvantage
that will be. For offloaded interfaces they will be completely off,
that's for sure. And this is one of the reasons that mandated the DSA
rework to expose MAC-based counters in dev_get_stats in the first place.
But thinking in the larger sense. An offloading interface that supports
IP forwarding, with 50 VLAN uppers. How could the statistics counters of
those VLAN uppers ever be correct. It's not realistic to expect of the
underlying hardware to keep separate statistics for each upper, that the
upper could then go ahead and just query. Sure, in the cases where a
lower can have only one upper at a time (bridge, bonding) what I said is
not applicable, but the general goal of having accurate counters for
offloading interfaces everywhere seems to not be really achievable.

In case this doesn't work out, I guess I'll have to document that
dev_get_stats is recursive, and all mutual exclusion based locks need to
be taken upfront, which is the only strategy that works with recursion
that I know of.

> > But in that case, maybe we should just keep on using the RTNL mutex.
>
> That's a wasted opportunity, RTNL lock is pretty busy.

It is certainly easy to use and easy to enforce though. It also seems to
protect everything, which is generally the reason why you tend to not
think a lot when using it.

To be clear, I am not removing the RTNL mutex from any place where it is
held today. Letting me do that would be like letting a bull in a china
shop. I am only creating a sub-lock of it, which is protecting a subset
of what the RTNL mutex is protecting. By sub-lock I mean that code paths
that currently hold the RTNL mutex, like list_netdevice(), will also
take net->netdev_lists_lock - never the other way around, that would
create lock inversion. The plan is to then require new code that
iterates through network interface lists to use the new locking scheme
and not RTNL mutex, and in parallel to migrate, on a best-effort basis,
code that runs under ASSERT_RTNL + net->netdev_lists_lock to only take
the net->netdev_lists_lock and be RTNL-free.

But is that any better at runtime than just taking the RTNL mutex, will
it make it less contended? Nope, due to the indirect serialization
effect that is created by the fact that net->netdev_lists_lock is a
sub-lock of the RTNL mutex. Say net/core/net-procfs.c holds
net->netdev_lists_lock while dumping the statistics from a firmware and
sleeping while doing so. All is rosy in its RTNL mutex free code path.
Then there comes along another thread which calls for_each_netdev,
something else which today requires the RTNL mutex but could be reworked
to use the net->netdev_lists_lock. Let's assume we are at a stage where
the gazillion places that call for_each_netdev() have been converted to
also take the net->netdev_lists_lock. But they couldn't be converted to
also _not_ take the RTNL mutex, or at least all of them. This means that
there will be code paths that try to take the RTNL mutex and end up
waiting for our net/core/net-procfs.c to complete dumping the firmware.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 05/13] net: bonding: hold the netdev lists lock when retrieving device statistics
  2020-12-09  1:14           ` Vladimir Oltean
@ 2020-12-09  1:28             ` Vladimir Oltean
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2020-12-09  1:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, netdev, Andrew Lunn, Florian Fainelli,
	Paul Gortmaker, Pablo Neira Ayuso, Jiri Benc, Cong Wang,
	Jamal Hadi Salim, Stephen Hemminger, Eric Dumazet,
	George McCollister, Oleksij Rempel, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek

On Wed, Dec 09, 2020 at 03:14:04AM +0200, Vladimir Oltean wrote:
> net_failover and bonding are the only drivers that are creating this
> recursivity requirement in dev_get_stats. Other one-over-many stackable
> interfaces, like the bridge, just use dev_get_tstats64. I'm almost
> thinking that it would be cleaner to convert these two to dev_get_tstats64
> too, that would simplify things enormously. Even team uses something
> that is based on software counters, something reminiscent of
> dev_get_tstats64, definitely not counters retrieved from the underlying
> device. Of course, the disadvantage with doing that is going to be that
> virtual interfaces cannot retrieve statistics recursively from their
> lower interface. I'm trying to think how much of a real disadvantage
> that will be. For offloaded interfaces they will be completely off,
> that's for sure. And this is one of the reasons that mandated the DSA
> rework to expose MAC-based counters in dev_get_stats in the first place.
> But thinking in the larger sense. An offloading interface that supports
> IP forwarding, with 50 VLAN uppers. How could the statistics counters of
> those VLAN uppers ever be correct. It's not realistic to expect of the
> underlying hardware to keep separate statistics for each upper, that the
> upper could then go ahead and just query. Sure, in the cases where a
> lower can have only one upper at a time (bridge, bonding) what I said is
> not applicable, but the general goal of having accurate counters for
> offloading interfaces everywhere seems to not be really achievable.

Ok, putting some more thought into it after sending the email, maybe it
isn't unreasonable for devices with IP fwd offload to keep statistics
for each upper. They need to be aware of those uppers for a plethora of
other reasons, after all.

So, here's another idea for eliminating the recursion. Maybe we just add
a new netdevice feature, NETIF_F_FOLDING_STATS, or something like that,
which bonding and failover will use. Then dev_get_stats sees that flag,
and instead of calling ->ndo_get_stats64 for it, it just iterates
through its bottom-most level of lower interfaces and aggregates the
stats by itself.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-12-09  1:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06 23:59 [RFC PATCH net-next 00/13] Make .ndo_get_stats64 sleepable Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 01/13] RDMA/mlx4: remove bogus dev_base_lock usage Vladimir Oltean
2020-12-08  6:43   ` Leon Romanovsky
2020-12-06 23:59 ` [RFC PATCH net-next 02/13] net: mark dev_base_lock for deprecation Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 03/13] net: introduce a mutex for the netns interface lists Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 04/13] s390/appldata_net_sum: hold the netdev lists lock when retrieving device statistics Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 05/13] net: bonding: " Vladimir Oltean
2020-12-07  1:00   ` Vladimir Oltean
2020-12-07 15:22     ` Vladimir Oltean
2020-12-08 23:57     ` Jakub Kicinski
2020-12-09  0:03       ` Vladimir Oltean
2020-12-09  0:17         ` Jakub Kicinski
2020-12-09  1:14           ` Vladimir Oltean
2020-12-09  1:28             ` Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 06/13] net_failover: " Vladimir Oltean
2020-12-07  2:22   ` kernel test robot
2020-12-06 23:59 ` [RFC PATCH net-next 07/13] parisc/led: remove trailing whitespaces Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 08/13] parisc/led: reindent the code that gathers device statistics Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 09/13] parisc/led: hold the netdev lists lock when retrieving " Vladimir Oltean
2020-12-07  6:14   ` kernel test robot
2020-12-06 23:59 ` [RFC PATCH net-next 10/13] net: procfs: " Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 11/13] net: sysfs: don't hold dev_base_lock while " Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 12/13] net: mark ndo_get_stats64 as being able to sleep Vladimir Oltean
2020-12-06 23:59 ` [RFC PATCH net-next 13/13] net: remove obsolete comments about ndo_get_stats64 context from eth drivers Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.