netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: avoid net core runtime resume for most drivers
@ 2024-02-07  9:51 Stanislaw Gruszka
  2024-02-09 20:45 ` Jakub Kicinski
  0 siblings, 1 reply; 3+ messages in thread
From: Stanislaw Gruszka @ 2024-02-07  9:51 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Johannes Berg, Tony Nguyen, Jesse Brandeburg, Sasha Neftin,
	Dima Ruinskiy, Heiner Kallweit, Rafael J . Wysocki

Introducing runtime resume before ndo_open and ethtool ops by commits:

d43c65b05b84 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")

caused rtnl_lock -> rtnl_lock deadlock for igc/igb drivers if user enabled
runtime power management by:

echo auto > /sys/bus/pci/devices/{PCI_ID}/power/control

and then use ethtool or open the link, when device is suspended.

The deadlock was reported at few places before, for example:

https://lore.kernel.org/netdev/20211124144505.31e15716@hermes.local/
https://lore.kernel.org/all/20231202221402.GA11155@merlins.org/

and has fix for igb:

ac8c58f5b535 ("igb: fix deadlock caused by taking RTNL in RPM resume path")

However this solution does not take into account various corner cases.
For example it breaks RTNL lock assertion for netif_set_real_num_queues().
Fixing the deadlock issue properly in race free manner in igb/igc drivers
is not easy.

Additionally for other drivers, that fine tune their pm runtime
implementation, runtime resume by net core cause unnecessary wake-ups.
Various ethtool ops do not require HW access and can be done without
resuming device. On dev_open(), we can error exit before ndo_open(),
then not-used device will stay permanently enabled (if not implemented
runtime pm with autosuspend).

So, remove the runtime resume calls from net core.

However, now seems there are some benefits of the calls for r8169
driver, so keep them for it by introducing IFF_DO_RUNTIME_PM priv flag
and use it for r8169.

Fixes: d43c65b05b84 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 2 +-
 include/linux/netdevice.h                 | 1 +
 net/core/dev.c                            | 2 +-
 net/ethtool/ioctl.c                       | 4 ++--
 net/ethtool/netlink.c                     | 6 +++---
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index dd73df6b17b0..f430df3c9e51 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5287,7 +5287,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 			   NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
 	dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
-	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_DO_RUNTIME_PM;
 
 	/*
 	 * Pretend we are using VLANs; This bypasses a nasty bug where
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 118c40258d07..b149eca32adc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1768,6 +1768,7 @@ enum netdev_priv_flags {
 	IFF_TX_SKB_NO_LINEAR		= BIT_ULL(31),
 	IFF_CHANGE_PROTO_DOWN		= BIT_ULL(32),
 	IFF_SEE_ALL_HWTSTAMP_REQUESTS	= BIT_ULL(33),
+	IFF_DO_RUNTIME_PM		= BIT_ULL(34),
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
diff --git a/net/core/dev.c b/net/core/dev.c
index cb2dab0feee0..bd9af0469dfa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1420,7 +1420,7 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 
 	if (!netif_device_present(dev)) {
 		/* may be detached because parent is runtime-suspended */
-		if (dev->dev.parent)
+		if (dev->priv_flags & IFF_DO_RUNTIME_PM && dev->dev.parent)
 			pm_runtime_resume(dev->dev.parent);
 		if (!netif_device_present(dev))
 			return -ENODEV;
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 7519b0818b91..8e424c08de89 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -2875,7 +2875,7 @@ __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
 			return -EPERM;
 	}
 
-	if (dev->dev.parent)
+	if (dev->priv_flags & IFF_DO_RUNTIME_PM && dev->dev.parent)
 		pm_runtime_get_sync(dev->dev.parent);
 
 	if (!netif_device_present(dev)) {
@@ -3106,7 +3106,7 @@ __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
 	if (old_features != dev->features)
 		netdev_features_change(dev);
 out:
-	if (dev->dev.parent)
+	if (dev->priv_flags & IFF_DO_RUNTIME_PM && dev->dev.parent)
 		pm_runtime_put(dev->dev.parent);
 
 	return rc;
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index fe3553f60bf3..089a88a12f7a 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -37,7 +37,7 @@ int ethnl_ops_begin(struct net_device *dev)
 	if (!dev)
 		return -ENODEV;
 
-	if (dev->dev.parent)
+	if (dev->priv_flags & IFF_DO_RUNTIME_PM && dev->dev.parent)
 		pm_runtime_get_sync(dev->dev.parent);
 
 	if (!netif_device_present(dev) ||
@@ -54,7 +54,7 @@ int ethnl_ops_begin(struct net_device *dev)
 
 	return 0;
 err:
-	if (dev->dev.parent)
+	if (dev->priv_flags & IFF_DO_RUNTIME_PM && dev->dev.parent)
 		pm_runtime_put(dev->dev.parent);
 
 	return ret;
@@ -65,7 +65,7 @@ void ethnl_ops_complete(struct net_device *dev)
 	if (dev->ethtool_ops->complete)
 		dev->ethtool_ops->complete(dev);
 
-	if (dev->dev.parent)
+	if (dev->priv_flags & IFF_DO_RUNTIME_PM  && dev->dev.parent)
 		pm_runtime_put(dev->dev.parent);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] net: avoid net core runtime resume for most drivers
  2024-02-07  9:51 [PATCH] net: avoid net core runtime resume for most drivers Stanislaw Gruszka
@ 2024-02-09 20:45 ` Jakub Kicinski
  2024-02-10 10:48   ` Stanislaw Gruszka
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2024-02-09 20:45 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Johannes Berg, Tony Nguyen, Jesse Brandeburg, Sasha Neftin,
	Dima Ruinskiy, Heiner Kallweit, Rafael J . Wysocki

On Wed,  7 Feb 2024 10:51:11 +0100 Stanislaw Gruszka wrote:
> Introducing runtime resume before ndo_open and ethtool ops by commits:
> 
> d43c65b05b84 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
> bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")

We should revisit whether core should try to help drivers with PM
or not once the Intel drivers are fixed. Taking the global networking
lock from device resume routine is inexcusable. I really don't want to
make precedents for adjusting the core because driver code is poor
quality :(

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] net: avoid net core runtime resume for most drivers
  2024-02-09 20:45 ` Jakub Kicinski
@ 2024-02-10 10:48   ` Stanislaw Gruszka
  0 siblings, 0 replies; 3+ messages in thread
From: Stanislaw Gruszka @ 2024-02-10 10:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Johannes Berg, Tony Nguyen, Jesse Brandeburg, Sasha Neftin,
	Dima Ruinskiy, Heiner Kallweit, Rafael J . Wysocki

On Fri, Feb 09, 2024 at 12:45:36PM -0800, Jakub Kicinski wrote:
> On Wed,  7 Feb 2024 10:51:11 +0100 Stanislaw Gruszka wrote:
> > Introducing runtime resume before ndo_open and ethtool ops by commits:
> > 
> > d43c65b05b84 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
> > bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
> 
> We should revisit whether core should try to help drivers with PM
> or not once the Intel drivers are fixed. Taking the global networking
> lock from device resume routine is inexcusable.

Ok, we need get rid of it in igc (and fix broken assertion in igb).

> I really don't want to
> make precedents for adjusting the core because driver code is poor
> quality :(

I see this rather as removal of special core adjustment added
by above commits. It's only needed for r8169. For all others
it is just pure harm. It could be done without the priv flag,
but then r8169 probably would need changes.

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-02-10 10:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-07  9:51 [PATCH] net: avoid net core runtime resume for most drivers Stanislaw Gruszka
2024-02-09 20:45 ` Jakub Kicinski
2024-02-10 10:48   ` Stanislaw Gruszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).