netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
@ 2018-05-24 11:11 Vladimir Zapolskiy
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

For ages trivial changes to RAVB and SuperH ethernet links by means of
standard 'ethtool' trigger a 'sleeping function called from invalid
context' bug, to visualize it on r8a7795 ULCB:

  % ethtool -r eth0
  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
  in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
  INFO: lockdep is turned off.
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>]           (null)
  hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
  softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
  softirqs last disabled at (0): [<0000000000000000>]           (null)
  CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
  Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
  Call trace:
   dump_backtrace+0x0/0x198
   show_stack+0x24/0x30
   dump_stack+0xb8/0xf4
   ___might_sleep+0x1c8/0x1f8
   __might_sleep+0x58/0x90
   __mutex_lock+0x50/0x890
   mutex_lock_nested+0x3c/0x50
   phy_start_aneg_priv+0x38/0x180
   phy_start_aneg+0x24/0x30
   ravb_nway_reset+0x3c/0x68
   dev_ethtool+0x3dc/0x2338
   dev_ioctl+0x19c/0x490
   sock_do_ioctl+0xe0/0x238
   sock_ioctl+0x254/0x460
   do_vfs_ioctl+0xb0/0x918
   ksys_ioctl+0x50/0x80
   sys_ioctl+0x34/0x48
   __sys_trace_return+0x0/0x4

The root cause is that an attempt to modify ECMR and GECMR registers
only when RX/TX function is disabled was too overcomplicated in its
original implementation, also processing of an optional Link Change
interrupt added even more complexity, as a result the implementation
was error prone.

The new locking scheme is confirmed to be correct by dumping driver
specific and generic PHY framework function calls with aid of ftrace
while running more or less advanced tests.

Please note that sh_eth patches from the series were built-tested only.

On purpose I do not add Fixes tags, the reused PHY handlers were added
way later than the fixed problems were firstly found in the drivers.

Vladimir Zapolskiy (6):
  ravb: remove custom .nway_reset from ethtool ops
  ravb: remove custom .get_link_ksettings from ethtool ops
  ravb: remove custom .set_link_ksettings from ethtool ops
  sh_eth: remove custom .nway_reset from ethtool ops
  sh_eth: remove custom .get_link_ksettings from ethtool ops
  sh_eth: remove custom .set_link_ksettings from ethtool ops

 drivers/net/ethernet/renesas/ravb_main.c | 93 ++++++-------------------------
 drivers/net/ethernet/renesas/sh_eth.c    | 94 ++++++--------------------------
 2 files changed, 34 insertions(+), 153 deletions(-)

-- 
2.8.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
@ 2018-05-24 11:11 ` Vladimir Zapolskiy
  2018-05-24 13:22   ` Andrew Lunn
                     ` (3 more replies)
  2018-05-24 11:11 ` [PATCH 2/6] ravb: remove custom .get_link_ksettings " Vladimir Zapolskiy
                   ` (5 subsequent siblings)
  6 siblings, 4 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

The change fixes a sleep in atomic context issue, which can be
always triggered by running 'ethtool -r' command, because
phy_start_aneg() protects phydev fields by a mutex.

Another note is that the change implicitly replaces phy_start_aneg()
with a newer phy_restart_aneg().

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 68f122140966..4a043eb0e2aa 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
 	return error;
 }
 
-static int ravb_nway_reset(struct net_device *ndev)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	int error = -ENODEV;
-	unsigned long flags;
-
-	if (ndev->phydev) {
-		spin_lock_irqsave(&priv->lock, flags);
-		error = phy_start_aneg(ndev->phydev);
-		spin_unlock_irqrestore(&priv->lock, flags);
-	}
-
-	return error;
-}
-
 static u32 ravb_get_msglevel(struct net_device *ndev)
 {
 	struct ravb_private *priv = netdev_priv(ndev);
@@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 }
 
 static const struct ethtool_ops ravb_ethtool_ops = {
-	.nway_reset		= ravb_nway_reset,
+	.nway_reset		= phy_ethtool_nway_reset,
 	.get_msglevel		= ravb_get_msglevel,
 	.set_msglevel		= ravb_set_msglevel,
 	.get_link		= ethtool_op_get_link,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/6] ravb: remove custom .get_link_ksettings from ethtool ops
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
@ 2018-05-24 11:11 ` Vladimir Zapolskiy
  2018-05-26 17:07   ` Sergei Shtylyov
  2018-05-24 11:11 ` [PATCH 3/6] ravb: remove custom .set_link_ksettings " Vladimir Zapolskiy
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

The change replaces a custom implementation of .get_link_ksettings
callback with a shared phy_ethtool_get_link_ksettings(), note that
&priv->lock wrapping is not needed, because the lock does not
serialize access to phydev fields.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 4a043eb0e2aa..3d91caa44176 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1096,22 +1096,6 @@ static int ravb_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int ravb_get_link_ksettings(struct net_device *ndev,
-				   struct ethtool_link_ksettings *cmd)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	unsigned long flags;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&priv->lock, flags);
-	phy_ethtool_ksettings_get(ndev->phydev, cmd);
-	spin_unlock_irqrestore(&priv->lock, flags);
-
-	return 0;
-}
-
 static int ravb_set_link_ksettings(struct net_device *ndev,
 				   const struct ethtool_link_ksettings *cmd)
 {
@@ -1372,7 +1356,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
 	.get_ringparam		= ravb_get_ringparam,
 	.set_ringparam		= ravb_set_ringparam,
 	.get_ts_info		= ravb_get_ts_info,
-	.get_link_ksettings	= ravb_get_link_ksettings,
+	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
 	.set_link_ksettings	= ravb_set_link_ksettings,
 	.get_wol		= ravb_get_wol,
 	.set_wol		= ravb_set_wol,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
  2018-05-24 11:11 ` [PATCH 2/6] ravb: remove custom .get_link_ksettings " Vladimir Zapolskiy
@ 2018-05-24 11:11 ` Vladimir Zapolskiy
  2018-05-24 13:29   ` Andrew Lunn
  2018-05-26 19:50   ` Sergei Shtylyov
  2018-05-24 11:11 ` [PATCH 4/6] sh_eth: remove custom .nway_reset " Vladimir Zapolskiy
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

The change replaces a custom implementation of .set_link_ksettings
callback with a shared phy_ethtool_set_link_ksettings(), this fixes
sleep in atomic context bug, which is encountered every time when link
settings are changed by ethtool.

Now duplex mode setting is enforced in ravb_adjust_link() only, also
now TX/RX is disabled when link is put down or modifications to E-MAC
registers ECMR and GECMR are expected for both cases of checked and
ignored link status pin state from E-MAC interrupt handler.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
 1 file changed, 15 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 3d91caa44176..0d811c02ff34 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
 	struct ravb_private *priv = netdev_priv(ndev);
 	struct phy_device *phydev = ndev->phydev;
 	bool new_state = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	/* Disable TX and RX right over here, if E-MAC change is ignored */
+	if (priv->no_avb_link)
+		ravb_rcv_snd_disable(ndev);
 
 	if (phydev->link) {
 		if (phydev->duplex != priv->duplex) {
@@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
 			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
 			new_state = true;
 			priv->link = phydev->link;
-			if (priv->no_avb_link)
-				ravb_rcv_snd_enable(ndev);
 		}
 	} else if (priv->link) {
 		new_state = true;
 		priv->link = 0;
 		priv->speed = 0;
 		priv->duplex = -1;
-		if (priv->no_avb_link)
-			ravb_rcv_snd_disable(ndev);
 	}
 
+	/* Enable TX and RX right over here, if E-MAC change is ignored */
+	if (priv->no_avb_link && phydev->link)
+		ravb_rcv_snd_enable(ndev);
+
+	mmiowb();
+	spin_unlock_irqrestore(&priv->lock, flags);
+
 	if (new_state && netif_msg_link(priv))
 		phy_print_status(phydev);
 }
@@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int ravb_set_link_ksettings(struct net_device *ndev,
-				   const struct ethtool_link_ksettings *cmd)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	unsigned long flags;
-	int error;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&priv->lock, flags);
-
-	/* Disable TX and RX */
-	ravb_rcv_snd_disable(ndev);
-
-	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
-	if (error)
-		goto error_exit;
-
-	if (cmd->base.duplex == DUPLEX_FULL)
-		priv->duplex = 1;
-	else
-		priv->duplex = 0;
-
-	ravb_set_duplex(ndev);
-
-error_exit:
-	mdelay(1);
-
-	/* Enable TX and RX */
-	ravb_rcv_snd_enable(ndev);
-
-	mmiowb();
-	spin_unlock_irqrestore(&priv->lock, flags);
-
-	return error;
-}
-
 static u32 ravb_get_msglevel(struct net_device *ndev)
 {
 	struct ravb_private *priv = netdev_priv(ndev);
@@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
 	.set_ringparam		= ravb_set_ringparam,
 	.get_ts_info		= ravb_get_ts_info,
 	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
-	.set_link_ksettings	= ravb_set_link_ksettings,
+	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
 	.get_wol		= ravb_get_wol,
 	.set_wol		= ravb_set_wol,
 };
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/6] sh_eth: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
                   ` (2 preceding siblings ...)
  2018-05-24 11:11 ` [PATCH 3/6] ravb: remove custom .set_link_ksettings " Vladimir Zapolskiy
@ 2018-05-24 11:11 ` Vladimir Zapolskiy
  2018-05-24 13:30   ` Andrew Lunn
  2018-05-26 18:46   ` Sergei Shtylyov
  2018-05-24 11:14 ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Vladimir Zapolskiy
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

The change fixes a sleep in atomic context issue, which can be
always triggered by running 'ethtool -r' command, because
phy_start_aneg() protects phydev fields by a mutex.

Another note is that the change implicitly replaces phy_start_aneg()
with a newer phy_restart_aneg().

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/sh_eth.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index d9cadfb1bc4a..6d1fed2b4a4a 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2252,22 +2252,6 @@ static void sh_eth_get_regs(struct net_device *ndev, struct ethtool_regs *regs,
 	pm_runtime_put_sync(&mdp->pdev->dev);
 }
 
-static int sh_eth_nway_reset(struct net_device *ndev)
-{
-	struct sh_eth_private *mdp = netdev_priv(ndev);
-	unsigned long flags;
-	int ret;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&mdp->lock, flags);
-	ret = phy_start_aneg(ndev->phydev);
-	spin_unlock_irqrestore(&mdp->lock, flags);
-
-	return ret;
-}
-
 static u32 sh_eth_get_msglevel(struct net_device *ndev)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
@@ -2418,7 +2402,7 @@ static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.get_regs_len	= sh_eth_get_regs_len,
 	.get_regs	= sh_eth_get_regs,
-	.nway_reset	= sh_eth_nway_reset,
+	.nway_reset	= phy_ethtool_nway_reset,
 	.get_msglevel	= sh_eth_get_msglevel,
 	.set_msglevel	= sh_eth_set_msglevel,
 	.get_link	= ethtool_op_get_link,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/6] sh_eth: remove custom .get_link_ksettings from ethtool ops
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
                   ` (3 preceding siblings ...)
  2018-05-24 11:11 ` [PATCH 4/6] sh_eth: remove custom .nway_reset " Vladimir Zapolskiy
@ 2018-05-24 11:14 ` Vladimir Zapolskiy
  2018-05-24 11:14   ` [PATCH 6/6] sh_eth: remove custom .set_link_ksettings " Vladimir Zapolskiy
  2018-05-24 13:31   ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Andrew Lunn
  2018-05-24 16:40 ` [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Sergei Shtylyov
  2018-05-25  8:11 ` Geert Uytterhoeven
  6 siblings, 2 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:14 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

The change replaces a custom implementation of .get_link_ksettings
callback with a shared phy_ethtool_get_link_ksettings(), note that
&priv->lock wrapping is not needed, because the lock does not
serialize access to phydev fields.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/sh_eth.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 6d1fed2b4a4a..e627b2b6c3b3 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2019,22 +2019,6 @@ static int sh_eth_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int sh_eth_get_link_ksettings(struct net_device *ndev,
-				     struct ethtool_link_ksettings *cmd)
-{
-	struct sh_eth_private *mdp = netdev_priv(ndev);
-	unsigned long flags;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&mdp->lock, flags);
-	phy_ethtool_ksettings_get(ndev->phydev, cmd);
-	spin_unlock_irqrestore(&mdp->lock, flags);
-
-	return 0;
-}
-
 static int sh_eth_set_link_ksettings(struct net_device *ndev,
 				     const struct ethtool_link_ksettings *cmd)
 {
@@ -2411,7 +2395,7 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.get_sset_count     = sh_eth_get_sset_count,
 	.get_ringparam	= sh_eth_get_ringparam,
 	.set_ringparam	= sh_eth_set_ringparam,
-	.get_link_ksettings = sh_eth_get_link_ksettings,
+	.get_link_ksettings = phy_ethtool_get_link_ksettings,
 	.set_link_ksettings = sh_eth_set_link_ksettings,
 	.get_wol	= sh_eth_get_wol,
 	.set_wol	= sh_eth_set_wol,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/6] sh_eth: remove custom .set_link_ksettings from ethtool ops
  2018-05-24 11:14 ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Vladimir Zapolskiy
@ 2018-05-24 11:14   ` Vladimir Zapolskiy
  2018-05-24 13:31   ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Andrew Lunn
  1 sibling, 0 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 11:14 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

The change replaces a custom implementation of .set_link_ksettings
callback with a shared phy_ethtool_set_link_ksettings(), this fixes
sleep in atomic context bug, which is encountered every time when link
settings are changed by ethtool.

Now duplex mode setting is enforced in ravb_adjust_link() only, also
now TX/RX is disabled when link is put down or modifications to E-MAC
registers ECMR and GECMR are expected for both cases of checked and
ignored link status pin state from E-MAC interrupt handler.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/sh_eth.c | 58 +++++++++--------------------------
 1 file changed, 15 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index e627b2b6c3b3..a3115888bd04 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1916,8 +1916,15 @@ static void sh_eth_adjust_link(struct net_device *ndev)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
 	struct phy_device *phydev = ndev->phydev;
+	unsigned long flags;
 	int new_state = 0;
 
+	spin_lock_irqsave(&mdp->lock, flags);
+
+	/* Disable TX and RX right over here, if E-MAC change is ignored */
+	if (mdp->cd->no_psr || mdp->no_ether_link)
+		sh_eth_rcv_snd_disable(ndev);
+
 	if (phydev->link) {
 		if (phydev->duplex != mdp->duplex) {
 			new_state = 1;
@@ -1936,18 +1943,21 @@ static void sh_eth_adjust_link(struct net_device *ndev)
 			sh_eth_modify(ndev, ECMR, ECMR_TXF, 0);
 			new_state = 1;
 			mdp->link = phydev->link;
-			if (mdp->cd->no_psr || mdp->no_ether_link)
-				sh_eth_rcv_snd_enable(ndev);
 		}
 	} else if (mdp->link) {
 		new_state = 1;
 		mdp->link = 0;
 		mdp->speed = 0;
 		mdp->duplex = -1;
-		if (mdp->cd->no_psr || mdp->no_ether_link)
-			sh_eth_rcv_snd_disable(ndev);
 	}
 
+	/* Enable TX and RX right over here, if E-MAC change is ignored */
+	if ((mdp->cd->no_psr || mdp->no_ether_link) && phydev->link)
+		sh_eth_rcv_snd_enable(ndev);
+
+	mmiowb();
+	spin_unlock_irqrestore(&mdp->lock, flags);
+
 	if (new_state && netif_msg_link(mdp))
 		phy_print_status(phydev);
 }
@@ -2019,44 +2029,6 @@ static int sh_eth_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int sh_eth_set_link_ksettings(struct net_device *ndev,
-				     const struct ethtool_link_ksettings *cmd)
-{
-	struct sh_eth_private *mdp = netdev_priv(ndev);
-	unsigned long flags;
-	int ret;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&mdp->lock, flags);
-
-	/* disable tx and rx */
-	sh_eth_rcv_snd_disable(ndev);
-
-	ret = phy_ethtool_ksettings_set(ndev->phydev, cmd);
-	if (ret)
-		goto error_exit;
-
-	if (cmd->base.duplex == DUPLEX_FULL)
-		mdp->duplex = 1;
-	else
-		mdp->duplex = 0;
-
-	if (mdp->cd->set_duplex)
-		mdp->cd->set_duplex(ndev);
-
-error_exit:
-	mdelay(1);
-
-	/* enable tx and rx */
-	sh_eth_rcv_snd_enable(ndev);
-
-	spin_unlock_irqrestore(&mdp->lock, flags);
-
-	return ret;
-}
-
 /* If it is ever necessary to increase SH_ETH_REG_DUMP_MAX_REGS, the
  * version must be bumped as well.  Just adding registers up to that
  * limit is fine, as long as the existing register indices don't
@@ -2396,7 +2368,7 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.get_ringparam	= sh_eth_get_ringparam,
 	.set_ringparam	= sh_eth_set_ringparam,
 	.get_link_ksettings = phy_ethtool_get_link_ksettings,
-	.set_link_ksettings = sh_eth_set_link_ksettings,
+	.set_link_ksettings = phy_ethtool_set_link_ksettings,
 	.get_wol	= sh_eth_get_wol,
 	.set_wol	= sh_eth_set_wol,
 };
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
@ 2018-05-24 13:22   ` Andrew Lunn
  2018-05-24 14:11     ` Vladimir Zapolskiy
  2018-05-26 16:56   ` Sergei Shtylyov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Andrew Lunn @ 2018-05-24 13:22 UTC (permalink / raw)
  To: Vladimir Zapolskiy
  Cc: David S. Miller, Sergei Shtylyov, netdev, linux-renesas-soc

On Thu, May 24, 2018 at 02:11:53PM +0300, Vladimir Zapolskiy wrote:
> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.
> 
> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);
> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}

Eck! phylib assumes thread context and takes a mutex while calling
into the PHY driver.

It would be good to add some sort of fixes: tag. Maybe for the commit
that added the generic nway_reset? That would at least cover some
stable kernels.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-05-24 11:11 ` [PATCH 3/6] ravb: remove custom .set_link_ksettings " Vladimir Zapolskiy
@ 2018-05-24 13:29   ` Andrew Lunn
  2018-05-24 14:06     ` Vladimir Zapolskiy
  2018-05-26 19:50   ` Sergei Shtylyov
  1 sibling, 1 reply; 34+ messages in thread
From: Andrew Lunn @ 2018-05-24 13:29 UTC (permalink / raw)
  To: Vladimir Zapolskiy
  Cc: David S. Miller, Sergei Shtylyov, netdev, linux-renesas-soc

On Thu, May 24, 2018 at 02:11:55PM +0300, Vladimir Zapolskiy wrote:
> The change replaces a custom implementation of .set_link_ksettings
> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
> sleep in atomic context bug, which is encountered every time when link
> settings are changed by ethtool.
> 
> Now duplex mode setting is enforced in ravb_adjust_link() only, also
> now TX/RX is disabled when link is put down or modifications to E-MAC
> registers ECMR and GECMR are expected for both cases of checked and
> ignored link status pin state from E-MAC interrupt handler.
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>  1 file changed, 15 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 3d91caa44176..0d811c02ff34 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>  	struct ravb_private *priv = netdev_priv(ndev);
>  	struct phy_device *phydev = ndev->phydev;
>  	bool new_state = false;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&priv->lock, flags);

Hi Vladimir

It is pretty unusual to see an adjust_link callback take a lock. Is it
clearly defined what it is protecting?

	Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] sh_eth: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 ` [PATCH 4/6] sh_eth: remove custom .nway_reset " Vladimir Zapolskiy
@ 2018-05-24 13:30   ` Andrew Lunn
  2018-05-26 18:46   ` Sergei Shtylyov
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Lunn @ 2018-05-24 13:30 UTC (permalink / raw)
  To: Vladimir Zapolskiy
  Cc: David S. Miller, Sergei Shtylyov, netdev, linux-renesas-soc

On Thu, May 24, 2018 at 02:11:56PM +0300, Vladimir Zapolskiy wrote:
> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.
> 
> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] sh_eth: remove custom .get_link_ksettings from ethtool ops
  2018-05-24 11:14 ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Vladimir Zapolskiy
  2018-05-24 11:14   ` [PATCH 6/6] sh_eth: remove custom .set_link_ksettings " Vladimir Zapolskiy
@ 2018-05-24 13:31   ` Andrew Lunn
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Lunn @ 2018-05-24 13:31 UTC (permalink / raw)
  To: Vladimir Zapolskiy
  Cc: David S. Miller, Sergei Shtylyov, netdev, linux-renesas-soc

On Thu, May 24, 2018 at 02:14:43PM +0300, Vladimir Zapolskiy wrote:
> The change replaces a custom implementation of .get_link_ksettings
> callback with a shared phy_ethtool_get_link_ksettings(), note that
> &priv->lock wrapping is not needed, because the lock does not
> serialize access to phydev fields.
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-05-24 13:29   ` Andrew Lunn
@ 2018-05-24 14:06     ` Vladimir Zapolskiy
  0 siblings, 0 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 14:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Zapolskiy, David S. Miller, Sergei Shtylyov, netdev,
	linux-renesas-soc

Hi Andrew,

On 05/24/2018 04:29 PM, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 02:11:55PM +0300, Vladimir Zapolskiy wrote:
>> The change replaces a custom implementation of .set_link_ksettings
>> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
>> sleep in atomic context bug, which is encountered every time when link
>> settings are changed by ethtool.
>>
>> Now duplex mode setting is enforced in ravb_adjust_link() only, also
>> now TX/RX is disabled when link is put down or modifications to E-MAC
>> registers ECMR and GECMR are expected for both cases of checked and
>> ignored link status pin state from E-MAC interrupt handler.
>>
>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>> ---
>>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>>  1 file changed, 15 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>> index 3d91caa44176..0d811c02ff34 100644
>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>>  	struct ravb_private *priv = netdev_priv(ndev);
>>  	struct phy_device *phydev = ndev->phydev;
>>  	bool new_state = false;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&priv->lock, flags);
> 
> Hi Vladimir
> 
> It is pretty unusual to see an adjust_link callback take a lock. Is it
> clearly defined what it is protecting?
> 

thank you for review.

As the commit message says, the hardware manual claims that
any modifications to ECMR and GECMR registers, i.e. calls to
ravb_set_duplex() and ravb_set_rate() from the modified
ravb_adjust_link() function, has to be done when RX/TX is
disabled (same ECMR register bit fields), the spinlock
serializes interrupt handlers and modifications to ECMR contents,
its previous usage for ethtool handlers was obviously wrong.

The information is quite implicit, but the change emphasizes
it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 13:22   ` Andrew Lunn
@ 2018-05-24 14:11     ` Vladimir Zapolskiy
  2018-05-24 16:18       ` Sergei Shtylyov
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-24 14:11 UTC (permalink / raw)
  To: Andrew Lunn, Vladimir Zapolskiy
  Cc: David S. Miller, Sergei Shtylyov, netdev, linux-renesas-soc

On 05/24/2018 04:22 PM, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 02:11:53PM +0300, Vladimir Zapolskiy wrote:
>> The change fixes a sleep in atomic context issue, which can be
>> always triggered by running 'ethtool -r' command, because
>> phy_start_aneg() protects phydev fields by a mutex.
>>
>> Another note is that the change implicitly replaces phy_start_aneg()
>> with a newer phy_restart_aneg().
>>
>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>> ---
>>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>>  1 file changed, 1 insertion(+), 16 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>> index 68f122140966..4a043eb0e2aa 100644
>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>>  	return error;
>>  }
>>  
>> -static int ravb_nway_reset(struct net_device *ndev)
>> -{
>> -	struct ravb_private *priv = netdev_priv(ndev);
>> -	int error = -ENODEV;
>> -	unsigned long flags;
>> -
>> -	if (ndev->phydev) {
>> -		spin_lock_irqsave(&priv->lock, flags);
>> -		error = phy_start_aneg(ndev->phydev);
>> -		spin_unlock_irqrestore(&priv->lock, flags);
>> -	}
> 
> Eck! phylib assumes thread context and takes a mutex while calling
> into the PHY driver.
> 
> It would be good to add some sort of fixes: tag. Maybe for the commit
> that added the generic nway_reset? That would at least cover some
> stable kernels.
> 
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> 

Hi Andrew, thank you for review.

generally it makes sense to add Fixes tag, but as I said in
the commit message the problem is present before reused phy_ethtool_*()
functions were added to the kernel, so some kind of juggling with
the proper kernel version would be required in assumption that
the fixes are backported as an unmodified changes.

Hopefully Sergei as the driver maintainer can verify the fixes on
older kernels and suggest the right kernel versions for backporting.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 14:11     ` Vladimir Zapolskiy
@ 2018-05-24 16:18       ` Sergei Shtylyov
  2018-05-24 16:44         ` Andrew Lunn
  0 siblings, 1 reply; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-24 16:18 UTC (permalink / raw)
  To: Vladimir Zapolskiy, Andrew Lunn, Vladimir Zapolskiy
  Cc: David S. Miller, netdev, linux-renesas-soc

Hello!

On 05/24/2018 05:11 PM, Vladimir Zapolskiy wrote:

>>> The change fixes a sleep in atomic context issue, which can be
>>> always triggered by running 'ethtool -r' command, because
>>> phy_start_aneg() protects phydev fields by a mutex.

  You don't say that *not* grabbing the spinlock is safe... 

>>> Another note is that the change implicitly replaces phy_start_aneg()
>>> with a newer phy_restart_aneg().

   Hm, perphaps this could be a material for a separate patch? 

>>>
>>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>>> ---
>>>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>>>  1 file changed, 1 insertion(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 68f122140966..4a043eb0e2aa 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>>>  	return error;
>>>  }
>>>  
>>> -static int ravb_nway_reset(struct net_device *ndev)
>>> -{
>>> -	struct ravb_private *priv = netdev_priv(ndev);
>>> -	int error = -ENODEV;
>>> -	unsigned long flags;
>>> -
>>> -	if (ndev->phydev) {
>>> -		spin_lock_irqsave(&priv->lock, flags);
>>> -		error = phy_start_aneg(ndev->phydev);
>>> -		spin_unlock_irqrestore(&priv->lock, flags);
>>> -	}
>>
>> Eck! phylib assumes thread context and takes a mutex while calling
>> into the PHY driver.
>>
>> It would be good to add some sort of fixes: tag. Maybe for the commit
>> that added the generic nway_reset? That would at least cover some
>> stable kernels.
>>
>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>>
> 
> Hi Andrew, thank you for review.
> 
> generally it makes sense to add Fixes tag, but as I said in
> the commit message the problem is present before reused phy_ethtool_*()
> functions were added to the kernel, so some kind of juggling with
> the proper kernel version would be required in assumption that
> the fixes are backported as an unmodified changes.

   The -stable fixes can vary from version to version, IIUC. You could be
asked to backport your patch if Greg KH (or somebody else from the -stable
kernel maintainers) gets in trouble backporting your patch. 

> Hopefully Sergei as the driver maintainer can verify the fixes on

   I'm *not* a maintainer, just a humble reviewer! :-)

> older kernels and suggest the right kernel versions for backporting.

   This would be asking too much from me, I'm afraid...
   Still, Dave, could you please give me a couple of days to spend on
this series?

> --
> With best wishes,
> Vladimir

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
                   ` (4 preceding siblings ...)
  2018-05-24 11:14 ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Vladimir Zapolskiy
@ 2018-05-24 16:40 ` Sergei Shtylyov
  2018-05-24 17:24   ` Sergei Shtylyov
  2018-05-25  8:11 ` Geert Uytterhoeven
  6 siblings, 1 reply; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-24 16:40 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> For ages trivial changes to RAVB and SuperH ethernet links by means of
> standard 'ethtool' trigger a 'sleeping function called from invalid
> context' bug, to visualize it on r8a7795 ULCB:
> 
>   % ethtool -r eth0
>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
>   in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
>   INFO: lockdep is turned off.
>   irq event stamp: 0
>   hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>   hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>   softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>   softirqs last disabled at (0): [<0000000000000000>]           (null)
>   CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
>   Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
>   Call trace:
>    dump_backtrace+0x0/0x198
>    show_stack+0x24/0x30
>    dump_stack+0xb8/0xf4
>    ___might_sleep+0x1c8/0x1f8
>    __might_sleep+0x58/0x90
>    __mutex_lock+0x50/0x890
>    mutex_lock_nested+0x3c/0x50
>    phy_start_aneg_priv+0x38/0x180
>    phy_start_aneg+0x24/0x30
>    ravb_nway_reset+0x3c/0x68
>    dev_ethtool+0x3dc/0x2338
>    dev_ioctl+0x19c/0x490
>    sock_do_ioctl+0xe0/0x238
>    sock_ioctl+0x254/0x460
>    do_vfs_ioctl+0xb0/0x918
>    ksys_ioctl+0x50/0x80
>    sys_ioctl+0x34/0x48
>    __sys_trace_return+0x0/0x4
> 
> The root cause is that an attempt to modify ECMR and GECMR registers
> only when RX/TX function is disabled was too overcomplicated in its
> original implementation, also processing of an optional Link Change
> interrupt added even more complexity, as a result the implementation
> was error prone.
> 
> The new locking scheme is confirmed to be correct by dumping driver
> specific and generic PHY framework function calls with aid of ftrace
> while running more or less advanced tests.
> 
> Please note that sh_eth patches from the series were built-tested only.
> 
> On purpose I do not add Fixes tags, the reused PHY handlers were added
> way later than the fixed problems were firstly found in the drivers.

   I think you went one step too far with these fixes. On the first glance,
the real fixes are to remove grabbing/releasing the spinlock for the duration
of the phylib calls. Am I right? If so, making use of the new phylib APIs
would be a further enhancement, it's not needed for fixing the splats per se...

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 16:18       ` Sergei Shtylyov
@ 2018-05-24 16:44         ` Andrew Lunn
  2018-05-24 17:01           ` Sergei Shtylyov
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Lunn @ 2018-05-24 16:44 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Vladimir Zapolskiy, Vladimir Zapolskiy, David S. Miller, netdev,
	linux-renesas-soc

On Thu, May 24, 2018 at 07:18:28PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 05/24/2018 05:11 PM, Vladimir Zapolskiy wrote:
> 
> >>> The change fixes a sleep in atomic context issue, which can be
> >>> always triggered by running 'ethtool -r' command, because
> >>> phy_start_aneg() protects phydev fields by a mutex.
> 
>   You don't say that *not* grabbing the spinlock is safe... 

For it to be unsafe, i think that would mean phylib would need to call
back into the MAC driver? The only way that could happen is via the
adjust_link call. And that will deadlock, since it takes the same
lock.

Or am i/we missing something?

	    Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 16:44         ` Andrew Lunn
@ 2018-05-24 17:01           ` Sergei Shtylyov
  2018-05-24 17:23             ` Andrew Lunn
  2018-05-25  6:05             ` Vladimir Zapolskiy
  0 siblings, 2 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-24 17:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Zapolskiy, Vladimir Zapolskiy, David S. Miller, netdev,
	linux-renesas-soc

On 05/24/2018 07:44 PM, Andrew Lunn wrote:

>>>>> The change fixes a sleep in atomic context issue, which can be
>>>>> always triggered by running 'ethtool -r' command, because
>>>>> phy_start_aneg() protects phydev fields by a mutex.
>>
>>   You don't say that *not* grabbing the spinlock is safe... 
> 
> For it to be unsafe, i think that would mean phylib would need to call
> back into the MAC driver? The only way that could happen is via the
> adjust_link call. And that will deadlock, since it takes the same
> lock.
> 
> Or am i/we missing something?

   It doesn't take any locks currently, only patches #3/#6 makes it do so...

> 	    Andrew

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 17:01           ` Sergei Shtylyov
@ 2018-05-24 17:23             ` Andrew Lunn
  2018-05-25  6:05             ` Vladimir Zapolskiy
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Lunn @ 2018-05-24 17:23 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Vladimir Zapolskiy, Vladimir Zapolskiy, David S. Miller, netdev,
	linux-renesas-soc

> > For it to be unsafe, i think that would mean phylib would need to call
> > back into the MAC driver? The only way that could happen is via the
> > adjust_link call. And that will deadlock, since it takes the same
> > lock.
> > 
> > Or am i/we missing something?
> 
>    It doesn't take any locks currently, only patches #3/#6 makes it do so...

Ah, yes.

You should not be holding any spinlocks when calling into phylib.
It does its own locking, which is mutex based.

The code in this patch is not touching the MAC, so looks safe to me.

    Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
  2018-05-24 16:40 ` [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Sergei Shtylyov
@ 2018-05-24 17:24   ` Sergei Shtylyov
  2018-05-25  6:25     ` Vladimir Zapolskiy
  0 siblings, 1 reply; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-24 17:24 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/24/2018 07:40 PM, Sergei Shtylyov wrote:

>> For ages trivial changes to RAVB and SuperH ethernet links by means of
>> standard 'ethtool' trigger a 'sleeping function called from invalid
>> context' bug, to visualize it on r8a7795 ULCB:
>>
>>   % ethtool -r eth0
>>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
>>   in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
>>   INFO: lockdep is turned off.
>>   irq event stamp: 0
>>   hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>>   hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>   softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>   softirqs last disabled at (0): [<0000000000000000>]           (null)
>>   CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
>>   Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
>>   Call trace:
>>    dump_backtrace+0x0/0x198
>>    show_stack+0x24/0x30
>>    dump_stack+0xb8/0xf4
>>    ___might_sleep+0x1c8/0x1f8
>>    __might_sleep+0x58/0x90
>>    __mutex_lock+0x50/0x890
>>    mutex_lock_nested+0x3c/0x50
>>    phy_start_aneg_priv+0x38/0x180
>>    phy_start_aneg+0x24/0x30
>>    ravb_nway_reset+0x3c/0x68
>>    dev_ethtool+0x3dc/0x2338
>>    dev_ioctl+0x19c/0x490
>>    sock_do_ioctl+0xe0/0x238
>>    sock_ioctl+0x254/0x460
>>    do_vfs_ioctl+0xb0/0x918
>>    ksys_ioctl+0x50/0x80
>>    sys_ioctl+0x34/0x48
>>    __sys_trace_return+0x0/0x4
>>
>> The root cause is that an attempt to modify ECMR and GECMR registers
>> only when RX/TX function is disabled was too overcomplicated in its
>> original implementation, also processing of an optional Link Change
>> interrupt added even more complexity, as a result the implementation
>> was error prone.
>>
>> The new locking scheme is confirmed to be correct by dumping driver
>> specific and generic PHY framework function calls with aid of ftrace
>> while running more or less advanced tests.
>>
>> Please note that sh_eth patches from the series were built-tested only.
>>
>> On purpose I do not add Fixes tags, the reused PHY handlers were added
>> way later than the fixed problems were firstly found in the drivers.
> 
>    I think you went one step too far with these fixes. On the first glance,
> the real fixes are to remove grabbing/releasing the spinlock for the duration
> of the phylib calls. Am I right? If so, making use of the new phylib APIs
> would be a further enhancement, it's not needed for fixing the splats per se...

   Note that I hadn't looked at the patches #3/#6 at the time of writing this;
those seem to be more complicated than the rest.

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 17:01           ` Sergei Shtylyov
  2018-05-24 17:23             ` Andrew Lunn
@ 2018-05-25  6:05             ` Vladimir Zapolskiy
  1 sibling, 0 replies; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-25  6:05 UTC (permalink / raw)
  To: Sergei Shtylyov, Andrew Lunn
  Cc: Vladimir Zapolskiy, David S. Miller, netdev, linux-renesas-soc

Hello Sergei,

On 05/24/2018 08:01 PM, Sergei Shtylyov wrote:
> On 05/24/2018 07:44 PM, Andrew Lunn wrote:
> 
>>>>>> The change fixes a sleep in atomic context issue, which can be
>>>>>> always triggered by running 'ethtool -r' command, because
>>>>>> phy_start_aneg() protects phydev fields by a mutex.
>>>
>>>   You don't say that *not* grabbing the spinlock is safe...

I say both that it is the fix and it is safe, I've already described
the function of the spinlock in my comments, and it is more or less
clear from the driver code.

>>
>> For it to be unsafe, i think that would mean phylib would need to call
>> back into the MAC driver? The only way that could happen is via the
>> adjust_link call. And that will deadlock, since it takes the same
>> lock.
>>
>> Or am i/we missing something?
> 
>    It doesn't take any locks currently, only patches #3/#6 makes it do so...

And that's the proper fix in my opinion, my tests don't unveil any issues.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
  2018-05-24 17:24   ` Sergei Shtylyov
@ 2018-05-25  6:25     ` Vladimir Zapolskiy
  2018-05-26 17:31       ` Sergei Shtylyov
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-25  6:25 UTC (permalink / raw)
  To: Sergei Shtylyov, David S. Miller; +Cc: netdev, linux-renesas-soc

Hello Sergei,

On 05/24/2018 08:24 PM, Sergei Shtylyov wrote:
> On 05/24/2018 07:40 PM, Sergei Shtylyov wrote:
> 
>>> For ages trivial changes to RAVB and SuperH ethernet links by means of
>>> standard 'ethtool' trigger a 'sleeping function called from invalid
>>> context' bug, to visualize it on r8a7795 ULCB:
>>>
>>>   % ethtool -r eth0
>>>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
>>>   in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
>>>   INFO: lockdep is turned off.
>>>   irq event stamp: 0
>>>   hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>>>   hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>>   softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>>   softirqs last disabled at (0): [<0000000000000000>]           (null)
>>>   CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
>>>   Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
>>>   Call trace:
>>>    dump_backtrace+0x0/0x198
>>>    show_stack+0x24/0x30
>>>    dump_stack+0xb8/0xf4
>>>    ___might_sleep+0x1c8/0x1f8
>>>    __might_sleep+0x58/0x90
>>>    __mutex_lock+0x50/0x890
>>>    mutex_lock_nested+0x3c/0x50
>>>    phy_start_aneg_priv+0x38/0x180
>>>    phy_start_aneg+0x24/0x30
>>>    ravb_nway_reset+0x3c/0x68
>>>    dev_ethtool+0x3dc/0x2338
>>>    dev_ioctl+0x19c/0x490
>>>    sock_do_ioctl+0xe0/0x238
>>>    sock_ioctl+0x254/0x460
>>>    do_vfs_ioctl+0xb0/0x918
>>>    ksys_ioctl+0x50/0x80
>>>    sys_ioctl+0x34/0x48
>>>    __sys_trace_return+0x0/0x4
>>>
>>> The root cause is that an attempt to modify ECMR and GECMR registers
>>> only when RX/TX function is disabled was too overcomplicated in its
>>> original implementation, also processing of an optional Link Change
>>> interrupt added even more complexity, as a result the implementation
>>> was error prone.
>>>
>>> The new locking scheme is confirmed to be correct by dumping driver
>>> specific and generic PHY framework function calls with aid of ftrace
>>> while running more or less advanced tests.
>>>
>>> Please note that sh_eth patches from the series were built-tested only.
>>>
>>> On purpose I do not add Fixes tags, the reused PHY handlers were added
>>> way later than the fixed problems were firstly found in the drivers.
>>
>>    I think you went one step too far with these fixes. On the first glance,
>> the real fixes are to remove grabbing/releasing the spinlock for the duration
>> of the phylib calls. Am I right? If so, making use of the new phylib APIs
>> would be a further enhancement, it's not needed for fixing the splats per se...
> 
>    Note that I hadn't looked at the patches #3/#6 at the time of writing this;
> those seem to be more complicated than the rest.

Right, the simplistic approach of just removing the held spinlock does
not fit well into the overall lame locking model found in the driver.

The thing is that I would prefer to exhibit 'remove custom callbacks'
side of the changes as it is done now, and fixing severe 'invalid contex'
bugs is left as a valuable side effect. I may attempt to find enough
free time to follow your instructions, but frankly speaking I don't
see it beneficial to split a single good all-sufficient change into
three or more: removal of spinlocks, replacement of phy_start_aneg(),
then a non-functional clean-up. Bikeshedding isn't my preference,
but a report about technical flaws related to the published changes
is appreciated, otherwise let me ask you to accept the changes as is,
secondary optimizations can be done on top of them.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
  2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
                   ` (5 preceding siblings ...)
  2018-05-24 16:40 ` [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Sergei Shtylyov
@ 2018-05-25  8:11 ` Geert Uytterhoeven
  6 siblings, 0 replies; 34+ messages in thread
From: Geert Uytterhoeven @ 2018-05-25  8:11 UTC (permalink / raw)
  To: Vladimir Zapolskiy
  Cc: David S. Miller, Sergei Shtylyov, netdev, Linux-Renesas

Hi Vladimir,

On Thu, May 24, 2018 at 1:11 PM, Vladimir Zapolskiy
<vladimir_zapolskiy@mentor.com> wrote:
> For ages trivial changes to RAVB and SuperH ethernet links by means of
> standard 'ethtool' trigger a 'sleeping function called from invalid
> context' bug, to visualize it on r8a7795 ULCB:
>
>   % ethtool -r eth0
>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747

[...]

> Please note that sh_eth patches from the series were built-tested only.

Thanks you very much!

I've tested this on both R-Car M2-W (sh_eth) and R-Car H3 ES2.0 (ravb),
and the BUG disappeared.

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
  2018-05-24 13:22   ` Andrew Lunn
@ 2018-05-26 16:56   ` Sergei Shtylyov
  2018-05-26 16:56   ` Sergei Shtylyov
  2018-05-26 17:51   ` Sergei Shtylyov
  3 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 16:56 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

Hello.

   A formal patch review this time...

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   OK so far...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().

   Why? Is this necessary to fix the BUG()?

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);

   OK, removing spin_lock_irqsave() fixes the BUG()...
   Not sure what we rotect against here anyway, MAC interrupts?

> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}
> -
> -	return error;
> -}
> -
>  static u32 ravb_get_msglevel(struct net_device *ndev)
>  {
>  	struct ravb_private *priv = netdev_priv(ndev);
> @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
>  }
>  
>  static const struct ethtool_ops ravb_ethtool_ops = {
> -	.nway_reset		= ravb_nway_reset,
> +	.nway_reset		= phy_ethtool_nway_reset,

   What does this fix?

>  	.get_msglevel		= ravb_get_msglevel,
>  	.set_msglevel		= ravb_set_msglevel,
>  	.get_link		= ethtool_op_get_link,

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
  2018-05-24 13:22   ` Andrew Lunn
  2018-05-26 16:56   ` Sergei Shtylyov
@ 2018-05-26 16:56   ` Sergei Shtylyov
  2018-05-26 17:51   ` Sergei Shtylyov
  3 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 16:56 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

Hello.

   A formal patch review this time...

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   OK so far...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().

   Why? Is this necessary to fix the BUG()?

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);

   OK, removing spin_lock_irqsave() fixes the BUG()...
   Not sure what we rotect against here anyway, MAC interrupts?

> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}
> -
> -	return error;
> -}
> -
>  static u32 ravb_get_msglevel(struct net_device *ndev)
>  {
>  	struct ravb_private *priv = netdev_priv(ndev);
> @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
>  }
>  
>  static const struct ethtool_ops ravb_ethtool_ops = {
> -	.nway_reset		= ravb_nway_reset,
> +	.nway_reset		= phy_ethtool_nway_reset,

   What does this fix?

>  	.get_msglevel		= ravb_get_msglevel,
>  	.set_msglevel		= ravb_set_msglevel,
>  	.get_link		= ethtool_op_get_link,

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] ravb: remove custom .get_link_ksettings from ethtool ops
  2018-05-24 11:11 ` [PATCH 2/6] ravb: remove custom .get_link_ksettings " Vladimir Zapolskiy
@ 2018-05-26 17:07   ` Sergei Shtylyov
  0 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 17:07 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change replaces a custom implementation of .get_link_ksettings
> callback with a shared phy_ethtool_get_link_ksettings(), note that

> &priv->lock wrapping is not needed, because the lock does not
> serialize access to phydev fields.

   No BUG() here, AFAICT. But then this is not a fix but an enhancement.
And I would have done that in 2 steps: 1st removing the spinlock code
and the 2nd removing the custom method implementation. 

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
  2018-05-25  6:25     ` Vladimir Zapolskiy
@ 2018-05-26 17:31       ` Sergei Shtylyov
  0 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 17:31 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/25/2018 09:25 AM, Vladimir Zapolskiy wrote:

>>>> For ages trivial changes to RAVB and SuperH ethernet links by means of
>>>> standard 'ethtool' trigger a 'sleeping function called from invalid
>>>> context' bug, to visualize it on r8a7795 ULCB:
>>>>
>>>>   % ethtool -r eth0
>>>>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
>>>>   in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
>>>>   INFO: lockdep is turned off.
>>>>   irq event stamp: 0
>>>>   hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>>>>   hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>>>   softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>>>   softirqs last disabled at (0): [<0000000000000000>]           (null)
>>>>   CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
>>>>   Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
>>>>   Call trace:
>>>>    dump_backtrace+0x0/0x198
>>>>    show_stack+0x24/0x30
>>>>    dump_stack+0xb8/0xf4
>>>>    ___might_sleep+0x1c8/0x1f8
>>>>    __might_sleep+0x58/0x90
>>>>    __mutex_lock+0x50/0x890
>>>>    mutex_lock_nested+0x3c/0x50
>>>>    phy_start_aneg_priv+0x38/0x180
>>>>    phy_start_aneg+0x24/0x30
>>>>    ravb_nway_reset+0x3c/0x68
>>>>    dev_ethtool+0x3dc/0x2338
>>>>    dev_ioctl+0x19c/0x490
>>>>    sock_do_ioctl+0xe0/0x238
>>>>    sock_ioctl+0x254/0x460
>>>>    do_vfs_ioctl+0xb0/0x918
>>>>    ksys_ioctl+0x50/0x80
>>>>    sys_ioctl+0x34/0x48
>>>>    __sys_trace_return+0x0/0x4
>>>>
>>>> The root cause is that an attempt to modify ECMR and GECMR registers
>>>> only when RX/TX function is disabled was too overcomplicated in its
>>>> original implementation, also processing of an optional Link Change
>>>> interrupt added even more complexity, as a result the implementation
>>>> was error prone.
>>>>
>>>> The new locking scheme is confirmed to be correct by dumping driver
>>>> specific and generic PHY framework function calls with aid of ftrace
>>>> while running more or less advanced tests.
>>>>
>>>> Please note that sh_eth patches from the series were built-tested only.
>>>>
>>>> On purpose I do not add Fixes tags, the reused PHY handlers were added
>>>> way later than the fixed problems were firstly found in the drivers.
>>>
>>>    I think you went one step too far with these fixes. On the first glance,
>>> the real fixes are to remove grabbing/releasing the spinlock for the duration
>>> of the phylib calls. Am I right? If so, making use of the new phylib APIs
>>> would be a further enhancement, it's not needed for fixing the splats per se...
>>
>>    Note that I hadn't looked at the patches #3/#6 at the time of writing this;
>> those seem to be more complicated than the rest.
> 
> Right, the simplistic approach of just removing the held spinlock does
> not fit well into the overall lame locking model found in the driver.

   Yet you only try fixing it in the patches #3 and #6. I was talking about
the patches #1 and #4 mostly (#2 and #5 turned out to be non-fixes).

> The thing is that I would prefer to exhibit 'remove custom callbacks'
> side of the changes as it is done now, and fixing severe 'invalid contex'
> bugs is left as a valuable side effect. I may attempt to find enough
> free time to follow your instructions, but frankly speaking I don't
> see it beneficial to split a single good all-sufficient change into
> three or more: removal of spinlocks, replacement of phy_start_aneg(),
> then a non-functional clean-up.
   Yes, I would prefer these step-by-step changes.

> Bikeshedding isn't my preference,

   This is not about bikeshedding. What you are trying to do clearly
violates the 2 basic principles of the kernel development: "don't mix
fixes and enhancements" and "do one thing per patch". 

> but a report about technical flaws related to the published changes
> is appreciated, otherwise let me ask you to accept the changes as is,
> secondary optimizations can be done on top of them.

   No, I'll certainly have to NAK patches #1/#3 in their current form.
I'm yet to review patches #3/#6... anyway, if you lack the time to do things
properly, I'll have to take this burden on my shoulders (giving you credits).
Yet I'm basically is in the same situation as you -- I have to spend my copiuos
free time on the large patch sets (like yours) and I'm still having some cleanups
to sh_eth cooking here (which I'll most probably have to defer)...

> --
> With best wishes,
> Vladimir

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
                     ` (2 preceding siblings ...)
  2018-05-26 16:56   ` Sergei Shtylyov
@ 2018-05-26 17:51   ` Sergei Shtylyov
  3 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 17:51 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   BTW, I was unable to trigger the BUG() with 'ethtool -r eth0' where 'eth0'
is EtherAVB. What am I doing wrong? :-)

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] sh_eth: remove custom .nway_reset from ethtool ops
  2018-05-24 11:11 ` [PATCH 4/6] sh_eth: remove custom .nway_reset " Vladimir Zapolskiy
  2018-05-24 13:30   ` Andrew Lunn
@ 2018-05-26 18:46   ` Sergei Shtylyov
  2018-05-26 19:22     ` Sergei Shtylyov
  1 sibling, 1 reply; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 18:46 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   Again, I'm unable to reproduce this BUG()...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] sh_eth: remove custom .nway_reset from ethtool ops
  2018-05-26 18:46   ` Sergei Shtylyov
@ 2018-05-26 19:22     ` Sergei Shtylyov
  0 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 19:22 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/26/2018 09:46 PM, Sergei Shtylyov wrote:

>> The change fixes a sleep in atomic context issue, which can be
>> always triggered by running 'ethtool -r' command, because
>> phy_start_aneg() protects phydev fields by a mutex.
> 
>    Again, I'm unable to reproduce this BUG()...

   Now I can! I started to suspect this check needs to be specifically enabled
under the Kernel Hacking menu, and it turned out to be so...

>> Another note is that the change implicitly replaces phy_start_aneg()
>> with a newer phy_restart_aneg().
>>
>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> [...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-05-24 11:11 ` [PATCH 3/6] ravb: remove custom .set_link_ksettings " Vladimir Zapolskiy
  2018-05-24 13:29   ` Andrew Lunn
@ 2018-05-26 19:50   ` Sergei Shtylyov
  2018-05-28  9:51     ` Vladimir Zapolskiy
  1 sibling, 1 reply; 34+ messages in thread
From: Sergei Shtylyov @ 2018-05-26 19:50 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change replaces a custom implementation of .set_link_ksettings
> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
> sleep in atomic context bug, which is encountered every time when link
> settings are changed by ethtool.

   Seeing it now...

> Now duplex mode setting is enforced in ravb_adjust_link() only, also
> now TX/RX is disabled when link is put down or modifications to E-MAC
> registers ECMR and GECMR are expected for both cases of checked and
> ignored link status pin state from E-MAC interrupt handler.
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>  1 file changed, 15 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 3d91caa44176..0d811c02ff34 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>  	struct ravb_private *priv = netdev_priv(ndev);
>  	struct phy_device *phydev = ndev->phydev;
>  	bool new_state = false;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&priv->lock, flags);
> +
> +	/* Disable TX and RX right over here, if E-MAC change is ignored */
> +	if (priv->no_avb_link)
> +		ravb_rcv_snd_disable(ndev);
>  
>  	if (phydev->link) {
>  		if (phydev->duplex != priv->duplex) {
> @@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
>  			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
>  			new_state = true;
>  			priv->link = phydev->link;
> -			if (priv->no_avb_link)
> -				ravb_rcv_snd_enable(ndev);
>  		}
>  	} else if (priv->link) {
>  		new_state = true;
>  		priv->link = 0;
>  		priv->speed = 0;
>  		priv->duplex = -1;
> -		if (priv->no_avb_link)
> -			ravb_rcv_snd_disable(ndev);
>  	}
>  
> +	/* Enable TX and RX right over here, if E-MAC change is ignored */
> +	if (priv->no_avb_link && phydev->link)
> +		ravb_rcv_snd_enable(ndev);
> +
> +	mmiowb();
> +	spin_unlock_irqrestore(&priv->lock, flags);
> +

   I like this part. :-)

>  	if (new_state && netif_msg_link(priv))
>  		phy_print_status(phydev);
>  }
> @@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
>  	return 0;
>  }
>  
> -static int ravb_set_link_ksettings(struct net_device *ndev,
> -				   const struct ethtool_link_ksettings *cmd)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	unsigned long flags;
> -	int error;
> -
> -	if (!ndev->phydev)
> -		return -ENODEV;
> -
> -	spin_lock_irqsave(&priv->lock, flags);
> -
> -	/* Disable TX and RX */
> -	ravb_rcv_snd_disable(ndev);
> -
> -	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
> -	if (error)
> -		goto error_exit;
> -
> -	if (cmd->base.duplex == DUPLEX_FULL)
> -		priv->duplex = 1;
> -	else
> -		priv->duplex = 0;
> -
> -	ravb_set_duplex(ndev);
> -
> -error_exit:
> -	mdelay(1);
> -
> -	/* Enable TX and RX */
> -	ravb_rcv_snd_enable(ndev);
> -
> -	mmiowb();
> -	spin_unlock_irqrestore(&priv->lock, flags);
> -
> -	return error;
> -}
> -

   But this part is clearly lumping it all together... 

[...]
> @@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
>  	.set_ringparam		= ravb_set_ringparam,
>  	.get_ts_info		= ravb_get_ts_info,
>  	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
> -	.set_link_ksettings	= ravb_set_link_ksettings,
> +	.set_link_ksettings	= phy_ethtool_set_link_ksettings,

   Should have been a part of the final patch in the fix/enhancement chain...

>  	.get_wol		= ravb_get_wol,
>  	.set_wol		= ravb_set_wol,
>  };

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-05-26 19:50   ` Sergei Shtylyov
@ 2018-05-28  9:51     ` Vladimir Zapolskiy
  2018-06-03 15:42       ` Sergei Shtylyov
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-05-28  9:51 UTC (permalink / raw)
  To: Sergei Shtylyov, David S. Miller; +Cc: netdev, linux-renesas-soc

Hello Sergei,

On 05/26/2018 10:50 PM, Sergei Shtylyov wrote:
> On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:
> 
>> The change replaces a custom implementation of .set_link_ksettings
>> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
>> sleep in atomic context bug, which is encountered every time when link
>> settings are changed by ethtool.
> 
>    Seeing it now...
> 
>> Now duplex mode setting is enforced in ravb_adjust_link() only, also
>> now TX/RX is disabled when link is put down or modifications to E-MAC
>> registers ECMR and GECMR are expected for both cases of checked and
>> ignored link status pin state from E-MAC interrupt handler.
>>
>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>> ---
>>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>>  1 file changed, 15 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>> index 3d91caa44176..0d811c02ff34 100644
>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>>  	struct ravb_private *priv = netdev_priv(ndev);
>>  	struct phy_device *phydev = ndev->phydev;
>>  	bool new_state = false;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&priv->lock, flags);
>> +
>> +	/* Disable TX and RX right over here, if E-MAC change is ignored */
>> +	if (priv->no_avb_link)
>> +		ravb_rcv_snd_disable(ndev);
>>  
>>  	if (phydev->link) {
>>  		if (phydev->duplex != priv->duplex) {
>> @@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
>>  			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
>>  			new_state = true;
>>  			priv->link = phydev->link;
>> -			if (priv->no_avb_link)
>> -				ravb_rcv_snd_enable(ndev);
>>  		}
>>  	} else if (priv->link) {
>>  		new_state = true;
>>  		priv->link = 0;
>>  		priv->speed = 0;
>>  		priv->duplex = -1;
>> -		if (priv->no_avb_link)
>> -			ravb_rcv_snd_disable(ndev);
>>  	}
>>  
>> +	/* Enable TX and RX right over here, if E-MAC change is ignored */
>> +	if (priv->no_avb_link && phydev->link)
>> +		ravb_rcv_snd_enable(ndev);
>> +
>> +	mmiowb();
>> +	spin_unlock_irqrestore(&priv->lock, flags);
>> +
> 
>    I like this part. :-)
> 

A weight off my mind :) And I hope that this change will remain the less
questionable one, other ones from the series are trivial.

Anyway I hope it is understandable that this part of the change can not
be simply extracted from the rest one below, otherwise there'll be bugs of
another type intorduced.

>>  	if (new_state && netif_msg_link(priv))
>>  		phy_print_status(phydev);
>>  }
>> @@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
>>  	return 0;
>>  }
>>  
>> -static int ravb_set_link_ksettings(struct net_device *ndev,
>> -				   const struct ethtool_link_ksettings *cmd)
>> -{
>> -	struct ravb_private *priv = netdev_priv(ndev);
>> -	unsigned long flags;
>> -	int error;
>> -
>> -	if (!ndev->phydev)
>> -		return -ENODEV;
>> -
>> -	spin_lock_irqsave(&priv->lock, flags);
>> -
>> -	/* Disable TX and RX */
>> -	ravb_rcv_snd_disable(ndev);
>> -
>> -	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
>> -	if (error)
>> -		goto error_exit;
>> -
>> -	if (cmd->base.duplex == DUPLEX_FULL)
>> -		priv->duplex = 1;
>> -	else
>> -		priv->duplex = 0;
>> -
>> -	ravb_set_duplex(ndev);
>> -
>> -error_exit:
>> -	mdelay(1);
>> -
>> -	/* Enable TX and RX */
>> -	ravb_rcv_snd_enable(ndev);
>> -
>> -	mmiowb();
>> -	spin_unlock_irqrestore(&priv->lock, flags);
>> -
>> -	return error;
>> -}
>> -
> 
>    But this part is clearly lumping it all together... 

Please elaborate.

> 
> [...]
>> @@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
>>  	.set_ringparam		= ravb_set_ringparam,
>>  	.get_ts_info		= ravb_get_ts_info,
>>  	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
>> -	.set_link_ksettings	= ravb_set_link_ksettings,
>> +	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
> 
>    Should have been a part of the final patch in the fix/enhancement chain...

Please elaborate.

Do you mean that firstly I have to make erroneous ravb_set_link_ksettings()
to look similar to phy_ethtool_set_link_ksettings() and then remove it?

As I see it in the current context (removal of ravb_set_duplex() call and
so on), the problem with this approach is that the actual fix change will
be done on top of a number of enchancement changes, thus it contradicts to
the accepted development/maintenace model "fixes first", and most probably
it won't be possible to backport the real fix, however this sole change can
be backported.

> 
>>  	.get_wol		= ravb_get_wol,
>>  	.set_wol		= ravb_set_wol,
>>  };
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-05-28  9:51     ` Vladimir Zapolskiy
@ 2018-06-03 15:42       ` Sergei Shtylyov
  2018-06-04 11:07         ` Vladimir Zapolskiy
  0 siblings, 1 reply; 34+ messages in thread
From: Sergei Shtylyov @ 2018-06-03 15:42 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

Hello!

   Sorry for the delay replying, the management keeps me busy... :-(

On 05/28/2018 12:51 PM, Vladimir Zapolskiy wrote:

>>> The change replaces a custom implementation of .set_link_ksettings
>>> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
>>> sleep in atomic context bug, which is encountered every time when link
>>> settings are changed by ethtool.
>>
>>    Seeing it now...

   And to say that this is *fixed* by removing the custom method is err...
simply misleading. The sleep in atomic context is fixed solely by the removal
of the spinlock grabbing before the phylib call.

>>> Now duplex mode setting is enforced in ravb_adjust_link() only, also
>>> now TX/RX is disabled when link is put down or modifications to E-MAC
>>> registers ECMR and GECMR are expected for both cases of checked and
>>> ignored link status pin state from E-MAC interrupt handler.
>>>
>>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>>> ---
>>>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>>>  1 file changed, 15 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 3d91caa44176..0d811c02ff34 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>  	struct ravb_private *priv = netdev_priv(ndev);
>>>  	struct phy_device *phydev = ndev->phydev;
>>>  	bool new_state = false;
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&priv->lock, flags);
>>> +
>>> +	/* Disable TX and RX right over here, if E-MAC change is ignored */
>>> +	if (priv->no_avb_link)
>>> +		ravb_rcv_snd_disable(ndev);
>>>  
>>>  	if (phydev->link) {
>>>  		if (phydev->duplex != priv->duplex) {
>>> @@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>  			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
>>>  			new_state = true;
>>>  			priv->link = phydev->link;
>>> -			if (priv->no_avb_link)
>>> -				ravb_rcv_snd_enable(ndev);
>>>  		}
>>>  	} else if (priv->link) {
>>>  		new_state = true;
>>>  		priv->link = 0;
>>>  		priv->speed = 0;
>>>  		priv->duplex = -1;
>>> -		if (priv->no_avb_link)
>>> -			ravb_rcv_snd_disable(ndev);
>>>  	}
>>>  
>>> +	/* Enable TX and RX right over here, if E-MAC change is ignored */
>>> +	if (priv->no_avb_link && phydev->link)
>>> +		ravb_rcv_snd_enable(ndev);
>>> +
>>> +	mmiowb();
>>> +	spin_unlock_irqrestore(&priv->lock, flags);
>>> +
>>
>>    I like this part. :-)
>>
> 
> A weight off my mind :) And I hope that this change will remain the less
> questionable one, other ones from the series are trivial.
> 
> Anyway I hope it is understandable that this part of the change can not
> be simply extracted from the rest one below, otherwise there'll be bugs of
> another type intorduced.

   I never said I'd like to apply this part alone, my idea was more like removing
the spinlock grabbing and the duplex handling down below.

[...]
>>> @@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
>>>  	return 0;
>>>  }
>>>  
>>> -static int ravb_set_link_ksettings(struct net_device *ndev,
>>> -				   const struct ethtool_link_ksettings *cmd)
>>> -{
>>> -	struct ravb_private *priv = netdev_priv(ndev);
>>> -	unsigned long flags;
>>> -	int error;
>>> -
>>> -	if (!ndev->phydev)
>>> -		return -ENODEV;
>>> -
>>> -	spin_lock_irqsave(&priv->lock, flags);
>>> -
>>> -	/* Disable TX and RX */
>>> -	ravb_rcv_snd_disable(ndev);
>>> -
>>> -	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
>>> -	if (error)
>>> -		goto error_exit;
>>> -
>>> -	if (cmd->base.duplex == DUPLEX_FULL)
>>> -		priv->duplex = 1;
>>> -	else
>>> -		priv->duplex = 0;
>>> -
>>> -	ravb_set_duplex(ndev);
>>> -
>>> -error_exit:
>>> -	mdelay(1);
>>> -
>>> -	/* Enable TX and RX */
>>> -	ravb_rcv_snd_enable(ndev);
>>> -
>>> -	mmiowb();
>>> -	spin_unlock_irqrestore(&priv->lock, flags);
>>> -
>>> -	return error;
>>> -}
>>> -
>>
>>    But this part is clearly lumping it all together... 
> 
> Please elaborate.

   My point is still that complete removal of the custom method was somewhat
premature and completely unnecessary for fixing the issues we have.

>> [...]
>>> @@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
>>>  	.set_ringparam		= ravb_set_ringparam,
>>>  	.get_ts_info		= ravb_get_ts_info,
>>>  	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
>>> -	.set_link_ksettings	= ravb_set_link_ksettings,
>>> +	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
>>
>>    Should have been a part of the final patch in the fix/enhancement chain...
> 
> Please elaborate.
> 
> Do you mean that firstly I have to make erroneous ravb_set_link_ksettings()
> to look similar to phy_ethtool_set_link_ksettings() and then remove it?

   Yes.

> As I see it in the current context (removal of ravb_set_duplex() call and
> so on), the problem with this approach is that the actual fix change will
> be done on top of a number of enchancement changes, thus it contradicts to

   Now I have to ask you to elaborate. I have no idea what you mean. :-(

   And of course, sometimes the things are broken in a so subtle way, that
only as pile of "cleanups" fixed them, we had that situation in e.g. the
R-Car I2C driver -- *none* of AFAIR 9 patches was good as a -stable patch...

> the accepted development/maintenace model "fixes first", and most probably
> it won't be possible to backport the real fix, however this sole change can
> be backported.

   My idea was to move the [G]ECMR writes to the adjust_link() callback and
to stop grabbing the spinlock where it *was* grabbed in the same fix patch.
Then just a single clean up, to start using the new phylib method.

[...]
> --
> With best wishes,
> Vladimir

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-06-03 15:42       ` Sergei Shtylyov
@ 2018-06-04 11:07         ` Vladimir Zapolskiy
  2018-06-06 20:34           ` Sergei Shtylyov
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Zapolskiy @ 2018-06-04 11:07 UTC (permalink / raw)
  To: Sergei Shtylyov, David S. Miller; +Cc: netdev, linux-renesas-soc

Hello Sergei,

On 06/03/2018 06:42 PM, Sergei Shtylyov wrote:
> Hello!
> 
>    Sorry for the delay replying, the management keeps me busy... :-(
> 
> On 05/28/2018 12:51 PM, Vladimir Zapolskiy wrote:
> 
>>>> The change replaces a custom implementation of .set_link_ksettings
>>>> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
>>>> sleep in atomic context bug, which is encountered every time when link
>>>> settings are changed by ethtool.
>>>
>>>    Seeing it now...
> 
>    And to say that this is *fixed* by removing the custom method is err...
> simply misleading. The sleep in atomic context is fixed solely by the removal
> of the spinlock grabbing before the phylib call.
> 

As I've already said, "the removal of the spinlock grabbing before the phylib
call" is not a valid fix, but it will introduce another regression.

>>>> Now duplex mode setting is enforced in ravb_adjust_link() only, also
>>>> now TX/RX is disabled when link is put down or modifications to E-MAC
>>>> registers ECMR and GECMR are expected for both cases of checked and
>>>> ignored link status pin state from E-MAC interrupt handler.
>>>>
>>>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>>>> ---
>>>>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>>>>  1 file changed, 15 insertions(+), 43 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>>> index 3d91caa44176..0d811c02ff34 100644
>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>>> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>>  	struct ravb_private *priv = netdev_priv(ndev);
>>>>  	struct phy_device *phydev = ndev->phydev;
>>>>  	bool new_state = false;
>>>> +	unsigned long flags;
>>>> +
>>>> +	spin_lock_irqsave(&priv->lock, flags);
>>>> +
>>>> +	/* Disable TX and RX right over here, if E-MAC change is ignored */
>>>> +	if (priv->no_avb_link)
>>>> +		ravb_rcv_snd_disable(ndev);
>>>>  
>>>>  	if (phydev->link) {
>>>>  		if (phydev->duplex != priv->duplex) {
>>>> @@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>>  			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
>>>>  			new_state = true;
>>>>  			priv->link = phydev->link;
>>>> -			if (priv->no_avb_link)
>>>> -				ravb_rcv_snd_enable(ndev);
>>>>  		}
>>>>  	} else if (priv->link) {
>>>>  		new_state = true;
>>>>  		priv->link = 0;
>>>>  		priv->speed = 0;
>>>>  		priv->duplex = -1;
>>>> -		if (priv->no_avb_link)
>>>> -			ravb_rcv_snd_disable(ndev);
>>>>  	}
>>>>  
>>>> +	/* Enable TX and RX right over here, if E-MAC change is ignored */
>>>> +	if (priv->no_avb_link && phydev->link)
>>>> +		ravb_rcv_snd_enable(ndev);
>>>> +
>>>> +	mmiowb();
>>>> +	spin_unlock_irqrestore(&priv->lock, flags);
>>>> +
>>>
>>>    I like this part. :-)
>>>
>>
>> A weight off my mind :) And I hope that this change will remain the less
>> questionable one, other ones from the series are trivial.
>>
>> Anyway I hope it is understandable that this part of the change can not
>> be simply extracted from the rest one below, otherwise there'll be bugs of
>> another type intorduced.
> 
>    I never said I'd like to apply this part alone, my idea was more like removing
> the spinlock grabbing and the duplex handling down below.
> 

As I've already said, "the removal of the spinlock grabbing" is not a valid fix,
but it will introduce another regression.

Please tell me, a removal of duplex handling change should be done before
a removal of the spinlock grabbing? Or after?

> [...]
>>>> @@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> -static int ravb_set_link_ksettings(struct net_device *ndev,
>>>> -				   const struct ethtool_link_ksettings *cmd)
>>>> -{
>>>> -	struct ravb_private *priv = netdev_priv(ndev);
>>>> -	unsigned long flags;
>>>> -	int error;
>>>> -
>>>> -	if (!ndev->phydev)
>>>> -		return -ENODEV;
>>>> -
>>>> -	spin_lock_irqsave(&priv->lock, flags);
>>>> -
>>>> -	/* Disable TX and RX */
>>>> -	ravb_rcv_snd_disable(ndev);
>>>> -
>>>> -	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
>>>> -	if (error)
>>>> -		goto error_exit;
>>>> -
>>>> -	if (cmd->base.duplex == DUPLEX_FULL)
>>>> -		priv->duplex = 1;
>>>> -	else
>>>> -		priv->duplex = 0;
>>>> -
>>>> -	ravb_set_duplex(ndev);
>>>> -
>>>> -error_exit:
>>>> -	mdelay(1);
>>>> -
>>>> -	/* Enable TX and RX */
>>>> -	ravb_rcv_snd_enable(ndev);
>>>> -
>>>> -	mmiowb();
>>>> -	spin_unlock_irqrestore(&priv->lock, flags);
>>>> -
>>>> -	return error;
>>>> -}
>>>> -
>>>
>>>    But this part is clearly lumping it all together... 
>>
>> Please elaborate.
> 
>    My point is still that complete removal of the custom method was somewhat
> premature and completely unnecessary for fixing the issues we have.
> 
>>> [...]
>>>> @@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
>>>>  	.set_ringparam		= ravb_set_ringparam,
>>>>  	.get_ts_info		= ravb_get_ts_info,
>>>>  	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
>>>> -	.set_link_ksettings	= ravb_set_link_ksettings,
>>>> +	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
>>>
>>>    Should have been a part of the final patch in the fix/enhancement chain...
>>
>> Please elaborate.
>>
>> Do you mean that firstly I have to make erroneous ravb_set_link_ksettings()
>> to look similar to phy_ethtool_set_link_ksettings() and then remove it?
> 
>    Yes.
> 

Then this change of "ravb_set_link_ksettings() looks similar to
phy_ethtool_set_link_ksettings()" will be a single commit, and it will be
a fix. Does it sound good?

>> As I see it in the current context (removal of ravb_set_duplex() call and
>> so on), the problem with this approach is that the actual fix change will
>> be done on top of a number of enchancement changes, thus it contradicts to
> 
>    Now I have to ask you to elaborate. I have no idea what you mean. :-(
> 

My statement is based on the next facts:
1. ravb_set_duplex() call in ravb_set_link_ksettings() is unnecessary,
   however its removal is an enchancement,
2. removal of the spinlock grabbing is just a *part* of the proper fix,
   and the complete proper fix includes ravb_set_duplex() call removal,
   adding spinlock grabbing to ravb_adjust_link() and other modifications
   to ravb_adjust_link() from this commit.
3. commits with fixes must precede commits with enchancements in the
   series, because enchancements are not backported.

The question remains the same, what do you ask me to do? 

>    And of course, sometimes the things are broken in a so subtle way, that
> only as pile of "cleanups" fixed them, we had that situation in e.g. the
> R-Car I2C driver -- *none* of AFAIR 9 patches was good as a -stable patch...
> 
>> the accepted development/maintenace model "fixes first", and most probably
>> it won't be possible to backport the real fix, however this sole change can
>> be backported.
> 
>    My idea was to move the [G]ECMR writes to the adjust_link() callback and
> to stop grabbing the spinlock where it *was* grabbed in the same fix patch.
> Then just a single clean up, to start using the new phylib method.
> 

It will be okay iff ravb_set_duplex() call removal is added to the first
part ("fixes" part) of two changes. Please confirm that our understanding
is aligned.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
  2018-06-04 11:07         ` Vladimir Zapolskiy
@ 2018-06-06 20:34           ` Sergei Shtylyov
  0 siblings, 0 replies; 34+ messages in thread
From: Sergei Shtylyov @ 2018-06-06 20:34 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc

On 06/04/2018 02:07 PM, Vladimir Zapolskiy wrote:

>>>>> The change replaces a custom implementation of .set_link_ksettings
>>>>> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
>>>>> sleep in atomic context bug, which is encountered every time when link
>>>>> settings are changed by ethtool.
>>>>
>>>>    Seeing it now...
>>
>>    And to say that this is *fixed* by removing the custom method is err...
>> simply misleading. The sleep in atomic context is fixed solely by the removal
>> of the spinlock grabbing before the phylib call.

> As I've already said, "the removal of the spinlock grabbing before the phylib
> call" is not a valid fix, but it will introduce another regression.

   It's the necessary part of the fix, unlike using the generic phylib method.

>>>>> Now duplex mode setting is enforced in ravb_adjust_link() only, also
>>>>> now TX/RX is disabled when link is put down or modifications to E-MAC
>>>>> registers ECMR and GECMR are expected for both cases of checked and
>>>>> ignored link status pin state from E-MAC interrupt handler.
>>>>>
>>>>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>>>>> ---
>>>>>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>>>>>  1 file changed, 15 insertions(+), 43 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> index 3d91caa44176..0d811c02ff34 100644
>>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>>>  	struct ravb_private *priv = netdev_priv(ndev);
>>>>>  	struct phy_device *phydev = ndev->phydev;
>>>>>  	bool new_state = false;
>>>>> +	unsigned long flags;
>>>>> +
>>>>> +	spin_lock_irqsave(&priv->lock, flags);
>>>>> +
>>>>> +	/* Disable TX and RX right over here, if E-MAC change is ignored */
>>>>> +	if (priv->no_avb_link)
>>>>> +		ravb_rcv_snd_disable(ndev);
>>>>>  
>>>>>  	if (phydev->link) {
>>>>>  		if (phydev->duplex != priv->duplex) {
>>>>> @@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>>>  			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
>>>>>  			new_state = true;
>>>>>  			priv->link = phydev->link;
>>>>> -			if (priv->no_avb_link)
>>>>> -				ravb_rcv_snd_enable(ndev);
>>>>>  		}
>>>>>  	} else if (priv->link) {
>>>>>  		new_state = true;
>>>>>  		priv->link = 0;
>>>>>  		priv->speed = 0;
>>>>>  		priv->duplex = -1;
>>>>> -		if (priv->no_avb_link)
>>>>> -			ravb_rcv_snd_disable(ndev);
>>>>>  	}
>>>>>  
>>>>> +	/* Enable TX and RX right over here, if E-MAC change is ignored */
>>>>> +	if (priv->no_avb_link && phydev->link)
>>>>> +		ravb_rcv_snd_enable(ndev);
>>>>> +
>>>>> +	mmiowb();
>>>>> +	spin_unlock_irqrestore(&priv->lock, flags);
>>>>> +
>>>>
>>>>    I like this part. :-)
>>>>
>>>
>>> A weight off my mind :) And I hope that this change will remain the less
>>> questionable one, other ones from the series are trivial.
>>>
>>> Anyway I hope it is understandable that this part of the change can not
>>> be simply extracted from the rest one below, otherwise there'll be bugs of
>>> another type intorduced.
>>
>>    I never said I'd like to apply this part alone, my idea was more like removing
>> the spinlock grabbing and the duplex handling down below.
>>
> 
> As I've already said, "the removal of the spinlock grabbing" is not a valid fix,
> but it will introduce another regression.
> 
> Please tell me, a removal of duplex handling change should be done before
> a removal of the spinlock grabbing? Or after?

   As much as I was able to figure out, at the same time.

>> [...]
>>>>> @@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
>>>>>  	.set_ringparam		= ravb_set_ringparam,
>>>>>  	.get_ts_info		= ravb_get_ts_info,
>>>>>  	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
>>>>> -	.set_link_ksettings	= ravb_set_link_ksettings,
>>>>> +	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
>>>>
>>>>    Should have been a part of the final patch in the fix/enhancement chain...
>>>
>>> Please elaborate.
>>>
>>> Do you mean that firstly I have to make erroneous ravb_set_link_ksettings()
>>> to look similar to phy_ethtool_set_link_ksettings() and then remove it?
>>
>>    Yes.

> Then this change of "ravb_set_link_ksettings() looks similar to
> phy_ethtool_set_link_ksettings()" will be a single commit, and it will be
> a fix. Does it sound good?

   It does, as I was trying to tell you. :-)

>>> As I see it in the current context (removal of ravb_set_duplex() call and
>>> so on), the problem with this approach is that the actual fix change will
>>> be done on top of a number of enchancement changes, thus it contradicts to
>>
>>    Now I have to ask you to elaborate. I have no idea what you mean. :-(
> 
> My statement is based on the next facts:

  s/next/following/?

> 1. ravb_set_duplex() call in ravb_set_link_ksettings() is unnecessary,
>    however its removal is an enchancement,
> 2. removal of the spinlock grabbing is just a *part* of the proper fix,
>    and the complete proper fix includes ravb_set_duplex() call removal,
>    adding spinlock grabbing to ravb_adjust_link() and other modifications
>    to ravb_adjust_link() from this commit.

   So far, so good. :-)

> 3. commits with fixes must precede commits with enchancements in the
>    series, because enchancements are not backported.

   Enhancements?
   Yes, if at all possible.

> The question remains the same, what do you ask me to do? 

   Mainly, to separate fixes from clean-ups, as much as possible. That'll simplify
the -stable backport handling for DaveM and the people maintaining the earlier kernel
versions.

>>    And of course, sometimes the things are broken in a so subtle way, that
>> only as pile of "cleanups" fixed them, we had that situation in e.g. the
>> R-Car I2C driver -- *none* of AFAIR 9 patches was good as a -stable patch...
>>
>>> the accepted development/maintenace model "fixes first", and most probably
>>> it won't be possible to backport the real fix, however this sole change can
>>> be backported.
>>
>>    My idea was to move the [G]ECMR writes to the adjust_link() callback and
>> to stop grabbing the spinlock where it *was* grabbed in the same fix patch.
>> Then just a single clean up, to start using the new phylib method.

> It will be okay iff ravb_set_duplex() call removal is added to the first
> part ("fixes" part) of two changes. Please confirm that our understanding
> is aligned.

   Yes, and I've tried to communicate that to you but somehow failed. :-)
> --
> With best wishes,
> Vladimir

MBR, Sergei

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-06-06 20:34 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 11:11 [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Vladimir Zapolskiy
2018-05-24 11:11 ` [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops Vladimir Zapolskiy
2018-05-24 13:22   ` Andrew Lunn
2018-05-24 14:11     ` Vladimir Zapolskiy
2018-05-24 16:18       ` Sergei Shtylyov
2018-05-24 16:44         ` Andrew Lunn
2018-05-24 17:01           ` Sergei Shtylyov
2018-05-24 17:23             ` Andrew Lunn
2018-05-25  6:05             ` Vladimir Zapolskiy
2018-05-26 16:56   ` Sergei Shtylyov
2018-05-26 16:56   ` Sergei Shtylyov
2018-05-26 17:51   ` Sergei Shtylyov
2018-05-24 11:11 ` [PATCH 2/6] ravb: remove custom .get_link_ksettings " Vladimir Zapolskiy
2018-05-26 17:07   ` Sergei Shtylyov
2018-05-24 11:11 ` [PATCH 3/6] ravb: remove custom .set_link_ksettings " Vladimir Zapolskiy
2018-05-24 13:29   ` Andrew Lunn
2018-05-24 14:06     ` Vladimir Zapolskiy
2018-05-26 19:50   ` Sergei Shtylyov
2018-05-28  9:51     ` Vladimir Zapolskiy
2018-06-03 15:42       ` Sergei Shtylyov
2018-06-04 11:07         ` Vladimir Zapolskiy
2018-06-06 20:34           ` Sergei Shtylyov
2018-05-24 11:11 ` [PATCH 4/6] sh_eth: remove custom .nway_reset " Vladimir Zapolskiy
2018-05-24 13:30   ` Andrew Lunn
2018-05-26 18:46   ` Sergei Shtylyov
2018-05-26 19:22     ` Sergei Shtylyov
2018-05-24 11:14 ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Vladimir Zapolskiy
2018-05-24 11:14   ` [PATCH 6/6] sh_eth: remove custom .set_link_ksettings " Vladimir Zapolskiy
2018-05-24 13:31   ` [PATCH 5/6] sh_eth: remove custom .get_link_ksettings " Andrew Lunn
2018-05-24 16:40 ` [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers Sergei Shtylyov
2018-05-24 17:24   ` Sergei Shtylyov
2018-05-25  6:25     ` Vladimir Zapolskiy
2018-05-26 17:31       ` Sergei Shtylyov
2018-05-25  8:11 ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).