linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure
@ 2022-02-23  3:14 Heyi Guo
  2022-02-23  3:14 ` [PATCH 1/3] drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call Heyi Guo
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Heyi Guo @ 2022-02-23  3:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heyi Guo, Andrew Lunn, David S. Miller, Jakub Kicinski,
	Joel Stanley, Guangbin Huang, Hao Chen, Arnd Bergmann,
	Dylan Hung, netdev

This patch set is to fix the issues discussed in the mail thread:
https://lore.kernel.org/netdev/51f5b7a7-330f-6b3c-253d-10e45cdb6805@linux.alibaba.com/
and follows the advice from Andrew Lunn.

The first 2 patches refactors the code to enable adjust_link calling reset
function directly.

Cc: Andrew Lunn <andrew@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Guangbin Huang <huangguangbin2@huawei.com>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dylan Hung <dylan_hung@aspeedtech.com>
Cc: netdev@vger.kernel.org


Heyi Guo (3):
  drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct
    function call
  drivers/net/ftgmac100: adjust code place for function call dependency
  drivers/net/ftgmac100: fix DHCP potential failure with systemd

 drivers/net/ethernet/faraday/ftgmac100.c | 243 ++++++++++++-----------
 1 file changed, 129 insertions(+), 114 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call
  2022-02-23  3:14 [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure Heyi Guo
@ 2022-02-23  3:14 ` Heyi Guo
  2022-02-23  3:14 ` [PATCH 2/3] drivers/net/ftgmac100: adjust code place for function call dependency Heyi Guo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Heyi Guo @ 2022-02-23  3:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heyi Guo, Andrew Lunn, David S. Miller, Jakub Kicinski,
	Joel Stanley, Guangbin Huang, Hao Chen, Arnd Bergmann,
	Dylan Hung, netdev

This is to prepare for ftgmac100_adjust_link() to call reset function
directly, instead of task schedule.

Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>

---
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Guangbin Huang <huangguangbin2@huawei.com>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dylan Hung <dylan_hung@aspeedtech.com>
Cc: netdev@vger.kernel.org


---
 drivers/net/ethernet/faraday/ftgmac100.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 691605c152659..1f3eb44314753 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1410,10 +1410,8 @@ static int ftgmac100_init_all(struct ftgmac100 *priv, bool ignore_alloc_err)
 	return err;
 }
 
-static void ftgmac100_reset_task(struct work_struct *work)
+static void ftgmac100_reset(struct ftgmac100 *priv)
 {
-	struct ftgmac100 *priv = container_of(work, struct ftgmac100,
-					      reset_task);
 	struct net_device *netdev = priv->netdev;
 	int err;
 
@@ -1459,6 +1457,14 @@ static void ftgmac100_reset_task(struct work_struct *work)
 	rtnl_unlock();
 }
 
+static void ftgmac100_reset_task(struct work_struct *work)
+{
+	struct ftgmac100 *priv = container_of(work, struct ftgmac100,
+					      reset_task);
+
+	ftgmac100_reset(priv);
+}
+
 static int ftgmac100_open(struct net_device *netdev)
 {
 	struct ftgmac100 *priv = netdev_priv(netdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] drivers/net/ftgmac100: adjust code place for function call dependency
  2022-02-23  3:14 [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure Heyi Guo
  2022-02-23  3:14 ` [PATCH 1/3] drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call Heyi Guo
@ 2022-02-23  3:14 ` Heyi Guo
  2022-02-23  3:14 ` [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd Heyi Guo
  2022-02-23 13:20 ` [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure patchwork-bot+netdevbpf
  3 siblings, 0 replies; 12+ messages in thread
From: Heyi Guo @ 2022-02-23  3:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heyi Guo, Andrew Lunn, David S. Miller, Jakub Kicinski,
	Joel Stanley, Guangbin Huang, Hao Chen, Arnd Bergmann,
	Dylan Hung, netdev

This is to prepare for ftgmac100_adjust_link() to call
ftgmac100_reset() directly. Only code places are changed.

Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
---
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Guangbin Huang <huangguangbin2@huawei.com>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dylan Hung <dylan_hung@aspeedtech.com>
Cc: netdev@vger.kernel.org


---
 drivers/net/ethernet/faraday/ftgmac100.c | 222 +++++++++++------------
 1 file changed, 111 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 1f3eb44314753..c1deb6e5d26c5 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -989,117 +989,6 @@ static int ftgmac100_alloc_rx_buffers(struct ftgmac100 *priv)
 	return 0;
 }
 
-static void ftgmac100_adjust_link(struct net_device *netdev)
-{
-	struct ftgmac100 *priv = netdev_priv(netdev);
-	struct phy_device *phydev = netdev->phydev;
-	bool tx_pause, rx_pause;
-	int new_speed;
-
-	/* We store "no link" as speed 0 */
-	if (!phydev->link)
-		new_speed = 0;
-	else
-		new_speed = phydev->speed;
-
-	/* Grab pause settings from PHY if configured to do so */
-	if (priv->aneg_pause) {
-		rx_pause = tx_pause = phydev->pause;
-		if (phydev->asym_pause)
-			tx_pause = !rx_pause;
-	} else {
-		rx_pause = priv->rx_pause;
-		tx_pause = priv->tx_pause;
-	}
-
-	/* Link hasn't changed, do nothing */
-	if (phydev->speed == priv->cur_speed &&
-	    phydev->duplex == priv->cur_duplex &&
-	    rx_pause == priv->rx_pause &&
-	    tx_pause == priv->tx_pause)
-		return;
-
-	/* Print status if we have a link or we had one and just lost it,
-	 * don't print otherwise.
-	 */
-	if (new_speed || priv->cur_speed)
-		phy_print_status(phydev);
-
-	priv->cur_speed = new_speed;
-	priv->cur_duplex = phydev->duplex;
-	priv->rx_pause = rx_pause;
-	priv->tx_pause = tx_pause;
-
-	/* Link is down, do nothing else */
-	if (!new_speed)
-		return;
-
-	/* Disable all interrupts */
-	iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
-
-	/* Reset the adapter asynchronously */
-	schedule_work(&priv->reset_task);
-}
-
-static int ftgmac100_mii_probe(struct net_device *netdev)
-{
-	struct ftgmac100 *priv = netdev_priv(netdev);
-	struct platform_device *pdev = to_platform_device(priv->dev);
-	struct device_node *np = pdev->dev.of_node;
-	struct phy_device *phydev;
-	phy_interface_t phy_intf;
-	int err;
-
-	/* Default to RGMII. It's a gigabit part after all */
-	err = of_get_phy_mode(np, &phy_intf);
-	if (err)
-		phy_intf = PHY_INTERFACE_MODE_RGMII;
-
-	/* Aspeed only supports these. I don't know about other IP
-	 * block vendors so I'm going to just let them through for
-	 * now. Note that this is only a warning if for some obscure
-	 * reason the DT really means to lie about it or it's a newer
-	 * part we don't know about.
-	 *
-	 * On the Aspeed SoC there are additionally straps and SCU
-	 * control bits that could tell us what the interface is
-	 * (or allow us to configure it while the IP block is held
-	 * in reset). For now I chose to keep this driver away from
-	 * those SoC specific bits and assume the device-tree is
-	 * right and the SCU has been configured properly by pinmux
-	 * or the firmware.
-	 */
-	if (priv->is_aspeed && !(phy_interface_mode_is_rgmii(phy_intf))) {
-		netdev_warn(netdev,
-			    "Unsupported PHY mode %s !\n",
-			    phy_modes(phy_intf));
-	}
-
-	phydev = phy_find_first(priv->mii_bus);
-	if (!phydev) {
-		netdev_info(netdev, "%s: no PHY found\n", netdev->name);
-		return -ENODEV;
-	}
-
-	phydev = phy_connect(netdev, phydev_name(phydev),
-			     &ftgmac100_adjust_link, phy_intf);
-
-	if (IS_ERR(phydev)) {
-		netdev_err(netdev, "%s: Could not attach to PHY\n", netdev->name);
-		return PTR_ERR(phydev);
-	}
-
-	/* Indicate that we support PAUSE frames (see comment in
-	 * Documentation/networking/phy.rst)
-	 */
-	phy_support_asym_pause(phydev);
-
-	/* Display what we found */
-	phy_attached_info(phydev);
-
-	return 0;
-}
-
 static int ftgmac100_mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum)
 {
 	struct net_device *netdev = bus->priv;
@@ -1465,6 +1354,117 @@ static void ftgmac100_reset_task(struct work_struct *work)
 	ftgmac100_reset(priv);
 }
 
+static void ftgmac100_adjust_link(struct net_device *netdev)
+{
+	struct ftgmac100 *priv = netdev_priv(netdev);
+	struct phy_device *phydev = netdev->phydev;
+	bool tx_pause, rx_pause;
+	int new_speed;
+
+	/* We store "no link" as speed 0 */
+	if (!phydev->link)
+		new_speed = 0;
+	else
+		new_speed = phydev->speed;
+
+	/* Grab pause settings from PHY if configured to do so */
+	if (priv->aneg_pause) {
+		rx_pause = tx_pause = phydev->pause;
+		if (phydev->asym_pause)
+			tx_pause = !rx_pause;
+	} else {
+		rx_pause = priv->rx_pause;
+		tx_pause = priv->tx_pause;
+	}
+
+	/* Link hasn't changed, do nothing */
+	if (phydev->speed == priv->cur_speed &&
+	    phydev->duplex == priv->cur_duplex &&
+	    rx_pause == priv->rx_pause &&
+	    tx_pause == priv->tx_pause)
+		return;
+
+	/* Print status if we have a link or we had one and just lost it,
+	 * don't print otherwise.
+	 */
+	if (new_speed || priv->cur_speed)
+		phy_print_status(phydev);
+
+	priv->cur_speed = new_speed;
+	priv->cur_duplex = phydev->duplex;
+	priv->rx_pause = rx_pause;
+	priv->tx_pause = tx_pause;
+
+	/* Link is down, do nothing else */
+	if (!new_speed)
+		return;
+
+	/* Disable all interrupts */
+	iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
+
+	/* Reset the adapter asynchronously */
+	schedule_work(&priv->reset_task);
+}
+
+static int ftgmac100_mii_probe(struct net_device *netdev)
+{
+	struct ftgmac100 *priv = netdev_priv(netdev);
+	struct platform_device *pdev = to_platform_device(priv->dev);
+	struct device_node *np = pdev->dev.of_node;
+	struct phy_device *phydev;
+	phy_interface_t phy_intf;
+	int err;
+
+	/* Default to RGMII. It's a gigabit part after all */
+	err = of_get_phy_mode(np, &phy_intf);
+	if (err)
+		phy_intf = PHY_INTERFACE_MODE_RGMII;
+
+	/* Aspeed only supports these. I don't know about other IP
+	 * block vendors so I'm going to just let them through for
+	 * now. Note that this is only a warning if for some obscure
+	 * reason the DT really means to lie about it or it's a newer
+	 * part we don't know about.
+	 *
+	 * On the Aspeed SoC there are additionally straps and SCU
+	 * control bits that could tell us what the interface is
+	 * (or allow us to configure it while the IP block is held
+	 * in reset). For now I chose to keep this driver away from
+	 * those SoC specific bits and assume the device-tree is
+	 * right and the SCU has been configured properly by pinmux
+	 * or the firmware.
+	 */
+	if (priv->is_aspeed && !(phy_interface_mode_is_rgmii(phy_intf))) {
+		netdev_warn(netdev,
+			    "Unsupported PHY mode %s !\n",
+			    phy_modes(phy_intf));
+	}
+
+	phydev = phy_find_first(priv->mii_bus);
+	if (!phydev) {
+		netdev_info(netdev, "%s: no PHY found\n", netdev->name);
+		return -ENODEV;
+	}
+
+	phydev = phy_connect(netdev, phydev_name(phydev),
+			     &ftgmac100_adjust_link, phy_intf);
+
+	if (IS_ERR(phydev)) {
+		netdev_err(netdev, "%s: Could not attach to PHY\n", netdev->name);
+		return PTR_ERR(phydev);
+	}
+
+	/* Indicate that we support PAUSE frames (see comment in
+	 * Documentation/networking/phy.rst)
+	 */
+	phy_support_asym_pause(phydev);
+
+	/* Display what we found */
+	phy_attached_info(phydev);
+
+	return 0;
+}
+
 static int ftgmac100_open(struct net_device *netdev)
 {
 	struct ftgmac100 *priv = netdev_priv(netdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23  3:14 [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure Heyi Guo
  2022-02-23  3:14 ` [PATCH 1/3] drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call Heyi Guo
  2022-02-23  3:14 ` [PATCH 2/3] drivers/net/ftgmac100: adjust code place for function call dependency Heyi Guo
@ 2022-02-23  3:14 ` Heyi Guo
  2022-02-23  5:00   ` Florian Fainelli
  2022-02-23 10:48   ` Andrew Lunn
  2022-02-23 13:20 ` [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure patchwork-bot+netdevbpf
  3 siblings, 2 replies; 12+ messages in thread
From: Heyi Guo @ 2022-02-23  3:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heyi Guo, Andrew Lunn, David S. Miller, Jakub Kicinski,
	Joel Stanley, Guangbin Huang, Hao Chen, Arnd Bergmann,
	Dylan Hung, netdev

DHCP failures were observed with systemd 247.6. The issue could be
reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
down/up.

It is caused by below procedures in the driver:

1. ftgmac100_open() enables net interface and call phy_start()
2. When PHY is link up, it calls netif_carrier_on() and then
adjust_link callback
3. ftgmac100_adjust_link() will schedule the reset task
4. ftgmac100_reset_task() will then reset the MAC in another schedule

After step 2, systemd will be notified to send DHCP discover packet,
while the packet might be corrupted by MAC reset operation in step 4.

Call ftgmac100_reset() directly instead of scheduling task to fix the
issue.

Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
---
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Guangbin Huang <huangguangbin2@huawei.com>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dylan Hung <dylan_hung@aspeedtech.com>
Cc: netdev@vger.kernel.org


---
 drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index c1deb6e5d26c5..d5356db7539a4 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
 	/* Disable all interrupts */
 	iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 
-	/* Reset the adapter asynchronously */
-	schedule_work(&priv->reset_task);
+	/* Release phy lock to allow ftgmac100_reset to aquire it, keeping lock
+	 * order consistent to prevent dead lock.
+	 */
+	if (netdev->phydev)
+		mutex_unlock(&netdev->phydev->lock);
+
+	ftgmac100_reset(priv);
+
+	if (netdev->phydev)
+		mutex_lock(&netdev->phydev->lock);
+
 }
 
 static int ftgmac100_mii_probe(struct net_device *netdev)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23  3:14 ` [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd Heyi Guo
@ 2022-02-23  5:00   ` Florian Fainelli
  2022-02-23 11:39     ` Heyi Guo
  2022-02-23 10:48   ` Andrew Lunn
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Fainelli @ 2022-02-23  5:00 UTC (permalink / raw)
  To: Heyi Guo, linux-kernel
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev



On 2/22/2022 7:14 PM, Heyi Guo wrote:
> DHCP failures were observed with systemd 247.6. The issue could be
> reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
> down/up.
> 
> It is caused by below procedures in the driver:
> 
> 1. ftgmac100_open() enables net interface and call phy_start()
> 2. When PHY is link up, it calls netif_carrier_on() and then
> adjust_link callback
> 3. ftgmac100_adjust_link() will schedule the reset task
> 4. ftgmac100_reset_task() will then reset the MAC in another schedule
> 
> After step 2, systemd will be notified to send DHCP discover packet,
> while the packet might be corrupted by MAC reset operation in step 4.
> 
> Call ftgmac100_reset() directly instead of scheduling task to fix the
> issue.
> 
> Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
> ---
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Guangbin Huang <huangguangbin2@huawei.com>
> Cc: Hao Chen <chenhao288@hisilicon.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Dylan Hung <dylan_hung@aspeedtech.com>
> Cc: netdev@vger.kernel.org
> 
> 
> ---
>   drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> index c1deb6e5d26c5..d5356db7539a4 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
>   	/* Disable all interrupts */
>   	iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>   
> -	/* Reset the adapter asynchronously */
> -	schedule_work(&priv->reset_task);
> +	/* Release phy lock to allow ftgmac100_reset to aquire it, keeping lock

typo: acquire

> +	 * order consistent to prevent dead lock.
> +	 */
> +	if (netdev->phydev)
> +		mutex_unlock(&netdev->phydev->lock);
> +
> +	ftgmac100_reset(priv);
> +
> +	if (netdev->phydev)
> +		mutex_lock(&netdev->phydev->lock);

Do you really need to perform a full MAC reset whenever the link goes up 
or down? Instead cannot you just extract the maccr configuration which 
adjusts the speed and be done with it?

What kind of Ethernet MAC design is this seriously.
-- 
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23  3:14 ` [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd Heyi Guo
  2022-02-23  5:00   ` Florian Fainelli
@ 2022-02-23 10:48   ` Andrew Lunn
  2022-02-23 11:40     ` Heyi Guo
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2022-02-23 10:48 UTC (permalink / raw)
  To: Heyi Guo
  Cc: linux-kernel, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev

> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> index c1deb6e5d26c5..d5356db7539a4 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
>  	/* Disable all interrupts */
>  	iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>  
> -	/* Reset the adapter asynchronously */
> -	schedule_work(&priv->reset_task);
> +	/* Release phy lock to allow ftgmac100_reset to aquire it, keeping lock
> +	 * order consistent to prevent dead lock.
> +	 */
> +	if (netdev->phydev)
> +		mutex_unlock(&netdev->phydev->lock);

No need to do this test. The fact that adjust_link is being called
indicates there must be a PHY.

    Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23  5:00   ` Florian Fainelli
@ 2022-02-23 11:39     ` Heyi Guo
  2022-02-23 17:55       ` Florian Fainelli
  0 siblings, 1 reply; 12+ messages in thread
From: Heyi Guo @ 2022-02-23 11:39 UTC (permalink / raw)
  To: Florian Fainelli, linux-kernel
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev

Hi Florian,

在 2022/2/23 下午1:00, Florian Fainelli 写道:
>
>
> On 2/22/2022 7:14 PM, Heyi Guo wrote:
>> DHCP failures were observed with systemd 247.6. The issue could be
>> reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
>> down/up.
>>
>> It is caused by below procedures in the driver:
>>
>> 1. ftgmac100_open() enables net interface and call phy_start()
>> 2. When PHY is link up, it calls netif_carrier_on() and then
>> adjust_link callback
>> 3. ftgmac100_adjust_link() will schedule the reset task
>> 4. ftgmac100_reset_task() will then reset the MAC in another schedule
>>
>> After step 2, systemd will be notified to send DHCP discover packet,
>> while the packet might be corrupted by MAC reset operation in step 4.
>>
>> Call ftgmac100_reset() directly instead of scheduling task to fix the
>> issue.
>>
>> Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
>> ---
>> Cc: Andrew Lunn <andrew@lunn.ch>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Joel Stanley <joel@jms.id.au>
>> Cc: Guangbin Huang <huangguangbin2@huawei.com>
>> Cc: Hao Chen <chenhao288@hisilicon.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Dylan Hung <dylan_hung@aspeedtech.com>
>> Cc: netdev@vger.kernel.org
>>
>>
>> ---
>>   drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
>> b/drivers/net/ethernet/faraday/ftgmac100.c
>> index c1deb6e5d26c5..d5356db7539a4 100644
>> --- a/drivers/net/ethernet/faraday/ftgmac100.c
>> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
>> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct 
>> net_device *netdev)
>>       /* Disable all interrupts */
>>       iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>>   -    /* Reset the adapter asynchronously */
>> -    schedule_work(&priv->reset_task);
>> +    /* Release phy lock to allow ftgmac100_reset to aquire it, 
>> keeping lock
>
> typo: acquire
>
Thanks for the catch :)
>> +     * order consistent to prevent dead lock.
>> +     */
>> +    if (netdev->phydev)
>> +        mutex_unlock(&netdev->phydev->lock);
>> +
>> +    ftgmac100_reset(priv);
>> +
>> +    if (netdev->phydev)
>> +        mutex_lock(&netdev->phydev->lock);
>
> Do you really need to perform a full MAC reset whenever the link goes 
> up or down? Instead cannot you just extract the maccr configuration 
> which adjusts the speed and be done with it?

This is the original behavior and not changed in this patch set, and I'm 
not familiar with the hardware design of ftgmac100, so I'd like to limit 
the changes to the code which really causes practical issues.

Thanks,

Heyi

>
> What kind of Ethernet MAC design is this seriously.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23 10:48   ` Andrew Lunn
@ 2022-02-23 11:40     ` Heyi Guo
  0 siblings, 0 replies; 12+ messages in thread
From: Heyi Guo @ 2022-02-23 11:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux-kernel, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev


在 2022/2/23 下午6:48, Andrew Lunn 写道:
>> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
>> index c1deb6e5d26c5..d5356db7539a4 100644
>> --- a/drivers/net/ethernet/faraday/ftgmac100.c
>> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
>> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
>>   	/* Disable all interrupts */
>>   	iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>>   
>> -	/* Reset the adapter asynchronously */
>> -	schedule_work(&priv->reset_task);
>> +	/* Release phy lock to allow ftgmac100_reset to aquire it, keeping lock
>> +	 * order consistent to prevent dead lock.
>> +	 */
>> +	if (netdev->phydev)
>> +		mutex_unlock(&netdev->phydev->lock);
> No need to do this test. The fact that adjust_link is being called
> indicates there must be a PHY.

OK, will remove the check in v2.

Thanks,

Heyi

>
>      Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure
  2022-02-23  3:14 [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure Heyi Guo
                   ` (2 preceding siblings ...)
  2022-02-23  3:14 ` [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd Heyi Guo
@ 2022-02-23 13:20 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-23 13:20 UTC (permalink / raw)
  To: Heyi Guo
  Cc: linux-kernel, andrew, davem, kuba, joel, huangguangbin2,
	chenhao288, arnd, dylan_hung, netdev

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 23 Feb 2022 11:14:33 +0800 you wrote:
> This patch set is to fix the issues discussed in the mail thread:
> https://lore.kernel.org/netdev/51f5b7a7-330f-6b3c-253d-10e45cdb6805@linux.alibaba.com/
> and follows the advice from Andrew Lunn.
> 
> The first 2 patches refactors the code to enable adjust_link calling reset
> function directly.
> 
> [...]

Here is the summary with links:
  - [1/3] drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call
    https://git.kernel.org/netdev/net/c/4f1e72850d45
  - [2/3] drivers/net/ftgmac100: adjust code place for function call dependency
    https://git.kernel.org/netdev/net/c/3c773dba8182
  - [3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
    https://git.kernel.org/netdev/net/c/1baf2e50e48f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23 11:39     ` Heyi Guo
@ 2022-02-23 17:55       ` Florian Fainelli
  2022-02-23 18:05         ` Florian Fainelli
  2022-02-24  2:52         ` Heyi Guo
  0 siblings, 2 replies; 12+ messages in thread
From: Florian Fainelli @ 2022-02-23 17:55 UTC (permalink / raw)
  To: Heyi Guo, linux-kernel
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev



On 2/23/2022 3:39 AM, Heyi Guo wrote:
> Hi Florian,
> 
> 在 2022/2/23 下午1:00, Florian Fainelli 写道:
>>
>>
>> On 2/22/2022 7:14 PM, Heyi Guo wrote:
>>> DHCP failures were observed with systemd 247.6. The issue could be
>>> reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
>>> down/up.
>>>
>>> It is caused by below procedures in the driver:
>>>
>>> 1. ftgmac100_open() enables net interface and call phy_start()
>>> 2. When PHY is link up, it calls netif_carrier_on() and then
>>> adjust_link callback
>>> 3. ftgmac100_adjust_link() will schedule the reset task
>>> 4. ftgmac100_reset_task() will then reset the MAC in another schedule
>>>
>>> After step 2, systemd will be notified to send DHCP discover packet,
>>> while the packet might be corrupted by MAC reset operation in step 4.
>>>
>>> Call ftgmac100_reset() directly instead of scheduling task to fix the
>>> issue.
>>>
>>> Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
>>> ---
>>> Cc: Andrew Lunn <andrew@lunn.ch>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Joel Stanley <joel@jms.id.au>
>>> Cc: Guangbin Huang <huangguangbin2@huawei.com>
>>> Cc: Hao Chen <chenhao288@hisilicon.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: Dylan Hung <dylan_hung@aspeedtech.com>
>>> Cc: netdev@vger.kernel.org
>>>
>>>
>>> ---
>>>   drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
>>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
>>> b/drivers/net/ethernet/faraday/ftgmac100.c
>>> index c1deb6e5d26c5..d5356db7539a4 100644
>>> --- a/drivers/net/ethernet/faraday/ftgmac100.c
>>> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
>>> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct 
>>> net_device *netdev)
>>>       /* Disable all interrupts */
>>>       iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>>>   -    /* Reset the adapter asynchronously */
>>> -    schedule_work(&priv->reset_task);
>>> +    /* Release phy lock to allow ftgmac100_reset to aquire it, 
>>> keeping lock
>>
>> typo: acquire
>>
> Thanks for the catch :)
>>> +     * order consistent to prevent dead lock.
>>> +     */
>>> +    if (netdev->phydev)
>>> +        mutex_unlock(&netdev->phydev->lock);
>>> +
>>> +    ftgmac100_reset(priv);
>>> +
>>> +    if (netdev->phydev)
>>> +        mutex_lock(&netdev->phydev->lock);
>>
>> Do you really need to perform a full MAC reset whenever the link goes 
>> up or down? Instead cannot you just extract the maccr configuration 
>> which adjusts the speed and be done with it?
> 
> This is the original behavior and not changed in this patch set, and I'm 
> not familiar with the hardware design of ftgmac100, so I'd like to limit 
> the changes to the code which really causes practical issues.

This unlocking and re-locking seems superfluous when you could introduce 
a version of ftgmac100_reset() which does not acquire the PHY device 
mutex, and have that version called from ftgmac100_adjust_link(). For 
every other call site, you would acquire it. Something like this for 
instance:


diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 691605c15265..98179c3fd9ee 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1038,7 +1038,7 @@ static void ftgmac100_adjust_link(struct 
net_device *netdev)
         iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);

         /* Reset the adapter asynchronously */
-       schedule_work(&priv->reset_task);
+       ftgmac100_reset(priv, false);
  }

  static int ftgmac100_mii_probe(struct net_device *netdev)
@@ -1410,10 +1410,8 @@ static int ftgmac100_init_all(struct ftgmac100 
*priv, bool ignore_alloc_err)
         return err;
  }

-static void ftgmac100_reset_task(struct work_struct *work)
+static void ftgmac100_reset_task(struct ftgmac100_priv *priv, bool 
lock_phy)
  {
-       struct ftgmac100 *priv = container_of(work, struct ftgmac100,
-                                             reset_task);
         struct net_device *netdev = priv->netdev;
         int err;

@@ -1421,7 +1419,7 @@ static void ftgmac100_reset_task(struct 
work_struct *work)

         /* Lock the world */
         rtnl_lock();
-       if (netdev->phydev)
+       if (netdev->phydev && lock_phy)
                 mutex_lock(&netdev->phydev->lock);
         if (priv->mii_bus)
                 mutex_lock(&priv->mii_bus->mdio_lock);
@@ -1454,11 +1452,19 @@ static void ftgmac100_reset_task(struct 
work_struct *work)
   bail:
         if (priv->mii_bus)
                 mutex_unlock(&priv->mii_bus->mdio_lock);
-       if (netdev->phydev)
+       if (netdev->phydev && lock_phy)
                 mutex_unlock(&netdev->phydev->lock);
         rtnl_unlock();
  }

+static void ftgmac100_reset_task(struct work_struct *work)
+{
+       struct ftgmac100 *priv = container_of(work, struct ftgmac100,
+                                             reset_task);
+
+       ftgmac100_reset(priv, true);
+}
+
  static int ftgmac100_open(struct net_device *netdev)
  {
         struct ftgmac100 *priv = netdev_priv(netdev)
-- 
Florian

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23 17:55       ` Florian Fainelli
@ 2022-02-23 18:05         ` Florian Fainelli
  2022-02-24  2:52         ` Heyi Guo
  1 sibling, 0 replies; 12+ messages in thread
From: Florian Fainelli @ 2022-02-23 18:05 UTC (permalink / raw)
  To: Heyi Guo, linux-kernel
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev



On 2/23/2022 9:55 AM, Florian Fainelli wrote:
> 
> 
> On 2/23/2022 3:39 AM, Heyi Guo wrote:
>> Hi Florian,
>>
>> 在 2022/2/23 下午1:00, Florian Fainelli 写道:
>>>
>>>
>>> On 2/22/2022 7:14 PM, Heyi Guo wrote:
>>>> DHCP failures were observed with systemd 247.6. The issue could be
>>>> reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
>>>> down/up.
>>>>
>>>> It is caused by below procedures in the driver:
>>>>
>>>> 1. ftgmac100_open() enables net interface and call phy_start()
>>>> 2. When PHY is link up, it calls netif_carrier_on() and then
>>>> adjust_link callback
>>>> 3. ftgmac100_adjust_link() will schedule the reset task
>>>> 4. ftgmac100_reset_task() will then reset the MAC in another schedule
>>>>
>>>> After step 2, systemd will be notified to send DHCP discover packet,
>>>> while the packet might be corrupted by MAC reset operation in step 4.
>>>>
>>>> Call ftgmac100_reset() directly instead of scheduling task to fix the
>>>> issue.
>>>>
>>>> Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
>>>> ---
>>>> Cc: Andrew Lunn <andrew@lunn.ch>
>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>> Cc: Joel Stanley <joel@jms.id.au>
>>>> Cc: Guangbin Huang <huangguangbin2@huawei.com>
>>>> Cc: Hao Chen <chenhao288@hisilicon.com>
>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>> Cc: Dylan Hung <dylan_hung@aspeedtech.com>
>>>> Cc: netdev@vger.kernel.org
>>>>
>>>>
>>>> ---
>>>>   drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
>>>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
>>>> b/drivers/net/ethernet/faraday/ftgmac100.c
>>>> index c1deb6e5d26c5..d5356db7539a4 100644
>>>> --- a/drivers/net/ethernet/faraday/ftgmac100.c
>>>> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
>>>> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct 
>>>> net_device *netdev)
>>>>       /* Disable all interrupts */
>>>>       iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>>>>   -    /* Reset the adapter asynchronously */
>>>> -    schedule_work(&priv->reset_task);
>>>> +    /* Release phy lock to allow ftgmac100_reset to aquire it, 
>>>> keeping lock
>>>
>>> typo: acquire
>>>
>> Thanks for the catch :)
>>>> +     * order consistent to prevent dead lock.
>>>> +     */
>>>> +    if (netdev->phydev)
>>>> +        mutex_unlock(&netdev->phydev->lock);
>>>> +
>>>> +    ftgmac100_reset(priv);
>>>> +
>>>> +    if (netdev->phydev)
>>>> +        mutex_lock(&netdev->phydev->lock);
>>>
>>> Do you really need to perform a full MAC reset whenever the link goes 
>>> up or down? Instead cannot you just extract the maccr configuration 
>>> which adjusts the speed and be done with it?
>>
>> This is the original behavior and not changed in this patch set, and 
>> I'm not familiar with the hardware design of ftgmac100, so I'd like to 
>> limit the changes to the code which really causes practical issues.
> 
> This unlocking and re-locking seems superfluous when you could introduce 
> a version of ftgmac100_reset() which does not acquire the PHY device 
> mutex, and have that version called from ftgmac100_adjust_link(). For 
> every other call site, you would acquire it. Something like this for 
> instance:
> 
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 691605c15265..98179c3fd9ee 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1038,7 +1038,7 @@ static void ftgmac100_adjust_link(struct 
> net_device *netdev)
>          iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
> 
>          /* Reset the adapter asynchronously */
> -       schedule_work(&priv->reset_task);
> +       ftgmac100_reset(priv, false);
>   }
> 
>   static int ftgmac100_mii_probe(struct net_device *netdev)
> @@ -1410,10 +1410,8 @@ static int ftgmac100_init_all(struct ftgmac100 
> *priv, bool ignore_alloc_err)
>          return err;
>   }
> 
> -static void ftgmac100_reset_task(struct work_struct *work)
> +static void ftgmac100_reset_task(struct ftgmac100_priv *priv, bool 
> lock_phy)
>   {
> -       struct ftgmac100 *priv = container_of(work, struct ftgmac100,
> -                                             reset_task);
>          struct net_device *netdev = priv->netdev;
>          int err;
> 
> @@ -1421,7 +1419,7 @@ static void ftgmac100_reset_task(struct 
> work_struct *work)
> 
>          /* Lock the world */
>          rtnl_lock();
> -       if (netdev->phydev)
> +       if (netdev->phydev && lock_phy)
>                  mutex_lock(&netdev->phydev->lock);
>          if (priv->mii_bus)
>                  mutex_lock(&priv->mii_bus->mdio_lock);
> @@ -1454,11 +1452,19 @@ static void ftgmac100_reset_task(struct 
> work_struct *work)
>    bail:
>          if (priv->mii_bus)
>                  mutex_unlock(&priv->mii_bus->mdio_lock);
> -       if (netdev->phydev)
> +       if (netdev->phydev && lock_phy)
>                  mutex_unlock(&netdev->phydev->lock);
>          rtnl_unlock();
>   }
> 
> +static void ftgmac100_reset_task(struct work_struct *work)
> +{
> +       struct ftgmac100 *priv = container_of(work, struct ftgmac100,
> +                                             reset_task);
> +
> +       ftgmac100_reset(priv, true);
> +}
> +
>   static int ftgmac100_open(struct net_device *netdev)
>   {
>          struct ftgmac100 *priv = netdev_priv(netdev)

Well this whole patch series has been applied already so I guess those 
comments are partially or totally moot now.

I have not received my notification about these patches being applied, 
unless when Jakub applies them, so either it is another vger/gmail lag 
that is absolutely unnerving or it is a difference of process between 
David and Jakub, in which case it really ought to be fixed such that it 
is consistent.
-- 
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd
  2022-02-23 17:55       ` Florian Fainelli
  2022-02-23 18:05         ` Florian Fainelli
@ 2022-02-24  2:52         ` Heyi Guo
  1 sibling, 0 replies; 12+ messages in thread
From: Heyi Guo @ 2022-02-24  2:52 UTC (permalink / raw)
  To: Florian Fainelli, linux-kernel
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, Joel Stanley,
	Guangbin Huang, Hao Chen, Arnd Bergmann, Dylan Hung, netdev

Hi Florian,

在 2022/2/24 上午1:55, Florian Fainelli 写道:
>
>
> On 2/23/2022 3:39 AM, Heyi Guo wrote:
>> Hi Florian,
>>
>> 在 2022/2/23 下午1:00, Florian Fainelli 写道:
>>>
>>>
>>> On 2/22/2022 7:14 PM, Heyi Guo wrote:
>>>> DHCP failures were observed with systemd 247.6. The issue could be
>>>> reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
>>>> down/up.
>>>>
>>>> It is caused by below procedures in the driver:
>>>>
>>>> 1. ftgmac100_open() enables net interface and call phy_start()
>>>> 2. When PHY is link up, it calls netif_carrier_on() and then
>>>> adjust_link callback
>>>> 3. ftgmac100_adjust_link() will schedule the reset task
>>>> 4. ftgmac100_reset_task() will then reset the MAC in another schedule
>>>>
>>>> After step 2, systemd will be notified to send DHCP discover packet,
>>>> while the packet might be corrupted by MAC reset operation in step 4.
>>>>
>>>> Call ftgmac100_reset() directly instead of scheduling task to fix the
>>>> issue.
>>>>
>>>> Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
>>>> ---
>>>> Cc: Andrew Lunn <andrew@lunn.ch>
>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>> Cc: Joel Stanley <joel@jms.id.au>
>>>> Cc: Guangbin Huang <huangguangbin2@huawei.com>
>>>> Cc: Hao Chen <chenhao288@hisilicon.com>
>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>> Cc: Dylan Hung <dylan_hung@aspeedtech.com>
>>>> Cc: netdev@vger.kernel.org
>>>>
>>>>
>>>> ---
>>>>   drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
>>>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
>>>> b/drivers/net/ethernet/faraday/ftgmac100.c
>>>> index c1deb6e5d26c5..d5356db7539a4 100644
>>>> --- a/drivers/net/ethernet/faraday/ftgmac100.c
>>>> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
>>>> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct 
>>>> net_device *netdev)
>>>>       /* Disable all interrupts */
>>>>       iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>>>>   -    /* Reset the adapter asynchronously */
>>>> -    schedule_work(&priv->reset_task);
>>>> +    /* Release phy lock to allow ftgmac100_reset to aquire it, 
>>>> keeping lock
>>>
>>> typo: acquire
>>>
>> Thanks for the catch :)
>>>> +     * order consistent to prevent dead lock.
>>>> +     */
>>>> +    if (netdev->phydev)
>>>> +        mutex_unlock(&netdev->phydev->lock);
>>>> +
>>>> +    ftgmac100_reset(priv);
>>>> +
>>>> +    if (netdev->phydev)
>>>> +        mutex_lock(&netdev->phydev->lock);
>>>
>>> Do you really need to perform a full MAC reset whenever the link 
>>> goes up or down? Instead cannot you just extract the maccr 
>>> configuration which adjusts the speed and be done with it?
>>
>> This is the original behavior and not changed in this patch set, and 
>> I'm not familiar with the hardware design of ftgmac100, so I'd like 
>> to limit the changes to the code which really causes practical issues.
>
> This unlocking and re-locking seems superfluous when you could 
> introduce a version of ftgmac100_reset() which does not acquire the 
> PHY device mutex, and have that version called from 
> ftgmac100_adjust_link(). For every other call site, you would acquire 
> it. Something like this for instance:
>
>
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 691605c15265..98179c3fd9ee 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1038,7 +1038,7 @@ static void ftgmac100_adjust_link(struct 
> net_device *netdev)
>         iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>
>         /* Reset the adapter asynchronously */
> -       schedule_work(&priv->reset_task);
> +       ftgmac100_reset(priv, false);
>  }
>
>  static int ftgmac100_mii_probe(struct net_device *netdev)
> @@ -1410,10 +1410,8 @@ static int ftgmac100_init_all(struct ftgmac100 
> *priv, bool ignore_alloc_err)
>         return err;
>  }
>
> -static void ftgmac100_reset_task(struct work_struct *work)
> +static void ftgmac100_reset_task(struct ftgmac100_priv *priv, bool 
> lock_phy)
>  {
> -       struct ftgmac100 *priv = container_of(work, struct ftgmac100,
> -                                             reset_task);
>         struct net_device *netdev = priv->netdev;
>         int err;
>
> @@ -1421,7 +1419,7 @@ static void ftgmac100_reset_task(struct 
> work_struct *work)
>
>         /* Lock the world */
>         rtnl_lock();
> -       if (netdev->phydev)
> +       if (netdev->phydev && lock_phy)
>                 mutex_lock(&netdev->phydev->lock);
>         if (priv->mii_bus)
>                 mutex_lock(&priv->mii_bus->mdio_lock);
> @@ -1454,11 +1452,19 @@ static void ftgmac100_reset_task(struct 
> work_struct *work)
>   bail:
>         if (priv->mii_bus)
>                 mutex_unlock(&priv->mii_bus->mdio_lock);
> -       if (netdev->phydev)
> +       if (netdev->phydev && lock_phy)
>                 mutex_unlock(&netdev->phydev->lock);
>         rtnl_unlock();
>  }

This is also what we supposed to do at first, however it will introduce 
a potential dead lock for different locks acquiring order, and 
CONFIG_PROVE_LOCKING will complain about it:

[   16.852199] ======================================================
[   16.859102] WARNING: possible circular locking dependency detected
[   16.866012] 5.10.36-60b3c9d-dirty-15f4fba #1 Not tainted
[   16.871976] ------------------------------------------------------
[   16.871991] kworker/1:1/23 is trying to acquire lock:
[   16.872000] 80fa0920 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x28
[   16.872047]
[   16.872047] but task is already holding lock:
[   16.872051] 821d44c0 (&dev->lock){+.+.}-{3:3}, at: 
phy_state_machine+0x50/0x290
[   16.872076]
[   16.872076] which lock already depends on the new lock.
[   16.872076]
[   16.872080]
[   16.872080] the existing dependency chain (in reverse order) is:
[   16.872083]
[   16.872083] -> #1 (&dev->lock){+.+.}-{3:3}:
[   16.872106]        lock_acquire+0x6c/0x74
[   16.872117]        __mutex_lock+0xb4/0xa48
[   16.872132]        mutex_lock_nested+0x2c/0x34
[   16.872141]        phy_start+0x30/0xc4
[   16.872155]        ftgmac100_open+0x1a0/0x254
[   16.872168]        __dev_open+0x110/0x1d0
[   16.872180]        __dev_change_flags+0x1d0/0x258
[   16.872192]        dev_change_flags+0x28/0x58
[   16.872204]        do_setlink+0x258/0xc60
[   16.872212]        rtnl_setlink+0x110/0x18c
[   16.872219]        rtnetlink_rcv_msg+0x1d0/0x53c
[   16.872226]        netlink_rcv_skb+0xd0/0x128
[   16.872233]        rtnetlink_rcv+0x20/0x24
[   16.872244]        netlink_unicast+0x1a8/0x26c
[   16.872254]        netlink_sendmsg+0x220/0x464
[   16.872265]        __sys_sendto+0xe4/0x134
[   16.872276]        sys_sendto+0x24/0x2c
[   16.872288]        ret_fast_syscall+0x0/0x28
[   16.872297]        0x7ed9e928
[   16.872301]
[   16.872301] -> #0 (rtnl_mutex){+.+.}-{3:3}:
[   16.872325]        __lock_acquire+0x17e8/0x3268
[   16.872331]        lock_acquire.part.0+0xcc/0x394
[   16.872341]        lock_acquire+0x6c/0x74
[   16.872354]        __mutex_lock+0xb4/0xa48
[   16.872365]        mutex_lock_nested+0x2c/0x34
[   16.872377]        rtnl_lock+0x24/0x28
[   16.872389]        ftgmac100_adjust_link+0xc0/0x144
[   16.872401]        phy_link_change+0x38/0x64
[   16.872411]        phy_check_link_status+0xa8/0xfc
[   16.872422]        phy_state_machine+0x80/0x290
[   16.872435]        process_one_work+0x294/0x7d8
[   16.872447]        worker_thread+0x6c/0x548
[   16.872456]        kthread+0x170/0x178
[   16.872462]        ret_from_fork+0x14/0x20
[   16.872467]        0x0
[   16.872471]
[   16.872471] other info that might help us debug this:
[   16.872471]
[   16.872475]  Possible unsafe locking scenario:
[   16.872475]
[   16.872478]        CPU0                    CPU1
[   16.872482]        ----                    ----
[   16.872485]   lock(&dev->lock);
[   16.872495]                                lock(rtnl_mutex);
[   16.872505] lock(&dev->lock);
[   16.872513]   lock(rtnl_mutex);
[   16.872522]
[   16.872522]  *** DEADLOCK ***
[   16.872522]
[   16.872528] 3 locks held by kworker/1:1/23:
[   16.872532]  #0: 818472a8 
((wq_completion)events_power_efficient){+.+.}-{0:0}, at: 
process_one_work+0x1e8/0x7d8
[   16.872558]  #1: 819fbef8 
((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: 
process_one_work+0x1e8/0x7d8
[   16.872582]  #2: 821d44c0 (&dev->lock){+.+.}-{3:3}, at: 
phy_state_machine+0x50/0x290

Thanks,

Heyi

>
> +static void ftgmac100_reset_task(struct work_struct *work)
> +{
> +       struct ftgmac100 *priv = container_of(work, struct ftgmac100,
> +                                             reset_task);
> +
> +       ftgmac100_reset(priv, true);
> +}
> +
>  static int ftgmac100_open(struct net_device *netdev)
>  {
>         struct ftgmac100 *priv = netdev_priv(netdev)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-24  2:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23  3:14 [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure Heyi Guo
2022-02-23  3:14 ` [PATCH 1/3] drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call Heyi Guo
2022-02-23  3:14 ` [PATCH 2/3] drivers/net/ftgmac100: adjust code place for function call dependency Heyi Guo
2022-02-23  3:14 ` [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure with systemd Heyi Guo
2022-02-23  5:00   ` Florian Fainelli
2022-02-23 11:39     ` Heyi Guo
2022-02-23 17:55       ` Florian Fainelli
2022-02-23 18:05         ` Florian Fainelli
2022-02-24  2:52         ` Heyi Guo
2022-02-23 10:48   ` Andrew Lunn
2022-02-23 11:40     ` Heyi Guo
2022-02-23 13:20 ` [PATCH 0/3] drivers/net/ftgmac100: fix occasional DHCP failure patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).