All of lore.kernel.org
 help / color / mirror / Atom feed
* net: fec: memory corruption caused by err_enet_mii_probe error path
@ 2021-12-22  7:06 Kegl Rohit
  2021-12-22  8:01 ` Joakim Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Kegl Rohit @ 2021-12-22  7:06 UTC (permalink / raw)
  To: netdev; +Cc: Andy Duan

Hello!

There is an issue with the error path of fec_enet_mii_probe in
fec_enet_open(struct net_device *ndev) which leads to random memory
corruption.

In open() the buffers are initialized:
https://github.com/torvalds/linux/blob/v5.10/drivers/net/ethernet/freescale/fec_main.c#L3001

Then fec_restart will start the DMA engines.
https://github.com/torvalds/linux/blob/v5.10/drivers/net/ethernet/freescale/fec_main.c#L3006

Now if fec_enet_mii_probe fails (e.g. phy did not respond via mii) the
err_enet_mii_probe error path will be used
https://github.com/torvalds/linux/blob/v5.10/drivers/net/ethernet/freescale/fec_main.c#L3031

err_enet_mii_probe:
fec_enet_free_buffers(ndev);
err_enet_alloc:
fec_enet_clk_enable(ndev, false);
clk_enable:
pm_runtime_mark_last_busy(&fep->pdev->dev);
pm_runtime_put_autosuspend(&fep->pdev->dev);
pinctrl_pm_select_sleep_state(&fep->pdev->dev);
return ret;

This error path frees the DMA buffers, BUT as far I could see it does
not stop the DMA engines.
=> open() fails => frees buffers => DMA still active => MAC receives
network packet => DMA starts => random memory corruption (use after
free) => random kernel panics

So maybe fec_stop() as counterpart to fec_restart() is missing before
freeing the buffers?
err_enet_mii_probe:
fec_stop(ndev);
fec_enet_free_buffers(ndev);

Issue happend with 5.10.83 and should also also happen with current master.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: net: fec: memory corruption caused by err_enet_mii_probe error path
  2021-12-22  7:06 net: fec: memory corruption caused by err_enet_mii_probe error path Kegl Rohit
@ 2021-12-22  8:01 ` Joakim Zhang
  2021-12-22 10:33   ` Andrew Lunn
  2021-12-22 11:05   ` Kegl Rohit
  0 siblings, 2 replies; 7+ messages in thread
From: Joakim Zhang @ 2021-12-22  8:01 UTC (permalink / raw)
  To: Kegl Rohit, netdev, Andrew Lunn


Hi Kegl,

Removing Andy Duan, since he has left NXP, and mail is unavailable any longer, to avoid noise.

> -----Original Message-----
> From: Kegl Rohit <keglrohit@gmail.com>
> Sent: 2021年12月22日 15:06
> To: netdev <netdev@vger.kernel.org>
> Cc: Andy Duan <fugang.duan@nxp.com>
> Subject: net: fec: memory corruption caused by err_enet_mii_probe error
> path
> 
> Hello!
> 
> There is an issue with the error path of fec_enet_mii_probe in
> fec_enet_open(struct net_device *ndev) which leads to random memory
> corruption.
> 
> In open() the buffers are initialized:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Ftorvalds%2Flinux%2Fblob%2Fv5.10%2Fdrivers%2Fnet%2Fethernet
> %2Ffreescale%2Ffec_main.c%23L3001&amp;data=04%7C01%7Cqiangqing.zh
> ang%40nxp.com%7Cc7511826d9684e2b9d3508d9c51991d0%7C686ea1d3bc2b
> 4c6fa92cd99c5c301635%7C0%7C1%7C637757535872985607%7CUnknown%7C
> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> CJXVCI6Mn0%3D%7C3000&amp;sdata=FovOc4z3RIuatEraWWO8lgco8sIea8B0
> 3RVKNd83QZc%3D&amp;reserved=0
> 
> Then fec_restart will start the DMA engines.
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Ftorvalds%2Flinux%2Fblob%2Fv5.10%2Fdrivers%2Fnet%2Fethernet
> %2Ffreescale%2Ffec_main.c%23L3006&amp;data=04%7C01%7Cqiangqing.zh
> ang%40nxp.com%7Cc7511826d9684e2b9d3508d9c51991d0%7C686ea1d3bc2b
> 4c6fa92cd99c5c301635%7C0%7C1%7C637757535872985607%7CUnknown%7C
> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> CJXVCI6Mn0%3D%7C3000&amp;sdata=FAEb9MAGVmVYv4gdiJrwhT7k4Pys8
> UlRLnydpAJ6N94%3D&amp;reserved=0
> 
> Now if fec_enet_mii_probe fails (e.g. phy did not respond via mii) the
> err_enet_mii_probe error path will be used
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Ftorvalds%2Flinux%2Fblob%2Fv5.10%2Fdrivers%2Fnet%2Fethernet
> %2Ffreescale%2Ffec_main.c%23L3031&amp;data=04%7C01%7Cqiangqing.zh
> ang%40nxp.com%7Cc7511826d9684e2b9d3508d9c51991d0%7C686ea1d3bc2b
> 4c6fa92cd99c5c301635%7C0%7C1%7C637757535872985607%7CUnknown%7C
> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> CJXVCI6Mn0%3D%7C3000&amp;sdata=QLT9%2BcOkH7%2BdRSZzdIzp6oq0rc
> wnOy0vfVN7Sa4YTAM%3D&amp;reserved=0
> 
> err_enet_mii_probe:
> fec_enet_free_buffers(ndev);
> err_enet_alloc:
> fec_enet_clk_enable(ndev, false);
> clk_enable:
> pm_runtime_mark_last_busy(&fep->pdev->dev);
> pm_runtime_put_autosuspend(&fep->pdev->dev);
> pinctrl_pm_select_sleep_state(&fep->pdev->dev);
> return ret;
> 
> This error path frees the DMA buffers, BUT as far I could see it does not stop
> the DMA engines.
> => open() fails => frees buffers => DMA still active => MAC receives network
> packet => DMA starts => random memory corruption (use after
> free) => random kernel panics

A question here, why receive path still active? MAC has not connected to PHY when this failure happened, should not see network activities.

> So maybe fec_stop() as counterpart to fec_restart() is missing before freeing
> the buffers?
> err_enet_mii_probe:
> fec_stop(ndev);
> fec_enet_free_buffers(ndev);
>
> Issue happend with 5.10.83 and should also also happen with current master.

It's fine for me, please see if anyone else has some comments. If not, please cook a formal patch, thanks.

Best Regards,
Joakim Zhang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net: fec: memory corruption caused by err_enet_mii_probe error path
  2021-12-22  8:01 ` Joakim Zhang
@ 2021-12-22 10:33   ` Andrew Lunn
  2021-12-22 11:05   ` Kegl Rohit
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Lunn @ 2021-12-22 10:33 UTC (permalink / raw)
  To: Joakim Zhang; +Cc: Kegl Rohit, netdev

> > err_enet_mii_probe:
> > fec_enet_free_buffers(ndev);
> > err_enet_alloc:
> > fec_enet_clk_enable(ndev, false);
> > clk_enable:
> > pm_runtime_mark_last_busy(&fep->pdev->dev);
> > pm_runtime_put_autosuspend(&fep->pdev->dev);
> > pinctrl_pm_select_sleep_state(&fep->pdev->dev);
> > return ret;
> > 
> > This error path frees the DMA buffers, BUT as far I could see it does not stop
> > the DMA engines.
> > => open() fails => frees buffers => DMA still active => MAC receives network
> > packet => DMA starts => random memory corruption (use after
> > free) => random kernel panics

> A question here, why receive path still active? MAC has not
> connected to PHY when this failure happened, should not see network
> activities.

Not every system has a PHY. There are plenty of boards with the FEC
directly connected to an Ethernet switch. So packets could be flowing.

	 Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net: fec: memory corruption caused by err_enet_mii_probe error path
  2021-12-22  8:01 ` Joakim Zhang
  2021-12-22 10:33   ` Andrew Lunn
@ 2021-12-22 11:05   ` Kegl Rohit
  2021-12-23  1:53     ` Joakim Zhang
  1 sibling, 1 reply; 7+ messages in thread
From: Kegl Rohit @ 2021-12-22 11:05 UTC (permalink / raw)
  To: Joakim Zhang; +Cc: netdev, Andrew Lunn

On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang <qiangqing.zhang@nxp.com> wrote:

> > This error path frees the DMA buffers, BUT as far I could see it does not stop
> > the DMA engines.
> > => open() fails => frees buffers => DMA still active => MAC receives network
> > packet => DMA starts => random memory corruption (use after
> > free) => random kernel panics
>
> A question here, why receive path still active? MAC has not connected to PHY when this failure happened, should not see network activities.

It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.

One of our devices (out of 10) eth1 did not detect a phy for eth1 on
ifup ( fec_open() ) => mdio error path => random system crashes
But the phy is there and the link is good, only MDIO access failed.
phys have autoneg activated after reset. So the RX path is active,
even if the fec driver says that there is no phy.

Without attached ethernet cable the phy was also not detected, BUT the
system did not crash. => Because no packets will arrive without
attached cable.

So the main issue on our side is the not detected phy. And the other
issue is the use after free in the error path.

I think the main issue has something to do with phy reset handling.
On a cold boot the eth1 phy is detected successfully. A warm restart (
reboot -f ) will always lead to a not detected eth1 phy. The eth0 phy
is always detected.
From past experience the dual ethernet implementation in the driver
was not the most stable one. Maybe because of a smaller user base.
In our setup both phy reset lines are connected to gpios with the
correct entry in the mac0 & mac1 DT entry.
Revert of https://github.com/torvalds/linux/commit/7705b5ed8adccd921423019e870df672fa423279#diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb551112d3fe37e50b1a15
seems to get the phys in a correct state before the fec driver is
probed and both phys are detected.

> > So maybe fec_stop() as counterpart to fec_restart() is missing before freeing
> > the buffers?
> > err_enet_mii_probe:
> > fec_stop(ndev);
> > fec_enet_free_buffers(ndev);
> >
> > Issue happend with 5.10.83 and should also also happen with current master.
>
> It's fine for me, please see if anyone else has some comments. If not, please cook a formal patch, thanks.
So fec_stop is the right guess to stop the rx/tx dma rings.

Other paths use netif_tx_lock_bh before calling fec_stop().
I don't think this is necessary because netif_tx_start_all_queues()
was not called before in this error path case?
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/freescale/fec_main.c#L3207

index 1b1f7f2a6..8f208b4a9 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)

        return 0;

 err_enet_mii_probe:
+       fec_stop(ndev);
        fec_enet_free_buffers(ndev);
 err_enet_alloc:
        fec_enet_clk_enable(ndev, false);
 clk_enable:

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: net: fec: memory corruption caused by err_enet_mii_probe error path
  2021-12-22 11:05   ` Kegl Rohit
@ 2021-12-23  1:53     ` Joakim Zhang
  2021-12-23 13:35       ` Kegl Rohit
  0 siblings, 1 reply; 7+ messages in thread
From: Joakim Zhang @ 2021-12-23  1:53 UTC (permalink / raw)
  To: Kegl Rohit; +Cc: netdev, Andrew Lunn


Hi Kegl,

> -----Original Message-----
> From: Kegl Rohit <keglrohit@gmail.com>
> Sent: 2021年12月22日 19:06
> To: Joakim Zhang <qiangqing.zhang@nxp.com>
> Cc: netdev <netdev@vger.kernel.org>; Andrew Lunn <andrew@lunn.ch>
> Subject: Re: net: fec: memory corruption caused by err_enet_mii_probe
> error path
> 
> On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang <qiangqing.zhang@nxp.com>
> wrote:
> 
> > > This error path frees the DMA buffers, BUT as far I could see it
> > > does not stop the DMA engines.
> > > => open() fails => frees buffers => DMA still active => MAC receives
> > > network packet => DMA starts => random memory corruption (use after
> > > free) => random kernel panics
> >
> > A question here, why receive path still active? MAC has not connected to
> PHY when this failure happened, should not see network activities.
> 
> It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.
> 
> One of our devices (out of 10) eth1 did not detect a phy for eth1 on ifup
> ( fec_open() ) => mdio error path => random system crashes But the phy is
> there and the link is good, only MDIO access failed.
> phys have autoneg activated after reset. So the RX path is active, even if the
> fec driver says that there is no phy.
> 
> Without attached ethernet cable the phy was also not detected, BUT the
> system did not crash. => Because no packets will arrive without attached
> cable.
> 
> So the main issue on our side is the not detected phy. And the other issue is
> the use after free in the error path.

Thanks for you detailed explanation :)

> I think the main issue has something to do with phy reset handling.
> On a cold boot the eth1 phy is detected successfully. A warm restart ( reboot
> -f ) will always lead to a not detected eth1 phy. The eth0 phy is always
> detected.
> From past experience the dual ethernet implementation in the driver was
> not the most stable one. Maybe because of a smaller user base.
> In our setup both phy reset lines are connected to gpios with the correct
> entry in the mac0 & mac1 DT entry.
> Revert of
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Ftorvalds%2Flinux%2Fcommit%2F7705b5ed8adccd921423019e870df
> 672fa423279%23diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb55111
> 2d3fe37e50b1a15&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C
> 2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C1%7C637757679518224181%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> 000&amp;sdata=yIQBX3v6PRvPerVBhdDcRJpjt1aIiq%2BtL8RXfNmKs%2FA%3
> D&amp;reserved=0
> seems to get the phys in a correct state before the fec driver is probed and
> both phys are detected.

You mean it always can detect both eth0 and eth1 PHY on a code boot, but eth1 PHY failed when do
warm reset, it may deserve a further analyze the different for these two scenarios.

I am not sure when you will hardware reset PHY on you platforms. For FEC driver, it seems only hardware
reset PHY when probe MAC (please note PHY reset has moved to phylib, not recommend to do it in MAC driver),
fec_probe->of_mdiobus_register->of_mdiobus_register_phy->of_mdiobus_phy_device_register->phy_device_register

Generally, a hardware reset will trigger an auto-nego together, so I think PHY should in a correct state when you ifup the device,
although, it will also trigger auto-nego when connected PHY. What I want to say is that you may need to check if the hardware reset
at you side is suitable? Not sure you use single GPIO to control two PHYs reset?

Just some analysis, hope this can give you some hints.
 
> > > So maybe fec_stop() as counterpart to fec_restart() is missing
> > > before freeing the buffers?
> > > err_enet_mii_probe:
> > > fec_stop(ndev);
> > > fec_enet_free_buffers(ndev);
> > >
> > > Issue happend with 5.10.83 and should also also happen with current
> master.
> >
> > It's fine for me, please see if anyone else has some comments. If not,
> please cook a formal patch, thanks.
> So fec_stop is the right guess to stop the rx/tx dma rings.
> 
> Other paths use netif_tx_lock_bh before calling fec_stop().
> I don't think this is necessary because netif_tx_start_all_queues() was not
> called before in this error path case?
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Ftorvalds%2Flinux%2Fblob%2Fmaster%2Fdrivers%2Fnet%2Fethern
> et%2Ffreescale%2Ffec_main.c%23L3207&amp;data=04%7C01%7Cqiangqing.z
> hang%40nxp.com%7C2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4
> c6fa92cd99c5c301635%7C0%7C1%7C637757679518224181%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> JXVCI6Mn0%3D%7C1000&amp;sdata=KfDdWkJ76Tvsd7nmXpNhdrY%2Fz9x24
> UYJl6fxjwvqtYA%3D&amp;reserved=0

ACK, please submit a patch if you test there is no regressions.

Best Regards,
Joakim Zhang
> index 1b1f7f2a6..8f208b4a9 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)
> 
>         return 0;
> 
>  err_enet_mii_probe:
> +       fec_stop(ndev);
>         fec_enet_free_buffers(ndev);
>  err_enet_alloc:
>         fec_enet_clk_enable(ndev, false);
>  clk_enable:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net: fec: memory corruption caused by err_enet_mii_probe error path
  2021-12-23  1:53     ` Joakim Zhang
@ 2021-12-23 13:35       ` Kegl Rohit
  2021-12-24  2:29         ` Joakim Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Kegl Rohit @ 2021-12-23 13:35 UTC (permalink / raw)
  To: Joakim Zhang; +Cc: netdev, Andrew Lunn

Thanks for the hint to use of_mdiobus_register instead of the old way
with fec { phy-reset-gpios = ....}

This change could solve another issue with the smsc phys.
I think phy_reset_after_clk_enable does only something if the phy was
specified in the DT with reset-gpio = <
Coming from an older kernel I was using the fecs phy-reset-gpios and
mdio auto probing and therefore phy_reset_after_clk_enable did
nothing.
And my smsc phys need this reset after enet_out enable (PHY_RST_AFTER_CLK_EN).

int phy_reset_after_clk_enable(struct phy_device *phydev)
{
if (!phydev || !phydev->drv)
return -ENODEV;

if (phydev->drv->flags & PHY_RST_AFTER_CLK_EN) {
phy_device_reset(phydev, 1);
phy_device_reset(phydev, 0);
return 1;
}
return 0;
}

So I adapted the DT to specify the reset gpios.
BUT this solved not my issue with the not detected phy.
So i changed the generic
compatible = "ethernet-phy-ieee802.3-c22";
to the exact phy id
compatible = "ethernet-phy-id0007.c0f0";

=> Now the phy is detected on every boot

if (!is_c45 && !of_get_phy_id(child, &phy_id))
phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
else
phy = get_phy_device(mdio, addr, is_c45);
if (IS_ERR(phy)) {
if (mii_ts)
unregister_mii_timestamper(mii_ts);
return PTR_ERR(phy);
}
rc = of_mdiobus_phy_device_register(mdio, phy, child, addr);

With a given phy id mdio probe will issue phy_device_create; otherwise
get_phy_device to read the id via mdio.

ethernet-phy@0 { reset-gpio = ....} gpio does not get
asserted/deasserted before the get_phy_device call.
Only the global mdio { reset-gpio = .... } gpio is asserted/deasserted
before the get_phy_device call.
So the phy does not get its reset after enet_out clock enable => phy
is in bad state => get_phy_device does not find the phy => no
of_mdiobus_phy_device_register => missing phy

I would need something like this in addition to ethernet-phy@0 {
reset-gpio = ....}
mdio { reset-gpio = <&gpio4 13 GPIO_ACTIVE_LOW &gpio2 7 GPIO_ACTIVE_LOW}
But mdio { reset-gpio } does only support a single gpio.

For now I am ok with the fixed phy ids.


&mac0 {
phy-mode = "rmii";
phy-handle = <&mac0_phy>;
phy-supply = <&reg_etn_3v3>;
pinctrl-names = "default";
pinctrl-0 = <&mac0_pins_a>, <&mac0_phy_reset_pin>, <&mac1_phy_reset_pin>;
status = "okay";


mdio {
#address-cells = <1>;
#size-cells = <0>;

mac0_phy: ethernet-phy@0 {
reg = <0>;
compatible = "ethernet-phy-id0007.c0f0";
reset-gpios = <&gpio4 13 GPIO_ACTIVE_LOW>;
reset-assert-us = <10000>;
reset-deassert-us = <30000>;
smsc,disable-energy-detect;
};

mac1_phy: ethernet-phy@1 {
reg = <1>;
compatible = "ethernet-phy-id0007.c0f0";
reset-gpios = <&gpio2 7 GPIO_ACTIVE_LOW>;
reset-assert-us = <10000>;
reset-deassert-us = <30000>;
smsc,disable-energy-detect;
};
};
};

On Thu, Dec 23, 2021 at 2:53 AM Joakim Zhang <qiangqing.zhang@nxp.com> wrote:
>
>
> Hi Kegl,
>
> > -----Original Message-----
> > From: Kegl Rohit <keglrohit@gmail.com>
> > Sent: 2021年12月22日 19:06
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > Cc: netdev <netdev@vger.kernel.org>; Andrew Lunn <andrew@lunn.ch>
> > Subject: Re: net: fec: memory corruption caused by err_enet_mii_probe
> > error path
> >
> > On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang <qiangqing.zhang@nxp.com>
> > wrote:
> >
> > > > This error path frees the DMA buffers, BUT as far I could see it
> > > > does not stop the DMA engines.
> > > > => open() fails => frees buffers => DMA still active => MAC receives
> > > > network packet => DMA starts => random memory corruption (use after
> > > > free) => random kernel panics
> > >
> > > A question here, why receive path still active? MAC has not connected to
> > PHY when this failure happened, should not see network activities.
> >
> > It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.
> >
> > One of our devices (out of 10) eth1 did not detect a phy for eth1 on ifup
> > ( fec_open() ) => mdio error path => random system crashes But the phy is
> > there and the link is good, only MDIO access failed.
> > phys have autoneg activated after reset. So the RX path is active, even if the
> > fec driver says that there is no phy.
> >
> > Without attached ethernet cable the phy was also not detected, BUT the
> > system did not crash. => Because no packets will arrive without attached
> > cable.
> >
> > So the main issue on our side is the not detected phy. And the other issue is
> > the use after free in the error path.
>
> Thanks for you detailed explanation :)
>
> > I think the main issue has something to do with phy reset handling.
> > On a cold boot the eth1 phy is detected successfully. A warm restart ( reboot
> > -f ) will always lead to a not detected eth1 phy. The eth0 phy is always
> > detected.
> > From past experience the dual ethernet implementation in the driver was
> > not the most stable one. Maybe because of a smaller user base.
> > In our setup both phy reset lines are connected to gpios with the correct
> > entry in the mac0 & mac1 DT entry.
> > Revert of
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > b.com%2Ftorvalds%2Flinux%2Fcommit%2F7705b5ed8adccd921423019e870df
> > 672fa423279%23diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb55111
> > 2d3fe37e50b1a15&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C
> > 2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4c6fa92cd99c5c301635
> > %7C0%7C1%7C637757679518224181%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> > 000&amp;sdata=yIQBX3v6PRvPerVBhdDcRJpjt1aIiq%2BtL8RXfNmKs%2FA%3
> > D&amp;reserved=0
> > seems to get the phys in a correct state before the fec driver is probed and
> > both phys are detected.
>
> You mean it always can detect both eth0 and eth1 PHY on a code boot, but eth1 PHY failed when do
> warm reset, it may deserve a further analyze the different for these two scenarios.
>
> I am not sure when you will hardware reset PHY on you platforms. For FEC driver, it seems only hardware
> reset PHY when probe MAC (please note PHY reset has moved to phylib, not recommend to do it in MAC driver),
> fec_probe->of_mdiobus_register->of_mdiobus_register_phy->of_mdiobus_phy_device_register->phy_device_register
>
> Generally, a hardware reset will trigger an auto-nego together, so I think PHY should in a correct state when you ifup the device,
> although, it will also trigger auto-nego when connected PHY. What I want to say is that you may need to check if the hardware reset
> at you side is suitable? Not sure you use single GPIO to control two PHYs reset?
>
> Just some analysis, hope this can give you some hints.
>
> > > > So maybe fec_stop() as counterpart to fec_restart() is missing
> > > > before freeing the buffers?
> > > > err_enet_mii_probe:
> > > > fec_stop(ndev);
> > > > fec_enet_free_buffers(ndev);
> > > >
> > > > Issue happend with 5.10.83 and should also also happen with current
> > master.
> > >
> > > It's fine for me, please see if anyone else has some comments. If not,
> > please cook a formal patch, thanks.
> > So fec_stop is the right guess to stop the rx/tx dma rings.
> >
> > Other paths use netif_tx_lock_bh before calling fec_stop().
> > I don't think this is necessary because netif_tx_start_all_queues() was not
> > called before in this error path case?
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > b.com%2Ftorvalds%2Flinux%2Fblob%2Fmaster%2Fdrivers%2Fnet%2Fethern
> > et%2Ffreescale%2Ffec_main.c%23L3207&amp;data=04%7C01%7Cqiangqing.z
> > hang%40nxp.com%7C2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4
> > c6fa92cd99c5c301635%7C0%7C1%7C637757679518224181%7CUnknown%7CT
> > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> > JXVCI6Mn0%3D%7C1000&amp;sdata=KfDdWkJ76Tvsd7nmXpNhdrY%2Fz9x24
> > UYJl6fxjwvqtYA%3D&amp;reserved=0
>
> ACK, please submit a patch if you test there is no regressions.
>
> Best Regards,
> Joakim Zhang
> > index 1b1f7f2a6..8f208b4a9 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)
> >
> >         return 0;
> >
> >  err_enet_mii_probe:
> > +       fec_stop(ndev);
> >         fec_enet_free_buffers(ndev);
> >  err_enet_alloc:
> >         fec_enet_clk_enable(ndev, false);
> >  clk_enable:

On Thu, Dec 23, 2021 at 2:53 AM Joakim Zhang <qiangqing.zhang@nxp.com> wrote:
>
>
> Hi Kegl,
>
> > -----Original Message-----
> > From: Kegl Rohit <keglrohit@gmail.com>
> > Sent: 2021年12月22日 19:06
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > Cc: netdev <netdev@vger.kernel.org>; Andrew Lunn <andrew@lunn.ch>
> > Subject: Re: net: fec: memory corruption caused by err_enet_mii_probe
> > error path
> >
> > On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang <qiangqing.zhang@nxp.com>
> > wrote:
> >
> > > > This error path frees the DMA buffers, BUT as far I could see it
> > > > does not stop the DMA engines.
> > > > => open() fails => frees buffers => DMA still active => MAC receives
> > > > network packet => DMA starts => random memory corruption (use after
> > > > free) => random kernel panics
> > >
> > > A question here, why receive path still active? MAC has not connected to
> > PHY when this failure happened, should not see network activities.
> >
> > It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.
> >
> > One of our devices (out of 10) eth1 did not detect a phy for eth1 on ifup
> > ( fec_open() ) => mdio error path => random system crashes But the phy is
> > there and the link is good, only MDIO access failed.
> > phys have autoneg activated after reset. So the RX path is active, even if the
> > fec driver says that there is no phy.
> >
> > Without attached ethernet cable the phy was also not detected, BUT the
> > system did not crash. => Because no packets will arrive without attached
> > cable.
> >
> > So the main issue on our side is the not detected phy. And the other issue is
> > the use after free in the error path.
>
> Thanks for you detailed explanation :)
>
> > I think the main issue has something to do with phy reset handling.
> > On a cold boot the eth1 phy is detected successfully. A warm restart ( reboot
> > -f ) will always lead to a not detected eth1 phy. The eth0 phy is always
> > detected.
> > From past experience the dual ethernet implementation in the driver was
> > not the most stable one. Maybe because of a smaller user base.
> > In our setup both phy reset lines are connected to gpios with the correct
> > entry in the mac0 & mac1 DT entry.
> > Revert of
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > b.com%2Ftorvalds%2Flinux%2Fcommit%2F7705b5ed8adccd921423019e870df
> > 672fa423279%23diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb55111
> > 2d3fe37e50b1a15&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C
> > 2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4c6fa92cd99c5c301635
> > %7C0%7C1%7C637757679518224181%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> > 000&amp;sdata=yIQBX3v6PRvPerVBhdDcRJpjt1aIiq%2BtL8RXfNmKs%2FA%3
> > D&amp;reserved=0
> > seems to get the phys in a correct state before the fec driver is probed and
> > both phys are detected.
>
> You mean it always can detect both eth0 and eth1 PHY on a code boot, but eth1 PHY failed when do
> warm reset, it may deserve a further analyze the different for these two scenarios.
>
> I am not sure when you will hardware reset PHY on you platforms. For FEC driver, it seems only hardware
> reset PHY when probe MAC (please note PHY reset has moved to phylib, not recommend to do it in MAC driver),
> fec_probe->of_mdiobus_register->of_mdiobus_register_phy->of_mdiobus_phy_device_register->phy_device_register
>
> Generally, a hardware reset will trigger an auto-nego together, so I think PHY should in a correct state when you ifup the device,
> although, it will also trigger auto-nego when connected PHY. What I want to say is that you may need to check if the hardware reset
> at you side is suitable? Not sure you use single GPIO to control two PHYs reset?
>
> Just some analysis, hope this can give you some hints.
>
> > > > So maybe fec_stop() as counterpart to fec_restart() is missing
> > > > before freeing the buffers?
> > > > err_enet_mii_probe:
> > > > fec_stop(ndev);
> > > > fec_enet_free_buffers(ndev);
> > > >
> > > > Issue happend with 5.10.83 and should also also happen with current
> > master.
> > >
> > > It's fine for me, please see if anyone else has some comments. If not,
> > please cook a formal patch, thanks.
> > So fec_stop is the right guess to stop the rx/tx dma rings.
> >
> > Other paths use netif_tx_lock_bh before calling fec_stop().
> > I don't think this is necessary because netif_tx_start_all_queues() was not
> > called before in this error path case?
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > b.com%2Ftorvalds%2Flinux%2Fblob%2Fmaster%2Fdrivers%2Fnet%2Fethern
> > et%2Ffreescale%2Ffec_main.c%23L3207&amp;data=04%7C01%7Cqiangqing.z
> > hang%40nxp.com%7C2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4
> > c6fa92cd99c5c301635%7C0%7C1%7C637757679518224181%7CUnknown%7CT
> > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> > JXVCI6Mn0%3D%7C1000&amp;sdata=KfDdWkJ76Tvsd7nmXpNhdrY%2Fz9x24
> > UYJl6fxjwvqtYA%3D&amp;reserved=0
>
> ACK, please submit a patch if you test there is no regressions.
>
> Best Regards,
> Joakim Zhang
> > index 1b1f7f2a6..8f208b4a9 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)
> >
> >         return 0;
> >
> >  err_enet_mii_probe:
> > +       fec_stop(ndev);
> >         fec_enet_free_buffers(ndev);
> >  err_enet_alloc:
> >         fec_enet_clk_enable(ndev, false);
> >  clk_enable:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: net: fec: memory corruption caused by err_enet_mii_probe error path
  2021-12-23 13:35       ` Kegl Rohit
@ 2021-12-24  2:29         ` Joakim Zhang
  0 siblings, 0 replies; 7+ messages in thread
From: Joakim Zhang @ 2021-12-24  2:29 UTC (permalink / raw)
  To: Kegl Rohit; +Cc: netdev, Andrew Lunn


Hi Kegl,

> -----Original Message-----
> From: Kegl Rohit <keglrohit@gmail.com>
> Sent: 2021年12月23日 21:35
> To: Joakim Zhang <qiangqing.zhang@nxp.com>
> Cc: netdev <netdev@vger.kernel.org>; Andrew Lunn <andrew@lunn.ch>
> Subject: Re: net: fec: memory corruption caused by err_enet_mii_probe
> error path
> 
> Thanks for the hint to use of_mdiobus_register instead of the old way with
> fec { phy-reset-gpios = ....}
> 
> This change could solve another issue with the smsc phys.
> I think phy_reset_after_clk_enable does only something if the phy was
> specified in the DT with reset-gpio = < Coming from an older kernel I was
> using the fecs phy-reset-gpios and mdio auto probing and therefore
> phy_reset_after_clk_enable did nothing.
> And my smsc phys need this reset after enet_out enable
> (PHY_RST_AFTER_CLK_EN).
> 
> int phy_reset_after_clk_enable(struct phy_device *phydev) { if (!phydev
> || !phydev->drv) return -ENODEV;
> 
> if (phydev->drv->flags & PHY_RST_AFTER_CLK_EN)
> { phy_device_reset(phydev, 1); phy_device_reset(phydev, 0); return 1; }
> return 0; }

Yes, please use mdio or phy reset, instead of mac reset, which marked as deprecated in DT-binding.
 
> So I adapted the DT to specify the reset gpios.
> BUT this solved not my issue with the not detected phy.
> So i changed the generic
> compatible = "ethernet-phy-ieee802.3-c22"; to the exact phy id compatible =
> "ethernet-phy-id0007.c0f0";
> 
> => Now the phy is detected on every boot
> 
> if (!is_c45 && !of_get_phy_id(child, &phy_id)) phy =
> phy_device_create(mdio, addr, phy_id, 0, NULL); else phy =
> get_phy_device(mdio, addr, is_c45); if (IS_ERR(phy)) { if (mii_ts)
> unregister_mii_timestamper(mii_ts);
> return PTR_ERR(phy);
> }
> rc = of_mdiobus_phy_device_register(mdio, phy, child, addr);
> 
> With a given phy id mdio probe will issue phy_device_create; otherwise
> get_phy_device to read the id via mdio.
> 
> ethernet-phy@0 { reset-gpio = ....} gpio does not get asserted/deasserted
> before the get_phy_device call.
> Only the global mdio { reset-gpio = .... } gpio is asserted/deasserted before
> the get_phy_device call.
> So the phy does not get its reset after enet_out clock enable => phy is in bad
> state => get_phy_device does not find the phy => no
> of_mdiobus_phy_device_register => missing phy

It sounds like that phylib would play a mdio reset very earlier, before we accessing PHY registers.

Yes, generic compatible and exact phy-id compatible indeed different at this point, you caught the root case, also
let me learn more :)

> I would need something like this in addition to ethernet-phy@0 { reset-gpio
> = ....} mdio { reset-gpio = <&gpio4 13 GPIO_ACTIVE_LOW &gpio2 7
> GPIO_ACTIVE_LOW} But mdio { reset-gpio } does only support a single gpio.
> 
> For now I am ok with the fixed phy ids.
> 
> 
> &mac0 {
> phy-mode = "rmii";
> phy-handle = <&mac0_phy>;
> phy-supply = <&reg_etn_3v3>;
> pinctrl-names = "default";
> pinctrl-0 = <&mac0_pins_a>, <&mac0_phy_reset_pin>,
> <&mac1_phy_reset_pin>; status = "okay";
> 
> 
> mdio {
> #address-cells = <1>;
> #size-cells = <0>;
> 
> mac0_phy: ethernet-phy@0 {
> reg = <0>;
> compatible = "ethernet-phy-id0007.c0f0"; reset-gpios = <&gpio4 13
> GPIO_ACTIVE_LOW>; reset-assert-us = <10000>; reset-deassert-us =
> <30000>; smsc,disable-energy-detect; };
> 
> mac1_phy: ethernet-phy@1 {
> reg = <1>;
> compatible = "ethernet-phy-id0007.c0f0"; reset-gpios = <&gpio2 7
> GPIO_ACTIVE_LOW>; reset-assert-us = <10000>; reset-deassert-us =
> <30000>; smsc,disable-energy-detect; }; }; };

Best Regards,
Joakim Zhang
> On Thu, Dec 23, 2021 at 2:53 AM Joakim Zhang <qiangqing.zhang@nxp.com>
> wrote:
> >
> >
> > Hi Kegl,
> >
> > > -----Original Message-----
> > > From: Kegl Rohit <keglrohit@gmail.com>
> > > Sent: 2021年12月22日 19:06
> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > > Cc: netdev <netdev@vger.kernel.org>; Andrew Lunn <andrew@lunn.ch>
> > > Subject: Re: net: fec: memory corruption caused by
> > > err_enet_mii_probe error path
> > >
> > > On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang
> > > <qiangqing.zhang@nxp.com>
> > > wrote:
> > >
> > > > > This error path frees the DMA buffers, BUT as far I could see it
> > > > > does not stop the DMA engines.
> > > > > => open() fails => frees buffers => DMA still active => MAC
> > > > > receives network packet => DMA starts => random memory
> > > > > corruption (use after
> > > > > free) => random kernel panics
> > > >
> > > > A question here, why receive path still active? MAC has not
> > > > connected to
> > > PHY when this failure happened, should not see network activities.
> > >
> > > It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.
> > >
> > > One of our devices (out of 10) eth1 did not detect a phy for eth1 on
> > > ifup ( fec_open() ) => mdio error path => random system crashes But
> > > the phy is there and the link is good, only MDIO access failed.
> > > phys have autoneg activated after reset. So the RX path is active,
> > > even if the fec driver says that there is no phy.
> > >
> > > Without attached ethernet cable the phy was also not detected, BUT
> > > the system did not crash. => Because no packets will arrive without
> > > attached cable.
> > >
> > > So the main issue on our side is the not detected phy. And the other
> > > issue is the use after free in the error path.
> >
> > Thanks for you detailed explanation :)
> >
> > > I think the main issue has something to do with phy reset handling.
> > > On a cold boot the eth1 phy is detected successfully. A warm restart
> > > ( reboot -f ) will always lead to a not detected eth1 phy. The eth0
> > > phy is always detected.
> > > From past experience the dual ethernet implementation in the driver
> > > was not the most stable one. Maybe because of a smaller user base.
> > > In our setup both phy reset lines are connected to gpios with the
> > > correct entry in the mac0 & mac1 DT entry.
> > > Revert of
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thu
> > >
> b.com%2Ftorvalds%2Flinux%2Fcommit%2F7705b5ed8adccd921423019e870df
> > >
> 672fa423279%23diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb55111
> > >
> 2d3fe37e50b1a15&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C
> > >
> 2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4c6fa92cd99c5c301635
> > > %7C0%7C1%7C637757679518224181%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIj
> > >
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> > >
> 000&amp;sdata=yIQBX3v6PRvPerVBhdDcRJpjt1aIiq%2BtL8RXfNmKs%2FA%3
> > > D&amp;reserved=0
> > > seems to get the phys in a correct state before the fec driver is
> > > probed and both phys are detected.
> >
> > You mean it always can detect both eth0 and eth1 PHY on a code boot,
> > but eth1 PHY failed when do warm reset, it may deserve a further analyze
> the different for these two scenarios.
> >
> > I am not sure when you will hardware reset PHY on you platforms. For
> > FEC driver, it seems only hardware reset PHY when probe MAC (please
> > note PHY reset has moved to phylib, not recommend to do it in MAC
> > driver),
> >
> fec_probe->of_mdiobus_register->of_mdiobus_register_phy->of_mdiobus
> _ph
> > y_device_register->phy_device_register
> >
> > Generally, a hardware reset will trigger an auto-nego together, so I
> > think PHY should in a correct state when you ifup the device,
> > although, it will also trigger auto-nego when connected PHY. What I want to
> say is that you may need to check if the hardware reset at you side is suitable?
> Not sure you use single GPIO to control two PHYs reset?
> >
> > Just some analysis, hope this can give you some hints.
> >
> > > > > So maybe fec_stop() as counterpart to fec_restart() is missing
> > > > > before freeing the buffers?
> > > > > err_enet_mii_probe:
> > > > > fec_stop(ndev);
> > > > > fec_enet_free_buffers(ndev);
> > > > >
> > > > > Issue happend with 5.10.83 and should also also happen with
> > > > > current
> > > master.
> > > >
> > > > It's fine for me, please see if anyone else has some comments. If
> > > > not,
> > > please cook a formal patch, thanks.
> > > So fec_stop is the right guess to stop the rx/tx dma rings.
> > >
> > > Other paths use netif_tx_lock_bh before calling fec_stop().
> > > I don't think this is necessary because netif_tx_start_all_queues()
> > > was not called before in this error path case?
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thu
> > >
> b.com%2Ftorvalds%2Flinux%2Fblob%2Fmaster%2Fdrivers%2Fnet%2Fethern
> > >
> et%2Ffreescale%2Ffec_main.c%23L3207&amp;data=04%7C01%7Cqiangqing.z
> > >
> hang%40nxp.com%7C2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4
> > >
> c6fa92cd99c5c301635%7C0%7C1%7C637757679518224181%7CUnknown%7CT
> > >
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> > >
> JXVCI6Mn0%3D%7C1000&amp;sdata=KfDdWkJ76Tvsd7nmXpNhdrY%2Fz9x24
> > > UYJl6fxjwvqtYA%3D&amp;reserved=0
> >
> > ACK, please submit a patch if you test there is no regressions.
> >
> > Best Regards,
> > Joakim Zhang
> > > index 1b1f7f2a6..8f208b4a9 100644
> > > --- a/drivers/net/ethernet/freescale/fec_main.c
> > > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > > @@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)
> > >
> > >         return 0;
> > >
> > >  err_enet_mii_probe:
> > > +       fec_stop(ndev);
> > >         fec_enet_free_buffers(ndev);
> > >  err_enet_alloc:
> > >         fec_enet_clk_enable(ndev, false);
> > >  clk_enable:
> 
> On Thu, Dec 23, 2021 at 2:53 AM Joakim Zhang <qiangqing.zhang@nxp.com>
> wrote:
> >
> >
> > Hi Kegl,
> >
> > > -----Original Message-----
> > > From: Kegl Rohit <keglrohit@gmail.com>
> > > Sent: 2021年12月22日 19:06
> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > > Cc: netdev <netdev@vger.kernel.org>; Andrew Lunn <andrew@lunn.ch>
> > > Subject: Re: net: fec: memory corruption caused by
> > > err_enet_mii_probe error path
> > >
> > > On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang
> > > <qiangqing.zhang@nxp.com>
> > > wrote:
> > >
> > > > > This error path frees the DMA buffers, BUT as far I could see it
> > > > > does not stop the DMA engines.
> > > > > => open() fails => frees buffers => DMA still active => MAC
> > > > > receives network packet => DMA starts => random memory
> > > > > corruption (use after
> > > > > free) => random kernel panics
> > > >
> > > > A question here, why receive path still active? MAC has not
> > > > connected to
> > > PHY when this failure happened, should not see network activities.
> > >
> > > It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.
> > >
> > > One of our devices (out of 10) eth1 did not detect a phy for eth1 on
> > > ifup ( fec_open() ) => mdio error path => random system crashes But
> > > the phy is there and the link is good, only MDIO access failed.
> > > phys have autoneg activated after reset. So the RX path is active,
> > > even if the fec driver says that there is no phy.
> > >
> > > Without attached ethernet cable the phy was also not detected, BUT
> > > the system did not crash. => Because no packets will arrive without
> > > attached cable.
> > >
> > > So the main issue on our side is the not detected phy. And the other
> > > issue is the use after free in the error path.
> >
> > Thanks for you detailed explanation :)
> >
> > > I think the main issue has something to do with phy reset handling.
> > > On a cold boot the eth1 phy is detected successfully. A warm restart
> > > ( reboot -f ) will always lead to a not detected eth1 phy. The eth0
> > > phy is always detected.
> > > From past experience the dual ethernet implementation in the driver
> > > was not the most stable one. Maybe because of a smaller user base.
> > > In our setup both phy reset lines are connected to gpios with the
> > > correct entry in the mac0 & mac1 DT entry.
> > > Revert of
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thu
> > >
> b.com%2Ftorvalds%2Flinux%2Fcommit%2F7705b5ed8adccd921423019e870df
> > >
> 672fa423279%23diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb55111
> > >
> 2d3fe37e50b1a15&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C
> > >
> 2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4c6fa92cd99c5c301635
> > > %7C0%7C1%7C637757679518224181%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIj
> > >
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> > >
> 000&amp;sdata=yIQBX3v6PRvPerVBhdDcRJpjt1aIiq%2BtL8RXfNmKs%2FA%3
> > > D&amp;reserved=0
> > > seems to get the phys in a correct state before the fec driver is
> > > probed and both phys are detected.
> >
> > You mean it always can detect both eth0 and eth1 PHY on a code boot,
> > but eth1 PHY failed when do warm reset, it may deserve a further analyze
> the different for these two scenarios.
> >
> > I am not sure when you will hardware reset PHY on you platforms. For
> > FEC driver, it seems only hardware reset PHY when probe MAC (please
> > note PHY reset has moved to phylib, not recommend to do it in MAC
> > driver),
> >
> fec_probe->of_mdiobus_register->of_mdiobus_register_phy->of_mdiobus
> _ph
> > y_device_register->phy_device_register
> >
> > Generally, a hardware reset will trigger an auto-nego together, so I
> > think PHY should in a correct state when you ifup the device,
> > although, it will also trigger auto-nego when connected PHY. What I want to
> say is that you may need to check if the hardware reset at you side is suitable?
> Not sure you use single GPIO to control two PHYs reset?
> >
> > Just some analysis, hope this can give you some hints.
> >
> > > > > So maybe fec_stop() as counterpart to fec_restart() is missing
> > > > > before freeing the buffers?
> > > > > err_enet_mii_probe:
> > > > > fec_stop(ndev);
> > > > > fec_enet_free_buffers(ndev);
> > > > >
> > > > > Issue happend with 5.10.83 and should also also happen with
> > > > > current
> > > master.
> > > >
> > > > It's fine for me, please see if anyone else has some comments. If
> > > > not,
> > > please cook a formal patch, thanks.
> > > So fec_stop is the right guess to stop the rx/tx dma rings.
> > >
> > > Other paths use netif_tx_lock_bh before calling fec_stop().
> > > I don't think this is necessary because netif_tx_start_all_queues()
> > > was not called before in this error path case?
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thu
> > >
> b.com%2Ftorvalds%2Flinux%2Fblob%2Fmaster%2Fdrivers%2Fnet%2Fethern
> > >
> et%2Ffreescale%2Ffec_main.c%23L3207&amp;data=04%7C01%7Cqiangqing.z
> > >
> hang%40nxp.com%7C2f10c3ff43de4ccccc8708d9c53b037e%7C686ea1d3bc2b4
> > >
> c6fa92cd99c5c301635%7C0%7C1%7C637757679518224181%7CUnknown%7CT
> > >
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> > >
> JXVCI6Mn0%3D%7C1000&amp;sdata=KfDdWkJ76Tvsd7nmXpNhdrY%2Fz9x24
> > > UYJl6fxjwvqtYA%3D&amp;reserved=0
> >
> > ACK, please submit a patch if you test there is no regressions.
> >
> > Best Regards,
> > Joakim Zhang
> > > index 1b1f7f2a6..8f208b4a9 100644
> > > --- a/drivers/net/ethernet/freescale/fec_main.c
> > > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > > @@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)
> > >
> > >         return 0;
> > >
> > >  err_enet_mii_probe:
> > > +       fec_stop(ndev);
> > >         fec_enet_free_buffers(ndev);
> > >  err_enet_alloc:
> > >         fec_enet_clk_enable(ndev, false);
> > >  clk_enable:

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-24  2:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-22  7:06 net: fec: memory corruption caused by err_enet_mii_probe error path Kegl Rohit
2021-12-22  8:01 ` Joakim Zhang
2021-12-22 10:33   ` Andrew Lunn
2021-12-22 11:05   ` Kegl Rohit
2021-12-23  1:53     ` Joakim Zhang
2021-12-23 13:35       ` Kegl Rohit
2021-12-24  2:29         ` Joakim Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.