netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: fec: Always call fec_restart() in resume path
@ 2024-02-12 10:50 John Ernberg
  2024-02-14  2:44 ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: John Ernberg @ 2024-02-12 10:50 UTC (permalink / raw)
  To: Wei Fang
  Cc: Shenwei Wang, Clark Wang, NXP Linux Team, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	John Ernberg

When trying to resume from suspend the following can be observed:

    fec 5b040000.ethernet eth0: MDIO read timeout
    Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
    Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110

This is because the MAC is left powered down after resuming from suspend.

The MAC is brought up in both probe and open, so leaving it off in resume
from suspend is an imbalance.
This imbalance combined with a LAN8700R that is permanently powered
results in unusuable networking if the board would happen to suspend before
the link is brought up, and the only way to get out of it would be a full
power cycle.

NOTE: With this change the PHY ends up taking different resume paths when
the link has never been up compared to once the link has been up. Currently
the resume process is identical and just happens at different times, so
this *should* not have any unforseen consequences.

Signed-off-by: John Ernberg <john.ernberg@actia.se>
---

Tested on 6.1 kernel and forward ported. I discovered this when we
upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had
this imbalance since at least 2009.

This is also why I target the -next tree, I can't identify a proper commit
to blame with a Fixes. Let me know if this should be the net tree anyway.

 drivers/net/ethernet/freescale/fec_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 42bdc01a304e..e6804c068d6b 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -4706,6 +4706,8 @@ static int __maybe_unused fec_resume(struct device *dev)
 		napi_enable(&fep->napi);
 		phy_init_hw(ndev->phydev);
 		phy_start(ndev->phydev);
+	} else {
+		fec_restart(ndev);
 	}
 	rtnl_unlock();
 
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path
  2024-02-12 10:50 [PATCH net-next] net: fec: Always call fec_restart() in resume path John Ernberg
@ 2024-02-14  2:44 ` Jakub Kicinski
  2024-02-14  8:27   ` John Ernberg
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2024-02-14  2:44 UTC (permalink / raw)
  To: John Ernberg
  Cc: Wei Fang, Shenwei Wang, Clark Wang, NXP Linux Team,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel

On Mon, 12 Feb 2024 10:50:30 +0000 John Ernberg wrote:
> Tested on 6.1 kernel and forward ported. I discovered this when we
> upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had
> this imbalance since at least 2009.
> 
> This is also why I target the -next tree, I can't identify a proper commit
> to blame with a Fixes. Let me know if this should be the net tree anyway.

I thought you bisected it to one or two specific changes?
I'd put those down as Fixes tags and target net.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path
  2024-02-14  2:44 ` Jakub Kicinski
@ 2024-02-14  8:27   ` John Ernberg
  2024-02-14 14:52     ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: John Ernberg @ 2024-02-14  8:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wei Fang, Shenwei Wang, Clark Wang, NXP Linux Team,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel

On 2/14/24 03:44, Jakub Kicinski wrote:
> On Mon, 12 Feb 2024 10:50:30 +0000 John Ernberg wrote:
>> Tested on 6.1 kernel and forward ported. I discovered this when we
>> upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had
>> this imbalance since at least 2009.
>>
>> This is also why I target the -next tree, I can't identify a proper commit
>> to blame with a Fixes. Let me know if this should be the net tree anyway.
> 
> I thought you bisected it to one or two specific changes?
> I'd put those down as Fixes tags and target net.

Hi Jakub,

You are correct, we thought so too at [1], but bisection is really hard 
because we need a whole bunch of patches on top to even boot the system 
(imx8qxp specific stuff in the NXP vendor tree that's difficult to 
rebase), we left it a bit open ended.

Over the course of the weekend I lost all confidence in my bisection 
after being confident for 4-5 days, because the more I thought about it 
the less it made sense for that commit to be the culprit.

I should probably have both followed up on that mail with that, and been 
clearer here. I apologize for failing that.

Best regards // John Ernberg

[1]: 
https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path
  2024-02-14  8:27   ` John Ernberg
@ 2024-02-14 14:52     ` Jakub Kicinski
  2024-02-14 15:49       ` John Ernberg
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2024-02-14 14:52 UTC (permalink / raw)
  To: John Ernberg
  Cc: Wei Fang, Shenwei Wang, Clark Wang, NXP Linux Team,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel

On Wed, 14 Feb 2024 08:27:02 +0000 John Ernberg wrote:
> You are correct, we thought so too at [1], but bisection is really hard 
> because we need a whole bunch of patches on top to even boot the system 
> (imx8qxp specific stuff in the NXP vendor tree that's difficult to 
> rebase), we left it a bit open ended.
> 
> Over the course of the weekend I lost all confidence in my bisection 
> after being confident for 4-5 days, because the more I thought about it 
> the less it made sense for that commit to be the culprit.
> 
> I should probably have both followed up on that mail with that, and been 
> clearer here. I apologize for failing that.

Is it perhaps possible that upstream 5.10 also didn't work?
I'm not saying the change itself is incorrect, indeed there 
is fec_restart() on probe and open paths, as you say.
Did you try reverting as many of the changes that happened
in the meantime as possible (instead of bisection)?

The other question is whether we need to enable any of the
clocks or runtime resume before calling fec_restart()?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path
  2024-02-14 14:52     ` Jakub Kicinski
@ 2024-02-14 15:49       ` John Ernberg
  0 siblings, 0 replies; 5+ messages in thread
From: John Ernberg @ 2024-02-14 15:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wei Fang, Shenwei Wang, Clark Wang, NXP Linux Team,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel

On 2/14/24 15:52, Jakub Kicinski wrote:
> On Wed, 14 Feb 2024 08:27:02 +0000 John Ernberg wrote:
>> You are correct, we thought so too at [1], but bisection is really hard
>> because we need a whole bunch of patches on top to even boot the system
>> (imx8qxp specific stuff in the NXP vendor tree that's difficult to
>> rebase), we left it a bit open ended.
>>
>> Over the course of the weekend I lost all confidence in my bisection
>> after being confident for 4-5 days, because the more I thought about it
>> the less it made sense for that commit to be the culprit.
>>
>> I should probably have both followed up on that mail with that, and been
>> clearer here. I apologize for failing that.
> 
> Is it perhaps possible that upstream 5.10 also didn't work?
> I'm not saying the change itself is incorrect, indeed there
> is fec_restart() on probe and open paths, as you say.
> Did you try reverting as many of the changes that happened
> in the meantime as possible (instead of bisection)?
> 

That's a really good point. I'll make some time for this in the next weeks.
Please mark it with changes requested in the meantime, as I expect to 
make changes to the patch when I have a result.

> The other question is whether we need to enable any of the
> clocks or runtime resume before calling fec_restart()?

On our board it works fine without it, I don't know enough about this 
SoC or other NXP SoCs to know if it's necessary in other situations.

The clocks are re-enabled in the open call which appears to be enough to 
get traffic going again when the link is brought up.

Perhaps NXP can fill us in?

Thanks! // John Ernberg

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-14 15:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-12 10:50 [PATCH net-next] net: fec: Always call fec_restart() in resume path John Ernberg
2024-02-14  2:44 ` Jakub Kicinski
2024-02-14  8:27   ` John Ernberg
2024-02-14 14:52     ` Jakub Kicinski
2024-02-14 15:49       ` John Ernberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).