linux-samsung-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
       [not found]   ` <CGME20220517101846eucas1p2c132f7e7032ed00996e222e9cc6cdf99@eucas1p2.samsung.com>
@ 2022-05-17 10:18     ` Marek Szyprowski
  2022-05-19 19:08       ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Marek Szyprowski @ 2022-05-17 10:18 UTC (permalink / raw)
  To: Lukas Wunner, David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

Hi Lukas,

On 12.05.2022 10:42, Lukas Wunner wrote:
> Link status of SMSC LAN95xx chips is polled once per second, even though
> they're capable of signaling PHY interrupts through the MAC layer.
>
> Forward those interrupts to the PHY driver to avoid polling.  Benefits
> are reduced bus traffic, reduced CPU overhead and quicker interface
> bringup.
>
> Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
> smsc95xx: fix link detection for disabled autonegotiation").
> Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
> hence couldn't detect link-up events when auto-negotiation was disabled.
> The proper solution would have been to enable the ENERGYON interrupt
> instead of polling.
>
> Since then, PHY handling was moved from the LAN95xx driver to the SMSC
> PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
> That PHY driver is capable of link detection with auto-negotiation
> disabled because it enables the ENERGYON interrupt.
>
> Note that signaling interrupts through the MAC layer not only works with
> the integrated PHY, but also with an external PHY, provided its
> interrupt pin is attached to LAN95xx's nPHY_INT pin.
>
> In the unlikely event that the interrupt pin of an external PHY is
> attached to a GPIO of the SoC (or not connected at all), the driver can
> be amended to retrieve the irq from the PHY's of_node.
>
> To forward PHY interrupts to phylib, it is not sufficient to call
> phy_mac_interrupt().  Instead, the PHY's interrupt handler needs to run
> so that PHY interrupts are cleared.  That's because according to page
> 119 of the LAN950x datasheet, "The source of this interrupt is a level.
> The interrupt persists until it is cleared in the PHY."
>
> https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
>
> Therefore, create an IRQ domain with a single IRQ for the PHY.  In the
> future, the IRQ domain may be extended to support the 11 GPIOs on the
> LAN95xx.
>
> Normally the PHY interrupt should be masked until the PHY driver has
> cleared it.  However masking requires a (sleeping) USB transaction and
> interrupts are received in (non-sleepable) softirq context.  I decided
> not to mask the interrupt at all (by using the dummy_irq_chip's noop
> ->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
> intervals and normally that's sufficient to wake the PHY driver's IRQ
> thread and have it clear the interrupt.  If it does take longer, worst
> thing that can happen is the IRQ thread is woken again.  No big deal.
>
> Because PHY interrupts are now perpetually enabled, there's no need to
> selectively enable them on suspend.  So remove all invocations of
> smsc95xx_enable_phy_wakeup_interrupts().
>
> In smsc95xx_resume(), move the call of phy_init_hw() before
> usbnet_resume() (which restarts the status URB) to ensure that the PHY
> is fully initialized when an interrupt is handled.
>
> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
> Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
> Cc: Andre Edich <andre.edich@microchip.com>

This patch landed in the recent linux next-20220516 as commit 
1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to 
avoid polling"). Unfortunately it breaks smsc95xx usb ethernet operation 
after system suspend-resume cycle. On the Odroid XU3 board I got the 
following warning in the kernel log:

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
PM: suspend entry (deep)
Filesystems sync: 0.001 seconds
Freezing user space processes ... (elapsed 0.002 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
printk: Suspending console(s) (use no_console_suspend to debug)
smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
------------[ cut here ]------------
WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946 
phy_state_machine+0x98/0x28c
Modules linked in: snd_soc_hdmi_codec snd_soc_odroid governor_passive 
snd_soc_i2s exynos_bus snd_soc_idma snd_soc_s3c_dma exynosdrm 
analogix_dp snd_soc_max98090 snd_soc_core ac97_bus snd_pcm_dmaengine 
snd_pcm clk_s2mps11 rtc_s5m snd_timer snd soundcore ina2xx exynos_gsc 
pwm_samsung exynos_adc s5p_jpeg ohci_exynosv4l2_mem2mem phy_exynos_usb2 
panfrost ehci_exynos s5p_mfc drm_shmem_helper videobuf2_dma_contig 
videobuf2_memops videobuf2_v4l2 videobuf2_common gpu_sched videodev mc 
exynos_ppmu exynos5422_dmc exynos_nocp s5p_sss rtc_s3c exynos_rng 
s3c2410_wdt s5p_cec pwm_fan
CPU: 2 PID: 73 Comm: kworker/2:1 Tainted: G        W 
5.18.0-rc6-01433-g1ce8b37241ed #5040
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_power_efficient phy_state_machine
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x40/0x4c
  dump_stack_lvl from __warn+0xc8/0x13c
  __warn from warn_slowpath_fmt+0x5c/0xb4
  warn_slowpath_fmt from phy_state_machine+0x98/0x28c
  phy_state_machine from process_one_work+0x1ec/0x4cc
  process_one_work from worker_thread+0x58/0x54c
  worker_thread from kthread+0xd0/0xec
  kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf0aa9fb0 to 0xf0aa9ff8)
...
---[ end trace 0000000000000000 ]---

It looks that the driver's suspend/resume operations might need some 
adjustments. After the system suspend/resume cycle the driver is not 
operational anymore. Reverting the $subject patch on top of linux 
next-20220516 restores ethernet operation after system suspend/resume.

> ---
> Only change since v2:
>   * Drop call to __irq_enter_raw() which worked around a warning in
>     generic_handle_domain_irq().  That warning is gone since
>     792ea6a074ae (queued on tip.git/irq/urgent).
>     (Marc Zyngier, Thomas Gleixner)
>
>   drivers/net/usb/smsc95xx.c | 113 ++++++++++++++++++++-----------------
>   1 file changed, 61 insertions(+), 52 deletions(-)

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-17 10:18     ` [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling Marek Szyprowski
@ 2022-05-19 19:08       ` Lukas Wunner
  2022-05-19 21:22         ` Marek Szyprowski
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-05-19 19:08 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> This patch landed in the recent linux next-20220516 as commit 
> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to 
> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet operation 
> after system suspend-resume cycle. On the Odroid XU3 board I got the 
> following warning in the kernel log:
> 
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> PM: suspend entry (deep)
> Filesystems sync: 0.001 seconds
> Freezing user space processes ... (elapsed 0.002 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> printk: Suspending console(s) (use no_console_suspend to debug)
> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> phy_state_machine+0x98/0x28c
[...]
> It looks that the driver's suspend/resume operations might need some 
> adjustments. After the system suspend/resume cycle the driver is not 
> operational anymore. Reverting the $subject patch on top of linux 
> next-20220516 restores ethernet operation after system suspend/resume.

Thanks a lot for the report.  It seems the PHY is signaling a link change
shortly before system sleep and by the time the phy_state_machine() worker
gets around to handle it, the device has already been suspended and thus
refuses any further USB requests with -EHOSTUNREACH (-113):

usb_suspend_both()
  usb_suspend_interface()
    smsc95xx_suspend()
      usbnet_suspend()
        __usbnet_status_stop_force() # stops interrupt polling,
                                     # link change is signaled before this

  udev->can_submit = 0               # refuse further URBs

Assuming the above theory is correct, calling phy_stop_machine()
after usbnet_suspend() would be sufficient to fix the issue.
It cancels the phy_state_machine() worker.

The small patch below does that.  Could you give it a spin?

Taking a step back though, I'm wondering if there's a bigger problem here:
This is a USB device, so we stop receiving interrupts once the Interrupt
Endpoint is no longer polled.  But what if a PHY's interrupt is attached
to a GPIO of the SoC and that interrupt is raised while the system is
suspending?  The interrupt handler may likewise try to reach an
inaccessible (suspended) device.

The right thing to do would probably be to signal wakeup.  But the
PHY drivers' irq handlers instead schedule the phy_state_machine().
Perhaps we need something like the following at the top of
phy_state_machine():

	if (phydev->suspended) {
		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
		return;
	}

However, phydev->suspended is set at the *bottom* of phy_suspend(),
it would have to be set at the *top* of mdio_bus_phy_suspend()
for the above to be correct.  Hmmm...

Thanks,

Lukas

-- >8 --
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index bd03e16..d351a6c 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1201,6 +1201,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
 	}
 
 	pdata->phydev->irq = phy_irq;
+	pdata->phydev->mac_managed_pm = true;
 	pdata->phydev->is_internal = is_internal_phy;
 
 	/* detect device revision as different features may be available */
@@ -1496,6 +1497,9 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 		return ret;
 	}
 
+	if (netif_running(dev->net))
+		phy_stop(pdata->phydev);
+
 	if (pdata->suspend_flags) {
 		netdev_warn(dev->net, "error during last resume\n");
 		pdata->suspend_flags = 0;
@@ -1778,6 +1782,8 @@ static int smsc95xx_resume(struct usb_interface *intf)
 	}
 
 	phy_init_hw(pdata->phydev);
+	if (netif_running(dev->net))
+		phy_start(pdata->phydev);
 
 	ret = usbnet_resume(intf);
 	if (ret < 0)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-19 19:08       ` Lukas Wunner
@ 2022-05-19 21:22         ` Marek Szyprowski
  2022-05-23  9:43           ` Lukas Wunner
  2022-08-26  6:51           ` Marek Szyprowski
  0 siblings, 2 replies; 26+ messages in thread
From: Marek Szyprowski @ 2022-05-19 21:22 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

Hi Lukas,

On 19.05.2022 21:08, Lukas Wunner wrote:
> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>> This patch landed in the recent linux next-20220516 as commit
>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to
>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet operation
>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>> following warning in the kernel log:
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>> PM: suspend entry (deep)
>> Filesystems sync: 0.001 seconds
>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> printk: Suspending console(s) (use no_console_suspend to debug)
>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>> phy_state_machine+0x98/0x28c
> [...]
>> It looks that the driver's suspend/resume operations might need some
>> adjustments. After the system suspend/resume cycle the driver is not
>> operational anymore. Reverting the $subject patch on top of linux
>> next-20220516 restores ethernet operation after system suspend/resume.
> Thanks a lot for the report.  It seems the PHY is signaling a link change
> shortly before system sleep and by the time the phy_state_machine() worker
> gets around to handle it, the device has already been suspended and thus
> refuses any further USB requests with -EHOSTUNREACH (-113):
>
> usb_suspend_both()
>    usb_suspend_interface()
>      smsc95xx_suspend()
>        usbnet_suspend()
>          __usbnet_status_stop_force() # stops interrupt polling,
>                                       # link change is signaled before this
>
>    udev->can_submit = 0               # refuse further URBs
>
> Assuming the above theory is correct, calling phy_stop_machine()
> after usbnet_suspend() would be sufficient to fix the issue.
> It cancels the phy_state_machine() worker.
>
> The small patch below does that.  Could you give it a spin?

That's it. Your analysis is right and the patch fixes the issue. Thanks!

Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> Taking a step back though, I'm wondering if there's a bigger problem here:
> This is a USB device, so we stop receiving interrupts once the Interrupt
> Endpoint is no longer polled.  But what if a PHY's interrupt is attached
> to a GPIO of the SoC and that interrupt is raised while the system is
> suspending?  The interrupt handler may likewise try to reach an
> inaccessible (suspended) device.
>
> The right thing to do would probably be to signal wakeup.  But the
> PHY drivers' irq handlers instead schedule the phy_state_machine().
> Perhaps we need something like the following at the top of
> phy_state_machine():
>
> 	if (phydev->suspended) {
> 		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
> 		return;
> 	}
>
> However, phydev->suspended is set at the *bottom* of phy_suspend(),
> it would have to be set at the *top* of mdio_bus_phy_suspend()
> for the above to be correct.  Hmmm...
Well, your concern sounds valid, but I don't have a board with such hw 
configuration, so I cannot really test.
> Thanks,
>
> Lukas
>
> -- >8 --
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index bd03e16..d351a6c 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1201,6 +1201,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
>   	}
>   
>   	pdata->phydev->irq = phy_irq;
> +	pdata->phydev->mac_managed_pm = true;
>   	pdata->phydev->is_internal = is_internal_phy;
>   
>   	/* detect device revision as different features may be available */
> @@ -1496,6 +1497,9 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
>   		return ret;
>   	}
>   
> +	if (netif_running(dev->net))
> +		phy_stop(pdata->phydev);
> +
>   	if (pdata->suspend_flags) {
>   		netdev_warn(dev->net, "error during last resume\n");
>   		pdata->suspend_flags = 0;
> @@ -1778,6 +1782,8 @@ static int smsc95xx_resume(struct usb_interface *intf)
>   	}
>   
>   	phy_init_hw(pdata->phydev);
> +	if (netif_running(dev->net))
> +		phy_start(pdata->phydev);
>   
>   	ret = usbnet_resume(intf);
>   	if (ret < 0)
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-19 21:22         ` Marek Szyprowski
@ 2022-05-23  9:43           ` Lukas Wunner
  2022-05-23 11:38             ` Marek Szyprowski
                               ` (2 more replies)
  2022-08-26  6:51           ` Marek Szyprowski
  1 sibling, 3 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-05-23  9:43 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On Thu, May 19, 2022 at 11:22:36PM +0200, Marek Szyprowski wrote:
> On 19.05.2022 21:08, Lukas Wunner wrote:
> > Taking a step back though, I'm wondering if there's a bigger problem here:
> > This is a USB device, so we stop receiving interrupts once the Interrupt
> > Endpoint is no longer polled.  But what if a PHY's interrupt is attached
> > to a GPIO of the SoC and that interrupt is raised while the system is
> > suspending?  The interrupt handler may likewise try to reach an
> > inaccessible (suspended) device.
> >
> > The right thing to do would probably be to signal wakeup.  But the
> > PHY drivers' irq handlers instead schedule the phy_state_machine().
> > Perhaps we need something like the following at the top of
> > phy_state_machine():
> >
> > 	if (phydev->suspended) {
> > 		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
> > 		return;
> > 	}
> >
> > However, phydev->suspended is set at the *bottom* of phy_suspend(),
> > it would have to be set at the *top* of mdio_bus_phy_suspend()
> > for the above to be correct.  Hmmm...
> 
> Well, your concern sounds valid, but I don't have a board with such hw 
> configuration, so I cannot really test.

I'm torn whether I should submit the quick fix in my last e-mail
or attempt to address the deeper issue.  The quick fix would ensure
v5.19-rc1 isn't broken, but if possible I'd rather address the deeper
issue...

Below is another patch.  Would you mind testing if it fixes the problem
for you?  It's a replacement for the patch in my last e-mail and seeks
to fix the problem for all drivers, not just smsc95xx.  If you don't
have time to test it, let me know and I'll just submit the quick fix
in my previous e-mail.

BTW, getting a PHY interrupt on suspend seems like a corner case to me,
so I'm amazed you found this and seem to be able to reproduce it 100%.
Out of curiosity, is this a CI test you're performing?

Thanks,

Lukas

-- >8 --

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ef62f357b76d..c2442c38d312 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -31,6 +31,7 @@
 #include <linux/io.h>
 #include <linux/uaccess.h>
 #include <linux/atomic.h>
+#include <linux/suspend.h>
 #include <net/netlink.h>
 #include <net/genetlink.h>
 #include <net/sock.h>
@@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 	struct phy_driver *drv = phydev->drv;
 	irqreturn_t ret;
 
+	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
+	    (phydev->mdio.dev.power.is_prepared ||
+	     phydev->mdio.dev.power.is_suspended)) {
+		struct net_device *netdev = phydev->attached_dev;
+
+		if (netdev) {
+			struct device *parent = netdev->dev.parent;
+
+			if (netdev->wol_enabled)
+				pm_system_wakeup();
+			else if (device_may_wakeup(&netdev->dev))
+				pm_wakeup_dev_event(&netdev->dev, 0, true);
+			else if (parent && device_may_wakeup(parent))
+				pm_wakeup_dev_event(parent, 0, true);
+		}
+
+		return IRQ_HANDLED;
+	}
+
 	mutex_lock(&phydev->lock);
 	ret = drv->handle_interrupt(phydev);
 	mutex_unlock(&phydev->lock);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 431a8719c635..da6d70ddf167 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
 	 * may call phy routines that try to grab the same lock, and that may
 	 * lead to a deadlock.
 	 */
-	if (phydev->attached_dev && phydev->adjust_link)
+	if (phydev->attached_dev && phydev->adjust_link) {
+		if (phy_interrupt_is_valid(phydev))
+			synchronize_irq(phydev->irq);
 		phy_stop_machine(phydev);
+	}
 
 	if (!mdio_bus_phy_may_suspend(phydev))
 		return 0;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-23  9:43           ` Lukas Wunner
@ 2022-05-23 11:38             ` Marek Szyprowski
  2022-05-23 12:34             ` Andrew Lunn
  2022-05-24  1:08             ` Andrew Lunn
  2 siblings, 0 replies; 26+ messages in thread
From: Marek Szyprowski @ 2022-05-23 11:38 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

Hi Lukas,

On 23.05.2022 11:43, Lukas Wunner wrote:
> On Thu, May 19, 2022 at 11:22:36PM +0200, Marek Szyprowski wrote:
>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>> Taking a step back though, I'm wondering if there's a bigger problem here:
>>> This is a USB device, so we stop receiving interrupts once the Interrupt
>>> Endpoint is no longer polled.  But what if a PHY's interrupt is attached
>>> to a GPIO of the SoC and that interrupt is raised while the system is
>>> suspending?  The interrupt handler may likewise try to reach an
>>> inaccessible (suspended) device.
>>>
>>> The right thing to do would probably be to signal wakeup.  But the
>>> PHY drivers' irq handlers instead schedule the phy_state_machine().
>>> Perhaps we need something like the following at the top of
>>> phy_state_machine():
>>>
>>> 	if (phydev->suspended) {
>>> 		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
>>> 		return;
>>> 	}
>>>
>>> However, phydev->suspended is set at the *bottom* of phy_suspend(),
>>> it would have to be set at the *top* of mdio_bus_phy_suspend()
>>> for the above to be correct.  Hmmm...
>> Well, your concern sounds valid, but I don't have a board with such hw
>> configuration, so I cannot really test.
> I'm torn whether I should submit the quick fix in my last e-mail
> or attempt to address the deeper issue.  The quick fix would ensure
> v5.19-rc1 isn't broken, but if possible I'd rather address the deeper
> issue...
>
> Below is another patch.  Would you mind testing if it fixes the problem
> for you?  It's a replacement for the patch in my last e-mail and seeks
> to fix the problem for all drivers, not just smsc95xx.  If you don't
> have time to test it, let me know and I'll just submit the quick fix
> in my previous e-mail.

I've just tested it on top of next-20220519 and I was not able to 
reproduce the issue, so it looks it also fixes the issue. :)

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> BTW, getting a PHY interrupt on suspend seems like a corner case to me,
> so I'm amazed you found this and seem to be able to reproduce it 100%.
> Out of curiosity, is this a CI test you're performing?

I've have some semi-automated (based on simple bash scripts) tests 
utilizing remote test boards (with remote power on/off control, serial 
console, tftp booting). This issue was quite easy to reproduce, even 
manually. Maybe it is somehow specific to the Odroid-XU3/XU3-lite boards 
and the way the smsc95xx USB ethernet chip is connected there, but it 
happens there usually in 2 of 3 suspend/resume tests.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-23  9:43           ` Lukas Wunner
  2022-05-23 11:38             ` Marek Szyprowski
@ 2022-05-23 12:34             ` Andrew Lunn
  2022-05-23 13:47               ` Lukas Wunner
  2022-05-24  1:08             ` Andrew Lunn
  2 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2022-05-23 12:34 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
>  	 * may call phy routines that try to grab the same lock, and that may
>  	 * lead to a deadlock.
>  	 */
> -	if (phydev->attached_dev && phydev->adjust_link)
> +	if (phydev->attached_dev && phydev->adjust_link) {
> +		if (phy_interrupt_is_valid(phydev))
> +			synchronize_irq(phydev->irq);
>  		phy_stop_machine(phydev);
> +	}

What is this hunk trying to achieve? As far as i know, interrupts have
not been disabled. So as soon as the call to synchronize_irq()
finishes, could well be another interrupt happens.

	  Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-23 12:34             ` Andrew Lunn
@ 2022-05-23 13:47               ` Lukas Wunner
  2022-05-24  0:52                 ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-05-23 13:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

On Mon, May 23, 2022 at 02:34:49PM +0200, Andrew Lunn wrote:
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
> >  	 * may call phy routines that try to grab the same lock, and that may
> >  	 * lead to a deadlock.
> >  	 */
> > -	if (phydev->attached_dev && phydev->adjust_link)
> > +	if (phydev->attached_dev && phydev->adjust_link) {
> > +		if (phy_interrupt_is_valid(phydev))
> > +			synchronize_irq(phydev->irq);
> >  		phy_stop_machine(phydev);
> > +	}
> 
> What is this hunk trying to achieve? As far as i know, interrupts have
> not been disabled. So as soon as the call to synchronize_irq()
> finishes, could well be another interrupt happens.

That other interrupt would bail out of phy_interrupt() because
the is_prepared flag is set on the PHY's struct device, see
first hunk of the patch.

The problem is that an interrupt may occur before the system
sleep transition commences.  phy_interrupt() will notice that
is_prepared is not (yet) set, hence invokes drv->handle_interrupt().
Let's say the IRQ thread is preempted at that point, the system
sleep transition is started and mdio_bus_phy_suspend() is run.
It calls phy_stop_machine(), so the state machine is now stopped.
Now phy_interrupt() continues, and the PHY driver's ->handle_interrupt()
callback starts the state machine.  Boom, that's not what we want.

So the synchronize_irq() ensures that any already running
phy_interrupt() runs to completion before phy_stop_machine()
is called.  It doesn't matter if another interrupt occurs
because then is_prepared will have been set and therefore
phy_interrupt() won't call drv->handle_interrupt().

Let me know if I haven't explained it in sufficient clarity,
I'll be happy to try again. :)

I'm more concerned about the first hunk of the patch because I'm
not sure I got the wakeup stuff right...

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-23 13:47               ` Lukas Wunner
@ 2022-05-24  0:52                 ` Andrew Lunn
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Lunn @ 2022-05-24  0:52 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

On Mon, May 23, 2022 at 03:47:09PM +0200, Lukas Wunner wrote:
> On Mon, May 23, 2022 at 02:34:49PM +0200, Andrew Lunn wrote:
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
> > >  	 * may call phy routines that try to grab the same lock, and that may
> > >  	 * lead to a deadlock.
> > >  	 */
> > > -	if (phydev->attached_dev && phydev->adjust_link)
> > > +	if (phydev->attached_dev && phydev->adjust_link) {
> > > +		if (phy_interrupt_is_valid(phydev))
> > > +			synchronize_irq(phydev->irq);
> > >  		phy_stop_machine(phydev);
> > > +	}
> > 
> > What is this hunk trying to achieve? As far as i know, interrupts have
> > not been disabled. So as soon as the call to synchronize_irq()
> > finishes, could well be another interrupt happens.
> 
> That other interrupt would bail out of phy_interrupt() because
> the is_prepared flag is set on the PHY's struct device, see
> first hunk of the patch.
> 
> The problem is that an interrupt may occur before the system
> sleep transition commences.  phy_interrupt() will notice that
> is_prepared is not (yet) set, hence invokes drv->handle_interrupt().
> Let's say the IRQ thread is preempted at that point, the system
> sleep transition is started and mdio_bus_phy_suspend() is run.
> It calls phy_stop_machine(), so the state machine is now stopped.
> Now phy_interrupt() continues, and the PHY driver's ->handle_interrupt()
> callback starts the state machine.  Boom, that's not what we want.
> 
> So the synchronize_irq() ensures that any already running
> phy_interrupt() runs to completion before phy_stop_machine()
> is called.  It doesn't matter if another interrupt occurs
> because then is_prepared will have been set and therefore
> phy_interrupt() won't call drv->handle_interrupt().
> 
> Let me know if I haven't explained it in sufficient clarity,
> I'll be happy to try again. :)

I think some comments are needed. If i don't understand what is going
on, i'm sure others don't as well.

    Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-23  9:43           ` Lukas Wunner
  2022-05-23 11:38             ` Marek Szyprowski
  2022-05-23 12:34             ` Andrew Lunn
@ 2022-05-24  1:08             ` Andrew Lunn
  2022-05-24  6:16               ` Marek Szyprowski
  2022-05-24 12:13               ` Lukas Wunner
  2 siblings, 2 replies; 26+ messages in thread
From: Andrew Lunn @ 2022-05-24  1:08 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

> @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>  	struct phy_driver *drv = phydev->drv;
>  	irqreturn_t ret;
>  
> +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
> +	    (phydev->mdio.dev.power.is_prepared ||
> +	     phydev->mdio.dev.power.is_suspended)) {
> +		struct net_device *netdev = phydev->attached_dev;
> +
> +		if (netdev) {
> +			struct device *parent = netdev->dev.parent;
> +
> +			if (netdev->wol_enabled)
> +				pm_system_wakeup();
> +			else if (device_may_wakeup(&netdev->dev))
> +				pm_wakeup_dev_event(&netdev->dev, 0, true);
> +			else if (parent && device_may_wakeup(parent))
> +				pm_wakeup_dev_event(parent, 0, true);
> +		}
> +
> +		return IRQ_HANDLED;

I'm not sure you can just throw the interrupt away. There have been
issues with WoL, where the WoL signal has been applied to a PMC, not
an actual interrupt. Yet the PHY driver assumes it is an
interrupt. And in order for WoL to work correctly, it needs the
interrupt handler to be called. We said the hardware is broken, WoL
cannot work for that setup.

Here you have correct hardware, but you are throwing the interrupt
away, which will have the same result. So i think you need to abort
the suspend, get the bus working again, and call the interrupt
handler. If this is a WoL interrupt you are supposed to be waking up
anyway.

	Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-24  1:08             ` Andrew Lunn
@ 2022-05-24  6:16               ` Marek Szyprowski
  2022-05-24 12:03                 ` Andrew Lunn
  2022-05-24 12:13               ` Lukas Wunner
  1 sibling, 1 reply; 26+ messages in thread
From: Marek Szyprowski @ 2022-05-24  6:16 UTC (permalink / raw)
  To: Andrew Lunn, Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

On 24.05.2022 03:08, Andrew Lunn wrote:
>> @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>>   	struct phy_driver *drv = phydev->drv;
>>   	irqreturn_t ret;
>>   
>> +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
>> +	    (phydev->mdio.dev.power.is_prepared ||
>> +	     phydev->mdio.dev.power.is_suspended)) {
>> +		struct net_device *netdev = phydev->attached_dev;
>> +
>> +		if (netdev) {
>> +			struct device *parent = netdev->dev.parent;
>> +
>> +			if (netdev->wol_enabled)
>> +				pm_system_wakeup();
>> +			else if (device_may_wakeup(&netdev->dev))
>> +				pm_wakeup_dev_event(&netdev->dev, 0, true);
>> +			else if (parent && device_may_wakeup(parent))
>> +				pm_wakeup_dev_event(parent, 0, true);
>> +		}
>> +
>> +		return IRQ_HANDLED;
> I'm not sure you can just throw the interrupt away. There have been
> issues with WoL, where the WoL signal has been applied to a PMC, not
> an actual interrupt. Yet the PHY driver assumes it is an
> interrupt. And in order for WoL to work correctly, it needs the
> interrupt handler to be called. We said the hardware is broken, WoL
> cannot work for that setup.
>
> Here you have correct hardware, but you are throwing the interrupt
> away, which will have the same result. So i think you need to abort
> the suspend, get the bus working again, and call the interrupt
> handler. If this is a WoL interrupt you are supposed to be waking up
> anyway.

This hardware doesn't support wake-on-lan. It looks somehow that it 
manages to throw an interrupt just a moment before the power regulator 
for the whole usb bus is cut off.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-24  6:16               ` Marek Szyprowski
@ 2022-05-24 12:03                 ` Andrew Lunn
  2022-05-26 13:55                   ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2022-05-24 12:03 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Lukas Wunner, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

On Tue, May 24, 2022 at 08:16:23AM +0200, Marek Szyprowski wrote:
> On 24.05.2022 03:08, Andrew Lunn wrote:
> >> @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
> >>   	struct phy_driver *drv = phydev->drv;
> >>   	irqreturn_t ret;
> >>   
> >> +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
> >> +	    (phydev->mdio.dev.power.is_prepared ||
> >> +	     phydev->mdio.dev.power.is_suspended)) {
> >> +		struct net_device *netdev = phydev->attached_dev;
> >> +
> >> +		if (netdev) {
> >> +			struct device *parent = netdev->dev.parent;
> >> +
> >> +			if (netdev->wol_enabled)
> >> +				pm_system_wakeup();
> >> +			else if (device_may_wakeup(&netdev->dev))
> >> +				pm_wakeup_dev_event(&netdev->dev, 0, true);
> >> +			else if (parent && device_may_wakeup(parent))
> >> +				pm_wakeup_dev_event(parent, 0, true);
> >> +		}
> >> +
> >> +		return IRQ_HANDLED;
> > I'm not sure you can just throw the interrupt away. There have been
> > issues with WoL, where the WoL signal has been applied to a PMC, not
> > an actual interrupt. Yet the PHY driver assumes it is an
> > interrupt. And in order for WoL to work correctly, it needs the
> > interrupt handler to be called. We said the hardware is broken, WoL
> > cannot work for that setup.
> >
> > Here you have correct hardware, but you are throwing the interrupt
> > away, which will have the same result. So i think you need to abort
> > the suspend, get the bus working again, and call the interrupt
> > handler. If this is a WoL interrupt you are supposed to be waking up
> > anyway.
> 
> This hardware doesn't support wake-on-lan. It looks somehow that it 
> manages to throw an interrupt just a moment before the power regulator 
> for the whole usb bus is cut off.

Your hardware might not support WOL, but this is generic code. It
needs to work for all hardware.

As for this hardware, if it does not support WOL, why are interrupts
still enabled?

      Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-24  1:08             ` Andrew Lunn
  2022-05-24  6:16               ` Marek Szyprowski
@ 2022-05-24 12:13               ` Lukas Wunner
  2022-06-06  1:28                 ` Andrew Lunn
  1 sibling, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-05-24 12:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

On Tue, May 24, 2022 at 03:08:07AM +0200, Andrew Lunn wrote:
> > @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
> >  	struct phy_driver *drv = phydev->drv;
> >  	irqreturn_t ret;
> >  
> > +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
> > +	    (phydev->mdio.dev.power.is_prepared ||
> > +	     phydev->mdio.dev.power.is_suspended)) {
> > +		struct net_device *netdev = phydev->attached_dev;
> > +
> > +		if (netdev) {
> > +			struct device *parent = netdev->dev.parent;
> > +
> > +			if (netdev->wol_enabled)
> > +				pm_system_wakeup();
> > +			else if (device_may_wakeup(&netdev->dev))
> > +				pm_wakeup_dev_event(&netdev->dev, 0, true);
> > +			else if (parent && device_may_wakeup(parent))
> > +				pm_wakeup_dev_event(parent, 0, true);
> > +		}
> > +
> > +		return IRQ_HANDLED;
> 
> I'm not sure you can just throw the interrupt away. There have been
> issues with WoL, where the WoL signal has been applied to a PMC, not
> an actual interrupt. Yet the PHY driver assumes it is an
> interrupt. And in order for WoL to work correctly, it needs the
> interrupt handler to be called. We said the hardware is broken, WoL
> cannot work for that setup.
> 
> Here you have correct hardware, but you are throwing the interrupt
> away, which will have the same result. So i think you need to abort
> the suspend, get the bus working again, and call the interrupt
> handler. If this is a WoL interrupt you are supposed to be waking up
> anyway.

mdio_bus_phy_resume() does trigger the state machine via
phy_start_machine(), so link state changes *are* detected after wakeup.

But you're saying that's not sufficient and you really want the
PHY driver's IRQ handler to be called, do I understand that correctly?

That could be achieved with a flag indicating that the IRQ handler
needs to be rerun after resume.  A simple invocation of irq_wake_thread()
will then achieve that.

It has also occurred to me that not clearing the IRQ may lead to
an interrupt storm if it's level-triggered.  So I need to disable
the IRQ and re-enable it after the PHY has been resumed.
Back to the drawing board...

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-24 12:03                 ` Andrew Lunn
@ 2022-05-26 13:55                   ` Lukas Wunner
  0 siblings, 0 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-05-26 13:55 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

On Tue, May 24, 2022 at 02:03:52PM +0200, Andrew Lunn wrote:
> As for this hardware, if it does not support WOL, why are interrupts
> still enabled?

LAN95xx chips do support WoL and will signal a USB wake event.
But whether that actually results in resume from system sleep
depends on the capabilities of the SoC and its USB host controller.

LAN95xx supports a variety of wake options (WoL, PHY Energy Detect, ...)
and can use either its integrated SMSC PHY or an external PHY.
I'm not sure all wake options will work with arbitrary external PHYs.

If WoL or Wake on PHY Energy Detect is not used, we just program the
LAN95xx to enter a deeper power state which results in the respective
wake events being ignored.  As a result, interrupts may be left enabled
even though they're not used as a wakeup source.  The phylib doesn't
provide an API to selectively disable or enable interrupts, other than
phy_stop() and phy_start(), which does a lot more.

The patch I've submitted today treats such unnecessarily enabled
interrupts leniently:  It will not signal wakeup if that wasn't enabled
and just remembers that an interrupt occurred.  The interrupt will be
replayed upon resume and that's it.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-24 12:13               ` Lukas Wunner
@ 2022-06-06  1:28                 ` Andrew Lunn
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Lunn @ 2022-06-06  1:28 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Russell King, Ferry Toth,
	Krzysztof Kozlowski, 'Linux Samsung SOC'

> mdio_bus_phy_resume() does trigger the state machine via
> phy_start_machine(), so link state changes *are* detected after wakeup.
> 
> But you're saying that's not sufficient and you really want the
> PHY driver's IRQ handler to be called, do I understand that correctly?

It is an interrupt, so i would expect the handler to be called. I've
never looked deeply how the kernel handles this, but maybe there is
some core support for this. The kernel does know about wake up
interrupts. The interesting bit is how do you defer the interrupt
until you have enough of the system running again you can actually
service the interrupt.

PHY interrupts mostly are level, not edge, because there are multiple
sources of interrupts within the PHY. So you do need to clear the
interrupt source, or you are going to get a storm, as you pointed out.
But being a level might actually help you. It fires once to get you
out of sleep, and then fires again when the interrupt controller is
resumed and is enabled.

	  Andrew
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-05-19 21:22         ` Marek Szyprowski
  2022-05-23  9:43           ` Lukas Wunner
@ 2022-08-26  6:51           ` Marek Szyprowski
  2022-08-26  7:19             ` Lukas Wunner
  1 sibling, 1 reply; 26+ messages in thread
From: Marek Szyprowski @ 2022-08-26  6:51 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On 19.05.2022 23:22, Marek Szyprowski wrote:
> On 19.05.2022 21:08, Lukas Wunner wrote:
>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>> This patch landed in the recent linux next-20220516 as commit
>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY 
>>> driver to
>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet 
>>> operation
>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>>> following warning in the kernel log:
>>>
>>> # time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>>> PM: suspend entry (deep)
>>> Filesystems sync: 0.001 seconds
>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>> phy_state_machine+0x98/0x28c
>> [...]
>>> It looks that the driver's suspend/resume operations might need some
>>> adjustments. After the system suspend/resume cycle the driver is not
>>> operational anymore. Reverting the $subject patch on top of linux
>>> next-20220516 restores ethernet operation after system suspend/resume.
>> Thanks a lot for the report.  It seems the PHY is signaling a link 
>> change
>> shortly before system sleep and by the time the phy_state_machine() 
>> worker
>> gets around to handle it, the device has already been suspended and thus
>> refuses any further USB requests with -EHOSTUNREACH (-113):
>>
>> usb_suspend_both()
>>    usb_suspend_interface()
>>      smsc95xx_suspend()
>>        usbnet_suspend()
>>          __usbnet_status_stop_force() # stops interrupt polling,
>>                                       # link change is signaled 
>> before this
>>
>>    udev->can_submit = 0               # refuse further URBs
>>
>> Assuming the above theory is correct, calling phy_stop_machine()
>> after usbnet_suspend() would be sufficient to fix the issue.
>> It cancels the phy_state_machine() worker.
>>
>> The small patch below does that.  Could you give it a spin?
>
> That's it. Your analysis is right and the patch fixes the issue. Thanks!
>
> Feel free to add:
>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>


Gentle ping for the final patch...

It looks that the similar fix is posted for other drivers, i.e.:

https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/


>
>> Taking a step back though, I'm wondering if there's a bigger problem 
>> here:
>> This is a USB device, so we stop receiving interrupts once the Interrupt
>> Endpoint is no longer polled.  But what if a PHY's interrupt is attached
>> to a GPIO of the SoC and that interrupt is raised while the system is
>> suspending?  The interrupt handler may likewise try to reach an
>> inaccessible (suspended) device.
>>
>> The right thing to do would probably be to signal wakeup.  But the
>> PHY drivers' irq handlers instead schedule the phy_state_machine().
>> Perhaps we need something like the following at the top of
>> phy_state_machine():
>>
>>     if (phydev->suspended) {
>>         pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
>>         return;
>>     }
>>
>> However, phydev->suspended is set at the *bottom* of phy_suspend(),
>> it would have to be set at the *top* of mdio_bus_phy_suspend()
>> for the above to be correct.  Hmmm...
> Well, your concern sounds valid, but I don't have a board with such hw 
> configuration, so I cannot really test.
>> Thanks,
>>
>> Lukas
>>
>> -- >8 --
>> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
>> index bd03e16..d351a6c 100644
>> --- a/drivers/net/usb/smsc95xx.c
>> +++ b/drivers/net/usb/smsc95xx.c
>> @@ -1201,6 +1201,7 @@ static int smsc95xx_bind(struct usbnet *dev, 
>> struct usb_interface *intf)
>>       }
>>         pdata->phydev->irq = phy_irq;
>> +    pdata->phydev->mac_managed_pm = true;
>>       pdata->phydev->is_internal = is_internal_phy;
>>         /* detect device revision as different features may be 
>> available */
>> @@ -1496,6 +1497,9 @@ static int smsc95xx_suspend(struct 
>> usb_interface *intf, pm_message_t message)
>>           return ret;
>>       }
>>   +    if (netif_running(dev->net))
>> +        phy_stop(pdata->phydev);
>> +
>>       if (pdata->suspend_flags) {
>>           netdev_warn(dev->net, "error during last resume\n");
>>           pdata->suspend_flags = 0;
>> @@ -1778,6 +1782,8 @@ static int smsc95xx_resume(struct usb_interface 
>> *intf)
>>       }
>>         phy_init_hw(pdata->phydev);
>> +    if (netif_running(dev->net))
>> +        phy_start(pdata->phydev);
>>         ret = usbnet_resume(intf);
>>       if (ret < 0)
>>
> Best regards

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-08-26  6:51           ` Marek Szyprowski
@ 2022-08-26  7:19             ` Lukas Wunner
  2022-08-26  7:41               ` Marek Szyprowski
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-08-26  7:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
> On 19.05.2022 23:22, Marek Szyprowski wrote:
> > On 19.05.2022 21:08, Lukas Wunner wrote:
> >> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> >>> This patch landed in the recent linux next-20220516 as commit
> >>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY 
> >>> driver to
> >>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet 
> >>> operation
> >>> after system suspend-resume cycle. On the Odroid XU3 board I got the
> >>> following warning in the kernel log:
> >>>
> >>> # time rtcwake -s10 -mmem
> >>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> >>> PM: suspend entry (deep)
> >>> Filesystems sync: 0.001 seconds
> >>> Freezing user space processes ... (elapsed 0.002 seconds) done.
> >>> OOM killer disabled.
> >>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>> printk: Suspending console(s) (use no_console_suspend to debug)
> >>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> >>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> >>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> >>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> >>> ------------[ cut here ]------------
> >>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> >>> phy_state_machine+0x98/0x28c
> >> [...]
> >>> It looks that the driver's suspend/resume operations might need some
> >>> adjustments. After the system suspend/resume cycle the driver is not
> >>> operational anymore. Reverting the $subject patch on top of linux
> >>> next-20220516 restores ethernet operation after system suspend/resume.
> >> Thanks a lot for the report. It seems the PHY is signaling a link 
> >> change
> >> shortly before system sleep and by the time the phy_state_machine() 
> >> worker
> >> gets around to handle it, the device has already been suspended and thus
> >> refuses any further USB requests with -EHOSTUNREACH (-113):
[...]
> >> Assuming the above theory is correct, calling phy_stop_machine()
> >> after usbnet_suspend() would be sufficient to fix the issue.
> >> It cancels the phy_state_machine() worker.
> >>
> >> The small patch below does that. Could you give it a spin?
> >
> > That's it. Your analysis is right and the patch fixes the issue. Thanks!
> >
> > Feel free to add:
> >
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Gentle ping for the final patch...

Hm?  Actually this issue is supposed to be fixed by mainline commit
1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").

The initial fix attempt that you're replying to should not be necessary
with that commit.

Are you still seeing issues even with 1758bde2e4aa applied?
Or are you maybe using a custom downstream tree which is missing that commit?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-08-26  7:19             ` Lukas Wunner
@ 2022-08-26  7:41               ` Marek Szyprowski
  2022-08-26  7:53                 ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Marek Szyprowski @ 2022-08-26  7:41 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

Hi Lukas,

On 26.08.2022 09:19, Lukas Wunner wrote:
> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
>> On 19.05.2022 23:22, Marek Szyprowski wrote:
>>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>>>> This patch landed in the recent linux next-20220516 as commit
>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
>>>>> driver to
>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
>>>>> operation
>>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>>>>> following warning in the kernel log:
>>>>>
>>>>> # time rtcwake -s10 -mmem
>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>>>>> PM: suspend entry (deep)
>>>>> Filesystems sync: 0.001 seconds
>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>> OOM killer disabled.
>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>>>> ------------[ cut here ]------------
>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>>>> phy_state_machine+0x98/0x28c
>>>> [...]
>>>>> It looks that the driver's suspend/resume operations might need some
>>>>> adjustments. After the system suspend/resume cycle the driver is not
>>>>> operational anymore. Reverting the $subject patch on top of linux
>>>>> next-20220516 restores ethernet operation after system suspend/resume.
>>>> Thanks a lot for the report. It seems the PHY is signaling a link
>>>> change
>>>> shortly before system sleep and by the time the phy_state_machine()
>>>> worker
>>>> gets around to handle it, the device has already been suspended and thus
>>>> refuses any further USB requests with -EHOSTUNREACH (-113):
> [...]
>>>> Assuming the above theory is correct, calling phy_stop_machine()
>>>> after usbnet_suspend() would be sufficient to fix the issue.
>>>> It cancels the phy_state_machine() worker.
>>>>
>>>> The small patch below does that. Could you give it a spin?
>>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
>>>
>>> Feel free to add:
>>>
>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>
>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Gentle ping for the final patch...
> Hm?  Actually this issue is supposed to be fixed by mainline commit
> 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
>
> The initial fix attempt that you're replying to should not be necessary
> with that commit.
>
> Are you still seeing issues even with 1758bde2e4aa applied?
> Or are you maybe using a custom downstream tree which is missing that commit?

On Linux next-20220825 I still get the following warning during 
suspend/resume cycle:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323 
mdio_bus_phy_resume+0x10c/0x110
Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig 
v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev 
mc s5p_cec
CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __warn+0xc8/0x220
  __warn from warn_slowpath_fmt+0x5c/0xb4
  warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
  mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
  dpm_run_callback from device_resume+0x124/0x21c
  device_resume from dpm_resume+0x108/0x278
  dpm_resume from dpm_resume_end+0xc/0x18
  dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
  suspend_devices_and_enter from pm_suspend+0x364/0x430
  pm_suspend from state_store+0x68/0xc8
  state_store from kernfs_fop_write_iter+0x110/0x1d4
  kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
  vfs_write from ksys_write+0x5c/0xd4
  ksys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
5fa0:                   00000004 0002b438 00000004 0002b438 00000004 
00000000
5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c 
00028160
5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
irq event stamp: 58381
hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
---[ end trace 0000000000000000 ]---

The mentioned patch fixes it.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-08-26  7:41               ` Marek Szyprowski
@ 2022-08-26  7:53                 ` Lukas Wunner
  2022-08-26  8:29                   ` Marek Szyprowski
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-08-26  7:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
> On 26.08.2022 09:19, Lukas Wunner wrote:
> > On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
> >> On 19.05.2022 23:22, Marek Szyprowski wrote:
> >>> On 19.05.2022 21:08, Lukas Wunner wrote:
> >>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> >>>>> This patch landed in the recent linux next-20220516 as commit
> >>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
> >>>>> driver to
> >>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
> >>>>> operation
> >>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
> >>>>> following warning in the kernel log:
> >>>>>
> >>>>> # time rtcwake -s10 -mmem
> >>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> >>>>> PM: suspend entry (deep)
> >>>>> Filesystems sync: 0.001 seconds
> >>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
> >>>>> OOM killer disabled.
> >>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>>>> printk: Suspending console(s) (use no_console_suspend to debug)
> >>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> >>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> >>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> >>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> >>>>> ------------[ cut here ]------------
> >>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> >>>>> phy_state_machine+0x98/0x28c
> >>>> [...]
> >>>>> It looks that the driver's suspend/resume operations might need some
> >>>>> adjustments. After the system suspend/resume cycle the driver is not
> >>>>> operational anymore. Reverting the $subject patch on top of linux
> >>>>> next-20220516 restores ethernet operation after system suspend/resume.
> >>>> Thanks a lot for the report. It seems the PHY is signaling a link
> >>>> change
> >>>> shortly before system sleep and by the time the phy_state_machine()
> >>>> worker
> >>>> gets around to handle it, the device has already been suspended and thus
> >>>> refuses any further USB requests with -EHOSTUNREACH (-113):
> > [...]
> >>>> Assuming the above theory is correct, calling phy_stop_machine()
> >>>> after usbnet_suspend() would be sufficient to fix the issue.
> >>>> It cancels the phy_state_machine() worker.
> >>>>
> >>>> The small patch below does that. Could you give it a spin?
> >>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
> >>>
> >>> Feel free to add:
> >>>
> >>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>
> >>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Gentle ping for the final patch...
> > Hm?  Actually this issue is supposed to be fixed by mainline commit
> > 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
> >
> > The initial fix attempt that you're replying to should not be necessary
> > with that commit.
> >
> > Are you still seeing issues even with 1758bde2e4aa applied?
> > Or are you maybe using a custom downstream tree which is missing that commit?
> 
> On Linux next-20220825 I still get the following warning during 
> suspend/resume cycle:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323 
> mdio_bus_phy_resume+0x10c/0x110
> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig 
> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev 
> mc s5p_cec
> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
> Hardware name: Samsung Exynos (Flattened Device Tree)
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from __warn+0xc8/0x220
>   __warn from warn_slowpath_fmt+0x5c/0xb4
>   warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>   mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>   dpm_run_callback from device_resume+0x124/0x21c
>   device_resume from dpm_resume+0x108/0x278
>   dpm_resume from dpm_resume_end+0xc/0x18
>   dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>   suspend_devices_and_enter from pm_suspend+0x364/0x430
>   pm_suspend from state_store+0x68/0xc8
>   state_store from kernfs_fop_write_iter+0x110/0x1d4
>   kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>   vfs_write from ksys_write+0x5c/0xd4
>   ksys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004 
> 00000000
> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c 
> 00028160
> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
> irq event stamp: 58381
> hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
> hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
> softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
> softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
> ---[ end trace 0000000000000000 ]---
> 
> The mentioned patch fixes it.

Color me confused.

With "the mentioned patch", are you referring to 1758bde2e4aa
or are you referring to the little test patch in this e-mail:
https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/

There's a Tested-by from you on 1758bde2e4aa, so I assume the
commit fixed the issue at the time.  Does it still occur
intermittently?  Or does it occur every time?  In the latter case,
I'd assume some other change was made in the meantime and that
other change exposed the issue again...

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-08-26  7:53                 ` Lukas Wunner
@ 2022-08-26  8:29                   ` Marek Szyprowski
  2022-08-29 11:40                     ` Marek Szyprowski
  0 siblings, 1 reply; 26+ messages in thread
From: Marek Szyprowski @ 2022-08-26  8:29 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

Hi Lukas,

On 26.08.2022 09:53, Lukas Wunner wrote:
> On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
>> On 26.08.2022 09:19, Lukas Wunner wrote:
>>> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
>>>> On 19.05.2022 23:22, Marek Szyprowski wrote:
>>>>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>>>>>> This patch landed in the recent linux next-20220516 as commit
>>>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
>>>>>>> driver to
>>>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
>>>>>>> operation
>>>>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>>>>>>> following warning in the kernel log:
>>>>>>>
>>>>>>> # time rtcwake -s10 -mmem
>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>>>>>>> PM: suspend entry (deep)
>>>>>>> Filesystems sync: 0.001 seconds
>>>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>>>> OOM killer disabled.
>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>>>>>> ------------[ cut here ]------------
>>>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>>>>>> phy_state_machine+0x98/0x28c
>>>>>> [...]
>>>>>>> It looks that the driver's suspend/resume operations might need some
>>>>>>> adjustments. After the system suspend/resume cycle the driver is not
>>>>>>> operational anymore. Reverting the $subject patch on top of linux
>>>>>>> next-20220516 restores ethernet operation after system suspend/resume.
>>>>>> Thanks a lot for the report. It seems the PHY is signaling a link
>>>>>> change
>>>>>> shortly before system sleep and by the time the phy_state_machine()
>>>>>> worker
>>>>>> gets around to handle it, the device has already been suspended and thus
>>>>>> refuses any further USB requests with -EHOSTUNREACH (-113):
>>> [...]
>>>>>> Assuming the above theory is correct, calling phy_stop_machine()
>>>>>> after usbnet_suspend() would be sufficient to fix the issue.
>>>>>> It cancels the phy_state_machine() worker.
>>>>>>
>>>>>> The small patch below does that. Could you give it a spin?
>>>>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
>>>>>
>>>>> Feel free to add:
>>>>>
>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>
>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>> Gentle ping for the final patch...
>>> Hm?  Actually this issue is supposed to be fixed by mainline commit
>>> 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
>>>
>>> The initial fix attempt that you're replying to should not be necessary
>>> with that commit.
>>>
>>> Are you still seeing issues even with 1758bde2e4aa applied?
>>> Or are you maybe using a custom downstream tree which is missing that commit?
>> On Linux next-20220825 I still get the following warning during
>> suspend/resume cycle:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323
>> mdio_bus_phy_resume+0x10c/0x110
>> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig
>> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev
>> mc s5p_cec
>> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>    unwind_backtrace from show_stack+0x10/0x14
>>    show_stack from dump_stack_lvl+0x58/0x70
>>    dump_stack_lvl from __warn+0xc8/0x220
>>    __warn from warn_slowpath_fmt+0x5c/0xb4
>>    warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>>    mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>>    dpm_run_callback from device_resume+0x124/0x21c
>>    device_resume from dpm_resume+0x108/0x278
>>    dpm_resume from dpm_resume_end+0xc/0x18
>>    dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>>    suspend_devices_and_enter from pm_suspend+0x364/0x430
>>    pm_suspend from state_store+0x68/0xc8
>>    state_store from kernfs_fop_write_iter+0x110/0x1d4
>>    kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>>    vfs_write from ksys_write+0x5c/0xd4
>>    ksys_write from ret_fast_syscall+0x0/0x1c
>> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
>> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004
>> 00000000
>> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c
>> 00028160
>> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
>> irq event stamp: 58381
>> hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
>> hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
>> softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
>> softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
>> ---[ end trace 0000000000000000 ]---
>>
>> The mentioned patch fixes it.
> Color me confused.
>
> With "the mentioned patch", are you referring to 1758bde2e4aa
> or are you referring to the little test patch in this e-mail:
> https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/
>
> There's a Tested-by from you on 1758bde2e4aa, so I assume the
> commit fixed the issue at the time.  Does it still occur
> intermittently?  Or does it occur every time?  In the latter case,
> I'd assume some other change was made in the meantime and that
> other change exposed the issue again...

The issue seems to be exposed again. On 1758bde2e4aa everything works 
fine and there is no warning during suspend/resume cycle. Then I've 
noticed that warning again while playing with current linux-next. I will 
check when it got broken and report again.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-08-26  8:29                   ` Marek Szyprowski
@ 2022-08-29 11:40                     ` Marek Szyprowski
  2022-09-18 19:13                       ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Marek Szyprowski @ 2022-08-29 11:40 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

Hi All,

On 26.08.2022 10:29, Marek Szyprowski wrote:
> On 26.08.2022 09:53, Lukas Wunner wrote:
>> On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
>>> On 26.08.2022 09:19, Lukas Wunner wrote:
>>>> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
>>>>> On 19.05.2022 23:22, Marek Szyprowski wrote:
>>>>>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>>>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>>>>>>> This patch landed in the recent linux next-20220516 as commit
>>>>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
>>>>>>>> driver to
>>>>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
>>>>>>>> operation
>>>>>>>> after system suspend-resume cycle. On the Odroid XU3 board I 
>>>>>>>> got the
>>>>>>>> following warning in the kernel log:
>>>>>>>>
>>>>>>>> # time rtcwake -s10 -mmem
>>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 
>>>>>>>> 09:16:07 2022
>>>>>>>> PM: suspend entry (deep)
>>>>>>>> Filesystems sync: 0.001 seconds
>>>>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>>>>> OOM killer disabled.
>>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
>>>>>>>> done.
>>>>>>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>>>>>>> ------------[ cut here ]------------
>>>>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>>>>>>> phy_state_machine+0x98/0x28c
>>>>>>> [...]
>>>>>>>> It looks that the driver's suspend/resume operations might need 
>>>>>>>> some
>>>>>>>> adjustments. After the system suspend/resume cycle the driver 
>>>>>>>> is not
>>>>>>>> operational anymore. Reverting the $subject patch on top of linux
>>>>>>>> next-20220516 restores ethernet operation after system 
>>>>>>>> suspend/resume.
>>>>>>> Thanks a lot for the report. It seems the PHY is signaling a link
>>>>>>> change
>>>>>>> shortly before system sleep and by the time the phy_state_machine()
>>>>>>> worker
>>>>>>> gets around to handle it, the device has already been suspended 
>>>>>>> and thus
>>>>>>> refuses any further USB requests with -EHOSTUNREACH (-113):
>>>> [...]
>>>>>>> Assuming the above theory is correct, calling phy_stop_machine()
>>>>>>> after usbnet_suspend() would be sufficient to fix the issue.
>>>>>>> It cancels the phy_state_machine() worker.
>>>>>>>
>>>>>>> The small patch below does that. Could you give it a spin?
>>>>>> That's it. Your analysis is right and the patch fixes the issue. 
>>>>>> Thanks!
>>>>>>
>>>>>> Feel free to add:
>>>>>>
>>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>
>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> Gentle ping for the final patch...
>>>> Hm?  Actually this issue is supposed to be fixed by mainline commit
>>>> 1758bde2e4aa ("net: phy: Don't trigger state machine while in 
>>>> suspend").
>>>>
>>>> The initial fix attempt that you're replying to should not be 
>>>> necessary
>>>> with that commit.
>>>>
>>>> Are you still seeing issues even with 1758bde2e4aa applied?
>>>> Or are you maybe using a custom downstream tree which is missing 
>>>> that commit?
>>> On Linux next-20220825 I still get the following warning during
>>> suspend/resume cycle:
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323
>>> mdio_bus_phy_resume+0x10c/0x110
>>> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig
>>> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev
>>> mc s5p_cec
>>> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 
>>> #5482
>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>>    unwind_backtrace from show_stack+0x10/0x14
>>>    show_stack from dump_stack_lvl+0x58/0x70
>>>    dump_stack_lvl from __warn+0xc8/0x220
>>>    __warn from warn_slowpath_fmt+0x5c/0xb4
>>>    warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>>>    mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>>>    dpm_run_callback from device_resume+0x124/0x21c
>>>    device_resume from dpm_resume+0x108/0x278
>>>    dpm_resume from dpm_resume_end+0xc/0x18
>>>    dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>>>    suspend_devices_and_enter from pm_suspend+0x364/0x430
>>>    pm_suspend from state_store+0x68/0xc8
>>>    state_store from kernfs_fop_write_iter+0x110/0x1d4
>>>    kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>>>    vfs_write from ksys_write+0x5c/0xd4
>>>    ksys_write from ret_fast_syscall+0x0/0x1c
>>> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
>>> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004
>>> 00000000
>>> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c
>>> 00028160
>>> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
>>> irq event stamp: 58381
>>> hardirqs last  enabled at (58393): [<c019ff28>] 
>>> vprintk_emit+0x320/0x344
>>> hardirqs last disabled at (58400): [<c019fedc>] 
>>> vprintk_emit+0x2d4/0x344
>>> softirqs last  enabled at (58258): [<c0101694>] 
>>> __do_softirq+0x354/0x618
>>> softirqs last disabled at (58247): [<c012dd18>] 
>>> __irq_exit_rcu+0x140/0x1ec
>>> ---[ end trace 0000000000000000 ]---
>>>
>>> The mentioned patch fixes it.
>> Color me confused.
>>
>> With "the mentioned patch", are you referring to 1758bde2e4aa
>> or are you referring to the little test patch in this e-mail:
>> https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/
>>
>> There's a Tested-by from you on 1758bde2e4aa, so I assume the
>> commit fixed the issue at the time.  Does it still occur
>> intermittently?  Or does it occur every time?  In the latter case,
>> I'd assume some other change was made in the meantime and that
>> other change exposed the issue again...
>
> The issue seems to be exposed again. On 1758bde2e4aa everything works 
> fine and there is no warning during suspend/resume cycle. Then I've 
> noticed that warning again while playing with current linux-next. I 
> will check when it got broken and report again.

I've finally traced what has happened. I've double checked and indeed 
the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as 
such it has been merged to linus tree. Then the commit 744d23c71af3 
("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been 
merged to linus tree, which triggers a new warning during the 
suspend/resume cycle with smsc95xx driver. Please note, that the 
smsc95xx still works fine regardless that warning. However it look that 
the commit 1758bde2e4aa only hide a real problem, which the commit 
744d23c71af3 warns about.

Probably a proper fix for smsc95xx driver is to call phy_stop/start 
during suspend/resume cycle, like in similar patches for other drivers:

https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-08-29 11:40                     ` Marek Szyprowski
@ 2022-09-18 19:13                       ` Lukas Wunner
  2022-09-18 20:41                         ` Florian Fainelli
  2022-09-22 13:21                         ` Marek Szyprowski
  0 siblings, 2 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-09-18 19:13 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC',
	Florian Fainelli

[cc += Florian]

On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
> I've finally traced what has happened. I've double checked and indeed 
> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as 
> such it has been merged to linus tree. Then the commit 744d23c71af3 
> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been 
> merged to linus tree, which triggers a new warning during the 
> suspend/resume cycle with smsc95xx driver. Please note, that the 
> smsc95xx still works fine regardless that warning. However it look that 
> the commit 1758bde2e4aa only hide a real problem, which the commit 
> 744d23c71af3 warns about.
> 
> Probably a proper fix for smsc95xx driver is to call phy_stop/start 
> during suspend/resume cycle, like in similar patches for other drivers:
> 
> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/

No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
no need to call phy_{stop,start}().

744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
of the fallout.

However the WARN() condition still seems too broad and causes false
positives such as in your case.  In particular, mdio_bus_phy_suspend()
may leave the device in PHY_UP state, so that's a legal state that
needs to be exempted from the WARN().

Does the issue still appear even after 6dbe852c379f?

If it does, could you test whether exempting PHY_UP silences the
gratuitous WARN splat?  I.e.:

-	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY);
+	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
+		phydev->state != PHY_UP);

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-09-18 19:13                       ` Lukas Wunner
@ 2022-09-18 20:41                         ` Florian Fainelli
  2022-09-18 20:55                           ` Lukas Wunner
  2022-09-22 13:21                         ` Marek Szyprowski
  1 sibling, 1 reply; 26+ messages in thread
From: Florian Fainelli @ 2022-09-18 20:41 UTC (permalink / raw)
  To: Lukas Wunner, Marek Szyprowski
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'



On 9/18/2022 12:13 PM, Lukas Wunner wrote:
> [cc += Florian]
> 
> On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
>> I've finally traced what has happened. I've double checked and indeed
>> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
>> such it has been merged to linus tree. Then the commit 744d23c71af3
>> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
>> merged to linus tree, which triggers a new warning during the
>> suspend/resume cycle with smsc95xx driver. Please note, that the
>> smsc95xx still works fine regardless that warning. However it look that
>> the commit 1758bde2e4aa only hide a real problem, which the commit
>> 744d23c71af3 warns about.
>>
>> Probably a proper fix for smsc95xx driver is to call phy_stop/start
>> during suspend/resume cycle, like in similar patches for other drivers:
>>
>> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> 
> No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> no need to call phy_{stop,start}() >
> 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> of the fallout.
> 
> However the WARN() condition still seems too broad and causes false
> positives such as in your case.  In particular, mdio_bus_phy_suspend()
> may leave the device in PHY_UP state, so that's a legal state that
> needs to be exempted from the WARN().

How is that a legal state when the PHY should be suspended? Even if we 
are interrupt driven, the state machine should be stopped, does not mean 
that Wake-on-LAN or other activity interrupts should be disabled.

> 
> Does the issue still appear even after 6dbe852c379f?
> 
> If it does, could you test whether exempting PHY_UP silences the
> gratuitous WARN splat?  I.e.:

If you allow PHY_UP, then the warning becomes effectively useless, so I 
don't believe this is quite what you want to do here.
-- 
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-09-18 20:41                         ` Florian Fainelli
@ 2022-09-18 20:55                           ` Lukas Wunner
  2022-09-18 22:11                             ` Florian Fainelli
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-09-18 20:55 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On Sun, Sep 18, 2022 at 01:41:13PM -0700, Florian Fainelli wrote:
> On 9/18/2022 12:13 PM, Lukas Wunner wrote:
> > On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
> > > I've finally traced what has happened. I've double checked and indeed
> > > the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
> > > such it has been merged to linus tree. Then the commit 744d23c71af3
> > > ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
> > > merged to linus tree, which triggers a new warning during the
> > > suspend/resume cycle with smsc95xx driver. Please note, that the
> > > smsc95xx still works fine regardless that warning. However it look that
> > > the commit 1758bde2e4aa only hide a real problem, which the commit
> > > 744d23c71af3 warns about.
> > > 
> > > Probably a proper fix for smsc95xx driver is to call phy_stop/start
> > > during suspend/resume cycle, like in similar patches for other drivers:
> > > 
> > > https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> > 
> > No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> > no need to call phy_{stop,start}() >
> > 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> > of the fallout.
> > 
> > However the WARN() condition still seems too broad and causes false
> > positives such as in your case.  In particular, mdio_bus_phy_suspend()
> > may leave the device in PHY_UP state, so that's a legal state that
> > needs to be exempted from the WARN().
> 
> How is that a legal state when the PHY should be suspended? Even if we are
> interrupt driven, the state machine should be stopped, does not mean that
> Wake-on-LAN or other activity interrupts should be disabled.

mdio_bus_phy_suspend()
  phy_stop_machine()
    phydev->state = PHY_UP  #  if (phydev->state >= PHY_UP)

So apparently PHY_UP is a legal state for a suspended PHY.


> > Does the issue still appear even after 6dbe852c379f?
> > 
> > If it does, could you test whether exempting PHY_UP silences the
> > gratuitous WARN splat?  I.e.:
> 
> If you allow PHY_UP, then the warning becomes effectively useless, so I
> don't believe this is quite what you want to do here.

Hm, maybe the WARN() should be dropped altogether?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-09-18 20:55                           ` Lukas Wunner
@ 2022-09-18 22:11                             ` Florian Fainelli
  2022-09-23  4:20                               ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Fainelli @ 2022-09-18 22:11 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'



On 9/18/2022 1:55 PM, Lukas Wunner wrote:
> On Sun, Sep 18, 2022 at 01:41:13PM -0700, Florian Fainelli wrote:
>> On 9/18/2022 12:13 PM, Lukas Wunner wrote:
>>> On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
>>>> I've finally traced what has happened. I've double checked and indeed
>>>> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
>>>> such it has been merged to linus tree. Then the commit 744d23c71af3
>>>> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
>>>> merged to linus tree, which triggers a new warning during the
>>>> suspend/resume cycle with smsc95xx driver. Please note, that the
>>>> smsc95xx still works fine regardless that warning. However it look that
>>>> the commit 1758bde2e4aa only hide a real problem, which the commit
>>>> 744d23c71af3 warns about.
>>>>
>>>> Probably a proper fix for smsc95xx driver is to call phy_stop/start
>>>> during suspend/resume cycle, like in similar patches for other drivers:
>>>>
>>>> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
>>>
>>> No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
>>> no need to call phy_{stop,start}() >
>>> 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
>>> of the fallout.
>>>
>>> However the WARN() condition still seems too broad and causes false
>>> positives such as in your case.  In particular, mdio_bus_phy_suspend()
>>> may leave the device in PHY_UP state, so that's a legal state that
>>> needs to be exempted from the WARN().
>>
>> How is that a legal state when the PHY should be suspended? Even if we are
>> interrupt driven, the state machine should be stopped, does not mean that
>> Wake-on-LAN or other activity interrupts should be disabled.
> 
> mdio_bus_phy_suspend()
>    phy_stop_machine()
>      phydev->state = PHY_UP  #  if (phydev->state >= PHY_UP)
> 
> So apparently PHY_UP is a legal state for a suspended PHY.

It is not clear to me why, however. Sure it does ensure that when we 
resume we set needs_aneg = true but this feels like a hack in the sense 
that we are setting the PHY in a provisional state in anticipation for 
what might come next.

> 
> 
>>> Does the issue still appear even after 6dbe852c379f?
>>>
>>> If it does, could you test whether exempting PHY_UP silences the
>>> gratuitous WARN splat?  I.e.:
>>
>> If you allow PHY_UP, then the warning becomes effectively useless, so I
>> don't believe this is quite what you want to do here.
> 
> Hm, maybe the WARN() should be dropped altogether?

And then be left with debugging similar problems that prompted me to 
submit the patch in the first place, no thank you. I guess I would 
rather accept that PHY_UP needs to be special cased then.
-- 
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-09-18 19:13                       ` Lukas Wunner
  2022-09-18 20:41                         ` Florian Fainelli
@ 2022-09-22 13:21                         ` Marek Szyprowski
  1 sibling, 0 replies; 26+ messages in thread
From: Marek Szyprowski @ 2022-09-22 13:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, linux-usb, Steve Glendinning, UNGLinuxDriver,
	Oliver Neukum, Andre Edich, Oleksij Rempel, Martyn Welch,
	Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC',
	Florian Fainelli

On 18.09.2022 21:13, Lukas Wunner wrote:
> [cc += Florian]
>
> On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
>> I've finally traced what has happened. I've double checked and indeed
>> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
>> such it has been merged to linus tree. Then the commit 744d23c71af3
>> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
>> merged to linus tree, which triggers a new warning during the
>> suspend/resume cycle with smsc95xx driver. Please note, that the
>> smsc95xx still works fine regardless that warning. However it look that
>> the commit 1758bde2e4aa only hide a real problem, which the commit
>> 744d23c71af3 warns about.
>>
>> Probably a proper fix for smsc95xx driver is to call phy_stop/start
>> during suspend/resume cycle, like in similar patches for other drivers:
>>
>> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> no need to call phy_{stop,start}().
>
> 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> of the fallout.
>
> However the WARN() condition still seems too broad and causes false
> positives such as in your case.  In particular, mdio_bus_phy_suspend()
> may leave the device in PHY_UP state, so that's a legal state that
> needs to be exempted from the WARN().
>
> Does the issue still appear even after 6dbe852c379f?
>
> If it does, could you test whether exempting PHY_UP silences the
> gratuitous WARN splat?  I.e.:
>
> -	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY);
> +	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
> +		phydev->state != PHY_UP);

Yes, this hides this warning in my case. I've tested on linux 
next-20220921 with the above change.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
  2022-09-18 22:11                             ` Florian Fainelli
@ 2022-09-23  4:20                               ` Lukas Wunner
  0 siblings, 0 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-09-23  4:20 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Marek Szyprowski, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, netdev, linux-usb, Steve Glendinning,
	UNGLinuxDriver, Oliver Neukum, Andre Edich, Oleksij Rempel,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King,
	Ferry Toth, Krzysztof Kozlowski, 'Linux Samsung SOC'

On Sun, Sep 18, 2022 at 03:11:47PM -0700, Florian Fainelli wrote:
> On 9/18/2022 1:55 PM, Lukas Wunner wrote:
> > On Sun, Sep 18, 2022 at 01:41:13PM -0700, Florian Fainelli wrote:
> > > On 9/18/2022 12:13 PM, Lukas Wunner wrote:
> > > > On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
> > > > > I've finally traced what has happened. I've double checked and indeed
> > > > > the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
> > > > > such it has been merged to linus tree. Then the commit 744d23c71af3
> > > > > ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
> > > > > merged to linus tree, which triggers a new warning during the
> > > > > suspend/resume cycle with smsc95xx driver. Please note, that the
> > > > > smsc95xx still works fine regardless that warning. However it look that
> > > > > the commit 1758bde2e4aa only hide a real problem, which the commit
> > > > > 744d23c71af3 warns about.
> > > > > 
> > > > > Probably a proper fix for smsc95xx driver is to call phy_stop/start
> > > > > during suspend/resume cycle, like in similar patches for other drivers:
> > > > > 
> > > > > https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> > > > 
> > > > No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> > > > no need to call phy_{stop,start}() >
> > > > 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> > > > of the fallout.
> > > > 
> > > > However the WARN() condition still seems too broad and causes false
> > > > positives such as in your case.  In particular, mdio_bus_phy_suspend()
> > > > may leave the device in PHY_UP state, so that's a legal state that
> > > > needs to be exempted from the WARN().
> > > 
> > > How is that a legal state when the PHY should be suspended? Even if we are
> > > interrupt driven, the state machine should be stopped, does not mean that
> > > Wake-on-LAN or other activity interrupts should be disabled.
> > 
> > mdio_bus_phy_suspend()
> >    phy_stop_machine()
> >      phydev->state = PHY_UP  #  if (phydev->state >= PHY_UP)
> > 
> > So apparently PHY_UP is a legal state for a suspended PHY.
> 
> It is not clear to me why, however. Sure it does ensure that when we resume
> we set needs_aneg = true but this feels like a hack in the sense that we are
> setting the PHY in a provisional state in anticipation for what might come
> next.

I've just submitted a fix so that at least v6.0 doesn't get released
with a false-positive WARN splat on resume:

https://lore.kernel.org/netdev/8128fdb51eeebc9efbf3776a4097363a1317aaf1.1663905575.git.lukas@wunner.de/

I guess we can look into making the state setting more logical in a
separate step.


> > > If you allow PHY_UP, then the warning becomes effectively useless, so I
> > > don't believe this is quite what you want to do here.
> > 
> > Hm, maybe the WARN() should be dropped altogether?
> 
> And then be left with debugging similar problems that prompted me to submit
> the patch in the first place, no thank you. I guess I would rather accept
> that PHY_UP needs to be special cased then.

I've interpreted that as an Acked-by for exempting PHY_UP.
If that was not what you wanted, please speak up.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-09-23  4:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1652343655.git.lukas@wunner.de>
     [not found] ` <748ac44eeb97b209f66182f3788d2a49d7bc28fe.1652343655.git.lukas@wunner.de>
     [not found]   ` <CGME20220517101846eucas1p2c132f7e7032ed00996e222e9cc6cdf99@eucas1p2.samsung.com>
2022-05-17 10:18     ` [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling Marek Szyprowski
2022-05-19 19:08       ` Lukas Wunner
2022-05-19 21:22         ` Marek Szyprowski
2022-05-23  9:43           ` Lukas Wunner
2022-05-23 11:38             ` Marek Szyprowski
2022-05-23 12:34             ` Andrew Lunn
2022-05-23 13:47               ` Lukas Wunner
2022-05-24  0:52                 ` Andrew Lunn
2022-05-24  1:08             ` Andrew Lunn
2022-05-24  6:16               ` Marek Szyprowski
2022-05-24 12:03                 ` Andrew Lunn
2022-05-26 13:55                   ` Lukas Wunner
2022-05-24 12:13               ` Lukas Wunner
2022-06-06  1:28                 ` Andrew Lunn
2022-08-26  6:51           ` Marek Szyprowski
2022-08-26  7:19             ` Lukas Wunner
2022-08-26  7:41               ` Marek Szyprowski
2022-08-26  7:53                 ` Lukas Wunner
2022-08-26  8:29                   ` Marek Szyprowski
2022-08-29 11:40                     ` Marek Szyprowski
2022-09-18 19:13                       ` Lukas Wunner
2022-09-18 20:41                         ` Florian Fainelli
2022-09-18 20:55                           ` Lukas Wunner
2022-09-18 22:11                             ` Florian Fainelli
2022-09-23  4:20                               ` Lukas Wunner
2022-09-22 13:21                         ` Marek Szyprowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).