* Re: stable-rc/linux-5.18.y bisection: baseline.login on panda
[not found] <63068874.170a0220.2f9fc.b822@mx.google.com>
@ 2022-08-24 22:47 ` Mark Brown
2022-08-25 4:21 ` Lukas Wunner
0 siblings, 1 reply; 2+ messages in thread
From: Mark Brown @ 2022-08-24 22:47 UTC (permalink / raw)
To: Oleksij Rempel, Ferry Toth, Lukas Wunner, Andrew Lunn,
Andre Edich, David S. Miller, Sasha Levin
Cc: kernelci-results, bot, gtucker, stable, netdev, linux-omap
[-- Attachment #1: Type: text/plain, Size: 17812 bytes --]
On Wed, Aug 24, 2022 at 01:22:12PM -0700, KernelCI bot wrote:
The KernelCI bisection bot identified 7eea9a60703ca ("usbnet: smsc95xx:
Forward PHY interrupts to PHY driver to avoid polling") as causing a
boot regression on Panda in v5.18. The board stops detecting a link
which breks NFS boot:
<6>[ 4.953613] smsc95xx 2-1.1:1.0 eth0: register 'smsc95xx' at usb-4a064c00.ehci-1.1, smsc95xx USB 2.0 Ethernet, 02:03:01:8c:13:b0
<6>[ 5.032928] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup
<6>[ 5.044036] smsc95xx 2-1.1:1.0 eth0: Link is Down
<6>[ 25.053924] Waiting up to 100 more seconds for network.
<6>[ 45.074005] Waiting up to 80 more seconds for network.
<6>[ 65.093933] Waiting up to 60 more seconds for network.
<6>[ 85.123992] Waiting up to 40 more seconds for network.
<6>[ 105.143951] Waiting up to 20 more seconds for network.
<5>[ 125.084014] Sending DHCP requests ...... timed out!
<6>[ 207.765808] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup
<3>[ 207.773468] IP-Config: Reopening network devices...
<6>[ 207.840332] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup
<6>[ 207.851226] smsc95xx 2-1.1:1.0 eth0: Link is Down
<6>[ 227.873931] Waiting up to 100 more seconds for network.
<6>[ 247.873931] Waiting up to 80 more seconds for network.
We're seen other failures on this board in mainline which stop boot
sooner so can't confirm if there's a similar issue there.
I've left the whole report, including links to more details like full
logs and a tag for the bot below:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis *
> * that you may be involved with the breaking commit it has *
> * found. No manual investigation has been done to verify it, *
> * and the root cause of the problem may be somewhere else. *
> * *
> * If you do send a fix, please include this trailer: *
> * Reported-by: "kernelci.org bot" <bot@kernelci.org> *
> * *
> * Hope this helps! *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> stable-rc/linux-5.18.y bisection: baseline.login on panda
>
> Summary:
> Start: 22a992953741a Linux 5.18.19
> Plain log: https://storage.kernelci.org/stable-rc/linux-5.18.y/v5.18.19/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-panda.txt
> HTML log: https://storage.kernelci.org/stable-rc/linux-5.18.y/v5.18.19/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-panda.html
> Result: 7eea9a60703ca usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
>
> Checks:
> revert: PASS
> verify: PASS
>
> Parameters:
> Tree: stable-rc
> URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Branch: linux-5.18.y
> Target: panda
> CPU arch: arm
> Lab: lab-baylibre
> Compiler: gcc-10
> Config: multi_v7_defconfig
> Test case: baseline.login
>
> Breaking commit found:
>
> -------------------------------------------------------------------------------
> commit 7eea9a60703caf1b04eccba93cd103f1c8fc6809
> Author: Lukas Wunner <lukas@wunner.de>
> Date: Thu May 12 10:42:05 2022 +0200
>
> usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
>
> [ Upstream commit 1ce8b37241ed291af56f7a49bbdbf20c08728e88 ]
>
> Link status of SMSC LAN95xx chips is polled once per second, even though
> they're capable of signaling PHY interrupts through the MAC layer.
>
> Forward those interrupts to the PHY driver to avoid polling. Benefits
> are reduced bus traffic, reduced CPU overhead and quicker interface
> bringup.
>
> Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
> smsc95xx: fix link detection for disabled autonegotiation").
> Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
> hence couldn't detect link-up events when auto-negotiation was disabled.
> The proper solution would have been to enable the ENERGYON interrupt
> instead of polling.
>
> Since then, PHY handling was moved from the LAN95xx driver to the SMSC
> PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
> That PHY driver is capable of link detection with auto-negotiation
> disabled because it enables the ENERGYON interrupt.
>
> Note that signaling interrupts through the MAC layer not only works with
> the integrated PHY, but also with an external PHY, provided its
> interrupt pin is attached to LAN95xx's nPHY_INT pin.
>
> In the unlikely event that the interrupt pin of an external PHY is
> attached to a GPIO of the SoC (or not connected at all), the driver can
> be amended to retrieve the irq from the PHY's of_node.
>
> To forward PHY interrupts to phylib, it is not sufficient to call
> phy_mac_interrupt(). Instead, the PHY's interrupt handler needs to run
> so that PHY interrupts are cleared. That's because according to page
> 119 of the LAN950x datasheet, "The source of this interrupt is a level.
> The interrupt persists until it is cleared in the PHY."
>
> https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
>
> Therefore, create an IRQ domain with a single IRQ for the PHY. In the
> future, the IRQ domain may be extended to support the 11 GPIOs on the
> LAN95xx.
>
> Normally the PHY interrupt should be masked until the PHY driver has
> cleared it. However masking requires a (sleeping) USB transaction and
> interrupts are received in (non-sleepable) softirq context. I decided
> not to mask the interrupt at all (by using the dummy_irq_chip's noop
> ->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec
> intervals and normally that's sufficient to wake the PHY driver's IRQ
> thread and have it clear the interrupt. If it does take longer, worst
> thing that can happen is the IRQ thread is woken again. No big deal.
>
> Because PHY interrupts are now perpetually enabled, there's no need to
> selectively enable them on suspend. So remove all invocations of
> smsc95xx_enable_phy_wakeup_interrupts().
>
> In smsc95xx_resume(), move the call of phy_init_hw() before
> usbnet_resume() (which restarts the status URB) to ensure that the PHY
> is fully initialized when an interrupt is handled.
>
> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
> Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
> Cc: Andre Edich <andre.edich@microchip.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index f5a208948d22a..358b170cc8fb7 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -18,6 +18,8 @@
> #include <linux/usb/usbnet.h>
> #include <linux/slab.h>
> #include <linux/of_net.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> #include <linux/mdio.h>
> #include <linux/phy.h>
> #include <net/selftests.h>
> @@ -53,6 +55,9 @@
> #define SUSPEND_ALLMODES (SUSPEND_SUSPEND0 | SUSPEND_SUSPEND1 | \
> SUSPEND_SUSPEND2 | SUSPEND_SUSPEND3)
>
> +#define SMSC95XX_NR_IRQS (1) /* raise to 12 for GPIOs */
> +#define PHY_HWIRQ (SMSC95XX_NR_IRQS - 1)
> +
> struct smsc95xx_priv {
> u32 mac_cr;
> u32 hash_hi;
> @@ -61,6 +66,9 @@ struct smsc95xx_priv {
> spinlock_t mac_cr_lock;
> u8 features;
> u8 suspend_flags;
> + struct irq_chip irqchip;
> + struct irq_domain *irqdomain;
> + struct fwnode_handle *irqfwnode;
> struct mii_bus *mdiobus;
> struct phy_device *phydev;
> };
> @@ -597,6 +605,8 @@ static void smsc95xx_mac_update_fullduplex(struct usbnet *dev)
>
> static void smsc95xx_status(struct usbnet *dev, struct urb *urb)
> {
> + struct smsc95xx_priv *pdata = dev->driver_priv;
> + unsigned long flags;
> u32 intdata;
>
> if (urb->actual_length != 4) {
> @@ -608,11 +618,15 @@ static void smsc95xx_status(struct usbnet *dev, struct urb *urb)
> intdata = get_unaligned_le32(urb->transfer_buffer);
> netif_dbg(dev, link, dev->net, "intdata: 0x%08X\n", intdata);
>
> + local_irq_save(flags);
> +
> if (intdata & INT_ENP_PHY_INT_)
> - ;
> + generic_handle_domain_irq(pdata->irqdomain, PHY_HWIRQ);
> else
> netdev_warn(dev->net, "unexpected interrupt, intdata=0x%08X\n",
> intdata);
> +
> + local_irq_restore(flags);
> }
>
> /* Enable or disable Tx & Rx checksum offload engines */
> @@ -1098,8 +1112,9 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
> {
> struct smsc95xx_priv *pdata;
> bool is_internal_phy;
> + char usb_path[64];
> + int ret, phy_irq;
> u32 val;
> - int ret;
>
> printk(KERN_INFO SMSC_CHIPNAME " v" SMSC_DRIVER_VERSION "\n");
>
> @@ -1139,10 +1154,38 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
> if (ret)
> goto free_pdata;
>
> + /* create irq domain for use by PHY driver and GPIO consumers */
> + usb_make_path(dev->udev, usb_path, sizeof(usb_path));
> + pdata->irqfwnode = irq_domain_alloc_named_fwnode(usb_path);
> + if (!pdata->irqfwnode) {
> + ret = -ENOMEM;
> + goto free_pdata;
> + }
> +
> + pdata->irqdomain = irq_domain_create_linear(pdata->irqfwnode,
> + SMSC95XX_NR_IRQS,
> + &irq_domain_simple_ops,
> + pdata);
> + if (!pdata->irqdomain) {
> + ret = -ENOMEM;
> + goto free_irqfwnode;
> + }
> +
> + phy_irq = irq_create_mapping(pdata->irqdomain, PHY_HWIRQ);
> + if (!phy_irq) {
> + ret = -ENOENT;
> + goto remove_irqdomain;
> + }
> +
> + pdata->irqchip = dummy_irq_chip;
> + pdata->irqchip.name = SMSC_CHIPNAME;
> + irq_set_chip_and_handler_name(phy_irq, &pdata->irqchip,
> + handle_simple_irq, "phy");
> +
> pdata->mdiobus = mdiobus_alloc();
> if (!pdata->mdiobus) {
> ret = -ENOMEM;
> - goto free_pdata;
> + goto dispose_irq;
> }
>
> ret = smsc95xx_read_reg(dev, HW_CFG, &val);
> @@ -1175,6 +1218,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
> goto unregister_mdio;
> }
>
> + pdata->phydev->irq = phy_irq;
> pdata->phydev->is_internal = is_internal_phy;
>
> /* detect device revision as different features may be available */
> @@ -1217,6 +1261,15 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
> free_mdio:
> mdiobus_free(pdata->mdiobus);
>
> +dispose_irq:
> + irq_dispose_mapping(phy_irq);
> +
> +remove_irqdomain:
> + irq_domain_remove(pdata->irqdomain);
> +
> +free_irqfwnode:
> + irq_domain_free_fwnode(pdata->irqfwnode);
> +
> free_pdata:
> kfree(pdata);
> return ret;
> @@ -1229,6 +1282,9 @@ static void smsc95xx_unbind(struct usbnet *dev, struct usb_interface *intf)
> phy_disconnect(dev->net->phydev);
> mdiobus_unregister(pdata->mdiobus);
> mdiobus_free(pdata->mdiobus);
> + irq_dispose_mapping(irq_find_mapping(pdata->irqdomain, PHY_HWIRQ));
> + irq_domain_remove(pdata->irqdomain);
> + irq_domain_free_fwnode(pdata->irqfwnode);
> netif_dbg(dev, ifdown, dev->net, "free pdata\n");
> kfree(pdata);
> }
> @@ -1253,29 +1309,6 @@ static u32 smsc_crc(const u8 *buffer, size_t len, int filter)
> return crc << ((filter % 2) * 16);
> }
>
> -static int smsc95xx_enable_phy_wakeup_interrupts(struct usbnet *dev, u16 mask)
> -{
> - int ret;
> -
> - netdev_dbg(dev->net, "enabling PHY wakeup interrupts\n");
> -
> - /* read to clear */
> - ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_SRC);
> - if (ret < 0)
> - return ret;
> -
> - /* enable interrupt source */
> - ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_MASK);
> - if (ret < 0)
> - return ret;
> -
> - ret |= mask;
> -
> - smsc95xx_mdio_write_nopm(dev, PHY_INT_MASK, ret);
> -
> - return 0;
> -}
> -
> static int smsc95xx_link_ok_nopm(struct usbnet *dev)
> {
> int ret;
> @@ -1442,7 +1475,6 @@ static int smsc95xx_enter_suspend3(struct usbnet *dev)
> static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
> {
> struct smsc95xx_priv *pdata = dev->driver_priv;
> - int ret;
>
> if (!netif_running(dev->net)) {
> /* interface is ifconfig down so fully power down hw */
> @@ -1461,27 +1493,10 @@ static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
> }
>
> netdev_dbg(dev->net, "autosuspend entering SUSPEND1\n");
> -
> - /* enable PHY wakeup events for if cable is attached */
> - ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
> - PHY_INT_MASK_ANEG_COMP_);
> - if (ret < 0) {
> - netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
> - return ret;
> - }
> -
> netdev_info(dev->net, "entering SUSPEND1 mode\n");
> return smsc95xx_enter_suspend1(dev);
> }
>
> - /* enable PHY wakeup events so we remote wakeup if cable is pulled */
> - ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
> - PHY_INT_MASK_LINK_DOWN_);
> - if (ret < 0) {
> - netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
> - return ret;
> - }
> -
> netdev_dbg(dev->net, "autosuspend entering SUSPEND3\n");
> return smsc95xx_enter_suspend3(dev);
> }
> @@ -1547,13 +1562,6 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
> }
>
> if (pdata->wolopts & WAKE_PHY) {
> - ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
> - (PHY_INT_MASK_ANEG_COMP_ | PHY_INT_MASK_LINK_DOWN_));
> - if (ret < 0) {
> - netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
> - goto done;
> - }
> -
> /* if link is down then configure EDPD and enter SUSPEND1,
> * otherwise enter SUSPEND0 below
> */
> @@ -1787,11 +1795,12 @@ static int smsc95xx_resume(struct usb_interface *intf)
> return ret;
> }
>
> + phy_init_hw(pdata->phydev);
> +
> ret = usbnet_resume(intf);
> if (ret < 0)
> netdev_warn(dev->net, "usbnet_resume error\n");
>
> - phy_init_hw(pdata->phydev);
> return ret;
> }
> -------------------------------------------------------------------------------
>
>
> Git bisection log:
>
> -------------------------------------------------------------------------------
> git bisect start
> # good: [9aa5a042881d4a99657f82c774e9e15353ebeb2d] Linux 5.18.14
> git bisect good 9aa5a042881d4a99657f82c774e9e15353ebeb2d
> # bad: [22a992953741ad79c07890d3f4104585e52ef26b] Linux 5.18.19
> git bisect bad 22a992953741ad79c07890d3f4104585e52ef26b
> # good: [77d5174ed962c259965226386f71575790646ec0] drm/mediatek: dpi: Remove output format of YUV
> git bisect good 77d5174ed962c259965226386f71575790646ec0
> # good: [598bc9541e2f82a7fe8dfe23891201b355a56cda] usb: dwc3: core: Do not perform GCTL_CORE_SOFTRESET during bootup
> git bisect good 598bc9541e2f82a7fe8dfe23891201b355a56cda
> # good: [876f57cc94922896cc71dd4696013a7c0558c9b4] f2fs: give priority to select unpinned section for foreground GC
> git bisect good 876f57cc94922896cc71dd4696013a7c0558c9b4
> # bad: [5f4e505909fe50a4e256704076594ee3def0b9b1] block: add a bdev_max_zone_append_sectors helper
> git bisect bad 5f4e505909fe50a4e256704076594ee3def0b9b1
> # good: [b9c3401f7cac6ae291a16784dadcd1bf116218fe] x86/kprobes: Update kcb status flag after singlestepping
> git bisect good b9c3401f7cac6ae291a16784dadcd1bf116218fe
> # bad: [73ce2046e04ad488cecc66757c36cbe1bdf089d4] iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
> git bisect bad 73ce2046e04ad488cecc66757c36cbe1bdf089d4
> # good: [f624c94ad56b663693a9413d8c8c03635435f369] drm/vc4: drv: Adopt the dma configuration from the HVS or V3D component
> git bisect good f624c94ad56b663693a9413d8c8c03635435f369
> # bad: [87c4896d5dd7fd9927c814cf3c6289f41de3b562] firmware: arm_scpi: Ensure scpi_info is not assigned if the probe fails
> git bisect bad 87c4896d5dd7fd9927c814cf3c6289f41de3b562
> # good: [9b61971cab92a918cab7168d439a351b1c799aca] usbnet: smsc95xx: Avoid link settings race on interrupt reception
> git bisect good 9b61971cab92a918cab7168d439a351b1c799aca
> # bad: [e81395cfbe62518f41af79a1b287fc8fe7c96b37] usbnet: smsc95xx: Fix deadlock on runtime resume
> git bisect bad e81395cfbe62518f41af79a1b287fc8fe7c96b37
> # bad: [7eea9a60703caf1b04eccba93cd103f1c8fc6809] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
> git bisect bad 7eea9a60703caf1b04eccba93cd103f1c8fc6809
> # first bad commit: [7eea9a60703caf1b04eccba93cd103f1c8fc6809] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
> -------------------------------------------------------------------------------
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#30878): https://groups.io/g/kernelci-results/message/30878
> Mute This Topic: https://groups.io/mt/93235418/1131744
> Group Owner: kernelci-results+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: stable-rc/linux-5.18.y bisection: baseline.login on panda
2022-08-24 22:47 ` stable-rc/linux-5.18.y bisection: baseline.login on panda Mark Brown
@ 2022-08-25 4:21 ` Lukas Wunner
0 siblings, 0 replies; 2+ messages in thread
From: Lukas Wunner @ 2022-08-25 4:21 UTC (permalink / raw)
To: Mark Brown
Cc: Oleksij Rempel, Ferry Toth, Andrew Lunn, Andre Edich,
David S. Miller, Sasha Levin, kernelci-results, bot, gtucker,
stable, netdev, linux-omap
On Wed, Aug 24, 2022 at 11:47:03PM +0100, Mark Brown wrote:
> On Wed, Aug 24, 2022 at 01:22:12PM -0700, KernelCI bot wrote:
> The KernelCI bisection bot identified 7eea9a60703ca ("usbnet: smsc95xx:
> Forward PHY interrupts to PHY driver to avoid polling") as causing a
> boot regression on Panda in v5.18. The board stops detecting a link
> which breks NFS boot:
>
> <6>[ 4.953613] smsc95xx 2-1.1:1.0 eth0: register 'smsc95xx' at usb-4a064c00.ehci-1.1, smsc95xx USB 2.0 Ethernet, 02:03:01:8c:13:b0
> <6>[ 5.032928] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup
> <6>[ 5.044036] smsc95xx 2-1.1:1.0 eth0: Link is Down
> <6>[ 25.053924] Waiting up to 100 more seconds for network.
> <6>[ 45.074005] Waiting up to 80 more seconds for network.
> <6>[ 65.093933] Waiting up to 60 more seconds for network.
> <6>[ 85.123992] Waiting up to 40 more seconds for network.
> <6>[ 105.143951] Waiting up to 20 more seconds for network.
> <5>[ 125.084014] Sending DHCP requests ...... timed out!
Looks like v5.19 boots fine (with the offending commit), so only
v5.18-stable and v5.15-stable are affected:
https://storage.kernelci.org/stable-rc/linux-5.19.y/v5.19.3-366-g32f80a5b58e2/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-panda.html
The offending commit was a new feature introduced in v5.19,
it deliberately wasn't tagged for stable. Aggressively
backporting such new features always carries a risk of
regressions.
Off the cuff I don't see that a prerequisite commit is missing
in stable kernels. So just reverting the offending commit in
v5.18-stable and v5.15-stable (but not in v5.19-stable) might
be the simplest solution.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-08-25 4:21 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <63068874.170a0220.2f9fc.b822@mx.google.com>
2022-08-24 22:47 ` stable-rc/linux-5.18.y bisection: baseline.login on panda Mark Brown
2022-08-25 4:21 ` Lukas Wunner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).