netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?
       [not found] ` <CALAqxLWopjCkiM=NR868DTcX-apPc1MPnONJMppm1jzCboAheg@mail.gmail.com>
@ 2020-04-03 11:47   ` Geert Uytterhoeven
  2020-04-03 13:05     ` Geert Uytterhoeven
  2020-04-04  4:18     ` John Stultz
  0 siblings, 2 replies; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-04-03 11:47 UTC (permalink / raw)
  To: John Stultz; +Cc: Yoshihiro Shimoda, gregkh, rafael, iommu, LKML, netdev

Hi John,

On Thu, Apr 2, 2020 at 7:27 PM John Stultz <john.stultz@linaro.org> wrote:
> On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> <yoshihiro.shimoda.uh@renesas.com> wrote:
> >
> > I found an issue after applied the following patches:
> > ---
> > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
> > e2cec7d driver core: Set deferred_probe_timeout to a longer default if CONFIG_MODULES is set

Note that just setting deferred_probe_timeout = -1 like for the
CONFIG_MODULES=n case doesn't help.

> > c8c43ce driver core: Fix driver_deferred_probe_check_state() logic
> > ---
> >
> > Before these patches, on my environment [1], some device drivers
> > which has iommus property output the following message when probing:
> >
> > [    3.222205] ravb e6800000.ethernet: ignoring dependency for device, assuming no driver
> > [    3.257174] ravb e6800000.ethernet eth0: Base address at 0xe6800000, 2e:09:0a:02:eb:2d, IRQ 117.
> >
> > So, since ravb driver is probed within 4 seconds, we can use NFS rootfs correctly.
> >
> > However, after these patches are applied, since the patches are always waiting for 30 seconds
> > for of_iommu_configure() when IOMMU hardware is disabled, drivers/base/dd.c output WARN.
> > Also, since ravb cannot be probed for 30 seconds, we cannot use NFS rootfs anymore.
> > JFYI, I copied the kernel log to the end of this email.
>
> Hey,
>   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> to remove the WARN messages, but I'm a bit more concerned about why
> after the 30 second delay, the ethernet driver loads:
>   [   36.218666] ravb e6800000.ethernet eth0: Base address at
> 0xe6800000, 2e:09:0a:02:eb:2d, IRQ 117.
> but NFS fails.
>
> Is it just that the 30 second delay is too long and NFS gives up?

I added some debug code to mount_nfs_root(), which shows that the first
3 tries happen before ravb is instantiated, and the last 3 tries happen
after.  So NFS root should work, if the network works.

However, it seems the Ethernet PHY is never initialized, hence the link
never becomes ready.  Dmesg before/after:

     ravb e6800000.ethernet eth0: Base address at 0xe6800000,
2e:09:0a:02:ea:ff, IRQ 108.

Good.

     ...
    -gpio_rcar e6052000.gpio: sense irq = 11, type = 8

This is the GPIO the PHY IRQ is connected to.
Note that that GPIO controller has been instantiated before.

     ...
    -Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00:
attached PHY driver [Micrel KSZ9031 Gigabit PHY]
(mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=197)
     ...
    -ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

Oops.

    -Sending DHCP requests .., OK
    -IP-Config: Got DHCP answer from ...
     ...
    +VFS: Unable to mount root fs via NFS, trying floppy.
    +VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6

> Does booting with deferred_probe_timeout=0 work?

It does, as now everything using optional links (DMA and IOMMU) is now
instantiated on first try.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?
  2020-04-03 11:47   ` How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y? Geert Uytterhoeven
@ 2020-04-03 13:05     ` Geert Uytterhoeven
  2020-04-04  4:18     ` John Stultz
  1 sibling, 0 replies; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-04-03 13:05 UTC (permalink / raw)
  To: John Stultz
  Cc: Yoshihiro Shimoda, gregkh, rafael, iommu, LKML, netdev,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski

Hi John,

On Fri, Apr 3, 2020 at 1:47 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Thu, Apr 2, 2020 at 7:27 PM John Stultz <john.stultz@linaro.org> wrote:
> > On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> > <yoshihiro.shimoda.uh@renesas.com> wrote:
> > >
> > > I found an issue after applied the following patches:
> > > ---
> > > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > > bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
> > > e2cec7d driver core: Set deferred_probe_timeout to a longer default if CONFIG_MODULES is set
>
> Note that just setting deferred_probe_timeout = -1 like for the
> CONFIG_MODULES=n case doesn't help.
>
> > > c8c43ce driver core: Fix driver_deferred_probe_check_state() logic
> > > ---
> > >
> > > Before these patches, on my environment [1], some device drivers
> > > which has iommus property output the following message when probing:
> > >
> > > [    3.222205] ravb e6800000.ethernet: ignoring dependency for device, assuming no driver
> > > [    3.257174] ravb e6800000.ethernet eth0: Base address at 0xe6800000, 2e:09:0a:02:eb:2d, IRQ 117.
> > >
> > > So, since ravb driver is probed within 4 seconds, we can use NFS rootfs correctly.
> > >
> > > However, after these patches are applied, since the patches are always waiting for 30 seconds
> > > for of_iommu_configure() when IOMMU hardware is disabled, drivers/base/dd.c output WARN.
> > > Also, since ravb cannot be probed for 30 seconds, we cannot use NFS rootfs anymore.
> > > JFYI, I copied the kernel log to the end of this email.
> >
> > Hey,
> >   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> > to remove the WARN messages, but I'm a bit more concerned about why
> > after the 30 second delay, the ethernet driver loads:
> >   [   36.218666] ravb e6800000.ethernet eth0: Base address at
> > 0xe6800000, 2e:09:0a:02:eb:2d, IRQ 117.
> > but NFS fails.
> >
> > Is it just that the 30 second delay is too long and NFS gives up?
>
> I added some debug code to mount_nfs_root(), which shows that the first
> 3 tries happen before ravb is instantiated, and the last 3 tries happen
> after.  So NFS root should work, if the network works.
>
> However, it seems the Ethernet PHY is never initialized, hence the link
> never becomes ready.

So the issue is not nfsroot in-se, but the ip-config that needs to
happen before that.

The call to wait_for_devices() in ip_auto_config() (which is a
late_initcall()) returns -ENODEV, as the network device hasn't probed
successfully yet, so ip-config is aborted.

The (whitespace-damaged) patch below fixes that, but may have unintended
side-effects.

--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1469,7 +1469,11 @@ static int __init ip_auto_config(void)
        /* Wait for devices to appear */
        err = wait_for_devices();
        if (err)
+#ifdef IPCONFIG_DYNAMIC
+               goto try_try_again;
+#else
                return err;
+#endif

        /* Setup all network devices */
        err = ic_open_devs();

Probably we want at least some CONFIG_ROOT_NFS || CONFIG_CIFS_ROOT,
and ROOT_DEV == Root_NFS || ROOT_DEV == Root_CIFS checks.

Thanks for your comments!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?
  2020-04-03 11:47   ` How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y? Geert Uytterhoeven
  2020-04-03 13:05     ` Geert Uytterhoeven
@ 2020-04-04  4:18     ` John Stultz
  2020-04-06  8:43       ` Yoshihiro Shimoda
  1 sibling, 1 reply; 9+ messages in thread
From: John Stultz @ 2020-04-04  4:18 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Yoshihiro Shimoda, gregkh, rafael, iommu, LKML, netdev

On Fri, Apr 3, 2020 at 4:47 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Thu, Apr 2, 2020 at 7:27 PM John Stultz <john.stultz@linaro.org> wrote:
> > On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> > <yoshihiro.shimoda.uh@renesas.com> wrote:
> > >
> > > I found an issue after applied the following patches:
> > > ---
> > > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > > bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
> > > e2cec7d driver core: Set deferred_probe_timeout to a longer default if CONFIG_MODULES is set
>
> Note that just setting deferred_probe_timeout = -1 like for the
> CONFIG_MODULES=n case doesn't help.

Yea. I can see why in that case, as we're checking
!IS_ENABLED(CONFIG_MODULES) directly in
driver_deferred_probe_check_state.

I guess we could switch that to checking
(driver_deferred_probe_timeout == -1) which would have the same logic
and at least make it consistent if someone specifies -1 on the command
line (since now it will effectively have it EPROBE_DEFER forever in
that case). But also having a timeout=infinity could be useful if
folks don't want the deferring to time out.  Maybe in the !modules
case setting it to =0 would be the most clear.

But that's sort of a further cleanup. I'm still more worried about the
NFS failure below.


> > Hey,
> >   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> > to remove the WARN messages, but I'm a bit more concerned about why
> > after the 30 second delay, the ethernet driver loads:
> >   [   36.218666] ravb e6800000.ethernet eth0: Base address at
> > 0xe6800000, 2e:09:0a:02:eb:2d, IRQ 117.
> > but NFS fails.
> >
> > Is it just that the 30 second delay is too long and NFS gives up?
>
> I added some debug code to mount_nfs_root(), which shows that the first
> 3 tries happen before ravb is instantiated, and the last 3 tries happen
> after.  So NFS root should work, if the network works.
>
> However, it seems the Ethernet PHY is never initialized, hence the link
> never becomes ready.  Dmesg before/after:
>
>      ravb e6800000.ethernet eth0: Base address at 0xe6800000,
> 2e:09:0a:02:ea:ff, IRQ 108.
>
> Good.
>
>      ...
>     -gpio_rcar e6052000.gpio: sense irq = 11, type = 8
>
> This is the GPIO the PHY IRQ is connected to.
> Note that that GPIO controller has been instantiated before.
>
>      ...
>     -Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00:
> attached PHY driver [Micrel KSZ9031 Gigabit PHY]
> (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=197)
>      ...
>     -ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
>
> Oops.
>
>     -Sending DHCP requests .., OK
>     -IP-Config: Got DHCP answer from ...
>      ...
>     +VFS: Unable to mount root fs via NFS, trying floppy.
>     +VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6
>
> > Does booting with deferred_probe_timeout=0 work?
>
> It does, as now everything using optional links (DMA and IOMMU) is now
> instantiated on first try.

Thanks so much for helping clarify this!

So it's at least good to hear that booting with
deferred_probe_timeout=0 is working!  But I'm bummed the NFS (or as
you pointed out in your later mail,  ip_auto_config) falls over
because the network isn't immediately there.

Looking a little closer at the ip_auto_config() code, I think the
issue may be that wait_for_device_probe() is effectively returning too
early, since the probe_defer_timeout is still active? I need to dig a
bit more on that code, on Monday, as I don't fully understand it yet.

If I can't find a way to address that, I think the best course will be
to set the driver_deferred_probe_timeout value to default to 0
regardless of the value of CONFIG_MODULES, so we don't cause any
apparent regression from previous behavior. That will also sort out
the less intuitive = -1 initialization in the non-modules case.

In any case, I'll try to have a patch to send out on Monday.

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?
  2020-04-04  4:18     ` John Stultz
@ 2020-04-06  8:43       ` Yoshihiro Shimoda
  2020-04-06 10:01         ` Yoshihiro Shimoda
  0 siblings, 1 reply; 9+ messages in thread
From: Yoshihiro Shimoda @ 2020-04-06  8:43 UTC (permalink / raw)
  To: John Stultz, Geert Uytterhoeven; +Cc: gregkh, rafael, iommu, LKML, netdev

Hi John, Geert,

> From: John Stultz, Sent: Saturday, April 4, 2020 1:19 PM
> 
> On Fri, Apr 3, 2020 at 4:47 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Thu, Apr 2, 2020 at 7:27 PM John Stultz <john.stultz@linaro.org> wrote:
> > > On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> > > <yoshihiro.shimoda.uh@renesas.com> wrote:
> > > >
> > > > I found an issue after applied the following patches:
> > > > ---
> > > > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > > > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > > > bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
> > > > e2cec7d driver core: Set deferred_probe_timeout to a longer default if CONFIG_MODULES is set
> >
> > Note that just setting deferred_probe_timeout = -1 like for the
> > CONFIG_MODULES=n case doesn't help.
> 
> Yea. I can see why in that case, as we're checking
> !IS_ENABLED(CONFIG_MODULES) directly in
> driver_deferred_probe_check_state.
> 
> I guess we could switch that to checking
> (driver_deferred_probe_timeout == -1) which would have the same logic
> and at least make it consistent if someone specifies -1 on the command
> line (since now it will effectively have it EPROBE_DEFER forever in
> that case). But also having a timeout=infinity could be useful if
> folks don't want the deferring to time out.  Maybe in the !modules
> case setting it to =0 would be the most clear.
> 
> But that's sort of a further cleanup. I'm still more worried about the
> NFS failure below.
> 
> 
> > > Hey,
> > >   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> > > to remove the WARN messages, but I'm a bit more concerned about why
> > > after the 30 second delay, the ethernet driver loads:
> > >   [   36.218666] ravb e6800000.ethernet eth0: Base address at
> > > 0xe6800000, 2e:09:0a:02:eb:2d, IRQ 117.
> > > but NFS fails.
> > >
> > > Is it just that the 30 second delay is too long and NFS gives up?
> >
> > I added some debug code to mount_nfs_root(), which shows that the first
> > 3 tries happen before ravb is instantiated, and the last 3 tries happen
> > after.  So NFS root should work, if the network works.
> >
> > However, it seems the Ethernet PHY is never initialized, hence the link
> > never becomes ready.  Dmesg before/after:
> >
> >      ravb e6800000.ethernet eth0: Base address at 0xe6800000,
> > 2e:09:0a:02:ea:ff, IRQ 108.
> >
> > Good.
> >
> >      ...
> >     -gpio_rcar e6052000.gpio: sense irq = 11, type = 8
> >
> > This is the GPIO the PHY IRQ is connected to.
> > Note that that GPIO controller has been instantiated before.
> >
> >      ...
> >     -Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00:
> > attached PHY driver [Micrel KSZ9031 Gigabit PHY]
> > (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=197)
> >      ...
> >     -ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> >
> > Oops.
> >
> >     -Sending DHCP requests .., OK
> >     -IP-Config: Got DHCP answer from ...
> >      ...
> >     +VFS: Unable to mount root fs via NFS, trying floppy.
> >     +VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6
> >
> > > Does booting with deferred_probe_timeout=0 work?
> >
> > It does, as now everything using optional links (DMA and IOMMU) is now
> > instantiated on first try.
> 
> Thanks so much for helping clarify this!
> 
> So it's at least good to hear that booting with
> deferred_probe_timeout=0 is working!  But I'm bummed the NFS (or as
> you pointed out in your later mail,  ip_auto_config) falls over
> because the network isn't immediately there.
> 
> Looking a little closer at the ip_auto_config() code, I think the
> issue may be that wait_for_device_probe() is effectively returning too
> early, since the probe_defer_timeout is still active? I need to dig a
> bit more on that code, on Monday, as I don't fully understand it yet.

I think so. I also investigated this issue more and then the following
patch seems to be related because return value is changed a bit.

c8c43ce driver core: Fix driver_deferred_probe_check_state() logic

# By the way, this is other topic though, IIUC we should revise
# the deferred_probe_timeout= in Documentation/admin-guide/kernel-parameters.txt
# for the commit c8c43ce. Especially " A timeout of 0 will timeout at the end of initcalls."
# doesn't match after we applied the commit.

I'm guessing we should add the following flush_work for deferred_probe_timeout_work().
# Sorry, I didn't test this for some reasons yet though...

+       /* wait for the deferred probe timeout workqueue to finish */
+       if (driver_deferred_probe_timeout > 0)
+               flush_work(&deferred_probe_timeout_work);

> If I can't find a way to address that, I think the best course will be
> to set the driver_deferred_probe_timeout value to default to 0
> regardless of the value of CONFIG_MODULES, so we don't cause any
> apparent regression from previous behavior. That will also sort out
> the less intuitive = -1 initialization in the non-modules case.
> 
> In any case, I'll try to have a patch to send out on Monday.

Thanks!

Best regards,
Yoshihiro Shimoda

> thanks
> -john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?
  2020-04-06  8:43       ` Yoshihiro Shimoda
@ 2020-04-06 10:01         ` Yoshihiro Shimoda
  2020-04-07  7:06           ` [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Yoshihiro Shimoda @ 2020-04-06 10:01 UTC (permalink / raw)
  To: John Stultz, Geert Uytterhoeven; +Cc: gregkh, rafael, iommu, LKML, netdev

Hi again,

<snip>
> I'm guessing we should add the following flush_work for deferred_probe_timeout_work().
> # Sorry, I didn't test this for some reasons yet though...
> 
> +       /* wait for the deferred probe timeout workqueue to finish */
> +       if (driver_deferred_probe_timeout > 0)
> +               flush_work(&deferred_probe_timeout_work);

I'm sorry. This code caused build error because the deferred_probe_timeout_work
is struct delayed_work. Also, I don't think using flush_delayed_work() is
my expectation (wait until the timeout of deferred)...

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  2020-04-06 10:01         ` Yoshihiro Shimoda
@ 2020-04-07  7:06           ` John Stultz
  2020-04-07  7:50             ` Geert Uytterhoeven
  0 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2020-04-07  7:06 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, netdev, linux-pm

In commit c8c43cee29f6 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we set the default
driver_deferred_probe_timeout value to 30 seconds to allow for
drivers that are missing dependencies to have some time so that
the dependency may be loaded from userland after initcalls_done
is set.

However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.

In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.

This patch tries to fix the issue by creating a waitqueue
for the driver_deferred_probe_timeout, and calling wait_event()
to make sure driver_deferred_probe_timeout is zero in
wait_for_device_probe() to make sure all the probing is
finished.

NOTE: I'm not 100% sure this won't have other unwanted side
effects (I don't have failing hardware myself to validate),
so I'd apprecate testing and close review.

If this approach doesn't work, I'll simply set the default
driver_deferred_probe_timeout value back to zero, to avoid any
behavioral change from before.

Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: netdev <netdev@vger.kernel.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/base/dd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 06ec0e851fa1..8c13f0df3282 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -232,9 +232,10 @@ DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 int driver_deferred_probe_timeout = 30;
 #else
 /* In the case of !modules, no probe timeout needed */
-int driver_deferred_probe_timeout = -1;
+int driver_deferred_probe_timeout;
 #endif
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
+static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
 
 static int __init deferred_probe_timeout_setup(char *str)
 {
@@ -266,7 +267,7 @@ int driver_deferred_probe_check_state(struct device *dev)
 		return -ENODEV;
 	}
 
-	if (!driver_deferred_probe_timeout) {
+	if (!driver_deferred_probe_timeout && initcalls_done) {
 		dev_WARN(dev, "deferred probe timeout, ignoring dependency");
 		return -ETIMEDOUT;
 	}
@@ -284,6 +285,7 @@ static void deferred_probe_timeout_work_func(struct work_struct *work)
 
 	list_for_each_entry_safe(private, p, &deferred_probe_pending_list, deferred_probe)
 		dev_info(private->device, "deferred probe pending");
+	wake_up(&probe_timeout_waitqueue);
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
 
@@ -658,6 +660,9 @@ int driver_probe_done(void)
  */
 void wait_for_device_probe(void)
 {
+	/* wait for probe timeout */
+	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
+
 	/* wait for the deferred probe workqueue to finish */
 	flush_work(&deferred_probe_work);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  2020-04-07  7:06           ` [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
@ 2020-04-07  7:50             ` Geert Uytterhoeven
  2020-04-07 16:46               ` Geert Uytterhoeven
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-04-07  7:50 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Yoshihiro Shimoda, netdev, Linux PM list

Hi John,

On Tue, Apr 7, 2020 at 9:06 AM John Stultz <john.stultz@linaro.org> wrote:
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic"), we set the default
> driver_deferred_probe_timeout value to 30 seconds to allow for
> drivers that are missing dependencies to have some time so that
> the dependency may be loaded from userland after initcalls_done
> is set.
>
> However, Yoshihiro Shimoda reported that on his device that
> expects to have unmet dependencies (due to "optional links" in
> its devicetree), was failing to mount the NFS root.
>
> In digging further, it seemed the problem was that while the
> device properly probes after waiting 30 seconds for any missing
> modules to load, the ip_auto_config() had already failed,
> resulting in NFS to fail. This was due to ip_auto_config()
> calling wait_for_device_probe() which doesn't wait for the
> driver_deferred_probe_timeout to fire.
>
> This patch tries to fix the issue by creating a waitqueue
> for the driver_deferred_probe_timeout, and calling wait_event()
> to make sure driver_deferred_probe_timeout is zero in
> wait_for_device_probe() to make sure all the probing is
> finished.
>
> NOTE: I'm not 100% sure this won't have other unwanted side
> effects (I don't have failing hardware myself to validate),
> so I'd apprecate testing and close review.
>
> If this approach doesn't work, I'll simply set the default
> driver_deferred_probe_timeout value back to zero, to avoid any
> behavioral change from before.
>
> Thanks to Geert for chasing down that ip_auto_config was why NFS
> was failing in this case!
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Thanks, this fixes the issue for me!

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  2020-04-07  7:50             ` Geert Uytterhoeven
@ 2020-04-07 16:46               ` Geert Uytterhoeven
  2020-04-07 18:38                 ` John Stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-04-07 16:46 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Yoshihiro Shimoda, netdev, Linux PM list

Hi John,

On Tue, Apr 7, 2020 at 9:50 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Tue, Apr 7, 2020 at 9:06 AM John Stultz <john.stultz@linaro.org> wrote:
> > In commit c8c43cee29f6 ("driver core: Fix
> > driver_deferred_probe_check_state() logic"), we set the default
> > driver_deferred_probe_timeout value to 30 seconds to allow for
> > drivers that are missing dependencies to have some time so that
> > the dependency may be loaded from userland after initcalls_done
> > is set.
> >
> > However, Yoshihiro Shimoda reported that on his device that
> > expects to have unmet dependencies (due to "optional links" in
> > its devicetree), was failing to mount the NFS root.
> >
> > In digging further, it seemed the problem was that while the
> > device properly probes after waiting 30 seconds for any missing
> > modules to load, the ip_auto_config() had already failed,
> > resulting in NFS to fail. This was due to ip_auto_config()
> > calling wait_for_device_probe() which doesn't wait for the
> > driver_deferred_probe_timeout to fire.
> >
> > This patch tries to fix the issue by creating a waitqueue
> > for the driver_deferred_probe_timeout, and calling wait_event()
> > to make sure driver_deferred_probe_timeout is zero in
> > wait_for_device_probe() to make sure all the probing is
> > finished.
> >
> > NOTE: I'm not 100% sure this won't have other unwanted side
> > effects (I don't have failing hardware myself to validate),
> > so I'd apprecate testing and close review.
> >
> > If this approach doesn't work, I'll simply set the default
> > driver_deferred_probe_timeout value back to zero, to avoid any
> > behavioral change from before.
> >
> > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > was failing in this case!
> >
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Cc: netdev <netdev@vger.kernel.org>
> > Cc: linux-pm@vger.kernel.org
> > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
>
> Thanks, this fixes the issue for me!
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Unfortunately this adds another delay of ca. 30 s to mounting NFS root
when using a kernel config that does include IOMMU and MODULES
support.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  2020-04-07 16:46               ` Geert Uytterhoeven
@ 2020-04-07 18:38                 ` John Stultz
  0 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2020-04-07 18:38 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Yoshihiro Shimoda, netdev, Linux PM list

On Tue, Apr 7, 2020 at 9:46 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi John,
>
> On Tue, Apr 7, 2020 at 9:50 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, Apr 7, 2020 at 9:06 AM John Stultz <john.stultz@linaro.org> wrote:
> > > In commit c8c43cee29f6 ("driver core: Fix
> > > driver_deferred_probe_check_state() logic"), we set the default
> > > driver_deferred_probe_timeout value to 30 seconds to allow for
> > > drivers that are missing dependencies to have some time so that
> > > the dependency may be loaded from userland after initcalls_done
> > > is set.
> > >
> > > However, Yoshihiro Shimoda reported that on his device that
> > > expects to have unmet dependencies (due to "optional links" in
> > > its devicetree), was failing to mount the NFS root.
> > >
> > > In digging further, it seemed the problem was that while the
> > > device properly probes after waiting 30 seconds for any missing
> > > modules to load, the ip_auto_config() had already failed,
> > > resulting in NFS to fail. This was due to ip_auto_config()
> > > calling wait_for_device_probe() which doesn't wait for the
> > > driver_deferred_probe_timeout to fire.
> > >
> > > This patch tries to fix the issue by creating a waitqueue
> > > for the driver_deferred_probe_timeout, and calling wait_event()
> > > to make sure driver_deferred_probe_timeout is zero in
> > > wait_for_device_probe() to make sure all the probing is
> > > finished.
> > >
> > > NOTE: I'm not 100% sure this won't have other unwanted side
> > > effects (I don't have failing hardware myself to validate),
> > > so I'd apprecate testing and close review.
> > >
> > > If this approach doesn't work, I'll simply set the default
> > > driver_deferred_probe_timeout value back to zero, to avoid any
> > > behavioral change from before.
> > >
> > > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > > was failing in this case!
> > >
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Cc: Rob Herring <robh@kernel.org>
> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Cc: netdev <netdev@vger.kernel.org>
> > > Cc: linux-pm@vger.kernel.org
> > > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > > Signed-off-by: John Stultz <john.stultz@linaro.org>
> >
> > Thanks, this fixes the issue for me!
> >
> > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Unfortunately this adds another delay of ca. 30 s to mounting NFS root
> when using a kernel config that does include IOMMU and MODULES
> support.

Yea. I worry the other downside is that systems with no missing
dependencies will also see the stall here since we're waiting for the
timeout regardless of if there's any drivers missing.

So in the light of morning (well, just barely), I think just setting
the probe timeout to zero by default is the best approach. The series
then doesn't change behavior but just cleans things up.

Though, I guess one could argue this fix should go along with setting
the value to zero, so at least if folks specify a delay on the boot
cmd, things don't fail because they didn't wait.

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-07 18:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com>
     [not found] ` <CALAqxLWopjCkiM=NR868DTcX-apPc1MPnONJMppm1jzCboAheg@mail.gmail.com>
2020-04-03 11:47   ` How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y? Geert Uytterhoeven
2020-04-03 13:05     ` Geert Uytterhoeven
2020-04-04  4:18     ` John Stultz
2020-04-06  8:43       ` Yoshihiro Shimoda
2020-04-06 10:01         ` Yoshihiro Shimoda
2020-04-07  7:06           ` [RFC][PATCH] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
2020-04-07  7:50             ` Geert Uytterhoeven
2020-04-07 16:46               ` Geert Uytterhoeven
2020-04-07 18:38                 ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).