netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Fixes for deferred_probe_timeout cleanup
@ 2020-04-22 20:32 John Stultz
  2020-04-22 20:32 ` [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0 John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: John Stultz @ 2020-04-22 20:32 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko, Sudeep Holla,
	Andy Shevchenko, Naresh Kamboju, Basil Eljuse, Ferry Toth,
	Arnd Bergmann, Anders Roxell, netdev, linux-pm

Just wanted to resubmit these three fixes for the
deferred_probe_timeout cleanup that landed in the v5.7-rc1 merge
window.

The first resets the default timeout value back to zero so we
have no behavioral change from 5.6. This avoids regressions on
boards that have "optional links" in their device tree.

The second changes the code to use dev_warn() instead of
dev_WARN() to address complaints about unnecessary backtraces in
the boot log.

The last fixes an issue discovered by Yoshihiro Shimoda
and Geert Uytterhoeven, where if a timeout was set, things
like NFS root might fail due to wait_for_device_probe()
not blocking until the timeout expires.

thanks
-john

New in v3:
* Just included the previously posted dev_warn() fix into the
  series, so they didn't collide.


Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: netdev <netdev@vger.kernel.org>
Cc: linux-pm@vger.kernel.org

John Stultz (3):
  driver core: Revert default driver_deferred_probe_timeout value to 0
  driver core: Use dev_warn() instead of dev_WARN() for
    deferred_probe_timeout warnings
  driver core: Ensure wait_for_device_probe() waits until the
    deferred_probe_timeout fires

 drivers/base/dd.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-04-22 20:32 [PATCH v3 0/3] Fixes for deferred_probe_timeout cleanup John Stultz
@ 2020-04-22 20:32 ` John Stultz
  2020-04-23  7:23   ` Geert Uytterhoeven
                     ` (2 more replies)
  2020-04-22 20:32 ` [PATCH v3 2/3] driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings John Stultz
  2020-04-22 20:32 ` [PATCH v3 3/3] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
  2 siblings, 3 replies; 14+ messages in thread
From: John Stultz @ 2020-04-22 20:32 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko, Sudeep Holla,
	Andy Shevchenko, Naresh Kamboju, Basil Eljuse, Ferry Toth,
	Arnd Bergmann, Anders Roxell, netdev, linux-pm

This patch addresses a regression in 5.7-rc1+

In commit c8c43cee29f6 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we both cleaned up
the logic and also set the default driver_deferred_probe_timeout
value to 30 seconds to allow for drivers that are missing
dependencies to have some time so that the dependency may be
loaded from userland after initcalls_done is set.

However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.

In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.

Fixing that issue is possible, but could also introduce 30
second delays in bootups for users who don't have any
missing dependencies, which is not ideal.

So I think the best solution to avoid any regressions is to
revert back to a default timeout value of zero, and allow
systems that need to utilize the timeout in order for userland
to load any modules that supply misisng dependencies in the dts
to specify the timeout length via the exiting documented boot
argument.

Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: netdev <netdev@vger.kernel.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/base/dd.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 06ec0e851fa1..908ae4d7805e 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -224,16 +224,7 @@ static int deferred_devs_show(struct seq_file *s, void *data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
-#ifdef CONFIG_MODULES
-/*
- * In the case of modules, set the default probe timeout to
- * 30 seconds to give userland some time to load needed modules
- */
-int driver_deferred_probe_timeout = 30;
-#else
-/* In the case of !modules, no probe timeout needed */
-int driver_deferred_probe_timeout = -1;
-#endif
+int driver_deferred_probe_timeout;
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 
 static int __init deferred_probe_timeout_setup(char *str)
@@ -266,7 +257,7 @@ int driver_deferred_probe_check_state(struct device *dev)
 		return -ENODEV;
 	}
 
-	if (!driver_deferred_probe_timeout) {
+	if (!driver_deferred_probe_timeout && initcalls_done) {
 		dev_WARN(dev, "deferred probe timeout, ignoring dependency");
 		return -ETIMEDOUT;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/3] driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings
  2020-04-22 20:32 [PATCH v3 0/3] Fixes for deferred_probe_timeout cleanup John Stultz
  2020-04-22 20:32 ` [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0 John Stultz
@ 2020-04-22 20:32 ` John Stultz
  2020-04-23  9:33   ` Yoshihiro Shimoda
  2020-04-22 20:32 ` [PATCH v3 3/3] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
  2 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2020-04-22 20:32 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko, Sudeep Holla,
	Andy Shevchenko, Naresh Kamboju, Basil Eljuse, Ferry Toth,
	Arnd Bergmann, Anders Roxell, netdev, linux-pm

In commit c8c43cee29f6 ("driver core: Fix
driver_deferred_probe_check_state() logic") and following
changes the logic was changes slightly so that if there is no
driver to match whats found in the dtb, we wait the sepcified
seconds for modules to be loaded by userland, and then timeout,
where as previously we'd print "ignoring dependency for device,
assuming no driver" and immediately return -ENODEV after
initcall_done.

However, in the timeout case (which previously existed but was
practicaly un-used without a boot argument), the timeout message
uses dev_WARN(). This means folks are now seeing a big backtrace
in their boot logs if there a entry in their dts that doesn't
have a driver.

To fix this, lets use dev_warn(), instead of dev_WARN() to match
the previous error path.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: netdev <netdev@vger.kernel.org>
Cc: linux-pm@vger.kernel.org
Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/base/dd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 908ae4d7805e..9c88afa5c74a 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -258,7 +258,7 @@ int driver_deferred_probe_check_state(struct device *dev)
 	}
 
 	if (!driver_deferred_probe_timeout && initcalls_done) {
-		dev_WARN(dev, "deferred probe timeout, ignoring dependency");
+		dev_warn(dev, "deferred probe timeout, ignoring dependency");
 		return -ETIMEDOUT;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/3] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  2020-04-22 20:32 [PATCH v3 0/3] Fixes for deferred_probe_timeout cleanup John Stultz
  2020-04-22 20:32 ` [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0 John Stultz
  2020-04-22 20:32 ` [PATCH v3 2/3] driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings John Stultz
@ 2020-04-22 20:32 ` John Stultz
  2020-04-23  7:25   ` Geert Uytterhoeven
  2 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2020-04-22 20:32 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko, Sudeep Holla,
	Andy Shevchenko, Naresh Kamboju, Basil Eljuse, Ferry Toth,
	Arnd Bergmann, Anders Roxell, netdev, linux-pm

In commit c8c43cee29f6 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we set the default
driver_deferred_probe_timeout value to 30 seconds to allow for
drivers that are missing dependencies to have some time so that
the dependency may be loaded from userland after initcalls_done
is set.

However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.

In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.

This patch tries to fix the issue by creating a waitqueue
for the driver_deferred_probe_timeout, and calling wait_event()
to make sure driver_deferred_probe_timeout is zero in
wait_for_device_probe() to make sure all the probing is
finished.

The downside to this solution is that kernel functionality that
uses wait_for_device_probe(), will block until the
driver_deferred_probe_timeout fires, regardless of if there is
any missing dependencies.

However, the previous patch reverts the default timeout value to
zero, so this side-effect will only affect users who specify a
driver_deferred_probe_timeout= value as a boot argument, where
the additional delay would be beneficial to allow modules to
load later during boot.

Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: netdev <netdev@vger.kernel.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/base/dd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9c88afa5c74a..94037be7f5d7 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -226,6 +226,7 @@ DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
 int driver_deferred_probe_timeout;
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
+static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
 
 static int __init deferred_probe_timeout_setup(char *str)
 {
@@ -275,6 +276,7 @@ static void deferred_probe_timeout_work_func(struct work_struct *work)
 
 	list_for_each_entry_safe(private, p, &deferred_probe_pending_list, deferred_probe)
 		dev_info(private->device, "deferred probe pending");
+	wake_up(&probe_timeout_waitqueue);
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
 
@@ -649,6 +651,9 @@ int driver_probe_done(void)
  */
 void wait_for_device_probe(void)
 {
+	/* wait for probe timeout */
+	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
+
 	/* wait for the deferred probe workqueue to finish */
 	flush_work(&deferred_probe_work);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-04-22 20:32 ` [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0 John Stultz
@ 2020-04-23  7:23   ` Geert Uytterhoeven
       [not found]   ` <CGME20200429134605eucas1p2bd601082e7a6b8c8fdbe79c83972e2e3@eucas1p2.samsung.com>
  2020-08-06 13:52   ` Thierry Reding
  2 siblings, 0 replies; 14+ messages in thread
From: Geert Uytterhoeven @ 2020-04-23  7:23 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko,
	Sudeep Holla, Andy Shevchenko, Naresh Kamboju, Basil Eljuse,
	Ferry Toth, Arnd Bergmann, Anders Roxell, netdev, Linux PM list

Hi John,

On Wed, Apr 22, 2020 at 10:33 PM John Stultz <john.stultz@linaro.org> wrote:
> This patch addresses a regression in 5.7-rc1+
>
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic"), we both cleaned up
> the logic and also set the default driver_deferred_probe_timeout
> value to 30 seconds to allow for drivers that are missing
> dependencies to have some time so that the dependency may be
> loaded from userland after initcalls_done is set.
>
> However, Yoshihiro Shimoda reported that on his device that
> expects to have unmet dependencies (due to "optional links" in
> its devicetree), was failing to mount the NFS root.
>
> In digging further, it seemed the problem was that while the
> device properly probes after waiting 30 seconds for any missing
> modules to load, the ip_auto_config() had already failed,
> resulting in NFS to fail. This was due to ip_auto_config()
> calling wait_for_device_probe() which doesn't wait for the
> driver_deferred_probe_timeout to fire.
>
> Fixing that issue is possible, but could also introduce 30
> second delays in bootups for users who don't have any
> missing dependencies, which is not ideal.
>
> So I think the best solution to avoid any regressions is to
> revert back to a default timeout value of zero, and allow
> systems that need to utilize the timeout in order for userland
> to load any modules that supply misisng dependencies in the dts
> to specify the timeout length via the exiting documented boot
> argument.
>
> Thanks to Geert for chasing down that ip_auto_config was why NFS
> was failing in this case!
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> Cc: Ferry Toth <fntoth@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Looks like you lost my
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
on v2.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  2020-04-22 20:32 ` [PATCH v3 3/3] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
@ 2020-04-23  7:25   ` Geert Uytterhoeven
  0 siblings, 0 replies; 14+ messages in thread
From: Geert Uytterhoeven @ 2020-04-23  7:25 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko,
	Sudeep Holla, Andy Shevchenko, Naresh Kamboju, Basil Eljuse,
	Ferry Toth, Arnd Bergmann, Anders Roxell, netdev, Linux PM list

On Wed, Apr 22, 2020 at 10:33 PM John Stultz <john.stultz@linaro.org> wrote:
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic"), we set the default
> driver_deferred_probe_timeout value to 30 seconds to allow for
> drivers that are missing dependencies to have some time so that
> the dependency may be loaded from userland after initcalls_done
> is set.
>
> However, Yoshihiro Shimoda reported that on his device that
> expects to have unmet dependencies (due to "optional links" in
> its devicetree), was failing to mount the NFS root.
>
> In digging further, it seemed the problem was that while the
> device properly probes after waiting 30 seconds for any missing
> modules to load, the ip_auto_config() had already failed,
> resulting in NFS to fail. This was due to ip_auto_config()
> calling wait_for_device_probe() which doesn't wait for the
> driver_deferred_probe_timeout to fire.
>
> This patch tries to fix the issue by creating a waitqueue
> for the driver_deferred_probe_timeout, and calling wait_event()
> to make sure driver_deferred_probe_timeout is zero in
> wait_for_device_probe() to make sure all the probing is
> finished.
>
> The downside to this solution is that kernel functionality that
> uses wait_for_device_probe(), will block until the
> driver_deferred_probe_timeout fires, regardless of if there is
> any missing dependencies.
>
> However, the previous patch reverts the default timeout value to
> zero, so this side-effect will only affect users who specify a
> driver_deferred_probe_timeout= value as a boot argument, where
> the additional delay would be beneficial to allow modules to
> load later during boot.
>
> Thanks to Geert for chasing down that ip_auto_config was why NFS
> was failing in this case!
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> Cc: Ferry Toth <fntoth@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Let's add my
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
for v2.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 2/3] driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings
  2020-04-22 20:32 ` [PATCH v3 2/3] driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings John Stultz
@ 2020-04-23  9:33   ` Yoshihiro Shimoda
  0 siblings, 0 replies; 14+ messages in thread
From: Yoshihiro Shimoda @ 2020-04-23  9:33 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Geert Uytterhoeven, Robin Murphy, Andy Shevchenko,
	Sudeep Holla, Andy Shevchenko, Naresh Kamboju, Basil Eljuse,
	Ferry Toth, Arnd Bergmann, Anders Roxell, netdev, linux-pm

Hi John,

> From: John Stultz, Sent: Thursday, April 23, 2020 5:33 AM
> 
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic") and following
> changes the logic was changes slightly so that if there is no
> driver to match whats found in the dtb, we wait the sepcified
> seconds for modules to be loaded by userland, and then timeout,
> where as previously we'd print "ignoring dependency for device,
> assuming no driver" and immediately return -ENODEV after
> initcall_done.
> 
> However, in the timeout case (which previously existed but was
> practicaly un-used without a boot argument), the timeout message
> uses dev_WARN(). This means folks are now seeing a big backtrace
> in their boot logs if there a entry in their dts that doesn't
> have a driver.
> 
> To fix this, lets use dev_warn(), instead of dev_WARN() to match
> the previous error path.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> Cc: Ferry Toth <fntoth@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---

Thank you for the patch!

Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
       [not found]   ` <CGME20200429134605eucas1p2bd601082e7a6b8c8fdbe79c83972e2e3@eucas1p2.samsung.com>
@ 2020-04-29 13:46     ` Marek Szyprowski
  2020-04-29 13:52       ` Mark Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Szyprowski @ 2020-04-29 13:46 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Geert Uytterhoeven, Yoshihiro Shimoda, Robin Murphy,
	Andy Shevchenko, Sudeep Holla, Andy Shevchenko, Naresh Kamboju,
	Basil Eljuse, Ferry Toth, Arnd Bergmann, Anders Roxell, netdev,
	linux-pm, Mark Brown, 'Linux Samsung SOC'

Hi John,

On 22.04.2020 22:32, John Stultz wrote:
> This patch addresses a regression in 5.7-rc1+
>
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic"), we both cleaned up
> the logic and also set the default driver_deferred_probe_timeout
> value to 30 seconds to allow for drivers that are missing
> dependencies to have some time so that the dependency may be
> loaded from userland after initcalls_done is set.
>
> However, Yoshihiro Shimoda reported that on his device that
> expects to have unmet dependencies (due to "optional links" in
> its devicetree), was failing to mount the NFS root.
>
> In digging further, it seemed the problem was that while the
> device properly probes after waiting 30 seconds for any missing
> modules to load, the ip_auto_config() had already failed,
> resulting in NFS to fail. This was due to ip_auto_config()
> calling wait_for_device_probe() which doesn't wait for the
> driver_deferred_probe_timeout to fire.
>
> Fixing that issue is possible, but could also introduce 30
> second delays in bootups for users who don't have any
> missing dependencies, which is not ideal.
>
> So I think the best solution to avoid any regressions is to
> revert back to a default timeout value of zero, and allow
> systems that need to utilize the timeout in order for userland
> to load any modules that supply misisng dependencies in the dts
> to specify the timeout length via the exiting documented boot
> argument.
>
> Thanks to Geert for chasing down that ip_auto_config was why NFS
> was failing in this case!
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> Cc: Ferry Toth <fntoth@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Please also revert dca0b44957e5 "regulator: Use 
driver_deferred_probe_timeout for regulator_init_complete_work" then, 
because now with the default 0 timeout some regulators gets disabled 
during boot, before their supplies gets instantiated.

This patch broke booting of Samsung Exynos5800-based Peach-Pi Chromeboot 
with the default multi_v7_defconfig.

> ---
>   drivers/base/dd.c | 13 ++-----------
>   1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 06ec0e851fa1..908ae4d7805e 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -224,16 +224,7 @@ static int deferred_devs_show(struct seq_file *s, void *data)
>   }
>   DEFINE_SHOW_ATTRIBUTE(deferred_devs);
>   
> -#ifdef CONFIG_MODULES
> -/*
> - * In the case of modules, set the default probe timeout to
> - * 30 seconds to give userland some time to load needed modules
> - */
> -int driver_deferred_probe_timeout = 30;
> -#else
> -/* In the case of !modules, no probe timeout needed */
> -int driver_deferred_probe_timeout = -1;
> -#endif
> +int driver_deferred_probe_timeout;
>   EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
>   
>   static int __init deferred_probe_timeout_setup(char *str)
> @@ -266,7 +257,7 @@ int driver_deferred_probe_check_state(struct device *dev)
>   		return -ENODEV;
>   	}
>   
> -	if (!driver_deferred_probe_timeout) {
> +	if (!driver_deferred_probe_timeout && initcalls_done) {
>   		dev_WARN(dev, "deferred probe timeout, ignoring dependency");
>   		return -ETIMEDOUT;
>   	}

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-04-29 13:46     ` Marek Szyprowski
@ 2020-04-29 13:52       ` Mark Brown
  2020-04-29 16:56         ` John Stultz
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Brown @ 2020-04-29 13:52 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Stultz, lkml, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko, Sudeep Holla,
	Andy Shevchenko, Naresh Kamboju, Basil Eljuse, Ferry Toth,
	Arnd Bergmann, Anders Roxell, netdev, linux-pm,
	'Linux Samsung SOC'

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On Wed, Apr 29, 2020 at 03:46:04PM +0200, Marek Szyprowski wrote:
> On 22.04.2020 22:32, John Stultz wrote:

> > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > Signed-off-by: John Stultz <john.stultz@linaro.org>

> Please also revert dca0b44957e5 "regulator: Use 
> driver_deferred_probe_timeout for regulator_init_complete_work" then, 
> because now with the default 0 timeout some regulators gets disabled 
> during boot, before their supplies gets instantiated.

Yes, please - I requested this when the revert was originally proposed :(

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-04-29 13:52       ` Mark Brown
@ 2020-04-29 16:56         ` John Stultz
  0 siblings, 0 replies; 14+ messages in thread
From: John Stultz @ 2020-04-29 16:56 UTC (permalink / raw)
  To: Mark Brown
  Cc: Marek Szyprowski, lkml, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Greg Kroah-Hartman,
	Rafael J . Wysocki, Rob Herring, Geert Uytterhoeven,
	Yoshihiro Shimoda, Robin Murphy, Andy Shevchenko, Sudeep Holla,
	Andy Shevchenko, Naresh Kamboju, Basil Eljuse, Ferry Toth,
	Arnd Bergmann, Anders Roxell, netdev, Linux PM list,
	Linux Samsung SOC

On Wed, Apr 29, 2020 at 6:52 AM Mark Brown <broonie@kernel.org> wrote:
> On Wed, Apr 29, 2020 at 03:46:04PM +0200, Marek Szyprowski wrote:
> > On 22.04.2020 22:32, John Stultz wrote:
>
> > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > > Signed-off-by: John Stultz <john.stultz@linaro.org>
>
> > Please also revert dca0b44957e5 "regulator: Use
> > driver_deferred_probe_timeout for regulator_init_complete_work" then,
> > because now with the default 0 timeout some regulators gets disabled
> > during boot, before their supplies gets instantiated.
>
> Yes, please - I requested this when the revert was originally proposed :(

Oh, my apologies. I misunderstood what you were suggesting earlier.
Sorry for being thick headed.

I'll spin up a revert here shortly.

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-04-22 20:32 ` [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0 John Stultz
  2020-04-23  7:23   ` Geert Uytterhoeven
       [not found]   ` <CGME20200429134605eucas1p2bd601082e7a6b8c8fdbe79c83972e2e3@eucas1p2.samsung.com>
@ 2020-08-06 13:52   ` Thierry Reding
  2020-08-07  2:09     ` John Stultz
  2 siblings, 1 reply; 14+ messages in thread
From: Thierry Reding @ 2020-08-06 13:52 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Geert Uytterhoeven, Yoshihiro Shimoda, Robin Murphy,
	Andy Shevchenko, Sudeep Holla, Andy Shevchenko, Naresh Kamboju,
	Basil Eljuse, Ferry Toth, Arnd Bergmann, Anders Roxell, netdev,
	linux-pm, linux-tegra, Jon Hunter

[-- Attachment #1: Type: text/plain, Size: 7524 bytes --]

On Wed, Apr 22, 2020 at 08:32:43PM +0000, John Stultz wrote:
> This patch addresses a regression in 5.7-rc1+
> 
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic"), we both cleaned up
> the logic and also set the default driver_deferred_probe_timeout
> value to 30 seconds to allow for drivers that are missing
> dependencies to have some time so that the dependency may be
> loaded from userland after initcalls_done is set.
> 
> However, Yoshihiro Shimoda reported that on his device that
> expects to have unmet dependencies (due to "optional links" in
> its devicetree), was failing to mount the NFS root.
> 
> In digging further, it seemed the problem was that while the
> device properly probes after waiting 30 seconds for any missing
> modules to load, the ip_auto_config() had already failed,
> resulting in NFS to fail. This was due to ip_auto_config()
> calling wait_for_device_probe() which doesn't wait for the
> driver_deferred_probe_timeout to fire.
> 
> Fixing that issue is possible, but could also introduce 30
> second delays in bootups for users who don't have any
> missing dependencies, which is not ideal.
> 
> So I think the best solution to avoid any regressions is to
> revert back to a default timeout value of zero, and allow
> systems that need to utilize the timeout in order for userland
> to load any modules that supply misisng dependencies in the dts
> to specify the timeout length via the exiting documented boot
> argument.
> 
> Thanks to Geert for chasing down that ip_auto_config was why NFS
> was failing in this case!
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> Cc: Ferry Toth <fntoth@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/base/dd.c | 13 ++-----------
>  1 file changed, 2 insertions(+), 11 deletions(-)

Sorry for being a bit late to the party, but this breaks suspend/resume
support on various Tegra devices. I've only noticed now because, well,
suspend/resume have been broken for other reasons for a little while and
it's taken us a bit to resolve those issues.

But now that those other issues have been fixed, I've started seeing an
issue where after resume from suspend some of the I2C controllers are no
longer working. The reason for this is that they share pins with DP AUX
controllers via the pinctrl framework. The DP AUX driver registers as
part of the DRM/KMS driver, which usually happens in userspace. Since
the deferred probe timeout was set to 0 by default this no longer works
because no pinctrl states are assigned to the I2C controller and
therefore upon resume the pins cannot be configured for I2C operation.

I'm also somewhat confused by this patch and a few before because they
claim that they restore previous default behaviour, but that's just not
true. Originally when this timeout was introduced it was -1, which meant
that there was no timeout at all and hence users had to opt-in if they
wanted to use a deferred probe timeout.

But now after this series the default is for there to be a very short
timeout, which in turn causes existing use-cases to potentially break.
I'm also going to suggest here that in most cases a driver will require
the resources that it asks for, so the case that Yoshihiro described and
that this patch is meant to fix sounds to me like it's the odd one out
rather than the other way around.

But I realize that that's not very constructive. So perhaps we can find
some other way for drivers to advertise that their dependencies are
optional? I came up with the below patch, which restores suspend/resume
on Tegra and could be used in conjunction with a patch that opts into
this behaviour for the problematic driver in Yoshihiro's case to make
this again work for everyone.

--- >8 ---
From a95f8f41b8a32dee3434db4f0515af7376d1873a Mon Sep 17 00:00:00 2001
From: Thierry Reding <treding@nvidia.com>
Date: Thu, 6 Aug 2020 14:51:59 +0200
Subject: [PATCH] driver core: Do not ignore dependencies by default

Many drivers do require the resources that they ask for and timing out
may not always be an option. While there is a way to allow probing to
continue to be deferred for some time after the system has booted, the
fact that this is controlled via a command-line parameter is undesired
because it require manual intervention, whereas in can be avoid in the
majority of cases.

Instead of requiring users to edit the kernel command-line, add a way
for drivers to specify whether or not their dependencies are optional
so that they can continue deferring probe indefinitely.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/base/dd.c             | 2 +-
 include/linux/device/driver.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 857b0a928e8d..11e747070eae 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -279,7 +279,7 @@ int driver_deferred_probe_check_state(struct device *dev)
 		return -ENODEV;
 	}
 
-	if (!driver_deferred_probe_timeout && initcalls_done) {
+	if (dev->driver->ignore_dependencies && !driver_deferred_probe_timeout && initcalls_done) {
 		dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
 		return -ETIMEDOUT;
 	}
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index ee7ba5b5417e..6994455e8a2e 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -100,6 +100,7 @@ struct device_driver {
 	const char		*mod_name;	/* used for built-in modules */
 
 	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool ignore_dependencies;	/* ignores dependencies */
 	enum probe_type probe_type;
 
 	const struct of_device_id	*of_match_table;
-- 
2.27.0
--- >8 ---

Although, thinking about it a bit more it sounds to me like an even
better approach would be to make this part of the API where a resource
is requested. There are in fact already APIs that can request optional
resources (such as regulator_get_optional()), so I think it would make
more sense for any driver that can live without a resource to request
it with an optional flag, which in turn could then trigger this code
path for the deferred probe timeout. For anyone that really needs the
resources that they request, they really shouldn't have to jump through
hoops to get there.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-08-06 13:52   ` Thierry Reding
@ 2020-08-07  2:09     ` John Stultz
  2020-08-07 11:02       ` Thierry Reding
  0 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2020-08-07  2:09 UTC (permalink / raw)
  To: Thierry Reding
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Geert Uytterhoeven, Yoshihiro Shimoda, Robin Murphy,
	Andy Shevchenko, Sudeep Holla, Andy Shevchenko, Naresh Kamboju,
	Basil Eljuse, Ferry Toth, Arnd Bergmann, Anders Roxell, netdev,
	Linux PM list, linux-tegra, Jon Hunter

On Thu, Aug 6, 2020 at 6:52 AM Thierry Reding <thierry.reding@gmail.com> wrote:
>
> On Wed, Apr 22, 2020 at 08:32:43PM +0000, John Stultz wrote:
> > This patch addresses a regression in 5.7-rc1+
> >
> > In commit c8c43cee29f6 ("driver core: Fix
> > driver_deferred_probe_check_state() logic"), we both cleaned up
> > the logic and also set the default driver_deferred_probe_timeout
> > value to 30 seconds to allow for drivers that are missing
> > dependencies to have some time so that the dependency may be
> > loaded from userland after initcalls_done is set.
> >
> > However, Yoshihiro Shimoda reported that on his device that
> > expects to have unmet dependencies (due to "optional links" in
> > its devicetree), was failing to mount the NFS root.
> >
> > In digging further, it seemed the problem was that while the
> > device properly probes after waiting 30 seconds for any missing
> > modules to load, the ip_auto_config() had already failed,
> > resulting in NFS to fail. This was due to ip_auto_config()
> > calling wait_for_device_probe() which doesn't wait for the
> > driver_deferred_probe_timeout to fire.
> >
> > Fixing that issue is possible, but could also introduce 30
> > second delays in bootups for users who don't have any
> > missing dependencies, which is not ideal.
> >
> > So I think the best solution to avoid any regressions is to
> > revert back to a default timeout value of zero, and allow
> > systems that need to utilize the timeout in order for userland
> > to load any modules that supply misisng dependencies in the dts
> > to specify the timeout length via the exiting documented boot
> > argument.
> >
> > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > was failing in this case!
> >
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> > Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> > Cc: Ferry Toth <fntoth@gmail.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Anders Roxell <anders.roxell@linaro.org>
> > Cc: netdev <netdev@vger.kernel.org>
> > Cc: linux-pm@vger.kernel.org
> > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  drivers/base/dd.c | 13 ++-----------
> >  1 file changed, 2 insertions(+), 11 deletions(-)
>
> Sorry for being a bit late to the party, but this breaks suspend/resume
> support on various Tegra devices. I've only noticed now because, well,
> suspend/resume have been broken for other reasons for a little while and
> it's taken us a bit to resolve those issues.
>
> But now that those other issues have been fixed, I've started seeing an
> issue where after resume from suspend some of the I2C controllers are no
> longer working. The reason for this is that they share pins with DP AUX
> controllers via the pinctrl framework. The DP AUX driver registers as
> part of the DRM/KMS driver, which usually happens in userspace. Since
> the deferred probe timeout was set to 0 by default this no longer works
> because no pinctrl states are assigned to the I2C controller and
> therefore upon resume the pins cannot be configured for I2C operation.

Oof. My apologies!

> I'm also somewhat confused by this patch and a few before because they
> claim that they restore previous default behaviour, but that's just not
> true. Originally when this timeout was introduced it was -1, which meant
> that there was no timeout at all and hence users had to opt-in if they
> wanted to use a deferred probe timeout.

I don't think that's quite true, since the point of my original
changes were to avoid troubles I was seeing with drivers not loading
because once the timeout fired after init, driver loading would fail
with ENODEV instead of returning EPROBE_DEFER. The logic that existed
was buggy so the timeout handling didn't really work (changing the
boot argument wouldn't help, because after init the logic would return
ENODEV before it checked the timeout value).

That said, looking at it now, I do realize the
driver_deferred_probe_check_state_continue() logic in effect never
returned ETIMEDOUT before was consolidated in the earlier changes, and
now we've backed the default timeout to 0, old user (see bec6c0ecb243)
will now get ETIMEDOUT where they wouldn't before.

So would the following fix it up for you? (sorry its whitespace corrupted)

diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
index c6fe7d64c913..c7448be64d07 100644
--- a/drivers/pinctrl/devicetree.c
+++ b/drivers/pinctrl/devicetree.c
@@ -129,9 +129,8 @@ static int dt_to_map_one_config(struct pinctrl *p,
                if (!np_pctldev || of_node_is_root(np_pctldev)) {
                        of_node_put(np_pctldev);
                        ret = driver_deferred_probe_check_state(p->dev);
-                       /* keep deferring if modules are enabled
unless we've timed out */
-                       if (IS_ENABLED(CONFIG_MODULES) && !allow_default &&
-                           (ret == -ENODEV))
+                       /* keep deferring if modules are enabled */
+                       if (IS_ENABLED(CONFIG_MODULES) &&
!allow_default && ret < 0)
                                ret = -EPROBE_DEFER;
                        return ret;
                }

(you could probably argue calling driver_deferred_probe_check_state
checking ret at all is silly here, since EPROBE_DEFER is the only
option you want)

> But now after this series the default is for there to be a very short
> timeout, which in turn causes existing use-cases to potentially break.
> I'm also going to suggest here that in most cases a driver will require
> the resources that it asks for, so the case that Yoshihiro described and
> that this patch is meant to fix sounds to me like it's the odd one out
> rather than the other way around.
>
> But I realize that that's not very constructive. So perhaps we can find
> some other way for drivers to advertise that their dependencies are
> optional? I came up with the below patch, which restores suspend/resume
> on Tegra and could be used in conjunction with a patch that opts into
> this behaviour for the problematic driver in Yoshihiro's case to make
> this again work for everyone.
>
> --- >8 ---
> From a95f8f41b8a32dee3434db4f0515af7376d1873a Mon Sep 17 00:00:00 2001
> From: Thierry Reding <treding@nvidia.com>
> Date: Thu, 6 Aug 2020 14:51:59 +0200
> Subject: [PATCH] driver core: Do not ignore dependencies by default
>
> Many drivers do require the resources that they ask for and timing out
> may not always be an option. While there is a way to allow probing to
> continue to be deferred for some time after the system has booted, the
> fact that this is controlled via a command-line parameter is undesired
> because it require manual intervention, whereas in can be avoid in the
> majority of cases.
>
> Instead of requiring users to edit the kernel command-line, add a way
> for drivers to specify whether or not their dependencies are optional
> so that they can continue deferring probe indefinitely.
>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/base/dd.c             | 2 +-
>  include/linux/device/driver.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 857b0a928e8d..11e747070eae 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -279,7 +279,7 @@ int driver_deferred_probe_check_state(struct device *dev)
>                 return -ENODEV;
>         }
>
> -       if (!driver_deferred_probe_timeout && initcalls_done) {
> +       if (dev->driver->ignore_dependencies && !driver_deferred_probe_timeout && initcalls_done) {
>                 dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
>                 return -ETIMEDOUT;
>         }
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index ee7ba5b5417e..6994455e8a2e 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -100,6 +100,7 @@ struct device_driver {
>         const char              *mod_name;      /* used for built-in modules */
>
>         bool suppress_bind_attrs;       /* disables bind/unbind via sysfs */
> +       bool ignore_dependencies;       /* ignores dependencies */
>         enum probe_type probe_type;
>
>         const struct of_device_id       *of_match_table;
> --
> 2.27.0
> --- >8 ---
>
> Although, thinking about it a bit more it sounds to me like an even
> better approach would be to make this part of the API where a resource
> is requested. There are in fact already APIs that can request optional
> resources (such as regulator_get_optional()), so I think it would make
> more sense for any driver that can live without a resource to request
> it with an optional flag, which in turn could then trigger this code
> path for the deferred probe timeout. For anyone that really needs the
> resources that they request, they really shouldn't have to jump through
> hoops to get there.

I do like these suggestions. I feel like the optional dt links
handling is very opaque as there's no real way to tell if a dt link is
truly optional or not. Having the timeout is a really subtle and
implicit way of getting that optional link handling behavior.

thanks
-john

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-08-07  2:09     ` John Stultz
@ 2020-08-07 11:02       ` Thierry Reding
  2020-08-07 12:26         ` Thierry Reding
  0 siblings, 1 reply; 14+ messages in thread
From: Thierry Reding @ 2020-08-07 11:02 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Geert Uytterhoeven, Yoshihiro Shimoda, Robin Murphy,
	Andy Shevchenko, Sudeep Holla, Andy Shevchenko, Naresh Kamboju,
	Basil Eljuse, Ferry Toth, Arnd Bergmann, Anders Roxell, netdev,
	Linux PM list, linux-tegra, Jon Hunter

[-- Attachment #1: Type: text/plain, Size: 11152 bytes --]

On Thu, Aug 06, 2020 at 07:09:16PM -0700, John Stultz wrote:
> On Thu, Aug 6, 2020 at 6:52 AM Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > On Wed, Apr 22, 2020 at 08:32:43PM +0000, John Stultz wrote:
> > > This patch addresses a regression in 5.7-rc1+
> > >
> > > In commit c8c43cee29f6 ("driver core: Fix
> > > driver_deferred_probe_check_state() logic"), we both cleaned up
> > > the logic and also set the default driver_deferred_probe_timeout
> > > value to 30 seconds to allow for drivers that are missing
> > > dependencies to have some time so that the dependency may be
> > > loaded from userland after initcalls_done is set.
> > >
> > > However, Yoshihiro Shimoda reported that on his device that
> > > expects to have unmet dependencies (due to "optional links" in
> > > its devicetree), was failing to mount the NFS root.
> > >
> > > In digging further, it seemed the problem was that while the
> > > device properly probes after waiting 30 seconds for any missing
> > > modules to load, the ip_auto_config() had already failed,
> > > resulting in NFS to fail. This was due to ip_auto_config()
> > > calling wait_for_device_probe() which doesn't wait for the
> > > driver_deferred_probe_timeout to fire.
> > >
> > > Fixing that issue is possible, but could also introduce 30
> > > second delays in bootups for users who don't have any
> > > missing dependencies, which is not ideal.
> > >
> > > So I think the best solution to avoid any regressions is to
> > > revert back to a default timeout value of zero, and allow
> > > systems that need to utilize the timeout in order for userland
> > > to load any modules that supply misisng dependencies in the dts
> > > to specify the timeout length via the exiting documented boot
> > > argument.
> > >
> > > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > > was failing in this case!
> > >
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Cc: Rob Herring <robh@kernel.org>
> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Cc: Robin Murphy <robin.murphy@arm.com>
> > > Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> > > Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> > > Cc: Ferry Toth <fntoth@gmail.com>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Anders Roxell <anders.roxell@linaro.org>
> > > Cc: netdev <netdev@vger.kernel.org>
> > > Cc: linux-pm@vger.kernel.org
> > > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > > ---
> > >  drivers/base/dd.c | 13 ++-----------
> > >  1 file changed, 2 insertions(+), 11 deletions(-)
> >
> > Sorry for being a bit late to the party, but this breaks suspend/resume
> > support on various Tegra devices. I've only noticed now because, well,
> > suspend/resume have been broken for other reasons for a little while and
> > it's taken us a bit to resolve those issues.
> >
> > But now that those other issues have been fixed, I've started seeing an
> > issue where after resume from suspend some of the I2C controllers are no
> > longer working. The reason for this is that they share pins with DP AUX
> > controllers via the pinctrl framework. The DP AUX driver registers as
> > part of the DRM/KMS driver, which usually happens in userspace. Since
> > the deferred probe timeout was set to 0 by default this no longer works
> > because no pinctrl states are assigned to the I2C controller and
> > therefore upon resume the pins cannot be configured for I2C operation.
> 
> Oof. My apologies!
> 
> > I'm also somewhat confused by this patch and a few before because they
> > claim that they restore previous default behaviour, but that's just not
> > true. Originally when this timeout was introduced it was -1, which meant
> > that there was no timeout at all and hence users had to opt-in if they
> > wanted to use a deferred probe timeout.
> 
> I don't think that's quite true, since the point of my original
> changes were to avoid troubles I was seeing with drivers not loading
> because once the timeout fired after init, driver loading would fail
> with ENODEV instead of returning EPROBE_DEFER. The logic that existed
> was buggy so the timeout handling didn't really work (changing the
> boot argument wouldn't help, because after init the logic would return
> ENODEV before it checked the timeout value).
> 
> That said, looking at it now, I do realize the
> driver_deferred_probe_check_state_continue() logic in effect never
> returned ETIMEDOUT before was consolidated in the earlier changes, and
> now we've backed the default timeout to 0, old user (see bec6c0ecb243)
> will now get ETIMEDOUT where they wouldn't before.
> 
> So would the following fix it up for you? (sorry its whitespace corrupted)
> 
> diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
> index c6fe7d64c913..c7448be64d07 100644
> --- a/drivers/pinctrl/devicetree.c
> +++ b/drivers/pinctrl/devicetree.c
> @@ -129,9 +129,8 @@ static int dt_to_map_one_config(struct pinctrl *p,
>                 if (!np_pctldev || of_node_is_root(np_pctldev)) {
>                         of_node_put(np_pctldev);
>                         ret = driver_deferred_probe_check_state(p->dev);
> -                       /* keep deferring if modules are enabled
> unless we've timed out */
> -                       if (IS_ENABLED(CONFIG_MODULES) && !allow_default &&
> -                           (ret == -ENODEV))
> +                       /* keep deferring if modules are enabled */
> +                       if (IS_ENABLED(CONFIG_MODULES) &&
> !allow_default && ret < 0)
>                                 ret = -EPROBE_DEFER;
>                         return ret;
>                 }
> 
> (you could probably argue calling driver_deferred_probe_check_state
> checking ret at all is silly here, since EPROBE_DEFER is the only
> option you want)

Just by looking at it I would've said, yes, this looks like it would fix
it. However, upon giving this a try, I see the same pinmux issues as I
was seeing before. I'm going to have to dig a little deeper to see if I
can find out why that is, because it isn't obvious to me right away.

> 
> > But now after this series the default is for there to be a very short
> > timeout, which in turn causes existing use-cases to potentially break.
> > I'm also going to suggest here that in most cases a driver will require
> > the resources that it asks for, so the case that Yoshihiro described and
> > that this patch is meant to fix sounds to me like it's the odd one out
> > rather than the other way around.
> >
> > But I realize that that's not very constructive. So perhaps we can find
> > some other way for drivers to advertise that their dependencies are
> > optional? I came up with the below patch, which restores suspend/resume
> > on Tegra and could be used in conjunction with a patch that opts into
> > this behaviour for the problematic driver in Yoshihiro's case to make
> > this again work for everyone.
> >
> > --- >8 ---
> > From a95f8f41b8a32dee3434db4f0515af7376d1873a Mon Sep 17 00:00:00 2001
> > From: Thierry Reding <treding@nvidia.com>
> > Date: Thu, 6 Aug 2020 14:51:59 +0200
> > Subject: [PATCH] driver core: Do not ignore dependencies by default
> >
> > Many drivers do require the resources that they ask for and timing out
> > may not always be an option. While there is a way to allow probing to
> > continue to be deferred for some time after the system has booted, the
> > fact that this is controlled via a command-line parameter is undesired
> > because it require manual intervention, whereas in can be avoid in the
> > majority of cases.
> >
> > Instead of requiring users to edit the kernel command-line, add a way
> > for drivers to specify whether or not their dependencies are optional
> > so that they can continue deferring probe indefinitely.
> >
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >  drivers/base/dd.c             | 2 +-
> >  include/linux/device/driver.h | 1 +
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 857b0a928e8d..11e747070eae 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -279,7 +279,7 @@ int driver_deferred_probe_check_state(struct device *dev)
> >                 return -ENODEV;
> >         }
> >
> > -       if (!driver_deferred_probe_timeout && initcalls_done) {
> > +       if (dev->driver->ignore_dependencies && !driver_deferred_probe_timeout && initcalls_done) {
> >                 dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
> >                 return -ETIMEDOUT;
> >         }
> > diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> > index ee7ba5b5417e..6994455e8a2e 100644
> > --- a/include/linux/device/driver.h
> > +++ b/include/linux/device/driver.h
> > @@ -100,6 +100,7 @@ struct device_driver {
> >         const char              *mod_name;      /* used for built-in modules */
> >
> >         bool suppress_bind_attrs;       /* disables bind/unbind via sysfs */
> > +       bool ignore_dependencies;       /* ignores dependencies */
> >         enum probe_type probe_type;
> >
> >         const struct of_device_id       *of_match_table;
> > --
> > 2.27.0
> > --- >8 ---
> >
> > Although, thinking about it a bit more it sounds to me like an even
> > better approach would be to make this part of the API where a resource
> > is requested. There are in fact already APIs that can request optional
> > resources (such as regulator_get_optional()), so I think it would make
> > more sense for any driver that can live without a resource to request
> > it with an optional flag, which in turn could then trigger this code
> > path for the deferred probe timeout. For anyone that really needs the
> > resources that they request, they really shouldn't have to jump through
> > hoops to get there.
> 
> I do like these suggestions. I feel like the optional dt links
> handling is very opaque as there's no real way to tell if a dt link is
> truly optional or not. Having the timeout is a really subtle and
> implicit way of getting that optional link handling behavior.

Agreed. Handling this implicitly is pretty wonky and as evidenced by
these two cases bound to break one way or another.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0
  2020-08-07 11:02       ` Thierry Reding
@ 2020-08-07 12:26         ` Thierry Reding
  0 siblings, 0 replies; 14+ messages in thread
From: Thierry Reding @ 2020-08-07 12:26 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Greg Kroah-Hartman, Rafael J . Wysocki,
	Rob Herring, Geert Uytterhoeven, Yoshihiro Shimoda, Robin Murphy,
	Andy Shevchenko, Sudeep Holla, Andy Shevchenko, Naresh Kamboju,
	Basil Eljuse, Ferry Toth, Arnd Bergmann, Anders Roxell, netdev,
	Linux PM list, linux-tegra, Jon Hunter

[-- Attachment #1: Type: text/plain, Size: 7431 bytes --]

On Fri, Aug 07, 2020 at 01:02:44PM +0200, Thierry Reding wrote:
> On Thu, Aug 06, 2020 at 07:09:16PM -0700, John Stultz wrote:
> > On Thu, Aug 6, 2020 at 6:52 AM Thierry Reding <thierry.reding@gmail.com> wrote:
> > >
> > > On Wed, Apr 22, 2020 at 08:32:43PM +0000, John Stultz wrote:
> > > > This patch addresses a regression in 5.7-rc1+
> > > >
> > > > In commit c8c43cee29f6 ("driver core: Fix
> > > > driver_deferred_probe_check_state() logic"), we both cleaned up
> > > > the logic and also set the default driver_deferred_probe_timeout
> > > > value to 30 seconds to allow for drivers that are missing
> > > > dependencies to have some time so that the dependency may be
> > > > loaded from userland after initcalls_done is set.
> > > >
> > > > However, Yoshihiro Shimoda reported that on his device that
> > > > expects to have unmet dependencies (due to "optional links" in
> > > > its devicetree), was failing to mount the NFS root.
> > > >
> > > > In digging further, it seemed the problem was that while the
> > > > device properly probes after waiting 30 seconds for any missing
> > > > modules to load, the ip_auto_config() had already failed,
> > > > resulting in NFS to fail. This was due to ip_auto_config()
> > > > calling wait_for_device_probe() which doesn't wait for the
> > > > driver_deferred_probe_timeout to fire.
> > > >
> > > > Fixing that issue is possible, but could also introduce 30
> > > > second delays in bootups for users who don't have any
> > > > missing dependencies, which is not ideal.
> > > >
> > > > So I think the best solution to avoid any regressions is to
> > > > revert back to a default timeout value of zero, and allow
> > > > systems that need to utilize the timeout in order for userland
> > > > to load any modules that supply misisng dependencies in the dts
> > > > to specify the timeout length via the exiting documented boot
> > > > argument.
> > > >
> > > > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > > > was failing in this case!
> > > >
> > > > Cc: "David S. Miller" <davem@davemloft.net>
> > > > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > Cc: Rob Herring <robh@kernel.org>
> > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > > Cc: Robin Murphy <robin.murphy@arm.com>
> > > > Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> > > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> > > > Cc: Basil Eljuse <Basil.Eljuse@arm.com>
> > > > Cc: Ferry Toth <fntoth@gmail.com>
> > > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > > Cc: Anders Roxell <anders.roxell@linaro.org>
> > > > Cc: netdev <netdev@vger.kernel.org>
> > > > Cc: linux-pm@vger.kernel.org
> > > > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > > Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > > > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > > > ---
> > > >  drivers/base/dd.c | 13 ++-----------
> > > >  1 file changed, 2 insertions(+), 11 deletions(-)
> > >
> > > Sorry for being a bit late to the party, but this breaks suspend/resume
> > > support on various Tegra devices. I've only noticed now because, well,
> > > suspend/resume have been broken for other reasons for a little while and
> > > it's taken us a bit to resolve those issues.
> > >
> > > But now that those other issues have been fixed, I've started seeing an
> > > issue where after resume from suspend some of the I2C controllers are no
> > > longer working. The reason for this is that they share pins with DP AUX
> > > controllers via the pinctrl framework. The DP AUX driver registers as
> > > part of the DRM/KMS driver, which usually happens in userspace. Since
> > > the deferred probe timeout was set to 0 by default this no longer works
> > > because no pinctrl states are assigned to the I2C controller and
> > > therefore upon resume the pins cannot be configured for I2C operation.
> > 
> > Oof. My apologies!
> > 
> > > I'm also somewhat confused by this patch and a few before because they
> > > claim that they restore previous default behaviour, but that's just not
> > > true. Originally when this timeout was introduced it was -1, which meant
> > > that there was no timeout at all and hence users had to opt-in if they
> > > wanted to use a deferred probe timeout.
> > 
> > I don't think that's quite true, since the point of my original
> > changes were to avoid troubles I was seeing with drivers not loading
> > because once the timeout fired after init, driver loading would fail
> > with ENODEV instead of returning EPROBE_DEFER. The logic that existed
> > was buggy so the timeout handling didn't really work (changing the
> > boot argument wouldn't help, because after init the logic would return
> > ENODEV before it checked the timeout value).
> > 
> > That said, looking at it now, I do realize the
> > driver_deferred_probe_check_state_continue() logic in effect never
> > returned ETIMEDOUT before was consolidated in the earlier changes, and
> > now we've backed the default timeout to 0, old user (see bec6c0ecb243)
> > will now get ETIMEDOUT where they wouldn't before.
> > 
> > So would the following fix it up for you? (sorry its whitespace corrupted)
> > 
> > diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
> > index c6fe7d64c913..c7448be64d07 100644
> > --- a/drivers/pinctrl/devicetree.c
> > +++ b/drivers/pinctrl/devicetree.c
> > @@ -129,9 +129,8 @@ static int dt_to_map_one_config(struct pinctrl *p,
> >                 if (!np_pctldev || of_node_is_root(np_pctldev)) {
> >                         of_node_put(np_pctldev);
> >                         ret = driver_deferred_probe_check_state(p->dev);
> > -                       /* keep deferring if modules are enabled
> > unless we've timed out */
> > -                       if (IS_ENABLED(CONFIG_MODULES) && !allow_default &&
> > -                           (ret == -ENODEV))
> > +                       /* keep deferring if modules are enabled */
> > +                       if (IS_ENABLED(CONFIG_MODULES) &&
> > !allow_default && ret < 0)
> >                                 ret = -EPROBE_DEFER;
> >                         return ret;
> >                 }
> > 
> > (you could probably argue calling driver_deferred_probe_check_state
> > checking ret at all is silly here, since EPROBE_DEFER is the only
> > option you want)
> 
> Just by looking at it I would've said, yes, this looks like it would fix
> it. However, upon giving this a try, I see the same pinmux issues as I
> was seeing before. I'm going to have to dig a little deeper to see if I
> can find out why that is, because it isn't obvious to me right away.

Oh, nevermind. I managed to somehow mess up the device tree. With that
fixed the above also restores the pinmux attachment for the I2C
controller and suspend/resume works fine.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-07 12:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22 20:32 [PATCH v3 0/3] Fixes for deferred_probe_timeout cleanup John Stultz
2020-04-22 20:32 ` [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0 John Stultz
2020-04-23  7:23   ` Geert Uytterhoeven
     [not found]   ` <CGME20200429134605eucas1p2bd601082e7a6b8c8fdbe79c83972e2e3@eucas1p2.samsung.com>
2020-04-29 13:46     ` Marek Szyprowski
2020-04-29 13:52       ` Mark Brown
2020-04-29 16:56         ` John Stultz
2020-08-06 13:52   ` Thierry Reding
2020-08-07  2:09     ` John Stultz
2020-08-07 11:02       ` Thierry Reding
2020-08-07 12:26         ` Thierry Reding
2020-04-22 20:32 ` [PATCH v3 2/3] driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings John Stultz
2020-04-23  9:33   ` Yoshihiro Shimoda
2020-04-22 20:32 ` [PATCH v3 3/3] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires John Stultz
2020-04-23  7:25   ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).