On Fri, Aug 07, 2020 at 01:02:44PM +0200, Thierry Reding wrote: > On Thu, Aug 06, 2020 at 07:09:16PM -0700, John Stultz wrote: > > On Thu, Aug 6, 2020 at 6:52 AM Thierry Reding wrote: > > > > > > On Wed, Apr 22, 2020 at 08:32:43PM +0000, John Stultz wrote: > > > > This patch addresses a regression in 5.7-rc1+ > > > > > > > > In commit c8c43cee29f6 ("driver core: Fix > > > > driver_deferred_probe_check_state() logic"), we both cleaned up > > > > the logic and also set the default driver_deferred_probe_timeout > > > > value to 30 seconds to allow for drivers that are missing > > > > dependencies to have some time so that the dependency may be > > > > loaded from userland after initcalls_done is set. > > > > > > > > However, Yoshihiro Shimoda reported that on his device that > > > > expects to have unmet dependencies (due to "optional links" in > > > > its devicetree), was failing to mount the NFS root. > > > > > > > > In digging further, it seemed the problem was that while the > > > > device properly probes after waiting 30 seconds for any missing > > > > modules to load, the ip_auto_config() had already failed, > > > > resulting in NFS to fail. This was due to ip_auto_config() > > > > calling wait_for_device_probe() which doesn't wait for the > > > > driver_deferred_probe_timeout to fire. > > > > > > > > Fixing that issue is possible, but could also introduce 30 > > > > second delays in bootups for users who don't have any > > > > missing dependencies, which is not ideal. > > > > > > > > So I think the best solution to avoid any regressions is to > > > > revert back to a default timeout value of zero, and allow > > > > systems that need to utilize the timeout in order for userland > > > > to load any modules that supply misisng dependencies in the dts > > > > to specify the timeout length via the exiting documented boot > > > > argument. > > > > > > > > Thanks to Geert for chasing down that ip_auto_config was why NFS > > > > was failing in this case! > > > > > > > > Cc: "David S. Miller" > > > > Cc: Alexey Kuznetsov > > > > Cc: Hideaki YOSHIFUJI > > > > Cc: Jakub Kicinski > > > > Cc: Greg Kroah-Hartman > > > > Cc: Rafael J. Wysocki > > > > Cc: Rob Herring > > > > Cc: Geert Uytterhoeven > > > > Cc: Yoshihiro Shimoda > > > > Cc: Robin Murphy > > > > Cc: Andy Shevchenko > > > > Cc: Sudeep Holla > > > > Cc: Andy Shevchenko > > > > Cc: Naresh Kamboju > > > > Cc: Basil Eljuse > > > > Cc: Ferry Toth > > > > Cc: Arnd Bergmann > > > > Cc: Anders Roxell > > > > Cc: netdev > > > > Cc: linux-pm@vger.kernel.org > > > > Reported-by: Yoshihiro Shimoda > > > > Tested-by: Yoshihiro Shimoda > > > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic") > > > > Signed-off-by: John Stultz > > > > --- > > > > drivers/base/dd.c | 13 ++----------- > > > > 1 file changed, 2 insertions(+), 11 deletions(-) > > > > > > Sorry for being a bit late to the party, but this breaks suspend/resume > > > support on various Tegra devices. I've only noticed now because, well, > > > suspend/resume have been broken for other reasons for a little while and > > > it's taken us a bit to resolve those issues. > > > > > > But now that those other issues have been fixed, I've started seeing an > > > issue where after resume from suspend some of the I2C controllers are no > > > longer working. The reason for this is that they share pins with DP AUX > > > controllers via the pinctrl framework. The DP AUX driver registers as > > > part of the DRM/KMS driver, which usually happens in userspace. Since > > > the deferred probe timeout was set to 0 by default this no longer works > > > because no pinctrl states are assigned to the I2C controller and > > > therefore upon resume the pins cannot be configured for I2C operation. > > > > Oof. My apologies! > > > > > I'm also somewhat confused by this patch and a few before because they > > > claim that they restore previous default behaviour, but that's just not > > > true. Originally when this timeout was introduced it was -1, which meant > > > that there was no timeout at all and hence users had to opt-in if they > > > wanted to use a deferred probe timeout. > > > > I don't think that's quite true, since the point of my original > > changes were to avoid troubles I was seeing with drivers not loading > > because once the timeout fired after init, driver loading would fail > > with ENODEV instead of returning EPROBE_DEFER. The logic that existed > > was buggy so the timeout handling didn't really work (changing the > > boot argument wouldn't help, because after init the logic would return > > ENODEV before it checked the timeout value). > > > > That said, looking at it now, I do realize the > > driver_deferred_probe_check_state_continue() logic in effect never > > returned ETIMEDOUT before was consolidated in the earlier changes, and > > now we've backed the default timeout to 0, old user (see bec6c0ecb243) > > will now get ETIMEDOUT where they wouldn't before. > > > > So would the following fix it up for you? (sorry its whitespace corrupted) > > > > diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c > > index c6fe7d64c913..c7448be64d07 100644 > > --- a/drivers/pinctrl/devicetree.c > > +++ b/drivers/pinctrl/devicetree.c > > @@ -129,9 +129,8 @@ static int dt_to_map_one_config(struct pinctrl *p, > > if (!np_pctldev || of_node_is_root(np_pctldev)) { > > of_node_put(np_pctldev); > > ret = driver_deferred_probe_check_state(p->dev); > > - /* keep deferring if modules are enabled > > unless we've timed out */ > > - if (IS_ENABLED(CONFIG_MODULES) && !allow_default && > > - (ret == -ENODEV)) > > + /* keep deferring if modules are enabled */ > > + if (IS_ENABLED(CONFIG_MODULES) && > > !allow_default && ret < 0) > > ret = -EPROBE_DEFER; > > return ret; > > } > > > > (you could probably argue calling driver_deferred_probe_check_state > > checking ret at all is silly here, since EPROBE_DEFER is the only > > option you want) > > Just by looking at it I would've said, yes, this looks like it would fix > it. However, upon giving this a try, I see the same pinmux issues as I > was seeing before. I'm going to have to dig a little deeper to see if I > can find out why that is, because it isn't obvious to me right away. Oh, nevermind. I managed to somehow mess up the device tree. With that fixed the above also restores the pinmux attachment for the I2C controller and suspend/resume works fine. Thierry