linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Re: [GIT PULL] Driver core changes for 6.0-rc1
       [not found] <YuqDMLF2AQyj4+N1@kroah.com>
@ 2022-09-12 17:23 ` Olof Johansson
  2022-09-12 17:24   ` Olof Johansson
  0 siblings, 1 reply; 10+ messages in thread
From: Olof Johansson @ 2022-09-12 17:23 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell,
	Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang

Hi,

On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> Saravana Kannan (11):
>       PM: domains: Delete usage of driver_deferred_probe_check_state()
>       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
>       net: mdio: Delete usage of driver_deferred_probe_check_state()
>       driver core: Add wait_for_init_devices_probe helper function
>       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
>       Revert "driver core: Set default deferred_probe_timeout back to 0."
>       driver core: Set fw_devlink.strict=1 by default
>       iommu/of: Delete usage of driver_deferred_probe_check_state()
>       driver core: Delete driver_deferred_probe_check_state()
>       driver core: fw_devlink: Allow firmware to mark devices as best effort
>       of: base: Avoid console probe delay when fw_devlink.strict=1

The last patch in this list regresses my HoneyComb LX2K (ironically
the machine I do maintainer work on). It stops PCIe from probing, but
without a single message indicating why.

The reason seems to be that the iommu-maps property doesn't get
patched up by my (older) u-boot, and thus isn't a valid reference.
System works fine without IOMMU, which is how I've ran it for a couple
of years.

It's also extremely hard to diagnose out of the box because there are
*no error messages*. And there were no warnings leading up to this
strict enforcement.

This "feature" seems to have been done backwards. The checks should
have been running (and not skipped due to the "optional" flag), but
also not causing errors, just warnings. That would have given users a
chance to know that this is something that needs to be fixed.

And when you flip the switch, at least report what failed so that
people don't need to spend a whole night bisecting kernels, please.

Greg, mind reverting just the last one? If I hit this, I presume
others would too.


-Olof

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-12 17:23 ` [GIT PULL] Driver core changes for 6.0-rc1 Olof Johansson
@ 2022-09-12 17:24   ` Olof Johansson
  2022-09-13 15:15     ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Olof Johansson @ 2022-09-12 17:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell,
	Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang

On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
>
> Hi,
>
> On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> > Saravana Kannan (11):
> >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> >       driver core: Add wait_for_init_devices_probe helper function
> >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> >       driver core: Set fw_devlink.strict=1 by default
> >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> >       driver core: Delete driver_deferred_probe_check_state()
> >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> >       of: base: Avoid console probe delay when fw_devlink.strict=1
>
> The last patch in this list regresses my HoneyComb LX2K (ironically
> the machine I do maintainer work on). It stops PCIe from probing, but
> without a single message indicating why.
>
> The reason seems to be that the iommu-maps property doesn't get
> patched up by my (older) u-boot, and thus isn't a valid reference.
> System works fine without IOMMU, which is how I've ran it for a couple
> of years.
>
> It's also extremely hard to diagnose out of the box because there are
> *no error messages*. And there were no warnings leading up to this
> strict enforcement.
>
> This "feature" seems to have been done backwards. The checks should
> have been running (and not skipped due to the "optional" flag), but
> also not causing errors, just warnings. That would have given users a
> chance to know that this is something that needs to be fixed.
>
> And when you flip the switch, at least report what failed so that
> people don't need to spend a whole night bisecting kernels, please.
>
> Greg, mind reverting just the last one? If I hit this, I presume
> others would too.

Apologies, wrong patch pointed out. The culprit is "driver core: Set
fw_devlink.strict=1 by default", 71066545b48e42.


-Olof

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-12 17:24   ` Olof Johansson
@ 2022-09-13 15:15     ` Greg KH
  2022-09-13 16:28       ` Olof Johansson
  0 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2022-09-13 15:15 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell,
	Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang

On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> >
> > Hi,
> >
> > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > > Saravana Kannan (11):
> > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > >       driver core: Add wait_for_init_devices_probe helper function
> > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > >       driver core: Set fw_devlink.strict=1 by default
> > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > >       driver core: Delete driver_deferred_probe_check_state()
> > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> >
> > The last patch in this list regresses my HoneyComb LX2K (ironically
> > the machine I do maintainer work on). It stops PCIe from probing, but
> > without a single message indicating why.
> >
> > The reason seems to be that the iommu-maps property doesn't get
> > patched up by my (older) u-boot, and thus isn't a valid reference.
> > System works fine without IOMMU, which is how I've ran it for a couple
> > of years.
> >
> > It's also extremely hard to diagnose out of the box because there are
> > *no error messages*. And there were no warnings leading up to this
> > strict enforcement.
> >
> > This "feature" seems to have been done backwards. The checks should
> > have been running (and not skipped due to the "optional" flag), but
> > also not causing errors, just warnings. That would have given users a
> > chance to know that this is something that needs to be fixed.
> >
> > And when you flip the switch, at least report what failed so that
> > people don't need to spend a whole night bisecting kernels, please.
> >
> > Greg, mind reverting just the last one? If I hit this, I presume
> > others would too.
> 
> Apologies, wrong patch pointed out. The culprit is "driver core: Set
> fw_devlink.strict=1 by default", 71066545b48e42.

Is this still an issue in -rc5?  A number of patches in the above series
was just reverted and hopefully should have resolved the issue you are
seeing.

thanks,

greg k-h

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-13 15:15     ` Greg KH
@ 2022-09-13 16:28       ` Olof Johansson
  2022-09-14 14:00         ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Olof Johansson @ 2022-09-13 16:28 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell,
	Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang

On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > > Saravana Kannan (11):
> > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > >       driver core: Add wait_for_init_devices_probe helper function
> > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > >       driver core: Set fw_devlink.strict=1 by default
> > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > >       driver core: Delete driver_deferred_probe_check_state()
> > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > >
> > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > without a single message indicating why.
> > >
> > > The reason seems to be that the iommu-maps property doesn't get
> > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > System works fine without IOMMU, which is how I've ran it for a couple
> > > of years.
> > >
> > > It's also extremely hard to diagnose out of the box because there are
> > > *no error messages*. And there were no warnings leading up to this
> > > strict enforcement.
> > >
> > > This "feature" seems to have been done backwards. The checks should
> > > have been running (and not skipped due to the "optional" flag), but
> > > also not causing errors, just warnings. That would have given users a
> > > chance to know that this is something that needs to be fixed.
> > >
> > > And when you flip the switch, at least report what failed so that
> > > people don't need to spend a whole night bisecting kernels, please.
> > >
> > > Greg, mind reverting just the last one? If I hit this, I presume
> > > others would too.
> >
> > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > fw_devlink.strict=1 by default", 71066545b48e42.
>
> Is this still an issue in -rc5?  A number of patches in the above series
> was just reverted and hopefully should have resolved the issue you are
> seeing.

Unfortunately, I discovered this regression with -rc5 in the first
place, so it's still there.


-Olof

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-13 16:28       ` Olof Johansson
@ 2022-09-14 14:00         ` Greg KH
  2022-09-14 16:24           ` Olof Johansson
  0 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2022-09-14 14:00 UTC (permalink / raw)
  To: Saravana Kannan, Olof Johansson
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell,
	Linux ARM Mailing List, Shawn Guo, Li Yang

On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote:
> On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > > Saravana Kannan (11):
> > > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > > >       driver core: Add wait_for_init_devices_probe helper function
> > > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > > >       driver core: Set fw_devlink.strict=1 by default
> > > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > > >       driver core: Delete driver_deferred_probe_check_state()
> > > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > > >
> > > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > > without a single message indicating why.
> > > >
> > > > The reason seems to be that the iommu-maps property doesn't get
> > > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > > System works fine without IOMMU, which is how I've ran it for a couple
> > > > of years.
> > > >
> > > > It's also extremely hard to diagnose out of the box because there are
> > > > *no error messages*. And there were no warnings leading up to this
> > > > strict enforcement.
> > > >
> > > > This "feature" seems to have been done backwards. The checks should
> > > > have been running (and not skipped due to the "optional" flag), but
> > > > also not causing errors, just warnings. That would have given users a
> > > > chance to know that this is something that needs to be fixed.
> > > >
> > > > And when you flip the switch, at least report what failed so that
> > > > people don't need to spend a whole night bisecting kernels, please.
> > > >
> > > > Greg, mind reverting just the last one? If I hit this, I presume
> > > > others would too.
> > >
> > > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > > fw_devlink.strict=1 by default", 71066545b48e42.
> >
> > Is this still an issue in -rc5?  A number of patches in the above series
> > was just reverted and hopefully should have resolved the issue you are
> > seeing.
> 
> Unfortunately, I discovered this regression with -rc5 in the first
> place, so it's still there.

Ick, ok, Saravana, any thoughts?  I know you're at the conference this
week with me, maybe you can give Olof a hint as to what to look for
here?

thanks,

greg k-h

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-14 14:00         ` Greg KH
@ 2022-09-14 16:24           ` Olof Johansson
  2022-09-14 17:35             ` Saravana Kannan
  0 siblings, 1 reply; 10+ messages in thread
From: Olof Johansson @ 2022-09-14 16:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel,
	Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang

Hi,

On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote:
> > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > > Saravana Kannan (11):
> > > > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > > > >       driver core: Add wait_for_init_devices_probe helper function
> > > > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > > > >       driver core: Set fw_devlink.strict=1 by default
> > > > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > > > >       driver core: Delete driver_deferred_probe_check_state()
> > > > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > > > >
> > > > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > > > without a single message indicating why.
> > > > >
> > > > > The reason seems to be that the iommu-maps property doesn't get
> > > > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > > > System works fine without IOMMU, which is how I've ran it for a couple
> > > > > of years.
> > > > >
> > > > > It's also extremely hard to diagnose out of the box because there are
> > > > > *no error messages*. And there were no warnings leading up to this
> > > > > strict enforcement.
> > > > >
> > > > > This "feature" seems to have been done backwards. The checks should
> > > > > have been running (and not skipped due to the "optional" flag), but
> > > > > also not causing errors, just warnings. That would have given users a
> > > > > chance to know that this is something that needs to be fixed.
> > > > >
> > > > > And when you flip the switch, at least report what failed so that
> > > > > people don't need to spend a whole night bisecting kernels, please.
> > > > >
> > > > > Greg, mind reverting just the last one? If I hit this, I presume
> > > > > others would too.
> > > >
> > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > > > fw_devlink.strict=1 by default", 71066545b48e42.
> > >
> > > Is this still an issue in -rc5?  A number of patches in the above series
> > > was just reverted and hopefully should have resolved the issue you are
> > > seeing.
> >
> > Unfortunately, I discovered this regression with -rc5 in the first
> > place, so it's still there.
>
> Ick, ok, Saravana, any thoughts?  I know you're at the conference this
> week with me, maybe you can give Olof a hint as to what to look for
> here?

I'm not sure what you want me to look for. The patch turns on
enforcement of DT contents that never used to be enforced, so now my
computer no longer boots. And it does it in a way that makes it
impossible for someone not rebuilding kernels to debug to figure out
what happened.

The patch needs to be reverted.


-Olof

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-14 16:24           ` Olof Johansson
@ 2022-09-14 17:35             ` Saravana Kannan
  2022-09-15  3:56               ` Olof Johansson
  0 siblings, 1 reply; 10+ messages in thread
From: Saravana Kannan @ 2022-09-14 17:35 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Greg KH, Linus Torvalds, Andrew Morton, linux-kernel,
	Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang

On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote:
>
> Hi,
>
> On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote:
> > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > > Saravana Kannan (11):
> > > > > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > > > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > > > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > > > > >       driver core: Add wait_for_init_devices_probe helper function
> > > > > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > > > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > > > > >       driver core: Set fw_devlink.strict=1 by default
> > > > > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > > > > >       driver core: Delete driver_deferred_probe_check_state()
> > > > > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > > > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > > > > >
> > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > > > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > > > > without a single message indicating why.
> > > > > >
> > > > > > The reason seems to be that the iommu-maps property doesn't get
> > > > > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > > > > System works fine without IOMMU, which is how I've ran it for a couple
> > > > > > of years.
> > > > > >
> > > > > > It's also extremely hard to diagnose out of the box because there are
> > > > > > *no error messages*. And there were no warnings leading up to this
> > > > > > strict enforcement.
> > > > > >
> > > > > > This "feature" seems to have been done backwards. The checks should
> > > > > > have been running (and not skipped due to the "optional" flag), but
> > > > > > also not causing errors, just warnings. That would have given users a
> > > > > > chance to know that this is something that needs to be fixed.
> > > > > >
> > > > > > And when you flip the switch, at least report what failed so that
> > > > > > people don't need to spend a whole night bisecting kernels, please.
> > > > > >
> > > > > > Greg, mind reverting just the last one? If I hit this, I presume
> > > > > > others would too.
> > > > >
> > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > > > > fw_devlink.strict=1 by default", 71066545b48e42.
> > > >
> > > > Is this still an issue in -rc5?  A number of patches in the above series
> > > > was just reverted and hopefully should have resolved the issue you are
> > > > seeing.
> > >
> > > Unfortunately, I discovered this regression with -rc5 in the first
> > > place, so it's still there.
> >
> > Ick, ok, Saravana, any thoughts?  I know you're at the conference this
> > week with me, maybe you can give Olof a hint as to what to look for
> > here?
>
> I'm not sure what you want me to look for. The patch turns on
> enforcement of DT contents that never used to be enforced, so now my
> computer no longer boots. And it does it in a way that makes it
> impossible for someone not rebuilding kernels to debug to figure out
> what happened.

Hi Olof,

Sorry for the trouble. It doesn't print any error messages because
there are cases where it's block the probe where it wouldn't be an
error. If I printed it every time fw_devlink blocked a probe, it'd be
a ton of messages.

Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes
that'll stop indefinitely blocking probes. So what you are seeing
shouldn't be happening. After about 10 seconds (configurable), it
should stop blocking the probes.

If you actually want to see the reasons the probe is being blocked,
you can enable the existing dbg messages in drivers/base/core.c.

Would you mind pointing me to the dts (not dtsi) file that corresponds
to this board please? And which specific PCI device is being blocked
from probing? I'll can try to debug it further. Also, can you try to
see why it doesn't get unblocked when driver_deferred_probe_timeout
expires? Or why that's not helping here?

In the meantime, if you want this patch reverted, I'm not opposed to
that. But if you can use fw_devlinks.strict=0 in your commandline for
now and give me time to debug, that'd be nicer.

Thanks,
Saravana

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-14 17:35             ` Saravana Kannan
@ 2022-09-15  3:56               ` Olof Johansson
  2022-09-15 10:48                 ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Olof Johansson @ 2022-09-15  3:56 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg KH, Linus Torvalds, Andrew Morton, linux-kernel,
	Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang

On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote:
> >
> > Hi,
> >
> > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote:
> > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > >
> > > > > > > > Saravana Kannan (11):
> > > > > > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > > > > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > > > > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > > > > > >       driver core: Add wait_for_init_devices_probe helper function
> > > > > > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > > > > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > > > > > >       driver core: Set fw_devlink.strict=1 by default
> > > > > > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > > > > > >       driver core: Delete driver_deferred_probe_check_state()
> > > > > > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > > > > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > > > > > >
> > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > > > > > without a single message indicating why.
> > > > > > >
> > > > > > > The reason seems to be that the iommu-maps property doesn't get
> > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > > > > > System works fine without IOMMU, which is how I've ran it for a couple
> > > > > > > of years.
> > > > > > >
> > > > > > > It's also extremely hard to diagnose out of the box because there are
> > > > > > > *no error messages*. And there were no warnings leading up to this
> > > > > > > strict enforcement.
> > > > > > >
> > > > > > > This "feature" seems to have been done backwards. The checks should
> > > > > > > have been running (and not skipped due to the "optional" flag), but
> > > > > > > also not causing errors, just warnings. That would have given users a
> > > > > > > chance to know that this is something that needs to be fixed.
> > > > > > >
> > > > > > > And when you flip the switch, at least report what failed so that
> > > > > > > people don't need to spend a whole night bisecting kernels, please.
> > > > > > >
> > > > > > > Greg, mind reverting just the last one? If I hit this, I presume
> > > > > > > others would too.
> > > > > >
> > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > > > > > fw_devlink.strict=1 by default", 71066545b48e42.
> > > > >
> > > > > Is this still an issue in -rc5?  A number of patches in the above series
> > > > > was just reverted and hopefully should have resolved the issue you are
> > > > > seeing.
> > > >
> > > > Unfortunately, I discovered this regression with -rc5 in the first
> > > > place, so it's still there.
> > >
> > > Ick, ok, Saravana, any thoughts?  I know you're at the conference this
> > > week with me, maybe you can give Olof a hint as to what to look for
> > > here?
> >
> > I'm not sure what you want me to look for. The patch turns on
> > enforcement of DT contents that never used to be enforced, so now my
> > computer no longer boots. And it does it in a way that makes it
> > impossible for someone not rebuilding kernels to debug to figure out
> > what happened.
>
> Hi Olof,
>
> Sorry for the trouble. It doesn't print any error messages because
> there are cases where it's block the probe where it wouldn't be an
> error. If I printed it every time fw_devlink blocked a probe, it'd be
> a ton of messages.
>
> Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes
> that'll stop indefinitely blocking probes. So what you are seeing
> shouldn't be happening. After about 10 seconds (configurable), it
> should stop blocking the probes.

"Shouldn't be happening" is a pretty bold statement. It's not actually
stuck on timeout in my case, and doesn't recover.

Instead, what seems to be happening is that the PCIe driver, which
registers as a platform_driver here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255

ends up registering, and the driver core now refuses to try to probe
the device matches, since they no longer have their suppliers
fulfilled (the smmu suppliers would not be tracked since they are
optional here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449

So what happens is that the driver registration succeeds, but there
have been no devices matched to it. So when it returns to the platform
core, it thinks there are no devices bound to this driver, so it
should be unregistered:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951

That explains why the pcie core doesn't retry and just disappears, and
stops retrying.

This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES:
[    5.178538] bus: 'platform': add driver layerscape-pcie-gen4
[    5.184301] bus: 'platform': __driver_probe_device: matched device
3600000.pcie with driver layerscape-pcie-gen4
[    5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier
5000000.iommu not ready
[    5.202607] platform 3600000.pcie: Added to deferred list
[    5.208024] bus: 'platform': __driver_probe_device: matched device
3800000.pcie with driver layerscape-pcie-gen4
[    5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier
5000000.iommu not ready
[    5.226333] platform 3800000.pcie: Added to deferred list
[    5.231814] bus: 'platform': remove driver layerscape-pcie-gen4
[    5.237761] driver: 'layerscape-pcie-gen4': driver_release

Note that the platform driver registration sets flags to disable async
probing, supposedly so it can assume that any matching devices would
be found by the time registration returns:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917
:

/*
* We have to run our probes synchronously because we check if
* we find any devices to bind to and exit with error if there
* are any.
*/
drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS;

/*
* Prevent driver from requesting probe deferral to avoid further
* futile probe attempts.
*/
drv->prevent_deferred_probe = true;




Bottom line: How was this code tested? This seems far from mature,
this doesn't seem like that of an obscure condition to occur and it
could create minefields for others down the road if it's fragile.


-Olof


>
> If you actually want to see the reasons the probe is being blocked,
> you can enable the existing dbg messages in drivers/base/core.c.
>
> Would you mind pointing me to the dts (not dtsi) file that corresponds
> to this board please? And which specific PCI device is being blocked
> from probing? I'll can try to debug it further. Also, can you try to
> see why it doesn't get unblocked when driver_deferred_probe_timeout
> expires? Or why that's not helping here?
>
> In the meantime, if you want this patch reverted, I'm not opposed to
> that. But if you can use fw_devlinks.strict=0 in your commandline for
> now and give me time to debug, that'd be nicer.
>
> Thanks,
> Saravana

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-15  3:56               ` Olof Johansson
@ 2022-09-15 10:48                 ` Greg KH
  2022-09-15 15:53                   ` Olof Johansson
  0 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2022-09-15 10:48 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel,
	Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang

On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote:
> On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote:
> > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > > >
> > > > > > > > > Saravana Kannan (11):
> > > > > > > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > >       driver core: Add wait_for_init_devices_probe helper function
> > > > > > > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > > > > > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > > > > > > >       driver core: Set fw_devlink.strict=1 by default
> > > > > > > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > >       driver core: Delete driver_deferred_probe_check_state()
> > > > > > > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > > > > > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > > > > > > >
> > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > > > > > > without a single message indicating why.
> > > > > > > >
> > > > > > > > The reason seems to be that the iommu-maps property doesn't get
> > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple
> > > > > > > > of years.
> > > > > > > >
> > > > > > > > It's also extremely hard to diagnose out of the box because there are
> > > > > > > > *no error messages*. And there were no warnings leading up to this
> > > > > > > > strict enforcement.
> > > > > > > >
> > > > > > > > This "feature" seems to have been done backwards. The checks should
> > > > > > > > have been running (and not skipped due to the "optional" flag), but
> > > > > > > > also not causing errors, just warnings. That would have given users a
> > > > > > > > chance to know that this is something that needs to be fixed.
> > > > > > > >
> > > > > > > > And when you flip the switch, at least report what failed so that
> > > > > > > > people don't need to spend a whole night bisecting kernels, please.
> > > > > > > >
> > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume
> > > > > > > > others would too.
> > > > > > >
> > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > > > > > > fw_devlink.strict=1 by default", 71066545b48e42.
> > > > > >
> > > > > > Is this still an issue in -rc5?  A number of patches in the above series
> > > > > > was just reverted and hopefully should have resolved the issue you are
> > > > > > seeing.
> > > > >
> > > > > Unfortunately, I discovered this regression with -rc5 in the first
> > > > > place, so it's still there.
> > > >
> > > > Ick, ok, Saravana, any thoughts?  I know you're at the conference this
> > > > week with me, maybe you can give Olof a hint as to what to look for
> > > > here?
> > >
> > > I'm not sure what you want me to look for. The patch turns on
> > > enforcement of DT contents that never used to be enforced, so now my
> > > computer no longer boots. And it does it in a way that makes it
> > > impossible for someone not rebuilding kernels to debug to figure out
> > > what happened.
> >
> > Hi Olof,
> >
> > Sorry for the trouble. It doesn't print any error messages because
> > there are cases where it's block the probe where it wouldn't be an
> > error. If I printed it every time fw_devlink blocked a probe, it'd be
> > a ton of messages.
> >
> > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes
> > that'll stop indefinitely blocking probes. So what you are seeing
> > shouldn't be happening. After about 10 seconds (configurable), it
> > should stop blocking the probes.
> 
> "Shouldn't be happening" is a pretty bold statement. It's not actually
> stuck on timeout in my case, and doesn't recover.
> 
> Instead, what seems to be happening is that the PCIe driver, which
> registers as a platform_driver here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255
> 
> ends up registering, and the driver core now refuses to try to probe
> the device matches, since they no longer have their suppliers
> fulfilled (the smmu suppliers would not be tracked since they are
> optional here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449
> 
> So what happens is that the driver registration succeeds, but there
> have been no devices matched to it. So when it returns to the platform
> core, it thinks there are no devices bound to this driver, so it
> should be unregistered:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951
> 
> That explains why the pcie core doesn't retry and just disappears, and
> stops retrying.
> 
> This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES:
> [    5.178538] bus: 'platform': add driver layerscape-pcie-gen4
> [    5.184301] bus: 'platform': __driver_probe_device: matched device
> 3600000.pcie with driver layerscape-pcie-gen4
> [    5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier
> 5000000.iommu not ready
> [    5.202607] platform 3600000.pcie: Added to deferred list
> [    5.208024] bus: 'platform': __driver_probe_device: matched device
> 3800000.pcie with driver layerscape-pcie-gen4
> [    5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier
> 5000000.iommu not ready
> [    5.226333] platform 3800000.pcie: Added to deferred list
> [    5.231814] bus: 'platform': remove driver layerscape-pcie-gen4
> [    5.237761] driver: 'layerscape-pcie-gen4': driver_release
> 
> Note that the platform driver registration sets flags to disable async
> probing, supposedly so it can assume that any matching devices would
> be found by the time registration returns:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917
> :
> 
> /*
> * We have to run our probes synchronously because we check if
> * we find any devices to bind to and exit with error if there
> * are any.
> */
> drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
> 
> /*
> * Prevent driver from requesting probe deferral to avoid further
> * futile probe attempts.
> */
> drv->prevent_deferred_probe = true;
> 
> 
> 
> 
> Bottom line: How was this code tested? This seems far from mature,
> this doesn't seem like that of an obscure condition to occur and it
> could create minefields for others down the road if it's fragile.

I've reverted it for now, let's get this worked out for later releases.

thanks,

greg k-h

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Driver core changes for 6.0-rc1
  2022-09-15 10:48                 ` Greg KH
@ 2022-09-15 15:53                   ` Olof Johansson
  0 siblings, 0 replies; 10+ messages in thread
From: Olof Johansson @ 2022-09-15 15:53 UTC (permalink / raw)
  To: Greg KH
  Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel,
	Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang

On Thu, Sep 15, 2022 at 3:47 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote:
> > On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote:
> > > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote:
> > > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > > > >
> > > > > > > > > > Saravana Kannan (11):
> > > > > > > > > >       PM: domains: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > > >       pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > > >       net: mdio: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > > >       driver core: Add wait_for_init_devices_probe helper function
> > > > > > > > > >       net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
> > > > > > > > > >       Revert "driver core: Set default deferred_probe_timeout back to 0."
> > > > > > > > > >       driver core: Set fw_devlink.strict=1 by default
> > > > > > > > > >       iommu/of: Delete usage of driver_deferred_probe_check_state()
> > > > > > > > > >       driver core: Delete driver_deferred_probe_check_state()
> > > > > > > > > >       driver core: fw_devlink: Allow firmware to mark devices as best effort
> > > > > > > > > >       of: base: Avoid console probe delay when fw_devlink.strict=1
> > > > > > > > >
> > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically
> > > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but
> > > > > > > > > without a single message indicating why.
> > > > > > > > >
> > > > > > > > > The reason seems to be that the iommu-maps property doesn't get
> > > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference.
> > > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple
> > > > > > > > > of years.
> > > > > > > > >
> > > > > > > > > It's also extremely hard to diagnose out of the box because there are
> > > > > > > > > *no error messages*. And there were no warnings leading up to this
> > > > > > > > > strict enforcement.
> > > > > > > > >
> > > > > > > > > This "feature" seems to have been done backwards. The checks should
> > > > > > > > > have been running (and not skipped due to the "optional" flag), but
> > > > > > > > > also not causing errors, just warnings. That would have given users a
> > > > > > > > > chance to know that this is something that needs to be fixed.
> > > > > > > > >
> > > > > > > > > And when you flip the switch, at least report what failed so that
> > > > > > > > > people don't need to spend a whole night bisecting kernels, please.
> > > > > > > > >
> > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume
> > > > > > > > > others would too.
> > > > > > > >
> > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set
> > > > > > > > fw_devlink.strict=1 by default", 71066545b48e42.
> > > > > > >
> > > > > > > Is this still an issue in -rc5?  A number of patches in the above series
> > > > > > > was just reverted and hopefully should have resolved the issue you are
> > > > > > > seeing.
> > > > > >
> > > > > > Unfortunately, I discovered this regression with -rc5 in the first
> > > > > > place, so it's still there.
> > > > >
> > > > > Ick, ok, Saravana, any thoughts?  I know you're at the conference this
> > > > > week with me, maybe you can give Olof a hint as to what to look for
> > > > > here?
> > > >
> > > > I'm not sure what you want me to look for. The patch turns on
> > > > enforcement of DT contents that never used to be enforced, so now my
> > > > computer no longer boots. And it does it in a way that makes it
> > > > impossible for someone not rebuilding kernels to debug to figure out
> > > > what happened.
> > >
> > > Hi Olof,
> > >
> > > Sorry for the trouble. It doesn't print any error messages because
> > > there are cases where it's block the probe where it wouldn't be an
> > > error. If I printed it every time fw_devlink blocked a probe, it'd be
> > > a ton of messages.
> > >
> > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes
> > > that'll stop indefinitely blocking probes. So what you are seeing
> > > shouldn't be happening. After about 10 seconds (configurable), it
> > > should stop blocking the probes.
> >
> > "Shouldn't be happening" is a pretty bold statement. It's not actually
> > stuck on timeout in my case, and doesn't recover.
> >
> > Instead, what seems to be happening is that the PCIe driver, which
> > registers as a platform_driver here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255
> >
> > ends up registering, and the driver core now refuses to try to probe
> > the device matches, since they no longer have their suppliers
> > fulfilled (the smmu suppliers would not be tracked since they are
> > optional here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449
> >
> > So what happens is that the driver registration succeeds, but there
> > have been no devices matched to it. So when it returns to the platform
> > core, it thinks there are no devices bound to this driver, so it
> > should be unregistered:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951
> >
> > That explains why the pcie core doesn't retry and just disappears, and
> > stops retrying.
> >
> > This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES:
> > [    5.178538] bus: 'platform': add driver layerscape-pcie-gen4
> > [    5.184301] bus: 'platform': __driver_probe_device: matched device
> > 3600000.pcie with driver layerscape-pcie-gen4
> > [    5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier
> > 5000000.iommu not ready
> > [    5.202607] platform 3600000.pcie: Added to deferred list
> > [    5.208024] bus: 'platform': __driver_probe_device: matched device
> > 3800000.pcie with driver layerscape-pcie-gen4
> > [    5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier
> > 5000000.iommu not ready
> > [    5.226333] platform 3800000.pcie: Added to deferred list
> > [    5.231814] bus: 'platform': remove driver layerscape-pcie-gen4
> > [    5.237761] driver: 'layerscape-pcie-gen4': driver_release
> >
> > Note that the platform driver registration sets flags to disable async
> > probing, supposedly so it can assume that any matching devices would
> > be found by the time registration returns:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917
> > :
> >
> > /*
> > * We have to run our probes synchronously because we check if
> > * we find any devices to bind to and exit with error if there
> > * are any.
> > */
> > drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
> >
> > /*
> > * Prevent driver from requesting probe deferral to avoid further
> > * futile probe attempts.
> > */
> > drv->prevent_deferred_probe = true;
> >
> >
> >
> >
> > Bottom line: How was this code tested? This seems far from mature,
> > this doesn't seem like that of an obscure condition to occur and it
> > could create minefields for others down the road if it's fragile.
>
> I've reverted it for now, let's get this worked out for later releases.

Thanks Greg!

-Olof

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-15 15:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <YuqDMLF2AQyj4+N1@kroah.com>
2022-09-12 17:23 ` [GIT PULL] Driver core changes for 6.0-rc1 Olof Johansson
2022-09-12 17:24   ` Olof Johansson
2022-09-13 15:15     ` Greg KH
2022-09-13 16:28       ` Olof Johansson
2022-09-14 14:00         ` Greg KH
2022-09-14 16:24           ` Olof Johansson
2022-09-14 17:35             ` Saravana Kannan
2022-09-15  3:56               ` Olof Johansson
2022-09-15 10:48                 ` Greg KH
2022-09-15 15:53                   ` Olof Johansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).