* [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-08-03 14:16 Greg KH 2022-08-04 19:27 ` pr-tracker-bot 2022-09-12 17:23 ` Olof Johansson 0 siblings, 2 replies; 22+ messages in thread From: Greg KH @ 2022-08-03 14:16 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton Cc: linux-kernel, Stephen Rothwell, Saravana Kannan The following changes since commit f2906aa863381afb0015a9eb7fefad885d4e5a56: Linux 5.19-rc1 (2022-06-05 17:18:54 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git tags/driver-core-6.0-rc1 for you to fetch changes up to 273aaa24369cb8d0f246bb16f7122b91a1ef5188: docs: embargoed-hardware-issues: fix invalid AMD contact email (2022-07-29 16:10:04 +0200) ---------------------------------------------------------------- Driver core / kernfs changes for 6.0-rc1 Here is the set of driver core and kernfs changes for 6.0-rc1. "biggest" thing in here is some scalability improvements for kernfs for large systems. Other than that, included in here are: - arch topology and cache info changes that have been reviewed and discussed a lot. - potential error path cleanup fixes - deferred driver probe cleanups - firmware loader cleanups and tweaks - documentation updates - other small things All of these have been in the linux-next tree for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> ---------------------------------------------------------------- Andy Shevchenko (2): driver core: Introduce device_find_any_child() helper spi: Use device_find_any_child() instead of custom approach Dave Airlie (1): docs: driver-api: firmware: add driver firmware guidelines. (v3) Duoming Zhou (2): devcoredump: remove the useless gfp_t parameter in dev_coredumpv and dev_coredumpm mwifiex: fix sleep in atomic context bugs caused by dev_coredumpv Fabio M. De Francesco (1): firmware_loader: Replace kmap() with kmap_local_page() Florian Fainelli (1): MAINTAINERS: Change mentions of mpm to olivia Greg Kroah-Hartman (4): Revert "mwifiex: fix sleep in atomic context bugs caused by dev_coredumpv" Revert "devcoredump: remove the useless gfp_t parameter in dev_coredumpv and dev_coredumpm" Merge tag 'arch-cache-topo-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into driver-core-next docs: embargoed-hardware-issues: fix invalid AMD contact email Imran Khan (5): kernfs: make ->attr.open RCU protected. kernfs: Change kernfs_notify_list to llist. kernfs: Introduce interface to access global kernfs_open_file_mutex. kernfs: Replace global kernfs_open_file_mutex with hashed mutexes. Revert "kernfs: Change kernfs_notify_list to llist." Ionela Voinescu (1): arch_topology: Limit span of cpu_clustergroup_mask() Lee Jones (2): docs: ABI: sysfs-class-pwm: Update Lee Jones' email address docs: ABI: sysfs-devices-soc: Update Lee Jones' email address Liang He (1): firmware: Hold a reference for of_find_compatible_node() Lin Feng (1): kernfs/file.c: remove redundant error return counter assignment Mauro Carvalho Chehab (1): ABI: testing/sysfs-devices-system-cpu: remove duplicated core_id Nick Desaulniers (1): Documentation/process: Add embargoed HW contact for LLVM Phil Auld (1): drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist Randy Dunlap (1): kobject: fix Kconfig.debug "its" grammar Saravana Kannan (11): PM: domains: Delete usage of driver_deferred_probe_check_state() pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() net: mdio: Delete usage of driver_deferred_probe_check_state() driver core: Add wait_for_init_devices_probe helper function net: ipconfig: Relax fw_devlink if we need to mount a network rootfs Revert "driver core: Set default deferred_probe_timeout back to 0." driver core: Set fw_devlink.strict=1 by default iommu/of: Delete usage of driver_deferred_probe_check_state() driver core: Delete driver_deferred_probe_check_state() driver core: fw_devlink: Allow firmware to mark devices as best effort of: base: Avoid console probe delay when fw_devlink.strict=1 Slark Xiao (2): kernfs: Fix typo 'the the' in comment sysfs docs: ABI: Fix typo in comment Sudeep Holla (23): ACPI: PPTT: Use table offset as fw_token instead of virtual address cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node cacheinfo: Add helper to access any cache index for a given CPU cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF cacheinfo: Add support to check if last level cache(LLC) is valid or shared cacheinfo: Allow early detection and population of cache attributes cacheinfo: Use cache identifiers to check if the caches are shared if available cacheinfo: Align checks in cache_shared_cpu_map_{setup,remove} for readability arch_topology: Add support to parse and detect cache attributes arch_topology: Use the last level cache information from the cacheinfo arm64: topology: Remove redundant setting of llc_id in CPU topology arch_topology: Drop LLC identifier stash from the CPU topology arch_topology: Set thread sibling cpumask only within the cluster arch_topology: Check for non-negative value rather than -1 for IDs validity arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found arch_topology: Don't set cluster identifier as physical package identifier arch_topology: Set cluster identifier in each core/thread from /cpu-map arch_topology: Add support for parsing sockets in /cpu-map arch_topology: Warn that topology for nested clusters is not supported ACPI: Remove the unused find_acpi_cpu_cache_topology() cacheinfo: Use atomic allocation for percpu cache attributes ACPI: PPTT: Leave the table mapped for the runtime usage arch_topology: Fix cache attributes detection in the CPU hotplug path Yangxi Xiang (1): devtmpfs: fix the dangling pointer of global devtmpfsd thread Yushan Zhou (1): kernfs: fix potential NULL dereference in __kernfs_remove Zhang Wensheng (1): driver core: fix potential deadlock in __driver_attach Documentation/ABI/stable/sysfs-module | 2 +- Documentation/ABI/testing/sysfs-class-pwm | 2 +- Documentation/ABI/testing/sysfs-class-rtrs-client | 2 +- Documentation/ABI/testing/sysfs-class-rtrs-server | 2 +- .../ABI/testing/sysfs-devices-platform-ACPI-TAD | 2 +- Documentation/ABI/testing/sysfs-devices-power | 2 +- Documentation/ABI/testing/sysfs-devices-soc | 14 +- Documentation/ABI/testing/sysfs-devices-system-cpu | 7 +- Documentation/driver-api/firmware/core.rst | 1 + .../firmware/firmware-usage-guidelines.rst | 44 +++++ .../process/embargoed-hardware-issues.rst | 5 +- .../zh_CN/process/embargoed-hardware-issues.rst | 2 +- .../zh_TW/process/embargoed-hardware-issues.rst | 2 +- MAINTAINERS | 4 +- arch/arm64/kernel/topology.c | 14 -- drivers/acpi/pptt.c | 142 +++++--------- drivers/base/arch_topology.c | 100 +++++++--- drivers/base/base.h | 1 + drivers/base/cacheinfo.c | 145 +++++++++------ drivers/base/core.c | 123 ++++++++++++- drivers/base/dd.c | 59 +++--- drivers/base/devtmpfs.c | 1 + drivers/base/firmware_loader/main.c | 4 +- drivers/base/firmware_loader/sysfs.c | 10 +- drivers/base/node.c | 4 +- drivers/base/power/domain.c | 2 +- drivers/base/topology.c | 32 ++-- drivers/iommu/of_iommu.c | 2 +- drivers/net/mdio/fwnode_mdio.c | 4 +- drivers/of/base.c | 2 + drivers/pinctrl/devicetree.c | 2 +- drivers/spi/spi.c | 9 +- fs/kernfs/dir.c | 7 +- fs/kernfs/file.c | 205 ++++++++++++++------- fs/kernfs/kernfs-internal.h | 4 + fs/kernfs/mount.c | 19 ++ include/linux/acpi.h | 5 - include/linux/arch_topology.h | 1 - include/linux/cacheinfo.h | 3 + include/linux/cpumask.h | 18 ++ include/linux/device.h | 2 + include/linux/device/driver.h | 2 +- include/linux/firmware/trusted_foundations.h | 8 +- include/linux/fwnode.h | 4 + include/linux/kernfs.h | 59 +++++- lib/Kconfig.debug | 2 +- net/ipv4/ipconfig.c | 6 + 47 files changed, 718 insertions(+), 374 deletions(-) create mode 100644 Documentation/driver-api/firmware/firmware-usage-guidelines.rst ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-08-03 14:16 [GIT PULL] Driver core changes for 6.0-rc1 Greg KH @ 2022-08-04 19:27 ` pr-tracker-bot 2022-09-12 17:23 ` Olof Johansson 1 sibling, 0 replies; 22+ messages in thread From: pr-tracker-bot @ 2022-08-04 19:27 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan The pull request you sent on Wed, 3 Aug 2022 16:16:16 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git tags/driver-core-6.0-rc1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/cfeafd94668910334a77c9437a18212baf9f5610 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-08-03 14:16 [GIT PULL] Driver core changes for 6.0-rc1 Greg KH @ 2022-09-12 17:23 ` Olof Johansson 2022-09-12 17:23 ` Olof Johansson 1 sibling, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-12 17:23 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang Hi, On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > Saravana Kannan (11): > PM: domains: Delete usage of driver_deferred_probe_check_state() > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > net: mdio: Delete usage of driver_deferred_probe_check_state() > driver core: Add wait_for_init_devices_probe helper function > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > Revert "driver core: Set default deferred_probe_timeout back to 0." > driver core: Set fw_devlink.strict=1 by default > iommu/of: Delete usage of driver_deferred_probe_check_state() > driver core: Delete driver_deferred_probe_check_state() > driver core: fw_devlink: Allow firmware to mark devices as best effort > of: base: Avoid console probe delay when fw_devlink.strict=1 The last patch in this list regresses my HoneyComb LX2K (ironically the machine I do maintainer work on). It stops PCIe from probing, but without a single message indicating why. The reason seems to be that the iommu-maps property doesn't get patched up by my (older) u-boot, and thus isn't a valid reference. System works fine without IOMMU, which is how I've ran it for a couple of years. It's also extremely hard to diagnose out of the box because there are *no error messages*. And there were no warnings leading up to this strict enforcement. This "feature" seems to have been done backwards. The checks should have been running (and not skipped due to the "optional" flag), but also not causing errors, just warnings. That would have given users a chance to know that this is something that needs to be fixed. And when you flip the switch, at least report what failed so that people don't need to spend a whole night bisecting kernels, please. Greg, mind reverting just the last one? If I hit this, I presume others would too. -Olof ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-12 17:23 ` Olof Johansson 0 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-12 17:23 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang Hi, On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > Saravana Kannan (11): > PM: domains: Delete usage of driver_deferred_probe_check_state() > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > net: mdio: Delete usage of driver_deferred_probe_check_state() > driver core: Add wait_for_init_devices_probe helper function > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > Revert "driver core: Set default deferred_probe_timeout back to 0." > driver core: Set fw_devlink.strict=1 by default > iommu/of: Delete usage of driver_deferred_probe_check_state() > driver core: Delete driver_deferred_probe_check_state() > driver core: fw_devlink: Allow firmware to mark devices as best effort > of: base: Avoid console probe delay when fw_devlink.strict=1 The last patch in this list regresses my HoneyComb LX2K (ironically the machine I do maintainer work on). It stops PCIe from probing, but without a single message indicating why. The reason seems to be that the iommu-maps property doesn't get patched up by my (older) u-boot, and thus isn't a valid reference. System works fine without IOMMU, which is how I've ran it for a couple of years. It's also extremely hard to diagnose out of the box because there are *no error messages*. And there were no warnings leading up to this strict enforcement. This "feature" seems to have been done backwards. The checks should have been running (and not skipped due to the "optional" flag), but also not causing errors, just warnings. That would have given users a chance to know that this is something that needs to be fixed. And when you flip the switch, at least report what failed so that people don't need to spend a whole night bisecting kernels, please. Greg, mind reverting just the last one? If I hit this, I presume others would too. -Olof _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-12 17:23 ` Olof Johansson @ 2022-09-12 17:24 ` Olof Johansson -1 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-12 17:24 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > Hi, > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > Saravana Kannan (11): > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > driver core: Add wait_for_init_devices_probe helper function > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > driver core: Set fw_devlink.strict=1 by default > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > driver core: Delete driver_deferred_probe_check_state() > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > The last patch in this list regresses my HoneyComb LX2K (ironically > the machine I do maintainer work on). It stops PCIe from probing, but > without a single message indicating why. > > The reason seems to be that the iommu-maps property doesn't get > patched up by my (older) u-boot, and thus isn't a valid reference. > System works fine without IOMMU, which is how I've ran it for a couple > of years. > > It's also extremely hard to diagnose out of the box because there are > *no error messages*. And there were no warnings leading up to this > strict enforcement. > > This "feature" seems to have been done backwards. The checks should > have been running (and not skipped due to the "optional" flag), but > also not causing errors, just warnings. That would have given users a > chance to know that this is something that needs to be fixed. > > And when you flip the switch, at least report what failed so that > people don't need to spend a whole night bisecting kernels, please. > > Greg, mind reverting just the last one? If I hit this, I presume > others would too. Apologies, wrong patch pointed out. The culprit is "driver core: Set fw_devlink.strict=1 by default", 71066545b48e42. -Olof ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-12 17:24 ` Olof Johansson 0 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-12 17:24 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > Hi, > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > Saravana Kannan (11): > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > driver core: Add wait_for_init_devices_probe helper function > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > driver core: Set fw_devlink.strict=1 by default > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > driver core: Delete driver_deferred_probe_check_state() > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > The last patch in this list regresses my HoneyComb LX2K (ironically > the machine I do maintainer work on). It stops PCIe from probing, but > without a single message indicating why. > > The reason seems to be that the iommu-maps property doesn't get > patched up by my (older) u-boot, and thus isn't a valid reference. > System works fine without IOMMU, which is how I've ran it for a couple > of years. > > It's also extremely hard to diagnose out of the box because there are > *no error messages*. And there were no warnings leading up to this > strict enforcement. > > This "feature" seems to have been done backwards. The checks should > have been running (and not skipped due to the "optional" flag), but > also not causing errors, just warnings. That would have given users a > chance to know that this is something that needs to be fixed. > > And when you flip the switch, at least report what failed so that > people don't need to spend a whole night bisecting kernels, please. > > Greg, mind reverting just the last one? If I hit this, I presume > others would too. Apologies, wrong patch pointed out. The culprit is "driver core: Set fw_devlink.strict=1 by default", 71066545b48e42. -Olof _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-12 17:24 ` Olof Johansson @ 2022-09-13 15:15 ` Greg KH -1 siblings, 0 replies; 22+ messages in thread From: Greg KH @ 2022-09-13 15:15 UTC (permalink / raw) To: Olof Johansson Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > Hi, > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > Saravana Kannan (11): > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > driver core: Add wait_for_init_devices_probe helper function > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > driver core: Set fw_devlink.strict=1 by default > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > driver core: Delete driver_deferred_probe_check_state() > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > the machine I do maintainer work on). It stops PCIe from probing, but > > without a single message indicating why. > > > > The reason seems to be that the iommu-maps property doesn't get > > patched up by my (older) u-boot, and thus isn't a valid reference. > > System works fine without IOMMU, which is how I've ran it for a couple > > of years. > > > > It's also extremely hard to diagnose out of the box because there are > > *no error messages*. And there were no warnings leading up to this > > strict enforcement. > > > > This "feature" seems to have been done backwards. The checks should > > have been running (and not skipped due to the "optional" flag), but > > also not causing errors, just warnings. That would have given users a > > chance to know that this is something that needs to be fixed. > > > > And when you flip the switch, at least report what failed so that > > people don't need to spend a whole night bisecting kernels, please. > > > > Greg, mind reverting just the last one? If I hit this, I presume > > others would too. > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > fw_devlink.strict=1 by default", 71066545b48e42. Is this still an issue in -rc5? A number of patches in the above series was just reverted and hopefully should have resolved the issue you are seeing. thanks, greg k-h _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-13 15:15 ` Greg KH 0 siblings, 0 replies; 22+ messages in thread From: Greg KH @ 2022-09-13 15:15 UTC (permalink / raw) To: Olof Johansson Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > Hi, > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > Saravana Kannan (11): > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > driver core: Add wait_for_init_devices_probe helper function > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > driver core: Set fw_devlink.strict=1 by default > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > driver core: Delete driver_deferred_probe_check_state() > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > the machine I do maintainer work on). It stops PCIe from probing, but > > without a single message indicating why. > > > > The reason seems to be that the iommu-maps property doesn't get > > patched up by my (older) u-boot, and thus isn't a valid reference. > > System works fine without IOMMU, which is how I've ran it for a couple > > of years. > > > > It's also extremely hard to diagnose out of the box because there are > > *no error messages*. And there were no warnings leading up to this > > strict enforcement. > > > > This "feature" seems to have been done backwards. The checks should > > have been running (and not skipped due to the "optional" flag), but > > also not causing errors, just warnings. That would have given users a > > chance to know that this is something that needs to be fixed. > > > > And when you flip the switch, at least report what failed so that > > people don't need to spend a whole night bisecting kernels, please. > > > > Greg, mind reverting just the last one? If I hit this, I presume > > others would too. > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > fw_devlink.strict=1 by default", 71066545b48e42. Is this still an issue in -rc5? A number of patches in the above series was just reverted and hopefully should have resolved the issue you are seeing. thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-13 15:15 ` Greg KH @ 2022-09-13 16:28 ` Olof Johansson -1 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-13 16:28 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > Hi, > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > Saravana Kannan (11): > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > driver core: Add wait_for_init_devices_probe helper function > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > driver core: Set fw_devlink.strict=1 by default > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > driver core: Delete driver_deferred_probe_check_state() > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > without a single message indicating why. > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > System works fine without IOMMU, which is how I've ran it for a couple > > > of years. > > > > > > It's also extremely hard to diagnose out of the box because there are > > > *no error messages*. And there were no warnings leading up to this > > > strict enforcement. > > > > > > This "feature" seems to have been done backwards. The checks should > > > have been running (and not skipped due to the "optional" flag), but > > > also not causing errors, just warnings. That would have given users a > > > chance to know that this is something that needs to be fixed. > > > > > > And when you flip the switch, at least report what failed so that > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > others would too. > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > fw_devlink.strict=1 by default", 71066545b48e42. > > Is this still an issue in -rc5? A number of patches in the above series > was just reverted and hopefully should have resolved the issue you are > seeing. Unfortunately, I discovered this regression with -rc5 in the first place, so it's still there. -Olof _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-13 16:28 ` Olof Johansson 0 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-13 16:28 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Saravana Kannan, Linux ARM Mailing List, Shawn Guo, Li Yang On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > Hi, > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > Saravana Kannan (11): > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > driver core: Add wait_for_init_devices_probe helper function > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > driver core: Set fw_devlink.strict=1 by default > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > driver core: Delete driver_deferred_probe_check_state() > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > without a single message indicating why. > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > System works fine without IOMMU, which is how I've ran it for a couple > > > of years. > > > > > > It's also extremely hard to diagnose out of the box because there are > > > *no error messages*. And there were no warnings leading up to this > > > strict enforcement. > > > > > > This "feature" seems to have been done backwards. The checks should > > > have been running (and not skipped due to the "optional" flag), but > > > also not causing errors, just warnings. That would have given users a > > > chance to know that this is something that needs to be fixed. > > > > > > And when you flip the switch, at least report what failed so that > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > others would too. > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > fw_devlink.strict=1 by default", 71066545b48e42. > > Is this still an issue in -rc5? A number of patches in the above series > was just reverted and hopefully should have resolved the issue you are > seeing. Unfortunately, I discovered this regression with -rc5 in the first place, so it's still there. -Olof ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-13 16:28 ` Olof Johansson @ 2022-09-14 14:00 ` Greg KH -1 siblings, 0 replies; 22+ messages in thread From: Greg KH @ 2022-09-14 14:00 UTC (permalink / raw) To: Saravana Kannan, Olof Johansson Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > Saravana Kannan (11): > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > without a single message indicating why. > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > of years. > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > *no error messages*. And there were no warnings leading up to this > > > > strict enforcement. > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > have been running (and not skipped due to the "optional" flag), but > > > > also not causing errors, just warnings. That would have given users a > > > > chance to know that this is something that needs to be fixed. > > > > > > > > And when you flip the switch, at least report what failed so that > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > others would too. > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > Is this still an issue in -rc5? A number of patches in the above series > > was just reverted and hopefully should have resolved the issue you are > > seeing. > > Unfortunately, I discovered this regression with -rc5 in the first > place, so it's still there. Ick, ok, Saravana, any thoughts? I know you're at the conference this week with me, maybe you can give Olof a hint as to what to look for here? thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-14 14:00 ` Greg KH 0 siblings, 0 replies; 22+ messages in thread From: Greg KH @ 2022-09-14 14:00 UTC (permalink / raw) To: Saravana Kannan, Olof Johansson Cc: Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > Saravana Kannan (11): > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > without a single message indicating why. > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > of years. > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > *no error messages*. And there were no warnings leading up to this > > > > strict enforcement. > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > have been running (and not skipped due to the "optional" flag), but > > > > also not causing errors, just warnings. That would have given users a > > > > chance to know that this is something that needs to be fixed. > > > > > > > > And when you flip the switch, at least report what failed so that > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > others would too. > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > Is this still an issue in -rc5? A number of patches in the above series > > was just reverted and hopefully should have resolved the issue you are > > seeing. > > Unfortunately, I discovered this regression with -rc5 in the first > place, so it's still there. Ick, ok, Saravana, any thoughts? I know you're at the conference this week with me, maybe you can give Olof a hint as to what to look for here? thanks, greg k-h _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-14 14:00 ` Greg KH @ 2022-09-14 16:24 ` Olof Johansson -1 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-14 16:24 UTC (permalink / raw) To: Greg KH Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang Hi, On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > Saravana Kannan (11): > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > without a single message indicating why. > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > of years. > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > *no error messages*. And there were no warnings leading up to this > > > > > strict enforcement. > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > also not causing errors, just warnings. That would have given users a > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > others would too. > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > was just reverted and hopefully should have resolved the issue you are > > > seeing. > > > > Unfortunately, I discovered this regression with -rc5 in the first > > place, so it's still there. > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > week with me, maybe you can give Olof a hint as to what to look for > here? I'm not sure what you want me to look for. The patch turns on enforcement of DT contents that never used to be enforced, so now my computer no longer boots. And it does it in a way that makes it impossible for someone not rebuilding kernels to debug to figure out what happened. The patch needs to be reverted. -Olof ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-14 16:24 ` Olof Johansson 0 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-14 16:24 UTC (permalink / raw) To: Greg KH Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang Hi, On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > Saravana Kannan (11): > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > without a single message indicating why. > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > of years. > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > *no error messages*. And there were no warnings leading up to this > > > > > strict enforcement. > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > also not causing errors, just warnings. That would have given users a > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > others would too. > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > was just reverted and hopefully should have resolved the issue you are > > > seeing. > > > > Unfortunately, I discovered this regression with -rc5 in the first > > place, so it's still there. > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > week with me, maybe you can give Olof a hint as to what to look for > here? I'm not sure what you want me to look for. The patch turns on enforcement of DT contents that never used to be enforced, so now my computer no longer boots. And it does it in a way that makes it impossible for someone not rebuilding kernels to debug to figure out what happened. The patch needs to be reverted. -Olof _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-14 16:24 ` Olof Johansson @ 2022-09-14 17:35 ` Saravana Kannan -1 siblings, 0 replies; 22+ messages in thread From: Saravana Kannan @ 2022-09-14 17:35 UTC (permalink / raw) To: Olof Johansson Cc: Greg KH, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > Hi, > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > without a single message indicating why. > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > of years. > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > strict enforcement. > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > others would too. > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > was just reverted and hopefully should have resolved the issue you are > > > > seeing. > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > place, so it's still there. > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > week with me, maybe you can give Olof a hint as to what to look for > > here? > > I'm not sure what you want me to look for. The patch turns on > enforcement of DT contents that never used to be enforced, so now my > computer no longer boots. And it does it in a way that makes it > impossible for someone not rebuilding kernels to debug to figure out > what happened. Hi Olof, Sorry for the trouble. It doesn't print any error messages because there are cases where it's block the probe where it wouldn't be an error. If I printed it every time fw_devlink blocked a probe, it'd be a ton of messages. Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes that'll stop indefinitely blocking probes. So what you are seeing shouldn't be happening. After about 10 seconds (configurable), it should stop blocking the probes. If you actually want to see the reasons the probe is being blocked, you can enable the existing dbg messages in drivers/base/core.c. Would you mind pointing me to the dts (not dtsi) file that corresponds to this board please? And which specific PCI device is being blocked from probing? I'll can try to debug it further. Also, can you try to see why it doesn't get unblocked when driver_deferred_probe_timeout expires? Or why that's not helping here? In the meantime, if you want this patch reverted, I'm not opposed to that. But if you can use fw_devlinks.strict=0 in your commandline for now and give me time to debug, that'd be nicer. Thanks, Saravana ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-14 17:35 ` Saravana Kannan 0 siblings, 0 replies; 22+ messages in thread From: Saravana Kannan @ 2022-09-14 17:35 UTC (permalink / raw) To: Olof Johansson Cc: Greg KH, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > Hi, > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > without a single message indicating why. > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > of years. > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > strict enforcement. > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > others would too. > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > was just reverted and hopefully should have resolved the issue you are > > > > seeing. > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > place, so it's still there. > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > week with me, maybe you can give Olof a hint as to what to look for > > here? > > I'm not sure what you want me to look for. The patch turns on > enforcement of DT contents that never used to be enforced, so now my > computer no longer boots. And it does it in a way that makes it > impossible for someone not rebuilding kernels to debug to figure out > what happened. Hi Olof, Sorry for the trouble. It doesn't print any error messages because there are cases where it's block the probe where it wouldn't be an error. If I printed it every time fw_devlink blocked a probe, it'd be a ton of messages. Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes that'll stop indefinitely blocking probes. So what you are seeing shouldn't be happening. After about 10 seconds (configurable), it should stop blocking the probes. If you actually want to see the reasons the probe is being blocked, you can enable the existing dbg messages in drivers/base/core.c. Would you mind pointing me to the dts (not dtsi) file that corresponds to this board please? And which specific PCI device is being blocked from probing? I'll can try to debug it further. Also, can you try to see why it doesn't get unblocked when driver_deferred_probe_timeout expires? Or why that's not helping here? In the meantime, if you want this patch reverted, I'm not opposed to that. But if you can use fw_devlinks.strict=0 in your commandline for now and give me time to debug, that'd be nicer. Thanks, Saravana _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-14 17:35 ` Saravana Kannan @ 2022-09-15 3:56 ` Olof Johansson -1 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-15 3:56 UTC (permalink / raw) To: Saravana Kannan Cc: Greg KH, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote: > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > > > Hi, > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > of years. > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > strict enforcement. > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > others would too. > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > seeing. > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > place, so it's still there. > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > week with me, maybe you can give Olof a hint as to what to look for > > > here? > > > > I'm not sure what you want me to look for. The patch turns on > > enforcement of DT contents that never used to be enforced, so now my > > computer no longer boots. And it does it in a way that makes it > > impossible for someone not rebuilding kernels to debug to figure out > > what happened. > > Hi Olof, > > Sorry for the trouble. It doesn't print any error messages because > there are cases where it's block the probe where it wouldn't be an > error. If I printed it every time fw_devlink blocked a probe, it'd be > a ton of messages. > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > that'll stop indefinitely blocking probes. So what you are seeing > shouldn't be happening. After about 10 seconds (configurable), it > should stop blocking the probes. "Shouldn't be happening" is a pretty bold statement. It's not actually stuck on timeout in my case, and doesn't recover. Instead, what seems to be happening is that the PCIe driver, which registers as a platform_driver here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 ends up registering, and the driver core now refuses to try to probe the device matches, since they no longer have their suppliers fulfilled (the smmu suppliers would not be tracked since they are optional here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 So what happens is that the driver registration succeeds, but there have been no devices matched to it. So when it returns to the platform core, it thinks there are no devices bound to this driver, so it should be unregistered: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 That explains why the pcie core doesn't retry and just disappears, and stops retrying. This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 [ 5.184301] bus: 'platform': __driver_probe_device: matched device 3600000.pcie with driver layerscape-pcie-gen4 [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier 5000000.iommu not ready [ 5.202607] platform 3600000.pcie: Added to deferred list [ 5.208024] bus: 'platform': __driver_probe_device: matched device 3800000.pcie with driver layerscape-pcie-gen4 [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier 5000000.iommu not ready [ 5.226333] platform 3800000.pcie: Added to deferred list [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release Note that the platform driver registration sets flags to disable async probing, supposedly so it can assume that any matching devices would be found by the time registration returns: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 : /* * We have to run our probes synchronously because we check if * we find any devices to bind to and exit with error if there * are any. */ drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; /* * Prevent driver from requesting probe deferral to avoid further * futile probe attempts. */ drv->prevent_deferred_probe = true; Bottom line: How was this code tested? This seems far from mature, this doesn't seem like that of an obscure condition to occur and it could create minefields for others down the road if it's fragile. -Olof > > If you actually want to see the reasons the probe is being blocked, > you can enable the existing dbg messages in drivers/base/core.c. > > Would you mind pointing me to the dts (not dtsi) file that corresponds > to this board please? And which specific PCI device is being blocked > from probing? I'll can try to debug it further. Also, can you try to > see why it doesn't get unblocked when driver_deferred_probe_timeout > expires? Or why that's not helping here? > > In the meantime, if you want this patch reverted, I'm not opposed to > that. But if you can use fw_devlinks.strict=0 in your commandline for > now and give me time to debug, that'd be nicer. > > Thanks, > Saravana ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-15 3:56 ` Olof Johansson 0 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-15 3:56 UTC (permalink / raw) To: Saravana Kannan Cc: Greg KH, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote: > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > > > Hi, > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > of years. > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > strict enforcement. > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > others would too. > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > seeing. > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > place, so it's still there. > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > week with me, maybe you can give Olof a hint as to what to look for > > > here? > > > > I'm not sure what you want me to look for. The patch turns on > > enforcement of DT contents that never used to be enforced, so now my > > computer no longer boots. And it does it in a way that makes it > > impossible for someone not rebuilding kernels to debug to figure out > > what happened. > > Hi Olof, > > Sorry for the trouble. It doesn't print any error messages because > there are cases where it's block the probe where it wouldn't be an > error. If I printed it every time fw_devlink blocked a probe, it'd be > a ton of messages. > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > that'll stop indefinitely blocking probes. So what you are seeing > shouldn't be happening. After about 10 seconds (configurable), it > should stop blocking the probes. "Shouldn't be happening" is a pretty bold statement. It's not actually stuck on timeout in my case, and doesn't recover. Instead, what seems to be happening is that the PCIe driver, which registers as a platform_driver here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 ends up registering, and the driver core now refuses to try to probe the device matches, since they no longer have their suppliers fulfilled (the smmu suppliers would not be tracked since they are optional here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 So what happens is that the driver registration succeeds, but there have been no devices matched to it. So when it returns to the platform core, it thinks there are no devices bound to this driver, so it should be unregistered: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 That explains why the pcie core doesn't retry and just disappears, and stops retrying. This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 [ 5.184301] bus: 'platform': __driver_probe_device: matched device 3600000.pcie with driver layerscape-pcie-gen4 [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier 5000000.iommu not ready [ 5.202607] platform 3600000.pcie: Added to deferred list [ 5.208024] bus: 'platform': __driver_probe_device: matched device 3800000.pcie with driver layerscape-pcie-gen4 [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier 5000000.iommu not ready [ 5.226333] platform 3800000.pcie: Added to deferred list [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release Note that the platform driver registration sets flags to disable async probing, supposedly so it can assume that any matching devices would be found by the time registration returns: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 : /* * We have to run our probes synchronously because we check if * we find any devices to bind to and exit with error if there * are any. */ drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; /* * Prevent driver from requesting probe deferral to avoid further * futile probe attempts. */ drv->prevent_deferred_probe = true; Bottom line: How was this code tested? This seems far from mature, this doesn't seem like that of an obscure condition to occur and it could create minefields for others down the road if it's fragile. -Olof > > If you actually want to see the reasons the probe is being blocked, > you can enable the existing dbg messages in drivers/base/core.c. > > Would you mind pointing me to the dts (not dtsi) file that corresponds > to this board please? And which specific PCI device is being blocked > from probing? I'll can try to debug it further. Also, can you try to > see why it doesn't get unblocked when driver_deferred_probe_timeout > expires? Or why that's not helping here? > > In the meantime, if you want this patch reverted, I'm not opposed to > that. But if you can use fw_devlinks.strict=0 in your commandline for > now and give me time to debug, that'd be nicer. > > Thanks, > Saravana _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-15 3:56 ` Olof Johansson @ 2022-09-15 10:48 ` Greg KH -1 siblings, 0 replies; 22+ messages in thread From: Greg KH @ 2022-09-15 10:48 UTC (permalink / raw) To: Olof Johansson Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote: > On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote: > > > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > Hi, > > > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > > of years. > > > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > > strict enforcement. > > > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > > others would too. > > > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > > seeing. > > > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > > place, so it's still there. > > > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > > week with me, maybe you can give Olof a hint as to what to look for > > > > here? > > > > > > I'm not sure what you want me to look for. The patch turns on > > > enforcement of DT contents that never used to be enforced, so now my > > > computer no longer boots. And it does it in a way that makes it > > > impossible for someone not rebuilding kernels to debug to figure out > > > what happened. > > > > Hi Olof, > > > > Sorry for the trouble. It doesn't print any error messages because > > there are cases where it's block the probe where it wouldn't be an > > error. If I printed it every time fw_devlink blocked a probe, it'd be > > a ton of messages. > > > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > > that'll stop indefinitely blocking probes. So what you are seeing > > shouldn't be happening. After about 10 seconds (configurable), it > > should stop blocking the probes. > > "Shouldn't be happening" is a pretty bold statement. It's not actually > stuck on timeout in my case, and doesn't recover. > > Instead, what seems to be happening is that the PCIe driver, which > registers as a platform_driver here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 > > ends up registering, and the driver core now refuses to try to probe > the device matches, since they no longer have their suppliers > fulfilled (the smmu suppliers would not be tracked since they are > optional here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 > > So what happens is that the driver registration succeeds, but there > have been no devices matched to it. So when it returns to the platform > core, it thinks there are no devices bound to this driver, so it > should be unregistered: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 > > That explains why the pcie core doesn't retry and just disappears, and > stops retrying. > > This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: > [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 > [ 5.184301] bus: 'platform': __driver_probe_device: matched device > 3600000.pcie with driver layerscape-pcie-gen4 > [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier > 5000000.iommu not ready > [ 5.202607] platform 3600000.pcie: Added to deferred list > [ 5.208024] bus: 'platform': __driver_probe_device: matched device > 3800000.pcie with driver layerscape-pcie-gen4 > [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier > 5000000.iommu not ready > [ 5.226333] platform 3800000.pcie: Added to deferred list > [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 > [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release > > Note that the platform driver registration sets flags to disable async > probing, supposedly so it can assume that any matching devices would > be found by the time registration returns: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 > : > > /* > * We have to run our probes synchronously because we check if > * we find any devices to bind to and exit with error if there > * are any. > */ > drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > /* > * Prevent driver from requesting probe deferral to avoid further > * futile probe attempts. > */ > drv->prevent_deferred_probe = true; > > > > > Bottom line: How was this code tested? This seems far from mature, > this doesn't seem like that of an obscure condition to occur and it > could create minefields for others down the road if it's fragile. I've reverted it for now, let's get this worked out for later releases. thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-15 10:48 ` Greg KH 0 siblings, 0 replies; 22+ messages in thread From: Greg KH @ 2022-09-15 10:48 UTC (permalink / raw) To: Olof Johansson Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote: > On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote: > > > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > Hi, > > > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > > of years. > > > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > > strict enforcement. > > > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > > others would too. > > > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > > seeing. > > > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > > place, so it's still there. > > > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > > week with me, maybe you can give Olof a hint as to what to look for > > > > here? > > > > > > I'm not sure what you want me to look for. The patch turns on > > > enforcement of DT contents that never used to be enforced, so now my > > > computer no longer boots. And it does it in a way that makes it > > > impossible for someone not rebuilding kernels to debug to figure out > > > what happened. > > > > Hi Olof, > > > > Sorry for the trouble. It doesn't print any error messages because > > there are cases where it's block the probe where it wouldn't be an > > error. If I printed it every time fw_devlink blocked a probe, it'd be > > a ton of messages. > > > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > > that'll stop indefinitely blocking probes. So what you are seeing > > shouldn't be happening. After about 10 seconds (configurable), it > > should stop blocking the probes. > > "Shouldn't be happening" is a pretty bold statement. It's not actually > stuck on timeout in my case, and doesn't recover. > > Instead, what seems to be happening is that the PCIe driver, which > registers as a platform_driver here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 > > ends up registering, and the driver core now refuses to try to probe > the device matches, since they no longer have their suppliers > fulfilled (the smmu suppliers would not be tracked since they are > optional here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 > > So what happens is that the driver registration succeeds, but there > have been no devices matched to it. So when it returns to the platform > core, it thinks there are no devices bound to this driver, so it > should be unregistered: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 > > That explains why the pcie core doesn't retry and just disappears, and > stops retrying. > > This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: > [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 > [ 5.184301] bus: 'platform': __driver_probe_device: matched device > 3600000.pcie with driver layerscape-pcie-gen4 > [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier > 5000000.iommu not ready > [ 5.202607] platform 3600000.pcie: Added to deferred list > [ 5.208024] bus: 'platform': __driver_probe_device: matched device > 3800000.pcie with driver layerscape-pcie-gen4 > [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier > 5000000.iommu not ready > [ 5.226333] platform 3800000.pcie: Added to deferred list > [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 > [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release > > Note that the platform driver registration sets flags to disable async > probing, supposedly so it can assume that any matching devices would > be found by the time registration returns: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 > : > > /* > * We have to run our probes synchronously because we check if > * we find any devices to bind to and exit with error if there > * are any. > */ > drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > /* > * Prevent driver from requesting probe deferral to avoid further > * futile probe attempts. > */ > drv->prevent_deferred_probe = true; > > > > > Bottom line: How was this code tested? This seems far from mature, > this doesn't seem like that of an obscure condition to occur and it > could create minefields for others down the road if it's fragile. I've reverted it for now, let's get this worked out for later releases. thanks, greg k-h _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 2022-09-15 10:48 ` Greg KH @ 2022-09-15 15:53 ` Olof Johansson -1 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-15 15:53 UTC (permalink / raw) To: Greg KH Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Thu, Sep 15, 2022 at 3:47 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote: > > On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote: > > > > > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > > > of years. > > > > > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > > > strict enforcement. > > > > > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > > > others would too. > > > > > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > > > seeing. > > > > > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > > > place, so it's still there. > > > > > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > > > week with me, maybe you can give Olof a hint as to what to look for > > > > > here? > > > > > > > > I'm not sure what you want me to look for. The patch turns on > > > > enforcement of DT contents that never used to be enforced, so now my > > > > computer no longer boots. And it does it in a way that makes it > > > > impossible for someone not rebuilding kernels to debug to figure out > > > > what happened. > > > > > > Hi Olof, > > > > > > Sorry for the trouble. It doesn't print any error messages because > > > there are cases where it's block the probe where it wouldn't be an > > > error. If I printed it every time fw_devlink blocked a probe, it'd be > > > a ton of messages. > > > > > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > > > that'll stop indefinitely blocking probes. So what you are seeing > > > shouldn't be happening. After about 10 seconds (configurable), it > > > should stop blocking the probes. > > > > "Shouldn't be happening" is a pretty bold statement. It's not actually > > stuck on timeout in my case, and doesn't recover. > > > > Instead, what seems to be happening is that the PCIe driver, which > > registers as a platform_driver here: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 > > > > ends up registering, and the driver core now refuses to try to probe > > the device matches, since they no longer have their suppliers > > fulfilled (the smmu suppliers would not be tracked since they are > > optional here: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 > > > > So what happens is that the driver registration succeeds, but there > > have been no devices matched to it. So when it returns to the platform > > core, it thinks there are no devices bound to this driver, so it > > should be unregistered: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 > > > > That explains why the pcie core doesn't retry and just disappears, and > > stops retrying. > > > > This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: > > [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 > > [ 5.184301] bus: 'platform': __driver_probe_device: matched device > > 3600000.pcie with driver layerscape-pcie-gen4 > > [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier > > 5000000.iommu not ready > > [ 5.202607] platform 3600000.pcie: Added to deferred list > > [ 5.208024] bus: 'platform': __driver_probe_device: matched device > > 3800000.pcie with driver layerscape-pcie-gen4 > > [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier > > 5000000.iommu not ready > > [ 5.226333] platform 3800000.pcie: Added to deferred list > > [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 > > [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release > > > > Note that the platform driver registration sets flags to disable async > > probing, supposedly so it can assume that any matching devices would > > be found by the time registration returns: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 > > : > > > > /* > > * We have to run our probes synchronously because we check if > > * we find any devices to bind to and exit with error if there > > * are any. > > */ > > drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > > > /* > > * Prevent driver from requesting probe deferral to avoid further > > * futile probe attempts. > > */ > > drv->prevent_deferred_probe = true; > > > > > > > > > > Bottom line: How was this code tested? This seems far from mature, > > this doesn't seem like that of an obscure condition to occur and it > > could create minefields for others down the road if it's fragile. > > I've reverted it for now, let's get this worked out for later releases. Thanks Greg! -Olof ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [GIT PULL] Driver core changes for 6.0-rc1 @ 2022-09-15 15:53 ` Olof Johansson 0 siblings, 0 replies; 22+ messages in thread From: Olof Johansson @ 2022-09-15 15:53 UTC (permalink / raw) To: Greg KH Cc: Saravana Kannan, Linus Torvalds, Andrew Morton, linux-kernel, Stephen Rothwell, Linux ARM Mailing List, Shawn Guo, Li Yang On Thu, Sep 15, 2022 at 3:47 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote: > > On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan <saravanak@google.com> wrote: > > > > > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson <olof@lixom.net> wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > > > of years. > > > > > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > > > strict enforcement. > > > > > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > > > others would too. > > > > > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > > > seeing. > > > > > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > > > place, so it's still there. > > > > > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > > > week with me, maybe you can give Olof a hint as to what to look for > > > > > here? > > > > > > > > I'm not sure what you want me to look for. The patch turns on > > > > enforcement of DT contents that never used to be enforced, so now my > > > > computer no longer boots. And it does it in a way that makes it > > > > impossible for someone not rebuilding kernels to debug to figure out > > > > what happened. > > > > > > Hi Olof, > > > > > > Sorry for the trouble. It doesn't print any error messages because > > > there are cases where it's block the probe where it wouldn't be an > > > error. If I printed it every time fw_devlink blocked a probe, it'd be > > > a ton of messages. > > > > > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > > > that'll stop indefinitely blocking probes. So what you are seeing > > > shouldn't be happening. After about 10 seconds (configurable), it > > > should stop blocking the probes. > > > > "Shouldn't be happening" is a pretty bold statement. It's not actually > > stuck on timeout in my case, and doesn't recover. > > > > Instead, what seems to be happening is that the PCIe driver, which > > registers as a platform_driver here: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 > > > > ends up registering, and the driver core now refuses to try to probe > > the device matches, since they no longer have their suppliers > > fulfilled (the smmu suppliers would not be tracked since they are > > optional here: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 > > > > So what happens is that the driver registration succeeds, but there > > have been no devices matched to it. So when it returns to the platform > > core, it thinks there are no devices bound to this driver, so it > > should be unregistered: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 > > > > That explains why the pcie core doesn't retry and just disappears, and > > stops retrying. > > > > This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: > > [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 > > [ 5.184301] bus: 'platform': __driver_probe_device: matched device > > 3600000.pcie with driver layerscape-pcie-gen4 > > [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier > > 5000000.iommu not ready > > [ 5.202607] platform 3600000.pcie: Added to deferred list > > [ 5.208024] bus: 'platform': __driver_probe_device: matched device > > 3800000.pcie with driver layerscape-pcie-gen4 > > [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier > > 5000000.iommu not ready > > [ 5.226333] platform 3800000.pcie: Added to deferred list > > [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 > > [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release > > > > Note that the platform driver registration sets flags to disable async > > probing, supposedly so it can assume that any matching devices would > > be found by the time registration returns: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 > > : > > > > /* > > * We have to run our probes synchronously because we check if > > * we find any devices to bind to and exit with error if there > > * are any. > > */ > > drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > > > /* > > * Prevent driver from requesting probe deferral to avoid further > > * futile probe attempts. > > */ > > drv->prevent_deferred_probe = true; > > > > > > > > > > Bottom line: How was this code tested? This seems far from mature, > > this doesn't seem like that of an obscure condition to occur and it > > could create minefields for others down the road if it's fragile. > > I've reverted it for now, let's get this worked out for later releases. Thanks Greg! -Olof _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-09-15 15:54 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-03 14:16 [GIT PULL] Driver core changes for 6.0-rc1 Greg KH 2022-08-04 19:27 ` pr-tracker-bot 2022-09-12 17:23 ` Olof Johansson 2022-09-12 17:23 ` Olof Johansson 2022-09-12 17:24 ` Olof Johansson 2022-09-12 17:24 ` Olof Johansson 2022-09-13 15:15 ` Greg KH 2022-09-13 15:15 ` Greg KH 2022-09-13 16:28 ` Olof Johansson 2022-09-13 16:28 ` Olof Johansson 2022-09-14 14:00 ` Greg KH 2022-09-14 14:00 ` Greg KH 2022-09-14 16:24 ` Olof Johansson 2022-09-14 16:24 ` Olof Johansson 2022-09-14 17:35 ` Saravana Kannan 2022-09-14 17:35 ` Saravana Kannan 2022-09-15 3:56 ` Olof Johansson 2022-09-15 3:56 ` Olof Johansson 2022-09-15 10:48 ` Greg KH 2022-09-15 10:48 ` Greg KH 2022-09-15 15:53 ` Olof Johansson 2022-09-15 15:53 ` Olof Johansson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.