linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
       [not found] ` <20201218031703.3053753-6-saravanak@google.com>
@ 2021-01-18 17:39   ` Geert Uytterhoeven
  2021-01-18 17:59     ` Marc Zyngier
       [not found]   ` <CGME20210111141814eucas1p1f388df07b789693a999042b27f0d8c2a@eucas1p1.samsung.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-18 17:39 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Android Kernel Team,
	Linux Kernel Mailing List, Jisheng Zhang, Kevin Hilman,
	John Stultz, Nicolas Saenz Julienne, Marc Zyngier,
	Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com> wrote:
> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
>   worry about functional dependencies between modules (depmod is still
>   needed for symbol dependencies).
>
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it.  If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
>
> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Shimoda-san reported that next-20210111 and later fail to boot
on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
is enabled.

I have bisected this to commit e590474768f1cc04 ("driver core: Set
fw_devlink=on by default").

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
       [not found]             ` <CAGETcx_QY3h83q1fSr=h_vMQdH-TMhVYPozPuSr=q4uv2Lr48w@mail.gmail.com>
@ 2021-01-18 17:43               ` Geert Uytterhoeven
  0 siblings, 0 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-18 17:43 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Marek Szyprowski, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, LKML, Jisheng Zhang, Kevin Hilman,
	John Stultz, Nicolas Saenz Julienne, Marc Zyngier,
	Linux Samsung SOC, Krzysztof Kozlowski,
	Bartlomiej Zolnierkiewicz, Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Wed, Jan 13, 2021 at 3:34 AM Saravana Kannan <saravanak@google.com> wrote:
> On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > On 11.01.2021 22:47, Saravana Kannan wrote:
> > > On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
> > > <m.szyprowski@samsung.com> wrote:
> > >> On 11.01.2021 12:12, Marek Szyprowski wrote:
> > >>> On 18.12.2020 04:17, Saravana Kannan wrote:
> > >>>> Cyclic dependencies in some firmware was one of the last remaining
> > >>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > >>>> dependencies don't block probing, set fw_devlink=on by default.
> > >>>>
> > >>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > >>>> only for systems with device tree firmware):
> > >>>> * Significantly cuts down deferred probes.
> > >>>> * Device probe is effectively attempted in graph order.
> > >>>> * Makes it much easier to load drivers as modules without having to
> > >>>>     worry about functional dependencies between modules (depmod is still
> > >>>>     needed for symbol dependencies).
> > >>>>
> > >>>> If this patch prevents some devices from probing, it's very likely due
> > >>>> to the system having one or more device drivers that "probe"/set up a
> > >>>> device (DT node with compatible property) without creating a struct
> > >>>> device for it.  If we hit such cases, the device drivers need to be
> > >>>> fixed so that they populate struct devices and probe them like normal
> > >>>> device drivers so that the driver core is aware of the devices and their
> > >>>> status. See [1] for an example of such a case.
> > >>>>
> > >>>> [1] -
> > >>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > >>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >>> This patch landed recently in linux next-20210111 as commit
> > >>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> > >>> breaks Exynos IOMMU operation, what causes lots of devices being
> > >>> deferred and not probed at all. I've briefly checked and noticed that
> > >>> exynos_sysmmu_probe() is never called after this patch. This is really
> > >>> strange for me, as the SYSMMU controllers on Exynos platform are
> > >>> regular platform devices registered by the OF code. The driver code is
> > >>> here: drivers/iommu/exynos-iommu.c, example dts:
> > >>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
> > >> Okay, I found the source of this problem. It is caused by Exynos power
> > >> domain driver, which is not platform driver yet. I will post a patch,
> > >> which converts it to the platform driver.
> > > Thanks Marek! Hopefully the debug logs I added were sufficient to
> > > figure out the reason.
> >
> > Frankly, it took me a while to figure out that device core waits for the
> > power domain devices. Maybe it would be possible to add some more debug
> > messages or hints? Like the reason of the deferred probe in
> > /sys/kernel/debug/devices_deferred ?
>
> There's already a /sys/devices/.../<device>/waiting_for_supplier file
> that tells you if the device is waiting for a supplier device to be
> added. That file goes away once the device probes. If the file has 1,
> then it's waiting for the supplier device to be added (like your
> case). If it's 0, then the device is just waiting on one of the
> existing suppliers to probe. You can find the existing suppliers
> through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
> these dev_dbg() to dev_info() if you need more details about deferred
> probing.

How are we supposed to check the contents of that file, if the system
doesn't even boot into userspace with a ramdisk? All hardware drivers
fail to probe. The only thing that works is "earlycon keep_bootcon",
and kernel output just stops after a while.

Thanks for your suggestions!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-18 17:39   ` [PATCH v1 5/5] driver core: Set fw_devlink=on by default Geert Uytterhoeven
@ 2021-01-18 17:59     ` Marc Zyngier
  2021-01-18 19:16       ` Geert Uytterhoeven
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2021-01-18 17:59 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Saravana Kannan, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Geert,

On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> Hi Saravana,
> 
> On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com> 
> wrote:
>> Cyclic dependencies in some firmware was one of the last remaining
>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> dependencies don't block probing, set fw_devlink=on by default.
>> 
>> Setting fw_devlink=on by default brings a bunch of benefits 
>> (currently,
>> only for systems with device tree firmware):
>> * Significantly cuts down deferred probes.
>> * Device probe is effectively attempted in graph order.
>> * Makes it much easier to load drivers as modules without having to
>>   worry about functional dependencies between modules (depmod is still
>>   needed for symbol dependencies).
>> 
>> If this patch prevents some devices from probing, it's very likely due
>> to the system having one or more device drivers that "probe"/set up a
>> device (DT node with compatible property) without creating a struct
>> device for it.  If we hit such cases, the device drivers need to be
>> fixed so that they populate struct devices and probe them like normal
>> device drivers so that the driver core is aware of the devices and 
>> their
>> status. See [1] for an example of such a case.
>> 
>> [1] - 
>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> 
> Shimoda-san reported that next-20210111 and later fail to boot
> on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> is enabled.
> 
> I have bisected this to commit e590474768f1cc04 ("driver core: Set
> fw_devlink=on by default").

There is a tentative patch from Saravana here[1], which works around
some issues on my RK3399 platform, and it'd be interesting to find
out whether that helps on your system.

Thanks,

         M.

[1] 
https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-18 17:59     ` Marc Zyngier
@ 2021-01-18 19:16       ` Geert Uytterhoeven
  2021-01-18 19:30         ` Marc Zyngier
  2021-01-18 21:18         ` Saravana Kannan
  0 siblings, 2 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-18 19:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Saravana Kannan, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Marc,

On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > wrote:
> >> Cyclic dependencies in some firmware was one of the last remaining
> >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >> dependencies don't block probing, set fw_devlink=on by default.
> >>
> >> Setting fw_devlink=on by default brings a bunch of benefits
> >> (currently,
> >> only for systems with device tree firmware):
> >> * Significantly cuts down deferred probes.
> >> * Device probe is effectively attempted in graph order.
> >> * Makes it much easier to load drivers as modules without having to
> >>   worry about functional dependencies between modules (depmod is still
> >>   needed for symbol dependencies).
> >>
> >> If this patch prevents some devices from probing, it's very likely due
> >> to the system having one or more device drivers that "probe"/set up a
> >> device (DT node with compatible property) without creating a struct
> >> device for it.  If we hit such cases, the device drivers need to be
> >> fixed so that they populate struct devices and probe them like normal
> >> device drivers so that the driver core is aware of the devices and
> >> their
> >> status. See [1] for an example of such a case.
> >>
> >> [1] -
> >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Shimoda-san reported that next-20210111 and later fail to boot
> > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > is enabled.
> >
> > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > fw_devlink=on by default").
>
> There is a tentative patch from Saravana here[1], which works around
> some issues on my RK3399 platform, and it'd be interesting to find
> out whether that helps on your system.
>
> Thanks,
>
>          M.
>
> [1]
> https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com

Thanks for the suggestion, but given no devices probe (incl. GPIO
providers), I'm afraid it won't help. [testing] Indeed.

With the debug prints in device_links_check_suppliers enabled, and
some postprocessing, I get:

    255 supplier e6180000.system-controller not ready
      9 supplier fe990000.iommu not ready
      9 supplier fe980000.iommu not ready
      6 supplier febd0000.iommu not ready
      6 supplier ec670000.iommu not ready
      3 supplier febe0000.iommu not ready
      3 supplier e7740000.iommu not ready
      3 supplier e6740000.iommu not ready
      3 supplier e65ee000.usb-phy not ready
      3 supplier e6570000.iommu not ready
      3 supplier e6054000.gpio not ready
      3 supplier e6053000.gpio not ready

As everything is part of a PM Domain, the (lack of the) system controller
must be the culprit. What's wrong with it? It is registered very early in
the boot:

[    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-18 19:16       ` Geert Uytterhoeven
@ 2021-01-18 19:30         ` Marc Zyngier
  2021-01-18 21:18         ` Saravana Kannan
  1 sibling, 0 replies; 15+ messages in thread
From: Marc Zyngier @ 2021-01-18 19:30 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Saravana Kannan, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

On 2021-01-18 19:16, Geert Uytterhoeven wrote:
> Hi Marc,
> 
> On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
>> On 2021-01-18 17:39, Geert Uytterhoeven wrote:
>> > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
>> > wrote:
>> >> Cyclic dependencies in some firmware was one of the last remaining
>> >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> >> dependencies don't block probing, set fw_devlink=on by default.
>> >>
>> >> Setting fw_devlink=on by default brings a bunch of benefits
>> >> (currently,
>> >> only for systems with device tree firmware):
>> >> * Significantly cuts down deferred probes.
>> >> * Device probe is effectively attempted in graph order.
>> >> * Makes it much easier to load drivers as modules without having to
>> >>   worry about functional dependencies between modules (depmod is still
>> >>   needed for symbol dependencies).
>> >>
>> >> If this patch prevents some devices from probing, it's very likely due
>> >> to the system having one or more device drivers that "probe"/set up a
>> >> device (DT node with compatible property) without creating a struct
>> >> device for it.  If we hit such cases, the device drivers need to be
>> >> fixed so that they populate struct devices and probe them like normal
>> >> device drivers so that the driver core is aware of the devices and
>> >> their
>> >> status. See [1] for an example of such a case.
>> >>
>> >> [1] -
>> >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>> >> Signed-off-by: Saravana Kannan <saravanak@google.com>
>> >
>> > Shimoda-san reported that next-20210111 and later fail to boot
>> > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
>> > is enabled.
>> >
>> > I have bisected this to commit e590474768f1cc04 ("driver core: Set
>> > fw_devlink=on by default").
>> 
>> There is a tentative patch from Saravana here[1], which works around
>> some issues on my RK3399 platform, and it'd be interesting to find
>> out whether that helps on your system.
>> 
>> Thanks,
>> 
>>          M.
>> 
>> [1]
>> https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
> 
> Thanks for the suggestion, but given no devices probe (incl. GPIO
> providers), I'm afraid it won't help. [testing] Indeed.
> 
> With the debug prints in device_links_check_suppliers enabled, and
> some postprocessing, I get:
> 
>     255 supplier e6180000.system-controller not ready
>       9 supplier fe990000.iommu not ready
>       9 supplier fe980000.iommu not ready
>       6 supplier febd0000.iommu not ready
>       6 supplier ec670000.iommu not ready
>       3 supplier febe0000.iommu not ready
>       3 supplier e7740000.iommu not ready
>       3 supplier e6740000.iommu not ready
>       3 supplier e65ee000.usb-phy not ready
>       3 supplier e6570000.iommu not ready
>       3 supplier e6054000.gpio not ready
>       3 supplier e6053000.gpio not ready
> 
> As everything is part of a PM Domain, the (lack of the) system 
> controller
> must be the culprit. What's wrong with it? It is registered very early 
> in
> the boot:
> 
> [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() 
> returned 0

Yeah, this looks like the exact same problem. The devlink stuff assumes
that because there is a "compatible" property, there will be a driver
directly associated with the node containing this property.

If any other node has a reference to that first node, the dependency
will only get resolved if/when that first node is bound to a driver.
Trouble is, there are *tons* of code in the tree that invalidate
this heuristic, and for each occurrence of this we get another failure.

The patch I referred to papers over it by registering a dummy driver,
but that doesn't scale easily...

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-18 19:16       ` Geert Uytterhoeven
  2021-01-18 19:30         ` Marc Zyngier
@ 2021-01-18 21:18         ` Saravana Kannan
       [not found]           ` <CAMuHMdVTKEy3rbdYYUKS+L1pY0y0ctMWRXNf7o+hJWyGR7L-Dg@mail.gmail.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Saravana Kannan @ 2021-01-18 21:18 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Marc,
>
> On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > wrote:
> > >> Cyclic dependencies in some firmware was one of the last remaining
> > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > >> dependencies don't block probing, set fw_devlink=on by default.
> > >>
> > >> Setting fw_devlink=on by default brings a bunch of benefits
> > >> (currently,
> > >> only for systems with device tree firmware):
> > >> * Significantly cuts down deferred probes.
> > >> * Device probe is effectively attempted in graph order.
> > >> * Makes it much easier to load drivers as modules without having to
> > >>   worry about functional dependencies between modules (depmod is still
> > >>   needed for symbol dependencies).
> > >>
> > >> If this patch prevents some devices from probing, it's very likely due
> > >> to the system having one or more device drivers that "probe"/set up a
> > >> device (DT node with compatible property) without creating a struct
> > >> device for it.  If we hit such cases, the device drivers need to be
> > >> fixed so that they populate struct devices and probe them like normal
> > >> device drivers so that the driver core is aware of the devices and
> > >> their
> > >> status. See [1] for an example of such a case.
> > >>
> > >> [1] -
> > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >
> > > Shimoda-san reported that next-20210111 and later fail to boot
> > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > is enabled.
> > >
> > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > fw_devlink=on by default").
> >
> > There is a tentative patch from Saravana here[1], which works around
> > some issues on my RK3399 platform, and it'd be interesting to find
> > out whether that helps on your system.
> >
> > Thanks,
> >
> >          M.
> >
> > [1]
> > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
>
> Thanks for the suggestion, but given no devices probe (incl. GPIO
> providers), I'm afraid it won't help. [testing] Indeed.
>
> With the debug prints in device_links_check_suppliers enabled, and
> some postprocessing, I get:
>
>     255 supplier e6180000.system-controller not ready
>       9 supplier fe990000.iommu not ready
>       9 supplier fe980000.iommu not ready
>       6 supplier febd0000.iommu not ready
>       6 supplier ec670000.iommu not ready
>       3 supplier febe0000.iommu not ready
>       3 supplier e7740000.iommu not ready
>       3 supplier e6740000.iommu not ready
>       3 supplier e65ee000.usb-phy not ready
>       3 supplier e6570000.iommu not ready
>       3 supplier e6054000.gpio not ready
>       3 supplier e6053000.gpio not ready
>
> As everything is part of a PM Domain, the (lack of the) system controller
> must be the culprit. What's wrong with it? It is registered very early in
> the boot:
>
> [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0

Hi Geert,

Thanks for reporting the issue.

Looks like you found the important logs. Can you please enable all
these logs and send the early con logs as an attachment (so I don't
need to deal with lines getting wrapped)?
1. The ones in device_links_check_suppliers()
2. The ones in device_link_add()
3. initcall_debug=1

That should help us figure out what's going on. Also, what's the DT
that corresponds to one of the boards that see this issue?

Lastly, can you please pick up these 3 patches (some need clean up
before they merge) to make sure it's not an issue being worked on from
other bug reports?
https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@google.com/
https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@google.com/
https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@google.com/

I have a strong hunch the 2nd one will fix your issues. fw_devlink can
handle cyclic dependencies now (it basically reverts to
fw_devlink=permissive mode for devices in the cycle), but it needs to
"see" all the dependencies to know there's a cycle. So want to make
sure it "sees" the "gpios" binding used all over some of the Renesas
DT files.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
       [not found]           ` <CAMuHMdVTKEy3rbdYYUKS+L1pY0y0ctMWRXNf7o+hJWyGR7L-Dg@mail.gmail.com>
@ 2021-01-19 18:08             ` Saravana Kannan
  2021-01-19 21:50               ` Saravana Kannan
  2021-01-20  9:11               ` Geert Uytterhoeven
  0 siblings, 2 replies; 15+ messages in thread
From: Saravana Kannan @ 2021-01-19 18:08 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > <geert@linux-m68k.org> wrote:
> > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > wrote:
> > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > >>
> > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > >> (currently,
> > > > >> only for systems with device tree firmware):
> > > > >> * Significantly cuts down deferred probes.
> > > > >> * Device probe is effectively attempted in graph order.
> > > > >> * Makes it much easier to load drivers as modules without having to
> > > > >>   worry about functional dependencies between modules (depmod is still
> > > > >>   needed for symbol dependencies).
> > > > >>
> > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > >> device (DT node with compatible property) without creating a struct
> > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > >> fixed so that they populate struct devices and probe them like normal
> > > > >> device drivers so that the driver core is aware of the devices and
> > > > >> their
> > > > >> status. See [1] for an example of such a case.
> > > > >>
> > > > >> [1] -
> > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > >
> > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > is enabled.
> > > > >
> > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > fw_devlink=on by default").
> > > >
> > > > There is a tentative patch from Saravana here[1], which works around
> > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > out whether that helps on your system.
> > > >
> > > > Thanks,
> > > >
> > > >          M.
> > > >
> > > > [1]
> > > > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
> > >
> > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > providers), I'm afraid it won't help. [testing] Indeed.
> > >
> > > With the debug prints in device_links_check_suppliers enabled, and
> > > some postprocessing, I get:
> > >
> > >     255 supplier e6180000.system-controller not ready
> > >       9 supplier fe990000.iommu not ready
> > >       9 supplier fe980000.iommu not ready
> > >       6 supplier febd0000.iommu not ready
> > >       6 supplier ec670000.iommu not ready
> > >       3 supplier febe0000.iommu not ready
> > >       3 supplier e7740000.iommu not ready
> > >       3 supplier e6740000.iommu not ready
> > >       3 supplier e65ee000.usb-phy not ready
> > >       3 supplier e6570000.iommu not ready
> > >       3 supplier e6054000.gpio not ready
> > >       3 supplier e6053000.gpio not ready
> > >
> > > As everything is part of a PM Domain, the (lack of the) system controller
> > > must be the culprit. What's wrong with it? It is registered very early in
> > > the boot:
> > >
> > > [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
>
> > Looks like you found the important logs. Can you please enable all
> > these logs and send the early con logs as an attachment (so I don't
> > need to deal with lines getting wrapped)?
> > 1. The ones in device_links_check_suppliers()
> > 2. The ones in device_link_add()
> > 3. initcall_debug=1
>
> I have attached[*] the requested log.
>
> > That should help us figure out what's going on. Also, what's the DT
> > that corresponds to one of the boards that see this issue?
>
> arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
>
> > Lastly, can you please pick up these 3 patches (some need clean up
> > before they merge) to make sure it's not an issue being worked on from
> > other bug reports?
> > https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@google.com/
> > https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@google.com/
> > https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@google.com/
> >
> > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > handle cyclic dependencies now (it basically reverts to
> > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > "see" all the dependencies to know there's a cycle. So want to make
> > sure it "sees" the "gpios" binding used all over some of the Renesas
> > DT files.
>
> These patches don't help.
> The 2nd one actually introduces a new failure:
>
>      OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> #gpio-cells for /cpus/cpu@102
>
> Note that my issues don't seem to be GPIO-related at all.
>
> BTW, you are aware IOMMUs and DMA controllers are optional?
> I.e. device drivers with iommus and/or dmas DT properties where the
> targets of these properties do not have a driver should still be probed,
> eventually.  But if the IOMMU or DMA drivers are present, they should be
> probed first, so the device drivers can make use of them.

Thanks for the logs and details.

Yeah, this is going to be a problem then. How is this handled in
static kernels today? Do we just try to make sure the iommus driver
probes the iommu device before the consumers? And then the consumers
simply don't defer probe on failure to get iommu?

I can make this work if modules are not enabled (needs some code
changes), but it's not going to work when there are modules. There's
no way to tell if an iommu module won't be loaded soon. Also, device
links doing this behavior only for iommu/dma is probably not a good
idea. So, whatever we do will have to be common behavior. :(

Another intermediate option I was thinking was having a
CONFIG_FW_DEVLINK_OFF/PERMISSIVE/ON and defaulting it to ON for ARM64
and turning it off in the defconfig for boards for which this doesn't
work. That way, we can incrementally enable fw_devlink.

This week is a very hectic week for me. So, please bear with slow
responses from me for rest of this week. Let me think about this a bit
to see if I can come up with a better solution than what I have in
mind.

Also, can you try deleting "iommu" and "dma" parsing in
of_supplier_bindings[] in driver/of/property.c and see if it helps?
Then we'd know this is the reason for things not working in your case.

> Thanks!
>
> [*] Although attaching means people like myself cannot read and comment
>     on the log easily, without saving the attachment first.
>     That's also the reason why patches should be submitted inline...

Yeah, I see your concern. If you want to add comments to logs when
sending them, yeah, please go ahead and put it inline. Or if someone
wants to add comments to what you attached, they could copy paste the
relevant sections and add comments.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-19 18:08             ` Saravana Kannan
@ 2021-01-19 21:50               ` Saravana Kannan
  2021-01-20  9:40                 ` Geert Uytterhoeven
  2021-01-20  9:11               ` Geert Uytterhoeven
  1 sibling, 1 reply; 15+ messages in thread
From: Saravana Kannan @ 2021-01-19 21:50 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Saravana,
> >
> > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > <geert@linux-m68k.org> wrote:
> > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > wrote:
> > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > >>
> > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > >> (currently,
> > > > > >> only for systems with device tree firmware):
> > > > > >> * Significantly cuts down deferred probes.
> > > > > >> * Device probe is effectively attempted in graph order.
> > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > >>   needed for symbol dependencies).
> > > > > >>
> > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > >> device (DT node with compatible property) without creating a struct
> > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > >> their
> > > > > >> status. See [1] for an example of such a case.
> > > > > >>
> > > > > >> [1] -
> > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > >
> > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > is enabled.
> > > > > >
> > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > fw_devlink=on by default").
> > > > >
> > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > out whether that helps on your system.
> > > > >
> > > > > Thanks,
> > > > >
> > > > >          M.
> > > > >
> > > > > [1]
> > > > > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
> > > >
> > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > >
> > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > some postprocessing, I get:
> > > >
> > > >     255 supplier e6180000.system-controller not ready
> > > >       9 supplier fe990000.iommu not ready
> > > >       9 supplier fe980000.iommu not ready
> > > >       6 supplier febd0000.iommu not ready
> > > >       6 supplier ec670000.iommu not ready
> > > >       3 supplier febe0000.iommu not ready
> > > >       3 supplier e7740000.iommu not ready
> > > >       3 supplier e6740000.iommu not ready
> > > >       3 supplier e65ee000.usb-phy not ready
> > > >       3 supplier e6570000.iommu not ready
> > > >       3 supplier e6054000.gpio not ready
> > > >       3 supplier e6053000.gpio not ready
> > > >
> > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > the boot:
> > > >
> > > > [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> >
> > > Looks like you found the important logs. Can you please enable all
> > > these logs and send the early con logs as an attachment (so I don't
> > > need to deal with lines getting wrapped)?
> > > 1. The ones in device_links_check_suppliers()
> > > 2. The ones in device_link_add()
> > > 3. initcall_debug=1
> >
> > I have attached[*] the requested log.
> >
> > > That should help us figure out what's going on. Also, what's the DT
> > > that corresponds to one of the boards that see this issue?
> >
> > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> >
> > > Lastly, can you please pick up these 3 patches (some need clean up
> > > before they merge) to make sure it's not an issue being worked on from
> > > other bug reports?
> > > https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@google.com/
> > > https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@google.com/
> > > https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@google.com/
> > >
> > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > handle cyclic dependencies now (it basically reverts to
> > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > "see" all the dependencies to know there's a cycle. So want to make
> > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > DT files.
> >
> > These patches don't help.
> > The 2nd one actually introduces a new failure:
> >
> >      OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > #gpio-cells for /cpus/cpu@102
> >
> > Note that my issues don't seem to be GPIO-related at all.
> >
> > BTW, you are aware IOMMUs and DMA controllers are optional?
> > I.e. device drivers with iommus and/or dmas DT properties where the
> > targets of these properties do not have a driver should still be probed,
> > eventually.  But if the IOMMU or DMA drivers are present, they should be
> > probed first, so the device drivers can make use of them.
>
> Thanks for the logs and details.
>
> Yeah, this is going to be a problem then. How is this handled in
> static kernels today? Do we just try to make sure the iommus driver
> probes the iommu device before the consumers? And then the consumers
> simply don't defer probe on failure to get iommu?
>
> I can make this work if modules are not enabled (needs some code
> changes), but it's not going to work when there are modules. There's
> no way to tell if an iommu module won't be loaded soon. Also, device
> links doing this behavior only for iommu/dma is probably not a good
> idea. So, whatever we do will have to be common behavior. :(
>
> Another intermediate option I was thinking was having a
> CONFIG_FW_DEVLINK_OFF/PERMISSIVE/ON and defaulting it to ON for ARM64
> and turning it off in the defconfig for boards for which this doesn't
> work. That way, we can incrementally enable fw_devlink.
>
> This week is a very hectic week for me. So, please bear with slow
> responses from me for rest of this week. Let me think about this a bit
> to see if I can come up with a better solution than what I have in
> mind.
>
> Also, can you try deleting "iommu" and "dma" parsing in
> of_supplier_bindings[] in driver/of/property.c and see if it helps?
> Then we'd know this is the reason for things not working in your case.

Hi Geert,

I took a look at your logs. It looks like your guess is right. It's at
least one of the issues.

You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
driver. You already have a platform device created for it. So just go
ahead and probe it with a platform driver. See what Marek did here
[1].

You probably had to implement it as an "initcall based driver"
because you had to play initcall chicken to make sure the PD hardware
was initialized before the consumers. With fw_devlink=on you won't
have to worry about that. As an added benefit of implementing a proper
platform driver, you can  actually implement runtime PM now, your
suspend/resume would be more robust, etc.

[1] - https://lore.kernel.org/lkml/20210113110320.13149-1-m.szyprowski@samsung.com/

-Saravana

>
> > Thanks!
> >
> > [*] Although attaching means people like myself cannot read and comment
> >     on the log easily, without saving the attachment first.
> >     That's also the reason why patches should be submitted inline...
>
> Yeah, I see your concern. If you want to add comments to logs when
> sending them, yeah, please go ahead and put it inline. Or if someone
> wants to add comments to what you attached, they could copy paste the
> relevant sections and add comments.
>
> Thanks,
> Saravana

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-19 18:08             ` Saravana Kannan
  2021-01-19 21:50               ` Saravana Kannan
@ 2021-01-20  9:11               ` Geert Uytterhoeven
  1 sibling, 0 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-20  9:11 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Tue, Jan 19, 2021 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > BTW, you are aware IOMMUs and DMA controllers are optional?
> > I.e. device drivers with iommus and/or dmas DT properties where the
> > targets of these properties do not have a driver should still be probed,
> > eventually.  But if the IOMMU or DMA drivers are present, they should be
> > probed first, so the device drivers can make use of them.
>
> Yeah, this is going to be a problem then. How is this handled in
> static kernels today? Do we just try to make sure the iommus driver
> probes the iommu device before the consumers? And then the consumers
> simply don't defer probe on failure to get iommu?

Iommus are handled by the iommu framework, not by the driver.
So the framework decides if/when it's OK to probe a device tied to an
iommu.  Hence the consumers' drivers don't return -EPROBE_DEFER, the
framework takes care of that, before drivers' probe() functions are
called.

DMA is handled by consumer drivers, and driver-specific.  Many consumer
drivers consider DMA optional, and fall back to PIO if getting the DMA
channel failed.  Some drivers retry getting the DMA channel when the
device is used, and thus may start using DMA when the DMAC driver
appears and probes.

> I can make this work if modules are not enabled (needs some code
> changes), but it's not going to work when there are modules. There's
> no way to tell if an iommu module won't be loaded soon. Also, device
> links doing this behavior only for iommu/dma is probably not a good
> idea. So, whatever we do will have to be common behavior. :(

The iommu driver definitely needs to be built-in.
Modular DMAC drivers currently work with consumer drivers that
either consider DMA mandatory, or retry obtaining DMA channels.

> Also, can you try deleting "iommu" and "dma" parsing in
> of_supplier_bindings[] in driver/of/property.c and see if it helps?
> Then we'd know this is the reason for things not working in your case.

It also fails on another system without "iommus" properties:

    182 supplier e6180000.system-controller not ready
     18 supplier e6055400.gpio not ready
     15 supplier e6055800.gpio not ready
     15 supplier e6052000.gpio not ready
      6 supplier e6055000.gpio not ready

The system controller is the culprit, and is a dependency for all
devices due to power-domains.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-19 21:50               ` Saravana Kannan
@ 2021-01-20  9:40                 ` Geert Uytterhoeven
  2021-01-20 14:26                   ` Geert Uytterhoeven
  0 siblings, 1 reply; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-20  9:40 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
> > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > <geert@linux-m68k.org> wrote:
> > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > > wrote:
> > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > >>
> > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > >> (currently,
> > > > > > >> only for systems with device tree firmware):
> > > > > > >> * Significantly cuts down deferred probes.
> > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > > >>   needed for symbol dependencies).
> > > > > > >>
> > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > >> their
> > > > > > >> status. See [1] for an example of such a case.
> > > > > > >>
> > > > > > >> [1] -
> > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > >
> > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > is enabled.
> > > > > > >
> > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > fw_devlink=on by default").
> > > > > >
> > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > out whether that helps on your system.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >          M.
> > > > > >
> > > > > > [1]
> > > > > > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
> > > > >
> > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > >
> > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > some postprocessing, I get:
> > > > >
> > > > >     255 supplier e6180000.system-controller not ready
> > > > >       9 supplier fe990000.iommu not ready
> > > > >       9 supplier fe980000.iommu not ready
> > > > >       6 supplier febd0000.iommu not ready
> > > > >       6 supplier ec670000.iommu not ready
> > > > >       3 supplier febe0000.iommu not ready
> > > > >       3 supplier e7740000.iommu not ready
> > > > >       3 supplier e6740000.iommu not ready
> > > > >       3 supplier e65ee000.usb-phy not ready
> > > > >       3 supplier e6570000.iommu not ready
> > > > >       3 supplier e6054000.gpio not ready
> > > > >       3 supplier e6053000.gpio not ready
> > > > >
> > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > the boot:
> > > > >
> > > > > [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > >
> > > > Looks like you found the important logs. Can you please enable all
> > > > these logs and send the early con logs as an attachment (so I don't
> > > > need to deal with lines getting wrapped)?
> > > > 1. The ones in device_links_check_suppliers()
> > > > 2. The ones in device_link_add()
> > > > 3. initcall_debug=1
> > >
> > > I have attached[*] the requested log.
> > >
> > > > That should help us figure out what's going on. Also, what's the DT
> > > > that corresponds to one of the boards that see this issue?
> > >
> > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > >
> > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > before they merge) to make sure it's not an issue being worked on from
> > > > other bug reports?
> > > > https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@google.com/
> > > > https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@google.com/
> > > > https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@google.com/
> > > >
> > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > handle cyclic dependencies now (it basically reverts to
> > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > DT files.
> > >
> > > These patches don't help.
> > > The 2nd one actually introduces a new failure:
> > >
> > >      OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > #gpio-cells for /cpus/cpu@102
> > >
> > > Note that my issues don't seem to be GPIO-related at all.

> I took a look at your logs. It looks like your guess is right. It's at
> least one of the issues.
>
> You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> driver. You already have a platform device created for it. So just go
> ahead and probe it with a platform driver. See what Marek did here
> [1].
>
> You probably had to implement it as an "initcall based driver"
> because you had to play initcall chicken to make sure the PD hardware
> was initialized before the consumers. With fw_devlink=on you won't
> have to worry about that. As an added benefit of implementing a proper
> platform driver, you can  actually implement runtime PM now, your
> suspend/resume would be more robust, etc.

On R-Car H1, the system controller driver needs to be active before
secondary CPU setup, hence the early_initcall().
platform_bus_init() is called after that, so this is gonna need a split
initialization.  Or a dummy platform driver to make devlinks think
everything is fine ;-)

So basically all producer DT drivers not using a platform (or e.g. i2c)
driver are now broken?
Including all clock drivers using CLK_OF_DECLARE()?

    $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
"\.compatible\>") | wc -l
    249

(includes false positives)

I doubt they'll all get fixed for v5.12, as we're already at rc4...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-20  9:40                 ` Geert Uytterhoeven
@ 2021-01-20 14:26                   ` Geert Uytterhoeven
  2021-01-20 17:22                     ` Saravana Kannan
  0 siblings, 1 reply; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-20 14:26 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > <geert@linux-m68k.org> wrote:
> > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > > > wrote:
> > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > >>
> > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > >> (currently,
> > > > > > > >> only for systems with device tree firmware):
> > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > > > >>   needed for symbol dependencies).
> > > > > > > >>
> > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > >> their
> > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > >>
> > > > > > > >> [1] -
> > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > >
> > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > is enabled.
> > > > > > > >
> > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > fw_devlink=on by default").
> > > > > > >
> > > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > > out whether that helps on your system.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >          M.
> > > > > > >
> > > > > > > [1]
> > > > > > > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
> > > > > >
> > > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > > >
> > > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > > some postprocessing, I get:
> > > > > >
> > > > > >     255 supplier e6180000.system-controller not ready
> > > > > >       9 supplier fe990000.iommu not ready
> > > > > >       9 supplier fe980000.iommu not ready
> > > > > >       6 supplier febd0000.iommu not ready
> > > > > >       6 supplier ec670000.iommu not ready
> > > > > >       3 supplier febe0000.iommu not ready
> > > > > >       3 supplier e7740000.iommu not ready
> > > > > >       3 supplier e6740000.iommu not ready
> > > > > >       3 supplier e65ee000.usb-phy not ready
> > > > > >       3 supplier e6570000.iommu not ready
> > > > > >       3 supplier e6054000.gpio not ready
> > > > > >       3 supplier e6053000.gpio not ready
> > > > > >
> > > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > > the boot:
> > > > > >
> > > > > > [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > > >
> > > > > Looks like you found the important logs. Can you please enable all
> > > > > these logs and send the early con logs as an attachment (so I don't
> > > > > need to deal with lines getting wrapped)?
> > > > > 1. The ones in device_links_check_suppliers()
> > > > > 2. The ones in device_link_add()
> > > > > 3. initcall_debug=1
> > > >
> > > > I have attached[*] the requested log.
> > > >
> > > > > That should help us figure out what's going on. Also, what's the DT
> > > > > that corresponds to one of the boards that see this issue?
> > > >
> > > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > > >
> > > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > > before they merge) to make sure it's not an issue being worked on from
> > > > > other bug reports?
> > > > > https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@google.com/
> > > > > https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@google.com/
> > > > > https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@google.com/
> > > > >
> > > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > > handle cyclic dependencies now (it basically reverts to
> > > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > > DT files.
> > > >
> > > > These patches don't help.
> > > > The 2nd one actually introduces a new failure:
> > > >
> > > >      OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > > #gpio-cells for /cpus/cpu@102
> > > >
> > > > Note that my issues don't seem to be GPIO-related at all.
>
> > I took a look at your logs. It looks like your guess is right. It's at
> > least one of the issues.
> >
> > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > driver. You already have a platform device created for it. So just go
> > ahead and probe it with a platform driver. See what Marek did here
> > [1].
> >
> > You probably had to implement it as an "initcall based driver"
> > because you had to play initcall chicken to make sure the PD hardware
> > was initialized before the consumers. With fw_devlink=on you won't
> > have to worry about that. As an added benefit of implementing a proper
> > platform driver, you can  actually implement runtime PM now, your
> > suspend/resume would be more robust, etc.
>
> On R-Car H1, the system controller driver needs to be active before
> secondary CPU setup, hence the early_initcall().
> platform_bus_init() is called after that, so this is gonna need a split
> initialization.  Or a dummy platform driver to make devlinks think
> everything is fine ;-)

Note that adding a dummy platform driver does work.

> So basically all producer DT drivers not using a platform (or e.g. i2c)
> driver are now broken?
> Including all clock drivers using CLK_OF_DECLARE()?

Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
is set, and of_clk_init() sets that flag.  So rcar-sysc should do so, too.
Patch sent.

>     $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> "\.compatible\>") | wc -l
>     249
>
> (includes false positives)
>
> I doubt they'll all get fixed for v5.12, as we're already at rc4...

Still more than 100 drivers to fix?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-20 14:26                   ` Geert Uytterhoeven
@ 2021-01-20 17:22                     ` Saravana Kannan
  2021-01-21 16:04                       ` Geert Uytterhoeven
  0 siblings, 1 reply; 15+ messages in thread
From: Saravana Kannan @ 2021-01-20 17:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > <geert@linux-m68k.org> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > > > > wrote:
> > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > >>
> > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > >> (currently,
> > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > > > > >>   needed for symbol dependencies).
> > > > > > > > >>
> > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > >> their
> > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > >>
> > > > > > > > >> [1] -
> > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > >
> > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > is enabled.
> > > > > > > > >
> > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > fw_devlink=on by default").
> > > > > > > >
> > > > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > > > out whether that helps on your system.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >          M.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@google.com
> > > > > > >
> > > > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > > > >
> > > > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > > > some postprocessing, I get:
> > > > > > >
> > > > > > >     255 supplier e6180000.system-controller not ready
> > > > > > >       9 supplier fe990000.iommu not ready
> > > > > > >       9 supplier fe980000.iommu not ready
> > > > > > >       6 supplier febd0000.iommu not ready
> > > > > > >       6 supplier ec670000.iommu not ready
> > > > > > >       3 supplier febe0000.iommu not ready
> > > > > > >       3 supplier e7740000.iommu not ready
> > > > > > >       3 supplier e6740000.iommu not ready
> > > > > > >       3 supplier e65ee000.usb-phy not ready
> > > > > > >       3 supplier e6570000.iommu not ready
> > > > > > >       3 supplier e6054000.gpio not ready
> > > > > > >       3 supplier e6053000.gpio not ready
> > > > > > >
> > > > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > > > the boot:
> > > > > > >
> > > > > > > [    0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > > > >
> > > > > > Looks like you found the important logs. Can you please enable all
> > > > > > these logs and send the early con logs as an attachment (so I don't
> > > > > > need to deal with lines getting wrapped)?
> > > > > > 1. The ones in device_links_check_suppliers()
> > > > > > 2. The ones in device_link_add()
> > > > > > 3. initcall_debug=1
> > > > >
> > > > > I have attached[*] the requested log.
> > > > >
> > > > > > That should help us figure out what's going on. Also, what's the DT
> > > > > > that corresponds to one of the boards that see this issue?
> > > > >
> > > > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > > > >
> > > > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > > > before they merge) to make sure it's not an issue being worked on from
> > > > > > other bug reports?
> > > > > > https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@google.com/
> > > > > > https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@google.com/
> > > > > > https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@google.com/
> > > > > >
> > > > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > > > handle cyclic dependencies now (it basically reverts to
> > > > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > > > DT files.
> > > > >
> > > > > These patches don't help.
> > > > > The 2nd one actually introduces a new failure:
> > > > >
> > > > >      OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > > > #gpio-cells for /cpus/cpu@102
> > > > >
> > > > > Note that my issues don't seem to be GPIO-related at all.
> >
> > > I took a look at your logs. It looks like your guess is right. It's at
> > > least one of the issues.
> > >
> > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > driver. You already have a platform device created for it. So just go
> > > ahead and probe it with a platform driver. See what Marek did here
> > > [1].
> > >
> > > You probably had to implement it as an "initcall based driver"
> > > because you had to play initcall chicken to make sure the PD hardware
> > > was initialized before the consumers. With fw_devlink=on you won't
> > > have to worry about that. As an added benefit of implementing a proper
> > > platform driver, you can  actually implement runtime PM now, your
> > > suspend/resume would be more robust, etc.
> >
> > On R-Car H1, the system controller driver needs to be active before
> > secondary CPU setup, hence the early_initcall().
> > platform_bus_init() is called after that, so this is gonna need a split
> > initialization.  Or a dummy platform driver to make devlinks think
> > everything is fine ;-)

I was wondering if you could still probe the "not needed by CPU" power
domains (if there are any) as devices. Using driver-core brings you
good things :)

>
> Note that adding a dummy platform driver does work.
>
> > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > driver are now broken?
> > Including all clock drivers using CLK_OF_DECLARE()?
>
> Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> is set, and of_clk_init() sets that flag.  So rcar-sysc should do so, too.
> Patch sent.
>
> >     $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > "\.compatible\>") | wc -l
> >     249
> >
> > (includes false positives)
> >
> > I doubt they'll all get fixed for v5.12, as we're already at rc4...
>
> Still more than 100 drivers to fix?

Not fully sure what the grep is trying to catch, but fw_devlink
supports devices on any bus (i2c, platform, pci, etc). So that's not a
problem. It'll be a problem when a struct device is never created for
a real device. Or if it's created, but never probed.

I'm also looking into a bunch of other options for fallback when
fw_devlink=on doesn't work. Too much to explain here -- patches are
easier :)

-Saravana

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-20 17:22                     ` Saravana Kannan
@ 2021-01-21 16:04                       ` Geert Uytterhoeven
  2021-01-25 23:30                         ` Saravana Kannan
  0 siblings, 1 reply; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-21 16:04 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > <geert@linux-m68k.org> wrote:
> > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > <geert@linux-m68k.org> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > > > > > wrote:
> > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > >>
> > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > >> (currently,
> > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > > > > > >>   needed for symbol dependencies).
> > > > > > > > > >>
> > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > >> their
> > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > >>
> > > > > > > > > >> [1] -
> > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > > >
> > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > is enabled.
> > > > > > > > > >
> > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > fw_devlink=on by default").

> > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > driver. You already have a platform device created for it. So just go
> > > > ahead and probe it with a platform driver. See what Marek did here
> > > > [1].
> > > >
> > > > You probably had to implement it as an "initcall based driver"
> > > > because you had to play initcall chicken to make sure the PD hardware
> > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > have to worry about that. As an added benefit of implementing a proper
> > > > platform driver, you can  actually implement runtime PM now, your
> > > > suspend/resume would be more robust, etc.
> > >
> > > On R-Car H1, the system controller driver needs to be active before
> > > secondary CPU setup, hence the early_initcall().
> > > platform_bus_init() is called after that, so this is gonna need a split
> > > initialization.  Or a dummy platform driver to make devlinks think
> > > everything is fine ;-)
>
> I was wondering if you could still probe the "not needed by CPU" power
> domains (if there are any) as devices. Using driver-core brings you
> good things :)

 1. That would mean splitting the driver in two parts, looping over the
    tables twice, while everything can just be done in the first pass?

 2. Which "good things" do you have in mind? Making the driver modular?
    Ignoring the dependency for secondary CPU setup on R-Car H1, this
    driver could indeed be modular on R-Car Gen2 and Gen3, as long as
    the boot loader would pass a ramdisk with the module to the kernel.
    The ramdisk could not be loaded in any other way, as all I/O
    devices are part of a PM Domain, and thus depend on the SYSC driver.
    Note that on some (non-R-Car) SoCs, the timers may be part of a PM
    Domain, too.

> > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > driver are now broken?
> > > Including all clock drivers using CLK_OF_DECLARE()?
> >
> > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > is set, and of_clk_init() sets that flag.  So rcar-sysc should do so, too.
> > Patch sent.
> > >     $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > "\.compatible\>") | wc -l
> > >     249
> > >
> > > (includes false positives)
> > >
> > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> >
> > Still more than 100 drivers to fix?
>
> Not fully sure what the grep is trying to catch, but fw_devlink
> supports devices on any bus (i2c, platform, pci, etc). So that's not a
> problem. It'll be a problem when a struct device is never created for
> a real device. Or if it's created, but never probed.

The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
and not using a driver model driver (i.e. not matching "*_driver").

> I'm also looking into a bunch of other options for fallback when
> fw_devlink=on doesn't work. Too much to explain here -- patches are
> easier :)

I gave it a try on all Renesas platforms I have local access to:

  - R-Car Gen2/Gen3:
    Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
    config boot again.  Remaining issues:
      - CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
      - CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
        Note that Ethernet does not use the R-Car DMAC, so DHCP works.
        Nevertheless, after that everything hangs, and the board does not
        respond to pings anymore
    Both IOMMU and DMAC dependencies are optional, hence should be dropped
    at late boot (late_initcall?).

  - SH-Mobile AG5 and R-Mobile APE6:
    The rmobile-sysc driver is similar to the rcar-sysc driver, and does
    not use a platform device.
    Still, it works, because all dependencies on the System Controller
    become unblocked when the rmobile-reset driver binds against the
    "renesas,sysc-rmobile" device.  Obviously it would fail if no
    support for that driver is included in your kernel...

  - R-Mobile A1:
    Also using the rmobile-sysc driver.
    However, this is a single core Cortex-A9, i.e. it does not have an
    ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
    Timer (like SH-Mobile AG5).  The timer used (TMU) is located in a PM
    Domain controlled by the rmobile-sysc driver, and driver
    initialization is postponed beyond the point where something relies
    on a working timer, causing a hang.

    Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
    this, but prevents the rmobile-reset driver from binding against the
    same device node, so the reset handling will have to be incorporated
    into the rmobile-sysc driver (and will thus be registered very
    early).

  - RZ/A1 and RZ/A2:
    These are not affected, as the timer used (OSTM) is not a platform
    driver, but uses TIMER_OF_DECLARE().
    Note that the RZ/A2 clock driver uses split initialization:
      1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
      2. Other clocks are initialized by platform_driver_probe() from a
         subsys_initcall.
    If the OSTM driver would be a platform_driver, it would block on the
    block dependency.  Setting the OF_POPULATED flag in the clock driver
    would not work: while that flag would unblock probing of the timer
    driver, it would also prevent the second part of the clock driver
    initialization.

Now, back to the things I was supposed to work on this week ;-)

[1] https://lore.kernel.org/linux-arm-kernel/20210120142323.2203705-1-geert+renesas@glider.be/


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-21 16:04                       ` Geert Uytterhoeven
@ 2021-01-25 23:30                         ` Saravana Kannan
  2021-01-26  8:25                           ` Geert Uytterhoeven
  0 siblings, 1 reply; 15+ messages in thread
From: Saravana Kannan @ 2021-01-25 23:30 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

On Thu, Jan 21, 2021 at 8:04 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > > <geert@linux-m68k.org> wrote:
> > > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > > <geert@linux-m68k.org> wrote:
> > > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > > >>
> > > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > > >> (currently,
> > > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > > > > > > >>   needed for symbol dependencies).
> > > > > > > > > > >>
> > > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > > >> their
> > > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > > >>
> > > > > > > > > > >> [1] -
> > > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > > > >
> > > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > > is enabled.
> > > > > > > > > > >
> > > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > > fw_devlink=on by default").
>
> > > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > > driver. You already have a platform device created for it. So just go
> > > > > ahead and probe it with a platform driver. See what Marek did here
> > > > > [1].
> > > > >
> > > > > You probably had to implement it as an "initcall based driver"
> > > > > because you had to play initcall chicken to make sure the PD hardware
> > > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > > have to worry about that. As an added benefit of implementing a proper
> > > > > platform driver, you can  actually implement runtime PM now, your
> > > > > suspend/resume would be more robust, etc.
> > > >
> > > > On R-Car H1, the system controller driver needs to be active before
> > > > secondary CPU setup, hence the early_initcall().
> > > > platform_bus_init() is called after that, so this is gonna need a split
> > > > initialization.  Or a dummy platform driver to make devlinks think
> > > > everything is fine ;-)
> >
> > I was wondering if you could still probe the "not needed by CPU" power
> > domains (if there are any) as devices. Using driver-core brings you
> > good things :)
>
>  1. That would mean splitting the driver in two parts, looping over the
>     tables twice, while everything can just be done in the first pass?
>
>  2. Which "good things" do you have in mind? Making the driver modular?
>     Ignoring the dependency for secondary CPU setup on R-Car H1, this
>     driver could indeed be modular on R-Car Gen2 and Gen3, as long as
>     the boot loader would pass a ramdisk with the module to the kernel.
>     The ramdisk could not be loaded in any other way, as all I/O
>     devices are part of a PM Domain, and thus depend on the SYSC driver.
>     Note that on some (non-R-Car) SoCs, the timers may be part of a PM
>     Domain, too.

"Good things" like being able to implement runtime pm, suspend/resume
robustness (due to device links). There were a few more benefits I had
in mind when I wrote it, but I don't remember what it was.

The double pass itself is not that big of a deal IMHO. It probably
adds less than a millisecond.

>
> > > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > > driver are now broken?
> > > > Including all clock drivers using CLK_OF_DECLARE()?
> > >
> > > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > > is set, and of_clk_init() sets that flag.  So rcar-sysc should do so, too.
> > > Patch sent.
> > > >     $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > > "\.compatible\>") | wc -l
> > > >     249
> > > >
> > > > (includes false positives)
> > > >
> > > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> > >
> > > Still more than 100 drivers to fix?
> >
> > Not fully sure what the grep is trying to catch, but fw_devlink
> > supports devices on any bus (i2c, platform, pci, etc). So that's not a
> > problem. It'll be a problem when a struct device is never created for
> > a real device. Or if it's created, but never probed.
>
> The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
> and not using a driver model driver (i.e. not matching "*_driver").

Ah TIL about -L and -l. Thanks.

> > I'm also looking into a bunch of other options for fallback when
> > fw_devlink=on doesn't work. Too much to explain here -- patches are
> > easier :)
>
> I gave it a try on all Renesas platforms I have local access to:

Thanks a lot! Really appreciate the testing and reporting.

>
>   - R-Car Gen2/Gen3:
>     Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
>     config boot again.  Remaining issues:
>       - CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
>       - CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
>         Note that Ethernet does not use the R-Car DMAC, so DHCP works.
>         Nevertheless, after that everything hangs, and the board does not
>         respond to pings anymore
>     Both IOMMU and DMAC dependencies are optional, hence should be dropped
>     at late boot (late_initcall?).

Yeah, I'm looking into a good/clean way of handling optional
suppliers. There are a bunch of corner cases I need to consider. But
in the end, I need to have it behave as closely as possible to
fw_devlink=permissive.

>
>   - SH-Mobile AG5 and R-Mobile APE6:
>     The rmobile-sysc driver is similar to the rcar-sysc driver, and does
>     not use a platform device.
>     Still, it works, because all dependencies on the System Controller
>     become unblocked when the rmobile-reset driver binds against the
>     "renesas,sysc-rmobile" device.  Obviously it would fail if no
>     support for that driver is included in your kernel...

Yeah, IMHO two real drivers (not stubs) for a single device tree node
is wrong/weird at a high level. I'd think one should be a child of the
other. But too late to fix that DT now.

Does it make sense for the rmobile-sysc driver to create a new
platform device and have the rmobule-reset bind to that instead? And
then you can bind a stub driver to the "renesas,sysc-rmobile" device?
I know this can be handled by whatever solution I come up with for the
IOMMU case, but that doesn't seem right for this case. We don't have
to decide on this now, but that's my current view.

>   - R-Mobile A1:
>     Also using the rmobile-sysc driver.
>     However, this is a single core Cortex-A9, i.e. it does not have an
>     ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
>     Timer (like SH-Mobile AG5).  The timer used (TMU) is located in a PM
>     Domain controlled by the rmobile-sysc driver, and driver
>     initialization is postponed beyond the point where something relies
>     on a working timer, causing a hang.
>
>     Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
>     this, but prevents the rmobile-reset driver from binding against the
>     same device node, so the reset handling will have to be incorporated
>     into the rmobile-sysc driver (and will thus be registered very
>     early).

Or you can do the "create a child device" option I suggested above.

>   - RZ/A1 and RZ/A2:
>     These are not affected, as the timer used (OSTM) is not a platform
>     driver, but uses TIMER_OF_DECLARE().
>     Note that the RZ/A2 clock driver uses split initialization:
>       1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
>       2. Other clocks are initialized by platform_driver_probe() from a
>          subsys_initcall.
>     If the OSTM driver would be a platform_driver, it would block on the
>     block dependency.  Setting the OF_POPULATED flag in the clock driver
>     would not work: while that flag would unblock probing of the timer
>     driver, it would also prevent the second part of the clock driver
>     initialization.

So this looks like it's all working fine, right? Yeah, I already took
into account the *OF*_DECLARE macros when I wrote this and was aware
of the split driver implementations. So hopefully this all works out
fine.

> Now, back to the things I was supposed to work on this week ;-)

Really appreciate all this testing and feedback!

-Saravana

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
  2021-01-25 23:30                         ` Saravana Kannan
@ 2021-01-26  8:25                           ` Geert Uytterhoeven
  0 siblings, 0 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2021-01-26  8:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki,
	Android Kernel Team, Linux Kernel Mailing List, Jisheng Zhang,
	Kevin Hilman, John Stultz, Nicolas Saenz Julienne,
	Yoshihiro Shimoda, Linux-Renesas

Hi Saravana,

On Tue, Jan 26, 2021 at 12:31 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Jan 21, 2021 at 8:04 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > > > <geert@linux-m68k.org> wrote:
> > > > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > > > <geert@linux-m68k.org> wrote:
> > > > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@google.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > > > >> (currently,
> > > > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > > > >>   worry about functional dependencies between modules (depmod is still
> > > > > > > > > > > >>   needed for symbol dependencies).
> > > > > > > > > > > >>
> > > > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > > > >> device for it.  If we hit such cases, the device drivers need to be
> > > > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > > > >> their
> > > > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > > > >>
> > > > > > > > > > > >> [1] -
> > > > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > > > > >
> > > > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > > > is enabled.
> > > > > > > > > > > >
> > > > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > > > fw_devlink=on by default").
> >
> > > > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > > > driver. You already have a platform device created for it. So just go
> > > > > > ahead and probe it with a platform driver. See what Marek did here
> > > > > > [1].
> > > > > >
> > > > > > You probably had to implement it as an "initcall based driver"
> > > > > > because you had to play initcall chicken to make sure the PD hardware
> > > > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > > > have to worry about that. As an added benefit of implementing a proper
> > > > > > platform driver, you can  actually implement runtime PM now, your
> > > > > > suspend/resume would be more robust, etc.
> > > > >
> > > > > On R-Car H1, the system controller driver needs to be active before
> > > > > secondary CPU setup, hence the early_initcall().
> > > > > platform_bus_init() is called after that, so this is gonna need a split
> > > > > initialization.  Or a dummy platform driver to make devlinks think
> > > > > everything is fine ;-)
> > >
> > > I was wondering if you could still probe the "not needed by CPU" power
> > > domains (if there are any) as devices. Using driver-core brings you
> > > good things :)
> >
> >  1. That would mean splitting the driver in two parts, looping over the
> >     tables twice, while everything can just be done in the first pass?
> >
> >  2. Which "good things" do you have in mind? Making the driver modular?
> >     Ignoring the dependency for secondary CPU setup on R-Car H1, this
> >     driver could indeed be modular on R-Car Gen2 and Gen3, as long as
> >     the boot loader would pass a ramdisk with the module to the kernel.
> >     The ramdisk could not be loaded in any other way, as all I/O
> >     devices are part of a PM Domain, and thus depend on the SYSC driver.
> >     Note that on some (non-R-Car) SoCs, the timers may be part of a PM
> >     Domain, too.
>
> "Good things" like being able to implement runtime pm, suspend/resume
> robustness (due to device links). There were a few more benefits I had
> in mind when I wrote it, but I don't remember what it was.

While that is valid for I/O devices, the System Controller is a power
provider, and thus provides Runtime PM services itself.  It does not use
Runtime PM itself, as it is always-on.

Note that, in theory, you can have a power provider that can be
powered-down, and thus would use (need) Runtime PM, but then you need to
have a second power provider that is always-on to control the first one,
and the problem would just shift to the second one.

> The double pass itself is not that big of a deal IMHO. It probably
> adds less than a millisecond.

Not all embedded systems run at multi-GHz speed...

> > > > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > > > driver are now broken?
> > > > > Including all clock drivers using CLK_OF_DECLARE()?
> > > >
> > > > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > > > is set, and of_clk_init() sets that flag.  So rcar-sysc should do so, too.
> > > > Patch sent.
> > > > >     $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > > > "\.compatible\>") | wc -l
> > > > >     249
> > > > >
> > > > > (includes false positives)
> > > > >
> > > > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> > > >
> > > > Still more than 100 drivers to fix?
> > >
> > > Not fully sure what the grep is trying to catch, but fw_devlink
> > > supports devices on any bus (i2c, platform, pci, etc). So that's not a
> > > problem. It'll be a problem when a struct device is never created for
> > > a real device. Or if it's created, but never probed.
> >
> > The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
> > and not using a driver model driver (i.e. not matching "*_driver").
>
> Ah TIL about -L and -l. Thanks.
>
> > > I'm also looking into a bunch of other options for fallback when
> > > fw_devlink=on doesn't work. Too much to explain here -- patches are
> > > easier :)
> >
> > I gave it a try on all Renesas platforms I have local access to:
>
> Thanks a lot! Really appreciate the testing and reporting.
>
> >
> >   - R-Car Gen2/Gen3:
> >     Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
> >     config boot again.  Remaining issues:
> >       - CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
> >       - CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
> >         Note that Ethernet does not use the R-Car DMAC, so DHCP works.
> >         Nevertheless, after that everything hangs, and the board does not
> >         respond to pings anymore
> >     Both IOMMU and DMAC dependencies are optional, hence should be dropped
> >     at late boot (late_initcall?).
>
> Yeah, I'm looking into a good/clean way of handling optional
> suppliers. There are a bunch of corner cases I need to consider. But
> in the end, I need to have it behave as closely as possible to
> fw_devlink=permissive.

OK.

> >   - SH-Mobile AG5 and R-Mobile APE6:
> >     The rmobile-sysc driver is similar to the rcar-sysc driver, and does
> >     not use a platform device.
> >     Still, it works, because all dependencies on the System Controller
> >     become unblocked when the rmobile-reset driver binds against the
> >     "renesas,sysc-rmobile" device.  Obviously it would fail if no
> >     support for that driver is included in your kernel...
>
> Yeah, IMHO two real drivers (not stubs) for a single device tree node
> is wrong/weird at a high level. I'd think one should be a child of the
> other. But too late to fix that DT now.
>
> Does it make sense for the rmobile-sysc driver to create a new
> platform device and have the rmobule-reset bind to that instead? And
> then you can bind a stub driver to the "renesas,sysc-rmobile" device?
> I know this can be handled by whatever solution I come up with for the
> IOMMU case, but that doesn't seem right for this case. We don't have
> to decide on this now, but that's my current view.

I guess registering the (system) reset handler in the rmobile-sysc driver
is the simplest solution.  We already have clock drivers
registering (device) reset support, as module clock and module reset
are typically combined in the same hardware block.

> >   - R-Mobile A1:
> >     Also using the rmobile-sysc driver.
> >     However, this is a single core Cortex-A9, i.e. it does not have an
> >     ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
> >     Timer (like SH-Mobile AG5).  The timer used (TMU) is located in a PM
> >     Domain controlled by the rmobile-sysc driver, and driver
> >     initialization is postponed beyond the point where something relies
> >     on a working timer, causing a hang.
> >
> >     Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
> >     this, but prevents the rmobile-reset driver from binding against the
> >     same device node, so the reset handling will have to be incorporated
> >     into the rmobile-sysc driver (and will thus be registered very
> >     early).

So the rmobile-sysc driver has to stay a DT driver.

> Or you can do the "create a child device" option I suggested above.

Registering a reset handler from the rmobile-sysc driver is fine.

> >   - RZ/A1 and RZ/A2:
> >     These are not affected, as the timer used (OSTM) is not a platform
> >     driver, but uses TIMER_OF_DECLARE().
> >     Note that the RZ/A2 clock driver uses split initialization:
> >       1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
> >       2. Other clocks are initialized by platform_driver_probe() from a
> >          subsys_initcall.
> >     If the OSTM driver would be a platform_driver, it would block on the
> >     block dependency.  Setting the OF_POPULATED flag in the clock driver
> >     would not work: while that flag would unblock probing of the timer
> >     driver, it would also prevent the second part of the clock driver
> >     initialization.
>
> So this looks like it's all working fine, right? Yeah, I already took
> into account the *OF*_DECLARE macros when I wrote this and was aware
> of the split driver implementations. So hopefully this all works out
> fine.

Some of it is working by accident.

I expect there are systems where the timer driver has been converted
from TIMER_OF_DECLARE() to a platform driver (which people are
recommending, as it is needed for Runtime PM support etc.), and that
will break.  It's hard to predict.

I tested on all Renesas boards I had, as I expected to discover
breakage.  But what exactly broke, and why, was sometimes a bit of a
surprise to me ;-)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-01-26 18:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201218031703.3053753-1-saravanak@google.com>
     [not found] ` <20201218031703.3053753-6-saravanak@google.com>
2021-01-18 17:39   ` [PATCH v1 5/5] driver core: Set fw_devlink=on by default Geert Uytterhoeven
2021-01-18 17:59     ` Marc Zyngier
2021-01-18 19:16       ` Geert Uytterhoeven
2021-01-18 19:30         ` Marc Zyngier
2021-01-18 21:18         ` Saravana Kannan
     [not found]           ` <CAMuHMdVTKEy3rbdYYUKS+L1pY0y0ctMWRXNf7o+hJWyGR7L-Dg@mail.gmail.com>
2021-01-19 18:08             ` Saravana Kannan
2021-01-19 21:50               ` Saravana Kannan
2021-01-20  9:40                 ` Geert Uytterhoeven
2021-01-20 14:26                   ` Geert Uytterhoeven
2021-01-20 17:22                     ` Saravana Kannan
2021-01-21 16:04                       ` Geert Uytterhoeven
2021-01-25 23:30                         ` Saravana Kannan
2021-01-26  8:25                           ` Geert Uytterhoeven
2021-01-20  9:11               ` Geert Uytterhoeven
     [not found]   ` <CGME20210111141814eucas1p1f388df07b789693a999042b27f0d8c2a@eucas1p1.samsung.com>
     [not found]     ` <5484316b-0f27-6c36-9259-5c765bb6b96c@samsung.com>
     [not found]       ` <2556a69b-5da5-bf80-e051-df2d02fbc40f@samsung.com>
     [not found]         ` <CAGETcx8-1YzF2Br0sszJROLAWo3DSm27K071Md9wY5SOwUeLdw@mail.gmail.com>
     [not found]           ` <fde65185-fd00-1f79-0f80-245eaa6c95cb@samsung.com>
     [not found]             ` <CAGETcx_QY3h83q1fSr=h_vMQdH-TMhVYPozPuSr=q4uv2Lr48w@mail.gmail.com>
2021-01-18 17:43               ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).