Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: Saravana Kannan <saravanak@google.com>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Kevin Hilman <khilman@kernel.org>,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Len Brown <len.brown@intel.com>, Len Brown <lenb@kernel.org>,
	Pavel Machek <pavel@ucw.cz>,
	Michael Turquette <mturquette@baylibre.com>,
	Stephen Boyd <sboyd@kernel.org>, Rob Herring <robh+dt@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Marc Zyngier <maz@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux PM list <linux-pm@vger.kernel.org>,
	linux-clk <linux-clk@vger.kernel.org>,
	"open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" 
	<devicetree@vger.kernel.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Android Kernel Team <kernel-team@android.com>,
	Linux-Renesas <linux-renesas-soc@vger.kernel.org>
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving
Date: Thu, 11 Feb 2021 18:59:54 -0800
Message-ID: <CAGETcx-668+uGigaOMcsvv00mo6o_eGPcH0YyD28OCVEyVbw+w@mail.gmail.com> (raw)
In-Reply-To: <CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com>

On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <saravanak@google.com> wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
> >
> > Patch 1 and 2 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 3 and 4 allow for handling optional DT bindings.
> >
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
> >
> > Thanks,
> > Saravana
> >
> > Saravana Kannan (8):
> >   driver core: fw_devlink: Detect supplier devices that will never be
> >     added
> >   of: property: Don't add links to absent suppliers
> >   driver core: Add fw_devlink.strict kernel param
> >   of: property: Add fw_devlink support for optional properties
> >   driver core: fw_devlink: Handle suppliers that don't use driver core
> >   irqdomain: Mark fwnodes when their irqdomain is added/removed
> >   PM: domains: Mark fwnodes when their powerdomain is added/removed
> >   clk: Mark fwnodes when their clock provider is added/removed
>
> Thanks for your series, which is now part of driver-core-next.
> I gave driver-core-next + [1] a try on various Renesas boards.

Thanks!

> Test results are below.
> In general, the result looks much better than before.

Ah, good to hear this.

> [1] - https://lore.kernel.org/lkml/20210210114435.122242-1-tudor.ambarus@microchip.com/
>
>   1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
>
>       - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
>         node OF_POPULATED after init") is no longer needed (but already
>         queued for v5.12 anyway)

Rob doesn't like the proliferation of OF_POPULATED and we don't need
it anymore, so maybe work it out with him? It's a balance between some
wasted memory (struct device(s)) vs not proliferating OF_POPULATED.

>       - Some devices are reprobed, despite their drivers returning
>         a real error code, and not -EPROBE_DEFER:

Sorry, it's not obvious from the logs below where "reprobing" is
happening. Can you give more pointers please?

Also, thinking more about this, the only way I could see this happen is:
1. Device fails with error that's not -EPROBE_DEFER
2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
flag) where it's a consumer.
3. The supplier probes and the device gets added to the deferred probe
list again.

But I can't see how this sequence can happen. Device links are created
only when a device is added. And is the supplier isn't added yet, the
consumer wouldn't have probed in the first place.

Other than "annoying waste of time" is this causing any other problems?

>             renesas_wdt e6020000.watchdog: Watchdog blacklisted on r8a7791 ES1.*
>             (rwdt_probe() returns -ENODEV)
>
>             sh-pfc e6060000.pinctrl: pin GP_7_23 already requested by
> ee090000.pci; cannot claim for e6590000.usb
>             sh-pfc e6060000.pinctrl: pin-247 (e6590000.usb) status -22
>             sh-pfc e6060000.pinctrl: could not request pin 247
> (GP_7_23) from group usb0  on device sh-pfc
>             renesas_usbhs e6590000.usb: Error applying setting,
> reverse things back
>             renesas_usbhs: probe of e6590000.usb failed with error -22
>
>             rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges:
>             rcar-pcie fe000000.pcie:       IO
> 0x00fe100000..0x00fe1fffff -> 0x0000000000
>             rcar-pcie fe000000.pcie:      MEM
> 0x00fe200000..0x00fe3fffff -> 0x00fe200000
>             rcar-pcie fe000000.pcie:      MEM
> 0x0030000000..0x0037ffffff -> 0x0030000000
>             rcar-pcie fe000000.pcie:      MEM
> 0x0038000000..0x003fffffff -> 0x0038000000
>             rcar-pcie fe000000.pcie:   IB MEM
> 0x0040000000..0x00bfffffff -> 0x0040000000
>             rcar-pcie fe000000.pcie:   IB MEM
> 0x0200000000..0x02ffffffff -> 0x0200000000
>             rcar-pcie fe000000.pcie: PCIe link down
>             (rcar_pcie_probe() returns -ENODEV)
>
>             xhci-hcd ee000000.usb: xHCI Host Controller
>             xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 7
>             xhci-hcd ee000000.usb: Direct firmware load for
> r8a779x_usb3_v3.dlmem failed with error -2
>             xhci-hcd ee000000.usb: can't setup: -2
>             xhci-hcd ee000000.usb: USB bus 7 deregistered
>             xhci-hcd: probe of ee000000.usb failed with error -2
>
>       - The PCI reprobing leads to a memory leak, for which I've sent a fix
>         "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
>         https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/

Wrt PCI reprobing,
1. Is this PCI never expected to probe, but it's being reattempted
despite the NOT EPROBE_DEFER error? Or
2. The PCI was deferred probe when it should have probed and then when
it's finally reattemped and it could succeed, we are hitting this mem
leak issue?

I'm basically trying to distinguish between "this stuff should never
be retried" vs "this/it's suppliers got probe deferred with
fw_devlink=on vs but didn't get probe deferred with
fw_devlink=permissive and that's causing issues"

>       - I2C on R-Car Gen3 does not seem to use DMA, according to
>         /sys/kernel/debug/dmaengine/summary:
>
>             -dma4chan0    | e66d8000.i2c:tx
>             -dma4chan1    | e66d8000.i2c:rx
>             -dma5chan0    | e6510000.i2c:tx

I think I need more context on the problem before I can try to fix it.
I'm also very unfamiliar with that file. With fw_devlink=permissive,
I2C was using DMA? If so, the next step is to see if the I2C relative
probe order with DMA is getting changed and if so, why.

>       - Disabling CONFIG_IPMMU_VMSA (IOMMU) now works, good!
>
>            ignoring dependency for device, assuming no driver
>
>       - Disabling CONFIG_RCAR_DMAC works for most devices, except for
>         sound:
>
>             -rcar_sound ec500000.sound: probed
>
>              ALSA device list:
>             -  #0: rcar-sound
>             +  No soundcards found.
>
>             # cat  /sys/kernel/debug/devices_deferred
>             2-0010
>             sound
>             ec500000.sound
>
>             platform e6510000.i2c: Linked as a sync state only
> consumer to ec500000.sound
>             platform ec500000.sound: Linked as a consumer to e6060000.pinctrl
>             platform ec500000.sound: Linked as a consumer to
> e6150000.clock-controller
>             i2c 2-0010: Linked as a consumer to ec500000.sound
>             platform ec500000.sound: Linked as a consumer to 2-004f
>             cs2000-cp 2-004f: revision - C1
>             i2c-rcar e6510000.i2c: probed
>             i2c-rcar e6510000.i2c: Dropping the link to ec500000.sound
>             i2c 2-0010: probe deferral - supplier ec500000.sound not ready
>
>         With CONFIG_RCAR_DMAC=y, ec500000.sound is probed quite early.

I saw your other reply, so I'll ignore this sound/DMA issue.

>
>             arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
>
>             ak4613: codec@10 {
>                     clocks = <&rcar_sound 3>;
>
>                     port {
>                             ak4613_endpoint: endpoint {
>                                     remote-endpoint = <&rsnd_endpoint0>;
>                             };
>                     };
>             };
>
>             sound_card: sound {
>                     dais = <&rsnd_port0     /* ak4613 */
>                             &rsnd_port1     /* HDMI0  */
>                             &rsnd_port2>;   /* HDMI1  */
>             };
>
>             rcar_sound: sound@ec500000 {
>                     ports {
>                             rsnd_port0: port@0 {
>                                     rsnd_endpoint0: endpoint {
>                                             remote-endpoint =
> <&ak4613_endpoint>;
>                                     }
>                             }
>                     }
>             };
>
>
>   2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)
>
>       - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
>         reset handling" is no longer needed
>         https://lore.kernel.org/linux-arm-kernel/20210205133319.1921108-1-geert+renesas@glider.be/

Good to see more evidence that this series is fixing things at a more
generic level.

>       - On R-Mobile A1, I get a BUG and a memory leak:
>
>             BUG: spinlock bad magic on CPU#0, swapper/1
>              lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner:
> <none>/-1, .owner_cpu: 0
>             CPU: 0 PID: 1 Comm: swapper Not tainted
> 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266
>             Hardware name: Generic R8A7740 (Flattened Device Tree)
>             [<c010c3c8>] (unwind_backtrace) from [<c010a49c>]
> (show_stack+0x10/0x14)
>             [<c010a49c>] (show_stack) from [<c0159534>]
> (do_raw_spin_lock+0x20/0x94)
>             [<c0159534>] (do_raw_spin_lock) from [<c04089d8>]
> (dev_pm_get_subsys_data+0x30/0xa0)
>             [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>]
> (genpd_add_device+0x34/0x1c0)
>             [<c0413698>] (genpd_add_device) from [<c041389c>]
> (of_genpd_add_device+0x34/0x4c)
>             [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>]
> (board_staging_register_device+0xf8/0x118)
>             [<c0a1e9bc>] (board_staging_register_device) from
> [<c0a1ea00>] (board_staging_register_devices+0x24/0x28)
>             [<c0a1ea00>] (board_staging_register_devices) from
> [<c0a1ea30>] (runtime_board_check+0x2c/0x40)
>             [<c0a1ea30>] (runtime_board_check) from [<c0101fac>]
> (do_one_initcall+0xe0/0x278)
>             [<c0101fac>] (do_one_initcall) from [<c0a01034>]
> (kernel_init_freeable+0x174/0x1c0)
>             [<c0a01034>] (kernel_init_freeable) from [<c05fd568>]
> (kernel_init+0x8/0x118)
>             [<c05fd568>] (kernel_init) from [<c010011c>]
> (ret_from_fork+0x14/0x38)
>             Exception stack(0xc19c9fb0 to 0xc19c9ff8)
>             9fa0:                                     00000000
> 00000000 00000000 00000000
>             9fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
>             9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>
>             unreferenced object 0xc4134e00 (size 512):
>               comm "swapper", pid 1, jiffies 4294937296 (age 3541.930s)
>               hex dump (first 32 bytes):
>                 00 4e 13 c4 00 4e 13 c4 ff ff ff 7f ff ff ff 7f
> .N...N..........
>                 ff ff ff 7f 02 00 00 00 00 5f 13 c4 1c 4e 13 c4
> ........._...N..
>               backtrace:
>                 [<de1a3c34>] dev_pm_qos_constraints_allocate+0x10/0xcc
>                 [<d21cf6e4>] dev_pm_qos_add_notifier+0x6c/0xd0
>                 [<e04bbc90>] genpd_add_device+0x178/0x1c0
>                 [<95067303>] of_genpd_add_device+0x34/0x4c
>                 [<c334b97a>] board_staging_register_device+0xf8/0x118
>                 [<01bd495a>] board_staging_register_devices+0x24/0x28
>                 [<fb25a5d8>] runtime_board_check+0x2c/0x40
>                 [<65aed679>] do_one_initcall+0xe0/0x278
>                 [<97e3f4f7>] kernel_init_freeable+0x174/0x1c0
>                 [<63c8fed0>] kernel_init+0x8/0x118
>                 [<f704d96c>] ret_from_fork+0x14/0x38
>                 [<00000000>] 0x0

Hmm... I looked at this in bits and pieces throughout the day. At
least spent an hour looking at this. This doesn't make a lot of sense
to me. I don't even touch anything in this code path AFAICT.  Are
modules/kernel mixed up somehow? I need more info before I can help.
Does reverting my pm domain change make any difference (assume it
boots this far without it).

>
>   3. RZ/A1 and RZ/A2: No issues.

Great!

-Saravana

  parent reply index

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210205222651eucas1p28ef87073dea33c1c5224c14aa203bec5@eucas1p2.samsung.com>
2021-02-05 22:26 ` Saravana Kannan
2021-02-05 22:26   ` [PATCH v4 1/8] driver core: fw_devlink: Detect supplier devices that will never be added Saravana Kannan
2021-02-05 22:26   ` [PATCH v4 2/8] of: property: Don't add links to absent suppliers Saravana Kannan
2021-02-09 21:25     ` Rob Herring
2021-02-05 22:26   ` [PATCH v4 3/8] driver core: Add fw_devlink.strict kernel param Saravana Kannan
2021-02-05 22:26   ` [PATCH v4 4/8] of: property: Add fw_devlink support for optional properties Saravana Kannan
2021-02-09 21:33     ` Rob Herring
2021-02-09 21:54       ` Saravana Kannan
2021-02-10  8:25         ` Geert Uytterhoeven
2021-02-05 22:26   ` [PATCH v4 5/8] driver core: fw_devlink: Handle suppliers that don't use driver core Saravana Kannan
2021-02-05 22:26   ` [PATCH v4 6/8] irqdomain: Mark fwnodes when their irqdomain is added/removed Saravana Kannan
2021-02-05 22:26   ` [PATCH v4 7/8] PM: domains: Mark fwnodes when their powerdomain " Saravana Kannan
2021-02-05 22:26   ` [PATCH v4 8/8] clk: Mark fwnodes when their clock provider " Saravana Kannan
2021-02-08 15:38     ` Rob Herring
2021-02-08 23:34       ` Saravana Kannan
2021-02-10 11:44     ` [PATCH] clk: Mark fwnodes when their clock provider is added Tudor Ambarus
2021-02-10 11:44       ` Tudor Ambarus
2021-02-11 13:00         ` Greg KH
2021-02-13  0:37           ` Stephen Boyd
     [not found]         ` <CGME20210325133159eucas1p297b769beb681743fb32d362a86cc6e3e@eucas1p2.samsung.com>
2021-03-25 13:31           ` Marek Szyprowski
2021-03-25 15:47             ` Geert Uytterhoeven
2021-03-25 18:25             ` Nicolas Saenz Julienne
2021-03-26 18:13               ` Stephen Boyd
2021-03-26 18:29                 ` Geert Uytterhoeven
     [not found]                   ` <161705310317.3012082.15148238105608149214@swboyd.mtv.corp.google.com>
2021-03-29 23:28                     ` Saravana Kannan
     [not found]                       ` <161706920822.3012082.10047587064612237296@swboyd.mtv.corp.google.com>
2021-03-30  6:58                         ` Geert Uytterhoeven
     [not found]                           ` <161715734080.2260335.881350237641202575@swboyd.mtv.corp.google.com>
2021-03-31  7:05                             ` Geert Uytterhoeven
     [not found]                               ` <161721871083.2260335.2392646934517115770@swboyd.mtv.corp.google.com>
2021-04-05 11:04                                 ` Nicolas Saenz Julienne
2021-04-21  3:26         ` Guenter Roeck
2021-04-21  7:00           ` Saravana Kannan
2021-02-10 19:13       ` Saravana Kannan
2021-03-30 15:42       ` Guenter Roeck
2021-03-30 16:26         ` Saravana Kannan
     [not found]     ` <161317679292.1254594.15797939257637374295@swboyd.mtv.corp.google.com>
2021-02-14 21:15       ` [PATCH v4 8/8] clk: Mark fwnodes when their clock provider is added/removed Saravana Kannan
2021-02-06  2:45   ` [PATCH v4 0/8] Make fw_devlink=on more forgiving Saravana Kannan
2021-02-06 19:41   ` Geert Uytterhoeven
2021-02-06 20:47     ` Saravana Kannan
2021-02-08  8:40   ` Marek Szyprowski
2021-02-08 23:57     ` Saravana Kannan
2021-02-10  8:19   ` Tudor.Ambarus
2021-02-10  8:54     ` Saravana Kannan
2021-02-10 10:02       ` Tudor.Ambarus
2021-02-10 19:14         ` Saravana Kannan
2021-02-11 13:00   ` Geert Uytterhoeven
2021-02-11 13:05     ` Geert Uytterhoeven
2021-02-12  2:59     ` Saravana Kannan [this message]
2021-02-12  8:14       ` Geert Uytterhoeven
2021-02-12 20:57         ` Saravana Kannan
2021-02-15 12:38       ` Geert Uytterhoeven
2021-02-15 21:26         ` Saravana Kannan
2021-02-16  8:05           ` Geert Uytterhoeven
2021-02-16 18:48             ` Saravana Kannan
2021-02-16 20:31               ` Geert Uytterhoeven
2021-02-17 23:57                 ` Saravana Kannan
2021-02-25  9:21                   ` Geert Uytterhoeven
2021-02-15 15:16       ` Geert Uytterhoeven
2021-02-15 21:57         ` Saravana Kannan
2021-02-16 12:58           ` Geert Uytterhoeven
2021-02-15 11:19     ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGETcx-668+uGigaOMcsvv00mo6o_eGPcH0YyD28OCVEyVbw+w@mail.gmail.com \
    --to=saravanak@google.com \
    --cc=corbet@lwn.net \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=kernel-team@android.com \
    --cc=khilman@kernel.org \
    --cc=len.brown@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=maz@kernel.org \
    --cc=mturquette@baylibre.com \
    --cc=pavel@ucw.cz \
    --cc=rafael@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git