All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] driver core: Fix double failed probing with fw_devlink=on
@ 2021-02-15 11:16 Geert Uytterhoeven
  2021-02-15 14:58 ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Geert Uytterhoeven @ 2021-02-15 11:16 UTC (permalink / raw)
  To: Saravana Kannan, Greg Kroah-Hartman, Rafael J . Wysocki
  Cc: linux-renesas-soc, linux-kernel, Geert Uytterhoeven

With fw_devlink=permissive, devices are added to the deferred probe
pending list if their driver's .probe() method returns -EPROBE_DEFER.

With fw_devlink=on, devices are added to the deferred probe pending list
if they are determined to be a consumer, which happens before their
driver's .probe() method is called.  If the actual probe fails later
(real failure, not -EPROBE_DEFER), the device will still be on the
deferred probe pending list, and it will be probed again when deferred
probing kicks in, which is futile.

Fix this by explicitly removing the device from the deferred probe
pending list in case of probe failures.

Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
Seen on various Renesas R-Car platforms, cfr.
https://lore.kernel.org/linux-acpi/CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com
---
 drivers/base/dd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9179825ff646f4e3..91c4181093c43709 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	case -ENXIO:
 		pr_debug("%s: probe of %s rejects match %d\n",
 			 drv->name, dev_name(dev), ret);
+		driver_deferred_probe_del(dev);
 		break;
 	default:
 		/* driver matched but the probe failed */
 		pr_warn("%s: probe of %s failed with error %d\n",
 			drv->name, dev_name(dev), ret);
+		driver_deferred_probe_del(dev);
 	}
 	/*
 	 * Ignore errors returned by ->probe so that the next driver can try
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-02-15 11:16 [PATCH] driver core: Fix double failed probing with fw_devlink=on Geert Uytterhoeven
@ 2021-02-15 14:58 ` Rafael J. Wysocki
  2021-02-15 18:26   ` Saravana Kannan
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2021-02-15 14:58 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Saravana Kannan, Greg Kroah-Hartman, Rafael J . Wysocki,
	Linux-Renesas, Linux Kernel Mailing List

On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
<geert+renesas@glider.be> wrote:
>
> With fw_devlink=permissive, devices are added to the deferred probe
> pending list if their driver's .probe() method returns -EPROBE_DEFER.
>
> With fw_devlink=on, devices are added to the deferred probe pending list
> if they are determined to be a consumer, which happens before their
> driver's .probe() method is called.  If the actual probe fails later
> (real failure, not -EPROBE_DEFER), the device will still be on the
> deferred probe pending list, and it will be probed again when deferred
> probing kicks in, which is futile.
>
> Fix this by explicitly removing the device from the deferred probe
> pending list in case of probe failures.
>
> Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Good catch:

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> Seen on various Renesas R-Car platforms, cfr.
> https://lore.kernel.org/linux-acpi/CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com
> ---
>  drivers/base/dd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9179825ff646f4e3..91c4181093c43709 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>         case -ENXIO:
>                 pr_debug("%s: probe of %s rejects match %d\n",
>                          drv->name, dev_name(dev), ret);
> +               driver_deferred_probe_del(dev);
>                 break;
>         default:
>                 /* driver matched but the probe failed */
>                 pr_warn("%s: probe of %s failed with error %d\n",
>                         drv->name, dev_name(dev), ret);
> +               driver_deferred_probe_del(dev);
>         }
>         /*
>          * Ignore errors returned by ->probe so that the next driver can try
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-02-15 14:58 ` Rafael J. Wysocki
@ 2021-02-15 18:26   ` Saravana Kannan
  2021-02-15 19:08     ` Geert Uytterhoeven
  0 siblings, 1 reply; 8+ messages in thread
From: Saravana Kannan @ 2021-02-15 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Geert Uytterhoeven, Greg Kroah-Hartman, Linux-Renesas,
	Linux Kernel Mailing List

On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> <geert+renesas@glider.be> wrote:
> >
> > With fw_devlink=permissive, devices are added to the deferred probe
> > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> >
> > With fw_devlink=on, devices are added to the deferred probe pending list
> > if they are determined to be a consumer,

If they are determined to be a consumer or if they are determined to
have a supplier that hasn't probed yet?

> > which happens before their
> > driver's .probe() method is called.  If the actual probe fails later
> > (real failure, not -EPROBE_DEFER), the device will still be on the
> > deferred probe pending list, and it will be probed again when deferred
> > probing kicks in, which is futile.
> >
> > Fix this by explicitly removing the device from the deferred probe
> > pending list in case of probe failures.
> >
> > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Good catch:
>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Geert,

The issue is real and needs to be fixed. But I'm confused how this can
happen. We won't even enter really_probe() if the driver isn't ready.
We also won't get to run the driver's .probe() if the suppliers aren't
ready. So how does the device get added to the deferred probe list
before the driver is ready? Is this due to device_links_driver_bound()
on the supplier?

Can you give a more detailed step by step on the case you are hitting?

Greg/Rafael,

Let's hold off picking this patch till I get to take a closer look
(within a day or two) please.

-Saravana

>
> > ---
> > Seen on various Renesas R-Car platforms, cfr.
> > https://lore.kernel.org/linux-acpi/CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com
> > ---
> >  drivers/base/dd.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 9179825ff646f4e3..91c4181093c43709 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >         case -ENXIO:
> >                 pr_debug("%s: probe of %s rejects match %d\n",
> >                          drv->name, dev_name(dev), ret);
> > +               driver_deferred_probe_del(dev);
> >                 break;
> >         default:
> >                 /* driver matched but the probe failed */
> >                 pr_warn("%s: probe of %s failed with error %d\n",
> >                         drv->name, dev_name(dev), ret);
> > +               driver_deferred_probe_del(dev);
> >         }
> >         /*
> >          * Ignore errors returned by ->probe so that the next driver can try
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-02-15 18:26   ` Saravana Kannan
@ 2021-02-15 19:08     ` Geert Uytterhoeven
  2021-02-15 20:59       ` Saravana Kannan
  0 siblings, 1 reply; 8+ messages in thread
From: Geert Uytterhoeven @ 2021-02-15 19:08 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Geert Uytterhoeven, Greg Kroah-Hartman,
	Linux-Renesas, Linux Kernel Mailing List

Hi Saravana,

On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > <geert+renesas@glider.be> wrote:
> > > With fw_devlink=permissive, devices are added to the deferred probe
> > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > >
> > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > if they are determined to be a consumer,
>
> If they are determined to be a consumer or if they are determined to
> have a supplier that hasn't probed yet?

When the supplier has probed:

    bus: 'platform': driver_probe_device: matched device
e6150000.clock-controller with driver renesas-cpg-mssr
    bus: 'platform': really_probe: probing driver renesas-cpg-mssr
with device e6150000.clock-controller
    PM: Added domain provider from /soc/clock-controller@e6150000
    driver: 'renesas-cpg-mssr': driver_bound: bound to device
'e6150000.clock-controller'
    platform e6055800.gpio: Added to deferred list
    [...]
    platform e6020000.watchdog: Added to deferred list
    [...]
    platform fe000000.pcie: Added to deferred list

> > > which happens before their
> > > driver's .probe() method is called.  If the actual probe fails later
> > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > deferred probe pending list, and it will be probed again when deferred
> > > probing kicks in, which is futile.
> > >
> > > Fix this by explicitly removing the device from the deferred probe
> > > pending list in case of probe failures.
> > >
> > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> >
> > Good catch:
> >
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The issue is real and needs to be fixed. But I'm confused how this can
> happen. We won't even enter really_probe() if the driver isn't ready.
> We also won't get to run the driver's .probe() if the suppliers aren't
> ready. So how does the device get added to the deferred probe list
> before the driver is ready? Is this due to device_links_driver_bound()
> on the supplier?
>
> Can you give a more detailed step by step on the case you are hitting?

The device is added to the list due to device_links_driver_bound()
calling driver_deferred_probe_add() on all consumer devices.

> > > +++ b/drivers/base/dd.c
> > > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >         case -ENXIO:
> > >                 pr_debug("%s: probe of %s rejects match %d\n",
> > >                          drv->name, dev_name(dev), ret);
> > > +               driver_deferred_probe_del(dev);
> > >                 break;
> > >         default:
> > >                 /* driver matched but the probe failed */
> > >                 pr_warn("%s: probe of %s failed with error %d\n",
> > >                         drv->name, dev_name(dev), ret);
> > > +               driver_deferred_probe_del(dev);
> > >         }
> > >         /*
> > >          * Ignore errors returned by ->probe so that the next driver can try

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-02-15 19:08     ` Geert Uytterhoeven
@ 2021-02-15 20:59       ` Saravana Kannan
  2021-02-16 17:07         ` Saravana Kannan
  0 siblings, 1 reply; 8+ messages in thread
From: Saravana Kannan @ 2021-02-15 20:59 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Geert Uytterhoeven, Greg Kroah-Hartman,
	Linux-Renesas, Linux Kernel Mailing List

On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > <geert+renesas@glider.be> wrote:
> > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > >
> > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > if they are determined to be a consumer,
> >
> > If they are determined to be a consumer or if they are determined to
> > have a supplier that hasn't probed yet?
>
> When the supplier has probed:
>
>     bus: 'platform': driver_probe_device: matched device
> e6150000.clock-controller with driver renesas-cpg-mssr
>     bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> with device e6150000.clock-controller
>     PM: Added domain provider from /soc/clock-controller@e6150000
>     driver: 'renesas-cpg-mssr': driver_bound: bound to device
> 'e6150000.clock-controller'
>     platform e6055800.gpio: Added to deferred list
>     [...]
>     platform e6020000.watchdog: Added to deferred list
>     [...]
>     platform fe000000.pcie: Added to deferred list
>
> > > > which happens before their
> > > > driver's .probe() method is called.  If the actual probe fails later
> > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > deferred probe pending list, and it will be probed again when deferred
> > > > probing kicks in, which is futile.
> > > >
> > > > Fix this by explicitly removing the device from the deferred probe
> > > > pending list in case of probe failures.
> > > >
> > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > >
> > > Good catch:
> > >
> > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The issue is real and needs to be fixed. But I'm confused how this can
> > happen. We won't even enter really_probe() if the driver isn't ready.
> > We also won't get to run the driver's .probe() if the suppliers aren't
> > ready. So how does the device get added to the deferred probe list
> > before the driver is ready? Is this due to device_links_driver_bound()
> > on the supplier?
> >
> > Can you give a more detailed step by step on the case you are hitting?
>
> The device is added to the list due to device_links_driver_bound()
> calling driver_deferred_probe_add() on all consumer devices.

Thanks for the explanation. Maybe add more details like this to the
commit text or in the code?

For the code:
Reviewed-by: Saravana Kanna <saravanak@google.com>

-Saravana

>
> > > > +++ b/drivers/base/dd.c
> > > > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > >         case -ENXIO:
> > > >                 pr_debug("%s: probe of %s rejects match %d\n",
> > > >                          drv->name, dev_name(dev), ret);
> > > > +               driver_deferred_probe_del(dev);
> > > >                 break;
> > > >         default:
> > > >                 /* driver matched but the probe failed */
> > > >                 pr_warn("%s: probe of %s failed with error %d\n",
> > > >                         drv->name, dev_name(dev), ret);
> > > > +               driver_deferred_probe_del(dev);
> > > >         }
> > > >         /*
> > > >          * Ignore errors returned by ->probe so that the next driver can try
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-02-15 20:59       ` Saravana Kannan
@ 2021-02-16 17:07         ` Saravana Kannan
  2021-07-07  8:43           ` Geert Uytterhoeven
  0 siblings, 1 reply; 8+ messages in thread
From: Saravana Kannan @ 2021-02-16 17:07 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Geert Uytterhoeven, Greg Kroah-Hartman,
	Linux-Renesas, Linux Kernel Mailing List

On Mon, Feb 15, 2021 at 12:59 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> >
> > Hi Saravana,
> >
> > On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > > <geert+renesas@glider.be> wrote:
> > > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > > >
> > > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > > if they are determined to be a consumer,
> > >
> > > If they are determined to be a consumer or if they are determined to
> > > have a supplier that hasn't probed yet?
> >
> > When the supplier has probed:
> >
> >     bus: 'platform': driver_probe_device: matched device
> > e6150000.clock-controller with driver renesas-cpg-mssr
> >     bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> > with device e6150000.clock-controller
> >     PM: Added domain provider from /soc/clock-controller@e6150000
> >     driver: 'renesas-cpg-mssr': driver_bound: bound to device
> > 'e6150000.clock-controller'
> >     platform e6055800.gpio: Added to deferred list
> >     [...]
> >     platform e6020000.watchdog: Added to deferred list
> >     [...]
> >     platform fe000000.pcie: Added to deferred list
> >
> > > > > which happens before their
> > > > > driver's .probe() method is called.  If the actual probe fails later
> > > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > > deferred probe pending list, and it will be probed again when deferred
> > > > > probing kicks in, which is futile.
> > > > >
> > > > > Fix this by explicitly removing the device from the deferred probe
> > > > > pending list in case of probe failures.
> > > > >
> > > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > >
> > > > Good catch:
> > > >
> > > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > The issue is real and needs to be fixed. But I'm confused how this can
> > > happen. We won't even enter really_probe() if the driver isn't ready.
> > > We also won't get to run the driver's .probe() if the suppliers aren't
> > > ready. So how does the device get added to the deferred probe list
> > > before the driver is ready? Is this due to device_links_driver_bound()
> > > on the supplier?
> > >
> > > Can you give a more detailed step by step on the case you are hitting?
> >
> > The device is added to the list due to device_links_driver_bound()
> > calling driver_deferred_probe_add() on all consumer devices.
>
> Thanks for the explanation. Maybe add more details like this to the
> commit text or in the code?
>
> For the code:
> Reviewed-by: Saravana Kanna <saravanak@google.com>

Ugh... I just realized that I might have to give this a Nak because of
bad locking in deferred_probe_work_func(). The unlock/lock inside the
loop is a terrible hack. If we add this patch, we can end up modifying
a linked list while it's being traversed and cause a crash or busy
loop (you'll accidentally end up on an "empty list"). I ran into a
similar issue during one of my unrelated refactors.

-Saravana

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-02-16 17:07         ` Saravana Kannan
@ 2021-07-07  8:43           ` Geert Uytterhoeven
  2021-07-07 17:45             ` Saravana Kannan
  0 siblings, 1 reply; 8+ messages in thread
From: Geert Uytterhoeven @ 2021-07-07  8:43 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux-Renesas,
	Linux Kernel Mailing List

Hi Saravana,

(going over old patch I still have in my local tree)

On Tue, Feb 16, 2021 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:
> On Mon, Feb 15, 2021 at 12:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
> > <geert@linux-m68k.org> wrote:
> > > On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > > > <geert+renesas@glider.be> wrote:
> > > > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > > > >
> > > > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > > > if they are determined to be a consumer,
> > > >
> > > > If they are determined to be a consumer or if they are determined to
> > > > have a supplier that hasn't probed yet?
> > >
> > > When the supplier has probed:
> > >
> > >     bus: 'platform': driver_probe_device: matched device
> > > e6150000.clock-controller with driver renesas-cpg-mssr
> > >     bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> > > with device e6150000.clock-controller
> > >     PM: Added domain provider from /soc/clock-controller@e6150000
> > >     driver: 'renesas-cpg-mssr': driver_bound: bound to device
> > > 'e6150000.clock-controller'
> > >     platform e6055800.gpio: Added to deferred list
> > >     [...]
> > >     platform e6020000.watchdog: Added to deferred list
> > >     [...]
> > >     platform fe000000.pcie: Added to deferred list
> > >
> > > > > > which happens before their
> > > > > > driver's .probe() method is called.  If the actual probe fails later
> > > > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > > > deferred probe pending list, and it will be probed again when deferred
> > > > > > probing kicks in, which is futile.
> > > > > >
> > > > > > Fix this by explicitly removing the device from the deferred probe
> > > > > > pending list in case of probe failures.
> > > > > >
> > > > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > >
> > > > > Good catch:
> > > > >
> > > > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > The issue is real and needs to be fixed. But I'm confused how this can
> > > > happen. We won't even enter really_probe() if the driver isn't ready.
> > > > We also won't get to run the driver's .probe() if the suppliers aren't
> > > > ready. So how does the device get added to the deferred probe list
> > > > before the driver is ready? Is this due to device_links_driver_bound()
> > > > on the supplier?
> > > >
> > > > Can you give a more detailed step by step on the case you are hitting?
> > >
> > > The device is added to the list due to device_links_driver_bound()
> > > calling driver_deferred_probe_add() on all consumer devices.
> >
> > Thanks for the explanation. Maybe add more details like this to the
> > commit text or in the code?
> >
> > For the code:
> > Reviewed-by: Saravana Kanna <saravanak@google.com>
>
> Ugh... I just realized that I might have to give this a Nak because of
> bad locking in deferred_probe_work_func(). The unlock/lock inside the
> loop is a terrible hack. If we add this patch, we can end up modifying
> a linked list while it's being traversed and cause a crash or busy
> loop (you'll accidentally end up on an "empty list"). I ran into a
> similar issue during one of my unrelated refactors.

Turns out the issue I was seeing went away due to commit
f2db85b64f0af141 ("driver core: Avoid pointless deferred probe
attempts"), so there is no need to apply this patch.


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on
  2021-07-07  8:43           ` Geert Uytterhoeven
@ 2021-07-07 17:45             ` Saravana Kannan
  0 siblings, 0 replies; 8+ messages in thread
From: Saravana Kannan @ 2021-07-07 17:45 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux-Renesas,
	Linux Kernel Mailing List

On Wed, Jul 7, 2021 at 1:43 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> (going over old patch I still have in my local tree)
>
> On Tue, Feb 16, 2021 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Mon, Feb 15, 2021 at 12:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
> > > <geert@linux-m68k.org> wrote:
> > > > On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > > > > <geert+renesas@glider.be> wrote:
> > > > > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > > > > >
> > > > > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > > > > if they are determined to be a consumer,
> > > > >
> > > > > If they are determined to be a consumer or if they are determined to
> > > > > have a supplier that hasn't probed yet?
> > > >
> > > > When the supplier has probed:
> > > >
> > > >     bus: 'platform': driver_probe_device: matched device
> > > > e6150000.clock-controller with driver renesas-cpg-mssr
> > > >     bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> > > > with device e6150000.clock-controller
> > > >     PM: Added domain provider from /soc/clock-controller@e6150000
> > > >     driver: 'renesas-cpg-mssr': driver_bound: bound to device
> > > > 'e6150000.clock-controller'
> > > >     platform e6055800.gpio: Added to deferred list
> > > >     [...]
> > > >     platform e6020000.watchdog: Added to deferred list
> > > >     [...]
> > > >     platform fe000000.pcie: Added to deferred list
> > > >
> > > > > > > which happens before their
> > > > > > > driver's .probe() method is called.  If the actual probe fails later
> > > > > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > > > > deferred probe pending list, and it will be probed again when deferred
> > > > > > > probing kicks in, which is futile.
> > > > > > >
> > > > > > > Fix this by explicitly removing the device from the deferred probe
> > > > > > > pending list in case of probe failures.
> > > > > > >
> > > > > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > > >
> > > > > > Good catch:
> > > > > >
> > > > > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > The issue is real and needs to be fixed. But I'm confused how this can
> > > > > happen. We won't even enter really_probe() if the driver isn't ready.
> > > > > We also won't get to run the driver's .probe() if the suppliers aren't
> > > > > ready. So how does the device get added to the deferred probe list
> > > > > before the driver is ready? Is this due to device_links_driver_bound()
> > > > > on the supplier?
> > > > >
> > > > > Can you give a more detailed step by step on the case you are hitting?
> > > >
> > > > The device is added to the list due to device_links_driver_bound()
> > > > calling driver_deferred_probe_add() on all consumer devices.
> > >
> > > Thanks for the explanation. Maybe add more details like this to the
> > > commit text or in the code?
> > >
> > > For the code:
> > > Reviewed-by: Saravana Kanna <saravanak@google.com>
> >
> > Ugh... I just realized that I might have to give this a Nak because of
> > bad locking in deferred_probe_work_func(). The unlock/lock inside the
> > loop is a terrible hack. If we add this patch, we can end up modifying
> > a linked list while it's being traversed and cause a crash or busy
> > loop (you'll accidentally end up on an "empty list"). I ran into a
> > similar issue during one of my unrelated refactors.
>
> Turns out the issue I was seeing went away due to commit
> f2db85b64f0af141 ("driver core: Avoid pointless deferred probe
> attempts"), so there is no need to apply this patch.
>

Yay! That was the goal :) I'm assuming it wasn't ever applied.

-Saravana

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-07 17:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-15 11:16 [PATCH] driver core: Fix double failed probing with fw_devlink=on Geert Uytterhoeven
2021-02-15 14:58 ` Rafael J. Wysocki
2021-02-15 18:26   ` Saravana Kannan
2021-02-15 19:08     ` Geert Uytterhoeven
2021-02-15 20:59       ` Saravana Kannan
2021-02-16 17:07         ` Saravana Kannan
2021-07-07  8:43           ` Geert Uytterhoeven
2021-07-07 17:45             ` Saravana Kannan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.