linux-input.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [BUG] amba: Remove deferred device addition
       [not found]                   ` <85b28900-5f42-b997-2ded-0b952bc2a03e@huawei.com>
@ 2021-08-27 19:09                     ` Saravana Kannan
  2021-08-28  1:09                       ` Kefeng Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Saravana Kannan @ 2021-08-27 19:09 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Rob Herring, linux-kernel, Frank Rowand, devicetree,
	Russell King, Linus Walleij, linux-arm-kernel, Dmitry Torokhov,
	linux-input

On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2021/8/27 8:04, Saravana Kannan wrote:
> > On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>
> >>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
> >>>>> solution that we have for amba devices. That causes a bunch of other
> >>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
> >>>>> adding more reasons for delaying the addition of the device.
> >> Hi Saravana, I try the link[1], but with it, there is a crash when boot
> >> (qemu-system-arm -M vexpress-a15),

I'm assuming it's this one?
arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts

> > Hi,
> >
> > It's hard to make sense of the logs. Looks like two different threads
> > might be printing to the log at the same time? Can you please enable
> > the config that prints the thread ID (forgot what it's called) and
> > collect this again? With what I could tell the crash seems to be
> > happening somewhere in platform_match(), but that's not related to
> > this patch at all?
>
> Can you reproduce it? it is very likely related(without your patch, the
> boot is fine),

Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.

> the NULL ptr is about serio, it is registed from amba driver.
>
> ambakmi_driver_init
>
>   -- amba_kmi_probe
>
>     -- __serio_register_port

Thanks for the pointer. I took a look at the logs and the code. It's
very strange. As you can see from the backtrace, platform_match() is
being called for the device_add() from serio_handle_event(). But the
device that gets added there is on the serio_bus which obviously
should be using the serio_bus_match.

>
> +Dmitry and input maillist, is there some known issue about serio ?
>
> I add some debug, the full log is attached.
>
> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
> drv c1090fc0, drv->name vexpress-reset

Based on the logs you added, it's pretty clear we are getting to
platform_match(). It's also strange that the drv->name is
vexpress-reset

> [    2.977928][   T41] 8<--- cut here ---
> [    2.978162][   T41] Unhandled fault: page domain fault (0x01b) at
> 0x00000000
> [    2.978494][   T41] pgd = (ptrval)
> [    2.978819][   T41] [00000000] *pgd=00000000
> [    2.979881][   T41] Internal error: : 1b [#1] SMP ARM
> [    2.980385][   T41] Modules linked in:
> [    2.980810][   T41] CPU: 0 PID: 41 Comm: kworker/0:2 Not tainted
> 5.14.0-rc7+ #213
> [    2.981229][   T41] Hardware name: ARM-Versatile Express
> [    2.981780][   T41] Workqueue: events_long serio_handle_event
> [    2.982737][   T41] PC is at strcmp+0x18/0x44
> [    2.983030][   T41] LR is at platform_match+0xdc/0xf0
> [    2.983283][   T41] pc : [<c0560bcc>]    lr : [<c0646358>]    psr:
> 600b0013
> [    2.983572][   T41] sp : c1675d68  ip : c1675d78  fp : c1675d74
> [    2.983832][   T41] r10: 00000000  r9 : 00000000  r8 : 00000001
> [    2.984095][   T41] r7 : c1e05518  r6 : c1675df4  r5 : c1e05518  r4 :
> c1090fc0
> [    2.984395][   T41] r3 : c0a5e180  r2 : 6bede3db  r1 : c0b82a04  r0 :
> 00000000
> [    2.984906][   T41] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32 ISA
> ARM  Segment none

---- 8< ---- cleaning up a bunch of register dumps

> [    3.003113][   T41] Backtrace:
> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)

But the platform_match() is happening for the device_add() from
serio_event_handle() that's adding a device to the serio_bus and it
should be using serio_bus_match().

I haven't reached any conclusion yet, but my current thought process
is that it's either:
1. My patch is somehow causing list corruption. But I don't directly
touch any list in my change (other than deleting a list entirely), so
it's not clear how that would be happening.
2. Without my patch, these AMBA device's probe would be delayed at
least until 5 seconds or possibly later. I'm wondering if my patch is
catching some bad timing assumptions in other code.

You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
doesn't mean it's not (2), but if it does, then we know for sure it's
(2).

I'll continue debugging further.

-Saravana

>
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 836d6d23bba3..883a58c658c2 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -237,6 +237,7 @@ static int amba_match(struct device *dev, struct
> device_driver *drv)
>
>          if (!pcdev->periphid) {
>                  int ret = amba_read_periphid(pcdev);
> +               dev_info(dev, "%s, amba_read_periphid ret = %d\n",
> __func__, ret);
>
>                  if (ret)
>                          return ret;
> @@ -522,6 +523,7 @@ int amba_device_add(struct amba_device *dev, struct
> resource *parent)
>          /* If primecell ID isn't hard-coded, figure it out */
>          if (!dev->periphid) {
>                  ret = amba_read_periphid(dev);
> +               dev_info(&dev->dev, "%s, amba_read_periphid ret = %d\n",
> __func__, ret);
>                  if (ret && ret != -EPROBE_DEFER)
>                          goto err_release;
>                  /*
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 8640578f45e9..f7c1933c56b5 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1360,6 +1360,7 @@ static int platform_match(struct device *dev,
> struct device_driver *drv)
>          struct platform_device *pdev = to_platform_device(dev);
>          struct platform_driver *pdrv = to_platform_driver(drv);
>
> +       dev_info(dev, "pdev %px, pdev->name %s, drv %px, drv->name %s",
> pdev, pdev->name, drv, drv->name);
>          /* When driver_override is set, only bind to the matching driver */
>          if (pdev->driver_override)
>                  return !strcmp(pdev->driver_override, drv->name);
>
>
> > [1] - https://lore.kernel.org/lkml/CAGETcx8b228nDUho3cX9AAQ-pXOfZTMv8cj2vhdx9yc_pk8q+A@mail.gmail.com/
> > .
> >
> >>> .
> >>>
> > .
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] amba: Remove deferred device addition
  2021-08-27 19:09                     ` [BUG] amba: Remove deferred device addition Saravana Kannan
@ 2021-08-28  1:09                       ` Kefeng Wang
  2021-09-09  3:30                         ` Saravana Kannan
  0 siblings, 1 reply; 6+ messages in thread
From: Kefeng Wang @ 2021-08-28  1:09 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, linux-kernel, Frank Rowand, devicetree,
	Russell King, Linus Walleij, linux-arm-kernel, Dmitry Torokhov,
	linux-input


On 2021/8/28 3:09, Saravana Kannan wrote:
> On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>> On 2021/8/27 8:04, Saravana Kannan wrote:
>>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
>>>>>>> solution that we have for amba devices. That causes a bunch of other
>>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
>>>>>>> adding more reasons for delaying the addition of the device.
>>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
>>>> (qemu-system-arm -M vexpress-a15),
> I'm assuming it's this one?
> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts

I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.

qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu 
cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd 
initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0 
earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev 
type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown 
-nographic

>
>>> Hi,
>>>
>>> It's hard to make sense of the logs. Looks like two different threads
>>> might be printing to the log at the same time? Can you please enable
>>> the config that prints the thread ID (forgot what it's called) and
>>> collect this again? With what I could tell the crash seems to be
>>> happening somewhere in platform_match(), but that's not related to
>>> this patch at all?
>> Can you reproduce it? it is very likely related(without your patch, the
>> boot is fine),
> Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
>
>> the NULL ptr is about serio, it is registed from amba driver.
>>
>> ambakmi_driver_init
>>
>>    -- amba_kmi_probe
>>
>>      -- __serio_register_port
> Thanks for the pointer. I took a look at the logs and the code. It's
> very strange. As you can see from the backtrace, platform_match() is
> being called for the device_add() from serio_handle_event(). But the
> device that gets added there is on the serio_bus which obviously
> should be using the serio_bus_match.
Yes, I am confused too.
>
>> +Dmitry and input maillist, is there some known issue about serio ?
>>
>> I add some debug, the full log is attached.
>>
>> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
>> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
>> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
>> drv c1090fc0, drv->name vexpress-reset
> Based on the logs you added, it's pretty clear we are getting to
> platform_match(). It's also strange that the drv->name is
> vexpress-reset
...
>
>> [    3.003113][   T41] Backtrace:
>> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
>> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
>> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
>> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
>> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
>> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
>> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
>> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
>> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
>> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
>> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
>> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
>> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
> But the platform_match() is happening for the device_add() from
> serio_event_handle() that's adding a device to the serio_bus and it
> should be using serio_bus_match().
>
> I haven't reached any conclusion yet, but my current thought process
> is that it's either:
> 1. My patch is somehow causing list corruption. But I don't directly
> touch any list in my change (other than deleting a list entirely), so
> it's not clear how that would be happening.

Maybe some concurrent driver load?

> 2. Without my patch, these AMBA device's probe would be delayed at
> least until 5 seconds or possibly later. I'm wondering if my patch is
> catching some bad timing assumptions in other code.

After Rob's patch, It will retry soon.

commit 039599c92d3b2e73689e8b6e519d653fd4770abb
Author: Rob Herring <robh@kernel.org>
Date:   Wed Apr 29 15:58:12 2020 -0500

     amba: Retry adding deferred devices at late_initcall

     If amba bus devices defer when adding, the amba bus code simply retries
     adding the devices every 5 seconds. This doesn't work well as it
     completely unsynchronized with starting the init process which can
     happen in less than 5 secs. Add a retry during late_initcall. If the
     amba devices are added, then deferred probe takes over. If the
     dependencies have not probed at this point, then there's no improvement
     over previous behavior. To completely solve this, we'd need to retry
     after every successful probe as deferred probe does.

     The list_empty() check now happens outside the mutex, but the mutex
     wasn't necessary in the first place.

     This needed to use deferred probe instead of fragile initcall ordering
     on 32-bit VExpress systems where the apb_pclk has a number of probe
     dependencies (vexpress-sysregs, vexpress-config).


>
> You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
> a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
> doesn't mean it's not (2), but if it does, then we know for sure it's
> (2).
ok, I will try this one, but due to above patch, it may not work.

>
> I'll continue debugging further.
>
> -Saravana

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] amba: Remove deferred device addition
  2021-08-28  1:09                       ` Kefeng Wang
@ 2021-09-09  3:30                         ` Saravana Kannan
  2021-09-10  7:59                           ` Kefeng Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Saravana Kannan @ 2021-09-09  3:30 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Rob Herring, linux-kernel, Frank Rowand, devicetree,
	Russell King, Linus Walleij, linux-arm-kernel, Dmitry Torokhov,
	linux-input

On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2021/8/28 3:09, Saravana Kannan wrote:
> > On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>
> >> On 2021/8/27 8:04, Saravana Kannan wrote:
> >>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
> >>>>>>> solution that we have for amba devices. That causes a bunch of other
> >>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
> >>>>>>> adding more reasons for delaying the addition of the device.
> >>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
> >>>> (qemu-system-arm -M vexpress-a15),
> > I'm assuming it's this one?
> > arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
>
> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
>
> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
> -nographic
>
> >
> >>> Hi,
> >>>
> >>> It's hard to make sense of the logs. Looks like two different threads
> >>> might be printing to the log at the same time? Can you please enable
> >>> the config that prints the thread ID (forgot what it's called) and
> >>> collect this again? With what I could tell the crash seems to be
> >>> happening somewhere in platform_match(), but that's not related to
> >>> this patch at all?
> >> Can you reproduce it? it is very likely related(without your patch, the
> >> boot is fine),
> > Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
> >
> >> the NULL ptr is about serio, it is registed from amba driver.
> >>
> >> ambakmi_driver_init
> >>
> >>    -- amba_kmi_probe
> >>
> >>      -- __serio_register_port
> > Thanks for the pointer. I took a look at the logs and the code. It's
> > very strange. As you can see from the backtrace, platform_match() is
> > being called for the device_add() from serio_handle_event(). But the
> > device that gets added there is on the serio_bus which obviously
> > should be using the serio_bus_match.
> Yes, I am confused too.
> >
> >> +Dmitry and input maillist, is there some known issue about serio ?
> >>
> >> I add some debug, the full log is attached.
> >>
> >> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
> >> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
> >> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
> >> drv c1090fc0, drv->name vexpress-reset
> > Based on the logs you added, it's pretty clear we are getting to
> > platform_match(). It's also strange that the drv->name is
> > vexpress-reset
> ...
> >
> >> [    3.003113][   T41] Backtrace:
> >> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
> >> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
> >> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
> >> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
> >> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
> >> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
> >> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
> >> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
> >> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
> >> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
> >> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
> >> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
> >> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
> > But the platform_match() is happening for the device_add() from
> > serio_event_handle() that's adding a device to the serio_bus and it
> > should be using serio_bus_match().
> >
> > I haven't reached any conclusion yet, but my current thought process
> > is that it's either:
> > 1. My patch is somehow causing list corruption. But I don't directly
> > touch any list in my change (other than deleting a list entirely), so
> > it's not clear how that would be happening.
>
> Maybe some concurrent driver load?
>
> > 2. Without my patch, these AMBA device's probe would be delayed at
> > least until 5 seconds or possibly later. I'm wondering if my patch is
> > catching some bad timing assumptions in other code.
>
> After Rob's patch, It will retry soon.
>
> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
> Author: Rob Herring <robh@kernel.org>
> Date:   Wed Apr 29 15:58:12 2020 -0500
>
>      amba: Retry adding deferred devices at late_initcall
>
>      If amba bus devices defer when adding, the amba bus code simply retries
>      adding the devices every 5 seconds. This doesn't work well as it
>      completely unsynchronized with starting the init process which can
>      happen in less than 5 secs. Add a retry during late_initcall. If the
>      amba devices are added, then deferred probe takes over. If the
>      dependencies have not probed at this point, then there's no improvement
>      over previous behavior. To completely solve this, we'd need to retry
>      after every successful probe as deferred probe does.
>
>      The list_empty() check now happens outside the mutex, but the mutex
>      wasn't necessary in the first place.
>
>      This needed to use deferred probe instead of fragile initcall ordering
>      on 32-bit VExpress systems where the apb_pclk has a number of probe
>      dependencies (vexpress-sysregs, vexpress-config).
>
>
> >
> > You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
> > a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
> > doesn't mean it's not (2), but if it does, then we know for sure it's
> > (2).
> ok, I will try this one, but due to above patch, it may not work.

Were you able to find anything more?

-Saravana

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] amba: Remove deferred device addition
  2021-09-09  3:30                         ` Saravana Kannan
@ 2021-09-10  7:59                           ` Kefeng Wang
  2022-07-05 19:25                             ` Saravana Kannan
  0 siblings, 1 reply; 6+ messages in thread
From: Kefeng Wang @ 2021-09-10  7:59 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, linux-kernel, Frank Rowand, devicetree,
	Russell King, Linus Walleij, linux-arm-kernel, Dmitry Torokhov,
	linux-input


On 2021/9/9 11:30, Saravana Kannan wrote:
> On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>> On 2021/8/28 3:09, Saravana Kannan wrote:
>>> On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>> On 2021/8/27 8:04, Saravana Kannan wrote:
>>>>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
>>>>>>>>> solution that we have for amba devices. That causes a bunch of other
>>>>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
>>>>>>>>> adding more reasons for delaying the addition of the device.
>>>>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
>>>>>> (qemu-system-arm -M vexpress-a15),
>>> I'm assuming it's this one?
>>> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
>> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
>>
>> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
>> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
>> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
>> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
>> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
>> -nographic
>>
>>>>> Hi,
>>>>>
>>>>> It's hard to make sense of the logs. Looks like two different threads
>>>>> might be printing to the log at the same time? Can you please enable
>>>>> the config that prints the thread ID (forgot what it's called) and
>>>>> collect this again? With what I could tell the crash seems to be
>>>>> happening somewhere in platform_match(), but that's not related to
>>>>> this patch at all?
>>>> Can you reproduce it? it is very likely related(without your patch, the
>>>> boot is fine),
>>> Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
>>>
>>>> the NULL ptr is about serio, it is registed from amba driver.
>>>>
>>>> ambakmi_driver_init
>>>>
>>>>     -- amba_kmi_probe
>>>>
>>>>       -- __serio_register_port
>>> Thanks for the pointer. I took a look at the logs and the code. It's
>>> very strange. As you can see from the backtrace, platform_match() is
>>> being called for the device_add() from serio_handle_event(). But the
>>> device that gets added there is on the serio_bus which obviously
>>> should be using the serio_bus_match.
>> Yes, I am confused too.
>>>> +Dmitry and input maillist, is there some known issue about serio ?
>>>>
>>>> I add some debug, the full log is attached.
>>>>
>>>> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
>>>> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
>>>> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
>>>> drv c1090fc0, drv->name vexpress-reset
>>> Based on the logs you added, it's pretty clear we are getting to
>>> platform_match(). It's also strange that the drv->name is
>>> vexpress-reset
>> ...
>>>> [    3.003113][   T41] Backtrace:
>>>> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
>>>> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
>>>> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
>>>> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
>>>> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
>>>> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
>>>> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
>>>> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
>>>> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
>>>> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
>>>> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
>>>> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
>>>> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
>>> But the platform_match() is happening for the device_add() from
>>> serio_event_handle() that's adding a device to the serio_bus and it
>>> should be using serio_bus_match().
>>>
>>> I haven't reached any conclusion yet, but my current thought process
>>> is that it's either:
>>> 1. My patch is somehow causing list corruption. But I don't directly
>>> touch any list in my change (other than deleting a list entirely), so
>>> it's not clear how that would be happening.
>> Maybe some concurrent driver load?
>>
>>> 2. Without my patch, these AMBA device's probe would be delayed at
>>> least until 5 seconds or possibly later. I'm wondering if my patch is
>>> catching some bad timing assumptions in other code.
>> After Rob's patch, It will retry soon.
>>
>> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
>> Author: Rob Herring <robh@kernel.org>
>> Date:   Wed Apr 29 15:58:12 2020 -0500
>>
>>       amba: Retry adding deferred devices at late_initcall
>>
>>       If amba bus devices defer when adding, the amba bus code simply retries
>>       adding the devices every 5 seconds. This doesn't work well as it
>>       completely unsynchronized with starting the init process which can
>>       happen in less than 5 secs. Add a retry during late_initcall. If the
>>       amba devices are added, then deferred probe takes over. If the
>>       dependencies have not probed at this point, then there's no improvement
>>       over previous behavior. To completely solve this, we'd need to retry
>>       after every successful probe as deferred probe does.
>>
>>       The list_empty() check now happens outside the mutex, but the mutex
>>       wasn't necessary in the first place.
>>
>>       This needed to use deferred probe instead of fragile initcall ordering
>>       on 32-bit VExpress systems where the apb_pclk has a number of probe
>>       dependencies (vexpress-sysregs, vexpress-config).
>>
>>
>>> You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
>>> a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
>>> doesn't mean it's not (2), but if it does, then we know for sure it's
>>> (2).
>> ok, I will try this one, but due to above patch, it may not work.
> Were you able to find anything more?
I can't find any clue, and have no time to check this for now, is there 
any news from your side?
>
> -Saravana
> .
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] amba: Remove deferred device addition
  2021-09-10  7:59                           ` Kefeng Wang
@ 2022-07-05 19:25                             ` Saravana Kannan
  2022-08-27 10:26                               ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 6+ messages in thread
From: Saravana Kannan @ 2022-07-05 19:25 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Rob Herring, linux-kernel, Frank Rowand, devicetree,
	Russell King, Linus Walleij, linux-arm-kernel, Dmitry Torokhov,
	linux-input

On Fri, Sep 10, 2021 at 12:59 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2021/9/9 11:30, Saravana Kannan wrote:
> > On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>
> >> On 2021/8/28 3:09, Saravana Kannan wrote:
> >>> On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>>> On 2021/8/27 8:04, Saravana Kannan wrote:
> >>>>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>>>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
> >>>>>>>>> solution that we have for amba devices. That causes a bunch of other
> >>>>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
> >>>>>>>>> adding more reasons for delaying the addition of the device.
> >>>>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
> >>>>>> (qemu-system-arm -M vexpress-a15),
> >>> I'm assuming it's this one?
> >>> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
> >> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
> >>
> >> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
> >> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
> >> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
> >> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
> >> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
> >> -nographic
> >>
> >>>>> Hi,
> >>>>>
> >>>>> It's hard to make sense of the logs. Looks like two different threads
> >>>>> might be printing to the log at the same time? Can you please enable
> >>>>> the config that prints the thread ID (forgot what it's called) and
> >>>>> collect this again? With what I could tell the crash seems to be
> >>>>> happening somewhere in platform_match(), but that's not related to
> >>>>> this patch at all?
> >>>> Can you reproduce it? it is very likely related(without your patch, the
> >>>> boot is fine),
> >>> Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
> >>>
> >>>> the NULL ptr is about serio, it is registed from amba driver.
> >>>>
> >>>> ambakmi_driver_init
> >>>>
> >>>>     -- amba_kmi_probe
> >>>>
> >>>>       -- __serio_register_port
> >>> Thanks for the pointer. I took a look at the logs and the code. It's
> >>> very strange. As you can see from the backtrace, platform_match() is
> >>> being called for the device_add() from serio_handle_event(). But the
> >>> device that gets added there is on the serio_bus which obviously
> >>> should be using the serio_bus_match.
> >> Yes, I am confused too.
> >>>> +Dmitry and input maillist, is there some known issue about serio ?
> >>>>
> >>>> I add some debug, the full log is attached.
> >>>>
> >>>> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
> >>>> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
> >>>> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
> >>>> drv c1090fc0, drv->name vexpress-reset
> >>> Based on the logs you added, it's pretty clear we are getting to
> >>> platform_match(). It's also strange that the drv->name is
> >>> vexpress-reset
> >> ...
> >>>> [    3.003113][   T41] Backtrace:
> >>>> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
> >>>> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
> >>>> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
> >>>> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
> >>>> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
> >>>> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
> >>>> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
> >>>> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
> >>>> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
> >>>> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
> >>>> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
> >>>> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
> >>>> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
> >>> But the platform_match() is happening for the device_add() from
> >>> serio_event_handle() that's adding a device to the serio_bus and it
> >>> should be using serio_bus_match().
> >>>
> >>> I haven't reached any conclusion yet, but my current thought process
> >>> is that it's either:
> >>> 1. My patch is somehow causing list corruption. But I don't directly
> >>> touch any list in my change (other than deleting a list entirely), so
> >>> it's not clear how that would be happening.
> >> Maybe some concurrent driver load?
> >>
> >>> 2. Without my patch, these AMBA device's probe would be delayed at
> >>> least until 5 seconds or possibly later. I'm wondering if my patch is
> >>> catching some bad timing assumptions in other code.
> >> After Rob's patch, It will retry soon.
> >>
> >> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
> >> Author: Rob Herring <robh@kernel.org>
> >> Date:   Wed Apr 29 15:58:12 2020 -0500
> >>
> >>       amba: Retry adding deferred devices at late_initcall
> >>
> >>       If amba bus devices defer when adding, the amba bus code simply retries
> >>       adding the devices every 5 seconds. This doesn't work well as it
> >>       completely unsynchronized with starting the init process which can
> >>       happen in less than 5 secs. Add a retry during late_initcall. If the
> >>       amba devices are added, then deferred probe takes over. If the
> >>       dependencies have not probed at this point, then there's no improvement
> >>       over previous behavior. To completely solve this, we'd need to retry
> >>       after every successful probe as deferred probe does.
> >>
> >>       The list_empty() check now happens outside the mutex, but the mutex
> >>       wasn't necessary in the first place.
> >>
> >>       This needed to use deferred probe instead of fragile initcall ordering
> >>       on 32-bit VExpress systems where the apb_pclk has a number of probe
> >>       dependencies (vexpress-sysregs, vexpress-config).
> >>
> >>
> >>> You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
> >>> a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
> >>> doesn't mean it's not (2), but if it does, then we know for sure it's
> >>> (2).
> >> ok, I will try this one, but due to above patch, it may not work.
> > Were you able to find anything more?
> I can't find any clue, and have no time to check this for now, is there
> any news from your side?

To close out this thread, the issue was due to a UAF bug in driver
core that was fixed by:
https://lore.kernel.org/all/20220513112444.45112-1-schspa@gmail.com/

With that fix, there wouldn't have been a crash, but amba driver
registration would have failed (because match returned
non-EPROBE_DEFER error).

-Saravana

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] amba: Remove deferred device addition
  2022-07-05 19:25                             ` Saravana Kannan
@ 2022-08-27 10:26                               ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 6+ messages in thread
From: Leizhen (ThunderTown) @ 2022-08-27 10:26 UTC (permalink / raw)
  To: Saravana Kannan, Kefeng Wang
  Cc: Rob Herring, linux-kernel, Frank Rowand, devicetree,
	Russell King, Linus Walleij, linux-arm-kernel, Dmitry Torokhov,
	linux-input



On 2022/7/6 3:25, Saravana Kannan wrote:
> On Fri, Sep 10, 2021 at 12:59 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>>
>> On 2021/9/9 11:30, Saravana Kannan wrote:
>>> On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>
>>>> On 2021/8/28 3:09, Saravana Kannan wrote:
>>>>> On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>> On 2021/8/27 8:04, Saravana Kannan wrote:
>>>>>>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
>>>>>>>>>>> solution that we have for amba devices. That causes a bunch of other
>>>>>>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
>>>>>>>>>>> adding more reasons for delaying the addition of the device.
>>>>>>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
>>>>>>>> (qemu-system-arm -M vexpress-a15),
>>>>> I'm assuming it's this one?
>>>>> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
>>>> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
>>>>
>>>> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
>>>> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
>>>> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
>>>> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
>>>> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
>>>> -nographic
>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It's hard to make sense of the logs. Looks like two different threads
>>>>>>> might be printing to the log at the same time? Can you please enable
>>>>>>> the config that prints the thread ID (forgot what it's called) and
>>>>>>> collect this again? With what I could tell the crash seems to be
>>>>>>> happening somewhere in platform_match(), but that's not related to
>>>>>>> this patch at all?
>>>>>> Can you reproduce it? it is very likely related(without your patch, the
>>>>>> boot is fine),
>>>>> Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
>>>>>
>>>>>> the NULL ptr is about serio, it is registed from amba driver.
>>>>>>
>>>>>> ambakmi_driver_init
>>>>>>
>>>>>>     -- amba_kmi_probe
>>>>>>
>>>>>>       -- __serio_register_port
>>>>> Thanks for the pointer. I took a look at the logs and the code. It's
>>>>> very strange. As you can see from the backtrace, platform_match() is
>>>>> being called for the device_add() from serio_handle_event(). But the
>>>>> device that gets added there is on the serio_bus which obviously
>>>>> should be using the serio_bus_match.
>>>> Yes, I am confused too.
>>>>>> +Dmitry and input maillist, is there some known issue about serio ?
>>>>>>
>>>>>> I add some debug, the full log is attached.
>>>>>>
>>>>>> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
>>>>>> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
>>>>>> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
>>>>>> drv c1090fc0, drv->name vexpress-reset
>>>>> Based on the logs you added, it's pretty clear we are getting to
>>>>> platform_match(). It's also strange that the drv->name is
>>>>> vexpress-reset
>>>> ...
>>>>>> [    3.003113][   T41] Backtrace:
>>>>>> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
>>>>>> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
>>>>>> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
>>>>>> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
>>>>>> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
>>>>>> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
>>>>>> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
>>>>>> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
>>>>>> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
>>>>>> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
>>>>>> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
>>>>>> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
>>>>>> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
>>>>> But the platform_match() is happening for the device_add() from
>>>>> serio_event_handle() that's adding a device to the serio_bus and it
>>>>> should be using serio_bus_match().
>>>>>
>>>>> I haven't reached any conclusion yet, but my current thought process
>>>>> is that it's either:
>>>>> 1. My patch is somehow causing list corruption. But I don't directly
>>>>> touch any list in my change (other than deleting a list entirely), so
>>>>> it's not clear how that would be happening.
>>>> Maybe some concurrent driver load?
>>>>
>>>>> 2. Without my patch, these AMBA device's probe would be delayed at
>>>>> least until 5 seconds or possibly later. I'm wondering if my patch is
>>>>> catching some bad timing assumptions in other code.
>>>> After Rob's patch, It will retry soon.
>>>>
>>>> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
>>>> Author: Rob Herring <robh@kernel.org>
>>>> Date:   Wed Apr 29 15:58:12 2020 -0500
>>>>
>>>>       amba: Retry adding deferred devices at late_initcall
>>>>
>>>>       If amba bus devices defer when adding, the amba bus code simply retries
>>>>       adding the devices every 5 seconds. This doesn't work well as it
>>>>       completely unsynchronized with starting the init process which can
>>>>       happen in less than 5 secs. Add a retry during late_initcall. If the
>>>>       amba devices are added, then deferred probe takes over. If the
>>>>       dependencies have not probed at this point, then there's no improvement
>>>>       over previous behavior. To completely solve this, we'd need to retry
>>>>       after every successful probe as deferred probe does.
>>>>
>>>>       The list_empty() check now happens outside the mutex, but the mutex
>>>>       wasn't necessary in the first place.
>>>>
>>>>       This needed to use deferred probe instead of fragile initcall ordering
>>>>       on 32-bit VExpress systems where the apb_pclk has a number of probe
>>>>       dependencies (vexpress-sysregs, vexpress-config).
>>>>
>>>>
>>>>> You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
>>>>> a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
>>>>> doesn't mean it's not (2), but if it does, then we know for sure it's
>>>>> (2).
>>>> ok, I will try this one, but due to above patch, it may not work.
>>> Were you able to find anything more?
>> I can't find any clue, and have no time to check this for now, is there
>> any news from your side?

Hi, Saravana and Kefeng:
  I've spent the whole afternoon trying to figure this out, and the fix
patch has been cc you two.

> 
> To close out this thread, the issue was due to a UAF bug in driver
> core that was fixed by:
> https://lore.kernel.org/all/20220513112444.45112-1-schspa@gmail.com/
> 
> With that fix, there wouldn't have been a crash, but amba driver
> registration would have failed (because match returned
> non-EPROBE_DEFER error).
> 
> -Saravana
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-27 10:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210816074619.177383-1-wangkefeng.wang@huawei.com>
     [not found] ` <20210816074619.177383-4-wangkefeng.wang@huawei.com>
     [not found]   ` <CAL_JsqLBddXVeP-t++wqPNp=xYF7tvEcnCbjFnK9CUBLK2+9JA@mail.gmail.com>
     [not found]     ` <CAGETcx8SY14rcd7g=Gdwmw7sUMb=jdEV+ffuNpg6btDoL1jmWw@mail.gmail.com>
     [not found]       ` <ee649111-dc07-d6db-8872-dcb692802236@huawei.com>
     [not found]         ` <CAGETcx9drOdE_vfn-nhDZM9MbgxGxYJN6ydiAVxo_Ltqve9eTg@mail.gmail.com>
     [not found]           ` <b5eb935f-26e1-6475-63af-e7f6101eb017@huawei.com>
     [not found]             ` <CAGETcx9yaWZOzt=gcyNAshoHdPoYizhmrKS-kU9c2QM2+HqeEw@mail.gmail.com>
     [not found]               ` <df8e7756-8b0d-d7de-a9ff-3f6eb0ffa8a5@huawei.com>
     [not found]                 ` <CAGETcx-47yRUcBjEdWFBtroSEkHXRNrJ4zaD8WpE0DPEPp9NxQ@mail.gmail.com>
     [not found]                   ` <85b28900-5f42-b997-2ded-0b952bc2a03e@huawei.com>
2021-08-27 19:09                     ` [BUG] amba: Remove deferred device addition Saravana Kannan
2021-08-28  1:09                       ` Kefeng Wang
2021-09-09  3:30                         ` Saravana Kannan
2021-09-10  7:59                           ` Kefeng Wang
2022-07-05 19:25                             ` Saravana Kannan
2022-08-27 10:26                               ` Leizhen (ThunderTown)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).