All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saravana Kannan <saravanak@google.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Rob Herring <robh+dt@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	devicetree@vger.kernel.org, Russell King <linux@armlinux.org.uk>,
	Linus Walleij <linus.walleij@linaro.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	linux-input@vger.kernel.org
Subject: Re: [BUG] amba: Remove deferred device addition
Date: Wed, 8 Sep 2021 20:30:24 -0700	[thread overview]
Message-ID: <CAGETcx9m4=7V25nvYa0030ChKeJw5bu3ogs6gjFpjNKdq+_B_Q@mail.gmail.com> (raw)
In-Reply-To: <265bb783-10da-a7c1-2625-055dec5643a3@huawei.com>

On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2021/8/28 3:09, Saravana Kannan wrote:
> > On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>
> >> On 2021/8/27 8:04, Saravana Kannan wrote:
> >>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
> >>>>>>> solution that we have for amba devices. That causes a bunch of other
> >>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
> >>>>>>> adding more reasons for delaying the addition of the device.
> >>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
> >>>> (qemu-system-arm -M vexpress-a15),
> > I'm assuming it's this one?
> > arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
>
> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
>
> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
> -nographic
>
> >
> >>> Hi,
> >>>
> >>> It's hard to make sense of the logs. Looks like two different threads
> >>> might be printing to the log at the same time? Can you please enable
> >>> the config that prints the thread ID (forgot what it's called) and
> >>> collect this again? With what I could tell the crash seems to be
> >>> happening somewhere in platform_match(), but that's not related to
> >>> this patch at all?
> >> Can you reproduce it? it is very likely related(without your patch, the
> >> boot is fine),
> > Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
> >
> >> the NULL ptr is about serio, it is registed from amba driver.
> >>
> >> ambakmi_driver_init
> >>
> >>    -- amba_kmi_probe
> >>
> >>      -- __serio_register_port
> > Thanks for the pointer. I took a look at the logs and the code. It's
> > very strange. As you can see from the backtrace, platform_match() is
> > being called for the device_add() from serio_handle_event(). But the
> > device that gets added there is on the serio_bus which obviously
> > should be using the serio_bus_match.
> Yes, I am confused too.
> >
> >> +Dmitry and input maillist, is there some known issue about serio ?
> >>
> >> I add some debug, the full log is attached.
> >>
> >> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
> >> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
> >> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
> >> drv c1090fc0, drv->name vexpress-reset
> > Based on the logs you added, it's pretty clear we are getting to
> > platform_match(). It's also strange that the drv->name is
> > vexpress-reset
> ...
> >
> >> [    3.003113][   T41] Backtrace:
> >> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
> >> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
> >> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
> >> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
> >> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
> >> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
> >> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
> >> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
> >> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
> >> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
> >> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
> >> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
> >> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
> > But the platform_match() is happening for the device_add() from
> > serio_event_handle() that's adding a device to the serio_bus and it
> > should be using serio_bus_match().
> >
> > I haven't reached any conclusion yet, but my current thought process
> > is that it's either:
> > 1. My patch is somehow causing list corruption. But I don't directly
> > touch any list in my change (other than deleting a list entirely), so
> > it's not clear how that would be happening.
>
> Maybe some concurrent driver load?
>
> > 2. Without my patch, these AMBA device's probe would be delayed at
> > least until 5 seconds or possibly later. I'm wondering if my patch is
> > catching some bad timing assumptions in other code.
>
> After Rob's patch, It will retry soon.
>
> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
> Author: Rob Herring <robh@kernel.org>
> Date:   Wed Apr 29 15:58:12 2020 -0500
>
>      amba: Retry adding deferred devices at late_initcall
>
>      If amba bus devices defer when adding, the amba bus code simply retries
>      adding the devices every 5 seconds. This doesn't work well as it
>      completely unsynchronized with starting the init process which can
>      happen in less than 5 secs. Add a retry during late_initcall. If the
>      amba devices are added, then deferred probe takes over. If the
>      dependencies have not probed at this point, then there's no improvement
>      over previous behavior. To completely solve this, we'd need to retry
>      after every successful probe as deferred probe does.
>
>      The list_empty() check now happens outside the mutex, but the mutex
>      wasn't necessary in the first place.
>
>      This needed to use deferred probe instead of fragile initcall ordering
>      on 32-bit VExpress systems where the apb_pclk has a number of probe
>      dependencies (vexpress-sysregs, vexpress-config).
>
>
> >
> > You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
> > a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
> > doesn't mean it's not (2), but if it does, then we know for sure it's
> > (2).
> ok, I will try this one, but due to above patch, it may not work.

Were you able to find anything more?

-Saravana

WARNING: multiple messages have this Message-ID (diff)
From: Saravana Kannan <saravanak@google.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Rob Herring <robh+dt@kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	 devicetree@vger.kernel.org, Russell King <linux@armlinux.org.uk>,
	 Linus Walleij <linus.walleij@linaro.org>,
	 linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	 Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	linux-input@vger.kernel.org
Subject: Re: [BUG] amba: Remove deferred device addition
Date: Wed, 8 Sep 2021 20:30:24 -0700	[thread overview]
Message-ID: <CAGETcx9m4=7V25nvYa0030ChKeJw5bu3ogs6gjFpjNKdq+_B_Q@mail.gmail.com> (raw)
In-Reply-To: <265bb783-10da-a7c1-2625-055dec5643a3@huawei.com>

On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2021/8/28 3:09, Saravana Kannan wrote:
> > On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>
> >> On 2021/8/27 8:04, Saravana Kannan wrote:
> >>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
> >>>>>>> solution that we have for amba devices. That causes a bunch of other
> >>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
> >>>>>>> adding more reasons for delaying the addition of the device.
> >>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
> >>>> (qemu-system-arm -M vexpress-a15),
> > I'm assuming it's this one?
> > arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
>
> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
>
> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
> -nographic
>
> >
> >>> Hi,
> >>>
> >>> It's hard to make sense of the logs. Looks like two different threads
> >>> might be printing to the log at the same time? Can you please enable
> >>> the config that prints the thread ID (forgot what it's called) and
> >>> collect this again? With what I could tell the crash seems to be
> >>> happening somewhere in platform_match(), but that's not related to
> >>> this patch at all?
> >> Can you reproduce it? it is very likely related(without your patch, the
> >> boot is fine),
> > Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
> >
> >> the NULL ptr is about serio, it is registed from amba driver.
> >>
> >> ambakmi_driver_init
> >>
> >>    -- amba_kmi_probe
> >>
> >>      -- __serio_register_port
> > Thanks for the pointer. I took a look at the logs and the code. It's
> > very strange. As you can see from the backtrace, platform_match() is
> > being called for the device_add() from serio_handle_event(). But the
> > device that gets added there is on the serio_bus which obviously
> > should be using the serio_bus_match.
> Yes, I am confused too.
> >
> >> +Dmitry and input maillist, is there some known issue about serio ?
> >>
> >> I add some debug, the full log is attached.
> >>
> >> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
> >> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
> >> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
> >> drv c1090fc0, drv->name vexpress-reset
> > Based on the logs you added, it's pretty clear we are getting to
> > platform_match(). It's also strange that the drv->name is
> > vexpress-reset
> ...
> >
> >> [    3.003113][   T41] Backtrace:
> >> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
> >> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
> >> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
> >> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
> >> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
> >> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
> >> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
> >> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
> >> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
> >> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
> >> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
> >> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
> >> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
> > But the platform_match() is happening for the device_add() from
> > serio_event_handle() that's adding a device to the serio_bus and it
> > should be using serio_bus_match().
> >
> > I haven't reached any conclusion yet, but my current thought process
> > is that it's either:
> > 1. My patch is somehow causing list corruption. But I don't directly
> > touch any list in my change (other than deleting a list entirely), so
> > it's not clear how that would be happening.
>
> Maybe some concurrent driver load?
>
> > 2. Without my patch, these AMBA device's probe would be delayed at
> > least until 5 seconds or possibly later. I'm wondering if my patch is
> > catching some bad timing assumptions in other code.
>
> After Rob's patch, It will retry soon.
>
> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
> Author: Rob Herring <robh@kernel.org>
> Date:   Wed Apr 29 15:58:12 2020 -0500
>
>      amba: Retry adding deferred devices at late_initcall
>
>      If amba bus devices defer when adding, the amba bus code simply retries
>      adding the devices every 5 seconds. This doesn't work well as it
>      completely unsynchronized with starting the init process which can
>      happen in less than 5 secs. Add a retry during late_initcall. If the
>      amba devices are added, then deferred probe takes over. If the
>      dependencies have not probed at this point, then there's no improvement
>      over previous behavior. To completely solve this, we'd need to retry
>      after every successful probe as deferred probe does.
>
>      The list_empty() check now happens outside the mutex, but the mutex
>      wasn't necessary in the first place.
>
>      This needed to use deferred probe instead of fragile initcall ordering
>      on 32-bit VExpress systems where the apb_pclk has a number of probe
>      dependencies (vexpress-sysregs, vexpress-config).
>
>
> >
> > You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
> > a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
> > doesn't mean it's not (2), but if it does, then we know for sure it's
> > (2).
> ok, I will try this one, but due to above patch, it may not work.

Were you able to find anything more?

-Saravana

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-09-09  3:31 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16  7:46 [PATCH 0/3] amba: Properly handle device probe without IRQ domain Kefeng Wang
2021-08-16  7:46 ` Kefeng Wang
2021-08-16  7:46 ` [PATCH 1/3] amba: Drop unused functions about APB/AHB devices add Kefeng Wang
2021-08-16  7:46   ` Kefeng Wang
2021-08-16  7:46 ` [PATCH 2/3] Revert "ARM: amba: make use of -1 IRQs warn" Kefeng Wang
2021-08-16  7:46   ` Kefeng Wang
2021-08-16  7:46 ` [PATCH 3/3] amba: Properly handle device probe without IRQ domain Kefeng Wang
2021-08-16  7:46   ` Kefeng Wang
2021-08-24 20:05   ` Rob Herring
2021-08-24 20:05     ` Rob Herring
2021-08-24 20:08     ` Saravana Kannan
2021-08-24 20:08       ` Saravana Kannan
2021-08-25  4:05       ` Kefeng Wang
2021-08-25  4:05         ` Kefeng Wang
2021-08-25  8:04         ` Saravana Kannan
2021-08-25  8:04           ` Saravana Kannan
2021-08-25  8:39           ` Kefeng Wang
2021-08-25  8:39             ` Kefeng Wang
2021-08-26  2:45           ` Kefeng Wang
2021-08-26  2:45             ` Kefeng Wang
2021-08-26  4:45             ` Saravana Kannan
2021-08-26  4:45               ` Saravana Kannan
2021-08-26  6:22               ` Kefeng Wang
2021-08-26  6:22                 ` Kefeng Wang
2021-08-26  8:22               ` [BUG] amba: Remove deferred device addition Kefeng Wang
2021-08-27  0:04                 ` Saravana Kannan
2021-08-27  0:04                   ` Saravana Kannan
2021-08-27 14:38                   ` Kefeng Wang
2021-08-27 19:09                     ` Saravana Kannan
2021-08-27 19:09                       ` Saravana Kannan
2021-08-28  1:09                       ` Kefeng Wang
2021-08-28  1:09                         ` Kefeng Wang
2021-09-09  3:30                         ` Saravana Kannan [this message]
2021-09-09  3:30                           ` Saravana Kannan
2021-09-10  7:59                           ` Kefeng Wang
2021-09-10  7:59                             ` Kefeng Wang
2022-07-05 19:25                             ` Saravana Kannan
2022-07-05 19:25                               ` Saravana Kannan
2022-08-27 10:26                               ` Leizhen (ThunderTown)
2022-08-27 10:26                                 ` Leizhen (ThunderTown)
2021-08-25 12:33         ` [PATCH 3/3] amba: Properly handle device probe without IRQ domain Rob Herring
2021-08-25 12:33           ` Rob Herring
2021-08-25 14:41           ` Kefeng Wang
2021-08-25 14:41             ` Kefeng Wang
2021-08-17 22:27 ` [PATCH 0/3] " Rob Herring
2021-08-17 22:27   ` Rob Herring
2021-08-23  2:19   ` Kefeng Wang
2021-08-23  2:19     ` Kefeng Wang
2021-08-23  9:05     ` Russell King (Oracle)
2021-08-23  9:05       ` Russell King (Oracle)
2021-08-23 10:57       ` Kefeng Wang
2021-08-23 10:57         ` Kefeng Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGETcx9m4=7V25nvYa0030ChKeJw5bu3ogs6gjFpjNKdq+_B_Q@mail.gmail.com' \
    --to=saravanak@google.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dmitry.torokhov@gmail.com \
    --cc=frowand.list@gmail.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=robh+dt@kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.