All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()")
@ 2016-02-10 12:21 Rafał Miłecki
       [not found] ` <CACna6rxQPv_Ut9NwN+BMbwSW+5-tJMoKXC3QWDfSP7zsCqfj7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Rafał Miłecki @ 2016-02-10 12:21 UTC (permalink / raw)
  To: Mark Brown, linux-spi; +Cc: Hauke Mehrtens

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

Hi,

I recently tried few recent kernels and found out that spi-bcm53xx
fails (rcu_sched) with them. I knew it was working OK with 3.18, so I
tried newer ones one by one and discovered it's a 4.0 regression.
Bisecting pointed following commit:

commit 0461a4149836c792d186027c8c859637a4cfb11a
Author: Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date:   Tue Dec 9 21:38:05 2014 +0000

    spi: Pump transfers inside calling context for spi_sync()

I also confirmed that reverting two commits:
spi: Only idle the message pump in the worker kthread
spi: Pump transfers inside calling context for spi_sync()
from top of 4.0.9 (clean), 4.1.16 (clean) and 4.4 (manual) fixes
(workarounds) that issue for me.

Unfortunately the bug in spi-bcm53xx isn't obvious for me. I kept
looking at its code for a bit but couldn't find anything.

Knowing the commit that exposed this bug, does it give you any hint
what may be wrong with spi-bcm53xx?

-- 
Rafał

[-- Attachment #2: rcu_sched.txt --]
[-- Type: text/plain, Size: 5122 bytes --]

[   74.438399] INFO: rcu_sched self-detected stall on CPU { 0}  (t=6000 jiffies g=-116 c=-117 q=12)
[   74.447203] Task dump for CPU 0:
[   74.450417] mtd             R running      0   505    503 0x00000002
[   74.456758] Backtrace: 
[   74.459242] [<c001590c>] (dump_backtrace) from [<c0015b1c>] (show_stack+0x18/0x1c)
[   74.466771]  r7:c03e1c40 r6:c03db0fc r5:c7981c00 r4:00000000
[   74.472447] [<c0015b04>] (show_stack) from [<c003c428>] (sched_show_task+0xb0/0xd0)
[   74.480069] [<c003c378>] (sched_show_task) from [<c003e210>] (dump_cpu_task+0x34/0x44)
[   74.487947]  r5:00000000 r4:00000000
[   74.491531] [<c003e1dc>] (dump_cpu_task) from [<c0054354>] (rcu_dump_cpu_stacks+0x7c/0xa0)
[   74.499750]  r5:00000000 r4:c03e1c40
[   74.503332] [<c00542d8>] (rcu_dump_cpu_stacks) from [<c0056ca8>] (rcu_check_callbacks+0x1d8/0x568)
[   74.512243]  r9:c03da100 r8:c03db148 r7:c7060008 r6:0000000c r5:c6dd4208 r4:c03e1c40
[   74.520001] [<c0056ad0>] (rcu_check_callbacks) from [<c00589f4>] (update_process_times+0x40/0x60)
[   74.528823]  r10:51eb851f r9:09f12a73 r8:0000001d r7:00000000 r6:00000000 r5:c7981c00
[   74.536651]  r4:c7060000
[   74.539197] [<c00589b4>] (update_process_times) from [<c006340c>] (tick_periodic+0xa8/0xc0)
[   74.547503]  r7:00000000 r6:c6dd8000 r5:7fffffff r4:c03da140
[   74.553173] [<c0063364>] (tick_periodic) from [<c0063608>] (tick_handle_periodic+0x30/0x90)
[   74.561482]  r5:7fffffff r4:ffffffff
[   74.565063] [<c00635d8>] (tick_handle_periodic) from [<c0017dd0>] (twd_handler+0x34/0x40)
[   74.573198]  r9:09f12a73 r8:0000001d r7:c78141c0 r6:c03e5b78 r5:c7805640 r4:00000001
[   74.580960] [<c0017d9c>] (twd_handler) from [<c00519ec>] (handle_percpu_devid_irq+0x70/0x8c)
[   74.589355]  r5:c7805640 r4:c6dd8000
[   74.592934] [<c005197c>] (handle_percpu_devid_irq) from [<c004e14c>] (generic_handle_irq+0x28/0x38)
[   74.601934]  r9:09f12a73 r8:00000001 r7:c780e800 r6:c03d6894 r5:00000000 r4:0000001d
[   74.609686] [<c004e124>] (generic_handle_irq) from [<c004e40c>] (__handle_domain_irq+0x84/0xa8)
[   74.618341]  r5:00000000 r4:00000000
[   74.621922] [<c004e388>] (__handle_domain_irq) from [<c000866c>] (gic_handle_irq+0x44/0x64)
[   74.630231]  r9:09f12a73 r8:c03e5964 r7:c7061c14 r6:c7061be0 r5:c03db480 r4:c8802100
[   74.637981] [<c0008628>] (gic_handle_irq) from [<c0009200>] (__irq_svc+0x40/0x54)
[   74.645431] Exception stack(0xc7061be0 to 0xc7061c28)
[   74.650465] 1be0: 00000001 c88e0220 6666666c c88e0000 c03da100 c7a4c9b8 ffffa7e2 c03e5964
[   74.658603] 1c00: c03e5964 09f12a73 51eb851f c7061c34 c7061c38 c7061c28 c01ef8b4 c0209db4
[   74.666736] 1c20: 60000013 ffffffff
[   74.670207]  r7:c7061c14 r6:ffffffff r5:60000013 r4:c0209db4
[   74.675886] [<c0209d98>] (bcma_host_soc_read32) from [<c01ef8b4>] (bcm53xxspi_wait.isra.0+0xc0/0x124)
[   74.685068] [<c01ef7f4>] (bcm53xxspi_wait.isra.0) from [<c01efc18>] (bcm53xxspi_transfer_one+0x300/0x374)
[   74.694581]  r9:09f12a73 r8:c713b52b r7:00000010 r6:00004ad5 r5:00000010 r4:c7a4c9b8
[   74.702345] [<c01ef918>] (bcm53xxspi_transfer_one) from [<c01ee2c8>] (spi_transfer_one_message+0x70/0x1d0)
[   74.711948]  r10:c7a4c900 r9:00000001 r8:00000000 r7:c7a4c800 r6:00000000 r5:c7061dcc
[   74.719776]  r4:c7061d60 r3:c01ef918
[   74.723356] [<c01ee258>] (spi_transfer_one_message) from [<c01eea00>] (spi_pump_messages+0x3c8/0x40c)
[   74.732528]  r9:c7061d08 r8:c7061d60 r7:00000000 r6:c7a4c800 r5:c7a4c95c r4:c7a4c94c
[   74.740281] [<c01ee638>] (spi_pump_messages) from [<c01ef4d4>] (__spi_sync+0x148/0x170)
[   74.748244]  r10:c7a4c900 r9:c7061d08 r8:c7a4c95c r7:00000000 r6:c7a4c800 r5:c7a4c960
[   74.756073]  r4:c7061d60
[   74.758605] [<c01ef38c>] (__spi_sync) from [<c01ef510>] (spi_sync+0x14/0x18)
[   74.765620]  r10:c7a6ce00 r9:00000000 r8:c7a48810 r7:c7061d60 r6:c7061ef8 r5:01c40000
[   74.773447]  r4:00000005
[   74.775981] [<c01ef4fc>] (spi_sync) from [<c01dda0c>] (bcm53xxspiflash_read+0xd0/0xfc)
[   74.783867] [<c01dd93c>] (bcm53xxspiflash_read) from [<c01dc704>] (spi_nor_read+0x60/0x7c)
[   74.792084]  r8:00040000 r7:00000000 r6:01c40000 r5:c7a489f0 r4:00000000
[   74.798799] [<c01dc6a4>] (spi_nor_read) from [<c01c0c3c>] (part_read+0x50/0x88)
[   74.806072]  r7:c7061f80 r6:00000000 r5:00000000 r4:c7a6ce00
[   74.811751] [<c01c0bec>] (part_read) from [<c01be308>] (mtd_read+0x6c/0xa4)
[   74.818677]  r9:00000000 r8:c7a6ce00 r6:c7120000 r5:00000000 r4:003b0000
[   74.825391] [<c01be29c>] (mtd_read) from [<c01c2340>] (mtdchar_read+0x13c/0x204)
[   74.832753]  r10:c7060000 r9:b6ab1010 r8:c7a6ce00 r7:c7061f80 r6:c7120000 r5:003b0000
[   74.840580]  r4:00000000
[   74.843124] [<c01c2204>] (mtdchar_read) from [<c00a7e84>] (vfs_read+0x9c/0x130)
[   74.850395]  r10:00000000 r9:c7060000 r8:c0008a84 r7:c01c2204 r6:c7061f80 r5:c702f880
[   74.858223]  r4:b4eb1010
[   74.860757] [<c00a7de8>] (vfs_read) from [<c00a8508>] (SyS_pread64+0x68/0x88)
[   74.867857]  r9:c7060000 r8:c0008a84 r7:01fb0000 r6:c702f880 r5:c702f880 r4:b4eb1010
[   74.875607] [<c00a84a0>] (SyS_pread64) from [<c00088e0>] (ret_fast_syscall+0x0/0x30)
[   74.883314]  r7:000000b4 r6:00000003 r5:00000000 r4:00000000

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()")
       [not found] ` <CACna6rxQPv_Ut9NwN+BMbwSW+5-tJMoKXC3QWDfSP7zsCqfj7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-02-10 12:48   ` Mark Brown
       [not found]     ` <20160210124828.GX13270-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Brown @ 2016-02-10 12:48 UTC (permalink / raw)
  To: Rafał Miłecki; +Cc: linux-spi, Hauke Mehrtens

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]

On Wed, Feb 10, 2016 at 01:21:04PM +0100, Rafał Miłecki wrote:

> Unfortunately the bug in spi-bcm53xx isn't obvious for me. I kept
> looking at its code for a bit but couldn't find anything.

> Knowing the commit that exposed this bug, does it give you any hint
> what may be wrong with spi-bcm53xx?

It's most likely just hammering the CPU for a very long time without
scheduling due to doing a long PIO transfers.  That change just removes
some needless context switching.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()")
       [not found]     ` <20160210124828.GX13270-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
@ 2016-02-10 13:04       ` Rafał Miłecki
       [not found]         ` <CACna6rzL-++_k68-VFQ4A9+gKHBJwhB5PJgU7wDwLQwi0vCCQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Rafał Miłecki @ 2016-02-10 13:04 UTC (permalink / raw)
  To: Mark Brown; +Cc: linux-spi, Hauke Mehrtens

On 10 February 2016 at 13:48, Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Wed, Feb 10, 2016 at 01:21:04PM +0100, Rafał Miłecki wrote:
>
>> Unfortunately the bug in spi-bcm53xx isn't obvious for me. I kept
>> looking at its code for a bit but couldn't find anything.
>
>> Knowing the commit that exposed this bug, does it give you any hint
>> what may be wrong with spi-bcm53xx?
>
> It's most likely just hammering the CPU for a very long time without
> scheduling due to doing a long PIO transfers.  That change just removes
> some needless context switching.

The fact is it happens only after flashing firmware, when flash is
formatted for the first time. After flashing some changes from ramfs
are likely flushed to the falsh. I guess it matches your theory of
long time CPU hammering.

Any idea how to handle this?

-- 
Rafał
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()")
       [not found]         ` <CACna6rzL-++_k68-VFQ4A9+gKHBJwhB5PJgU7wDwLQwi0vCCQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-02-10 13:15           ` Mark Brown
       [not found]             ` <20160210131552.GY13270-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Brown @ 2016-02-10 13:15 UTC (permalink / raw)
  To: Rafał Miłecki; +Cc: linux-spi, Hauke Mehrtens

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On Wed, Feb 10, 2016 at 02:04:21PM +0100, Rafał Miłecki wrote:
> On 10 February 2016 at 13:48, Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> > It's most likely just hammering the CPU for a very long time without
> > scheduling due to doing a long PIO transfers.  That change just removes
> > some needless context switching.

> The fact is it happens only after flashing firmware, when flash is
> formatted for the first time. After flashing some changes from ramfs
> are likely flushed to the falsh. I guess it matches your theory of
> long time CPU hammering.

I bet you can trigger it with a heavy read load too (especially lots of
reads going on in parallel).  You'll have always been able to trigger
it, it's just that this has made it easier.

> Any idea how to handle this?

Don't busy wait in the driver, ideally implement DMA support.  Add some
sleeps if you really need to.  A PIO driver really isn't fit for purpose
with large transfers like you get on flash.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()")
       [not found]             ` <20160210131552.GY13270-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
@ 2016-02-10 15:30               ` Rafał Miłecki
       [not found]                 ` <CACna6rzHFx8VQs39gRGMKoF=P_t_MwGKQhUQb0O6x4OQeYrX5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Rafał Miłecki @ 2016-02-10 15:30 UTC (permalink / raw)
  To: Mark Brown; +Cc: linux-spi, Hauke Mehrtens

On 10 February 2016 at 14:15, Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Wed, Feb 10, 2016 at 02:04:21PM +0100, Rafał Miłecki wrote:
>> On 10 February 2016 at 13:48, Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
>> > It's most likely just hammering the CPU for a very long time without
>> > scheduling due to doing a long PIO transfers.  That change just removes
>> > some needless context switching.
>
>> The fact is it happens only after flashing firmware, when flash is
>> formatted for the first time. After flashing some changes from ramfs
>> are likely flushed to the falsh. I guess it matches your theory of
>> long time CPU hammering.
>
> I bet you can trigger it with a heavy read load too (especially lots of
> reads going on in parallel).  You'll have always been able to trigger
> it, it's just that this has made it easier.
>
>> Any idea how to handle this?
>
> Don't busy wait in the driver, ideally implement DMA support.  Add some
> sleeps if you really need to.  A PIO driver really isn't fit for purpose
> with large transfers like you get on flash.

The only place with busy waiting seems to be bcm53xxspi_wait. I tried
adding cpu_relax() before udelay(5) as well as cpu_relax() at the
beginning of this function. I didn't chage anything. Should I raher
use some timer & sleeping/waking mechanism?

I'm afraid there isn't a real DMA available. It seems Broadcom uses
something called "Boot SPI". Enabling this mode maps flash content
into a memory, but it may be a bit complex to implement. It would
require flash driver to have better control over SPI controller and
I'm not aware of any existing API for that.

-- 
Rafał
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()")
       [not found]                 ` <CACna6rzHFx8VQs39gRGMKoF=P_t_MwGKQhUQb0O6x4OQeYrX5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-02-10 18:06                   ` Mark Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Brown @ 2016-02-10 18:06 UTC (permalink / raw)
  To: Rafał Miłecki; +Cc: linux-spi, Hauke Mehrtens

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

On Wed, Feb 10, 2016 at 04:30:03PM +0100, Rafał Miłecki wrote:
> On 10 February 2016 at 14:15, Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> > Don't busy wait in the driver, ideally implement DMA support.  Add some
> > sleeps if you really need to.  A PIO driver really isn't fit for purpose
> > with large transfers like you get on flash.

> The only place with busy waiting seems to be bcm53xxspi_wait. I tried
> adding cpu_relax() before udelay(5) as well as cpu_relax() at the
> beginning of this function. I didn't chage anything. Should I raher
> use some timer & sleeping/waking mechanism?

Yes, you'd need to either add some actual sleeps (probably a brief one
after every transfer) or work with the maintainers of the lockup
detector to come up with some other way that the thread can tell the
detector that it really intends to burn huge amounts of CPU time in
kernel.

> I'm afraid there isn't a real DMA available. It seems Broadcom uses
> something called "Boot SPI". Enabling this mode maps flash content
> into a memory, but it may be a bit complex to implement. It would
> require flash driver to have better control over SPI controller and
> I'm not aware of any existing API for that.

Read support has actually just been implemented (it ends up just being a
callback with a MTD specific accessor), write support could also be
added along similar lines I guess.  See spi_flash_read() in -next.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-10 18:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-10 12:21 Bug in in spi-bcm53xx (rcu_sched) exposed by 0461a41 ("spi: Pump transfers inside calling context for spi_sync()") Rafał Miłecki
     [not found] ` <CACna6rxQPv_Ut9NwN+BMbwSW+5-tJMoKXC3QWDfSP7zsCqfj7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-10 12:48   ` Mark Brown
     [not found]     ` <20160210124828.GX13270-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2016-02-10 13:04       ` Rafał Miłecki
     [not found]         ` <CACna6rzL-++_k68-VFQ4A9+gKHBJwhB5PJgU7wDwLQwi0vCCQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-10 13:15           ` Mark Brown
     [not found]             ` <20160210131552.GY13270-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2016-02-10 15:30               ` Rafał Miłecki
     [not found]                 ` <CACna6rzHFx8VQs39gRGMKoF=P_t_MwGKQhUQb0O6x4OQeYrX5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-10 18:06                   ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.