All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] irqchip: armada-370-xp: workqueue lockup
@ 2021-09-21  8:40 Steffen Trumtrar
  2021-09-21 15:18 ` Marc Zyngier
  2021-09-22 13:27 ` [irqchip: irq/irqchip-fixes] irqchip/armada-370-xp: Fix ack/eoi breakage irqchip-bot for Marc Zyngier
  0 siblings, 2 replies; 6+ messages in thread
From: Steffen Trumtrar @ 2021-09-21  8:40 UTC (permalink / raw)
  To: Valentin Schneider, Marc Zyngier
  Cc: Andrew Lunn, Gregory Clement, Sebastion Hesselbarth, linux-arm-kernel


Hi,

I noticed that after the patch

        e52e73b7e9f7d08b8c2ef6fb1657105093e22a03
        From: Valentin Schneider <valentin.schneider@arm.com>
        Date: Mon, 9 Nov 2020 09:41:18 +0000
        Subject: [PATCH] irqchip/armada-370-xp: Make IPIs use
        handle_percpu_devid_irq()

        As done for the Arm GIC irqchips, move IPIs to handle_percpu_devid_irq() as
        handle_percpu_devid_fasteoi_ipi() isn't actually required.

        Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
        Signed-off-by: Marc Zyngier <maz@kernel.org>
        Link: https://lore.kernel.org/r/20201109094121.29975-3-valentin.schneider@arm.com
        ---
        drivers/irqchip/irq-armada-370-xp.c | 2 +-
        1 file changed, 1 insertion(+), 1 deletion(-)

        diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
        index d7eb2e93db8f..32938dfc0e46 100644
        --- a/drivers/irqchip/irq-armada-370-xp.c
        +++ b/drivers/irqchip/irq-armada-370-xp.c
        @@ -382,7 +382,7 @@ static int armada_370_xp_ipi_alloc(struct irq_domain *d,
                        irq_set_percpu_devid(virq + i);
                        irq_domain_set_info(d, virq + i, i, &ipi_irqchip,
                                        d->host_data,
        -                                   handle_percpu_devid_fasteoi_ipi,
        +                                   handle_percpu_devid_irq,
                                        NULL, NULL);
                }

I get workqueue lockups on my Armada-XP based board.
When I run the following test on v5.15-rc2

        stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 120s

I get a backtrace like this:

        stress-ng: info:  [7740] dispatching hogs: 8 cpu, 4 io, 2 vm, 4 fork
        [ 1670.169087] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        [ 1670.169102] 	(detected by 0, t=5252 jiffies, g=50257, q=3369)
        [ 1670.169112] rcu: All QSes seen, last rcu_preempt kthread activity 5252 (342543-337291), jiffies_till_next_fqs=1, root ->qsmask 0x0
        [ 1670.169121] rcu: rcu_preempt kthread timer wakeup didn't happen for 5251 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
        [ 1670.169128] rcu: 	Possible timer handling issue on cpu=1 timer-softirq=20398
        [ 1670.169132] rcu: rcu_preempt kthread starved for 5252 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
        [ 1670.169140] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
        [ 1670.169143] rcu: RCU grace-period kthread stack dump:
        [ 1670.169146] task:rcu_preempt     state:R stack:    0 pid:   13 ppid:     2 flags:0x00000000
        [ 1670.169157] Backtrace:
        [ 1670.169163] [<c0a19c20>] (__schedule) from [<c0a1a458>] (schedule+0x64/0x110)
        [ 1670.169185]  r10:00000001 r9:c190e000 r8:c137b690 r7:c137b69c r6:c190fed4 r5:c190e000
        [ 1670.169189]  r4:c197c880
        [ 1670.169192] [<c0a1a3f4>] (schedule) from [<c0a20048>] (schedule_timeout+0xa8/0x1c0)
        [ 1670.169206]  r5:c1303d00 r4:0005258c
        [ 1670.169209] [<c0a1ffa0>] (schedule_timeout) from [<c01a1664>] (rcu_gp_fqs_loop+0x120/0x3ac)
        [ 1670.169227]  r7:c137b69c r6:c1303d00 r5:c137b4c0 r4:00000000
        [ 1670.169230] [<c01a1544>] (rcu_gp_fqs_loop) from [<c01a3dac>] (rcu_gp_kthread+0xfc/0x1b0)
        [ 1670.169247]  r10:c190ff5c r9:c1303d00 r8:c137b4c0 r7:c190e000 r6:c137b69e r5:c137b690
        [ 1670.169251]  r4:c137b69c
        [ 1670.169253] [<c01a3cb0>] (rcu_gp_kthread) from [<c0153b14>] (kthread+0x16c/0x1a0)
        [ 1670.169268]  r7:00000000
        [ 1670.169271] [<c01539a8>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
        [ 1670.169282] Exception stack(0xc190ffb0 to 0xc190fff8)
        [ 1670.169288] ffa0:                                     ???????? ???????? ???????? ????????
        [ 1670.169293] ffc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
        [ 1670.169297] ffe0: ???????? ???????? ???????? ???????? ???????? ????????
        [ 1670.169305]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01539a8
        [ 1670.169310]  r4:c19320c0 r3:00000000
        [ 1670.169313] rcu: Stack dump where RCU GP kthread last ran:
        [ 1670.169316] Sending NMI from CPU 0 to CPUs 1:
        [ 1670.169327] NMI backtrace for cpu 1
        [ 1670.169335] CPU: 1 PID: 7764 Comm: stress-ng-cpu Tainted: G        W         5.15.0-rc2+ #5
        [ 1670.169343] Hardware name: Marvell Armada 370/XP (Device Tree)
        [ 1670.169346] PC is at 0x4bde7a
        [ 1670.169354] LR is at 0x4bdf21
        [ 1670.169359] pc : [<004bde7a>]    lr : [<004bdf21>]    psr: 20030030
        [ 1670.169363] sp : beb8270c  ip : 00004650  fp : beb8289c
        [ 1670.169367] r10: 00e5e800  r9 : 00514760  r8 : 0000036b
        [ 1670.169371] r7 : beb828a8  r6 : 000001f7  r5 : 000001fd  r4 : 000bacd7
        [ 1670.169375] r3 : 004bde30  r2 : 0000000b  r1 : 000001fd  r0 : 0001bbd7
        [ 1670.169380] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
        [ 1670.169386] Control: 10c5387d  Table: 0334806a  DAC: 00000055
        [ 1670.169389] CPU: 1 PID: 7764 Comm: stress-ng-cpu Tainted: G        W         5.15.0-rc2+ #5
        [ 1670.169395] Hardware name: Marvell Armada 370/XP (Device Tree)
        [ 1670.169398] Backtrace:
        [ 1670.169402] [<c0a0b758>] (dump_backtrace) from [<c0a0b9a4>] (show_stack+0x20/0x24)
        [ 1670.169418]  r7:c18db400 r6:c7875fb0 r5:60030193 r4:c1099c7c
        [ 1670.169421] [<c0a0b984>] (show_stack) from [<c0a11988>] (dump_stack_lvl+0x48/0x54)
        [ 1670.169433] [<c0a11940>] (dump_stack_lvl) from [<c0a119ac>] (dump_stack+0x18/0x1c)
        [ 1670.169445]  r5:00000001 r4:20030193
        [ 1670.169447] [<c0a11994>] (dump_stack) from [<c0109984>] (show_regs+0x1c/0x20)
        [ 1670.169461] [<c0109968>] (show_regs) from [<c05f6af8>] (nmi_cpu_backtrace+0xc0/0x10c)
        [ 1670.169474] [<c05f6a38>] (nmi_cpu_backtrace) from [<c010ffa4>] (do_handle_IPI+0x54/0x3b8)
        [ 1670.169489]  r7:c18db400 r6:00000017 r5:00000001 r4:00000007
        [ 1670.169491] [<c010ff50>] (do_handle_IPI) from [<c0110330>] (ipi_handler+0x28/0x30)
        [ 1670.169505]  r10:c7875f58 r9:c7875fb0 r8:c7875f30 r7:c18db400 r6:00000017 r5:c13ecadc
        [ 1670.169509]  r4:c18d9300 r3:00000010
        [ 1670.169511] [<c0110308>] (ipi_handler) from [<c0193200>] (handle_percpu_devid_irq+0xb4/0x288)
        [ 1670.169525] [<c019314c>] (handle_percpu_devid_irq) from [<c018c4b4>] (handle_domain_irq+0x8c/0xc0)
        [ 1670.169539]  r9:c7875fb0 r8:00000007 r7:00000000 r6:c1863d80 r5:00000000 r4:c12781e0
        [ 1670.169542] [<c018c428>] (handle_domain_irq) from [<c01012cc>] (armada_370_xp_handle_irq+0xdc/0x124)
        [ 1670.169556]  r10:00e5e800 r9:00514760 r8:10c5387d r7:c147d604 r6:c7875fb0 r5:000003fe
        [ 1670.169560]  r4:00000007 r3:00000007
        [ 1670.169562] [<c01011f0>] (armada_370_xp_handle_irq) from [<c0100e58>] (__irq_usr+0x58/0x80)
        [ 1670.169571] Exception stack(0xc7875fb0 to 0xc7875ff8)
        [ 1670.169576] 5fa0:                                     ???????? ???????? ???????? ????????
        [ 1670.169580] 5fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
        [ 1670.169584] 5fe0: ???????? ???????? ???????? ???????? ???????? ????????
        [ 1670.169590]  r7:10c5387d r6:ffffffff r5:20030030 r4:004bde7a
        [ 1690.589098] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 38s!
        [ 1690.589133] Showing busy workqueues and worker pools:
        [ 1690.589138] workqueue events_unbound: flags=0x2
        [ 1690.589142]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=3/512 refcnt=5
        [ 1690.589157]     in-flight: 7:call_usermodehelper_exec_work
        [ 1690.589177]     pending: flush_memcg_stats_work, flush_memcg_stats_dwork
        [ 1690.589198] workqueue events_power_efficient: flags=0x80
        [ 1690.589203]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
        [ 1690.589218]     in-flight: 53:fb_flashcursor fb_flashcursor
        [ 1690.589236]     pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean
        [ 1690.589265] workqueue mm_percpu_wq: flags=0x8
        [ 1690.589269]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
        [ 1690.589284]     pending: vmstat_update
        [ 1690.589301] workqueue edac-poller: flags=0xa000a
        [ 1690.589305]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1 refcnt=4
        [ 1690.589318]     pending: edac_mc_workq_function
        [ 1690.589331]     inactive: edac_device_workq_function
        [ 1690.589346] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=38s workers=3 idle: 7621 6478
        [ 1690.589370] pool 4: cpus=0-1 flags=0x4 nice=0 hung=41s workers=3 idle: 6967 5672
        [ 1721.313097] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 69s!
        [ 1721.313136] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 72s!
        [ 1721.313149] Showing busy workqueues and worker pools:
        [ 1721.313154] workqueue events_unbound: flags=0x2
        [ 1721.313158]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=3/512 refcnt=5
        [ 1721.313173]     in-flight: 7:call_usermodehelper_exec_work
        [ 1721.313193]     pending: flush_memcg_stats_work, flush_memcg_stats_dwork
        [ 1721.313213] workqueue events_power_efficient: flags=0x80
        [ 1721.313218]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
        [ 1721.313234]     in-flight: 53:fb_flashcursor fb_flashcursor
        [ 1721.313251]     pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean
        [ 1721.313282] workqueue mm_percpu_wq: flags=0x8
        [ 1721.313285]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
        [ 1721.313301]     pending: vmstat_update
        [ 1721.313319] workqueue edac-poller: flags=0xa000a
        [ 1721.313323]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1 refcnt=4
        [ 1721.313336]     pending: edac_mc_workq_function
        [ 1721.313349]     inactive: edac_device_workq_function
        [ 1721.313366] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=69s workers=3 idle: 7621 6478
        [ 1721.313390] pool 4: cpus=0-1 flags=0x4 nice=0 hung=72s workers=3 idle: 6967 5672
        [ 1733.189086] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        [ 1733.189101] 	(detected by 0, t=21007 jiffies, g=50257, q=13112)
        [ 1733.189111] rcu: All QSes seen, last rcu_preempt kthread activity 21007 (358298-337291), jiffies_till_next_fqs=1, root ->qsmask 0x0
        [ 1733.189119] rcu: rcu_preempt kthread timer wakeup didn't happen for 21006 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
        [ 1733.189126] rcu: 	Possible timer handling issue on cpu=1 timer-softirq=20834
        [ 1733.189131] rcu: rcu_preempt kthread starved for 21007 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
        [ 1733.189138] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
        [ 1733.189141] rcu: RCU grace-period kthread stack dump:
        [ 1733.189144] task:rcu_preempt     state:R stack:    0 pid:   13 ppid:     2 flags:0x00000000
        [ 1733.189156] Backtrace:
        [ 1733.189162] [<c0a19c20>] (__schedule) from [<c0a1a458>] (schedule+0x64/0x110)
        [ 1733.189184]  r10:00000001 r9:c190e000 r8:c137b690 r7:c137b69c r6:c190fed4 r5:c190e000
        [ 1733.189188]  r4:c197c880
        [ 1733.189191] [<c0a1a3f4>] (schedule) from [<c0a20048>] (schedule_timeout+0xa8/0x1c0)
        [ 1733.189205]  r5:c1303d00 r4:0005258c
        [ 1733.189208] [<c0a1ffa0>] (schedule_timeout) from [<c01a1664>] (rcu_gp_fqs_loop+0x120/0x3ac)
        [ 1733.189226]  r7:c137b69c r6:c1303d00 r5:c137b4c0 r4:00000000
        [ 1733.189229] [<c01a1544>] (rcu_gp_fqs_loop) from [<c01a3dac>] (rcu_gp_kthread+0xfc/0x1b0)
        [ 1733.189246]  r10:c190ff5c r9:c1303d00 r8:c137b4c0 r7:c190e000 r6:c137b69e r5:c137b690
        [ 1733.189249]  r4:c137b69c
        [ 1733.189252] [<c01a3cb0>] (rcu_gp_kthread) from [<c0153b14>] (kthread+0x16c/0x1a0)
        [ 1733.189267]  r7:00000000
        [ 1733.189270] [<c01539a8>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
        [ 1733.189281] Exception stack(0xc190ffb0 to 0xc190fff8)
        [ 1733.189287] ffa0:                                     ???????? ???????? ???????? ????????
        [ 1733.189292] ffc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
        [ 1733.189297] ffe0: ???????? ???????? ???????? ???????? ???????? ????????
        [ 1733.189304]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01539a8
        [ 1733.189309]  r4:c19320c0 r3:00000000
        [ 1733.189312] rcu: Stack dump where RCU GP kthread last ran:
        [ 1733.189315] Sending NMI from CPU 0 to CPUs 1:
        [ 1733.189327] NMI backtrace for cpu 1
        [ 1733.189335] CPU: 1 PID: 7755 Comm: stress-ng-cpu Tainted: G        W         5.15.0-rc2+ #5
        [ 1733.189343] Hardware name: Marvell Armada 370/XP (Device Tree)
        [ 1733.189346] PC is at 0x4bdee0
        [ 1733.189354] LR is at 0x4bdf21
        [ 1733.189358] pc : [<004bdee0>]    lr : [<004bdf21>]    psr: 20030030
        [ 1733.189363] sp : beb8270c  ip : 00004650  fp : beb8289c
        [ 1733.189367] r10: 00e5e800  r9 : 00514760  r8 : 00000358
        [ 1733.189370] r7 : beb828a8  r6 : 00000047  r5 : 0000004d  r4 : 000b2ab7
        [ 1733.189375] r3 : 004bde10  r2 : 00001217  r1 : 0000004f  r0 : 00000085
        [ 1733.189379] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
        [ 1733.189385] Control: 10c5387d  Table: 0734006a  DAC: 00000055
        [ 1733.189389] CPU: 1 PID: 7755 Comm: stress-ng-cpu Tainted: G        W         5.15.0-rc2+ #5
        [ 1733.189395] Hardware name: Marvell Armada 370/XP (Device Tree)
        [ 1733.189397] Backtrace:
        [ 1733.189402] [<c0a0b758>] (dump_backtrace) from [<c0a0b9a4>] (show_stack+0x20/0x24)
        [ 1733.189417]  r7:c18db400 r6:c7375fb0 r5:60030193 r4:c1099c7c
        [ 1733.189420] [<c0a0b984>] (show_stack) from [<c0a11988>] (dump_stack_lvl+0x48/0x54)
        [ 1733.189432] [<c0a11940>] (dump_stack_lvl) from [<c0a119ac>] (dump_stack+0x18/0x1c)
        [ 1733.189444]  r5:00000001 r4:20030193
        [ 1733.189446] [<c0a11994>] (dump_stack) from [<c0109984>] (show_regs+0x1c/0x20)
        [ 1733.189460] [<c0109968>] (show_regs) from [<c05f6af8>] (nmi_cpu_backtrace+0xc0/0x10c)
        [ 1733.189473] [<c05f6a38>] (nmi_cpu_backtrace) from [<c010ffa4>] (do_handle_IPI+0x54/0x3b8)
        [ 1733.189488]  r7:c18db400 r6:00000017 r5:00000001 r4:00000007
        [ 1733.189490] [<c010ff50>] (do_handle_IPI) from [<c0110330>] (ipi_handler+0x28/0x30)
        [ 1733.189504]  r10:c7375f58 r9:c7375fb0 r8:c7375f30 r7:c18db400 r6:00000017 r5:c13ecadc
        [ 1733.189508]  r4:c18d9300 r3:00000010
        [ 1733.189510] [<c0110308>] (ipi_handler) from [<c0193200>] (handle_percpu_devid_irq+0xb4/0x288)
        [ 1733.189523] [<c019314c>] (handle_percpu_devid_irq) from [<c018c4b4>] (handle_domain_irq+0x8c/0xc0)
        [ 1733.189538]  r9:c7375fb0 r8:00000007 r7:00000000 r6:c1863d80 r5:00000000 r4:c12781e0
        [ 1733.189540] [<c018c428>] (handle_domain_irq) from [<c01012cc>] (armada_370_xp_handle_irq+0xdc/0x124)
        [ 1733.189555]  r10:00e5e800 r9:00514760 r8:10c5387d r7:c147d604 r6:c7375fb0 r5:000003fe
        [ 1733.189559]  r4:00000007 r3:00000007
        [ 1733.189561] [<c01011f0>] (armada_370_xp_handle_irq) from [<c0100e58>] (__irq_usr+0x58/0x80)
        [ 1733.189570] Exception stack(0xc7375fb0 to 0xc7375ff8)
        [ 1733.189575] 5fa0:                                     ???????? ???????? ???????? ????????
        [ 1733.189579] 5fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
        [ 1733.189583] 5fe0: ???????? ???????? ???????? ???????? ???????? ????????
        [ 1733.189589]  r7:10c5387d r6:ffffffff r5:20030030 r4:004bdee0
        [ 1752.029102] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 100s!
        [ 1752.029137] Showing busy workqueues and worker pools:
        [ 1752.029141] workqueue events_unbound: flags=0x2
        [ 1752.029146]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=3/512 refcnt=5
        [ 1752.029161]     in-flight: 7:call_usermodehelper_exec_work
        [ 1752.029180]     pending: flush_memcg_stats_work, flush_memcg_stats_dwork
        [ 1752.029200] workqueue events_power_efficient: flags=0x80
        [ 1752.029205]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
        [ 1752.029221]     in-flight: 53:fb_flashcursor fb_flashcursor
        [ 1752.029239]     pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean
        [ 1752.029269] workqueue mm_percpu_wq: flags=0x8
        [ 1752.029272]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
        [ 1752.029288]     pending: vmstat_update
        [ 1752.029306] workqueue edac-poller: flags=0xa000a
        [ 1752.029310]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1 refcnt=4
        [ 1752.029323]     pending: edac_mc_workq_function
        [ 1752.029337]     inactive: edac_device_workq_function
        [ 1752.029353] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=100s workers=3 idle: 7621 6478
        [ 1752.029378] pool 4: cpus=0-1 flags=0x4 nice=0 hung=102s workers=3 idle: 6967 5672
        stress-ng: info:  [7740] successful run completed in 125.31s (2 mins, 5.31 secs)

Earlier kernels (i.e v5.13.9) completely froze the machine resulting in
the watchdog triggering and rebooting the machine. So, $something was
already fixed here.

Bisecting leads to the mentioned commit, reverting of the commit results
in a BUG-less run of the stress-ng test.
Any idea what might cause this and how to fix it?


Best regards,
Steffen Trumtrar

--
Pengutronix e.K.                | Dipl.-Inform. Steffen Trumtrar |
Steuerwalder Str. 21            | https://www.pengutronix.de/    |
31137 Hildesheim, Germany       | Phone: +49-5121-206917-0       |
Amtsgericht Hildesheim, HRA 2686| Fax:   +49-5121-206917-5555    |

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] irqchip: armada-370-xp: workqueue lockup
  2021-09-21  8:40 [BUG] irqchip: armada-370-xp: workqueue lockup Steffen Trumtrar
@ 2021-09-21 15:18 ` Marc Zyngier
  2021-09-22  6:49   ` Steffen Trumtrar
  2021-09-22 13:27 ` [irqchip: irq/irqchip-fixes] irqchip/armada-370-xp: Fix ack/eoi breakage irqchip-bot for Marc Zyngier
  1 sibling, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2021-09-21 15:18 UTC (permalink / raw)
  To: Steffen Trumtrar
  Cc: Valentin Schneider, Andrew Lunn, Gregory Clement,
	Sebastion Hesselbarth, linux-arm-kernel

Hi Steffen,

On Tue, 21 Sep 2021 09:40:59 +0100,
Steffen Trumtrar <s.trumtrar@pengutronix.de> wrote:
> 
> 
> Hi,
> 
> I noticed that after the patch
> 
>         e52e73b7e9f7d08b8c2ef6fb1657105093e22a03
>         From: Valentin Schneider <valentin.schneider@arm.com>
>         Date: Mon, 9 Nov 2020 09:41:18 +0000
>         Subject: [PATCH] irqchip/armada-370-xp: Make IPIs use
>         handle_percpu_devid_irq()
> 
>         As done for the Arm GIC irqchips, move IPIs to handle_percpu_devid_irq() as
>         handle_percpu_devid_fasteoi_ipi() isn't actually required.
> 
>         Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>         Signed-off-by: Marc Zyngier <maz@kernel.org>
>         Link: https://lore.kernel.org/r/20201109094121.29975-3-valentin.schneider@arm.com
>         ---
>         drivers/irqchip/irq-armada-370-xp.c | 2 +-
>         1 file changed, 1 insertion(+), 1 deletion(-)
> 
>         diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
>         index d7eb2e93db8f..32938dfc0e46 100644
>         --- a/drivers/irqchip/irq-armada-370-xp.c
>         +++ b/drivers/irqchip/irq-armada-370-xp.c
>         @@ -382,7 +382,7 @@ static int armada_370_xp_ipi_alloc(struct irq_domain *d,
>                         irq_set_percpu_devid(virq + i);
>                         irq_domain_set_info(d, virq + i, i, &ipi_irqchip,
>                                         d->host_data,
>         -                                   handle_percpu_devid_fasteoi_ipi,
>         +                                   handle_percpu_devid_irq,
>                                         NULL, NULL);
>                 }
> 
> I get workqueue lockups on my Armada-XP based board.
> When I run the following test on v5.15-rc2
> 
>         stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 120s
>
> I get a backtrace like this:

[...]

> Earlier kernels (i.e v5.13.9) completely froze the machine resulting in
> the watchdog triggering and rebooting the machine. So, $something was
> already fixed here.

Fixed? Or broken? More likely the later.

> Bisecting leads to the mentioned commit, reverting of the commit results
> in a BUG-less run of the stress-ng test.
> Any idea what might cause this and how to fix it?

It isn't obvious to me how reverting this patch fixes anything.  The
fasteoi flow does the same thing as far as the IPI driver is concerned

However, it appears that I have broken that part much earlier in
f02147dd02eb ("irqchip/armada-370-xp: Configure IPIs as standard
interrupts"), as the write to ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS that
used to occur before the handling (an ACK) has now been moved after as
an EOI. That's a pretty good way to lose edge interrupts.

Could you try the following patch on top of 5.12-rc2?

Thanks,

	M.

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 7557ab551295..53e0fb0562c1 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -359,16 +359,16 @@ static void armada_370_xp_ipi_send_mask(struct irq_data *d,
 		ARMADA_370_XP_SW_TRIG_INT_OFFS);
 }
 
-static void armada_370_xp_ipi_eoi(struct irq_data *d)
+static void armada_370_xp_ipi_ack(struct irq_data *d)
 {
 	writel(~BIT(d->hwirq), per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
 }
 
 static struct irq_chip ipi_irqchip = {
 	.name		= "IPI",
+	.irq_ack	= armada_370_xp_ipi_ack,
 	.irq_mask	= armada_370_xp_ipi_mask,
 	.irq_unmask	= armada_370_xp_ipi_unmask,
-	.irq_eoi	= armada_370_xp_ipi_eoi,
 	.ipi_send_mask	= armada_370_xp_ipi_send_mask,
 };
 

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] irqchip: armada-370-xp: workqueue lockup
  2021-09-21 15:18 ` Marc Zyngier
@ 2021-09-22  6:49   ` Steffen Trumtrar
  2021-09-22  8:12     ` Marc Zyngier
  0 siblings, 1 reply; 6+ messages in thread
From: Steffen Trumtrar @ 2021-09-22  6:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Steffen Trumtrar, Valentin Schneider, Andrew Lunn,
	Gregory Clement, Sebastion Hesselbarth, linux-arm-kernel


Hi,

Marc Zyngier <maz@kernel.org> writes:
> It isn't obvious to me how reverting this patch fixes anything.  The
> fasteoi flow does the same thing as far as the IPI driver is concerned
>

didn't the fasteoi flow just call the irq_eoi earlier? Same as the
irq_ack now?

>
> However, it appears that I have broken that part much earlier in
> f02147dd02eb ("irqchip/armada-370-xp: Configure IPIs as standard
> interrupts"), as the write to ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS that
> used to occur before the handling (an ACK) has now been moved after as
> an EOI. That's a pretty good way to lose edge interrupts.
>
> Could you try the following patch on top of 5.12-rc2?
>
> Thanks,
>
> 	M.
>
> diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> index 7557ab551295..53e0fb0562c1 100644
> --- a/drivers/irqchip/irq-armada-370-xp.c
> +++ b/drivers/irqchip/irq-armada-370-xp.c
> @@ -359,16 +359,16 @@ static void armada_370_xp_ipi_send_mask(struct irq_data *d,
>  		ARMADA_370_XP_SW_TRIG_INT_OFFS);
>  }
>
> -static void armada_370_xp_ipi_eoi(struct irq_data *d)
> +static void armada_370_xp_ipi_ack(struct irq_data *d)
>  {
>  	writel(~BIT(d->hwirq), per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
>  }
>
>  static struct irq_chip ipi_irqchip = {
>  	.name		= "IPI",
> +	.irq_ack	= armada_370_xp_ipi_ack,
>  	.irq_mask	= armada_370_xp_ipi_mask,
>  	.irq_unmask	= armada_370_xp_ipi_unmask,
> -	.irq_eoi	= armada_370_xp_ipi_eoi,
>  	.ipi_send_mask	= armada_370_xp_ipi_send_mask,
>  };

This fixes it, yes \o/


Best regards,
Steffen

--
Pengutronix e.K.                | Dipl.-Inform. Steffen Trumtrar |
Steuerwalder Str. 21            | https://www.pengutronix.de/    |
31137 Hildesheim, Germany       | Phone: +49-5121-206917-0       |
Amtsgericht Hildesheim, HRA 2686| Fax:   +49-5121-206917-5555    |

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] irqchip: armada-370-xp: workqueue lockup
  2021-09-22  6:49   ` Steffen Trumtrar
@ 2021-09-22  8:12     ` Marc Zyngier
  2021-09-22  8:24       ` Steffen Trumtrar
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2021-09-22  8:12 UTC (permalink / raw)
  To: Steffen Trumtrar
  Cc: Valentin Schneider, Andrew Lunn, Gregory Clement,
	Sebastion Hesselbarth, linux-arm-kernel

On Wed, 22 Sep 2021 07:49:05 +0100,
Steffen Trumtrar <s.trumtrar@pengutronix.de> wrote:
> 
> 
> Hi,
> 
> Marc Zyngier <maz@kernel.org> writes:
> > It isn't obvious to me how reverting this patch fixes anything.  The
> > fasteoi flow does the same thing as far as the IPI driver is concerned
> >
> 
> didn't the fasteoi flow just call the irq_eoi earlier? Same as the
> irq_ack now?

Yes, of course, you are correct. Another proof that the whole initial
fasteoi flow that used EOI as an ACK was *a bad idea* (tm).

> 
> >
> > However, it appears that I have broken that part much earlier in
> > f02147dd02eb ("irqchip/armada-370-xp: Configure IPIs as standard
> > interrupts"), as the write to ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS that
> > used to occur before the handling (an ACK) has now been moved after as
> > an EOI. That's a pretty good way to lose edge interrupts.
> >
> > Could you try the following patch on top of 5.12-rc2?
> >
> > Thanks,
> >
> > 	M.
> >
> > diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> > index 7557ab551295..53e0fb0562c1 100644
> > --- a/drivers/irqchip/irq-armada-370-xp.c
> > +++ b/drivers/irqchip/irq-armada-370-xp.c
> > @@ -359,16 +359,16 @@ static void armada_370_xp_ipi_send_mask(struct irq_data *d,
> >  		ARMADA_370_XP_SW_TRIG_INT_OFFS);
> >  }
> >
> > -static void armada_370_xp_ipi_eoi(struct irq_data *d)
> > +static void armada_370_xp_ipi_ack(struct irq_data *d)
> >  {
> >  	writel(~BIT(d->hwirq), per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
> >  }
> >
> >  static struct irq_chip ipi_irqchip = {
> >  	.name		= "IPI",
> > +	.irq_ack	= armada_370_xp_ipi_ack,
> >  	.irq_mask	= armada_370_xp_ipi_mask,
> >  	.irq_unmask	= armada_370_xp_ipi_unmask,
> > -	.irq_eoi	= armada_370_xp_ipi_eoi,
> >  	.ipi_send_mask	= armada_370_xp_ipi_send_mask,
> >  };
> 
> This fixes it, yes \o/

Thanks. Can I use this as a Tested-by: tag in the official patch?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] irqchip: armada-370-xp: workqueue lockup
  2021-09-22  8:12     ` Marc Zyngier
@ 2021-09-22  8:24       ` Steffen Trumtrar
  0 siblings, 0 replies; 6+ messages in thread
From: Steffen Trumtrar @ 2021-09-22  8:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Steffen Trumtrar, Valentin Schneider, Andrew Lunn,
	Gregory Clement, linux-arm-kernel


Marc Zyngier <maz@kernel.org> writes:

> On Wed, 22 Sep 2021 07:49:05 +0100,
> Steffen Trumtrar <s.trumtrar@pengutronix.de> wrote:
>>
>>
>> Hi,
>>
>> Marc Zyngier <maz@kernel.org> writes:
>> > It isn't obvious to me how reverting this patch fixes anything.  The
>> > fasteoi flow does the same thing as far as the IPI driver is concerned
>> >
>>
>> didn't the fasteoi flow just call the irq_eoi earlier? Same as the
>> irq_ack now?
>
> Yes, of course, you are correct. Another proof that the whole initial
> fasteoi flow that used EOI as an ACK was *a bad idea* (tm).
>
>>
>> >
>> > However, it appears that I have broken that part much earlier in
>> > f02147dd02eb ("irqchip/armada-370-xp: Configure IPIs as standard
>> > interrupts"), as the write to ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS that
>> > used to occur before the handling (an ACK) has now been moved after as
>> > an EOI. That's a pretty good way to lose edge interrupts.
>> >
>> > Could you try the following patch on top of 5.12-rc2?
>> >
>> > Thanks,
>> >
>> > 	M.
>> >
>> > diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
>> > index 7557ab551295..53e0fb0562c1 100644
>> > --- a/drivers/irqchip/irq-armada-370-xp.c
>> > +++ b/drivers/irqchip/irq-armada-370-xp.c
>> > @@ -359,16 +359,16 @@ static void armada_370_xp_ipi_send_mask(struct irq_data *d,
>> >  		ARMADA_370_XP_SW_TRIG_INT_OFFS);
>> >  }
>> >
>> > -static void armada_370_xp_ipi_eoi(struct irq_data *d)
>> > +static void armada_370_xp_ipi_ack(struct irq_data *d)
>> >  {
>> >  	writel(~BIT(d->hwirq), per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
>> >  }
>> >
>> >  static struct irq_chip ipi_irqchip = {
>> >  	.name		= "IPI",
>> > +	.irq_ack	= armada_370_xp_ipi_ack,
>> >  	.irq_mask	= armada_370_xp_ipi_mask,
>> >  	.irq_unmask	= armada_370_xp_ipi_unmask,
>> > -	.irq_eoi	= armada_370_xp_ipi_eoi,
>> >  	.ipi_send_mask	= armada_370_xp_ipi_send_mask,
>> >  };
>>
>> This fixes it, yes \o/
>
> Thanks. Can I use this as a Tested-by: tag in the official patch?
>

Yes, of course. Go ahead.


Thanks,
Steffen

--
Pengutronix e.K.                | Dipl.-Inform. Steffen Trumtrar |
Steuerwalder Str. 21            | https://www.pengutronix.de/    |
31137 Hildesheim, Germany       | Phone: +49-5121-206917-0       |
Amtsgericht Hildesheim, HRA 2686| Fax:   +49-5121-206917-5555    |

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [irqchip: irq/irqchip-fixes] irqchip/armada-370-xp: Fix ack/eoi breakage
  2021-09-21  8:40 [BUG] irqchip: armada-370-xp: workqueue lockup Steffen Trumtrar
  2021-09-21 15:18 ` Marc Zyngier
@ 2021-09-22 13:27 ` irqchip-bot for Marc Zyngier
  1 sibling, 0 replies; 6+ messages in thread
From: irqchip-bot for Marc Zyngier @ 2021-09-22 13:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steffen Trumtrar, Marc Zyngier, Valentin Schneider, stable, tglx

The following commit has been merged into the irq/irqchip-fixes branch of irqchip:

Commit-ID:     2a7313dc81e88adc7bb09d0f056985fa8afc2b89
Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/2a7313dc81e88adc7bb09d0f056985fa8afc2b89
Author:        Marc Zyngier <maz@kernel.org>
AuthorDate:    Wed, 22 Sep 2021 14:19:41 +01:00
Committer:     Marc Zyngier <maz@kernel.org>
CommitterDate: Wed, 22 Sep 2021 14:24:49 +01:00

irqchip/armada-370-xp: Fix ack/eoi breakage

When converting the driver to using handle_percpu_devid_irq,
we forgot to repaint the irq_eoi() callback into irq_ack(),
as handle_percpu_devid_fasteoi_ipi() was actually using EOI
really early in the handling. Yes this was a stupid idea.

Fix this by using the HW ack method as irq_ack().

Fixes: e52e73b7e9f7 ("irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()")
Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuiexq5f.fsf@pengutronix.de
---
 drivers/irqchip/irq-armada-370-xp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 7557ab5..53e0fb0 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -359,16 +359,16 @@ static void armada_370_xp_ipi_send_mask(struct irq_data *d,
 		ARMADA_370_XP_SW_TRIG_INT_OFFS);
 }
 
-static void armada_370_xp_ipi_eoi(struct irq_data *d)
+static void armada_370_xp_ipi_ack(struct irq_data *d)
 {
 	writel(~BIT(d->hwirq), per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
 }
 
 static struct irq_chip ipi_irqchip = {
 	.name		= "IPI",
+	.irq_ack	= armada_370_xp_ipi_ack,
 	.irq_mask	= armada_370_xp_ipi_mask,
 	.irq_unmask	= armada_370_xp_ipi_unmask,
-	.irq_eoi	= armada_370_xp_ipi_eoi,
 	.ipi_send_mask	= armada_370_xp_ipi_send_mask,
 };
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-22 13:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21  8:40 [BUG] irqchip: armada-370-xp: workqueue lockup Steffen Trumtrar
2021-09-21 15:18 ` Marc Zyngier
2021-09-22  6:49   ` Steffen Trumtrar
2021-09-22  8:12     ` Marc Zyngier
2021-09-22  8:24       ` Steffen Trumtrar
2021-09-22 13:27 ` [irqchip: irq/irqchip-fixes] irqchip/armada-370-xp: Fix ack/eoi breakage irqchip-bot for Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.