All of lore.kernel.org
 help / color / mirror / Atom feed
* serial: imx: Possible circular locking dependency
@ 2021-09-29 16:10 Fabio Estevam
  2021-09-29 18:35 ` Fabio Estevam
  0 siblings, 1 reply; 3+ messages in thread
From: Fabio Estevam @ 2021-09-29 16:10 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek; +Cc: linux-serial, rostedt, Marek Vasut

Hi Petr and Sergey,

I know this has been reported before [1] and [2], but I am still
observing the deadlock below
on an imx6q board since commit dbdda842fe96 ("printk: Add console
owner and waiter logic to load balance console writes").

To reproduce it:

# echo t > /proc/sysrq-trigger

[   20.325246] ======================================================
[   20.325252] WARNING: possible circular locking dependency detected
[   20.325260] 5.15.0-rc2-next-20210924-00004-gd2d6e664f29f-dirty #163
Not tainted
[   20.325273] ------------------------------------------------------
[   20.325279] sh/236 is trying to acquire lock:
[   20.325293] c1618614 (console_owner){-...}-{0:0}, at:
console_unlock+0x180/0x5bc
[   20.325361]
[   20.325361] but task is already holding lock:
[   20.325368] eefccc90 (&pool->lock){-.-.}-{2:2}, at:
show_workqueue_state+0x104/0x3c8
[   20.325432]
[   20.325432] which lock already depends on the new lock.
[   20.325432]
[   20.325439]
[   20.325439] the existing dependency chain (in reverse order) is:
[   20.325446]
[   20.325446] -> #3 (&pool->lock){-.-.}-{2:2}:
[   20.325475]        __queue_work+0x114/0x810
[   20.325497]        queue_work_on+0x54/0x94
[   20.325516]        put_pwq_unlocked.part.0+0x18/0x24
[   20.325535]        trace_eval_sync+0x18/0x2c
[   20.325560]        do_one_initcall+0x78/0x388
[   20.325582]        kernel_init_freeable+0x198/0x22c
[   20.325613]        kernel_init+0x10/0x128
[   20.325631]        ret_from_fork+0x14/0x38
[   20.325646]        0x0
[   20.325657]
[   20.325657] -> #2 (&pool->lock/1){-.-.}-{2:2}:
[   20.325690]        __queue_work+0x114/0x810
[   20.325710]        queue_work_on+0x54/0x94
[   20.325727]        __imx_uart_rxint.constprop.0+0x1b4/0x2e0
[   20.325760]        imx_uart_int+0x270/0x310
[   20.325778]        __handle_irq_event_percpu+0x48/0x34c
[   20.325796]        handle_irq_event_percpu+0x2c/0x84
[   20.325811]        handle_irq_event+0x38/0x5c
[   20.325824]        handle_fasteoi_irq+0xc8/0x180
[   20.325848]        handle_domain_irq+0x68/0xa4
[   20.325862]        gic_handle_irq+0x88/0xb0
[   20.325881]        __irq_svc+0x60/0x80
[   20.325896]        cpuidle_enter_state+0x138/0x5a4
[   20.325915]        cpuidle_enter_state+0x138/0x5a4
[   20.325928]        cpuidle_enter+0x28/0x38
[   20.325941]        do_idle+0x210/0x2bc
[   20.325966]        cpu_startup_entry+0x18/0x1c
[   20.325982]        start_kernel+0x5d4/0x6d8
[   20.326005]
[   20.326005] -> #1 (&port_lock_key){-...}-{2:2}:
[   20.326033]        imx_uart_console_write+0x168/0x1ac
[   20.326059]        console_unlock+0x2a4/0x5bc
[   20.326083]        register_console+0x18c/0x2ec
[   20.326102]        uart_add_one_port+0x4b4/0x50c
[   20.326121]        imx_uart_probe+0x4ac/0x728
[   20.326140]        platform_probe+0x58/0xb8
[   20.326162]        really_probe+0x14c/0x320
[   20.326192]        __driver_probe_device+0x80/0xe4
[   20.326214]        driver_probe_device+0x30/0xd4
[   20.326234]        __driver_attach+0x70/0xf4
[   20.326255]        bus_for_each_dev+0x74/0xc0
[   20.326277]        bus_add_driver+0x150/0x1dc
[   20.326298]        driver_register+0x74/0x108
[   20.326313]        imx_uart_init+0x20/0x40
[   20.326341]        do_one_initcall+0x78/0x388
[   20.326358]        kernel_init_freeable+0x198/0x22c
[   20.326380]        kernel_init+0x10/0x128
[   20.326396]        ret_from_fork+0x14/0x38
[   20.326410]        0x0
[   20.326419]
[   20.326419] -> #0 (console_owner){-...}-{0:0}:
[   20.326445]        lock_acquire+0x130/0x400
[   20.326462]        console_unlock+0x1d0/0x5bc
[   20.326481]        vprintk_emit+0x1f0/0x2e8
[   20.326503]        vprintk_default+0x20/0x28
[   20.326522]        _printk+0x2c/0x5c
[   20.326553]        show_pwq+0x58/0x2b0
[   20.326572]        show_workqueue_state+0x11c/0x3c8
[   20.326597]        __handle_sysrq+0xf4/0x210
[   20.326619]        write_sysrq_trigger+0x6c/0x84
[   20.326636]        proc_reg_write+0xa8/0xcc
[   20.326667]        vfs_write+0xb8/0x544
[   20.326697]        ksys_write+0x60/0xec
[   20.326717]        ret_fast_syscall+0x0/0x1c
[   20.326733]        0xbec5b950
[   20.326743]
[   20.326743] other info that might help us debug this:
[   20.326743]
[   20.326750] Chain exists of:
[   20.326750]   console_owner --> &pool->lock/1 --> &pool->lock
[   20.326750]
[   20.326788]  Possible unsafe locking scenario:
[   20.326788]
[   20.326792]        CPU0                    CPU1
[   20.326797]        ----                    ----
[   20.326803]   lock(&pool->lock);
[   20.326814]                                lock(&pool->lock/1);
[   20.326833]                                lock(&pool->lock);
[   20.326846]   lock(console_owner);
[   20.326858]
[   20.326858]  *** DEADLOCK ***
[   20.326858]
[   20.326862] 5 locks held by sh/236:
[   20.326876]  #0: c2abd284 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x60/0xec
[   20.326933]  #1: c16e0d2c (rcu_read_lock){....}-{1:2}, at:
__handle_sysrq+0x0/0x210
[   20.326984]  #2: c16e0d2c (rcu_read_lock){....}-{1:2}, at:
show_workqueue_state+0x0/0x3c8
[   20.327035]  #3: eefccc90 (&pool->lock){-.-.}-{2:2}, at:
show_workqueue_state+0x104/0x3c8
[   20.327081]  #4: c161858c (console_lock){+.+.}-{0:0}, at:
vprintk_emit+0xd8/0x2e8
[   20.327133]
[   20.327133] stack backtrace:
[   20.327141] CPU: 1 PID: 236 Comm: sh Not tainted
5.15.0-rc2-next-20210924-00004-gd2d6e664f29f-dirty #163
[   20.327163] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   20.327177] [<c0111918>] (unwind_backtrace) from [<c010be08>]
(show_stack+0x10/0x14)
[   20.327224] [<c010be08>] (show_stack) from [<c0e87838>]
(dump_stack_lvl+0x58/0x70)
[   20.327260] [<c0e87838>] (dump_stack_lvl) from [<c0189520>]
(check_noncircular+0xf4/0x168)
[   20.327290] [<c0189520>] (check_noncircular) from [<c018d3a8>]
(__lock_acquire+0x167c/0x31e8)
[   20.327320] [<c018d3a8>] (__lock_acquire) from [<c018f97c>]
(lock_acquire+0x130/0x400)
[   20.327347] [<c018f97c>] (lock_acquire) from [<c01984d0>]
(console_unlock+0x1d0/0x5bc)
[   20.327381] [<c01984d0>] (console_unlock) from [<c019a778>]
(vprintk_emit+0x1f0/0x2e8)
[   20.327416] [<c019a778>] (vprintk_emit) from [<c019a890>]
(vprintk_default+0x20/0x28)
[   20.327453] [<c019a890>] (vprintk_default) from [<c0e83744>]
(_printk+0x2c/0x5c)
[   20.327498] [<c0e83744>] (_printk) from [<c0e826c4>] (show_pwq+0x58/0x2b0)
[   20.327534] [<c0e826c4>] (show_pwq) from [<c014acfc>]
(show_workqueue_state+0x11c/0x3c8)
[   20.327576] [<c014acfc>] (show_workqueue_state) from [<c06cedd8>]
(__handle_sysrq+0xf4/0x210)
[   20.327617] [<c06cedd8>] (__handle_sysrq) from [<c06cf3dc>]
(write_sysrq_trigger+0x6c/0x84)
[   20.327648] [<c06cf3dc>] (write_sysrq_trigger) from [<c03707d0>]
(proc_reg_write+0xa8/0xcc)
[   20.327692] [<c03707d0>] (proc_reg_write) from [<c02cfa40>]
(vfs_write+0xb8/0x544)
[   20.327733] [<c02cfa40>] (vfs_write) from [<c02d001c>] (ksys_write+0x60/0xec)
[   20.327770] [<c02d001c>] (ksys_write) from [<c0100080>]
(ret_fast_syscall+0x0/0x1c)
[   20.327803] Exception stack(0xc2a55fa8 to 0xc2a55ff0)
[   20.327824] 5fa0:                   00000001 0128cf28 00000001
0128cf28 00000002 00000000
[   20.327846] 5fc0: 00000001 0128cf28 00000002 00000004 00585444
bec5b98c 00000020 0128b008
[   20.327861] 5fe0: 00584c60 bec5b950 004bd630 b6f66a3c

[1] https://lore.kernel.org/lkml/20190917141034.gvjg7bgylqbbxyv7@pengutronix.de/
[2] https://lore.kernel.org/linux-serial/1574416632-32321-1-git-send-email-fugang.duan@nxp.com/

Petr has suggested a way to fix it:
https://lore.kernel.org/lkml/20190926085855.debu7t46s7kgb26p@pathway.suse.cz/

but I am not familiar with the serial core code enough to give it a
try at implementing it.

Would it be possible for you to cook a patch to fix this issue so I
could try it?

Thanks a lot,

Fabio Estevam

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: serial: imx: Possible circular locking dependency
  2021-09-29 16:10 serial: imx: Possible circular locking dependency Fabio Estevam
@ 2021-09-29 18:35 ` Fabio Estevam
  2021-09-29 20:50   ` Fabio Estevam
  0 siblings, 1 reply; 3+ messages in thread
From: Fabio Estevam @ 2021-09-29 18:35 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek; +Cc: linux-serial, rostedt, Marek Vasut

On Wed, Sep 29, 2021 at 1:10 PM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Petr and Sergey,
>
> I know this has been reported before [1] and [2], but I am still
> observing the deadlock below
> on an imx6q board since commit dbdda842fe96 ("printk: Add console
> owner and waiter logic to load balance console writes").
>
> To reproduce it:
>
> # echo t > /proc/sysrq-trigger
>
> [   20.325246] ======================================================
> [   20.325252] WARNING: possible circular locking dependency detected
> [   20.325260] 5.15.0-rc2-next-20210924-00004-gd2d6e664f29f-dirty #163

If I move the __imx_uart_rxint() block outside the spin_lock() like this:

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 8b121cd869e9..c94704f5dd99 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -955,13 +955,6 @@ static irqreturn_t imx_uart_int(int irq, void *dev_id)
        if ((ucr4 & UCR4_OREN) == 0)
                usr2 &= ~USR2_ORE;

-       if (usr1 & (USR1_RRDY | USR1_AGTIM)) {
-               imx_uart_writel(sport, USR1_AGTIM, USR1);
-
-               __imx_uart_rxint(irq, dev_id);
-               ret = IRQ_HANDLED;
-       }
-
        if ((usr1 & USR1_TRDY) || (usr2 & USR2_TXDC)) {
                imx_uart_transmit_buffer(sport);
                ret = IRQ_HANDLED;
@@ -993,6 +986,13 @@ static irqreturn_t imx_uart_int(int irq, void *dev_id)

        spin_unlock(&sport->port.lock);

+       if (usr1 & (USR1_RRDY | USR1_AGTIM)) {
+               imx_uart_writel(sport, USR1_AGTIM, USR1);
+
+               __imx_uart_rxint(irq, dev_id);
+               ret = IRQ_HANDLED;
+       }
+
        return ret;
 }

Then the problem does not happen anymore.

Is this a proper fix?

Thanks

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: serial: imx: Possible circular locking dependency
  2021-09-29 18:35 ` Fabio Estevam
@ 2021-09-29 20:50   ` Fabio Estevam
  0 siblings, 0 replies; 3+ messages in thread
From: Fabio Estevam @ 2021-09-29 20:50 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek; +Cc: linux-serial, rostedt, Marek Vasut

On Wed, Sep 29, 2021 at 3:35 PM Fabio Estevam <festevam@gmail.com> wrote:
>
> On Wed, Sep 29, 2021 at 1:10 PM Fabio Estevam <festevam@gmail.com> wrote:
> >
> > Hi Petr and Sergey,
> >
> > I know this has been reported before [1] and [2], but I am still
> > observing the deadlock below
> > on an imx6q board since commit dbdda842fe96 ("printk: Add console
> > owner and waiter logic to load balance console writes").
> >
> > To reproduce it:
> >
> > # echo t > /proc/sysrq-trigger
> >
> > [   20.325246] ======================================================
> > [   20.325252] WARNING: possible circular locking dependency detected
> > [   20.325260] 5.15.0-rc2-next-20210924-00004-gd2d6e664f29f-dirty #163
>
> If I move the __imx_uart_rxint() block outside the spin_lock() like this:
>
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index 8b121cd869e9..c94704f5dd99 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -955,13 +955,6 @@ static irqreturn_t imx_uart_int(int irq, void *dev_id)
>         if ((ucr4 & UCR4_OREN) == 0)
>                 usr2 &= ~USR2_ORE;
>
> -       if (usr1 & (USR1_RRDY | USR1_AGTIM)) {
> -               imx_uart_writel(sport, USR1_AGTIM, USR1);
> -
> -               __imx_uart_rxint(irq, dev_id);
> -               ret = IRQ_HANDLED;
> -       }
> -
>         if ((usr1 & USR1_TRDY) || (usr2 & USR2_TXDC)) {
>                 imx_uart_transmit_buffer(sport);
>                 ret = IRQ_HANDLED;
> @@ -993,6 +986,13 @@ static irqreturn_t imx_uart_int(int irq, void *dev_id)
>
>         spin_unlock(&sport->port.lock);
>
> +       if (usr1 & (USR1_RRDY | USR1_AGTIM)) {
> +               imx_uart_writel(sport, USR1_AGTIM, USR1);
> +
> +               __imx_uart_rxint(irq, dev_id);
> +               ret = IRQ_HANDLED;
> +       }
> +
>         return ret;
>  }
>
> Then the problem does not happen anymore.
>
> Is this a proper fix?

It seems that a proper fix would be to use the same approach as in

5697df7322fe ("serial: fsl_lpuart: split sysrq handling):

I will send a proper patch with this solution.

Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-29 20:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29 16:10 serial: imx: Possible circular locking dependency Fabio Estevam
2021-09-29 18:35 ` Fabio Estevam
2021-09-29 20:50   ` Fabio Estevam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.