linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kdb: Fix the deadlock issue in KDB debugging.
@ 2024-02-28  2:56 LiuYe
  2024-02-28 12:05 ` Daniel Thompson
  2024-03-02 20:44 ` [PATCH] " Greg KH
  0 siblings, 2 replies; 48+ messages in thread
From: LiuYe @ 2024-02-28  2:56 UTC (permalink / raw)
  To: jason.wessel, daniel.thompson
  Cc: dianders, gregkh, jirislaby, kgdb-bugreport, linux-kernel,
	linux-serial, liu.yeC

master cpu : After executing the go command, a deadlock occurs.
slave cpu: may be performing thread migration,
        acquiring the running queue lock of master CPU.
        Then it was interrupted by kdb NMI and entered the nmi_handler process.
        (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
        while(1){ touch wathcdog}.)

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

Signed-off-by: LiuYe <liu.yeC@h3c.com>
---
 drivers/tty/serial/kgdboc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..945318ef1 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,9 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/smp.h>
+
+#include "../kernel/sched/sched.h"

 #define MAX_CONFIG_LEN         40

@@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
                dbg_restore_graphics = 0;
                con_debug_leave();
        }
-       kgdboc_restore_input();
+       if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
+               kgdboc_restore_input();
 }

 static struct kgdb_io kgdboc_io_ops = {
--
2.25.1

-------------------------------------------------------------------------------------------------------------------------------------
±¾Óʼþ¼°Æ丽¼þº¬ÓÐлªÈý¼¯Íŵı£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгö
µÄ¸öÈË»òȺ×é¡£½ûÖ¹ÈκÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØй¶¡¢¸´ÖÆ¡¢
»òÉ¢·¢£©±¾ÓʼþÖеÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾
Óʼþ£¡
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-02-28  2:56 [PATCH] kdb: Fix the deadlock issue in KDB debugging LiuYe
@ 2024-02-28 12:05 ` Daniel Thompson
  2024-03-01  3:30   ` 答复: " Liuye
  2024-03-02 20:44 ` [PATCH] " Greg KH
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-02-28 12:05 UTC (permalink / raw)
  To: LiuYe
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> master cpu : After executing the go command, a deadlock occurs.
> slave cpu: may be performing thread migration,
>         acquiring the running queue lock of master CPU.
>         Then it was interrupted by kdb NMI and entered the nmi_handler process.
>         (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
>         while(1){ touch wathcdog}.)

I think this description is a little short and doesn't clearly explain
the cause. How about:

Currently, if kgdboc includes 'kdb', then kgdboc will attempt to
use schedule_work() to provoke a keyboard reset when transitioning out
of the debugger and back to normal operation. This can cause
deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slace CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().


> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80
>
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> ---
>  drivers/tty/serial/kgdboc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 7ce7bb164..945318ef1 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,9 @@
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
>  #include <linux/serial_core.h>
> +#include <linux/smp.h>
> +
> +#include "../kernel/sched/sched.h"
>
>  #define MAX_CONFIG_LEN         40
>
> @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
>                 dbg_restore_graphics = 0;
>                 con_debug_leave();
>         }
> -       kgdboc_restore_input();
> +       if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
> +               kgdboc_restore_input();

I don't think solving this by access internal scheduler state is the
right approach .

The description I wrote above perhaps already suggests why. The
deadlock occurs because it is unsafe to call schedule_work() from
the debug trap handler. The debug trap handler in your stack trace is not
running from an NMI but it certainly has NMI-like properties. Therefore
a better fix is not to call schedule_work() at all from the debug trap
handler.

Instead we need to use an NMI-safe API such as irq_work_queue() and that
irq_work can call schedule_work() and trigger the keyboard reset.


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-02-28 12:05 ` Daniel Thompson
@ 2024-03-01  3:30   ` Liuye
  2024-03-01 10:59     ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-01  3:30 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial, Liuye

>On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
>> master cpu : After executing the go command, a deadlock occurs.
>> slave cpu: may be performing thread migration,
>>         acquiring the running queue lock of master CPU.
>>         Then it was interrupted by kdb NMI and entered the nmi_handler process.
>>         (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
>>         while(1){ touch wathcdog}.)
>
>I think this description is a little short and doesn't clearly explain the cause. How about:
>
>Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use schedule_work() to provoke a keyboard reset when transitioning out of the debugger and back to normal operation. This can cause deadlock because schedule_work() is not NMI-safe.
>
>The stack trace below shows an example of the problem. In this case the master cpu is not running from NMI but it has parked the slace CPUs using an NMI and the parked CPUs is holding spinlocks needed by schedule_work().
>
>

Due to the brevity of the description, there may be some misunderstanding, so a detailed description is provided as follows:

before KDB command “go”:

When a specific key is detected by the serial port, it will trigger kgdb_breakpoint, and the master CPU0 will enter the kdb_main_loop to process user commands in a loop.

kgdb_breakpoint
int3
do_int3
notify_die
atomic_notifier_call_chain
__atomic_notifier_call_chain
notifier_call_chain
kgdb_notify
__kgdb_notify
kgdb_handle_exception
kgdb_cpu_enter (kgdb_roundup_cpus send IPI to other slave CPU)
kdb_stub
kdb_main_loop

slave CPU1, CPU2, CPU3 ... and other CPUs:
Using CPU1 as an example:
Currently holding the running queue lock of the master CPU0 due to load_balance or other reasons, responding to the NMI sent by master CPU0 through kgdb_roundup_cpus. Enter the following stack:
nmi_handle
kgdb_nmicallback
kgdb_cpu_enter (The slave CPU1 will loop touch watchdog and wait for the master CPU0 to exit.)

The above is the state before executing the KDB command "go".

When the user executes the KDB command "go", it will trigger a deadlock.
master CPU0 :
kdb_main_loop return
kdb_stub return
kgdb_cpu_enter
kgdboc_post_exp_handler
queue_work_on
__queue_work
insert_work
wake_up_process
try_to_wake_up
_raw_spin_lock (Acquire the spin lock of master CPU0 rq->lock, but at this time the spin lock of master CPU0 is held by CPU1)

As a result, a deadlock has occurred.

Therefore, when the master CPU0 exits, if the rq->lock of CPU0 is locked, it should not wake up the worker on the system_wq.

>> example:
>>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888,
>> .owner_cpu: 1
>>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
>> Call Trace:
>>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0  [<ffffffff8147a7fc>] ?
>> schedule+0x3c/0x90  [<ffffffff8147e71a>] ?
>> schedule_hrtimeout_range_clock+0x10a/0x120
>>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10  [<ffffffff811c839b>] ?
>> ep_scan_ready_list+0x1db/0x1e0  [<ffffffff8147e743>] ?
>> schedule_hrtimeout_range+0x13/0x20
>>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0  [<ffffffff8108c540>] ?
>> wake_up_q+0x70/0x70  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>> [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>>   Call Trace:
>>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>>   [<ffffffff81017219>] do_int3+0x89/0x120
>>   [<ffffffff81480fb4>] int3+0x44/0x80
>>
>> Signed-off-by: LiuYe <liu.yeC@h3c.com>
>> ---
>>  drivers/tty/serial/kgdboc.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
>> index 7ce7bb164..945318ef1 100644
>> --- a/drivers/tty/serial/kgdboc.c
>> +++ b/drivers/tty/serial/kgdboc.c
>> @@ -22,6 +22,9 @@
>>  #include <linux/module.h>
>>  #include <linux/platform_device.h>
>>  #include <linux/serial_core.h>
>> +#include <linux/smp.h>
>> +
>> +#include "../kernel/sched/sched.h"
>>
>>  #define MAX_CONFIG_LEN         40
>>
>> @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
>>                 dbg_restore_graphics = 0;
>>                 con_debug_leave();
>>         }
>> -       kgdboc_restore_input();
>> +       if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
>> +               kgdboc_restore_input();
>
>I don't think solving this by access internal scheduler state is the right approach .
>
>The description I wrote above perhaps already suggests why. The deadlock occurs because it is unsafe to call schedule_work() from the debug trap handler. The debug trap handler in your stack trace is not running from an NMI but it certainly has NMI-like properties. Therefore a better fix is not to call schedule_work() at all from the debug trap handler.
>
>Instead we need to use an NMI-safe API such as irq_work_queue() and that irq_work can call schedule_work() and trigger the keyboard reset.
>
>
>Daniel.


-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-01  3:30   ` 答复: " Liuye
@ 2024-03-01 10:59     ` Daniel Thompson
  2024-03-12  8:37       ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-01 10:59 UTC (permalink / raw)
  To: Liuye
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Fri, Mar 01, 2024 at 03:30:25AM +0000, Liuye wrote:
> >On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> >> master cpu : After executing the go command, a deadlock occurs.
> >> slave cpu: may be performing thread migration, acquiring the
> >> running queue lock of master CPU.  Then it was interrupted by kdb
> >> NMI and entered the nmi_handler process.  (nmi_handle->
> >> kgdb_nmicallback-> kgdb_cpu_enter while(1){ touch wathcdog}.)
> >
> >I think this description is a little short and doesn't clearly
> >explain the cause. How about:
> >
> >Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use
> >schedule_work() to provoke a keyboard reset when transitioning out of
> >the debugger and back to normal operation. This can cause deadlock
> >because schedule_work() is not NMI-safe.
> >
> >The stack trace below shows an example of the problem. In this case
> >the master cpu is not running from NMI but it has parked the slace
> >CPUs using an NMI and the parked CPUs is holding spinlocks needed by
> >schedule_work().
>
> Due to the brevity of the description, there may be some
> misunderstanding, so a detailed description is provided as follows:

So, there is a small mistake in the example description I provided.
After double checking the code it should start slightly differently:
  "Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
  attempt to use schedule_work() ...".

However other than that I think it is correct.

The important bit of feedback here is that the patch description should
commence with a description of the bug rather than a description of the
symptom. In this case the bug is kgdboc calls a function that is not
safe to call from this calling context.

It is really useful to describe the symptom as part of the patch
description. However if we focus on the symptom without additional
code review then we can end up with the wrong fix. That is what
happened here. It is unsafe to call schedule_work() and checking
the runqueue locks is insufficient to make it safe because we are
still calling a function from an inappropriate calling context..


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-02-28  2:56 [PATCH] kdb: Fix the deadlock issue in KDB debugging LiuYe
  2024-02-28 12:05 ` Daniel Thompson
@ 2024-03-02 20:44 ` Greg KH
  1 sibling, 0 replies; 48+ messages in thread
From: Greg KH @ 2024-03-02 20:44 UTC (permalink / raw)
  To: LiuYe
  Cc: jason.wessel, daniel.thompson, dianders, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> This e-mail and its attachments contain confidential information from New H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!

Now deleted.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-01 10:59     ` Daniel Thompson
@ 2024-03-12  8:37       ` Liuye
  2024-03-12  9:57         ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-12  8:37 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

I know that you said schedule_work is not NMI save, which is the first issue. Perhaps it can be fixed using irq_work_queue. But even if irq_work_queue is used to implement it, there will still be a deadlock problem because slave cpu1 still has not released the running queue lock of master CPU0.




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-12  8:37       ` 答复: " Liuye
@ 2024-03-12  9:57         ` Daniel Thompson
  2024-03-12 10:04           ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-12  9:57 UTC (permalink / raw)
  To: Liuye
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> I know that you said schedule_work is not NMI save, which is the first
> issue. Perhaps it can be fixed using irq_work_queue. But even if
> irq_work_queue is used to implement it, there will still be a deadlock
> problem because slave cpu1 still has not released the running queue
> lock of master CPU0.

This doesn't sound right to me. Why do you think CPU1 won't release the
run queue lock?


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-12  9:57         ` Daniel Thompson
@ 2024-03-12 10:04           ` Liuye
  2024-03-12 10:24             ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-12 10:04 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

>On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> I know that you said schedule_work is not NMI save, which is the first 
>> issue. Perhaps it can be fixed using irq_work_queue. But even if 
>> irq_work_queue is used to implement it, there will still be a deadlock 
>> problem because slave cpu1 still has not released the running queue 
>> lock of master CPU0.
>
>This doesn't sound right to me. Why do you think CPU1 won't release the run queue lock?

In this example, CPU1 is waiting for CPU0 to release dbg_slave_lock.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-12 10:04           ` 答复: " Liuye
@ 2024-03-12 10:24             ` Daniel Thompson
  2024-03-13  1:22               ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-12 10:24 UTC (permalink / raw)
  To: Liuye
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> I know that you said schedule_work is not NMI save, which is the first
> >> issue. Perhaps it can be fixed using irq_work_queue. But even if
> >> irq_work_queue is used to implement it, there will still be a deadlock
> >> problem because slave cpu1 still has not released the running queue
> >> lock of master CPU0.
> >
> >This doesn't sound right to me. Why do you think CPU1 won't release
> >the run queue lock?
>
> In this example, CPU1 is waiting for CPU0 to release dbg_slave_lock.

That shouldn't be a problem. CPU0 will have released that lock by
the time the irq work is dispatched.


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-12 10:24             ` Daniel Thompson
@ 2024-03-13  1:22               ` Liuye
  2024-03-13 14:17                 ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-13  1:22 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

>On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> I know that you said schedule_work is not NMI save, which is the 
>> >> first issue. Perhaps it can be fixed using irq_work_queue. But even 
>> >> if irq_work_queue is used to implement it, there will still be a 
>> >> deadlock problem because slave cpu1 still has not released the 
>> >> running queue lock of master CPU0.
>> >
>> >This doesn't sound right to me. Why do you think CPU1 won't release 
>> >the run queue lock?
>>
>> In this example, CPU1 is waiting for CPU0 to release dbg_slave_lock.
>
>That shouldn't be a problem. CPU0 will have released that lock by the time the irq work is dispatched.

Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be handled, and we are back to the previous issue.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-13  1:22               ` 答复: " Liuye
@ 2024-03-13 14:17                 ` Daniel Thompson
  2024-03-14  7:06                   ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-13 14:17 UTC (permalink / raw)
  To: Liuye
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> >> I know that you said schedule_work is not NMI save, which is the
> >> >> first issue. Perhaps it can be fixed using irq_work_queue. But
> >> >> even if irq_work_queue is used to implement it, there will still
> >> >> be a deadlock problem because slave cpu1 still has not released
> >> >> the running queue lock of master CPU0.
> >> >
> >> >This doesn't sound right to me. Why do you think CPU1 won't
> >> >release the run queue lock?
> >>
> >> In this example, CPU1 is waiting for CPU0 to release
> >> dbg_slave_lock.
> >
> >That shouldn't be a problem. CPU0 will have released that lock by the
> >time the irq work is dispatched.
>
> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be
> handled, and we are back to the previous issue.

Sorry but I still don't understand what problem you think can happen
here. What is wrong with calling schedule_work() from the IRQ work
handler?

Both irq_work_queue() and schedule_work() are calls to queue deferred
work. It does not matter when the work is queued (providing we are lock
safe). What matters is when the work is actually executed.

Please can you describe the problem you think exists based on when the
work is executed.


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-13 14:17                 ` Daniel Thompson
@ 2024-03-14  7:06                   ` Liuye
  2024-03-14 13:09                     ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-14  7:06 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

>On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
>> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> >> I know that you said schedule_work is not NMI save, which is the 
>> >> >> first issue. Perhaps it can be fixed using irq_work_queue. But 
>> >> >> even if irq_work_queue is used to implement it, there will still 
>> >> >> be a deadlock problem because slave cpu1 still has not released 
>> >> >> the running queue lock of master CPU0.
>> >> >
>> >> >This doesn't sound right to me. Why do you think CPU1 won't 
>> >> >release the run queue lock?
>> >>
>> >> In this example, CPU1 is waiting for CPU0 to release 
>> >> dbg_slave_lock.
>> >
>> >That shouldn't be a problem. CPU0 will have released that lock by the 
>> >time the irq work is dispatched.
>>
>> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be 
>> handled, and we are back to the previous issue.
>
>Sorry but I still don't understand what problem you think can happen here. What is wrong with calling schedule_work() from the IRQ work handler?
>
>Both irq_work_queue() and schedule_work() are calls to queue deferred work. It does not matter when the work is queued (providing we are lock safe). What matters is when the work is actually executed.
>
>Please can you describe the problem you think exists based on when the work is executed.

CPU0 enters the KDB process when processing serial port interrupts and triggers an IPI (NMI) to other CPUs. 
After entering a stable state, CPU0 is in interrupt context, while other CPUs are in NMI context. 
Before other CPUs enter NMI context, there is a chance to obtain the running queue of CPU0. 
At this time, when CPU0 is processing kgdboc_restore_input, calling schedule_work, need_more_worker here determines the chance to wake up processes on system_wq. 
This will cause CPU0 to acquire the running queue lock of this core, which is held by other CPUs. 
but other CPUs are still in NMI context and have not exited because waiting for CPU0 to release the dbg_slave_lock after schedule_work.

After thinking about it, the problem is not whether schedule_work is NMI safe, but that processes on system_wq should not be awakened immediately when schedule_work is called. 
I replaced schedule_work with schedule_delayed_work, and this solved my problem.

The new patch is as follows:

Index: drivers/tty/serial/kgdboc.c
===================================================================
--- drivers/tty/serial/kgdboc.c (revision 57862)
+++ drivers/tty/serial/kgdboc.c (working copy)
@@ -92,12 +92,12 @@
        mutex_unlock(&kgdboc_reset_mutex);
 }

-static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
+static DECLARE_DELAYED_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);

 static void kgdboc_restore_input(void)
 {
        if (likely(system_state == SYSTEM_RUNNING))
-               schedule_work(&kgdboc_restore_input_work);
+               schedule_delayed_work(&kgdboc_restore_input_work,2*HZ);
 }

 static int kgdboc_register_kbd(char **cptr)
@@ -128,7 +128,7 @@
                        i--;
                }
        }
-       flush_work(&kgdboc_restore_input_work);
+       flush_delayed_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
 #define kgdboc_register_kbd(x) 0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-14  7:06                   ` 答复: " Liuye
@ 2024-03-14 13:09                     ` Daniel Thompson
  2024-03-15  9:59                       ` 答复: " Liuye
  2024-03-16  2:34                       ` [PATCH v1] " liu.yec
  0 siblings, 2 replies; 48+ messages in thread
From: Daniel Thompson @ 2024-03-14 13:09 UTC (permalink / raw)
  To: Liuye
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Thu, Mar 14, 2024 at 07:06:22AM +0000, Liuye wrote:
> >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
> >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> >> >> I know that you said schedule_work is not NMI save, which is
> >> >> >> the first issue. Perhaps it can be fixed using
> >> >> >> irq_work_queue. But even if irq_work_queue is used to
> >> >> >> implement it, there will still be a deadlock problem because
> >> >> >> slave cpu1 still has not released the running queue lock of
> >> >> >> master CPU0.
> >> >> >
> >> >> >This doesn't sound right to me. Why do you think CPU1 won't
> >> >> >release the run queue lock?
> >> >>
> >> >> In this example, CPU1 is waiting for CPU0 to release
> >> >> dbg_slave_lock.
> >> >
> >> >That shouldn't be a problem. CPU0 will have released that lock by
> >> >the time the irq work is dispatched.
> >>
> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to
> >> be handled, and we are back to the previous issue.
> >
> > Sorry but I still don't understand what problem you think can happen
> > here. What is wrong with calling schedule_work() from the IRQ work
> > handler?
> >
> > Both irq_work_queue() and schedule_work() are calls to queue deferred
> > work. It does not matter when the work is queued (providing we are
> > lock safe). What matters is when the work is actually executed.
> >
> > Please can you describe the problem you think exists based on when
> > the work is executed.
>
> CPU0 enters the KDB process when processing serial port interrupts and
> triggers an IPI (NMI) to other CPUs.  After entering a stable state,
> CPU0 is in interrupt context, while other CPUs are in NMI context.
> Before other CPUs enter NMI context, there is a chance to obtain the
> running queue of CPU0.

Focusing on the run queue locks in this analysis is a mistake. Before
the other CPUs enter NMI context there is a chance for them to obtain
*any* locks, including the timer wheel locks.


> At this time, when CPU0 is processing kgdboc_restore_input, calling
> schedule_work, need_more_worker here determines the chance to wake up
> processes on system_wq.
>
> This will cause CPU0 to acquire the running queue lock of this core,
> which is held by other CPUs.  but other CPUs are still in NMI context
> and have not exited because waiting for CPU0 to release the
> dbg_slave_lock after schedule_work.
>
> After thinking about it, the problem is not whether schedule_work is
> NMI safe, but that processes on system_wq should not be awakened
> immediately when schedule_work is called.

I disagree with this conclusion.

The problem *is* that schedue_work() is not NMI-safe.

You cannot solve an NMI safety problem by replacing schedule_work()
with another function that is also not NMI-safe. That simply changes
the locks that need to be taken to provoke a deadlock.


> I replaced schedule_work with schedule_delayed_work, and this solved
> my problem.

This may stop some specific reproduction from taking place but it
does not fix the potential deadlock.

I still believe that using irq_work is the only way to solve this
properly. Please try the following change:

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb1640054..161b25ecc5e15 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>

 #define MAX_CONFIG_LEN		40

@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)

 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);

+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }

 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* 答复: 答复: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-14 13:09                     ` Daniel Thompson
@ 2024-03-15  9:59                       ` Liuye
  2024-03-16  2:34                       ` [PATCH v1] " liu.yec
  1 sibling, 0 replies; 48+ messages in thread
From: Liuye @ 2024-03-15  9:59 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jason.wessel, dianders, gregkh, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

>On Thu, Mar 14, 2024 at 07:06:22AM +0000, Liuye wrote:
>> >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
>> >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> >> >> I know that you said schedule_work is not NMI save, which is 
>> >> >> >> the first issue. Perhaps it can be fixed using 
>> >> >> >> irq_work_queue. But even if irq_work_queue is used to 
>> >> >> >> implement it, there will still be a deadlock problem because 
>> >> >> >> slave cpu1 still has not released the running queue lock of 
>> >> >> >> master CPU0.
>> >> >> >
>> >> >> >This doesn't sound right to me. Why do you think CPU1 won't 
>> >> >> >release the run queue lock?
>> >> >>
>> >> >> In this example, CPU1 is waiting for CPU0 to release 
>> >> >> dbg_slave_lock.
>> >> >
>> >> >That shouldn't be a problem. CPU0 will have released that lock by 
>> >> >the time the irq work is dispatched.
>> >>
>> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to 
>> >> be handled, and we are back to the previous issue.
>> >
>> > Sorry but I still don't understand what problem you think can happen 
>> > here. What is wrong with calling schedule_work() from the IRQ work 
>> > handler?
>> >
>> > Both irq_work_queue() and schedule_work() are calls to queue 
>> > deferred work. It does not matter when the work is queued (providing 
>> > we are lock safe). What matters is when the work is actually executed.
>> >
>> > Please can you describe the problem you think exists based on when 
>> > the work is executed.
>>
>> CPU0 enters the KDB process when processing serial port interrupts and 
>> triggers an IPI (NMI) to other CPUs.  After entering a stable state,
>> CPU0 is in interrupt context, while other CPUs are in NMI context.
>> Before other CPUs enter NMI context, there is a chance to obtain the 
>> running queue of CPU0.
>
>Focusing on the run queue locks in this analysis is a mistake. Before the other CPUs enter NMI context there is a chance for them to obtain
>*any* locks, including the timer wheel locks.

Yes, you are right. I did not consider it comprehensively.

>> At this time, when CPU0 is processing kgdboc_restore_input, calling 
>> schedule_work, need_more_worker here determines the chance to wake up 
>> processes on system_wq.
>>
>> This will cause CPU0 to acquire the running queue lock of this core, 
>> which is held by other CPUs.  but other CPUs are still in NMI context 
>> and have not exited because waiting for CPU0 to release the 
>> dbg_slave_lock after schedule_work.
>>
>> After thinking about it, the problem is not whether schedule_work is 
>> NMI safe, but that processes on system_wq should not be awakened 
>> immediately when schedule_work is called.
>
>I disagree with this conclusion.
>
>The problem *is* that schedue_work() is not NMI-safe.
>
>You cannot solve an NMI safety problem by replacing schedule_work() with another function that is also not NMI-safe. That simply changes the locks that need to be taken to provoke a deadlock.
>
>
>> I replaced schedule_work with schedule_delayed_work, and this solved 
>> my problem.
>
>This may stop some specific reproduction from taking place but it does not fix the potential deadlock.
>
>I still believe that using irq_work is the only way to solve this properly. Please try the following change:

I tried the following patch and it also resolved the issue.
Thank you for your guidance and suggestions. I will organize this issue and resend a patch.

>diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 7ce7bb1640054..161b25ecc5e15 100644
>--- a/drivers/tty/serial/kgdboc.c
>+++ b/drivers/tty/serial/kgdboc.c
>@@ -22,6 +22,7 @@
> #include <linux/module.h>
> #include <linux/platform_device.h>
> #include <linux/serial_core.h>
>+#include <linux/irq_work.h>
>
> #define MAX_CONFIG_LEN		40
>
>@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
>
> static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
>
>+static void kgdboc_queue_restore_input_helper(struct irq_work *unused) 
>+{
>+	schedule_work(&kgdboc_restore_input_work);
>+}
>+
>+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, 
>+kgdboc_queue_restore_input_helper);
>+
> static void kgdboc_restore_input(void)
> {
> 	if (likely(system_state == SYSTEM_RUNNING))
>-		schedule_work(&kgdboc_restore_input_work);
>+		irq_work_queue(&kgdboc_restore_input_irq_work);
> }
>
> static int kgdboc_register_kbd(char **cptr) @@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
> 			i--;
> 		}
> 	}
>+	irq_work_sync(&kgdboc_restore_input_irq_work);
> 	flush_work(&kgdboc_restore_input_work);
> }
> #else /* ! CONFIG_KDB_KEYBOARD */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-14 13:09                     ` Daniel Thompson
  2024-03-15  9:59                       ` 答复: " Liuye
@ 2024-03-16  2:34                       ` liu.yec
  2024-03-20 16:28                         ` Daniel Thompson
  1 sibling, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-03-16  2:34 UTC (permalink / raw)
  To: daniel.thompson
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial, liu.yeC

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
  attempt to use schedule_work() to provoke a keyboard reset when transitioning out
of the debugger and back to normal operation. This can cause
deadlock because schedule_work() is not NMI-safe.

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

Suggested-by: daniel.thompson@linaro.org
Signed-off-by: LiuYe <liu.yeC@h3c.com>
---
 drivers/tty/serial/kgdboc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..161b25ecc 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-16  2:34                       ` [PATCH v1] " liu.yec
@ 2024-03-20 16:28                         ` Daniel Thompson
  2024-03-21  2:26                           ` [PATCH V3] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-20 16:28 UTC (permalink / raw)
  To: liu.yec
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

^^^ This is v2 isn't it?

On Sat, Mar 16, 2024 at 10:34:43AM +0800, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
>
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
>   attempt to use schedule_work() to provoke a keyboard reset when transitioning out

Please format the description correctly.


> of the debugger and back to normal operation. This can cause
> deadlock because schedule_work() is not NMI-safe.

Wasn't there another paragraph to go here?

: The stack trace below shows an example of the problem. In this
: case the master cpu is not running from NMI but it has parked
: the slave CPUs using an NMI and the parked CPUs is holding
: spinlocks needed by schedule_work().


> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80
>
> Suggested-by: daniel.thompson@linaro.org

Thanks for the credit but I think the following is probably a better way
to express it:

Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-20 16:28                         ` Daniel Thompson
@ 2024-03-21  2:26                           ` liu.yec
  2024-03-21  7:38                             ` Greg KH
  0 siblings, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-03-21  2:26 UTC (permalink / raw)
  To: daniel.thompson
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial, liu.yec, LiuYe

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---
 drivers/tty/serial/kgdboc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..161b25ecc 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-21  2:26                           ` [PATCH V3] " liu.yec
@ 2024-03-21  7:38                             ` Greg KH
  2024-03-21  7:57                               ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2024-03-21  7:38 UTC (permalink / raw)
  To: liu.yec
  Cc: daniel.thompson, dianders, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

On Thu, Mar 21, 2024 at 10:26:04AM +0800, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
> 
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
> 
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().
> 
> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80
> 
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> ---
>  drivers/tty/serial/kgdboc.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 7ce7bb164..161b25ecc 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,7 @@
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
>  #include <linux/serial_core.h>
> +#include <linux/irq_work.h>
>  
>  #define MAX_CONFIG_LEN		40
>  
> @@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
>  
>  static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
>  
> +static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
> +{
> +	schedule_work(&kgdboc_restore_input_work);
> +}
> +
> +static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
> +
>  static void kgdboc_restore_input(void)
>  {
>  	if (likely(system_state == SYSTEM_RUNNING))
> -		schedule_work(&kgdboc_restore_input_work);
> +		irq_work_queue(&kgdboc_restore_input_irq_work);
>  }
>  
>  static int kgdboc_register_kbd(char **cptr)
> @@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
>  			i--;
>  		}
>  	}
> +	irq_work_sync(&kgdboc_restore_input_irq_work);
>  	flush_work(&kgdboc_restore_input_work);
>  }
>  #else /* ! CONFIG_KDB_KEYBOARD */
> -- 
> 2.25.1
> 
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/process/submitting-patches.rst for what
  needs to be done here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-21  7:38                             ` Greg KH
@ 2024-03-21  7:57                               ` Liuye
  2024-03-21 11:04                                 ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-21  7:57 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.thompson, dianders, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial


> The stack trace below shows an example of the problem. In this case 
> the master cpu is not running from NMI but it has parked the slave 
> CPUs using an NMI and the parked CPUs is holding spinlocks needed by 
> schedule_work().

Add description information 

> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

Add 
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: [PATCH V3] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-21  7:57                               ` 答复: " Liuye
@ 2024-03-21 11:04                                 ` Daniel Thompson
  2024-03-21 11:50                                   ` [PATCH V4] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-21 11:04 UTC (permalink / raw)
  To: Liuye
  Cc: Greg KH, dianders, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Thu, Mar 21, 2024 at 07:57:28AM +0000, Liuye wrote:
>
> > The stack trace below shows an example of the problem. In this case
> > the master cpu is not running from NMI but it has parked the slave
> > CPUs using an NMI and the parked CPUs is holding spinlocks needed by
> > schedule_work().
>
> Add description information
>
> > Signed-off-by: LiuYe <liu.yeC@h3c.com>
> > Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>
> Add
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

I assume this reply is summarizing what changed between the previous
version and v3?

The best way to add the changelog requested by Greg's bot is to send
a corrected v4 version. If you follow the example in
https://docs.kernel.org/process/submitting-patches.html#the-canonical-patch-format
then you'll see that what is expected.

You can use git-notes to handle this. If you add a note containing
something like the following:
~~~
Changes in v4:
* Provide these changelogs.

Changes in v3:
* <please describe changes>

Changes in v2:
* <please describe changes>
~~~

Once you have the note than git-format-patch --notes will automatically
included the changelog with the patch.


Daniel.


PS I know that v2 was circulated with an incorrect subject line but I
   think it is probably OK to call it v2 anyway... it makes the
   changelog clearer!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-21 11:04                                 ` Daniel Thompson
@ 2024-03-21 11:50                                   ` liu.yec
  2024-03-22  6:54                                     ` Jiri Slaby
  0 siblings, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-03-21 11:50 UTC (permalink / raw)
  To: daniel.thompson
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial, liu.yec, LiuYe

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..161b25ecc 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-21 11:50                                   ` [PATCH V4] " liu.yec
@ 2024-03-22  6:54                                     ` Jiri Slaby
  2024-03-22  7:50                                       ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Jiri Slaby @ 2024-03-22  6:54 UTC (permalink / raw)
  To: liu.yec, daniel.thompson
  Cc: dianders, gregkh, jason.wessel, kgdb-bugreport, linux-kernel,
	linux-serial

On 21. 03. 24, 12:50, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
> 
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
> 
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().

I am missing here an explanation (perhaps because I cannot find any docs 
for irq_work) why irq_work works in this case.

And why you need to schedule another work in the irq_work and not do the 
job directly.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-22  6:54                                     ` Jiri Slaby
@ 2024-03-22  7:50                                       ` Liuye
  2024-03-22 15:58                                         ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-03-22  7:50 UTC (permalink / raw)
  To: Jiri Slaby, daniel.thompson
  Cc: dianders, gregkh, jason.wessel, kgdb-bugreport, linux-kernel,
	linux-serial

>On 21. 03. 24, 12:50, liu.yec@h3c.com wrote:
>> From: LiuYe <liu.yeC@h3c.com>
>> 
>> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt 
>> to use schedule_work() to provoke a keyboard reset when transitioning 
>> out of the debugger and back to normal operation.
>> This can cause deadlock because schedule_work() is not NMI-safe.
>> 
>> The stack trace below shows an example of the problem. In this case 
>> the master cpu is not running from NMI but it has parked the slave 
>> CPUs using an NMI and the parked CPUs is holding spinlocks needed by 
>> schedule_work().
>
>I am missing here an explanation (perhaps because I cannot find any docs for irq_work) why irq_work works in this case.

Just need to postpone schedule_work to the slave CPU exiting the NMI context, and there will be no deadlock problem. 
irq_work will only respond to handle schedule_work after master cpu exiting the current interrupt context. 
When the master CPU exits the interrupt context, other CPUs will naturally exit the NMI context, so there will be no deadlock.

>And why you need to schedule another work in the irq_work and not do the job directly.

In the function kgdboc_restore_input_helper , use mutex_lock for protection. The mutex lock cannot be used in interrupt context.
Guess that the process needs to run in the context of the process. Therefore, call schedule_work in irq_work. Keep the original flow unchanged.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-22  7:50                                       ` 答复: " Liuye
@ 2024-03-22 15:58                                         ` Daniel Thompson
  2024-03-23  1:41                                           ` [PATCH V5] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-03-22 15:58 UTC (permalink / raw)
  To: Liuye
  Cc: Jiri Slaby, dianders, gregkh, jason.wessel, kgdb-bugreport,
	linux-kernel, linux-serial

On Fri, Mar 22, 2024 at 07:50:54AM +0000, Liuye wrote:
> >On 21. 03. 24, 12:50, liu.yec@h3c.com wrote:
> >> From: LiuYe <liu.yeC@h3c.com>
> >>
> >> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt
> >> to use schedule_work() to provoke a keyboard reset when transitioning
> >> out of the debugger and back to normal operation.
> >> This can cause deadlock because schedule_work() is not NMI-safe.
> >>
> >> The stack trace below shows an example of the problem. In this case
> >> the master cpu is not running from NMI but it has parked the slave
> >> CPUs using an NMI and the parked CPUs is holding spinlocks needed by
> >> schedule_work().
> >
> > I am missing here an explanation (perhaps because I cannot find any
> > docs for irq_work) why irq_work works in this case.
>
> Just need to postpone schedule_work to the slave CPU exiting the NMI
> context, and there will be no deadlock problem.  irq_work will only
> respond to handle schedule_work after master cpu exiting the current
> interrupt context.  When the master CPU exits the interrupt context,
> other CPUs will naturally exit the NMI context, so there will be no
> deadlock.
>
> > And why you need to schedule another work in the irq_work and not do
> > the job directly.
>
> In the function kgdboc_restore_input_helper , use mutex_lock for
> protection.

It is the call to input_register_handler() that forces us not to
do the work from irq_work's hardirq callback.

It is true that there are mutexes in kgdboc_restore_input_helper()
but if they were the only problem we could change the locking
strategy.


> The mutex lock cannot be used in interrupt context.  Guess
> that the process needs to run in the context of the process.
> Therefore, call schedule_work in irq_work. Keep the original flow
> unchanged.

You should answer these questions by posting a v5 with the explanation
in the patch description (otherwise the explanation of how the fix works
doesn't end up in the changelog).


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V5] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-22 15:58                                         ` Daniel Thompson
@ 2024-03-23  1:41                                           ` liu.yec
  2024-03-25 16:54                                             ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-03-23  1:41 UTC (permalink / raw)
  To: daniel.thompson, jirislaby
  Cc: dianders, gregkh, jason.wessel, kgdb-bugreport, linux-kernel,
	linux-serial, liu.yec, LiuYe

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

Just need to postpone schedule_work to the slave CPU exiting the NMI context.

irq_work will only respond to handle schedule_work after exiting the current interrupt context.

When the master CPU exits the interrupt context, other CPUs will naturally exit the NMI context, so there will be no deadlock.

It is the call to input_register_handler() that forces us not to do the work from irq_work's hardirq callback.

Therefore schedule another work in the irq_work and not do the job directly.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..161b25ecc 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V5] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-23  1:41                                           ` [PATCH V5] " liu.yec
@ 2024-03-25 16:54                                             ` Daniel Thompson
  2024-03-26  0:47                                               ` 答复: " Liuye
  2024-03-26  7:40                                               ` [PATCH V6] " liu.yec
  0 siblings, 2 replies; 48+ messages in thread
From: Daniel Thompson @ 2024-03-25 16:54 UTC (permalink / raw)
  To: liu.yec
  Cc: jirislaby, dianders, gregkh, jason.wessel, kgdb-bugreport,
	linux-kernel, linux-serial

On Sat, Mar 23, 2024 at 09:41:41AM +0800, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
>
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
>
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().
>
> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80
>
> Just need to postpone schedule_work to the slave CPU exiting the NMI context.
>
> irq_work will only respond to handle schedule_work after exiting the current interrupt context.
>
> When the master CPU exits the interrupt context, other CPUs will naturally exit the NMI context, so there will be no deadlock.
>
> It is the call to input_register_handler() that forces us not to do the work from irq_work's hardirq callback.
>
> Therefore schedule another work in the irq_work and not do the job directly.

This looks like it was copy and pasted from the e-mail thread without
any editing to make it make any sense. It not even formatted correctly
(where are the line breaks?).

How about:

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. irq_work is an NMI-safe deferred work
framework that performs the requested work from a hardirq context
(usually an IPI but it can be timer interrupt on some
architectures).

Note that we still need to  a workqueue since we cannot resync
the keyboard state from the hardirq context provided by irq_work.
That must be done from task context for the calls into the input
subystem. Hence we must defer the work twice. First to safely
switch from the debug trap (NMI-like context) to hardirq and
then, secondly, to get from hardirq to the system workqueue.


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: [PATCH V5] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-25 16:54                                             ` Daniel Thompson
@ 2024-03-26  0:47                                               ` Liuye
  2024-03-26  7:40                                               ` [PATCH V6] " liu.yec
  1 sibling, 0 replies; 48+ messages in thread
From: Liuye @ 2024-03-26  0:47 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: jirislaby, dianders, gregkh, jason.wessel, kgdb-bugreport,
	linux-kernel, linux-serial

>> Just need to postpone schedule_work to the slave CPU exiting the NMI context.
>>
>> irq_work will only respond to handle schedule_work after exiting the current interrupt context.
>>
>> When the master CPU exits the interrupt context, other CPUs will naturally exit the NMI context, so there will be no deadlock.
>>
>> It is the call to input_register_handler() that forces us not to do the work from irq_work's hardirq callback.
>>
>> Therefore schedule another work in the irq_work and not do the job directly.
>
>This looks like it was copy and pasted from the e-mail thread without any editing to make it make any sense. It not even formatted correctly (where are the line breaks?).
>
>How about:
>
>We fix the problem by using irq_work to call schedule_work() instead of calling it directly. irq_work is an NMI-safe deferred work framework that performs the requested work from a hardirq context (usually an IPI but it can be timer interrupt on some architectures).
>
>Note that we still need to  a workqueue since we cannot resync the keyboard state from the hardirq context provided by irq_work.
>That must be done from task context for the calls into the input subystem. Hence we must defer the work twice. First to safely switch from the debug trap (NMI-like context) to hardirq and then, secondly, to get from hardirq to the system workqueue.

I apologize for my poor writing skills, your answer is more professional and accurate. I will replace this part with your description in the V6.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V6] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-25 16:54                                             ` Daniel Thompson
  2024-03-26  0:47                                               ` 答复: " Liuye
@ 2024-03-26  7:40                                               ` liu.yec
  2024-03-26  8:22                                                 ` Greg KH
  1 sibling, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-03-26  7:40 UTC (permalink / raw)
  To: daniel.thompson, jirislaby
  Cc: dianders, gregkh, jason.wessel, kgdb-bugreport, linux-kernel,
	linux-serial, liu.yec, LiuYe

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. irq_work is an NMI-safe deferred work
framework that performs the requested work from a hardirq context
(usually an IPI but it can be timer interrupt on some
architectures).

Note that we still need to  a workqueue since we cannot resync
the keyboard state from the hardirq context provided by irq_work.
That must be done from task context for the calls into the input
subystem. Hence we must defer the work twice. First to safely
switch from the debug trap (NMI-like context) to hardirq and
then, secondly, to get from hardirq to the system workqueue.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V5 -> V6: Replace with a more professional and accurate answer.
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..161b25ecc 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V6] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-26  7:40                                               ` [PATCH V6] " liu.yec
@ 2024-03-26  8:22                                                 ` Greg KH
  2024-03-26  8:54                                                   ` [PATCH V7] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2024-03-26  8:22 UTC (permalink / raw)
  To: liu.yec
  Cc: daniel.thompson, jirislaby, dianders, jason.wessel,
	kgdb-bugreport, linux-kernel, linux-serial

On Tue, Mar 26, 2024 at 03:40:14PM +0800, liu.yec@h3c.com wrote:
> Note that we still need to  a workqueue since we cannot resync
> the keyboard state from the hardirq context provided by irq_work.

I think you are missing a word between "to" and "a", right?

> That must be done from task context for the calls into the input
> subystem. Hence we must defer the work twice. First to safely
> switch from the debug trap (NMI-like context) to hardirq and
> then, secondly, to get from hardirq to the system workqueue.
> 
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> 
> ---
> V5 -> V6: Replace with a more professional and accurate answer.
> V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
> V3 -> V4: Add changelogs
> V2 -> V3: Add description information
> V1 -> V2: using irq_work to solve this properly.
> ---
> ---
>  drivers/tty/serial/kgdboc.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 7ce7bb164..161b25ecc 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,7 @@
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
>  #include <linux/serial_core.h>
> +#include <linux/irq_work.h>
>  
>  #define MAX_CONFIG_LEN		40
>  
> @@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
>  
>  static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
>  
> +static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
> +{
> +	schedule_work(&kgdboc_restore_input_work);

As this is a "two stage deferment" or something like that, it should be
documented in the code exactly why this is needed and what is happening,
otherwise it looks very odd.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V7] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-26  8:22                                                 ` Greg KH
@ 2024-03-26  8:54                                                   ` liu.yec
  2024-04-02 12:58                                                     ` Daniel Thompson
  0 siblings, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-03-26  8:54 UTC (permalink / raw)
  To: gregkh, jirislaby
  Cc: dianders, jason.wessel, kgdb-bugreport, linux-kernel,
	linux-serial, liu.yec, LiuYe, Daniel Thompson

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. irq_work is an NMI-safe deferred work
framework that performs the requested work from a hardirq context
(usually an IPI but it can be timer interrupt on some
architectures).

Note that we still need to a workqueue since we cannot resync
the keyboard state from the hardirq context provided by irq_work.
That must be done from task context for the calls into the input
subystem. Hence we must defer the work twice. First to safely
switch from the debug trap (NMI-like context) to hardirq and
then, secondly, to get from hardirq to the system workqueue.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V6 -> V7: Add comments in the code.
V5 -> V6: Replace with a more professional and accurate answer.
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..750ed66d2 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -98,11 +99,29 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 }
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
+/*
+ * We fix the problem by using irq_work to call schedule_work()
+ * instead of calling it directly. irq_work is an NMI-safe deferred work
+ * framework that performs the requested work from a hardirq context
+ * (usually an IPI but it can be timer interrupt on some
+ * architectures). Note that we still need to a workqueue since we cannot resync
+ * the keyboard state from the hardirq context provided by irq_work.
+ * That must be done from task context for the calls into the input
+ * subystem. Hence we must defer the work twice. First to safely
+ * switch from the debug trap (NMI-like context) to hardirq and
+ * then, secondly, to get from hardirq to the system workqueue.
+ */
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
 
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +152,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V7] kdb: Fix the deadlock issue in KDB debugging.
  2024-03-26  8:54                                                   ` [PATCH V7] " liu.yec
@ 2024-04-02 12:58                                                     ` Daniel Thompson
  2024-04-03  6:11                                                       ` [PATCH V8] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Thompson @ 2024-04-02 12:58 UTC (permalink / raw)
  To: liu.yec
  Cc: gregkh, jirislaby, dianders, jason.wessel, kgdb-bugreport,
	linux-kernel, linux-serial

On Tue, Mar 26, 2024 at 04:54:07PM +0800, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
>
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
>
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().
>
> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80
>
> We fix the problem by using irq_work to call schedule_work()
> instead of calling it directly. irq_work is an NMI-safe deferred work
> framework that performs the requested work from a hardirq context
> (usually an IPI but it can be timer interrupt on some
> architectures).
>
> Note that we still need to a workqueue since we cannot resync
> the keyboard state from the hardirq context provided by irq_work.
> That must be done from task context for the calls into the input
> subystem. Hence we must defer the work twice. First to safely
> switch from the debug trap (NMI-like context) to hardirq and
> then, secondly, to get from hardirq to the system workqueue.
>
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>
> ---
> V6 -> V7: Add comments in the code.
> V5 -> V6: Replace with a more professional and accurate answer.
> V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
> V3 -> V4: Add changelogs
> V2 -> V3: Add description information
> V1 -> V2: using irq_work to solve this properly.
> ---
> ---
>  drivers/tty/serial/kgdboc.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 7ce7bb164..750ed66d2 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,7 @@
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
>  #include <linux/serial_core.h>
> +#include <linux/irq_work.h>
>
>  #define MAX_CONFIG_LEN		40
>
> @@ -98,11 +99,29 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
>  }
>
>  static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
> +/*
> + * We fix the problem by using irq_work to call schedule_work()
> + * instead of calling it directly.

What problem?

Put another way this text has been copy-pasted from
the commit message without any editing to make it make sense. Text in
the C file needs to be standalone!

More like:
--- cut here ---
This code ensures that the keyboard state, which is changed during kdb
execution, is resynchronized when we leave the debug trap. The resync
logic calls into the input subsystem to force a reset. The calls into
the input subsystem must be executed from normal task context.

We need to trigger the resync from the debug trap, which executes in an
NMI (or similar) context. To make it safe to call into the input
subsystem we end up having use two deferred execution techniques.
Firstly, we use irq_work, which is NMI-safe, to provoke a callback from
hardirq context. Then, from the hardirq callback we use the system
workqueue to provoke the callback that actually performs the resync.
--- cut here ---

>                                     irq_work is an NMI-safe deferred work
> + * framework that performs the requested work from a hardirq context
> + * (usually an IPI but it can be timer interrupt on some
> + * architectures). Note that we still need to a workqueue since we cannot resync
> + * the keyboard state from the hardirq context provided by irq_work.
> + * That must be done from task context for the calls into the input
> + * subystem. Hence we must defer the work twice. First to safely
> + * switch from the debug trap (NMI-like context) to hardirq and
> + * then, secondly, to get from hardirq to the system workqueue.
> + */

Please find a better place to anchor the comment. It should be further
up in the file when the first bit of deferred work appears, perhaps near
kgdboc_restore_input_helper().


> +static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
> +{
> +	schedule_work(&kgdboc_restore_input_work);
> +}
> +
> +static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-02 12:58                                                     ` Daniel Thompson
@ 2024-04-03  6:11                                                       ` liu.yec
  2024-04-03 13:58                                                         ` Daniel Thompson
  2024-04-03 22:22                                                         ` Andy Shevchenko
  0 siblings, 2 replies; 48+ messages in thread
From: liu.yec @ 2024-04-03  6:11 UTC (permalink / raw)
  To: daniel.thompson
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial, liu.yec, LiuYe

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. This is because we cannot
resynchronize the keyboard state from the hardirq context
provided by irq_work. This must be done from the task context
in order to call the input subsystem.

Therefore, we have to defer the work twice. First, safely
switch from the debug trap context (similar to NMI) to the
hardirq, and then switch from the hardirq to the system work queue.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V7 -> V8: Update the description information and comments in the code.
	: Submit this patch based on version linux-6.9-rc2.
V6 -> V7: Add comments in the code.
V5 -> V6: Replace with a more professional and accurate answer.
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..d6ce945f0 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>
 
 #define MAX_CONFIG_LEN		40
 
@@ -82,6 +83,19 @@ static struct input_handler kgdboc_reset_handler = {
 
 static DEFINE_MUTEX(kgdboc_reset_mutex);
 
+/*
+ * This code ensures that the keyboard state, which is changed during kdb
+ * execution, is resynchronized when we leave the debug trap. The resync
+ * logic calls into the input subsystem to force a reset. The calls into
+ * the input subsystem must be executed from normal task context.
+ *
+ * We need to trigger the resync from the debug trap, which executes in an
+ * NMI (or similar) context. To make it safe to call into the input
+ * subsystem we end up having use two deferred execution techniques.
+ * Firstly, we use irq_work, which is NMI-safe, to provoke a callback from
+ * hardirq context. Then, from the hardirq callback we use the system
+ * workqueue to provoke the callback that actually performs the resync.
+ */
 static void kgdboc_restore_input_helper(struct work_struct *dummy)
 {
 	/*
@@ -99,10 +113,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +154,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-03  6:11                                                       ` [PATCH V8] " liu.yec
@ 2024-04-03 13:58                                                         ` Daniel Thompson
  2024-04-03 22:22                                                         ` Andy Shevchenko
  1 sibling, 0 replies; 48+ messages in thread
From: Daniel Thompson @ 2024-04-03 13:58 UTC (permalink / raw)
  To: liu.yec
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial

On Wed, Apr 03, 2024 at 02:11:09PM +0800, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
>
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
>
> <snip>
>
> We fix the problem by using irq_work to call schedule_work()
> instead of calling it directly. This is because we cannot
> resynchronize the keyboard state from the hardirq context
> provided by irq_work. This must be done from the task context
> in order to call the input subsystem.
>
> Therefore, we have to defer the work twice. First, safely
> switch from the debug trap context (similar to NMI) to the
> hardirq, and then switch from the hardirq to the system work queue.
>
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

I'm happy with how this is looking. In the long term it might be good to
move the keyboard resync code so it is with the rest of the kdb keyboard
code rather than in tty/serial. However I certainly don't want to tangle
that kind of clean up along with a bug fix so I think this is ready to
go now.

@Greg: I assume you want to take this via the tty/serial tree? I
contributed a fair bit to the eventual patch so a Reviewed-by from me
probably isn't appropriate but if you want to take the code it is
certainly:
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-03  6:11                                                       ` [PATCH V8] " liu.yec
  2024-04-03 13:58                                                         ` Daniel Thompson
@ 2024-04-03 22:22                                                         ` Andy Shevchenko
  2024-04-08  1:44                                                           ` LiuYe
  1 sibling, 1 reply; 48+ messages in thread
From: Andy Shevchenko @ 2024-04-03 22:22 UTC (permalink / raw)
  To: liu.yec
  Cc: daniel.thompson, dianders, gregkh, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

Wed, Apr 03, 2024 at 02:11:09PM +0800, liu.yec@h3c.com kirjoitti:
> From: LiuYe <liu.yeC@h3c.com>
> 
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
> 
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().

> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80

Ouch.
Please, read this
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages
and modify the commit message accordingly.

> We fix the problem by using irq_work to call schedule_work()
> instead of calling it directly. This is because we cannot
> resynchronize the keyboard state from the hardirq context
> provided by irq_work. This must be done from the task context
> in order to call the input subsystem.
> 
> Therefore, we have to defer the work twice. First, safely
> switch from the debug trap context (similar to NMI) to the
> hardirq, and then switch from the hardirq to the system work queue.

> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>

Correct tag is Co-developed-by, btw it's written in the same document the link
to which I provided a few lines above.

...

> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,7 @@
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
>  #include <linux/serial_core.h>
> +#include <linux/irq_work.h>

Please, keep it ordered (with visible context this should go at least before
module.h).

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-03 22:22                                                         ` Andy Shevchenko
@ 2024-04-08  1:44                                                           ` LiuYe
  2024-04-08 10:29                                                             ` Andy Shevchenko
  0 siblings, 1 reply; 48+ messages in thread
From: LiuYe @ 2024-04-08  1:44 UTC (permalink / raw)
  To: andy.shevchenko
  Cc: daniel.thompson, dianders, gregkh, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial, liu.yec, LiuYe

>Wed, Apr 03, 2024 at 02:11:09PM +0800, liu.yec@h3c.com kirjoitti:
>> From: LiuYe <liu.yeC@h3c.com>
>> 
>> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
>> attempt to use schedule_work() to provoke a keyboard reset when
>> transitioning out of the debugger and back to normal operation.
>> This can cause deadlock because schedule_work() is not NMI-safe.
>> 
>> The stack trace below shows an example of the problem. In this
>> case the master cpu is not running from NMI but it has parked
>> the slave CPUs using an NMI and the parked CPUs is holding
>> spinlocks needed by schedule_work().
>
>> example:
>>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
>> Call Trace:
>>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>>   Call Trace:
>>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>>   [<ffffffff81017219>] do_int3+0x89/0x120
>>   [<ffffffff81480fb4>] int3+0x44/0x80
>
>Ouch.
>Please, read this
>https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages
>and modify the commit message accordingly.

The example is the printout of the kernel lockup detection mechanism, which may be easier to understand. 
If organized according to the format provided in the previous link, should it be arranged as follows?

Example:
BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
CPU1: Call Trace:
__schedule
schedule
schedule_hrtimeout_range_clock
mutex_unlock
ep_scan_ready_list
schedule_hrtimeout_range
ep_poll
wake_up_q
SyS_epoll_wait
entry_SYSCALL_64_fastpath

CPU0: Call Trace:
dump_stack
spin_dump
do_raw_spin_lock
_raw_spin_lock
try_to_wake_up
wake_up_process
insert_work
__queue_work
queue_work_on
kgdboc_post_exp_handler
kgdb_cpu_enter
kgdb_handle_exception
__kgdb_notify
kgdb_notify
notifier_call_chain
notify_die
do_int3
int3

>> We fix the problem by using irq_work to call schedule_work()
>> instead of calling it directly. This is because we cannot
>> resynchronize the keyboard state from the hardirq context
>> provided by irq_work. This must be done from the task context
>> in order to call the input subsystem.
>> 
>> Therefore, we have to defer the work twice. First, safely
>> switch from the debug trap context (similar to NMI) to the
>> hardirq, and then switch from the hardirq to the system work queue.
>
>> Signed-off-by: LiuYe <liu.yeC@h3c.com>
>> Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>
>
>Correct tag is Co-developed-by, btw it's written in the same document the link
>to which I provided a few lines above.

Yes, there will be warnings when using the ./scripts/checkpatch.pl script to check.

WARNING: Non-standard signature: Co-authored-by:
#68:
Co-authored-by: Daniel Thompson <daniel.thompson@linaro.org>

total: 0 errors, 1 warnings, 51 lines checked 

I will change it to the following:

Co-developed-by: Daniel Thompson <daniel.thompson@linaro.org>

>
>> --- a/drivers/tty/serial/kgdboc.c
>> +++ b/drivers/tty/serial/kgdboc.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/module.h>
>>  #include <linux/platform_device.h>
>>  #include <linux/serial_core.h>
>> +#include <linux/irq_work.h>
>
>Please, keep it ordered (with visible context this should go at least before
>module.h).

I don't understand why this needs to be placed before module.h. Please explain further, thank you.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Re: [PATCH V8] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-08  1:44                                                           ` LiuYe
@ 2024-04-08 10:29                                                             ` Andy Shevchenko
  2024-04-09  2:03                                                               ` [PATCH V9] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: Andy Shevchenko @ 2024-04-08 10:29 UTC (permalink / raw)
  To: LiuYe
  Cc: daniel.thompson, dianders, gregkh, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

On Mon, Apr 8, 2024 at 4:46 AM LiuYe <liu.yeC@h3c.com> wrote:
> >Wed, Apr 03, 2024 at 02:11:09PM +0800, liu.yec@h3c.com kirjoitti:

...

> >Ouch.
> >Please, read this
> >https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages
> >and modify the commit message accordingly.
>
> The example is the printout of the kernel lockup detection mechanism, which may be easier to understand.
> If organized according to the format provided in the previous link, should it be arranged as follows?

Do you think all lines are important from this?
Do you think you haven't dropped anything useful?

If "yes" is the answer to both Qs, then go with it (but at least I see
that first seems to me as "no", some lines are not important)


> Example:
> BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
> CPU1: Call Trace:
> __schedule
> schedule
> schedule_hrtimeout_range_clock
> mutex_unlock
> ep_scan_ready_list
> schedule_hrtimeout_range
> ep_poll
> wake_up_q
> SyS_epoll_wait
> entry_SYSCALL_64_fastpath
>
> CPU0: Call Trace:
> dump_stack
> spin_dump
> do_raw_spin_lock
> _raw_spin_lock
> try_to_wake_up
> wake_up_process
> insert_work
> __queue_work
> queue_work_on
> kgdboc_post_exp_handler
> kgdb_cpu_enter
> kgdb_handle_exception
> __kgdb_notify
> kgdb_notify
> notifier_call_chain
> notify_die
> do_int3
> int3

...

> >>  #include <linux/module.h>
> >>  #include <linux/platform_device.h>
> >>  #include <linux/serial_core.h>
> >> +#include <linux/irq_work.h>
> >
> >Please, keep it ordered (with visible context this should go at least before
> >module.h).
>
> I don't understand why this needs to be placed before module.h. Please explain further, thank you.

Alphabetical order helps long-term maintenance. Yes, I know that it is
not _fully_ sorted, but don't add more mess to it.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V9] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-08 10:29                                                             ` Andy Shevchenko
@ 2024-04-09  2:03                                                               ` liu.yec
  2024-04-10  2:06                                                                 ` [PATCH V10] " liu.yec
  0 siblings, 1 reply; 48+ messages in thread
From: liu.yec @ 2024-04-09  2:03 UTC (permalink / raw)
  To: andy.shevchenko, daniel.thompson
  Cc: dianders, gregkh, jason.wessel, jirislaby, kgdb-bugreport,
	linux-kernel, linux-serial, liu.yeC

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

Example:
BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
CPU1: Call Trace:
__schedule
schedule
schedule_hrtimeout_range_clock
mutex_unlock
ep_scan_ready_list
schedule_hrtimeout_range
ep_poll
wake_up_q
SyS_epoll_wait
entry_SYSCALL_64_fastpath

CPU0: Call Trace:
dump_stack
spin_dump
do_raw_spin_lock
_raw_spin_lock
try_to_wake_up
wake_up_process
insert_work
__queue_work
queue_work_on
kgdboc_post_exp_handler
kgdb_cpu_enter
kgdb_handle_exception
__kgdb_notify
kgdb_notify
notifier_call_chain
notify_die
do_int3
int3

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. This is because we cannot
resynchronize the keyboard state from the hardirq context
provided by irq_work. This must be done from the task context
in order to call the input subsystem.

Therefore, we have to defer the work twice. First, safely
switch from the debug trap context (similar to NMI) to the
hardirq, and then switch from the hardirq to the system work queue.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-developed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V8 -> V9: Modify call trace format and move irq_work.h before module.h
V7 -> V8: Update the description information and comments in the code.
	: Submit this patch based on version linux-6.9-rc2.
V6 -> V7: Add comments in the code.
V5 -> V6: Replace with a more professional and accurate answer.
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..32410fec7 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -19,6 +19,7 @@
 #include <linux/console.h>
 #include <linux/vt_kern.h>
 #include <linux/input.h>
+#include <linux/irq_work.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
@@ -82,6 +83,19 @@ static struct input_handler kgdboc_reset_handler = {
 
 static DEFINE_MUTEX(kgdboc_reset_mutex);
 
+/*
+ * This code ensures that the keyboard state, which is changed during kdb
+ * execution, is resynchronized when we leave the debug trap. The resync
+ * logic calls into the input subsystem to force a reset. The calls into
+ * the input subsystem must be executed from normal task context.
+ *
+ * We need to trigger the resync from the debug trap, which executes in an
+ * NMI (or similar) context. To make it safe to call into the input
+ * subsystem we end up having use two deferred execution techniques.
+ * Firstly, we use irq_work, which is NMI-safe, to provoke a callback from
+ * hardirq context. Then, from the hardirq callback we use the system
+ * workqueue to provoke the callback that actually performs the resync.
+ */
 static void kgdboc_restore_input_helper(struct work_struct *dummy)
 {
 	/*
@@ -99,10 +113,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +154,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-09  2:03                                                               ` [PATCH V9] " liu.yec
@ 2024-04-10  2:06                                                                 ` liu.yec
  2024-04-10  3:59                                                                   ` Andy Shevchenko
  2024-04-10  5:30                                                                   ` Greg KH
  0 siblings, 2 replies; 48+ messages in thread
From: liu.yec @ 2024-04-10  2:06 UTC (permalink / raw)
  To: andy.shevchenko, daniel.thompson, gregkh
  Cc: dianders, jason.wessel, jirislaby, kgdb-bugreport, linux-kernel,
	linux-serial, liu.yeC

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

Example:
BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
CPU1: Call Trace:
__schedule
schedule
schedule_hrtimeout_range_clock
mutex_unlock
ep_scan_ready_list
schedule_hrtimeout_range
ep_poll
wake_up_q
SyS_epoll_wait
entry_SYSCALL_64_fastpath

CPU0: Call Trace:
dump_stack
spin_dump
do_raw_spin_lock
_raw_spin_lock
try_to_wake_up
wake_up_process
insert_work
__queue_work
queue_work_on
kgdboc_post_exp_handler
kgdb_cpu_enter
kgdb_handle_exception
__kgdb_notify
kgdb_notify
notifier_call_chain
notify_die
do_int3
int3

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. This is because we cannot
resynchronize the keyboard state from the hardirq context
provided by irq_work. This must be done from the task context
in order to call the input subsystem.

Therefore, we have to defer the work twice. First, safely
switch from the debug trap context (similar to NMI) to the
hardirq, and then switch from the hardirq to the system work queue.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-developed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>

---
V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson
V8 -> V9: Modify call trace format and move irq_work.h before module.h
V7 -> V8: Update the description information and comments in the code.
	: Submit this patch based on version linux-6.9-rc2.
V6 -> V7: Add comments in the code.
V5 -> V6: Replace with a more professional and accurate answer.
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..32410fec7 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -19,6 +19,7 @@
 #include <linux/console.h>
 #include <linux/vt_kern.h>
 #include <linux/input.h>
+#include <linux/irq_work.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
@@ -82,6 +83,19 @@ static struct input_handler kgdboc_reset_handler = {
 
 static DEFINE_MUTEX(kgdboc_reset_mutex);
 
+/*
+ * This code ensures that the keyboard state, which is changed during kdb
+ * execution, is resynchronized when we leave the debug trap. The resync
+ * logic calls into the input subsystem to force a reset. The calls into
+ * the input subsystem must be executed from normal task context.
+ *
+ * We need to trigger the resync from the debug trap, which executes in an
+ * NMI (or similar) context. To make it safe to call into the input
+ * subsystem we end up having use two deferred execution techniques.
+ * Firstly, we use irq_work, which is NMI-safe, to provoke a callback from
+ * hardirq context. Then, from the hardirq callback we use the system
+ * workqueue to provoke the callback that actually performs the resync.
+ */
 static void kgdboc_restore_input_helper(struct work_struct *dummy)
 {
 	/*
@@ -99,10 +113,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +154,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  2:06                                                                 ` [PATCH V10] " liu.yec
@ 2024-04-10  3:59                                                                   ` Andy Shevchenko
  2024-04-10  5:30                                                                   ` Greg KH
  1 sibling, 0 replies; 48+ messages in thread
From: Andy Shevchenko @ 2024-04-10  3:59 UTC (permalink / raw)
  To: liu.yec
  Cc: daniel.thompson, gregkh, dianders, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

On Wed, Apr 10, 2024 at 5:07 AM <liu.yec@h3c.com> wrote:
>
> From: LiuYe <liu.yeC@h3c.com>
>
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
>
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().
>
> Example:
> BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
> CPU1: Call Trace:
> __schedule
> schedule
> schedule_hrtimeout_range_clock
> mutex_unlock
> ep_scan_ready_list
> schedule_hrtimeout_range
> ep_poll
> wake_up_q
> SyS_epoll_wait
> entry_SYSCALL_64_fastpath
>
> CPU0: Call Trace:
> dump_stack
> spin_dump
> do_raw_spin_lock
> _raw_spin_lock
> try_to_wake_up
> wake_up_process
> insert_work
> __queue_work
> queue_work_on
> kgdboc_post_exp_handler
> kgdb_cpu_enter
> kgdb_handle_exception
> __kgdb_notify
> kgdb_notify
> notifier_call_chain
> notify_die
> do_int3
> int3
>
> We fix the problem by using irq_work to call schedule_work()
> instead of calling it directly. This is because we cannot
> resynchronize the keyboard state from the hardirq context
> provided by irq_work. This must be done from the task context
> in order to call the input subsystem.
>
> Therefore, we have to defer the work twice. First, safely
> switch from the debug trap context (similar to NMI) to the
> hardirq, and then switch from the hardirq to the system work queue.

...

> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>

> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson

Huh?!

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  2:06                                                                 ` [PATCH V10] " liu.yec
  2024-04-10  3:59                                                                   ` Andy Shevchenko
@ 2024-04-10  5:30                                                                   ` Greg KH
  2024-04-10  5:54                                                                     ` 答复: " Liuye
  1 sibling, 1 reply; 48+ messages in thread
From: Greg KH @ 2024-04-10  5:30 UTC (permalink / raw)
  To: liu.yec
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

On Wed, Apr 10, 2024 at 10:06:15AM +0800, liu.yec@h3c.com wrote:
> From: LiuYe <liu.yeC@h3c.com>
> 
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
> 
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().
> 
> Example:
> BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
> CPU1: Call Trace:
> __schedule
> schedule
> schedule_hrtimeout_range_clock
> mutex_unlock
> ep_scan_ready_list
> schedule_hrtimeout_range
> ep_poll
> wake_up_q
> SyS_epoll_wait
> entry_SYSCALL_64_fastpath
> 
> CPU0: Call Trace:
> dump_stack
> spin_dump
> do_raw_spin_lock
> _raw_spin_lock
> try_to_wake_up
> wake_up_process
> insert_work
> __queue_work
> queue_work_on
> kgdboc_post_exp_handler
> kgdb_cpu_enter
> kgdb_handle_exception
> __kgdb_notify
> kgdb_notify
> notifier_call_chain
> notify_die
> do_int3
> int3
> 
> We fix the problem by using irq_work to call schedule_work()
> instead of calling it directly. This is because we cannot
> resynchronize the keyboard state from the hardirq context
> provided by irq_work. This must be done from the task context
> in order to call the input subsystem.
> 
> Therefore, we have to defer the work twice. First, safely
> switch from the debug trap context (similar to NMI) to the
> hardirq, and then switch from the hardirq to the system work queue.
> 
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> Co-developed-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>

I have NOT signed off on this commit.  You just said that I made a legal
statement about this commit without that actually being true???

Sorry, but that is flat out not acceptable at all.  Please go work with
your company lawyers to figure out what you did and come back with an
explaination of exactly what this is and how it will not happen again.

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  5:30                                                                   ` Greg KH
@ 2024-04-10  5:54                                                                     ` Liuye
  2024-04-10  5:59                                                                       ` Greg KH
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-04-10  5:54 UTC (permalink / raw)
  To: Greg KH, andy.shevchenko
  Cc: daniel.thompson, dianders, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

>> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
>
>I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
>
>Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
>

>> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
>
>> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by 
>> of Daniel Thompson
>
>Huh?!

@greg k-h :
@Andy Shevchenko :

Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.

I want to express my gratitude for the suggestions on the patch from the two of you. 

What do I need to do now? Release PATCH V11 and delete these two signatures in it ?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  5:54                                                                     ` 答复: " Liuye
@ 2024-04-10  5:59                                                                       ` Greg KH
  2024-04-10  6:10                                                                         ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2024-04-10  5:59 UTC (permalink / raw)
  To: Liuye
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

On Wed, Apr 10, 2024 at 05:54:08AM +0000, Liuye wrote:
> >> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
> >
> >I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
> >
> >Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
> >
> 
> >> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> >
> >> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by 
> >> of Daniel Thompson
> >
> >Huh?!
> 
> @greg k-h :
> @Andy Shevchenko :
> 
> Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.
> 
> I want to express my gratitude for the suggestions on the patch from the two of you. 
> 
> What do I need to do now? Release PATCH V11 and delete these two signatures in it ?

As I said, go work with your corporate lawyers on this to understand
what just happened and have them let us know how it will not happen
again.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  5:59                                                                       ` Greg KH
@ 2024-04-10  6:10                                                                         ` Liuye
  2024-04-10  6:15                                                                           ` Greg KH
  0 siblings, 1 reply; 48+ messages in thread
From: Liuye @ 2024-04-10  6:10 UTC (permalink / raw)
  To: Greg KH
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

>On Wed, Apr 10, 2024 at 05:54:08AM +0000, Liuye wrote:
>> >> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
>> >
>> >I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
>> >
>> >Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
>> >
>> 
>> >> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
>> >
>> >> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, 
>> >> Acked-by of Daniel Thompson
>> >
>> >Huh?!
>> 
>> @greg k-h :
>> @Andy Shevchenko :
>> 
>> Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.
>> 
>> I want to express my gratitude for the suggestions on the patch from the two of you. 
>> 
>> What do I need to do now? Release PATCH V11 and delete these two signatures in it ?
>
>As I said, go work with your corporate lawyers on this to understand what just happened and have them let us know how it will not happen again.

I'm very sorry, this is my first time submitting a patch and I made a significant mistake in using "Signed-off-by". I now understand the meaning of this field and will not make the same mistake again in the future.

Thanks,
Liu Ye

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 答复: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  6:10                                                                         ` 答复: " Liuye
@ 2024-04-10  6:15                                                                           ` Greg KH
  2024-04-10  6:30                                                                             ` 答复: " Liuye
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2024-04-10  6:15 UTC (permalink / raw)
  To: Liuye
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

On Wed, Apr 10, 2024 at 06:10:10AM +0000, Liuye wrote:
> >On Wed, Apr 10, 2024 at 05:54:08AM +0000, Liuye wrote:
> >> >> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
> >> >
> >> >I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
> >> >
> >> >Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
> >> >
> >> 
> >> >> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> >> >
> >> >> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, 
> >> >> Acked-by of Daniel Thompson
> >> >
> >> >Huh?!
> >> 
> >> @greg k-h :
> >> @Andy Shevchenko :
> >> 
> >> Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.
> >> 
> >> I want to express my gratitude for the suggestions on the patch from the two of you. 
> >> 
> >> What do I need to do now? Release PATCH V11 and delete these two signatures in it ?
> >
> >As I said, go work with your corporate lawyers on this to understand what just happened and have them let us know how it will not happen again.
> 
> I'm very sorry, this is my first time submitting a patch and I made a significant mistake in using "Signed-off-by". I now understand the meaning of this field and will not make the same mistake again in the future.

Understood, but you still need to go work with your corporate legal
group so that you all ensure this does not happen again for any other
developer in your company, as I am sure they will want to know about
this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  6:15                                                                           ` Greg KH
@ 2024-04-10  6:30                                                                             ` Liuye
  2024-04-10  7:18                                                                               ` [PATCH V11] " liu.yec
  2024-04-10  8:24                                                                               ` 答复: 答复: 答复: [PATCH V10] " Greg KH
  0 siblings, 2 replies; 48+ messages in thread
From: Liuye @ 2024-04-10  6:30 UTC (permalink / raw)
  To: Greg KH
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

>On Wed, Apr 10, 2024 at 06:10:10AM +0000, Liuye wrote:
>> >On Wed, Apr 10, 2024 at 05:54:08AM +0000, Liuye wrote:
>> >> >> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
>> >> >
>> >> >I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
>> >> >
>> >> >Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
>> >> >
>> >> 
>> >> >> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
>> >> >
>> >> >> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, 
>> >> >> Acked-by of Daniel Thompson
>> >> >
>> >> >Huh?!
>> >> 
>> >> @greg k-h :
>> >> @Andy Shevchenko :
>> >> 
>> >> Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.
>> >> 
>> >> I want to express my gratitude for the suggestions on the patch from the two of you. 
>> >> 
>> >> What do I need to do now? Release PATCH V11 and delete these two signatures in it ?
>> >
>> >As I said, go work with your corporate lawyers on this to understand what just happened and have them let us know how it will not happen again.
>> 
>> I'm very sorry, this is my first time submitting a patch and I made a significant mistake in using "Signed-off-by". I now understand the meaning of this field and will not make the same mistake again in the future.
>
>Understood, but you still need to go work with your corporate legal group so that you all ensure this does not happen again for any other developer in your company, as I am sure they will want to know about this.

Ok, I will report this to the company. At the same time, I will add an audit mechanism to the patch submission process. Thanks again for your reminder.

I will remove this part in PATCH V11.

Thanks,
Liu Ye

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V11] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  6:30                                                                             ` 答复: " Liuye
@ 2024-04-10  7:18                                                                               ` liu.yec
  2024-04-10  8:24                                                                               ` 答复: 答复: 答复: [PATCH V10] " Greg KH
  1 sibling, 0 replies; 48+ messages in thread
From: liu.yec @ 2024-04-10  7:18 UTC (permalink / raw)
  To: andy.shevchenko, daniel.thompson, gregkh
  Cc: dianders, jason.wessel, jirislaby, kgdb-bugreport, linux-kernel,
	linux-serial, liu.yeC

From: LiuYe <liu.yeC@h3c.com>

Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
attempt to use schedule_work() to provoke a keyboard reset when
transitioning out of the debugger and back to normal operation.
This can cause deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slave CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().

Example:
BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
CPU1: Call Trace:
__schedule
schedule
schedule_hrtimeout_range_clock
mutex_unlock
ep_scan_ready_list
schedule_hrtimeout_range
ep_poll
wake_up_q
SyS_epoll_wait
entry_SYSCALL_64_fastpath

CPU0: Call Trace:
dump_stack
spin_dump
do_raw_spin_lock
_raw_spin_lock
try_to_wake_up
wake_up_process
insert_work
__queue_work
queue_work_on
kgdboc_post_exp_handler
kgdb_cpu_enter
kgdb_handle_exception
__kgdb_notify
kgdb_notify
notifier_call_chain
notify_die
do_int3
int3

We fix the problem by using irq_work to call schedule_work()
instead of calling it directly. This is because we cannot
resynchronize the keyboard state from the hardirq context
provided by irq_work. This must be done from the task context
in order to call the input subsystem.

Therefore, we have to defer the work twice. First, safely
switch from the debug trap context (similar to NMI) to the
hardirq, and then switch from the hardirq to the system work queue.

Signed-off-by: LiuYe <liu.yeC@h3c.com>
Co-developed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
V10 -> V11: Revert to V9
V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson
V8 -> V9: Modify call trace format and move irq_work.h before module.h
V7 -> V8: Update the description information and comments in the code.
	: Submit this patch based on version linux-6.9-rc2.
V6 -> V7: Add comments in the code.
V5 -> V6: Replace with a more professional and accurate answer.
V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
V3 -> V4: Add changelogs
V2 -> V3: Add description information
V1 -> V2: using irq_work to solve this properly.
---
---
 drivers/tty/serial/kgdboc.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb164..32410fec7 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -19,6 +19,7 @@
 #include <linux/console.h>
 #include <linux/vt_kern.h>
 #include <linux/input.h>
+#include <linux/irq_work.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
@@ -82,6 +83,19 @@ static struct input_handler kgdboc_reset_handler = {
 
 static DEFINE_MUTEX(kgdboc_reset_mutex);
 
+/*
+ * This code ensures that the keyboard state, which is changed during kdb
+ * execution, is resynchronized when we leave the debug trap. The resync
+ * logic calls into the input subsystem to force a reset. The calls into
+ * the input subsystem must be executed from normal task context.
+ *
+ * We need to trigger the resync from the debug trap, which executes in an
+ * NMI (or similar) context. To make it safe to call into the input
+ * subsystem we end up having use two deferred execution techniques.
+ * Firstly, we use irq_work, which is NMI-safe, to provoke a callback from
+ * hardirq context. Then, from the hardirq callback we use the system
+ * workqueue to provoke the callback that actually performs the resync.
+ */
 static void kgdboc_restore_input_helper(struct work_struct *dummy)
 {
 	/*
@@ -99,10 +113,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
 
 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
 
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }
 
 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +154,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: 答复: 答复: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  6:30                                                                             ` 答复: " Liuye
  2024-04-10  7:18                                                                               ` [PATCH V11] " liu.yec
@ 2024-04-10  8:24                                                                               ` Greg KH
  2024-04-10  8:38                                                                                 ` 答复: " Liuye
  1 sibling, 1 reply; 48+ messages in thread
From: Greg KH @ 2024-04-10  8:24 UTC (permalink / raw)
  To: Liuye
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

On Wed, Apr 10, 2024 at 06:30:59AM +0000, Liuye wrote:
> >On Wed, Apr 10, 2024 at 06:10:10AM +0000, Liuye wrote:
> >> >On Wed, Apr 10, 2024 at 05:54:08AM +0000, Liuye wrote:
> >> >> >> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
> >> >> >
> >> >> >I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
> >> >> >
> >> >> >Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
> >> >> >
> >> >> 
> >> >> >> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> >> >> >
> >> >> >> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, 
> >> >> >> Acked-by of Daniel Thompson
> >> >> >
> >> >> >Huh?!
> >> >> 
> >> >> @greg k-h :
> >> >> @Andy Shevchenko :
> >> >> 
> >> >> Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.
> >> >> 
> >> >> I want to express my gratitude for the suggestions on the patch from the two of you. 
> >> >> 
> >> >> What do I need to do now? Release PATCH V11 and delete these two signatures in it ?
> >> >
> >> >As I said, go work with your corporate lawyers on this to understand what just happened and have them let us know how it will not happen again.
> >> 
> >> I'm very sorry, this is my first time submitting a patch and I made a significant mistake in using "Signed-off-by". I now understand the meaning of this field and will not make the same mistake again in the future.
> >
> >Understood, but you still need to go work with your corporate legal group so that you all ensure this does not happen again for any other developer in your company, as I am sure they will want to know about this.
> 
> Ok, I will report this to the company. At the same time, I will add an audit mechanism to the patch submission process. Thanks again for your reminder.
> 
> I will remove this part in PATCH V11.

No, you will need to do this before we can accept your change.  And some
sort of verification that this is now in place properly for obvious
reasons.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* 答复: 答复: 答复: 答复: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-10  8:24                                                                               ` 答复: 答复: 答复: [PATCH V10] " Greg KH
@ 2024-04-10  8:38                                                                                 ` Liuye
  0 siblings, 0 replies; 48+ messages in thread
From: Liuye @ 2024-04-10  8:38 UTC (permalink / raw)
  To: Greg KH
  Cc: andy.shevchenko, daniel.thompson, dianders, jason.wessel,
	jirislaby, kgdb-bugreport, linux-kernel, linux-serial

>On Wed, Apr 10, 2024 at 06:30:59AM +0000, Liuye wrote:
>> >On Wed, Apr 10, 2024 at 06:10:10AM +0000, Liuye wrote:
>> >> >On Wed, Apr 10, 2024 at 05:54:08AM +0000, Liuye wrote:
>> >> >> >> Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
>> >> >> >
>> >> >> >I have NOT signed off on this commit.  You just said that I made a legal statement about this commit without that actually being true???
>> >> >> >
>> >> >> >Sorry, but that is flat out not acceptable at all.  Please go work with your company lawyers to figure out what you did and come back with an explaination of exactly what this is and how it will not happen again.
>> >> >> >
>> >> >> 
>> >> >> >> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
>> >> >> >
>> >> >> >> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, 
>> >> >> >> Acked-by of Daniel Thompson
>> >> >> >
>> >> >> >Huh?!
>> >> >> 
>> >> >> @greg k-h :
>> >> >> @Andy Shevchenko :
>> >> >> 
>> >> >> Sorry, it was my mistake. I misunderstood the meaning of "signed-off-by", which led to usage issues.
>> >> >> 
>> >> >> I want to express my gratitude for the suggestions on the patch from the two of you. 
>> >> >> 
>> >> >> What do I need to do now? Release PATCH V11 and delete these two signatures in it ?
>> >> >
>> >> >As I said, go work with your corporate lawyers on this to understand what just happened and have them let us know how it will not happen again.
>> >> 
>> >> I'm very sorry, this is my first time submitting a patch and I made a significant mistake in using "Signed-off-by". I now understand the meaning of this field and will not make the same mistake again in the future.
>> >
>> >Understood, but you still need to go work with your corporate legal group so that you all ensure this does not happen again for any other developer in your company, as I am sure they will want to know about this.
>> 
>> Ok, I will report this to the company. At the same time, I will add an audit mechanism to the patch submission process. Thanks again for your reminder.
>> 
>> I will remove this part in PATCH V11.
>
>No, you will need to do this before we can accept your change.  And some sort of verification that this is now in place properly for obvious reasons.

What does "No" mean? Are you talking about giving feedback to the company to prevent this incident from happening? Or submitting PATCH V11? If it's the former, how should I give you feedback?"

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-04-10  8:39 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-28  2:56 [PATCH] kdb: Fix the deadlock issue in KDB debugging LiuYe
2024-02-28 12:05 ` Daniel Thompson
2024-03-01  3:30   ` 答复: " Liuye
2024-03-01 10:59     ` Daniel Thompson
2024-03-12  8:37       ` 答复: " Liuye
2024-03-12  9:57         ` Daniel Thompson
2024-03-12 10:04           ` 答复: " Liuye
2024-03-12 10:24             ` Daniel Thompson
2024-03-13  1:22               ` 答复: " Liuye
2024-03-13 14:17                 ` Daniel Thompson
2024-03-14  7:06                   ` 答复: " Liuye
2024-03-14 13:09                     ` Daniel Thompson
2024-03-15  9:59                       ` 答复: " Liuye
2024-03-16  2:34                       ` [PATCH v1] " liu.yec
2024-03-20 16:28                         ` Daniel Thompson
2024-03-21  2:26                           ` [PATCH V3] " liu.yec
2024-03-21  7:38                             ` Greg KH
2024-03-21  7:57                               ` 答复: " Liuye
2024-03-21 11:04                                 ` Daniel Thompson
2024-03-21 11:50                                   ` [PATCH V4] " liu.yec
2024-03-22  6:54                                     ` Jiri Slaby
2024-03-22  7:50                                       ` 答复: " Liuye
2024-03-22 15:58                                         ` Daniel Thompson
2024-03-23  1:41                                           ` [PATCH V5] " liu.yec
2024-03-25 16:54                                             ` Daniel Thompson
2024-03-26  0:47                                               ` 答复: " Liuye
2024-03-26  7:40                                               ` [PATCH V6] " liu.yec
2024-03-26  8:22                                                 ` Greg KH
2024-03-26  8:54                                                   ` [PATCH V7] " liu.yec
2024-04-02 12:58                                                     ` Daniel Thompson
2024-04-03  6:11                                                       ` [PATCH V8] " liu.yec
2024-04-03 13:58                                                         ` Daniel Thompson
2024-04-03 22:22                                                         ` Andy Shevchenko
2024-04-08  1:44                                                           ` LiuYe
2024-04-08 10:29                                                             ` Andy Shevchenko
2024-04-09  2:03                                                               ` [PATCH V9] " liu.yec
2024-04-10  2:06                                                                 ` [PATCH V10] " liu.yec
2024-04-10  3:59                                                                   ` Andy Shevchenko
2024-04-10  5:30                                                                   ` Greg KH
2024-04-10  5:54                                                                     ` 答复: " Liuye
2024-04-10  5:59                                                                       ` Greg KH
2024-04-10  6:10                                                                         ` 答复: " Liuye
2024-04-10  6:15                                                                           ` Greg KH
2024-04-10  6:30                                                                             ` 答复: " Liuye
2024-04-10  7:18                                                                               ` [PATCH V11] " liu.yec
2024-04-10  8:24                                                                               ` 答复: 答复: 答复: [PATCH V10] " Greg KH
2024-04-10  8:38                                                                                 ` 答复: " Liuye
2024-03-02 20:44 ` [PATCH] " Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).