All of lore.kernel.org
 help / color / mirror / Atom feed
* Re:[PATCH V11] kdb: Fix the deadlock issue in KDB debugging.
@ 2024-04-17 11:01 Liuye
  2024-04-18  9:05 ` Daniel Thompson
  0 siblings, 1 reply; 2+ messages in thread
From: Liuye @ 2024-04-17 11:01 UTC (permalink / raw)
  To: Greg KH, daniel.thompson, andy.shevchenko
  Cc: dianders, jason.wessel, jirislaby, kgdb-bugreport, linux-kernel,
	linux-serial

>From: LiuYe <liu.yeC@h3c.com>
>
>Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
>attempt to use schedule_work() to provoke a keyboard reset when
>transitioning out of the debugger and back to normal operation.
>This can cause deadlock because schedule_work() is not NMI-safe.
>
>The stack trace below shows an example of the problem. In this
>case the master cpu is not running from NMI but it has parked
>the slave CPUs using an NMI and the parked CPUs is holding
>spinlocks needed by schedule_work().
>
>Example:
>BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
>CPU1: Call Trace:
>__schedule
>schedule
>schedule_hrtimeout_range_clock
>mutex_unlock
>ep_scan_ready_list
>schedule_hrtimeout_range
>ep_poll
>wake_up_q
>SyS_epoll_wait
>entry_SYSCALL_64_fastpath
>
>CPU0: Call Trace:
>dump_stack
>spin_dump
>do_raw_spin_lock
>_raw_spin_lock
>try_to_wake_up
>wake_up_process
>insert_work
>__queue_work
>queue_work_on
>kgdboc_post_exp_handler
>kgdb_cpu_enter
>kgdb_handle_exception
>__kgdb_notify
>kgdb_notify
>notifier_call_chain
>notify_die
>do_int3
>int3
>
>We fix the problem by using irq_work to call schedule_work()
>instead of calling it directly. This is because we cannot
>resynchronize the keyboard state from the hardirq context
>provided by irq_work. This must be done from the task context
>in order to call the input subsystem.
>
>Therefore, we have to defer the work twice. First, safely
>switch from the debug trap context (similar to NMI) to the
>hardirq, and then switch from the hardirq to the system work queue.
>
>Signed-off-by: LiuYe <liu.yeC@h3c.com>
>Co-developed-by: Daniel Thompson <daniel.thompson@linaro.org>
>Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>
>---
>V10 -> V11: Revert to V9
>V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson
>V8 -> V9: Modify call trace format and move irq_work.h before module.h
>V7 -> V8: Update the description information and comments in the code.
>	: Submit this patch based on version linux-6.9-rc2.
>V6 -> V7: Add comments in the code.
>V5 -> V6: Replace with a more professional and accurate answer.
>V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly.
>V3 -> V4: Add changelogs
>V2 -> V3: Add description information
>V1 -> V2: using irq_work to solve this properly.
>---
>---
> drivers/tty/serial/kgdboc.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
>index 7ce7bb164..32410fec7 100644
>--- a/drivers/tty/serial/kgdboc.c
>+++ b/drivers/tty/serial/kgdboc.c
>@@ -19,6 +19,7 @@
> #include <linux/console.h>
> #include <linux/vt_kern.h>
> #include <linux/input.h>
>+#include <linux/irq_work.h>
> #include <linux/module.h>
> #include <linux/platform_device.h>
> #include <linux/serial_core.h>
>@@ -82,6 +83,19 @@ static struct input_handler kgdboc_reset_handler = {
> 
> static DEFINE_MUTEX(kgdboc_reset_mutex);
> 
>+/*
>+ * This code ensures that the keyboard state, which is changed during kdb
>+ * execution, is resynchronized when we leave the debug trap. The resync
>+ * logic calls into the input subsystem to force a reset. The calls into
>+ * the input subsystem must be executed from normal task context.
>+ *
>+ * We need to trigger the resync from the debug trap, which executes in an
>+ * NMI (or similar) context. To make it safe to call into the input
>+ * subsystem we end up having use two deferred execution techniques.
>+ * Firstly, we use irq_work, which is NMI-safe, to provoke a callback from
>+ * hardirq context. Then, from the hardirq callback we use the system
>+ * workqueue to provoke the callback that actually performs the resync.
>+ */
> static void kgdboc_restore_input_helper(struct work_struct *dummy)
> {
> 	/*
>@@ -99,10 +113,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
> 
> static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
> 
>+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
>+{
>+	schedule_work(&kgdboc_restore_input_work);
>+}
>+
>+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
>+
> static void kgdboc_restore_input(void)
> {
> 	if (likely(system_state == SYSTEM_RUNNING))
>-		schedule_work(&kgdboc_restore_input_work);
>+		irq_work_queue(&kgdboc_restore_input_irq_work);
> }
> 
> static int kgdboc_register_kbd(char **cptr)
>@@ -133,6 +154,7 @@ static void kgdboc_unregister_kbd(void)
> 			i--;
> 		}
> 	}
>+	irq_work_sync(&kgdboc_restore_input_irq_work);
> 	flush_work(&kgdboc_restore_input_work);
> }
> #else /* ! CONFIG_KDB_KEYBOARD */
>-- 
>2.25.1

What is the current status of PATCH V11? Are there any additional modifications needed?


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Re:[PATCH V11] kdb: Fix the deadlock issue in KDB debugging.
  2024-04-17 11:01 Re:[PATCH V11] kdb: Fix the deadlock issue in KDB debugging Liuye
@ 2024-04-18  9:05 ` Daniel Thompson
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Thompson @ 2024-04-18  9:05 UTC (permalink / raw)
  To: Liuye
  Cc: Greg KH, andy.shevchenko, dianders, jason.wessel, jirislaby,
	kgdb-bugreport, linux-kernel, linux-serial

On Wed, Apr 17, 2024 at 11:01:56AM +0000, Liuye wrote:
> >---
> >V10 -> V11: Revert to V9
> >V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked
> >            by of Daniel Thompson
> >V8 -> V9: Modify call trace format and move irq_work.h before module.h
> >V7 -> V8: Update the description information and comments in the code.
> >	   : Submit this patch based on version linux-6.9-rc2.
> >V6 -> V7: Add comments in the code.
> >V5 -> V6: Replace with a more professional and accurate answer.
> >V4 -> V5: Answer why schedule another work in the irq_work and not do
> >          the job directly.
> >V3 -> V4: Add changelogs
> >V2 -> V3: Add description information
> >V1 -> V2: using irq_work to solve this properly.
> >---
>
> What is the current status of PATCH V11? Are there any additional
> modifications needed?

I understood that is blocked pending outcome of the legal matters
raised by v10...  and that this is why you were asked not to post
v11 until they had been resolved.

To be honest given that [I wrote all of the C code][1] for the most
recent version of the patch and that I'd like to see the bug fixed,
then I will probably have to give up on co-authorship. Instead I can
post my code with a new comment and patch description and credit you
with a Reported-by:. That should take the pressure off in terms of
landing this bug fix.

However, the legal issues do still need to be resolved or there is a
risk that other upstream contributions from your company will be
delayed.


Daniel.


[1]: https://lore.kernel.org/all/20240314130916.GE202685@aspen.lan/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-04-18  9:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-17 11:01 Re:[PATCH V11] kdb: Fix the deadlock issue in KDB debugging Liuye
2024-04-18  9:05 ` Daniel Thompson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.