All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] arm64: kgdb: fix single stepping
@ 2017-05-23  4:30 AKASHI Takahiro
  2017-05-23  4:30 ` [PATCH v3 1/4] " AKASHI Takahiro
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: AKASHI Takahiro @ 2017-05-23  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

This is the third version of my kgdb patch for fixing single stepping.
(Sorry for this lazy update.)


Kgdb support on arm64 was merged in v3.15, but from its first appearance,
"single step" has never worked well. This patchset fixes all the error
cases I found so far. Issues are reproducable, at least, on Hikey and fast
model and I tested the patches on both of them.

The original patch[1] was splitted into three pieces, patch #1 to #3,
addressing issues one by one, in order for necessary part of changes to
be applied to specific version of kernels, but it might no longer make
much sense.

Regarding to patch #4, I'm not still sure whether it is a right fix,
but it certainly fixes some corner case.

Changes for v3: (May 23, 2017)
* simplify patch #1 to respect the original code
* swap patch #2 and #3
* add patch #4

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/295632.html

AKASHI Takahiro (4):
  arm64: kgdb: fix single stepping
  arm64: kgdb: disable interrupts while a software step is enabled
  arm64: kgdb: prevent kgdb from being invoked recursively
  kgdb: select a correct cpu while in a single stepping

 arch/arm64/kernel/kgdb.c  | 62 ++++++++++++++++++++++++++++++++++++++---------
 kernel/debug/debug_core.c |  8 +++---
 2 files changed, 55 insertions(+), 15 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] arm64: kgdb: fix single stepping
  2017-05-23  4:30 [PATCH v3 0/4] arm64: kgdb: fix single stepping AKASHI Takahiro
@ 2017-05-23  4:30 ` AKASHI Takahiro
  2017-06-05 16:29   ` Will Deacon
  2017-05-23  4:30 ` [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled AKASHI Takahiro
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: AKASHI Takahiro @ 2017-05-23  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

After entering kgdb mode, the first 'stepi' can succeed, but the following
'stepi' never executes the next instruction.

This is because a software step cannot get enabled as the software step
bit(SS) in SPSR, which is cleared by the first single stepping, will not
be set again for the following 's' commands.
Please note that this bit, as well as the software step control bit(SS)
in MDSCR, must be set before resuming the execution.
kernel_active_single_step() called by kgdb_arch_handle_exception() checks
only for the bit in MDSCR, and so kgdb_enable_single_step() will never be
called.

This patch removes kgdb_disable_single_step() from 'c' command handling
and enables/disables a single step explicitly at every entry and exit
of 's' command handling.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
---
 arch/arm64/kernel/kgdb.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 2122cd187f19..b9176b324e5a 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -197,12 +197,6 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
 		atomic_set(&kgdb_cpu_doing_single_step, -1);
 		kgdb_single_step =  0;
 
-		/*
-		 * Received continue command, disable single step
-		 */
-		if (kernel_active_single_step())
-			kernel_disable_single_step();
-
 		err = 0;
 		break;
 	case 's':
@@ -217,7 +211,6 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
 		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
 		atomic_set(&kgdb_cpu_doing_single_step, raw_smp_processor_id());
 		kgdb_single_step =  1;
-
 		/*
 		 * Enable single step handling
 		 */
@@ -252,6 +245,8 @@ static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
 	if (!kgdb_single_step)
 		return DBG_HOOK_ERROR;
 
+	kernel_disable_single_step();
+
 	kgdb_handle_exception(1, SIGTRAP, 0, regs);
 	return 0;
 }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-05-23  4:30 [PATCH v3 0/4] arm64: kgdb: fix single stepping AKASHI Takahiro
  2017-05-23  4:30 ` [PATCH v3 1/4] " AKASHI Takahiro
@ 2017-05-23  4:30 ` AKASHI Takahiro
  2017-06-05 16:29   ` Will Deacon
  2017-05-23  4:30 ` [PATCH v3 3/4] arm64: kgdb: prevent kgdb from being invoked recursively AKASHI Takahiro
  2017-05-23  4:30 ` [PATCH v3 4/4] kgdb: select a correct cpu while in a single stepping AKASHI Takahiro
  3 siblings, 1 reply; 14+ messages in thread
From: AKASHI Takahiro @ 2017-05-23  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

After entering kgdb mode, 'stepi' may unexpectedly breaks the execution
somewhere in el1_irq.

This happens because a debug exception is always enabled in el1_irq
due to the following commit merged in v3.16:
  commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
			paths where possible")
A pending interrupt can be taken after kgdb has enabled a software step,
but before a debug exception is actually taken.

This patch enforces interrupts to be masked while single stepping.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
---
 arch/arm64/kernel/kgdb.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index b9176b324e5a..fddbc6be3780 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -28,6 +28,7 @@
 
 #include <asm/debug-monitors.h>
 #include <asm/insn.h>
+#include <asm/ptrace.h>
 #include <asm/traps.h>
 
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
@@ -111,6 +112,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
 	{ "fpcr", 4, -1 },
 };
 
+static DEFINE_PER_CPU(unsigned int, kgdb_pstate);
+
 char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
 {
 	if (regno >= DBG_MAX_REG_NUM || regno < 0)
@@ -200,6 +203,10 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
 		err = 0;
 		break;
 	case 's':
+		/* mask interrupts while single stepping */
+		__this_cpu_write(kgdb_pstate, linux_regs->pstate);
+		linux_regs->pstate |= PSR_I_BIT;
+
 		/*
 		 * Update step address value with address passed
 		 * with step packet.
@@ -242,11 +249,20 @@ NOKPROBE_SYMBOL(kgdb_compiled_brk_fn);
 
 static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
 {
+	unsigned int pstate;
+
 	if (!kgdb_single_step)
 		return DBG_HOOK_ERROR;
 
 	kernel_disable_single_step();
 
+	/* restore interrupt mask status */
+	pstate = __this_cpu_read(kgdb_pstate);
+	if (pstate & PSR_I_BIT)
+		regs->pstate |= PSR_I_BIT;
+	else
+		regs->pstate &= ~PSR_I_BIT;
+
 	kgdb_handle_exception(1, SIGTRAP, 0, regs);
 	return 0;
 }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] arm64: kgdb: prevent kgdb from being invoked recursively
  2017-05-23  4:30 [PATCH v3 0/4] arm64: kgdb: fix single stepping AKASHI Takahiro
  2017-05-23  4:30 ` [PATCH v3 1/4] " AKASHI Takahiro
  2017-05-23  4:30 ` [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled AKASHI Takahiro
@ 2017-05-23  4:30 ` AKASHI Takahiro
  2017-05-23  4:30 ` [PATCH v3 4/4] kgdb: select a correct cpu while in a single stepping AKASHI Takahiro
  3 siblings, 0 replies; 14+ messages in thread
From: AKASHI Takahiro @ 2017-05-23  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

If a breakpoint is set in an interrupt-sensitive place,
like gic_handle_irq(), a debug exception can be triggerred recursively.
We will see the following message:

    KGDB: re-enter error: breakpoint removed ffffffc000081258
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 650 at kernel/debug/debug_core.c:435
                                        kgdb_handle_exception+0x1dc/0x1f4()
    Modules linked in:
    CPU: 0 PID: 650 Comm: sh Not tainted 3.17.0-rc2+ #177
    Call trace:
    [<ffffffc000087fac>] dump_backtrace+0x0/0x130
    [<ffffffc0000880ec>] show_stack+0x10/0x1c
    [<ffffffc0004d683c>] dump_stack+0x74/0xb8
    [<ffffffc0000ab824>] warn_slowpath_common+0x8c/0xb4
    [<ffffffc0000ab90c>] warn_slowpath_null+0x14/0x20
    [<ffffffc000121bfc>] kgdb_handle_exception+0x1d8/0x1f4
    [<ffffffc000092ffc>] kgdb_brk_fn+0x18/0x28
    [<ffffffc0000821c8>] brk_handler+0x9c/0xe8
    [<ffffffc0000811e8>] do_debug_exception+0x3c/0xac
    Exception stack(0xffffffc07e027650 to 0xffffffc07e027770)
    ...
    [<ffffffc000083cac>] el1_dbg+0x14/0x68
    [<ffffffc00012178c>] kgdb_cpu_enter+0x464/0x5c0
    [<ffffffc000121bb4>] kgdb_handle_exception+0x190/0x1f4
    [<ffffffc000092ffc>] kgdb_brk_fn+0x18/0x28
    [<ffffffc0000821c8>] brk_handler+0x9c/0xe8
    [<ffffffc0000811e8>] do_debug_exception+0x3c/0xac
    Exception stack(0xffffffc07e027ac0 to 0xffffffc07e027be0)
    ...
    [<ffffffc000083cac>] el1_dbg+0x14/0x68
    [<ffffffc00032e4b4>] __handle_sysrq+0x11c/0x190
    [<ffffffc00032e93c>] write_sysrq_trigger+0x4c/0x60
    [<ffffffc0001e7d58>] proc_reg_write+0x54/0x84
    [<ffffffc000192fa4>] vfs_write+0x98/0x1c8
    [<ffffffc0001939b0>] SyS_write+0x40/0xa0

When some interrupt occurs, a breakpoint at gic_handle_irq() invokes kgdb.
Kgdb then calls kgdb_roundup_cpus() to sync with other cpus. Current
kgdb_roundup_cpus() unmasks interrupts temporarily to use
smp_call_function(). This eventually allows another interrupt to occur and
likely results in hitting a breakpoint at gic_handle_irq() again since
a debug exception is always enabled in el1_irq.

We can avoid this issue by specifying "nokgdbroundup" in a kernel command
line, but this will also leave other cpus be in unknown state in terms of
kgdb, and may result in interfering with kgdb.

This patch re-implements kgdb_roundup_cpus() by using irq_work so that we
won't have to enable interrupts there.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
---
 arch/arm64/kernel/kgdb.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index fddbc6be3780..58706501d460 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -20,10 +20,13 @@
  */
 
 #include <linux/bug.h>
+#include <linux/cpumask.h>
 #include <linux/irq.h>
+#include <linux/irq_work.h>
 #include <linux/kdebug.h>
 #include <linux/kgdb.h>
 #include <linux/kprobes.h>
+#include <linux/percpu.h>
 #include <linux/sched/task_stack.h>
 
 #include <asm/debug-monitors.h>
@@ -113,6 +116,7 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
 };
 
 static DEFINE_PER_CPU(unsigned int, kgdb_pstate);
+static DEFINE_PER_CPU(struct irq_work, kgdb_irq_work);
 
 char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
 {
@@ -284,16 +288,33 @@ static struct step_hook kgdb_step_hook = {
 	.fn		= kgdb_step_brk_fn
 };
 
-static void kgdb_call_nmi_hook(void *ignored)
+static void kgdb_roundup_hook(struct irq_work *work)
 {
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
 void kgdb_roundup_cpus(unsigned long flags)
 {
-	local_irq_enable();
-	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
+	int cpu, this_cpu;
+	struct irq_work *work;
+
+	if (num_online_cpus() == 1)
+		return;
+
+	/*
+	 * We assume that no cpus go up or down here, but we can't guarantee
+	 * this because get_online_cpus() may not be called in an atomic
+	 * context. It is fragile, but kgdb_handle_exception()/
+	 * kgdb_cpu_enter() doesn't deal with this, neither.
+	 */
+	this_cpu = raw_smp_processor_id();
+	for_each_online_cpu(cpu) {
+		if (cpu == this_cpu)
+			continue;
+
+		work = per_cpu_ptr(&kgdb_irq_work, cpu);
+		irq_work_queue_on(work, cpu);
+	}
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
@@ -334,6 +355,8 @@ static struct notifier_block kgdb_notifier = {
 int kgdb_arch_init(void)
 {
 	int ret = register_die_notifier(&kgdb_notifier);
+	int cpu;
+	struct irq_work *work;
 
 	if (ret != 0)
 		return ret;
@@ -341,6 +364,12 @@ int kgdb_arch_init(void)
 	register_break_hook(&kgdb_brkpt_hook);
 	register_break_hook(&kgdb_compiled_brkpt_hook);
 	register_step_hook(&kgdb_step_hook);
+
+	for_each_possible_cpu(cpu) {
+		work = per_cpu_ptr(&kgdb_irq_work, cpu);
+		init_irq_work(work, kgdb_roundup_hook);
+	}
+
 	return 0;
 }
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] kgdb: select a correct cpu while in a single stepping
  2017-05-23  4:30 [PATCH v3 0/4] arm64: kgdb: fix single stepping AKASHI Takahiro
                   ` (2 preceding siblings ...)
  2017-05-23  4:30 ` [PATCH v3 3/4] arm64: kgdb: prevent kgdb from being invoked recursively AKASHI Takahiro
@ 2017-05-23  4:30 ` AKASHI Takahiro
  3 siblings, 0 replies; 14+ messages in thread
From: AKASHI Takahiro @ 2017-05-23  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

When a breakpoint is set somewhere in an irq handler and it is hit
almost at the same time on multiple cpus, we may see the following
error from gdb in trying to do a single stepping:

  (gdb) si
  /home/tcwg-buildslave/workspace/tcwg-make-release/label/docker-trusty
  -amd64-tcwg-build/target/aarch64-linux-gnu/snapshots/binutils-gdb.git
  ~gdb-7.12-branch/gdb/infrun.c:5565: internal-error: int finish_step_o
  ver(execution_control_state*): Assertion `ecs->event_thread->control.
  trap_expected' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n)

This message indicates that, before and after a single stepping,
different cpus/tasks are selected as a kgdb master in kgdb_cpu_enter
(DCPU_WANT_MASTER).

This issue was seen on arm64 when a breakpoint was set, for example,
to gic_handle_irq.

How can this happen?
With the following commit, kgdb_cpu_enter() tries to "only enter on
the processor that was single stepping."

  commit 028e7b175970
  Author: Jason Wessel <jason.wessel@windriver.com>
  Date:   Fri Dec 11 08:43:17 2009 -0600

      kgdb: allow for cpu switch when single stepping

If idle task is running when a breakpoint is hit, kgdb_info[cpu].task
is set to current, pid of which is 0 (zero).
(Actually 'current' has nothing to do with this breakpoint though.)
Then if a few cpus are competing for becoming a kgdb master while in
idle state,

        if (atomic_read(&kgdb_cpu_doing_single_step) != -1 &&
            (kgdb_info[cpu].task &&
             kgdb_info[cpu].task->pid != kgdb_sstep_pid) && --sstep_tries)

this check will mistakenly pass and select a wrong cpu just because
kgdb_info[cpu].task->pid can be equal to 0 for all of the cpus at this
point.

So the issue is fixed in this patch by using 'task_struct *' instead of
process id to identify a targeted single-stepping cpu.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
---
 kernel/debug/debug_core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 65c0f1363788..4dd2a41c8309 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -135,7 +135,7 @@ struct task_struct		*kgdb_usethread;
 struct task_struct		*kgdb_contthread;
 
 int				kgdb_single_step;
-static pid_t			kgdb_sstep_pid;
+static struct task_struct	*kgdb_sstep_thread;
 
 /* to keep track of the CPU which is doing the single stepping*/
 atomic_t			kgdb_cpu_doing_single_step = ATOMIC_INIT(-1);
@@ -555,7 +555,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	 */
 	if (atomic_read(&kgdb_cpu_doing_single_step) != -1 &&
 	    (kgdb_info[cpu].task &&
-	     kgdb_info[cpu].task->pid != kgdb_sstep_pid) && --sstep_tries) {
+	     kgdb_info[cpu].task != kgdb_sstep_thread) && --sstep_tries) {
 		atomic_set(&kgdb_active, -1);
 		raw_spin_unlock(&dbg_master_lock);
 		dbg_touch_watchdogs();
@@ -658,9 +658,9 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
 		int sstep_cpu = atomic_read(&kgdb_cpu_doing_single_step);
 		if (kgdb_info[sstep_cpu].task)
-			kgdb_sstep_pid = kgdb_info[sstep_cpu].task->pid;
+			kgdb_sstep_thread = kgdb_info[sstep_cpu].task;
 		else
-			kgdb_sstep_pid = 0;
+			kgdb_sstep_thread = NULL;
 	}
 	if (arch_kgdb_ops.correct_hw_break)
 		arch_kgdb_ops.correct_hw_break();
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-05-23  4:30 ` [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled AKASHI Takahiro
@ 2017-06-05 16:29   ` Will Deacon
  2017-06-07  5:34     ` AKASHI Takahiro
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2017-06-05 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote:
> After entering kgdb mode, 'stepi' may unexpectedly breaks the execution
> somewhere in el1_irq.
> 
> This happens because a debug exception is always enabled in el1_irq
> due to the following commit merged in v3.16:
>   commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> 			paths where possible")
> A pending interrupt can be taken after kgdb has enabled a software step,
> but before a debug exception is actually taken.
> 
> This patch enforces interrupts to be masked while single stepping.

The desired behaviour here boils down to whether or not KGDB expects to step
into or over interrupts in response a stepi instruction. What do other
architectures do? What happens if the instruction being stepped faults?

Will

> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Jason Wessel <jason.wessel@windriver.com>
> ---
>  arch/arm64/kernel/kgdb.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index b9176b324e5a..fddbc6be3780 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -28,6 +28,7 @@
>  
>  #include <asm/debug-monitors.h>
>  #include <asm/insn.h>
> +#include <asm/ptrace.h>
>  #include <asm/traps.h>
>  
>  struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> @@ -111,6 +112,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
>  	{ "fpcr", 4, -1 },
>  };
>  
> +static DEFINE_PER_CPU(unsigned int, kgdb_pstate);
> +
>  char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
>  {
>  	if (regno >= DBG_MAX_REG_NUM || regno < 0)
> @@ -200,6 +203,10 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
>  		err = 0;
>  		break;
>  	case 's':
> +		/* mask interrupts while single stepping */
> +		__this_cpu_write(kgdb_pstate, linux_regs->pstate);
> +		linux_regs->pstate |= PSR_I_BIT;
> +
>  		/*
>  		 * Update step address value with address passed
>  		 * with step packet.
> @@ -242,11 +249,20 @@ NOKPROBE_SYMBOL(kgdb_compiled_brk_fn);
>  
>  static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
>  {
> +	unsigned int pstate;
> +
>  	if (!kgdb_single_step)
>  		return DBG_HOOK_ERROR;
>  
>  	kernel_disable_single_step();
>  
> +	/* restore interrupt mask status */
> +	pstate = __this_cpu_read(kgdb_pstate);
> +	if (pstate & PSR_I_BIT)
> +		regs->pstate |= PSR_I_BIT;
> +	else
> +		regs->pstate &= ~PSR_I_BIT;
> +
>  	kgdb_handle_exception(1, SIGTRAP, 0, regs);
>  	return 0;
>  }
> -- 
> 2.11.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] arm64: kgdb: fix single stepping
  2017-05-23  4:30 ` [PATCH v3 1/4] " AKASHI Takahiro
@ 2017-06-05 16:29   ` Will Deacon
  2017-06-07  4:43     ` AKASHI Takahiro
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2017-06-05 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 23, 2017 at 01:30:55PM +0900, AKASHI Takahiro wrote:
> After entering kgdb mode, the first 'stepi' can succeed, but the following
> 'stepi' never executes the next instruction.
> 
> This is because a software step cannot get enabled as the software step
> bit(SS) in SPSR, which is cleared by the first single stepping, will not
> be set again for the following 's' commands.

For userspace, we have user_rewind_single_step to re-arm the state machine
on an unhandled step exception. It sounds like we need the kernel version of
that?

> Please note that this bit, as well as the software step control bit(SS)
> in MDSCR, must be set before resuming the execution.
> kernel_active_single_step() called by kgdb_arch_handle_exception() checks
> only for the bit in MDSCR, and so kgdb_enable_single_step() will never be
> called.

MDSCR.SS shouldn't get cleared by the hardware, so I don't understand your
point here.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] arm64: kgdb: fix single stepping
  2017-06-05 16:29   ` Will Deacon
@ 2017-06-07  4:43     ` AKASHI Takahiro
  0 siblings, 0 replies; 14+ messages in thread
From: AKASHI Takahiro @ 2017-06-07  4:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 05, 2017 at 05:29:25PM +0100, Will Deacon wrote:
> On Tue, May 23, 2017 at 01:30:55PM +0900, AKASHI Takahiro wrote:
> > After entering kgdb mode, the first 'stepi' can succeed, but the following
> > 'stepi' never executes the next instruction.
> > 
> > This is because a software step cannot get enabled as the software step
> > bit(SS) in SPSR, which is cleared by the first single stepping, will not
> > be set again for the following 's' commands.
> 
> For userspace, we have user_rewind_single_step to re-arm the state machine
> on an unhandled step exception. It sounds like we need the kernel version of
> that?

Bingo. All what we needed here is:
---8<---
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 2122cd187f19..a04c4242c3f8 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -253,6 +253,10 @@ static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
 		return DBG_HOOK_ERROR;
 
 	kgdb_handle_exception(1, SIGTRAP, 0, regs);
+
+	/* rewind a single step */
+	regs->pstate |= DBG_SPSR_SS;
+
 	return 0;
 }
 NOKPROBE_SYMBOL(kgdb_step_brk_fn);
--->8---

> > Please note that this bit, as well as the software step control bit(SS)
> > in MDSCR, must be set before resuming the execution.
> > kernel_active_single_step() called by kgdb_arch_handle_exception() checks
> > only for the bit in MDSCR, and so kgdb_enable_single_step() will never be
> > called.
> 
> MDSCR.SS shouldn't get cleared by the hardware, so I don't understand your
> point here.

I think I saw some description in ARM ARM, but don't find out any now.
Maybe I confused SPSR.SS with MDSCR.

Thanks,
-Takahiro AKASHI


> Will

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-06-05 16:29   ` Will Deacon
@ 2017-06-07  5:34     ` AKASHI Takahiro
  2017-06-07 16:50       ` Will Deacon
  0 siblings, 1 reply; 14+ messages in thread
From: AKASHI Takahiro @ 2017-06-07  5:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote:
> On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote:
> > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution
> > somewhere in el1_irq.
> > 
> > This happens because a debug exception is always enabled in el1_irq
> > due to the following commit merged in v3.16:
> >   commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> > 			paths where possible")
> > A pending interrupt can be taken after kgdb has enabled a software step,
> > but before a debug exception is actually taken.
> > 
> > This patch enforces interrupts to be masked while single stepping.
> 
> The desired behaviour here boils down to whether or not KGDB expects to step
> into or over interrupts in response a stepi instruction. What do other
> architectures do?

I don't know x86 case, but if we step into interrupt code here,
doing stepi on a normal function will be almost useless as every
attempt of stepi will end up falling into irq code (mostly for timer
interrupt).

> What happens if the instruction being stepped faults?

Well, as a matter of fact, we get a gdb control somewhere in exception code
(actually just after 'enable_dbg' in el1_sync).
But this is a synchronous event, and 'c' command will put us back where we
were before stepi.
I hope this will be a more desirable behavior.

(The only drawback here is we can't see a correct/complete backtrace of stack
because the current gdb is not kernel-aware.)

Thanks,
-Takahiro AKASHI

> Will
> 
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Jason Wessel <jason.wessel@windriver.com>
> > ---
> >  arch/arm64/kernel/kgdb.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> > index b9176b324e5a..fddbc6be3780 100644
> > --- a/arch/arm64/kernel/kgdb.c
> > +++ b/arch/arm64/kernel/kgdb.c
> > @@ -28,6 +28,7 @@
> >  
> >  #include <asm/debug-monitors.h>
> >  #include <asm/insn.h>
> > +#include <asm/ptrace.h>
> >  #include <asm/traps.h>
> >  
> >  struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> > @@ -111,6 +112,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> >  	{ "fpcr", 4, -1 },
> >  };
> >  
> > +static DEFINE_PER_CPU(unsigned int, kgdb_pstate);
> > +
> >  char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
> >  {
> >  	if (regno >= DBG_MAX_REG_NUM || regno < 0)
> > @@ -200,6 +203,10 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
> >  		err = 0;
> >  		break;
> >  	case 's':
> > +		/* mask interrupts while single stepping */
> > +		__this_cpu_write(kgdb_pstate, linux_regs->pstate);
> > +		linux_regs->pstate |= PSR_I_BIT;
> > +
> >  		/*
> >  		 * Update step address value with address passed
> >  		 * with step packet.
> > @@ -242,11 +249,20 @@ NOKPROBE_SYMBOL(kgdb_compiled_brk_fn);
> >  
> >  static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
> >  {
> > +	unsigned int pstate;
> > +
> >  	if (!kgdb_single_step)
> >  		return DBG_HOOK_ERROR;
> >  
> >  	kernel_disable_single_step();
> >  
> > +	/* restore interrupt mask status */
> > +	pstate = __this_cpu_read(kgdb_pstate);
> > +	if (pstate & PSR_I_BIT)
> > +		regs->pstate |= PSR_I_BIT;
> > +	else
> > +		regs->pstate &= ~PSR_I_BIT;
> > +
> >  	kgdb_handle_exception(1, SIGTRAP, 0, regs);
> >  	return 0;
> >  }
> > -- 
> > 2.11.1
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-06-07  5:34     ` AKASHI Takahiro
@ 2017-06-07 16:50       ` Will Deacon
  2017-06-20  2:43         ` AKASHI Takahiro
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2017-06-07 16:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote:
> On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote:
> > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote:
> > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution
> > > somewhere in el1_irq.
> > > 
> > > This happens because a debug exception is always enabled in el1_irq
> > > due to the following commit merged in v3.16:
> > >   commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> > > 			paths where possible")
> > > A pending interrupt can be taken after kgdb has enabled a software step,
> > > but before a debug exception is actually taken.
> > > 
> > > This patch enforces interrupts to be masked while single stepping.
> > 
> > The desired behaviour here boils down to whether or not KGDB expects to step
> > into or over interrupts in response a stepi instruction. What do other
> > architectures do?
> 
> I don't know x86 case, but if we step into interrupt code here,
> doing stepi on a normal function will be almost useless as every
> attempt of stepi will end up falling into irq code (mostly for timer
> interrupt).
> 
> > What happens if the instruction being stepped faults?
> 
> Well, as a matter of fact, we get a gdb control somewhere in exception code
> (actually just after 'enable_dbg' in el1_sync).

Ok, but don't we need to re-enable interrupts, otherwise we can't safely
handle the fault (which might involve blocking)?

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-06-07 16:50       ` Will Deacon
@ 2017-06-20  2:43         ` AKASHI Takahiro
  2017-06-20 17:07           ` Will Deacon
  0 siblings, 1 reply; 14+ messages in thread
From: AKASHI Takahiro @ 2017-06-20  2:43 UTC (permalink / raw)
  To: linux-arm-kernel

Will,

Sorry for late response,

On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote:
> On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote:
> > On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote:
> > > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote:
> > > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution
> > > > somewhere in el1_irq.
> > > > 
> > > > This happens because a debug exception is always enabled in el1_irq
> > > > due to the following commit merged in v3.16:
> > > >   commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> > > > 			paths where possible")
> > > > A pending interrupt can be taken after kgdb has enabled a software step,
> > > > but before a debug exception is actually taken.
> > > > 
> > > > This patch enforces interrupts to be masked while single stepping.
> > > 
> > > The desired behaviour here boils down to whether or not KGDB expects to step
> > > into or over interrupts in response a stepi instruction. What do other
> > > architectures do?
> > 
> > I don't know x86 case, but if we step into interrupt code here,
> > doing stepi on a normal function will be almost useless as every
> > attempt of stepi will end up falling into irq code (mostly for timer
> > interrupt).
> > 
> > > What happens if the instruction being stepped faults?
> > 
> > Well, as a matter of fact, we get a gdb control somewhere in exception code
> > (actually just after 'enable_dbg' in el1_sync).
> 
> Ok, but don't we need to re-enable interrupts, otherwise we can't safely
> handle the fault (which might involve blocking)?

I thought a lot, but have got no other way to solve the issue, which
totally makes stepi in vain.
In theory, you might be right, but in practice, people don't always expect
to step through the whole sequence of fault recovery with single stepping.
Once we do 'c(ontinue),' interrupts are enabled again and the execution
will follow as expected.

If you want to 'step over' a faulted instruction here, or to break
somewhere in a middle of exception handler, you need to manage to set
a breakpoint explicitly. But it will, I believe, be much better than
useless stepi from day-1 :)

Meanwhile, kprobes also disables interrupts while single stepping.
See setup_singlestep().

Thanks,
-Takahiro AKASHI

> Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-06-20  2:43         ` AKASHI Takahiro
@ 2017-06-20 17:07           ` Will Deacon
  2017-06-21  2:43             ` AKASHI Takahiro
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2017-06-20 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 20, 2017 at 11:43:34AM +0900, AKASHI Takahiro wrote:
> On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote:
> > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote:
> > > On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote:
> > > > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote:
> > > > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution
> > > > > somewhere in el1_irq.
> > > > > 
> > > > > This happens because a debug exception is always enabled in el1_irq
> > > > > due to the following commit merged in v3.16:
> > > > >   commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> > > > > 			paths where possible")
> > > > > A pending interrupt can be taken after kgdb has enabled a software step,
> > > > > but before a debug exception is actually taken.
> > > > > 
> > > > > This patch enforces interrupts to be masked while single stepping.
> > > > 
> > > > The desired behaviour here boils down to whether or not KGDB expects to step
> > > > into or over interrupts in response a stepi instruction. What do other
> > > > architectures do?
> > > 
> > > I don't know x86 case, but if we step into interrupt code here,
> > > doing stepi on a normal function will be almost useless as every
> > > attempt of stepi will end up falling into irq code (mostly for timer
> > > interrupt).
> > > 
> > > > What happens if the instruction being stepped faults?
> > > 
> > > Well, as a matter of fact, we get a gdb control somewhere in exception code
> > > (actually just after 'enable_dbg' in el1_sync).
> > 
> > Ok, but don't we need to re-enable interrupts, otherwise we can't safely
> > handle the fault (which might involve blocking)?
> 
> I thought a lot, but have got no other way to solve the issue, which
> totally makes stepi in vain.
> In theory, you might be right, but in practice, people don't always expect
> to step through the whole sequence of fault recovery with single stepping.
> Once we do 'c(ontinue),' interrupts are enabled again and the execution
> will follow as expected.

It's not the stepping guarantees I'm worried about. I'm more worried that
the fault handler panics because it's called with IRQs disabled, so the
debugger has ended up changing the behaviour of the kernel which is
absolutely not what you want!

> If you want to 'step over' a faulted instruction here, or to break
> somewhere in a middle of exception handler, you need to manage to set
> a breakpoint explicitly. But it will, I believe, be much better than
> useless stepi from day-1 :)
> 
> Meanwhile, kprobes also disables interrupts while single stepping.
> See setup_singlestep().

Sure, but I don't think those instructions can fault. Can KGDB make the same
guarantees?

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-06-20 17:07           ` Will Deacon
@ 2017-06-21  2:43             ` AKASHI Takahiro
  2017-06-21 10:00               ` Will Deacon
  0 siblings, 1 reply; 14+ messages in thread
From: AKASHI Takahiro @ 2017-06-21  2:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 20, 2017 at 06:07:00PM +0100, Will Deacon wrote:
> On Tue, Jun 20, 2017 at 11:43:34AM +0900, AKASHI Takahiro wrote:
> > On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote:
> > > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote:
>
> > > Ok, but don't we need to re-enable interrupts, otherwise we can't safely
> > > handle the fault (which might involve blocking)?
> > 
> > I thought a lot, but have got no other way to solve the issue, which
> > totally makes stepi in vain.
> > In theory, you might be right, but in practice, people don't always expect
> > to step through the whole sequence of fault recovery with single stepping.
> > Once we do 'c(ontinue),' interrupts are enabled again and the execution
> > will follow as expected.
> 
> It's not the stepping guarantees I'm worried about. I'm more worried that
> the fault handler panics because it's called with IRQs disabled,

I'm still wondering how it can cause panic.

> so the
> debugger has ended up changing the behaviour of the kernel which is
> absolutely not what you want!

That's what I'm saying, I guess.
As far as we are keeping on single stepping, no interrupts will be
taken but once the execution is continued, any outstanding interrupts
will be served.
Human intervention with a debugger, either software-oriented kgdb or
jtag-based ICE, will inevitably affect the system's behavior somehow.

> > If you want to 'step over' a faulted instruction here, or to break
> > somewhere in a middle of exception handler, you need to manage to set
> > a breakpoint explicitly. But it will, I believe, be much better than
> > useless stepi from day-1 :)
> > 
> > Meanwhile, kprobes also disables interrupts while single stepping.
> > See setup_singlestep().
> 
> Sure, but I don't think those instructions can fault.

Looking into arm_probe_decode_insn() & aarch64_insn_is_steppable(),
I doubt that no instruction to be single stepped by kprobes
will cause any (page) fault.

In addition, a comment in kprobe_fault_handler() says,
                /*
                 * We are here because the instruction being single
                 * stepped caused a page fault. We reset the current
                 * kprobe and the ip points back to the probe address
                 * and allow the page fault handler to continue as a
                 * normal page fault.
                 */
Kprobes seems to redo the instruction with single step disabled
if it has generated a page fault.

> Can KGDB make the same
> guarantees?

Taking a similar approach won't be impossible, but require lots of work,
in particular decoding instructions to be single stepped.
(I'm even not sure that gdb protocol allows this.)
It also raises another question, as you mentioned early, should we
step into or step over exception/interrupt handler?

Thanks,
-Takahiro AKASHI

> Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled
  2017-06-21  2:43             ` AKASHI Takahiro
@ 2017-06-21 10:00               ` Will Deacon
  0 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2017-06-21 10:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 21, 2017 at 11:43:05AM +0900, AKASHI Takahiro wrote:
> On Tue, Jun 20, 2017 at 06:07:00PM +0100, Will Deacon wrote:
> > On Tue, Jun 20, 2017 at 11:43:34AM +0900, AKASHI Takahiro wrote:
> > > On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote:
> > > > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote:
> >
> > > > Ok, but don't we need to re-enable interrupts, otherwise we can't safely
> > > > handle the fault (which might involve blocking)?
> > > 
> > > I thought a lot, but have got no other way to solve the issue, which
> > > totally makes stepi in vain.
> > > In theory, you might be right, but in practice, people don't always expect
> > > to step through the whole sequence of fault recovery with single stepping.
> > > Once we do 'c(ontinue),' interrupts are enabled again and the execution
> > > will follow as expected.
> > 
> > It's not the stepping guarantees I'm worried about. I'm more worried that
> > the fault handler panics because it's called with IRQs disabled,
> 
> I'm still wondering how it can cause panic.

Well you're calling the page fault path with interrupts disabled. It might
try to allocate page tables without passing GFP_ATOMIC, or it might block on
I/O or reschedule.

The answer might well be "you can't step put_user and friends", but perhaps
we should try to enforce that rather than go horribly wrong if you do it.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-21 10:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23  4:30 [PATCH v3 0/4] arm64: kgdb: fix single stepping AKASHI Takahiro
2017-05-23  4:30 ` [PATCH v3 1/4] " AKASHI Takahiro
2017-06-05 16:29   ` Will Deacon
2017-06-07  4:43     ` AKASHI Takahiro
2017-05-23  4:30 ` [PATCH v3 2/4] arm64: kgdb: disable interrupts while a software step is enabled AKASHI Takahiro
2017-06-05 16:29   ` Will Deacon
2017-06-07  5:34     ` AKASHI Takahiro
2017-06-07 16:50       ` Will Deacon
2017-06-20  2:43         ` AKASHI Takahiro
2017-06-20 17:07           ` Will Deacon
2017-06-21  2:43             ` AKASHI Takahiro
2017-06-21 10:00               ` Will Deacon
2017-05-23  4:30 ` [PATCH v3 3/4] arm64: kgdb: prevent kgdb from being invoked recursively AKASHI Takahiro
2017-05-23  4:30 ` [PATCH v3 4/4] kgdb: select a correct cpu while in a single stepping AKASHI Takahiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.