All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] arm64: hw_breakpoint: Add get_hwbkt_alignment_mask
@ 2016-09-23 15:18 Pavel Labath
  2016-09-23 15:19 ` [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses Pavel Labath
  2016-09-23 15:19 ` [PATCH 3/3] selftests: arm64: add test for inexact watchpoint address handling Pavel Labath
  0 siblings, 2 replies; 15+ messages in thread
From: Pavel Labath @ 2016-09-23 15:18 UTC (permalink / raw)
  To: linux-arm-kernel

This commit adds a utility function for computing the alignment mask of a
hardware breakpoint and fixes arch_validate_hwbkpt_settings to use that
instead.

Signed-off-by: Pavel Labath <labath@google.com>
---
 arch/arm64/kernel/hw_breakpoint.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 26a6bf7..14562ae 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -473,6 +473,16 @@ static int arch_build_bp_info(struct perf_event *bp)
 	return 0;
 }
 
+static unsigned int get_hwbkpt_alignment_mask(struct perf_event *bp)
+{
+	const struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+
+	if (is_compat_bp(bp))
+		return info->ctrl.len == ARM_BREAKPOINT_LEN_8 ? 0x7 : 0x3;
+	else
+		return info->ctrl.type == ARM_BREAKPOINT_EXECUTE ? 0x3 : 0x7;
+}
+
 /*
  * Validate the arch-specific HW Breakpoint register settings.
  */
@@ -496,11 +506,8 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
 	 * AArch32 tasks expect some simple alignment fixups, so emulate
 	 * that here.
 	 */
+	alignment_mask = get_hwbkpt_alignment_mask(bp);
 	if (is_compat_bp(bp)) {
-		if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
-			alignment_mask = 0x7;
-		else
-			alignment_mask = 0x3;
 		offset = info->address & alignment_mask;
 		switch (offset) {
 		case 0:
@@ -521,10 +528,6 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
 		info->address &= ~alignment_mask;
 		info->ctrl.len <<= offset;
 	} else {
-		if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE)
-			alignment_mask = 0x3;
-		else
-			alignment_mask = 0x7;
 		if (info->address & alignment_mask)
 			return -EINVAL;
 	}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-09-23 15:18 [PATCH 1/3] arm64: hw_breakpoint: Add get_hwbkt_alignment_mask Pavel Labath
@ 2016-09-23 15:19 ` Pavel Labath
  2016-09-26 14:06   ` Will Deacon
  2016-10-07 16:38   ` Pratyush Anand
  2016-09-23 15:19 ` [PATCH 3/3] selftests: arm64: add test for inexact watchpoint address handling Pavel Labath
  1 sibling, 2 replies; 15+ messages in thread
From: Pavel Labath @ 2016-09-23 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

Arm64 hardware does not always report a watchpoint hit address that
matches one of the watchpoints set. It can also report an address
"near" the watchpoint if a single instruction access both watched and
unwatched addresses. There is no straight-forward way, short of
disassembling the offending instruction, to map that address back to
the watchpoint.

Previously, when the hardware reported a watchpoint hit on an address
that did not match our watchpoint (this happens in case of instructions
which access large chunks of memory such as "stp") the process would
enter a loop where we would be continually resuming it (because we did
not recognise that watchpoint hit) and it would keep hitting the
watchpoint again and again. The tracing process would never get
notified of the watchpoint hit.

This commit fixes the problem by looking at the watchpoints near the
address reported by the hardware. If the address does not exactly match
one of the watchpoints we have set, it attributes the hit to the
nearest watchpoint we have.  This heuristic is a bit dodgy, but I don't
think we can do much more, given the hardware limitations.

Signed-off-by: Pavel Labath <labath@google.com>
---
 arch/arm64/kernel/hw_breakpoint.c | 98 +++++++++++++++++++++++++--------------
 1 file changed, 64 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 14562ae..3ce27ea 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -664,49 +664,63 @@ unlock:
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
+/*
+ * Arm64 hardware does not always report a watchpoint hit address that matches
+ * one of the watchpoints set. It can also report an address "near" the
+ * watchpoint if a single instruction access both watched and unwatched
+ * addresses. There is no straight-forward way, short of disassembling the
+ * offending instruction, to map that address back to the watchpoint. This
+ * function computes the distance of the memory access from the watchpoint as a
+ * heuristic for the likelyhood that a given access triggered the watchpoint.
+ *
+ * See Section D2.10.5 "Determining the memory location that caused a Watchpoint
+ * exception" of ARMv8 Architecture Reference Manual for details.
+ *
+ * The function returns the distance of the address from the bytes watched by
+ * the watchpoint. In case of an exact match, it returns 0.
+ */
+static u64 get_distance_from_watchpoint(unsigned long addr, int i,
+					struct arch_hw_breakpoint *info)
+{
+	u64 wp_low, wp_high;
+	int first_bit;
+
+	first_bit = ffs(info->ctrl.len);
+	if (first_bit == 0)
+		return -1;
+
+	wp_low = info->address + first_bit - 1;
+	wp_high = info->address + fls(info->ctrl.len) - 1;
+	if (addr < wp_low)
+		return wp_low - addr;
+	else if (addr > wp_high)
+		return addr - wp_high;
+	else
+		return 0;
+
+}
+
 static int watchpoint_handler(unsigned long addr, unsigned int esr,
 			      struct pt_regs *regs)
 {
-	int i, step = 0, *kernel_step, access;
-	u32 ctrl_reg;
-	u64 val, alignment_mask;
+	int i, step = 0, *kernel_step, access, closest_match = 0;
+	u64 min_dist = -1, dist;
 	struct perf_event *wp, **slots;
 	struct debug_info *debug_info;
 	struct arch_hw_breakpoint *info;
-	struct arch_hw_breakpoint_ctrl ctrl;
 
 	slots = this_cpu_ptr(wp_on_reg);
 	debug_info = &current->thread.debug;
 
+	/*
+	 * Find all watchpoints that match the reported address. If no exact
+	 * match is found. Attribute the hit to the closest watchpoint.
+	 */
+	rcu_read_lock();
 	for (i = 0; i < core_num_wrps; ++i) {
-		rcu_read_lock();
-
 		wp = slots[i];
-
 		if (wp == NULL)
-			goto unlock;
-
-		info = counter_arch_bp(wp);
-		/* AArch32 watchpoints are either 4 or 8 bytes aligned. */
-		if (is_compat_task()) {
-			if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
-				alignment_mask = 0x7;
-			else
-				alignment_mask = 0x3;
-		} else {
-			alignment_mask = 0x7;
-		}
-
-		/* Check if the watchpoint value matches. */
-		val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
-		if (val != (addr & ~alignment_mask))
-			goto unlock;
-
-		/* Possible match, check the byte address select to confirm. */
-		ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
-		decode_ctrl_reg(ctrl_reg, &ctrl);
-		if (!((1 << (addr & alignment_mask)) & ctrl.len))
-			goto unlock;
+			continue;
 
 		/*
 		 * Check that the access type matches.
@@ -715,7 +729,18 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
 		access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W :
 			 HW_BREAKPOINT_R;
 		if (!(access & hw_breakpoint_type(wp)))
-			goto unlock;
+			continue;
+
+		info = counter_arch_bp(wp);
+
+		dist = get_distance_from_watchpoint(addr, i, info);
+		if (dist < min_dist) {
+			min_dist = dist;
+			closest_match = i;
+		}
+		/* Is this an exact match? */
+		if (dist != 0)
+			continue;
 
 		info->trigger = addr;
 		perf_bp_event(wp, regs);
@@ -723,10 +748,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
 		/* Do we need to handle the stepping? */
 		if (is_default_overflow_handler(wp))
 			step = 1;
-
-unlock:
-		rcu_read_unlock();
 	}
+	if (min_dist > 0 && min_dist != -1) {
+		/* No exact match found. */
+		wp = slots[closest_match];
+		info = counter_arch_bp(wp);
+		info->trigger = addr;
+		perf_bp_event(wp, regs);
+	}
+	rcu_read_unlock();
 
 	if (!step)
 		return 0;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] selftests: arm64: add test for inexact watchpoint address handling
  2016-09-23 15:18 [PATCH 1/3] arm64: hw_breakpoint: Add get_hwbkt_alignment_mask Pavel Labath
  2016-09-23 15:19 ` [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses Pavel Labath
@ 2016-09-23 15:19 ` Pavel Labath
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Labath @ 2016-09-23 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

This adds a test which triggers hardware watchpoints by memory accesses
of different sizes. Memory accesses of large blocks of memory were
confusing the kernel as in these cases, the address reported by the
hardware did not exactly match the address we were watching.

This is a follow-up to: "arm64: hw_breakpoint: Handle inexact watchpoint
addresses".

Signed-off-by: Pavel Labath <labath@google.com>
---
 tools/testing/selftests/breakpoints/Makefile       |   5 +-
 .../selftests/breakpoints/breakpoint_test-arm.c    | 217 +++++++++++++++++++++
 2 files changed, 221 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test-arm.c

diff --git a/tools/testing/selftests/breakpoints/Makefile b/tools/testing/selftests/breakpoints/Makefile
index 74e533f..458a31a 100644
--- a/tools/testing/selftests/breakpoints/Makefile
+++ b/tools/testing/selftests/breakpoints/Makefile
@@ -5,6 +5,9 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
 ifeq ($(ARCH),x86)
 TEST_PROGS := breakpoint_test
 endif
+ifeq ($(ARCH),arm64)
+TEST_PROGS := breakpoint_test-arm
+endif
 
 TEST_PROGS += step_after_suspend_test
 
@@ -13,4 +16,4 @@ all: $(TEST_PROGS)
 include ../lib.mk
 
 clean:
-	rm -fr breakpoint_test step_after_suspend_test
+	rm -fr breakpoint_test breakpoint_test-arm step_after_suspend_test
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test-arm.c b/tools/testing/selftests/breakpoints/breakpoint_test-arm.c
new file mode 100644
index 0000000..3ed46b7
--- /dev/null
+++ b/tools/testing/selftests/breakpoints/breakpoint_test-arm.c
@@ -0,0 +1,217 @@
+/*
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define _GNU_SOURCE
+
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/ptrace.h>
+#include <sys/uio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <elf.h>
+#include <signal.h>
+
+#include "../kselftest.h"
+
+enum Test {
+	TEST_WRITE_1, TEST_WRITE_2, TEST_WRITE_4, TEST_WRITE_8,
+	TEST_WRITE_16, TEST_WRITE_32, TEST_MAX
+};
+
+struct Data {
+	union {
+		uint8_t u8[32];
+		uint16_t u16[16];
+		uint32_t u32[8];
+		uint64_t u64[4];
+	};
+};
+volatile struct Data var __aligned(32);
+
+
+void child(enum Test test)
+{
+	if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0) {
+		perror("ptrace(PTRACE_TRACEME) failed");
+		_exit(1);
+	}
+
+	if (raise(SIGSTOP) != 0) {
+		perror("raise(SIGSTOP) failed");
+		_exit(1);
+	}
+
+	switch (test) {
+	case TEST_WRITE_1:
+		var.u8[31] = 47;
+		break;
+	case TEST_WRITE_2:
+		var.u16[15] = 47;
+		break;
+	case TEST_WRITE_4:
+		var.u32[7] = 47;
+		break;
+	case TEST_WRITE_8:
+		var.u64[3] = 47;
+		break;
+	case TEST_WRITE_16:
+		__asm__ volatile ("stp x29, x30, %0" : "=m" (var.u64[2]));
+		break;
+	case TEST_WRITE_32:
+		__asm__ volatile ("stp q29, q30, %0" : "=m" (var));
+		break;
+	}
+
+	_exit(0);
+}
+
+static bool set_watchpoint(pid_t pid, const volatile void *address,
+			   size_t size)
+{
+	const unsigned int byte_mask = (1 << size) - 1;
+	const unsigned int type = 2; /* Write */
+	const unsigned int enable = 1;
+	const unsigned int control = byte_mask << 5 | type << 3 | enable;
+	struct user_hwdebug_state dreg_state;
+	struct iovec iov;
+
+	memset(&dreg_state, 0, sizeof(dreg_state));
+	dreg_state.dbg_regs[0].addr = (uintptr_t)address;
+	dreg_state.dbg_regs[0].ctrl = control;
+	iov.iov_base = &dreg_state;
+	iov.iov_len = offsetof(struct user_hwdebug_state, dbg_regs) +
+		      sizeof(dreg_state.dbg_regs[0]);
+	if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_HW_WATCH, &iov) == 0)
+		return true;
+
+	if (errno == EIO) {
+		printf("ptrace(PTRACE_SETREGSET, NT_ARM_HW_WATCH) "
+		       "not supported on this hardware\n");
+		ksft_exit_skip();
+	}
+	perror("ptrace(PTRACE_SETREGSET, NT_ARM_HW_WATCH) failed");
+	return false;
+}
+
+
+bool run_test(enum Test test)
+{
+	int status;
+	siginfo_t siginfo;
+	pid_t pid = fork();
+	pid_t wpid;
+
+	if (pid < 0) {
+		perror("fork() failed");
+		return false;
+	}
+	if (pid == 0)
+		child(test);
+
+	wpid = waitpid(pid, &status, __WALL);
+	if (wpid != pid) {
+		perror("waitpid() failed");
+		return false;
+	}
+	if (!WIFSTOPPED(status)) {
+		printf("child did not stop\n");
+		return false;
+	}
+	if (WSTOPSIG(status) != SIGSTOP) {
+		printf("child did not stop with SIGSTOP\n");
+		return false;
+	}
+
+	if (!set_watchpoint(pid, &var.u64[3], 8))
+		return false;
+
+	if (ptrace(PTRACE_CONT, pid, NULL, NULL) < 0) {
+		perror("ptrace(PTRACE_SINGLESTEP) failed");
+		return false;
+	}
+
+	alarm(3);
+	wpid = waitpid(pid, &status, __WALL);
+	if (wpid != pid) {
+		perror("waitpid() failed");
+		return false;
+	}
+	alarm(0);
+	if (WIFEXITED(status)) {
+		printf("child did not single-step\n");
+		return false;
+	}
+	if (!WIFSTOPPED(status)) {
+		printf("child did not stop\n");
+		return false;
+	}
+	if (WSTOPSIG(status) != SIGTRAP) {
+		printf("child did not stop with SIGTRAP\n");
+		return false;
+	}
+	if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &siginfo) != 0) {
+		perror("ptrace(PTRACE_GETSIGINFO)");
+		return false;
+	}
+	if (siginfo.si_code != TRAP_HWBKPT) {
+		printf("Unexpected si_code %d\n", siginfo.si_code);
+		return false;
+	}
+
+	kill(pid, SIGKILL);
+	wpid = waitpid(pid, &status, 0);
+	if (wpid != pid) {
+		perror("waitpid() failed");
+		return false;
+	}
+	return true;
+}
+
+void sigalrm(int sig)
+{
+}
+
+int main(int argc, char **argv)
+{
+	int opt;
+	bool succeeded = true;
+	enum Test test;
+	struct sigaction act;
+
+	act.sa_handler = sigalrm;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = 0;
+	sigaction(SIGALRM, &act, NULL);
+	for (test = 0; test < TEST_MAX; ++test) {
+		printf("Test %d ", test);
+		if (run_test(test)) {
+			printf("[OK]\n");
+			ksft_inc_pass_cnt();
+		} else {
+			printf("[FAILED]\n");
+			ksft_inc_fail_cnt();
+			succeeded = false;
+		}
+	}
+
+	ksft_print_cnts();
+	if (succeeded)
+		ksft_exit_pass();
+	else
+		ksft_exit_fail();
+}
+
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-09-23 15:19 ` [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses Pavel Labath
@ 2016-09-26 14:06   ` Will Deacon
  2016-10-07 16:38   ` Pratyush Anand
  1 sibling, 0 replies; 15+ messages in thread
From: Will Deacon @ 2016-09-26 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pavel,

On Fri, Sep 23, 2016 at 04:19:00PM +0100, Pavel Labath wrote:
> Arm64 hardware does not always report a watchpoint hit address that
> matches one of the watchpoints set. It can also report an address
> "near" the watchpoint if a single instruction access both watched and
> unwatched addresses. There is no straight-forward way, short of
> disassembling the offending instruction, to map that address back to
> the watchpoint.
> 
> Previously, when the hardware reported a watchpoint hit on an address
> that did not match our watchpoint (this happens in case of instructions
> which access large chunks of memory such as "stp") the process would
> enter a loop where we would be continually resuming it (because we did
> not recognise that watchpoint hit) and it would keep hitting the
> watchpoint again and again. The tracing process would never get
> notified of the watchpoint hit.
> 
> This commit fixes the problem by looking at the watchpoints near the
> address reported by the hardware. If the address does not exactly match
> one of the watchpoints we have set, it attributes the hit to the
> nearest watchpoint we have.  This heuristic is a bit dodgy, but I don't
> think we can do much more, given the hardware limitations.
> 
> Signed-off-by: Pavel Labath <labath@google.com>
> ---
>  arch/arm64/kernel/hw_breakpoint.c | 98 +++++++++++++++++++++++++--------------
>  1 file changed, 64 insertions(+), 34 deletions(-)

If the first patch in the series is no longer required (as you stated in
your follow-up reply), then you can just drop it.

> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 14562ae..3ce27ea 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -664,49 +664,63 @@ unlock:
>  }
>  NOKPROBE_SYMBOL(breakpoint_handler);
>  
> +/*
> + * Arm64 hardware does not always report a watchpoint hit address that matches
> + * one of the watchpoints set. It can also report an address "near" the
> + * watchpoint if a single instruction access both watched and unwatched
> + * addresses. There is no straight-forward way, short of disassembling the
> + * offending instruction, to map that address back to the watchpoint. This
> + * function computes the distance of the memory access from the watchpoint as a
> + * heuristic for the likelyhood that a given access triggered the watchpoint.
> + *
> + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint
> + * exception" of ARMv8 Architecture Reference Manual for details.
> + *
> + * The function returns the distance of the address from the bytes watched by
> + * the watchpoint. In case of an exact match, it returns 0.
> + */
> +static u64 get_distance_from_watchpoint(unsigned long addr, int i,
> +					struct arch_hw_breakpoint *info)
> +{
> +	u64 wp_low, wp_high;
> +	int first_bit;
> +
> +	first_bit = ffs(info->ctrl.len);
> +	if (first_bit == 0)
> +		return -1;
> +
> +	wp_low = info->address + first_bit - 1;
> +	wp_high = info->address + fls(info->ctrl.len) - 1;

This would all be cleaner if you just called get_hbp_len(info->ctrl.len)
to get the size of the watchpoint. We don't do anything sophisticated
with the BAS, so you can assume everything is base + len.

> @@ -723,10 +748,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
>  		/* Do we need to handle the stepping? */
>  		if (is_default_overflow_handler(wp))
>  			step = 1;
> -
> -unlock:
> -		rcu_read_unlock();
>  	}
> +	if (min_dist > 0 && min_dist != -1) {

min_dist is unsigned, so this could be:

	if (min_dist + 1 > 1)

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-09-23 15:19 ` [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses Pavel Labath
  2016-09-26 14:06   ` Will Deacon
@ 2016-10-07 16:38   ` Pratyush Anand
  2016-10-07 17:24     ` Pavel Labath
  1 sibling, 1 reply; 15+ messages in thread
From: Pratyush Anand @ 2016-10-07 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 23, 2016 at 8:49 PM, Pavel Labath
<test.tberghammer@gmail.com> wrote:
> Arm64 hardware does not always report a watchpoint hit address that
> matches one of the watchpoints set. It can also report an address
> "near" the watchpoint if a single instruction access both watched and
> unwatched addresses. There is no straight-forward way, short of
> disassembling the offending instruction, to map that address back to
> the watchpoint.
>
> Previously, when the hardware reported a watchpoint hit on an address
> that did not match our watchpoint (this happens in case of instructions
> which access large chunks of memory such as "stp") the process would
> enter a loop where we would be continually resuming it (because we did
> not recognise that watchpoint hit) and it would keep hitting the
> watchpoint again and again. The tracing process would never get
> notified of the watchpoint hit.
>
> This commit fixes the problem by looking at the watchpoints near the
> address reported by the hardware. If the address does not exactly match
> one of the watchpoints we have set, it attributes the hit to the
> nearest watchpoint we have.  This heuristic is a bit dodgy, but I don't
> think we can do much more, given the hardware limitations.

IIUC, then you see an issue when an address watched is not the base
address accessed by the instruction. For example, if an address 'a+8'
is watched and an instruction accesses instruction from  a to a +16. I
tried to reproduce the issue with mustang using your test-case in
patch3 (after couple of syntax modifcations for resolving compilation
issue with gcc). All the test case did pass with existing code in
v4.8. I noticed that, watchpoint exception is generated if any of the
sub-location accessed from a single instruction is watched, provided
watchdpoint watches either a byte, half word, word or double word
from the base.

So, either I must be missing something or the problem is not related
to all arm64 platform.

However, I did notice that it does not work if we watch an address
which is at some offset from address programmed. For example, it works
when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which
is supported by hardware).

I do have some patches to resolve that.

https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel

I will send them for review comment after some testing.

~Pratyush

>
> Signed-off-by: Pavel Labath <labath@google.com>
> ---
>  arch/arm64/kernel/hw_breakpoint.c | 98 +++++++++++++++++++++++++--------------
>  1 file changed, 64 insertions(+), 34 deletions(-)
>
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 14562ae..3ce27ea 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -664,49 +664,63 @@ unlock:
>  }
>  NOKPROBE_SYMBOL(breakpoint_handler);
>
> +/*
> + * Arm64 hardware does not always report a watchpoint hit address that matches
> + * one of the watchpoints set. It can also report an address "near" the
> + * watchpoint if a single instruction access both watched and unwatched
> + * addresses. There is no straight-forward way, short of disassembling the
> + * offending instruction, to map that address back to the watchpoint. This
> + * function computes the distance of the memory access from the watchpoint as a
> + * heuristic for the likelyhood that a given access triggered the watchpoint.
> + *
> + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint
> + * exception" of ARMv8 Architecture Reference Manual for details.
> + *
> + * The function returns the distance of the address from the bytes watched by
> + * the watchpoint. In case of an exact match, it returns 0.
> + */
> +static u64 get_distance_from_watchpoint(unsigned long addr, int i,
> +                                       struct arch_hw_breakpoint *info)
> +{
> +       u64 wp_low, wp_high;
> +       int first_bit;
> +
> +       first_bit = ffs(info->ctrl.len);
> +       if (first_bit == 0)
> +               return -1;
> +
> +       wp_low = info->address + first_bit - 1;
> +       wp_high = info->address + fls(info->ctrl.len) - 1;
> +       if (addr < wp_low)
> +               return wp_low - addr;
> +       else if (addr > wp_high)
> +               return addr - wp_high;
> +       else
> +               return 0;
> +
> +}
> +
>  static int watchpoint_handler(unsigned long addr, unsigned int esr,
>                               struct pt_regs *regs)
>  {
> -       int i, step = 0, *kernel_step, access;
> -       u32 ctrl_reg;
> -       u64 val, alignment_mask;
> +       int i, step = 0, *kernel_step, access, closest_match = 0;
> +       u64 min_dist = -1, dist;
>         struct perf_event *wp, **slots;
>         struct debug_info *debug_info;
>         struct arch_hw_breakpoint *info;
> -       struct arch_hw_breakpoint_ctrl ctrl;
>
>         slots = this_cpu_ptr(wp_on_reg);
>         debug_info = &current->thread.debug;
>
> +       /*
> +        * Find all watchpoints that match the reported address. If no exact
> +        * match is found. Attribute the hit to the closest watchpoint.
> +        */
> +       rcu_read_lock();
>         for (i = 0; i < core_num_wrps; ++i) {
> -               rcu_read_lock();
> -
>                 wp = slots[i];
> -
>                 if (wp == NULL)
> -                       goto unlock;
> -
> -               info = counter_arch_bp(wp);
> -               /* AArch32 watchpoints are either 4 or 8 bytes aligned. */
> -               if (is_compat_task()) {
> -                       if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
> -                               alignment_mask = 0x7;
> -                       else
> -                               alignment_mask = 0x3;
> -               } else {
> -                       alignment_mask = 0x7;
> -               }
> -
> -               /* Check if the watchpoint value matches. */
> -               val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
> -               if (val != (addr & ~alignment_mask))
> -                       goto unlock;
> -
> -               /* Possible match, check the byte address select to confirm. */
> -               ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
> -               decode_ctrl_reg(ctrl_reg, &ctrl);
> -               if (!((1 << (addr & alignment_mask)) & ctrl.len))
> -                       goto unlock;
> +                       continue;
>
>                 /*
>                  * Check that the access type matches.
> @@ -715,7 +729,18 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
>                 access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W :
>                          HW_BREAKPOINT_R;
>                 if (!(access & hw_breakpoint_type(wp)))
> -                       goto unlock;
> +                       continue;
> +
> +               info = counter_arch_bp(wp);
> +
> +               dist = get_distance_from_watchpoint(addr, i, info);
> +               if (dist < min_dist) {
> +                       min_dist = dist;
> +                       closest_match = i;
> +               }
> +               /* Is this an exact match? */
> +               if (dist != 0)
> +                       continue;
>
>                 info->trigger = addr;
>                 perf_bp_event(wp, regs);
> @@ -723,10 +748,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
>                 /* Do we need to handle the stepping? */
>                 if (is_default_overflow_handler(wp))
>                         step = 1;
> -
> -unlock:
> -               rcu_read_unlock();
>         }
> +       if (min_dist > 0 && min_dist != -1) {
> +               /* No exact match found. */
> +               wp = slots[closest_match];
> +               info = counter_arch_bp(wp);
> +               info->trigger = addr;
> +               perf_bp_event(wp, regs);
> +       }
> +       rcu_read_unlock();
>
>         if (!step)
>                 return 0;
> --
> 2.8.0.rc3.226.g39d4020
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-07 16:38   ` Pratyush Anand
@ 2016-10-07 17:24     ` Pavel Labath
  2016-10-08  5:10       ` Pratyush Anand
  2016-10-10 17:08       ` Pratyush Anand
  0 siblings, 2 replies; 15+ messages in thread
From: Pavel Labath @ 2016-10-07 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 October 2016 at 09:38, Pratyush Anand <panand@redhat.com> wrote:
>
>
> IIUC, then you see an issue when an address watched is not the base
> address accessed by the instruction. For example, if an address 'a+8'
> is watched and an instruction accesses instruction from  a to a +16. I
> tried to reproduce the issue with mustang using your test-case in
> patch3 (after couple of syntax modifcations for resolving compilation
> issue with gcc). All the test case did pass with existing code in
> v4.8. I noticed that, watchpoint exception is generated if any of the
> sub-location accessed from a single instruction is watched, provided
> watchdpoint watches either a byte, half word, word or double word
> from the base.
>
>
> So, either I must be missing something or the problem is not related
> to all arm64 platform.


Hello Pratyush,

Thank you for looking into this.

The thing is, I have observed different behavior here depending on the
exact hardware used. I don't have the exact parameters with me now,
but I can look it up next week.

The thing is that the spec is imprecise about what exact address the
hardware can report for the watchpoint hit. I presume that is
deliberate to give some leeway to implementers. The spec says the
address can be anywhere in the range from the lowest memory address
accessed by the instruction to the highest address watched by the
watchpoint, but most hardware seems to be stricter than that and
return an address that fits inside the watched range.

On chip 1, I observed the behavior where the hardware would
consistently report an address out of range of the watchpoint and we
would just spin it in a loop.

On chip 2, I observed the behavior where the hardware would report an
out-of-range address for the first two dozen (~) iterations, after
which it would "give up" and report an address that we were happy
with. I don't really have an explanation for this - I can only assume
that some external event like a reschedule to a different core caused
some internal state of the hardware to be reset and cause it to report
a different (better?) address instead. In the case where this was
happening, it had no observable effects on userspace - it did not see
the fact that we had re-executed the offending instruction a dozen
times and as far as it was concerned, the watchpoint functionality
worked perfectly. You can check whether this is happening in your case
by instrumenting the code to print the reported address whenever it
enters `watchpoint_handler`.

(I am sorry about the test errors. I was compiling the test case with
an android gcc - I'll make sure to check it with a vanilla linux gcc
also.)

 >
> However, I did notice that it does not work if we watch an address
> which is at some offset from address programmed. For example, it works
> when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which
> is supported by hardware).
>
> I do have some patches to resolve that.
>
> https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel
>
> I will send them for review comment after some testing.

I am looking forward to these patches - they were the next on my list
to look into after I got this resolved. :)

However: Are sure about 0x2 not being a valid byte mask? According to
my reading of the armv8 spec (section  D7.3.11, "DBGWCR<n>_EL1, Debug
Watchpoint Control Registers, n = 0 - 15") it should be fine.
====
The valid values for BAS are 0b0000000, or a binary number all of
whose set bits are contiguous. All other values are reserved and must
not be used by software.
====
So, 0x2 (as well as 0x6, 0xC, 0xE) should be fine as it has a
contiguous sequence of set bit(s). I haven't tried yet whether any
hardware actually handles that correctly, but I was certainly hoping
we would be able to watch more precise memory regions.

regards,
Pavel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-07 17:24     ` Pavel Labath
@ 2016-10-08  5:10       ` Pratyush Anand
  2016-10-12 13:50         ` Pavel Labath
  2016-10-10 17:08       ` Pratyush Anand
  1 sibling, 1 reply; 15+ messages in thread
From: Pratyush Anand @ 2016-10-08  5:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pavel,


On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote:
> On 7 October 2016 at 09:38, Pratyush Anand <panand@redhat.com> wrote:
>>
>>
>> IIUC, then you see an issue when an address watched is not the base
>> address accessed by the instruction. For example, if an address 'a+8'
>> is watched and an instruction accesses instruction from  a to a +16. I
>> tried to reproduce the issue with mustang using your test-case in
>> patch3 (after couple of syntax modifcations for resolving compilation
>> issue with gcc). All the test case did pass with existing code in
>> v4.8. I noticed that, watchpoint exception is generated if any of the
>> sub-location accessed from a single instruction is watched, provided
>> watchdpoint watches either a byte, half word, word or double word
>> from the base.
>>
>>
>> So, either I must be missing something or the problem is not related
>> to all arm64 platform.
>
>
> Hello Pratyush,
>
> Thank you for looking into this.
>
> The thing is, I have observed different behavior here depending on the
> exact hardware used. I don't have the exact parameters with me now,
> but I can look it up next week.
>
> The thing is that the spec is imprecise about what exact address the
> hardware can report for the watchpoint hit. I presume that is
> deliberate to give some leeway to implementers. The spec says the
> address can be anywhere in the range from the lowest memory address
> accessed by the instruction to the highest address watched by the
> watchpoint,

I think, my patches should  be able to take care of the above condition.

>  but most hardware seems to be stricter than that and
> return an address that fits inside the watched range.
>
> On chip 1, I observed the behavior where the hardware would
> consistently report an address out of range of the watchpoint and we
> would just spin it in a loop.
>
> On chip 2, I observed the behavior where the hardware would report an
> out-of-range address for the first two dozen (~) iterations, after
> which it would "give up" and report an address that we were happy
> with. I don't really have an explanation for this - I can only assume
> that some external event like a reschedule to a different core caused
> some internal state of the hardware to be reset and cause it to report
> a different (better?) address instead. In the case where this was
> happening, it had no observable effects on userspace - it did not see
> the fact that we had re-executed the offending instruction a dozen
> times and as far as it was concerned, the watchpoint functionality
> worked perfectly. You can check whether this is happening in your case
> by instrumenting the code to print the reported address whenever it
> enters `watchpoint_handler`.
>

Yes, I had done that. In my case it was always starting memory address
accessed by the instruction. e.g. if a strb writes to 0x400023, I
would get addr=0x400023 in watchpoint_handler(), if a str writes to
0x400020-0x400027, I would get addr=0x400020.

> (I am sorry about the test errors. I was compiling the test case with
> an android gcc - I'll make sure to check it with a vanilla linux gcc
> also.)
>

your test cases are helpful. I think, I would use them to build
further test cases.

>  >
>> However, I did notice that it does not work if we watch an address
>> which is at some offset from address programmed. For example, it works
>> when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which
>> is supported by hardware).
>>
>> I do have some patches to resolve that.
>>
>> https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel
>>
>> I will send them for review comment after some testing.
>
> I am looking forward to these patches - they were the next on my list
> to look into after I got this resolved. :)
>
> However: Are sure about 0x2 not being a valid byte mask? According to

It is a valid mask indeed, and you should be able to use all those
mask after my patches.

> my reading of the armv8 spec (section  D7.3.11, "DBGWCR<n>_EL1, Debug
> Watchpoint Control Registers, n = 0 - 15") it should be fine.
> ====
> The valid values for BAS are 0b0000000, or a binary number all of
> whose set bits are contiguous. All other values are reserved and must
> not be used by software.
> ====
> So, 0x2 (as well as 0x6, 0xC, 0xE) should be fine as it has a
> contiguous sequence of set bit(s). I haven't tried yet whether any
> hardware actually handles that correctly, but I was certainly hoping
> we would be able to watch more precise memory regions.

I have tested couple of them, and I think my patches should help with
using any contiguous bit mask acceptable by spec.

~Pratyush

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-07 17:24     ` Pavel Labath
  2016-10-08  5:10       ` Pratyush Anand
@ 2016-10-10 17:08       ` Pratyush Anand
  1 sibling, 0 replies; 15+ messages in thread
From: Pratyush Anand @ 2016-10-10 17:08 UTC (permalink / raw)
  To: linux-arm-kernel



On Friday 07 October 2016 10:54 PM, Pavel Labath wrote:
> However, I did notice that it does not work if we watch an address
>> which is at some offset from address programmed. For example, it works
>> when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which
>> is supported by hardware).
>>
>> I do have some patches to resolve that.
>>
>> https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel
>>
>> I will send them for review comment after some testing.

This branch has been updated with a test code as well. So far they work 
fine at my end. Planning to send them tomorrow for review.

~Pratyush

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-08  5:10       ` Pratyush Anand
@ 2016-10-12 13:50         ` Pavel Labath
  2016-10-13  9:58           ` Pratyush Anand
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Labath @ 2016-10-12 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Pratyush,

I am sorry about the delay. I have finally had a chance to try out
your changes today. Response inline.

On 8 October 2016 at 06:10, Pratyush Anand <panand@redhat.com> wrote:
> On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote:
>> The thing is, I have observed different behavior here depending on the
>> exact hardware used. I don't have the exact parameters with me now,
>> but I can look it up next week.
>>
>> The thing is that the spec is imprecise about what exact address the
>> hardware can report for the watchpoint hit. I presume that is
>> deliberate to give some leeway to implementers. The spec says the
>> address can be anywhere in the range from the lowest memory address
>> accessed by the instruction to the highest address watched by the
>> watchpoint,
>
> I think, my patches should  be able to take care of the above condition.
Unfortunately, the patch does not solve the problem for my hardware,
because of the leeway you give in watchpoint_handler is not big
enough. It does work however, if I change the line
> if (addr + 7 < val + lens || addr > val + lene)
to
> if (addr + 15 < val + lens || addr > val + lene)
I do not think we can assume that address reported by the hardware
will be at most 7 bytes off from the address we put the watchpoint at.
There is nothing in the spec that guarantees that, and it does not
seem to be enough for some hardware. In fact, I am not sure we can
assume 15 is enough either, but maybe it can do for now, until we can
find hardware that does not work with that (I haven't yet tried what
happens it the watchpoint is triggered by cache management
instructions, which can access much larger blocks of memory).

For reference, the hardware in question is:
> Processor : AArch64 Processor rev 0 (aarch64)
> processor : 0
> min_vddcx : 400000
> min_vddmx : 490000
> BogoMIPS : 38.00
> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
> CPU implementer : 0x51
> CPU architecture: 8
> CPU variant : 0x2
> CPU part : 0x201
> CPU revision : 0
> CPU param : 300 468 468 621 939 299 445 445 621 1077
> Hardware : Qualcomm Technologies, Inc MSM8996pro

And this is how it behaves:
The output of the test app triggering the watchpoint (I have set a
single-byte watchpoint at 555556705f)
>
> Writing to: 555556705f, size: 1
> Writing to: 555556705e, size: 2
> Writing to: 555556705c, size: 4
> Writing to: 5555567058, size: 8
> Writing to: 5555567050, size: 16
> Writing to: 5555567040, size: 32

The addresses received by the kernel:
[  251.812166] c1   3780 hw-breakpoint: watchpoint_handler: addr:
555556705f, val+lens: 555556705f, val+lene: 555556705f
[  251.820341] c1   3781 hw-breakpoint: watchpoint_handler: addr:
555556705e, val+lens: 555556705f, val+lene: 555556705f
[  251.825572] c0   3782 hw-breakpoint: watchpoint_handler: addr:
555556705c, val+lens: 555556705f, val+lene: 555556705f
[  251.831085] c0   3783 hw-breakpoint: watchpoint_handler: addr:
5555567058, val+lens: 555556705f, val+lene: 555556705f
[  251.835804] c0   3784 hw-breakpoint: watchpoint_handler: addr:
5555567050, val+lens: 555556705f, val+lene: 555556705f
[  251.841350] c0   3785 hw-breakpoint: watchpoint_handler: addr:
5555567050, val+lens: 555556705f, val+lene: 555556705f

Note that for the case of 16 and 32-byte access it returns the address
5555567050 -- this is why the "+15" is sufficient for me.


The other thing I am not so sure about in your patch is that it has
potential to mis-attribute the watchpoint hit if we have two
watchpoints close together. For example, if I have first watchpoint on
0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application
writes one byte to 0x1004, then your code will still attribute the hit
to the first watchpoint, even though it was not really triggered. This
is the reason I implemented my fix as a two-stage process, first
looking for exact hits, and then falling back to the nearest one. That
said, I don't know enough about the codebase to say if this is a real
problem.

On the plus side, I like the fact that we can watch arbitrary memory
regions now, and the feature is working perfectly. :)


My proposal would be to combine the two patches - take the byte mask
handling code from yours, and the hit-attribution code from my patch.
What do you think?

regards,
pavel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-12 13:50         ` Pavel Labath
@ 2016-10-13  9:58           ` Pratyush Anand
  2016-10-13 17:03             ` Pavel Labath
  0 siblings, 1 reply; 15+ messages in thread
From: Pratyush Anand @ 2016-10-13  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pavel,

Thanks a lot for your testing.

On Wed, Oct 12, 2016 at 7:20 PM, Pavel Labath <labath@google.com> wrote:
> Hello Pratyush,
>
> I am sorry about the delay. I have finally had a chance to try out
> your changes today. Response inline.
>
> On 8 October 2016 at 06:10, Pratyush Anand <panand@redhat.com> wrote:
>> On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote:
>>> The thing is, I have observed different behavior here depending on the
>>> exact hardware used. I don't have the exact parameters with me now,
>>> but I can look it up next week.
>>>
>>> The thing is that the spec is imprecise about what exact address the
>>> hardware can report for the watchpoint hit. I presume that is
>>> deliberate to give some leeway to implementers. The spec says the
>>> address can be anywhere in the range from the lowest memory address
>>> accessed by the instruction to the highest address watched by the
>>> watchpoint,
>>
>> I think, my patches should  be able to take care of the above condition.
> Unfortunately, the patch does not solve the problem for my hardware,
> because of the leeway you give in watchpoint_handler is not big
> enough. It does work however, if I change the line
>> if (addr + 7 < val + lens || addr > val + lene)
> to
>> if (addr + 15 < val + lens || addr > val + lene)

Yes, I missed that floating point register will be of size 16.

> I do not think we can assume that address reported by the hardware
> will be at most 7 bytes off from the address we put the watchpoint at.
> There is nothing in the spec that guarantees that, and it does not
> seem to be enough for some hardware. In fact, I am not sure we can
> assume 15 is enough either, but maybe it can do for now, until we can

Right. It might even be bigger, in case of cache maintenance instructions.

> find hardware that does not work with that (I haven't yet tried what
> happens it the watchpoint is triggered by cache management
> instructions, which can access much larger blocks of memory).

Not, sure, may be it can lie in cache line size range.

>
> For reference, the hardware in question is:
>> Processor : AArch64 Processor rev 0 (aarch64)
>> processor : 0
>> min_vddcx : 400000
>> min_vddmx : 490000
>> BogoMIPS : 38.00
>> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x51
>> CPU architecture: 8
>> CPU variant : 0x2
>> CPU part : 0x201
>> CPU revision : 0
>> CPU param : 300 468 468 621 939 299 445 445 621 1077
>> Hardware : Qualcomm Technologies, Inc MSM8996pro
>
> And this is how it behaves:
> The output of the test app triggering the watchpoint (I have set a
> single-byte watchpoint at 555556705f)
>>
>> Writing to: 555556705f, size: 1
>> Writing to: 555556705e, size: 2
>> Writing to: 555556705c, size: 4
>> Writing to: 5555567058, size: 8
>> Writing to: 5555567050, size: 16
>> Writing to: 5555567040, size: 32
>
> The addresses received by the kernel:
> [  251.812166] c1   3780 hw-breakpoint: watchpoint_handler: addr:
> 555556705f, val+lens: 555556705f, val+lene: 555556705f
> [  251.820341] c1   3781 hw-breakpoint: watchpoint_handler: addr:
> 555556705e, val+lens: 555556705f, val+lene: 555556705f
> [  251.825572] c0   3782 hw-breakpoint: watchpoint_handler: addr:
> 555556705c, val+lens: 555556705f, val+lene: 555556705f
> [  251.831085] c0   3783 hw-breakpoint: watchpoint_handler: addr:
> 5555567058, val+lens: 555556705f, val+lene: 555556705f
> [  251.835804] c0   3784 hw-breakpoint: watchpoint_handler: addr:
> 5555567050, val+lens: 555556705f, val+lene: 555556705f
> [  251.841350] c0   3785 hw-breakpoint: watchpoint_handler: addr:
> 5555567050, val+lens: 555556705f, val+lene: 555556705f
>
> Note that for the case of 16 and 32-byte access it returns the address
> 5555567050 -- this is why the "+15" is sufficient for me.
>
>
> The other thing I am not so sure about in your patch is that it has
> potential to mis-attribute the watchpoint hit if we have two
> watchpoints close together. For example, if I have first watchpoint on
> 0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application
> writes one byte to 0x1004, then your code will still attribute the hit
> to the first watchpoint, even though it was not really triggered. This

Hummm..yes, thanks for pointing it out.
There could be only two solutions for it:
(1) We read instruction at the location regs->pc and analyse it and
find the size of read/write.
or(2) What you have suggested in your patch.

I think, its easier to go with your implementation. So, I have taken
your patch and updated my perf/upstream_arm64_devel branch. May be you
can give it a test for your test cases.

> is the reason I implemented my fix as a two-stage process, first
> looking for exact hits, and then falling back to the nearest one. That
> said, I don't know enough about the codebase to say if this is a real
> problem.
>
> On the plus side, I like the fact that we can watch arbitrary memory
> regions now, and the feature is working perfectly. :)
>

Thanks :-)

>
> My proposal would be to combine the two patches - take the byte mask
> handling code from yours, and the hit-attribution code from my patch.
> What do you think?

I am ok with merging them together as well as sending them as
different patch in my v2 series.

~Pratyush

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-13  9:58           ` Pratyush Anand
@ 2016-10-13 17:03             ` Pavel Labath
  2016-10-14  3:15               ` Pratyush Anand
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Labath @ 2016-10-13 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 October 2016 at 10:58, Pratyush Anand <panand@redhat.com> wrote:
> Hi Pavel,
>
> Thanks a lot for your testing.
>
> On Wed, Oct 12, 2016 at 7:20 PM, Pavel Labath <labath@google.com> wrote:
>> Hello Pratyush,
>>
>> I am sorry about the delay. I have finally had a chance to try out
>> your changes today. Response inline.
>>
>> On 8 October 2016 at 06:10, Pratyush Anand <panand@redhat.com> wrote:
>>> On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote:
>>>> The thing is, I have observed different behavior here depending on the
>>>> exact hardware used. I don't have the exact parameters with me now,
>>>> but I can look it up next week.
>>>>
>>>> The thing is that the spec is imprecise about what exact address the
>>>> hardware can report for the watchpoint hit. I presume that is
>>>> deliberate to give some leeway to implementers. The spec says the
>>>> address can be anywhere in the range from the lowest memory address
>>>> accessed by the instruction to the highest address watched by the
>>>> watchpoint,
>>>
>>> I think, my patches should  be able to take care of the above condition.
>> Unfortunately, the patch does not solve the problem for my hardware,
>> because of the leeway you give in watchpoint_handler is not big
>> enough. It does work however, if I change the line
>>> if (addr + 7 < val + lens || addr > val + lene)
>> to
>>> if (addr + 15 < val + lens || addr > val + lene)
>
> Yes, I missed that floating point register will be of size 16.
>
>> I do not think we can assume that address reported by the hardware
>> will be at most 7 bytes off from the address we put the watchpoint at.
>> There is nothing in the spec that guarantees that, and it does not
>> seem to be enough for some hardware. In fact, I am not sure we can
>> assume 15 is enough either, but maybe it can do for now, until we can
>
> Right. It might even be bigger, in case of cache maintenance instructions.
>
>> find hardware that does not work with that (I haven't yet tried what
>> happens it the watchpoint is triggered by cache management
>> instructions, which can access much larger blocks of memory).
>
> Not, sure, may be it can lie in cache line size range.
>
>>
>> For reference, the hardware in question is:
>>> Processor : AArch64 Processor rev 0 (aarch64)
>>> processor : 0
>>> min_vddcx : 400000
>>> min_vddmx : 490000
>>> BogoMIPS : 38.00
>>> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
>>> CPU implementer : 0x51
>>> CPU architecture: 8
>>> CPU variant : 0x2
>>> CPU part : 0x201
>>> CPU revision : 0
>>> CPU param : 300 468 468 621 939 299 445 445 621 1077
>>> Hardware : Qualcomm Technologies, Inc MSM8996pro
>>
>> And this is how it behaves:
>> The output of the test app triggering the watchpoint (I have set a
>> single-byte watchpoint at 555556705f)
>>>
>>> Writing to: 555556705f, size: 1
>>> Writing to: 555556705e, size: 2
>>> Writing to: 555556705c, size: 4
>>> Writing to: 5555567058, size: 8
>>> Writing to: 5555567050, size: 16
>>> Writing to: 5555567040, size: 32
>>
>> The addresses received by the kernel:
>> [  251.812166] c1   3780 hw-breakpoint: watchpoint_handler: addr:
>> 555556705f, val+lens: 555556705f, val+lene: 555556705f
>> [  251.820341] c1   3781 hw-breakpoint: watchpoint_handler: addr:
>> 555556705e, val+lens: 555556705f, val+lene: 555556705f
>> [  251.825572] c0   3782 hw-breakpoint: watchpoint_handler: addr:
>> 555556705c, val+lens: 555556705f, val+lene: 555556705f
>> [  251.831085] c0   3783 hw-breakpoint: watchpoint_handler: addr:
>> 5555567058, val+lens: 555556705f, val+lene: 555556705f
>> [  251.835804] c0   3784 hw-breakpoint: watchpoint_handler: addr:
>> 5555567050, val+lens: 555556705f, val+lene: 555556705f
>> [  251.841350] c0   3785 hw-breakpoint: watchpoint_handler: addr:
>> 5555567050, val+lens: 555556705f, val+lene: 555556705f
>>
>> Note that for the case of 16 and 32-byte access it returns the address
>> 5555567050 -- this is why the "+15" is sufficient for me.
>>
>>
>> The other thing I am not so sure about in your patch is that it has
>> potential to mis-attribute the watchpoint hit if we have two
>> watchpoints close together. For example, if I have first watchpoint on
>> 0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application
>> writes one byte to 0x1004, then your code will still attribute the hit
>> to the first watchpoint, even though it was not really triggered. This
>
> Hummm..yes, thanks for pointing it out.
> There could be only two solutions for it:
> (1) We read instruction at the location regs->pc and analyse it and
> find the size of read/write.
> or(2) What you have suggested in your patch.
>
> I think, its easier to go with your implementation. So, I have taken
> your patch and updated my perf/upstream_arm64_devel branch. May be you
> can give it a test for your test cases.

I've checked out the new version of your branch, and it works great.
I'll write a patch with additional test cases to go on top of your
branch, as the tests there do not capture the bug I was fixing.

regards,
Pavel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-13 17:03             ` Pavel Labath
@ 2016-10-14  3:15               ` Pratyush Anand
  2016-10-19 12:07                 ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Pratyush Anand @ 2016-10-14  3:15 UTC (permalink / raw)
  To: linux-arm-kernel



On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote:
>> I think, its easier to go with your implementation. So, I have taken
>> > your patch and updated my perf/upstream_arm64_devel branch. May be you
>> > can give it a test for your test cases.
> I've checked out the new version of your branch, and it works great.
> I'll write a patch with additional test cases to go on top of your
> branch, as the tests there do not capture the bug I was fixing.

That would be great. We can send them all together as V2.

Thanks for your help.

~Pratyush

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-14  3:15               ` Pratyush Anand
@ 2016-10-19 12:07                 ` Will Deacon
  2016-10-19 13:30                   ` Pavel Labath
  2016-10-20  5:53                   ` Pratyush Anand
  0 siblings, 2 replies; 15+ messages in thread
From: Will Deacon @ 2016-10-19 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 14, 2016 at 08:45:43AM +0530, Pratyush Anand wrote:
> 
> 
> On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote:
> >>I think, its easier to go with your implementation. So, I have taken
> >>> your patch and updated my perf/upstream_arm64_devel branch. May be you
> >>> can give it a test for your test cases.
> >I've checked out the new version of your branch, and it works great.
> >I'll write a patch with additional test cases to go on top of your
> >branch, as the tests there do not capture the bug I was fixing.
> 
> That would be great. We can send them all together as V2.

Did you send a v2? I've been holding off reviewing this, but I just want
to make sure I didn't miss the update.

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-19 12:07                 ` Will Deacon
@ 2016-10-19 13:30                   ` Pavel Labath
  2016-10-20  5:53                   ` Pratyush Anand
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Labath @ 2016-10-19 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

I've send the test suite update to Pratyush via github yesterday. I
presume he will come with a v2 soon. :)

regards,
pavel

On 19 October 2016 at 13:07, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Oct 14, 2016 at 08:45:43AM +0530, Pratyush Anand wrote:
>>
>>
>> On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote:
>> >>I think, its easier to go with your implementation. So, I have taken
>> >>> your patch and updated my perf/upstream_arm64_devel branch. May be you
>> >>> can give it a test for your test cases.
>> >I've checked out the new version of your branch, and it works great.
>> >I'll write a patch with additional test cases to go on top of your
>> >branch, as the tests there do not capture the bug I was fixing.
>>
>> That would be great. We can send them all together as V2.
>
> Did you send a v2? I've been holding off reviewing this, but I just want
> to make sure I didn't miss the update.
>
> Cheers,
>
> Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses
  2016-10-19 12:07                 ` Will Deacon
  2016-10-19 13:30                   ` Pavel Labath
@ 2016-10-20  5:53                   ` Pratyush Anand
  1 sibling, 0 replies; 15+ messages in thread
From: Pratyush Anand @ 2016-10-20  5:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Wednesday 19 October 2016 05:37 PM, Will Deacon wrote:
> On Fri, Oct 14, 2016 at 08:45:43AM +0530, Pratyush Anand wrote:
>>
>>
>> On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote:
>>>> I think, its easier to go with your implementation. So, I have taken
>>>>> your patch and updated my perf/upstream_arm64_devel branch. May be you
>>>>> can give it a test for your test cases.
>>> I've checked out the new version of your branch, and it works great.
>>> I'll write a patch with additional test cases to go on top of your
>>> branch, as the tests there do not capture the bug I was fixing.
>>
>> That would be great. We can send them all together as V2.
>
> Did you send a v2? I've been holding off reviewing this, but I just want
> to make sure I didn't miss the update.

I just posted V2.

http://www.spinics.net/lists/arm-kernel/msg537194.html

~Pratyush

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-10-20  5:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-23 15:18 [PATCH 1/3] arm64: hw_breakpoint: Add get_hwbkt_alignment_mask Pavel Labath
2016-09-23 15:19 ` [PATCH 2/3] arm64: hw_breakpoint: Handle inexact watchpoint addresses Pavel Labath
2016-09-26 14:06   ` Will Deacon
2016-10-07 16:38   ` Pratyush Anand
2016-10-07 17:24     ` Pavel Labath
2016-10-08  5:10       ` Pratyush Anand
2016-10-12 13:50         ` Pavel Labath
2016-10-13  9:58           ` Pratyush Anand
2016-10-13 17:03             ` Pavel Labath
2016-10-14  3:15               ` Pratyush Anand
2016-10-19 12:07                 ` Will Deacon
2016-10-19 13:30                   ` Pavel Labath
2016-10-20  5:53                   ` Pratyush Anand
2016-10-10 17:08       ` Pratyush Anand
2016-09-23 15:19 ` [PATCH 3/3] selftests: arm64: add test for inexact watchpoint address handling Pavel Labath

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.