linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] perf: Support compat mode for AUX ring buffer
@ 2021-08-09 11:27 Leo Yan
  2021-08-09 11:27 ` [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment Leo Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Leo Yan @ 2021-08-09 11:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight,
	linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: Leo Yan

For better organising and easier review, this patch series is extracted
from the patch set "perf: Refine barriers for AUX ring buffer" .  When
applying this patch series, it needs to be applied on the top of the
patch series [1].

To support the compat mode in perf tool, the patch 01 adds an new item
in "perf_env" to track if kernel is running in 64-bit mode.  This patch
is a preparation for later changes.

Patch 02 introduces compat variant functions for accessing AUX trace's
head and tail, these two functions are defined with weak attribute, so
they can be called when any architectures cannot provide 64-bit value
atomic accessing when perf is in compat mode.

Patch 03 supports compat_auxtrace_mmap__{read_head|write_tail} on Arm
platform.  For Arm platform with compat mode, the kernel runs in 64-bit
kernel mode and user space tool runs in 32-bit mode, it uses the
instructions "ldrd" and "strd" for 64-bit value atomicity.

This patch set have been tested on Arm64 Juno platform for the perf tool
is built with compiler arm-linux-gnueabihf-gcc.

[1] https://lore.kernel.org/patchwork/cover/1473916/


Leo Yan (3):
  perf env: Track kernel 64-bit mode in environment
  perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  perf auxtrace arm: Support
    compat_auxtrace_mmap__{read_head|write_tail}

 tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++
 tools/perf/util/auxtrace.c          | 88 +++++++++++++++++++++++++++--
 tools/perf/util/auxtrace.h          | 22 +++++++-
 tools/perf/util/env.c               | 24 +++++++-
 tools/perf/util/env.h               |  3 +
 5 files changed, 161 insertions(+), 8 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment
  2021-08-09 11:27 [PATCH v1 0/3] perf: Support compat mode for AUX ring buffer Leo Yan
@ 2021-08-09 11:27 ` Leo Yan
  2021-08-09 20:11   ` Arnaldo Carvalho de Melo
  2021-08-09 11:27 ` [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
  2021-08-09 11:27 ` [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
  2 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-08-09 11:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight,
	linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: Leo Yan, Arnaldo Carvalho de Melo

It's useful to know that the kernel is running in 32-bit or 64-bit
mode.  E.g. We can decide if perf tool is running in compat mode
based on the info.

This patch adds an item "kernel_is_64_bit" into session's environment
structure perf_env, its value is initialized based on the architecture
string.

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/env.c | 24 +++++++++++++++++++++++-
 tools/perf/util/env.h |  3 +++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index ab341050be46..8f7ff0035c41 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -219,13 +219,35 @@ void perf_env__exit(struct perf_env *env)
 	zfree(&env->hybrid_cpc_nodes);
 }
 
-void perf_env__init(struct perf_env *env __maybe_unused)
+void perf_env__init(struct perf_env *env)
 {
 #ifdef HAVE_LIBBPF_SUPPORT
 	env->bpf_progs.infos = RB_ROOT;
 	env->bpf_progs.btfs = RB_ROOT;
 	init_rwsem(&env->bpf_progs.lock);
 #endif
+	env->kernel_is_64_bit = -1;
+}
+
+static void perf_env__init_kernel_mode(struct perf_env *env)
+{
+	const char *arch = perf_env__raw_arch(env);
+
+	if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
+	    !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
+	    !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
+	    !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
+		env->kernel_is_64_bit = 1;
+	else
+		env->kernel_is_64_bit = 0;
+}
+
+int perf_env__kernel_is_64_bit(struct perf_env *env)
+{
+	if (env->kernel_is_64_bit == -1)
+		perf_env__init_kernel_mode(env);
+
+	return env->kernel_is_64_bit;
 }
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 6824a7423a2d..1f5175820a05 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -61,6 +61,7 @@ struct perf_env {
 	unsigned long long	total_mem;
 	unsigned int		msr_pmu_type;
 	unsigned int		max_branches;
+	int			kernel_is_64_bit;
 
 	int			nr_cmdline;
 	int			nr_sibling_cores;
@@ -143,6 +144,8 @@ extern struct perf_env perf_env;
 
 void perf_env__exit(struct perf_env *env);
 
+int perf_env__kernel_is_64_bit(struct perf_env *env);
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
 int perf_env__read_cpuid(struct perf_env *env);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-09 11:27 [PATCH v1 0/3] perf: Support compat mode for AUX ring buffer Leo Yan
  2021-08-09 11:27 ` [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment Leo Yan
@ 2021-08-09 11:27 ` Leo Yan
  2021-08-09 20:12   ` Arnaldo Carvalho de Melo
                     ` (2 more replies)
  2021-08-09 11:27 ` [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
  2 siblings, 3 replies; 17+ messages in thread
From: Leo Yan @ 2021-08-09 11:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight,
	linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: Leo Yan

When perf runs in compat mode (kernel in 64-bit mode and the perf is in
32-bit mode), the 64-bit value atomicity in the user space cannot be
assured, E.g. on some architectures, the 64-bit value accessing is split
into two instructions, one is for the low 32-bit word accessing and
another is for the high 32-bit word.

This patch introduces weak functions compat_auxtrace_mmap__read_head()
and compat_auxtrace_mmap__write_tail(), as their naming indicates, when
perf tool works in compat mode, it uses these two functions to access
the AUX head and tail.  These two functions can allow the perf tool to
work properly in certain conditions, e.g. when perf tool works in
snapshot mode with only using AUX head pointer, or perf tool uses the
AUX buffer and the incremented tail is not bigger than 4GB.

When perf tool cannot handle the case when the AUX tail is bigger than
4GB, the function compat_auxtrace_mmap__write_tail() returns -1 and
tells the caller to bail out for the error.

These two functions are declared as weak attribute, this allows to
implement arch specific functions if any arch can support the 64-bit
value atomicity in compat mode.

Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/auxtrace.c | 88 ++++++++++++++++++++++++++++++++++++--
 tools/perf/util/auxtrace.h | 22 ++++++++--
 2 files changed, 103 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index f33f09b8b535..60975df05d54 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1669,6 +1669,82 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
 	return 0;
 }
 
+/*
+ * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
+ * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
+ * the issues caused by the below sequence on multiple CPUs: when perf tool
+ * accesses either the load operation or the store operation for 64-bit value,
+ * on some architectures the operation is divided into two instructions, one
+ * is for accessing the low 32-bit value and another is for the high 32-bit;
+ * thus these two user operations can give the kernel chances to access the
+ * 64-bit value, and thus leads to the unexpected load values.
+ *
+ *   kernel (64-bit)                        user (32-bit)
+ *
+ *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
+ *       STORE $aux_data      |       ,--->
+ *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
+ *       STORE ->aux_head   --|-------`     smp_rmb()
+ *   }                        |             LOAD $data
+ *                            |             smp_mb()
+ *                            |             STORE ->aux_tail_lo
+ *                            `----------->
+ *                                          STORE ->aux_tail_hi
+ *
+ * For this reason, it's impossible for the perf tool to work correctly when
+ * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
+ * can not simply limit the AUX ring buffer to less than 4GB, the reason is
+ * the pointers can be increased monotonically, whatever the buffer size it is,
+ * at the end the head and tail can be bigger than 4GB and carry out to the
+ * high 32-bit.
+ *
+ * To mitigate the issues and improve the user experience, we can allow the
+ * perf tool working in certain conditions and bail out with error if detect
+ * any overflow cannot be handled.
+ *
+ * For reading the AUX head, it reads out the values for three times, and
+ * compares the high 4 bytes of the values between the first time and the last
+ * time, if there has no change for high 4 bytes injected by the kernel during
+ * the user reading sequence, it's safe for use the second value.
+ *
+ * When update the AUX tail and detects any carrying in the high 32 bits, it
+ * means there have two store operations in user space and it cannot promise
+ * the atomicity for 64-bit write, so return '-1' in this case to tell the
+ * caller an overflow error has happened.
+ */
+u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 first, second, last;
+	u64 mask = (u64)(UINT32_MAX) << 32;
+
+	do {
+		first = READ_ONCE(pc->aux_head);
+		/* Ensure all reads are done after we read the head */
+		smp_rmb();
+		second = READ_ONCE(pc->aux_head);
+		/* Ensure all reads are done after we read the head */
+		smp_rmb();
+		last = READ_ONCE(pc->aux_head);
+	} while ((first & mask) != (last & mask));
+
+	return second;
+}
+
+int __weak compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 mask = (u64)(UINT32_MAX) << 32;
+
+	if (tail & mask)
+		return -1;
+
+	/* Ensure all reads are done before we write the tail out */
+	smp_mb();
+	WRITE_ONCE(pc->aux_tail, tail);
+	return 0;
+}
+
 static int __auxtrace_mmap__read(struct mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
@@ -1680,8 +1756,9 @@ static int __auxtrace_mmap__read(struct mmap *map,
 	size_t size, head_off, old_off, len1, len2, padding;
 	union perf_event ev;
 	void *data1, *data2;
+	int kernel_is_64_bit = perf_env__kernel_is_64_bit(evsel__env(NULL));
 
-	head = auxtrace_mmap__read_head(mm);
+	head = auxtrace_mmap__read_head(mm, kernel_is_64_bit);
 
 	if (snapshot &&
 	    auxtrace_record__find_snapshot(itr, mm->idx, mm, data, &head, &old))
@@ -1764,10 +1841,13 @@ static int __auxtrace_mmap__read(struct mmap *map,
 	mm->prev = head;
 
 	if (!snapshot) {
-		auxtrace_mmap__write_tail(mm, head);
-		if (itr->read_finish) {
-			int err;
+		int err;
+
+		err = auxtrace_mmap__write_tail(mm, head, kernel_is_64_bit);
+		if (err < 0)
+			return err;
 
+		if (itr->read_finish) {
 			err = itr->read_finish(itr, mm->idx);
 			if (err < 0)
 				return err;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index d68a5e80b217..5f383908ca6e 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -440,23 +440,39 @@ struct auxtrace_cache;
 
 #ifdef HAVE_AUXTRACE_SUPPORT
 
-static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
+u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);
+int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail);
+
+static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm,
+					   int kernel_is_64_bit __maybe_unused)
 {
 	struct perf_event_mmap_page *pc = mm->userpg;
-	u64 head = READ_ONCE(pc->aux_head);
+	u64 head;
+
+#if BITS_PER_LONG == 32
+	if (kernel_is_64_bit)
+		return compat_auxtrace_mmap__read_head(mm);
+#endif
+	head = READ_ONCE(pc->aux_head);
 
 	/* Ensure all reads are done after we read the head */
 	smp_rmb();
 	return head;
 }
 
-static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
+static inline int auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail,
+					    int kernel_is_64_bit __maybe_unused)
 {
 	struct perf_event_mmap_page *pc = mm->userpg;
 
+#if BITS_PER_LONG == 32
+	if (kernel_is_64_bit)
+		return compat_auxtrace_mmap__write_tail(mm, tail);
+#endif
 	/* Ensure all reads are done before we write the tail out */
 	smp_mb();
 	WRITE_ONCE(pc->aux_tail, tail);
+	return 0;
 }
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-09 11:27 [PATCH v1 0/3] perf: Support compat mode for AUX ring buffer Leo Yan
  2021-08-09 11:27 ` [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment Leo Yan
  2021-08-09 11:27 ` [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
@ 2021-08-09 11:27 ` Leo Yan
  2021-08-23 12:23   ` James Clark
  2 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-08-09 11:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight,
	linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: Leo Yan

When the tool runs with compat mode on Arm platform, the kernel is in
64-bit mode and user space is in 32-bit mode; the user space can use
instructions "ldrd" and "strd" for 64-bit value atomicity.

This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
building, it uses "ldrd" and "strd" instructions to ensure accessing
atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
built for arm and arm64 building, these two functions are not needed for
arm64, so check the compiler macro "__arm__" to only include them for
arm building.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index b187bddbd01a..c7c7ec0812d5 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -107,3 +107,35 @@ struct auxtrace_record
 	*err = 0;
 	return NULL;
 }
+
+#if defined(__arm__)
+u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 result;
+
+	__asm__ __volatile__(
+"	ldrd    %0, %H0, [%1]"
+	: "=&r" (result)
+	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
+	);
+
+	return result;
+}
+
+int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+
+	/* Ensure all reads are done before we write the tail out */
+	smp_mb();
+
+	__asm__ __volatile__(
+"	strd    %2, %H2, [%1]"
+	: "=Qo" (pc->aux_tail)
+	: "r" (&pc->aux_tail), "r" (tail)
+	);
+
+	return 0;
+}
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment
  2021-08-09 11:27 ` [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment Leo Yan
@ 2021-08-09 20:11   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-08-09 20:11 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Adrian Hunter, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Will Deacon,
	Russell King, Catalin Marinas, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, John Garry, Andi Kleen, Riccardo Mancini, Jin Yao,
	Li Huafei, coresight, linux-arm-kernel, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo

Em Mon, Aug 09, 2021 at 07:27:25PM +0800, Leo Yan escreveu:
> It's useful to know that the kernel is running in 32-bit or 64-bit
> mode.  E.g. We can decide if perf tool is running in compat mode
> based on the info.
> 
> This patch adds an item "kernel_is_64_bit" into session's environment
> structure perf_env, its value is initialized based on the architecture
> string.

Thanks, applied.

- Arnaldo

 
> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/env.c | 24 +++++++++++++++++++++++-
>  tools/perf/util/env.h |  3 +++
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index ab341050be46..8f7ff0035c41 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -219,13 +219,35 @@ void perf_env__exit(struct perf_env *env)
>  	zfree(&env->hybrid_cpc_nodes);
>  }
>  
> -void perf_env__init(struct perf_env *env __maybe_unused)
> +void perf_env__init(struct perf_env *env)
>  {
>  #ifdef HAVE_LIBBPF_SUPPORT
>  	env->bpf_progs.infos = RB_ROOT;
>  	env->bpf_progs.btfs = RB_ROOT;
>  	init_rwsem(&env->bpf_progs.lock);
>  #endif
> +	env->kernel_is_64_bit = -1;
> +}
> +
> +static void perf_env__init_kernel_mode(struct perf_env *env)
> +{
> +	const char *arch = perf_env__raw_arch(env);
> +
> +	if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
> +	    !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
> +	    !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
> +	    !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
> +		env->kernel_is_64_bit = 1;
> +	else
> +		env->kernel_is_64_bit = 0;
> +}
> +
> +int perf_env__kernel_is_64_bit(struct perf_env *env)
> +{
> +	if (env->kernel_is_64_bit == -1)
> +		perf_env__init_kernel_mode(env);
> +
> +	return env->kernel_is_64_bit;
>  }
>  
>  int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 6824a7423a2d..1f5175820a05 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -61,6 +61,7 @@ struct perf_env {
>  	unsigned long long	total_mem;
>  	unsigned int		msr_pmu_type;
>  	unsigned int		max_branches;
> +	int			kernel_is_64_bit;
>  
>  	int			nr_cmdline;
>  	int			nr_sibling_cores;
> @@ -143,6 +144,8 @@ extern struct perf_env perf_env;
>  
>  void perf_env__exit(struct perf_env *env);
>  
> +int perf_env__kernel_is_64_bit(struct perf_env *env);
> +
>  int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
>  
>  int perf_env__read_cpuid(struct perf_env *env);
> -- 
> 2.25.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-09 11:27 ` [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
@ 2021-08-09 20:12   ` Arnaldo Carvalho de Melo
  2021-08-12  9:23   ` Adrian Hunter
  2021-08-13 16:22   ` James Clark
  2 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-08-09 20:12 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Adrian Hunter, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Will Deacon,
	Russell King, Catalin Marinas, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, John Garry, Andi Kleen, Riccardo Mancini, Jin Yao,
	Li Huafei, coresight, linux-arm-kernel, linux-perf-users,
	linux-kernel

Em Mon, Aug 09, 2021 at 07:27:26PM +0800, Leo Yan escreveu:
> When perf runs in compat mode (kernel in 64-bit mode and the perf is in
> 32-bit mode), the 64-bit value atomicity in the user space cannot be
> assured, E.g. on some architectures, the 64-bit value accessing is split
> into two instructions, one is for the low 32-bit word accessing and
> another is for the high 32-bit word.
> 
> This patch introduces weak functions compat_auxtrace_mmap__read_head()
> and compat_auxtrace_mmap__write_tail(), as their naming indicates, when
> perf tool works in compat mode, it uses these two functions to access
> the AUX head and tail.  These two functions can allow the perf tool to
> work properly in certain conditions, e.g. when perf tool works in
> snapshot mode with only using AUX head pointer, or perf tool uses the
> AUX buffer and the incremented tail is not bigger than 4GB.
> 
> When perf tool cannot handle the case when the AUX tail is bigger than
> 4GB, the function compat_auxtrace_mmap__write_tail() returns -1 and
> tells the caller to bail out for the error.
> 
> These two functions are declared as weak attribute, this allows to
> implement arch specific functions if any arch can support the 64-bit
> value atomicity in compat mode.

Adrian, ARM guys, can you please review this?

Thanks,

- Arnaldo
 
> Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/auxtrace.c | 88 ++++++++++++++++++++++++++++++++++++--
>  tools/perf/util/auxtrace.h | 22 ++++++++--
>  2 files changed, 103 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index f33f09b8b535..60975df05d54 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1669,6 +1669,82 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
>  	return 0;
>  }
>  
> +/*
> + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
> + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
> + * the issues caused by the below sequence on multiple CPUs: when perf tool
> + * accesses either the load operation or the store operation for 64-bit value,
> + * on some architectures the operation is divided into two instructions, one
> + * is for accessing the low 32-bit value and another is for the high 32-bit;
> + * thus these two user operations can give the kernel chances to access the
> + * 64-bit value, and thus leads to the unexpected load values.
> + *
> + *   kernel (64-bit)                        user (32-bit)
> + *
> + *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
> + *       STORE $aux_data      |       ,--->
> + *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
> + *       STORE ->aux_head   --|-------`     smp_rmb()
> + *   }                        |             LOAD $data
> + *                            |             smp_mb()
> + *                            |             STORE ->aux_tail_lo
> + *                            `----------->
> + *                                          STORE ->aux_tail_hi
> + *
> + * For this reason, it's impossible for the perf tool to work correctly when
> + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
> + * can not simply limit the AUX ring buffer to less than 4GB, the reason is
> + * the pointers can be increased monotonically, whatever the buffer size it is,
> + * at the end the head and tail can be bigger than 4GB and carry out to the
> + * high 32-bit.
> + *
> + * To mitigate the issues and improve the user experience, we can allow the
> + * perf tool working in certain conditions and bail out with error if detect
> + * any overflow cannot be handled.
> + *
> + * For reading the AUX head, it reads out the values for three times, and
> + * compares the high 4 bytes of the values between the first time and the last
> + * time, if there has no change for high 4 bytes injected by the kernel during
> + * the user reading sequence, it's safe for use the second value.
> + *
> + * When update the AUX tail and detects any carrying in the high 32 bits, it
> + * means there have two store operations in user space and it cannot promise
> + * the atomicity for 64-bit write, so return '-1' in this case to tell the
> + * caller an overflow error has happened.
> + */
> +u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 first, second, last;
> +	u64 mask = (u64)(UINT32_MAX) << 32;
> +
> +	do {
> +		first = READ_ONCE(pc->aux_head);
> +		/* Ensure all reads are done after we read the head */
> +		smp_rmb();
> +		second = READ_ONCE(pc->aux_head);
> +		/* Ensure all reads are done after we read the head */
> +		smp_rmb();
> +		last = READ_ONCE(pc->aux_head);
> +	} while ((first & mask) != (last & mask));
> +
> +	return second;
> +}
> +
> +int __weak compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 mask = (u64)(UINT32_MAX) << 32;
> +
> +	if (tail & mask)
> +		return -1;
> +
> +	/* Ensure all reads are done before we write the tail out */
> +	smp_mb();
> +	WRITE_ONCE(pc->aux_tail, tail);
> +	return 0;
> +}
> +
>  static int __auxtrace_mmap__read(struct mmap *map,
>  				 struct auxtrace_record *itr,
>  				 struct perf_tool *tool, process_auxtrace_t fn,
> @@ -1680,8 +1756,9 @@ static int __auxtrace_mmap__read(struct mmap *map,
>  	size_t size, head_off, old_off, len1, len2, padding;
>  	union perf_event ev;
>  	void *data1, *data2;
> +	int kernel_is_64_bit = perf_env__kernel_is_64_bit(evsel__env(NULL));
>  
> -	head = auxtrace_mmap__read_head(mm);
> +	head = auxtrace_mmap__read_head(mm, kernel_is_64_bit);
>  
>  	if (snapshot &&
>  	    auxtrace_record__find_snapshot(itr, mm->idx, mm, data, &head, &old))
> @@ -1764,10 +1841,13 @@ static int __auxtrace_mmap__read(struct mmap *map,
>  	mm->prev = head;
>  
>  	if (!snapshot) {
> -		auxtrace_mmap__write_tail(mm, head);
> -		if (itr->read_finish) {
> -			int err;
> +		int err;
> +
> +		err = auxtrace_mmap__write_tail(mm, head, kernel_is_64_bit);
> +		if (err < 0)
> +			return err;
>  
> +		if (itr->read_finish) {
>  			err = itr->read_finish(itr, mm->idx);
>  			if (err < 0)
>  				return err;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index d68a5e80b217..5f383908ca6e 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -440,23 +440,39 @@ struct auxtrace_cache;
>  
>  #ifdef HAVE_AUXTRACE_SUPPORT
>  
> -static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);
> +int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail);
> +
> +static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm,
> +					   int kernel_is_64_bit __maybe_unused)
>  {
>  	struct perf_event_mmap_page *pc = mm->userpg;
> -	u64 head = READ_ONCE(pc->aux_head);
> +	u64 head;
> +
> +#if BITS_PER_LONG == 32
> +	if (kernel_is_64_bit)
> +		return compat_auxtrace_mmap__read_head(mm);
> +#endif
> +	head = READ_ONCE(pc->aux_head);
>  
>  	/* Ensure all reads are done after we read the head */
>  	smp_rmb();
>  	return head;
>  }
>  
> -static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +static inline int auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail,
> +					    int kernel_is_64_bit __maybe_unused)
>  {
>  	struct perf_event_mmap_page *pc = mm->userpg;
>  
> +#if BITS_PER_LONG == 32
> +	if (kernel_is_64_bit)
> +		return compat_auxtrace_mmap__write_tail(mm, tail);
> +#endif
>  	/* Ensure all reads are done before we write the tail out */
>  	smp_mb();
>  	WRITE_ONCE(pc->aux_tail, tail);
> +	return 0;
>  }
>  
>  int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
> -- 
> 2.25.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-09 11:27 ` [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
  2021-08-09 20:12   ` Arnaldo Carvalho de Melo
@ 2021-08-12  9:23   ` Adrian Hunter
  2021-08-13 16:22   ` James Clark
  2 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-08-12  9:23 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Will Deacon, Russell King, Catalin Marinas, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, John Garry, Andi Kleen,
	Riccardo Mancini, Jin Yao, Li Huafei, coresight,
	linux-arm-kernel, linux-perf-users, linux-kernel

On 9/08/21 2:27 pm, Leo Yan wrote:
> When perf runs in compat mode (kernel in 64-bit mode and the perf is in
> 32-bit mode), the 64-bit value atomicity in the user space cannot be
> assured, E.g. on some architectures, the 64-bit value accessing is split
> into two instructions, one is for the low 32-bit word accessing and
> another is for the high 32-bit word.
> 
> This patch introduces weak functions compat_auxtrace_mmap__read_head()
> and compat_auxtrace_mmap__write_tail(), as their naming indicates, when
> perf tool works in compat mode, it uses these two functions to access
> the AUX head and tail.  These two functions can allow the perf tool to
> work properly in certain conditions, e.g. when perf tool works in
> snapshot mode with only using AUX head pointer, or perf tool uses the
> AUX buffer and the incremented tail is not bigger than 4GB.
> 
> When perf tool cannot handle the case when the AUX tail is bigger than
> 4GB, the function compat_auxtrace_mmap__write_tail() returns -1 and
> tells the caller to bail out for the error.
> 
> These two functions are declared as weak attribute, this allows to
> implement arch specific functions if any arch can support the 64-bit
> value atomicity in compat mode.
> 
> Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/perf/util/auxtrace.c | 88 ++++++++++++++++++++++++++++++++++++--
>  tools/perf/util/auxtrace.h | 22 ++++++++--
>  2 files changed, 103 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index f33f09b8b535..60975df05d54 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1669,6 +1669,82 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
>  	return 0;
>  }
>  
> +/*
> + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
> + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
> + * the issues caused by the below sequence on multiple CPUs: when perf tool
> + * accesses either the load operation or the store operation for 64-bit value,
> + * on some architectures the operation is divided into two instructions, one
> + * is for accessing the low 32-bit value and another is for the high 32-bit;
> + * thus these two user operations can give the kernel chances to access the
> + * 64-bit value, and thus leads to the unexpected load values.
> + *
> + *   kernel (64-bit)                        user (32-bit)
> + *
> + *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
> + *       STORE $aux_data      |       ,--->
> + *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
> + *       STORE ->aux_head   --|-------`     smp_rmb()
> + *   }                        |             LOAD $data
> + *                            |             smp_mb()
> + *                            |             STORE ->aux_tail_lo
> + *                            `----------->
> + *                                          STORE ->aux_tail_hi
> + *
> + * For this reason, it's impossible for the perf tool to work correctly when
> + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
> + * can not simply limit the AUX ring buffer to less than 4GB, the reason is
> + * the pointers can be increased monotonically, whatever the buffer size it is,
> + * at the end the head and tail can be bigger than 4GB and carry out to the
> + * high 32-bit.
> + *
> + * To mitigate the issues and improve the user experience, we can allow the
> + * perf tool working in certain conditions and bail out with error if detect
> + * any overflow cannot be handled.
> + *
> + * For reading the AUX head, it reads out the values for three times, and
> + * compares the high 4 bytes of the values between the first time and the last
> + * time, if there has no change for high 4 bytes injected by the kernel during
> + * the user reading sequence, it's safe for use the second value.
> + *
> + * When update the AUX tail and detects any carrying in the high 32 bits, it
> + * means there have two store operations in user space and it cannot promise
> + * the atomicity for 64-bit write, so return '-1' in this case to tell the
> + * caller an overflow error has happened.
> + */
> +u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 first, second, last;
> +	u64 mask = (u64)(UINT32_MAX) << 32;
> +
> +	do {
> +		first = READ_ONCE(pc->aux_head);
> +		/* Ensure all reads are done after we read the head */
> +		smp_rmb();
> +		second = READ_ONCE(pc->aux_head);
> +		/* Ensure all reads are done after we read the head */
> +		smp_rmb();
> +		last = READ_ONCE(pc->aux_head);
> +	} while ((first & mask) != (last & mask));
> +
> +	return second;
> +}
> +
> +int __weak compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 mask = (u64)(UINT32_MAX) << 32;
> +
> +	if (tail & mask)
> +		return -1;
> +
> +	/* Ensure all reads are done before we write the tail out */
> +	smp_mb();
> +	WRITE_ONCE(pc->aux_tail, tail);
> +	return 0;
> +}
> +
>  static int __auxtrace_mmap__read(struct mmap *map,
>  				 struct auxtrace_record *itr,
>  				 struct perf_tool *tool, process_auxtrace_t fn,
> @@ -1680,8 +1756,9 @@ static int __auxtrace_mmap__read(struct mmap *map,
>  	size_t size, head_off, old_off, len1, len2, padding;
>  	union perf_event ev;
>  	void *data1, *data2;
> +	int kernel_is_64_bit = perf_env__kernel_is_64_bit(evsel__env(NULL));
>  
> -	head = auxtrace_mmap__read_head(mm);
> +	head = auxtrace_mmap__read_head(mm, kernel_is_64_bit);
>  
>  	if (snapshot &&
>  	    auxtrace_record__find_snapshot(itr, mm->idx, mm, data, &head, &old))
> @@ -1764,10 +1841,13 @@ static int __auxtrace_mmap__read(struct mmap *map,
>  	mm->prev = head;
>  
>  	if (!snapshot) {
> -		auxtrace_mmap__write_tail(mm, head);
> -		if (itr->read_finish) {
> -			int err;
> +		int err;
> +
> +		err = auxtrace_mmap__write_tail(mm, head, kernel_is_64_bit);
> +		if (err < 0)
> +			return err;
>  
> +		if (itr->read_finish) {
>  			err = itr->read_finish(itr, mm->idx);
>  			if (err < 0)
>  				return err;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index d68a5e80b217..5f383908ca6e 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -440,23 +440,39 @@ struct auxtrace_cache;
>  
>  #ifdef HAVE_AUXTRACE_SUPPORT
>  
> -static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);
> +int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail);
> +
> +static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm,
> +					   int kernel_is_64_bit __maybe_unused)
>  {
>  	struct perf_event_mmap_page *pc = mm->userpg;
> -	u64 head = READ_ONCE(pc->aux_head);
> +	u64 head;
> +
> +#if BITS_PER_LONG == 32
> +	if (kernel_is_64_bit)
> +		return compat_auxtrace_mmap__read_head(mm);
> +#endif
> +	head = READ_ONCE(pc->aux_head);
>  
>  	/* Ensure all reads are done after we read the head */
>  	smp_rmb();
>  	return head;
>  }
>  
> -static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +static inline int auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail,
> +					    int kernel_is_64_bit __maybe_unused)
>  {
>  	struct perf_event_mmap_page *pc = mm->userpg;
>  
> +#if BITS_PER_LONG == 32
> +	if (kernel_is_64_bit)
> +		return compat_auxtrace_mmap__write_tail(mm, tail);
> +#endif
>  	/* Ensure all reads are done before we write the tail out */
>  	smp_mb();
>  	WRITE_ONCE(pc->aux_tail, tail);
> +	return 0;
>  }
>  
>  int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-09 11:27 ` [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
  2021-08-09 20:12   ` Arnaldo Carvalho de Melo
  2021-08-12  9:23   ` Adrian Hunter
@ 2021-08-13 16:22   ` James Clark
  2021-08-23  9:51     ` Leo Yan
  2 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2021-08-13 16:22 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight



On 09/08/2021 12:27, Leo Yan wrote:
> +/*
> + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
> + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
> + * the issues caused by the below sequence on multiple CPUs: when perf tool
> + * accesses either the load operation or the store operation for 64-bit value,
> + * on some architectures the operation is divided into two instructions, one
> + * is for accessing the low 32-bit value and another is for the high 32-bit;
> + * thus these two user operations can give the kernel chances to access the
> + * 64-bit value, and thus leads to the unexpected load values.
> + *
> + *   kernel (64-bit)                        user (32-bit)
> + *
> + *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
> + *       STORE $aux_data      |       ,--->
> + *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
> + *       STORE ->aux_head   --|-------`     smp_rmb()
> + *   }                        |             LOAD $data
> + *                            |             smp_mb()
> + *                            |             STORE ->aux_tail_lo
> + *                            `----------->
> + *                                          STORE ->aux_tail_hi
> + *
> + * For this reason, it's impossible for the perf tool to work correctly when
> + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
> + * can not simply limit the AUX ring buffer to less than 4GB, the reason is
> + * the pointers can be increased monotonically, whatever the buffer size it is,
> + * at the end the head and tail can be bigger than 4GB and carry out to the
> + * high 32-bit.
> + *
> + * To mitigate the issues and improve the user experience, we can allow the
> + * perf tool working in certain conditions and bail out with error if detect
> + * any overflow cannot be handled.
> + *
> + * For reading the AUX head, it reads out the values for three times, and
> + * compares the high 4 bytes of the values between the first time and the last
> + * time, if there has no change for high 4 bytes injected by the kernel during
> + * the user reading sequence, it's safe for use the second value.
> + *
> + * When update the AUX tail and detects any carrying in the high 32 bits, it
> + * means there have two store operations in user space and it cannot promise
> + * the atomicity for 64-bit write, so return '-1' in this case to tell the
> + * caller an overflow error has happened.
> + */
> +u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 first, second, last;
> +	u64 mask = (u64)(UINT32_MAX) << 32;
> +
> +	do {
> +		first = READ_ONCE(pc->aux_head);
> +		/* Ensure all reads are done after we read the head */
> +		smp_rmb();
> +		second = READ_ONCE(pc->aux_head);
> +		/* Ensure all reads are done after we read the head */
> +		smp_rmb();
> +		last = READ_ONCE(pc->aux_head);
> +	} while ((first & mask) != (last & mask));
> +
> +	return second;
> +}
> +

Hi Leo,

I had a couple of questions about this bit. If we're assuming that the
high bytes of 'first' and 'last' are equal, then 'second' is supposed
to be somewhere in between or equal to 'first' and 'last'.

If that's the case, wouldn't it be better to return 'last', because it's
closer to the value at the time of reading? And then in that case, if
last is returned, then why do a read for 'second' at all? Can 'second' be
skipped and just read first and last?

Also maybe it won't make a difference, but is there a missing smp_rmb()
between the read of 'last' and 'first'?

Thanks
James

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-13 16:22   ` James Clark
@ 2021-08-23  9:51     ` Leo Yan
  2021-08-23 10:57       ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-08-23  9:51 UTC (permalink / raw)
  To: James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight

Hi James,

On Fri, Aug 13, 2021 at 05:22:31PM +0100, James Clark wrote:
> On 09/08/2021 12:27, Leo Yan wrote:
> > +/*
> > + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
> > + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
> > + * the issues caused by the below sequence on multiple CPUs: when perf tool
> > + * accesses either the load operation or the store operation for 64-bit value,
> > + * on some architectures the operation is divided into two instructions, one
> > + * is for accessing the low 32-bit value and another is for the high 32-bit;
> > + * thus these two user operations can give the kernel chances to access the
> > + * 64-bit value, and thus leads to the unexpected load values.
> > + *
> > + *   kernel (64-bit)                        user (32-bit)
> > + *
> > + *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
> > + *       STORE $aux_data      |       ,--->
> > + *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
> > + *       STORE ->aux_head   --|-------`     smp_rmb()
> > + *   }                        |             LOAD $data
> > + *                            |             smp_mb()
> > + *                            |             STORE ->aux_tail_lo
> > + *                            `----------->
> > + *                                          STORE ->aux_tail_hi
> > + *
> > + * For this reason, it's impossible for the perf tool to work correctly when
> > + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
> > + * can not simply limit the AUX ring buffer to less than 4GB, the reason is
> > + * the pointers can be increased monotonically, whatever the buffer size it is,
> > + * at the end the head and tail can be bigger than 4GB and carry out to the
> > + * high 32-bit.
> > + *
> > + * To mitigate the issues and improve the user experience, we can allow the
> > + * perf tool working in certain conditions and bail out with error if detect
> > + * any overflow cannot be handled.
> > + *
> > + * For reading the AUX head, it reads out the values for three times, and
> > + * compares the high 4 bytes of the values between the first time and the last
> > + * time, if there has no change for high 4 bytes injected by the kernel during
> > + * the user reading sequence, it's safe for use the second value.
> > + *
> > + * When update the AUX tail and detects any carrying in the high 32 bits, it
> > + * means there have two store operations in user space and it cannot promise
> > + * the atomicity for 64-bit write, so return '-1' in this case to tell the
> > + * caller an overflow error has happened.
> > + */
> > +u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> > +{
> > +	struct perf_event_mmap_page *pc = mm->userpg;
> > +	u64 first, second, last;
> > +	u64 mask = (u64)(UINT32_MAX) << 32;
> > +
> > +	do {
> > +		first = READ_ONCE(pc->aux_head);
> > +		/* Ensure all reads are done after we read the head */
> > +		smp_rmb();
> > +		second = READ_ONCE(pc->aux_head);
> > +		/* Ensure all reads are done after we read the head */
> > +		smp_rmb();
> > +		last = READ_ONCE(pc->aux_head);
> > +	} while ((first & mask) != (last & mask));
> > +
> > +	return second;
> > +}
> > +
> 
> Hi Leo,
> 
> I had a couple of questions about this bit. If we're assuming that the
> high bytes of 'first' and 'last' are equal, then 'second' is supposed
> to be somewhere in between or equal to 'first' and 'last'.
> 
> If that's the case, wouldn't it be better to return 'last', because it's
> closer to the value at the time of reading?

> And then in that case, if last is returned, then why do a read for
> 'second' at all? Can 'second' be skipped and just read first and last?

Simply to say, the logic can be depicted as:

  step 1: read 'first'
  step 2: read 'second' -> There have no any atomicity risk if 'first'
                           is same with 'last'
  step 3: read 'last'

The key point is if the 'first' and 'last' have the same value in the
high word, there have no any increment for high word in the middle of
'first' and 'last', so we don't worry about the atomicity for 'second'.

But we cannot promise the atomicity for reading 'last', let's see
below sequence:

             CPU(a)                                 CPU(b)
  step 1: read 'first' (high word)
          read 'first' (low word)
  step 2: read 'second' (high word)
          read 'second' (low word)
  step 3: read 'last' (high word)
                                       --> write 'last' (high word)
                                       --> write 'last' (low word)
          read 'last' (low word)


Even 'first' and 'last' have the same high word, but the 'last' cannot
be trusted.

> Also maybe it won't make a difference, but is there a missing smp_rmb()
> between the read of 'last' and 'first'?

Good question, from my understanding, we only need to promise the flow
from step 1 to step 3, it's not necessary to add barrier in the middle
of the two continuous loops.

Thanks for reviewing!

Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23  9:51     ` Leo Yan
@ 2021-08-23 10:57       ` James Clark
  2021-08-23 12:13         ` Leo Yan
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2021-08-23 10:57 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight



On 23/08/2021 10:51, Leo Yan wrote:
> Hi James,
> 
> On Fri, Aug 13, 2021 at 05:22:31PM +0100, James Clark wrote:
>> On 09/08/2021 12:27, Leo Yan wrote:
>>> +/*
>>> + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
>>> + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
>>> + * the issues caused by the below sequence on multiple CPUs: when perf tool
>>> + * accesses either the load operation or the store operation for 64-bit value,
>>> + * on some architectures the operation is divided into two instructions, one
>>> + * is for accessing the low 32-bit value and another is for the high 32-bit;
>>> + * thus these two user operations can give the kernel chances to access the
>>> + * 64-bit value, and thus leads to the unexpected load values.
>>> + *
>>> + *   kernel (64-bit)                        user (32-bit)
>>> + *
>>> + *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
>>> + *       STORE $aux_data      |       ,--->
>>> + *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
>>> + *       STORE ->aux_head   --|-------`     smp_rmb()
>>> + *   }                        |             LOAD $data
>>> + *                            |             smp_mb()
>>> + *                            |             STORE ->aux_tail_lo
>>> + *                            `----------->
>>> + *                                          STORE ->aux_tail_hi
>>> + *
>>> + * For this reason, it's impossible for the perf tool to work correctly when
>>> + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
>>> + * can not simply limit the AUX ring buffer to less than 4GB, the reason is
>>> + * the pointers can be increased monotonically, whatever the buffer size it is,
>>> + * at the end the head and tail can be bigger than 4GB and carry out to the
>>> + * high 32-bit.
>>> + *
>>> + * To mitigate the issues and improve the user experience, we can allow the
>>> + * perf tool working in certain conditions and bail out with error if detect
>>> + * any overflow cannot be handled.
>>> + *
>>> + * For reading the AUX head, it reads out the values for three times, and
>>> + * compares the high 4 bytes of the values between the first time and the last
>>> + * time, if there has no change for high 4 bytes injected by the kernel during
>>> + * the user reading sequence, it's safe for use the second value.
>>> + *
>>> + * When update the AUX tail and detects any carrying in the high 32 bits, it
>>> + * means there have two store operations in user space and it cannot promise
>>> + * the atomicity for 64-bit write, so return '-1' in this case to tell the
>>> + * caller an overflow error has happened.
>>> + */
>>> +u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
>>> +{
>>> +	struct perf_event_mmap_page *pc = mm->userpg;
>>> +	u64 first, second, last;
>>> +	u64 mask = (u64)(UINT32_MAX) << 32;
>>> +
>>> +	do {
>>> +		first = READ_ONCE(pc->aux_head);
>>> +		/* Ensure all reads are done after we read the head */
>>> +		smp_rmb();
>>> +		second = READ_ONCE(pc->aux_head);
>>> +		/* Ensure all reads are done after we read the head */
>>> +		smp_rmb();
>>> +		last = READ_ONCE(pc->aux_head);
>>> +	} while ((first & mask) != (last & mask));
>>> +
>>> +	return second;
>>> +}
>>> +
>>
>> Hi Leo,
>>
>> I had a couple of questions about this bit. If we're assuming that the
>> high bytes of 'first' and 'last' are equal, then 'second' is supposed
>> to be somewhere in between or equal to 'first' and 'last'.
>>
>> If that's the case, wouldn't it be better to return 'last', because it's
>> closer to the value at the time of reading?
> 
>> And then in that case, if last is returned, then why do a read for
>> 'second' at all? Can 'second' be skipped and just read first and last?
> 
> Simply to say, the logic can be depicted as:
> 
>   step 1: read 'first'
>   step 2: read 'second' -> There have no any atomicity risk if 'first'
>                            is same with 'last'
>   step 3: read 'last'
> 
> The key point is if the 'first' and 'last' have the same value in the
> high word, there have no any increment for high word in the middle of
> 'first' and 'last', so we don't worry about the atomicity for 'second'.
> 
> But we cannot promise the atomicity for reading 'last', let's see
> below sequence:
> 
>              CPU(a)                                 CPU(b)
>   step 1: read 'first' (high word)
>           read 'first' (low word)
>   step 2: read 'second' (high word)
>           read 'second' (low word)
>   step 3: read 'last' (high word)
>                                        --> write 'last' (high word)
>                                        --> write 'last' (low word)
>           read 'last' (low word)
> 
> 
> Even 'first' and 'last' have the same high word, but the 'last' cannot
> be trusted.
> 
>> Also maybe it won't make a difference, but is there a missing smp_rmb()
>> between the read of 'last' and 'first'?
> 
> Good question, from my understanding, we only need to promise the flow
> from step 1 to step 3, it's not necessary to add barrier in the middle
> of the two continuous loops.
> 
> Thanks for reviewing!
> 

Ok thanks for the explanation, that makes sense now. I do have one other
point about the documentation for the function:

> + * When update the AUX tail and detects any carrying in the high 32 bits, it
> + * means there have two store operations in user space and it cannot promise
> + * the atomicity for 64-bit write, so return '-1' in this case to tell the
> + * caller an overflow error has happened.
> + */

I couldn't see how it can ever return -1, it seems like it would loop forever
until it reads the correct value.


> Leo
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23 10:57       ` James Clark
@ 2021-08-23 12:13         ` Leo Yan
  2021-08-23 16:00           ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-08-23 12:13 UTC (permalink / raw)
  To: James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight

On Mon, Aug 23, 2021 at 11:57:52AM +0100, James Clark wrote:

[...]

> Ok thanks for the explanation, that makes sense now. I do have one other
> point about the documentation for the function:

Welcome!

> > + * When update the AUX tail and detects any carrying in the high 32 bits, it
> > + * means there have two store operations in user space and it cannot promise
> > + * the atomicity for 64-bit write, so return '-1' in this case to tell the
> > + * caller an overflow error has happened.
> > + */
> 
> I couldn't see how it can ever return -1, it seems like it would loop forever
> until it reads the correct value.

I use this chunk comment to address the function
compat_auxtrace_mmap__write_tail():

+int __weak compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 mask = (u64)(UINT32_MAX) << 32;
+
+	if (tail & mask)
+		return -1;
+
+	/* Ensure all reads are done before we write the tail out */
+	smp_mb();
+	WRITE_ONCE(pc->aux_tail, tail);
+	return 0;
+}

Please let me know if this is okay or not?  Otherwise, if you think
the format can cause confusion, I'd like to split the comments into
two sections, one section for reading AUX head and another is for
writing AUX tail.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-09 11:27 ` [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
@ 2021-08-23 12:23   ` James Clark
  2021-08-23 13:30     ` Leo Yan
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2021-08-23 12:23 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mike Leach, Will Deacon,
	Peter Zijlstra, Adrian Hunter, Andi Kleen, Mark Rutland,
	Catalin Marinas, Russell King, linux-perf-users,
	linux-arm-kernel, Li Huafei, Jin Yao, Riccardo Mancini,
	Ingo Molnar, Namhyung Kim, Suzuki K Poulose, Mathieu Poirier,
	Alexander Shishkin, Jiri Olsa, John Garry, coresight,
	linux-kernel



On 09/08/2021 12:27, Leo Yan wrote:
> When the tool runs with compat mode on Arm platform, the kernel is in
> 64-bit mode and user space is in 32-bit mode; the user space can use
> instructions "ldrd" and "strd" for 64-bit value atomicity.
> 
> This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
> building, it uses "ldrd" and "strd" instructions to ensure accessing
> atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
> built for arm and arm64 building, these two functions are not needed for
> arm64, so check the compiler macro "__arm__" to only include them for
> arm building.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> index b187bddbd01a..c7c7ec0812d5 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -107,3 +107,35 @@ struct auxtrace_record
>  	*err = 0;
>  	return NULL;
>  }
> +
> +#if defined(__arm__)
> +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 result;
> +
> +	__asm__ __volatile__(
> +"	ldrd    %0, %H0, [%1]"
> +	: "=&r" (result)
> +	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
> +	);
> +
> +	return result;
> +}

Hi Leo,

I see that this is a duplicate of the atomic read in arch/arm/include/asm/atomic.h

For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
make it not so easy for Arm. Just wondering if you considered trying to include the
existing one? Or decided that it was easier to duplicate it?

Other than that, I have tested that the change works with a 32bit build with snapshot
and normal mode.

Reviewed by: James Clark <james.clark@arm.com>
Tested by: James Clark <james.clark@arm.com>
 
> +
> +int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +
> +	/* Ensure all reads are done before we write the tail out */
> +	smp_mb();
> +
> +	__asm__ __volatile__(
> +"	strd    %2, %H2, [%1]"
> +	: "=Qo" (pc->aux_tail)
> +	: "r" (&pc->aux_tail), "r" (tail)
> +	);
> +
> +	return 0;
> +}
> +#endif
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23 12:23   ` James Clark
@ 2021-08-23 13:30     ` Leo Yan
  2021-08-23 13:39       ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-08-23 13:30 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Mike Leach, Will Deacon,
	Peter Zijlstra, Adrian Hunter, Andi Kleen, Mark Rutland,
	Catalin Marinas, Russell King, linux-perf-users,
	linux-arm-kernel, Li Huafei, Jin Yao, Riccardo Mancini,
	Ingo Molnar, Namhyung Kim, Suzuki K Poulose, Mathieu Poirier,
	Alexander Shishkin, Jiri Olsa, John Garry, coresight,
	linux-kernel

Hi James,

On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:
> 
> 
> On 09/08/2021 12:27, Leo Yan wrote:
> > When the tool runs with compat mode on Arm platform, the kernel is in
> > 64-bit mode and user space is in 32-bit mode; the user space can use
> > instructions "ldrd" and "strd" for 64-bit value atomicity.
> > 
> > This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
> > building, it uses "ldrd" and "strd" instructions to ensure accessing
> > atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
> > built for arm and arm64 building, these two functions are not needed for
> > arm64, so check the compiler macro "__arm__" to only include them for
> > arm building.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> > index b187bddbd01a..c7c7ec0812d5 100644
> > --- a/tools/perf/arch/arm/util/auxtrace.c
> > +++ b/tools/perf/arch/arm/util/auxtrace.c
> > @@ -107,3 +107,35 @@ struct auxtrace_record
> >  	*err = 0;
> >  	return NULL;
> >  }
> > +
> > +#if defined(__arm__)
> > +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> > +{
> > +	struct perf_event_mmap_page *pc = mm->userpg;
> > +	u64 result;
> > +
> > +	__asm__ __volatile__(
> > +"	ldrd    %0, %H0, [%1]"
> > +	: "=&r" (result)
> > +	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
> > +	);
> > +
> > +	return result;
> > +}
> 
> Hi Leo,
> 
> I see that this is a duplicate of the atomic read in arch/arm/include/asm/atomic.h

Exactly.

> For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> make it not so easy for Arm. Just wondering if you considered trying to include the
> existing one? Or decided that it was easier to duplicate it?

Good finding!

With you reminding, I recognized that the atomic operations for
arm/arm64 should be improved for user space program.  So far, perf tool
simply uses the compiler's atomic implementations (from
asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
implementation, I think we should improve the user space program with
architecture's atomic instructions.

So I think your question should be converted to: should we export the
arm/arm64 atomicity operations to user space program?  Seems to me this
is a challenge work, we need at least to finish below items:

- Support arm64 atomic operations and reuse kernel's
  arch/arm/include/asm/atomic.h;
- Support arm atomic operation and reuse kernel's
  arch/arm/include/asm/atomic.h;
- For aarch32 building, we need to use configurations to distinguish
  different cases, like LPAE, Armv7, and ARMv6 variants (so far I have
  no idea how to use a graceful way to distinguish these different
  building in perf tool).

I am not sure if there have any existed ongoing effort for this part,
if anyone is working on this (or before have started related work),
then definitely we should look into how we can reuse the arch's
atomic headers.

Otherwise, I prefer to firstly merge this patch with dozen lines of
duplicate code; afterwards, we can send a separate patch set to
support arm/arm64 atomic operations in user space.

If any Arm/Arm64 maintainers could shed some light for this part work,
I think it would be very helpful.

> Other than that, I have tested that the change works with a 32bit build with snapshot
> and normal mode.
> 
> Reviewed by: James Clark <james.clark@arm.com>
> Tested by: James Clark <james.clark@arm.com>

Thanks for test and review!

Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23 13:30     ` Leo Yan
@ 2021-08-23 13:39       ` Russell King (Oracle)
  2021-08-23 14:54         ` Leo Yan
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-08-23 13:39 UTC (permalink / raw)
  To: Leo Yan
  Cc: James Clark, Arnaldo Carvalho de Melo, Mike Leach, Will Deacon,
	Peter Zijlstra, Adrian Hunter, Andi Kleen, Mark Rutland,
	Catalin Marinas, linux-perf-users, linux-arm-kernel, Li Huafei,
	Jin Yao, Riccardo Mancini, Ingo Molnar, Namhyung Kim,
	Suzuki K Poulose, Mathieu Poirier, Alexander Shishkin, Jiri Olsa,
	John Garry, coresight, linux-kernel

On Mon, Aug 23, 2021 at 09:30:43PM +0800, Leo Yan wrote:
> Hi James,
> 
> On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:
> > 
> > 
> > On 09/08/2021 12:27, Leo Yan wrote:
> > > When the tool runs with compat mode on Arm platform, the kernel is in
> > > 64-bit mode and user space is in 32-bit mode; the user space can use
> > > instructions "ldrd" and "strd" for 64-bit value atomicity.
> > > 
> > > This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
> > > building, it uses "ldrd" and "strd" instructions to ensure accessing
> > > atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
> > > built for arm and arm64 building, these two functions are not needed for
> > > arm64, so check the compiler macro "__arm__" to only include them for
> > > arm building.
> > > 
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > > 
> > > diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> > > index b187bddbd01a..c7c7ec0812d5 100644
> > > --- a/tools/perf/arch/arm/util/auxtrace.c
> > > +++ b/tools/perf/arch/arm/util/auxtrace.c
> > > @@ -107,3 +107,35 @@ struct auxtrace_record
> > >  	*err = 0;
> > >  	return NULL;
> > >  }
> > > +
> > > +#if defined(__arm__)
> > > +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> > > +{
> > > +	struct perf_event_mmap_page *pc = mm->userpg;
> > > +	u64 result;
> > > +
> > > +	__asm__ __volatile__(
> > > +"	ldrd    %0, %H0, [%1]"
> > > +	: "=&r" (result)
> > > +	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
> > > +	);
> > > +
> > > +	return result;
> > > +}
> > 
> > Hi Leo,
> > 
> > I see that this is a duplicate of the atomic read in arch/arm/include/asm/atomic.h
> 
> Exactly.
> 
> > For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> > include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> > make it not so easy for Arm. Just wondering if you considered trying to include the
> > existing one? Or decided that it was easier to duplicate it?
> 
> Good finding!
> 
> With you reminding, I recognized that the atomic operations for
> arm/arm64 should be improved for user space program.  So far, perf tool
> simply uses the compiler's atomic implementations (from
> asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
> implementation, I think we should improve the user space program with
> architecture's atomic instructions.

No we should not. Sometimes, what's in the kernel is for the kernel's
use only, and not for userspace's use. That may be because what works
in kernel space does not work in userspace.

For example, the ARMv6+ atomic operations can be executed in userspace
_provided_ they are only used on memory which has an exclusive monitor.
They can't be used on anything that is not "normal memory". Prior to
ARMv6, the atomic operations rely on disabling interrupts. That
facility is simply not available to userspace, so these must not be
made available to userspace.

The same applies to bitops.

We've been here before in the past, when the kernel headers were not
separated from the user ABI headers, and people would write programs
that included e.g. bitops.h on x86 because they had optimised bitops
code. This made the userspace programs very non-portable - without
re-implementing userspace versions of this stuff in every userspace
program that did this stuff.

So no, having experienced the effects of this kind of thing in the
past, the kernel should _not_ export architecture specific code in
header files to userspace.

Also, it should be pointed out that by doing so, you create a licensing
issue. If the code is GPLv2, and you build your program such that it
incorporates GPLv2 code, then if the userspace program is not GPLv2
compliant, you have a licensing problem, and in effect the program
can be distributed.

Please do not go down this route.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23 13:39       ` Russell King (Oracle)
@ 2021-08-23 14:54         ` Leo Yan
  0 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2021-08-23 14:54 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: James Clark, Arnaldo Carvalho de Melo, Mike Leach, Will Deacon,
	Peter Zijlstra, Adrian Hunter, Andi Kleen, Mark Rutland,
	Catalin Marinas, linux-perf-users, linux-arm-kernel, Li Huafei,
	Jin Yao, Riccardo Mancini, Ingo Molnar, Namhyung Kim,
	Suzuki K Poulose, Mathieu Poirier, Alexander Shishkin, Jiri Olsa,
	John Garry, coresight, linux-kernel

Hi Russell,

On Mon, Aug 23, 2021 at 02:39:18PM +0100, Russell King (Oracle) wrote:
> On Mon, Aug 23, 2021 at 09:30:43PM +0800, Leo Yan wrote:
> > On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:

[...]

> > > For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> > > include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> > > make it not so easy for Arm. Just wondering if you considered trying to include the
> > > existing one? Or decided that it was easier to duplicate it?
> > 
> > Good finding!
> > 
> > With you reminding, I recognized that the atomic operations for
> > arm/arm64 should be improved for user space program.  So far, perf tool
> > simply uses the compiler's atomic implementations (from
> > asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
> > implementation, I think we should improve the user space program with
> > architecture's atomic instructions.
> 
> No we should not. Sometimes, what's in the kernel is for the kernel's
> use only, and not for userspace's use. That may be because what works
> in kernel space does not work in userspace.
> 
> For example, the ARMv6+ atomic operations can be executed in userspace
> _provided_ they are only used on memory which has an exclusive monitor.
> They can't be used on anything that is not "normal memory".

Okay, IIUC, the requirement for "normal memory" and exclusive monitor
should also apply on aarch64 for ldrex/strex, Load-Acquire and
Store-Release instructions, etc.  Otherwise, it's heavily dependent on
the exclusive monitors outside the cache coherency domain (but this is
out of the scopes for CPU).

perf tool is very likely to map memory with "normal memory" but we
cannot say it's always true.

So I agree there have risk for exporting the aarch32/aarch64 atomic
headers to user space.

> Prior to
> ARMv6, the atomic operations rely on disabling interrupts. That
> facility is simply not available to userspace, so these must not be
> made available to userspace.
> 
> The same applies to bitops.
> 
> We've been here before in the past, when the kernel headers were not
> separated from the user ABI headers, and people would write programs
> that included e.g. bitops.h on x86 because they had optimised bitops
> code. This made the userspace programs very non-portable - without
> re-implementing userspace versions of this stuff in every userspace
> program that did this stuff.
> 
> So no, having experienced the effects of this kind of thing in the
> past, the kernel should _not_ export architecture specific code in
> header files to userspace.
> 
> Also, it should be pointed out that by doing so, you create a licensing
> issue. If the code is GPLv2, and you build your program such that it
> incorporates GPLv2 code, then if the userspace program is not GPLv2
> compliant, you have a licensing problem, and in effect the program
> can be distributed.
> 
> Please do not go down this route.

Thanks a lot for the suggestion and quick response.

Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23 12:13         ` Leo Yan
@ 2021-08-23 16:00           ` James Clark
  2021-08-24  2:13             ` Leo Yan
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2021-08-23 16:00 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight



On 23/08/2021 13:13, Leo Yan wrote:
> On Mon, Aug 23, 2021 at 11:57:52AM +0100, James Clark wrote:
> 
> [...]
> 
>> Ok thanks for the explanation, that makes sense now. I do have one other
>> point about the documentation for the function:
> 
> Welcome!
> 
>>> + * When update the AUX tail and detects any carrying in the high 32 bits, it
>>> + * means there have two store operations in user space and it cannot promise
>>> + * the atomicity for 64-bit write, so return '-1' in this case to tell the
>>> + * caller an overflow error has happened.
>>> + */
>>
>> I couldn't see how it can ever return -1, it seems like it would loop forever
>> until it reads the correct value.
> 
> I use this chunk comment to address the function
> compat_auxtrace_mmap__write_tail():
> 
> +int __weak compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 mask = (u64)(UINT32_MAX) << 32;
> +
> +	if (tail & mask)
> +		return -1;
> +
> +	/* Ensure all reads are done before we write the tail out */
> +	smp_mb();
> +	WRITE_ONCE(pc->aux_tail, tail);
> +	return 0;
> +}
> 
> Please let me know if this is okay or not?  Otherwise, if you think
> the format can cause confusion, I'd like to split the comments into
> two sections, one section for reading AUX head and another is for
> writing AUX tail.

I see what you mean now, I think keeping it in one section is fine, I would just
replace "When update the AUX tail and detects" with "When
compat_auxtrace_mmap__write_tail() detects". If the function name is there then
it's clear that it's not the return value of compat_auxtrace_mmap__read_head()

Thanks
James

> 
> Thanks,
> Leo
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
  2021-08-23 16:00           ` James Clark
@ 2021-08-24  2:13             ` Leo Yan
  0 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2021-08-24  2:13 UTC (permalink / raw)
  To: James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Adrian Hunter,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Will Deacon, Russell King, Catalin Marinas,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, John Garry,
	Andi Kleen, Riccardo Mancini, Jin Yao, Li Huafei, coresight

On Mon, Aug 23, 2021 at 05:00:14PM +0100, James Clark wrote:

[...]

> >> I couldn't see how it can ever return -1, it seems like it would loop forever
> >> until it reads the correct value.
> > 
> > I use this chunk comment to address the function
> > compat_auxtrace_mmap__write_tail():
> > 
> > +int __weak compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> > +{
> > +	struct perf_event_mmap_page *pc = mm->userpg;
> > +	u64 mask = (u64)(UINT32_MAX) << 32;
> > +
> > +	if (tail & mask)
> > +		return -1;
> > +
> > +	/* Ensure all reads are done before we write the tail out */
> > +	smp_mb();
> > +	WRITE_ONCE(pc->aux_tail, tail);
> > +	return 0;
> > +}
> > 
> > Please let me know if this is okay or not?  Otherwise, if you think
> > the format can cause confusion, I'd like to split the comments into
> > two sections, one section for reading AUX head and another is for
> > writing AUX tail.
> 
> I see what you mean now, I think keeping it in one section is fine, I would just
> replace "When update the AUX tail and detects" with "When
> compat_auxtrace_mmap__write_tail() detects". If the function name is there then
> it's clear that it's not the return value of compat_auxtrace_mmap__read_head()

Sure, will update and send out the new patch.

Thanks for suggestion.
Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-08-24  2:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 11:27 [PATCH v1 0/3] perf: Support compat mode for AUX ring buffer Leo Yan
2021-08-09 11:27 ` [PATCH v1 1/3] perf env: Track kernel 64-bit mode in environment Leo Yan
2021-08-09 20:11   ` Arnaldo Carvalho de Melo
2021-08-09 11:27 ` [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
2021-08-09 20:12   ` Arnaldo Carvalho de Melo
2021-08-12  9:23   ` Adrian Hunter
2021-08-13 16:22   ` James Clark
2021-08-23  9:51     ` Leo Yan
2021-08-23 10:57       ` James Clark
2021-08-23 12:13         ` Leo Yan
2021-08-23 16:00           ` James Clark
2021-08-24  2:13             ` Leo Yan
2021-08-09 11:27 ` [PATCH v1 3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail} Leo Yan
2021-08-23 12:23   ` James Clark
2021-08-23 13:30     ` Leo Yan
2021-08-23 13:39       ` Russell King (Oracle)
2021-08-23 14:54         ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).