linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@linaro.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@linaro.org>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org
Cc: Leo Yan <leo.yan@linaro.org>
Subject: [PATCH v3 10/10] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
Date: Sun,  4 Jul 2021 15:16:44 +0800	[thread overview]
Message-ID: <20210704071644.107397-11-leo.yan@linaro.org> (raw)
In-Reply-To: <20210704071644.107397-1-leo.yan@linaro.org>

When perf runs in compat mode (kernel in 64-bit mode and the perf is in
32-bit mode), the 64-bit value atomicity in the user space cannot be
assured, E.g. on some architectures, the 64-bit value accessing is split
into two instructions, one is the low 32-bit accessing and another is
for the high 32-bit.

This patch introduces two functions compat_auxtrace_mmap__read_head()
and compat_auxtrace_mmap__write_tail(), as their naming indicates, when
perf tool works in compat mode, it uses these two functions to access
the AUX head and tail.  These two functions can allow the perf tool to
work properly in certain conditions, e.g. when perf tool works in
snapshot mode with only using AUX head pointer, or perf tool uses the
AUX buffer and the tail is not bigger than 4GB.

When the perf tool cannot handle the case when the AUX tail is overflow
for 4GB, the function compat_auxtrace_mmap__write_tail() returns -1 and
the caller is changed to bail out for the overflow error.

Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/auxtrace.c |  9 ++--
 tools/perf/util/auxtrace.h | 94 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index db6255b55c90..a430d74b279f 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1766,10 +1766,13 @@ static int __auxtrace_mmap__read(struct mmap *map,
 	mm->prev = head;
 
 	if (!snapshot) {
-		auxtrace_mmap__write_tail(mm, head);
-		if (itr->read_finish) {
-			int err;
+		int err;
 
+		err = auxtrace_mmap__write_tail(mm, head);
+		if (err < 0)
+			return err;
+
+		if (itr->read_finish) {
 			err = itr->read_finish(itr, mm->idx);
 			if (err < 0)
 				return err;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index d68a5e80b217..66de7b6e65ec 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -18,6 +18,8 @@
 #include <asm/bitsperlong.h>
 #include <asm/barrier.h>
 
+#include "env.h"
+
 union perf_event;
 struct perf_session;
 struct evlist;
@@ -440,23 +442,111 @@ struct auxtrace_cache;
 
 #ifdef HAVE_AUXTRACE_SUPPORT
 
+/*
+ * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode,
+ * 32-bit perf tool cannot access 64-bit value atomically, which might lead to
+ * the issues caused by the below sequence on multiple CPUs: when perf tool
+ * accesses either the load operation or the store operation for 64-bit value,
+ * on some architectures the operation is divided into two instructions, one
+ * is for accessing the low 32-bit value and another is for the high 32-bit;
+ * thus these two user operations can give the kernel chances to access the
+ * 64-bit value, and thus leads to the unexpected load values.
+ *
+ *   kernel (64-bit)                        user (32-bit)
+ *
+ *   if (LOAD ->aux_tail) { --,             LOAD ->aux_head_lo
+ *       STORE $aux_data      |       ,--->
+ *       FLUSH $aux_data      |       |     LOAD ->aux_head_hi
+ *       STORE ->aux_head   --|-------`     smp_rmb()
+ *   }                        |             LOAD $data
+ *                            |             smp_mb()
+ *                            |             STORE ->aux_tail_lo
+ *                            `----------->
+ *                                          STORE ->aux_tail_hi
+ *
+ * For this reason, it's impossible for the perf tool to work correctly when
+ * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we
+ * can not simply limit the AUX ring buffer to less than 4GB, the reason is
+ * the pointers can be increased monotonically (e.g in snapshot mode), whatever
+ * the buffer size it is, at the end the head and tail can be bigger than 4GB
+ * and carry out to the high 32-bit.
+ *
+ * To mitigate the issues and improve the user experience, we can allow the
+ * perf tool working in certain conditions and bail out with error if detect
+ * any overflow cannot be handled.
+ *
+ * For reading the AUX head, it reads out the values for three times, and
+ * compares the high 4 bytes of the values between the first time and the last
+ * time, if there has no change for high 4 bytes injected by the kernel during
+ * the user reading sequence, it's safe for use the second value.
+ *
+ * When update the AUX tail and detects any carrying in the high 32 bits, it
+ * means there have two store operations in user space and it cannot promise
+ * the atomicity for 64-bit write, so return '-1' in this case to tell the
+ * caller an overflow error has happened.
+ */
+static inline u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 first, second, last;
+	u64 mask = (u64)(UINT32_MAX) << 32;
+
+	do {
+		first = READ_ONCE(pc->aux_head);
+		/* Ensure all reads are done after we read the head */
+		smp_rmb();
+		second = READ_ONCE(pc->aux_head);
+		/* Ensure all reads are done after we read the head */
+		smp_rmb();
+		last = READ_ONCE(pc->aux_head);
+	} while ((first & mask) != (last & mask));
+
+	return second;
+}
+
+static inline int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm,
+						    u64 tail)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 mask = (u64)(UINT32_MAX) << 32;
+
+	if (tail & mask)
+		return -1;
+
+	/* Ensure all reads are done before we write the tail out */
+	smp_mb();
+	WRITE_ONCE(pc->aux_tail, tail);
+	return 0;
+}
+
 static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
 {
 	struct perf_event_mmap_page *pc = mm->userpg;
-	u64 head = READ_ONCE(pc->aux_head);
+	u64 head;
+
+#if BITS_PER_LONG == 32
+	if (kernel_is_64_bit)
+		return compat_auxtrace_mmap__read_head(mm);
+#endif
+	head = READ_ONCE(pc->aux_head);
 
 	/* Ensure all reads are done after we read the head */
 	smp_rmb();
 	return head;
 }
 
-static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
+static inline int auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
 {
 	struct perf_event_mmap_page *pc = mm->userpg;
 
+#if BITS_PER_LONG == 32
+	if (kernel_is_64_bit)
+		return compat_auxtrace_mmap__write_tail(mm, tail);
+#endif
 	/* Ensure all reads are done before we write the tail out */
 	smp_mb();
 	WRITE_ONCE(pc->aux_tail, tail);
+	return 0;
 }
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
-- 
2.25.1


      parent reply	other threads:[~2021-07-04  7:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-04  7:16 [PATCH v3 00/10] perf: Refine barriers for AUX ring buffer Leo Yan
2021-07-04  7:16 ` [PATCH v3 01/10] perf/ring_buffer: Add comment for barriers on " Leo Yan
2021-07-04  7:16 ` [PATCH v3 02/10] coresight: tmc-etr: Add barrier after updating " Leo Yan
2021-07-04  7:16 ` [PATCH v3 03/10] coresight: tmc-etf: Add comment for store ordering Leo Yan
2021-07-04  7:16 ` [PATCH v3 04/10] perf/x86: Add barrier after updating bts Leo Yan
2021-07-04  7:16 ` [PATCH v3 05/10] perf auxtrace: Drop legacy __sync functions Leo Yan
2021-07-10 12:34   ` Adrian Hunter
2021-07-11  7:41     ` Leo Yan
2021-07-04  7:16 ` [PATCH v3 06/10] perf auxtrace: Remove auxtrace_mmap__read_snapshot_head() Leo Yan
2021-07-04  7:16 ` [PATCH v3 07/10] perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT Leo Yan
2021-07-10 12:36   ` Adrian Hunter
2021-07-11  8:08     ` Leo Yan
2021-07-04  7:16 ` [PATCH v3 08/10] perf auxtrace: Use WRITE_ONCE() for updating aux_tail Leo Yan
2021-07-04  7:16 ` [PATCH v3 09/10] perf env: Set kernel bit mode Leo Yan
2021-07-04  7:16 ` Leo Yan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210704071644.107397-11-leo.yan@linaro.org \
    --to=leo.yan@linaro.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=coresight@lists.linaro.org \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH v3 10/10] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).