All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] perf: add two new features
@ 2013-06-28 13:22 Adrian Hunter
  2013-06-28 13:22 ` [PATCH 1/5] perf: fix broken union in perf_event_mmap_page Adrian Hunter
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 13:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Arnaldo Carvalho de Melo,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian, Adrian Hunter

Hi

Please consider these two new perf features:
      x86: add ability to calculate TSC from perf sample timestamps
      perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE

This is also a minor fix:
      perf: fix broken union in perf_event_mmap_page

And tests in perf tools:
      perf tools: add test for converting perf time to/from TSC
      perf tools: add 'keep tracking' test


Adrian Hunter (5):
      perf: fix broken union in perf_event_mmap_page
      x86: add ability to calculate TSC from perf sample timestamps
      perf tools: add test for converting perf time to/from TSC
      perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE
      perf tools: add 'keep tracking' test

 arch/x86/include/asm/tsc.h          |   1 +
 arch/x86/kernel/cpu/perf_event.c    |   6 ++
 arch/x86/kernel/tsc.c               |   6 ++
 include/linux/perf_event.h          |   1 +
 include/uapi/linux/perf_event.h     |  29 +++++-
 kernel/events/core.c                |  21 ++++-
 tools/perf/Makefile                 |   4 +
 tools/perf/arch/x86/Makefile        |   2 +
 tools/perf/arch/x86/util/tsc.c      |  59 ++++++++++++
 tools/perf/arch/x86/util/tsc.h      |  20 ++++
 tools/perf/tests/builtin-test.c     |  10 ++
 tools/perf/tests/keep-tracking.c    | 168 ++++++++++++++++++++++++++++++++++
 tools/perf/tests/perf-time-to-tsc.c | 177 ++++++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h            |   2 +
 tools/perf/util/evlist.c            |  32 +++++++
 tools/perf/util/evlist.h            |   5 +
 16 files changed, 537 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/arch/x86/util/tsc.c
 create mode 100644 tools/perf/arch/x86/util/tsc.h
 create mode 100644 tools/perf/tests/keep-tracking.c
 create mode 100644 tools/perf/tests/perf-time-to-tsc.c


Regards
Adrian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/5] perf: fix broken union in perf_event_mmap_page
  2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
@ 2013-06-28 13:22 ` Adrian Hunter
  2013-06-28 15:22   ` Peter Zijlstra
  2013-07-24  3:56   ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' tip-bot for Adrian Hunter
  2013-06-28 13:22 ` [PATCH 2/5] x86: add ability to calculate TSC from perf sample timestamps Adrian Hunter
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 13:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Arnaldo Carvalho de Melo,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian, Adrian Hunter

The capabilities bits must not be "union'ed" together.
Put them in a separate struct.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/uapi/linux/perf_event.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0b1df41..19f6ee5 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -375,9 +375,11 @@ struct perf_event_mmap_page {
 	__u64	time_running;		/* time event on cpu */
 	union {
 		__u64	capabilities;
-		__u64	cap_usr_time  : 1,
-			cap_usr_rdpmc : 1,
-			cap_____res   : 62;
+		struct {
+			__u64	cap_usr_time		: 1,
+				cap_usr_rdpmc		: 1,
+				cap_____res		: 62;
+		};
 	};
 
 	/*
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/5] x86: add ability to calculate TSC from perf sample timestamps
  2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
  2013-06-28 13:22 ` [PATCH 1/5] perf: fix broken union in perf_event_mmap_page Adrian Hunter
@ 2013-06-28 13:22 ` Adrian Hunter
  2013-07-24  3:56   ` [tip:perf/core] perf/x86: Add " tip-bot for Adrian Hunter
  2013-06-28 13:22 ` [PATCH 3/5] perf tools: add test for converting perf time to/from TSC Adrian Hunter
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 13:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Arnaldo Carvalho de Melo,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian, Adrian Hunter

For modern CPUs, perf clock is directly related to TSC.  TSC
can be calculated from perf clock and vice versa using a simple
calculation.  Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page.  This patch
exports the third.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/include/asm/tsc.h       |  1 +
 arch/x86/kernel/cpu/perf_event.c |  6 ++++++
 arch/x86/kernel/tsc.c            |  6 ++++++
 include/uapi/linux/perf_event.h  | 22 ++++++++++++++++++++--
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index c91e8b9..235be70 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -49,6 +49,7 @@ extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
 extern int check_tsc_unstable(void);
+extern int check_tsc_disabled(void);
 extern unsigned long native_calibrate_tsc(void);
 
 extern int tsc_clocksource_reliable;
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index afc2413..3f74034 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1860,6 +1860,7 @@ static struct pmu pmu = {
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 	userpg->cap_usr_time = 0;
+	userpg->cap_usr_time_zero = 0;
 	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
@@ -1873,6 +1874,11 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 	userpg->time_mult = this_cpu_read(cyc2ns);
 	userpg->time_shift = CYC2NS_SCALE_FACTOR;
 	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
+
+	if (sched_clock_stable && !check_tsc_disabled()) {
+		userpg->cap_usr_time_zero = 1;
+		userpg->time_zero = this_cpu_read(cyc2ns_offset);
+	}
 }
 
 /*
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 098b3cf..c810283 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -89,6 +89,12 @@ int check_tsc_unstable(void)
 }
 EXPORT_SYMBOL_GPL(check_tsc_unstable);
 
+int check_tsc_disabled(void)
+{
+	return tsc_disabled;
+}
+EXPORT_SYMBOL_GPL(check_tsc_disabled);
+
 #ifdef CONFIG_X86_TSC
 int __init notsc_setup(char *str)
 {
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 19f6ee5..663be3f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -378,7 +378,8 @@ struct perf_event_mmap_page {
 		struct {
 			__u64	cap_usr_time		: 1,
 				cap_usr_rdpmc		: 1,
-				cap_____res		: 62;
+				cap_usr_time_zero	: 1,
+				cap_____res		: 61;
 		};
 	};
 
@@ -420,12 +421,29 @@ struct perf_event_mmap_page {
 	__u16	time_shift;
 	__u32	time_mult;
 	__u64	time_offset;
+	/*
+	 * If cap_usr_time_zero, the hardware clock (e.g. TSC) can be calculated
+	 * from sample timestamps.
+	 *
+	 *   time = timestamp - time_zero;
+	 *   quot = time / time_mult;
+	 *   rem  = time % time_mult;
+	 *   cyc = (quot << time_shift) + (rem << time_shift) / time_mult;
+	 *
+	 * And vice versa:
+	 *
+	 *   quot = cyc >> time_shift;
+	 *   rem  = cyc & ((1 << time_shift) - 1);
+	 *   timestamp = time_zero + quot * time_mult +
+	 *               ((rem * time_mult) >> time_shift);
+	 */
+	__u64	time_zero;
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u64	__reserved[120];	/* align to 1k */
+	__u64	__reserved[119];	/* align to 1k */
 
 	/*
 	 * Control data for the mmap() data buffer.
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/5] perf tools: add test for converting perf time to/from TSC
  2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
  2013-06-28 13:22 ` [PATCH 1/5] perf: fix broken union in perf_event_mmap_page Adrian Hunter
  2013-06-28 13:22 ` [PATCH 2/5] x86: add ability to calculate TSC from perf sample timestamps Adrian Hunter
@ 2013-06-28 13:22 ` Adrian Hunter
  2013-07-24  3:56   ` [tip:perf/core] perf tools: Add test for converting perf time to/ from TSC tip-bot for Adrian Hunter
  2013-06-28 13:22 ` [PATCH 4/5] perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE Adrian Hunter
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 13:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Arnaldo Carvalho de Melo,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian, Adrian Hunter

The test uses the newly added cap_usr_time_zero and time_zero of
perf_event_mmap_page.  TSC from rdtsc is compared with the time
from 2 perf events.  The test passes if the calculated times are
all in the correct order.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile                 |   3 +
 tools/perf/arch/x86/Makefile        |   2 +
 tools/perf/arch/x86/util/tsc.c      |  59 ++++++++++++
 tools/perf/arch/x86/util/tsc.h      |  20 ++++
 tools/perf/tests/builtin-test.c     |   6 ++
 tools/perf/tests/perf-time-to-tsc.c | 177 ++++++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h            |   1 +
 7 files changed, 268 insertions(+)
 create mode 100644 tools/perf/arch/x86/util/tsc.c
 create mode 100644 tools/perf/arch/x86/util/tsc.h
 create mode 100644 tools/perf/tests/perf-time-to-tsc.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 203cb0e..7a8dc89 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -390,6 +390,9 @@ LIB_OBJS += $(OUTPUT)tests/bp_signal.o
 LIB_OBJS += $(OUTPUT)tests/bp_signal_overflow.o
 LIB_OBJS += $(OUTPUT)tests/task-exit.o
 LIB_OBJS += $(OUTPUT)tests/sw-clock.o
+ifeq ($(ARCH),x86)
+LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o
+endif
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 815841c..8801fe0 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -6,3 +6,5 @@ ifndef NO_LIBUNWIND
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
 endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/tsc.o
+LIB_H += arch/$(ARCH)/util/tsc.h
diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
new file mode 100644
index 0000000..f111744
--- /dev/null
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -0,0 +1,59 @@
+#include <stdbool.h>
+#include <errno.h>
+
+#include <linux/perf_event.h>
+
+#include "../../perf.h"
+#include "../../util/types.h"
+#include "../../util/debug.h"
+#include "tsc.h"
+
+u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc)
+{
+	u64 time, quot, rem;
+
+	time = ns - tc->time_zero;
+	quot = time / tc->time_mult;
+	rem  = time % tc->time_mult;
+	return (quot << tc->time_shift) +
+	       (rem << tc->time_shift) / tc->time_mult;
+}
+
+u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
+{
+	u64 quot, rem;
+
+	quot = cyc >> tc->time_shift;
+	rem  = cyc & ((1 << tc->time_shift) - 1);
+	return tc->time_zero + quot * tc->time_mult +
+	       ((rem * tc->time_mult) >> tc->time_shift);
+}
+
+int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
+			     struct perf_tsc_conversion *tc)
+{
+	bool cap_usr_time_zero;
+	u32 seq;
+	int i = 0;
+
+	while (1) {
+		seq = pc->lock;
+		rmb();
+		tc->time_mult = pc->time_mult;
+		tc->time_shift = pc->time_shift;
+		tc->time_zero = pc->time_zero;
+		cap_usr_time_zero = pc->cap_usr_time_zero;
+		rmb();
+		if (pc->lock == seq && !(seq & 1))
+			break;
+		if (++i > 10000) {
+			pr_debug("failed to get perf_event_mmap_page lock\n");
+			return -EINVAL;
+		}
+	}
+
+	if (!cap_usr_time_zero)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
diff --git a/tools/perf/arch/x86/util/tsc.h b/tools/perf/arch/x86/util/tsc.h
new file mode 100644
index 0000000..a24dec8
--- /dev/null
+++ b/tools/perf/arch/x86/util/tsc.h
@@ -0,0 +1,20 @@
+#ifndef TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
+#define TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
+
+#include "../../util/types.h"
+
+struct perf_tsc_conversion {
+	u16 time_shift;
+	u32 time_mult;
+	u64 time_zero;
+};
+
+struct perf_event_mmap_page;
+
+int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
+			     struct perf_tsc_conversion *tc);
+
+u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);
+u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
+
+#endif /* TOOLS_PERF_ARCH_X86_UTIL_TSC_H__ */
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 35b45f1466..b7b4049 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -93,6 +93,12 @@ static struct test {
 		.desc = "Test software clock events have valid period values",
 		.func = test__sw_clock_freq,
 	},
+#if defined(__x86_64__) || defined(__i386__)
+	{
+		.desc = "Test converting perf time to TSC",
+		.func = test__perf_time_to_tsc,
+	},
+#endif
 	{
 		.func = NULL,
 	},
diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c
new file mode 100644
index 0000000..0ab61b1
--- /dev/null
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -0,0 +1,177 @@
+#include <stdio.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <sys/prctl.h>
+
+#include "parse-events.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "thread_map.h"
+#include "cpumap.h"
+#include "tests.h"
+
+#include "../arch/x86/util/tsc.h"
+
+#define CHECK__(x) {				\
+	while ((x) < 0) {			\
+		pr_debug(#x " failed!\n");	\
+		goto out_err;			\
+	}					\
+}
+
+#define CHECK_NOT_NULL__(x) {			\
+	while ((x) == NULL) {			\
+		pr_debug(#x " failed!\n");	\
+		goto out_err;			\
+	}					\
+}
+
+static u64 rdtsc(void)
+{
+	unsigned int low, high;
+
+	asm volatile("rdtsc" : "=a" (low), "=d" (high));
+
+	return low | ((u64)high) << 32;
+}
+
+/**
+ * test__perf_time_to_tsc - test converting perf time to TSC.
+ *
+ * This function implements a test that checks that the conversion of perf time
+ * to and from TSC is consistent with the order of events.  If the test passes
+ * %0 is returned, otherwise %-1 is returned.  If TSC conversion is not
+ * supported then then the test passes but " (not supported)" is printed.
+ */
+int test__perf_time_to_tsc(void)
+{
+	struct perf_record_opts opts = {
+		.mmap_pages	     = UINT_MAX,
+		.user_freq	     = UINT_MAX,
+		.user_interval	     = ULLONG_MAX,
+		.freq		     = 4000,
+		.target		     = {
+			.uses_mmap   = true,
+		},
+		.sample_time	     = true,
+	};
+	struct thread_map *threads = NULL;
+	struct cpu_map *cpus = NULL;
+	struct perf_evlist *evlist = NULL;
+	struct perf_evsel *evsel = NULL;
+	int err = -1, ret, i;
+	const char *comm1, *comm2;
+	struct perf_tsc_conversion tc;
+	struct perf_event_mmap_page *pc;
+	union perf_event *event;
+	u64 test_tsc, comm1_tsc, comm2_tsc;
+	u64 test_time, comm1_time = 0, comm2_time = 0;
+
+	threads = thread_map__new(-1, getpid(), UINT_MAX);
+	CHECK_NOT_NULL__(threads);
+
+	cpus = cpu_map__new(NULL);
+	CHECK_NOT_NULL__(cpus);
+
+	evlist = perf_evlist__new();
+	CHECK_NOT_NULL__(evlist);
+
+	perf_evlist__set_maps(evlist, cpus, threads);
+
+	CHECK__(parse_events(evlist, "cycles:u"));
+
+	perf_evlist__config(evlist, &opts);
+
+	evsel = perf_evlist__first(evlist);
+
+	evsel->attr.comm = 1;
+	evsel->attr.disabled = 1;
+	evsel->attr.enable_on_exec = 0;
+
+	CHECK__(perf_evlist__open(evlist));
+
+	CHECK__(perf_evlist__mmap(evlist, UINT_MAX, false));
+
+	pc = evlist->mmap[0].base;
+	ret = perf_read_tsc_conversion(pc, &tc);
+	if (ret) {
+		if (ret == -EOPNOTSUPP) {
+			fprintf(stderr, " (not supported)");
+			return 0;
+		}
+		goto out_err;
+	}
+
+	perf_evlist__enable(evlist);
+
+	comm1 = "Test COMM 1";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm1, 0, 0, 0));
+
+	test_tsc = rdtsc();
+
+	comm2 = "Test COMM 2";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm2, 0, 0, 0));
+
+	perf_evlist__disable(evlist);
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+			struct perf_sample sample;
+
+			if (event->header.type != PERF_RECORD_COMM ||
+			    (pid_t)event->comm.pid != getpid() ||
+			    (pid_t)event->comm.tid != getpid())
+				continue;
+
+			if (strcmp(event->comm.comm, comm1) == 0) {
+				CHECK__(perf_evsel__parse_sample(evsel, event,
+								 &sample));
+				comm1_time = sample.time;
+			}
+			if (strcmp(event->comm.comm, comm2) == 0) {
+				CHECK__(perf_evsel__parse_sample(evsel, event,
+								 &sample));
+				comm2_time = sample.time;
+			}
+		}
+	}
+
+	if (!comm1_time || !comm2_time)
+		goto out_err;
+
+	test_time = tsc_to_perf_time(test_tsc, &tc);
+	comm1_tsc = perf_time_to_tsc(comm1_time, &tc);
+	comm2_tsc = perf_time_to_tsc(comm2_time, &tc);
+
+	pr_debug("1st event perf time %"PRIu64" tsc %"PRIu64"\n",
+		 comm1_time, comm1_tsc);
+	pr_debug("rdtsc          time %"PRIu64" tsc %"PRIu64"\n",
+		 test_time, test_tsc);
+	pr_debug("2nd event perf time %"PRIu64" tsc %"PRIu64"\n",
+		 comm2_time, comm2_tsc);
+
+	if (test_time <= comm1_time ||
+	    test_time >= comm2_time)
+		goto out_err;
+
+	if (test_tsc <= comm1_tsc ||
+	    test_tsc >= comm2_tsc)
+		goto out_err;
+
+	err = 0;
+
+out_err:
+	if (evlist) {
+		perf_evlist__disable(evlist);
+		perf_evlist__munmap(evlist);
+		perf_evlist__close(evlist);
+		perf_evlist__delete(evlist);
+	}
+	if (cpus)
+		cpu_map__delete(cpus);
+	if (threads)
+		thread_map__delete(threads);
+
+	return err;
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index dd7feae..107696e 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -27,5 +27,6 @@ int test__bp_signal(void);
 int test__bp_signal_overflow(void);
 int test__task_exit(void);
 int test__sw_clock_freq(void);
+int test__perf_time_to_tsc(void);
 
 #endif /* TESTS_H */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/5] perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE
  2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
                   ` (2 preceding siblings ...)
  2013-06-28 13:22 ` [PATCH 3/5] perf tools: add test for converting perf time to/from TSC Adrian Hunter
@ 2013-06-28 13:22 ` Adrian Hunter
  2013-06-28 13:22 ` [PATCH 5/5] perf tools: add 'keep tracking' test Adrian Hunter
  2013-06-28 15:27 ` [PATCH 0/5] perf: add two new features Peter Zijlstra
  5 siblings, 0 replies; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 13:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Arnaldo Carvalho de Melo,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian, Adrian Hunter

Make it possible to disable an event but continue to receive
"tracking" events i.e. continue to receive mmap, comm, task events.
The flag is updated by both PERF_EVENT_IOC_DISABLE and
PERF_EVENT_IOC_ENABLE.  The flag is cleared by prctl
PR_TASK_PERF_EVENTS_DISABLE.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  1 +
 kernel/events/core.c            | 21 +++++++++++++++++++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 50b3efd..789eeeb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -436,6 +436,7 @@ struct perf_event {
 	struct perf_cgroup		*cgrp; /* cgroup event is attach to */
 	int				cgrp_defer_enabled;
 #endif
+	int				keep_tracking;
 
 #endif /* CONFIG_PERF_EVENTS */
 };
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 663be3f..7b4ecfb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -324,6 +324,7 @@ struct perf_event_attr {
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
+	PERF_IOC_KEEP_TRACKING		= 1U << 1,
 };
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1db3af9..0c1fbe9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3495,6 +3495,16 @@ static inline int perf_fget_light(int fd, struct fd *p)
 	return 0;
 }
 
+static void perf_event_keep_tracking(struct perf_event *event, u32 flags)
+{
+	int keep_tracking = !!(flags & PERF_IOC_KEEP_TRACKING);
+
+	if (flags & PERF_IOC_FLAG_GROUP)
+		event->group_leader->keep_tracking = keep_tracking;
+	else
+		event->keep_tracking = keep_tracking;
+}
+
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
@@ -3510,6 +3520,7 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		func = perf_event_enable;
 		break;
 	case PERF_EVENT_IOC_DISABLE:
+		perf_event_keep_tracking(event, flags);
 		func = perf_event_disable;
 		break;
 	case PERF_EVENT_IOC_RESET:
@@ -3552,6 +3563,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	else
 		perf_event_for_each_child(event, func);
 
+	if (cmd == PERF_EVENT_IOC_ENABLE)
+		perf_event_keep_tracking(event, flags);
+
 	return 0;
 }
 
@@ -3572,8 +3586,10 @@ int perf_event_task_disable(void)
 	struct perf_event *event;
 
 	mutex_lock(&current->perf_event_mutex);
-	list_for_each_entry(event, &current->perf_event_list, owner_entry)
+	list_for_each_entry(event, &current->perf_event_list, owner_entry) {
 		perf_event_for_each_child(event, perf_event_disable);
+		event->keep_tracking = 0;
+	}
 	mutex_unlock(&current->perf_event_mutex);
 
 	return 0;
@@ -4670,7 +4686,8 @@ perf_event_aux_ctx(struct perf_event_context *ctx,
 	struct perf_event *event;
 
 	list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
-		if (event->state < PERF_EVENT_STATE_INACTIVE)
+		if (event->state < PERF_EVENT_STATE_INACTIVE &&
+		    !event->keep_tracking)
 			continue;
 		if (!event_filter_match(event))
 			continue;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/5] perf tools: add 'keep tracking' test
  2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
                   ` (3 preceding siblings ...)
  2013-06-28 13:22 ` [PATCH 4/5] perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE Adrian Hunter
@ 2013-06-28 13:22 ` Adrian Hunter
  2013-06-28 15:27 ` [PATCH 0/5] perf: add two new features Peter Zijlstra
  5 siblings, 0 replies; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 13:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Arnaldo Carvalho de Melo,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian, Adrian Hunter

Add a test for the newly added 'keep tracking' flag for
the PERF_EVENT_IOC_DISABLE IOCTL.  The test checks that
comm events are present when the event is disabled with
the 'keep tracking' flag set.  The test also checks that
comm events are not present when the event is disabled
without the flag set.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile              |   1 +
 tools/perf/tests/builtin-test.c  |   4 +
 tools/perf/tests/keep-tracking.c | 168 +++++++++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h         |   1 +
 tools/perf/util/evlist.c         |  32 ++++++++
 tools/perf/util/evlist.h         |   5 ++
 6 files changed, 211 insertions(+)
 create mode 100644 tools/perf/tests/keep-tracking.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7a8dc89..9c3642f 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -393,6 +393,7 @@ LIB_OBJS += $(OUTPUT)tests/sw-clock.o
 ifeq ($(ARCH),x86)
 LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o
 endif
+LIB_OBJS += $(OUTPUT)tests/keep-tracking.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b7b4049..b0b2b88 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -100,6 +100,10 @@ static struct test {
 	},
 #endif
 	{
+		.desc = "Test event disable IOCTL 'keep tracking' flag",
+		.func = test__keep_tracking,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c
new file mode 100644
index 0000000..ce6a1a1
--- /dev/null
+++ b/tools/perf/tests/keep-tracking.c
@@ -0,0 +1,168 @@
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+
+#include "parse-events.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "thread_map.h"
+#include "cpumap.h"
+#include "tests.h"
+
+#define CHECK__(x) {				\
+	while ((x) < 0) {			\
+		pr_debug(#x " failed!\n");	\
+		goto out_err;			\
+	}					\
+}
+
+#define CHECK_NOT_NULL__(x) {			\
+	while ((x) == NULL) {			\
+		pr_debug(#x " failed!\n");	\
+		goto out_err;			\
+	}					\
+}
+
+static int find_comm(struct perf_evlist *evlist, const char *comm)
+{
+	union perf_event *event;
+	int i, found;
+
+	found = 0;
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+			if (event->header.type == PERF_RECORD_COMM &&
+			    (pid_t)event->comm.pid == getpid() &&
+			    (pid_t)event->comm.tid == getpid() &&
+			    strcmp(event->comm.comm, comm) == 0)
+				found += 1;
+		}
+	}
+	return found;
+}
+
+/**
+ * test__keep_tracking - test event disable IOCTL with 'keep tracking' flag.
+ *
+ * This function implements a test that checks that tracking events continue
+ * when an event is disabled with the 'keep tracking flag'.  If the test passes
+ * %0 is returned, otherwise %-1 is returned.
+ */
+int test__keep_tracking(void)
+{
+	struct perf_record_opts opts = {
+		.mmap_pages	     = UINT_MAX,
+		.user_freq	     = UINT_MAX,
+		.user_interval	     = ULLONG_MAX,
+		.freq		     = 4000,
+		.target		     = {
+			.uses_mmap   = true,
+		},
+	};
+	struct thread_map *threads = NULL;
+	struct cpu_map *cpus = NULL;
+	struct perf_evlist *evlist = NULL;
+	struct perf_evsel *evsel = NULL;
+	int found, err = -1;
+	const char *comm;
+
+	threads = thread_map__new(-1, getpid(), UINT_MAX);
+	CHECK_NOT_NULL__(threads);
+
+	cpus = cpu_map__new(NULL);
+	CHECK_NOT_NULL__(cpus);
+
+	evlist = perf_evlist__new();
+	CHECK_NOT_NULL__(evlist);
+
+	perf_evlist__set_maps(evlist, cpus, threads);
+
+	CHECK__(parse_events(evlist, "cycles:u"));
+
+	perf_evlist__config(evlist, &opts);
+
+	evsel = perf_evlist__first(evlist);
+
+	evsel->attr.comm = 1;
+	evsel->attr.disabled = 1;
+	evsel->attr.enable_on_exec = 0;
+
+	CHECK__(perf_evlist__open(evlist));
+
+	CHECK__(perf_evlist__mmap(evlist, UINT_MAX, false));
+
+	/*
+	 * First, test that a 'comm' event can be found when the event is
+	 * enabled.
+	 */
+
+	perf_evlist__enable(evlist);
+
+	comm = "Test COMM 1";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm, 0, 0, 0));
+
+	perf_evlist__disable(evlist);
+
+	found = find_comm(evlist, comm);
+	if (found != 1) {
+		pr_debug("Failed to find tracking event.\n");
+		goto out_err;
+	}
+
+	/*
+	 * Secondly, test that a 'comm' event cannot be found when the event is
+	 * disabled without the 'keep tracking' flag.
+	 */
+
+	perf_evlist__enable(evlist);
+
+	CHECK__(perf_evlist__disable_event(evlist, evsel, 0));
+
+	comm = "Test COMM 2";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm, 0, 0, 0));
+
+	perf_evlist__disable(evlist);
+
+	found = find_comm(evlist, comm);
+	if (found) {
+		pr_debug("Found tracking event with the event disabled.\n");
+		goto out_err;
+	}
+
+	/*
+	 * Finally, test that a 'comm' event can be found when the event is
+	 * disabled with the 'keep tracking' flag.
+	 */
+
+	perf_evlist__enable(evlist);
+
+	CHECK__(perf_evlist__disable_event(evlist, evsel,
+					   PERF_IOC_KEEP_TRACKING));
+
+	comm = "Test COMM 3";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm, 0, 0, 0));
+
+	perf_evlist__disable(evlist);
+
+	found = find_comm(evlist, comm);
+	if (found != 1) {
+		pr_debug("PERF_IOC_KEEP_TRACKING flag had no effect.\n");
+		goto out_err;
+	}
+
+	err = 0;
+
+out_err:
+	if (evlist) {
+		perf_evlist__disable(evlist);
+		perf_evlist__munmap(evlist);
+		perf_evlist__close(evlist);
+		perf_evlist__delete(evlist);
+	}
+	if (cpus)
+		cpu_map__delete(cpus);
+	if (threads)
+		thread_map__delete(threads);
+
+	return err;
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 107696e..971172a 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -28,5 +28,6 @@ int test__bp_signal_overflow(void);
 int test__task_exit(void);
 int test__sw_clock_freq(void);
 int test__perf_time_to_tsc(void);
+int test__keep_tracking(void);
 
 #endif /* TESTS_H */
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 78331da..abaf63b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -303,6 +303,38 @@ void perf_evlist__enable(struct perf_evlist *evlist)
 	}
 }
 
+int perf_evlist__disable_event(struct perf_evlist *evlist,
+			       struct perf_evsel *evsel, int flags)
+{
+	int cpu, thread, err;
+
+	for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
+		for (thread = 0; thread < evlist->threads->nr; thread++) {
+			err = ioctl(FD(evsel, cpu, thread),
+				    PERF_EVENT_IOC_DISABLE, flags);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+int perf_evlist__enable_event(struct perf_evlist *evlist,
+			      struct perf_evsel *evsel, int flags)
+{
+	int cpu, thread, err;
+
+	for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
+		for (thread = 0; thread < evlist->threads->nr; thread++) {
+			err = ioctl(FD(evsel, cpu, thread),
+				    PERF_EVENT_IOC_ENABLE, flags);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
 static int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
 	int nr_cpus = cpu_map__nr(evlist->cpus);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b1be475..3531516 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -104,6 +104,11 @@ void perf_evlist__munmap(struct perf_evlist *evlist);
 void perf_evlist__disable(struct perf_evlist *evlist);
 void perf_evlist__enable(struct perf_evlist *evlist);
 
+int perf_evlist__disable_event(struct perf_evlist *evlist,
+			       struct perf_evsel *evsel, int flags);
+int perf_evlist__enable_event(struct perf_evlist *evlist,
+			      struct perf_evsel *evsel, int flags);
+
 void perf_evlist__set_selected(struct perf_evlist *evlist,
 			       struct perf_evsel *evsel);
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/5] perf: fix broken union in perf_event_mmap_page
  2013-06-28 13:22 ` [PATCH 1/5] perf: fix broken union in perf_event_mmap_page Adrian Hunter
@ 2013-06-28 15:22   ` Peter Zijlstra
  2013-07-16 11:51     ` H. Peter Anvin
  2013-07-24  3:56   ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' tip-bot for Adrian Hunter
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-06-28 15:22 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin,
	Arnaldo Carvalho de Melo, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

On Fri, Jun 28, 2013 at 04:22:17PM +0300, Adrian Hunter wrote:
> The capabilities bits must not be "union'ed" together.
> Put them in a separate struct.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  include/uapi/linux/perf_event.h | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 0b1df41..19f6ee5 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -375,9 +375,11 @@ struct perf_event_mmap_page {
>  	__u64	time_running;		/* time event on cpu */
>  	union {
>  		__u64	capabilities;
> -		__u64	cap_usr_time  : 1,
> -			cap_usr_rdpmc : 1,
> -			cap_____res   : 62;
> +		struct {
> +			__u64	cap_usr_time		: 1,
> +				cap_usr_rdpmc		: 1,
> +				cap_____res		: 62;
> +		};
>  	};

Ick, it did that!? and here I thought there was a difference between:

  int foo:1,
      bar:1;

and

  int foo:1;
  int bar:1;

That made all the difference in this particular case. I guess I should
go read the language spec more carefully next time.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] perf: add two new features
  2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
                   ` (4 preceding siblings ...)
  2013-06-28 13:22 ` [PATCH 5/5] perf tools: add 'keep tracking' test Adrian Hunter
@ 2013-06-28 15:27 ` Peter Zijlstra
  2013-06-28 19:22   ` Adrian Hunter
  5 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-06-28 15:27 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin,
	Arnaldo Carvalho de Melo, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

On Fri, Jun 28, 2013 at 04:22:16PM +0300, Adrian Hunter wrote:
> Hi
> 
> Please consider these two new perf features:
>       x86: add ability to calculate TSC from perf sample timestamps
>       perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE

Please explain to us why you'd like to do this..

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] perf: add two new features
  2013-06-28 15:27 ` [PATCH 0/5] perf: add two new features Peter Zijlstra
@ 2013-06-28 19:22   ` Adrian Hunter
  2013-07-16  6:22     ` Adrian Hunter
  0 siblings, 1 reply; 39+ messages in thread
From: Adrian Hunter @ 2013-06-28 19:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin,
	Arnaldo Carvalho de Melo, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

On 28/06/2013 6:27 p.m., Peter Zijlstra wrote:
> On Fri, Jun 28, 2013 at 04:22:16PM +0300, Adrian Hunter wrote:
>> Hi
>>
>> Please consider these two new perf features:
>>        x86: add ability to calculate TSC from perf sample timestamps
>>        perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE
>
> Please explain to us why you'd like to do this..

I will see what information I can dig up.  The short answer is that I 
need to disable and re-enable a perf event but still be able to map IPs 
to their DSOs and symbols - which means not losing mmap events.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] perf: add two new features
  2013-06-28 19:22   ` Adrian Hunter
@ 2013-07-16  6:22     ` Adrian Hunter
  2013-07-16 14:34       ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Adrian Hunter @ 2013-07-16  6:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H Peter Anvin,
	Arnaldo Carvalho de Melo, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On 28/06/13 22:22, Adrian Hunter wrote:
> On 28/06/2013 6:27 p.m., Peter Zijlstra wrote:
>> On Fri, Jun 28, 2013 at 04:22:16PM +0300, Adrian Hunter wrote:
>>> Hi
>>>
>>> Please consider these two new perf features:
>>>        x86: add ability to calculate TSC from perf sample timestamps
>>>        perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE
>>
>> Please explain to us why you'd like to do this..
> 
> I will see what information I can dig up.  The short answer is that I need
> to disable and re-enable a perf event but still be able to map IPs to their
> DSOs and symbols - which means not losing mmap events.

Any other comments?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/5] perf: fix broken union in perf_event_mmap_page
  2013-06-28 15:22   ` Peter Zijlstra
@ 2013-07-16 11:51     ` H. Peter Anvin
  0 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2013-07-16 11:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Ingo Molnar, Thomas Gleixner,
	Arnaldo Carvalho de Melo, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

On 06/28/2013 08:22 AM, Peter Zijlstra wrote:
> 
> Ick, it did that!? and here I thought there was a difference between:
> 
>   int foo:1,
>       bar:1;
> 
> and
> 
>   int foo:1;
>   int bar:1;
> 
> That made all the difference in this particular case. I guess I should
> go read the language spec more carefully next time.
> 

There is absolutely no difference there...

	-hpa


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] perf: add two new features
  2013-07-16  6:22     ` Adrian Hunter
@ 2013-07-16 14:34       ` Peter Zijlstra
  2013-07-17 11:28         ` Adrian Hunter
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-07-16 14:34 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H Peter Anvin,
	Arnaldo Carvalho de Melo, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On Tue, Jul 16, 2013 at 09:22:00AM +0300, Adrian Hunter wrote:
> On 28/06/13 22:22, Adrian Hunter wrote:
> > On 28/06/2013 6:27 p.m., Peter Zijlstra wrote:
> >> On Fri, Jun 28, 2013 at 04:22:16PM +0300, Adrian Hunter wrote:
> >>> Hi
> >>>
> >>> Please consider these two new perf features:
> >>>        x86: add ability to calculate TSC from perf sample timestamps
> >>>        perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE
> >>
> >> Please explain to us why you'd like to do this..
> > 
> > I will see what information I can dig up.  The short answer is that I need
> > to disable and re-enable a perf event but still be able to map IPs to their
> > DSOs and symbols - which means not losing mmap events.
> 
> Any other comments?

Ah, thanks for the reminder.. well, I've applied patches 1-3 as those seem
useful on their own. I'm not entirely convinced about the 'keep tracking' thing
though.

It seems to me you could get the same by opening a second event into the same
buffer and keeping that enabled.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] perf: add two new features
  2013-07-16 14:34       ` Peter Zijlstra
@ 2013-07-17 11:28         ` Adrian Hunter
  0 siblings, 0 replies; 39+ messages in thread
From: Adrian Hunter @ 2013-07-17 11:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H Peter Anvin,
	Arnaldo Carvalho de Melo, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On 16/07/13 17:34, Peter Zijlstra wrote:
> On Tue, Jul 16, 2013 at 09:22:00AM +0300, Adrian Hunter wrote:
>> On 28/06/13 22:22, Adrian Hunter wrote:
>>> On 28/06/2013 6:27 p.m., Peter Zijlstra wrote:
>>>> On Fri, Jun 28, 2013 at 04:22:16PM +0300, Adrian Hunter wrote:
>>>>> Hi
>>>>>
>>>>> Please consider these two new perf features:
>>>>>        x86: add ability to calculate TSC from perf sample timestamps
>>>>>        perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE
>>>>
>>>> Please explain to us why you'd like to do this..
>>>
>>> I will see what information I can dig up.  The short answer is that I need
>>> to disable and re-enable a perf event but still be able to map IPs to their
>>> DSOs and symbols - which means not losing mmap events.
>>
>> Any other comments?
> 
> Ah, thanks for the reminder.. well, I've applied patches 1-3 as those seem
> useful on their own. I'm not entirely convinced about the 'keep tracking' thing
> though.
> 
> It seems to me you could get the same by opening a second event into the same
> buffer and keeping that enabled.

In that case I would like a dummy event to use for that purpose.  A software
event could be set aside e.g.

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 627bbcf..7ea4ccd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -109,6 +109,7 @@ enum perf_sw_ids {
        PERF_COUNT_SW_PAGE_FAULTS_MAJ           = 6,
        PERF_COUNT_SW_ALIGNMENT_FAULTS          = 7,
        PERF_COUNT_SW_EMULATION_FAULTS          = 8,
+       PERF_COUNT_SW_DUMMY                     = 9,

        PERF_COUNT_SW_MAX,                      /* non-ABI */
 };



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-06-28 13:22 ` [PATCH 1/5] perf: fix broken union in perf_event_mmap_page Adrian Hunter
  2013-06-28 15:22   ` Peter Zijlstra
@ 2013-07-24  3:56   ` tip-bot for Adrian Hunter
  2013-09-17 20:23     ` Vince Weaver
  1 sibling, 1 reply; 39+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-07-24  3:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, adrian.hunter, tglx

Commit-ID:  860f085b74e9f0075de8140ed3a1e5b5e3e39aa8
Gitweb:     http://git.kernel.org/tip/860f085b74e9f0075de8140ed3a1e5b5e3e39aa8
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 28 Jun 2013 16:22:17 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 23 Jul 2013 12:17:10 +0200

perf: Fix broken union in 'struct perf_event_mmap_page'

The capabilities bits must not be "union'ed" together.
Put them in a separate struct.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372425741-1676-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/uapi/linux/perf_event.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 00d8274..0041aed 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -375,9 +375,11 @@ struct perf_event_mmap_page {
 	__u64	time_running;		/* time event on cpu */
 	union {
 		__u64	capabilities;
-		__u64	cap_usr_time  : 1,
-			cap_usr_rdpmc : 1,
-			cap_____res   : 62;
+		struct {
+			__u64	cap_usr_time		: 1,
+				cap_usr_rdpmc		: 1,
+				cap_____res		: 62;
+		};
 	};
 
 	/*

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:perf/core] perf/x86: Add ability to calculate TSC from perf sample timestamps
  2013-06-28 13:22 ` [PATCH 2/5] x86: add ability to calculate TSC from perf sample timestamps Adrian Hunter
@ 2013-07-24  3:56   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-07-24  3:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, adrian.hunter, tglx

Commit-ID:  c73deb6aecda2955716f31572516f09d930ef450
Gitweb:     http://git.kernel.org/tip/c73deb6aecda2955716f31572516f09d930ef450
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 28 Jun 2013 16:22:18 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 23 Jul 2013 12:17:45 +0200

perf/x86: Add ability to calculate TSC from perf sample timestamps

For modern CPUs, perf clock is directly related to TSC.  TSC
can be calculated from perf clock and vice versa using a simple
calculation.  Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page.  This patch
exports the third.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/tsc.h       |  1 +
 arch/x86/kernel/cpu/perf_event.c |  6 ++++++
 arch/x86/kernel/tsc.c            |  6 ++++++
 include/uapi/linux/perf_event.h  | 22 ++++++++++++++++++++--
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index c91e8b9..235be70 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -49,6 +49,7 @@ extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
 extern int check_tsc_unstable(void);
+extern int check_tsc_disabled(void);
 extern unsigned long native_calibrate_tsc(void);
 
 extern int tsc_clocksource_reliable;
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index a7c7305..8355c84 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1884,6 +1884,7 @@ static struct pmu pmu = {
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 	userpg->cap_usr_time = 0;
+	userpg->cap_usr_time_zero = 0;
 	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
@@ -1897,6 +1898,11 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 	userpg->time_mult = this_cpu_read(cyc2ns);
 	userpg->time_shift = CYC2NS_SCALE_FACTOR;
 	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
+
+	if (sched_clock_stable && !check_tsc_disabled()) {
+		userpg->cap_usr_time_zero = 1;
+		userpg->time_zero = this_cpu_read(cyc2ns_offset);
+	}
 }
 
 /*
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 6ff4924..930e5d4 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -89,6 +89,12 @@ int check_tsc_unstable(void)
 }
 EXPORT_SYMBOL_GPL(check_tsc_unstable);
 
+int check_tsc_disabled(void)
+{
+	return tsc_disabled;
+}
+EXPORT_SYMBOL_GPL(check_tsc_disabled);
+
 #ifdef CONFIG_X86_TSC
 int __init notsc_setup(char *str)
 {
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0041aed..efef1d3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -378,7 +378,8 @@ struct perf_event_mmap_page {
 		struct {
 			__u64	cap_usr_time		: 1,
 				cap_usr_rdpmc		: 1,
-				cap_____res		: 62;
+				cap_usr_time_zero	: 1,
+				cap_____res		: 61;
 		};
 	};
 
@@ -420,12 +421,29 @@ struct perf_event_mmap_page {
 	__u16	time_shift;
 	__u32	time_mult;
 	__u64	time_offset;
+	/*
+	 * If cap_usr_time_zero, the hardware clock (e.g. TSC) can be calculated
+	 * from sample timestamps.
+	 *
+	 *   time = timestamp - time_zero;
+	 *   quot = time / time_mult;
+	 *   rem  = time % time_mult;
+	 *   cyc = (quot << time_shift) + (rem << time_shift) / time_mult;
+	 *
+	 * And vice versa:
+	 *
+	 *   quot = cyc >> time_shift;
+	 *   rem  = cyc & ((1 << time_shift) - 1);
+	 *   timestamp = time_zero + quot * time_mult +
+	 *               ((rem * time_mult) >> time_shift);
+	 */
+	__u64	time_zero;
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u64	__reserved[120];	/* align to 1k */
+	__u64	__reserved[119];	/* align to 1k */
 
 	/*
 	 * Control data for the mmap() data buffer.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:perf/core] perf tools: Add test for converting perf time to/ from TSC
  2013-06-28 13:22 ` [PATCH 3/5] perf tools: add test for converting perf time to/from TSC Adrian Hunter
@ 2013-07-24  3:56   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-07-24  3:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, hpa, mingo, a.p.zijlstra, jolsa, adrian.hunter, tglx

Commit-ID:  3bd5a5fc8c6b9fe769777abf74b0ab5fbd7930b4
Gitweb:     http://git.kernel.org/tip/3bd5a5fc8c6b9fe769777abf74b0ab5fbd7930b4
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 28 Jun 2013 16:22:19 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 23 Jul 2013 12:17:59 +0200

perf tools: Add test for converting perf time to/from TSC

The test uses the newly added cap_usr_time_zero and time_zero of
perf_event_mmap_page.  TSC from rdtsc is compared with the time
from 2 perf events.  The test passes if the calculated times are
all in the correct order.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1372425741-1676-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/perf/Makefile                 |   3 +
 tools/perf/arch/x86/Makefile        |   2 +
 tools/perf/arch/x86/util/tsc.c      |  59 ++++++++++++
 tools/perf/arch/x86/util/tsc.h      |  20 ++++
 tools/perf/tests/builtin-test.c     |   6 ++
 tools/perf/tests/perf-time-to-tsc.c | 177 ++++++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h            |   1 +
 7 files changed, 268 insertions(+)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 024680b..bfd12d0 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -389,6 +389,9 @@ LIB_OBJS += $(OUTPUT)tests/bp_signal.o
 LIB_OBJS += $(OUTPUT)tests/bp_signal_overflow.o
 LIB_OBJS += $(OUTPUT)tests/task-exit.o
 LIB_OBJS += $(OUTPUT)tests/sw-clock.o
+ifeq ($(ARCH),x86)
+LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o
+endif
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 815841c..8801fe0 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -6,3 +6,5 @@ ifndef NO_LIBUNWIND
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
 endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/tsc.o
+LIB_H += arch/$(ARCH)/util/tsc.h
diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
new file mode 100644
index 0000000..f111744
--- /dev/null
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -0,0 +1,59 @@
+#include <stdbool.h>
+#include <errno.h>
+
+#include <linux/perf_event.h>
+
+#include "../../perf.h"
+#include "../../util/types.h"
+#include "../../util/debug.h"
+#include "tsc.h"
+
+u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc)
+{
+	u64 time, quot, rem;
+
+	time = ns - tc->time_zero;
+	quot = time / tc->time_mult;
+	rem  = time % tc->time_mult;
+	return (quot << tc->time_shift) +
+	       (rem << tc->time_shift) / tc->time_mult;
+}
+
+u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
+{
+	u64 quot, rem;
+
+	quot = cyc >> tc->time_shift;
+	rem  = cyc & ((1 << tc->time_shift) - 1);
+	return tc->time_zero + quot * tc->time_mult +
+	       ((rem * tc->time_mult) >> tc->time_shift);
+}
+
+int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
+			     struct perf_tsc_conversion *tc)
+{
+	bool cap_usr_time_zero;
+	u32 seq;
+	int i = 0;
+
+	while (1) {
+		seq = pc->lock;
+		rmb();
+		tc->time_mult = pc->time_mult;
+		tc->time_shift = pc->time_shift;
+		tc->time_zero = pc->time_zero;
+		cap_usr_time_zero = pc->cap_usr_time_zero;
+		rmb();
+		if (pc->lock == seq && !(seq & 1))
+			break;
+		if (++i > 10000) {
+			pr_debug("failed to get perf_event_mmap_page lock\n");
+			return -EINVAL;
+		}
+	}
+
+	if (!cap_usr_time_zero)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
diff --git a/tools/perf/arch/x86/util/tsc.h b/tools/perf/arch/x86/util/tsc.h
new file mode 100644
index 0000000..a24dec8
--- /dev/null
+++ b/tools/perf/arch/x86/util/tsc.h
@@ -0,0 +1,20 @@
+#ifndef TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
+#define TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
+
+#include "../../util/types.h"
+
+struct perf_tsc_conversion {
+	u16 time_shift;
+	u32 time_mult;
+	u64 time_zero;
+};
+
+struct perf_event_mmap_page;
+
+int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
+			     struct perf_tsc_conversion *tc);
+
+u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);
+u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
+
+#endif /* TOOLS_PERF_ARCH_X86_UTIL_TSC_H__ */
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 35b45f1466..b7b4049 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -93,6 +93,12 @@ static struct test {
 		.desc = "Test software clock events have valid period values",
 		.func = test__sw_clock_freq,
 	},
+#if defined(__x86_64__) || defined(__i386__)
+	{
+		.desc = "Test converting perf time to TSC",
+		.func = test__perf_time_to_tsc,
+	},
+#endif
 	{
 		.func = NULL,
 	},
diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c
new file mode 100644
index 0000000..0ab61b1
--- /dev/null
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -0,0 +1,177 @@
+#include <stdio.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <sys/prctl.h>
+
+#include "parse-events.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "thread_map.h"
+#include "cpumap.h"
+#include "tests.h"
+
+#include "../arch/x86/util/tsc.h"
+
+#define CHECK__(x) {				\
+	while ((x) < 0) {			\
+		pr_debug(#x " failed!\n");	\
+		goto out_err;			\
+	}					\
+}
+
+#define CHECK_NOT_NULL__(x) {			\
+	while ((x) == NULL) {			\
+		pr_debug(#x " failed!\n");	\
+		goto out_err;			\
+	}					\
+}
+
+static u64 rdtsc(void)
+{
+	unsigned int low, high;
+
+	asm volatile("rdtsc" : "=a" (low), "=d" (high));
+
+	return low | ((u64)high) << 32;
+}
+
+/**
+ * test__perf_time_to_tsc - test converting perf time to TSC.
+ *
+ * This function implements a test that checks that the conversion of perf time
+ * to and from TSC is consistent with the order of events.  If the test passes
+ * %0 is returned, otherwise %-1 is returned.  If TSC conversion is not
+ * supported then then the test passes but " (not supported)" is printed.
+ */
+int test__perf_time_to_tsc(void)
+{
+	struct perf_record_opts opts = {
+		.mmap_pages	     = UINT_MAX,
+		.user_freq	     = UINT_MAX,
+		.user_interval	     = ULLONG_MAX,
+		.freq		     = 4000,
+		.target		     = {
+			.uses_mmap   = true,
+		},
+		.sample_time	     = true,
+	};
+	struct thread_map *threads = NULL;
+	struct cpu_map *cpus = NULL;
+	struct perf_evlist *evlist = NULL;
+	struct perf_evsel *evsel = NULL;
+	int err = -1, ret, i;
+	const char *comm1, *comm2;
+	struct perf_tsc_conversion tc;
+	struct perf_event_mmap_page *pc;
+	union perf_event *event;
+	u64 test_tsc, comm1_tsc, comm2_tsc;
+	u64 test_time, comm1_time = 0, comm2_time = 0;
+
+	threads = thread_map__new(-1, getpid(), UINT_MAX);
+	CHECK_NOT_NULL__(threads);
+
+	cpus = cpu_map__new(NULL);
+	CHECK_NOT_NULL__(cpus);
+
+	evlist = perf_evlist__new();
+	CHECK_NOT_NULL__(evlist);
+
+	perf_evlist__set_maps(evlist, cpus, threads);
+
+	CHECK__(parse_events(evlist, "cycles:u"));
+
+	perf_evlist__config(evlist, &opts);
+
+	evsel = perf_evlist__first(evlist);
+
+	evsel->attr.comm = 1;
+	evsel->attr.disabled = 1;
+	evsel->attr.enable_on_exec = 0;
+
+	CHECK__(perf_evlist__open(evlist));
+
+	CHECK__(perf_evlist__mmap(evlist, UINT_MAX, false));
+
+	pc = evlist->mmap[0].base;
+	ret = perf_read_tsc_conversion(pc, &tc);
+	if (ret) {
+		if (ret == -EOPNOTSUPP) {
+			fprintf(stderr, " (not supported)");
+			return 0;
+		}
+		goto out_err;
+	}
+
+	perf_evlist__enable(evlist);
+
+	comm1 = "Test COMM 1";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm1, 0, 0, 0));
+
+	test_tsc = rdtsc();
+
+	comm2 = "Test COMM 2";
+	CHECK__(prctl(PR_SET_NAME, (unsigned long)comm2, 0, 0, 0));
+
+	perf_evlist__disable(evlist);
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+			struct perf_sample sample;
+
+			if (event->header.type != PERF_RECORD_COMM ||
+			    (pid_t)event->comm.pid != getpid() ||
+			    (pid_t)event->comm.tid != getpid())
+				continue;
+
+			if (strcmp(event->comm.comm, comm1) == 0) {
+				CHECK__(perf_evsel__parse_sample(evsel, event,
+								 &sample));
+				comm1_time = sample.time;
+			}
+			if (strcmp(event->comm.comm, comm2) == 0) {
+				CHECK__(perf_evsel__parse_sample(evsel, event,
+								 &sample));
+				comm2_time = sample.time;
+			}
+		}
+	}
+
+	if (!comm1_time || !comm2_time)
+		goto out_err;
+
+	test_time = tsc_to_perf_time(test_tsc, &tc);
+	comm1_tsc = perf_time_to_tsc(comm1_time, &tc);
+	comm2_tsc = perf_time_to_tsc(comm2_time, &tc);
+
+	pr_debug("1st event perf time %"PRIu64" tsc %"PRIu64"\n",
+		 comm1_time, comm1_tsc);
+	pr_debug("rdtsc          time %"PRIu64" tsc %"PRIu64"\n",
+		 test_time, test_tsc);
+	pr_debug("2nd event perf time %"PRIu64" tsc %"PRIu64"\n",
+		 comm2_time, comm2_tsc);
+
+	if (test_time <= comm1_time ||
+	    test_time >= comm2_time)
+		goto out_err;
+
+	if (test_tsc <= comm1_tsc ||
+	    test_tsc >= comm2_tsc)
+		goto out_err;
+
+	err = 0;
+
+out_err:
+	if (evlist) {
+		perf_evlist__disable(evlist);
+		perf_evlist__munmap(evlist);
+		perf_evlist__close(evlist);
+		perf_evlist__delete(evlist);
+	}
+	if (cpus)
+		cpu_map__delete(cpus);
+	if (threads)
+		thread_map__delete(threads);
+
+	return err;
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 07a92f9..d22202a 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -35,5 +35,6 @@ int test__bp_signal(void);
 int test__bp_signal_overflow(void);
 int test__task_exit(void);
 int test__sw_clock_freq(void);
+int test__perf_time_to_tsc(void);
 
 #endif /* TESTS_H */

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-07-24  3:56   ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' tip-bot for Adrian Hunter
@ 2013-09-17 20:23     ` Vince Weaver
  2013-09-17 20:35       ` Vince Weaver
                         ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Vince Weaver @ 2013-09-17 20:23 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, a.p.zijlstra, adrian.hunter, tglx
  Cc: linux-tip-commits


This patch somehow breaks the perf-ABI.

If I take a program that reads "mmap->cap_usr_rdpmc" and compile it
against the new header with this change (say from 3.12-rc1)
and then run it on an old kernel (say 3.11) then I get "0" for
cap_usr_rdpmc.

If I take the same program and recompile against the old (without this 
patch) header and run it on 3.11, I get the expected "1" value.

So something about this changed the bit pattern in an incompatible 
fashion.

Vince




On Tue, 23 Jul 2013, tip-bot for Adrian Hunter wrote:

> Commit-ID:  860f085b74e9f0075de8140ed3a1e5b5e3e39aa8
> Gitweb:     http://git.kernel.org/tip/860f085b74e9f0075de8140ed3a1e5b5e3e39aa8
> Author:     Adrian Hunter <adrian.hunter@intel.com>
> AuthorDate: Fri, 28 Jun 2013 16:22:17 +0300
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 23 Jul 2013 12:17:10 +0200
> 
> perf: Fix broken union in 'struct perf_event_mmap_page'
> 
> The capabilities bits must not be "union'ed" together.
> Put them in a separate struct.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Link: http://lkml.kernel.org/r/1372425741-1676-2-git-send-email-adrian.hunter@intel.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  include/uapi/linux/perf_event.h | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 00d8274..0041aed 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -375,9 +375,11 @@ struct perf_event_mmap_page {
>  	__u64	time_running;		/* time event on cpu */
>  	union {
>  		__u64	capabilities;
> -		__u64	cap_usr_time  : 1,
> -			cap_usr_rdpmc : 1,
> -			cap_____res   : 62;
> +		struct {
> +			__u64	cap_usr_time		: 1,
> +				cap_usr_rdpmc		: 1,
> +				cap_____res		: 62;
> +		};
>  	};
>  
>  	/*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
k

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-17 20:23     ` Vince Weaver
@ 2013-09-17 20:35       ` Vince Weaver
  2013-09-19  8:42         ` Ingo Molnar
  2013-09-18  8:57       ` Peter Zijlstra
  2013-09-18  9:13       ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' Adrian Hunter
  2 siblings, 1 reply; 39+ messages in thread
From: Vince Weaver @ 2013-09-17 20:35 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, a.p.zijlstra, adrian.hunter, tglx
  Cc: linux-tip-commits

On Tue, 17 Sep 2013, Vince Weaver wrote:

> 
> This patch somehow breaks the perf-ABI.
> 
> If I take a program that reads "mmap->cap_usr_rdpmc" and compile it
> against the new header with this change (say from 3.12-rc1)
> and then run it on an old kernel (say 3.11) then I get "0" for
> cap_usr_rdpmc.
> 
> If I take the same program and recompile against the old (without this 
> patch) header and run it on 3.11, I get the expected "1" value.

To follow up, the original case:

        union {
                __u64   capabilities;
                __u64   cap_usr_time  : 1,
                        cap_usr_rdpmc : 1,
                        cap_____res   : 62;
        };

        Then mmap->usr_rdpmc=1; gets assembled as

        400420:       80 0d 11 05 20 00 02    orb    $0x2,0x200511(%rip)        

The new version

       union {
                __u64   capabilities;
                struct {
                        __u64   cap_usr_time            : 1,
                                cap_usr_rdpmc           : 1,
                                cap_usr_time_zero       : 1,
                                cap_____res             : 61;
                };
        };

        mmap->usr_rdpmc=1; gets assembled as


       400427:       80 0d 02 05 20 00 01    orb    $0x1,0x200502(%rip)        

note the difference in the or value.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-17 20:23     ` Vince Weaver
  2013-09-17 20:35       ` Vince Weaver
@ 2013-09-18  8:57       ` Peter Zijlstra
  2013-09-18 14:19         ` Vince Weaver
  2013-09-18  9:13       ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' Adrian Hunter
  2 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-09-18  8:57 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mingo, hpa, linux-kernel, adrian.hunter, tglx, linux-tip-commits

On Tue, Sep 17, 2013 at 04:23:25PM -0400, Vince Weaver wrote:
> >  include/uapi/linux/perf_event.h | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 00d8274..0041aed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -375,9 +375,11 @@ struct perf_event_mmap_page {
> >  	__u64	time_running;		/* time event on cpu */
> >  	union {
> >  		__u64	capabilities;
> > -		__u64	cap_usr_time  : 1,
> > -			cap_usr_rdpmc : 1,
> > -			cap_____res   : 62;
> > +		struct {
> > +			__u64	cap_usr_time		: 1,
> > +				cap_usr_rdpmc		: 1,
> > +				cap_____res		: 62;
> > +		};
> >  	};

> This patch somehow breaks the perf-ABI.

Difficult call that..

> If I take a program that reads "mmap->cap_usr_rdpmc" and compile it
> against the new header with this change (say from 3.12-rc1)
> and then run it on an old kernel (say 3.11) then I get "0" for
> cap_usr_rdpmc.
> 
> If I take the same program and recompile against the old (without this 
> patch) header and run it on 3.11, I get the expected "1" value.
> 
> So something about this changed the bit pattern in an incompatible 
> fashion.

Which was the entire point of the change. Previously cap_usr_time and
cap_usr_rdpmc were the same bit which clearly cannot be right.

With the change they're consecutive bits in the capabilities word.

Should we preserve completely broken things that don't work?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-17 20:23     ` Vince Weaver
  2013-09-17 20:35       ` Vince Weaver
  2013-09-18  8:57       ` Peter Zijlstra
@ 2013-09-18  9:13       ` Adrian Hunter
  2013-09-18 14:10         ` Vince Weaver
  2 siblings, 1 reply; 39+ messages in thread
From: Adrian Hunter @ 2013-09-18  9:13 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mingo, hpa, linux-kernel, a.p.zijlstra, tglx, linux-tip-commits

On 17/09/13 23:23, Vince Weaver wrote:
> 
> This patch somehow breaks the perf-ABI.
> 
> If I take a program that reads "mmap->cap_usr_rdpmc" and compile it
> against the new header with this change (say from 3.12-rc1)
> and then run it on an old kernel (say 3.11) then I get "0" for
> cap_usr_rdpmc.
> 
> If I take the same program and recompile against the old (without this 
> patch) header and run it on 3.11, I get the expected "1" value.
> 
> So something about this changed the bit pattern in an incompatible 
> fashion.


cap_usr_time and cap_usr_rdpmc were occupying the same bit position i.e. bit 0

That means that cap_usr_time and cap_usr_rdpmc were both unreliable.

If you look at the logic:

void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
{
    userpg->cap_usr_time = 0;
    userpg->cap_usr_time_zero = 0;
    userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
    userpg->pmc_width = x86_pmu.cntval_bits;

    if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
        return;

    if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
        return;

    userpg->cap_usr_time = 1;
    userpg->time_mult = this_cpu_read(cyc2ns);
    userpg->time_shift = CYC2NS_SCALE_FACTOR;
    userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;

    if (sched_clock_stable && !check_tsc_disabled()) {
        userpg->cap_usr_time_zero = 1;
        userpg->time_zero = this_cpu_read(cyc2ns_offset);
    }
}

The incorrect union caused 2 bugs:

1. On hardware with constant, non-stop TSC cap_usr_rdpmc was always 1.

2. On hardware without constant, non-stop TSC cap_usr_time was still 1 if
rdpmc was allowed in userspace.


Possible improvements are one or both of:
1. Add cap_usr_fixed to identify kernels that have the capabilities bits fixed
2. Swap the positions of cap_usr_time and cap_usr_rdpmc so that
cap_usr_rdpmc remains in bit 0






> 
> Vince
> 
> 
> 
> 
> On Tue, 23 Jul 2013, tip-bot for Adrian Hunter wrote:
> 
>> Commit-ID:  860f085b74e9f0075de8140ed3a1e5b5e3e39aa8
>> Gitweb:     http://git.kernel.org/tip/860f085b74e9f0075de8140ed3a1e5b5e3e39aa8
>> Author:     Adrian Hunter <adrian.hunter@intel.com>
>> AuthorDate: Fri, 28 Jun 2013 16:22:17 +0300
>> Committer:  Ingo Molnar <mingo@kernel.org>
>> CommitDate: Tue, 23 Jul 2013 12:17:10 +0200
>>
>> perf: Fix broken union in 'struct perf_event_mmap_page'
>>
>> The capabilities bits must not be "union'ed" together.
>> Put them in a separate struct.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Link: http://lkml.kernel.org/r/1372425741-1676-2-git-send-email-adrian.hunter@intel.com
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  include/uapi/linux/perf_event.h | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index 00d8274..0041aed 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -375,9 +375,11 @@ struct perf_event_mmap_page {
>>  	__u64	time_running;		/* time event on cpu */
>>  	union {
>>  		__u64	capabilities;
>> -		__u64	cap_usr_time  : 1,
>> -			cap_usr_rdpmc : 1,
>> -			cap_____res   : 62;
>> +		struct {
>> +			__u64	cap_usr_time		: 1,
>> +				cap_usr_rdpmc		: 1,
>> +				cap_____res		: 62;
>> +		};
>>  	};
>>  
>>  	/*
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> k
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18  9:13       ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' Adrian Hunter
@ 2013-09-18 14:10         ` Vince Weaver
  0 siblings, 0 replies; 39+ messages in thread
From: Vince Weaver @ 2013-09-18 14:10 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: mingo, hpa, linux-kernel, a.p.zijlstra, tglx, linux-tip-commits

On Wed, 18 Sep 2013, Adrian Hunter wrote:

> On 17/09/13 23:23, Vince Weaver wrote:
> > 
> > This patch somehow breaks the perf-ABI.
> > 
> > If I take a program that reads "mmap->cap_usr_rdpmc" and compile it
> > against the new header with this change (say from 3.12-rc1)
> > and then run it on an old kernel (say 3.11) then I get "0" for
> > cap_usr_rdpmc.
> > 
> > If I take the same program and recompile against the old (without this 
> > patch) header and run it on 3.11, I get the expected "1" value.
> > 
> > So something about this changed the bit pattern in an incompatible 
> > fashion.
> 
> 
> cap_usr_time and cap_usr_rdpmc were occupying the same bit position i.e. bit 0
> 
> That means that cap_usr_time and cap_usr_rdpmc were both unreliable.

well then I have to say this patch wins today's award for "least 
useful commit message".  Why wasn't this important info there?
>From what it originally sounded like this was just some shuffling of
low-level C minutia, not a hard-to-resolve break in the perf ABI.

> Possible improvements are one or both of:
> 1. Add cap_usr_fixed to identify kernels that have the capabilities bits fixed
> 2. Swap the positions of cap_usr_time and cap_usr_rdpmc so that
> cap_usr_rdpmc remains in bit 0

Or just create two new fields and mark the old ones as obsolete somehow.

The problem with just making this change silently is now tools like PAPI 
that care about cap_usr_rdpmc will behave in hard-to-debug ways.

For example, code compiled against new headers but run with old kernels 
won't detect rdpmc properly

Code compiled against old headers but run with new kernels won't detect 
rdpmc properly.

It's true existing code could have problems if you ran on a machine with 
conflicting rdpmc/usr_time, but my tests never noticed that, presumably 
because rdpmc and usr_time were added at the same time (in 3.4), x86 
always enables rdpmc, and none of my code uses usr_time so I never see 
wrong time results.

Vince


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18  8:57       ` Peter Zijlstra
@ 2013-09-18 14:19         ` Vince Weaver
  2013-09-18 15:42           ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Vince Weaver @ 2013-09-18 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, adrian.hunter, tglx, linux-tip-commits

On Wed, 18 Sep 2013, Peter Zijlstra wrote:

> > This patch somehow breaks the perf-ABI.
> 
> Difficult call that..

OK, let me rephrase.

This change broke existing working code.

Can you point to any code that is fixed by the commit?

If not, I think the rule is you revert the changeset.

Or you can just mark the bug "Won't fix, perf tool doesn't use it" I guess

You guys told me to start running my tests in -rc1 so I can catch bugs 
early enough to fix things.  So here I am, finding a problem in -rc1.
If no one cares I'll stop bothering and go back to just documenting the 
problems after the fact in complaining webpages.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18 14:19         ` Vince Weaver
@ 2013-09-18 15:42           ` Peter Zijlstra
  2013-09-18 18:33             ` Stephane Eranian
  2013-09-18 20:07             ` Vince Weaver
  0 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2013-09-18 15:42 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mingo, hpa, linux-kernel, adrian.hunter, tglx, linux-tip-commits,
	eranian

On Wed, Sep 18, 2013 at 10:19:32AM -0400, Vince Weaver wrote:
> Can you point to any code that is fixed by the commit?

I have some, but I don't think a lot of people use it.

Would you be ok with something like the below? It should preserve
functionality for code that only cares about cap_usr_rdpmc (PAPI).

Stephane, does libpfm use any of these?

--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -380,8 +380,8 @@ struct perf_event_mmap_page {
 	union {
 		__u64	capabilities;
 		struct {
-			__u64	cap_usr_time		: 1,
-				cap_usr_rdpmc		: 1,
+			__u64	cap_usr_rdpmc		: 1,
+				cap_usr_time		: 1,
 				cap_usr_time_zero	: 1,
 				cap_____res		: 61;
 		};

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18 15:42           ` Peter Zijlstra
@ 2013-09-18 18:33             ` Stephane Eranian
  2013-09-19  8:43               ` Peter Zijlstra
  2013-09-18 20:07             ` Vince Weaver
  1 sibling, 1 reply; 39+ messages in thread
From: Stephane Eranian @ 2013-09-18 18:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, Peter Anvin, LKML, adrian.hunter,
	Thomas Gleixner, linux-tip-commits

On Wed, Sep 18, 2013 at 5:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 18, 2013 at 10:19:32AM -0400, Vince Weaver wrote:
>> Can you point to any code that is fixed by the commit?
>
> I have some, but I don't think a lot of people use it.
>
> Would you be ok with something like the below? It should preserve
> functionality for code that only cares about cap_usr_rdpmc (PAPI).
>
> Stephane, does libpfm use any of these?
>
Yes, there is an example using this. Need to verify it is not broken
currently (self_count.c).

> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -380,8 +380,8 @@ struct perf_event_mmap_page {
>         union {
>                 __u64   capabilities;
>                 struct {
> -                       __u64   cap_usr_time            : 1,
> -                               cap_usr_rdpmc           : 1,
> +                       __u64   cap_usr_rdpmc           : 1,
> +                               cap_usr_time            : 1,
>                                 cap_usr_time_zero       : 1,
>                                 cap_____res             : 61;
>                 };

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18 15:42           ` Peter Zijlstra
  2013-09-18 18:33             ` Stephane Eranian
@ 2013-09-18 20:07             ` Vince Weaver
  2013-09-19  8:16               ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Vince Weaver @ 2013-09-18 20:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, adrian.hunter, tglx, linux-tip-commits,
	eranian

On Wed, 18 Sep 2013, Peter Zijlstra wrote:

> On Wed, Sep 18, 2013 at 10:19:32AM -0400, Vince Weaver wrote:
> > Can you point to any code that is fixed by the commit?
> 
> I have some, but I don't think a lot of people use it.
> 
> Would you be ok with something like the below? It should preserve
> functionality for code that only cares about cap_usr_rdpmc (PAPI).
> 
> Stephane, does libpfm use any of these?
> 
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -380,8 +380,8 @@ struct perf_event_mmap_page {
>  	union {
>  		__u64	capabilities;
>  		struct {
> -			__u64	cap_usr_time		: 1,
> -				cap_usr_rdpmc		: 1,
> +			__u64	cap_usr_rdpmc		: 1,
> +				cap_usr_time		: 1,
>  				cap_usr_time_zero	: 1,
>  				cap_____res		: 61;
>  		};
> 

It would be nice if there was some way to detect this change; I liked the 
idea of a "cap_usr_fixed" bit.

Even with your change you can't have code that can reliably detect both 
cap_usr_time and cap_usr_rdpmc unless you can guarantee that both 
perf_event.h and the kernel are 3.12 or newer, and it gets more 
complicated if distros backport this patch.

Tools like PAPI often carry around their own copy of perf_event.h and 
people like to move around binaries between machines with different kernel 
versions so things get complicated quickly.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18 20:07             ` Vince Weaver
@ 2013-09-19  8:16               ` Peter Zijlstra
  2013-09-19  9:14                 ` [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-09-19  8:16 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mingo, hpa, linux-kernel, adrian.hunter, tglx, linux-tip-commits,
	eranian

On Wed, Sep 18, 2013 at 04:07:52PM -0400, Vince Weaver wrote:
> It would be nice if there was some way to detect this change; I liked the 
> idea of a "cap_usr_fixed" bit.

How about we start using the version field for this? Arguably we should
have incremented that value every time we changed the thing but we might
as well start now.

---

diff --git a/kernel/events/core.c b/kernel/events/core.c
index dd236b6..c44d55c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3660,6 +3660,25 @@ static void calc_timer_values(struct perf_event *event,
 	*running = ctx_time - event->tstamp_running;
 }
 
+static void perf_event_init_userpage(struct perf_event *event)
+{
+	struct perf_event_mmap_page *userpg;
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb);
+	if (!rb)
+		goto unlock;
+
+	userpg = rb->user_page;
+
+	userpg->version = 1;
+	userpg->compat_version = 0;
+
+unlock:
+	rcu_read_unlock();
+}
+
 void __weak arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 }
@@ -4044,6 +4063,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
 
+	perf_event_init_userpage(event);
 	perf_event_update_userpage(event);
 
 unlock:

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-17 20:35       ` Vince Weaver
@ 2013-09-19  8:42         ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2013-09-19  8:42 UTC (permalink / raw)
  To: Vince Weaver
  Cc: hpa, linux-kernel, a.p.zijlstra, adrian.hunter, tglx, linux-tip-commits


* Vince Weaver <vince@deater.net> wrote:

> On Tue, 17 Sep 2013, Vince Weaver wrote:
> 
> > 
> > This patch somehow breaks the perf-ABI.
> > 
> > If I take a program that reads "mmap->cap_usr_rdpmc" and compile it
> > against the new header with this change (say from 3.12-rc1)
> > and then run it on an old kernel (say 3.11) then I get "0" for
> > cap_usr_rdpmc.
> > 
> > If I take the same program and recompile against the old (without this 
> > patch) header and run it on 3.11, I get the expected "1" value.
> 
> To follow up, the original case:
> 
>         union {
>                 __u64   capabilities;
>                 __u64   cap_usr_time  : 1,
>                         cap_usr_rdpmc : 1,
>                         cap_____res   : 62;
>         };
> 
>         Then mmap->usr_rdpmc=1; gets assembled as
> 
>         400420:       80 0d 11 05 20 00 02    orb    $0x2,0x200511(%rip)        

Hm, how can it be 0x2? Didn't you mix up the two assemblies accidentally?

> 
> The new version
> 
>        union {
>                 __u64   capabilities;
>                 struct {
>                         __u64   cap_usr_time            : 1,
>                                 cap_usr_rdpmc           : 1,
>                                 cap_usr_time_zero       : 1,
>                                 cap_____res             : 61;
>                 };
>         };
> 
>         mmap->usr_rdpmc=1; gets assembled as
> 
> 
>        400427:       80 0d 02 05 20 00 01    orb    $0x1,0x200502(%rip)        
> 
> note the difference in the or value.

Here the value should be 0x2, as cap_usr_rdpmc move to the second bit.

Right?

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-18 18:33             ` Stephane Eranian
@ 2013-09-19  8:43               ` Peter Zijlstra
  2013-09-19  8:55                 ` Stephane Eranian
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-09-19  8:43 UTC (permalink / raw)
  To: eranian
  Cc: Vince Weaver, Ingo Molnar, Peter Anvin, LKML, adrian.hunter,
	Thomas Gleixner, linux-tip-commits

On Wed, Sep 18, 2013 at 08:33:53PM +0200, Stephane Eranian wrote:
> On Wed, Sep 18, 2013 at 5:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Sep 18, 2013 at 10:19:32AM -0400, Vince Weaver wrote:
> >> Can you point to any code that is fixed by the commit?
> >
> > I have some, but I don't think a lot of people use it.
> >
> > Would you be ok with something like the below? It should preserve
> > functionality for code that only cares about cap_usr_rdpmc (PAPI).
> >
> > Stephane, does libpfm use any of these?
> >
> Yes, there is an example using this. Need to verify it is not broken
> currently (self_count.c).

So if that only uses cap_usr_rdpmc, you have the same issue as Vince and
the proposed solutions would work for you too.

If you also use cap_usr_time we've a bit of a problem.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-19  8:43               ` Peter Zijlstra
@ 2013-09-19  8:55                 ` Stephane Eranian
  2013-09-19  9:16                   ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Stephane Eranian @ 2013-09-19  8:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, Peter Anvin, LKML, adrian.hunter,
	Thomas Gleixner, linux-tip-commits

On Thu, Sep 19, 2013 at 10:43 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 18, 2013 at 08:33:53PM +0200, Stephane Eranian wrote:
>> On Wed, Sep 18, 2013 at 5:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Sep 18, 2013 at 10:19:32AM -0400, Vince Weaver wrote:
>> >> Can you point to any code that is fixed by the commit?
>> >
>> > I have some, but I don't think a lot of people use it.
>> >
>> > Would you be ok with something like the below? It should preserve
>> > functionality for code that only cares about cap_usr_rdpmc (PAPI).
>> >
>> > Stephane, does libpfm use any of these?
>> >
>> Yes, there is an example using this. Need to verify it is not broken
>> currently (self_count.c).
>
> So if that only uses cap_usr_rdpmc, you have the same issue as Vince and
> the proposed solutions would work for you too.
>
> If you also use cap_usr_time we've a bit of a problem.

I need to look at this program again, was written a long time ago.
It does not use cap_usr_rdpmc nor cap_use_time for sure.
So which one(s) are okay to use?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI
  2013-09-19  8:16               ` Peter Zijlstra
@ 2013-09-19  9:14                 ` Ingo Molnar
  2013-09-19 10:12                   ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-09-19  9:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, hpa, linux-kernel, adrian.hunter, tglx,
	linux-tip-commits, eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Sep 18, 2013 at 04:07:52PM -0400, Vince Weaver wrote:
>
> > It would be nice if there was some way to detect this change; I liked 
> > the idea of a "cap_usr_fixed" bit.
> 
> How about we start using the version field for this? Arguably we should 
> have incremented that value every time we changed the thing but we might 
> as well start now.

But version fields are really fragile, the way we usually iterate ABIs is 
a self-maintaining size field - which is missing here.

So I think the best solution would be to make it all explicit and 
self-contained:

 - always clear bit 0, and rename it to usrpage->cap_bit0, to at least not 
   confuse old user-space binaries. RDPMC will be marked as unavailable 
   to old binaries but that's within the ABI.

 - rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - use bits 2, 3, 4 for the newly defined, correct functionality.

 - rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

I.e. something like the patch below. (untested)

The 'size' field can then be used in the future to add new fields and it 
will act as a natural ABI version indicator as well.

Thanks,

	Ingo

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8355c84..3ab624c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1883,9 +1883,9 @@ static struct pmu pmu = {
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
-	userpg->cap_usr_time = 0;
-	userpg->cap_usr_time_zero = 0;
-	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
+	userpg->cap_usr_time_used = 0;
+	userpg->cap_usr_time_zero_used = 0;
+	userpg->cap_usr_rdpmc_available = x86_pmu.attr_rdpmc;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
@@ -1894,13 +1894,13 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
 		return;
 
-	userpg->cap_usr_time = 1;
+	userpg->cap_usr_time_used = 1;
 	userpg->time_mult = this_cpu_read(cyc2ns);
 	userpg->time_shift = CYC2NS_SCALE_FACTOR;
 	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
 
 	if (sched_clock_stable && !check_tsc_disabled()) {
-		userpg->cap_usr_time_zero = 1;
+		userpg->cap_usr_time_zero_used = 1;
 		userpg->time_zero = this_cpu_read(cyc2ns_offset);
 	}
 }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 40a1fb8..515d7d2 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -380,10 +380,13 @@ struct perf_event_mmap_page {
 	union {
 		__u64	capabilities;
 		struct {
-			__u64	cap_usr_time		: 1,
-				cap_usr_rdpmc		: 1,
-				cap_usr_time_zero	: 1,
-				cap_____res		: 61;
+			__u64	cap_bit0		: 1, /* Deprecated, always zero, see commit 860f085b74e9 */
+				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
+
+				cap_usr_rdpmc_available	: 1, /* The RDPMC instruction can be used to read counts */
+				cap_usr_time_used	: 1, /* The time_* fields are uses */
+				cap_usr_time_zero_used	: 1, /* The time_zero field is used */
+				cap_____res		: 59;
 		};
 	};
 
@@ -442,12 +445,14 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+	__u32	size;			/* Header size up to this point */
+	__u32	__reserved0;		/* 4 byte hole */
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u64	__reserved[119];	/* align to 1k */
+	__u64	__reserved[118];	/* align to 1k */
 
 	/*
 	 * Control data for the mmap() data buffer.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dd236b6..27d339f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3660,6 +3660,26 @@ static void calc_timer_values(struct perf_event *event,
 	*running = ctx_time - event->tstamp_running;
 }
 
+static void perf_event_init_userpage(struct perf_event *event)
+{
+	struct perf_event_mmap_page *userpg;
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb);
+	if (!rb)
+		goto unlock;
+
+	userpg = rb->user_page;
+
+	/* Allow new userspace to detect that bit 0 is deprecated */
+	userpg->cap_bit0_is_deprecated = 1;
+	userpg->size = offsetof(struct perf_event_mmap_page, size);
+
+unlock:
+	rcu_read_unlock();
+}
+
 void __weak arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 }
@@ -4044,6 +4064,7 @@ again:
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
 
+	perf_event_init_userpage(event);
 	perf_event_update_userpage(event);
 
 unlock:

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page'
  2013-09-19  8:55                 ` Stephane Eranian
@ 2013-09-19  9:16                   ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2013-09-19  9:16 UTC (permalink / raw)
  To: eranian
  Cc: Peter Zijlstra, Vince Weaver, Peter Anvin, LKML, adrian.hunter,
	Thomas Gleixner, linux-tip-commits


* Stephane Eranian <eranian@googlemail.com> wrote:

> On Thu, Sep 19, 2013 at 10:43 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Sep 18, 2013 at 08:33:53PM +0200, Stephane Eranian wrote:
> >> On Wed, Sep 18, 2013 at 5:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Wed, Sep 18, 2013 at 10:19:32AM -0400, Vince Weaver wrote:
> >> >> Can you point to any code that is fixed by the commit?
> >> >
> >> > I have some, but I don't think a lot of people use it.
> >> >
> >> > Would you be ok with something like the below? It should preserve
> >> > functionality for code that only cares about cap_usr_rdpmc (PAPI).
> >> >
> >> > Stephane, does libpfm use any of these?
> >> >
> >> Yes, there is an example using this. Need to verify it is not broken
> >> currently (self_count.c).
> >
> > So if that only uses cap_usr_rdpmc, you have the same issue as Vince and
> > the proposed solutions would work for you too.
> >
> > If you also use cap_usr_time we've a bit of a problem.
> 
> I need to look at this program again, was written a long time ago. It 
> does not use cap_usr_rdpmc nor cap_use_time for sure.

If it does not use either flag (which in released kernels is really a 
single flag ABI-wise) then it should be fine.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI
  2013-09-19  9:14                 ` [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI Ingo Molnar
@ 2013-09-19 10:12                   ` Peter Zijlstra
  2013-09-19 10:28                     ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-09-19 10:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, hpa, linux-kernel, adrian.hunter, tglx,
	linux-tip-commits, eranian

On Thu, Sep 19, 2013 at 11:14:53AM +0200, Ingo Molnar wrote:
> @@ -442,12 +445,14 @@ struct perf_event_mmap_page {
>  	 *               ((rem * time_mult) >> time_shift);
>  	 */
>  	__u64	time_zero;
> +	__u32	size;			/* Header size up to this point */
> +	__u32	__reserved0;		/* 4 byte hole */
>  
>  		/*
>  		 * Hole for extension of the self monitor capabilities
>  		 */
>  
> -	__u64	__reserved[119];	/* align to 1k */
> +	__u64	__reserved[118];	/* align to 1k */
>  
>  	/*
>  	 * Control data for the mmap() data buffer.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dd236b6..27d339f 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3660,6 +3660,26 @@ static void calc_timer_values(struct perf_event *event,
>  	*running = ctx_time - event->tstamp_running;
>  }
>  
> +static void perf_event_init_userpage(struct perf_event *event)
> +{
> +	struct perf_event_mmap_page *userpg;
> +	struct ring_buffer *rb;
> +
> +	rcu_read_lock();
> +	rb = rcu_dereference(event->rb);
> +	if (!rb)
> +		goto unlock;
> +
> +	userpg = rb->user_page;
> +
> +	/* Allow new userspace to detect that bit 0 is deprecated */
> +	userpg->cap_bit0_is_deprecated = 1;
> +	userpg->size = offsetof(struct perf_event_mmap_page, size);

This is fragile and I'm 100% sure we'll forget to update it.

	userpg->size = offsetof(struct perf_event_mmap_page, __reserved);

Will auto update and mostly do the right thing.

Also, how will userspace know there's a valid size field way out there?
Shouldn't we bump the version field to indicate so? :-) After all,
running on old userspace this field will be 0.

> +
> +unlock:
> +	rcu_read_unlock();
> +}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI
  2013-09-19 10:12                   ` Peter Zijlstra
@ 2013-09-19 10:28                     ` Ingo Molnar
  2013-09-19 10:35                       ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-09-19 10:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, hpa, linux-kernel, adrian.hunter, tglx,
	linux-tip-commits, eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 19, 2013 at 11:14:53AM +0200, Ingo Molnar wrote:
> > @@ -442,12 +445,14 @@ struct perf_event_mmap_page {
> >  	 *               ((rem * time_mult) >> time_shift);
> >  	 */
> >  	__u64	time_zero;
> > +	__u32	size;			/* Header size up to this point */
> > +	__u32	__reserved0;		/* 4 byte hole */
> >  
> >  		/*
> >  		 * Hole for extension of the self monitor capabilities
> >  		 */
> >  
> > -	__u64	__reserved[119];	/* align to 1k */
> > +	__u64	__reserved[118];	/* align to 1k */
> >  
> >  	/*
> >  	 * Control data for the mmap() data buffer.
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index dd236b6..27d339f 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3660,6 +3660,26 @@ static void calc_timer_values(struct perf_event *event,
> >  	*running = ctx_time - event->tstamp_running;
> >  }
> >  
> > +static void perf_event_init_userpage(struct perf_event *event)
> > +{
> > +	struct perf_event_mmap_page *userpg;
> > +	struct ring_buffer *rb;
> > +
> > +	rcu_read_lock();
> > +	rb = rcu_dereference(event->rb);
> > +	if (!rb)
> > +		goto unlock;
> > +
> > +	userpg = rb->user_page;
> > +
> > +	/* Allow new userspace to detect that bit 0 is deprecated */
> > +	userpg->cap_bit0_is_deprecated = 1;
> > +	userpg->size = offsetof(struct perf_event_mmap_page, size);
> 
> This is fragile and I'm 100% sure we'll forget to update it.
> 
> 	userpg->size = offsetof(struct perf_event_mmap_page, __reserved);
> 
> Will auto update and mostly do the right thing.

Ah, yes, agreed 100% - that was my intention, just implemented it badly.

One detail: I think we want to track size with u8 granularity, to be able 
to detect when a new u32 (or u16) field gets added, right? Updated patch 
attached below.

Note that this way of writing the array size:

+	__u8	__reserved[118*8+4];	/* align to 1k. */

Makes it sure that we are aware of the current word alignment situation 
and makes it harder to break alignment in the future.

> Also, how will userspace know there's a valid size field way out there? 

The current value of the size field on old kernels is 0 so it easily 
detected by being nonzero.

> Shouldn't we bump the version field to indicate so? :-) After all, 
> running on old userspace this field will be 0.

Well, on old kernel this will be 0, right?

Thanks,

	Ingo

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8355c84..3ab624c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1883,9 +1883,9 @@ static struct pmu pmu = {
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
-	userpg->cap_usr_time = 0;
-	userpg->cap_usr_time_zero = 0;
-	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
+	userpg->cap_usr_time_used = 0;
+	userpg->cap_usr_time_zero_used = 0;
+	userpg->cap_usr_rdpmc_available = x86_pmu.attr_rdpmc;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
@@ -1894,13 +1894,13 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
 		return;
 
-	userpg->cap_usr_time = 1;
+	userpg->cap_usr_time_used = 1;
 	userpg->time_mult = this_cpu_read(cyc2ns);
 	userpg->time_shift = CYC2NS_SCALE_FACTOR;
 	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
 
 	if (sched_clock_stable && !check_tsc_disabled()) {
-		userpg->cap_usr_time_zero = 1;
+		userpg->cap_usr_time_zero_used = 1;
 		userpg->time_zero = this_cpu_read(cyc2ns_offset);
 	}
 }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 40a1fb8..e3514d1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -380,10 +380,13 @@ struct perf_event_mmap_page {
 	union {
 		__u64	capabilities;
 		struct {
-			__u64	cap_usr_time		: 1,
-				cap_usr_rdpmc		: 1,
-				cap_usr_time_zero	: 1,
-				cap_____res		: 61;
+			__u64	cap_bit0		: 1, /* Deprecated, always zero, see commit 860f085b74e9 */
+				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
+
+				cap_usr_rdpmc_available	: 1, /* The RDPMC instruction can be used to read counts */
+				cap_usr_time_used	: 1, /* The time_* fields are uses */
+				cap_usr_time_zero_used	: 1, /* The time_zero field is used */
+				cap_____res		: 59;
 		};
 	};
 
@@ -442,12 +445,13 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+	__u32	size;			/* Header size up to __reserved[] fields. */
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u64	__reserved[119];	/* align to 1k */
+	__u8	__reserved[118*8+4];	/* align to 1k. */
 
 	/*
 	 * Control data for the mmap() data buffer.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dd236b6..cb4238e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3660,6 +3660,26 @@ static void calc_timer_values(struct perf_event *event,
 	*running = ctx_time - event->tstamp_running;
 }
 
+static void perf_event_init_userpage(struct perf_event *event)
+{
+	struct perf_event_mmap_page *userpg;
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb);
+	if (!rb)
+		goto unlock;
+
+	userpg = rb->user_page;
+
+	/* Allow new userspace to detect that bit 0 is deprecated */
+	userpg->cap_bit0_is_deprecated = 1;
+	userpg->size = offsetof(struct perf_event_mmap_page, __reserved);
+
+unlock:
+	rcu_read_unlock();
+}
+
 void __weak arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 }
@@ -4044,6 +4064,7 @@ again:
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
 
+	perf_event_init_userpage(event);
 	perf_event_update_userpage(event);
 
 unlock:

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI
  2013-09-19 10:28                     ` Ingo Molnar
@ 2013-09-19 10:35                       ` Peter Zijlstra
  2013-09-19 10:40                         ` [PATCH, v3] " Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2013-09-19 10:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, hpa, linux-kernel, adrian.hunter, tglx,
	linux-tip-commits, eranian

On Thu, Sep 19, 2013 at 12:28:18PM +0200, Ingo Molnar wrote:

You really don't like version fields do you ;-)

> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 8355c84..3ab624c 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1883,9 +1883,9 @@ static struct pmu pmu = {
>  
>  void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
>  {
> -	userpg->cap_usr_time = 0;
> -	userpg->cap_usr_time_zero = 0;
> -	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
> +	userpg->cap_usr_time_used = 0;
> +	userpg->cap_usr_time_zero_used = 0;
> +	userpg->cap_usr_rdpmc_available = x86_pmu.attr_rdpmc;
>  	userpg->pmc_width = x86_pmu.cntval_bits;
>  
>  	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> @@ -1894,13 +1894,13 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
>  	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
>  		return;
>  
> -	userpg->cap_usr_time = 1;
> +	userpg->cap_usr_time_used = 1;
>  	userpg->time_mult = this_cpu_read(cyc2ns);
>  	userpg->time_shift = CYC2NS_SCALE_FACTOR;
>  	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
>  
>  	if (sched_clock_stable && !check_tsc_disabled()) {
> -		userpg->cap_usr_time_zero = 1;
> +		userpg->cap_usr_time_zero_used = 1;
>  		userpg->time_zero = this_cpu_read(cyc2ns_offset);
>  	}
>  }
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 40a1fb8..e3514d1 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -380,10 +380,13 @@ struct perf_event_mmap_page {
>  	union {
>  		__u64	capabilities;
>  		struct {
> -			__u64	cap_usr_time		: 1,
> -				cap_usr_rdpmc		: 1,
> -				cap_usr_time_zero	: 1,
> -				cap_____res		: 61;
> +			__u64	cap_bit0		: 1, /* Deprecated, always zero, see commit 860f085b74e9 */
> +				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
> +
> +				cap_usr_rdpmc_available	: 1, /* The RDPMC instruction can be used to read counts */
> +				cap_usr_time_used	: 1, /* The time_* fields are uses */
> +				cap_usr_time_zero_used	: 1, /* The time_zero field is used */
> +				cap_____res		: 59;
>  		};
>  	};

Would it make sense to do something like s/cap_usr/cap_user/ and drop
the _available, _used postfixes? It results in different names but
avoids these terribly long ones.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH, v3] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI
  2013-09-19 10:35                       ` Peter Zijlstra
@ 2013-09-19 10:40                         ` Ingo Molnar
  2013-09-19 11:18                           ` Adrian Hunter
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-09-19 10:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, hpa, linux-kernel, adrian.hunter, tglx,
	linux-tip-commits, eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 19, 2013 at 12:28:18PM +0200, Ingo Molnar wrote:
> 
> You really don't like version fields do you ;-)

Indeed they are a horrible concept.

> Would it make sense to do something like s/cap_usr/cap_user/ and drop 
> the _available, _used postfixes? It results in different names but 
> avoids these terribly long ones.

Absolutely! Find updated patch below.

Thanks,

	Ingo

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8355c84..a9c606b 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1883,9 +1883,9 @@ static struct pmu pmu = {
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
-	userpg->cap_usr_time = 0;
-	userpg->cap_usr_time_zero = 0;
-	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
+	userpg->cap_user_time = 0;
+	userpg->cap_user_time_zero = 0;
+	userpg->cap_user_rdpmc = x86_pmu.attr_rdpmc;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
@@ -1894,13 +1894,13 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
 		return;
 
-	userpg->cap_usr_time = 1;
+	userpg->cap_user_time = 1;
 	userpg->time_mult = this_cpu_read(cyc2ns);
 	userpg->time_shift = CYC2NS_SCALE_FACTOR;
 	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
 
 	if (sched_clock_stable && !check_tsc_disabled()) {
-		userpg->cap_usr_time_zero = 1;
+		userpg->cap_user_time_zero = 1;
 		userpg->time_zero = this_cpu_read(cyc2ns_offset);
 	}
 }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 40a1fb8..dd4c903 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -380,10 +380,13 @@ struct perf_event_mmap_page {
 	union {
 		__u64	capabilities;
 		struct {
-			__u64	cap_usr_time		: 1,
-				cap_usr_rdpmc		: 1,
-				cap_usr_time_zero	: 1,
-				cap_____res		: 61;
+			__u64	cap_bit0		: 1, /* Always 0, deprecated, see commit 860f085b74e9 */
+				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
+
+				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
+				cap_user_time		: 1, /* The time_* fields are used */
+				cap_user_time_zero	: 1, /* The time_zero field is used */
+				cap_____res		: 59;
 		};
 	};
 
@@ -442,12 +445,13 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+	__u32	size;			/* Header size up to __reserved[] fields. */
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u64	__reserved[119];	/* align to 1k */
+	__u8	__reserved[118*8+4];	/* align to 1k. */
 
 	/*
 	 * Control data for the mmap() data buffer.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dd236b6..cb4238e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3660,6 +3660,26 @@ static void calc_timer_values(struct perf_event *event,
 	*running = ctx_time - event->tstamp_running;
 }
 
+static void perf_event_init_userpage(struct perf_event *event)
+{
+	struct perf_event_mmap_page *userpg;
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb);
+	if (!rb)
+		goto unlock;
+
+	userpg = rb->user_page;
+
+	/* Allow new userspace to detect that bit 0 is deprecated */
+	userpg->cap_bit0_is_deprecated = 1;
+	userpg->size = offsetof(struct perf_event_mmap_page, __reserved);
+
+unlock:
+	rcu_read_unlock();
+}
+
 void __weak arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 }
@@ -4044,6 +4064,7 @@ again:
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
 
+	perf_event_init_userpage(event);
 	perf_event_update_userpage(event);
 
 unlock:

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH, v3] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI
  2013-09-19 10:40                         ` [PATCH, v3] " Ingo Molnar
@ 2013-09-19 11:18                           ` Adrian Hunter
  2013-09-19 11:42                             ` [PATCH, v4] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Adrian Hunter @ 2013-09-19 11:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Vince Weaver, hpa, linux-kernel, tglx,
	linux-tip-commits, eranian

On 19/09/13 13:40, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, Sep 19, 2013 at 12:28:18PM +0200, Ingo Molnar wrote:
>>
>> You really don't like version fields do you ;-)
> 
> Indeed they are a horrible concept.
> 
>> Would it make sense to do something like s/cap_usr/cap_user/ and drop 
>> the _available, _used postfixes? It results in different names but 
>> avoids these terribly long ones.
> 
> Absolutely! Find updated patch below.
> 
> Thanks,
> 
> 	Ingo
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 8355c84..a9c606b 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1883,9 +1883,9 @@ static struct pmu pmu = {
>  
>  void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
>  {
> -	userpg->cap_usr_time = 0;
> -	userpg->cap_usr_time_zero = 0;
> -	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
> +	userpg->cap_user_time = 0;
> +	userpg->cap_user_time_zero = 0;
> +	userpg->cap_user_rdpmc = x86_pmu.attr_rdpmc;
>  	userpg->pmc_width = x86_pmu.cntval_bits;
>  
>  	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> @@ -1894,13 +1894,13 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
>  	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
>  		return;
>  
> -	userpg->cap_usr_time = 1;
> +	userpg->cap_user_time = 1;
>  	userpg->time_mult = this_cpu_read(cyc2ns);
>  	userpg->time_shift = CYC2NS_SCALE_FACTOR;
>  	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
>  
>  	if (sched_clock_stable && !check_tsc_disabled()) {
> -		userpg->cap_usr_time_zero = 1;
> +		userpg->cap_user_time_zero = 1;
>  		userpg->time_zero = this_cpu_read(cyc2ns_offset);
>  	}
>  }
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 40a1fb8..dd4c903 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -380,10 +380,13 @@ struct perf_event_mmap_page {
>  	union {
>  		__u64	capabilities;
>  		struct {
> -			__u64	cap_usr_time		: 1,
> -				cap_usr_rdpmc		: 1,
> -				cap_usr_time_zero	: 1,
> -				cap_____res		: 61;
> +			__u64	cap_bit0		: 1, /* Always 0, deprecated, see commit 860f085b74e9 */
> +				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
> +
> +				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
> +				cap_user_time		: 1, /* The time_* fields are used */
> +				cap_user_time_zero	: 1, /* The time_zero field is used */
> +				cap_____res		: 59;
>  		};
>  	};
>  
> @@ -442,12 +445,13 @@ struct perf_event_mmap_page {
>  	 *               ((rem * time_mult) >> time_shift);
>  	 */
>  	__u64	time_zero;
> +	__u32	size;			/* Header size up to __reserved[] fields. */
>  
>  		/*
>  		 * Hole for extension of the self monitor capabilities
>  		 */
>  
> -	__u64	__reserved[119];	/* align to 1k */
> +	__u8	__reserved[118*8+4];	/* align to 1k. */
>  
>  	/*
>  	 * Control data for the mmap() data buffer.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dd236b6..cb4238e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3660,6 +3660,26 @@ static void calc_timer_values(struct perf_event *event,
>  	*running = ctx_time - event->tstamp_running;
>  }
>  
> +static void perf_event_init_userpage(struct perf_event *event)
> +{
> +	struct perf_event_mmap_page *userpg;
> +	struct ring_buffer *rb;
> +
> +	rcu_read_lock();
> +	rb = rcu_dereference(event->rb);
> +	if (!rb)
> +		goto unlock;
> +
> +	userpg = rb->user_page;
> +
> +	/* Allow new userspace to detect that bit 0 is deprecated */
> +	userpg->cap_bit0_is_deprecated = 1;
> +	userpg->size = offsetof(struct perf_event_mmap_page, __reserved);
> +
> +unlock:
> +	rcu_read_unlock();
> +}
> +
>  void __weak arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
>  {
>  }
> @@ -4044,6 +4064,7 @@ again:
>  	ring_buffer_attach(event, rb);
>  	rcu_assign_pointer(event->rb, rb);
>  
> +	perf_event_init_userpage(event);
>  	perf_event_update_userpage(event);
>  
>  unlock:
> 
> 

Please consider adding:


diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
index 9570c2b..b2519e4 100644
--- a/tools/perf/arch/x86/util/tsc.c
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -32,7 +32,7 @@ u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
 int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
 			     struct perf_tsc_conversion *tc)
 {
-	bool cap_usr_time_zero;
+	bool cap_user_time_zero;
 	u32 seq;
 	int i = 0;

@@ -42,7 +42,7 @@ int perf_read_tsc_conversion(const struct
perf_event_mmap_page *pc,
 		tc->time_mult = pc->time_mult;
 		tc->time_shift = pc->time_shift;
 		tc->time_zero = pc->time_zero;
-		cap_usr_time_zero = pc->cap_usr_time_zero;
+		cap_user_time_zero = pc->cap_user_time_zero;
 		rmb();
 		if (pc->lock == seq && !(seq & 1))
 			break;
@@ -52,7 +52,7 @@ int perf_read_tsc_conversion(const struct
perf_event_mmap_page *pc,
 		}
 	}

-	if (!cap_usr_time_zero)
+	if (!cap_user_time_zero)
 		return -EOPNOTSUPP;

 	return 0;


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH, v4] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  2013-09-19 11:18                           ` Adrian Hunter
@ 2013-09-19 11:42                             ` Ingo Molnar
  2013-09-19 17:40                               ` Vince Weaver
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-09-19 11:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Vince Weaver, hpa, linux-kernel, tglx,
	linux-tip-commits, eranian


* Adrian Hunter <adrian.hunter@intel.com> wrote:

> >  		struct {
> > -			__u64	cap_usr_time		: 1,
> > -				cap_usr_rdpmc		: 1,
> > -				cap_usr_time_zero	: 1,
> > -				cap_____res		: 61;
> > +			__u64	cap_bit0		: 1, /* Always 0, deprecated, see commit 860f085b74e9 */
> > +				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
> > +
> > +				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
> > +				cap_user_time		: 1, /* The time_* fields are used */
> > +				cap_user_time_zero	: 1, /* The time_zero field is used */
> > +				cap_____res		: 59;
> >  		};

> Please consider adding:

indeed - added. I also build-tested it all and added a changelog - see the 
updated patch below.

Thanks,

	Ingo

--------------------->
Subject: perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 19 Sep 2013 10:16:42 +0200

Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74e9 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
[ Restructured it, using Peter's original patches. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   10 +++++-----
 include/uapi/linux/perf_event.h  |   14 +++++++++-----
 kernel/events/core.c             |   21 +++++++++++++++++++++
 tools/perf/arch/x86/util/tsc.c   |    6 +++---
 4 files changed, 38 insertions(+), 13 deletions(-)

Index: tip/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- tip.orig/arch/x86/kernel/cpu/perf_event.c
+++ tip/arch/x86/kernel/cpu/perf_event.c
@@ -1883,9 +1883,9 @@ static struct pmu pmu = {
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
-	userpg->cap_usr_time = 0;
-	userpg->cap_usr_time_zero = 0;
-	userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
+	userpg->cap_user_time = 0;
+	userpg->cap_user_time_zero = 0;
+	userpg->cap_user_rdpmc = x86_pmu.attr_rdpmc;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
@@ -1894,13 +1894,13 @@ void arch_perf_update_userpage(struct pe
 	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
 		return;
 
-	userpg->cap_usr_time = 1;
+	userpg->cap_user_time = 1;
 	userpg->time_mult = this_cpu_read(cyc2ns);
 	userpg->time_shift = CYC2NS_SCALE_FACTOR;
 	userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
 
 	if (sched_clock_stable && !check_tsc_disabled()) {
-		userpg->cap_usr_time_zero = 1;
+		userpg->cap_user_time_zero = 1;
 		userpg->time_zero = this_cpu_read(cyc2ns_offset);
 	}
 }
Index: tip/include/uapi/linux/perf_event.h
===================================================================
--- tip.orig/include/uapi/linux/perf_event.h
+++ tip/include/uapi/linux/perf_event.h
@@ -380,10 +380,13 @@ struct perf_event_mmap_page {
 	union {
 		__u64	capabilities;
 		struct {
-			__u64	cap_usr_time		: 1,
-				cap_usr_rdpmc		: 1,
-				cap_usr_time_zero	: 1,
-				cap_____res		: 61;
+			__u64	cap_bit0		: 1, /* Always 0, deprecated, see commit 860f085b74e9 */
+				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
+
+				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
+				cap_user_time		: 1, /* The time_* fields are used */
+				cap_user_time_zero	: 1, /* The time_zero field is used */
+				cap_____res		: 59;
 		};
 	};
 
@@ -442,12 +445,13 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+	__u32	size;			/* Header size up to __reserved[] fields. */
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u64	__reserved[119];	/* align to 1k */
+	__u8	__reserved[118*8+4];	/* align to 1k. */
 
 	/*
 	 * Control data for the mmap() data buffer.
Index: tip/kernel/events/core.c
===================================================================
--- tip.orig/kernel/events/core.c
+++ tip/kernel/events/core.c
@@ -3660,6 +3660,26 @@ static void calc_timer_values(struct per
 	*running = ctx_time - event->tstamp_running;
 }
 
+static void perf_event_init_userpage(struct perf_event *event)
+{
+	struct perf_event_mmap_page *userpg;
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb);
+	if (!rb)
+		goto unlock;
+
+	userpg = rb->user_page;
+
+	/* Allow new userspace to detect that bit 0 is deprecated */
+	userpg->cap_bit0_is_deprecated = 1;
+	userpg->size = offsetof(struct perf_event_mmap_page, __reserved);
+
+unlock:
+	rcu_read_unlock();
+}
+
 void __weak arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
 {
 }
@@ -4044,6 +4064,7 @@ again:
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
 
+	perf_event_init_userpage(event);
 	perf_event_update_userpage(event);
 
 unlock:
Index: tip/tools/perf/arch/x86/util/tsc.c
===================================================================
--- tip.orig/tools/perf/arch/x86/util/tsc.c
+++ tip/tools/perf/arch/x86/util/tsc.c
@@ -32,7 +32,7 @@ u64 tsc_to_perf_time(u64 cyc, struct per
 int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
 			     struct perf_tsc_conversion *tc)
 {
-	bool cap_usr_time_zero;
+	bool cap_user_time_zero;
 	u32 seq;
 	int i = 0;
 
@@ -42,7 +42,7 @@ int perf_read_tsc_conversion(const struc
 		tc->time_mult = pc->time_mult;
 		tc->time_shift = pc->time_shift;
 		tc->time_zero = pc->time_zero;
-		cap_usr_time_zero = pc->cap_usr_time_zero;
+		cap_user_time_zero = pc->cap_user_time_zero;
 		rmb();
 		if (pc->lock == seq && !(seq & 1))
 			break;
@@ -52,7 +52,7 @@ int perf_read_tsc_conversion(const struc
 		}
 	}
 
-	if (!cap_usr_time_zero)
+	if (!cap_user_time_zero)
 		return -EOPNOTSUPP;
 
 	return 0;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH, v4] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  2013-09-19 11:42                             ` [PATCH, v4] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' Ingo Molnar
@ 2013-09-19 17:40                               ` Vince Weaver
  2013-09-20  7:44                                 ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Vince Weaver @ 2013-09-19 17:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Adrian Hunter, Peter Zijlstra, hpa, linux-kernel, tglx,
	linux-tip-commits, eranian

On Thu, 19 Sep 2013, Ingo Molnar wrote:

> To solve all that make this change explicit, detectable and self-contained,
> by iterating the ABI the following way:
> 
>  - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
>    confuse old user-space binaries. RDPMC will be marked as unavailable
>    to old binaries but that's within the ABI, this is a capability bit.
> 
>  - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
>    libraries can reliably detect that bit 0 is deprecated and perma-zero
>    without having to check the kernel version.
> 
>  - Use bits 2, 3, 4 for the newly defined, correct functionality:
> 
> 	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
> 	cap_user_time		: 1, /* The time_* fields are used */
> 	cap_user_time_zero	: 1, /* The time_zero field is used */
> 

So let me think through the possible combinations:

OLD-KERNEL OLD-HEADER
   + cap_usr_rdpmc and cap_usr_time map to the same bit
   + In general for kernels from 3.4 to 3.11 the bit will be set
     for all x86

NEW-KERNEL OLD-HEADER
   + cap_usr_rdpmc (cap_bit0) and cap_usr_time (cap_bit0) both are 0
   + code detecting cap_usr_rdpmc will probably fall back to non-rdpmc
     even through rdpmc code is probably there and functioning

OLD-KERNEL NEW-HEADER
   + cap_user_rdpmc and cap_user_time both set to 0
   + cap_bit0_is_deprecated can be read; if it is 0 you can
     read cap_bit0, and if it is 1, assume rdpmc is available
     with the same likelyhood you could with the old header

NEW-KERNEL NEW-HEADER
   + Use cap_user_rdpmc and cap_user_time and everything is OK


>  - Rename all the bitfield names in perf_event.h to be different from the
>    old names, to make sure it's not possible to mis-compile it
>    accidentally with old assumptions.

Well this breaks that API, though I guess there are no guarantees there.
I guess that's intentional since it will force 
users to fix their code, but a pain if you aren't expecting it and 
suddenly your project doesn't compile anymore after a kernel-headers 
update.  Most of my code carries around its own perf_event.h so I guess
I'll be unaffected.

In any case this seems to be about as reasonable a solution to this 
problem as we can get.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH, v4] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  2013-09-19 17:40                               ` Vince Weaver
@ 2013-09-20  7:44                                 ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2013-09-20  7:44 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Adrian Hunter, Peter Zijlstra, hpa, linux-kernel, tglx,
	linux-tip-commits, eranian


* Vince Weaver <vince@deater.net> wrote:

> On Thu, 19 Sep 2013, Ingo Molnar wrote:
> 
> > To solve all that make this change explicit, detectable and self-contained,
> > by iterating the ABI the following way:
> > 
> >  - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
> >    confuse old user-space binaries. RDPMC will be marked as unavailable
> >    to old binaries but that's within the ABI, this is a capability bit.
> > 
> >  - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
> >    libraries can reliably detect that bit 0 is deprecated and perma-zero
> >    without having to check the kernel version.
> > 
> >  - Use bits 2, 3, 4 for the newly defined, correct functionality:
> > 
> > 	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
> > 	cap_user_time		: 1, /* The time_* fields are used */
> > 	cap_user_time_zero	: 1, /* The time_zero field is used */
> > 
> 
> So let me think through the possible combinations:
> 
> OLD-KERNEL OLD-HEADER
>    + cap_usr_rdpmc and cap_usr_time map to the same bit
>    + In general for kernels from 3.4 to 3.11 the bit will be set
>      for all x86
> 
> NEW-KERNEL OLD-HEADER
>    + cap_usr_rdpmc (cap_bit0) and cap_usr_time (cap_bit0) both are 0
>    + code detecting cap_usr_rdpmc will probably fall back to non-rdpmc
>      even through rdpmc code is probably there and functioning
> 
> OLD-KERNEL NEW-HEADER
>    + cap_user_rdpmc and cap_user_time both set to 0
>    + cap_bit0_is_deprecated can be read; if it is 0 you can
>      read cap_bit0, and if it is 1, assume rdpmc is available
>      with the same likelyhood you could with the old header

Yes - this is the enabling vector to support old kernels (to the extent 
it's possible), in newly built libraries - without having to test for some 
explicit kernel version in the quirk code. (which is really a fragile 
concept when features/fixes get backported.)

> NEW-KERNEL NEW-HEADER
>    + Use cap_user_rdpmc and cap_user_time and everything is OK

Yes.

> >  - Rename all the bitfield names in perf_event.h to be different from the
> >    old names, to make sure it's not possible to mis-compile it
> >    accidentally with old assumptions.
> 
> Well this breaks that API, though I guess there are no guarantees there.

Not just that there are no guarantees, but without the rename old code 
with new headers could also result in hard to debug "it seems to work but 
is really broken" code.

> I guess that's intentional since it will force users to fix their code, 
> but a pain if you aren't expecting it and suddenly your project doesn't 
> compile anymore after a kernel-headers update.  Most of my code carries 
> around its own perf_event.h so I guess I'll be unaffected.

I'd expect mainly libraries to ever run into the build failure, not end 
users.

> In any case this seems to be about as reasonable a solution to this 
> problem as we can get.

Great, thanks for the review!

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2013-09-20  7:44 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-28 13:22 [PATCH 0/5] perf: add two new features Adrian Hunter
2013-06-28 13:22 ` [PATCH 1/5] perf: fix broken union in perf_event_mmap_page Adrian Hunter
2013-06-28 15:22   ` Peter Zijlstra
2013-07-16 11:51     ` H. Peter Anvin
2013-07-24  3:56   ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' tip-bot for Adrian Hunter
2013-09-17 20:23     ` Vince Weaver
2013-09-17 20:35       ` Vince Weaver
2013-09-19  8:42         ` Ingo Molnar
2013-09-18  8:57       ` Peter Zijlstra
2013-09-18 14:19         ` Vince Weaver
2013-09-18 15:42           ` Peter Zijlstra
2013-09-18 18:33             ` Stephane Eranian
2013-09-19  8:43               ` Peter Zijlstra
2013-09-19  8:55                 ` Stephane Eranian
2013-09-19  9:16                   ` Ingo Molnar
2013-09-18 20:07             ` Vince Weaver
2013-09-19  8:16               ` Peter Zijlstra
2013-09-19  9:14                 ` [PATCH] perf: Always set bit 0 in the capabilities field of 'struct perf_event_mmap_page' to 0, to maintain the ABI Ingo Molnar
2013-09-19 10:12                   ` Peter Zijlstra
2013-09-19 10:28                     ` Ingo Molnar
2013-09-19 10:35                       ` Peter Zijlstra
2013-09-19 10:40                         ` [PATCH, v3] " Ingo Molnar
2013-09-19 11:18                           ` Adrian Hunter
2013-09-19 11:42                             ` [PATCH, v4] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' Ingo Molnar
2013-09-19 17:40                               ` Vince Weaver
2013-09-20  7:44                                 ` Ingo Molnar
2013-09-18  9:13       ` [tip:perf/core] perf: Fix broken union in ' struct perf_event_mmap_page' Adrian Hunter
2013-09-18 14:10         ` Vince Weaver
2013-06-28 13:22 ` [PATCH 2/5] x86: add ability to calculate TSC from perf sample timestamps Adrian Hunter
2013-07-24  3:56   ` [tip:perf/core] perf/x86: Add " tip-bot for Adrian Hunter
2013-06-28 13:22 ` [PATCH 3/5] perf tools: add test for converting perf time to/from TSC Adrian Hunter
2013-07-24  3:56   ` [tip:perf/core] perf tools: Add test for converting perf time to/ from TSC tip-bot for Adrian Hunter
2013-06-28 13:22 ` [PATCH 4/5] perf: add 'keep tracking' flag to PERF_EVENT_IOC_DISABLE Adrian Hunter
2013-06-28 13:22 ` [PATCH 5/5] perf tools: add 'keep tracking' test Adrian Hunter
2013-06-28 15:27 ` [PATCH 0/5] perf: add two new features Peter Zijlstra
2013-06-28 19:22   ` Adrian Hunter
2013-07-16  6:22     ` Adrian Hunter
2013-07-16 14:34       ` Peter Zijlstra
2013-07-17 11:28         ` Adrian Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.