linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] ARCv2 port to Linux - (C) perf
@ 2015-08-05 15:13 Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 1/8] ARC: perf: support RAW events Alexey Brodkin
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz, Alexey Brodkin

Hi Peter,

This mini-series adds perf support for ARCv2 based cores, which brings in
overflow interupts and SMP. Additionally now raw events are supported as well.

Please review !

Compared to v1 this series has:
 [1] Addressed review comments
 [2] More verbose commit messages and comments in sources
 [3] Minor cosmetics

Thanks,
Alexey


Alexey Brodkin (6):
  ARC: perf: support RAW events
  ARCv2: perf: implement "event_set_period" for future use with
    interrupts
  ARCv2: perf: Support sampling events using overflow interrupts
  ARCv2: perf: set usable max period as a half of real max period
  ARCv2: perf: implement exclusion of event counting in user or kernel
    mode
  ARCv2: perf: SMP support

Vineet Gupta (2):
  ARC: perf: cap the number of counters to hardware max of 32
  ARCv2: perf: Finally introduce HS perf unit

 .../devicetree/bindings/arc/archs-pct.txt          |  17 +
 MAINTAINERS                                        |   2 +-
 arch/arc/include/asm/perf_event.h                  |  24 +-
 arch/arc/kernel/perf_event.c                       | 350 ++++++++++++++++++---
 4 files changed, 345 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt

-- 
2.4.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/8] ARC: perf: support RAW events
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 2/8] ARC: perf: cap the number of counters to hardware max of 32 Alexey Brodkin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Alexey Brodkin, Arnaldo Carvalho de Melo

To run perf against raw event user may issue following command:
 -------------->-------------
 # perf stat -e r6372756e ls -la /proc > /dev/null

  Performance counter stats for 'ls -la /proc':

            7336905      r6372756e

        0.085494733 seconds time elapsed
 -------------->-------------

"-e rXXX" is indication of raw event to count.
XXX is 64-bit ASCII value.
0x6372756e = crun (in ASCII)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

Compared to v1:
 [1] Swapping of event names moved to probe routine so now we're closer to
     real raw event in terms of accepting exactly what user entered.
 [2] Added comment in sources that explains logic for swapping etc.
 [3] Cosmetics

 arch/arc/include/asm/perf_event.h |  3 ++
 arch/arc/kernel/perf_event.c      | 78 +++++++++++++++++++++++++++++++++------
 2 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 2b8880e..ea43477 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -15,6 +15,9 @@
 /* real maximum varies per CPU, this is the maximum supported by the driver */
 #define ARC_PMU_MAX_HWEVENTS	64
 
+/* Max number of countable events that CPU may have */
+#define ARC_PERF_MAX_EVENTS	256
+
 #define ARC_REG_CC_BUILD	0xF6
 #define ARC_REG_CC_INDEX	0x240
 #define ARC_REG_CC_NAME0	0x241
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 1287388..ae4a921 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -22,8 +22,10 @@ struct arc_pmu {
 	struct pmu	pmu;
 	int		counter_size;	/* in bits */
 	int		n_counters;
+	int		n_events;
 	unsigned long	used_mask[BITS_TO_LONGS(ARC_PMU_MAX_HWEVENTS)];
 	int		ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
+	u64             raw_events[ARC_PERF_MAX_EVENTS];
 };
 
 struct arc_callchain_trace {
@@ -136,6 +138,18 @@ static int arc_pmu_cache_event(u64 config)
 	return ret;
 }
 
+static int arc_pmu_raw_event(u64 config)
+{
+	int i;
+
+	for (i = 0; i < arc_pmu->n_events; i++) {
+		if (config == arc_pmu->raw_events[i])
+			return i;
+	}
+
+	return -ENOENT;
+}
+
 /* initializes hw_perf_event structure if event is supported */
 static int arc_pmu_event_init(struct perf_event *event)
 {
@@ -159,6 +173,14 @@ static int arc_pmu_event_init(struct perf_event *event)
 			return ret;
 		hwc->config = arc_pmu->ev_hw_idx[ret];
 		return 0;
+
+	case PERF_TYPE_RAW:
+		ret = arc_pmu_raw_event(event->attr.config);
+		if (ret < 0)
+			return ret;
+		hwc->config |= ret;
+		return 0;
+
 	default:
 		return -ENOENT;
 	}
@@ -270,15 +292,15 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 	struct arc_reg_cc_build cc_bcr;
 	int i, j;
 
-	union cc_name {
-		struct {
-			uint32_t word0, word1;
-			char sentinel;
-		} indiv;
-		char str[9];
+	struct cc_name {
+		union {
+			uint32_t word[2];
+			u64	 dword;
+			char	 str[8];
+		} u;
+		char sentinel[8];
 	} cc_name;
 
-
 	READ_BCR(ARC_REG_PCT_BUILD, pct_bcr);
 	if (!pct_bcr.v) {
 		pr_err("This core does not have performance counters!\n");
@@ -288,6 +310,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 
 	READ_BCR(ARC_REG_CC_BUILD, cc_bcr);
 	BUG_ON(!cc_bcr.v); /* Counters exist but No countable conditions ? */
+	BUG_ON(cc_bcr.c > ARC_PERF_MAX_EVENTS);
 
 	arc_pmu = devm_kzalloc(&pdev->dev, sizeof(struct arc_pmu), GFP_KERNEL);
 	if (!arc_pmu)
@@ -299,23 +322,54 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 	pr_info("ARC perf\t: %d counters (%d bits), %d countable conditions\n",
 		arc_pmu->n_counters, arc_pmu->counter_size, cc_bcr.c);
 
-	cc_name.str[8] = 0;
+	arc_pmu->n_events = cc_bcr.c;
+
 	for (i = 0; i < PERF_COUNT_ARC_HW_MAX; i++)
 		arc_pmu->ev_hw_idx[i] = -1;
 
+	cc_name.sentinel[0] = '\0';
+
 	/* loop thru all available h/w condition indexes */
 	for (j = 0; j < cc_bcr.c; j++) {
+		u64 name;
+
 		write_aux_reg(ARC_REG_CC_INDEX, j);
-		cc_name.indiv.word0 = read_aux_reg(ARC_REG_CC_NAME0);
-		cc_name.indiv.word1 = read_aux_reg(ARC_REG_CC_NAME1);
+		cc_name.u.word[0] = read_aux_reg(ARC_REG_CC_NAME0);
+		cc_name.u.word[1] = read_aux_reg(ARC_REG_CC_NAME1);
+
+		/*
+		 * condition name caching for raw events
+		 *
+		 * In PCT register CC_NAME{0,1} event name string[] is saved
+		 * from LSB side:
+		 * e.g. cycles corresponds to "crun" and is saved as 0x6e757263
+		 *						       n u r c
+		 * However in perf cmdline they are specified in human order as
+		 * r6372756e
+		 *
+		 * Thus save a 64bit swapped value for quick cross check at the
+		 * time of raw event request, which will give in example above:
+		 * __swab64(0x000000006e757263) = 0x6372756e00000000.
+		 * And then to finally have 0x6372756e, trim the trailing zeroes
+		 */
+		name = __swab64(cc_name.u.dword);
+
+		/* Trim leading zeroes */
+		for (i = 0; i < sizeof(u64); i++)
+			if (!(name & 0xFF))
+				name = name >> 8;
+			else
+				break;
+
+		arc_pmu->raw_events[j] = name;
 
 		/* See if it has been mapped to a perf event_id */
 		for (i = 0; i < ARRAY_SIZE(arc_pmu_ev_hw_map); i++) {
 			if (arc_pmu_ev_hw_map[i] &&
-			    !strcmp(arc_pmu_ev_hw_map[i], cc_name.str) &&
+			    !strcmp(arc_pmu_ev_hw_map[i], cc_name.u.str) &&
 			    strlen(arc_pmu_ev_hw_map[i])) {
 				pr_debug("mapping perf event %2d to h/w event \'%8s\' (idx %d)\n",
-					 i, cc_name.str, j);
+					 i, cc_name.u.str, j);
 				arc_pmu->ev_hw_idx[i] = j;
 			}
 		}
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/8] ARC: perf: cap the number of counters to hardware max of 32
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 1/8] ARC: perf: support RAW events Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts Alexey Brodkin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Arnaldo Carvalho de Melo, Alexey Brodkin

From: Vineet Gupta <vgupta@synopsys.com>

The number of counters in PCT can never be more than 32 (while countable
conditions could be 100+) for both ARCompact and ARCv2

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

No changes since v1.

 arch/arc/include/asm/perf_event.h | 5 +++--
 arch/arc/kernel/perf_event.c      | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index ea43477..ca8c414 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -1,6 +1,7 @@
 /*
  * Linux performance counter support for ARC
  *
+ * Copyright (C) 2014-2015 Synopsys, Inc. (www.synopsys.com)
  * Copyright (C) 2011-2013 Synopsys, Inc. (www.synopsys.com)
  *
  * This program is free software; you can redistribute it and/or modify
@@ -12,8 +13,8 @@
 #ifndef __ASM_PERF_EVENT_H
 #define __ASM_PERF_EVENT_H
 
-/* real maximum varies per CPU, this is the maximum supported by the driver */
-#define ARC_PMU_MAX_HWEVENTS	64
+/* Max number of counters that PCT block may ever have */
+#define ARC_PERF_MAX_COUNTERS	32
 
 /* Max number of countable events that CPU may have */
 #define ARC_PERF_MAX_EVENTS	256
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index ae4a921..461fccf 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -23,7 +23,7 @@ struct arc_pmu {
 	int		counter_size;	/* in bits */
 	int		n_counters;
 	int		n_events;
-	unsigned long	used_mask[BITS_TO_LONGS(ARC_PMU_MAX_HWEVENTS)];
+	unsigned long	used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
 	int		ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
 	u64             raw_events[ARC_PERF_MAX_EVENTS];
 };
@@ -306,7 +306,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 		pr_err("This core does not have performance counters!\n");
 		return -ENODEV;
 	}
-	BUG_ON(pct_bcr.c > ARC_PMU_MAX_HWEVENTS);
+	BUG_ON(pct_bcr.c > ARC_PERF_MAX_COUNTERS);
 
 	READ_BCR(ARC_REG_CC_BUILD, cc_bcr);
 	BUG_ON(!cc_bcr.v); /* Counters exist but No countable conditions ? */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 1/8] ARC: perf: support RAW events Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 2/8] ARC: perf: cap the number of counters to hardware max of 32 Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-18 17:52   ` Peter Zijlstra
  2015-08-18 17:55   ` Peter Zijlstra
  2015-08-05 15:13 ` [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts Alexey Brodkin
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Alexey Brodkin, Arnaldo Carvalho de Melo

This generalization prepares for support of overflow interrupts.

Hardware event counters on ARC work that way:
Each counter counts from programmed start value (set in
ARC_REG_PCT_COUNT) to a limit value (set in ARC_REG_PCT_INT_CNT) and
once limit value is reached this timer generates an interrupt.

Even though this hardware implementation allows for more flexibility,
in Linux kernel we decided to mimic behavior of other architectures this
way:

 [1] Set limit value as half of counter's max value (to allow counter to
     run after reaching it limit, see below for more explanation):
 ---------->8-----------
 arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
 ---------->8-----------

 [2] Set start value as "arc_pmu->max_period - sample_period" and then
count up to the limit

Our event counters don't stop on reaching max value (the one we set in
ARC_REG_PCT_INT_CNT) but continue to count until kernel explicitly
stops each of them.

And setting a limit as half of counter capacity is done to allow
capturing of additional events in between moment when interrupt was
triggered until we're actually processing PMU interrupts. That way
we're trying to be more precise.

For example if we count CPU cycles we keep track of cycles while
running through generic IRQ handling code:

 [1] We set counter period as say 100_000 events of type "crun"
 [2] Counter reaches that limit and raises its interrupt
 [3] Once we get in PMU IRQ handler we read current counter value from
ARC_REG_PCT_SNAP ans see there something like 105_000.

If counters stop on reaching a limit value then we would miss
additional 5000 cycles.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

Compared to v1:
 [1] Added verbose commit message with explanation of how PCT HW works on ARC
 [2] Simplified arc_perf_event_update()
 [3] Removed check for is_sampling_event() because we already set
     PERF_PMU_CAP_NO_INTERRUPT in probe()
 [4] Minor cosmetics

 arch/arc/kernel/perf_event.c | 81 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 461fccf..2d95440 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -20,10 +20,10 @@
 
 struct arc_pmu {
 	struct pmu	pmu;
-	int		counter_size;	/* in bits */
 	int		n_counters;
 	int		n_events;
 	unsigned long	used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
+	u64		max_period;
 	int		ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
 	u64             raw_events[ARC_PERF_MAX_EVENTS];
 };
@@ -90,18 +90,15 @@ static uint64_t arc_pmu_read_counter(int idx)
 static void arc_perf_event_update(struct perf_event *event,
 				  struct hw_perf_event *hwc, int idx)
 {
-	uint64_t prev_raw_count, new_raw_count;
-	int64_t delta;
-
-	do {
-		prev_raw_count = local64_read(&hwc->prev_count);
-		new_raw_count = arc_pmu_read_counter(idx);
-	} while (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
-				 new_raw_count) != prev_raw_count);
-
-	delta = (new_raw_count - prev_raw_count) &
-		((1ULL << arc_pmu->counter_size) - 1ULL);
+	uint64_t prev_raw_count = local64_read(&hwc->prev_count);
+	uint64_t new_raw_count = arc_pmu_read_counter(idx);
+	int64_t delta = new_raw_count - prev_raw_count;
 
+	/*
+	 * We don't afaraid of hwc->prev_count changing beneath our feet
+	 * because there's no way for us to re-enter this function anytime.
+	 */
+	local64_set(&hwc->prev_count, new_raw_count);
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
 }
@@ -156,6 +153,10 @@ static int arc_pmu_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int ret;
 
+	hwc->sample_period  = arc_pmu->max_period;
+	hwc->last_period = hwc->sample_period;
+	local64_set(&hwc->period_left, hwc->sample_period);
+
 	switch (event->attr.type) {
 	case PERF_TYPE_HARDWARE:
 		if (event->attr.config >= PERF_COUNT_HW_MAX)
@@ -167,6 +168,7 @@ static int arc_pmu_event_init(struct perf_event *event)
 			 (int) event->attr.config, (int) hwc->config,
 			 arc_pmu_ev_hw_map[event->attr.config]);
 		return 0;
+
 	case PERF_TYPE_HW_CACHE:
 		ret = arc_pmu_cache_event(event->attr.config);
 		if (ret < 0)
@@ -202,6 +204,49 @@ static void arc_pmu_disable(struct pmu *pmu)
 	write_aux_reg(ARC_REG_PCT_CONTROL, (tmp & 0xffff0000) | 0x0);
 }
 
+static int arc_pmu_event_set_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+	s64 period = hwc->sample_period;
+	int idx = hwc->idx;
+	int overflow = 0;
+	u64 value;
+
+	if (unlikely(left <= -period)) {
+		/* left underflowed by more than period. */
+		left = period;
+		local64_set(&hwc->period_left, left);
+		hwc->last_period = period;
+		overflow = 1;
+	} else	if (unlikely(left <= 0)) {
+		/* left underflowed by less than period. */
+		left += period;
+		local64_set(&hwc->period_left, left);
+		hwc->last_period = period;
+		overflow = 1;
+	}
+
+	if (left > arc_pmu->max_period) {
+		left = arc_pmu->max_period;
+		local64_set(&hwc->period_left, left);
+	}
+
+	value = arc_pmu->max_period - left;
+	local64_set(&hwc->prev_count, value);
+
+	/* Select counter */
+	write_aux_reg(ARC_REG_PCT_INDEX, idx);
+
+	/* Write value */
+	write_aux_reg(ARC_REG_PCT_COUNTL, (u32)value);
+	write_aux_reg(ARC_REG_PCT_COUNTH, (value >> 32));
+
+	perf_event_update_userpage(event);
+
+	return overflow;
+}
+
 /*
  * Assigns hardware counter to hardware condition.
  * Note that there is no separate start/stop mechanism;
@@ -216,9 +261,11 @@ static void arc_pmu_start(struct perf_event *event, int flags)
 		return;
 
 	if (flags & PERF_EF_RELOAD)
-		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+		WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+
+	hwc->state = 0;
 
-	event->hw.state = 0;
+	arc_pmu_event_set_period(event);
 
 	/* enable ARC pmu here */
 	write_aux_reg(ARC_REG_PCT_INDEX, idx);
@@ -291,6 +338,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 	struct arc_reg_pct_build pct_bcr;
 	struct arc_reg_cc_build cc_bcr;
 	int i, j;
+	int counter_size;	/* in bits */
 
 	struct cc_name {
 		union {
@@ -317,10 +365,11 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	arc_pmu->n_counters = pct_bcr.c;
-	arc_pmu->counter_size = 32 + (pct_bcr.s << 4);
+	counter_size = 32 + (pct_bcr.s << 4);
+	arc_pmu->max_period = (1ULL << counter_size) - 1ULL;
 
 	pr_info("ARC perf\t: %d counters (%d bits), %d countable conditions\n",
-		arc_pmu->n_counters, arc_pmu->counter_size, cc_bcr.c);
+		arc_pmu->n_counters, counter_size, cc_bcr.c);
 
 	arc_pmu->n_events = cc_bcr.c;
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (2 preceding siblings ...)
  2015-08-05 15:13 ` [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-18 22:12   ` Peter Zijlstra
  2015-08-05 15:13 ` [PATCH v2 5/8] ARCv2: perf: set usable max period as a half of real max period Alexey Brodkin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Alexey Brodkin, Arnaldo Carvalho de Melo

In times of ARC 700 performance counters didn't have support of
interrupt an so for ARC we only had support of non-sampling events.

Put simply only "perf stat" was functional.

Now with ARC HS we have support of interrupts in performance counters
which this change introduces support of.

ARC performance counters act in the following way in regard of
interrupts generation.
 [1] A counter counts starting from value set in PCT_COUNT register pair
 [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised

Basic setup look like this:
 [1] PCT_COUNT = 0;
 [2] PCT_INT_CNT = __limit_value__;
 [3] Enable interrupts for that counter and let it run
 [4] Let counter reach its limit
 [5] Handle interrupt when it happens

Note that PCT HW block is build in CPU core and so ints interrupt
line (which is basically OR of all counters IRQs) is wired directly to
top-level IRQC. That means do de-assert PCT interrupt it's required to
reset IRQs from all counters that have reached their limit values.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

Compared to v1:
 [1] Added commit message
 [2] Removed check for is_sampling_event() because we already set
     PERF_PMU_CAP_NO_INTERRUPT in probe()
 [3] Minor cosmetics

 arch/arc/include/asm/perf_event.h |   8 ++-
 arch/arc/kernel/perf_event.c      | 127 +++++++++++++++++++++++++++++++++++---
 2 files changed, 125 insertions(+), 10 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index ca8c414..33a6eb2 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -32,15 +32,19 @@
 #define ARC_REG_PCT_CONFIG	0x254
 #define ARC_REG_PCT_CONTROL	0x255
 #define ARC_REG_PCT_INDEX	0x256
+#define ARC_REG_PCT_INT_CNTL	0x25C
+#define ARC_REG_PCT_INT_CNTH	0x25D
+#define ARC_REG_PCT_INT_CTRL	0x25E
+#define ARC_REG_PCT_INT_ACT	0x25F
 
 #define ARC_REG_PCT_CONTROL_CC	(1 << 16)	/* clear counts */
 #define ARC_REG_PCT_CONTROL_SN	(1 << 17)	/* snapshot */
 
 struct arc_reg_pct_build {
 #ifdef CONFIG_CPU_BIG_ENDIAN
-	unsigned int m:8, c:8, r:6, s:2, v:8;
+	unsigned int m:8, c:8, r:5, i:1, s:2, v:8;
 #else
-	unsigned int v:8, s:2, r:6, c:8, m:8;
+	unsigned int v:8, s:2, i:1, r:5, c:8, m:8;
 #endif
 };
 
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 2d95440..1a9f922 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -11,6 +11,7 @@
  *
  */
 #include <linux/errno.h>
+#include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/perf_event.h>
@@ -25,6 +26,7 @@ struct arc_pmu {
 	unsigned long	used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
 	u64		max_period;
 	int		ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
+	struct perf_event *act_counter[ARC_PERF_MAX_COUNTERS];
 	u64             raw_events[ARC_PERF_MAX_EVENTS];
 };
 
@@ -153,9 +155,11 @@ static int arc_pmu_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int ret;
 
-	hwc->sample_period  = arc_pmu->max_period;
-	hwc->last_period = hwc->sample_period;
-	local64_set(&hwc->period_left, hwc->sample_period);
+	if (!is_sampling_event(event)) {
+		hwc->sample_period  = arc_pmu->max_period;
+		hwc->last_period = hwc->sample_period;
+		local64_set(&hwc->period_left, hwc->sample_period);
+	}
 
 	switch (event->attr.type) {
 	case PERF_TYPE_HARDWARE:
@@ -277,6 +281,17 @@ static void arc_pmu_stop(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
+	/* Disable interrupt for this counter */
+	if (is_sampling_event(event)) {
+		/*
+		 * Reset interrupt flag by writing of 1. This is required
+		 * to make sure pending interrupt was not left.
+		 */
+		write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
+		write_aux_reg(ARC_REG_PCT_INT_CTRL,
+			      read_aux_reg(ARC_REG_PCT_INT_CTRL) & ~(1 << idx));
+	}
+
 	if (!(event->hw.state & PERF_HES_STOPPED)) {
 		/* stop ARC pmu here */
 		write_aux_reg(ARC_REG_PCT_INDEX, idx);
@@ -299,6 +314,8 @@ static void arc_pmu_del(struct perf_event *event, int flags)
 	arc_pmu_stop(event, PERF_EF_UPDATE);
 	__clear_bit(event->hw.idx, arc_pmu->used_mask);
 
+	arc_pmu->act_counter[event->hw.idx] = 0;
+
 	perf_event_update_userpage(event);
 }
 
@@ -319,6 +336,20 @@ static int arc_pmu_add(struct perf_event *event, int flags)
 	}
 
 	write_aux_reg(ARC_REG_PCT_INDEX, idx);
+
+	arc_pmu->act_counter[idx] = event;
+
+	if (is_sampling_event(event)) {
+		/* Mimic full counter overflow as other arches do */
+		write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
+		write_aux_reg(ARC_REG_PCT_INT_CNTH,
+			      (arc_pmu->max_period >> 32));
+
+		/* Enable interrupt for this counter */
+		write_aux_reg(ARC_REG_PCT_INT_CTRL,
+			      read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
+	}
+
 	write_aux_reg(ARC_REG_PCT_CONFIG, 0);
 	write_aux_reg(ARC_REG_PCT_COUNTL, 0);
 	write_aux_reg(ARC_REG_PCT_COUNTH, 0);
@@ -333,11 +364,70 @@ static int arc_pmu_add(struct perf_event *event, int flags)
 	return 0;
 }
 
+#ifdef CONFIG_ISA_ARCV2
+static irqreturn_t arc_pmu_intr(int irq, void *dev)
+{
+	struct perf_sample_data data;
+	struct arc_pmu *arc_pmu = (struct arc_pmu *)dev;
+	struct pt_regs *regs;
+	int active_ints;
+	int idx;
+
+	arc_pmu_disable(&arc_pmu->pmu);
+
+	active_ints = read_aux_reg(ARC_REG_PCT_INT_ACT);
+
+	regs = get_irq_regs();
+
+	for (idx = 0; idx < arc_pmu->n_counters; idx++) {
+		struct perf_event *event = arc_pmu->act_counter[idx];
+		struct hw_perf_event *hwc;
+
+		if (!(active_ints & (1 << idx)))
+			continue;
+
+		/* Reset interrupt flag by writing of 1 */
+		write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
+
+		/*
+		 * On reset of "interrupt active" bit corresponding
+		 * "interrupt enable" bit gets automatically reset as well.
+		 * Now we need to re-enable interrupt for the counter.
+		 */
+		write_aux_reg(ARC_REG_PCT_INT_CTRL,
+			read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
+
+		hwc = &event->hw;
+
+		WARN_ON_ONCE(hwc->idx != idx);
+
+		arc_perf_event_update(event, &event->hw, event->hw.idx);
+		perf_sample_data_init(&data, 0, hwc->last_period);
+		if (!arc_pmu_event_set_period(event))
+			continue;
+
+		if (perf_event_overflow(event, &data, regs))
+			arc_pmu_stop(event, 0);
+	}
+
+	arc_pmu_enable(&arc_pmu->pmu);
+
+	return IRQ_HANDLED;
+}
+#else
+
+static irqreturn_t arc_pmu_intr(int irq, void *dev)
+{
+	return IRQ_NONE;
+}
+
+#endif /* CONFIG_ISA_ARCV2 */
+
 static int arc_pmu_device_probe(struct platform_device *pdev)
 {
 	struct arc_reg_pct_build pct_bcr;
 	struct arc_reg_cc_build cc_bcr;
-	int i, j;
+	int i, j, has_interrupts;
 	int counter_size;	/* in bits */
 
 	struct cc_name {
@@ -364,12 +454,16 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 	if (!arc_pmu)
 		return -ENOMEM;
 
+	has_interrupts = is_isa_arcv2() ? pct_bcr.i : 0;
+
 	arc_pmu->n_counters = pct_bcr.c;
 	counter_size = 32 + (pct_bcr.s << 4);
+
 	arc_pmu->max_period = (1ULL << counter_size) - 1ULL;
 
-	pr_info("ARC perf\t: %d counters (%d bits), %d countable conditions\n",
-		arc_pmu->n_counters, counter_size, cc_bcr.c);
+	pr_info("ARC perf\t: %d counters (%d bits), %d conditions%s\n",
+		arc_pmu->n_counters, counter_size, cc_bcr.c,
+		has_interrupts ? ", [overflow IRQ support]":"");
 
 	arc_pmu->n_events = cc_bcr.c;
 
@@ -435,8 +529,25 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 		.read		= arc_pmu_read,
 	};
 
-	/* ARC 700 PMU does not support sampling events */
-	arc_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+	if (has_interrupts) {
+		int irq = platform_get_irq(pdev, 0);
+
+		if (irq < 0) {
+			pr_err("Cannot get IRQ number for the platform\n");
+			return -ENODEV;
+		}
+
+		ret = devm_request_irq(&pdev->dev, irq, arc_pmu_intr, 0,
+				       "arc-pmu", arc_pmu);
+		if (ret) {
+			pr_err("could not allocate PMU IRQ\n");
+			return ret;
+		}
+
+		/* Clean all pending interrupt flags */
+		write_aux_reg(ARC_REG_PCT_INT_ACT, 0xffffffff);
+	} else
+		arc_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
 
 	return perf_pmu_register(&arc_pmu->pmu, pdev->name, PERF_TYPE_RAW);
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/8] ARCv2: perf: set usable max period as a half of real max period
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (3 preceding siblings ...)
  2015-08-05 15:13 ` [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode Alexey Brodkin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Alexey Brodkin, Arnaldo Carvalho de Melo

Overflow interrupt happens when counter reaches a limit which we set as a
maximum value of the counter.

But for better precision counter continues registration of assigned events
even after reaching pre-defined limit. To not really overlap we leave half
of the counter values free.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

No changes since v1.

 arch/arc/kernel/perf_event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 1a9f922..81847ef 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -459,7 +459,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 	arc_pmu->n_counters = pct_bcr.c;
 	counter_size = 32 + (pct_bcr.s << 4);
 
-	arc_pmu->max_period = (1ULL << counter_size) - 1ULL;
+	arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
 
 	pr_info("ARC perf\t: %d counters (%d bits), %d conditions%s\n",
 		arc_pmu->n_counters, counter_size, cc_bcr.c,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (4 preceding siblings ...)
  2015-08-05 15:13 ` [PATCH v2 5/8] ARCv2: perf: set usable max period as a half of real max period Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-18 23:37   ` Peter Zijlstra
  2015-08-05 15:13 ` [PATCH v2 7/8] ARCv2: perf: SMP support Alexey Brodkin
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Alexey Brodkin, Arnaldo Carvalho de Melo

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

No changes since v1.

 arch/arc/include/asm/perf_event.h |  3 +++
 arch/arc/kernel/perf_event.c      | 16 ++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 33a6eb2..e915c0d 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -37,6 +37,9 @@
 #define ARC_REG_PCT_INT_CTRL	0x25E
 #define ARC_REG_PCT_INT_ACT	0x25F
 
+#define ARC_REG_PCT_CONFIG_USER	(1 << 18)	/* count in user mode */
+#define ARC_REG_PCT_CONFIG_KERN	(1 << 19)	/* count in kernel mode */
+
 #define ARC_REG_PCT_CONTROL_CC	(1 << 16)	/* clear counts */
 #define ARC_REG_PCT_CONTROL_SN	(1 << 17)	/* snapshot */
 
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 81847ef..3203141 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -161,13 +161,25 @@ static int arc_pmu_event_init(struct perf_event *event)
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
 
+	hwc->config = 0;
+
+	if (is_isa_arcv2()) {
+		/* "exclude user" means "count only kernel" */
+		if (event->attr.exclude_user)
+			hwc->config |= ARC_REG_PCT_CONFIG_KERN;
+
+		/* "exclude kernel" means "count only user" */
+		if (event->attr.exclude_kernel)
+			hwc->config |= ARC_REG_PCT_CONFIG_USER;
+	}
+
 	switch (event->attr.type) {
 	case PERF_TYPE_HARDWARE:
 		if (event->attr.config >= PERF_COUNT_HW_MAX)
 			return -ENOENT;
 		if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
 			return -ENOENT;
-		hwc->config = arc_pmu->ev_hw_idx[event->attr.config];
+		hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];
 		pr_debug("init event %d with h/w %d \'%s\'\n",
 			 (int) event->attr.config, (int) hwc->config,
 			 arc_pmu_ev_hw_map[event->attr.config]);
@@ -177,7 +189,7 @@ static int arc_pmu_event_init(struct perf_event *event)
 		ret = arc_pmu_cache_event(event->attr.config);
 		if (ret < 0)
 			return ret;
-		hwc->config = arc_pmu->ev_hw_idx[ret];
+		hwc->config |= arc_pmu->ev_hw_idx[ret];
 		return 0;
 
 	case PERF_TYPE_RAW:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 7/8] ARCv2: perf: SMP support
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (5 preceding siblings ...)
  2015-08-05 15:13 ` [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-05 15:13 ` [PATCH v2 8/8] ARCv2: perf: Finally introduce HS perf unit Alexey Brodkin
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Alexey Brodkin, Arnaldo Carvalho de Melo

* split off pmu info into singleton and per-cpu bits
* setup PMU on all cores

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

Compared to v1:
 [1] Rebase on top of previos patches hence changes in patch itself
 [2] Cosmetics

 arch/arc/kernel/perf_event.c | 71 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 55 insertions(+), 16 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 3203141..008fa58 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -21,13 +21,25 @@
 
 struct arc_pmu {
 	struct pmu	pmu;
+	unsigned int	irq;
 	int		n_counters;
 	int		n_events;
-	unsigned long	used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
 	u64		max_period;
 	int		ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
+	u64		raw_events[ARC_PERF_MAX_EVENTS];
+};
+
+struct arc_pmu_cpu {
+	/*
+	 * A 1 bit for an index indicates that the counter is being used for
+	 * an event. A 0 means that the counter can be used.
+	 */
+	unsigned long	used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
+
+	/*
+	 * The events that are active on the PMU for the given index.
+	 */
 	struct perf_event *act_counter[ARC_PERF_MAX_COUNTERS];
-	u64             raw_events[ARC_PERF_MAX_EVENTS];
 };
 
 struct arc_callchain_trace {
@@ -69,6 +81,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 }
 
 static struct arc_pmu *arc_pmu;
+static DEFINE_PER_CPU(struct arc_pmu_cpu, arc_pmu_cpu);
 
 /* read counter #idx; note that counter# != event# on ARC! */
 static uint64_t arc_pmu_read_counter(int idx)
@@ -323,10 +336,12 @@ static void arc_pmu_stop(struct perf_event *event, int flags)
 
 static void arc_pmu_del(struct perf_event *event, int flags)
 {
+	struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
+
 	arc_pmu_stop(event, PERF_EF_UPDATE);
-	__clear_bit(event->hw.idx, arc_pmu->used_mask);
+	__clear_bit(event->hw.idx, pmu_cpu->used_mask);
 
-	arc_pmu->act_counter[event->hw.idx] = 0;
+	pmu_cpu->act_counter[event->hw.idx] = 0;
 
 	perf_event_update_userpage(event);
 }
@@ -334,22 +349,23 @@ static void arc_pmu_del(struct perf_event *event, int flags)
 /* allocate hardware counter and optionally start counting */
 static int arc_pmu_add(struct perf_event *event, int flags)
 {
+	struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (__test_and_set_bit(idx, arc_pmu->used_mask)) {
-		idx = find_first_zero_bit(arc_pmu->used_mask,
+	if (__test_and_set_bit(idx, pmu_cpu->used_mask)) {
+		idx = find_first_zero_bit(pmu_cpu->used_mask,
 					  arc_pmu->n_counters);
 		if (idx == arc_pmu->n_counters)
 			return -EAGAIN;
 
-		__set_bit(idx, arc_pmu->used_mask);
+		__set_bit(idx, pmu_cpu->used_mask);
 		hwc->idx = idx;
 	}
 
 	write_aux_reg(ARC_REG_PCT_INDEX, idx);
 
-	arc_pmu->act_counter[idx] = event;
+	pmu_cpu->act_counter[idx] = event;
 
 	if (is_sampling_event(event)) {
 		/* Mimic full counter overflow as other arches do */
@@ -380,7 +396,7 @@ static int arc_pmu_add(struct perf_event *event, int flags)
 static irqreturn_t arc_pmu_intr(int irq, void *dev)
 {
 	struct perf_sample_data data;
-	struct arc_pmu *arc_pmu = (struct arc_pmu *)dev;
+	struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
 	struct pt_regs *regs;
 	int active_ints;
 	int idx;
@@ -392,7 +408,7 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
 	regs = get_irq_regs();
 
 	for (idx = 0; idx < arc_pmu->n_counters; idx++) {
-		struct perf_event *event = arc_pmu->act_counter[idx];
+		struct perf_event *event = pmu_cpu->act_counter[idx];
 		struct hw_perf_event *hwc;
 
 		if (!(active_ints & (1 << idx)))
@@ -435,6 +451,17 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
 
 #endif /* CONFIG_ISA_ARCV2 */
 
+void arc_cpu_pmu_irq_init(void)
+{
+	struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
+
+	arc_request_percpu_irq(arc_pmu->irq, smp_processor_id(), arc_pmu_intr,
+			       "ARC perf counters", pmu_cpu);
+
+	/* Clear all pending interrupt flags */
+	write_aux_reg(ARC_REG_PCT_INT_ACT, 0xffffffff);
+}
+
 static int arc_pmu_device_probe(struct platform_device *pdev)
 {
 	struct arc_reg_pct_build pct_bcr;
@@ -543,18 +570,30 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 
 	if (has_interrupts) {
 		int irq = platform_get_irq(pdev, 0);
+		unsigned long flags;
 
 		if (irq < 0) {
 			pr_err("Cannot get IRQ number for the platform\n");
 			return -ENODEV;
 		}
 
-		ret = devm_request_irq(&pdev->dev, irq, arc_pmu_intr, 0,
-				       "arc-pmu", arc_pmu);
-		if (ret) {
-			pr_err("could not allocate PMU IRQ\n");
-			return ret;
-		}
+		arc_pmu->irq = irq;
+
+		/*
+		 * arc_cpu_pmu_irq_init() needs to be called on all cores for
+		 * their respective local PMU.
+		 * However we use opencoded on_each_cpu() to ensure it is called
+		 * on core0 first, so that arc_request_percpu_irq() sets up
+		 * AUTOEN etc. Otherwise enable_percpu_irq() fails to enable
+		 * perf IRQ on non master cores.
+		 * see arc_request_percpu_irq()
+		 */
+		preempt_disable();
+		local_irq_save(flags);
+		arc_cpu_pmu_irq_init();
+		local_irq_restore(flags);
+		smp_call_function((smp_call_func_t)arc_cpu_pmu_irq_init, 0, 1);
+		preempt_enable();
 
 		/* Clean all pending interrupt flags */
 		write_aux_reg(ARC_REG_PCT_INT_ACT, 0xffffffff);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 8/8] ARCv2: perf: Finally introduce HS perf unit
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (6 preceding siblings ...)
  2015-08-05 15:13 ` [PATCH v2 7/8] ARCv2: perf: SMP support Alexey Brodkin
@ 2015-08-05 15:13 ` Alexey Brodkin
  2015-08-10  8:29 ` [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
  2015-08-14  7:41 ` [arc-linux-dev] " Vineet Gupta
  9 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-05 15:13 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, peterz,
	Arnaldo Carvalho de Melo, Alexey Brodkin

From: Vineet Gupta <vgupta@synopsys.com>

With all features in place, the ARC HS pct block can now be effectively
allowed to be probed/used

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
---

Compared to v1:
 [1] MAINTAINERS file updated to cover new file.

 Documentation/devicetree/bindings/arc/archs-pct.txt | 17 +++++++++++++++++
 MAINTAINERS                                         |  2 +-
 arch/arc/include/asm/perf_event.h                   |  5 ++++-
 arch/arc/kernel/perf_event.c                        |  3 ++-
 4 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt

diff --git a/Documentation/devicetree/bindings/arc/archs-pct.txt b/Documentation/devicetree/bindings/arc/archs-pct.txt
new file mode 100644
index 0000000..1ae98b8
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/archs-pct.txt
@@ -0,0 +1,17 @@
+* ARC HS Performance Counters
+
+The ARC HS can be configured with a pipeline performance monitor for counting
+CPU and cache events like cache misses and hits. Like conventional PCT there
+are 100+ hardware conditions dynamically mapped to upto 32 counters.
+It also supports overflow interrupts.
+
+Required properties:
+
+- compatible : should contain
+	"snps,archs-pct"
+
+Example:
+
+pmu {
+        compatible = "snps,archs-pct";
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index a226416..ec9e729 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9865,7 +9865,7 @@ SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
 F:	arch/arc/
-F:	Documentation/devicetree/bindings/arc/
+F:	Documentation/devicetree/bindings/arc/*
 F:	drivers/tty/serial/arc_uart.c
 
 SYNOPSYS ARC SDP platform support
diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index e915c0d..e4d7587 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -108,8 +108,11 @@ static const char * const arc_pmu_ev_hw_map[] = {
 	[PERF_COUNT_HW_INSTRUCTIONS] = "iall",
 	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmp",
 	[PERF_COUNT_ARC_BPOK]         = "bpok",	  /* NP-NT, PT-T, PNT-NT */
+#ifdef CONFIG_ISA_ARCV2
+	[PERF_COUNT_HW_BRANCH_MISSES] = "bpmp",
+#else
 	[PERF_COUNT_HW_BRANCH_MISSES] = "bpfail", /* NP-T, PT-NT, PNT-T */
-
+#endif
 	[PERF_COUNT_ARC_LDC] = "imemrdc",	/* Instr: mem read cached */
 	[PERF_COUNT_ARC_STC] = "imemwrc",	/* Instr: mem write cached */
 
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 008fa58..d1dba1c 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -606,6 +606,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 #ifdef CONFIG_OF
 static const struct of_device_id arc_pmu_match[] = {
 	{ .compatible = "snps,arc700-pct" },
+	{ .compatible = "snps,archs-pct" },
 	{},
 };
 MODULE_DEVICE_TABLE(of, arc_pmu_match);
@@ -613,7 +614,7 @@ MODULE_DEVICE_TABLE(of, arc_pmu_match);
 
 static struct platform_driver arc_pmu_driver = {
 	.driver	= {
-		.name		= "arc700-pct",
+		.name		= "arc-pct",
 		.of_match_table = of_match_ptr(arc_pmu_match),
 	},
 	.probe		= arc_pmu_device_probe,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/8] ARCv2 port to Linux - (C) perf
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (7 preceding siblings ...)
  2015-08-05 15:13 ` [PATCH v2 8/8] ARCv2: perf: Finally introduce HS perf unit Alexey Brodkin
@ 2015-08-10  8:29 ` Alexey Brodkin
  2015-08-14  7:41 ` [arc-linux-dev] " Vineet Gupta
  9 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-10  8:29 UTC (permalink / raw)
  To: peterz; +Cc: linux-kernel, linux-arch, Vineet Gupta, arc-linux-dev, arnd

Hi Peter,

On Wed, 2015-08-05 at 18:13 +0300, Alexey Brodkin wrote:
> Hi Peter,
> 
> This mini-series adds perf support for ARCv2 based cores, which brings in
> overflow interupts and SMP. Additionally now raw events are supported as well.
> 
> Please review !
> 
> Compared to v1 this series has:
>  [1] Addressed review comments
>  [2] More verbose commit messages and comments in sources
>  [3] Minor cosmetics
> 
> Thanks,
> Alexey
> 
> 
> Alexey Brodkin (6):
>   ARC: perf: support RAW events
>   ARCv2: perf: implement "event_set_period" for future use with
>     interrupts
>   ARCv2: perf: Support sampling events using overflow interrupts
>   ARCv2: perf: set usable max period as a half of real max period
>   ARCv2: perf: implement exclusion of event counting in user or kernel
>     mode
>   ARCv2: perf: SMP support
> 
> Vineet Gupta (2):
>   ARC: perf: cap the number of counters to hardware max of 32
>   ARCv2: perf: Finally introduce HS perf unit
> 
>  .../devicetree/bindings/arc/archs-pct.txt          |  17 +
>  MAINTAINERS                                        |   2 +-
>  arch/arc/include/asm/perf_event.h                  |  24 +-
>  arch/arc/kernel/perf_event.c                       | 350 ++++++++++++++++++---
>  4 files changed, 345 insertions(+), 48 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt
> 

Any chance for this series respin to be reviewed sometime soon?

-Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [arc-linux-dev] [PATCH v2 0/8] ARCv2 port to Linux - (C) perf
  2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
                   ` (8 preceding siblings ...)
  2015-08-10  8:29 ` [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
@ 2015-08-14  7:41 ` Vineet Gupta
  9 siblings, 0 replies; 20+ messages in thread
From: Vineet Gupta @ 2015-08-14  7:41 UTC (permalink / raw)
  To: peterz; +Cc: arc-linux-dev, linux-arch, linux-kernel, Alexey Brodkin

On Wednesday 05 August 2015 08:44 PM, Alexey Brodkin wrote:
> Hi Peter,
>
> This mini-series adds perf support for ARCv2 based cores, which brings in
> overflow interupts and SMP. Additionally now raw events are supported as well.
>
> Please review !
>
> Compared to v1 this series has:
>  [1] Addressed review comments
>  [2] More verbose commit messages and comments in sources
>  [3] Minor cosmetics
>
> Thanks,
> Alexey

Hi Peter,

Can u please skim thru these any time soon. Merge window is drawing nearer and it
would be nice to have perf for ARCv2 based cores in there.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
  2015-08-05 15:13 ` [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts Alexey Brodkin
@ 2015-08-18 17:52   ` Peter Zijlstra
  2015-08-18 18:03     ` Alexey Brodkin
  2015-08-20 10:46     ` Alexey Brodkin
  2015-08-18 17:55   ` Peter Zijlstra
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2015-08-18 17:52 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-arch, linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo

On Wed, Aug 05, 2015 at 06:13:29PM +0300, Alexey Brodkin wrote:
> Even though this hardware implementation allows for more flexibility,
> in Linux kernel we decided to mimic behavior of other architectures this
> way:
> 
>  [1] Set limit value as half of counter's max value (to allow counter to
>      run after reaching it limit, see below for more explanation):
>  ---------->8-----------
>  arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
>  ---------->8-----------

> @@ -317,10 +365,11 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
>  		return -ENOMEM;
>  
>  	arc_pmu->n_counters = pct_bcr.c;
> -	arc_pmu->counter_size = 32 + (pct_bcr.s << 4);
> +	counter_size = 32 + (pct_bcr.s << 4);
> +	arc_pmu->max_period = (1ULL << counter_size) - 1ULL;
>  

I don't see that /2 there..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
  2015-08-05 15:13 ` [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts Alexey Brodkin
  2015-08-18 17:52   ` Peter Zijlstra
@ 2015-08-18 17:55   ` Peter Zijlstra
  2015-08-20 11:25     ` Alexey Brodkin
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2015-08-18 17:55 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-arch, linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo

On Wed, Aug 05, 2015 at 06:13:29PM +0300, Alexey Brodkin wrote:
> +static int arc_pmu_event_set_period(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	s64 left = local64_read(&hwc->period_left);
> +	s64 period = hwc->sample_period;
> +	int idx = hwc->idx;
> +	int overflow = 0;
> +	u64 value;
> +
> +	if (unlikely(left <= -period)) {
> +		/* left underflowed by more than period. */
> +		left = period;
> +		local64_set(&hwc->period_left, left);
> +		hwc->last_period = period;
> +		overflow = 1;
> +	} else	if (unlikely(left <= 0)) {
> +		/* left underflowed by less than period. */
> +		left += period;
> +		local64_set(&hwc->period_left, left);
> +		hwc->last_period = period;
> +		overflow = 1;
> +	}
> +
> +	if (left > arc_pmu->max_period) {
> +		left = arc_pmu->max_period;
> +		local64_set(&hwc->period_left, left);

Given that you set counter_size to 32+bct_bcr.s << 4, I'm assuming these
counters are not 64bit wide (or at least the hardware has the option of
not being full width).

That means this local64_set() is wrong.

The purpose here is to emulate a longer period with a short counter. So
even though we have to take the interrupt to observe the counter width
overflow and reprogram, we must not decrease the @left value.

Doing so will trigger one of the above two cases and result in @overflow
== 1, even though we've not actually had hwc->sample_period counts.

> +	}
> +
> +	value = arc_pmu->max_period - left;
> +	local64_set(&hwc->prev_count, value);
> +
> +	/* Select counter */
> +	write_aux_reg(ARC_REG_PCT_INDEX, idx);
> +
> +	/* Write value */
> +	write_aux_reg(ARC_REG_PCT_COUNTL, (u32)value);
> +	write_aux_reg(ARC_REG_PCT_COUNTH, (value >> 32));
> +
> +	perf_event_update_userpage(event);
> +
> +	return overflow;
> +}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
  2015-08-18 17:52   ` Peter Zijlstra
@ 2015-08-18 18:03     ` Alexey Brodkin
  2015-08-20 10:46     ` Alexey Brodkin
  1 sibling, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-18 18:03 UTC (permalink / raw)
  To: peterz; +Cc: linux-arch, linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd, acme

Hi Peter,

On Tue, 2015-08-18 at 19:52 +0200, Peter Zijlstra wrote:
> On Wed, Aug 05, 2015 at 06:13:29PM +0300, Alexey Brodkin wrote:
> > Even though this hardware implementation allows for more flexibility,
> > in Linux kernel we decided to mimic behavior of other architectures this
> > way:
> > 
> >  [1] Set limit value as half of counter's max value (to allow counter to
> >      run after reaching it limit, see below for more explanation):
> >  ---------->8-----------
> >  arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
> >  ---------->8-----------
> 
> > @@ -317,10 +365,11 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
> >  		return -ENOMEM;
> >  
> >  	arc_pmu->n_counters = pct_bcr.c;
> > -	arc_pmu->counter_size = 32 + (pct_bcr.s << 4);
> > +	counter_size = 32 + (pct_bcr.s << 4);
> > +	arc_pmu->max_period = (1ULL << counter_size) - 1ULL;
> >  
> 
> I don't see that /2 there..

My comment was a bit too early.
That "/2" was actually introduced in the subsequent commit.

Do you think I need to do a re-spin of that patch with commit
message which matches real code (i.e. no "/2")?

-Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts
  2015-08-05 15:13 ` [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts Alexey Brodkin
@ 2015-08-18 22:12   ` Peter Zijlstra
  2015-08-20 11:30     ` Alexey Brodkin
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2015-08-18 22:12 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-arch, linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo

On Wed, Aug 05, 2015 at 06:13:30PM +0300, Alexey Brodkin wrote:
> @@ -319,6 +336,20 @@ static int arc_pmu_add(struct perf_event *event, int flags)
>  	}
>  
>  	write_aux_reg(ARC_REG_PCT_INDEX, idx);
> +
> +	arc_pmu->act_counter[idx] = event;
> +
> +	if (is_sampling_event(event)) {
> +		/* Mimic full counter overflow as other arches do */
> +		write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
> +		write_aux_reg(ARC_REG_PCT_INT_CNTH,
> +			      (arc_pmu->max_period >> 32));
> +
> +		/* Enable interrupt for this counter */
> +		write_aux_reg(ARC_REG_PCT_INT_CTRL,
> +			      read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
> +	}

*confused* pmu::add should only start on flags & PERF_EF_START, and then
we start with hwc->sample_period, not the max_period.

> +
>  	write_aux_reg(ARC_REG_PCT_CONFIG, 0);
>  	write_aux_reg(ARC_REG_PCT_COUNTL, 0);
>  	write_aux_reg(ARC_REG_PCT_COUNTH, 0);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode
  2015-08-05 15:13 ` [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode Alexey Brodkin
@ 2015-08-18 23:37   ` Peter Zijlstra
  2015-08-20 11:33     ` Alexey Brodkin
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2015-08-18 23:37 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-arch, linux-kernel, Vineet.Gupta1, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo

On Wed, Aug 05, 2015 at 06:13:32PM +0300, Alexey Brodkin wrote:
> +	hwc->config = 0;
> +
> +	if (is_isa_arcv2()) {
> +		/* "exclude user" means "count only kernel" */
> +		if (event->attr.exclude_user)
> +			hwc->config |= ARC_REG_PCT_CONFIG_KERN;
> +
> +		/* "exclude kernel" means "count only user" */
> +		if (event->attr.exclude_kernel)
> +			hwc->config |= ARC_REG_PCT_CONFIG_USER;
> +	}
> +
>  	switch (event->attr.type) {
>  	case PERF_TYPE_HARDWARE:
>  		if (event->attr.config >= PERF_COUNT_HW_MAX)
>  			return -ENOENT;
>  		if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
>  			return -ENOENT;
> -		hwc->config = arc_pmu->ev_hw_idx[event->attr.config];
> +		hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];

So I would still very much like perf_event_attr::config to reflect the
value you'll program into hardware.

If you want to do that weird 4 character lookup thing, use a special
hardware event (possibly 0 if that is not a valid value), and stuff the
4 chars in ::config1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
  2015-08-18 17:52   ` Peter Zijlstra
  2015-08-18 18:03     ` Alexey Brodkin
@ 2015-08-20 10:46     ` Alexey Brodkin
  1 sibling, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-20 10:46 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: linux-arch, linux-kernel, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo, Vineet Gupta

Hi Peter,

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: 18 августа 2015 г. 20:52
> To: Alexey Brodkin
> Cc: linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org; Vineet.Gupta1@synopsys.com; arc-linux-dev@synopsys.com;
> arnd@arndb.de; Arnaldo Carvalho de Melo
> Subject: Re: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
> 
> On Wed, Aug 05, 2015 at 06:13:29PM +0300, Alexey Brodkin wrote:
> > Even though this hardware implementation allows for more flexibility,
> > in Linux kernel we decided to mimic behavior of other architectures this
> > way:
> >
> >  [1] Set limit value as half of counter's max value (to allow counter to
> >      run after reaching it limit, see below for more explanation):
> >  ---------->8-----------
> >  arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
> >  ---------->8-----------
> 
> > @@ -317,10 +365,11 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
> >  		return -ENOMEM;
> >
> >  	arc_pmu->n_counters = pct_bcr.c;
> > -	arc_pmu->counter_size = 32 + (pct_bcr.s << 4);
> > +	counter_size = 32 + (pct_bcr.s << 4);
> > +	arc_pmu->max_period = (1ULL << counter_size) - 1ULL;
> >
> 
> I don't see that /2 there..

Probably another good solution for that coupl be merging of 2 following patches in one:
 [1]  [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
 [2] [PATCH v2 5/8] ARCv2: perf: set usable max period as a half of real max period

And then comment will become valid.

-Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
  2015-08-18 17:55   ` Peter Zijlstra
@ 2015-08-20 11:25     ` Alexey Brodkin
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-20 11:25 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: linux-arch, linux-kernel, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo, Vineet Gupta

Hi Peter,

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: 18 августа 2015 г. 20:55
> To: Alexey Brodkin
> Cc: linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org; Vineet.Gupta1@synopsys.com; arc-linux-dev@synopsys.com;
> arnd@arndb.de; Arnaldo Carvalho de Melo
> Subject: Re: [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts
> 
> On Wed, Aug 05, 2015 at 06:13:29PM +0300, Alexey Brodkin wrote:
> > +static int arc_pmu_event_set_period(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	s64 left = local64_read(&hwc->period_left);
> > +	s64 period = hwc->sample_period;
> > +	int idx = hwc->idx;
> > +	int overflow = 0;
> > +	u64 value;
> > +
> > +	if (unlikely(left <= -period)) {
> > +		/* left underflowed by more than period. */
> > +		left = period;
> > +		local64_set(&hwc->period_left, left);
> > +		hwc->last_period = period;
> > +		overflow = 1;
> > +	} else	if (unlikely(left <= 0)) {
> > +		/* left underflowed by less than period. */
> > +		left += period;
> > +		local64_set(&hwc->period_left, left);
> > +		hwc->last_period = period;
> > +		overflow = 1;
> > +	}
> > +
> > +	if (left > arc_pmu->max_period) {
> > +		left = arc_pmu->max_period;
> > +		local64_set(&hwc->period_left, left);
> 
> Given that you set counter_size to 32+bct_bcr.s << 4, I'm assuming these
> counters are not 64bit wide (or at least the hardware has the option of
> not being full width).

Indeed our counters could be 32/48(default)/64 bits wide.
 
> That means this local64_set() is wrong.

You mean the one used for setting "hwc->period_left"?

> The purpose here is to emulate a longer period with a short counter. So
> even though we have to take the interrupt to observe the counter width
> overflow and reprogram, we must not decrease the @left value.
> 
> Doing so will trigger one of the above two cases and result in @overflow
> == 1, even though we've not actually had hwc->sample_period counts.

My understanding was that here we're just checking if for some reason in
arc_perf_event_update() we decremented "hwc->period_left" too much
and it became either just <0 or even <(0 - period). IMHO that may happen
if not in sampling even case (where we expect interrupt to happen close
to a period being crossed) but in case of non-sampling event IMHO that
is pretty possible if frequency of checking counter value is way too low.

-Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts
  2015-08-18 22:12   ` Peter Zijlstra
@ 2015-08-20 11:30     ` Alexey Brodkin
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-20 11:30 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: linux-arch, linux-kernel, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo, Vineet Gupta

Hi Peter,

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: 19 августа 2015 г. 1:12
> To: Alexey Brodkin
> Cc: linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org; Vineet.Gupta1@synopsys.com; arc-linux-dev@synopsys.com;
> arnd@arndb.de; Arnaldo Carvalho de Melo
> Subject: Re: [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts
> 
> On Wed, Aug 05, 2015 at 06:13:30PM +0300, Alexey Brodkin wrote:
> > @@ -319,6 +336,20 @@ static int arc_pmu_add(struct perf_event *event, int flags)
> >  	}
> >
> >  	write_aux_reg(ARC_REG_PCT_INDEX, idx);
> > +
> > +	arc_pmu->act_counter[idx] = event;
> > +
> > +	if (is_sampling_event(event)) {
> > +		/* Mimic full counter overflow as other arches do */
> > +		write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
> > +		write_aux_reg(ARC_REG_PCT_INT_CNTH,
> > +			      (arc_pmu->max_period >> 32));
> > +
> > +		/* Enable interrupt for this counter */
> > +		write_aux_reg(ARC_REG_PCT_INT_CTRL,
> > +			      read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
> > +	}
> 
> *confused* pmu::add should only start on flags & PERF_EF_START, and then
> we start with hwc->sample_period, not the max_period.

Did you mean here that we should enable interrupts in arc_pmu_start() but not
in arc_pmu_add()?

-Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode
  2015-08-18 23:37   ` Peter Zijlstra
@ 2015-08-20 11:33     ` Alexey Brodkin
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Brodkin @ 2015-08-20 11:33 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: linux-arch, linux-kernel, arc-linux-dev, arnd,
	Arnaldo Carvalho de Melo, Vineet Gupta

Hi Peter,

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: 19 августа 2015 г. 2:37
> To: Alexey Brodkin
> Cc: linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org; Vineet.Gupta1@synopsys.com; arc-linux-dev@synopsys.com;
> arnd@arndb.de; Arnaldo Carvalho de Melo
> Subject: Re: [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode
> 
> On Wed, Aug 05, 2015 at 06:13:32PM +0300, Alexey Brodkin wrote:
> > +	hwc->config = 0;
> > +
> > +	if (is_isa_arcv2()) {
> > +		/* "exclude user" means "count only kernel" */
> > +		if (event->attr.exclude_user)
> > +			hwc->config |= ARC_REG_PCT_CONFIG_KERN;
> > +
> > +		/* "exclude kernel" means "count only user" */
> > +		if (event->attr.exclude_kernel)
> > +			hwc->config |= ARC_REG_PCT_CONFIG_USER;
> > +	}
> > +
> >  	switch (event->attr.type) {
> >  	case PERF_TYPE_HARDWARE:
> >  		if (event->attr.config >= PERF_COUNT_HW_MAX)
> >  			return -ENOENT;
> >  		if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
> >  			return -ENOENT;
> > -		hwc->config = arc_pmu->ev_hw_idx[event->attr.config];
> > +		hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];
> 
> So I would still very much like perf_event_attr::config to reflect the
> value you'll program into hardware.
> 
> If you want to do that weird 4 character lookup thing, use a special
> hardware event (possibly 0 if that is not a valid value), and stuff the
> 4 chars in ::config1

Ok I understand your concern here but I cannot quite understand
what do you mean saying "stuff the 4 chars in ::config1".

Could you please explain this a bit more verbose?
Is there an example of something similar I may take a look at?

-Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-08-20 11:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-05 15:13 [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 1/8] ARC: perf: support RAW events Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 2/8] ARC: perf: cap the number of counters to hardware max of 32 Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 3/8] ARCv2: perf: implement "event_set_period" for future use with interrupts Alexey Brodkin
2015-08-18 17:52   ` Peter Zijlstra
2015-08-18 18:03     ` Alexey Brodkin
2015-08-20 10:46     ` Alexey Brodkin
2015-08-18 17:55   ` Peter Zijlstra
2015-08-20 11:25     ` Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 4/8] ARCv2: perf: Support sampling events using overflow interrupts Alexey Brodkin
2015-08-18 22:12   ` Peter Zijlstra
2015-08-20 11:30     ` Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 5/8] ARCv2: perf: set usable max period as a half of real max period Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 6/8] ARCv2: perf: implement exclusion of event counting in user or kernel mode Alexey Brodkin
2015-08-18 23:37   ` Peter Zijlstra
2015-08-20 11:33     ` Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 7/8] ARCv2: perf: SMP support Alexey Brodkin
2015-08-05 15:13 ` [PATCH v2 8/8] ARCv2: perf: Finally introduce HS perf unit Alexey Brodkin
2015-08-10  8:29 ` [PATCH v2 0/8] ARCv2 port to Linux - (C) perf Alexey Brodkin
2015-08-14  7:41 ` [arc-linux-dev] " Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).