linux-csky.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/6] csky: Add pmu hardware sampling support
@ 2019-06-04  2:23 Mao Han
  2019-06-04  2:23 ` [PATCH V4 1/6] csky: Init pmu as a device Mao Han
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mao Han, linux-csky, Guo Ren

This patch set add hardware sampling support for csky-pmu, and
also add some properties to pmu node definition. perf can record
on hardware event with this patch applied.

Cc: Guo Ren <guoren@kernel.org>

Changes since v3:
  - change reg-io-width to count-width
  - use macro sign_extend64
  - update commit log

Changes since v2:
  - update dt-binding(csky pmu use rising edge interrupt)
  - use cpuhp_setup_state to enable irq(fix irq enable on smp)

Changes since v1:
  - do not update hpcr when event type is invalid(fix option
    --all-kernel/--all-user)

Guo Ren (1):
  csky: Fixup some error count in 810 & 860.

Mao Han (5):
  csky: Init pmu as a device
  csky: Add count-width property for csky pmu
  csky: Add pmu interrupt support
  dt-bindings: csky: Add csky PMU bindings
  csky: Fix perf record in kernel/user space

 Documentation/devicetree/bindings/csky/pmu.txt |  38 +++
 arch/csky/kernel/perf_event.c                  | 424 +++++++++++++++++++++++--
 2 files changed, 441 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/csky/pmu.txt

-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V4 1/6] csky: Init pmu as a device
  2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
@ 2019-06-04  2:23 ` Mao Han
  2019-06-04  5:50   ` Guo Ren
  2019-06-04  2:23 ` [PATCH V4 2/6] csky: Add count-width property for csky pmu Mao Han
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mao Han, linux-csky, Guo Ren

This patch change the csky pmu initialization from arch init to
device init. The pmu can be configued with information from
device tree(pmu device name, irq number and etc.).

Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Guo Ren <guoren@kernel.org>
---
 arch/csky/kernel/perf_event.c | 58 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index 376c972..c022acc 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -21,6 +21,8 @@ struct csky_pmu_t {
 	uint32_t	hpcr;
 } csky_pmu;
 
+typedef int (*csky_pmu_init)(struct csky_pmu_t *);
+
 #define cprgr(reg)				\
 ({						\
 	unsigned int tmp;			\
@@ -1028,4 +1030,58 @@ int __init init_hw_perf_events(void)
 
 	return perf_pmu_register(&csky_pmu.pmu, "cpu", PERF_TYPE_RAW);
 }
-arch_initcall(init_hw_perf_events);
+
+int csky_pmu_device_probe(struct platform_device *pdev,
+			  const struct of_device_id *of_table)
+{
+	const struct of_device_id *of_id;
+	csky_pmu_init init_fn;
+	struct device_node *node = pdev->dev.of_node;
+	int ret = -ENODEV;
+
+	of_id = of_match_node(of_table, pdev->dev.of_node);
+	if (node && of_id) {
+		init_fn = of_id->data;
+		ret = init_fn(&csky_pmu);
+	}
+
+	if (ret) {
+		pr_notice("[perf] failed to probe PMU!\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+const static struct of_device_id csky_pmu_of_device_ids[] = {
+	{.compatible = "csky,csky-pmu", .data = init_hw_perf_events},
+	{},
+};
+
+static int csky_pmu_dev_probe(struct platform_device *pdev)
+{
+	return csky_pmu_device_probe(pdev, csky_pmu_of_device_ids);
+}
+
+static struct platform_driver csky_pmu_driver = {
+	.driver = {
+		   .name = "csky-pmu",
+		   .of_match_table = csky_pmu_of_device_ids,
+		   },
+	.probe = csky_pmu_dev_probe,
+};
+
+static int __init csky_pmu_probe(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&csky_pmu_driver);
+	if (ret)
+		pr_notice("[perf] PMU initialization failed\n");
+	else
+		pr_notice("[perf] PMU initialization done\n");
+
+	return ret;
+}
+
+device_initcall(csky_pmu_probe);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V4 2/6] csky: Add count-width property for csky pmu
  2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
  2019-06-04  2:23 ` [PATCH V4 1/6] csky: Init pmu as a device Mao Han
@ 2019-06-04  2:23 ` Mao Han
  2019-06-04  5:35   ` Guo Ren
  2019-06-04  2:23 ` [PATCH V4 3/6] csky: Add pmu interrupt support Mao Han
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mao Han, linux-csky, Guo Ren

The csky pmu counter may have different io width. When the counter is
smaller then 64 bits and counter value is smaller than the old value, it
will result to a extremely large delta value. So the sampled value should
be extend to 64 bits to avoid this, the extension bits base on the
count-width property from dts.

Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Guo Ren <guoren@kernel.org>
---
 arch/csky/kernel/perf_event.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index c022acc..36f7f20 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -9,6 +9,7 @@
 #include <linux/platform_device.h>
 
 #define CSKY_PMU_MAX_EVENTS 32
+#define DEFAULT_COUNT_WIDTH 48
 
 #define HPCR		"<0, 0x0>"	/* PMU Control reg */
 #define HPCNTENR	"<0, 0x4>"	/* Count Enable reg */
@@ -18,6 +19,7 @@ static void (*hw_raw_write_mapping[CSKY_PMU_MAX_EVENTS])(uint64_t val);
 
 struct csky_pmu_t {
 	struct pmu	pmu;
+	uint32_t	count_width;
 	uint32_t	hpcr;
 } csky_pmu;
 
@@ -806,7 +808,12 @@ static void csky_perf_event_update(struct perf_event *event,
 				   struct hw_perf_event *hwc)
 {
 	uint64_t prev_raw_count = local64_read(&hwc->prev_count);
-	uint64_t new_raw_count = hw_raw_read_mapping[hwc->idx]();
+	/*
+	 * Sign extend count value to 64bit, otherwise delta calculation
+	 * would be incorrect when overflow occurs.
+	 */
+	uint64_t new_raw_count = sign_extend64(
+			hw_raw_read_mapping[hwc->idx](), csky_pmu.count_width);
 	int64_t delta = new_raw_count - prev_raw_count;
 
 	/*
@@ -1045,6 +1052,11 @@ int csky_pmu_device_probe(struct platform_device *pdev,
 		ret = init_fn(&csky_pmu);
 	}
 
+	if (!of_property_read_u32(node, "count-width",
+				  &csky_pmu.count_width)) {
+		csky_pmu.count_width = DEFAULT_COUNT_WIDTH;
+	}
+
 	if (ret) {
 		pr_notice("[perf] failed to probe PMU!\n");
 		return ret;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V4 3/6] csky: Add pmu interrupt support
  2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
  2019-06-04  2:23 ` [PATCH V4 1/6] csky: Init pmu as a device Mao Han
  2019-06-04  2:23 ` [PATCH V4 2/6] csky: Add count-width property for csky pmu Mao Han
@ 2019-06-04  2:23 ` Mao Han
  2019-06-04  6:31   ` Guo Ren
  2019-06-04  2:23 ` [PATCH V4 4/6] dt-bindings: csky: Add csky PMU bindings Mao Han
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mao Han, linux-csky, Guo Ren

This patch add interrupt request and handler for csky pmu.
perf can record on hardware event with this patch applied.

Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Guo Ren <guoren@kernel.org>
---
 arch/csky/kernel/perf_event.c | 292 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 276 insertions(+), 16 deletions(-)

diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index 36f7f20..af09885 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -11,18 +11,50 @@
 #define CSKY_PMU_MAX_EVENTS 32
 #define DEFAULT_COUNT_WIDTH 48
 
-#define HPCR		"<0, 0x0>"	/* PMU Control reg */
-#define HPCNTENR	"<0, 0x4>"	/* Count Enable reg */
+#define HPCR		"<0, 0x0>"      /* PMU Control reg */
+#define HPSPR		"<0, 0x1>"      /* Start PC reg */
+#define HPEPR		"<0, 0x2>"      /* End PC reg */
+#define HPSIR		"<0, 0x3>"      /* Soft Counter reg */
+#define HPCNTENR	"<0, 0x4>"      /* Count Enable reg */
+#define HPINTENR	"<0, 0x5>"      /* Interrupt Enable reg */
+#define HPOFSR		"<0, 0x6>"      /* Interrupt Status reg */
+
+/* The events for a given PMU register set. */
+struct pmu_hw_events {
+	/*
+	 * The events that are active on the PMU for the given index.
+	 */
+	struct perf_event *events[CSKY_PMU_MAX_EVENTS];
+
+	/*
+	 * A 1 bit for an index indicates that the counter is being used for
+	 * an event. A 0 means that the counter can be used.
+	 */
+	unsigned long used_mask[BITS_TO_LONGS(CSKY_PMU_MAX_EVENTS)];
+
+	/*
+	 * Hardware lock to serialize accesses to PMU registers. Needed for the
+	 * read/modify/write sequences.
+	 */
+	raw_spinlock_t pmu_lock;
+};
 
 static uint64_t (*hw_raw_read_mapping[CSKY_PMU_MAX_EVENTS])(void);
 static void (*hw_raw_write_mapping[CSKY_PMU_MAX_EVENTS])(uint64_t val);
 
 struct csky_pmu_t {
-	struct pmu	pmu;
-	uint32_t	count_width;
-	uint32_t	hpcr;
+	struct pmu			pmu;
+	irqreturn_t			(*handle_irq)(int irq_num);
+	void				(*reset)(void *info);
+	struct pmu_hw_events __percpu	*hw_events;
+	struct platform_device		*plat_device;
+	uint32_t			count_width;
+	uint32_t			hpcr;
+	u64				max_period;
 } csky_pmu;
+static int csky_pmu_irq;
 
+#define to_csky_pmu(p)  (container_of(p, struct csky_pmu, pmu))
 typedef int (*csky_pmu_init)(struct csky_pmu_t *);
 
 #define cprgr(reg)				\
@@ -804,6 +836,51 @@ static const int csky_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 	},
 };
 
+int  csky_pmu_event_set_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+	s64 period = hwc->sample_period;
+	int ret = 0;
+
+	if (unlikely(left <= -period)) {
+		left = period;
+		local64_set(&hwc->period_left, left);
+		hwc->last_period = period;
+		ret = 1;
+	}
+
+	if (unlikely(left <= 0)) {
+		left += period;
+		local64_set(&hwc->period_left, left);
+		hwc->last_period = period;
+		ret = 1;
+	}
+
+	if (left > (s64)csky_pmu.max_period)
+		left = csky_pmu.max_period;
+
+	/* Interrupt may lose when period is too small. */
+	if (left < 10)
+		left = 10;
+
+	/*
+	 * The hw event starts counting from this event offset,
+	 * mark it to be able to extract future "deltas":
+	 */
+	local64_set(&hwc->prev_count, (u64)(-left));
+
+	if (hw_raw_write_mapping[hwc->idx] != NULL)
+		hw_raw_write_mapping[hwc->idx]((u64)(-left) &
+						csky_pmu.max_period);
+
+	cpwcr(HPOFSR, ~BIT(hwc->idx) & cprcr(HPOFSR));
+
+	perf_event_update_userpage(event);
+
+	return ret;
+}
+
 static void csky_perf_event_update(struct perf_event *event,
 				   struct hw_perf_event *hwc)
 {
@@ -825,6 +902,11 @@ static void csky_perf_event_update(struct perf_event *event,
 	local64_sub(delta, &hwc->period_left);
 }
 
+static void csky_pmu_reset(void *info)
+{
+	cpwcr(HPCR, BIT(31) | BIT(30) | BIT(1));
+}
+
 static void csky_pmu_read(struct perf_event *event)
 {
 	csky_perf_event_update(event, &event->hw);
@@ -901,7 +983,9 @@ static void csky_pmu_disable(struct pmu *pmu)
 
 static void csky_pmu_start(struct perf_event *event, int flags)
 {
+	unsigned long irq_flags;
 	struct hw_perf_event *hwc = &event->hw;
+	struct pmu_hw_events *events = this_cpu_ptr(csky_pmu.hw_events);
 	int idx = hwc->idx;
 
 	if (WARN_ON_ONCE(idx == -1))
@@ -912,16 +996,35 @@ static void csky_pmu_start(struct perf_event *event, int flags)
 
 	hwc->state = 0;
 
+	csky_pmu_event_set_period(event);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, irq_flags);
+
+	cpwcr(HPINTENR, BIT(idx) | cprcr(HPINTENR));
 	cpwcr(HPCNTENR, BIT(idx) | cprcr(HPCNTENR));
+
+	raw_spin_unlock_irqrestore(&events->pmu_lock, irq_flags);
 }
 
-static void csky_pmu_stop(struct perf_event *event, int flags)
+static void csky_pmu_stop_event(struct perf_event *event)
 {
+	unsigned long irq_flags;
 	struct hw_perf_event *hwc = &event->hw;
+	struct pmu_hw_events *events = this_cpu_ptr(csky_pmu.hw_events);
 	int idx = hwc->idx;
 
+	raw_spin_lock_irqsave(&events->pmu_lock, irq_flags);
+
+	cpwcr(HPINTENR, ~BIT(idx) & cprcr(HPINTENR));
+	cpwcr(HPCNTENR, ~BIT(idx) & cprcr(HPCNTENR));
+
+	raw_spin_unlock_irqrestore(&events->pmu_lock, irq_flags);
+}
+
+static void csky_pmu_stop(struct perf_event *event, int flags)
+{
 	if (!(event->hw.state & PERF_HES_STOPPED)) {
-		cpwcr(HPCNTENR, ~BIT(idx) & cprcr(HPCNTENR));
+		csky_pmu_stop_event(event);
 		event->hw.state |= PERF_HES_STOPPED;
 	}
 
@@ -934,7 +1037,11 @@ static void csky_pmu_stop(struct perf_event *event, int flags)
 
 static void csky_pmu_del(struct perf_event *event, int flags)
 {
+	struct pmu_hw_events *hw_events = this_cpu_ptr(csky_pmu.hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+
 	csky_pmu_stop(event, PERF_EF_UPDATE);
+	hw_events->events[hwc->idx] = NULL;
 
 	perf_event_update_userpage(event);
 }
@@ -942,12 +1049,10 @@ static void csky_pmu_del(struct perf_event *event, int flags)
 /* allocate hardware counter and optionally start counting */
 static int csky_pmu_add(struct perf_event *event, int flags)
 {
+	struct pmu_hw_events *hw_events = this_cpu_ptr(csky_pmu.hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 
-	local64_set(&hwc->prev_count, 0);
-
-	if (hw_raw_write_mapping[hwc->idx] != NULL)
-		hw_raw_write_mapping[hwc->idx](0);
+	hw_events->events[hwc->idx] = event;
 
 	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
 	if (flags & PERF_EF_START)
@@ -958,8 +1063,118 @@ static int csky_pmu_add(struct perf_event *event, int flags)
 	return 0;
 }
 
+static irqreturn_t csky_pmu_handle_irq(int irq_num)
+{
+	struct perf_sample_data data;
+	struct pmu_hw_events *cpuc = this_cpu_ptr(csky_pmu.hw_events);
+	struct pt_regs *regs;
+	int idx;
+
+	/*
+	 * Did an overflow occur?
+	 */
+	if (!cprcr(HPOFSR))
+		return IRQ_NONE;
+
+	/*
+	 * Handle the counter(s) overflow(s)
+	 */
+	regs = get_irq_regs();
+
+	csky_pmu_disable(&csky_pmu.pmu);
+	for (idx = 0; idx < CSKY_PMU_MAX_EVENTS; ++idx) {
+		struct perf_event *event = cpuc->events[idx];
+		struct hw_perf_event *hwc;
+
+		/* Ignore if we don't have an event. */
+		if (!event)
+			continue;
+		/*
+		 * We have a single interrupt for all counters. Check that
+		 * each counter has overflowed before we process it.
+		 */
+		if (!(cprcr(HPOFSR) & 1 << idx))
+			continue;
+
+		hwc = &event->hw;
+		csky_perf_event_update(event, &event->hw);
+		perf_sample_data_init(&data, 0, hwc->last_period);
+		csky_pmu_event_set_period(event);
+
+		if (perf_event_overflow(event, &data, regs))
+			csky_pmu_stop_event(event);
+	}
+	csky_pmu_enable(&csky_pmu.pmu);
+	/*
+	 * Handle the pending perf events.
+	 *
+	 * Note: this call *must* be run with interrupts disabled. For
+	 * platforms that can have the PMU interrupts raised as an NMI, this
+	 * will not work.
+	 */
+	irq_work_run();
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t csky_pmu_dispatch_irq(int irq, void *dev)
+{
+	int ret;
+
+	ret = csky_pmu.handle_irq(irq);
+
+	return ret;
+}
+
+static int csky_pmu_request_irq(irq_handler_t handler)
+{
+	int err, irq, irqs;
+	struct platform_device *pmu_device = csky_pmu.plat_device;
+
+	if (!pmu_device)
+		return -ENODEV;
+
+	irqs = min(pmu_device->num_resources, num_possible_cpus());
+	if (irqs < 1) {
+		pr_err("no irqs for PMUs defined\n");
+		return -ENODEV;
+	}
+
+	irq = platform_get_irq(pmu_device, 0);
+	if (irq < 0)
+		return -ENODEV;
+	err = request_percpu_irq(irq, handler, "csky-pmu",
+				 this_cpu_ptr(csky_pmu.hw_events));
+	if (err) {
+		pr_err("unable to request IRQ%d for CSKY PMU counters\n",
+		       irq);
+		return err;
+	}
+
+	return 0;
+}
+
+static void csky_pmu_free_irq(void)
+{
+	int irq;
+	struct platform_device *pmu_device = csky_pmu.plat_device;
+
+	irq = platform_get_irq(pmu_device, 0);
+	if (irq >= 0)
+		free_percpu_irq(irq, this_cpu_ptr(csky_pmu.hw_events));
+}
+
 int __init init_hw_perf_events(void)
 {
+	int cpu;
+
+	csky_pmu.hw_events = alloc_percpu_gfp(struct pmu_hw_events,
+					      GFP_KERNEL);
+	if (!csky_pmu.hw_events) {
+		pr_info("failed to allocate per-cpu PMU data.\n");
+		return -ENOMEM;
+	}
+
 	csky_pmu.pmu = (struct pmu) {
 		.pmu_enable	= csky_pmu_enable,
 		.pmu_disable	= csky_pmu_disable,
@@ -971,6 +1186,16 @@ int __init init_hw_perf_events(void)
 		.read		= csky_pmu_read,
 	};
 
+	csky_pmu.handle_irq = csky_pmu_handle_irq;
+	csky_pmu.reset = csky_pmu_reset;
+
+	for_each_possible_cpu(cpu) {
+		struct pmu_hw_events *events;
+
+		events = per_cpu_ptr(csky_pmu.hw_events, cpu);
+		raw_spin_lock_init(&events->pmu_lock);
+	}
+
 	memset((void *)hw_raw_read_mapping, 0,
 		sizeof(hw_raw_read_mapping[CSKY_PMU_MAX_EVENTS]));
 
@@ -1031,11 +1256,19 @@ int __init init_hw_perf_events(void)
 	hw_raw_write_mapping[0x1a] = csky_pmu_write_l2wac;
 	hw_raw_write_mapping[0x1b] = csky_pmu_write_l2wmc;
 
-	csky_pmu.pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+	return 0;
+}
 
-	cpwcr(HPCR, BIT(31) | BIT(30) | BIT(1));
+static int csky_pmu_starting_cpu(unsigned int cpu)
+{
+	enable_percpu_irq(csky_pmu_irq, 0);
+	return 0;
+}
 
-	return perf_pmu_register(&csky_pmu.pmu, "cpu", PERF_TYPE_RAW);
+static int csky_pmu_dying_cpu(unsigned int cpu)
+{
+	disable_percpu_irq(csky_pmu_irq);
+	return 0;
 }
 
 int csky_pmu_device_probe(struct platform_device *pdev,
@@ -1052,14 +1285,41 @@ int csky_pmu_device_probe(struct platform_device *pdev,
 		ret = init_fn(&csky_pmu);
 	}
 
+	if (ret) {
+		pr_notice("[perf] failed to probe PMU!\n");
+		return ret;
+	}
+
 	if (!of_property_read_u32(node, "count-width",
 				  &csky_pmu.count_width)) {
 		csky_pmu.count_width = DEFAULT_COUNT_WIDTH;
 	}
+	csky_pmu.max_period = ((u64)1 << csky_pmu.count_width) - 1;
 
+	csky_pmu.plat_device = pdev;
+
+	/* Ensure the PMU has sane values out of reset. */
+	if (csky_pmu.reset)
+		on_each_cpu(csky_pmu.reset, &csky_pmu, 1);
+
+	ret = csky_pmu_request_irq(csky_pmu_dispatch_irq);
 	if (ret) {
-		pr_notice("[perf] failed to probe PMU!\n");
-		return ret;
+		csky_pmu.pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+		pr_notice("[perf] PMU request irq fail!\n");
+	}
+
+	ret = cpuhp_setup_state(CPUHP_AP_PERF_ONLINE, "AP_PERF_ONLINE",
+				csky_pmu_starting_cpu,
+				csky_pmu_dying_cpu);
+	if (ret) {
+		csky_pmu_free_irq();
+		free_percpu(csky_pmu.hw_events);
+	}
+
+	ret = perf_pmu_register(&csky_pmu.pmu, "cpu", PERF_TYPE_RAW);
+	if (ret) {
+		csky_pmu_free_irq();
+		free_percpu(csky_pmu.hw_events);
 	}
 
 	return ret;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V4 4/6] dt-bindings: csky: Add csky PMU bindings
  2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
                   ` (2 preceding siblings ...)
  2019-06-04  2:23 ` [PATCH V4 3/6] csky: Add pmu interrupt support Mao Han
@ 2019-06-04  2:23 ` Mao Han
  2019-06-04  5:38   ` Guo Ren
  2019-06-04  2:23 ` [PATCH V4 5/6] csky: Fixup some error count in 810 & 860 Mao Han
  2019-06-04  2:24 ` [PATCH V4 6/6] csky: Fix perf record in kernel/user space Mao Han
  5 siblings, 1 reply; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mao Han, linux-csky, Rob Herring, Guo Ren

This patch adds the documentation to describe that how to add pmu node in
dts.

Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
---
 Documentation/devicetree/bindings/csky/pmu.txt | 38 ++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/csky/pmu.txt

diff --git a/Documentation/devicetree/bindings/csky/pmu.txt b/Documentation/devicetree/bindings/csky/pmu.txt
new file mode 100644
index 0000000..53c3b0a
--- /dev/null
+++ b/Documentation/devicetree/bindings/csky/pmu.txt
@@ -0,0 +1,38 @@
+============================
+C-SKY Performance Monitor Units
+============================
+
+C-SKY 8xx series cores often have a PMU for counting cpu and cache events.
+The C-SKY PMU representation in the device tree should be done as under:
+
+==============================
+PMU node bindings definition
+==============================
+
+	Description: Describes PMU
+
+	PROPERTIES
+
+	- compatible
+		Usage: required
+		Value type: <string>
+		Definition: must be "csky,csky-pmu"
+	- interrupts
+		Usage: required
+		Value type: <u32>
+		Definition: must be pmu irq num defined by soc
+	- count-width
+		Usage: optional
+		Value type: <u32>
+		Definition: the width of pmu counter
+
+Examples:
+---------
+
+        pmu {
+                compatible = "csky,csky-pmu";
+                interrupts = <0x17 IRQ_TYPE_EDGE_RISING>;
+                interrupt-parent = <&intc>;
+		count-width = <0x30>;
+        };
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V4 5/6] csky: Fixup some error count in 810 & 860.
  2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
                   ` (3 preceding siblings ...)
  2019-06-04  2:23 ` [PATCH V4 4/6] dt-bindings: csky: Add csky PMU bindings Mao Han
@ 2019-06-04  2:23 ` Mao Han
  2019-06-04  2:24 ` [PATCH V4 6/6] csky: Fix perf record in kernel/user space Mao Han
  5 siblings, 0 replies; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Guo Ren, linux-csky, Mao Han, Guo Ren

From: Guo Ren <ren_guo@c-sky.com>

CK810 pmu only support event with index 0-8 and 0xd; CK860 only
support event 1~4, 0xa~0x1b. So do not register unsupport event
to hardware cache event, which may leader to unknown behavior.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
 arch/csky/kernel/perf_event.c | 60 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 54 insertions(+), 6 deletions(-)

diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index af09885..dc84dc7 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -737,6 +737,20 @@ static const int csky_pmu_hw_map[PERF_COUNT_HW_MAX] = {
 #define CACHE_OP_UNSUPPORTED	0xffff
 static const int csky_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 	[C(L1D)] = {
+#ifdef CONFIG_CPU_CK810
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= 0x5,
+			[C(RESULT_MISS)]	= 0x6,
+		},
+#else
 		[C(OP_READ)] = {
 			[C(RESULT_ACCESS)]	= 0x14,
 			[C(RESULT_MISS)]	= 0x15,
@@ -746,9 +760,10 @@ static const int csky_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 			[C(RESULT_MISS)]	= 0x17,
 		},
 		[C(OP_PREFETCH)] = {
-			[C(RESULT_ACCESS)]	= 0x5,
-			[C(RESULT_MISS)]	= 0x6,
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
+#endif
 	},
 	[C(L1I)] = {
 		[C(OP_READ)] = {
@@ -765,6 +780,20 @@ static const int csky_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		},
 	},
 	[C(LL)] = {
+#ifdef CONFIG_CPU_CK810
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= 0x7,
+			[C(RESULT_MISS)]	= 0x8,
+		},
+#else
 		[C(OP_READ)] = {
 			[C(RESULT_ACCESS)]	= 0x18,
 			[C(RESULT_MISS)]	= 0x19,
@@ -774,29 +803,48 @@ static const int csky_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 			[C(RESULT_MISS)]	= 0x1b,
 		},
 		[C(OP_PREFETCH)] = {
-			[C(RESULT_ACCESS)]	= 0x7,
-			[C(RESULT_MISS)]	= 0x8,
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
+#endif
 	},
 	[C(DTLB)] = {
+#ifdef CONFIG_CPU_CK810
 		[C(OP_READ)] = {
-			[C(RESULT_ACCESS)]	= 0x5,
-			[C(RESULT_MISS)]	= 0xb,
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 		[C(OP_WRITE)] = {
 			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
+#else
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= 0x14,
+			[C(RESULT_MISS)]	= 0xb,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= 0x16,
+			[C(RESULT_MISS)]	= 0xb,
+		},
+#endif
 		[C(OP_PREFETCH)] = {
 			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
 	[C(ITLB)] = {
+#ifdef CONFIG_CPU_CK810
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+#else
 		[C(OP_READ)] = {
 			[C(RESULT_ACCESS)]	= 0x3,
 			[C(RESULT_MISS)]	= 0xa,
 		},
+#endif
 		[C(OP_WRITE)] = {
 			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V4 6/6] csky: Fix perf record in kernel/user space
  2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
                   ` (4 preceding siblings ...)
  2019-06-04  2:23 ` [PATCH V4 5/6] csky: Fixup some error count in 810 & 860 Mao Han
@ 2019-06-04  2:24 ` Mao Han
  2019-06-04  6:36   ` Guo Ren
  5 siblings, 1 reply; 12+ messages in thread
From: Mao Han @ 2019-06-04  2:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mao Han, linux-csky, Guo Ren

csky_pmu_event_init is called several times during the perf record
initialzation. After configure the event counter in either kernel
space or user space, csky_pmu_event_init is called twice with no
attr specified. Configuration will be overwritten with sampling in
both kernel space and user space. --all-kernel/--all-user is
useless without this patch applied.

Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
 arch/csky/kernel/perf_event.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index dc84dc7..e3308ab 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -983,6 +983,12 @@ static int csky_pmu_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int ret;
 
+	if (event->attr.type != PERF_TYPE_HARDWARE &&
+	    event->attr.type != PERF_TYPE_HW_CACHE &&
+	    event->attr.type != PERF_TYPE_RAW) {
+		return -ENOENT;
+	}
+
 	if (event->attr.exclude_user)
 		csky_pmu.hpcr = BIT(2);
 	else if (event->attr.exclude_kernel)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V4 2/6] csky: Add count-width property for csky pmu
  2019-06-04  2:23 ` [PATCH V4 2/6] csky: Add count-width property for csky pmu Mao Han
@ 2019-06-04  5:35   ` Guo Ren
  0 siblings, 0 replies; 12+ messages in thread
From: Guo Ren @ 2019-06-04  5:35 UTC (permalink / raw)
  To: Mao Han; +Cc: linux-kernel, linux-csky

Hi Mao,

On Tue, Jun 4, 2019 at 10:25 AM Mao Han <han_mao@c-sky.com> wrote:
>
> The csky pmu counter may have different io width. When the counter is
> smaller then 64 bits and counter value is smaller than the old value, it
> will result to a extremely large delta value. So the sampled value should
> be extend to 64 bits to avoid this, the extension bits base on the
> count-width property from dts.
>
> Signed-off-by: Mao Han <han_mao@c-sky.com>
> Cc: Guo Ren <guoren@kernel.org>
> ---
>  arch/csky/kernel/perf_event.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
> index c022acc..36f7f20 100644
> --- a/arch/csky/kernel/perf_event.c
> +++ b/arch/csky/kernel/perf_event.c
> @@ -9,6 +9,7 @@
>  #include <linux/platform_device.h>
>
>  #define CSKY_PMU_MAX_EVENTS 32
> +#define DEFAULT_COUNT_WIDTH 48
>
>  #define HPCR           "<0, 0x0>"      /* PMU Control reg */
>  #define HPCNTENR       "<0, 0x4>"      /* Count Enable reg */
> @@ -18,6 +19,7 @@ static void (*hw_raw_write_mapping[CSKY_PMU_MAX_EVENTS])(uint64_t val);
>
>  struct csky_pmu_t {
>         struct pmu      pmu;
> +       uint32_t        count_width;
>         uint32_t        hpcr;
>  } csky_pmu;
>
> @@ -806,7 +808,12 @@ static void csky_perf_event_update(struct perf_event *event,
>                                    struct hw_perf_event *hwc)
>  {
>         uint64_t prev_raw_count = local64_read(&hwc->prev_count);
> -       uint64_t new_raw_count = hw_raw_read_mapping[hwc->idx]();
> +       /*
> +        * Sign extend count value to 64bit, otherwise delta calculation
> +        * would be incorrect when overflow occurs.
> +        */
> +       uint64_t new_raw_count = sign_extend64(
> +                       hw_raw_read_mapping[hwc->idx](), csky_pmu.count_width);
csky_pmu.count_width - 1 ? we need index here.

Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V4 4/6] dt-bindings: csky: Add csky PMU bindings
  2019-06-04  2:23 ` [PATCH V4 4/6] dt-bindings: csky: Add csky PMU bindings Mao Han
@ 2019-06-04  5:38   ` Guo Ren
  0 siblings, 0 replies; 12+ messages in thread
From: Guo Ren @ 2019-06-04  5:38 UTC (permalink / raw)
  To: Mao Han; +Cc: linux-kernel, linux-csky, Rob Herring

Reviewed-by: Guo Ren <ren_guo@c-sky.com>

On Tue, Jun 4, 2019 at 10:25 AM Mao Han <han_mao@c-sky.com> wrote:
>
> This patch adds the documentation to describe that how to add pmu node in
> dts.
>
> Signed-off-by: Mao Han <han_mao@c-sky.com>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Guo Ren <guoren@kernel.org>
> ---
>  Documentation/devicetree/bindings/csky/pmu.txt | 38 ++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/csky/pmu.txt
>
> diff --git a/Documentation/devicetree/bindings/csky/pmu.txt b/Documentation/devicetree/bindings/csky/pmu.txt
> new file mode 100644
> index 0000000..53c3b0a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/csky/pmu.txt
> @@ -0,0 +1,38 @@
> +============================
> +C-SKY Performance Monitor Units
> +============================
> +
> +C-SKY 8xx series cores often have a PMU for counting cpu and cache events.
> +The C-SKY PMU representation in the device tree should be done as under:
> +
> +==============================
> +PMU node bindings definition
> +==============================
> +
> +       Description: Describes PMU
> +
> +       PROPERTIES
> +
> +       - compatible
> +               Usage: required
> +               Value type: <string>
> +               Definition: must be "csky,csky-pmu"
> +       - interrupts
> +               Usage: required
> +               Value type: <u32>
> +               Definition: must be pmu irq num defined by soc
> +       - count-width
> +               Usage: optional
> +               Value type: <u32>
> +               Definition: the width of pmu counter
> +
> +Examples:
> +---------
> +
> +        pmu {
> +                compatible = "csky,csky-pmu";
> +                interrupts = <0x17 IRQ_TYPE_EDGE_RISING>;
> +                interrupt-parent = <&intc>;
> +               count-width = <0x30>;
> +        };
> +
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V4 1/6] csky: Init pmu as a device
  2019-06-04  2:23 ` [PATCH V4 1/6] csky: Init pmu as a device Mao Han
@ 2019-06-04  5:50   ` Guo Ren
  0 siblings, 0 replies; 12+ messages in thread
From: Guo Ren @ 2019-06-04  5:50 UTC (permalink / raw)
  To: Mao Han; +Cc: linux-kernel, linux-csky

Hello Mao,

On Tue, Jun 4, 2019 at 10:25 AM Mao Han <han_mao@c-sky.com> wrote:
>
> This patch change the csky pmu initialization from arch init to
> device init. The pmu can be configued with information from
> device tree(pmu device name, irq number and etc.).
>
> Signed-off-by: Mao Han <han_mao@c-sky.com>
> Cc: Guo Ren <guoren@kernel.org>
> ---
>  arch/csky/kernel/perf_event.c | 58 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
> index 376c972..c022acc 100644
> --- a/arch/csky/kernel/perf_event.c
> +++ b/arch/csky/kernel/perf_event.c
> @@ -21,6 +21,8 @@ struct csky_pmu_t {
>         uint32_t        hpcr;
>  } csky_pmu;
>
> +typedef int (*csky_pmu_init)(struct csky_pmu_t *);
Is the type of csky_pmu_init() the same with init_hw_perf_events ?

And I also think you should remove the hook style, because there
is only one init for the driver.

> +
>  #define cprgr(reg)                             \
>  ({                                             \
>         unsigned int tmp;                       \
> @@ -1028,4 +1030,58 @@ int __init init_hw_perf_events(void)
>
>         return perf_pmu_register(&csky_pmu.pmu, "cpu", PERF_TYPE_RAW);
>  }
> -arch_initcall(init_hw_perf_events);
> +
> +int csky_pmu_device_probe(struct platform_device *pdev,
> +                         const struct of_device_id *of_table)
> +{
> +       const struct of_device_id *of_id;
> +       csky_pmu_init init_fn;
> +       struct device_node *node = pdev->dev.of_node;
> +       int ret = -ENODEV;
> +

> +       of_id = of_match_node(of_table, pdev->dev.of_node);
> +       if (node && of_id) {
> +               init_fn = of_id->data;
> +               ret = init_fn(&csky_pmu);
> +       }
Ditto, all 7 lines above should be removed and use directly like:
            ret = init_hw_perf_events();

> +       if (ret) {
> +               pr_notice("[perf] failed to probe PMU!\n");
> +               return ret;
> +       }
> +
> +       return ret;
> +}
> +
> +const static struct of_device_id csky_pmu_of_device_ids[] = {
> +       {.compatible = "csky,csky-pmu", .data = init_hw_perf_events},
Ditto, Nothing for .data.

Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V4 3/6] csky: Add pmu interrupt support
  2019-06-04  2:23 ` [PATCH V4 3/6] csky: Add pmu interrupt support Mao Han
@ 2019-06-04  6:31   ` Guo Ren
  0 siblings, 0 replies; 12+ messages in thread
From: Guo Ren @ 2019-06-04  6:31 UTC (permalink / raw)
  To: Mao Han; +Cc: linux-kernel, linux-csky

Hello Mao,

Nice job and see my comment below.

On Tue, Jun 4, 2019 at 10:25 AM Mao Han <han_mao@c-sky.com> wrote:
>
> This patch add interrupt request and handler for csky pmu.
> perf can record on hardware event with this patch applied.
>
> Signed-off-by: Mao Han <han_mao@c-sky.com>
> Cc: Guo Ren <guoren@kernel.org>
> ---
>  arch/csky/kernel/perf_event.c | 292 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 276 insertions(+), 16 deletions(-)
>
> diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
> index 36f7f20..af09885 100644
> --- a/arch/csky/kernel/perf_event.c
> +++ b/arch/csky/kernel/perf_event.c
> @@ -11,18 +11,50 @@
>  #define CSKY_PMU_MAX_EVENTS 32
>  #define DEFAULT_COUNT_WIDTH 48
>
> -#define HPCR           "<0, 0x0>"      /* PMU Control reg */
> -#define HPCNTENR       "<0, 0x4>"      /* Count Enable reg */
> +#define HPCR           "<0, 0x0>"      /* PMU Control reg */
> +#define HPSPR          "<0, 0x1>"      /* Start PC reg */
> +#define HPEPR          "<0, 0x2>"      /* End PC reg */
> +#define HPSIR          "<0, 0x3>"      /* Soft Counter reg */
> +#define HPCNTENR       "<0, 0x4>"      /* Count Enable reg */
> +#define HPINTENR       "<0, 0x5>"      /* Interrupt Enable reg */
> +#define HPOFSR         "<0, 0x6>"      /* Interrupt Status reg */
> +
> +/* The events for a given PMU register set. */
> +struct pmu_hw_events {
> +       /*
> +        * The events that are active on the PMU for the given index.
> +        */
> +       struct perf_event *events[CSKY_PMU_MAX_EVENTS];
> +
> +       /*
> +        * A 1 bit for an index indicates that the counter is being used for
> +        * an event. A 0 means that the counter can be used.
> +        */
> +       unsigned long used_mask[BITS_TO_LONGS(CSKY_PMU_MAX_EVENTS)];
> +
> +       /*
> +        * Hardware lock to serialize accesses to PMU registers. Needed for the
> +        * read/modify/write sequences.
> +        */
> +       raw_spinlock_t pmu_lock;
> +};
>
>  static uint64_t (*hw_raw_read_mapping[CSKY_PMU_MAX_EVENTS])(void);
>  static void (*hw_raw_write_mapping[CSKY_PMU_MAX_EVENTS])(uint64_t val);
>
>  struct csky_pmu_t {
Help add static struct csky_pmu_t here.

> -       struct pmu      pmu;
> -       uint32_t        count_width;
> -       uint32_t        hpcr;
> +       struct pmu                      pmu;
> +       irqreturn_t                     (*handle_irq)(int irq_num);
Only one PMU no need this hook.
> +       void                            (*reset)(void *info);
Ditto

> +       struct pmu_hw_events __percpu   *hw_events;
> +       struct platform_device          *plat_device;
> +       uint32_t                        count_width;
> +       uint32_t                        hpcr;
> +       u64                             max_period;
>  } csky_pmu;
> +static int csky_pmu_irq;
>
> +#define to_csky_pmu(p)  (container_of(p, struct csky_pmu, pmu))
>  typedef int (*csky_pmu_init)(struct csky_pmu_t *);
>
>  #define cprgr(reg)                             \
> @@ -804,6 +836,51 @@ static const int csky_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
>         },
>  };
>
> +int  csky_pmu_event_set_period(struct perf_event *event)
> +{
> +       struct hw_perf_event *hwc = &event->hw;
> +       s64 left = local64_read(&hwc->period_left);
> +       s64 period = hwc->sample_period;
> +       int ret = 0;
> +
> +       if (unlikely(left <= -period)) {
> +               left = period;
> +               local64_set(&hwc->period_left, left);
> +               hwc->last_period = period;
> +               ret = 1;
> +       }
> +
> +       if (unlikely(left <= 0)) {
> +               left += period;
> +               local64_set(&hwc->period_left, left);
> +               hwc->last_period = period;
> +               ret = 1;
> +       }
> +
> +       if (left > (s64)csky_pmu.max_period)
> +               left = csky_pmu.max_period;
> +
> +       /* Interrupt may lose when period is too small. */
> +       if (left < 10)
> +               left = 10;
Is that right? We've solved RISING_EDGE and request_percpu_irq enalbe problems.

> +
> +       /*
> +        * The hw event starts counting from this event offset,
> +        * mark it to be able to extract future "deltas":
> +        */
> +       local64_set(&hwc->prev_count, (u64)(-left));
> +
> +       if (hw_raw_write_mapping[hwc->idx] != NULL)
> +               hw_raw_write_mapping[hwc->idx]((u64)(-left) &
> +                                               csky_pmu.max_period);
> +
> +       cpwcr(HPOFSR, ~BIT(hwc->idx) & cprcr(HPOFSR));
> +
> +       perf_event_update_userpage(event);
> +
> +       return ret;
> +}
> +
>  static void csky_perf_event_update(struct perf_event *event,
>                                    struct hw_perf_event *hwc)
>  {
> @@ -825,6 +902,11 @@ static void csky_perf_event_update(struct perf_event *event,
>         local64_sub(delta, &hwc->period_left);
>  }
>
> +static void csky_pmu_reset(void *info)
> +{
> +       cpwcr(HPCR, BIT(31) | BIT(30) | BIT(1));
> +}
> +
>  static void csky_pmu_read(struct perf_event *event)
>  {
>         csky_perf_event_update(event, &event->hw);
> @@ -901,7 +983,9 @@ static void csky_pmu_disable(struct pmu *pmu)
>
>  static void csky_pmu_start(struct perf_event *event, int flags)
>  {
> +       unsigned long irq_flags;
>         struct hw_perf_event *hwc = &event->hw;
> +       struct pmu_hw_events *events = this_cpu_ptr(csky_pmu.hw_events);
>         int idx = hwc->idx;
>
>         if (WARN_ON_ONCE(idx == -1))
> @@ -912,16 +996,35 @@ static void csky_pmu_start(struct perf_event *event, int flags)
>
>         hwc->state = 0;
>
> +       csky_pmu_event_set_period(event);
> +
> +       raw_spin_lock_irqsave(&events->pmu_lock, irq_flags);
No need spin_lock here, the register is per_cpu.

> +
> +       cpwcr(HPINTENR, BIT(idx) | cprcr(HPINTENR));
>         cpwcr(HPCNTENR, BIT(idx) | cprcr(HPCNTENR));
> +
> +       raw_spin_unlock_irqrestore(&events->pmu_lock, irq_flags);
Ditto

>  }
>
> -static void csky_pmu_stop(struct perf_event *event, int flags)
> +static void csky_pmu_stop_event(struct perf_event *event)
>  {
> +       unsigned long irq_flags;
>         struct hw_perf_event *hwc = &event->hw;
> +       struct pmu_hw_events *events = this_cpu_ptr(csky_pmu.hw_events);
>         int idx = hwc->idx;
>
> +       raw_spin_lock_irqsave(&events->pmu_lock, irq_flags);
Ditto

> +
> +       cpwcr(HPINTENR, ~BIT(idx) & cprcr(HPINTENR));
> +       cpwcr(HPCNTENR, ~BIT(idx) & cprcr(HPCNTENR));
> +
> +       raw_spin_unlock_irqrestore(&events->pmu_lock, irq_flags);
Ditto

> +}
> +
> +static void csky_pmu_stop(struct perf_event *event, int flags)
> +{
>         if (!(event->hw.state & PERF_HES_STOPPED)) {
> -               cpwcr(HPCNTENR, ~BIT(idx) & cprcr(HPCNTENR));
> +               csky_pmu_stop_event(event);
>                 event->hw.state |= PERF_HES_STOPPED;
>         }
>
> @@ -934,7 +1037,11 @@ static void csky_pmu_stop(struct perf_event *event, int flags)
>
>  static void csky_pmu_del(struct perf_event *event, int flags)
>  {
> +       struct pmu_hw_events *hw_events = this_cpu_ptr(csky_pmu.hw_events);
> +       struct hw_perf_event *hwc = &event->hw;
> +
>         csky_pmu_stop(event, PERF_EF_UPDATE);
> +       hw_events->events[hwc->idx] = NULL;
>
>         perf_event_update_userpage(event);
>  }
> @@ -942,12 +1049,10 @@ static void csky_pmu_del(struct perf_event *event, int flags)
>  /* allocate hardware counter and optionally start counting */
>  static int csky_pmu_add(struct perf_event *event, int flags)
>  {
> +       struct pmu_hw_events *hw_events = this_cpu_ptr(csky_pmu.hw_events);
>         struct hw_perf_event *hwc = &event->hw;
>
> -       local64_set(&hwc->prev_count, 0);
> -
> -       if (hw_raw_write_mapping[hwc->idx] != NULL)
> -               hw_raw_write_mapping[hwc->idx](0);
> +       hw_events->events[hwc->idx] = event;
>
>         hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
>         if (flags & PERF_EF_START)
> @@ -958,8 +1063,118 @@ static int csky_pmu_add(struct perf_event *event, int flags)
>         return 0;
>  }
>
> +static irqreturn_t csky_pmu_handle_irq(int irq_num)
> +{
> +       struct perf_sample_data data;
> +       struct pmu_hw_events *cpuc = this_cpu_ptr(csky_pmu.hw_events);
> +       struct pt_regs *regs;
> +       int idx;
> +
> +       /*
> +        * Did an overflow occur?
> +        */
> +       if (!cprcr(HPOFSR))
> +               return IRQ_NONE;
> +
> +       /*
> +        * Handle the counter(s) overflow(s)
> +        */
> +       regs = get_irq_regs();
> +
> +       csky_pmu_disable(&csky_pmu.pmu);
> +       for (idx = 0; idx < CSKY_PMU_MAX_EVENTS; ++idx) {
> +               struct perf_event *event = cpuc->events[idx];
> +               struct hw_perf_event *hwc;
> +
> +               /* Ignore if we don't have an event. */
> +               if (!event)
> +                       continue;
> +               /*
> +                * We have a single interrupt for all counters. Check that
> +                * each counter has overflowed before we process it.
> +                */
> +               if (!(cprcr(HPOFSR) & 1 << idx))
> +                       continue;
> +
> +               hwc = &event->hw;
> +               csky_perf_event_update(event, &event->hw);
> +               perf_sample_data_init(&data, 0, hwc->last_period);
> +               csky_pmu_event_set_period(event);
> +
> +               if (perf_event_overflow(event, &data, regs))
> +                       csky_pmu_stop_event(event);
> +       }
> +       csky_pmu_enable(&csky_pmu.pmu);
> +       /*
> +        * Handle the pending perf events.
> +        *
> +        * Note: this call *must* be run with interrupts disabled. For
> +        * platforms that can have the PMU interrupts raised as an NMI, this
> +        * will not work.
> +        */
> +       irq_work_run();
> +
> +       return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t csky_pmu_dispatch_irq(int irq, void *dev)
Remove the hook function. It's unnecessary for now.

> +{
> +       int ret;
> +
> +       ret = csky_pmu.handle_irq(irq);
> +
> +       return ret;
> +}
> +
> +static int csky_pmu_request_irq(irq_handler_t handler)
> +{
> +       int err, irq, irqs;
> +       struct platform_device *pmu_device = csky_pmu.plat_device;
> +
> +       if (!pmu_device)
> +               return -ENODEV;
> +
> +       irqs = min(pmu_device->num_resources, num_possible_cpus());
> +       if (irqs < 1) {
> +               pr_err("no irqs for PMUs defined\n");
> +               return -ENODEV;
> +       }
> +
> +       irq = platform_get_irq(pmu_device, 0);
> +       if (irq < 0)
> +               return -ENODEV;
> +       err = request_percpu_irq(irq, handler, "csky-pmu",
> +                                this_cpu_ptr(csky_pmu.hw_events));
> +       if (err) {
> +               pr_err("unable to request IRQ%d for CSKY PMU counters\n",
> +                      irq);
> +               return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static void csky_pmu_free_irq(void)
> +{
> +       int irq;
> +       struct platform_device *pmu_device = csky_pmu.plat_device;
> +
> +       irq = platform_get_irq(pmu_device, 0);
> +       if (irq >= 0)
> +               free_percpu_irq(irq, this_cpu_ptr(csky_pmu.hw_events));
> +}
> +
>  int __init init_hw_perf_events(void)
>  {
> +       int cpu;
> +
> +       csky_pmu.hw_events = alloc_percpu_gfp(struct pmu_hw_events,
> +                                             GFP_KERNEL);
> +       if (!csky_pmu.hw_events) {
> +               pr_info("failed to allocate per-cpu PMU data.\n");
> +               return -ENOMEM;
> +       }
> +
>         csky_pmu.pmu = (struct pmu) {
>                 .pmu_enable     = csky_pmu_enable,
>                 .pmu_disable    = csky_pmu_disable,
> @@ -971,6 +1186,16 @@ int __init init_hw_perf_events(void)
>                 .read           = csky_pmu_read,
>         };
>
> +       csky_pmu.handle_irq = csky_pmu_handle_irq;
> +       csky_pmu.reset = csky_pmu_reset;
> +
> +       for_each_possible_cpu(cpu) {
> +               struct pmu_hw_events *events;
> +
> +               events = per_cpu_ptr(csky_pmu.hw_events, cpu);
> +               raw_spin_lock_init(&events->pmu_lock);
> +       }
> +
>         memset((void *)hw_raw_read_mapping, 0,
>                 sizeof(hw_raw_read_mapping[CSKY_PMU_MAX_EVENTS]));
>
> @@ -1031,11 +1256,19 @@ int __init init_hw_perf_events(void)
>         hw_raw_write_mapping[0x1a] = csky_pmu_write_l2wac;
>         hw_raw_write_mapping[0x1b] = csky_pmu_write_l2wmc;
>
> -       csky_pmu.pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
> +       return 0;
> +}
>
> -       cpwcr(HPCR, BIT(31) | BIT(30) | BIT(1));
> +static int csky_pmu_starting_cpu(unsigned int cpu)
> +{
> +       enable_percpu_irq(csky_pmu_irq, 0);
> +       return 0;
> +}
>
> -       return perf_pmu_register(&csky_pmu.pmu, "cpu", PERF_TYPE_RAW);
> +static int csky_pmu_dying_cpu(unsigned int cpu)
> +{
> +       disable_percpu_irq(csky_pmu_irq);
> +       return 0;
>  }
>
>  int csky_pmu_device_probe(struct platform_device *pdev,
> @@ -1052,14 +1285,41 @@ int csky_pmu_device_probe(struct platform_device *pdev,
>                 ret = init_fn(&csky_pmu);
>         }
>
> +       if (ret) {
> +               pr_notice("[perf] failed to probe PMU!\n");
> +               return ret;
> +       }
> +
>         if (!of_property_read_u32(node, "count-width",
>                                   &csky_pmu.count_width)) {
>                 csky_pmu.count_width = DEFAULT_COUNT_WIDTH;
>         }
> +       csky_pmu.max_period = ((u64)1 << csky_pmu.count_width) - 1;
>
> +       csky_pmu.plat_device = pdev;
> +
> +       /* Ensure the PMU has sane values out of reset. */
> +       if (csky_pmu.reset)
> +               on_each_cpu(csky_pmu.reset, &csky_pmu, 1);
Ditto, No reset hook!

> +
> +       ret = csky_pmu_request_irq(csky_pmu_dispatch_irq);
Ditto, unecessary hook for csky_pmu_dispatch_irq.

>         if (ret) {
> -               pr_notice("[perf] failed to probe PMU!\n");
> -               return ret;
> +               csky_pmu.pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
> +               pr_notice("[perf] PMU request irq fail!\n");
> +       }
> +
> +       ret = cpuhp_setup_state(CPUHP_AP_PERF_ONLINE, "AP_PERF_ONLINE",
> +                               csky_pmu_starting_cpu,
> +                               csky_pmu_dying_cpu);
> +       if (ret) {
> +               csky_pmu_free_irq();
> +               free_percpu(csky_pmu.hw_events);
No return ?

> +       }
> +
> +       ret = perf_pmu_register(&csky_pmu.pmu, "cpu", PERF_TYPE_RAW);
> +       if (ret) {
> +               csky_pmu_free_irq();
> +               free_percpu(csky_pmu.hw_events);
>         }
>
>         return ret;
> --
> 2.7.4
>

Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V4 6/6] csky: Fix perf record in kernel/user space
  2019-06-04  2:24 ` [PATCH V4 6/6] csky: Fix perf record in kernel/user space Mao Han
@ 2019-06-04  6:36   ` Guo Ren
  0 siblings, 0 replies; 12+ messages in thread
From: Guo Ren @ 2019-06-04  6:36 UTC (permalink / raw)
  To: Mao Han; +Cc: linux-kernel, linux-csky

Just move attr.exclude_user after switch, like this:

OK?

diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index 376c972..3470cfa 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -844,15 +844,6 @@ static int csky_pmu_event_init(struct perf_event *event)
        struct hw_perf_event *hwc = &event->hw;
        int ret;

-       if (event->attr.exclude_user)
-               csky_pmu.hpcr = BIT(2);
-       else if (event->attr.exclude_kernel)
-               csky_pmu.hpcr = BIT(3);
-       else
-               csky_pmu.hpcr = BIT(2) | BIT(3);
-
-       csky_pmu.hpcr |= BIT(1) | BIT(0);
-
        switch (event->attr.type) {
        case PERF_TYPE_HARDWARE:
                if (event->attr.config >= PERF_COUNT_HW_MAX)
@@ -861,21 +852,31 @@ static int csky_pmu_event_init(struct perf_event *event)
                if (ret == HW_OP_UNSUPPORTED)
                        return -ENOENT;
                hwc->idx = ret;
-               return 0;
+               break;
        case PERF_TYPE_HW_CACHE:
                ret = csky_pmu_cache_event(event->attr.config);
                if (ret == CACHE_OP_UNSUPPORTED)
                        return -ENOENT;
                hwc->idx = ret;
-               return 0;
+               break;
        case PERF_TYPE_RAW:
                if (hw_raw_read_mapping[event->attr.config] == NULL)
                        return -ENOENT;
                hwc->idx = event->attr.config;
-               return 0;
+               break;
        default:
                return -ENOENT;
        }
+
+       if (event->attr.exclude_user)
+               csky_pmu.hpcr = BIT(2);
+       else if (event->attr.exclude_kernel)
+               csky_pmu.hpcr = BIT(3);
+       else
+               csky_pmu.hpcr = BIT(2) | BIT(3);
+
+       csky_pmu.hpcr |= BIT(1) | BIT(0);
+
 }

On Tue, Jun 4, 2019 at 10:25 AM Mao Han <han_mao@c-sky.com> wrote:
>
> csky_pmu_event_init is called several times during the perf record
> initialzation. After configure the event counter in either kernel
> space or user space, csky_pmu_event_init is called twice with no
> attr specified. Configuration will be overwritten with sampling in
> both kernel space and user space. --all-kernel/--all-user is
> useless without this patch applied.
>
> Signed-off-by: Mao Han <han_mao@c-sky.com>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: linux-csky@vger.kernel.org
> ---
>  arch/csky/kernel/perf_event.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
> index dc84dc7..e3308ab 100644
> --- a/arch/csky/kernel/perf_event.c
> +++ b/arch/csky/kernel/perf_event.c
> @@ -983,6 +983,12 @@ static int csky_pmu_event_init(struct perf_event *event)
>         struct hw_perf_event *hwc = &event->hw;
>         int ret;
>
> +       if (event->attr.type != PERF_TYPE_HARDWARE &&
> +           event->attr.type != PERF_TYPE_HW_CACHE &&
> +           event->attr.type != PERF_TYPE_RAW) {
> +               return -ENOENT;
> +       }
> +
>         if (event->attr.exclude_user)
>                 csky_pmu.hpcr = BIT(2);
>         else if (event->attr.exclude_kernel)
> --
> 2.7.4
>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-04  6:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04  2:23 [PATCH V4 0/6] csky: Add pmu hardware sampling support Mao Han
2019-06-04  2:23 ` [PATCH V4 1/6] csky: Init pmu as a device Mao Han
2019-06-04  5:50   ` Guo Ren
2019-06-04  2:23 ` [PATCH V4 2/6] csky: Add count-width property for csky pmu Mao Han
2019-06-04  5:35   ` Guo Ren
2019-06-04  2:23 ` [PATCH V4 3/6] csky: Add pmu interrupt support Mao Han
2019-06-04  6:31   ` Guo Ren
2019-06-04  2:23 ` [PATCH V4 4/6] dt-bindings: csky: Add csky PMU bindings Mao Han
2019-06-04  5:38   ` Guo Ren
2019-06-04  2:23 ` [PATCH V4 5/6] csky: Fixup some error count in 810 & 860 Mao Han
2019-06-04  2:24 ` [PATCH V4 6/6] csky: Fix perf record in kernel/user space Mao Han
2019-06-04  6:36   ` Guo Ren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).