linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] ARC perf updates
@ 2017-11-07 22:13 Vineet Gupta
  2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc; +Cc: linux-kernel, Peter Zijlstra, Vineet Gupta

Hi,

Found these when cleaning up some old branches. The only controversial one
could be the last one.

Thx,
-Vineet

Vineet Gupta (4):
  ARCv2: perf: tweak overflow interrupt
  ARCv2: perf: optimize given that num counters <= 32
  ARC: perf: avoid vmalloc backed mmap
  ARCv2: entry: Reduce perf intr return path

 arch/arc/Kconfig                   |  2 +-
 arch/arc/include/asm/entry-arcv2.h |  2 ++
 arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
 arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
 4 files changed, 46 insertions(+), 21 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] ARCv2: perf: tweak overflow interrupt
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc; +Cc: linux-kernel, Peter Zijlstra, Vineet Gupta

Current perf ISR loops thru all 32 counters, checking for each if it
caused the interrupt. Instead only loop thru counters which actually
interrupted (typically 1).

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/kernel/perf_event.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 2ce24e74f879..0eaa132a2c90 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -377,21 +377,22 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
 	struct perf_sample_data data;
 	struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
 	struct pt_regs *regs;
-	int active_ints;
+	unsigned int active_ints;
 	int idx;
 
 	arc_pmu_disable(&arc_pmu->pmu);
 
 	active_ints = read_aux_reg(ARC_REG_PCT_INT_ACT);
+	if (!active_ints)
+		goto done;
 
 	regs = get_irq_regs();
 
-	for (idx = 0; idx < arc_pmu->n_counters; idx++) {
-		struct perf_event *event = pmu_cpu->act_counter[idx];
+	do {
+		struct perf_event *event;
 		struct hw_perf_event *hwc;
 
-		if (!(active_ints & (1 << idx)))
-			continue;
+		idx = __ffs(active_ints);
 
 		/* Reset interrupt flag by writing of 1 */
 		write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
@@ -404,19 +405,22 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
 		write_aux_reg(ARC_REG_PCT_INT_CTRL,
 			read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
 
+		event = pmu_cpu->act_counter[idx];
 		hwc = &event->hw;
 
 		WARN_ON_ONCE(hwc->idx != idx);
 
 		arc_perf_event_update(event, &event->hw, event->hw.idx);
 		perf_sample_data_init(&data, 0, hwc->last_period);
-		if (!arc_pmu_event_set_period(event))
-			continue;
+		if (arc_pmu_event_set_period(event)) {
+			if (perf_event_overflow(event, &data, regs))
+				arc_pmu_stop(event, 0);
+		}
 
-		if (perf_event_overflow(event, &data, regs))
-			arc_pmu_stop(event, 0);
-	}
+		active_ints &= ~(1U << idx);
+	} while (active_ints);
 
+done:
 	arc_pmu_enable(&arc_pmu->pmu);
 
 	return IRQ_HANDLED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
  2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc; +Cc: linux-kernel, Peter Zijlstra, Vineet Gupta

use ffz primitive which maps to ARCv2 instruction, vs. non atomic
__test_and_set_bit

It is unlikely if we will even have more than 32 counters, but still add
a BUILD_BUG to catch that

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/kernel/perf_event.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 0eaa132a2c90..8aec462d90fb 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -336,15 +336,12 @@ static int arc_pmu_add(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (__test_and_set_bit(idx, pmu_cpu->used_mask)) {
-		idx = find_first_zero_bit(pmu_cpu->used_mask,
-					  arc_pmu->n_counters);
-		if (idx == arc_pmu->n_counters)
-			return -EAGAIN;
-
-		__set_bit(idx, pmu_cpu->used_mask);
-		hwc->idx = idx;
-	}
+	idx = ffz(pmu_cpu->used_mask[0]);
+	if (idx == arc_pmu->n_counters)
+		return -EAGAIN;
+
+	__set_bit(idx, pmu_cpu->used_mask);
+	hwc->idx = idx;
 
 	write_aux_reg(ARC_REG_PCT_INDEX, idx);
 
@@ -465,6 +462,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 		pr_err("This core does not have performance counters!\n");
 		return -ENODEV;
 	}
+	BUILD_BUG_ON(ARC_PERF_MAX_COUNTERS > 32);
 	BUG_ON(pct_bcr.c > ARC_PERF_MAX_COUNTERS);
 
 	READ_BCR(ARC_REG_CC_BUILD, cc_bcr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
  2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
  2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc; +Cc: linux-kernel, Peter Zijlstra, Vineet Gupta

For non-alising Dcache, vmalloc is not needed.

vmalloc triggers additonal D-TLB Misses in the perf interrupt code path
making it slightly inefficient as evident from hackbench runs below.

| [ARCLinux]# perf stat -e dTLB-load-misses --repeat 5 hackbench
| Running with 10*40 (== 400) tasks.
| Time: 35.060
| ...
| Performance counter stats for 'hackbench' (5 runs):

Before:      399235      dTLB-load-misses ( +-  2.08% )
After :      397676      dTLB-load-misses ( +-  2.27% )

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index c84e67fdea09..f3cad98eeb8f 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -39,7 +39,7 @@ config ARC
 	select OF
 	select OF_EARLY_FLATTREE
 	select OF_RESERVED_MEM
-	select PERF_USE_VMALLOC
+	select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_KERNEL_GZIP
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
                   ` (2 preceding siblings ...)
  2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-14 10:28   ` Peter Zijlstra
  2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
  2017-11-21 22:09 ` Vineet Gupta
  5 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc; +Cc: linux-kernel, Peter Zijlstra, Vineet Gupta

In the more likely case of returning to kernel from perf interrupt, do a
fast path returning w/o bothering about CONFIG_PREEMPT etc

However, if returning to user space, need do go thru the usual gyrations,
as check for pending signals is an absolute must.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/entry-arcv2.h |  2 ++
 arch/arc/kernel/entry-arcv2.S      | 23 ++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arc/include/asm/entry-arcv2.h b/arch/arc/include/asm/entry-arcv2.h
index 257a68f3c2fe..8b49b327b1f9 100644
--- a/arch/arc/include/asm/entry-arcv2.h
+++ b/arch/arc/include/asm/entry-arcv2.h
@@ -58,6 +58,8 @@
 /*------------------------------------------------------------------------*/
 .macro INTERRUPT_EPILOGUE	called_from
 
+	; Assumes STATUS32.Z bit set if return to K
+
 .ifnc \called_from, exception
 	add	sp, sp, 12	; skip BTA/ECR/orig_r0 placeholderss
 .endif
diff --git a/arch/arc/kernel/entry-arcv2.S b/arch/arc/kernel/entry-arcv2.S
index cc558a25b8fa..9ca1d146426b 100644
--- a/arch/arc/kernel/entry-arcv2.S
+++ b/arch/arc/kernel/entry-arcv2.S
@@ -51,7 +51,7 @@ VECTOR	handle_interrupt	; (16) Timer0
 VECTOR	handle_interrupt	; unused (Timer1)
 VECTOR	handle_interrupt	; unused (WDT)
 VECTOR	handle_interrupt	; (19) Inter core Interrupt (IPI)
-VECTOR	handle_interrupt	; (20) perf Interrupt
+VECTOR	handle_interrupt_pct	; (20) perf Interrupt
 VECTOR	handle_interrupt	; (21) Software Triggered Intr (Self IPI)
 VECTOR	handle_interrupt	; unused
 VECTOR	handle_interrupt	; (23) unused
@@ -97,6 +97,26 @@ ENTRY(handle_interrupt)
 
 END(handle_interrupt)
 
+ENTRY(handle_interrupt_pct)
+
+	INTERRUPT_PROLOGUE  irq
+
+	IRQ_DISABLE
+
+	lr	r0, [ICAUSE]
+
+	bl.d	arch_do_IRQ
+	mov	r1, sp
+
+	ld	r0, [sp, PT_status32]   ; returning to User/Kernel Mode
+	btst	r0, STATUS_U_BIT
+	bnz	resume_user_mode_begin
+
+	clri
+	b	.Lisr_ret_fast_path_to_k
+
+END(handle_interrupt_pct)
+
 ;################### Non TLB Exception Handling #############################
 
 ENTRY(EV_SWI)
@@ -224,6 +244,7 @@ debug_marker_l1:
 	bset.nz	r11, r11, AUX_IRQ_ACT_BIT_U	; NZ means U
 	sr	r11, [AUX_IRQ_ACT]
 
+.Lisr_ret_fast_path_to_k:
 	INTERRUPT_EPILOGUE  irq
 	rtie
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] ARC perf updates
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
                   ` (3 preceding siblings ...)
  2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
@ 2017-11-13 21:30 ` Vineet Gupta
  2017-11-21 22:09 ` Vineet Gupta
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-13 21:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-snps-arc, linux-kernel

On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> Hi,
> 
> Found these when cleaning up some old branches. The only controversial one
> could be the last one.
> 
> Thx,
> -Vineet
> 
> Vineet Gupta (4):
>    ARCv2: perf: tweak overflow interrupt
>    ARCv2: perf: optimize given that num counters <= 32
>    ARC: perf: avoid vmalloc backed mmap
>    ARCv2: entry: Reduce perf intr return path
> 
>   arch/arc/Kconfig                   |  2 +-
>   arch/arc/include/asm/entry-arcv2.h |  2 ++
>   arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
>   arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
>   4 files changed, 46 insertions(+), 21 deletions(-)
> 

Peter, any specific likes/dislikes for this series !

-Vineet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
@ 2017-11-14 10:28   ` Peter Zijlstra
  2017-11-14 23:01     ` Vineet Gupta
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-14 10:28 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-snps-arc, linux-kernel

On Tue, Nov 07, 2017 at 02:13:04PM -0800, Vineet Gupta wrote:
> In the more likely case of returning to kernel from perf interrupt, do a
> fast path returning w/o bothering about CONFIG_PREEMPT etc

I think this needs more explaining and certainly also deserves a code
comment.

Is the argument something along these lines?

  Assumes the interrupt will never set TIF_NEED_RESCHED;
  therefore no preemption is ever required on return from
  the interrupt.

What do you (on ARC) do about irq_work ?

> +ENTRY(handle_interrupt_pct)
> +
> +	INTERRUPT_PROLOGUE  irq
> +
> +	IRQ_DISABLE
> +
> +	lr	r0, [ICAUSE]
> +
> +	bl.d	arch_do_IRQ
> +	mov	r1, sp
> +
> +	ld	r0, [sp, PT_status32]   ; returning to User/Kernel Mode
> +	btst	r0, STATUS_U_BIT
> +	bnz	resume_user_mode_begin
> +
> +	clri
> +	b	.Lisr_ret_fast_path_to_k
> +
> +END(handle_interrupt_pct)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-14 10:28   ` Peter Zijlstra
@ 2017-11-14 23:01     ` Vineet Gupta
  2017-11-15 10:18       ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-14 23:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-snps-arc, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4116 bytes --]

On 11/14/2017 02:28 AM, Peter Zijlstra wrote:
> On Tue, Nov 07, 2017 at 02:13:04PM -0800, Vineet Gupta wrote:
>> In the more likely case of returning to kernel from perf interrupt, do a
>> fast path returning w/o bothering about CONFIG_PREEMPT etc
> 
> I think this needs more explaining and certainly also deserves a code
> comment.

Sure ! It was a quick hack mainly to solicit feedback.


> Is the argument something along these lines?
> 
>    Assumes the interrupt will never set TIF_NEED_RESCHED;
>    therefore no preemption is ever required on return from
>    the interrupt.

No. I don't think we can assume that. But I was choosing to ignore it mainly to 
reduce the overhead of a perf intr in general. A subsequent real interrupt could 
go thru thru the gyrations of preemption etc.


> What do you (on ARC) do about irq_work ?

Nothing ATM. At the time of NPS platform upstreaming, Noam had proposed a patch to 
enable work as part of NO_HZ_FULL support on ARC for NPS. That part of patchseries 
didn't get merged and then life took over...

I'm attaching it here, can be added to ARC I suppose after a bit of deuglification 
for #ifdef'ery.

Although I'm sure it is, can you please explain how irq_work is relevant in the 
context of this patch. Even w/o the TIF_NEED_RESCHED, the generic intr return 
could still invoke the irq_work for perf.

I remember from my investigations at the time that it was originally needed for 
perf_event_do_pending() to do stuff in IRQ context. It was later factored out as 
generic irq_work facility for no_hz and printk !

------------>
 From c120294c80d61bcad223c96bd49303357f95afdc Mon Sep 17 00:00:00 2001
From: Noam Camus <noamc@ezchip.com>
Date: Sun, 21 Jun 2015 01:41:22 +0300
Subject: [PATCH] ARC: Support arch_irq_work_raise() via self IPIs

This will allow the scheduler tick to be restarted.
It is used if CONFIG_NO_HZ_FULL.

Signed-off-by: Gil Fruchter <gilf@ezchip.com>
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
  arch/arc/include/asm/Kbuild     |  1 -
  arch/arc/include/asm/irq_work.h |  8 ++++++++
  arch/arc/kernel/smp.c           | 16 ++++++++++++++++
  3 files changed, 24 insertions(+), 1 deletion(-)
  create mode 100644 arch/arc/include/asm/irq_work.h

diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 0b10ef2a4372..f83ec8c04feb 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += ioctl.h
  generic-y += ioctls.h
  generic-y += ipcbuf.h
  generic-y += irq_regs.h
-generic-y += irq_work.h
  generic-y += kmap_types.h
  generic-y += kvm_para.h
  generic-y += local.h
diff --git a/arch/arc/include/asm/irq_work.h b/arch/arc/include/asm/irq_work.h
new file mode 100644
index 000000000000..40d015199ced
--- /dev/null
+++ b/arch/arc/include/asm/irq_work.h
@@ -0,0 +1,8 @@
+#ifndef _ASM_ARC_IRQ_WORK_H
+#define _ASM_ARC_IRQ_WORK_H
+static inline bool arch_irq_work_has_interrupt(void)
+{
+	return true;
+}
+
+#endif /* _ASM_ARC_IRQ_WORK_H */
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index f2cf58b771b2..c2f49db57265 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -22,6 +22,7 @@
  #include <linux/atomic.h>
  #include <linux/cpumask.h>
  #include <linux/reboot.h>
+#include <linux/irq_work.h>
  #include <asm/processor.h>
  #include <asm/setup.h>
  #include <asm/mach_desc.h>
@@ -202,6 +203,7 @@ enum ipi_msg_type {
  	IPI_RESCHEDULE = 1,
  	IPI_CALL_FUNC,
  	IPI_CPU_STOP,
+	IPI_IRQ_WORK,
  };

  /*
@@ -276,6 +278,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
  	ipi_send_msg(mask, IPI_CALL_FUNC);
  }

+#ifdef CONFIG_IRQ_WORK
+void arch_irq_work_raise(void)
+{
+	if (arch_irq_work_has_interrupt())
+		ipi_send_msg_one(smp_processor_id(), IPI_IRQ_WORK);
+}
+#endif
+
  /*
   * ipi_cpu_stop - handle IPI from smp_send_stop()
   */
@@ -301,6 +311,12 @@ static inline int __do_IPI(unsigned long msg)
  		ipi_cpu_stop();
  		break;

+#ifdef CONFIG_IRQ_WORK
+	case IPI_IRQ_WORK:
+		irq_work_run();
+		break;
+#endif
+
  	default:
  		rc = 1;
  	}
-- 
2.7.4

[-- Attachment #2: 0001-ARC-Support-arch_irq_work_raise-via-self-IPIs.patch --]
[-- Type: text/x-patch, Size: 2504 bytes --]

>From c120294c80d61bcad223c96bd49303357f95afdc Mon Sep 17 00:00:00 2001
From: Noam Camus <noamc@ezchip.com>
Date: Sun, 21 Jun 2015 01:41:22 +0300
Subject: [PATCH] ARC: Support arch_irq_work_raise() via self IPIs

This will allow the scheduler tick to be restarted.
It is used if CONFIG_NO_HZ_FULL.

Signed-off-by: Gil Fruchter <gilf@ezchip.com>
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/Kbuild     |  1 -
 arch/arc/include/asm/irq_work.h |  8 ++++++++
 arch/arc/kernel/smp.c           | 16 ++++++++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 arch/arc/include/asm/irq_work.h

diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 0b10ef2a4372..f83ec8c04feb 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += ioctl.h
 generic-y += ioctls.h
 generic-y += ipcbuf.h
 generic-y += irq_regs.h
-generic-y += irq_work.h
 generic-y += kmap_types.h
 generic-y += kvm_para.h
 generic-y += local.h
diff --git a/arch/arc/include/asm/irq_work.h b/arch/arc/include/asm/irq_work.h
new file mode 100644
index 000000000000..40d015199ced
--- /dev/null
+++ b/arch/arc/include/asm/irq_work.h
@@ -0,0 +1,8 @@
+#ifndef _ASM_ARC_IRQ_WORK_H
+#define _ASM_ARC_IRQ_WORK_H
+static inline bool arch_irq_work_has_interrupt(void)
+{
+	return true;
+}
+
+#endif /* _ASM_ARC_IRQ_WORK_H */
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index f2cf58b771b2..c2f49db57265 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -22,6 +22,7 @@
 #include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/reboot.h>
+#include <linux/irq_work.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/mach_desc.h>
@@ -202,6 +203,7 @@ enum ipi_msg_type {
 	IPI_RESCHEDULE = 1,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
+	IPI_IRQ_WORK,
 };
 
 /*
@@ -276,6 +278,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 	ipi_send_msg(mask, IPI_CALL_FUNC);
 }
 
+#ifdef CONFIG_IRQ_WORK
+void arch_irq_work_raise(void)
+{
+	if (arch_irq_work_has_interrupt())
+		ipi_send_msg_one(smp_processor_id(), IPI_IRQ_WORK);
+}
+#endif
+
 /*
  * ipi_cpu_stop - handle IPI from smp_send_stop()
  */
@@ -301,6 +311,12 @@ static inline int __do_IPI(unsigned long msg)
 		ipi_cpu_stop();
 		break;
 
+#ifdef CONFIG_IRQ_WORK
+	case IPI_IRQ_WORK:
+		irq_work_run();
+		break;
+#endif
+
 	default:
 		rc = 1;
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-14 23:01     ` Vineet Gupta
@ 2017-11-15 10:18       ` Peter Zijlstra
  2017-11-17 23:42         ` Vineet Gupta
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-15 10:18 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-snps-arc, linux-kernel

On Tue, Nov 14, 2017 at 03:01:26PM -0800, Vineet Gupta wrote:
> On 11/14/2017 02:28 AM, Peter Zijlstra wrote:
> > On Tue, Nov 07, 2017 at 02:13:04PM -0800, Vineet Gupta wrote:
> > > In the more likely case of returning to kernel from perf interrupt, do a
> > > fast path returning w/o bothering about CONFIG_PREEMPT etc
> > 
> > I think this needs more explaining and certainly also deserves a code
> > comment.
> 
> Sure ! It was a quick hack mainly to solicit feedback.
> 
> 
> > Is the argument something along these lines?
> > 
> >    Assumes the interrupt will never set TIF_NEED_RESCHED;
> >    therefore no preemption is ever required on return from
> >    the interrupt.
> 
> No. I don't think we can assume that.

Well, given we run that code from NMI context on a number of platforms
(x86 being one of them) it can not in fact do things like wakeups.

So the pure perf-interrupt part should never set TIF_NEED_RESCHED.

I think we can actually make that assumption.

> But I was choosing to ignore it mainly to reduce the overhead of a
> perf intr in general. A subsequent real interrupt could go thru thru
> the gyrations of preemption etc.

So that's dangerous thinking... People that run a PREEMPT kernel
generally tend to care about latency (esp. when combined with
PREEMPT_RT).

And ignoring a preemption point gets these people upset (and missed
preemptions are a royal friggin pain to debug).

> > What do you (on ARC) do about irq_work ?
> 
> Nothing ATM.

So the reason I'm asking is that some architectures that don't have NMIs
call irq_work_run() at the very end of their perf-interrupt handler (ARM
does this for instance).

And the thing is, _that_ can and does do things like wakeups and will
thus require doing the PREEMPT thing.

> Although I'm sure it is, can you please explain how irq_work is relevant in
> the context of this patch.

Since the perf interrupt (in general) cannot call a whole lot of things
for it needs to assume running from NMI context, it needs to defer
things to a more regular context. It does this with irq_work.

So for instance, when the output buffer reaches its watermark, we'll
raise the irq_work to issue the wakeup of tasks that poll() on that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-15 10:18       ` Peter Zijlstra
@ 2017-11-17 23:42         ` Vineet Gupta
  2017-11-21 23:26           ` Vineet Gupta
  0 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-17 23:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-snps-arc, linux-kernel


>> But I was choosing to ignore it mainly to reduce the overhead of a
>> perf intr in general. A subsequent real interrupt could go thru thru
>> the gyrations of preemption etc.
> 
> So that's dangerous thinking... People that run a PREEMPT kernel
> generally tend to care about latency (esp. when combined with
> PREEMPT_RT).
> 
> And ignoring a preemption point gets these people upset (and missed
> preemptions are a royal friggin pain to debug).

Which implies that this patch goes to trash ! Unless we think that running 
instrumentation (perf) on production systems will not yield the same behavior in 
general.

>>> What do you (on ARC) do about irq_work ?
>>
>> Nothing ATM.

What I meant was lack of support for arch_irq_work_raise(). But given that we 
don't have NMIs (yet), this need *not* be a must as things could actually be 
scheduled in the regular intr return path ? At any rate arch_irq_work_raise() is 
not relevant for this discussion since NMIs are not involved.

> So the reason I'm asking is that some architectures that don't have NMIs
> call irq_work_run() at the very end of their perf-interrupt handler (ARM
> does this for instance).

But on ARC, we don't call irq_work_run() in perf intr return path and that seem to 
imply it is broken - as in latency to service a perf induced preemption.

> And the thing is, _that_ can and does do things like wakeups and will
> thus require doing the PREEMPT thing.

Reassures that this patch has to go to trash anyways, but I'm more worried about 
perf intr return for ARC in general.

>> Although I'm sure it is, can you please explain how irq_work is relevant in
>> the context of this patch.
> 
> Since the perf interrupt (in general) cannot call a whole lot of things
> for it needs to assume running from NMI context, it needs to defer
> things to a more regular context. It does this with irq_work.

And so do places such as flush_smp_call_function_queue() where the cross-core IPI 
could be an NMI.

> So for instance, when the output buffer reaches its watermark, we'll
> raise the irq_work to issue the wakeup of tasks that poll() on that.

Cool, thx for the explanation.
Perhaps I should put it in a Documentation/irq_work.txt pr some such !

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] ARC perf updates
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
                   ` (4 preceding siblings ...)
  2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
@ 2017-11-21 22:09 ` Vineet Gupta
  2017-11-21 23:13   ` Peter Zijlstra
  5 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-21 22:09 UTC (permalink / raw)
  To: linux-snps-arc, Peter Zijlstra; +Cc: linux-kernel

On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> Hi,
> 
> Found these when cleaning up some old branches. The only controversial one
> could be the last one.
> 
> Thx,
> -Vineet
> 
> Vineet Gupta (4):
>    ARCv2: perf: tweak overflow interrupt
>    ARCv2: perf: optimize given that num counters <= 32
>    ARC: perf: avoid vmalloc backed mmap
>    ARCv2: entry: Reduce perf intr return path
> 
>   arch/arc/Kconfig                   |  2 +-
>   arch/arc/include/asm/entry-arcv2.h |  2 ++
>   arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
>   arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
>   4 files changed, 46 insertions(+), 21 deletions(-)

Peter, do I need an ACK from you for the initial 3 patches to able to add them to 
ARC tree ?

-Vineet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] ARC perf updates
  2017-11-21 22:09 ` Vineet Gupta
@ 2017-11-21 23:13   ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-21 23:13 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-snps-arc, linux-kernel

On Tue, Nov 21, 2017 at 02:09:38PM -0800, Vineet Gupta wrote:
> On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> > Hi,
> > 
> > Found these when cleaning up some old branches. The only controversial one
> > could be the last one.
> > 
> > Thx,
> > -Vineet
> > 
> > Vineet Gupta (4):
> >    ARCv2: perf: tweak overflow interrupt
> >    ARCv2: perf: optimize given that num counters <= 32
> >    ARC: perf: avoid vmalloc backed mmap
> >    ARCv2: entry: Reduce perf intr return path
> > 
> >   arch/arc/Kconfig                   |  2 +-
> >   arch/arc/include/asm/entry-arcv2.h |  2 ++
> >   arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
> >   arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
> >   4 files changed, 46 insertions(+), 21 deletions(-)
> 
> Peter, do I need an ACK from you for the initial 3 patches to able to add
> them to ARC tree ?

Sure, for 1-3

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-17 23:42         ` Vineet Gupta
@ 2017-11-21 23:26           ` Vineet Gupta
  0 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-21 23:26 UTC (permalink / raw)
  To: Vineet Gupta, Peter Zijlstra; +Cc: linux-snps-arc, linux-kernel

On 11/17/2017 03:42 PM, Vineet Gupta wrote:

>>>> What do you (on ARC) do about irq_work ?
>>>
> 
>> So the reason I'm asking is that some architectures that don't have NMIs
>> call irq_work_run() at the very end of their perf-interrupt handler (ARM
>> does this for instance).
> 
> But on ARC, we don't call irq_work_run() in perf intr return path and that seem to 
> imply it is broken - as in latency to service a perf induced preemption.

[snip...]

>>> Although I'm sure it is, can you please explain how irq_work is relevant in
>>> the context of this patch.
>>
>> Since the perf interrupt (in general) cannot call a whole lot of things
>> for it needs to assume running from NMI context, it needs to defer
>> things to a more regular context. It does this with irq_work.

So given my understanding of this topic, ARC (or any non NMI based perf intr 
system) is potentially broken without irq_work_run() ?

I can follow up with a patch for ARC, or does this need to addressed for others 
too - say irq_exit_perf() or some such ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-21 23:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
2017-11-14 10:28   ` Peter Zijlstra
2017-11-14 23:01     ` Vineet Gupta
2017-11-15 10:18       ` Peter Zijlstra
2017-11-17 23:42         ` Vineet Gupta
2017-11-21 23:26           ` Vineet Gupta
2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-21 22:09 ` Vineet Gupta
2017-11-21 23:13   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).