All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: local64_cmpxchg() in arc_perf_event_update()
       [not found] <1445286926.3913.13.camel@synopsys.com>
@ 2015-11-17  9:14   ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17  9:14 UTC (permalink / raw)
  To: Alexey Brodkin; +Cc: Peter Zijlstra, arcml, lkml

On Tuesday 20 October 2015 02:05 AM, Alexey Brodkin wrote:
> Hi Vineet,
>
> Looking at a patch that Peter Z mentioned:
> http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/include/linux/perf_event.h?id=b0e878759452314676f
> bdd71df4ac67e7d08de5d
>
>
> And it says here http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/include/linux/perf_event.h#n160
> --------->8----------
> 	/*
> 	 * The last observed hardware counter value, updated with a
> 	 * local64_cmpxchg() such that pmu::read() can be called nested.
> 	 */
> 	local64_t			prev_count;
> --------->8----------
>
> So now I think we may want to return to local64_cmpxchg() in arc_perf_event_update() as well.
> The reason is having no control over generic perf code we cannot really guarantee that pmu->read() won't
> happen at random moment right before we enter perf IRQ handler (where we execute arc_perf_event_update() directly).
>
> In other words arc_perf_event_update() is used in IRQ handler and outside it and chances are that
> function will be reentered at some point.
>
> Agree?

Let's check with Peter as I'm not sure how exactly the read call will nest for
same counter on same core ?

But if they do then indeed, then commit 1fe8bfa5ff3b (ARCv2: perf: implement
"event_set_period") needs to be partially reverted.

-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* local64_cmpxchg() in arc_perf_event_update()
@ 2015-11-17  9:14   ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17  9:14 UTC (permalink / raw)
  To: linux-snps-arc

On Tuesday 20 October 2015 02:05 AM, Alexey Brodkin wrote:
> Hi Vineet,
>
> Looking at a patch that Peter Z mentioned:
> http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/include/linux/perf_event.h?id=b0e878759452314676f
> bdd71df4ac67e7d08de5d
>
>
> And it says here http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/include/linux/perf_event.h#n160
> --------->8----------
> 	/*
> 	 * The last observed hardware counter value, updated with a
> 	 * local64_cmpxchg() such that pmu::read() can be called nested.
> 	 */
> 	local64_t			prev_count;
> --------->8----------
>
> So now I think we may want to return to local64_cmpxchg() in arc_perf_event_update() as well.
> The reason is having no control over generic perf code we cannot really guarantee that pmu->read() won't
> happen at random moment right before we enter perf IRQ handler (where we execute arc_perf_event_update() directly).
>
> In other words arc_perf_event_update() is used in IRQ handler and outside it and chances are that
> function will be reentered at some point.
>
> Agree?

Let's check with Peter as I'm not sure how exactly the read call will nest for
same counter on same core ?

But if they do then indeed, then commit 1fe8bfa5ff3b (ARCv2: perf: implement
"event_set_period") needs to be partially reverted.

-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: local64_cmpxchg() in arc_perf_event_update()
  2015-11-17  9:14   ` Vineet Gupta
@ 2015-11-17 11:07     ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 11:07 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Alexey Brodkin, arcml, lkml

On Tue, Nov 17, 2015 at 09:14:59AM +0000, Vineet Gupta wrote:
> Let's check with Peter as I'm not sure how exactly the read call will
> nest for same counter on same core ?

Various possible ways, but the easiest is userspace doing a sys_read()
on the counter while the NMI happens.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* local64_cmpxchg() in arc_perf_event_update()
@ 2015-11-17 11:07     ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 11:07 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 17, 2015@09:14:59AM +0000, Vineet Gupta wrote:
> Let's check with Peter as I'm not sure how exactly the read call will
> nest for same counter on same core ?

Various possible ways, but the easiest is userspace doing a sys_read()
on the counter while the NMI happens.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: local64_cmpxchg() in arc_perf_event_update()
  2015-11-17 11:07     ` Peter Zijlstra
@ 2015-11-17 11:23       ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17 11:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Alexey Brodkin, arcml, lkml

On Tuesday 17 November 2015 04:37 PM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015 at 09:14:59AM +0000, Vineet Gupta wrote:
>> > Let's check with Peter as I'm not sure how exactly the read call will
>> > nest for same counter on same core ?
> Various possible ways, but the easiest is userspace doing a sys_read()
> on the counter while the NMI happens.
> 
> 

That means Alexey need to revert the hunk ?

 static void arc_perf_event_update(struct perf_event *event,
 				  struct hw_perf_event *hwc, int idx)
 {
-	uint64_t prev_raw_count, new_raw_count;
-	int64_t delta;
-
-	do {
-		prev_raw_count = local64_read(&hwc->prev_count);
-		new_raw_count = arc_pmu_read_counter(idx);
-	} while (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
-				 new_raw_count) != prev_raw_count);
-
-	delta = (new_raw_count - prev_raw_count) &
-		((1ULL << arc_pmu->counter_size) - 1ULL);
+	uint64_t prev_raw_count = local64_read(&hwc->prev_count);
+	uint64_t new_raw_count = arc_pmu_read_counter(idx);
+	int64_t delta = new_raw_count - prev_raw_count;

+	/*
+	 * We don't afaraid of hwc->prev_count changing beneath our feet
+	 * because there's no way for us to re-enter this function anytime.
+	 */
+	local64_set(&hwc->prev_count, new_raw_count);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* local64_cmpxchg() in arc_perf_event_update()
@ 2015-11-17 11:23       ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17 11:23 UTC (permalink / raw)
  To: linux-snps-arc

On Tuesday 17 November 2015 04:37 PM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015@09:14:59AM +0000, Vineet Gupta wrote:
>> > Let's check with Peter as I'm not sure how exactly the read call will
>> > nest for same counter on same core ?
> Various possible ways, but the easiest is userspace doing a sys_read()
> on the counter while the NMI happens.
> 
> 

That means Alexey need to revert the hunk ?

 static void arc_perf_event_update(struct perf_event *event,
 				  struct hw_perf_event *hwc, int idx)
 {
-	uint64_t prev_raw_count, new_raw_count;
-	int64_t delta;
-
-	do {
-		prev_raw_count = local64_read(&hwc->prev_count);
-		new_raw_count = arc_pmu_read_counter(idx);
-	} while (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
-				 new_raw_count) != prev_raw_count);
-
-	delta = (new_raw_count - prev_raw_count) &
-		((1ULL << arc_pmu->counter_size) - 1ULL);
+	uint64_t prev_raw_count = local64_read(&hwc->prev_count);
+	uint64_t new_raw_count = arc_pmu_read_counter(idx);
+	int64_t delta = new_raw_count - prev_raw_count;

+	/*
+	 * We don't afaraid of hwc->prev_count changing beneath our feet
+	 * because there's no way for us to re-enter this function anytime.
+	 */
+	local64_set(&hwc->prev_count, new_raw_count);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: local64_cmpxchg() in arc_perf_event_update()
  2015-11-17 11:23       ` Vineet Gupta
@ 2015-11-17 12:24         ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 12:24 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Alexey Brodkin, arcml, lkml

On Tue, Nov 17, 2015 at 04:53:04PM +0530, Vineet Gupta wrote:
> That means Alexey need to revert the hunk ?

Yes, I think so.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* local64_cmpxchg() in arc_perf_event_update()
@ 2015-11-17 12:24         ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 12:24 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 17, 2015@04:53:04PM +0530, Vineet Gupta wrote:
> That means Alexey need to revert the hunk ?

Yes, I think so.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: local64_cmpxchg() in arc_perf_event_update()
  2015-11-17 12:24         ` Peter Zijlstra
@ 2015-11-17 12:25           ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 12:25 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Alexey Brodkin, arcml, lkml

On Tue, Nov 17, 2015 at 01:24:01PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015 at 04:53:04PM +0530, Vineet Gupta wrote:
> > That means Alexey need to revert the hunk ?
> 
> Yes, I think so.

This is assuming you now have these NMIs we talked about earlier. If all
you have are regular IRQs this is not possible, for we should be calling
->read() with IRQs disabled.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* local64_cmpxchg() in arc_perf_event_update()
@ 2015-11-17 12:25           ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 12:25 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 17, 2015@01:24:01PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015@04:53:04PM +0530, Vineet Gupta wrote:
> > That means Alexey need to revert the hunk ?
> 
> Yes, I think so.

This is assuming you now have these NMIs we talked about earlier. If all
you have are regular IRQs this is not possible, for we should be calling
->read() with IRQs disabled.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
  2015-11-17 12:25           ` Peter Zijlstra
@ 2015-11-17 12:53             ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17 12:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: arcml, Alexey Brodkin, lkml

On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:

> This is assuming you now have these NMIs we talked about earlier. If all
> you have are regular IRQs this is not possible, for we should be calling
> ->read() with IRQs disabled.
> 

No we don't yet. The first stab at it fell flat on floor.

The NMI support from hardware is that is it provides different priorities, higher
one obviously able to interrupt lower one. However instructions like CLRI (disable
interrupts) will still lock out all interrupts.

Thus local_irq_save()/restore() and local_irq_enable()/disable() now need to be
contextual.

  - When running in prio 0 mode, they only need to enable 0
  - In prio 1, they need to enable both 0 and 1

For irq_save()/restore() this is achievable by doing an additional STATUS32 read
at the time of save and passing that value to restore - so there's an additional
overhead - but ignoring that for now.

Bummer is irq_disable()/enable() case: there's need to pass old prio state from
enable to disabled, so we need some sort of global state tracking - which in case
of SMP needs to be per cpu.... either keep something hot in a reg or pay the cost
of additional mem/cache line miss.

I've not investigated how other arches do that. PPC seems to be using some sort of
soft irq state anyways.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2015-11-17 12:53             ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17 12:53 UTC (permalink / raw)
  To: linux-snps-arc

On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:

> This is assuming you now have these NMIs we talked about earlier. If all
> you have are regular IRQs this is not possible, for we should be calling
> ->read() with IRQs disabled.
> 

No we don't yet. The first stab at it fell flat on floor.

The NMI support from hardware is that is it provides different priorities, higher
one obviously able to interrupt lower one. However instructions like CLRI (disable
interrupts) will still lock out all interrupts.

Thus local_irq_save()/restore() and local_irq_enable()/disable() now need to be
contextual.

  - When running in prio 0 mode, they only need to enable 0
  - In prio 1, they need to enable both 0 and 1

For irq_save()/restore() this is achievable by doing an additional STATUS32 read
at the time of save and passing that value to restore - so there's an additional
overhead - but ignoring that for now.

Bummer is irq_disable()/enable() case: there's need to pass old prio state from
enable to disabled, so we need some sort of global state tracking - which in case
of SMP needs to be per cpu.... either keep something hot in a reg or pay the cost
of additional mem/cache line miss.

I've not investigated how other arches do that. PPC seems to be using some sort of
soft irq state anyways.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2015-11-17 12:53             ` Vineet Gupta
@ 2015-11-17 13:15               ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 13:15 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: arcml, Alexey Brodkin, lkml

On Tue, Nov 17, 2015 at 06:23:21PM +0530, Vineet Gupta wrote:
> On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:
> 
> > This is assuming you now have these NMIs we talked about earlier. If all
> > you have are regular IRQs this is not possible, for we should be calling
> > ->read() with IRQs disabled.
> > 
> 
> No we don't yet. The first stab at it fell flat on floor.
> 
> The NMI support from hardware is that is it provides different priorities, higher
> one obviously able to interrupt lower one. However instructions like CLRI (disable
> interrupts) will still lock out all interrupts.
> 
> Thus local_irq_save()/restore() and local_irq_enable()/disable() now need to be
> contextual.
> 
>   - When running in prio 0 mode, they only need to enable 0
>   - In prio 1, they need to enable both 0 and 1
> 
> For irq_save()/restore() this is achievable by doing an additional STATUS32 read
> at the time of save and passing that value to restore - so there's an additional
> overhead - but ignoring that for now.
> 
> Bummer is irq_disable()/enable() case: there's need to pass old prio state from
> enable to disabled, so we need some sort of global state tracking - which in case
> of SMP needs to be per cpu.... either keep something hot in a reg or pay the cost
> of additional mem/cache line miss.
> 
> I've not investigated how other arches do that. PPC seems to be using some sort of
> soft irq state anyways.

Yeah, Sparc64 might be a better example, it more closely matches your
hardware. See
arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2015-11-17 13:15               ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-11-17 13:15 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 17, 2015@06:23:21PM +0530, Vineet Gupta wrote:
> On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:
> 
> > This is assuming you now have these NMIs we talked about earlier. If all
> > you have are regular IRQs this is not possible, for we should be calling
> > ->read() with IRQs disabled.
> > 
> 
> No we don't yet. The first stab at it fell flat on floor.
> 
> The NMI support from hardware is that is it provides different priorities, higher
> one obviously able to interrupt lower one. However instructions like CLRI (disable
> interrupts) will still lock out all interrupts.
> 
> Thus local_irq_save()/restore() and local_irq_enable()/disable() now need to be
> contextual.
> 
>   - When running in prio 0 mode, they only need to enable 0
>   - In prio 1, they need to enable both 0 and 1
> 
> For irq_save()/restore() this is achievable by doing an additional STATUS32 read
> at the time of save and passing that value to restore - so there's an additional
> overhead - but ignoring that for now.
> 
> Bummer is irq_disable()/enable() case: there's need to pass old prio state from
> enable to disabled, so we need some sort of global state tracking - which in case
> of SMP needs to be per cpu.... either keep something hot in a reg or pay the cost
> of additional mem/cache line miss.
> 
> I've not investigated how other arches do that. PPC seems to be using some sort of
> soft irq state anyways.

Yeah, Sparc64 might be a better example, it more closely matches your
hardware. See
arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: local64_cmpxchg() in arc_perf_event_update()
  2015-11-17 12:25           ` Peter Zijlstra
@ 2015-11-17 13:24             ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17 13:24 UTC (permalink / raw)
  To: Peter Zijlstra, Vineet Gupta; +Cc: arcml, Alexey Brodkin, lkml

On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015 at 01:24:01PM +0100, Peter Zijlstra wrote:
>> > On Tue, Nov 17, 2015 at 04:53:04PM +0530, Vineet Gupta wrote:
>>> > > That means Alexey need to revert the hunk ?
>> > 
>> > Yes, I think so.
> This is assuming you now have these NMIs we talked about earlier. If all
> you have are regular IRQs this is not possible, for we should be calling
> ->read() with IRQs disabled.

Ok then there is not need for this change ATM. I'll queue up the revert in the
branch which brings NMI support !

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* local64_cmpxchg() in arc_perf_event_update()
@ 2015-11-17 13:24             ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2015-11-17 13:24 UTC (permalink / raw)
  To: linux-snps-arc

On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015@01:24:01PM +0100, Peter Zijlstra wrote:
>> > On Tue, Nov 17, 2015@04:53:04PM +0530, Vineet Gupta wrote:
>>> > > That means Alexey need to revert the hunk ?
>> > 
>> > Yes, I think so.
> This is assuming you now have these NMIs we talked about earlier. If all
> you have are regular IRQs this is not possible, for we should be calling
> ->read() with IRQs disabled.

Ok then there is not need for this change ATM. I'll queue up the revert in the
branch which brings NMI support !

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2015-11-17 13:15               ` Peter Zijlstra
@ 2016-09-28  0:22                 ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-28  0:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: arcml, Alexey Brodkin, lkml

Hi Peter,

On 11/17/2015 05:15 AM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015 at 06:23:21PM +0530, Vineet Gupta wrote:
>> On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:
>>
>>> This is assuming you now have these NMIs we talked about earlier. If all
>>> you have are regular IRQs this is not possible, for we should be calling
>>> ->read() with IRQs disabled.
>>>
>>
>> No we don't yet. The first stab at it fell flat on floor.
>>
>> The NMI support from hardware is that is it provides different priorities, higher
>> one obviously able to interrupt lower one. However instructions like CLRI (disable
>> interrupts) will still lock out all interrupts.
>>
>> Thus local_irq_save()/restore() and local_irq_enable()/disable() now need to be
>> contextual.
>>
>>   - When running in prio 0 mode, they only need to enable 0
>>   - In prio 1, they need to enable both 0 and 1
>>
>> For irq_save()/restore() this is achievable by doing an additional STATUS32 read
>> at the time of save and passing that value to restore - so there's an additional
>> overhead - but ignoring that for now.
>>
>> Bummer is irq_disable()/enable() case: there's need to pass old prio state from
>> enable to disabled, so we need some sort of global state tracking - which in case
>> of SMP needs to be per cpu.... either keep something hot in a reg or pay the cost
>> of additional mem/cache line miss.
>>
>> I've not investigated how other arches do that. PPC seems to be using some sort of
>> soft irq state anyways.
> 
> Yeah, Sparc64 might be a better example, it more closely matches your
> hardware. See
> arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().

So I finally got around to doing this and as expected has turned out to be quite
some fun. I have a couple of questions and would really appreciate your inputs there.

1. Is it OK in general to short-circuit preemption off irq checks for NMI style
interrupts. The issue is we can get nested interrupts (timer followed by perf)
and one of them can cause resched_curr() to a user task - but we can't return to
user mode from inner interrupt. So it becomes easy if we don't even bother
checking for TIF_NEED_RESCHED in perf intr path. This also has slight advantage
that perf intr returns quickly. Implementation wise this requires a hack to bump
preemption count in the low level nmi handler - and revert that in nmi return path.

2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
require interrupt safety, does that need to be NMI safe as well. We ofcourse want
the very late register restore parts to be non-interruptible, but is this required
before we call prrempt_schedule_irq() off of asm code.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28  0:22                 ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-28  0:22 UTC (permalink / raw)
  To: linux-snps-arc

Hi Peter,

On 11/17/2015 05:15 AM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2015@06:23:21PM +0530, Vineet Gupta wrote:
>> On Tuesday 17 November 2015 05:55 PM, Peter Zijlstra wrote:
>>
>>> This is assuming you now have these NMIs we talked about earlier. If all
>>> you have are regular IRQs this is not possible, for we should be calling
>>> ->read() with IRQs disabled.
>>>
>>
>> No we don't yet. The first stab at it fell flat on floor.
>>
>> The NMI support from hardware is that is it provides different priorities, higher
>> one obviously able to interrupt lower one. However instructions like CLRI (disable
>> interrupts) will still lock out all interrupts.
>>
>> Thus local_irq_save()/restore() and local_irq_enable()/disable() now need to be
>> contextual.
>>
>>   - When running in prio 0 mode, they only need to enable 0
>>   - In prio 1, they need to enable both 0 and 1
>>
>> For irq_save()/restore() this is achievable by doing an additional STATUS32 read
>> at the time of save and passing that value to restore - so there's an additional
>> overhead - but ignoring that for now.
>>
>> Bummer is irq_disable()/enable() case: there's need to pass old prio state from
>> enable to disabled, so we need some sort of global state tracking - which in case
>> of SMP needs to be per cpu.... either keep something hot in a reg or pay the cost
>> of additional mem/cache line miss.
>>
>> I've not investigated how other arches do that. PPC seems to be using some sort of
>> soft irq state anyways.
> 
> Yeah, Sparc64 might be a better example, it more closely matches your
> hardware. See
> arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().

So I finally got around to doing this and as expected has turned out to be quite
some fun. I have a couple of questions and would really appreciate your inputs there.

1. Is it OK in general to short-circuit preemption off irq checks for NMI style
interrupts. The issue is we can get nested interrupts (timer followed by perf)
and one of them can cause resched_curr() to a user task - but we can't return to
user mode from inner interrupt. So it becomes easy if we don't even bother
checking for TIF_NEED_RESCHED in perf intr path. This also has slight advantage
that perf intr returns quickly. Implementation wise this requires a hack to bump
preemption count in the low level nmi handler - and revert that in nmi return path.

2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
require interrupt safety, does that need to be NMI safe as well. We ofcourse want
the very late register restore parts to be non-interruptible, but is this required
before we call prrempt_schedule_irq() off of asm code.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28  0:22                 ` Vineet Gupta
@ 2016-09-28  7:16                   ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-28  7:16 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: arcml, Alexey Brodkin, lkml, Andy Lutomirski

On Tue, Sep 27, 2016 at 05:22:13PM -0700, Vineet Gupta wrote:

> > Yeah, Sparc64 might be a better example, it more closely matches your
> > hardware. See
> > arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().
> 
> So I finally got around to doing this and as expected has turned out to be quite
> some fun. I have a couple of questions and would really appreciate your inputs there.
> 
> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style
> interrupts. 

Yes. If the NMI returns to kernel space you must not attempt preemption
for reasons you found :-), if the NMI returns to userspace you should do
the normal return to user bits, I think.

> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
> the very late register restore parts to be non-interruptible, but is this required
> before we call prrempt_schedule_irq() off of asm code.

Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
might actually know this off the top of his head. I'll try and dig
through x86 to see what it does.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28  7:16                   ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-28  7:16 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Sep 27, 2016@05:22:13PM -0700, Vineet Gupta wrote:

> > Yeah, Sparc64 might be a better example, it more closely matches your
> > hardware. See
> > arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().
> 
> So I finally got around to doing this and as expected has turned out to be quite
> some fun. I have a couple of questions and would really appreciate your inputs there.
> 
> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style
> interrupts. 

Yes. If the NMI returns to kernel space you must not attempt preemption
for reasons you found :-), if the NMI returns to userspace you should do
the normal return to user bits, I think.

> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
> the very late register restore parts to be non-interruptible, but is this required
> before we call prrempt_schedule_irq() off of asm code.

Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
might actually know this off the top of his head. I'll try and dig
through x86 to see what it does.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28  7:16                   ` Peter Zijlstra
@ 2016-09-28 17:58                     ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-28 17:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: arcml, Alexey Brodkin, lkml, Andy Lutomirski

On 09/28/2016 12:16 AM, Peter Zijlstra wrote:
>> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style
>> > interrupts. 
> Yes. If the NMI returns to kernel space you must not attempt preemption
> for reasons you found :-), if the NMI returns to userspace you should do
> the normal return to user bits, I think.

Just to add, eliding preemption check in return path might not be sufficient as we
could still have timer intr leading to scheduler_tick() - whcih is what i found as
well ;-) So scheduler nevertheless needs to be told to not reschedule in this code
path.

So eliding preempt-off-irq just becomes an optimization for NMI code paths IMHO.

Now I was a stupid fool, fudging preemption counts in low level code, to achieve
this, whereas we have nice generic nmi_{enter,exit}() and in_nmi() helpers which
can be used.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28 17:58                     ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-28 17:58 UTC (permalink / raw)
  To: linux-snps-arc

On 09/28/2016 12:16 AM, Peter Zijlstra wrote:
>> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style
>> > interrupts. 
> Yes. If the NMI returns to kernel space you must not attempt preemption
> for reasons you found :-), if the NMI returns to userspace you should do
> the normal return to user bits, I think.

Just to add, eliding preemption check in return path might not be sufficient as we
could still have timer intr leading to scheduler_tick() - whcih is what i found as
well ;-) So scheduler nevertheless needs to be told to not reschedule in this code
path.

So eliding preempt-off-irq just becomes an optimization for NMI code paths IMHO.

Now I was a stupid fool, fudging preemption counts in low level code, to achieve
this, whereas we have nice generic nmi_{enter,exit}() and in_nmi() helpers which
can be used.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28  7:16                   ` Peter Zijlstra
@ 2016-09-28 19:25                     ` Andy Lutomirski
  -1 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-28 19:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vineet Gupta, arcml, Alexey Brodkin, lkml

On Wed, Sep 28, 2016 at 12:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 27, 2016 at 05:22:13PM -0700, Vineet Gupta wrote:
>
>> > Yeah, Sparc64 might be a better example, it more closely matches your
>> > hardware. See
>> > arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().
>>
>> So I finally got around to doing this and as expected has turned out to be quite
>> some fun. I have a couple of questions and would really appreciate your inputs there.
>>
>> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style
>> interrupts.
>
> Yes. If the NMI returns to kernel space you must not attempt preemption
> for reasons you found :-),

Last time I looked at this, I decided that there was no reason that
NMIs would ever need to handle preemption.  Even if the NMI hit
interruptible kernel code, anything that would cause preemption to be
needed would either send an IPI (and thus cause preemption) right
after the NMI fiinished.  NMI handlers themselves have no business
setting TIF_NEED_RESCHED or similar.

> if the NMI returns to userspace you should do
> the normal return to user bits, I think.

x86 does this for simplicity.  There was a really nasty corner case
that I could only figure out how to solve by special casing NMIs from
user space.  I'm not sure that it's actually necessary from a
non-arch-specific POV to handle all the usual return-to-userspace work
on NMI.  But maybe perf NMIs can send signals?

x86's MCEs *do* need the full return-to-userspace handling for memory
failure to work right.  MCE is kind of like NMI...

>
>> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
>> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
>> the very late register restore parts to be non-interruptible, but is this required
>> before we call prrempt_schedule_irq() off of asm code.
>
> Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
> might actually know this off the top of his head. I'll try and dig
> through x86 to see what it does.

On x86, it's quite simple.  IRQs are *always* off during the final
register restore, and we don't re-check for preemption there.  x86
handles preemption after turning off IRQs, and IRQs are guaranteed to
stay off until we actually return to userspace.

The code is almost entirely in C in arch/x86/entry/common.c.  There
isn't anything particularly x86-speficic in there.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28 19:25                     ` Andy Lutomirski
  0 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-28 19:25 UTC (permalink / raw)
  To: linux-snps-arc

On Wed, Sep 28, 2016@12:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 27, 2016@05:22:13PM -0700, Vineet Gupta wrote:
>
>> > Yeah, Sparc64 might be a better example, it more closely matches your
>> > hardware. See
>> > arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save().
>>
>> So I finally got around to doing this and as expected has turned out to be quite
>> some fun. I have a couple of questions and would really appreciate your inputs there.
>>
>> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style
>> interrupts.
>
> Yes. If the NMI returns to kernel space you must not attempt preemption
> for reasons you found :-),

Last time I looked at this, I decided that there was no reason that
NMIs would ever need to handle preemption.  Even if the NMI hit
interruptible kernel code, anything that would cause preemption to be
needed would either send an IPI (and thus cause preemption) right
after the NMI fiinished.  NMI handlers themselves have no business
setting TIF_NEED_RESCHED or similar.

> if the NMI returns to userspace you should do
> the normal return to user bits, I think.

x86 does this for simplicity.  There was a really nasty corner case
that I could only figure out how to solve by special casing NMIs from
user space.  I'm not sure that it's actually necessary from a
non-arch-specific POV to handle all the usual return-to-userspace work
on NMI.  But maybe perf NMIs can send signals?

x86's MCEs *do* need the full return-to-userspace handling for memory
failure to work right.  MCE is kind of like NMI...

>
>> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
>> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
>> the very late register restore parts to be non-interruptible, but is this required
>> before we call prrempt_schedule_irq() off of asm code.
>
> Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
> might actually know this off the top of his head. I'll try and dig
> through x86 to see what it does.

On x86, it's quite simple.  IRQs are *always* off during the final
register restore, and we don't re-check for preemption there.  x86
handles preemption after turning off IRQs, and IRQs are guaranteed to
stay off until we actually return to userspace.

The code is almost entirely in C in arch/x86/entry/common.c.  There
isn't anything particularly x86-speficic in there.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28 19:25                     ` Andy Lutomirski
@ 2016-09-28 20:37                       ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-28 20:37 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Vineet Gupta, arcml, Alexey Brodkin, lkml

On Wed, Sep 28, 2016 at 12:25:11PM -0700, Andy Lutomirski wrote:
> > Yes. If the NMI returns to kernel space you must not attempt preemption
> > for reasons you found :-),
> 
> Last time I looked at this, I decided that there was no reason that
> NMIs would ever need to handle preemption.  Even if the NMI hit
> interruptible kernel code, anything that would cause preemption to be
> needed would either send an IPI (and thus cause preemption) right
> after the NMI fiinished.  NMI handlers themselves have no business
> setting TIF_NEED_RESCHED or similar.

Good point, they don't and therefore you need not bother.

> > if the NMI returns to userspace you should do
> > the normal return to user bits, I think.
> 
> x86 does this for simplicity.  There was a really nasty corner case
> that I could only figure out how to solve by special casing NMIs from
> user space.  I'm not sure that it's actually necessary from a
> non-arch-specific POV to handle all the usual return-to-userspace work
> on NMI.  But maybe perf NMIs can send signals?

No it cannot. It uses irq_work (which sends a self-IPI) when it wants to
do signals.

> >> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
> >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
> >> the very late register restore parts to be non-interruptible, but is this required
> >> before we call prrempt_schedule_irq() off of asm code.
> >
> > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
> > might actually know this off the top of his head. I'll try and dig
> > through x86 to see what it does.
> 
> On x86, it's quite simple.  IRQs are *always* off during the final
> register restore, and we don't re-check for preemption there.  x86
> handles preemption after turning off IRQs, and IRQs are guaranteed to
> stay off until we actually return to userspace.
> 
> The code is almost entirely in C in arch/x86/entry/common.c.  There
> isn't anything particularly x86-speficic in there.

Right, so what I think Vineet is asking is if we need to disable NMIs as
well, we cannot on x86 disable NMIs so no.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28 20:37                       ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-28 20:37 UTC (permalink / raw)
  To: linux-snps-arc

On Wed, Sep 28, 2016@12:25:11PM -0700, Andy Lutomirski wrote:
> > Yes. If the NMI returns to kernel space you must not attempt preemption
> > for reasons you found :-),
> 
> Last time I looked at this, I decided that there was no reason that
> NMIs would ever need to handle preemption.  Even if the NMI hit
> interruptible kernel code, anything that would cause preemption to be
> needed would either send an IPI (and thus cause preemption) right
> after the NMI fiinished.  NMI handlers themselves have no business
> setting TIF_NEED_RESCHED or similar.

Good point, they don't and therefore you need not bother.

> > if the NMI returns to userspace you should do
> > the normal return to user bits, I think.
> 
> x86 does this for simplicity.  There was a really nasty corner case
> that I could only figure out how to solve by special casing NMIs from
> user space.  I'm not sure that it's actually necessary from a
> non-arch-specific POV to handle all the usual return-to-userspace work
> on NMI.  But maybe perf NMIs can send signals?

No it cannot. It uses irq_work (which sends a self-IPI) when it wants to
do signals.

> >> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
> >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
> >> the very late register restore parts to be non-interruptible, but is this required
> >> before we call prrempt_schedule_irq() off of asm code.
> >
> > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
> > might actually know this off the top of his head. I'll try and dig
> > through x86 to see what it does.
> 
> On x86, it's quite simple.  IRQs are *always* off during the final
> register restore, and we don't re-check for preemption there.  x86
> handles preemption after turning off IRQs, and IRQs are guaranteed to
> stay off until we actually return to userspace.
> 
> The code is almost entirely in C in arch/x86/entry/common.c.  There
> isn't anything particularly x86-speficic in there.

Right, so what I think Vineet is asking is if we need to disable NMIs as
well, we cannot on x86 disable NMIs so no.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28 20:37                       ` Peter Zijlstra
@ 2016-09-28 22:26                         ` Andy Lutomirski
  -1 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-28 22:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Alexey Brodkin, arcml, Vineet Gupta, lkml

On Sep 28, 2016 1:37 PM, "Peter Zijlstra" <peterz@infradead.org> wrote:
>
> On Wed, Sep 28, 2016 at 12:25:11PM -0700, Andy Lutomirski wrote:
> > > Yes. If the NMI returns to kernel space you must not attempt preemption
> > > for reasons you found :-),
> >
> > Last time I looked at this, I decided that there was no reason that
> > NMIs would ever need to handle preemption.  Even if the NMI hit
> > interruptible kernel code, anything that would cause preemption to be
> > needed would either send an IPI (and thus cause preemption) right
> > after the NMI fiinished.  NMI handlers themselves have no business
> > setting TIF_NEED_RESCHED or similar.
>
> Good point, they don't and therefore you need not bother.
>
> > > if the NMI returns to userspace you should do
> > > the normal return to user bits, I think.
> >
> > x86 does this for simplicity.  There was a really nasty corner case
> > that I could only figure out how to solve by special casing NMIs from
> > user space.  I'm not sure that it's actually necessary from a
> > non-arch-specific POV to handle all the usual return-to-userspace work
> > on NMI.  But maybe perf NMIs can send signals?
>
> No it cannot. It uses irq_work (which sends a self-IPI) when it wants to
> do signals.
>
> > >> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
> > >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
> > >> the very late register restore parts to be non-interruptible, but is this required
> > >> before we call prrempt_schedule_irq() off of asm code.
> > >
> > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
> > > might actually know this off the top of his head. I'll try and dig
> > > through x86 to see what it does.
> >
> > On x86, it's quite simple.  IRQs are *always* off during the final
> > register restore, and we don't re-check for preemption there.  x86
> > handles preemption after turning off IRQs, and IRQs are guaranteed to
> > stay off until we actually return to userspace.
> >
> > The code is almost entirely in C in arch/x86/entry/common.c.  There
> > isn't anything particularly x86-speficic in there.
>
> Right, so what I think Vineet is asking is if we need to disable NMIs as
> well, we cannot on x86 disable NMIs so no.
>

The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
without sending an IPI, so we can't miss a wakeup.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28 22:26                         ` Andy Lutomirski
  0 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-28 22:26 UTC (permalink / raw)
  To: linux-snps-arc

On Sep 28, 2016 1:37 PM, "Peter Zijlstra" <peterz@infradead.org> wrote:
>
> On Wed, Sep 28, 2016@12:25:11PM -0700, Andy Lutomirski wrote:
> > > Yes. If the NMI returns to kernel space you must not attempt preemption
> > > for reasons you found :-),
> >
> > Last time I looked at this, I decided that there was no reason that
> > NMIs would ever need to handle preemption.  Even if the NMI hit
> > interruptible kernel code, anything that would cause preemption to be
> > needed would either send an IPI (and thus cause preemption) right
> > after the NMI fiinished.  NMI handlers themselves have no business
> > setting TIF_NEED_RESCHED or similar.
>
> Good point, they don't and therefore you need not bother.
>
> > > if the NMI returns to userspace you should do
> > > the normal return to user bits, I think.
> >
> > x86 does this for simplicity.  There was a really nasty corner case
> > that I could only figure out how to solve by special casing NMIs from
> > user space.  I'm not sure that it's actually necessary from a
> > non-arch-specific POV to handle all the usual return-to-userspace work
> > on NMI.  But maybe perf NMIs can send signals?
>
> No it cannot. It uses irq_work (which sends a self-IPI) when it wants to
> do signals.
>
> > >> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
> > >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
> > >> the very late register restore parts to be non-interruptible, but is this required
> > >> before we call prrempt_schedule_irq() off of asm code.
> > >
> > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
> > > might actually know this off the top of his head. I'll try and dig
> > > through x86 to see what it does.
> >
> > On x86, it's quite simple.  IRQs are *always* off during the final
> > register restore, and we don't re-check for preemption there.  x86
> > handles preemption after turning off IRQs, and IRQs are guaranteed to
> > stay off until we actually return to userspace.
> >
> > The code is almost entirely in C in arch/x86/entry/common.c.  There
> > isn't anything particularly x86-speficic in there.
>
> Right, so what I think Vineet is asking is if we need to disable NMIs as
> well, we cannot on x86 disable NMIs so no.
>

The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
without sending an IPI, so we can't miss a wakeup.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28 22:26                         ` Andy Lutomirski
@ 2016-09-28 22:44                           ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-28 22:44 UTC (permalink / raw)
  To: Andy Lutomirski, Peter Zijlstra; +Cc: Alexey Brodkin, arcml, lkml

On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>>>>> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
>>>>> > > >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
>>>>> > > >> the very late register restore parts to be non-interruptible, but is this required
>>>>> > > >> before we call prrempt_schedule_irq() off of asm code.
>>>> > > >
>>>> > > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
>>>> > > > might actually know this off the top of his head. I'll try and dig
>>>> > > > through x86 to see what it does.
>>> > >
>>> > > On x86, it's quite simple.  IRQs are *always* off during the final
>>> > > register restore, and we don't re-check for preemption there.  x86
>>> > > handles preemption after turning off IRQs, and IRQs are guaranteed to
>>> > > stay off until we actually return to userspace.
>>> > >
>>> > > The code is almost entirely in C in arch/x86/entry/common.c.  There
>>> > > isn't anything particularly x86-speficic in there.
>> >
>> > Right, so what I think Vineet is asking is if we need to disable NMIs as
>> > well, we cannot on x86 disable NMIs so no.
>> >
> The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> without sending an IPI, so we can't miss a wakeup.

The case I saw was different: timer intr (normal prio) comes in - scheduler_tick()
sets TIF_NEED_RESCHED and before this intr return, it gets interrupted by perf
intr (higher prio) and we decide not to follow through on preemption because a
nested intr can't return to userspace anyways.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28 22:44                           ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-28 22:44 UTC (permalink / raw)
  To: linux-snps-arc

On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>>>>> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
>>>>> > > >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
>>>>> > > >> the very late register restore parts to be non-interruptible, but is this required
>>>>> > > >> before we call prrempt_schedule_irq() off of asm code.
>>>> > > >
>>>> > > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
>>>> > > > might actually know this off the top of his head. I'll try and dig
>>>> > > > through x86 to see what it does.
>>> > >
>>> > > On x86, it's quite simple.  IRQs are *always* off during the final
>>> > > register restore, and we don't re-check for preemption there.  x86
>>> > > handles preemption after turning off IRQs, and IRQs are guaranteed to
>>> > > stay off until we actually return to userspace.
>>> > >
>>> > > The code is almost entirely in C in arch/x86/entry/common.c.  There
>>> > > isn't anything particularly x86-speficic in there.
>> >
>> > Right, so what I think Vineet is asking is if we need to disable NMIs as
>> > well, we cannot on x86 disable NMIs so no.
>> >
> The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> without sending an IPI, so we can't miss a wakeup.

The case I saw was different: timer intr (normal prio) comes in - scheduler_tick()
sets TIF_NEED_RESCHED and before this intr return, it gets interrupted by perf
intr (higher prio) and we decide not to follow through on preemption because a
nested intr can't return to userspace anyways.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28 22:44                           ` Vineet Gupta
@ 2016-09-28 22:47                             ` Andy Lutomirski
  -1 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-28 22:47 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Peter Zijlstra, Alexey Brodkin, arcml, lkml

On Wed, Sep 28, 2016 at 3:44 PM, Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>>>>>> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
>>>>>> > > >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
>>>>>> > > >> the very late register restore parts to be non-interruptible, but is this required
>>>>>> > > >> before we call prrempt_schedule_irq() off of asm code.
>>>>> > > >
>>>>> > > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
>>>>> > > > might actually know this off the top of his head. I'll try and dig
>>>>> > > > through x86 to see what it does.
>>>> > >
>>>> > > On x86, it's quite simple.  IRQs are *always* off during the final
>>>> > > register restore, and we don't re-check for preemption there.  x86
>>>> > > handles preemption after turning off IRQs, and IRQs are guaranteed to
>>>> > > stay off until we actually return to userspace.
>>>> > >
>>>> > > The code is almost entirely in C in arch/x86/entry/common.c.  There
>>>> > > isn't anything particularly x86-speficic in there.
>>> >
>>> > Right, so what I think Vineet is asking is if we need to disable NMIs as
>>> > well, we cannot on x86 disable NMIs so no.
>>> >
>> The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
>> without sending an IPI, so we can't miss a wakeup.
>
> The case I saw was different: timer intr (normal prio) comes in - scheduler_tick()
> sets TIF_NEED_RESCHED and before this intr return, it gets interrupted by perf
> intr (higher prio) and we decide not to follow through on preemption because a
> nested intr can't return to userspace anyways.
>
>

This shouldn't cause a problem.  When the timer interrupt returns, it
should be able to handle preemption.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-28 22:47                             ` Andy Lutomirski
  0 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-28 22:47 UTC (permalink / raw)
  To: linux-snps-arc

On Wed, Sep 28, 2016 at 3:44 PM, Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>>>>>> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode
>>>>>> > > >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want
>>>>>> > > >> the very late register restore parts to be non-interruptible, but is this required
>>>>>> > > >> before we call prrempt_schedule_irq() off of asm code.
>>>>> > > >
>>>>> > > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who
>>>>> > > > might actually know this off the top of his head. I'll try and dig
>>>>> > > > through x86 to see what it does.
>>>> > >
>>>> > > On x86, it's quite simple.  IRQs are *always* off during the final
>>>> > > register restore, and we don't re-check for preemption there.  x86
>>>> > > handles preemption after turning off IRQs, and IRQs are guaranteed to
>>>> > > stay off until we actually return to userspace.
>>>> > >
>>>> > > The code is almost entirely in C in arch/x86/entry/common.c.  There
>>>> > > isn't anything particularly x86-speficic in there.
>>> >
>>> > Right, so what I think Vineet is asking is if we need to disable NMIs as
>>> > well, we cannot on x86 disable NMIs so no.
>>> >
>> The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
>> without sending an IPI, so we can't miss a wakeup.
>
> The case I saw was different: timer intr (normal prio) comes in - scheduler_tick()
> sets TIF_NEED_RESCHED and before this intr return, it gets interrupted by perf
> intr (higher prio) and we decide not to follow through on preemption because a
> nested intr can't return to userspace anyways.
>
>

This shouldn't cause a problem.  When the timer interrupt returns, it
should be able to handle preemption.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-28 22:26                         ` Andy Lutomirski
@ 2016-09-29  1:20                           ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-29  1:20 UTC (permalink / raw)
  To: Andy Lutomirski, Peter Zijlstra; +Cc: Alexey Brodkin, arcml, Vineet Gupta, lkml

On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>> Right, so what I think Vineet is asking is if we need to disable NMIs as
>> > well, we cannot on x86 disable NMIs so no.
>> >
> The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> without sending an IPI, so we can't miss a wakeup.

But what exact wakeup miss are we taking about here. If intr were NOT disabled,
how could this happen. Just trying to understand the need for "irqs-disabled" in
resume_{user,kernel}_*

The intr disabled before reg file restore makes complete sense though.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-29  1:20                           ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-29  1:20 UTC (permalink / raw)
  To: linux-snps-arc

On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>> Right, so what I think Vineet is asking is if we need to disable NMIs as
>> > well, we cannot on x86 disable NMIs so no.
>> >
> The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> without sending an IPI, so we can't miss a wakeup.

But what exact wakeup miss are we taking about here. If intr were NOT disabled,
how could this happen. Just trying to understand the need for "irqs-disabled" in
resume_{user,kernel}_*

The intr disabled before reg file restore makes complete sense though.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-29  1:20                           ` Vineet Gupta
@ 2016-09-29  6:43                             ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-29  6:43 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Andy Lutomirski, Alexey Brodkin, arcml, Vineet Gupta, lkml

On Wed, Sep 28, 2016 at 06:20:29PM -0700, Vineet Gupta wrote:
> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
> >> Right, so what I think Vineet is asking is if we need to disable NMIs as
> >> > well, we cannot on x86 disable NMIs so no.
> >> >
> > The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> > without sending an IPI, so we can't miss a wakeup.
> 
> But what exact wakeup miss are we taking about here. If intr were NOT disabled,
> how could this happen. Just trying to understand the need for "irqs-disabled" in
> resume_{user,kernel}_*
> 
> The intr disabled before reg file restore makes complete sense though.


	user	irq	nmi

	|
	|
	`-----> .
		|
		|
		|
		`----->	.
			|
			|
		. <-----'
	. <-----'
	|
	|

So what Andy is saying is that NMI context never sets TIF_NEED_RESCHED,
this means that return from NMI never needs to check for preemption
etc..

Now your return from IRQ obviously should, the normal way. If the IRQ
return gets interrupted by the NMI nothing special should occur. The
return from NMI should simply resume the return from IRQ.

So I'm a little confused by your timer interrupt example, it _should_ do
the preemption, the nested interrupt (NMI) will return to the regular
interrupt which should resume its normal return preemption or not.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-29  6:43                             ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-29  6:43 UTC (permalink / raw)
  To: linux-snps-arc

On Wed, Sep 28, 2016@06:20:29PM -0700, Vineet Gupta wrote:
> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
> >> Right, so what I think Vineet is asking is if we need to disable NMIs as
> >> > well, we cannot on x86 disable NMIs so no.
> >> >
> > The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> > without sending an IPI, so we can't miss a wakeup.
> 
> But what exact wakeup miss are we taking about here. If intr were NOT disabled,
> how could this happen. Just trying to understand the need for "irqs-disabled" in
> resume_{user,kernel}_*
> 
> The intr disabled before reg file restore makes complete sense though.


	user	irq	nmi

	|
	|
	`-----> .
		|
		|
		|
		`----->	.
			|
			|
		. <-----'
	. <-----'
	|
	|

So what Andy is saying is that NMI context never sets TIF_NEED_RESCHED,
this means that return from NMI never needs to check for preemption
etc..

Now your return from IRQ obviously should, the normal way. If the IRQ
return gets interrupted by the NMI nothing special should occur. The
return from NMI should simply resume the return from IRQ.

So I'm a little confused by your timer interrupt example, it _should_ do
the preemption, the nested interrupt (NMI) will return to the regular
interrupt which should resume its normal return preemption or not.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-29  6:43                             ` Peter Zijlstra
@ 2016-09-29 16:47                               ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-29 16:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: arcml, Alexey Brodkin, lkml, Andy Lutomirski

On 09/28/2016 11:43 PM, Peter Zijlstra wrote:
> On Wed, Sep 28, 2016 at 06:20:29PM -0700, Vineet Gupta wrote:
>> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:

> 
> 
> 	user	irq	nmi
> 
> 	|
> 	|
> 	`-----> .
> 		|
> 		|
> 		|
> 		`----->	.
> 			|
> 			|
> 		. <-----'
> 	. <-----'
> 	|
> 	|
> 
> So what Andy is saying is that NMI context never sets TIF_NEED_RESCHED,

Can we we be absolutely sure about that. A perf intr, vmalloc based mmap can go
thru various hoops and. Is it not possible that it hits a reschedule, setting
TIF_NEED_RESCHED

> this means that return from NMI never needs to check for preemption
> etc..

I don't think this implies from prev one. In my example, timer interrupt triggers
a TIF_NEED_RESCHED and irq_exit -> __do_softirq() it hits the perf intr

> Now your return from IRQ obviously should, the normal way. If the IRQ
> return gets interrupted by the NMI nothing special should occur. The
> return from NMI should simply resume the return from IRQ.
> 
> So I'm a little confused by your timer interrupt example, it _should_ do
> the preemption, the nested interrupt (NMI) will return to the regular
> interrupt which should resume its normal return preemption or not.

So lets first see how a single priority intr works on ARC (maybe on other arches
as well).

1. task t1 enters kernel syscall (Trap Exception on ARC), handler drops down to
pure kernel model and proceeds into syscall handler.
2. while in handler, some intr is taken, which causes a reschedule to task t2.
3. t2's control flow returns (say it was in syscall when originally
scheduled-out). It needs to return to user mode but cpu needs to return from
active interrupt. So we return to user mode, "riding" the intr return path. Means
intr in step #2 returns to a different PC and execution mode (user vs. kernel etc).

Now the same scheme doesn't work out of the box when u have intr and nmi. We have
to actively ensure that nmi doesn't lead to a __schedule() sans user code. And
this is done by bumping preempt_count(NMI_OFFSET) in entry of nmi handler.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-29 16:47                               ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-29 16:47 UTC (permalink / raw)
  To: linux-snps-arc

On 09/28/2016 11:43 PM, Peter Zijlstra wrote:
> On Wed, Sep 28, 2016@06:20:29PM -0700, Vineet Gupta wrote:
>> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:

> 
> 
> 	user	irq	nmi
> 
> 	|
> 	|
> 	`-----> .
> 		|
> 		|
> 		|
> 		`----->	.
> 			|
> 			|
> 		. <-----'
> 	. <-----'
> 	|
> 	|
> 
> So what Andy is saying is that NMI context never sets TIF_NEED_RESCHED,

Can we we be absolutely sure about that. A perf intr, vmalloc based mmap can go
thru various hoops and. Is it not possible that it hits a reschedule, setting
TIF_NEED_RESCHED

> this means that return from NMI never needs to check for preemption
> etc..

I don't think this implies from prev one. In my example, timer interrupt triggers
a TIF_NEED_RESCHED and irq_exit -> __do_softirq() it hits the perf intr

> Now your return from IRQ obviously should, the normal way. If the IRQ
> return gets interrupted by the NMI nothing special should occur. The
> return from NMI should simply resume the return from IRQ.
> 
> So I'm a little confused by your timer interrupt example, it _should_ do
> the preemption, the nested interrupt (NMI) will return to the regular
> interrupt which should resume its normal return preemption or not.

So lets first see how a single priority intr works on ARC (maybe on other arches
as well).

1. task t1 enters kernel syscall (Trap Exception on ARC), handler drops down to
pure kernel model and proceeds into syscall handler.
2. while in handler, some intr is taken, which causes a reschedule to task t2.
3. t2's control flow returns (say it was in syscall when originally
scheduled-out). It needs to return to user mode but cpu needs to return from
active interrupt. So we return to user mode, "riding" the intr return path. Means
intr in step #2 returns to a different PC and execution mode (user vs. kernel etc).

Now the same scheme doesn't work out of the box when u have intr and nmi. We have
to actively ensure that nmi doesn't lead to a __schedule() sans user code. And
this is done by bumping preempt_count(NMI_OFFSET) in entry of nmi handler.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-29  1:20                           ` Vineet Gupta
@ 2016-09-29 17:30                             ` Andy Lutomirski
  -1 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-29 17:30 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Alexey Brodkin, arcml, Vineet Gupta, lkml, Peter Zijlstra

On Sep 28, 2016 6:20 PM, "Vineet Gupta" <vgupta@synopsys.com> wrote:
>
> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
> >> Right, so what I think Vineet is asking is if we need to disable NMIs as
> >> > well, we cannot on x86 disable NMIs so no.
> >> >
> > The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> > without sending an IPI, so we can't miss a wakeup.
>
> But what exact wakeup miss are we taking about here. If intr were NOT disabled,
> how could this happen. Just trying to understand the need for "irqs-disabled" in
> resume_{user,kernel}_*
>
> The intr disabled before reg file restore makes complete sense though.

An interrupt after the TIF_NEED_RESCHED check could set
TIF_NEED_RESCHED (because it woke a waiting task), and the resume code
wouldn't notice.  No IPI would be sent because TIF_NEED_RESCHED on the
current task is supposed to be acted on in a timely manner without
one.

More severe problems could happen.  A signal could be queued or, on
x86, a return-to-usermode notifier could be set by KVM on a
preemptible kernel.  The latter, if missed, will bring down the whole
machine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-29 17:30                             ` Andy Lutomirski
  0 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-29 17:30 UTC (permalink / raw)
  To: linux-snps-arc

On Sep 28, 2016 6:20 PM, "Vineet Gupta" <vgupta@synopsys.com> wrote:
>
> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
> >> Right, so what I think Vineet is asking is if we need to disable NMIs as
> >> > well, we cannot on x86 disable NMIs so no.
> >> >
> > The same argument works here, too: an NMI won't set TIF_NEED_RESCHED
> > without sending an IPI, so we can't miss a wakeup.
>
> But what exact wakeup miss are we taking about here. If intr were NOT disabled,
> how could this happen. Just trying to understand the need for "irqs-disabled" in
> resume_{user,kernel}_*
>
> The intr disabled before reg file restore makes complete sense though.

An interrupt after the TIF_NEED_RESCHED check could set
TIF_NEED_RESCHED (because it woke a waiting task), and the resume code
wouldn't notice.  No IPI would be sent because TIF_NEED_RESCHED on the
current task is supposed to be acted on in a timely manner without
one.

More severe problems could happen.  A signal could be queued or, on
x86, a return-to-usermode notifier could be set by KVM on a
preemptible kernel.  The latter, if missed, will bring down the whole
machine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-29 16:47                               ` Vineet Gupta
@ 2016-09-29 18:54                                 ` Andy Lutomirski
  -1 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-29 18:54 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Peter Zijlstra, arcml, Alexey Brodkin, lkml

On Thu, Sep 29, 2016 at 9:47 AM, Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
> On 09/28/2016 11:43 PM, Peter Zijlstra wrote:
>> On Wed, Sep 28, 2016 at 06:20:29PM -0700, Vineet Gupta wrote:
>>> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>
>>
>>
>>       user    irq     nmi
>>
>>       |
>>       |
>>       `-----> .
>>               |
>>               |
>>               |
>>               `-----> .
>>                       |
>>                       |
>>               . <-----'
>>       . <-----'
>>       |
>>       |
>>
>> So what Andy is saying is that NMI context never sets TIF_NEED_RESCHED,
>
> Can we we be absolutely sure about that. A perf intr, vmalloc based mmap can go
> thru various hoops and. Is it not possible that it hits a reschedule, setting
> TIF_NEED_RESCHED
>
>> this means that return from NMI never needs to check for preemption
>> etc..
>
> I don't think this implies from prev one. In my example, timer interrupt triggers
> a TIF_NEED_RESCHED and irq_exit -> __do_softirq() it hits the perf intr
>
>> Now your return from IRQ obviously should, the normal way. If the IRQ
>> return gets interrupted by the NMI nothing special should occur. The
>> return from NMI should simply resume the return from IRQ.
>>
>> So I'm a little confused by your timer interrupt example, it _should_ do
>> the preemption, the nested interrupt (NMI) will return to the regular
>> interrupt which should resume its normal return preemption or not.
>
> So lets first see how a single priority intr works on ARC (maybe on other arches
> as well).
>
> 1. task t1 enters kernel syscall (Trap Exception on ARC), handler drops down to
> pure kernel model and proceeds into syscall handler.
> 2. while in handler, some intr is taken, which causes a reschedule to task t2.
> 3. t2's control flow returns (say it was in syscall when originally
> scheduled-out). It needs to return to user mode but cpu needs to return from
> active interrupt. So we return to user mode, "riding" the intr return path. Means
> intr in step #2 returns to a different PC and execution mode (user vs. kernel etc).
>

For the benefit of people who don't know what an "active interrupt" is
(x86 has no such concept in hardware), can you elaborate a bit?  On
x86, for all practical purposes [1], an interrupt kicks the CPU into
kernel mode, and the kernel is free to return however it likes.  It
can do a standard interrupt return right back to the interrupted
context, but it can also switch stacks and do whatever some other
thread was doing.

> Now the same scheme doesn't work out of the box when u have intr and nmi. We have
> to actively ensure that nmi doesn't lead to a __schedule() sans user code. And
> this is done by bumping preempt_count(NMI_OFFSET) in entry of nmi handler.

The perf NMI code won't schedule.  In general, you just need to ensure
that is_nmi() is true.  Any kernel code that touches normal locks,
schedules, gets page faults without extreme caution, etc. needs to be
aware that nmis are special.


[1] There's an exception on 64-bit AMD CPUs because AMD blew it.
Also, x86 NMI return is itself severely overcomplicated because we don't
have good control over NMI nesting.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-29 18:54                                 ` Andy Lutomirski
  0 siblings, 0 replies; 46+ messages in thread
From: Andy Lutomirski @ 2016-09-29 18:54 UTC (permalink / raw)
  To: linux-snps-arc

On Thu, Sep 29, 2016 at 9:47 AM, Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
> On 09/28/2016 11:43 PM, Peter Zijlstra wrote:
>> On Wed, Sep 28, 2016@06:20:29PM -0700, Vineet Gupta wrote:
>>> On 09/28/2016 03:26 PM, Andy Lutomirski wrote:
>
>>
>>
>>       user    irq     nmi
>>
>>       |
>>       |
>>       `-----> .
>>               |
>>               |
>>               |
>>               `-----> .
>>                       |
>>                       |
>>               . <-----'
>>       . <-----'
>>       |
>>       |
>>
>> So what Andy is saying is that NMI context never sets TIF_NEED_RESCHED,
>
> Can we we be absolutely sure about that. A perf intr, vmalloc based mmap can go
> thru various hoops and. Is it not possible that it hits a reschedule, setting
> TIF_NEED_RESCHED
>
>> this means that return from NMI never needs to check for preemption
>> etc..
>
> I don't think this implies from prev one. In my example, timer interrupt triggers
> a TIF_NEED_RESCHED and irq_exit -> __do_softirq() it hits the perf intr
>
>> Now your return from IRQ obviously should, the normal way. If the IRQ
>> return gets interrupted by the NMI nothing special should occur. The
>> return from NMI should simply resume the return from IRQ.
>>
>> So I'm a little confused by your timer interrupt example, it _should_ do
>> the preemption, the nested interrupt (NMI) will return to the regular
>> interrupt which should resume its normal return preemption or not.
>
> So lets first see how a single priority intr works on ARC (maybe on other arches
> as well).
>
> 1. task t1 enters kernel syscall (Trap Exception on ARC), handler drops down to
> pure kernel model and proceeds into syscall handler.
> 2. while in handler, some intr is taken, which causes a reschedule to task t2.
> 3. t2's control flow returns (say it was in syscall when originally
> scheduled-out). It needs to return to user mode but cpu needs to return from
> active interrupt. So we return to user mode, "riding" the intr return path. Means
> intr in step #2 returns to a different PC and execution mode (user vs. kernel etc).
>

For the benefit of people who don't know what an "active interrupt" is
(x86 has no such concept in hardware), can you elaborate a bit?  On
x86, for all practical purposes [1], an interrupt kicks the CPU into
kernel mode, and the kernel is free to return however it likes.  It
can do a standard interrupt return right back to the interrupted
context, but it can also switch stacks and do whatever some other
thread was doing.

> Now the same scheme doesn't work out of the box when u have intr and nmi. We have
> to actively ensure that nmi doesn't lead to a __schedule() sans user code. And
> this is done by bumping preempt_count(NMI_OFFSET) in entry of nmi handler.

The perf NMI code won't schedule.  In general, you just need to ensure
that is_nmi() is true.  Any kernel code that touches normal locks,
schedules, gets page faults without extreme caution, etc. needs to be
aware that nmis are special.


[1] There's an exception on 64-bit AMD CPUs because AMD blew it.
Also, x86 NMI return is itself severely overcomplicated because we don't
have good control over NMI nesting.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-29 18:54                                 ` Andy Lutomirski
@ 2016-09-29 19:48                                   ` Vineet Gupta
  -1 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-29 19:48 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Peter Zijlstra, arcml, Alexey Brodkin, lkml

On 09/29/2016 11:54 AM, Andy Lutomirski wrote:
>> So lets first see how a single priority intr works on ARC (maybe on other arches
>> > as well).
>> >
>> > 1. task t1 enters kernel syscall (Trap Exception on ARC), handler drops down to
>> > pure kernel model and proceeds into syscall handler.
>> > 2. while in handler, some intr is taken, which causes a reschedule to task t2.
>> > 3. t2's control flow returns (say it was in syscall when originally
>> > scheduled-out). It needs to return to user mode but cpu needs to return from
>> > active interrupt. So we return to user mode, "riding" the intr return path. Means
>> > intr in step #2 returns to a different PC and execution mode (user vs. kernel etc).
>> >
> For the benefit of people who don't know what an "active interrupt" is
> (x86 has no such concept in hardware), can you elaborate a bit? 

A bit set in AUX_IRQ_ACTIVE register which says cpu is servicing an interrupt (of
prio X - see bottom for more details). ARC has RTIE instruction to return from
intr/exception/pure-kernel-mode. In HS38 cores, h/w saves the regfile on taken
interrupts (and not exceptions). Thus RTIE needs to know about return from intr or
exception (pure kernel mode is same as exceptions). In that sense there is a
distinction between intr mode and pure kernel mode.

> On
> x86, for all practical purposes [1], an interrupt kicks the CPU into
> kernel mode, and the kernel is free to return however it likes.  It
> can do a standard interrupt return right back to the interrupted
> context, but it can also switch stacks and do whatever some other
> thread was doing.

That is the true for ARC as well. A taken interrupt can return form orig taken
intr context or it can return in the context of a sched-in task which itself had
entered kernel via a syscall. The auto-saved regfile for interrupts is compatible
with hand saved regs for traps/exceptions so it doesn't really matter.

So when I started with initial nmi support, I was lacking equivalent of
nmi_enter() in the perf intr handler. That in turn led to this nested intr + sched
of user task situation. With that fixed (originally by just fudging the soft
preempt count in nmi entry/exit), I was able to solve this.
So even if TIF_NEED_RESCHED was already set (by outer intr say), nmi returned path
won't resched because of in_nmi() being true.

My original question to Peter was, whether it is OK to elide TIF_NEED_RESCHED
checks for nmi handlers, as an optimization, so that perf intr returns faster.

>> > Now the same scheme doesn't work out of the box when u have intr and nmi. We have
>> > to actively ensure that nmi doesn't lead to a __schedule() sans user code. And
>> > this is done by bumping preempt_count(NMI_OFFSET) in entry of nmi handler.
> The perf NMI code won't schedule.  In general, you just need to ensure
> that is_nmi() is true.  Any kernel code that touches normal locks,
> schedules, gets page faults without extreme caution, etc. needs to be
> aware that nmis are special.

Exactly this bit is what I was missing in the first place.


> [1] There's an exception on 64-bit AMD CPUs because AMD blew it.
> Also, x86 NMI return is itself severely overcomplicated because we don't
> have good control over NMI nesting.

For ARC (HS38 cores), there are 16 interrupt priorities (0-high, 15-lowest) and
each active interrupt has a bit in AUX_IRQ_ACTIVE. If a prio X is active, another
prio X can't be taken (you can only take higher prio). In that sense nmi (aka prio
0) can't nest for us.

-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-29 19:48                                   ` Vineet Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Vineet Gupta @ 2016-09-29 19:48 UTC (permalink / raw)
  To: linux-snps-arc

On 09/29/2016 11:54 AM, Andy Lutomirski wrote:
>> So lets first see how a single priority intr works on ARC (maybe on other arches
>> > as well).
>> >
>> > 1. task t1 enters kernel syscall (Trap Exception on ARC), handler drops down to
>> > pure kernel model and proceeds into syscall handler.
>> > 2. while in handler, some intr is taken, which causes a reschedule to task t2.
>> > 3. t2's control flow returns (say it was in syscall when originally
>> > scheduled-out). It needs to return to user mode but cpu needs to return from
>> > active interrupt. So we return to user mode, "riding" the intr return path. Means
>> > intr in step #2 returns to a different PC and execution mode (user vs. kernel etc).
>> >
> For the benefit of people who don't know what an "active interrupt" is
> (x86 has no such concept in hardware), can you elaborate a bit? 

A bit set in AUX_IRQ_ACTIVE register which says cpu is servicing an interrupt (of
prio X - see bottom for more details). ARC has RTIE instruction to return from
intr/exception/pure-kernel-mode. In HS38 cores, h/w saves the regfile on taken
interrupts (and not exceptions). Thus RTIE needs to know about return from intr or
exception (pure kernel mode is same as exceptions). In that sense there is a
distinction between intr mode and pure kernel mode.

> On
> x86, for all practical purposes [1], an interrupt kicks the CPU into
> kernel mode, and the kernel is free to return however it likes.  It
> can do a standard interrupt return right back to the interrupted
> context, but it can also switch stacks and do whatever some other
> thread was doing.

That is the true for ARC as well. A taken interrupt can return form orig taken
intr context or it can return in the context of a sched-in task which itself had
entered kernel via a syscall. The auto-saved regfile for interrupts is compatible
with hand saved regs for traps/exceptions so it doesn't really matter.

So when I started with initial nmi support, I was lacking equivalent of
nmi_enter() in the perf intr handler. That in turn led to this nested intr + sched
of user task situation. With that fixed (originally by just fudging the soft
preempt count in nmi entry/exit), I was able to solve this.
So even if TIF_NEED_RESCHED was already set (by outer intr say), nmi returned path
won't resched because of in_nmi() being true.

My original question to Peter was, whether it is OK to elide TIF_NEED_RESCHED
checks for nmi handlers, as an optimization, so that perf intr returns faster.

>> > Now the same scheme doesn't work out of the box when u have intr and nmi. We have
>> > to actively ensure that nmi doesn't lead to a __schedule() sans user code. And
>> > this is done by bumping preempt_count(NMI_OFFSET) in entry of nmi handler.
> The perf NMI code won't schedule.  In general, you just need to ensure
> that is_nmi() is true.  Any kernel code that touches normal locks,
> schedules, gets page faults without extreme caution, etc. needs to be
> aware that nmis are special.

Exactly this bit is what I was missing in the first place.


> [1] There's an exception on 64-bit AMD CPUs because AMD blew it.
> Also, x86 NMI return is itself severely overcomplicated because we don't
> have good control over NMI nesting.

For ARC (HS38 cores), there are 16 interrupt priorities (0-high, 15-lowest) and
each active interrupt has a bit in AUX_IRQ_ACTIVE. If a prio X is active, another
prio X can't be taken (you can only take higher prio). In that sense nmi (aka prio
0) can't nest for us.

-Vineet

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NMI for ARC
  2016-09-29 19:48                                   ` Vineet Gupta
@ 2016-09-30 10:49                                     ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-30 10:49 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Andy Lutomirski, arcml, Alexey Brodkin, lkml

On Thu, Sep 29, 2016 at 12:48:03PM -0700, Vineet Gupta wrote:
> > [1] There's an exception on 64-bit AMD CPUs because AMD blew it.
> > Also, x86 NMI return is itself severely overcomplicated because we don't
> > have good control over NMI nesting.
> 
> For ARC (HS38 cores), there are 16 interrupt priorities (0-high, 15-lowest) and
> each active interrupt has a bit in AUX_IRQ_ACTIVE. If a prio X is active, another
> prio X can't be taken (you can only take higher prio). In that sense nmi (aka prio
> 0) can't nest for us.

Forget you ever read this; the reason for the nested NMIs on x86 is
horrid, but here goes:

The problem is that IRET, the only instruction capable of returning from
_any_ trap/fault/interrupt context automagically unmasks NMIs. And we
'want' to take debug-traps and page-faults from NMI context.

We use debug-traps to synchronize against self-modifying code without
halting the entire machine.

We 'need' page-faults for vmalloc backed memory, the vmalloc page tables
are not globally propagated and we need (minor) faults to populate the
page-tables on demand. And we need page-faults to do user-space access
from NMI context (for the fixup_exception stuff).

Returning from either trap or fault is done through IRET, which then
unmasks NMIs, allowing another NMI to come in while we're still
servicing one.

We have quite the horror cabinet for dealing with this ;-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NMI for ARC
@ 2016-09-30 10:49                                     ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2016-09-30 10:49 UTC (permalink / raw)
  To: linux-snps-arc

On Thu, Sep 29, 2016@12:48:03PM -0700, Vineet Gupta wrote:
> > [1] There's an exception on 64-bit AMD CPUs because AMD blew it.
> > Also, x86 NMI return is itself severely overcomplicated because we don't
> > have good control over NMI nesting.
> 
> For ARC (HS38 cores), there are 16 interrupt priorities (0-high, 15-lowest) and
> each active interrupt has a bit in AUX_IRQ_ACTIVE. If a prio X is active, another
> prio X can't be taken (you can only take higher prio). In that sense nmi (aka prio
> 0) can't nest for us.

Forget you ever read this; the reason for the nested NMIs on x86 is
horrid, but here goes:

The problem is that IRET, the only instruction capable of returning from
_any_ trap/fault/interrupt context automagically unmasks NMIs. And we
'want' to take debug-traps and page-faults from NMI context.

We use debug-traps to synchronize against self-modifying code without
halting the entire machine.

We 'need' page-faults for vmalloc backed memory, the vmalloc page tables
are not globally propagated and we need (minor) faults to populate the
page-tables on demand. And we need page-faults to do user-space access
from NMI context (for the fixup_exception stuff).

Returning from either trap or fault is done through IRET, which then
unmasks NMIs, allowing another NMI to come in while we're still
servicing one.

We have quite the horror cabinet for dealing with this ;-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-09-30 10:49 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1445286926.3913.13.camel@synopsys.com>
2015-11-17  9:14 ` local64_cmpxchg() in arc_perf_event_update() Vineet Gupta
2015-11-17  9:14   ` Vineet Gupta
2015-11-17 11:07   ` Peter Zijlstra
2015-11-17 11:07     ` Peter Zijlstra
2015-11-17 11:23     ` Vineet Gupta
2015-11-17 11:23       ` Vineet Gupta
2015-11-17 12:24       ` Peter Zijlstra
2015-11-17 12:24         ` Peter Zijlstra
2015-11-17 12:25         ` Peter Zijlstra
2015-11-17 12:25           ` Peter Zijlstra
2015-11-17 12:53           ` NMI for ARC Vineet Gupta
2015-11-17 12:53             ` Vineet Gupta
2015-11-17 13:15             ` Peter Zijlstra
2015-11-17 13:15               ` Peter Zijlstra
2016-09-28  0:22               ` Vineet Gupta
2016-09-28  0:22                 ` Vineet Gupta
2016-09-28  7:16                 ` Peter Zijlstra
2016-09-28  7:16                   ` Peter Zijlstra
2016-09-28 17:58                   ` Vineet Gupta
2016-09-28 17:58                     ` Vineet Gupta
2016-09-28 19:25                   ` Andy Lutomirski
2016-09-28 19:25                     ` Andy Lutomirski
2016-09-28 20:37                     ` Peter Zijlstra
2016-09-28 20:37                       ` Peter Zijlstra
2016-09-28 22:26                       ` Andy Lutomirski
2016-09-28 22:26                         ` Andy Lutomirski
2016-09-28 22:44                         ` Vineet Gupta
2016-09-28 22:44                           ` Vineet Gupta
2016-09-28 22:47                           ` Andy Lutomirski
2016-09-28 22:47                             ` Andy Lutomirski
2016-09-29  1:20                         ` Vineet Gupta
2016-09-29  1:20                           ` Vineet Gupta
2016-09-29  6:43                           ` Peter Zijlstra
2016-09-29  6:43                             ` Peter Zijlstra
2016-09-29 16:47                             ` Vineet Gupta
2016-09-29 16:47                               ` Vineet Gupta
2016-09-29 18:54                               ` Andy Lutomirski
2016-09-29 18:54                                 ` Andy Lutomirski
2016-09-29 19:48                                 ` Vineet Gupta
2016-09-29 19:48                                   ` Vineet Gupta
2016-09-30 10:49                                   ` Peter Zijlstra
2016-09-30 10:49                                     ` Peter Zijlstra
2016-09-29 17:30                           ` Andy Lutomirski
2016-09-29 17:30                             ` Andy Lutomirski
2015-11-17 13:24           ` local64_cmpxchg() in arc_perf_event_update() Vineet Gupta
2015-11-17 13:24             ` Vineet Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.