eliminte NMI entry/ exit code
diff mbox series

Message ID 42FD42C1.6040009@mvista.com
State New, archived
Headers show
Series
  • eliminte NMI entry/ exit code
Related show

Commit Message

George Anzinger Aug. 13, 2005, 12:45 a.m. UTC
The NMI entry and exit code fiddles with bits in the preempt count.  If 
an NMI happens while some other code is doing the same, bits will be 
lost.  This patch removes this modify code from the NMI path till we can 
come up with something better.

Comments

Nick Piggin Aug. 13, 2005, 12:56 a.m. UTC | #1
George Anzinger wrote:
> The NMI entry and exit code fiddles with bits in the preempt count.  If 
> an NMI happens while some other code is doing the same, bits will be 
> lost.  This patch removes this modify code from the NMI path till we can 
> come up with something better.
> 

Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?

Nick
George Anzinger Aug. 13, 2005, 1:13 a.m. UTC | #2
Nick Piggin wrote:
> George Anzinger wrote:
> 
>> The NMI entry and exit code fiddles with bits in the preempt count.  
>> If an NMI happens while some other code is doing the same, bits will 
>> be lost.  This patch removes this modify code from the NMI path till 
>> we can come up with something better.
>>
> 
> Humour me for a minute here...
> NMI restores preempt_count back to its old value upon exit, right?
> So what does a race case look like?

Normal code                   NMI
fetch preempt_count
add                   <-----  interrupt here add and store then subtract 
and store, darn!
store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side...
>
Linus Torvalds Aug. 13, 2005, 1:18 a.m. UTC | #3
On Fri, 12 Aug 2005, George Anzinger wrote:
>
> The NMI entry and exit code fiddles with bits in the preempt count.  If 
> an NMI happens while some other code is doing the same, bits will be 
> lost.

Why?

Even if an NMI happens in the middle of a read-modify-write sequence that 
is critical, the NMI exit is supposed to undo whatever it was that the NMI 
entry did, so the preempt counters are "safe" wrt NMI: they may change, 
but they always change back by the time anybody cares.

This, btw, is something we depend on wrt _normal_ interrupts too. It's why 
people can read/modify/write preempt count without having to disable 
interrupts.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Zachary Amsden Aug. 13, 2005, 6:27 a.m. UTC | #4
George Anzinger wrote:

> Nick Piggin wrote:
>
>> George Anzinger wrote:
>>
>>> The NMI entry and exit code fiddles with bits in the preempt count.  
>>> If an NMI happens while some other code is doing the same, bits will 
>>> be lost.  This patch removes this modify code from the NMI path till 
>>> we can come up with something better.
>>>
>>
>> Humour me for a minute here...
>> NMI restores preempt_count back to its old value upon exit, right?
>> So what does a race case look like?
>
>
> Normal code                   NMI
> fetch preempt_count
> add                   <-----  interrupt here add and store then 
> subtract and store, darn!
> store preempt_count
>
> Ok, no problem.
>
> The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
> reasonable counts fail because of the rather undefined state when NMI 
> picks up the word.  The failure is on the NMI side... 


So NMI changing the preempt count and restoring in the middle of a RWM 
is not the problem.  Thus I don't understand what the issue is.  NMI 
must undo all side effects.  Does the PREEMPT_DEBUG code check the count 
somewhere within the NMI handler?  If so, shouldn't the proper fix be to 
make that code aware that it could be running inside of an NMI and/or 
ensure that code is not called from within the NMI handler?

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
George Anzinger Aug. 13, 2005, 4:54 p.m. UTC | #5
Zachary Amsden wrote:
> George Anzinger wrote:
> 
>> Nick Piggin wrote:
>>
>>> George Anzinger wrote:
>>>
>>>> The NMI entry and exit code fiddles with bits in the preempt count.  
>>>> If an NMI happens while some other code is doing the same, bits will 
>>>> be lost.  This patch removes this modify code from the NMI path till 
>>>> we can come up with something better.
>>>>
>>>
>>> Humour me for a minute here...
>>> NMI restores preempt_count back to its old value upon exit, right?
>>> So what does a race case look like?
>>
>>
>>
>> Normal code                   NMI
>> fetch preempt_count
>> add                   <-----  interrupt here add and store then 
>> subtract and store, darn!
>> store preempt_count
>>
>> Ok, no problem.
>>
>> The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
>> reasonable counts fail because of the rather undefined state when NMI 
>> picks up the word.  The failure is on the NMI side... 
> 
> 
> 
> So NMI changing the preempt count and restoring in the middle of a RWM 
> is not the problem.  Thus I don't understand what the issue is.  NMI 
> must undo all side effects.  Does the PREEMPT_DEBUG code check the count 
> somewhere within the NMI handler?  If so, shouldn't the proper fix be to 
> make that code aware that it could be running inside of an NMI and/or 
> ensure that code is not called from within the NMI handler?

Yes that is the problem.  The sanity check in PREEMPT_DEBUG fails when 
called from the NMI handler.
>

Patch
diff mbox series

Index: linux-2.6.13-rc/include/linux/hardirq.h
===================================================================
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@  extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)	barrier()
 #endif
-
-#define nmi_enter()		irq_enter()
-#define nmi_exit()		sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()	/*	irq_enter()  */
+#define nmi_exit()	/*	sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)