linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible race condition in i386 global_irq_lock handling.
@ 2003-08-21  8:48 TeJun Huh
  2003-08-21 10:07 ` Zwane Mwaikambo
  0 siblings, 1 reply; 12+ messages in thread
From: TeJun Huh @ 2003-08-21  8:48 UTC (permalink / raw)
  To: linux-kernel

 I've been reading i386 interrupt handling code for a couple of days
and encountered something that looks like a race condition.  It's
between include/asm-i386/hardirq.h:irq_enter() and
arch/i386/kernel/irq.c:get_irqlock().  They seem to be using lockless
synchronization with local_irq_count of each cpu and global_irq_lock
variable.

 A. locking CPU

 1. Do test_and_set_bit() on global_irq_lock, if fail, repeat.
 2. If all local_irq_count's are zero, we're the winner.  Check other
    stuff; otherwise, clear global_irq_lock and retry.

 B. other CPUs

 1. Increment local_irq_count
 2. test_bit() on global_irq_lock, if zero, continue handling interrupt;
    otherwise, wait till it's cleared.

 For this to work, the locking CPU should fetch the value of
local_irq_count after global_irq_lock value becomes visible to other
CPUs, and other CPUs should fetch the value of global_irq_lock after
making the incremented local_irq_count visible to other CPUs.

 The locking CPU is OK because test_and_set_bit() forces ordering on
x86, but there should be a mb() betweewn step 1 and 2 for other CPUs
because none of ++ and test_bit is ordering.  The B part is irq_enter()
in hardirq.h which looks like the following.

static inline void irq_enter(int cpu, int irq)
{
	++local_irq_count(cpu);

	while (test_bit(0,&global_irq_lock)) {
		cpu_relax();
	}
}

 Is it a race condition or am I getting it horribly wrong?  Thx in
advance.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-21  8:48 Possible race condition in i386 global_irq_lock handling TeJun Huh
@ 2003-08-21 10:07 ` Zwane Mwaikambo
  2003-08-21 16:15   ` TeJun Huh
  0 siblings, 1 reply; 12+ messages in thread
From: Zwane Mwaikambo @ 2003-08-21 10:07 UTC (permalink / raw)
  To: TeJun Huh; +Cc: linux-kernel

On Thu, 21 Aug 2003, TeJun Huh wrote:

>  I've been reading i386 interrupt handling code for a couple of days
> and encountered something that looks like a race condition.  It's
> between include/asm-i386/hardirq.h:irq_enter() and
> arch/i386/kernel/irq.c:get_irqlock().  They seem to be using lockless
> synchronization with local_irq_count of each cpu and global_irq_lock
> variable.

Ok 2.4 (but for future try and mention which kernel version). You'll have 
to forgive me if i misunderstand you..

>  A. locking CPU
> 
>  1. Do test_and_set_bit() on global_irq_lock, if fail, repeat.
>  2. If all local_irq_count's are zero, we're the winner.  Check other
>     stuff; otherwise, clear global_irq_lock and retry.

Are you referring to hardirq_trylock()?

>  B. other CPUs
> 
>  1. Increment local_irq_count
>  2. test_bit() on global_irq_lock, if zero, continue handling interrupt;
>     otherwise, wait till it's cleared.
> 
>  For this to work, the locking CPU should fetch the value of
> local_irq_count after global_irq_lock value becomes visible to other
> CPUs, and other CPUs should fetch the value of global_irq_lock after
> making the incremented local_irq_count visible to other CPUs.

Why after? it's currently in an interrupt anyway, the local_irq_count is 
per cpu so it's not used on other cpus why do you need to make it 
visible on other processors? (save irqs_running() but even that's ok)

>  The locking CPU is OK because test_and_set_bit() forces ordering on
> x86, but there should be a mb() betweewn step 1 and 2 for other CPUs
> because none of ++ and test_bit is ordering.  The B part is irq_enter()
> in hardirq.h which looks like the following.
> 
> static inline void irq_enter(int cpu, int irq)
> {
> 	++local_irq_count(cpu);
> 
> 	while (test_bit(0,&global_irq_lock)) {
> 		cpu_relax();
> 	}
> }
> 
>  Is it a race condition or am I getting it horribly wrong?  Thx in
> advance.

I don't see or understand the race condition you're describing, 
local_irq_count is per cpu.

	Zwane


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-21 10:07 ` Zwane Mwaikambo
@ 2003-08-21 16:15   ` TeJun Huh
  0 siblings, 0 replies; 12+ messages in thread
From: TeJun Huh @ 2003-08-21 16:15 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: linux-kernel

On Thu, Aug 21, 2003 at 06:07:34AM -0400, Zwane Mwaikambo wrote:
> 
> Ok 2.4 (but for future try and mention which kernel version). You'll have 
> to forgive me if i misunderstand you..

 The version I'm looking at is 2.4.21. Sorry about forgetting to
mention.

> Are you referring to hardirq_trylock()?
>...cut...
> >  For this to work, the locking CPU should fetch the value of
> > local_irq_count after global_irq_lock value becomes visible to other
> > CPUs, and other CPUs should fetch the value of global_irq_lock after
> > making the incremented local_irq_count visible to other CPUs.
> 
> Why after? it's currently in an interrupt anyway, the local_irq_count is 
> per cpu so it's not used on other cpus why do you need to make it 
> visible on other processors? (save irqs_running() but even that's ok)

 I'm talking about global_irq_lock synchronization. local_irq_count
_is_ local but used to synchronize global irq lock. Sparc uses big
reader lock for this purpose but x86 code seems to use memory-ordered
lockless synchronization.

 I'll describe it in more detail. On MP, cli() is __global_cli(),
which in turn calls get_irqlock(). get_irqlock() uses
test_and_set_bit() and wait_on_irq() to achieve global irq locking.
The counterpart of this locking is irq_enter() and irq_exit().
Simplified version of the mechanism is as following.

A. get_irqlock() -> wait_on_irq()

1. Repeat test_and_set_bit(0, &global_irq_lock) until we're the winner.
2. Test if all local_irq_count's are zero. If there is any non-zero
   value, the CPU might have entered interrupt handler already. Clear
   global_irq_lock and go back to step 1.

=> If the test succeeded, we should be sure that no other cpu is
   running an interrupt handler and none will enter interrupt handler
   until global_irq_lock is cleared.

B. irq_enter()

1. Increment local_irq_count.
2. Do test_bit(0, &global_irq_lock). If it's set, someone is trying to
   grab or have grabbed global_irq_lock, loop until it gets cleared.
   If global_irq_lock is clear, the CPU enters interrupt handler.

 The race condition occurs because there is no mb() between step 1 and
2 of irq_enter(). Example scenarios would be

 [AM]: atomic & memory barrier
 [L] : local to cpu (not yet visible to other cpus)
 [G] : became global

	A				B
  calls cli()			Interrupt occurs
  executing get_irqlock()	executing irq_enter()

** Scenario #1
				[L]++local_irq_counter
				fetch global_irq_lock
  [AM]set global_irq_lock	test global_irq_lock
  fetch local_irq_counter
  test local_irq_counter	[G]++local_irq_counter
  
** Scenario #2
				fetch global_irq_lock
  [AM]set global_irq_lock
  fetch local_irq_counter
  test local_irq_counter	[L]++local_irq_counter
				[G]++local_irq_counter
				test global_irq_lock

 On above scenarios, B enters interrupt handler and A returns
successfully from cli() - B will be executing an interrupt handler
while A is inside cli(), sti() critical section. This occurs because
there is nothing which forces fetching of global_irq_lock occur after
making local_irq_counter increment visible to other cpus.

 If I misunderstood the synchronization mechanism or architectural
characteristics, please point out.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-24  3:06         ` TeJun Huh
@ 2003-08-24 22:03           ` Andrea Arcangeli
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2003-08-24 22:03 UTC (permalink / raw)
  To: TeJun Huh; +Cc: Stephan von Krawczynski, manfred, linux-kernel, zwane

On Sun, Aug 24, 2003 at 12:06:51PM +0900, TeJun Huh wrote:
>  As now I know that test_and_set_bit() implies memory barrier,
> smb_mb__after_clear_bit() can be removed.  I'll make and post a patch

;) right

> which fixes this race and the bh race of the other thread.

thanks,

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-22 16:25       ` Andrea Arcangeli
@ 2003-08-24  3:06         ` TeJun Huh
  2003-08-24 22:03           ` Andrea Arcangeli
  0 siblings, 1 reply; 12+ messages in thread
From: TeJun Huh @ 2003-08-24  3:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Stephan von Krawczynski, manfred, linux-kernel, zwane

 Hello Andrea,

On Fri, Aug 22, 2003 at 06:25:46PM +0200, Andrea Arcangeli wrote:
> thanks TeJun,
> 
> just one comment
> 
> On Fri, Aug 22, 2003 at 10:18:40AM +0900, TeJun Huh wrote:
> >  3. remove irqs_running() test from synchronize_irq()
> 
> I'm not convinced this one is needed. An irq can still run on another
> cpu but the cli();sti() may execute while it's here:
> 
> 	irq running		synchronize_irq()
> 	--------------		-----------------
> 	do_IRQ
> 	handle_IRQ_event
> 				cli()
> 				sti()
> 
> 	irq_enter -> way too late
> 
> in short, doing irqs_running() doesn't seem to weaken the semantics of
> synchronize_irq() to me.
> 
> I think it should be changed this way instead:
> 
> void synchronize_irq(void)
> {
> 	smp_mb();
> 	if (irqs_running()) {
> 		/* Stupid approach */
> 		cli();
> 		sti();
> 	}
> }
> 
> to be sure to read the local irq area after the previous code (the
> test_and_set_bit of the global_irq_lock of a cli() in your version would
> achieve the same implicit smp_mb too, so maybe your only point for doing
> cli()/sti() was to execute the smp_mb before the irqs_running?).  the
> above version is more finegrined and it looks equivalent to yours.
> 
> Andrea

 Yes, you're right.  Adding just smp_mb() should guarantee that no cpu
is executing interrupt handler which may not see memory contents
modified before synchronize_irq() after synchronize_irq() returns.  I
think we need some decent comments there. :-)

 As now I know that test_and_set_bit() implies memory barrier,
smb_mb__after_clear_bit() can be removed.  I'll make and post a patch
which fixes this race and the bh race of the other thread.

 Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-22  1:18     ` TeJun Huh
  2003-08-22 10:07       ` Stephan von Krawczynski
@ 2003-08-22 16:25       ` Andrea Arcangeli
  2003-08-24  3:06         ` TeJun Huh
  1 sibling, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2003-08-22 16:25 UTC (permalink / raw)
  To: Stephan von Krawczynski, manfred, linux-kernel, zwane; +Cc: TeJun Huh

thanks TeJun,

just one comment

On Fri, Aug 22, 2003 at 10:18:40AM +0900, TeJun Huh wrote:
>  3. remove irqs_running() test from synchronize_irq()

I'm not convinced this one is needed. An irq can still run on another
cpu but the cli();sti() may execute while it's here:

	irq running		synchronize_irq()
	--------------		-----------------
	do_IRQ
	handle_IRQ_event
				cli()
				sti()

	irq_enter -> way too late

in short, doing irqs_running() doesn't seem to weaken the semantics of
synchronize_irq() to me.

I think it should be changed this way instead:

void synchronize_irq(void)
{
	smp_mb();
	if (irqs_running()) {
		/* Stupid approach */
		cli();
		sti();
	}
}

to be sure to read the local irq area after the previous code (the
test_and_set_bit of the global_irq_lock of a cli() in your version would
achieve the same implicit smp_mb too, so maybe your only point for doing
cli()/sti() was to execute the smp_mb before the irqs_running?).  the
above version is more finegrined and it looks equivalent to yours.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-22  1:18     ` TeJun Huh
@ 2003-08-22 10:07       ` Stephan von Krawczynski
  2003-08-22 16:25       ` Andrea Arcangeli
  1 sibling, 0 replies; 12+ messages in thread
From: Stephan von Krawczynski @ 2003-08-22 10:07 UTC (permalink / raw)
  To: TeJun Huh; +Cc: andrea, manfred, linux-kernel, zwane

On Fri, 22 Aug 2003 10:18:40 +0900
TeJun Huh <tejun@aratech.co.kr> wrote:

>  I'm attaching patch for i386. It makes three changes.
> 
>  1. add smp_mb() between local_irq_count++ and global_irq_lock test
>     in irq_enter().
>  2. add smp_mb__after_clear_bit() before irqs_running() test in
>     wait_on_irq().
>  3. remove irqs_running() test from synchronize_irq()

Thank you TeJun,

I have started tests and will provide feedback if your patch has any influence
on my problem. This may take some days.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-21 21:48   ` Stephan von Krawczynski
  2003-08-21 22:44     ` Andrea Arcangeli
@ 2003-08-22  1:18     ` TeJun Huh
  2003-08-22 10:07       ` Stephan von Krawczynski
  2003-08-22 16:25       ` Andrea Arcangeli
  1 sibling, 2 replies; 12+ messages in thread
From: TeJun Huh @ 2003-08-22  1:18 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Andrea Arcangeli, manfred, linux-kernel, zwane

 I'm attaching patch for i386. It makes three changes.

 1. add smp_mb() between local_irq_count++ and global_irq_lock test
    in irq_enter().
 2. add smp_mb__after_clear_bit() before irqs_running() test in
    wait_on_irq().
 3. remove irqs_running() test from synchronize_irq()

 Removing irqs_running() test from synchronize_irq() is needed for the
same reason. Other interrupts might be running on successful return
from synchronize_irq().

 smp_mb__after_clear_bit() should be smp_mb__after_test_and_set_bit()
which doesn't exist. Should I add this?

 After determining smp_mb__after_clear_bit(), I'll make a patch for
every affected architecture. Please comment.

# ------------ patch follows --------------

diff -Nru a/arch/i386/kernel/irq.c b/arch/i386/kernel/irq.c
--- a/arch/i386/kernel/irq.c	Fri Aug 22 10:07:50 2003
+++ b/arch/i386/kernel/irq.c	Fri Aug 22 10:07:50 2003
@@ -271,6 +271,8 @@
 		 * for bottom half handlers unless we're
 		 * already executing in one..
 		 */
+		smp_mb__after_clear_bit(); /* Synchronize with irq_enter() */
+
 		if (!irqs_running())
 			if (local_bh_count(cpu) || !spin_is_locked(&global_bh_lock))
 				break;
@@ -307,11 +309,9 @@
  */
 void synchronize_irq(void)
 {
-	if (irqs_running()) {
-		/* Stupid approach */
-		cli();
-		sti();
-	}
+	/* Stupid approach */
+	cli();
+	sti();
 }
 
 static inline void get_irqlock(int cpu)
diff -Nru a/include/asm-i386/hardirq.h b/include/asm-i386/hardirq.h
--- a/include/asm-i386/hardirq.h	Fri Aug 22 10:07:50 2003
+++ b/include/asm-i386/hardirq.h	Fri Aug 22 10:07:50 2003
@@ -67,6 +67,8 @@
 {
 	++local_irq_count(cpu);
 
+	smp_mb(); /* Synchronize with wait_on_irq() */
+
 	while (test_bit(0,&global_irq_lock)) {
 		cpu_relax();
 	}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-21 21:48   ` Stephan von Krawczynski
@ 2003-08-21 22:44     ` Andrea Arcangeli
  2003-08-22  1:18     ` TeJun Huh
  1 sibling, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2003-08-21 22:44 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: manfred, tejun, linux-kernel, zwane

On Thu, Aug 21, 2003 at 11:48:24PM +0200, Stephan von Krawczynski wrote:
> 
> > smb_rmb is enough in practice for x86 (in asm-i386), but not the right
> > barrier in general because rmb only serializes reads against reads, so
> > it would also make little sense while reading the i386 code. here you've
> > to serialize a write against a read so it would be misleading unless you
> > know exactly the lowlevel implementations of those barriers.
> > 
> > smp_mb() before the while loop should be the correct barrier for all
> > archs and the asm generated on x86 will be the same.
> > 
> > alpha, ia64 and x86-64 (and probably others) needs it too.
> 
> Can some kind soul please provide me with the needed mini-patch. I would like
> to try that on my constantly crashing SMP test box...

--- 2.4.22pre7aa1/include/asm-i386/hardirq.h.~1~	2003-07-20 18:39:04.000000000 +0200
+++ 2.4.22pre7aa1/include/asm-i386/hardirq.h	2003-08-22 00:24:08.000000000 +0200
@@ -71,6 +71,8 @@ static inline void irq_enter(int cpu, in
 {
 	++local_irq_count(cpu);
 
+	smp_mb();
+
 	while (test_bit(0,&global_irq_lock)) {
 		cpu_relax();
 	}

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-21 17:27 ` Andrea Arcangeli
@ 2003-08-21 21:48   ` Stephan von Krawczynski
  2003-08-21 22:44     ` Andrea Arcangeli
  2003-08-22  1:18     ` TeJun Huh
  0 siblings, 2 replies; 12+ messages in thread
From: Stephan von Krawczynski @ 2003-08-21 21:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: manfred, tejun, linux-kernel, zwane


> smb_rmb is enough in practice for x86 (in asm-i386), but not the right
> barrier in general because rmb only serializes reads against reads, so
> it would also make little sense while reading the i386 code. here you've
> to serialize a write against a read so it would be misleading unless you
> know exactly the lowlevel implementations of those barriers.
> 
> smp_mb() before the while loop should be the correct barrier for all
> archs and the asm generated on x86 will be the same.
> 
> alpha, ia64 and x86-64 (and probably others) needs it too.

Can some kind soul please provide me with the needed mini-patch. I would like
to try that on my constantly crashing SMP test box...

Regards,
Stephan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
  2003-08-21 17:01 Manfred Spraul
@ 2003-08-21 17:27 ` Andrea Arcangeli
  2003-08-21 21:48   ` Stephan von Krawczynski
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2003-08-21 17:27 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: TeJun Huh, linux-kernel, Zwane Mwaikambo

On Thu, Aug 21, 2003 at 07:01:39PM +0200, Manfred Spraul wrote:
> TeJun wrote:
> >static inline void irq_enter(int cpu, int irq)
> >{
> >	++local_irq_count(cpu);
> >
> >	while (test_bit(0,&global_irq_lock)) {
> >		cpu_relax();
> >	}
> >}
> >
> > Is it a race condition or am I getting it horribly wrong?  Thx in
> >advance.
> 
> Yes, it's a race. Actually a variant of the race that lead to the 
> introduction of set_current_state():
> 
> test_bit is a simple read instruction. i386 cpus are free to execute it 
> early, i.e. they can execute it before the write part of 
> "++local_irq_count(cpu)".
> 
> I think smp_rmb() is the right barrier - could you write a patch and send 
> it to Marcelo?

smb_rmb is enough in practice for x86 (in asm-i386), but not the right
barrier in general because rmb only serializes reads against reads, so
it would also make little sense while reading the i386 code. here you've
to serialize a write against a read so it would be misleading unless you
know exactly the lowlevel implementations of those barriers.

smp_mb() before the while loop should be the correct barrier for all
archs and the asm generated on x86 will be the same.

alpha, ia64 and x86-64 (and probably others) needs it too.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race condition in i386 global_irq_lock handling.
@ 2003-08-21 17:01 Manfred Spraul
  2003-08-21 17:27 ` Andrea Arcangeli
  0 siblings, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2003-08-21 17:01 UTC (permalink / raw)
  To: TeJun Huh; +Cc: linux-kernel, Zwane Mwaikambo

TeJun wrote:
> static inline void irq_enter(int cpu, int irq)
> {
> 	++local_irq_count(cpu);
> 
> 	while (test_bit(0,&global_irq_lock)) {
> 		cpu_relax();
> 	}
> }
> 
>  Is it a race condition or am I getting it horribly wrong?  Thx in
> advance.

Yes, it's a race. Actually a variant of the race that lead to the introduction of set_current_state():

test_bit is a simple read instruction. i386 cpus are free to execute it early, i.e. they can execute it before the write part of "++local_irq_count(cpu)".

I think smp_rmb() is the right barrier - could you write a patch and send it to Marcelo?

--
	Manfred




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-08-24 22:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-21  8:48 Possible race condition in i386 global_irq_lock handling TeJun Huh
2003-08-21 10:07 ` Zwane Mwaikambo
2003-08-21 16:15   ` TeJun Huh
2003-08-21 17:01 Manfred Spraul
2003-08-21 17:27 ` Andrea Arcangeli
2003-08-21 21:48   ` Stephan von Krawczynski
2003-08-21 22:44     ` Andrea Arcangeli
2003-08-22  1:18     ` TeJun Huh
2003-08-22 10:07       ` Stephan von Krawczynski
2003-08-22 16:25       ` Andrea Arcangeli
2003-08-24  3:06         ` TeJun Huh
2003-08-24 22:03           ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).