Possible race condition in i386 global_irq_lock handling.

* Possible race condition in i386 global_irq_lock handling.
@ 2003-08-21  8:48 TeJun Huh
  2003-08-21 10:07 ` Zwane Mwaikambo
  0 siblings, 1 reply; 12+ messages in thread
From: TeJun Huh @ 2003-08-21  8:48 UTC (permalink / raw)
  To: linux-kernel

 I've been reading i386 interrupt handling code for a couple of days
and encountered something that looks like a race condition.  It's
between include/asm-i386/hardirq.h:irq_enter() and
arch/i386/kernel/irq.c:get_irqlock().  They seem to be using lockless
synchronization with local_irq_count of each cpu and global_irq_lock
variable.

 A. locking CPU

 1. Do test_and_set_bit() on global_irq_lock, if fail, repeat.
 2. If all local_irq_count's are zero, we're the winner.  Check other
    stuff; otherwise, clear global_irq_lock and retry.

 B. other CPUs

 1. Increment local_irq_count
 2. test_bit() on global_irq_lock, if zero, continue handling interrupt;
    otherwise, wait till it's cleared.

 For this to work, the locking CPU should fetch the value of
local_irq_count after global_irq_lock value becomes visible to other
CPUs, and other CPUs should fetch the value of global_irq_lock after
making the incremented local_irq_count visible to other CPUs.

 The locking CPU is OK because test_and_set_bit() forces ordering on
x86, but there should be a mb() betweewn step 1 and 2 for other CPUs
because none of ++ and test_bit is ordering.  The B part is irq_enter()
in hardirq.h which looks like the following.

static inline void irq_enter(int cpu, int irq)
{
	++local_irq_count(cpu);

	while (test_bit(0,&global_irq_lock)) {
		cpu_relax();
	}
}

 Is it a race condition or am I getting it horribly wrong?  Thx in
advance.

^ permalink raw reply	[flat|nested] 12+ messages in thread