It's a similar to race condition spotted in i386 interrupt code.  The
race exists between tasklet_[hi_]action() and tasklet_disable().
Again, memory-ordered synchronization is used between
tasklet_struct.count and tasklet_struct.state.

 tasklet_disable() is find because there's an smp_mb() at the end of
tasklet_disable_nosync(); however, in tasklet_action(), there is no
mb() between tasklet_trylock(t) and atomic_read(&t->count).  This
won't cause any trouble on architectures which orders memory accesses
around atomic operations such (including x86), but on architectures
which don't, a tasklet can be executing on another cpu on return from
tasklet_disable().

 Adding smp_mb__after_test_and_set_bit() at the end of
tasklet_trylock() should remedy the situation.  As
smp_mb__{before|after}_test_and_set_bit() don't exist yet, I'm
attaching a patch which adds smp_mb__after_clear_bit().  The patch is
against 2.4.21.

P.S. Please comment on the addition of
smp_mb__{before|after}_test_and_set_bit().

P.P.S. One thing I don't really understand is the uses of smp_mb() at
the end of tasklet_disable() and smp_mb__before_atomic_dec() inside
tasklet_enable().  Can anybody tell me what those are for?

--
tejun