On Fri, Jun 10, 2016 at 01:25:03AM +0800, Boqun Feng wrote: > On Thu, Jun 09, 2016 at 10:23:28PM +1000, Michael Ellerman wrote: > > On Wed, 2016-06-08 at 15:59 +0200, Peter Zijlstra wrote: > > > On Wed, Jun 08, 2016 at 11:49:20PM +1000, Michael Ellerman wrote: > > > > > > > > Ok; what tree does this go in? I have this dependent series which I'd > > > > > like to get sorted and merged somewhere. > > > > > > > > Ah sorry, I didn't realise. I was going to put it in my next (which doesn't > > > > exist yet but hopefully will early next week). > > > > > > > > I'll make a topic branch with just that commit based on rc2 or rc3? > > > > > > Works for me; thanks! > > > > Unfortunately the patch isn't 100%. > > > > It's causing some of my machines to lock up hard, which isn't surprising when > > you look at the generated code for the non-atomic spin loop: > > > > c00000000009af48: 7c 21 0b 78 mr r1,r1 # HMT_LOW > > c00000000009af4c: 40 9e ff fc bne cr7,c00000000009af48 <.do_exit+0x6d8> > > > > There is even no code checking for SHARED_PROCESSOR here, so I assume > your config is !PPC_SPLPAR. > > > Which is a spin loop waiting for a result in cr7, but with no comparison. > > > > The problem seems to be that we did: > > > > @@ -184,7 +184,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) > > if (arch_spin_value_unlocked(lock_val)) > > goto out; > > > > - while (lock->slock) { > > + while (!arch_spin_value_unlocked(*lock)) { > > HMT_low(); > > if (SHARED_PROCESSOR) > > __spin_yield(lock); > > > > And as I also did an consolidation in this patch, we now share the same > piece of arch_spin_unlock_wait(), so if !PPC_SPLPAR, the previous loop > became: > > while (!arch_spin_value_unlocked(*lock)) { > HMT_low(); > } > > and given HMT_low() is not a compiler barrier. So the compiler may > optimize out the loop.. > > > Which seems to be hiding the fact that lock->slock is volatile from the > > compiler, even though arch_spin_value_unlocked() is inline. Not sure if that's > > our bug or gcc's. > > > > I think arch_spin_value_unlocked() is not volatile because > arch_spin_value_unlocked() takes the value of the lock rather than the > address of the lock as its parameter, which makes it a pure function. > > To fix this we can add READ_ONCE() for the read of lock value like the > following: > > while(!arch_spin_value_unlock(READ_ONCE(*lock))) { > HMT_low(); > ... > > Or you prefer to simply using lock->slock which is a volatile variable > already? > > Or maybe we can refactor the code a little like this: > > static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) > { > arch_spinlock_t lock_val; > > smp_mb(); > > /* > * Atomically load and store back the lock value (unchanged). This > * ensures that our observation of the lock value is ordered with > * respect to other lock operations. > */ > __asm__ __volatile__( > "1: " PPC_LWARX(%0, 0, %2, 0) "\n" > " stwcx. %0, 0, %2\n" > " bne- 1b\n" > : "=&r" (lock_val), "+m" (*lock) > : "r" (lock) > : "cr0", "xer"); > > while (!arch_spin_value_unlocked(lock_val)) { > HMT_low(); > if (SHARED_PROCESSOR) > __spin_yield(lock); > > lock_val = READ_ONCE(*lock); > } > HMT_medium(); > > smp_mb(); > } > This version will generate the correct code for the loop if !PPC_SPLPAR: c00000000009fa70: 78 0b 21 7c mr r1,r1 c00000000009fa74: ec 06 37 81 lwz r9,1772(r23) c00000000009fa78: 00 00 a9 2f cmpdi cr7,r9,0 c00000000009fa7c: f4 ff 9e 40 bne cr7,c00000000009fa70 c00000000009fa80: 78 13 42 7c mr r2,r2 The reason I used arch_spin_value_unlocked() was trying to be consistent with arch_spin_is_locked(), but most of our all lock primitives use ->slock directly. So I don't see a strong reason for us to use arch_spin_value_unlocked() here. That said, this version does save a few lines of code and make the logic a little more clear, I think. Thoughts? Regards, Boqun