> >Regarding the overhead of the shared runqueue lock: > > > >Is the "lock" prefix actually required for locking between x86 > >siblings which share the same L1 cache? > > > > That lock is still taken by other CPUs as well for eg. wakeups, balancing, > and so forth. I guess it could be a very specific optimisation for > spinlocks in general if there was only one HT core. Don't know if it > would be worthwhile though. also keep in mind that current x86 processors all will internally optimize out the lock prefix in UP mode or when the cacheline is owned exclusive.... If HT matters here let the cpu optimize it out.....