On Mon, 2017-06-05 at 15:36 -0700, Andy Lutomirski wrote: > +++ b/arch/x86/include/asm/mmu_context.h > @@ -122,8 +122,10 @@ static inline void switch_ldt(struct mm_struct > *prev, struct mm_struct *next) >   >  static inline void enter_lazy_tlb(struct mm_struct *mm, struct > task_struct *tsk) >  { > - if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK) > - this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY); > + int cpu = smp_processor_id(); > + > + if (cpumask_test_cpu(cpu, mm_cpumask(mm))) > + cpumask_clear_cpu(cpu, mm_cpumask(mm)); >  } This is an atomic write to a shared cacheline, every time a CPU goes idle. I am not sure you really want to do this, since there are some workloads out there that have a crazy number of threads, which go idle hundreds, or even thousands of times a second, on dozens of CPUs at a time. *cough*Java*cough* Keeping track of the state in a CPU-local variable, written with a non-atomic write, would be much more CPU cache friendly here. -- All rights reversed