at 12:16 AM, Peter Zijlstra wrote: > On Tue, Oct 09, 2018 at 10:02:50AM +0530, Ashish Mhetre wrote: >> From: Shaohua Li >> >> We use the accessed bit to age a page at page reclaim time, >> and currently we also flush the TLB when doing so. >> >> But in some workloads TLB flush overhead is very heavy. In my >> simple multithreaded app with a lot of swap to several pcie >> SSDs, removing the tlb flush gives about 20% ~ 30% swapout >> speedup. >> >> Fortunately just removing the TLB flush is a valid optimization: >> on x86 CPUs, clearing the accessed bit without a TLB flush >> doesn't cause data corruption. >> >> It could cause incorrect page aging and the (mistaken) reclaim of >> hot pages, but the chance of that should be relatively low. >> >> So as a performance optimization don't flush the TLB when >> clearing the accessed bit, it will eventually be flushed by >> a context switch or a VM operation anyway. [ In the rare >> event of it not getting flushed for a long time the delay >> shouldn't really matter because there's no real memory >> pressure for swapout to react to. ] > > Note that context switches (and here I'm talking about switch_mm(), not > the cheaper switch_to()) do not unconditionally imply a TLB invalidation > these days (on PCID enabled hardware). > > So in that regards, the Changelog (and the comment) is a little > misleading. > > I don't see anything fundamentally wrong with the patch though; just the > wording. What am I missing? This is a patch from 2014, no? b13b1d2d8692b ?