On Wed, Sep 01, 2021 at 07:23:24PM -0700, Andi Kleen wrote: > > On 9/1/2021 6:35 PM, Feng Tang wrote: > >On Wed, Sep 01, 2021 at 08:12:24AM -0700, Andi Kleen wrote: > >>Feng Tang writes: > >>>Yes, the tests I did is no matter where the 128B padding is added, the > >>>performance can be restored and even improved. > >>I wonder if we can find some cold, rarely accessed, data to put into the > >>padding to not waste it. Perhaps some name strings? Or the destroy > >>support, which doesn't sound like its commonly used. > >Yes, I tried to move 'destroy_work', 'destroy_rwork' and 'parent' over > >before the 'refcnt' together with some padding, it restored the performance > >to about 10~15% regression. (debug patch pasted below) > > > >But I'm not sure if we should use it, before we can fully explain the > >regression. > > Narrowing it down to a single prefetcher seems good enough to me. The > behavior of the prefetchers is fairly complicated and hard to predict, so I > doubt you'll ever get a 100% step by step explanation. Yes, I'm afriad so, given that the policy/algorithm used by perfetcher keeps changing from generation to generation. I will test the patch more with other benchmarks. Thanks, Feng > > -Andi >