Hi Rik, On Thu, Oct 20, 2022 at 11:28:16AM -0400, Rik van Riel wrote: > On Thu, 2022-10-20 at 13:07 +0800, Huang, Ying wrote: > > > > Nathan Chancellor writes: > > > > > > For what it's worth, I just bisected a massive and visible > > > performance > > > regression on my Threadripper 3990X workstation to commit > > > f35b5d7d676e > > > ("mm: align larger anonymous mappings on THP boundaries"), which > > > seems > > > directly related to this report/analysis. I initially noticed this > > > because my full set of kernel builds against mainline went from 2 > > > hours > > > and 20 minutes or so to over 3 hours. Zeroing in on x86_64 > > > allmodconfig, > > > which I used for the bisect: > > > > > > @ 7b5a0b664ebe ("mm/page_ext: remove unused variable in > > > offline_page_ext"): > > > > > > Benchmark 1: make -skj128 LLVM=1 allmodconfig all > > >   Time (mean ± σ):     318.172 s ±  0.730 s    [User: 31750.902 s, > > > System: 4564.246 s] > > >   Range (min … max):   317.332 s … 318.662 s    3 runs > > > > > > @ f35b5d7d676e ("mm: align larger anonymous mappings on THP > > > boundaries"): > > > > > > Benchmark 1: make -skj128 LLVM=1 allmodconfig all > > > Time (mean ± σ): 406.688 s ± 0.676 s [User: 31819.526 s, > System: 16327.022 s] > > > Range (min … max): 405.954 s … 407.284 s 3 run > > > > Have you tried to build with gcc?  Want to check whether is this > > clang > > specific issue or not. > > This may indeed be something LLVM specific. In previous tests, > GCC has generally seen a benefit from increased THP usage. > Many other applications also benefit from getting more THPs. Indeed, GCC builds actually appear to be slightly faster on my system now, apologies for not trying that before reporting :/ 7b5a0b664ebe: Benchmark 1: make -skj128 allmodconfig all Time (mean ± σ): 355.294 s ± 0.931 s [User: 33620.469 s, System: 6390.064 s] Range (min … max): 354.571 s … 356.344 s 3 runs f35b5d7d676e: Benchmark 1: make -skj128 allmodconfig all Time (mean ± σ): 347.400 s ± 2.029 s [User: 34389.724 s, System: 4603.175 s] Range (min … max): 345.815 s … 349.686 s 3 runs > LLVM showing 10% system time before this change, and a whopping > 30% system time after that change, suggests that LLVM is behaving > quite differently from GCC in some ways. The above tests were done with GCC 12.2.0 from Arch Linux. The previous LLVM tests were done with a self-compiled version of LLVM from the main branch (16.0.0), optimized with BOLT [1]. To eliminate that as a source of issues, I used my distribution's version of clang (14.0.6) and saw similar results as before: 7b5a0b664ebe: Benchmark 1: make -skj128 LLVM=/usr/bin/ allmodconfig all Time (mean ± σ): 462.517 s ± 1.214 s [User: 48544.240 s, System: 5586.212 s] Range (min … max): 461.115 s … 463.245 s 3 runs f35b5d7d676e: Benchmark 1: make -skj128 LLVM=/usr/bin/ allmodconfig all Time (mean ± σ): 547.927 s ± 0.862 s [User: 47913.709 s, System: 17682.514 s] Range (min … max): 547.429 s … 548.922 s 3 runs > If we can figure out what these differences are, maybe we can > just fine tune the code to avoid this issue. > > I'll try to play around with LLVM compilation a little bit next > week, to see if I can figure out what might be going on. I wonder > if LLVM is doing lots of mremap calls or something... If there is any further information I can provide or patches I can test, I am more than happy to do so. [1]: https://github.com/llvm/llvm-project/tree/96552e73900176d65ee6650facae8d669d6f9498/bolt Cheers, Nathan