> Are you sure it's due to page faults and not khugepaged + high value > (such as the default 511) of max_ptes_none? As reported here? > > https://bugzilla.kernel.org/show_bug.cgi?id=93111 > > Once you have faulted in a THP, and then purged part of it and split it, > I don't think page faults in the purged part can lead to a new THP > collapse, only khugepaged can do that AFAIK. > And if you mmap smaller than 2M areas (i.e. your 256K chunks), that > should prevent THP page faults on the first fault within the chunk as well. Hm, that's probably it. The page faults would still be an issue when reserving ranges on 64-bit for parallel chunk allocation and to make sure the lowest address chunks are the oldest from the start, which is likely down the road. A nice property of 2M chunks is that mremap doesn't need to split huge pages and neither does purging at the chunk level. I'd expect that to be a *good thing* rather than something that needs to be avoided due to an aggressive heuristic.