On Fri, 6 Aug 2021, Zi Yan wrote: > On 6 Aug 2021, at 16:27, Hugh Dickins wrote: > > On Fri, 6 Aug 2021, Zi Yan wrote: > >> > >> In addition, I would like to share more detail on my plan on supporting 1GB PUD THP. > >> This patchset is the first step, enabling kernel to allocate 1GB pages, so that > >> user can get 1GB THPs from ZONE_NORMAL and ZONE_MOVABLE without using > >> alloc_contig_pages() or CMA allocator. The next step is to improve kernel memory > >> fragmentation handling for pages up to MAX_ORDER, since currently pageblock size > >> is still limited by memory section size. As a result, I will explore solutions > >> like having additional larger pageblocks (up to MAX_ORDER) to counter memory > >> fragmentation. I will discover what else needs to be solved as I gradually improve > >> 1GB PUD THP support. > > > > Sorry to be blunt, but let me state my opinion: 2MB THPs have given and > > continue to give us more than enough trouble. Complicating the kernel's > > mm further, just to allow 1GB THPs, seems a very bad tradeoff to me. I > > understand that it's an appealing personal project; but for the sake of > > of all the rest of us, please leave 1GB huge pages to hugetlbfs (until > > the day when we are all using 2MB base pages). > > I do not agree with you. 2MB THP provides good performance, while letting us > keep using 4KB base pages. The 2MB THP implementation is the price we pay > to get the performance. This patchset removes the tie between MAX_ORDER > and section size to allow >2MB page allocation, which is useful in many > places. 1GB THP is one of the users. Gigantic pages also improve > device performance, like GPUs (e.g., AMD GPUs can use any power of two up to > 1GB pages[1], which I just learnt). Also could you point out which part > of my patchset complicates kernel’s mm? I could try to simplify it if > possible. > > In addition, I am not sure hugetlbfs is the way to go. THP is managed by > core mm, whereas hugetlbfs has its own code for memory management. > As hugetlbfs gets popular, more core mm functionalities have been > replicated and added to hugetlbfs codebase. It is not a good tradeoff > either. One of the reasons I work on 1GB THP is that Roman from Facebook > explicitly mentioned they want to use THP in place of hugetlbfs[2]. > > I think it might be more constructive to point out the existing issues > in THP so that we can improve the code together. BTW, I am also working > on simplifying THP code like generalizing THP split[3] and planning to > simplify page table manipulation code by reviving Kirill’s idea[4]. You may have good reasons for working on huge PUD entry support; and perhaps we have different understandings of "THP". Fragmentation: that's what horrifies me about 1GB THP. The dark side of THP is compaction. People have put in a lot of effort to get compaction working as well as it currently does, but getting 512 adjacent 4k pages is not easy. Getting 512*512 adjacent 4k pages is very much harder. Please put in the work on compaction before you attempt to support 1GB THP. Related fears: unexpected latencies; unacceptable variance between runs; frequent rebooting of machines to get back to an unfragmented state; page table code that most of us will never be in a position to test. Sorry, no, I'm not reading your patches: that's not personal, it's just that I've more than enough to do already, and must make choices. Hugh > > [1] https://lore.kernel.org/linux-mm/bdec12bd-9188-9f3e-c442-aa33e25303a6@amd.com/ > [2] https://lore.kernel.org/linux-mm/20200903162527.GF60440@carbon.dhcp.thefacebook.com/ > [3] https://lwn.net/Articles/837928/ > [4] https://lore.kernel.org/linux-mm/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/ > > — > Best Regards, > Yan, Zi