On 2 Oct 2020, at 4:30, David Hildenbrand wrote: > On 02.10.20 10:10, Michal Hocko wrote: >> On Fri 02-10-20 09:50:02, David Hildenbrand wrote: >>>>>> - huge page sizes controllable by the userspace? >>>>> >>>>> It might be good to allow advanced users to choose the page sizes, so they >>>>> have better control of their applications. >>>> >>>> Could you elaborate more? Those advanced users can use hugetlb, right? >>>> They get a very good control over page size and pool preallocation etc. >>>> So they can get what they need - assuming there is enough memory. >>>> >>> >>> I am still not convinced that 1G THP (TGP :) ) are really what we want >>> to support. I can understand that there are some use cases that might >>> benefit from it, especially: >> >> Well, I would say that internal support for larger huge pages (e.g. 1GB) >> that can transparently split under memory pressure is a useful >> funtionality. I cannot really judge how complex that would be > > Right, but that's then something different than serving (scarce, > unmovable) gigantic pages from CMA / reserved hugetlbfs pool. Nothing > wrong about *real* THP support, meaning, e.g., grouping consecutive > pages and converting them back and forth on demand. (E.g., 1GB -> > multiple 2MB -> multiple single pages), for example, when having to > migrate such a gigantic page. But that's very different from our > existing gigantic page code as far as I can tell. Serving 1GB PUD THPs from CMA is a compromise, since we do not want to bump MAX_ORDER to 20 to enable 1GB page allocation in buddy allocator, which needs section size increase. In addition, unmoveable pages cannot be allocated in CMA, so allocating 1GB pages has much higher chance from it than from ZONE_NORMAL. >> consideting that 2MB THP have turned out to be quite a pain but >> situation has settled over time. Maybe our current code base is prepared >> for that much better. I am planning to refactor my code further to reduce the amount of the added code, since PUD THP is very similar to PMD THP. One thing I want to achieve is to enable split_huge_page to split any order of pages to a group of any lower order of pages. A lot of code in this patchset is replicating the same behavior of PMD THP at PUD level. It might be possible to deduplicate most of the code. >> >> Exposing that interface to the userspace is a different story of course. >> I do agree that we likely do not want to be very explicit about that. >> E.g. an interface for address space defragmentation without any more >> specifics sounds like a useful feature to me. It will be up to the >> kernel to decide which huge pages to use. > > Yes, I think one important feature would be that we don't end up placing > a gigantic page where only a handful of pages are actually populated > without green light from the application - because that's what some user > space applications care about (not consuming more memory than intended. > IIUC, this is also what this patch set does). I'm fine with placing > gigantic pages if it really just "defragments" the address space layout, > without filling unpopulated holes. > > Then, this would be mostly invisible to user space, and we really > wouldn't have to care about any configuration. I agree that the interface should be as simple as no configuration to most users. But I also wonder why we have hugetlbfs to allow users to specify different kinds of page sizes, which seems against the discussion above. Are we assuming advanced users should always use hugetlbfs instead of THPs? — Best Regards, Yan Zi