On 10 Sep 2020, at 4:27, David Hildenbrand wrote: > On 10.09.20 09:32, Michal Hocko wrote: >> [Cc Vlastimil and Mel - the whole email thread starts >> http://lkml.kernel.org/r/20200902180628.4052244-1-zi.yan@sent.com >> but this particular subthread has diverged a bit and you might find it >> interesting] >> >> On Wed 09-09-20 15:43:55, David Hildenbrand wrote: >>> On 09.09.20 15:19, Rik van Riel wrote: >>>> On Wed, 2020-09-09 at 09:04 +0200, Michal Hocko wrote: >>>>> On Tue 08-09-20 10:41:10, Rik van Riel wrote: >>>>>> On Tue, 2020-09-08 at 16:35 +0200, Michal Hocko wrote: >>>>>> >>>>>>> A global knob is insufficient. 1G pages will become a very >>>>>>> precious >>>>>>> resource as it requires a pre-allocation (reservation). So it >>>>>>> really >>>>>>> has >>>>>>> to be an opt-in and the question is whether there is also some >>>>>>> sort >>>>>>> of >>>>>>> access control needed. >>>>>> >>>>>> The 1GB pages do not require that much in the way of >>>>>> pre-allocation. The memory can be obtained through CMA, >>>>>> which means it can be used for movable 4kB and 2MB >>>>>> allocations when not >>>>>> being used for 1GB pages. >>>>> >>>>> That CMA has to be pre-reserved, right? That requires a >>>>> configuration. >>>> >>>> To some extent, yes. >>>> >>>> However, because that pool can be used for movable >>>> 4kB and 2MB >>>> pages as well as for 1GB pages, it would be easy to just set >>>> the size of that pool to eg. 1/3 or even 1/2 of memory for every >>>> system. >>>> >>>> It isn't like the pool needs to be the exact right size. We >>>> just need to avoid the "highmem problem" of having too little >>>> memory for kernel allocations. >>>> >>> >>> I am not sure I like the trend towards CMA that we are seeing, reserving >>> huge buffers for specific users (and eventually even doing it >>> automatically). >>> >>> What we actually want is ZONE_MOVABLE with relaxed guarantees, such that >>> anybody who requires large, unmovable allocations can use it. >>> >>> I once played with the idea of having ZONE_PREFER_MOVABLE, which >>> a) Is the primary choice for movable allocations >>> b) Is allowed to contain unmovable allocations (esp., gigantic pages) >>> c) Is the fallback for ZONE_NORMAL for unmovable allocations, instead of >>> running out of memory >> >> I might be missing something but how can this work longterm? Or put in >> another words why would this work any better than existing fragmentation >> avoidance techniques that page allocator implements already - movability >> grouping etc. Please note that I am not deeply familiar with those but >> my high level understanding is that we already try hard to not mix >> movable and unmovable objects in same page blocks as much as we can. > > Note that we group in pageblock granularity, which avoids fragmentation > on a pageblock level, not on anything bigger than that. Especially > MAX_ORDER - 1 pages (e.g., on x86-64) and gigantic pages. > > So once you run for some time on a system (especially thinking about > page shuffling *within* a zone), trying to allocate a gigantic page will > simply always fail - even if you always had plenty of free memory in > your single zone. > >> >> My suspicion is that a separate zone would work in a similar fashion. As >> long as there is a lot of free memory then zone will be effectively >> MOVABLE. Similar applies to normal zone when unmovable allocations are > > Note the difference to MOVABLE: if you really want, you *can* put > movable allocations into that zone. So you can happily allocate gigantic > pages from it. Or anything else you like. As the name suggests "prefer > movable allocations". > >> in minority. As long as the Normal zone gets full of unmovable objects >> they start overflowing to ZONE_PREFER_MOVABLE and it will resemble page >> block stealing when unmovable objects start spreading over movable page >> blocks. > > Right, the long-term goal would be > 1. To limit the chance of that happening. (e.g., size it in a way that's > safe for 99.9% of all setups, resize dynamically on demand) > 2. To limit the physical area where that is happening (e.g., find lowest > possible pageblock etc.). That's more tricky but I consider this a pure > optimization on top. > > As long as we stay in safe zone boundaries you get a benefit in most > scenarios. As soon as we would have a (temporary) workload that would > require more unmovable allocations we would fallback to polluting some > pageblocks only. The idea would work well until unmoveable pages begin to overflow into ZONE_PREFER_MOVABLE or we move the boundary of ZONE_PREFER_MOVABLE to avoid unmoveable page overflow. The issue comes from the lifetime of the unmoveable pages. Since some long-live ones can be around the boundary, there is no guarantee that ZONE_PREFER_MOVABLE cannot grow back even if other unmoveable pages are deallocated. Ultimately, ZONE_PREFER_MOVABLE would be shrink to a small size and the situation is back to what we have now. OK. I have a stupid question here. Why not just grow pageblock to a larger size, like 1GB? So the fragmentation of unmoveable pages will be at larger granularity. But it is less likely unmoveable pages will be allocated at a movable pageblock, since the kernel has 1GB pageblock for them after a pageblock stealing. If other kinds of pageblocks run out, moveable and reclaimable pages can fall back to unmoveable pageblocks. What am I missing here? Thanks. — Best Regards, Yan Zi