On 2/18/19 9:46 PM, Andrea Arcangeli wrote: > Hello, > > On Mon, Feb 18, 2019 at 03:47:22PM -0800, Alexander Duyck wrote: >> essentially fragmented them. I guess hugepaged went through and >> started trying to reassemble the huge pages and as a result there have >> been apps that ended up consuming more memory than they would have >> otherwise since they were using fragments of THP pages after doing an >> MADV_DONTNEED on sections of the page. > With relatively recent kernels MADV_DONTNEED doesn't necessarily free > anything when it's applied to a THP subpage, it only splits the > pagetables and queues the THP for deferred splitting. If there's > memory pressure a shrinker will be invoked and the queue is scanned > and the THPs are physically splitted, but to be reassembled/collapsed > after a physical split it requires at least one young pte. > > If this is particularly problematic for page hinting, this behavior > where the MADV_DONTNEED can be undoed by khugepaged (if some subpage is > being frequently accessed), can be turned off by setting > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none to > 0. Then the THP will only be collapsed if all 512 subpages are mapped > (i.e. they've all be re-allocated by the guest). > > Regardless of the max_ptes_none default, keeping the smaller guest > buddy orders as the last target for page hinting should be good for > performance. > >> Yeah, no problem. The only thing I don't like about MADV_FREE is that >> you have to have memory pressure before the pages really start getting >> scrubbed with is both a benefit and a drawback. Basically it defers >> the freeing until you are under actual memory pressure so when you hit >> that case things start feeling much slower, that and it limits your >> allocations since the kernel doesn't recognize the pages as free until >> it would have to start trying to push memory to swap. > The guest allocation behavior should not be influenced by MADV_FREE vs > MADV_DONTNEED, the guest can't see the difference anyway, so why > should it limit the allocations? > > The benefit of MADV_FREE should be that when the same guest frees and > reallocates an huge amount of RAM (i.e. guest app allocating and > freeing lots of RAM in a loop, not so uncommon), there will be no KVM > page fault during guest re-allocations. So in absence of memory > pressure in the host it should be a major win. Overall it sounds like > a good tradeoff compared to MADV_DONTNEED that forcefully invokes MMU > notifiers and forces host allocations and KVM page faults in order to > reallocate the same RAM in the same guest. This does makes sense. Thanks for explaining this. > > When there's memory pressure it's up to the host Linux VM to notice > there's plenty of MADV_FREE material to free at zero I/O cost before > starting swapping. > > Thanks, > Andrea -- Regards Nitesh