On 2/18/19 9:46 PM, Andrea Arcangeli wrote:
> Hello,
>
> On Mon, Feb 18, 2019 at 03:47:22PM -0800, Alexander Duyck wrote:
>> essentially fragmented them. I guess hugepaged went through and
>> started trying to reassemble the huge pages and as a result there have
>> been apps that ended up consuming more memory than they would have
>> otherwise since they were using fragments of THP pages after doing an
>> MADV_DONTNEED on sections of the page.
> With relatively recent kernels MADV_DONTNEED doesn't necessarily free
> anything when it's applied to a THP subpage, it only splits the
> pagetables and queues the THP for deferred splitting. If there's
> memory pressure a shrinker will be invoked and the queue is scanned
> and the THPs are physically splitted, but to be reassembled/collapsed
> after a physical split it requires at least one young pte.
>
> If this is particularly problematic for page hinting, this behavior
> where the MADV_DONTNEED can be undoed by khugepaged (if some subpage is
> being frequently accessed), can be turned off by setting
> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none to
> 0. Then the THP will only be collapsed if all 512 subpages are mapped
> (i.e. they've all be re-allocated by the guest).
>
> Regardless of the max_ptes_none default, keeping the smaller guest
> buddy orders as the last target for page hinting should be good for
> performance.
>
>> Yeah, no problem. The only thing I don't like about MADV_FREE is that
>> you have to have memory pressure before the pages really start getting
>> scrubbed with is both a benefit and a drawback. Basically it defers
>> the freeing until you are under actual memory pressure so when you hit
>> that case things start feeling much slower, that and it limits your
>> allocations since the kernel doesn't recognize the pages as free until
>> it would have to start trying to push memory to swap.
> The guest allocation behavior should not be influenced by MADV_FREE vs
> MADV_DONTNEED, the guest can't see the difference anyway, so why
> should it limit the allocations?
>
> The benefit of MADV_FREE should be that when the same guest frees and
> reallocates an huge amount of RAM (i.e. guest app allocating and
> freeing lots of RAM in a loop, not so uncommon), there will be no KVM
> page fault during guest re-allocations. So in absence of memory
> pressure in the host it should be a major win. Overall it sounds like
> a good tradeoff compared to MADV_DONTNEED that forcefully invokes MMU
> notifiers and forces host allocations and KVM page faults in order to
> reallocate the same RAM in the same guest.
This does makes sense.
Thanks for explaining this.
>
> When there's memory pressure it's up to the host Linux VM to notice
> there's plenty of MADV_FREE material to free at zero I/O cost before
> starting swapping.
>
> Thanks,
> Andrea
-- 
Regards
Nitesh