linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>, Shakeel Butt <shakeelb@google.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Zi Yan <ziy@nvidia.com>, Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"George G. Davis" <davis.george@siemens.com>
Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED
Date: Mon, 29 Nov 2021 14:00:40 -0800	[thread overview]
Message-ID: <CAHbLzkpT3cU0PfmD2LeM9LuYjUhZhArRHxaP7JAU0koeDSXHvw@mail.gmail.com> (raw)
In-Reply-To: <d068c48d-897d-238b-010c-056951c9e3f1@redhat.com>

On Fri, Nov 26, 2021 at 1:04 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 26.11.21 03:52, Peter Xu wrote:
> > On Thu, Nov 25, 2021 at 11:32:08AM +0100, David Hildenbrand wrote:
> >> On 25.11.21 11:24, Peter Xu wrote:
> >>> On Mon, Nov 22, 2021 at 10:40:54AM -0800, Shakeel Butt wrote:
> >>>>> Do we have a performance evaluation how much overhead is added e.g., for
> >>>>> a single 4k MADV_DONTNEED call on a THP or on a MADV_DONTNEED call that
> >>>>> covers the whole THP?
> >>>>
> >>>> I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on
> >>>> x86 for both settings you suggested. I don't see any statistically
> >>>> significant difference with and without the patch. Let me know if you
> >>>> want me to try something else.
> >>>
> >>> I'm a bit surprised that sync split thp didn't bring any extra overhead.
> >>>
> >>> "unmap whole thp" is understandable from that pov, because afaict that won't
> >>> even trigger any thp split anyway even delayed, if this is the simplest case
> >>> that only this process mapped this thp, and it mapped once.
> >>>
> >>> For "unmap 4k upon thp" IIUC that's the worst case and zapping 4k should be
> >>> fast; while what I don't understand since thp split requires all hand-made work
> >>> for copying thp flags into small pages and so on, so I thought there should at
> >>> least be some overhead measured.  Shakeel, could there be something overlooked
> >>> in the test, or maybe it's me that overlooked?
> >>>
> >>> I had the same concern as what Kirill/Matthew raised in the other thread - I'm
> >>> worried proactively splitting simply because any 4k page is zapped might
> >>> quickly free up 2m thps in the system and I'm not sure whether it'll exaggerate
> >>> the defragmentation of the system memory in general.  I'm also not sure whether
> >>> that's ideal for some very common workload that frequently uses DONTNEED to
> >>> proactively drop some pages.
> >>
> >> The pageblock corresponding to the THP is movable. So (unless we start
> >> spilling unmovable allocations into movable pageblocks) we'd only place
> >> movable allocations in there. Compaction will be able to migrate to
> >> re-create a free THP.
> >>
> >> In contrast I think, compaction will happily skip over the THP and
> >> ignore it, because it has no clue that the THP could be repurposed by
> >> split+migrate (at least I am not aware of code that does it).
> >>
> >> Unless I am missing something, with the above in mind it could make
> >> sense to split as soon as possible, even before we're under memory
> >> pressure -- for example, for proactive compaction.
> >>
> >> [proactive compaction could try splitting first as well I think]
> >
> > But we can't rely on proactive compaction for rapid operations, because it's
> > still adding overhead to the overall system by split+merge, right?
>
> Yes, but there is also direct compaction that can be triggered without
> the shrinker getting involved. I think we can summarize as "there might
> not be a right or wrong when to split". An application that
> MADV_DONTNEEDs/munmap sub-THP memory told us that it doesn't want to
> consume memory, yet it looks like it's still consuming that memory.
>
> I do wonder how THP on the deferred split queue behave in respect to
> page migration -- memory offlining, alloc_contig_range(). I saw reports

IIUC the old page that is on deferred split queue would be freed and
off the list once the migration is done. But the new page is *NOT*
added to the deferred split list at all. This would not cause any
severe bugs but the partial unmapped pages can no longer be shrunk
under memory pressure.

> that there are some cases where THP can be problematic when
> stress-testing THP:
> https://lkml.kernel.org/r/20210903162102.GA10039@mam-gdavis-dt
>
> But not sure if that's related to deferred splitting. Most probably not.
>
> >
> > +compaction_proactiveness
> > +========================
> > + ...
> > +Note that compaction has a non-trivial system-wide impact as pages
> > +belonging to different processes are moved around, which could also lead
> > +to latency spikes in unsuspecting applications. The kernel employs
> > +various heuristics to avoid wasting CPU cycles if it detects that
> > +proactive compaction is not being effective.
> >
> > Delaying split makes sense to me because after all the kernel is not aware of
> > the userspace's preference, so the best thing is to do nothing until necessary.
> >
> > Proactively split thps in dontneed/unmap added an assumption that the userspace
> > wants to break the pages by default.  It's 100% true for Shakeel's use case,
> > but I'm not sure whether it may always be true.  That's why I thought maybe a
> > new interface is more proper, so we at least won't break anyone by accident.
>
> Well, we already broke the PMD into PTEs. So the performance gain at
> least for that user is really gone until we "fix that" again via
> khugepaged -- which might just be configured to not "fix" if there are
> empty PTEs.
>
> It for sure is interesting if you have a COW huge page and only one
> party zaps/unmaps some part.
>
>
> --
> Thanks,
>
> David / dhildenb
>

  reply	other threads:[~2021-11-29 23:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20 20:12 [PATCH] mm: split thp synchronously on MADV_DONTNEED Shakeel Butt
2021-11-21  4:35 ` Matthew Wilcox
2021-11-21  5:25   ` Shakeel Butt
2021-11-22  0:50 ` Kirill A. Shutemov
2021-11-22  3:42   ` Shakeel Butt
2021-11-22  4:56 ` Matthew Wilcox
2021-11-22  9:19   ` David Hildenbrand
2021-12-08 13:23     ` Pankaj Gupta
2021-11-22  8:32 ` David Hildenbrand
2021-11-22 18:40   ` Shakeel Butt
2021-11-22 18:59     ` David Hildenbrand
2021-11-23  1:20       ` Shakeel Butt
2021-11-23 16:56         ` David Hildenbrand
2021-11-23 17:17           ` Shakeel Butt
2021-11-23 17:20             ` David Hildenbrand
2021-11-23 17:24               ` Shakeel Butt
2021-11-23 17:26                 ` David Hildenbrand
2021-11-23 17:28                   ` Shakeel Butt
2021-11-25 10:09                     ` Peter Xu
2021-11-25 17:14                       ` Shakeel Butt
2021-11-26  0:00                         ` Peter Xu
2021-11-25 10:24     ` Peter Xu
2021-11-25 10:32       ` David Hildenbrand
2021-11-26  2:52         ` Peter Xu
2021-11-26  9:04           ` David Hildenbrand
2021-11-29 22:00             ` Yang Shi [this message]
2021-11-26  3:21       ` Shakeel Butt
2021-11-26  4:12         ` Peter Xu
2021-11-26  9:16           ` David Hildenbrand
2021-11-26  9:39             ` Peter Xu
2021-11-29 21:32             ` Yang Shi
2022-01-24 18:48           ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkpT3cU0PfmD2LeM9LuYjUhZhArRHxaP7JAU0koeDSXHvw@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=davis.george@siemens.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).