All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zi Yan" <zi.yan@cs.rutgers.edu>
To: "David Rientjes" <rientjes@google.com>
Cc: "Mel Gorman" <mgorman@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Andrea Argangeli" <andrea@kernel.org>,
	"Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	"Stable tree" <stable@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: thp:  relax __GFP_THISNODE for MADV_HUGEPAGE mappings
Date: Mon, 22 Oct 2018 21:27:43 -0400	[thread overview]
Message-ID: <0BA54BDA-D457-4BD8-AC49-1DD7CD032C7F@cs.rutgers.edu> (raw)
In-Reply-To: <alpine.DEB.2.21.1810221355050.120157@chino.kir.corp.google.com>

Hi David,

On 22 Oct 2018, at 17:04, David Rientjes wrote:

> On Tue, 16 Oct 2018, Mel Gorman wrote:
>
>> I consider this to be an unfortunate outcome. On the one hand, we 
>> have a
>> problem that three people can trivially reproduce with known test 
>> cases
>> and a patch shown to resolve the problem. Two of those three people 
>> work
>> on distributions that are exposed to a large number of users. On the
>> other, we have a problem that requires the system to be in a specific
>> state and an unknown workload that suffers badly from the remote 
>> access
>> penalties with a patch that has review concerns and has not been 
>> proven
>> to resolve the trivial cases.
>
> The specific state is that remote memory is fragmented as well, this 
> is
> not atypical.  Removing __GFP_THISNODE to avoid thrashing a zone will 
> only
> be beneficial when you can allocate remotely instead.  When you cannot
> allocate remotely instead, you've made the problem much worse for
> something that should be __GFP_NORETRY in the first place (and was for
> years) and should never thrash.
>
> I'm not interested in patches that require remote nodes to have an
> abundance of free or unfragmented memory to avoid regressing.

I just wonder what is the page allocation priority list in your 
environment,
assuming all memory nodes are so fragmented that no huge pages can be
obtained without compaction or reclaim.

Here is my version of that list, please let me know if it makes sense to 
you:

1. local huge pages: with compaction and/or page reclaim, you are 
willing
to pay the penalty of getting huge pages;

2. local base pages: since, in your system, remote data accesses have 
much
higher penalty than the extra TLB misses incurred by the base page size;

3. remote huge pages: at least it is better than remote base pages;

4. remote base pages: it performs worst in terms of locality and TLBs.

This might not be easy to implement in current kernel, because
the zones from remote nodes will always be candidates when
kernel is trying get_page_from_freelist(). Only _GFP_THISNODE
and MPOL_BIND can eliminate these remote node zones, where _GFP_THISNODE
is a kernel version MPOL_BIND and overwrites any user space
memory policy other than MPOL_BIND, which is troublesome.

In addition, to prioritize local base pages over remote pages,
the original huge page allocation has to fail, then kernel can
fall back to base page allocations. And you will never get remote
huge pages any more if the local base page allocation fails,
because there is no way back to huge page allocation after the fallback.

Do you expect both behaviors?


>> In the case of distributions, the first
>> patch addresses concerns with a common workload where on the other 
>> hand
>> we have an internal workload of a single company that is affected --
>> which indirectly affects many users admittedly but only one entity 
>> directly.
>>
>
> The alternative, which is my patch, hasn't been tested or shown why it
> cannot work.  We continue to talk about order >= pageblock_order vs
> __GFP_COMPACTONLY.
>
> I'd like to know, specifically:
>
>  - what measurable affect my patch has that is better solved with 
> removing
>    __GFP_THISNODE on systems where remote memory is also fragmented?
>
>  - what platforms benefit from remote access to hugepages vs accessing
>    local small pages (I've asked this maybe 4 or 5 times now)?
>
>  - how is reclaiming (and possibly thrashing) memory helpful if 
> compaction
>    fails to free an entire pageblock due to slab fragmentation due to 
> low
>    on memory conditions and the page allocator preference to return 
> node-
>    local memory?
>
>  - how is reclaiming (and possibly thrashing) memory helpful if 
> compaction
>    cannot access the memory reclaimed because the freeing scanner has
>    already passed by it, or the migration scanner has passed by it, 
> since
>    this reclaim is not targeted to pages it can find?
>
>  - what metrics can be introduced to the page allocator so that we can
>    determine that reclaiming (and possibly thrashing) memory will 
> result
>    in a hugepage being allocated?

The slab fragmentation and whether reclaim/compaction can help form
huge pages seem to orthogonal to this patch, which tries to decide
the priority between locality and huge pages.

For slab fragmentation, you might find this paper “Making Huge Pages 
Actually Useful”
(https://dl.acm.org/citation.cfm?id=3173203) helpful. The paper is
trying to minimize the number of page blocks that have both moveable and
non-moveable pages.


--
Best Regards
Yan Zi

WARNING: multiple messages have this Message-ID (diff)
From: "Zi Yan" <zi.yan@cs.rutgers.edu>
To: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrea Argangeli <andrea@kernel.org>,
	Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Stable tree <stable@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: thp:  relax __GFP_THISNODE for MADV_HUGEPAGE mappings
Date: Mon, 22 Oct 2018 21:27:43 -0400	[thread overview]
Message-ID: <0BA54BDA-D457-4BD8-AC49-1DD7CD032C7F@cs.rutgers.edu> (raw)
In-Reply-To: <alpine.DEB.2.21.1810221355050.120157@chino.kir.corp.google.com>

Hi David,

On 22 Oct 2018, at 17:04, David Rientjes wrote:

> On Tue, 16 Oct 2018, Mel Gorman wrote:
>
>> I consider this to be an unfortunate outcome. On the one hand, we 
>> have a
>> problem that three people can trivially reproduce with known test 
>> cases
>> and a patch shown to resolve the problem. Two of those three people 
>> work
>> on distributions that are exposed to a large number of users. On the
>> other, we have a problem that requires the system to be in a specific
>> state and an unknown workload that suffers badly from the remote 
>> access
>> penalties with a patch that has review concerns and has not been 
>> proven
>> to resolve the trivial cases.
>
> The specific state is that remote memory is fragmented as well, this 
> is
> not atypical.  Removing __GFP_THISNODE to avoid thrashing a zone will 
> only
> be beneficial when you can allocate remotely instead.  When you cannot
> allocate remotely instead, you've made the problem much worse for
> something that should be __GFP_NORETRY in the first place (and was for
> years) and should never thrash.
>
> I'm not interested in patches that require remote nodes to have an
> abundance of free or unfragmented memory to avoid regressing.

I just wonder what is the page allocation priority list in your 
environment,
assuming all memory nodes are so fragmented that no huge pages can be
obtained without compaction or reclaim.

Here is my version of that list, please let me know if it makes sense to 
you:

1. local huge pages: with compaction and/or page reclaim, you are 
willing
to pay the penalty of getting huge pages;

2. local base pages: since, in your system, remote data accesses have 
much
higher penalty than the extra TLB misses incurred by the base page size;

3. remote huge pages: at least it is better than remote base pages;

4. remote base pages: it performs worst in terms of locality and TLBs.

This might not be easy to implement in current kernel, because
the zones from remote nodes will always be candidates when
kernel is trying get_page_from_freelist(). Only _GFP_THISNODE
and MPOL_BIND can eliminate these remote node zones, where _GFP_THISNODE
is a kernel version MPOL_BIND and overwrites any user space
memory policy other than MPOL_BIND, which is troublesome.

In addition, to prioritize local base pages over remote pages,
the original huge page allocation has to fail, then kernel can
fall back to base page allocations. And you will never get remote
huge pages any more if the local base page allocation fails,
because there is no way back to huge page allocation after the fallback.

Do you expect both behaviors?


>> In the case of distributions, the first
>> patch addresses concerns with a common workload where on the other 
>> hand
>> we have an internal workload of a single company that is affected --
>> which indirectly affects many users admittedly but only one entity 
>> directly.
>>
>
> The alternative, which is my patch, hasn't been tested or shown why it
> cannot work.  We continue to talk about order >= pageblock_order vs
> __GFP_COMPACTONLY.
>
> I'd like to know, specifically:
>
>  - what measurable affect my patch has that is better solved with 
> removing
>    __GFP_THISNODE on systems where remote memory is also fragmented?
>
>  - what platforms benefit from remote access to hugepages vs accessing
>    local small pages (I've asked this maybe 4 or 5 times now)?
>
>  - how is reclaiming (and possibly thrashing) memory helpful if 
> compaction
>    fails to free an entire pageblock due to slab fragmentation due to 
> low
>    on memory conditions and the page allocator preference to return 
> node-
>    local memory?
>
>  - how is reclaiming (and possibly thrashing) memory helpful if 
> compaction
>    cannot access the memory reclaimed because the freeing scanner has
>    already passed by it, or the migration scanner has passed by it, 
> since
>    this reclaim is not targeted to pages it can find?
>
>  - what metrics can be introduced to the page allocator so that we can
>    determine that reclaiming (and possibly thrashing) memory will 
> result
>    in a hugepage being allocated?

The slab fragmentation and whether reclaim/compaction can help form
huge pages seem to orthogonal to this patch, which tries to decide
the priority between locality and huge pages.

For slab fragmentation, you might find this paper a??Making Huge Pages 
Actually Usefula??
(https://dl.acm.org/citation.cfm?id=3173203) helpful. The paper is
trying to minimize the number of page blocks that have both moveable and
non-moveable pages.


--
Best Regards
Yan Zi

  reply	other threads:[~2018-10-23  1:27 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 12:03 [PATCH 0/2] thp nodereclaim fixes Michal Hocko
2018-09-25 12:03 ` Michal Hocko
2018-09-25 12:03 ` [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-09-25 12:03   ` Michal Hocko
2018-09-25 12:20   ` Mel Gorman
2018-09-25 12:30     ` Michal Hocko
2018-10-04 20:16   ` David Rientjes
2018-10-04 21:10     ` Andrea Arcangeli
2018-10-04 23:05       ` David Rientjes
2018-10-06  3:19         ` Andrea Arcangeli
2018-10-05  7:38     ` Mel Gorman
2018-10-05 20:35       ` David Rientjes
2018-10-05 23:21         ` Andrea Arcangeli
2018-10-08 20:41           ` David Rientjes
2018-10-09  9:48             ` Mel Gorman
2018-10-09 12:27               ` Michal Hocko
2018-10-09 13:00                 ` Mel Gorman
2018-10-09 14:25                   ` Michal Hocko
2018-10-09 15:16                     ` Mel Gorman
2018-10-09 23:03                     ` Andrea Arcangeli
2018-10-10 21:19                       ` David Rientjes
2018-10-15 22:30                         ` David Rientjes
2018-10-15 22:44                           ` Andrew Morton
2018-10-15 23:19                             ` Andrea Arcangeli
2018-10-22 20:54                               ` David Rientjes
2018-10-16  7:46                             ` Mel Gorman
2018-10-16 22:37                               ` Andrew Morton
2018-10-16 23:11                                 ` Andrea Arcangeli
2018-10-16 23:16                                   ` Andrew Morton
2018-10-17  7:08                                     ` Michal Hocko
2018-10-17  9:00                                 ` Mel Gorman
2018-10-22 21:04                               ` David Rientjes
2018-10-23  1:27                                 ` Zi Yan [this message]
2018-10-23  1:27                                   ` Zi Yan
2018-10-28 21:45                                   ` David Rientjes
2018-10-23  7:57                                 ` Mel Gorman
2018-10-23  8:38                                   ` Mel Gorman
2018-10-15 22:57                           ` Andrea Arcangeli
2018-10-22 20:45                             ` David Rientjes
2018-10-09 22:17               ` David Rientjes
2018-10-09 22:51                 ` Andrea Arcangeli
2018-10-10  7:54                   ` Vlastimil Babka
2018-10-10 21:00                   ` David Rientjes
2018-10-09 13:08             ` Vlastimil Babka
2018-10-09 22:21             ` Andrea Arcangeli
2018-10-29  5:17   ` Balbir Singh
2018-10-29  9:00     ` Michal Hocko
2018-10-29  9:42       ` Balbir Singh
2018-10-29 10:08         ` Michal Hocko
2018-10-29 10:56           ` Andrea Arcangeli
2018-09-25 12:03 ` [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask Michal Hocko
2018-09-25 12:03   ` Michal Hocko
2018-09-26 13:30   ` Kirill A. Shutemov
2018-09-26 14:17     ` Michal Hocko
2018-09-26 14:22       ` Michal Hocko
2018-10-19  2:11         ` Andrew Morton
2018-10-19  8:06           ` Michal Hocko
2018-10-22 13:27             ` Vlastimil Babka
2018-10-24 23:17               ` Andrew Morton
2018-10-25  4:56                 ` Vlastimil Babka
2018-10-25 16:14                   ` Michal Hocko
2018-10-25 16:18                     ` Andrew Morton
2018-10-25 16:45                       ` Michal Hocko
2018-10-22 13:15         ` Vlastimil Babka
2018-10-22 13:30           ` Michal Hocko
2018-10-22 13:35             ` Vlastimil Babka
2018-10-22 13:46               ` Michal Hocko
2018-10-22 13:53                 ` Vlastimil Babka
2018-10-04 20:17     ` David Rientjes
2018-10-04 21:49       ` Zi Yan
2018-10-09 12:36       ` Michal Hocko
2018-09-26 13:08 ` linux-mm@ archive on lore.kernel.org (Was: [PATCH 0/2] thp nodereclaim fixes) Kirill A. Shutemov
2018-09-26 13:14   ` Michal Hocko
2018-09-26 22:22     ` Andrew Morton
2018-09-26 23:08       ` Mel Gorman
2018-09-27  0:47         ` Konstantin Ryabitsev
2018-09-26 15:25   ` Konstantin Ryabitsev
2018-09-27 11:30     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0BA54BDA-D457-4BD8-AC49-1DD7CD032C7F@cs.rutgers.edu \
    --to=zi.yan@cs.rutgers.edu \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@kernel.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.