Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrea Argangeli <andrea@kernel.org>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Stable tree <stable@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: thp:  relax __GFP_THISNODE for MADV_HUGEPAGE mappings
Date: Mon, 15 Oct 2018 18:57:43 -0400
Message-ID: <20181015225743.GB30832@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1810151525460.247641@chino.kir.corp.google.com>

On Mon, Oct 15, 2018 at 03:30:17PM -0700, David Rientjes wrote:
> At the risk of beating a dead horse that has already been beaten, what are 
> the plans for this patch when the merge window opens?  It would be rather 
> unfortunate for us to start incurring a 14% increase in access latency and 
> 40% increase in fault latency.  Would it be possible to test with my 
> patch[*] that does not try reclaim to address the thrashing issue?  If 
> that is satisfactory, I don't have a strong preference if it is done with 
> a hardcoded pageblock_order and __GFP_NORETRY check or a new 
> __GFP_COMPACT_ONLY flag.

I don't like the pageblock size hardcoding inside the page
allocator. __GFP_COMPACT_ONLY is fully runtime equivalent, but it at
least let the caller choose the behavior, so it looks more flexible.

To fix your 40% fault latency concern in the short term we could use
__GFP_COMPACT_ONLY, but I think we should get rid of
__GFP_COMPACT_ONLY later: we need more statistical data in the zone
structure to track remote compaction failures triggering because the
zone is fully fragmented.

Once the zone is fully fragmented we need to do a special exponential
backoff on that zone when the zone is from a remote node.

Furthermore at the first remote NUMA node zone failure caused by full
fragmentation we need to interrupt compaction and stop trying with all
remote nodes.

As long as compaction returns COMPACT_SKIPPED it's ok to keep doing
reclaim and keep doing compaction, as long as compaction succeeds.

What is causing the higher latency is the fact we try all zones from
all remote nodes even if there's a failure caused by full
fragmentation of all remote zone, and we're unable to skip (skip with
exponential backoff) only those zones where compaction cannot succeed
because of fragmentation.

Once we achieve the above deleting __GFP_COMPACT_ONLY will be a
trivial patch.

> I think the second issue of faulting remote thp by removing __GFP_THISNODE 
> needs supporting evidence that shows some platforms benefit from this (and 
> not with numa=fake on the command line :).
> 
>  [*] https://marc.info/?l=linux-kernel&m=153903127717471

That is needed to compare the current one liner fix with
__GFP_COMPACT_ONLY, but I don't think it's needed to compare v4.18
with the current fix. The badness of v4.18 was too bad keep, getting
local PAGE_SIZEd memory or remote THPs is a secondary issue.

In fact the main reason for __GFP_COMPACT_ONLY is not anymore such
tradeoff, but not to spend too much CPU in compaction when all nodes
are fragmented to avoid increasing the allocation latency too much.

Thanks,
Andrea

  parent reply index

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 12:03 [PATCH 0/2] thp nodereclaim fixes Michal Hocko
2018-09-25 12:03 ` [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-09-25 12:20   ` Mel Gorman
2018-09-25 12:30     ` Michal Hocko
2018-10-04 20:16   ` David Rientjes
2018-10-04 21:10     ` Andrea Arcangeli
2018-10-04 23:05       ` David Rientjes
2018-10-06  3:19         ` Andrea Arcangeli
2018-10-05  7:38     ` Mel Gorman
2018-10-05 20:35       ` David Rientjes
2018-10-05 23:21         ` Andrea Arcangeli
2018-10-08 20:41           ` David Rientjes
2018-10-09  9:48             ` Mel Gorman
2018-10-09 12:27               ` Michal Hocko
2018-10-09 13:00                 ` Mel Gorman
2018-10-09 14:25                   ` Michal Hocko
2018-10-09 15:16                     ` Mel Gorman
2018-10-09 23:03                     ` Andrea Arcangeli
2018-10-10 21:19                       ` David Rientjes
2018-10-15 22:30                         ` David Rientjes
2018-10-15 22:44                           ` Andrew Morton
2018-10-15 23:19                             ` Andrea Arcangeli
2018-10-22 20:54                               ` David Rientjes
2018-10-16  7:46                             ` Mel Gorman
2018-10-16 22:37                               ` Andrew Morton
2018-10-16 23:11                                 ` Andrea Arcangeli
2018-10-16 23:16                                   ` Andrew Morton
2018-10-17  7:08                                     ` Michal Hocko
2018-10-17  9:00                                 ` Mel Gorman
2018-10-22 21:04                               ` David Rientjes
2018-10-23  1:27                                 ` Zi Yan
2018-10-28 21:45                                   ` David Rientjes
2018-10-23  7:57                                 ` Mel Gorman
2018-10-23  8:38                                   ` Mel Gorman
2018-10-15 22:57                           ` Andrea Arcangeli [this message]
2018-10-22 20:45                             ` David Rientjes
2018-10-09 22:17               ` David Rientjes
2018-10-09 22:51                 ` Andrea Arcangeli
2018-10-10  7:54                   ` Vlastimil Babka
2018-10-10 21:00                   ` David Rientjes
2018-10-09 13:08             ` Vlastimil Babka
2018-10-09 22:21             ` Andrea Arcangeli
2018-10-29  5:17   ` Balbir Singh
2018-10-29  9:00     ` Michal Hocko
2018-10-29  9:42       ` Balbir Singh
2018-10-29 10:08         ` Michal Hocko
2018-10-29 10:56           ` Andrea Arcangeli
2018-09-25 12:03 ` [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask Michal Hocko
2018-09-26 13:30   ` Kirill A. Shutemov
2018-09-26 14:17     ` Michal Hocko
2018-09-26 14:22       ` Michal Hocko
2018-10-19  2:11         ` Andrew Morton
2018-10-19  8:06           ` Michal Hocko
2018-10-22 13:27             ` Vlastimil Babka
2018-10-24 23:17               ` Andrew Morton
2018-10-25  4:56                 ` Vlastimil Babka
2018-10-25 16:14                   ` Michal Hocko
2018-10-25 16:18                     ` Andrew Morton
2018-10-25 16:45                       ` Michal Hocko
2018-10-22 13:15         ` Vlastimil Babka
2018-10-22 13:30           ` Michal Hocko
2018-10-22 13:35             ` Vlastimil Babka
2018-10-22 13:46               ` Michal Hocko
2018-10-22 13:53                 ` Vlastimil Babka
2018-10-04 20:17     ` David Rientjes
2018-10-04 21:49       ` Zi Yan
2018-10-09 12:36       ` Michal Hocko
2018-09-26 13:08 ` linux-mm@ archive on lore.kernel.org (Was: [PATCH 0/2] thp nodereclaim fixes) Kirill A. Shutemov
2018-09-26 13:14   ` Michal Hocko
2018-09-26 22:22     ` Andrew Morton
2018-09-26 23:08       ` Mel Gorman
2018-09-27  0:47         ` Konstantin Ryabitsev
2018-09-26 15:25   ` Konstantin Ryabitsev
2018-09-27 11:30     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181015225743.GB30832@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@kernel.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git