LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	ying.huang@intel.com, Andrea Arcangeli <aarcange@redhat.com>,
	s.priebe@profihost.ag, mgorman@techsingularity.net,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
	Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Wed, 5 Dec 2018 11:18:36 +0100
Message-ID: <20181205101836.GF1286@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1812041551170.213718@chino.kir.corp.google.com>

On Tue 04-12-18 16:07:27, David Rientjes wrote:
> On Tue, 4 Dec 2018, Michal Hocko wrote:
> 
> > The thing I am really up to here is that reintroduction of
> > __GFP_THISNODE, which you are pushing for, will conflate madvise mode
> > resp. defrag=always with a numa placement policy because the allocation
> > doesn't fallback to a remote node.
> > 
> 
> It isn't specific to MADV_HUGEPAGE, it is the policy for all transparent 
> hugepage allocations, including defrag=always.  We agree that 
> MADV_HUGEPAGE is not exactly defined: does it mean try harder to allocate 
> a hugepage locally, try compaction synchronous to the fault, allow remote 
> fallback?  It's undefined.

Yeah, it is certainly underdefined. One thing is clear though. Using
MADV_HUGEPAGE implies that the specific mapping benefits from THPs and
is willing to pay associated init cost. This doesn't imply anything
regarding NUMA locality and as we have NUMA API it shouldn't even
attempt to do so because it would be conflating two things.
[...]

> > And that is a fundamental problem and the antipattern I am talking
> > about. Look at it this way. All normal allocations are utilizing all the
> > available memory even though they might hit a remote latency penalty. If
> > you do care about NUMA placement you have an API to enforce a specific
> > placement.  What is so different about THP to behave differently. Do
> > we really want to later invent an API to actually allow to utilize all
> > the memory? There are certainly usecases (that triggered the discussion
> > previously) that do not mind the remote latency because all other
> > benefits simply outweight it?
> > 
> 
> What is different about THP is that on every platform I have measured, 
> NUMA matters more than hugepages.  Obviously if on Broadwell, Haswell, and 
> Rome, remote hugepages were a performance win over local pages, this 
> discussion would not be happening.  Faulting local pages rather than 
> local hugepages, if possible, is easy and doesn't require reclaim.  
> Faulting remote pages rather than reclaiming local pages is easy in your 
> scenario, it's non-disruptive.

You keep ignoring all other usecases mentioned before and that is not
really helpful. Access cost can be amortized by other savings. Not to
mention NUMA balancing moving around hot THPs with remote accesses.

> So to answer "what is so different about THP?", it's the performance data.  
> The NUMA locality matters more than whether the pages are huge or not.  We 
> also have the added benefit of khugepaged being able to collapse pages 
> locally if fragmentation improves rather than being stuck accessing a 
> remote hugepage forever.

Please back your claims by a variety of workloads. Including mentioned
KVMs one. You keep hand waving about access latency completely ignoring
all other aspects and that makes my suspicious that you do not really
appreciate all the complexity here even stronger.

If there was a general consensus we want to make THP very special wrt.
numa locality, I could live with that. It would be inconsistency in the
API and as such something that will kick us sooner or later. But it
seems that _you_ are the only one to push that direction and you keep
ignoring all other usecases consistently throughout all the discussions
we have had so far. Several people keeps pointing out that this is a
wrong direction but that seems to be completely ignored.

I believe that the only way forward is back your claims by numbers
covering a larger set of THP users and prove that remote THP is
a wrong default behavior. But you cannot really push that through based
on a single usecase of yours which you refuse to describe beyond a
simple access latency metric.
-- 
Michal Hocko
SUSE Labs

  reply index

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  6:25 kernel test robot
2018-11-27 17:08 ` Linus Torvalds
2018-11-27 18:17   ` Michal Hocko
2018-11-27 18:21     ` Michal Hocko
2018-11-27 19:05   ` Vlastimil Babka
2018-11-27 19:16     ` Vlastimil Babka
2018-11-27 20:57   ` Andrea Arcangeli
2018-11-27 22:50     ` Linus Torvalds
2018-11-28  6:30       ` Michal Hocko
2018-11-28  3:20     ` Huang\, Ying
2018-11-28 16:48       ` Linus Torvalds
2018-11-28 18:39         ` Andrea Arcangeli
2018-11-28 23:10         ` David Rientjes
2018-12-03 18:01         ` Linus Torvalds
2018-12-03 18:14           ` Michal Hocko
2018-12-03 18:19             ` Linus Torvalds
2018-12-03 18:30               ` Michal Hocko
2018-12-03 18:45                 ` Linus Torvalds
2018-12-03 18:59                   ` Michal Hocko
2018-12-03 19:23                     ` Andrea Arcangeli
2018-12-03 20:26                       ` David Rientjes
2018-12-03 19:28                     ` Linus Torvalds
2018-12-03 20:12                       ` Andrea Arcangeli
2018-12-03 20:36                         ` David Rientjes
2018-12-03 22:04                         ` Linus Torvalds
2018-12-03 22:27                           ` Linus Torvalds
2018-12-03 22:57                             ` David Rientjes
2018-12-04  9:22                             ` Vlastimil Babka
2018-12-04 10:45                               ` Mel Gorman
2018-12-05  0:47                                 ` David Rientjes
2018-12-05  9:08                                   ` Michal Hocko
2018-12-05 10:43                                     ` Mel Gorman
2018-12-05 11:43                                       ` Michal Hocko
2018-12-05 10:06                                 ` Mel Gorman
2018-12-05 20:40                                 ` Andrea Arcangeli
2018-12-05 21:59                                   ` David Rientjes
2018-12-06  0:00                                     ` Andrea Arcangeli
2018-12-05 22:03                                   ` Linus Torvalds
2018-12-05 22:12                                     ` David Rientjes
2018-12-05 23:36                                     ` Andrea Arcangeli
2018-12-05 23:51                                       ` Linus Torvalds
2018-12-06  0:58                                         ` Linus Torvalds
2018-12-06  9:14                                           ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06 23:49                                             ` David Rientjes
2018-12-07  7:34                                               ` Michal Hocko
2018-12-07  4:31                                             ` Linus Torvalds
2018-12-07  7:49                                               ` Michal Hocko
2018-12-07  9:06                                                 ` Vlastimil Babka
2018-12-07 23:15                                                   ` David Rientjes
2018-12-06 23:43                                           ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-07  4:01                                             ` Linus Torvalds
2018-12-10  0:29                                               ` David Rientjes
2018-12-10  4:49                                                 ` Andrea Arcangeli
2018-12-12  0:37                                                   ` David Rientjes
2018-12-12  9:50                                                     ` Michal Hocko
2018-12-12 17:00                                                       ` Andrea Arcangeli
2018-12-14 11:32                                                         ` Michal Hocko
2018-12-12 10:14                                                     ` Vlastimil Babka
2018-12-14 21:04                                                       ` David Rientjes
2018-12-14 21:33                                                         ` Vlastimil Babka
2018-12-21 22:18                                                           ` David Rientjes
2018-12-22 12:08                                                             ` Mel Gorman
2018-12-14 23:11                                                         ` Mel Gorman
2018-12-21 22:15                                                           ` David Rientjes
2018-12-12 10:44                                                   ` Andrea Arcangeli
2019-04-15 11:48                                             ` Michal Hocko
2018-12-06  0:18                                       ` David Rientjes
2018-12-06  0:54                                         ` Andrea Arcangeli
2018-12-06  9:23                                           ` Vlastimil Babka
2018-12-03 20:39                     ` David Rientjes
2018-12-03 21:25                       ` Michal Hocko
2018-12-03 21:53                         ` David Rientjes
2018-12-04  8:48                           ` Michal Hocko
2018-12-05  0:07                             ` David Rientjes
2018-12-05 10:18                               ` Michal Hocko [this message]
2018-12-05 19:16                                 ` David Rientjes
2018-11-27  7:23 kernel test robot

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205101836.GF1286@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox