All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	ying.huang@intel.com, Andrea Arcangeli <aarcange@redhat.com>,
	s.priebe@profihost.ag, mgorman@techsingularity.net,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
	Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Wed, 5 Dec 2018 11:16:05 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1812051103560.240991@chino.kir.corp.google.com> (raw)
In-Reply-To: <20181205101836.GF1286@dhcp22.suse.cz>

On Wed, 5 Dec 2018, Michal Hocko wrote:

> > It isn't specific to MADV_HUGEPAGE, it is the policy for all transparent 
> > hugepage allocations, including defrag=always.  We agree that 
> > MADV_HUGEPAGE is not exactly defined: does it mean try harder to allocate 
> > a hugepage locally, try compaction synchronous to the fault, allow remote 
> > fallback?  It's undefined.
> 
> Yeah, it is certainly underdefined. One thing is clear though. Using
> MADV_HUGEPAGE implies that the specific mapping benefits from THPs and
> is willing to pay associated init cost. This doesn't imply anything
> regarding NUMA locality and as we have NUMA API it shouldn't even
> attempt to do so because it would be conflating two things.

This is exactly why we use MADV_HUGEPAGE when remapping our text segment 
to be backed by transparent hugepages, we want to pay the cost at startup 
to fault thp and that involves synchronous memory compaction rather than 
quickly falling back to remote memory.  This is making the case for me.

> > So to answer "what is so different about THP?", it's the performance data.  
> > The NUMA locality matters more than whether the pages are huge or not.  We 
> > also have the added benefit of khugepaged being able to collapse pages 
> > locally if fragmentation improves rather than being stuck accessing a 
> > remote hugepage forever.
> 
> Please back your claims by a variety of workloads. Including mentioned
> KVMs one. You keep hand waving about access latency completely ignoring
> all other aspects and that makes my suspicious that you do not really
> appreciate all the complexity here even stronger.
> 

I discussed the tradeoff of local hugepages vs local pages vs remote 
hugepages in https://marc.info/?l=linux-kernel&m=154077010828940 on 
Broadwell, Haswell, and Rome.  When a single application does not fit on a 
single node, we obviously need to extend the API to allow it to fault 
remotely.  We can do that without changing long-standing behavior that 
prefers to only fault locally and causing real-world users to regress.  
Your suggestions about how we can extend the API are all very logical.

 [ Note that is not the regression being addressed here, however, which is 
   massive swap storms due to a fragmented local node, which is why the
   __GFP_COMPACT_ONLY patch was also proposed by Andrea.  The ability to
   prefer faulting remotely is a worthwhile extension but it does no
   good whatsoever if we can encounter massive swap storms because we
   didn't set __GFP_NORETRY appropriately (which both of our patches do)
   both locally and now remotely. ]

WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com>
To: lkp@lists.01.org
Subject: Re: [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Wed, 05 Dec 2018 11:16:05 -0800	[thread overview]
Message-ID: <alpine.DEB.2.21.1812051103560.240991@chino.kir.corp.google.com> (raw)
In-Reply-To: <20181205101836.GF1286@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 2656 bytes --]

On Wed, 5 Dec 2018, Michal Hocko wrote:

> > It isn't specific to MADV_HUGEPAGE, it is the policy for all transparent 
> > hugepage allocations, including defrag=always.  We agree that 
> > MADV_HUGEPAGE is not exactly defined: does it mean try harder to allocate 
> > a hugepage locally, try compaction synchronous to the fault, allow remote 
> > fallback?  It's undefined.
> 
> Yeah, it is certainly underdefined. One thing is clear though. Using
> MADV_HUGEPAGE implies that the specific mapping benefits from THPs and
> is willing to pay associated init cost. This doesn't imply anything
> regarding NUMA locality and as we have NUMA API it shouldn't even
> attempt to do so because it would be conflating two things.

This is exactly why we use MADV_HUGEPAGE when remapping our text segment 
to be backed by transparent hugepages, we want to pay the cost at startup 
to fault thp and that involves synchronous memory compaction rather than 
quickly falling back to remote memory.  This is making the case for me.

> > So to answer "what is so different about THP?", it's the performance data.  
> > The NUMA locality matters more than whether the pages are huge or not.  We 
> > also have the added benefit of khugepaged being able to collapse pages 
> > locally if fragmentation improves rather than being stuck accessing a 
> > remote hugepage forever.
> 
> Please back your claims by a variety of workloads. Including mentioned
> KVMs one. You keep hand waving about access latency completely ignoring
> all other aspects and that makes my suspicious that you do not really
> appreciate all the complexity here even stronger.
> 

I discussed the tradeoff of local hugepages vs local pages vs remote 
hugepages in https://marc.info/?l=linux-kernel&m=154077010828940 on 
Broadwell, Haswell, and Rome.  When a single application does not fit on a 
single node, we obviously need to extend the API to allow it to fault 
remotely.  We can do that without changing long-standing behavior that 
prefers to only fault locally and causing real-world users to regress.  
Your suggestions about how we can extend the API are all very logical.

 [ Note that is not the regression being addressed here, however, which is 
   massive swap storms due to a fragmented local node, which is why the
   __GFP_COMPACT_ONLY patch was also proposed by Andrea.  The ability to
   prefer faulting remotely is a worthwhile extension but it does no
   good whatsoever if we can encounter massive swap storms because we
   didn't set __GFP_NORETRY appropriately (which both of our patches do)
   both locally and now remotely. ]

  reply	other threads:[~2018-12-05 19:16 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  6:25 [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression kernel test robot
2018-11-27  6:25 ` kernel test robot
2018-11-27 17:08 ` [LKP] " Linus Torvalds
2018-11-27 17:08   ` Linus Torvalds
2018-11-27 18:17   ` [LKP] " Michal Hocko
2018-11-27 18:17     ` Michal Hocko
2018-11-27 18:21     ` [LKP] " Michal Hocko
2018-11-27 18:21       ` Michal Hocko
2018-11-27 19:05   ` [LKP] " Vlastimil Babka
2018-11-27 19:05     ` Vlastimil Babka
2018-11-27 19:16     ` [LKP] " Vlastimil Babka
2018-11-27 19:16       ` Vlastimil Babka
2018-11-27 20:57   ` [LKP] " Andrea Arcangeli
2018-11-27 20:57     ` Andrea Arcangeli
2018-11-27 22:50     ` [LKP] " Linus Torvalds
2018-11-27 22:50       ` Linus Torvalds
2018-11-28  6:30       ` [LKP] " Michal Hocko
2018-11-28  6:30         ` Michal Hocko
2018-11-28  3:20     ` [LKP] " Huang, Ying
2018-11-28  3:20       ` Huang, Ying
2018-11-28 16:48       ` [LKP] " Linus Torvalds
2018-11-28 16:48         ` Linus Torvalds
2018-11-28 18:39         ` [LKP] " Andrea Arcangeli
2018-11-28 18:39           ` Andrea Arcangeli
2018-11-28 23:10         ` [LKP] " David Rientjes
2018-11-28 23:10           ` David Rientjes
2018-12-03 18:01         ` [LKP] " Linus Torvalds
2018-12-03 18:01           ` Linus Torvalds
2018-12-03 18:14           ` [LKP] " Michal Hocko
2018-12-03 18:14             ` Michal Hocko
2018-12-03 18:19             ` [LKP] " Linus Torvalds
2018-12-03 18:19               ` Linus Torvalds
2018-12-03 18:30               ` [LKP] " Michal Hocko
2018-12-03 18:30                 ` Michal Hocko
2018-12-03 18:45                 ` [LKP] " Linus Torvalds
2018-12-03 18:45                   ` Linus Torvalds
2018-12-03 18:59                   ` [LKP] " Michal Hocko
2018-12-03 18:59                     ` Michal Hocko
2018-12-03 19:23                     ` [LKP] " Andrea Arcangeli
2018-12-03 19:23                       ` Andrea Arcangeli
2018-12-03 20:26                       ` [LKP] " David Rientjes
2018-12-03 20:26                         ` David Rientjes
2018-12-03 19:28                     ` [LKP] " Linus Torvalds
2018-12-03 19:28                       ` Linus Torvalds
2018-12-03 20:12                       ` [LKP] " Andrea Arcangeli
2018-12-03 20:12                         ` Andrea Arcangeli
2018-12-03 20:36                         ` [LKP] " David Rientjes
2018-12-03 20:36                           ` David Rientjes
2018-12-03 22:04                         ` [LKP] " Linus Torvalds
2018-12-03 22:04                           ` Linus Torvalds
2018-12-03 22:27                           ` [LKP] " Linus Torvalds
2018-12-03 22:27                             ` Linus Torvalds
2018-12-03 22:57                             ` [LKP] " David Rientjes
2018-12-03 22:57                               ` David Rientjes
2018-12-04  9:22                             ` [LKP] " Vlastimil Babka
2018-12-04  9:22                               ` Vlastimil Babka
2018-12-04 10:45                               ` [LKP] " Mel Gorman
2018-12-04 10:45                                 ` Mel Gorman
2018-12-05  0:47                                 ` [LKP] " David Rientjes
2018-12-05  0:47                                   ` David Rientjes
2018-12-05  9:08                                   ` [LKP] " Michal Hocko
2018-12-05  9:08                                     ` Michal Hocko
2018-12-05 10:43                                     ` [LKP] " Mel Gorman
2018-12-05 10:43                                       ` Mel Gorman
2018-12-05 11:43                                       ` [LKP] " Michal Hocko
2018-12-05 11:43                                         ` Michal Hocko
2018-12-05 10:06                                 ` [LKP] " Mel Gorman
2018-12-05 10:06                                   ` Mel Gorman
2018-12-05 20:40                                 ` [LKP] " Andrea Arcangeli
2018-12-05 20:40                                   ` Andrea Arcangeli
2018-12-05 21:59                                   ` [LKP] " David Rientjes
2018-12-05 21:59                                     ` David Rientjes
2018-12-06  0:00                                     ` [LKP] " Andrea Arcangeli
2018-12-06  0:00                                       ` Andrea Arcangeli
2018-12-05 22:03                                   ` [LKP] " Linus Torvalds
2018-12-05 22:03                                     ` Linus Torvalds
2018-12-05 22:12                                     ` [LKP] " David Rientjes
2018-12-05 22:12                                       ` David Rientjes
2018-12-05 23:36                                     ` [LKP] " Andrea Arcangeli
2018-12-05 23:36                                       ` Andrea Arcangeli
2018-12-05 23:51                                       ` [LKP] " Linus Torvalds
2018-12-05 23:51                                         ` Linus Torvalds
2018-12-06  0:58                                         ` [LKP] " Linus Torvalds
2018-12-06  0:58                                           ` Linus Torvalds
2018-12-06  9:14                                           ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06  9:14                                             ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko
2018-12-06 23:49                                             ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " David Rientjes
2018-12-06 23:49                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: " David Rientjes
2018-12-07  7:34                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Michal Hocko
2018-12-07  7:34                                                 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko
2018-12-07  4:31                                             ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Linus Torvalds
2018-12-07  4:31                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: " Linus Torvalds
2018-12-07  7:49                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Michal Hocko
2018-12-07  7:49                                                 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko
2018-12-07  9:06                                                 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Vlastimil Babka
2018-12-07  9:06                                                   ` MADV_HUGEPAGE vs. NUMA semantic (was: " Vlastimil Babka
2018-12-07 23:15                                                   ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " David Rientjes
2018-12-07 23:15                                                     ` MADV_HUGEPAGE vs. NUMA semantic (was: " David Rientjes
2018-12-06 23:43                                           ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-06 23:43                                             ` David Rientjes
2018-12-07  4:01                                             ` [LKP] " Linus Torvalds
2018-12-07  4:01                                               ` Linus Torvalds
2018-12-10  0:29                                               ` [LKP] " David Rientjes
2018-12-10  0:29                                                 ` David Rientjes
2018-12-10  4:49                                                 ` [LKP] " Andrea Arcangeli
2018-12-10  4:49                                                   ` Andrea Arcangeli
2018-12-12  0:37                                                   ` [LKP] " David Rientjes
2018-12-12  0:37                                                     ` David Rientjes
2018-12-12  9:50                                                     ` [LKP] " Michal Hocko
2018-12-12  9:50                                                       ` Michal Hocko
2018-12-12 17:00                                                       ` [LKP] " Andrea Arcangeli
2018-12-12 17:00                                                         ` Andrea Arcangeli
2018-12-14 11:32                                                         ` [LKP] " Michal Hocko
2018-12-14 11:32                                                           ` Michal Hocko
2018-12-12 10:14                                                     ` [LKP] " Vlastimil Babka
2018-12-12 10:14                                                       ` Vlastimil Babka
2018-12-14 21:04                                                       ` [LKP] " David Rientjes
2018-12-14 21:04                                                         ` David Rientjes
2018-12-14 21:33                                                         ` [LKP] " Vlastimil Babka
2018-12-14 21:33                                                           ` Vlastimil Babka
2018-12-21 22:18                                                           ` [LKP] " David Rientjes
2018-12-21 22:18                                                             ` David Rientjes
2018-12-21 22:18                                                             ` [LKP] " David Rientjes
2018-12-22 12:08                                                             ` Mel Gorman
2018-12-22 12:08                                                               ` Mel Gorman
2018-12-14 23:11                                                         ` [LKP] " Mel Gorman
2018-12-14 23:11                                                           ` Mel Gorman
2018-12-21 22:15                                                           ` [LKP] " David Rientjes
2018-12-21 22:15                                                             ` David Rientjes
2018-12-12 10:44                                                   ` [LKP] " Andrea Arcangeli
2018-12-12 10:44                                                     ` Andrea Arcangeli
2019-04-15 11:48                                             ` [LKP] " Michal Hocko
2019-04-15 11:48                                               ` Michal Hocko
2018-12-06  0:18                                       ` [LKP] " David Rientjes
2018-12-06  0:18                                         ` David Rientjes
2018-12-06  0:54                                         ` [LKP] " Andrea Arcangeli
2018-12-06  0:54                                           ` Andrea Arcangeli
2018-12-06  9:23                                           ` [LKP] " Vlastimil Babka
2018-12-06  9:23                                             ` Vlastimil Babka
2018-12-03 20:39                     ` [LKP] " David Rientjes
2018-12-03 20:39                       ` David Rientjes
2018-12-03 21:25                       ` [LKP] " Michal Hocko
2018-12-03 21:25                         ` Michal Hocko
2018-12-03 21:53                         ` [LKP] " David Rientjes
2018-12-03 21:53                           ` David Rientjes
2018-12-04  8:48                           ` [LKP] " Michal Hocko
2018-12-04  8:48                             ` Michal Hocko
2018-12-05  0:07                             ` [LKP] " David Rientjes
2018-12-05  0:07                               ` David Rientjes
2018-12-05 10:18                               ` [LKP] " Michal Hocko
2018-12-05 10:18                                 ` Michal Hocko
2018-12-05 19:16                                 ` David Rientjes [this message]
2018-12-05 19:16                                   ` David Rientjes
2018-11-27  7:23 [LKP] " kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1812051103560.240991@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.