linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	 Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,  Zi Yan <zi.yan@cs.rutgers.edu>,
	 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
	 "Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"
Date: Wed, 29 May 2019 14:24:33 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1905281907060.86034@chino.kir.corp.google.com> (raw)
In-Reply-To: <20190523175737.2fb5b997df85b5d117092b5b@linux-foundation.org>

On Thu, 23 May 2019, Andrew Morton wrote:

> > We are going in circles, *yes* there is a problem for potential swap 
> > storms today because of the poor interaction between memory compaction and 
> > directed reclaim but this is a result of a poor API that does not allow 
> > userspace to specify that its workload really will span multiple sockets 
> > so faulting remotely is the best course of action.  The fix is not to 
> > cause regressions for others who have implemented a userspace stack that 
> > is based on the past 3+ years of long standing behavior or for specialized 
> > workloads where it is known that it spans multiple sockets so we want some 
> > kind of different behavior.  We need to provide a clear and stable API to 
> > define these terms for the page allocator that is independent of any 
> > global setting of thp enabled, defrag, zone_reclaim_mode, etc.  It's 
> > workload dependent.
> 
> um, who is going to do this work?
> 
> Implementing a new API doesn't help existing userspace which is hurting
> from the problem which this patch addresses.
> 

The problem which this patch addresses has apparently gone unreported for 
4+ years since

commit 077fcf116c8c2bd7ee9487b645aa3b50368db7e1
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Feb 11 15:27:12 2015 -0800

    mm/thp: allocate transparent hugepages on local node

My goal is to reach a solution that does not cause anybody to incur 
performance penalties as a result of it.  It's surprising that such a 
severe swap storm issue that went unnoticed for four years is something 
that can't reach an amicable solution that doesn't cause other users to 
regress.

> It does appear to me that this patch does more good than harm for the
> totality of kernel users, so I'm inclined to push it through and to try
> to talk Linus out of reverting it again.  
> 

(1) The totality of kernel users are not running workloads that span 
multiple sockets, it's the minority, (2) it's changing 4+ year behavior 
based on NUMA locality of hugepage allocations and provides no workarounds 
for users who incur regressions as a result, and (3) does not solve the 
underlying issue if remote memory is also fragmented or low on memory: it 
actually makes the problem worse.

The easiest solution would be to define the MADV_HUGEPAGE behavior 
explicitly in sysfs: local or remote.  Defaut to local as the behavior 
from the past four years and allow users to specify remote if their 
workloads will span multiple sockets.  This is somewhat coarse but no more 
than the thp defrag setting in sysfs today that defines defrag behavior 
for everybody on the system.


  parent reply	other threads:[~2019-05-29 21:24 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-03 22:31 [PATCH 0/2] reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Andrea Arcangeli
2019-05-03 22:31 ` [PATCH 1/2] Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" Andrea Arcangeli
2019-05-04 12:03   ` Michal Hocko
2019-05-03 22:31 ` [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations" Andrea Arcangeli
2019-05-04 12:11   ` Michal Hocko
2019-05-09  8:38   ` Mel Gorman
2019-05-15 20:26   ` David Rientjes
2019-05-20 15:36     ` Mel Gorman
2019-05-20 17:54       ` David Rientjes
2019-05-24  0:57         ` Andrew Morton
2019-05-24 20:29           ` Andrea Arcangeli
2019-05-29  2:06             ` David Rientjes
2019-05-29 21:24           ` David Rientjes [this message]
2019-05-31  9:22             ` Michal Hocko
2019-05-31 21:53               ` David Rientjes
2019-06-05  9:32                 ` Michal Hocko
2019-06-06 22:12                   ` David Rientjes
2019-06-07  8:32                     ` Michal Hocko
2019-06-13 20:17                       ` David Rientjes
2019-06-21 21:16                     ` David Rientjes
2019-07-30 13:11           ` Michal Hocko
2019-07-30 18:05             ` Andrew Morton
2019-07-31  8:17               ` Michal Hocko
2019-05-24 10:07         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1905281907060.86034@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=s.priebe@profihost.ag \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).