linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: David Rientjes <rientjes@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Michal Hocko <mhocko@suse.com>, Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages
Date: Sun, 8 Sep 2019 14:47:08 +0200	[thread overview]
Message-ID: <d76f8cc3-97aa-8da5-408d-397467ea768b@suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1909071829440.200558@chino.kir.corp.google.com>

On 9/8/19 3:50 AM, David Rientjes wrote:
> On Sat, 7 Sep 2019, Linus Torvalds wrote:
> 
>>> Andrea acknowledges the swap storm that he reported would be fixed with
>>> the last two patches in this series
>>
>> The problem is that even you aren't arguing that those patches should
>> go into 5.3.
>>
> 
> For three reasons: (a) we lack a test result from Andrea,

That's argument against the rfc patches 3+4s, no? But not for including
the reverts of reverts of reverts (patches 1+2).

> (b) there's 
> on-going discussion, particularly based on Vlastimil's feedback, and 

I doubt this will be finished and tested with reasonable confidence even
for the 5.4 merge window.

> (c) the patches will be refreshed incorporating that feedback as well as 
> Mike's suggestion to exempt __GFP_RETRY_MAYFAIL for hugetlb.

There might be other unexpected consequences (even if hugetlb wasn't
such an issue as I suspected, in the end).

>> So those fixes aren't going in, so "the swap storms would be fixed"
>> argument isn't actually an argument at all as far as 5.3 is concerned.
>>
> 
> It indicates that progress has been made to address the actual bug without 
> introducing long-lived access latency regressions for others, particularly 
> those who use MADV_HUGEPAGE.  In the worst case, some systems running 
> 5.3-rc4 and 5.3-rc5 have the same amount of memory backed by hugepages but 
> on 5.3-rc5 the vast majority of it is allocated remotely.  This incurs a

It's been said before, but such sensitive code generally relies on
mempolicies or node reclaim mode, not THP __GFP_THISNODE implementation
details. Or if you know there's enough free memory and just needs to be
compacted, you could do it once via sysfs before starting up your workload.

> signficant performance regression regardless of platform; the only thing 
> needed to induce this is a fragmented local node that would otherwise be 
> compacted in 5.3-rc4 rather than quickly allocate remote on 5.3-rc5.
> 
>> End result: we'd have the qemu-kvm instance performance problem in 5.3
>> that apparently causes distros to apply those patches that you want to
>> revert anyway.
>>
>> So reverting would just make distros not use 5.3 in that form.
>>
> 
> I'm arguing to revert 5.3 back to the behavior that we have had for years 
> and actually fix the bug that everybody else seems to be ignoring and then 
> *backport* those fixes to 5.3 stable and every other stable tree that can 
> use them.  Introducing a new mempolicy for NUMA locality into 5.3.0 that

I think it's rather removing the problematic implicit mempolicy of
__GFP_THISNODE.

> will subsequently changed in future 5.3 stable kernels and differs from 
> all kernels from the past few years is not in anybody's best interest if 
> the actual problem can be fixed.  It requires more feedback than a 
> one-line "the swap storms would be fixed with this."  That collaboration 
> takes time and isn't something that should be rushed into 5.3-rc5.
> 
> Yes, we can fix NUMA locality of hugepages when a workload like qemu is 
> larger than a single socket; the vast majority of workloads in the 
> datacenter are small than a socket and *cannot* incur the performance 
> penalty if local memory is fragmented that 5.3-rc5 introduces.
> 
> In other words, 5.3-rc5 is only fixing a highly specialized usecase where 
> remote allocation is acceptable because the workload is larger than a 
> socket *and* remote memory is not low on memory or fragmented.  If you

Clearly we disagree here which is the highly specialized usecase that
might get slower remote memory access, and which is more common workload
that will suffer from swap storms. No point arguing it further, but
several distros made the choice by carrying Andrea's patches already.

> consider the opposite of that, workloads smaller than a socket or local 
> compaction actually works, this has introduced a measurable regression for 
> everybody else.
> 
> I'm not sure why we are ignoring a painfully obvious bug in the page 
> allocator because of a poor feedback loop between itself and memory 
> compaction and rather papering over it by falling back to remote memory 
> when NUMA actually does matter.  If you release 5.3 without the first two 
> patches in this series, I wouldn't expect any additional feedback or test 
> results to fix this bug considering all we have gotten so far is "this 
> would fix this swap storms" and not collaborating to fix the issue for 
> everybody rather than only caring about their own workloads.  At least my 
> patches acknowledge and try to fix the issue the other is encountering.

I might have missed something, but you were asked for a reproducer of
your use case so others can develop patches with it in mind? Mel did
provide a simple example that shows the swap storms very easily.


  reply	other threads:[~2019-09-08 12:47 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 19:54 [patch for-5.3 0/4] revert immediate fallback to remote hugepages David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05  9:00   ` Michal Hocko
2019-09-05 11:22     ` Vlastimil Babka
2019-09-05 20:53       ` Mike Kravetz
2019-09-06 20:16         ` David Rientjes
2019-09-06 20:49       ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54   ` David Rientjes
2019-09-07 19:51     ` David Rientjes
2019-09-07 19:55       ` Linus Torvalds
2019-09-08  1:50         ` David Rientjes
2019-09-08 12:47           ` Vlastimil Babka [this message]
2019-09-08 20:45             ` David Rientjes
2019-09-09  8:37               ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06   ` David Rientjes
2019-09-09 19:30     ` Michal Hocko
2019-09-25  7:08       ` Michal Hocko
2019-09-26 19:03         ` David Rientjes
2019-09-27  7:48           ` Michal Hocko
2019-09-28 20:59             ` Linus Torvalds
2019-09-30 11:28               ` Michal Hocko
2019-10-01  5:43                 ` Michal Hocko
2019-10-01  8:37                   ` Michal Hocko
2019-10-18 14:15                     ` Michal Hocko
2019-10-23 11:03                       ` Vlastimil Babka
2019-10-24 18:59                         ` David Rientjes
2019-10-29 14:14                           ` Vlastimil Babka
2019-10-29 15:15                             ` Michal Hocko
2019-10-29 21:33                               ` Andrew Morton
2019-10-29 21:45                                 ` Vlastimil Babka
2019-10-29 23:25                                 ` David Rientjes
2019-11-05 13:02                                   ` Michal Hocko
2019-11-06  1:01                                     ` David Rientjes
2019-11-06  7:35                                       ` Michal Hocko
2019-11-06 21:32                                         ` David Rientjes
2019-11-13 11:20                                           ` Mel Gorman
2019-11-25  0:10                                             ` David Rientjes
2019-11-25 11:47                                               ` Michal Hocko
2019-11-25 20:38                                                 ` David Rientjes
2019-11-25 21:34                                                   ` Vlastimil Babka
2019-10-01 13:50                   ` Vlastimil Babka
2019-10-01 20:31                     ` David Rientjes
2019-10-01 21:54                       ` Vlastimil Babka
2019-10-02 10:34                         ` Michal Hocko
2019-10-02 22:32                           ` David Rientjes
2019-10-03  8:00                             ` Vlastimil Babka
2019-10-04 12:18                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d76f8cc3-97aa-8da5-408d-397467ea768b@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).