linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	xfs@oss.sgi.com, ppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Sun, 8 Mar 2015 11:02:23 +0100	[thread overview]
Message-ID: <20150308100223.GC15487@gmail.com> (raw)
In-Reply-To: <CA+55aFwDuzpL-k8LsV3touhNLh+TFSLKP8+-nPwMXkWXDYPhrg@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > And the patch Dave bisected to is a relatively simple patch. Why 
> > not simply revert it to see whether that cures much of the 
> > problem?
> 
> So the problem with that is that "pmd_set_numa()" and friends simply 
> no longer exist. So we can't just revert that one patch, it's the 
> whole series, and the whole point of the series.

Yeah.

> What confuses me is that the only real change that I can see in that 
> patch is the change to "change_huge_pmd()". Everything else is 
> pretty much a 100% equivalent transformation, afaik. Of course, I 
> may be wrong about that, and missing something silly.

Well, there's a difference in what we write to the pte:

 #define _PAGE_BIT_NUMA          (_PAGE_BIT_GLOBAL+1)
 #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL

and our expectation was that the two should be equivalent methods from 
the POV of the NUMA balancing code, right?

> And the changes to "change_huge_pmd()" were basically re-done
> differently by subsequent patches anyway.
> 
> The *only* change I see remaining is that change_huge_pmd() now does
> 
>    entry = pmdp_get_and_clear_notify(mm, addr, pmd);
>    entry = pmd_modify(entry, newprot);
>    set_pmd_at(mm, addr, pmd, entry);
> 
> for all changes. It used to do that "pmdp_set_numa()" for the
> prot_numa case, which did just
> 
>    pmd_t pmd = *pmdp;
>    pmd = pmd_mknuma(pmd);
>    set_pmd_at(mm, addr, pmdp, pmd);
> 
> instead.
> 
> I don't like the old pmdp_set_numa() because it can drop dirty bits,
> so I think the old code was actively buggy.

Could we, as a silly testing hack not to be applied, write a 
hack-patch that re-introduces the racy way of setting the NUMA bit, to 
confirm that it is indeed this difference that changes pte visibility 
across CPUs enough to create so many more faults?

Because if the answer is 'yes', then we can safely say: 'we regressed 
performance because correctness [not dropping dirty bits] comes before 
performance'.

If the answer is 'no', then we still have a mystery (and a regression) 
to track down.

As a second hack (not to be applied), could we change:

 #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL

to:

 #define _PAGE_BIT_PROTNONE      (_PAGE_BIT_GLOBAL+1)

to double check that the position of the bit does not matter?

I don't think we've exhaused all avenues of analysis here.

Thanks,

	Ingo

  reply	other threads:[~2015-03-08 10:02 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-07 15:20 [RFC PATCH 0/4] Automatic NUMA balancing and PROT_NONE handling followup v2r8 Mel Gorman
2015-03-07 15:20 ` [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd Mel Gorman
2015-03-07 20:13   ` Linus Torvalds
2015-03-07 20:31   ` Linus Torvalds
2015-03-07 20:56     ` Mel Gorman
2015-03-07 15:20 ` [PATCH 2/4] mm: numa: Remove migrate_ratelimited Mel Gorman
2015-03-07 15:20 ` [PATCH 3/4] mm: numa: Mark huge PTEs young when clearing NUMA hinting faults Mel Gorman
2015-03-07 18:33   ` Linus Torvalds
2015-03-07 18:42     ` Linus Torvalds
2015-03-07 15:20 ` [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Mel Gorman
2015-03-07 16:36   ` Ingo Molnar
2015-03-07 17:37     ` Mel Gorman
2015-03-08  9:54       ` Ingo Molnar
2015-03-07 19:12     ` Linus Torvalds
2015-03-08 10:02       ` Ingo Molnar [this message]
2015-03-08 18:35         ` Linus Torvalds
2015-03-08 18:46           ` Linus Torvalds
2015-03-09 11:29           ` Dave Chinner
2015-03-09 16:52             ` Linus Torvalds
2015-03-09 19:19               ` Dave Chinner
2015-03-10 23:55                 ` Linus Torvalds
2015-03-12 13:10                   ` Mel Gorman
2015-03-12 16:20                     ` Linus Torvalds
2015-03-12 18:49                       ` Mel Gorman
2015-03-17  7:06                         ` Dave Chinner
2015-03-17 16:53                           ` Linus Torvalds
2015-03-17 20:51                             ` Dave Chinner
2015-03-17 21:30                               ` Linus Torvalds
2015-03-17 22:08                                 ` Dave Chinner
2015-03-18 16:08                                   ` Linus Torvalds
2015-03-18 17:31                                     ` Linus Torvalds
2015-03-18 22:23                                       ` Dave Chinner
2015-03-19 14:10                                       ` Mel Gorman
2015-03-19 18:09                                         ` Linus Torvalds
2015-03-19 21:41                                       ` Linus Torvalds
2015-03-19 22:41                                         ` Dave Chinner
2015-03-19 23:05                                           ` Linus Torvalds
2015-03-19 23:23                                             ` Dave Chinner
2015-03-20  0:23                                             ` Dave Chinner
2015-03-20  1:29                                               ` Linus Torvalds
2015-03-20  4:13                                                 ` Dave Chinner
2015-03-20 17:02                                                   ` Linus Torvalds
2015-03-23 12:01                                                     ` Mel Gorman
2015-03-20 10:12                                                 ` Mel Gorman
2015-03-20  9:56                                             ` Mel Gorman
2015-03-08 20:40         ` Mel Gorman
2015-03-09 21:02           ` Mel Gorman
2015-03-10 13:08             ` Mel Gorman
2015-03-08  9:41   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150308100223.GC15487@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).