All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, peterz@infradead.org,
	mingo@kernel.org, aarcange@redhat.com
Subject: Re: [PATCH] sched,numa,mm: revert to checking pmd/pte_write instead of VMA flags
Date: Mon, 12 Sep 2016 11:09:43 -0400	[thread overview]
Message-ID: <1473692983.32433.235.camel@redhat.com> (raw)
In-Reply-To: <20160911162402.GA2775@suse.de>

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

On Sun, 2016-09-11 at 17:24 +0100, Mel Gorman wrote:
> On Thu, Sep 08, 2016 at 09:30:53PM -0400, Rik van Riel wrote:
> > Commit 4d9424669946 ("mm: convert p[te|md]_mknonnuma and remaining
> > page table manipulations") changed NUMA balancing from _PAGE_NUMA
> > to using PROT_NONE, and was quickly found to introduce a regression
> > with NUMA grouping.
> > 
> > It was followed up by these changesets:
> > 
> > 53da3bc2ba9e ("mm: fix up numa read-only thread grouping logic")
> > bea66fbd11af ("mm: numa: group related processes based on VMA flags
> > instead of page table flags")
> > b191f9b106ea ("mm: numa: preserve PTE write permissions across a
> > NUMA hinting fault")
> > 
> > The first of those two changesets try alternate approaches to NUMA
> > grouping, which apparently do not work as well as looking at the
> > PTE
> > write permissions.
> > 
> > The latter patch preserves the PTE write permissions across a NUMA
> > protection fault. However, it forgets to revert the condition for
> > whether or not to group tasks together back to what it was before
> > 3.19, even though the information is now preserved in the page
> > tables
> > once again.
> > 
> > This patch brings the NUMA grouping heuristic back to what it was
> > before changeset 4d9424669946, which the changelogs of subsequent
> > changesets suggest worked best.
> > 
> > We have all the information again. We should probably use it.
> > 
> 
> Patch looks ok other than the comment above the second hunk being out
> of
> date. Out of curiousity, what workload benefitted from this? I saw a
> mix
> of marginal results when I ran this on a 2-socket and 4-socket box.

I did not performance test the change, because I believe
the VM_WRITE test has a small logical error.

Specifically, VM_WRITE is also true for VMAs that are
PROT_WRITE|MAP_PRIVATE, which we do NOT want to group
on. Every shared library mapped on my system seems to
have a (small) read-write VMA:

00007f5adacff000   1764K r-x-- libc-2.23.so
00007f5adaeb8000   2044K ----- libc-2.23.so
00007f5adb0b7000     16K r---- libc-2.23.so
00007f5adb0bb000      8K rw--- libc-2.23.so

In other words, the code that is currently upstream
could result in programs being grouped into a numa
group due to accesses to libc.so, if they happened
to get started up right at the same time.

This will not catch many programs, since most of them
will have private copies of the pages in the small
read-write segments by the time other programs start
up, but it could catch a few of them.

Testing on VM_WRITE|VM_SHARED would solve that issue,
but at that point it would be essentially identical
to reverting the code to the old pte_write() test
that we had in 3.19 and before.

I do not expect the performance impact to be visible,
except when somebody gets very unlucky with application
startup timing.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2016-09-12 15:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09  1:30 [PATCH] sched,numa,mm: revert to checking pmd/pte_write instead of VMA flags Rik van Riel
2016-09-09  1:30 ` Rik van Riel
2016-09-11 16:24 ` Mel Gorman
2016-09-11 16:24   ` Mel Gorman
2016-09-12 15:09   ` Rik van Riel [this message]
2016-09-14  9:25     ` Mel Gorman
2016-09-14  9:25       ` Mel Gorman
2016-09-13 22:07 ` [tip:sched/core] sched/numa, mm: Revert " tip-bot for Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1473692983.32433.235.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.