linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andi Kleen <ak@linux.intel.com>
Cc: Christopher Lameter <cl@linux.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Liang, Kan" <kan.liang@intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk
Date: Tue, 22 Aug 2017 15:52:17 -0700	[thread overview]
Message-ID: <CA+55aFw_-RmdWF6mPHonnqoJcMEmjhvjzcwp5OU7Uwzk3KPNmw@mail.gmail.com> (raw)
In-Reply-To: <20170822212408.GC28715@tassilo.jf.intel.com>

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

On Tue, Aug 22, 2017 at 2:24 PM, Andi Kleen <ak@linux.intel.com> wrote:
>
> I believe in this case it's used by threads, so a reference count limit
> wouldn't help.

For the first migration try, yes. But if it's some kind of "try and
try again" pattern, the second time you try and there are people
waiting for the page, the page count (not the map count) would be
elevanted.

So it's possible that depending on exactly what the deeper problem is,
the "this page is very busy, don't migrate" case might be
discoverable, and the page count might be part of it.

However, after PeterZ made that comment that page migration should
have that should_numa_migrate_memory() filter, I am looking at that
mpol_misplaced() code.

And honestly, that MPOL_PREFERRED / MPOL_F_LOCAL case really looks
like complete garbage to me.

It looks like garbage exactly because it says "always migrate to the
current node", but that's crazy - if it's a group of threads all
running together on the same VM, that obviously will just bounce the
page around for absolute zero good ewason.

The *other* memory policies look fairly sane. They basically have a
fairly well-defined preferred node for the policy (although the
"MPOL_INTERLEAVE" looks wrong for a hugepage).  But
MPOL_PREFERRED/MPOL_F_LOCAL really looks completely broken.

Maybe people expected that anybody who uses MPOL_F_LOCAL will also
bind all threads to one single node?

Could we perhaps make that "MPOL_PREFERRED / MPOL_F_LOCAL" case just
do the MPOL_F_MORON policy, which *does* use that "should I migrate to
the local node" filter?

IOW, we've been looking at the waiters (because the problem shows up
due to the excessive wait queues), but maybe the source of the problem
comes from the numa balancing code just insanely bouncing pages
back-and-forth if you use that "always balance to local node" thing.

Untested (as always) patch attached.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 840 bytes --]

 mm/mempolicy.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 618ab125228b..f2d5aab84c49 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2190,9 +2190,9 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 
 	case MPOL_PREFERRED:
 		if (pol->flags & MPOL_F_LOCAL)
-			polnid = numa_node_id();
-		else
-			polnid = pol->v.preferred_node;
+			goto local_node;
+
+		polnid = pol->v.preferred_node;
 		break;
 
 	case MPOL_BIND:
@@ -2218,6 +2218,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 
 	/* Migrate the page towards the node whose CPU is referencing it */
 	if (pol->flags & MPOL_F_MORON) {
+local_node:
 		polnid = thisnid;
 
 		if (!should_numa_migrate_memory(current, page, curnid, thiscpu))

  reply	other threads:[~2017-08-22 22:52 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15  0:52 [PATCH 1/2] sched/wait: Break up long wake list walk Tim Chen
2017-08-15  0:52 ` [PATCH 2/2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-15  1:48 ` [PATCH 1/2] sched/wait: Break up long wake list walk Linus Torvalds
2017-08-15  2:27   ` Andi Kleen
2017-08-15  2:52     ` Linus Torvalds
2017-08-15  3:15       ` Andi Kleen
2017-08-15  3:28         ` Linus Torvalds
2017-08-15 19:05           ` Tim Chen
2017-08-15 19:41             ` Linus Torvalds
2017-08-15 19:47               ` Linus Torvalds
2017-08-15 22:47           ` Davidlohr Bueso
2017-08-15 22:56             ` Linus Torvalds
2017-08-15 22:57               ` Linus Torvalds
2017-08-15 23:50                 ` Linus Torvalds
2017-08-16 23:22                   ` Eric W. Biederman
2017-08-17 16:17   ` Liang, Kan
2017-08-17 16:25     ` Linus Torvalds
2017-08-17 20:18       ` Liang, Kan
2017-08-17 20:44         ` Linus Torvalds
2017-08-18 12:23           ` Mel Gorman
2017-08-18 14:20             ` Liang, Kan
2017-08-18 14:46               ` Mel Gorman
2017-08-18 16:36                 ` Tim Chen
2017-08-18 16:45                   ` Andi Kleen
2017-08-18 16:53                 ` Liang, Kan
2017-08-18 17:48                   ` Linus Torvalds
2017-08-18 18:54                     ` Mel Gorman
2017-08-18 19:14                       ` Linus Torvalds
2017-08-18 19:58                         ` Andi Kleen
2017-08-18 20:10                           ` Linus Torvalds
2017-08-21 18:32                         ` Mel Gorman
2017-08-21 18:56                           ` Liang, Kan
2017-08-22 17:23                             ` Liang, Kan
2017-08-22 18:19                               ` Linus Torvalds
2017-08-22 18:25                                 ` Linus Torvalds
2017-08-22 18:56                                 ` Peter Zijlstra
2017-08-22 19:15                                   ` Linus Torvalds
2017-08-22 19:08                                 ` Peter Zijlstra
2017-08-22 19:30                                   ` Linus Torvalds
2017-08-22 19:37                                     ` Andi Kleen
2017-08-22 21:08                                       ` Christopher Lameter
2017-08-22 21:24                                         ` Andi Kleen
2017-08-22 22:52                                           ` Linus Torvalds [this message]
2017-08-22 23:19                                             ` Linus Torvalds
2017-08-23 14:51                                             ` Liang, Kan
2017-08-22 19:55                                 ` Liang, Kan
2017-08-22 20:42                                   ` Linus Torvalds
2017-08-22 20:53                                     ` Peter Zijlstra
2017-08-22 20:58                                       ` Linus Torvalds
2017-08-23 14:49                                     ` Liang, Kan
2017-08-23 15:58                                       ` Tim Chen
2017-08-23 18:17                                         ` Linus Torvalds
2017-08-23 20:55                                           ` Liang, Kan
2017-08-23 23:30                                           ` Linus Torvalds
2017-08-24 17:49                                             ` Tim Chen
2017-08-24 18:16                                               ` Linus Torvalds
2017-08-24 20:44                                                 ` Mel Gorman
2017-08-25 16:44                                                   ` Tim Chen
2017-08-23 16:04                                 ` Mel Gorman
2017-08-18 20:05                     ` Andi Kleen
2017-08-18 20:29                       ` Linus Torvalds
2017-08-18 20:29                     ` Liang, Kan
2017-08-18 20:34                       ` Linus Torvalds
2017-08-18 16:55             ` Linus Torvalds
2017-08-18 13:06           ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFw_-RmdWF6mPHonnqoJcMEmjhvjzcwp5OU7Uwzk3KPNmw@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kan.liang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).