All of lore.kernel.org
 help / color / mirror / Atom feed
* + mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node.patch added to -mm tree
@ 2012-04-26 21:18 akpm
  0 siblings, 0 replies; only message in thread
From: akpm @ 2012-04-26 21:18 UTC (permalink / raw)
  To: mm-commits; +Cc: lwoodman, kamezawa.hiroyu, mel, mkosaki, riel


The patch titled
     Subject: mm: do_migrate_pages() calls migrate_to_node() even if task is already on a correct node
has been added to the -mm tree.  Its filename is
     mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Larry Woodman <lwoodman@redhat.com>
Subject: mm: do_migrate_pages() calls migrate_to_node() even if task is already on a correct node

While running an application that moves tasks from one cpuset to another I
noticed that it takes much longer and moves many more pages than expected.
 The reason for this is do_migrate_pages() does its best to preserve the
relative node differential from the first node of the cpuset because the
application may have been written with that in mind.  If memory was
interleaved on the nodes of the source cpuset by an application
do_migrate_pages() will try its best to maintain that interleaving on the
nodes of the destination cpuset.  This means copying the memory from all
source nodes to the destination nodes even if the source and destination
nodes overlap.

This is a problem for userspace NUMA placement tools.  The amount of time
spent doing extra memory moves cancels out some of the NUMA performance
improvements.  Furthermore, if the number of source and destination nodes
are to maintain the previous interleaving layout anyway.

This patch changes do_migrate_pages() to only preserve the relative layout
inside the program if the number of NUMA nodes in the source and
destination mask are the same.  If the number is different, we do a much
more efficient migration by not touching memory that is in an allowed
node.

This preserves the old behaviour for programs that want it, while allowing
a userspace NUMA placement tool to use the new, faster migration.  This
improves performance in our tests by up to a factor of 7.

Without this change migrating tasks from a cpuset containing nodes 0-7 to
a cpuset containing nodes 3-4, we migrate from ALL the nodes even if they
are in the both the source and destination nodesets:

   Migrating 7 to 4
   Migrating 6 to 3
   Migrating 5 to 4
   Migrating 4 to 3
   Migrating 1 to 4
   Migrating 3 to 4
   Migrating 0 to 3
   Migrating 2 to 3

With this change we only migrate from nodes that are not in the
destination nodesets:

   Migrating 7 to 4
   Migrating 6 to 3
   Migrating 5 to 4
   Migrating 2 to 3
   Migrating 1 to 4
   Migrating 0 to 3

Yet if we move from a cpuset containing nodes 2,3,4 to a cpuset
containing 3,4,5 we still
do move everything so that we preserve the desired NUMA offsets:

Migrating 4 to 5
Migrating 3 to 4
Migrating 2 to 3

As far as performance is concerned this simple patch improves the time it
takes to move 14, 20 and 26 large tasks from a cpuset containing nodes 0-7
to a cpuset containing nodes 1 & 3 by up to a factor of 7.  Here are the
timings with and without the patch:

BEFORE PATCH -- Move times: 59, 140, 651 seconds
============

Moving 14 tasks from nodes (0-7) to nodes (1,3)
numad(8780) do_migrate_pages (mm=0xffff88081d414400
from_nodes=0xffff880818c81d28 to_nodes=0xffff880818c81ce8 flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x7 dest=0x3
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x6 dest=0x1
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x5 dest=0x3
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x4 dest=0x1
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x2 dest=0x1
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x1 dest=0x3
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d414400 source=0x0 dest=0x1
flags=0x4)
(Above moves repeated for each of the 14 tasks...)
PID 8890 moved to node(s) 1,3 in 59.2 seconds

Moving 20 tasks from nodes (0-7) to nodes (1,4-5)
numad(8780) do_migrate_pages (mm=0xffff88081d88c700
from_nodes=0xffff880818c81d28 to_nodes=0xffff880818c81ce8 flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d88c700 source=0x7 dest=0x4
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d88c700 source=0x6 dest=0x1
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d88c700 source=0x3 dest=0x1
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d88c700 source=0x2 dest=0x5
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d88c700 source=0x1 dest=0x4
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d88c700 source=0x0 dest=0x1
flags=0x4)
(Above moves repeated for each of the 20 tasks...)
PID 8962 moved to node(s) 1,4-5 in 139.88 seconds

Moving 26 tasks from nodes (0-7) to nodes (1-3,5)
numad(8780) do_migrate_pages (mm=0xffff88081d5bc740
from_nodes=0xffff880818c81d28 to_nodes=0xffff880818c81ce8 flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x7 dest=0x5
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x6 dest=0x3
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x5 dest=0x2
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x3 dest=0x5
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x2 dest=0x3
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x1 dest=0x2
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x0 dest=0x1
flags=0x4)
numad(8780) migrate_to_node (mm=0xffff88081d5bc740 source=0x4 dest=0x1
flags=0x4)
(Above moves repeated for each of the 26 tasks...)
PID 9058 moved to node(s) 1-3,5 in 651.45 seconds

AFTER PATCH -- Move times: 42, 56, 93 seconds
===========

Moving 14 tasks from nodes (0-7) to nodes (5,7)
numad(33209) do_migrate_pages (mm=0xffff88101d5ff140
from_nodes=0xffff88101e7b5d28 to_nodes=0xffff88101e7b5ce8 flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d5ff140 source=0x6 dest=0x5
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d5ff140 source=0x4 dest=0x5
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d5ff140 source=0x3 dest=0x7
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d5ff140 source=0x2 dest=0x5
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d5ff140 source=0x1 dest=0x7
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d5ff140 source=0x0 dest=0x5
flags=0x4)
(Above moves repeated for each of the 14 tasks...)
PID 33221 moved to node(s) 5,7 in 41.67 seconds

Moving 20 tasks from nodes (0-7) to nodes (1,3,5)
numad(33209) do_migrate_pages (mm=0xffff88101d6c37c0
from_nodes=0xffff88101e7b5d28 to_nodes=0xffff88101e7b5ce8 flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d6c37c0 source=0x7 dest=0x3
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d6c37c0 source=0x6 dest=0x1
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d6c37c0 source=0x4 dest=0x3
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d6c37c0 source=0x2 dest=0x5
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d6c37c0 source=0x0 dest=0x1
flags=0x4)
(Above moves repeated for each of the 20 tasks...)
PID 33289 moved to node(s) 1,3,5 in 56.3 seconds

Moving 26 tasks from nodes (0-7) to nodes (1,3,5,7)
numad(33209) do_migrate_pages (mm=0xffff88101d924400
from_nodes=0xffff88101e7b5d28 to_nodes=0xffff88101e7b5ce8 flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d924400 source=0x6 dest=0x5
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d924400 source=0x4 dest=0x1
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d924400 source=0x2 dest=0x5
flags=0x4)
numad(33209) migrate_to_node (mm=0xffff88101d924400 source=0x0 dest=0x1
flags=0x4)
(Above moves repeated for each of the 26 tasks...)
PID 33372 moved to node(s) 1,3,5,7 in 92.67 seconds

Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <mkosaki@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff -puN mm/mempolicy.c~mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node mm/mempolicy.c
--- a/mm/mempolicy.c~mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node
+++ a/mm/mempolicy.c
@@ -1012,6 +1012,26 @@ int do_migrate_pages(struct mm_struct *m
 		int dest = 0;
 
 		for_each_node_mask(s, tmp) {
+
+			/*
+			 * do_migrate_pages() tries to maintain the
+			 * relative node relationship of the pages
+			 * established between threads and memory areas.
+                         *
+			 * However if the number of source nodes is not
+			 * equal to the number of destination nodes we
+			 * can not preserve this node relative relationship.
+			 * In that case, skip copying memory from a node that
+			 * is in the destination mask.
+			 *
+			 * Example: [2,3,4] -> [3,4,5] moves everything.
+			 *                 [0-7] - > [3,4,5] moves only 0,1,2,6,7.
+			 */
+
+			if ((nodes_weight(*from_nodes) != nodes_weight(*to_nodes)) &&
+						(node_isset(s, *to_nodes)))
+				continue;
+
 			d = node_remap(s, *from_nodes, *to_nodes);
 			if (s == d)
 				continue;
_
Subject: Subject: mm: do_migrate_pages() calls migrate_to_node() even if task is already on a correct node

Patches currently in -mm which might be from lwoodman@redhat.com are

mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node.patch
mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node-fix.patch
mm-do_migrate_pages-rename-arguments.patch


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-04-26 21:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-26 21:18 + mm-do_migrate_pages-calls-migrate_to_node-even-if-task-is-already-on-a-correct-node.patch added to -mm tree akpm

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.