All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview
@ 2005-06-22 16:39 Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 1/10] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

Summary
-------

This is the -rc3 version of the manual page migration facility
that I proposed in February and that was discussed on the
linux-mm mailing list.  This overview is relatively short since
the overview is effectively unchanged from what I submitted on
April 6, 2005.  For details, see the overview I sent out then at:

http://marc.theaimsgroup.com/?l=linux-mm&m=111276123522952&w=2

For details of the -rc2 version of this patcheset, see:

http://marc.theaimsgroup.com/?l=linux-mm&m=111578651020174&w=2

This patch set differs from the previous patchset in the following:

(1)  The previous patch was based on 2.6.12-rc3-mhp1, this patchset
     is based on patch-2.6.12-rc5-mhp1-pm.gz from www.sr71.net/patches/
     2.6.12 (of the Memory Hotplug project patchset maintained by
     Dave Hansen).

(2)  The previous patchset used an XFS extended attribute to
     help the kernel code recognize mapped files as being
     shared libraries and to identify files that were not to
     be migrated.  The current patcheset uses the following
     algorithm to determine which VMAs should be migrated:

     (1)  Anymous VMAs are migrated.
     (2)  VMAs for mapped files are migrated if they have
          VM_WRITE set in the vm_flags field.

     This correctly handles shared libraries and R/O data
     files that are mapped out of /lib and /usr/lib.  It does
     not cause the executable to be migrated, nor does it
     correctly handle r/o (user) data files that are mapped
     into the process address space.

     To deal with these cases (as well as to allow the
     user-level migration library to have some control
     over what things are migrated), this patchset also
     supports modifying the migration policy on a file
     by file basis through use of the mbind() system call.

     For details, see the patch:  add-mempolicy-control-rc3.patch.

(3)  Some code changes and bug fixes were made.  For details,
     see the patch:  add-sys_migrate_pages-rc3.patch

(5)  Changes suggested by Paul Jackson and Christoph Hellwig
     have been incorporated into this patchset.

If this patch is acceptable to the Memory Hotplug Team, I'd like
to see it added to the page migration sequence of patches in
the memory hotplug patch.

This patch adds a parameter to try_to_migrate_pages().
The last patch of this series:

N1.2-add-nodemap-to-try_to_migrate_pages-call.patch

Should be inserted in the memory hotplug patcheset after the
patch:  N1.1-pass-page_list-to-steal_page.patch to fixup
the call to try_to_migrate_pages() from capture_page_range()
in mm/page_alloc.c.

As always, suggestions, flames, etc should be directed to me
at raybry@sgi.com or raybry@austin.rr.com.

Description of the patches in this patchset
-------------------------------------------

Recall that all of these patches apply to 2.6.12-rc5 with the
page-migration patches applied first.  The simplest way to do
this is to obtain the Memory Hotplug broken out patches from

http://sr71.net/patches/2.6.12/2.6.12-rc5-mhp1/broken-out-2.6.12-rc5-mhp1.tar.gz

And then to add patches 1-9 of this patchset to the series file
after the patch "AA-PM-99-x86_64-IMMOVABLE.patch".  (Patch 10
goes after N1.1-pass-page_list-to-steal_page.patch.) Then apply all
patches up through the 9th patch of this set and turn on the
CONFIG_MEMORY_MIGRATE option.  This works on Altix, at least;
that is the only NUMA machine I have access to at the moment.

(I've run into some minor problems with the broken out page-migration
patches under http://sr71.net/patches/2.6.12/2.6.12-rc5-mhp1/page-migration.
Nothing significant, but applying the broken-out patches worked
better for me this time.)

The 10th patch is only needed if you want to try to build the
entire mhp1 patchset after applying the manual page migration
patches.

Patch 1: hirokazu-steal_page_from_lru.patch
	 This patch (due to Hirokazu Tokahashi) simplifies the interface
	 to steal_page_from_lru() and is not yet present in the 2.6.12-rc5-mhp1
	 patchset.

Patch 2: xfs-migrate-page-rc3.patch
	 This patch, due to Nathan Scott at SGI, provides a migrate_
	 page method for XFS.  EXT2 and EXT3 already have such methods.

Patch 3: add-node_map-arg-to-try_to_migrate_pages-rc3.patch
         This patch adds an additional argument to try_to_migrate_pages().
	 The additional argument controls where pages found on specific
	 nodes in the page_list passed into try_to_migrate_pages() are
	 migrated to.

Patch 4: add-sys_migrate_pages-rc3.patch
	 This is the patch that adds the migrate_pages() system call.
	 This patch provides a simple version of the system call that
	 migrates all pages associated with a particular process, so
	 is really only useful for programs that are statically linked
	 (i. e. that don't map in any shared libraries).

Patch 5: sys_migrate_pages-mempolicy-migration-rc3.patch
         This patch updates the memory policy data structures
	 as they are encountered in accordance with the migration
	 request.

Patch 6: add-mempolicy-control-rc3.patch
	 This patch extends the mbind() and get_mempolicy() system
	 calls to support the interface to override the default
	 kernel policy.

Patch 7: sys_migrate_pages-migration-selection-rc3.patch
	 This patch uses the migration policy bits set by the code
	 from the last patch to control which mapped files are
	 migrated (or not).

Patch 8: sys_migrate_pages-cpuset-support.patch
         This patch makes migrate_pages() cooperate better with
	 cpusets.

Patch 9: sys_migrate_pages-permissions-check.patch
         This patch adds a permission check to make sure the
	 invoking process has the necessary permissions to migrate
	 the target task.

Patch 10:N1.2-add-nodemap-to-try_to_migrate_pages-call.patch
	 This patch fixes the call to try_to_migrate_pages()
	 from capture_page_range() in mm/page_alloc.c that
	 is introduced in the N1.0-memsection_migrate.patch
	 of the memory hotplug series.


Unresolved issues
-----------------

(1)  This version of migrate_pages() works reliably only when the
     process to be migrated has been stopped (e. g., using SIGSTOP)
     before the migrate_pages() system call is executed. 
     (The system doesn't crash or oops, but sometimes the process
     being migrated will be "Killed by VM" when it starts up again.
     There may be a few messages put into the log as well at that time.)

     At the moment, I am proposing that processes need to be
     suspended before being migrated.  This really should not
     be a performance conern, since the delay imposed by page
     migration far exceeds any delay imposed by SIGSTOPing the
     processes before migration and SIGCONTinuing them afterward.

(2)  I'm still using system call #1279.  On ia64 this is the
     last slot in the system call table.  A system call number
     needs to be assigned to migrate_pages().

(3)  As part of the discussion with Andi Kleen, we agreed to
     provide some memory migration support under MPOL_MF_STRICT.
     Currently, if one calls mbind() with the flag MPOL_MF_STRICT
     set, and pages are found that don't follow the memory policy,
     then the mbind() will return -EIO.  Andi would like to be
     able cause those pages to be migrated to the correct nodes.
     This feature is not yet part of this patchset and will
     be added as a distinct set of patches.

Background
----------

The purpose of this set of patches is to introduce the necessary kernel
infrastructure to support "manual page migration".  That phrase is
intended to describe a facility whereby some user program (most likely
a batch scheduler) is given the responsibility of managing where jobs
run on a large NUMA system.  If it turns out that a job needs to be
run on a different set of nodes from where it is running now, then that
user program would invoke this facility to move the job to the new set
of nodes.

We use the word "manual" here to indicate that the facility is invoked
in a way that the kernel is told where to move things; we distinguish
this approach from "automatic page migration" facilities which have been
proposed in the past.  To us, "automatic page migration" implies using
hardware counters to determine where pages should reside and having the
O/S automatically move misplaced pages.  The utility of such facilities,
for example, on IRIX has, been mixed, and we are not currently proposing
such a facility for Linux.

The normal sequence of events would be as follows:  A job is running
on, say nodes 5-8, and a higher priority job arrives and the only place
it can be run, for whatever reason, is nodes 5-8.  Then the scheduler
would suspend the processes of the existing job (by, for example sending
them a SIGSTOP) and start the new job on those nodes.  At some point in
the future, other nodes become available for use, and at this point the
batch scheduler would invoke the manual page migration facility to move
the processes of the suspended job from nodes 5-8 to the new set of nodes.

Note that not all of the pages of all of the processes will need to (or
should) be moved.  For example, pages of shared libraries are likely to be
shared by many processes in the system; these pages should not be moved
merely because a few processes using these libraries have been migrated.
As discussed above, the kernel code handles this by migrating all
anonymous VMAs and all VMAs with the VM_WRITE bit set.  VMAs that map
the code segments of a program don't have VM_WRITE set, so shared
library code segments will not be migrated (by default).  Read-only mapped
files (e. g. files in /usr/lib for National Language support) are also
not migrated by default.

The default migration decisions of the kernel migration code can be
overridden for mmap()'d files using the mbind() system call, as
described above.  This call can be used, for example, to cause the
program executable to be migrated.  Similarly, if the user has a
(non-system) data file mapped R/O, the mbind() system call can be
used to override the kernel default and cause the mapped file to be
migrated as well.

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 1/10] mm: hirokazu-steal_page_from_lru.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch Ray Bryant
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

Hi Dave,

Would you apply the following patch right after
AA-PM-01-steal_page_from_lru.patch.

This patch makes steal_page_from_lru() and putback_page_to_lru()
check PageLRU() with zone->lur_lock held. Currently the process
migration code, where Ray is working on, only uses this code.

Thanks,
Hirokazu Takahashi.


Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
---

 linux-2.6.12-rc3-taka/include/linux/mm_inline.h |    8 +++++---
 1 files changed, 5 insertions, 3 deletions

diff -puN include/linux/mm_inline.h~taka-steal_page_from_lru-FIX include/linux/mm_inline.h
--- linux-2.6.12-rc3/include/linux/mm_inline.h~taka-steal_page_from_lru-FIX	Mon May 23 02:26:57 2005
+++ linux-2.6.12-rc3-taka/include/linux/mm_inline.h	Mon May 23 02:26:57 2005
@@ -80,9 +80,10 @@ static inline int
 steal_page_from_lru(struct zone *zone, struct page *page,
 			struct list_head *dst)
 {
-	int ret;
+	int ret = 0;
 	spin_lock_irq(&zone->lru_lock);
-	ret = __steal_page_from_lru(zone, page, dst);
+	if (PageLRU(page))
+		ret = __steal_page_from_lru(zone, page, dst);
 	spin_unlock_irq(&zone->lru_lock);
 	return ret;
 }
@@ -102,7 +103,8 @@ static inline void
 putback_page_to_lru(struct zone *zone, struct page *page)
 {
 	spin_lock_irq(&zone->lru_lock);
-	__putback_page_to_lru(zone, page);
+	if (!PageLRU(page))
+		__putback_page_to_lru(zone, page);
 	spin_unlock_irq(&zone->lru_lock);
 }
 
_


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Lhms-devel mailing list
Lhms-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lhms-devel


-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 1/10] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-22 17:30   ` [Lhms-devel] " Joel Schopp
  2005-06-23  4:01   ` Nathan Scott
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 3/10] mm: manual page migration-rc3 -- add-node_map-arg-to-try_to_migrate_pages-rc3.patch Ray Bryant
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

Nathan Scott of SGI provided this patch for XFS that supports
the migrate_page method in the address_space operations vector.
It is basically the same as what is in ext2_migrate_page().
However, the routine "xfs_skip_migrate_page()" is added to
disallow migration of xfs metadata.

Signed-off-by: Ray Bryant <raybry@sgi.com>
--

 xfs_aops.c |   10 ++++++++++
 xfs_buf.c  |    7 +++++++
 2 files changed, 17 insertions(+)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/fs/xfs/linux-2.6/xfs_aops.c	2005-06-13 11:12:36.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_aops.c	2005-06-13 11:12:42.000000000 -0700
@@ -54,6 +54,7 @@
 #include "xfs_iomap.h"
 #include <linux/mpage.h>
 #include <linux/writeback.h>
+#include <linux/mmigrate.h>
 
 STATIC void xfs_count_page_state(struct page *, int *, int *, int *);
 STATIC void xfs_convert_page(struct inode *, struct page *, xfs_iomap_t *,
@@ -1273,6 +1274,14 @@ linvfs_prepare_write(
 	return block_prepare_write(page, from, to, linvfs_get_block);
 }
 
+STATIC int
+linvfs_migrate_page(
+	struct page		*from,
+	struct page		*to)
+{
+	return generic_migrate_page(from, to, migrate_page_buffer);
+}
+
 struct address_space_operations linvfs_aops = {
 	.readpage		= linvfs_readpage,
 	.readpages		= linvfs_readpages,
@@ -1283,4 +1292,5 @@ struct address_space_operations linvfs_a
 	.commit_write		= generic_commit_write,
 	.bmap			= linvfs_bmap,
 	.direct_IO		= linvfs_direct_IO,
+	.migrate_page		= linvfs_migrate_page,
 };
Index: linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/fs/xfs/linux-2.6/xfs_buf.c	2005-06-13 11:12:36.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_buf.c	2005-06-13 11:12:42.000000000 -0700
@@ -1626,6 +1626,12 @@ xfs_setsize_buftarg(
 }
 
 STATIC int
+xfs_skip_migrate_page(struct page *from, struct page *to)
+{
+	return -EBUSY;
+}
+
+STATIC int
 xfs_mapping_buftarg(
 	xfs_buftarg_t		*btp,
 	struct block_device	*bdev)
@@ -1635,6 +1641,7 @@ xfs_mapping_buftarg(
 	struct address_space	*mapping;
 	static struct address_space_operations mapping_aops = {
 		.sync_page = block_sync_page,
+		.migrate_page = xfs_skip_migrate_page,
 	};
 
 	inode = new_inode(bdev->bd_inode->i_sb);

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 3/10] mm: manual page migration-rc3 -- add-node_map-arg-to-try_to_migrate_pages-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 1/10] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch Ray Bryant
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

This patch changes the interface to try_to_migrate_pages() so that one
can specify the nodes where the pages are to be migrated to.  This is
done by adding a "node_map" argument to try_to_migrate_pages(), node_map
is of type "int *".

If this argument is NULL, then try_to_migrate_pages() behaves exactly
as before and this is the interface the rest of the memory hotplug
patch should use.  (Note:  This patchset does not include the changes
for the rest of the memory hotplug patch that will be necessary to use
this new interface [if it is accepted].  Those chagnes will be provided
as a distinct patch.)

If the argument is non-NULL, the node_map points at an array of int
of size MAX_NUMNODES.   node_map[N] is either the id of an online node
or -1.  If node_map[N] >=0 then pages found in the page list passed
to try_to_migrate_pages() that are found on node N are migrated to node
node_map[N].  if node_map[N] == -1, then pages found on node N are left
where they are.

This change depends on previous changes to migrate_onepage()
that support migrating a page to a specified node.  These changes
are already part of the memory migration sub-patch of the memory
hotplug patch.

Signed-off-by:  Ray Bryant <raybry@sgi.com>
--

 include/linux/mmigrate.h |   11 ++++++++++-
 mm/mmigrate.c            |   10 ++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mmigrate.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mmigrate.h	2005-06-10 14:47:25.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mmigrate.h	2005-06-13 10:22:22.000000000 -0700
@@ -16,7 +16,16 @@ extern int migrate_page_buffer(struct pa
 extern int page_migratable(struct page *, struct page *, int,
 					struct list_head *);
 extern struct page * migrate_onepage(struct page *, int nodeid);
-extern int try_to_migrate_pages(struct list_head *);
+extern int try_to_migrate_pages(struct list_head *, int *);
+
+static inline struct page *node_migrate_onepage(struct page *page, int *node_map)
+{
+	if (node_map)
+		return migrate_onepage(page, node_map[page_to_nid(page)]);
+	else
+		return migrate_onepage(page, MIGRATE_NODE_ANY);
+
+}
 
 #else
 static inline int generic_migrate_page(struct page *page, struct page *newpage,
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c	2005-06-10 14:47:25.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c	2005-06-13 10:22:02.000000000 -0700
@@ -501,9 +501,11 @@ out_unlock:
 /*
  * This is the main entry point to migrate pages in a specific region.
  * If a page is inactive, the page may be just released instead of
- * migration.
+ * migration.  node_map is supplied in those cases (on NUMA systems)
+ * where the caller wishes to specify to which nodes the pages are
+ * migrated.  If node_map is null, the target node is MIGRATE_NODE_ANY.
  */
-int try_to_migrate_pages(struct list_head *page_list)
+int try_to_migrate_pages(struct list_head *page_list, int *node_map)
 {
 	struct page *page, *page2, *newpage;
 	LIST_HEAD(pass1_list);
@@ -541,7 +543,7 @@ int try_to_migrate_pages(struct list_hea
 	list_for_each_entry_safe(page, page2, &pass1_list, lru) {
 		list_del(&page->lru);
 		if (PageLocked(page) || PageWriteback(page) ||
-		    IS_ERR(newpage = migrate_onepage(page, MIGRATE_NODE_ANY))) {
+		    IS_ERR(newpage = node_migrate_onepage(page, node_map))) {
 			if (page_count(page) == 1) {
 				/* the page is already unused */
 				putback_page_to_lru(page_zone(page), page);
@@ -559,7 +561,7 @@ int try_to_migrate_pages(struct list_hea
 	 */
 	list_for_each_entry_safe(page, page2, &pass2_list, lru) {
 		list_del(&page->lru);
-		if (IS_ERR(newpage = migrate_onepage(page, MIGRATE_NODE_ANY))) {
+		if (IS_ERR(newpage = node_migrate_onepage(page, node_map))) {
 			if (page_count(page) == 1) {
 				/* the page is already unused */
 				putback_page_to_lru(page_zone(page), page);

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (2 preceding siblings ...)
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 3/10] mm: manual page migration-rc3 -- add-node_map-arg-to-try_to_migrate_pages-rc3.patch Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-22 17:23   ` Dave Hansen
  2005-06-25 10:32   ` Hirokazu Takahashi
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch Ray Bryant
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

This is the main patch that creates the migrate_pages() system
call.  Note that in this case, the system call number was more
or less arbitrarily assigned at 1279.  This number needs to
allocated.

This patch sits on top of the page migration patches from
the Memory Hotplug project.  This particular patchset is built
on top of the page migration subset of:

http://www.sr71.net/patches/2.6.12/2.6.12-rc5-mhp1/broken-out-2.6.12-rc5-mhp1.tar.gz

but it may apply on subsequent page migration patches as well.
(The page migration subset of the mhp broken out patches begins
with the first patch of the patchset above and ends with the
patch:  AA-PM-99-x86_64-IMMOVABLE.patch.  Normally, I would use
the patch

http://www.sr71.net/patches/2.6.12/2.6.12-rc5-mhp1/page-migration/patch-2.6.12-rc5-mhp1-pm.gz

for this, but this particular release of the memory hotplug
patches, the broken out approach described above worked better.)

This patch migrates all pages in the specified process (including
shared libraries) so is mostly useful for migrating statically
bound applications.  This is made more general by subsequent
patches of this patchset.

See the patches:
	sys_migrate_pages-migration-selection-rc4.patch
	add-mempolicy-control-rc4.patch

for details on the default kernel migration policy (this determines
which VMAs are actually migrated) and how this policy can be overridden
using the mbind() system call.

Updates since last release of this patchset:

(1)  old_nodes and new_nodes are now arrays of int instead of short.
(2)  The wait and retry code in migrate_vma() has been replaced by
     code that returns -EAGAIN and expects the user-level code to
     retry.  In general, we expect this will never happen except
     in the rare case of trunction that is happening at the same
     time that an associated page is being migrated.
(3)  In the case that -EAGAIN is returned from migrate_vma(), the
     unmigrated pages are now put back onto the LRU lists.  Previously
     this was not done.
(3)  The mmap_semaphore is now taken during the migration operation.
     This is to avoid an mmap() or munmap() occurring at the same
     time as a migration (the latter could cause the mm->mmap list
     to change underneath us, which is "Not a good thing[tm]".
(4)  Numerous other suggestions of Paul Jackson and Christoph
     Hellwig have been applied.

Signed-off-by: Ray Bryant <raybry@sgi.com>
--

 arch/ia64/kernel/entry.S |    2 
 kernel/sys_ni.c          |    1 
 mm/mmigrate.c            |  181 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 182 insertions(+), 2 deletions(-)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/arch/ia64/kernel/entry.S
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/arch/ia64/kernel/entry.S	2005-06-13 10:21:40.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/arch/ia64/kernel/entry.S	2005-06-13 10:22:41.000000000 -0700
@@ -1582,6 +1582,6 @@ sys_call_table:
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
-	data8 sys_ni_syscall
+	data8 sys_migrate_pages			// 1279
 
 	.org sys_call_table + 8*NR_syscalls	// guard against failures to increase NR_syscalls
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c	2005-06-13 10:22:02.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c	2005-06-13 10:43:15.000000000 -0700
@@ -5,6 +5,9 @@
  *
  *  Authors:	IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
  *		Hirokazu Takahashi <taka@valinux.co.jp>
+ *
+ * sys_migrate_pages() added by Ray Bryant <raybry@sgi.com>
+ * Copyright (C) 2005, Silicon Graphics, Inc.
  */
 
 #include <linux/config.h>
@@ -21,6 +24,8 @@
 #include <linux/rmap.h>
 #include <linux/mmigrate.h>
 #include <linux/delay.h>
+#include <linux/nodemask.h>
+#include <asm/bitops.h>
 
 /*
  * The concept of memory migration is to replace a target page with
@@ -436,7 +441,7 @@ migrate_onepage(struct page *page, int n
 	if (nodeid == MIGRATE_NODE_ANY)
 		newpage = page_cache_alloc(mapping);
 	else
-		newpage = alloc_pages_node(nodeid, mapping->flags, 0);
+		newpage = alloc_pages_node(nodeid, (unsigned int)mapping->flags, 0);
 	if (newpage == NULL) {
 		unlock_page(page);
 		return ERR_PTR(-ENOMEM);
@@ -587,6 +592,180 @@ int try_to_migrate_pages(struct list_hea
 	return nr_busy;
 }
 
+static int
+migrate_vma(struct task_struct *task, struct mm_struct *mm,
+	struct vm_area_struct *vma, int *node_map)
+{
+	struct page *page, *page2;
+	unsigned long vaddr;
+	int count = 0, nr_busy;
+	LIST_HEAD(page_list);
+
+	/* can't migrate mlock()'d pages */
+	if (vma->vm_flags & VM_LOCKED)
+		return 0;
+
+	/*
+	 * gather all of the pages to be migrated from this vma into page_list
+	 */
+	spin_lock(&mm->page_table_lock);
+ 	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
+		page = follow_page(mm, vaddr, 0);
+		/*
+		 * follow_page has been known to return pages with zero mapcount
+		 * and NULL mapping.  Skip those pages as well
+		 */
+		if (page && page_mapcount(page)) {
+			if (node_map[page_to_nid(page)] >= 0) {
+				if (steal_page_from_lru(page_zone(page), page,
+					&page_list))
+						count++;
+				else
+					BUG();
+			}
+		}
+	}
+	spin_unlock(&mm->page_table_lock);
+
+	/* call the page migration code to move the pages */
+	if (count) {
+		nr_busy = try_to_migrate_pages(&page_list, node_map);
+
+		if (nr_busy < 0)
+			return nr_busy;
+
+		if (nr_busy == 0)
+			return count;
+
+		/* return the unmigrated pages to the LRU lists */
+		list_for_each_entry_safe(page, page2, &page_list, lru) {
+			list_del(&page->lru);
+			putback_page_to_lru(page_zone(page), page);
+		}
+		return -EAGAIN;
+	}
+
+	return 0;
+
+}
+
+void lru_add_drain_per_cpu(void *info)
+{
+	lru_add_drain();
+}
+
+asmlinkage long
+sys_migrate_pages(pid_t pid, __u32 count, __u32 *old_nodes, __u32 *new_nodes)
+{
+	int i, ret = 0, migrated = 0;
+	int *tmp_old_nodes = NULL;
+	int *tmp_new_nodes = NULL;
+	int *node_map;
+	struct task_struct *task;
+	struct mm_struct *mm = NULL;
+	size_t size = count * sizeof(tmp_old_nodes[0]);
+	struct vm_area_struct *vma;
+	nodemask_t old_node_mask, new_node_mask;
+
+	if ((count < 1) || (count > MAX_NUMNODES))
+		return -EINVAL;
+
+	tmp_old_nodes = kmalloc(size, GFP_KERNEL);
+	tmp_new_nodes = kmalloc(size, GFP_KERNEL);
+	node_map = kmalloc(MAX_NUMNODES*sizeof(node_map[0]), GFP_KERNEL);
+
+	if (!tmp_old_nodes || !tmp_new_nodes || !node_map) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(tmp_old_nodes, (void __user *)old_nodes, size) ||
+	    copy_from_user(tmp_new_nodes, (void __user *)new_nodes, size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	nodes_clear(old_node_mask);
+	nodes_clear(new_node_mask);
+	for (i = 0; i < count; i++) {
+		int n;
+
+		n = tmp_old_nodes[i];
+		if ((n < 0) || (n >= MAX_NUMNODES))
+			goto out_einval;
+		node_set(n, old_node_mask);
+
+		n = tmp_new_nodes[i];
+		if ((n < 0) || (n >= MAX_NUMNODES) || !node_online(n))
+			goto out_einval;
+		node_set(n, new_node_mask);
+
+	}
+
+	/* old_nodes and new_nodes must be disjoint */
+	if (nodes_intersects(old_node_mask, new_node_mask))
+		goto out_einval;
+
+	/* find the task and mm_structs for this process */
+	read_lock(&tasklist_lock);
+	task = find_task_by_pid(pid);
+	if (task) {
+		task_lock(task);
+		mm = task->mm;
+		if (mm)
+			atomic_inc(&mm->mm_users);
+		task_unlock(task);
+	} else {
+		ret = -ESRCH;
+		read_unlock(&tasklist_lock);
+		goto out;
+	}
+	read_unlock(&tasklist_lock);
+	if (!mm)
+		goto out_einval;
+
+	/* set up the node_map array */
+	for (i = 0; i < MAX_NUMNODES; i++)
+		node_map[i] = -1;
+	for (i = 0; i < count; i++)
+		node_map[tmp_old_nodes[i]] = tmp_new_nodes[i];
+
+	/* prepare for lru list manipulation */
+  	smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+	lru_add_drain();
+
+	/* actually do the migration */
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		/* migrate the pages of this vma */
+		ret = migrate_vma(task, mm, vma, node_map);
+		if (ret < 0)
+			goto out_up_mmap_sem;
+		migrated += ret;
+	}
+	up_read(&mm->mmap_sem);
+	ret = migrated;
+
+out:
+	if (mm)
+		mmput(mm);
+
+	kfree(tmp_old_nodes);
+	kfree(tmp_new_nodes);
+	kfree(node_map);
+
+	return ret;
+
+out_einval:
+	ret = -EINVAL;
+	goto out;
+
+out_up_mmap_sem:
+	up_read(&mm->mmap_sem);
+	goto out;
+
+}
+
 EXPORT_SYMBOL(generic_migrate_page);
 EXPORT_SYMBOL(migrate_page_common);
 EXPORT_SYMBOL(migrate_page_buffer);
Index: linux-2.6.12-rc5-mhp1-page-migration-export/kernel/sys_ni.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/kernel/sys_ni.c	2005-06-13 10:21:40.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/kernel/sys_ni.c	2005-06-13 10:22:41.000000000 -0700
@@ -77,6 +77,7 @@ cond_syscall(sys_request_key);
 cond_syscall(sys_keyctl);
 cond_syscall(compat_sys_keyctl);
 cond_syscall(compat_sys_socketcall);
+cond_syscall(sys_migrate_pages);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (3 preceding siblings ...)
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-23  1:51   ` Andi Kleen
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 6/10] mm: manual page migration-rc3 -- add-mempolicy-control-rc3.patch Ray Bryant
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

This patch adds code that translates the memory policy structures
as they are encountered so that they continue to represent where
memory should be allocated after the page migration has completed.

Signed-off-by: Ray Bryant <raybry@sgi.com>
--

 include/linux/mempolicy.h |    2 
 mm/mempolicy.c            |  116 ++++++++++++++++++++++++++++++++++++++++++++++
 mm/mmigrate.c             |   12 ++++
 3 files changed, 129 insertions(+), 1 deletion(-)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mempolicy.h	2005-06-13 11:12:34.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h	2005-06-13 11:12:51.000000000 -0700
@@ -152,6 +152,8 @@ struct mempolicy *mpol_shared_policy_loo
 
 extern void numa_default_policy(void);
 extern void numa_policy_init(void);
+extern int migrate_process_policy(struct task_struct *, unsigned int *);
+extern int migrate_vma_policy(struct vm_area_struct *, unsigned int *);
 
 #else
 
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c	2005-06-13 11:12:34.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c	2005-06-13 11:12:51.000000000 -0700
@@ -1136,3 +1136,119 @@ void numa_default_policy(void)
 {
 	sys_set_mempolicy(MPOL_DEFAULT, NULL, 0);
 }
+
+/*
+ * update a node mask according to a migration request
+ */
+static void migrate_node_mask(unsigned long *new_node_mask,
+			      unsigned long *old_node_mask,
+			      unsigned int  *node_map)
+{
+	int i;
+
+	bitmap_zero(new_node_mask, MAX_NUMNODES);
+
+	i = find_first_bit(old_node_mask, MAX_NUMNODES);
+	while(i < MAX_NUMNODES) {
+		if (node_map[i] >= 0)
+			set_bit(node_map[i], new_node_mask);
+		else
+			set_bit(i, new_node_mask);
+		i = find_next_bit(old_node_mask, MAX_NUMNODES, i+1);
+	}
+}
+
+/*
+ * update a process or vma mempolicy according to a migration request
+ */
+static struct mempolicy *
+migrate_policy(struct mempolicy *old, unsigned int *node_map)
+{
+	struct mempolicy *new;
+	DECLARE_BITMAP(old_nodes, MAX_NUMNODES);
+	DECLARE_BITMAP(new_nodes, MAX_NUMNODES);
+	struct zone *z;
+	int i;
+
+	new = kmem_cache_alloc(policy_cache, GFP_KERNEL);
+	if (!new)
+		return ERR_PTR(-ENOMEM);
+	atomic_set(&new->refcnt, 0);
+	switch(old->policy) {
+	case MPOL_DEFAULT:
+		BUG();
+	case MPOL_INTERLEAVE:
+		migrate_node_mask(new->v.nodes, old->v.nodes, node_map);
+		break;
+	case MPOL_PREFERRED:
+		if (old->v.preferred_node>=0 &&
+			(node_map[old->v.preferred_node] >= 0))
+			new->v.preferred_node = node_map[old->v.preferred_node];
+		else
+			new->v.preferred_node = old->v.preferred_node;
+		break;
+	case MPOL_BIND:
+		bitmap_zero(old_nodes, MAX_NUMNODES);
+		for (i = 0; (z = old->v.zonelist->zones[i]) != NULL; i++)
+			set_bit(z->zone_pgdat->node_id, old_nodes);
+		migrate_node_mask(new_nodes, old_nodes, node_map);
+		new->v.zonelist = bind_zonelist(new_nodes);
+		if (!new->v.zonelist) {
+			kmem_cache_free(policy_cache, new);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+	new->policy = old->policy;
+	return new;
+}
+
+/*
+ * update a process mempolicy based on a migration request
+ */
+int migrate_process_policy(struct task_struct *task, unsigned int  *node_map)
+{
+	struct mempolicy *new, *old = task->mempolicy;
+	int tmp;
+
+	if ((!old) || (old->policy == MPOL_DEFAULT))
+		return 0;
+
+	new = migrate_policy(task->mempolicy, node_map);
+	if (IS_ERR(new))
+		return (PTR_ERR(new));
+
+	mpol_get(new);
+	task->mempolicy = new;
+	mpol_free(old);
+
+	if (task->mempolicy->policy == MPOL_INTERLEAVE) {
+		/*
+		 * If the task is still running and allocating storage, this
+		 * is racy, but there is not much that can be done about it.
+		 */
+		tmp = task->il_next;
+		if (node_map[tmp] >= 0)
+			task->il_next = node_map[tmp];
+	}
+
+	return 0;
+
+}
+
+/*
+ * update a vma mempolicy based on a migration request
+ */
+int migrate_vma_policy(struct vm_area_struct *vma, unsigned int *node_map)
+{
+
+	struct mempolicy *new;
+
+	if (!vma->vm_policy || vma->vm_policy->policy == MPOL_DEFAULT)
+		return 0;
+
+	new = migrate_policy(vma->vm_policy, node_map);
+	if (IS_ERR(new))
+		return (PTR_ERR(new));
+
+	return(policy_vma(vma, new));
+}
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c	2005-06-13 11:12:50.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c	2005-06-13 11:12:51.000000000 -0700
@@ -25,6 +25,7 @@
 #include <linux/mmigrate.h>
 #include <linux/delay.h>
 #include <linux/nodemask.h>
+#include <linux/mempolicy.h>
 #include <asm/bitops.h>
 
 /*
@@ -731,12 +732,21 @@ sys_migrate_pages(pid_t pid, __u32 count
 		node_map[tmp_old_nodes[i]] = tmp_new_nodes[i];
 
 	/* prepare for lru list manipulation */
-  	smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+	smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
 	lru_add_drain();
 
+	/* update the process mempolicy, if needed */
+	ret = migrate_process_policy(task, node_map);
+	if (ret < 0)
+		goto out;
+
 	/* actually do the migration */
 	down_read(&mm->mmap_sem);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		/* update the vma mempolicy, if needed */
+		ret = migrate_vma_policy(vma, node_map);
+		if (ret < 0)
+			goto out_up_mmap_sem;
 		/* migrate the pages of this vma */
 		ret = migrate_vma(task, mm, vma, node_map);
 		if (ret < 0)

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 6/10] mm: manual page migration-rc3 -- add-mempolicy-control-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (4 preceding siblings ...)
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 7/10] mm: manual page migration-rc3 -- sys_migrate_pages-migration-selection-rc3.patch Ray Bryant
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

This patch allows a process to override the default kernel memory
migration policy (invoked via migrate_pages()) on a mapped file
by mapped file basis.

The default policy is to migrate all anonymous VMAs and all other
VMAs that have the VM_WRITE bit set.  (See the patch:
	sys_migrate_pages-migration-selection-rc4.patch
for details on how the default policy is implemented.)

This policy does not cause the program executable or any mapped
user data files that are mapped R/O to be migrated.  These problems
can be detected and fixed in the user-level migration application,
but that user code needs an interface to do the "fix".  This patch
supplies that interface via an extension to the mbind() system call.

The interface is as follows:

mbind(start, length, 0, 0, 0, MPOL_MF_DO_MMIGRATE)
mbind(start, length, 0, 0, 0, MPOL_MF_DO_NOT_MMIGRATE)

These calls override the default kernel policy in
favor of the policy specified.  These call cause the bits
AS_DO_MMIGRATTE (or AS_DO_NOT_MMIGRATE) to be set in the
memory object pointed to by the VMA at the specified addresses
in the current process's address space.  Setting such a "deep"
attribute is required so that the modification can be seen by
all address spaces that map the object.

The bits set by the above call are "sticky" in the sense that
they will remain set so long as the memory object exists.  To
return the migration policy for that memory object to its
default setting is done by the following system call:

mbind(start, length, 0, 0, 0, MPOL_MF_MMIGRATE_DEFAULT)

The system call:

get_mempolicy(&policy, NULL, 0, (int *)start, (long) MPOL_F_MMIGRATE)

returns the policy migration bits from the memory object in the bottom
two bits of "policy".

Typical use by the user-level manual page migration code would
be to:

(1)  Identify the file name whose migration policy needs modified.
(2)  Open and mmap() the file into the current address space.
(3)  Issue the appropriate mbind() call from the above list.
(4)  (Assuming a successful return), unmap() and close the file.

Note well that this interface allows the memory migration process
to modify the migration policy on a file-by-file basis for all proceses
that mmap() the specified file.  This has two implications:

(1)  All VMAs that map to the specified memory object will have
     the same migration policy applied.   There is no way to
     specify a distinct migration policy for one of the VMAs that
     map the file.

(2)  The migration policy for anonymous memory cannot be changed,
     since there is no memory object (where the migration policy
     bits are stored) in that case.

To date, we have yet to identify any case where these restrictions
would need to be overcome in the manual page migration case.

Signed-off-by:  Ray Bryant <raybry@sgi.com>
--

 include/linux/mempolicy.h |   18 +++++++++
 include/linux/pagemap.h   |    4 ++
 mm/mempolicy.c            |   84 ++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 103 insertions(+), 3 deletions(-)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c	2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c	2005-06-13 12:20:12.000000000 -0700
@@ -76,6 +76,7 @@
 #include <linux/init.h>
 #include <linux/compat.h>
 #include <linux/mempolicy.h>
+#include <linux/pagemap.h>
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
 
@@ -354,6 +355,54 @@ static int mbind_range(struct vm_area_st
 	return err;
 }
 
+static int mbind_migration_policy(struct mm_struct *mm, unsigned long start,
+				  unsigned long end, unsigned flags)
+{
+	struct vm_area_struct *first, *vma;
+	struct address_space *as;
+	int err = 0;
+
+	/* only one of these bits may be set */
+	if (hweight_long(flags & (MPOL_MF_MMIGRATE_MASK)) > 1)
+		return -EINVAL;
+
+	down_read(&mm->mmap_sem);
+	first = find_vma(mm, start);
+	if (!first) {
+		err = -EFAULT;
+		goto out;
+	}
+	for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
+		if (!vma->vm_file)
+			continue;
+		as = vma->vm_file->f_mapping;
+		BUG_ON(!as);
+		switch (flags & MPOL_MF_MMIGRATE_MASK) {
+		case MPOL_MF_DO_MMIGRATE:
+			/* only one of these bits may be set */
+			if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+				clear_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+			set_bit(AS_DO_MMIGRATE, &as->flags);
+			break;
+		case MPOL_MF_DO_NOT_MMIGRATE:
+			/* only one of these bits may be set */
+			if (test_bit(AS_DO_MMIGRATE, &as->flags))
+				clear_bit(AS_DO_MMIGRATE, &as->flags);
+			set_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+			break;
+		case MPOL_MF_MMIGRATE_DEFAULT:
+			clear_bit(AS_DO_MMIGRATE, &as->flags);
+			clear_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+			break;
+		default:
+			BUG();
+		}
+	}
+out:
+	up_read(&mm->mmap_sem);
+	return err;
+}
+
 /* Change policy for a memory range */
 asmlinkage long sys_mbind(unsigned long start, unsigned long len,
 			  unsigned long mode,
@@ -367,7 +416,7 @@ asmlinkage long sys_mbind(unsigned long 
 	DECLARE_BITMAP(nodes, MAX_NUMNODES);
 	int err;
 
-	if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
+	if ((flags & ~(unsigned long)(MPOL_MF_MASK)) || mode > MPOL_MAX)
 		return -EINVAL;
 	if (start & ~PAGE_MASK)
 		return -EINVAL;
@@ -380,6 +429,12 @@ asmlinkage long sys_mbind(unsigned long 
 	if (end == start)
 		return 0;
 
+	if (flags & MPOL_MF_MMIGRATE_MASK)
+		return mbind_migration_policy(mm, start, end, flags);
+
+	if (mode == MPOL_DEFAULT)
+		flags &= ~MPOL_MF_STRICT;
+
 	err = get_nodes(nodes, nmask, maxnode, mode);
 	if (err)
 		return err;
@@ -492,17 +547,40 @@ asmlinkage long sys_get_mempolicy(int __
 	struct vm_area_struct *vma = NULL;
 	struct mempolicy *pol = current->mempolicy;
 
-	if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
+	if (flags & ~(unsigned long)(MPOL_F_MASK))
 		return -EINVAL;
+	if ((flags & (MPOL_F_NODE | MPOL_F_ADDR)) &&
+	    (flags & MPOL_F_MMIGRATE))
+	    	return -EINVAL;
 	if (nmask != NULL && maxnode < MAX_NUMNODES)
 		return -EINVAL;
-	if (flags & MPOL_F_ADDR) {
+	if ((flags & MPOL_F_ADDR) || (flags & MPOL_F_MMIGRATE)) {
 		down_read(&mm->mmap_sem);
 		vma = find_vma_intersection(mm, addr, addr+1);
 		if (!vma) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
+		if (flags & MPOL_F_MMIGRATE) {
+			struct address_space *as;
+			err = 0;
+			if (!vma->vm_file) {
+				err = -EINVAL;
+				goto out;
+			}
+			as = vma->vm_file->f_mapping;
+			BUG_ON(!as);
+			pval = 0;
+			if (test_bit(AS_DO_MMIGRATE, &as->flags))
+				pval |= MPOL_MF_DO_MMIGRATE;
+			if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+				pval |= MPOL_MF_DO_NOT_MMIGRATE;
+			if (policy && put_user(pval, policy)) {
+				err = -EFAULT;
+				goto out;
+			}
+			goto out;
+		}
 		if (vma->vm_ops && vma->vm_ops->get_policy)
 			pol = vma->vm_ops->get_policy(vma, addr);
 		else
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mempolicy.h	2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h	2005-06-13 11:48:53.000000000 -0700
@@ -19,9 +19,27 @@
 /* Flags for get_mem_policy */
 #define MPOL_F_NODE	(1<<0)	/* return next IL mode instead of node mask */
 #define MPOL_F_ADDR	(1<<1)	/* look up vma using address */
+#define MPOL_F_MMIGRATE (1<<2)  /* return migration policy flags */
+
+#define MPOL_F_MASK (MPOL_F_NODE | MPOL_F_ADDR | MPOL_F_MMIGRATE)
 
 /* Flags for mbind */
 #define MPOL_MF_STRICT	(1<<0)	/* Verify existing pages in the mapping */
+/* FUTURE USE           (1<<1)  RESERVE for MPOL_MF_MOVE */
+/* Flags to set the migration policy for a memory range
+ * By default the kernel will memory migrate all writable VMAs
+ * (this includes anonymous memory) and the program exectuable.
+ * For non-anonymous memory, the user can change the default
+ * actions using the following flags to mbind:
+ */
+#define MPOL_MF_DO_MMIGRATE      (1<<2) /* migrate pages of this mem object */
+#define MPOL_MF_DO_NOT_MMIGRATE  (1<<3) /* don't migrate any of these pages */
+#define MPOL_MF_MMIGRATE_DEFAULT (1<<4) /* reset back to kernel default */
+
+#define MPOL_MF_MASK (MPOL_MF_STRICT | MPOL_MF_DO_MMIGRATE | \
+		      MPOL_MF_DO_NOT_MMIGRATE | MPOL_MF_MMIGRATE_DEFAULT)
+#define MPOL_MF_MMIGRATE_MASK (MPOL_MF_DO_MMIGRATE |       \
+		      MPOL_MF_DO_NOT_MMIGRATE | MPOL_MF_MMIGRATE_DEFAULT)
 
 #ifdef __KERNEL__
 
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/pagemap.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/pagemap.h	2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/pagemap.h	2005-06-13 11:48:53.000000000 -0700
@@ -19,6 +19,10 @@
 #define	AS_EIO		(__GFP_BITS_SHIFT + 0)	/* IO error on async write */
 #define AS_ENOSPC	(__GFP_BITS_SHIFT + 1)	/* ENOSPC on async write */
 
+/* (manual) memory migration control flags.  set via mbind() in mempolicy.c */
+#define AS_DO_MMIGRATE     (__GFP_BITS_SHIFT + 2)  /* migrate pages */
+#define AS_DO_NOT_MMIGRATE (__GFP_BITS_SHIFT + 3)  /* don't migrate any pages */
+
 static inline unsigned int __nocast mapping_gfp_mask(struct address_space * mapping)
 {
 	return mapping->flags & __GFP_BITS_MASK;

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 7/10] mm: manual page migration-rc3 -- sys_migrate_pages-migration-selection-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (5 preceding siblings ...)
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 6/10] mm: manual page migration-rc3 -- add-mempolicy-control-rc3.patch Ray Bryant
@ 2005-06-22 16:39 ` Ray Bryant
  2005-06-22 16:40 ` [PATCH 2.6.12-rc5 8/10] mm: manual page migration-rc3 -- sys_migrate_pages-cpuset-support-rc3.patch Ray Bryant
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:39 UTC (permalink / raw)
  To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

This patch implements the default kernel "policy" for deciding
which VMAs are to be migrated.  The default policy is:

(1)  Migrate all anonymous VMAs
(2)  Migrate all VMAs that have VM_WRITE set in vm_flags.

This is correct policy for almost all VMAs.  However, there are
a couple of cases where the above policy may need to be modified.
The mbind() interface added in the patch

	add-mempolicy-control-rc3.patch

allows user space code to modify the default policy for mapped
files on a file-by-file basis.

This patch also adds the migrate_pages() side of the support
for the mbind() policy override system call.

Signed-off-by:  Ray Bryant <raybry@sgi.com>
--

 mmigrate.c |   29 ++++++++++++++++++++---------
 1 files changed, 20 insertions(+), 9 deletions(-)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c	2005-06-13 11:12:51.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c	2005-06-13 11:12:58.000000000 -0700
@@ -601,21 +601,32 @@ migrate_vma(struct task_struct *task, st
 	unsigned long vaddr;
 	int count = 0, nr_busy;
 	LIST_HEAD(page_list);
+	struct address_space *as = NULL;
 
-	/* can't migrate mlock()'d pages */
-	if (vma->vm_flags & VM_LOCKED)
+	if ((vma->vm_flags & VM_LOCKED) || (vma->vm_flags & VM_IO))
 		return 0;
 
-	/*
-	 * gather all of the pages to be migrated from this vma into page_list
-	 */
+	/* we always migrate anonymous pages */
+	if (!vma->vm_file)
+		goto do_migrate;
+	as = vma->vm_file->f_mapping;
+	/* we have to have both AS_DO_MMIGRATE and AS_DO_MOT_MMIGRATE to
+	 * give user space full ability to override the kernel's default
+	 * migration decisions */
+	if (test_bit(AS_DO_MMIGRATE, &as->flags))
+		goto do_migrate;
+	if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+		return 0;
+	if (!(vma->vm_flags & VM_WRITE))
+		return 0;
+
+	/* gather the pages to be migrated from this vma into page_list */
+do_migrate:
 	spin_lock(&mm->page_table_lock);
  	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
 		page = follow_page(mm, vaddr, 0);
-		/*
-		 * follow_page has been known to return pages with zero mapcount
-		 * and NULL mapping.  Skip those pages as well
-		 */
+		/* follow_page has been known to return pages with zero mapcount
+		 * and NULL mapping.  Skip those pages as well */
 		if (page && page_mapcount(page)) {
 			if (node_map[page_to_nid(page)] >= 0) {
 				if (steal_page_from_lru(page_zone(page), page,

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 8/10] mm: manual page migration-rc3 -- sys_migrate_pages-cpuset-support-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (6 preceding siblings ...)
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 7/10] mm: manual page migration-rc3 -- sys_migrate_pages-migration-selection-rc3.patch Ray Bryant
@ 2005-06-22 16:40 ` Ray Bryant
  2005-06-22 16:40 ` [PATCH 2.6.12-rc5 9/10] mm: manual page migration-rc3 -- sys_migrate_pages-permissions-check-rc3.patch Ray Bryant
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:40 UTC (permalink / raw)
  To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

This patch adds cpuset support to the migrate_pages() system call.

The idea of this patch is that in order to do a migration:

(1)  The target task needs to be able to allocate pages on the
     nodes that are being migrated to.

(2)  However, the actual allocation of pages is not done by
     the target task.  Allocation is done by the task that is
     running the migrate_pages() system call.  Since it is 
     expected that the migration will be done by a batch manager
     of some kind that is authorized to control the jobs running
     in an enclosing cpuset, we make the requirement that the
     current task ALSO must be able to allocate pages on the
     nodes that are being migrated to.

Note well that if cpusets are not configured, both of these tests
become noops.

Signed-off-by: Ray Bryant <raybry@sgi.com>
--

 include/linux/cpuset.h |    8 +++++++-
 kernel/cpuset.c        |   24 +++++++++++++++++++++++-
 mm/mmigrate.c          |   24 ++++++++++++++++++++----
 3 files changed, 50 insertions(+), 6 deletions(-)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/cpuset.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/cpuset.h	2005-06-13 11:12:34.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/cpuset.h	2005-06-13 11:13:04.000000000 -0700
@@ -4,7 +4,7 @@
  *  cpuset interface
  *
  *  Copyright (C) 2003 BULL SA
- *  Copyright (C) 2004 Silicon Graphics, Inc.
+ *  Copyright (C) 2004-2005 Silicon Graphics, Inc.
  *
  */
 
@@ -24,6 +24,7 @@ void cpuset_update_current_mems_allowed(
 void cpuset_restrict_to_mems_allowed(unsigned long *nodes);
 int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
 int cpuset_zone_allowed(struct zone *z);
+extern const nodemask_t cpuset_mems_allowed(const struct task_struct *tsk);
 extern struct file_operations proc_cpuset_operations;
 extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer);
 
@@ -53,6 +54,11 @@ static inline int cpuset_zone_allowed(st
 	return 1;
 }
 
+static inline nodemask_t cpuset_mems_allowed(const struct task_struct *tsk)
+{
+	return node_possible_map;
+}
+
 static inline char *cpuset_task_status_allowed(struct task_struct *task,
 							char *buffer)
 {
Index: linux-2.6.12-rc5-mhp1-page-migration-export/kernel/cpuset.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/kernel/cpuset.c	2005-06-13 11:12:34.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/kernel/cpuset.c	2005-06-13 11:13:04.000000000 -0700
@@ -4,7 +4,7 @@
  *  Processor and Memory placement constraints for sets of tasks.
  *
  *  Copyright (C) 2003 BULL SA.
- *  Copyright (C) 2004 Silicon Graphics, Inc.
+ *  Copyright (C) 2004-2005 Silicon Graphics, Inc.
  *
  *  Portions derived from Patrick Mochel's sysfs code.
  *  sysfs is Copyright (c) 2001-3 Patrick Mochel
@@ -1500,6 +1500,28 @@ int cpuset_zone_allowed(struct zone *z)
 		node_isset(z->zone_pgdat->node_id, current->mems_allowed);
 }
 
+/**
+ * cpuset_mems_allowed - return mems_allowed mask from a tasks cpuset.
+ * @tsk: pointer to task_struct from which to obtain cpuset->mems_allowed.
+ *
+ * Description: Returns the nodemask_t mems_allowed of the cpuset
+ * attached to the specified @tsk.
+ *
+ **/
+
+const nodemask_t cpuset_mems_allowed(const struct task_struct *tsk)
+{
+	nodemask_t mask;
+
+	down(&cpuset_sem);
+	task_lock((struct task_struct *)tsk);
+	guarantee_online_mems(tsk->cpuset, &mask);
+	task_unlock((struct task_struct *)tsk);
+	up(&cpuset_sem);
+
+	return mask;
+}
+
 /*
  * proc_cpuset_show()
  *  - Print tasks cpuset path into seq_file.
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c	2005-06-13 11:12:58.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c	2005-06-13 11:13:04.000000000 -0700
@@ -26,6 +26,7 @@
 #include <linux/delay.h>
 #include <linux/nodemask.h>
 #include <linux/mempolicy.h>
+#include <linux/cpuset.h>
 #include <asm/bitops.h>
 
 /*
@@ -673,11 +674,12 @@ sys_migrate_pages(pid_t pid, __u32 count
 	int *tmp_old_nodes = NULL;
 	int *tmp_new_nodes = NULL;
 	int *node_map;
-	struct task_struct *task;
+	struct task_struct *task = NULL;
 	struct mm_struct *mm = NULL;
 	size_t size = count * sizeof(tmp_old_nodes[0]);
 	struct vm_area_struct *vma;
-	nodemask_t old_node_mask, new_node_mask;
+	nodemask_t old_node_mask, new_node_mask, target_nodes_allowed;
+	nodemask_t current_nodes_allowed;
 
 	if ((count < 1) || (count > MAX_NUMNODES))
 		return -EINVAL;
@@ -724,8 +726,10 @@ sys_migrate_pages(pid_t pid, __u32 count
 	if (task) {
 		task_lock(task);
 		mm = task->mm;
-		if (mm)
+		if (mm) {
 			atomic_inc(&mm->mm_users);
+			get_task_struct(task);
+		}
 		task_unlock(task);
 	} else {
 		ret = -ESRCH;
@@ -736,6 +740,16 @@ sys_migrate_pages(pid_t pid, __u32 count
 	if (!mm)
 		goto out_einval;
 
+	/* Obviously, the target task needs to be able to allocate on
+	 * the new set of nodes.  However, the migrated pages will
+	 * actually be allocated by the current task, so the current
+	 * task has to be able to allocate on those nodes as well */
+	target_nodes_allowed = cpuset_mems_allowed(task);
+	current_nodes_allowed = cpuset_mems_allowed(current);
+	if (!nodes_subset(new_node_mask, target_nodes_allowed) ||
+	    !nodes_subset(new_node_mask, current_nodes_allowed))
+		goto out_einval;
+
 	/* set up the node_map array */
 	for (i = 0; i < MAX_NUMNODES; i++)
 		node_map[i] = -1;
@@ -768,8 +782,10 @@ sys_migrate_pages(pid_t pid, __u32 count
 	ret = migrated;
 
 out:
-	if (mm)
+	if (mm) {
 		mmput(mm);
+		put_task_struct(task);
+	}
 
 	kfree(tmp_old_nodes);
 	kfree(tmp_new_nodes);

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 9/10] mm: manual page migration-rc3 -- sys_migrate_pages-permissions-check-rc3.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (7 preceding siblings ...)
  2005-06-22 16:40 ` [PATCH 2.6.12-rc5 8/10] mm: manual page migration-rc3 -- sys_migrate_pages-cpuset-support-rc3.patch Ray Bryant
@ 2005-06-22 16:40 ` Ray Bryant
  2005-06-22 16:40 ` [PATCH 2.6.12-rc5 10/10] mm: manual page migration-rc3 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
  2005-06-23 21:31 ` [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Christoph Lameter
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:40 UTC (permalink / raw)
  To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

Add permissions checking to migrate_pages() system call.
The basic idea is that if you could send an arbitary
signal to a process then you are allowed to migrate
that process, or if the calling process has capability
CAP_SYS_ADMIN.  The permissions check is based
on that in check_kill_permission() in kernel/signal.c.

Signed-off-by: Ray Bryant <raybry@sgi.com>
--

 include/linux/capability.h |    2 ++
 mm/mmigrate.c              |   12 ++++++++++++
 2 files changed, 14 insertions(+)

Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/capability.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/capability.h	2005-06-13 11:12:33.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/capability.h	2005-06-13 11:13:09.000000000 -0700
@@ -233,6 +233,8 @@ typedef __u32 kernel_cap_t;
 /* Allow enabling/disabling tagged queuing on SCSI controllers and sending
    arbitrary SCSI commands */
 /* Allow setting encryption key on loopback filesystem */
+/* Allow using the migrate_pages() system call to migrate a process's pages
+   from one set of NUMA nodes to another */
 
 #define CAP_SYS_ADMIN        21
 
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c	2005-06-13 11:13:04.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c	2005-06-13 11:13:09.000000000 -0700
@@ -15,6 +15,8 @@
 #include <linux/module.h>
 #include <linux/swap.h>
 #include <linux/pagemap.h>
+#include <linux/sched.h>
+#include <linux/capability.h>
 #include <linux/init.h>
 #include <linux/highmem.h>
 #include <linux/writeback.h>
@@ -725,6 +727,16 @@ sys_migrate_pages(pid_t pid, __u32 count
 	task = find_task_by_pid(pid);
 	if (task) {
 		task_lock(task);
+		/* does this task have permission to migrate that task?
+		 * (ala check_kill_permission() ) */
+	        if ((current->euid ^ task->suid) && (current->euid ^ task->uid)
+	           && (current->uid ^ task->suid) && (current->uid ^ task->uid)
+	           && !capable(CAP_SYS_ADMIN)) {
+		   	ret = -EPERM;
+			task_unlock(task);
+			read_unlock(&tasklist_lock);
+			goto out;
+		}
 		mm = task->mm;
 		if (mm) {
 			atomic_inc(&mm->mm_users);

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2.6.12-rc5 10/10] mm: manual page migration-rc3 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (8 preceding siblings ...)
  2005-06-22 16:40 ` [PATCH 2.6.12-rc5 9/10] mm: manual page migration-rc3 -- sys_migrate_pages-permissions-check-rc3.patch Ray Bryant
@ 2005-06-22 16:40 ` Ray Bryant
  2005-06-23 21:31 ` [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Christoph Lameter
  10 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-22 16:40 UTC (permalink / raw)
  To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
  Cc: Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel, Ray Bryant,
	Paul Jackson, Nathan Scott

Manual page migration adds a nodemap arg to try_to_migrate_pages().
The nodemap specifies where pages found on a particular node are to
be migrated.  If all you want to do is to migrate the page off of
the current node, then you specify the nodemap argument as NULL.

Add the NULL to the try_to_migrate_pages() invocation.

This patch should be added to the Memory Hotplug series after patch
N1.1-pass-page_list-to-steal_page.patch (for 2.6.12-rc5-mhp1).

Signed-off-by: Ray Bryant <raybry@sgi.com>
--

 page_alloc.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.12-rc5-mhp1-memory-hotplug/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mhp1-memory-hotplug.orig/mm/page_alloc.c	2005-06-21 10:43:14.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-memory-hotplug/mm/page_alloc.c	2005-06-21 10:43:14.000000000 -0700
@@ -823,7 +823,7 @@ retry:
 	on_each_cpu(lru_drain_schedule, NULL, 1, 1);
 
 	rest = grab_capturing_pages(&page_list, start_pfn, nr_pages);
-	remains = try_to_migrate_pages(&page_list);
+	remains = try_to_migrate_pages(&page_list, NULL);
 	if (rest || !list_empty(&page_list)) {
 		if (remains == -ENOSPC) {
 			/* A swap device should be added. */

-- 
Best Regards,
Ray
-----------------------------------------------
Ray Bryant                       raybry@sgi.com
The box said: "Requires Windows 98 or better",
           so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch Ray Bryant
@ 2005-06-22 17:23   ` Dave Hansen
  2005-06-23  1:34     ` Ray Bryant
  2005-06-25 10:32   ` Hirokazu Takahashi
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2005-06-22 17:23 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms, Paul Jackson,
	Nathan Scott

On Wed, 2005-06-22 at 09:39 -0700, Ray Bryant wrote:
> +asmlinkage long
> +sys_migrate_pages(pid_t pid, __u32 count, __u32 *old_nodes, __u32 *new_nodes)
> +{

Should the buffers be marked __user?

> +       if ((count < 1) || (count > MAX_NUMNODES))
> +               return -EINVAL;

Since you have an out_einval:, it's probably best to use it
consistently.  There is another place or two like this.

> +       for (i = 0; i < count; i++) {
> +               int n;
> +
> +               n = tmp_old_nodes[i];
> +               if ((n < 0) || (n >= MAX_NUMNODES))
> +                       goto out_einval;
> +               node_set(n, old_node_mask);
> +
> +               n = tmp_new_nodes[i];
> +               if ((n < 0) || (n >= MAX_NUMNODES) || !node_online(n))
> +                       goto out_einval;
> +               node_set(n, new_node_mask);
> +
> +       }

I know it's a simple operation, but I think I'd probably break out the
array validation into its own function.

Then, replace the above loop with this:

if (!migrate_masks_valid(tmp_old_nodes, count) ||
     migrate_masks_valid(tmp_old_nodes, count))
	goto out_einval;

for (i = 0; i < count; i++) {
	node_set(tmp_old_nodes[i], old_node_mask);
	node_set(tmp_new_nodes[i], new_node_mask);
}

> +static int
> +migrate_vma(struct task_struct *task, struct mm_struct *mm,
> +       struct vm_area_struct *vma, int *node_map)
...
> +       spin_lock(&mm->page_table_lock);
> +       for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
> +               page = follow_page(mm, vaddr, 0);
> +               /*
> +                * follow_page has been known to return pages with zero mapcount
> +                * and NULL mapping.  Skip those pages as well
> +                */
> +               if (page && page_mapcount(page)) {
> +                       if (node_map[page_to_nid(page)] >= 0) {
> +                               if (steal_page_from_lru(page_zone(page), page,
> +                                       &page_list))
> +                                               count++;
> +                               else
> +                                       BUG();
> +                       }
> +               }
> +       }
> +       spin_unlock(&mm->page_table_lock);

Personally, I dislike having so many embedded ifs, especially in a for
loop like that.  I think it's a lot more logical to code it up as a
series of continues, mostly because it's easy to read a continue as,
"skip this page."  You can't always see that as easily with an if().  It
also makes it so that you don't have to wrap the steal_page_from_lru()
call across two lines, which is super-ugly. :)

for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
	page = follow_page(mm, vaddr, 0);
	if (!page || !page_mapcount(page))
		continue;

	if (node_map[page_to_nid(page)] < 0)
		continue;

	if (steal_page_from_lru(page_zone(page), page, &page_list));
		count++;
	else
		BUG();
}

The same kind of thing goes for this if: 

> +       /* call the page migration code to move the pages */
> +       if (count) {
> +               nr_busy = try_to_migrate_pages(&page_list, node_map);
> +
> +               if (nr_busy < 0)
> +                       return nr_busy;
> +
> +               if (nr_busy == 0)
> +                       return count;
> +
> +               /* return the unmigrated pages to the LRU lists */
> +               list_for_each_entry_safe(page, page2, &page_list, lru)
> {
> +                       list_del(&page->lru);
> +                       putback_page_to_lru(page_zone(page), page);
> +               }
> +               return -EAGAIN;
> +       }
> +
> +       return 0;

It looks a lot cleaner if you just do 

	if (!count)
		return count;

	... contents of the if(){} block go here

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch Ray Bryant
@ 2005-06-22 17:30   ` Joel Schopp
  2005-06-23  4:01   ` Nathan Scott
  1 sibling, 0 replies; 26+ messages in thread
From: Joel Schopp @ 2005-06-22 17:30 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel,
	Paul Jackson, Nathan Scott

> However, the routine "xfs_skip_migrate_page()" is added to
> disallow migration of xfs metadata.

On ppc64 we are aiming to eventually be able to migrate ALL data.  I 
understand we aren't nearly there yet.  I'd like to keep track of what 
we need to do to get there.  What do we need to do to be able to migrate 
xfs metadata?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch
  2005-06-22 17:23   ` Dave Hansen
@ 2005-06-23  1:34     ` Ray Bryant
  2005-06-23  1:42       ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Ray Bryant @ 2005-06-23  1:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ray Bryant, Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms, Paul Jackson,
	Nathan Scott

Dave Hansen wrote:
> On Wed, 2005-06-22 at 09:39 -0700, Ray Bryant wrote:
> 
>>+asmlinkage long
>>+sys_migrate_pages(pid_t pid, __u32 count, __u32 *old_nodes, __u32 *new_nodes)
>>+{
> 
> 
> Should the buffers be marked __user?
> 

I've tried it both ways, but with the __user in the system call declaration,
you still need to have it on the copy_from_user() calls to get sparse to
shut up, so it really doesn't appear to help much to put it in the 
declaration.  I'm easy though.  If you think it helps, I'll add it.

> 
>>+       if ((count < 1) || (count > MAX_NUMNODES))
>>+               return -EINVAL;
> 
> 
> Since you have an out_einval:, it's probably best to use it
> consistently.  There is another place or two like this.
>

Good point.  I looked for other places like this and didn't find any, though.

> 
>>+       for (i = 0; i < count; i++) {
>>+               int n;
>>+
>>+               n = tmp_old_nodes[i];
>>+               if ((n < 0) || (n >= MAX_NUMNODES))
>>+                       goto out_einval;
>>+               node_set(n, old_node_mask);
>>+
>>+               n = tmp_new_nodes[i];
>>+               if ((n < 0) || (n >= MAX_NUMNODES) || !node_online(n))
>>+                       goto out_einval;
>>+               node_set(n, new_node_mask);
>>+
>>+       }
> 
> 
> I know it's a simple operation, but I think I'd probably break out the
> array validation into its own function.
> 
> Then, replace the above loop with this:
> 
> if (!migrate_masks_valid(tmp_old_nodes, count) ||
>      migrate_masks_valid(tmp_old_nodes, count))
> 	goto out_einval;
> 
> for (i = 0; i < count; i++) {
> 	node_set(tmp_old_nodes[i], old_node_mask);
> 	node_set(tmp_new_nodes[i], new_node_mask);
> }
> 
> 
>>+static int
>>+migrate_vma(struct task_struct *task, struct mm_struct *mm,
>>+       struct vm_area_struct *vma, int *node_map)
> 
> ...
> 

ok.

>>+       spin_lock(&mm->page_table_lock);
>>+       for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
>>+               page = follow_page(mm, vaddr, 0);
>>+               /*
>>+                * follow_page has been known to return pages with zero mapcount
>>+                * and NULL mapping.  Skip those pages as well
>>+                */
>>+               if (page && page_mapcount(page)) {
>>+                       if (node_map[page_to_nid(page)] >= 0) {
>>+                               if (steal_page_from_lru(page_zone(page), page,
>>+                                       &page_list))
>>+                                               count++;
>>+                               else
>>+                                       BUG();
>>+                       }
>>+               }
>>+       }
>>+       spin_unlock(&mm->page_table_lock);
> 
> 
> Personally, I dislike having so many embedded ifs, especially in a for
> loop like that.  I think it's a lot more logical to code it up as a
> series of continues, mostly because it's easy to read a continue as,
> "skip this page."  You can't always see that as easily with an if().  It
> also makes it so that you don't have to wrap the steal_page_from_lru()
> call across two lines, which is super-ugly. :)

ok, but I had to shorten page_list to pglist go get it to fit in 80 columns,
anyway.

> 
> for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
> 	page = follow_page(mm, vaddr, 0);
> 	if (!page || !page_mapcount(page))
> 		continue;
> 
> 	if (node_map[page_to_nid(page)] < 0)
> 		continue;
> 
> 	if (steal_page_from_lru(page_zone(page), page, &page_list));
> 		count++;
> 	else
> 		BUG();
> }
> 
> The same kind of thing goes for this if: 
> 
> 
>>+       /* call the page migration code to move the pages */
>>+       if (count) {
>>+               nr_busy = try_to_migrate_pages(&page_list, node_map);
>>+
>>+               if (nr_busy < 0)
>>+                       return nr_busy;
>>+
>>+               if (nr_busy == 0)
>>+                       return count;
>>+
>>+               /* return the unmigrated pages to the LRU lists */
>>+               list_for_each_entry_safe(page, page2, &page_list, lru)
>>{
>>+                       list_del(&page->lru);
>>+                       putback_page_to_lru(page_zone(page), page);
>>+               }
>>+               return -EAGAIN;
>>+       }
>>+
>>+       return 0;
> 
> 
> It looks a lot cleaner if you just do 
> 
> 	if (!count)
> 		return count;
> 
> 	... contents of the if(){} block go here
>

ok.

> -- Dave
> 
> 

Let me make the changes and I'll send out a new set of patches in a bit.
-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch
  2005-06-23  1:34     ` Ray Bryant
@ 2005-06-23  1:42       ` Dave Hansen
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2005-06-23  1:42 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Ray Bryant, Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms, Paul Jackson,
	Nathan Scott

On Wed, 2005-06-22 at 20:34 -0500, Ray Bryant wrote:
> Dave Hansen wrote:
> > On Wed, 2005-06-22 at 09:39 -0700, Ray Bryant wrote:
> > 
> >>+asmlinkage long
> >>+sys_migrate_pages(pid_t pid, __u32 count, __u32 *old_nodes, __u32 *new_nodes)
> >>+{
> >  
> > Should the buffers be marked __user?
> > 
> 
> I've tried it both ways, but with the __user in the system call declaration,
> you still need to have it on the copy_from_user() calls to get sparse to
> shut up, so it really doesn't appear to help much to put it in the 
> declaration.  I'm easy though.  If you think it helps, I'll add it.

Looking at fs/read_write.c, the convention seems to be to put them in
the function declaration.  That's all that I was looking at.  No big
deal.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch Ray Bryant
@ 2005-06-23  1:51   ` Andi Kleen
  2005-06-23 20:59     ` [Lhms-devel] " Ray Bryant
  0 siblings, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2005-06-23  1:51 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel,
	Paul Jackson, Nathan Scott

On Wed, Jun 22, 2005 at 09:39:41AM -0700, Ray Bryant wrote:
> This patch adds code that translates the memory policy structures
> as they are encountered so that they continue to represent where
> memory should be allocated after the page migration has completed.


That won't work for shared memory objects though (which store
their mempolicies separately). Is that intended?

> +
> +	if (task->mempolicy->policy == MPOL_INTERLEAVE) {
> +		/*
> +		 * If the task is still running and allocating storage, this
> +		 * is racy, but there is not much that can be done about it.
> +		 */
> +		tmp = task->il_next;
> +		if (node_map[tmp] >= 0)
> +			task->il_next = node_map[tmp];

RCU (synchronize_kernel) could do better, but that might be slow. However the 
code might BUG when il_next ends up in a node that is not part of 
the policy anymore. Have you checked that?  

-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch Ray Bryant
  2005-06-22 17:30   ` [Lhms-devel] " Joel Schopp
@ 2005-06-23  4:01   ` Nathan Scott
  1 sibling, 0 replies; 26+ messages in thread
From: Nathan Scott @ 2005-06-23  4:01 UTC (permalink / raw)
  To: Ray Bryant, Joel Schopp
  Cc: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel,
	Paul Jackson

On Wed, Jun 22, 2005 at 12:30:41 PM -0500, Joel Schopp wrote:
> >However, the routine "xfs_skip_migrate_page()" is added to
> >disallow migration of xfs metadata.
>
> On ppc64 we are aiming to eventually be able to migrate ALL data.  I
> understand we aren't nearly there yet.  I'd like to keep track of what
> we need to do to get there.  What do we need to do to be able to migrate
> xfs metadata?

I guess we'd effectively have to do a fs "freeze" (freeze_bdev)
to prevent new metadata buffers springing into existence, then 
flush out all metadata for the filesystem in question and toss
the associated page cache pages (this is part of the existing
umount behaviour already though).  Then a "thaw" to get the
filesystem to spring back into life.

Its just a Simple Matter Of Programming.  :)

cheers.

-
Nathan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch
  2005-06-23  1:51   ` Andi Kleen
@ 2005-06-23 20:59     ` Ray Bryant
  2005-06-23 21:05       ` Andi Kleen
  0 siblings, 1 reply; 26+ messages in thread
From: Ray Bryant @ 2005-06-23 20:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ray Bryant, Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel,
	Paul Jackson, Nathan Scott

Andi Kleen wrote:
> On Wed, Jun 22, 2005 at 09:39:41AM -0700, Ray Bryant wrote:
> 
>>This patch adds code that translates the memory policy structures
>>as they are encountered so that they continue to represent where
>>memory should be allocated after the page migration has completed.
> 
> 
> 
> That won't work for shared memory objects though (which store
> their mempolicies separately). Is that intended?
> 

No, it looks like I dropped the ball there.  I thought that the
vma->vm_policy field was used in that case as well, but it appears
that the policy is looked up in the tree every time it is used.
(Can that be right?)  If so, I need to do something else.

Anyway, I shouldn't be updating the vma policy if I am also not
migrating the VMA, so there is some work there that needs to be
done as well.  (The update to the per vma policy needs to be moved
into migrate_vma()).

> 
>>+
>>+	if (task->mempolicy->policy == MPOL_INTERLEAVE) {
>>+		/*
>>+		 * If the task is still running and allocating storage, this
>>+		 * is racy, but there is not much that can be done about it.
>>+		 */
>>+		tmp = task->il_next;
>>+		if (node_map[tmp] >= 0)
>>+			task->il_next = node_map[tmp];
> 
> 
> RCU (synchronize_kernel) could do better, but that might be slow. However the 
> code might BUG when il_next ends up in a node that is not part of 
> the policy anymore. Have you checked that?  
> 
> -Andi
> 

I don't think this particular case will bug().  The worst thing that could
happen, as I read the code is that if we change the policy at the same time
that a page is being allocated via the interleaved policy, that one page
could be allocated on a node according to the old policy even after the
policy has been updated.

(That is, we update the policy and before task->il_next can be updated
to match the new policy, a page gets allocated.)  Since we update the
policy, then migrate the pages, then that one page will get migrated
anyway, so as near as I can tell this is not a problem.

However, (looking at the code some more) there is a different case where a
BUG() could be called.  That is in offset_il_node().  If the node mask
(p->v.nodes) changes after the last find_next_bit() and before the
BUG_ON(!test_bit(nid, pol->v.nodes)), then the system could BUG() because
of the policy migration.

A simple solution to this would be to delete that BUG_ON().  :-)
(Is this required?  It looks almost like a debugging statement.)

In that case, we have the same kind of situation as with the il->next
case, that is, if a process is actively allocating storage at the same
time as we do a migration, then one page (per vma?) could be allocated
on the old set of nodes after the policy is updated.  However, since
we update the policy first, then migrate the pages, it still seems to
me that all such pages will get migrated to the new nodes.

Unfortunately, I've not tested this.  For the cases I am looking at
we suspend the task before migration and resume it after.  Indeed,
the system call in question will sometimes fail (the migrated process
will die) it we don't suspend/resume the migrated tasks.  I was hoping
that would be good enough, but if migrating non-suspended tasks is
thought to be important, then I will go fix that as well.  (The
unresolved issues paragraph in the note I sent out about this patch
points out this issue.)

I don't see any other BUG() calls that could be tripped by changing
the node mask underneath a process that is actively allocating
storage, at least not in mempolicy.c.  Am I overlooking something?

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch
  2005-06-23 20:59     ` [Lhms-devel] " Ray Bryant
@ 2005-06-23 21:05       ` Andi Kleen
  2005-06-25  5:11         ` Ray Bryant
  0 siblings, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2005-06-23 21:05 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Andi Kleen, Ray Bryant, Hirokazu Takahashi, Dave Hansen,
	Marcelo Tosatti, Christoph Hellwig, Ray Bryant, linux-mm,
	lhms-devel, Paul Jackson, Nathan Scott

On Thu, Jun 23, 2005 at 03:59:47PM -0500, Ray Bryant wrote:
> No, it looks like I dropped the ball there.  I thought that the
> vma->vm_policy field was used in that case as well, but it appears
> that the policy is looked up in the tree every time it is used.
> (Can that be right?)  If so, I need to do something else.

Yes, it's like this. I had it originally in vm_policy in this case,
but there were too many corner cases to handle when changing policies
(splitting VMAs of remote processes when a policy is changedetc.), so I 
eventually settled on this.

On the other hand tmpfs is not really memory belonging to a single
process only so it is not clear if process migration should touch
should a shared resource.

> A simple solution to this would be to delete that BUG_ON().  :-)
> (Is this required?  It looks almost like a debugging statement.)

Yes, removing it would be fine.

> I don't see any other BUG() calls that could be tripped by changing
> the node mask underneath a process that is actively allocating
> storage, at least not in mempolicy.c.  Am I overlooking something?

Don't think so.

-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview
  2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
                   ` (9 preceding siblings ...)
  2005-06-22 16:40 ` [PATCH 2.6.12-rc5 10/10] mm: manual page migration-rc3 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
@ 2005-06-23 21:31 ` Christoph Lameter
  2005-06-23 23:00   ` Ray Bryant
  2005-06-24 14:15   ` [Lhms-devel] " Ray Bryant
  10 siblings, 2 replies; 26+ messages in thread
From: Christoph Lameter @ 2005-06-23 21:31 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel,
	Paul Jackson, Nathan Scott

On Wed, 22 Jun 2005, Ray Bryant wrote:

> (1)  This version of migrate_pages() works reliably only when the
>      process to be migrated has been stopped (e. g., using SIGSTOP)
>      before the migrate_pages() system call is executed. 
>      (The system doesn't crash or oops, but sometimes the process
>      being migrated will be "Killed by VM" when it starts up again.
>      There may be a few messages put into the log as well at that time.)
> 
>      At the moment, I am proposing that processes need to be
>      suspended before being migrated.  This really should not
>      be a performance conern, since the delay imposed by page
>      migration far exceeds any delay imposed by SIGSTOPing the
>      processes before migration and SIGCONTinuing them afterward.

There is PF_FREEZE flag used by the suspend feature that could 
be used here to send the process into the "freezer" first. Using regular 
signals to stop a process may cause races with user space code also doing
SIGSTOP SIGCONT on a process while migrating it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview
  2005-06-23 21:31 ` [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Christoph Lameter
@ 2005-06-23 23:00   ` Ray Bryant
  2005-06-23 23:03     ` Christoph Lameter
  2005-06-24 14:15   ` [Lhms-devel] " Ray Bryant
  1 sibling, 1 reply; 26+ messages in thread
From: Ray Bryant @ 2005-06-23 23:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ray Bryant, Hirokazu Takahashi, Andi Kleen, Dave Hansen,
	Marcelo Tosatti, Christoph Hellwig, Ray Bryant, linux-mm,
	lhms-devel, Paul Jackson, Nathan Scott

Christoph Lameter wrote:
> On Wed, 22 Jun 2005, Ray Bryant wrote:
> 
> 
>>(1)  This version of migrate_pages() works reliably only when the
>>     process to be migrated has been stopped (e. g., using SIGSTOP)
>>     before the migrate_pages() system call is executed. 
>>     (The system doesn't crash or oops, but sometimes the process
>>     being migrated will be "Killed by VM" when it starts up again.
>>     There may be a few messages put into the log as well at that time.)
>>
>>     At the moment, I am proposing that processes need to be
>>     suspended before being migrated.  This really should not
>>     be a performance conern, since the delay imposed by page
>>     migration far exceeds any delay imposed by SIGSTOPing the
>>     processes before migration and SIGCONTinuing them afterward.
> 
> 
> There is PF_FREEZE flag used by the suspend feature that could 
> be used here to send the process into the "freezer" first. Using regular 
> signals to stop a process may cause races with user space code also doing
> SIGSTOP SIGCONT on a process while migrating it.
> 
> 

Christoph,

So are you suggesting that I set PF_FREEZE, wait until PF_FROZEN is set as
well, then migrate the pages, and then clear PF_FROZEN to resume the task?

I guess that might work, unless we're actually running on a laptop and it
goes into hibernation at the same time we are trying to do a migration....

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview
  2005-06-23 23:00   ` Ray Bryant
@ 2005-06-23 23:03     ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2005-06-23 23:03 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Ray Bryant, Hirokazu Takahashi, Andi Kleen, Dave Hansen,
	Marcelo Tosatti, Christoph Hellwig, Ray Bryant, linux-mm,
	lhms-devel, Paul Jackson, Nathan Scott

On Thu, 23 Jun 2005, Ray Bryant wrote:

> > There is PF_FREEZE flag used by the suspend feature that could be used here
> > to send the process into the "freezer" first. Using regular signals to stop
> > a process may cause races with user space code also doing
> > SIGSTOP SIGCONT on a process while migrating it.
> 
> So are you suggesting that I set PF_FREEZE, wait until PF_FROZEN is set as
> well, then migrate the pages, and then clear PF_FROZEN to resume the task?

Yes.

> I guess that might work, unless we're actually running on a laptop and it
> goes into hibernation at the same time we are trying to do a migration....

You can atomically set the PF_FREEZE flag.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview
  2005-06-23 21:31 ` [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Christoph Lameter
  2005-06-23 23:00   ` Ray Bryant
@ 2005-06-24 14:15   ` Ray Bryant
  2005-06-24 15:41     ` Christoph Lameter
  1 sibling, 1 reply; 26+ messages in thread
From: Ray Bryant @ 2005-06-24 14:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ray Bryant, Hirokazu Takahashi, Andi Kleen, Dave Hansen,
	Marcelo Tosatti, Christoph Hellwig, Ray Bryant, linux-mm,
	lhms-devel, Paul Jackson, Nathan Scott

Christoph Lameter wrote:

> 
> 
> There is PF_FREEZE flag used by the suspend feature that could 
> be used here to send the process into the "freezer" first. Using regular 
> signals to stop a process may cause races with user space code also doing
> SIGSTOP SIGCONT on a process while migrating it.
> 
> 

In general, process flags are only updatable by the current process.
There is no locking applied.  Having the migrating task set the PF_FREEZE
bit in the migrated process runs the risk of losing the update to some other
flags bit that is simultaneously set by the (running) migrated process.

I suppose this could be fixed as well by introducing a second flags word
in the task_struct.  But this starts to sound like a reimplemtnation of
signals.

The other concern (probably not a problem on Altix  :-) ), is what happens
if a process migration is underway at the time of a suspend.  When the
resume occurs, all processes will be unfrozen, including the task that
is under migration.

At the moment, I'm not convinced that this is a better path than depending
on SIGSTOP/SIGCONT.  It is a resonable restriction that processes eligble for
migration are not allowed to use those signals themselves, in particular for
the batch environment this is targeted at.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview
  2005-06-24 14:15   ` [Lhms-devel] " Ray Bryant
@ 2005-06-24 15:41     ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2005-06-24 15:41 UTC (permalink / raw)
  To: Ray Bryant
  Cc: Ray Bryant, Hirokazu Takahashi, Andi Kleen, Dave Hansen,
	Marcelo Tosatti, Christoph Hellwig, Ray Bryant, linux-mm,
	lhms-devel, Paul Jackson, Nathan Scott

On Fri, 24 Jun 2005, Ray Bryant wrote:

> In general, process flags are only updatable by the current process.
> There is no locking applied.  Having the migrating task set the PF_FREEZE
> bit in the migrated process runs the risk of losing the update to some other
> flags bit that is simultaneously set by the (running) migrated process.

Look at freeze_processes(). It takes a read lock on tasklist_lock. So if 
you take a write lock on tasklist lock then you could be safe that no 
other process sets PF_FREEZE while migrating.

Maybe we could downgrade that to a readlock if we would modify 
freeze_processes to to a test and test and use a test and set during migration.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch
  2005-06-23 21:05       ` Andi Kleen
@ 2005-06-25  5:11         ` Ray Bryant
  0 siblings, 0 replies; 26+ messages in thread
From: Ray Bryant @ 2005-06-25  5:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ray Bryant, Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti,
	Christoph Hellwig, Ray Bryant, linux-mm, lhms-devel,
	Paul Jackson, Nathan Scott

Andi Kleen wrote:

> 
> On the other hand tmpfs is not really memory belonging to a single
> process only so it is not clear if process migration should touch
> should a shared resource.
> 

I think the way this should work is as follows:  if a VMA maps a
shared object, and it meets the criterion for being a migratable
VMA (e. g. vm_write is set), then we migrate the data and the
policy.

This isn't perfect, since pages in the shared object that are not
mapped won't be migrated.  Perhaps we need a utility to fix that
up after the fact.


-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch
  2005-06-22 16:39 ` [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch Ray Bryant
  2005-06-22 17:23   ` Dave Hansen
@ 2005-06-25 10:32   ` Hirokazu Takahashi
  1 sibling, 0 replies; 26+ messages in thread
From: Hirokazu Takahashi @ 2005-06-25 10:32 UTC (permalink / raw)
  To: raybry
  Cc: marcelo.tosatti, ak, haveblue, hch, raybry, linux-mm, lhms-devel,
	pj, nathans

Hi Ray,

> +static int
> +migrate_vma(struct task_struct *task, struct mm_struct *mm,
> +	struct vm_area_struct *vma, int *node_map)
> +{
> +	struct page *page, *page2;
> +	unsigned long vaddr;
> +	int count = 0, nr_busy;
> +	LIST_HEAD(page_list);
> +
> +	/* can't migrate mlock()'d pages */
> +	if (vma->vm_flags & VM_LOCKED)
> +		return 0;
> +
> +	/*
> +	 * gather all of the pages to be migrated from this vma into page_list
> +	 */
> +	spin_lock(&mm->page_table_lock);
> + 	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
> +		page = follow_page(mm, vaddr, 0);
> +		/*
> +		 * follow_page has been known to return pages with zero mapcount
> +		 * and NULL mapping.  Skip those pages as well
> +		 */
> +		if (page && page_mapcount(page)) {
> +			if (node_map[page_to_nid(page)] >= 0) {
> +				if (steal_page_from_lru(page_zone(page), page,
> +					&page_list))
> +						count++;
> +				else
> +					BUG();
> +			}
> +		}
> +	}
> +	spin_unlock(&mm->page_table_lock);

I think you shouldn't call BUG() here because the swap code can remove
any pages from the LRU lists at any moment even though mm->page_table_lock
is held.

The preferable code would be:

		if (page && page_mapcount(page)) {
			if (node_map[page_to_nid(page)] >= 0) {
				if (steal_page_from_lru(page_zone(page), page,
					&page_list))
						count++;
				else
					continue;
			}
		}

Thanks,
Hirokazu Takahashi.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2005-06-25 10:32 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-22 16:39 [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Ray Bryant
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 1/10] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 2/10] mm: manual page migration-rc3 -- xfs-migrate-page-rc3.patch Ray Bryant
2005-06-22 17:30   ` [Lhms-devel] " Joel Schopp
2005-06-23  4:01   ` Nathan Scott
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 3/10] mm: manual page migration-rc3 -- add-node_map-arg-to-try_to_migrate_pages-rc3.patch Ray Bryant
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 4/10] mm: manual page migration-rc3 -- add-sys_migrate_pages-rc3.patch Ray Bryant
2005-06-22 17:23   ` Dave Hansen
2005-06-23  1:34     ` Ray Bryant
2005-06-23  1:42       ` Dave Hansen
2005-06-25 10:32   ` Hirokazu Takahashi
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 5/10] mm: manual page migration-rc3 -- sys_migrate_pages-mempolicy-migration-rc3.patch Ray Bryant
2005-06-23  1:51   ` Andi Kleen
2005-06-23 20:59     ` [Lhms-devel] " Ray Bryant
2005-06-23 21:05       ` Andi Kleen
2005-06-25  5:11         ` Ray Bryant
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 6/10] mm: manual page migration-rc3 -- add-mempolicy-control-rc3.patch Ray Bryant
2005-06-22 16:39 ` [PATCH 2.6.12-rc5 7/10] mm: manual page migration-rc3 -- sys_migrate_pages-migration-selection-rc3.patch Ray Bryant
2005-06-22 16:40 ` [PATCH 2.6.12-rc5 8/10] mm: manual page migration-rc3 -- sys_migrate_pages-cpuset-support-rc3.patch Ray Bryant
2005-06-22 16:40 ` [PATCH 2.6.12-rc5 9/10] mm: manual page migration-rc3 -- sys_migrate_pages-permissions-check-rc3.patch Ray Bryant
2005-06-22 16:40 ` [PATCH 2.6.12-rc5 10/10] mm: manual page migration-rc3 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
2005-06-23 21:31 ` [PATCH 2.6.12-rc5 0/10] mm: manual page migration-rc3 -- overview Christoph Lameter
2005-06-23 23:00   ` Ray Bryant
2005-06-23 23:03     ` Christoph Lameter
2005-06-24 14:15   ` [Lhms-devel] " Ray Bryant
2005-06-24 15:41     ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.