linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] Avoid putting a bad page back on the LRU
@ 2009-04-08  0:11 Russ Anderson
  2009-04-08  3:43 ` Ingo Oeser
  0 siblings, 1 reply; 4+ messages in thread
From: Russ Anderson @ 2009-04-08  0:11 UTC (permalink / raw)
  To: linux-kernel, linux-mm, x86; +Cc: Russ Anderson

Prevent a page with a physical memory error from being placed back
on the LRU.  This patch applies on top of Andi Kleen's POISON
patchset.


Signed-off-by: Russ Anderson <rja@sgi.com>

---
 include/linux/page-flags.h |    8 +++++++-
 mm/migrate.c               |   39 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 2 deletions(-)

Index: linux-next/mm/migrate.c
===================================================================
--- linux-next.orig/mm/migrate.c	2009-04-07 18:32:12.781949840 -0500
+++ linux-next/mm/migrate.c	2009-04-07 18:34:19.169736260 -0500
@@ -72,6 +72,7 @@ int putback_lru_pages(struct list_head *
 	}
 	return count;
 }
+EXPORT_SYMBOL(isolate_lru_page);
 
 /*
  * Restore a potential migration pte to a working pte entry
@@ -139,6 +140,7 @@ static void remove_migration_pte(struct 
 out:
 	pte_unmap_unlock(ptep, ptl);
 }
+EXPORT_SYMBOL(migrate_prep);
 
 /*
  * Note that remove_file_migration_ptes will only work on regular mappings,
@@ -161,6 +163,7 @@ static void remove_file_migration_ptes(s
 
 	spin_unlock(&mapping->i_mmap_lock);
 }
+EXPORT_SYMBOL(putback_lru_pages);
 
 /*
  * Must hold mmap_sem lock on at least one of the vmas containing
@@ -693,6 +696,26 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
+#ifdef CONFIG_MEMORY_FAILURE
+		if (PagePoison(page)) {
+			if (rc == 0)
+				/*
+				 * A page with a memory error that has
+				 * been migrated will not be moved to
+				 * the LRU.
+				 */
+				goto move_newpage;
+			else
+				/*
+				 * The page failed to migrate and will not
+				 * be added to the bad page list.  Clearing
+				 * the error bit will allow another attempt
+				 * to migrate if it gets another correctable
+				 * error.
+				 */
+				ClearPagePoison(page);
+		}
+#endif
 		putback_lru_page(page);
 	}
 
@@ -736,7 +759,7 @@ int migrate_pages(struct list_head *from
 	struct page *page;
 	struct page *page2;
 	int swapwrite = current->flags & PF_SWAPWRITE;
-	int rc;
+	int rc = 0;
 
 	if (!swapwrite)
 		current->flags |= PF_SWAPWRITE;
@@ -765,6 +788,19 @@ int migrate_pages(struct list_head *from
 			}
 		}
 	}
+
+#ifdef CONFIG_MEMORY_FAILURE
+	if (rc != 0)
+		list_for_each_entry_safe(page, page2, from, lru)
+			if (PagePoison(page))
+				/*
+				 * The page failed to migrate.  Clearing
+				 * the error bit will allow another attempt
+				 * to migrate if it gets another correctable
+				 * error.
+				 */
+				ClearPagePoison(page);
+#endif
 	rc = 0;
 out:
 	if (!swapwrite)
@@ -777,6 +813,7 @@ out:
 
 	return nr_failed + retry;
 }
+EXPORT_SYMBOL(migrate_pages);
 
 #ifdef CONFIG_NUMA
 /*
Index: linux-next/include/linux/page-flags.h
===================================================================
--- linux-next.orig/include/linux/page-flags.h	2009-04-07 18:32:12.789950956 -0500
+++ linux-next/include/linux/page-flags.h	2009-04-07 18:34:19.197737925 -0500
@@ -169,15 +169,21 @@ static inline int TestSetPage##uname(str
 static inline int TestClearPage##uname(struct page *page)		\
 		{ return test_and_clear_bit(PG_##lname, &page->flags); }
 
+#define PAGEFLAGMASK(uname, lname)					\
+static inline int PAGEMASK_##uname(void)				\
+		{ return (1 << PG_##lname); }
 
 #define PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname)		\
-	SETPAGEFLAG(uname, lname) CLEARPAGEFLAG(uname, lname)
+	SETPAGEFLAG(uname, lname) CLEARPAGEFLAG(uname, lname)		\
+	PAGEFLAGMASK(uname, lname)
 
 #define __PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname)		\
 	__SETPAGEFLAG(uname, lname)  __CLEARPAGEFLAG(uname, lname)
 
 #define PAGEFLAG_FALSE(uname) 						\
 static inline int Page##uname(struct page *page) 			\
+			{ return 0; }					\
+static inline int PAGEMASK_##uname(void)				\
 			{ return 0; }
 
 #define TESTSCFLAG(uname, lname)					\
-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Avoid putting a bad page back on the LRU
  2009-04-08  0:11 [PATCH 1/2] Avoid putting a bad page back on the LRU Russ Anderson
@ 2009-04-08  3:43 ` Ingo Oeser
  2009-04-08  6:46   ` Andi Kleen
  2009-04-08 13:31   ` Russ Anderson
  0 siblings, 2 replies; 4+ messages in thread
From: Ingo Oeser @ 2009-04-08  3:43 UTC (permalink / raw)
  To: Russ Anderson; +Cc: linux-kernel, linux-mm, x86, Andi Kleen

Hi Russ,

On Wednesday 08 April 2009, Russ Anderson wrote:
> --- linux-next.orig/mm/migrate.c	2009-04-07 18:32:12.781949840 -0500
> +++ linux-next/mm/migrate.c	2009-04-07 18:34:19.169736260 -0500
> @@ -693,6 +696,26 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> +#ifdef CONFIG_MEMORY_FAILURE
> +		if (PagePoison(page)) {
> +			if (rc == 0)
> +				/*
> +				 * A page with a memory error that has
> +				 * been migrated will not be moved to
> +				 * the LRU.
> +				 */
> +				goto move_newpage;
> +			else
> +				/*
> +				 * The page failed to migrate and will not
> +				 * be added to the bad page list.  Clearing
> +				 * the error bit will allow another attempt
> +				 * to migrate if it gets another correctable
> +				 * error.
> +				 */
> +				ClearPagePoison(page);

Clearing the flag doesn't change the fact, that this page is representing 
permanently bad RAM.

What about removing it from the LRU and adding it to a bad RAM list in every case?
After hot swapping the physical RAM banks it could be moved back, not before.


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Avoid putting a bad page back on the LRU
  2009-04-08  3:43 ` Ingo Oeser
@ 2009-04-08  6:46   ` Andi Kleen
  2009-04-08 13:31   ` Russ Anderson
  1 sibling, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2009-04-08  6:46 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Russ Anderson, linux-kernel, linux-mm, x86

Ingo Oeser <ioe-lkml@rameria.de> writes:
>
> Clearing the flag doesn't change the fact, that this page is representing 
> permanently bad RAM.

Yes, you cannot ever clear a Poison flag, at least not without a special
hardware mechanism that clears the hardware poison too (but that has
other issues in Linux too). Otherwise you would die later.

> What about removing it from the LRU and adding it to a bad RAM list in every case?

That is what memory_failure() already should be doing. Except there's no list
currently.

> After hot swapping the physical RAM banks it could be moved back, not before.

Linux doesn't really support that. That is at least not when it's OS visible.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Avoid putting a bad page back on the LRU
  2009-04-08  3:43 ` Ingo Oeser
  2009-04-08  6:46   ` Andi Kleen
@ 2009-04-08 13:31   ` Russ Anderson
  1 sibling, 0 replies; 4+ messages in thread
From: Russ Anderson @ 2009-04-08 13:31 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel, linux-mm, x86, Andi Kleen, rja

On Wed, Apr 08, 2009 at 05:43:15AM +0200, Ingo Oeser wrote:
> Hi Russ,
> 
> On Wednesday 08 April 2009, Russ Anderson wrote:
> > --- linux-next.orig/mm/migrate.c	2009-04-07 18:32:12.781949840 -0500
> > +++ linux-next/mm/migrate.c	2009-04-07 18:34:19.169736260 -0500
> > @@ -693,6 +696,26 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > +#ifdef CONFIG_MEMORY_FAILURE
> > +		if (PagePoison(page)) {
> > +			if (rc == 0)
> > +				/*
> > +				 * A page with a memory error that has
> > +				 * been migrated will not be moved to
> > +				 * the LRU.
> > +				 */
> > +				goto move_newpage;
> > +			else
> > +				/*
> > +				 * The page failed to migrate and will not
> > +				 * be added to the bad page list.  Clearing
> > +				 * the error bit will allow another attempt
> > +				 * to migrate if it gets another correctable
> > +				 * error.
> > +				 */
> > +				ClearPagePoison(page);
> 
> Clearing the flag doesn't change the fact, that this page is representing 
> permanently bad RAM.

Yes, but this is intended for corrected memory errors (meaning there is
an underlying RAM error, but has not reached the point of losing data).

After talking with Andi, it is clear the intent of the Poison flag
(uncorrectable memory error) is different from my intent (corrected
memory error).  I'll go back to using a different page flag to avoid
confusing the two issues.
 
> What about removing it from the LRU and adding it to a bad RAM list in every case?

That is what happens when the page migrates (the normal case).  The else case 
s when the page could not be migrated.  My intent was to wait for the next
corrected error on that page and try migrating again.

> After hot swapping the physical RAM banks it could be moved back, not before.

As soon as the code is written.  :-)

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-08 13:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-08  0:11 [PATCH 1/2] Avoid putting a bad page back on the LRU Russ Anderson
2009-04-08  3:43 ` Ingo Oeser
2009-04-08  6:46   ` Andi Kleen
2009-04-08 13:31   ` Russ Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).