linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
@ 2015-08-13  7:09 Wanpeng Li
  2015-08-13  8:53 ` Naoya Horiguchi
  0 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2015-08-13  7:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Naoya Horiguchi, linux-mm, linux-kernel, Wanpeng Li

[   61.572584] BUG: Bad page state in process bash  pfn:97000
[   61.578106] page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
[   61.586803] flags: 0x1fffff80080048(uptodate|active|swapbacked)
[   61.592809] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   61.599250] bad because of flags:
[   61.602567] flags: 0x40(active)
[   61.605746] Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
[   61.605827] CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
[   61.605829] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
[   61.605832]  ffffffff818b3be8 ffff8800da373ad8 ffffffff8165ceb4 0000000001313ce1
[   61.605837]  ffffea00025c0000 ffff8800da373b08 ffffffff8117bdd6 ffff88021edd4b00
[   61.605842]  0000000000000001 001fffff80080048 0000000000000000 ffff8800da373b88
[   61.605847] Call Trace:
[   61.605858]  [<ffffffff8165ceb4>] dump_stack+0x48/0x5c
[   61.605865]  [<ffffffff8117bdd6>] bad_page+0xe6/0x140
[   61.605870]  [<ffffffff8117dfc9>] free_pages_prepare+0x2f9/0x320
[   61.605876]  [<ffffffff811e817d>] ? uncharge_list+0xdd/0x100
[   61.605882]  [<ffffffff8117ff20>] free_hot_cold_page+0x40/0x170
[   61.605888]  [<ffffffff81185dd0>] __put_single_page+0x20/0x30
[   61.605892]  [<ffffffff81186675>] put_page+0x25/0x40
[   61.605897]  [<ffffffff811dc276>] unmap_and_move+0x1a6/0x1f0
[   61.605908]  [<ffffffff811dc3c0>] migrate_pages+0x100/0x1d0
[   61.605914]  [<ffffffff811eb710>] ? kill_procs+0x100/0x100
[   61.605918]  [<ffffffff811764af>] ? unlock_page+0x6f/0x90
[   61.605923]  [<ffffffff811ecf37>] __soft_offline_page+0x127/0x2a0
[   61.605928]  [<ffffffff811ed156>] soft_offline_page+0xa6/0x200

There is a race window between soft_offline_page() and unpoison_memory():

		CPU0 					CPU1

soft_offline_page
__soft_offline_page
TestSetPageHWPoison   
					unpoison_memory
					PageHWPoison check (true)
					TestClearPageHWPoison
					put_page    -> release refcount held by get_hwpoison_page in unpoison_memory
					put_page    -> release refcount held by isolate_lru_page in __soft_offline_page
migrate_pages

The second put_page() releases refcount held by isolate_lru_page() which 
will lead to unmap_and_move() releases the last refcount of page and w/ 
mapcount still 1 since try_to_unmap() is not called if there is only 
one user map the page. Anyway, the page refcount and mapcount will 
still mess if the page is mapped by multiple users. Commit (4491f712606: 
mm/memory-failure: set PageHWPoison before migrate_pages()) is introduced 
to avoid to reuse just successful migrated page, however, it also incurs 
this race window.

Fix it by continue to use migratetype to guarantee the source page which 
is successful migration does not reused before PG_hwpoison is set.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 include/linux/page-isolation.h |    5 +++++
 mm/memory-failure.c            |   16 ++++++++++++----
 mm/migrate.c                   |    3 +--
 mm/page_isolation.c            |    4 ++--
 4 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d647..ff5751e 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 			bool skip_hwpoisoned_pages);
 
+/*
+ *  Internal functions. Changes pageblock's migrate type.
+ */
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
+void unset_migratetype_isolate(struct page *page, unsigned migratetype);
 struct page *alloc_migrate_target(struct page *page, unsigned long private,
 				int **resultp);
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index eca613e..0ed3814 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
 		inc_zone_page_state(page, NR_ISOLATED_ANON +
 					page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
-		if (!TestSetPageHWPoison(page))
-			atomic_long_inc(&num_poisoned_pages);
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
 		if (ret) {
@@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
 				pfn, ret, page->flags);
 			if (ret > 0)
 				ret = -EIO;
-			if (TestClearPageHWPoison(page))
-				atomic_long_dec(&num_poisoned_pages);
+		} else {
+			if (!TestSetPageHWPoison(page))
+				atomic_long_inc(&num_poisoned_pages);
 		}
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
@@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
 
 	get_online_mems();
 
+	/*
+	 * Isolate the page, so that it doesn't get reallocated if it
+	 * was free. This flag should be kept set until the source page
+	 * is freed and PG_hwpoison on it is set.
+	 */
+	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+		set_migratetype_isolate(page, false);
+
 	ret = get_any_page(page, pfn, flags);
 	put_online_mems();
 	if (ret > 0) { /* for in-use pages */
@@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
 				atomic_long_inc(&num_poisoned_pages);
 		}
 	}
+	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
 	return ret;
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index 1f8369d..472baf5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	/* Establish migration ptes or remove ptes */
 	if (page_mapped(page)) {
 		try_to_unmap(page,
-			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
-			TTU_IGNORE_HWPOISON);
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 		page_was_mapped = 1;
 	}
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4568fd5..654662a 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -9,7 +9,7 @@
 #include <linux/hugetlb.h>
 #include "internal.h"
 
-static int set_migratetype_isolate(struct page *page,
+int set_migratetype_isolate(struct page *page,
 				bool skip_hwpoisoned_pages)
 {
 	struct zone *zone;
@@ -73,7 +73,7 @@ out:
 	return ret;
 }
 
-static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
+void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags, nr_pages;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-13  7:09 [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Wanpeng Li
@ 2015-08-13  8:53 ` Naoya Horiguchi
  2015-08-13  9:18   ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-13  8:53 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Andrew Morton, linux-mm, linux-kernel

On Thu, Aug 13, 2015 at 03:09:07PM +0800, Wanpeng Li wrote:
> [   61.572584] BUG: Bad page state in process bash  pfn:97000
> [   61.578106] page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
> [   61.586803] flags: 0x1fffff80080048(uptodate|active|swapbacked)
> [   61.592809] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [   61.599250] bad because of flags:
> [   61.602567] flags: 0x40(active)
> [   61.605746] Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
> [   61.605827] CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
> [   61.605829] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
> [   61.605832]  ffffffff818b3be8 ffff8800da373ad8 ffffffff8165ceb4 0000000001313ce1
> [   61.605837]  ffffea00025c0000 ffff8800da373b08 ffffffff8117bdd6 ffff88021edd4b00
> [   61.605842]  0000000000000001 001fffff80080048 0000000000000000 ffff8800da373b88
> [   61.605847] Call Trace:
> [   61.605858]  [<ffffffff8165ceb4>] dump_stack+0x48/0x5c
> [   61.605865]  [<ffffffff8117bdd6>] bad_page+0xe6/0x140
> [   61.605870]  [<ffffffff8117dfc9>] free_pages_prepare+0x2f9/0x320
> [   61.605876]  [<ffffffff811e817d>] ? uncharge_list+0xdd/0x100
> [   61.605882]  [<ffffffff8117ff20>] free_hot_cold_page+0x40/0x170
> [   61.605888]  [<ffffffff81185dd0>] __put_single_page+0x20/0x30
> [   61.605892]  [<ffffffff81186675>] put_page+0x25/0x40
> [   61.605897]  [<ffffffff811dc276>] unmap_and_move+0x1a6/0x1f0
> [   61.605908]  [<ffffffff811dc3c0>] migrate_pages+0x100/0x1d0
> [   61.605914]  [<ffffffff811eb710>] ? kill_procs+0x100/0x100
> [   61.605918]  [<ffffffff811764af>] ? unlock_page+0x6f/0x90
> [   61.605923]  [<ffffffff811ecf37>] __soft_offline_page+0x127/0x2a0
> [   61.605928]  [<ffffffff811ed156>] soft_offline_page+0xa6/0x200
> 
> There is a race window between soft_offline_page() and unpoison_memory():
> 
> 		CPU0 					CPU1
> 
> soft_offline_page
> __soft_offline_page
> TestSetPageHWPoison   
> 					unpoison_memory
> 					PageHWPoison check (true)
> 					TestClearPageHWPoison
> 					put_page    -> release refcount held by get_hwpoison_page in unpoison_memory
> 					put_page    -> release refcount held by isolate_lru_page in __soft_offline_page
> migrate_pages
> 
> The second put_page() releases refcount held by isolate_lru_page() which 
> will lead to unmap_and_move() releases the last refcount of page and w/ 
> mapcount still 1 since try_to_unmap() is not called if there is only 
> one user map the page. Anyway, the page refcount and mapcount will 
> still mess if the page is mapped by multiple users. Commit (4491f712606: 
> mm/memory-failure: set PageHWPoison before migrate_pages()) is introduced 
> to avoid to reuse just successful migrated page, however, it also incurs 
> this race window.
> 
> Fix it by continue to use migratetype to guarantee the source page which 
> is successful migration does not reused before PG_hwpoison is set.
> 
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

Thank you for the report, Wanpeng.

I think that unpoison is used only in testing so this race never affects
our end-users/customers, so going back to this migratetype change stuff
looks unworthy to me.

If I read correctly, the old migratetype approach has a few problems:
  1) it doesn't fix the problem completely, because
     set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
     target page if the pageblock of the page contains one or more
     unmovable pages (i.e. has_unmovable_pages() returns true).
  2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
     and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
     of the original migratetype state, which could impact other subsystems
     like memory hotplug or compaction.

So in my opinion, the best option is to fix this within unpoison code,
but it might be hard because we don't know whether the target page is
under migration. The second option is to add a race check in the if (reason
== MR_MEMORY_FAILURE) branch in unmap_and_move(), although this looks ad-hoc
and need testing. The third option is to leave it with some "FIXME" comment.

Thanks,
Naoya Horiguchi

> ---
>  include/linux/page-isolation.h |    5 +++++
>  mm/memory-failure.c            |   16 ++++++++++++----
>  mm/migrate.c                   |    3 +--
>  mm/page_isolation.c            |    4 ++--
>  4 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 047d647..ff5751e 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  			bool skip_hwpoisoned_pages);
>  
> +/*
> + *  Internal functions. Changes pageblock's migrate type.
> + */
> +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
> +void unset_migratetype_isolate(struct page *page, unsigned migratetype);
>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>  				int **resultp);
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index eca613e..0ed3814 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
>  		inc_zone_page_state(page, NR_ISOLATED_ANON +
>  					page_is_file_cache(page));
>  		list_add(&page->lru, &pagelist);
> -		if (!TestSetPageHWPoison(page))
> -			atomic_long_inc(&num_poisoned_pages);
>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
>  		if (ret) {
> @@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
>  				pfn, ret, page->flags);
>  			if (ret > 0)
>  				ret = -EIO;
> -			if (TestClearPageHWPoison(page))
> -				atomic_long_dec(&num_poisoned_pages);
> +		} else {
> +			if (!TestSetPageHWPoison(page))
> +				atomic_long_inc(&num_poisoned_pages);
>  		}
>  	} else {
>  		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
> @@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
>  
>  	get_online_mems();
>  
> +	/*
> +	 * Isolate the page, so that it doesn't get reallocated if it
> +	 * was free. This flag should be kept set until the source page
> +	 * is freed and PG_hwpoison on it is set.
> +	 */
> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> +		set_migratetype_isolate(page, false);
> +
>  	ret = get_any_page(page, pfn, flags);
>  	put_online_mems();
>  	if (ret > 0) { /* for in-use pages */
> @@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
>  				atomic_long_inc(&num_poisoned_pages);
>  		}
>  	}
> +	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
>  	return ret;
>  }
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1f8369d..472baf5 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  	/* Establish migration ptes or remove ptes */
>  	if (page_mapped(page)) {
>  		try_to_unmap(page,
> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
> -			TTU_IGNORE_HWPOISON);
> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>  		page_was_mapped = 1;
>  	}
>  
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 4568fd5..654662a 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -9,7 +9,7 @@
>  #include <linux/hugetlb.h>
>  #include "internal.h"
>  
> -static int set_migratetype_isolate(struct page *page,
> +int set_migratetype_isolate(struct page *page,
>  				bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -73,7 +73,7 @@ out:
>  	return ret;
>  }
>  
> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> +void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  {
>  	struct zone *zone;
>  	unsigned long flags, nr_pages;
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-13  8:53 ` Naoya Horiguchi
@ 2015-08-13  9:18   ` Wanpeng Li
  2015-08-13 10:04     ` Naoya Horiguchi
  0 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2015-08-13  9:18 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel

On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
> On Thu, Aug 13, 2015 at 03:09:07PM +0800, Wanpeng Li wrote:
>> [   61.572584] BUG: Bad page state in process bash  pfn:97000
>> [   61.578106] page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
>> [   61.586803] flags: 0x1fffff80080048(uptodate|active|swapbacked)
>> [   61.592809] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [   61.599250] bad because of flags:
>> [   61.602567] flags: 0x40(active)
>> [   61.605746] Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
>> [   61.605827] CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
>> [   61.605829] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
>> [   61.605832]  ffffffff818b3be8 ffff8800da373ad8 ffffffff8165ceb4 0000000001313ce1
>> [   61.605837]  ffffea00025c0000 ffff8800da373b08 ffffffff8117bdd6 ffff88021edd4b00
>> [   61.605842]  0000000000000001 001fffff80080048 0000000000000000 ffff8800da373b88
>> [   61.605847] Call Trace:
>> [   61.605858]  [<ffffffff8165ceb4>] dump_stack+0x48/0x5c
>> [   61.605865]  [<ffffffff8117bdd6>] bad_page+0xe6/0x140
>> [   61.605870]  [<ffffffff8117dfc9>] free_pages_prepare+0x2f9/0x320
>> [   61.605876]  [<ffffffff811e817d>] ? uncharge_list+0xdd/0x100
>> [   61.605882]  [<ffffffff8117ff20>] free_hot_cold_page+0x40/0x170
>> [   61.605888]  [<ffffffff81185dd0>] __put_single_page+0x20/0x30
>> [   61.605892]  [<ffffffff81186675>] put_page+0x25/0x40
>> [   61.605897]  [<ffffffff811dc276>] unmap_and_move+0x1a6/0x1f0
>> [   61.605908]  [<ffffffff811dc3c0>] migrate_pages+0x100/0x1d0
>> [   61.605914]  [<ffffffff811eb710>] ? kill_procs+0x100/0x100
>> [   61.605918]  [<ffffffff811764af>] ? unlock_page+0x6f/0x90
>> [   61.605923]  [<ffffffff811ecf37>] __soft_offline_page+0x127/0x2a0
>> [   61.605928]  [<ffffffff811ed156>] soft_offline_page+0xa6/0x200
>>
>> There is a race window between soft_offline_page() and unpoison_memory():
>>
>> 		CPU0 					CPU1
>>
>> soft_offline_page
>> __soft_offline_page
>> TestSetPageHWPoison   
>> 					unpoison_memory
>> 					PageHWPoison check (true)
>> 					TestClearPageHWPoison
>> 					put_page    -> release refcount held by get_hwpoison_page in unpoison_memory
>> 					put_page    -> release refcount held by isolate_lru_page in __soft_offline_page
>> migrate_pages
>>
>> The second put_page() releases refcount held by isolate_lru_page() which 
>> will lead to unmap_and_move() releases the last refcount of page and w/ 
>> mapcount still 1 since try_to_unmap() is not called if there is only 
>> one user map the page. Anyway, the page refcount and mapcount will 
>> still mess if the page is mapped by multiple users. Commit (4491f712606: 
>> mm/memory-failure: set PageHWPoison before migrate_pages()) is introduced 
>> to avoid to reuse just successful migrated page, however, it also incurs 
>> this race window.
>>
>> Fix it by continue to use migratetype to guarantee the source page which 
>> is successful migration does not reused before PG_hwpoison is set.
>>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> Thank you for the report, Wanpeng.
>
> I think that unpoison is used only in testing so this race never affects
> our end-users/customers, so going back to this migratetype change stuff
> looks unworthy to me.

Migratetype stuff is just removed by you two months ago, then this bug
occurs recently since the more and more patches which you fix some races.

>
> If I read correctly, the old migratetype approach has a few problems:
>   1) it doesn't fix the problem completely, because
>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
>      target page if the pageblock of the page contains one or more
>      unmovable pages (i.e. has_unmovable_pages() returns true).
>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
>      of the original migratetype state, which could impact other subsystems
>      like memory hotplug or compaction.

Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
current linus tree calltrace and it should be fixed immediately, and I
don't see obvious bugs appear on migratetype stuffs at least currently,
so "FIXME" is enough. :-)

>
> So in my opinion, the best option is to fix this within unpoison code,
> but it might be hard because we don't know whether the target page is

Exactly.

> under migration. The second option is to add a race check in the if (reason
> == MR_MEMORY_FAILURE) branch in unmap_and_move(), although this looks ad-hoc

You have already add MR_MEMORY_FAILURE, however, that is not enough to
fix unpoison race.

> and need testing. The third option is to leave it with some "FIXME" comment.

I *prefer* add "FIXME" to migratetype stuffs.

Regards,
Wanpeng Li

>
> Thanks,
> Naoya Horiguchi
>
>> ---
>>  include/linux/page-isolation.h |    5 +++++
>>  mm/memory-failure.c            |   16 ++++++++++++----
>>  mm/migrate.c                   |    3 +--
>>  mm/page_isolation.c            |    4 ++--
>>  4 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index 047d647..ff5751e 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>  			bool skip_hwpoisoned_pages);
>>  
>> +/*
>> + *  Internal functions. Changes pageblock's migrate type.
>> + */
>> +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
>> +void unset_migratetype_isolate(struct page *page, unsigned migratetype);
>>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>  				int **resultp);
>>  
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index eca613e..0ed3814 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
>>  		inc_zone_page_state(page, NR_ISOLATED_ANON +
>>  					page_is_file_cache(page));
>>  		list_add(&page->lru, &pagelist);
>> -		if (!TestSetPageHWPoison(page))
>> -			atomic_long_inc(&num_poisoned_pages);
>>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
>>  		if (ret) {
>> @@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
>>  				pfn, ret, page->flags);
>>  			if (ret > 0)
>>  				ret = -EIO;
>> -			if (TestClearPageHWPoison(page))
>> -				atomic_long_dec(&num_poisoned_pages);
>> +		} else {
>> +			if (!TestSetPageHWPoison(page))
>> +				atomic_long_inc(&num_poisoned_pages);
>>  		}
>>  	} else {
>>  		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
>> @@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
>>  
>>  	get_online_mems();
>>  
>> +	/*
>> +	 * Isolate the page, so that it doesn't get reallocated if it
>> +	 * was free. This flag should be kept set until the source page
>> +	 * is freed and PG_hwpoison on it is set.
>> +	 */
>> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>> +		set_migratetype_isolate(page, false);
>> +
>>  	ret = get_any_page(page, pfn, flags);
>>  	put_online_mems();
>>  	if (ret > 0) { /* for in-use pages */
>> @@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
>>  				atomic_long_inc(&num_poisoned_pages);
>>  		}
>>  	}
>> +	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
>>  	return ret;
>>  }
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 1f8369d..472baf5 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>  	/* Establish migration ptes or remove ptes */
>>  	if (page_mapped(page)) {
>>  		try_to_unmap(page,
>> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
>> -			TTU_IGNORE_HWPOISON);
>> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>>  		page_was_mapped = 1;
>>  	}
>>  
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 4568fd5..654662a 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -9,7 +9,7 @@
>>  #include <linux/hugetlb.h>
>>  #include "internal.h"
>>  
>> -static int set_migratetype_isolate(struct page *page,
>> +int set_migratetype_isolate(struct page *page,
>>  				bool skip_hwpoisoned_pages)
>>  {
>>  	struct zone *zone;
>> @@ -73,7 +73,7 @@ out:
>>  	return ret;
>>  }
>>  
>> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>> +void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>>  {
>>  	struct zone *zone;
>>  	unsigned long flags, nr_pages;
>> -- 
>> 1.7.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-13  9:18   ` Wanpeng Li
@ 2015-08-13 10:04     ` Naoya Horiguchi
  2015-08-13 10:27       ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-13 10:04 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Andrew Morton, linux-mm, linux-kernel

On Thu, Aug 13, 2015 at 05:18:56PM +0800, Wanpeng Li wrote:
> On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
...
> >
> > I think that unpoison is used only in testing so this race never affects
> > our end-users/customers, so going back to this migratetype change stuff
> > looks unworthy to me.
>
> Migratetype stuff is just removed by you two months ago, then this bug
> occurs recently since the more and more patches which you fix some races.

Yes, this race (which existed before my recent changes) became more visible
with that changes. But I don't think that simply reverting them is a right solution.

> >
> > If I read correctly, the old migratetype approach has a few problems:
> >   1) it doesn't fix the problem completely, because
> >      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
> >      target page if the pageblock of the page contains one or more
> >      unmovable pages (i.e. has_unmovable_pages() returns true).
> >   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
> >      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
> >      of the original migratetype state, which could impact other subsystems
> >      like memory hotplug or compaction.
>
> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
> current linus tree calltrace and it should be fixed immediately, and I
> don't see obvious bugs appear on migratetype stuffs at least currently,
> so "FIXME" is enough. :-)

Sorry if confusing, but my intention in saying about "FIXME" comment was
that we can find another solution for this race rather than just reverting,
so adding comment about the reported bug in current code (keeping code from
4491f712606) is OK for very short term.
I understand that leaving a race window of BUG_ON is not the best thing, but
as I said, this race shouldn't affect end-users, so this is not an urgent bug.
# It's enough if testers know this.

> >
> > So in my opinion, the best option is to fix this within unpoison code,
> > but it might be hard because we don't know whether the target page is
>
> Exactly.
>
> > under migration. The second option is to add a race check in the if (reason
> > == MR_MEMORY_FAILURE) branch in unmap_and_move(), although this looks ad-hoc
>
> You have already add MR_MEMORY_FAILURE, however, that is not enough to
> fix unpoison race.

Right, some additional code is necessary. I'll try the second approach.

Thanks,
Naoya Horiguchi

> > and need testing. The third option is to leave it with some "FIXME" comment.
>
> I *prefer* add "FIXME" to migratetype stuffs.
>
> Regards,
> Wanpeng Li
>
> >
> > Thanks,
> > Naoya Horiguchi
> >
> >> ---
> >>  include/linux/page-isolation.h |    5 +++++
> >>  mm/memory-failure.c            |   16 ++++++++++++----
> >>  mm/migrate.c                   |    3 +--
> >>  mm/page_isolation.c            |    4 ++--
> >>  4 files changed, 20 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> >> index 047d647..ff5751e 100644
> >> --- a/include/linux/page-isolation.h
> >> +++ b/include/linux/page-isolation.h
> >> @@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> >>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> >>  			bool skip_hwpoisoned_pages);
> >>
> >> +/*
> >> + *  Internal functions. Changes pageblock's migrate type.
> >> + */
> >> +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
> >> +void unset_migratetype_isolate(struct page *page, unsigned migratetype);
> >>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
> >>  				int **resultp);
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index eca613e..0ed3814 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
> >>  		inc_zone_page_state(page, NR_ISOLATED_ANON +
> >>  					page_is_file_cache(page));
> >>  		list_add(&page->lru, &pagelist);
> >> -		if (!TestSetPageHWPoison(page))
> >> -			atomic_long_inc(&num_poisoned_pages);
> >>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> >>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
> >>  		if (ret) {
> >> @@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
> >>  				pfn, ret, page->flags);
> >>  			if (ret > 0)
> >>  				ret = -EIO;
> >> -			if (TestClearPageHWPoison(page))
> >> -				atomic_long_dec(&num_poisoned_pages);
> >> +		} else {
> >> +			if (!TestSetPageHWPoison(page))
> >> +				atomic_long_inc(&num_poisoned_pages);
> >>  		}
> >>  	} else {
> >>  		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
> >> @@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
> >>
> >>  	get_online_mems();
> >>
> >> +	/*
> >> +	 * Isolate the page, so that it doesn't get reallocated if it
> >> +	 * was free. This flag should be kept set until the source page
> >> +	 * is freed and PG_hwpoison on it is set.
> >> +	 */
> >> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> >> +		set_migratetype_isolate(page, false);
> >> +
> >>  	ret = get_any_page(page, pfn, flags);
> >>  	put_online_mems();
> >>  	if (ret > 0) { /* for in-use pages */
> >> @@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
> >>  				atomic_long_inc(&num_poisoned_pages);
> >>  		}
> >>  	}
> >> +	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
> >>  	return ret;
> >>  }
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index 1f8369d..472baf5 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>  	/* Establish migration ptes or remove ptes */
> >>  	if (page_mapped(page)) {
> >>  		try_to_unmap(page,
> >> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
> >> -			TTU_IGNORE_HWPOISON);
> >> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> >>  		page_was_mapped = 1;
> >>  	}
> >>
> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> >> index 4568fd5..654662a 100644
> >> --- a/mm/page_isolation.c
> >> +++ b/mm/page_isolation.c
> >> @@ -9,7 +9,7 @@
> >>  #include <linux/hugetlb.h>
> >>  #include "internal.h"
> >>
> >> -static int set_migratetype_isolate(struct page *page,
> >> +int set_migratetype_isolate(struct page *page,
> >>  				bool skip_hwpoisoned_pages)
> >>  {
> >>  	struct zone *zone;
> >> @@ -73,7 +73,7 @@ out:
> >>  	return ret;
> >>  }
> >>
> >> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> >> +void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> >>  {
> >>  	struct zone *zone;
> >>  	unsigned long flags, nr_pages;
> >> --
> >> 1.7.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-13 10:04     ` Naoya Horiguchi
@ 2015-08-13 10:27       ` Wanpeng Li
  2015-08-14  4:19         ` Naoya Horiguchi
  0 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2015-08-13 10:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel

On 8/13/15 6:04 PM, Naoya Horiguchi wrote:
> On Thu, Aug 13, 2015 at 05:18:56PM +0800, Wanpeng Li wrote:
>> On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
> ...
>>> I think that unpoison is used only in testing so this race never affects
>>> our end-users/customers, so going back to this migratetype change stuff
>>> looks unworthy to me.
>> Migratetype stuff is just removed by you two months ago, then this bug
>> occurs recently since the more and more patches which you fix some races.
> Yes, this race (which existed before my recent changes) became more visible

IIUC, no. The page will be freed before PageHWPoison is set. So the race
doesn't exist.

> with that changes. But I don't think that simply reverting them is a right solution.
>
>>> If I read correctly, the old migratetype approach has a few problems:
>>>   1) it doesn't fix the problem completely, because
>>>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
>>>      target page if the pageblock of the page contains one or more
>>>      unmovable pages (i.e. has_unmovable_pages() returns true).
>>>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
>>>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
>>>      of the original migratetype state, which could impact other subsystems
>>>      like memory hotplug or compaction.
>> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
>> current linus tree calltrace and it should be fixed immediately, and I
>> don't see obvious bugs appear on migratetype stuffs at least currently,
>> so "FIXME" is enough. :-)
> Sorry if confusing, but my intention in saying about "FIXME" comment was
> that we can find another solution for this race rather than just reverting,
> so adding comment about the reported bug in current code (keeping code from
> 4491f712606) is OK for very short term.
> I understand that leaving a race window of BUG_ON is not the best thing, but
> as I said, this race shouldn't affect end-users, so this is not an urgent bug.
> # It's enough if testers know this.

The 4.2 is coming, this patch can be applied as a temporal solution in
order to fix the broken linus tree, and the any final solution can be
figured out later.

Regards,
Wanpeng Li

>
>>> So in my opinion, the best option is to fix this within unpoison code,
>>> but it might be hard because we don't know whether the target page is
>> Exactly.
>>
>>> under migration. The second option is to add a race check in the if (reason
>>> == MR_MEMORY_FAILURE) branch in unmap_and_move(), although this looks ad-hoc
>> You have already add MR_MEMORY_FAILURE, however, that is not enough to
>> fix unpoison race.
> Right, some additional code is necessary. I'll try the second approach.
>
> Thanks,
> Naoya Horiguchi
>
>>> and need testing. The third option is to leave it with some "FIXME" comment.
>> I *prefer* add "FIXME" to migratetype stuffs.
>>
>> Regards,
>> Wanpeng Li
>>
>>> Thanks,
>>> Naoya Horiguchi
>>>
>>>> ---
>>>>  include/linux/page-isolation.h |    5 +++++
>>>>  mm/memory-failure.c            |   16 ++++++++++++----
>>>>  mm/migrate.c                   |    3 +--
>>>>  mm/page_isolation.c            |    4 ++--
>>>>  4 files changed, 20 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>>>> index 047d647..ff5751e 100644
>>>> --- a/include/linux/page-isolation.h
>>>> +++ b/include/linux/page-isolation.h
>>>> @@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>>>  			bool skip_hwpoisoned_pages);
>>>>
>>>> +/*
>>>> + *  Internal functions. Changes pageblock's migrate type.
>>>> + */
>>>> +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
>>>> +void unset_migratetype_isolate(struct page *page, unsigned migratetype);
>>>>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>>>  				int **resultp);
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index eca613e..0ed3814 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
>>>>  		inc_zone_page_state(page, NR_ISOLATED_ANON +
>>>>  					page_is_file_cache(page));
>>>>  		list_add(&page->lru, &pagelist);
>>>> -		if (!TestSetPageHWPoison(page))
>>>> -			atomic_long_inc(&num_poisoned_pages);
>>>>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>>>>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
>>>>  		if (ret) {
>>>> @@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
>>>>  				pfn, ret, page->flags);
>>>>  			if (ret > 0)
>>>>  				ret = -EIO;
>>>> -			if (TestClearPageHWPoison(page))
>>>> -				atomic_long_dec(&num_poisoned_pages);
>>>> +		} else {
>>>> +			if (!TestSetPageHWPoison(page))
>>>> +				atomic_long_inc(&num_poisoned_pages);
>>>>  		}
>>>>  	} else {
>>>>  		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
>>>> @@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
>>>>
>>>>  	get_online_mems();
>>>>
>>>> +	/*
>>>> +	 * Isolate the page, so that it doesn't get reallocated if it
>>>> +	 * was free. This flag should be kept set until the source page
>>>> +	 * is freed and PG_hwpoison on it is set.
>>>> +	 */
>>>> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>>>> +		set_migratetype_isolate(page, false);
>>>> +
>>>>  	ret = get_any_page(page, pfn, flags);
>>>>  	put_online_mems();
>>>>  	if (ret > 0) { /* for in-use pages */
>>>> @@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
>>>>  				atomic_long_inc(&num_poisoned_pages);
>>>>  		}
>>>>  	}
>>>> +	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
>>>>  	return ret;
>>>>  }
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 1f8369d..472baf5 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>  	/* Establish migration ptes or remove ptes */
>>>>  	if (page_mapped(page)) {
>>>>  		try_to_unmap(page,
>>>> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
>>>> -			TTU_IGNORE_HWPOISON);
>>>> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>>>>  		page_was_mapped = 1;
>>>>  	}
>>>>
>>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>>> index 4568fd5..654662a 100644
>>>> --- a/mm/page_isolation.c
>>>> +++ b/mm/page_isolation.c
>>>> @@ -9,7 +9,7 @@
>>>>  #include <linux/hugetlb.h>
>>>>  #include "internal.h"
>>>>
>>>> -static int set_migratetype_isolate(struct page *page,
>>>> +int set_migratetype_isolate(struct page *page,
>>>>  				bool skip_hwpoisoned_pages)
>>>>  {
>>>>  	struct zone *zone;
>>>> @@ -73,7 +73,7 @@ out:
>>>>  	return ret;
>>>>  }
>>>>
>>>> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>>>> +void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>>>>  {
>>>>  	struct zone *zone;
>>>>  	unsigned long flags, nr_pages;
>>>> --
>>>> 1.7.1
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-13 10:27       ` Wanpeng Li
@ 2015-08-14  4:19         ` Naoya Horiguchi
  2015-08-14  5:03           ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-14  4:19 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Andrew Morton, linux-mm, linux-kernel

On Thu, Aug 13, 2015 at 06:27:40PM +0800, Wanpeng Li wrote:
> On 8/13/15 6:04 PM, Naoya Horiguchi wrote:
> > On Thu, Aug 13, 2015 at 05:18:56PM +0800, Wanpeng Li wrote:
> >> On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
> > ...
> >>> I think that unpoison is used only in testing so this race never affects
> >>> our end-users/customers, so going back to this migratetype change stuff
> >>> looks unworthy to me.
> >> Migratetype stuff is just removed by you two months ago, then this bug
> >> occurs recently since the more and more patches which you fix some races.
> > Yes, this race (which existed before my recent changes) became more visible
> 
> IIUC, no. The page will be freed before PageHWPoison is set. So the race
> doesn't exist.

OK ...

> > with that changes. But I don't think that simply reverting them is a right solution.
> >
> >>> If I read correctly, the old migratetype approach has a few problems:
> >>>   1) it doesn't fix the problem completely, because
> >>>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
> >>>      target page if the pageblock of the page contains one or more
> >>>      unmovable pages (i.e. has_unmovable_pages() returns true).
> >>>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
> >>>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
> >>>      of the original migratetype state, which could impact other subsystems
> >>>      like memory hotplug or compaction.
> >> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
> >> current linus tree calltrace and it should be fixed immediately, and I
> >> don't see obvious bugs appear on migratetype stuffs at least currently,
> >> so "FIXME" is enough. :-)
> > Sorry if confusing, but my intention in saying about "FIXME" comment was
> > that we can find another solution for this race rather than just reverting,
> > so adding comment about the reported bug in current code (keeping code from
> > 4491f712606) is OK for very short term.
> > I understand that leaving a race window of BUG_ON is not the best thing, but
> > as I said, this race shouldn't affect end-users, so this is not an urgent bug.
> > # It's enough if testers know this.
> 
> The 4.2 is coming, this patch can be applied as a temporal solution in
> order to fix the broken linus tree, and the any final solution can be
> figured out later.

I didn't reproduce this problem yet in my environment, but from code reading
I guess that checking PageHWPoison flag in unmap_and_move() like below could
avoid the problem. Could you testing with this, please?

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Fri, 14 Aug 2015 08:04:03 +0900
Subject: [PATCH] mm: hwpoison: migrate: fix race b/w soft-offline and unpoison

[description to be written ...]

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/migrate.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index eb4267107d1f..24f5b9acc26a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -953,7 +953,13 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 				page_is_file_cache(page));
 		/* Soft-offlined page shouldn't go through lru cache list */
 		if (reason == MR_MEMORY_FAILURE)
-			put_page(page);
+			/*
+			 * Check race condition with unpoison, where the source
+			 * page is handled by unpoison handler which decrements
+			 * the refcount, so no need to call put_page() here.
+			 */
+			if (PageHWPoison(page))
+				put_page(page);
 		else
 			putback_lru_page(page);
 	}
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  4:19         ` Naoya Horiguchi
@ 2015-08-14  5:03           ` Wanpeng Li
  2015-08-14  7:26             ` Naoya Horiguchi
  0 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2015-08-14  5:03 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel

On 8/14/15 12:19 PM, Naoya Horiguchi wrote:
> On Thu, Aug 13, 2015 at 06:27:40PM +0800, Wanpeng Li wrote:
>> On 8/13/15 6:04 PM, Naoya Horiguchi wrote:
>>> On Thu, Aug 13, 2015 at 05:18:56PM +0800, Wanpeng Li wrote:
>>>> On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
>>> ...
>>>>> I think that unpoison is used only in testing so this race never affects
>>>>> our end-users/customers, so going back to this migratetype change stuff
>>>>> looks unworthy to me.
>>>> Migratetype stuff is just removed by you two months ago, then this bug
>>>> occurs recently since the more and more patches which you fix some races.
>>> Yes, this race (which existed before my recent changes) became more visible
>> IIUC, no. The page will be freed before PageHWPoison is set. So the race
>> doesn't exist.
> OK ...
>
>>> with that changes. But I don't think that simply reverting them is a right solution.
>>>
>>>>> If I read correctly, the old migratetype approach has a few problems:
>>>>>   1) it doesn't fix the problem completely, because
>>>>>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
>>>>>      target page if the pageblock of the page contains one or more
>>>>>      unmovable pages (i.e. has_unmovable_pages() returns true).
>>>>>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
>>>>>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
>>>>>      of the original migratetype state, which could impact other subsystems
>>>>>      like memory hotplug or compaction.
>>>> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
>>>> current linus tree calltrace and it should be fixed immediately, and I
>>>> don't see obvious bugs appear on migratetype stuffs at least currently,
>>>> so "FIXME" is enough. :-)
>>> Sorry if confusing, but my intention in saying about "FIXME" comment was
>>> that we can find another solution for this race rather than just reverting,
>>> so adding comment about the reported bug in current code (keeping code from
>>> 4491f712606) is OK for very short term.
>>> I understand that leaving a race window of BUG_ON is not the best thing, but
>>> as I said, this race shouldn't affect end-users, so this is not an urgent bug.
>>> # It's enough if testers know this.
>> The 4.2 is coming, this patch can be applied as a temporal solution in
>> order to fix the broken linus tree, and the any final solution can be
>> figured out later.
> I didn't reproduce this problem yet in my environment, but from code reading
> I guess that checking PageHWPoison flag in unmap_and_move() like below could
> avoid the problem. Could you testing with this, please?

I have already try to modify unmap_and_move() the same as what you do
before I post migratetype stuff. It doesn't work and have other calltrace.

Regards,
Wanpeng Li

>
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Fri, 14 Aug 2015 08:04:03 +0900
> Subject: [PATCH] mm: hwpoison: migrate: fix race b/w soft-offline and unpoison
>
> [description to be written ...]
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/migrate.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index eb4267107d1f..24f5b9acc26a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -953,7 +953,13 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
>  				page_is_file_cache(page));
>  		/* Soft-offlined page shouldn't go through lru cache list */
>  		if (reason == MR_MEMORY_FAILURE)
> -			put_page(page);
> +			/*
> +			 * Check race condition with unpoison, where the source
> +			 * page is handled by unpoison handler which decrements
> +			 * the refcount, so no need to call put_page() here.
> +			 */
> +			if (PageHWPoison(page))
> +				put_page(page);
>  		else
>  			putback_lru_page(page);
>  	}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  5:03           ` Wanpeng Li
@ 2015-08-14  7:26             ` Naoya Horiguchi
  2015-08-14  7:54               ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-14  7:26 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Andrew Morton, linux-mm, linux-kernel

On Fri, Aug 14, 2015 at 01:03:53PM +0800, Wanpeng Li wrote:
> On 8/14/15 12:19 PM, Naoya Horiguchi wrote:
...
> >>>>> If I read correctly, the old migratetype approach has a few problems:
> >>>>>   1) it doesn't fix the problem completely, because
> >>>>>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
> >>>>>      target page if the pageblock of the page contains one or more
> >>>>>      unmovable pages (i.e. has_unmovable_pages() returns true).
> >>>>>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
> >>>>>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
> >>>>>      of the original migratetype state, which could impact other subsystems
> >>>>>      like memory hotplug or compaction.
> >>>> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
> >>>> current linus tree calltrace and it should be fixed immediately, and I
> >>>> don't see obvious bugs appear on migratetype stuffs at least currently,
> >>>> so "FIXME" is enough. :-)
> >>> Sorry if confusing, but my intention in saying about "FIXME" comment was
> >>> that we can find another solution for this race rather than just reverting,
> >>> so adding comment about the reported bug in current code (keeping code from
> >>> 4491f712606) is OK for very short term.
> >>> I understand that leaving a race window of BUG_ON is not the best thing, but
> >>> as I said, this race shouldn't affect end-users, so this is not an urgent bug.
> >>> # It's enough if testers know this.
> >> The 4.2 is coming, this patch can be applied as a temporal solution in
> >> order to fix the broken linus tree, and the any final solution can be
> >> figured out later.
> > I didn't reproduce this problem yet in my environment, but from code reading
> > I guess that checking PageHWPoison flag in unmap_and_move() like below could
> > avoid the problem. Could you testing with this, please?
> 
> I have already try to modify unmap_and_move() the same as what you do
> before I post migratetype stuff. It doesn't work and have other calltrace.

OK, then I rethink of handling the race in unpoison_memory().

Currently properly contained/hwpoisoned pages should have page refcount 1
(when the memory error hits LRU pages or hugetlb pages) or refcount 0
(when the memory error hits the buddy page.) And current unpoison_memory()
implicitly assumes this because otherwise the unpoisoned page has no place
to go and it's just leaked.
So to avoid the kernel panic, adding prechecks of refcount and mapcount
to limit the page to unpoison for only unpoisonable pages looks OK to me.
The page under soft offlining always has refcount >=2 and/or mapcount > 0,
so such pages should be filtered out.

Here's a patch. In my testing (run soft offline stress testing then repeat
unpoisoning in background,) the reported (or similar) bug doesn't happen.
Can I have your comments?

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: [PATCH] mm/hwpoison: don't unpoison for pinned or mapped page

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d1f85f6278ee..c6f14d2cd919 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1442,6 +1442,16 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
+	if (page_count(page) > 1) {
+		pr_info("MCE: Someone grabs the hwpoison page %#lx\n", pfn);
+		return 0;
+	}
+
+	if (page_mapped(page)) {
+		pr_info("MCE: Someone maps the hwpoison page %#lx\n", pfn);
+		return 0;
+	}
+
 	/*
 	 * unpoison_memory() can encounter thp only when the thp is being
 	 * worked by memory_failure() and the page lock is not held yet.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  7:26             ` Naoya Horiguchi
@ 2015-08-14  7:54               ` Wanpeng Li
  2015-08-14  7:59                 ` Wanpeng Li
  2015-08-14  8:02                 ` Naoya Horiguchi
  0 siblings, 2 replies; 18+ messages in thread
From: Wanpeng Li @ 2015-08-14  7:54 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel

On 8/14/15 3:26 PM, Naoya Horiguchi wrote:
> On Fri, Aug 14, 2015 at 01:03:53PM +0800, Wanpeng Li wrote:
>> On 8/14/15 12:19 PM, Naoya Horiguchi wrote:
> ...
>>>>>>> If I read correctly, the old migratetype approach has a few problems:
>>>>>>>   1) it doesn't fix the problem completely, because
>>>>>>>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
>>>>>>>      target page if the pageblock of the page contains one or more
>>>>>>>      unmovable pages (i.e. has_unmovable_pages() returns true).
>>>>>>>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
>>>>>>>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
>>>>>>>      of the original migratetype state, which could impact other subsystems
>>>>>>>      like memory hotplug or compaction.
>>>>>> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
>>>>>> current linus tree calltrace and it should be fixed immediately, and I
>>>>>> don't see obvious bugs appear on migratetype stuffs at least currently,
>>>>>> so "FIXME" is enough. :-)
>>>>> Sorry if confusing, but my intention in saying about "FIXME" comment was
>>>>> that we can find another solution for this race rather than just reverting,
>>>>> so adding comment about the reported bug in current code (keeping code from
>>>>> 4491f712606) is OK for very short term.
>>>>> I understand that leaving a race window of BUG_ON is not the best thing, but
>>>>> as I said, this race shouldn't affect end-users, so this is not an urgent bug.
>>>>> # It's enough if testers know this.
>>>> The 4.2 is coming, this patch can be applied as a temporal solution in
>>>> order to fix the broken linus tree, and the any final solution can be
>>>> figured out later.
>>> I didn't reproduce this problem yet in my environment, but from code reading
>>> I guess that checking PageHWPoison flag in unmap_and_move() like below could
>>> avoid the problem. Could you testing with this, please?
>> I have already try to modify unmap_and_move() the same as what you do
>> before I post migratetype stuff. It doesn't work and have other calltrace.
> OK, then I rethink of handling the race in unpoison_memory().
>
> Currently properly contained/hwpoisoned pages should have page refcount 1
> (when the memory error hits LRU pages or hugetlb pages) or refcount 0
> (when the memory error hits the buddy page.) And current unpoison_memory()
> implicitly assumes this because otherwise the unpoisoned page has no place
> to go and it's just leaked.
> So to avoid the kernel panic, adding prechecks of refcount and mapcount
> to limit the page to unpoison for only unpoisonable pages looks OK to me.
> The page under soft offlining always has refcount >=2 and/or mapcount > 0,
> so such pages should be filtered out.
>
> Here's a patch. In my testing (run soft offline stress testing then repeat
> unpoisoning in background,) the reported (or similar) bug doesn't happen.
> Can I have your comments?

As page_action() prints out page maybe still referenced by some users,
however, PageHWPoison has already set. So you will leak many poison pages.

Regards,
Wanpeng Li

>
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Subject: [PATCH] mm/hwpoison: don't unpoison for pinned or mapped page
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/memory-failure.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d1f85f6278ee..c6f14d2cd919 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1442,6 +1442,16 @@ int unpoison_memory(unsigned long pfn)
>  		return 0;
>  	}
>  
> +	if (page_count(page) > 1) {
> +		pr_info("MCE: Someone grabs the hwpoison page %#lx\n", pfn);
> +		return 0;
> +	}
> +
> +	if (page_mapped(page)) {
> +		pr_info("MCE: Someone maps the hwpoison page %#lx\n", pfn);
> +		return 0;
> +	}
> +
>  	/*
>  	 * unpoison_memory() can encounter thp only when the thp is being
>  	 * worked by memory_failure() and the page lock is not held yet.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  7:54               ` Wanpeng Li
@ 2015-08-14  7:59                 ` Wanpeng Li
  2015-08-14  8:38                   ` Naoya Horiguchi
  2015-08-14  8:02                 ` Naoya Horiguchi
  1 sibling, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2015-08-14  7:59 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel

On 8/14/15 3:54 PM, Wanpeng Li wrote:
> [...]
>> OK, then I rethink of handling the race in unpoison_memory().
>>
>> Currently properly contained/hwpoisoned pages should have page refcount 1
>> (when the memory error hits LRU pages or hugetlb pages) or refcount 0
>> (when the memory error hits the buddy page.) And current unpoison_memory()
>> implicitly assumes this because otherwise the unpoisoned page has no place
>> to go and it's just leaked.
>> So to avoid the kernel panic, adding prechecks of refcount and mapcount
>> to limit the page to unpoison for only unpoisonable pages looks OK to me.
>> The page under soft offlining always has refcount >=2 and/or mapcount > 0,
>> so such pages should be filtered out.
>>
>> Here's a patch. In my testing (run soft offline stress testing then repeat
>> unpoisoning in background,) the reported (or similar) bug doesn't happen.
>> Can I have your comments?
> As page_action() prints out page maybe still referenced by some users,
> however, PageHWPoison has already set. So you will leak many poison pages.
>

Anyway, the bug is still there.

[  944.387559] BUG: Bad page state in process expr  pfn:591e3
[  944.393053] page:ffffea00016478c0 count:-1 mapcount:0 
mapping:          (null) index:0x2
[  944.401147] flags: 0x1fffff80000000()
[  944.404819] page dumped because: nonzero _count

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  7:54               ` Wanpeng Li
  2015-08-14  7:59                 ` Wanpeng Li
@ 2015-08-14  8:02                 ` Naoya Horiguchi
  1 sibling, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-14  8:02 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Andrew Morton, linux-mm, linux-kernel

On Fri, Aug 14, 2015 at 03:54:36PM +0800, Wanpeng Li wrote:
> On 8/14/15 3:26 PM, Naoya Horiguchi wrote:
> > On Fri, Aug 14, 2015 at 01:03:53PM +0800, Wanpeng Li wrote:
> >> On 8/14/15 12:19 PM, Naoya Horiguchi wrote:
> > ...
> >>>>>>> If I read correctly, the old migratetype approach has a few problems:
> >>>>>>>   1) it doesn't fix the problem completely, because
> >>>>>>>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
> >>>>>>>      target page if the pageblock of the page contains one or more
> >>>>>>>      unmovable pages (i.e. has_unmovable_pages() returns true).
> >>>>>>>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
> >>>>>>>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
> >>>>>>>      of the original migratetype state, which could impact other subsystems
> >>>>>>>      like memory hotplug or compaction.
> >>>>>> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
> >>>>>> current linus tree calltrace and it should be fixed immediately, and I
> >>>>>> don't see obvious bugs appear on migratetype stuffs at least currently,
> >>>>>> so "FIXME" is enough. :-)
> >>>>> Sorry if confusing, but my intention in saying about "FIXME" comment was
> >>>>> that we can find another solution for this race rather than just reverting,
> >>>>> so adding comment about the reported bug in current code (keeping code from
> >>>>> 4491f712606) is OK for very short term.
> >>>>> I understand that leaving a race window of BUG_ON is not the best thing, but
> >>>>> as I said, this race shouldn't affect end-users, so this is not an urgent bug.
> >>>>> # It's enough if testers know this.
> >>>> The 4.2 is coming, this patch can be applied as a temporal solution in
> >>>> order to fix the broken linus tree, and the any final solution can be
> >>>> figured out later.
> >>> I didn't reproduce this problem yet in my environment, but from code reading
> >>> I guess that checking PageHWPoison flag in unmap_and_move() like below could
> >>> avoid the problem. Could you testing with this, please?
> >> I have already try to modify unmap_and_move() the same as what you do
> >> before I post migratetype stuff. It doesn't work and have other calltrace.
> > OK, then I rethink of handling the race in unpoison_memory().
> >
> > Currently properly contained/hwpoisoned pages should have page refcount 1
> > (when the memory error hits LRU pages or hugetlb pages) or refcount 0
> > (when the memory error hits the buddy page.) And current unpoison_memory()
> > implicitly assumes this because otherwise the unpoisoned page has no place
> > to go and it's just leaked.
> > So to avoid the kernel panic, adding prechecks of refcount and mapcount
> > to limit the page to unpoison for only unpoisonable pages looks OK to me.
> > The page under soft offlining always has refcount >=2 and/or mapcount > 0,
> > so such pages should be filtered out.
> >
> > Here's a patch. In my testing (run soft offline stress testing then repeat
> > unpoisoning in background,) the reported (or similar) bug doesn't happen.
> > Can I have your comments?
> 
> As page_action() prints out page maybe still referenced by some users,
> however, PageHWPoison has already set. So you will leak many poison pages.

Right, but it isn't a problem, because error handling doesn't always succeed.
Our basic policy for such case is to leak the page intentionally. IOW, the
memory leakage happen even in current kernel (unpoison doesn't work because
leaked page never return to buddy.) So my suggestion doesn't make things worse.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  7:59                 ` Wanpeng Li
@ 2015-08-14  8:38                   ` Naoya Horiguchi
  2015-08-14  9:01                     ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-14  8:38 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Andrew Morton, linux-mm, linux-kernel

On Fri, Aug 14, 2015 at 03:59:21PM +0800, Wanpeng Li wrote:
> On 8/14/15 3:54 PM, Wanpeng Li wrote:
> >[...]
> >>OK, then I rethink of handling the race in unpoison_memory().
> >>
> >>Currently properly contained/hwpoisoned pages should have page refcount 1
> >>(when the memory error hits LRU pages or hugetlb pages) or refcount 0
> >>(when the memory error hits the buddy page.) And current unpoison_memory()
> >>implicitly assumes this because otherwise the unpoisoned page has no place
> >>to go and it's just leaked.
> >>So to avoid the kernel panic, adding prechecks of refcount and mapcount
> >>to limit the page to unpoison for only unpoisonable pages looks OK to me.
> >>The page under soft offlining always has refcount >=2 and/or mapcount > 0,
> >>so such pages should be filtered out.
> >>
> >>Here's a patch. In my testing (run soft offline stress testing then repeat
> >>unpoisoning in background,) the reported (or similar) bug doesn't happen.
> >>Can I have your comments?
> >As page_action() prints out page maybe still referenced by some users,
> >however, PageHWPoison has already set. So you will leak many poison pages.
> >
>
> Anyway, the bug is still there.
>
> [  944.387559] BUG: Bad page state in process expr  pfn:591e3
> [  944.393053] page:ffffea00016478c0 count:-1 mapcount:0 mapping:
> (null) index:0x2
> [  944.401147] flags: 0x1fffff80000000()
> [  944.404819] page dumped because: nonzero _count

Hmm, no luck :(

To investigate more, I'd like to test the exactly same kernel as yours, so
could you share the kernel info (.config and base kernel and what patches
you applied)? or pushing your tree somewhere like github?
# if you like, sending to me privately is fine.

I think that I tested v4.2-rc6 + <your recent 7 hwpoison patches> +
"mm/hwpoison: fix race between soft_offline_page and unpoison_memory",
but I experienced some conflict in applying your patches for some reason,
so it might happen that we are testing on different kernels.

Mine is here:
  https://github.com/Naoya-Horiguchi/linux v4.2-rc6/fix_race_soft_offline_unpoison

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  8:38                   ` Naoya Horiguchi
@ 2015-08-14  9:01                     ` Wanpeng Li
  2015-08-17  4:32                       ` Naoya Horiguchi
  0 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2015-08-14  9:01 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel

On 8/14/15 4:38 PM, Naoya Horiguchi wrote:
> On Fri, Aug 14, 2015 at 03:59:21PM +0800, Wanpeng Li wrote:
>> On 8/14/15 3:54 PM, Wanpeng Li wrote:
>>> [...]
>>>> OK, then I rethink of handling the race in unpoison_memory().
>>>>
>>>> Currently properly contained/hwpoisoned pages should have page refcount 1
>>>> (when the memory error hits LRU pages or hugetlb pages) or refcount 0
>>>> (when the memory error hits the buddy page.) And current unpoison_memory()
>>>> implicitly assumes this because otherwise the unpoisoned page has no place
>>>> to go and it's just leaked.
>>>> So to avoid the kernel panic, adding prechecks of refcount and mapcount
>>>> to limit the page to unpoison for only unpoisonable pages looks OK to me.
>>>> The page under soft offlining always has refcount >=2 and/or mapcount > 0,
>>>> so such pages should be filtered out.
>>>>
>>>> Here's a patch. In my testing (run soft offline stress testing then repeat
>>>> unpoisoning in background,) the reported (or similar) bug doesn't happen.
>>>> Can I have your comments?
>>> As page_action() prints out page maybe still referenced by some users,
>>> however, PageHWPoison has already set. So you will leak many poison pages.
>>>
>> Anyway, the bug is still there.
>>
>> [  944.387559] BUG: Bad page state in process expr  pfn:591e3
>> [  944.393053] page:ffffea00016478c0 count:-1 mapcount:0 mapping:
>> (null) index:0x2
>> [  944.401147] flags: 0x1fffff80000000()
>> [  944.404819] page dumped because: nonzero _count
> Hmm, no luck :(
>
> To investigate more, I'd like to test the exactly same kernel as yours, so
> could you share the kernel info (.config and base kernel and what patches
> you applied)? or pushing your tree somewhere like github?
> # if you like, sending to me privately is fine.
>
> I think that I tested v4.2-rc6 + <your recent 7 hwpoison patches> +
> "mm/hwpoison: fix race between soft_offline_page and unpoison_memory",
> but I experienced some conflict in applying your patches for some reason,
> so it might happen that we are testing on different kernels.

I don't have special config and tree, the latest mmotm has already
merged my recent 8 hwpoison patches, you can test based on it.

Regards,
Wanpeng Li

>
> Mine is here:
>   https://github.com/Naoya-Horiguchi/linux v4.2-rc6/fix_race_soft_offline_unpoison
>
> Thanks,
> Naoya Horiguchi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-14  9:01                     ` Wanpeng Li
@ 2015-08-17  4:32                       ` Naoya Horiguchi
  2015-08-17  4:32                         ` [PATCH v2 2/3] " Naoya Horiguchi
                                           ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-17  4:32 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi, Naoya Horiguchi

On Fri, Aug 14, 2015 at 05:01:34PM +0800, Wanpeng Li wrote:
> On 8/14/15 4:38 PM, Naoya Horiguchi wrote:
> > On Fri, Aug 14, 2015 at 03:59:21PM +0800, Wanpeng Li wrote:
> >> On 8/14/15 3:54 PM, Wanpeng Li wrote:
> >>> [...]
> >>>> OK, then I rethink of handling the race in unpoison_memory().
> >>>>
> >>>> Currently properly contained/hwpoisoned pages should have page refcount 1
> >>>> (when the memory error hits LRU pages or hugetlb pages) or refcount 0
> >>>> (when the memory error hits the buddy page.) And current unpoison_memory()
> >>>> implicitly assumes this because otherwise the unpoisoned page has no place
> >>>> to go and it's just leaked.
> >>>> So to avoid the kernel panic, adding prechecks of refcount and mapcount
> >>>> to limit the page to unpoison for only unpoisonable pages looks OK to me.
> >>>> The page under soft offlining always has refcount >=2 and/or mapcount > 0,
> >>>> so such pages should be filtered out.
> >>>>
> >>>> Here's a patch. In my testing (run soft offline stress testing then repeat
> >>>> unpoisoning in background,) the reported (or similar) bug doesn't happen.
> >>>> Can I have your comments?
> >>> As page_action() prints out page maybe still referenced by some users,
> >>> however, PageHWPoison has already set. So you will leak many poison pages.
> >>>
> >> Anyway, the bug is still there.
> >>
> >> [  944.387559] BUG: Bad page state in process expr  pfn:591e3
> >> [  944.393053] page:ffffea00016478c0 count:-1 mapcount:0 mapping:
> >> (null) index:0x2
> >> [  944.401147] flags: 0x1fffff80000000()
> >> [  944.404819] page dumped because: nonzero _count
> > Hmm, no luck :(
> >
> > To investigate more, I'd like to test the exactly same kernel as yours, so
> > could you share the kernel info (.config and base kernel and what patches
> > you applied)? or pushing your tree somewhere like github?
> > # if you like, sending to me privately is fine.
> >
> > I think that I tested v4.2-rc6 + <your recent 7 hwpoison patches> +
> > "mm/hwpoison: fix race between soft_offline_page and unpoison_memory",
> > but I experienced some conflict in applying your patches for some reason,
> > so it might happen that we are testing on different kernels.
> 
> I don't have special config and tree, the latest mmotm has already
> merged my recent 8 hwpoison patches, you can test based on it.

OK, so I wrote the next version against mmotm-2015-08-13-15-29 (replied to
this email.) It moves PageSetHWPoison part into migration code, which should
close up the reported race window and minimize the another revived race window
of reusing offlined pages, so I feel that it's a good compromise between two
races.

My testing shows no kernel panic with these patches (same testing easily caused
panics for bare mmotm-2015-08-13-15-29,) so they should work. But I'm appreciated
if you help double checking.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/3] mm/hwpoison: introduce num_poisoned_pages wrappers
  2015-08-17  4:32                       ` Naoya Horiguchi
  2015-08-17  4:32                         ` [PATCH v2 2/3] " Naoya Horiguchi
  2015-08-17  4:32                         ` [PATCH v2 3/3] mm/hwpoison: don't try to unpoison containment-failed pages Naoya Horiguchi
@ 2015-08-17  4:32                         ` Naoya Horiguchi
  2015-08-17  5:29                         ` [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Wanpeng Li
  3 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-17  4:32 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi, Naoya Horiguchi

num_poisoned_pages counter will be changed outside mm/memory-failure.c by a
subsequent patch, so this patch prepares wrappers to manipulate it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/swapops.h | 23 +++++++++++++++++++++++
 mm/memory-failure.c     | 30 ++++++++++++++----------------
 2 files changed, 37 insertions(+), 16 deletions(-)

diff --git mmotm-2015-08-13-15-29.orig/include/linux/swapops.h mmotm-2015-08-13-15-29/include/linux/swapops.h
index cedf3d3c373f..ec04669f2a3b 100644
--- mmotm-2015-08-13-15-29.orig/include/linux/swapops.h
+++ mmotm-2015-08-13-15-29/include/linux/swapops.h
@@ -164,6 +164,9 @@ static inline int is_write_migration_entry(swp_entry_t entry)
 #endif
 
 #ifdef CONFIG_MEMORY_FAILURE
+
+extern atomic_long_t num_poisoned_pages __read_mostly;
+
 /*
  * Support for hardware poisoned pages
  */
@@ -177,6 +180,26 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
 {
 	return swp_type(entry) == SWP_HWPOISON;
 }
+
+static inline void num_poisoned_pages_inc(void)
+{
+	atomic_long_inc(&num_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_dec(void)
+{
+	atomic_long_dec(&num_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_add(long num)
+{
+	atomic_long_add(num, &num_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_sub(long num)
+{
+	atomic_long_sub(num, &num_poisoned_pages);
+}
 #else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
diff --git mmotm-2015-08-13-15-29.orig/mm/memory-failure.c mmotm-2015-08-13-15-29/mm/memory-failure.c
index a3d9c3946b36..ac3056c6fe9f 100644
--- mmotm-2015-08-13-15-29.orig/mm/memory-failure.c
+++ mmotm-2015-08-13-15-29/mm/memory-failure.c
@@ -1109,7 +1109,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 		nr_pages = 1 << compound_order(hpage);
 	else /* normal page or thp */
 		nr_pages = 1;
-	atomic_long_add(nr_pages, &num_poisoned_pages);
+	num_poisoned_pages_add(nr_pages);
 
 	/*
 	 * We need/can do nothing about count=0 pages.
@@ -1137,7 +1137,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 			if (PageHWPoison(hpage)) {
 				if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
 				    || (p != hpage && TestSetPageHWPoison(hpage))) {
-					atomic_long_sub(nr_pages, &num_poisoned_pages);
+					num_poisoned_pages_sub(nr_pages);
 					unlock_page(hpage);
 					return 0;
 				}
@@ -1161,7 +1161,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 			else
 				pr_err("MCE: %#lx: thp split failed\n", pfn);
 			if (TestClearPageHWPoison(p))
-				atomic_long_sub(nr_pages, &num_poisoned_pages);
+				num_poisoned_pages_sub(nr_pages);
 			put_hwpoison_page(p);
 			return -EBUSY;
 		}
@@ -1221,14 +1221,14 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	 */
 	if (!PageHWPoison(p)) {
 		printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn);
-		atomic_long_sub(nr_pages, &num_poisoned_pages);
+		num_poisoned_pages_sub(nr_pages);
 		unlock_page(hpage);
 		put_hwpoison_page(hpage);
 		return 0;
 	}
 	if (hwpoison_filter(p)) {
 		if (TestClearPageHWPoison(p))
-			atomic_long_sub(nr_pages, &num_poisoned_pages);
+			num_poisoned_pages_sub(nr_pages);
 		unlock_page(hpage);
 		put_hwpoison_page(hpage);
 		return 0;
@@ -1457,7 +1457,7 @@ int unpoison_memory(unsigned long pfn)
 			return 0;
 		}
 		if (TestClearPageHWPoison(p))
-			atomic_long_dec(&num_poisoned_pages);
+			num_poisoned_pages_dec();
 		pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
 		return 0;
 	}
@@ -1471,7 +1471,7 @@ int unpoison_memory(unsigned long pfn)
 	 */
 	if (TestClearPageHWPoison(page)) {
 		pr_info("MCE: Software-unpoisoned page %#lx\n", pfn);
-		atomic_long_sub(nr_pages, &num_poisoned_pages);
+		num_poisoned_pages_sub(nr_pages);
 		freeit = 1;
 		if (PageHuge(page))
 			clear_page_hwpoison_huge_page(page);
@@ -1607,11 +1607,10 @@ static int soft_offline_huge_page(struct page *page, int flags)
 		if (PageHuge(page)) {
 			set_page_hwpoison_huge_page(hpage);
 			dequeue_hwpoisoned_huge_page(hpage);
-			atomic_long_add(1 << compound_order(hpage),
-					&num_poisoned_pages);
+			num_poisoned_pages_add(1 << compound_order(hpage));
 		} else {
 			SetPageHWPoison(page);
-			atomic_long_inc(&num_poisoned_pages);
+			num_poisoned_pages_inc();
 		}
 	}
 	return ret;
@@ -1650,7 +1649,7 @@ static int __soft_offline_page(struct page *page, int flags)
 		put_hwpoison_page(page);
 		pr_info("soft_offline: %#lx: invalidated\n", pfn);
 		SetPageHWPoison(page);
-		atomic_long_inc(&num_poisoned_pages);
+		num_poisoned_pages_inc();
 		return 0;
 	}
 
@@ -1671,7 +1670,7 @@ static int __soft_offline_page(struct page *page, int flags)
 					page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
 		if (!TestSetPageHWPoison(page))
-			atomic_long_inc(&num_poisoned_pages);
+			num_poisoned_pages_dec();
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
 		if (ret) {
@@ -1687,7 +1686,7 @@ static int __soft_offline_page(struct page *page, int flags)
 			if (ret > 0)
 				ret = -EIO;
 			if (TestClearPageHWPoison(page))
-				atomic_long_dec(&num_poisoned_pages);
+				num_poisoned_pages_dec();
 		}
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
@@ -1753,11 +1752,10 @@ int soft_offline_page(struct page *page, int flags)
 		if (PageHuge(page)) {
 			set_page_hwpoison_huge_page(hpage);
 			if (!dequeue_hwpoisoned_huge_page(hpage))
-				atomic_long_add(1 << compound_order(hpage),
-					&num_poisoned_pages);
+				num_poisoned_pages_add(1 << compound_order(hpage));
 		} else {
 			if (!TestSetPageHWPoison(page))
-				atomic_long_inc(&num_poisoned_pages);
+				num_poisoned_pages_inc();
 		}
 	}
 	return ret;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/3] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-17  4:32                       ` Naoya Horiguchi
@ 2015-08-17  4:32                         ` Naoya Horiguchi
  2015-08-17  4:32                         ` [PATCH v2 3/3] mm/hwpoison: don't try to unpoison containment-failed pages Naoya Horiguchi
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-17  4:32 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi, Naoya Horiguchi

From: Wanpeng Li <wanpeng.li@hotmail.com>

Wanpeng Li reported a race between soft_offline_page() and unpoison_memory(),
which causes the following kernel panic:

  [   61.572584] BUG: Bad page state in process bash  pfn:97000
  [   61.578106] page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
  [   61.586803] flags: 0x1fffff80080048(uptodate|active|swapbacked)
  [   61.592809] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
  [   61.599250] bad because of flags:
  [   61.602567] flags: 0x40(active)
  [   61.605746] Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
  [   61.605827] CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
  [   61.605829] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
  [   61.605832]  ffffffff818b3be8 ffff8800da373ad8 ffffffff8165ceb4 0000000001313ce1
  [   61.605837]  ffffea00025c0000 ffff8800da373b08 ffffffff8117bdd6 ffff88021edd4b00
  [   61.605842]  0000000000000001 001fffff80080048 0000000000000000 ffff8800da373b88
  [   61.605847] Call Trace:
  [   61.605858]  [<ffffffff8165ceb4>] dump_stack+0x48/0x5c
  [   61.605865]  [<ffffffff8117bdd6>] bad_page+0xe6/0x140
  [   61.605870]  [<ffffffff8117dfc9>] free_pages_prepare+0x2f9/0x320
  [   61.605876]  [<ffffffff811e817d>] ? uncharge_list+0xdd/0x100
  [   61.605882]  [<ffffffff8117ff20>] free_hot_cold_page+0x40/0x170
  [   61.605888]  [<ffffffff81185dd0>] __put_single_page+0x20/0x30
  [   61.605892]  [<ffffffff81186675>] put_page+0x25/0x40
  [   61.605897]  [<ffffffff811dc276>] unmap_and_move+0x1a6/0x1f0
  [   61.605908]  [<ffffffff811dc3c0>] migrate_pages+0x100/0x1d0
  [   61.605914]  [<ffffffff811eb710>] ? kill_procs+0x100/0x100
  [   61.605918]  [<ffffffff811764af>] ? unlock_page+0x6f/0x90
  [   61.605923]  [<ffffffff811ecf37>] __soft_offline_page+0x127/0x2a0
  [   61.605928]  [<ffffffff811ed156>] soft_offline_page+0xa6/0x200

This race is explained like below:

  CPU0                    CPU1

  soft_offline_page
  __soft_offline_page
  TestSetPageHWPoison
                        unpoison_memory
                        PageHWPoison check (true)
                        TestClearPageHWPoison
                        put_page    -> release refcount held by get_hwpoison_page in unpoison_memory
                        put_page    -> release refcount held by isolate_lru_page in __soft_offline_page
  migrate_pages

The second put_page() releases refcount held by isolate_lru_page() which
will lead to unmap_and_move() releases the last refcount of page and w/
mapcount still 1 since try_to_unmap() is not called if there is only
one user map the page. Anyway, the page refcount and mapcount will
still mess if the page is mapped by multiple users.

This race was introduced by commit 4491f71260 ("mm/memory-failure: set
PageHWPoison before migrate_pages()"), which focuses on preventing the reuse
of successfully migrated page. Before this commit we prevent the reuse by
changing the migratetype to MIGRATE_ISOLATE during soft offlining, which has
the following problems, so simply reverting the commit is not a best option:
  1) it doesn't eliminate the reuse completely, because set_migratetype_isolate()
     can fail to set MIGRATE_ISOLATE to the target page if the pageblock of
     the page contains one or more unmovable pages (i.e. has_unmovable_pages()
     returns true).
  2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
     and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
     of the original migratetype state, which could impact other subsystems
     like memory hotplug or compaction.

This patch moves PageSetHWPoison just after put_page() in unmap_and_move(),
which closes up the reported race window and minimizes another race window b/w
SetPageHWPoison and reallocation (which causes the reuse of soft-offlined page.)
The latter race window still exists but it's acceptable, because it's rare and
effectively the same as ordinary "containment failure" case even if it happens,
so keep the window open is acceptable.

Fixes: 4491f71260 ("mm/memory-failure: set PageHWPoison before migrate_pages()")
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/swapops.h | 14 ++++++++++++++
 mm/memory-failure.c     |  4 ----
 mm/migrate.c            |  9 +++++----
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git mmotm-2015-08-13-15-29.orig/include/linux/swapops.h mmotm-2015-08-13-15-29/include/linux/swapops.h
index ec04669f2a3b..5c3a5f3e7eec 100644
--- mmotm-2015-08-13-15-29.orig/include/linux/swapops.h
+++ mmotm-2015-08-13-15-29/include/linux/swapops.h
@@ -181,6 +181,11 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
 	return swp_type(entry) == SWP_HWPOISON;
 }
 
+static inline bool test_set_page_hwpoison(struct page *page)
+{
+	return TestSetPageHWPoison(page);
+}
+
 static inline void num_poisoned_pages_inc(void)
 {
 	atomic_long_inc(&num_poisoned_pages);
@@ -211,6 +216,15 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
 {
 	return 0;
 }
+
+static inline bool test_set_page_hwpoison(struct page *page)
+{
+	return false;
+}
+
+static inline void num_poisoned_pages_inc(void)
+{
+}
 #endif
 
 #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
diff --git mmotm-2015-08-13-15-29.orig/mm/memory-failure.c mmotm-2015-08-13-15-29/mm/memory-failure.c
index ac3056c6fe9f..7986db56e240 100644
--- mmotm-2015-08-13-15-29.orig/mm/memory-failure.c
+++ mmotm-2015-08-13-15-29/mm/memory-failure.c
@@ -1669,8 +1669,6 @@ static int __soft_offline_page(struct page *page, int flags)
 		inc_zone_page_state(page, NR_ISOLATED_ANON +
 					page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
-		if (!TestSetPageHWPoison(page))
-			num_poisoned_pages_dec();
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
 		if (ret) {
@@ -1685,8 +1683,6 @@ static int __soft_offline_page(struct page *page, int flags)
 				pfn, ret, page->flags);
 			if (ret > 0)
 				ret = -EIO;
-			if (TestClearPageHWPoison(page))
-				num_poisoned_pages_dec();
 		}
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
diff --git mmotm-2015-08-13-15-29.orig/mm/migrate.c mmotm-2015-08-13-15-29/mm/migrate.c
index d3bae49f89cc..fbf17988ab5f 100644
--- mmotm-2015-08-13-15-29.orig/mm/migrate.c
+++ mmotm-2015-08-13-15-29/mm/migrate.c
@@ -886,8 +886,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	/* Establish migration ptes or remove ptes */
 	if (page_mapped(page)) {
 		try_to_unmap(page,
-			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
-			TTU_IGNORE_HWPOISON);
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 		page_was_mapped = 1;
 	}
 
@@ -958,9 +957,11 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
 		/* Soft-offlined page shouldn't go through lru cache list */
-		if (reason == MR_MEMORY_FAILURE)
+		if (reason == MR_MEMORY_FAILURE) {
 			put_page(page);
-		else
+			if (!test_set_page_hwpoison(page))
+				num_poisoned_pages_inc();
+		} else
 			putback_lru_page(page);
 	}
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/3] mm/hwpoison: don't try to unpoison containment-failed pages
  2015-08-17  4:32                       ` Naoya Horiguchi
  2015-08-17  4:32                         ` [PATCH v2 2/3] " Naoya Horiguchi
@ 2015-08-17  4:32                         ` Naoya Horiguchi
  2015-08-17  4:32                         ` [PATCH v2 1/3] mm/hwpoison: introduce num_poisoned_pages wrappers Naoya Horiguchi
  2015-08-17  5:29                         ` [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Wanpeng Li
  3 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2015-08-17  4:32 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi, Naoya Horiguchi

memory_failure() can be called at any page at any time, which means that we
can't eliminate the possibility of containment failure. In such case the best
option is to leak the page intentionally (and never touch it later.)

We have an unpoison function for testing, and it cannot handle such
containment-failed pages, which results in kernel panic (visible with various
calltraces.) So this patch suggests that we limit the unpoisonable pages to
properly contained pages and ignore any other ones.

Testers are recommended to keep in mind that there're un-unpoisonable pages
when writing test programs.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git mmotm-2015-08-13-15-29.orig/mm/memory-failure.c mmotm-2015-08-13-15-29/mm/memory-failure.c
index 7986db56e240..613389e9e5a8 100644
--- mmotm-2015-08-13-15-29.orig/mm/memory-failure.c
+++ mmotm-2015-08-13-15-29/mm/memory-failure.c
@@ -1433,6 +1433,22 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
+	if (page_count(page) > 1) {
+		pr_info("MCE: Someone grabs the hwpoison page %#lx\n", pfn);
+		return 0;
+	}
+
+	if (page_mapped(page)) {
+		pr_info("MCE: Someone maps the hwpoison page %#lx\n", pfn);
+		return 0;
+	}
+
+	if (page_mapping(page)) {
+		pr_info("MCE: the hwpoison page has non-NULL mapping %#lx\n",
+			pfn);
+		return 0;
+	}
+
 	/*
 	 * unpoison_memory() can encounter thp only when the thp is being
 	 * worked by memory_failure() and the page lock is not held yet.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
  2015-08-17  4:32                       ` Naoya Horiguchi
                                           ` (2 preceding siblings ...)
  2015-08-17  4:32                         ` [PATCH v2 1/3] mm/hwpoison: introduce num_poisoned_pages wrappers Naoya Horiguchi
@ 2015-08-17  5:29                         ` Wanpeng Li
  3 siblings, 0 replies; 18+ messages in thread
From: Wanpeng Li @ 2015-08-17  5:29 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi

On 8/17/15 12:32 PM, Naoya Horiguchi wrote:
> [...]
> OK, so I wrote the next version against mmotm-2015-08-13-15-29 (replied to
> this email.) It moves PageSetHWPoison part into migration code, which should
> close up the reported race window and minimize the another revived race window
> of reusing offlined pages, so I feel that it's a good compromise between two
> races.
>
> My testing shows no kernel panic with these patches (same testing easily caused
> panics for bare mmotm-2015-08-13-15-29,) so they should work. But I'm appreciated
> if you help double checking.

This patchset looks good to me after some stress testing.

Andrew,

Could we pick it in order to catch up upcoming merge window? :-)

Regards,
Wanpeng Li


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-08-17  5:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-13  7:09 [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Wanpeng Li
2015-08-13  8:53 ` Naoya Horiguchi
2015-08-13  9:18   ` Wanpeng Li
2015-08-13 10:04     ` Naoya Horiguchi
2015-08-13 10:27       ` Wanpeng Li
2015-08-14  4:19         ` Naoya Horiguchi
2015-08-14  5:03           ` Wanpeng Li
2015-08-14  7:26             ` Naoya Horiguchi
2015-08-14  7:54               ` Wanpeng Li
2015-08-14  7:59                 ` Wanpeng Li
2015-08-14  8:38                   ` Naoya Horiguchi
2015-08-14  9:01                     ` Wanpeng Li
2015-08-17  4:32                       ` Naoya Horiguchi
2015-08-17  4:32                         ` [PATCH v2 2/3] " Naoya Horiguchi
2015-08-17  4:32                         ` [PATCH v2 3/3] mm/hwpoison: don't try to unpoison containment-failed pages Naoya Horiguchi
2015-08-17  4:32                         ` [PATCH v2 1/3] mm/hwpoison: introduce num_poisoned_pages wrappers Naoya Horiguchi
2015-08-17  5:29                         ` [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Wanpeng Li
2015-08-14  8:02                 ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).