linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] mm: hwpoison: fix mce-test.ras.fail on 23a003bfd2
@ 2017-04-26  9:10 Naoya Horiguchi
  2017-04-26  9:10 ` [PATCH v1 1/2] mm: hwpoison: call shake_page() unconditionally Naoya Horiguchi
  2017-04-26  9:10 ` [PATCH v1 2/2] mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page Naoya Horiguchi
  0 siblings, 2 replies; 3+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  9:10 UTC (permalink / raw)
  To: linux-mm
  Cc: xiaolong.ye, Andrew Morton, Chen Gong, lkp, linux-kernel,
	Naoya Horiguchi, Naoya Horiguchi

Hello,

I wrote fixes for the problems reported by http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop.
These 2 patches make tsimpleinj.c delivered in mce-test test suite pass.

  [build8:~/mce-test-official/cases/function/hwpoison]$ ./tsimpleinj
  dirty page 0x7f657aecb000
  signal 7 code 4 addr 0x7f657aecb000
  recovered
  mlocked page 0x7f657aeca000
  signal 7 code 4 addr 0x7f657aeca000
  recovered
  clean file page 0x7f657aec9000
  23
  recovered
  file dirty page 0x7f657aec8000
  signal 7 code 4 addr 0x7f657aec8000
  recovered
  no error on msync expect error
  no error on fsync expect error
  hole file dirty page 0x7f657aec7000
  signal 7 code 4 addr 0x7f657aec7000
  recovered
  no error on hole msync expect error
  no error on hole fsync expect error
  SUCCESS

I'm digging another similar issue for hugetlb pages, which need some more
research and code, so I'll send it separately later.

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (2):
      mm: hwpoison: call shake_page() unconditionally
      mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page

 mm/hwpoison-inject.c |  3 +--
 mm/memory-failure.c  | 35 +++++++++++++++++++----------------
 2 files changed, 20 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v1 1/2] mm: hwpoison: call shake_page() unconditionally
  2017-04-26  9:10 [PATCH v1 0/2] mm: hwpoison: fix mce-test.ras.fail on 23a003bfd2 Naoya Horiguchi
@ 2017-04-26  9:10 ` Naoya Horiguchi
  2017-04-26  9:10 ` [PATCH v1 2/2] mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page Naoya Horiguchi
  1 sibling, 0 replies; 3+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  9:10 UTC (permalink / raw)
  To: linux-mm
  Cc: xiaolong.ye, Andrew Morton, Chen Gong, lkp, linux-kernel,
	Naoya Horiguchi, Naoya Horiguchi

shake_page() is called before going into core error handling code in order
to ensure that the error page is flushed from lru_cache lists where pages
stay during transferring among LRU lists.
But currently it's not fully functional because when the page is linked to
lru_cache by calling activate_page(), its PageLRU flag is set and
shake_page() is skipped. The result is to fail error handling with "still
referenced by 1 users" message.
When the page is linked to lru_cache by isolate_lru_page(), its PageLRU is
clear, so that's fine.

This patch makes shake_page() unconditionally called to avoild the failure.

Link: http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/hwpoison-inject.c |  3 +--
 mm/memory-failure.c  | 27 +++++++++++----------------
 2 files changed, 12 insertions(+), 18 deletions(-)

diff --git v4.11-rc6-mmotm-2017-04-13-14-50/mm/hwpoison-inject.c v4.11-rc6-mmotm-2017-04-13-14-50_patched/mm/hwpoison-inject.c
index 9d26fd9..356df05 100644
--- v4.11-rc6-mmotm-2017-04-13-14-50/mm/hwpoison-inject.c
+++ v4.11-rc6-mmotm-2017-04-13-14-50_patched/mm/hwpoison-inject.c
@@ -34,8 +34,7 @@ static int hwpoison_inject(void *data, u64 val)
 	if (!hwpoison_filter_enable)
 		goto inject;
 
-	if (!PageLRU(hpage) && !PageHuge(p))
-		shake_page(hpage, 0);
+	shake_page(hpage, 0);
 	/*
 	 * This implies unable to support non-LRU pages.
 	 */
diff --git v4.11-rc6-mmotm-2017-04-13-14-50/mm/memory-failure.c v4.11-rc6-mmotm-2017-04-13-14-50_patched/mm/memory-failure.c
index 8c02811..77cf9c3 100644
--- v4.11-rc6-mmotm-2017-04-13-14-50/mm/memory-failure.c
+++ v4.11-rc6-mmotm-2017-04-13-14-50_patched/mm/memory-failure.c
@@ -220,6 +220,9 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
  */
 void shake_page(struct page *p, int access)
 {
+	if (PageHuge(p))
+		return;
+
 	if (!PageSlab(p)) {
 		lru_add_drain_all();
 		if (PageLRU(p))
@@ -1140,22 +1143,14 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
 	 */
-	if (!PageHuge(p)) {
-		if (!PageLRU(p))
-			shake_page(p, 0);
-		if (!PageLRU(p)) {
-			/*
-			 * shake_page could have turned it free.
-			 */
-			if (is_free_buddy_page(p)) {
-				if (flags & MF_COUNT_INCREASED)
-					action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-				else
-					action_result(pfn, MF_MSG_BUDDY_2ND,
-						      MF_DELAYED);
-				return 0;
-			}
-		}
+	shake_page(p, 0);
+	/* shake_page could have turned it free. */
+	if (!PageLRU(p) && is_free_buddy_page(p)) {
+		if (flags & MF_COUNT_INCREASED)
+			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
+		else
+			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+		return 0;
 	}
 
 	lock_page(hpage);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v1 2/2] mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
  2017-04-26  9:10 [PATCH v1 0/2] mm: hwpoison: fix mce-test.ras.fail on 23a003bfd2 Naoya Horiguchi
  2017-04-26  9:10 ` [PATCH v1 1/2] mm: hwpoison: call shake_page() unconditionally Naoya Horiguchi
@ 2017-04-26  9:10 ` Naoya Horiguchi
  1 sibling, 0 replies; 3+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  9:10 UTC (permalink / raw)
  To: linux-mm
  Cc: xiaolong.ye, Andrew Morton, Chen Gong, lkp, linux-kernel,
	Naoya Horiguchi, Naoya Horiguchi

Memory error handler calls try_to_unmap() for error pages in various
states. If the error page is a mlocked page, error handling could fail
with "still referenced by 1 users" message. This is because the page
is linked to and stays in lru cache after the following call chain.

  try_to_unmap_one
    page_remove_rmap
      clear_page_mlock
        putback_lru_page
          lru_cache_add

memory_failure() calls shake_page() to hanlde the similar issue, but
current code doesn't cover because shake_page() is called only before
try_to_unmap(). So this patches adds shake_page().

Link: http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git v4.11-rc6-mmotm-2017-04-13-14-50/mm/memory-failure.c v4.11-rc6-mmotm-2017-04-13-14-50_patched/mm/memory-failure.c
index 77cf9c3..57f07ec 100644
--- v4.11-rc6-mmotm-2017-04-13-14-50/mm/memory-failure.c
+++ v4.11-rc6-mmotm-2017-04-13-14-50_patched/mm/memory-failure.c
@@ -919,6 +919,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	bool unmap_success;
 	int kill = 1, forcekill;
 	struct page *hpage = *hpagep;
+	bool mlocked = PageMlocked(hpage);
 
 	/*
 	 * Here we are interested only in user-mapped pages, so skip any
@@ -983,6 +984,13 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 		       pfn, page_mapcount(hpage));
 
 	/*
+	 * try_to_unmap() might put mlocked page in lru cache, so call
+	 * shake_page() again to ensure that it's flushed.
+	 */
+	if (mlocked)
+		shake_page(hpage, 0);
+
+	/*
 	 * Now that the dirty bit has been propagated to the
 	 * struct page and all unmaps done we can decide if
 	 * killing is needed or not.  Only kill when the page
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-26  9:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26  9:10 [PATCH v1 0/2] mm: hwpoison: fix mce-test.ras.fail on 23a003bfd2 Naoya Horiguchi
2017-04-26  9:10 ` [PATCH v1 1/2] mm: hwpoison: call shake_page() unconditionally Naoya Horiguchi
2017-04-26  9:10 ` [PATCH v1 2/2] mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).