stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <naoya.horiguchi@linux.dev>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Yang Shi" <shy828301@gmail.com>,
	"James Houghton" <jthoughton@google.com>,
	"Muchun Song" <songmuchun@bytedance.com>,
	"Miaohe Lin" <linmiaohe@huawei.com>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: hwpoison, shmem: fix data lost issue for 5.15.y
Date: Tue, 15 Nov 2022 15:39:12 +0900	[thread overview]
Message-ID: <20221115063912.GA3928893@u2004> (raw)
In-Reply-To: <Y3LrtTmLdBU7atso@monkey>

On Mon, Nov 14, 2022 at 05:30:29PM -0800, Mike Kravetz wrote:
> On 11/15/22 01:16, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Nov 14, 2022 at 02:53:51PM -0800, Mike Kravetz wrote:
> > > On 11/15/22 07:39, Naoya Horiguchi wrote:
> > > > On Mon, Nov 14, 2022 at 05:11:35PM +0100, Greg KH wrote:
> > > > > On Mon, Nov 14, 2022 at 10:14:03PM +0900, Naoya Horiguchi wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I'd like to request the follow commits to be backported to 5.15.y.
> > > > > > 
> > > > > > - dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
> > > > > > - 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
> > > > > > - a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
> > > > > > 
> > > > > > These patches fixed a data lost issue by preventing shmem pagecache from
> > > > > > being removed by memory error.  These were not tagged for stable originally,
> > > > > > but that's revisited recently.
> > > > > 
> > > > > And have you tested that these all apply properly (and in which order?)
> > > > 
> > > > Yes, I've checked that these cleanly apply (without any change) on
> > > > 5.15.78 in the above order (i.e. dd0f23 is first, 496645 comes next,
> > > > then a76054).
> > > > 
> > > > > and work correctly?
> > > > 
> > > > Yes, I ran related testcases in my test suite, and their status changed
> > > > FAIL to PASS with these patches.
> > > 
> > > Hi Naoya,
> > > 
> > > Just curious if you have plans to do backports for earlier releases?
> > 
> > I didn't have a clear plan.  I just thought that we should backport to
> > earlier kernels if someone want and the patches are applicable easily
> > enough and well-tested.
> > 
> > > 
> > > If not, I can start that effort.  We have seen data loss/corruption because of
> > > this on a 4.14 based release.   So, I would go at least that far back.
> > 
> > Thank you for raising hand, that's really helpful.
> > 
> > Maybe dd0f230a0a80 ("[PATCH] hugetlbfs: don't delete error page from

# I meant 8625147cafaa, sorry if the wrong commit ID confused you.

I tested with 8625147cafaa too, and it made hugetlb-related testcases
passed.

> > pagecbache") should be considered to backport together, because it's
> > the similar issue and reported (a while ago) to fail to backport.
> 
> Since dd0f230a0a80 was marked for backports, Greg's automation flags it as
> FAILED due to conflicts in earlier releases.  I am not sure if James has
> a plan to do backports for dd0f230a0a80.  Again, this is also something I
> would help with due to real customer issues.

OK, so I do want 8625147cafaa to be merged to stable.
Another reason why I addressed 8625147cafaa here is that this patch depends
on the code handling "extra_pins" (given by dd0f230a0a80) and it's simpler
to handle together.

We need to slightly modify 8625147cafaa to apply to 5.15.y.  So in summary,
my updated suggestion for 5.15.y is like below:

- [1/4] cherry-pick dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
- [2/4] cherry-pick 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
- [3/4] cherry-pick a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
- [4/4] apply the following patch (as a modified version of 8625147cafaa)

Thanks,
Naoya Horiguchi
---
From bdd88699d8ba6907e0815608482f8c0e2170982d Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton@google.com>
Date: Tue, 15 Nov 2022 15:20:51 +0900
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache

commit 8625147cafaa9ba74713d682f5185eb62cb2aedb upstream.

This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.

Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache.  That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned.  As [1] states,
this is effectively memory corruption.

The fix is to leave the page in the page cache.  If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses.  For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.

[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")

Although there's a conflict in hugetlbfs_error_remove_page() when
backporting to stable tree, the resolution is almost trivial (just
removing conflicting lines in the function).

Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 fs/hugetlbfs/inode.c | 13 ++++++-------
 mm/hugetlb.c         |  4 ++++
 mm/memory-failure.c  |  5 ++++-
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d74a49b188c2..be8deec29ebe 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -361,6 +361,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		} else {
 			unlock_page(page);
 
+			if (PageHWPoison(page)) {
+				put_page(page);
+				retval = -EIO;
+				break;
+			}
+
 			/*
 			 * We have the page, copy it to user space buffer.
 			 */
@@ -984,13 +990,6 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
 static int hugetlbfs_error_remove_page(struct address_space *mapping,
 				struct page *page)
 {
-	struct inode *inode = mapping->host;
-	pgoff_t index = page->index;
-
-	remove_huge_page(page);
-	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
-		hugetlb_fix_reserve_counts(inode);
-
 	return 0;
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dbb63ec3b5fa..e7bd42f23667 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5350,6 +5350,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
 	spin_lock(ptl);
 
+	ret = -EIO;
+	if (PageHWPoison(page))
+		goto out_release_unlock;
+
 	/*
 	 * Recheck the i_size after holding PT lock to make sure not
 	 * to leave any page mapped (as page_mapped()) beyond the end
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 85b1a77e3a99..2ad0f4580091 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1040,6 +1040,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 	int res;
 	struct page *hpage = compound_head(p);
 	struct address_space *mapping;
+	bool extra_pins = false;
 
 	if (!PageHuge(hpage))
 		return MF_DELAYED;
@@ -1047,6 +1048,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 	mapping = page_mapping(hpage);
 	if (mapping) {
 		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+		/* The page is kept in page cache. */
+		extra_pins = true;
 		unlock_page(hpage);
 	} else {
 		res = MF_FAILED;
@@ -1064,7 +1067,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		}
 	}
 
-	if (has_extra_refcount(ps, p, false))
+	if (has_extra_refcount(ps, p, extra_pins))
 		res = MF_FAILED;
 
 	return res;
-- 
2.37.3.518.g79f2338b37


  reply	other threads:[~2022-11-15  6:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14 13:14 hwpoison, shmem: fix data lost issue for 5.15.y Naoya Horiguchi
2022-11-14 16:11 ` Greg KH
2022-11-14 22:39   ` Naoya Horiguchi
2022-11-14 22:53     ` Mike Kravetz
2022-11-15  1:16       ` HORIGUCHI NAOYA(堀口 直也)
2022-11-15  1:30         ` Mike Kravetz
2022-11-15  6:39           ` Naoya Horiguchi [this message]
2022-11-16  3:48             ` Mike Kravetz
2022-11-16 23:58               ` Naoya Horiguchi
2022-11-23  1:05         ` Mike Kravetz
2022-11-23 18:54           ` Yang Shi
2022-11-24  3:03             ` Naoya Horiguchi
2022-11-16  9:16     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221115063912.GA3928893@u2004 \
    --to=naoya.horiguchi@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).