* Re: [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly
@ 2021-11-01 19:38 Jue Wang
2021-11-01 20:11 ` Yang Shi
0 siblings, 1 reply; 5+ messages in thread
From: Jue Wang @ 2021-11-01 19:38 UTC (permalink / raw)
To: Yang Shi, Hugh Dickins
Cc: Andrew Morton, kirill.shutemov, linux-fsdevel, LKML, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Oscar Salvador, Peter Xu, Matthew Wilcox
A related bug but whose fix may belong to a separate series:
split_huge_page fails when invoked concurrently on the same THP page.
It's possible that multiple memory errors on the same THP get consumed
by multiple threads and come down to split_huge_page path easily.
Thanks,
-Jue
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly
2021-11-01 19:38 [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly Jue Wang
@ 2021-11-01 20:11 ` Yang Shi
2021-11-02 3:49 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Yang Shi @ 2021-11-01 20:11 UTC (permalink / raw)
To: Jue Wang
Cc: Hugh Dickins, Andrew Morton, Kirill A. Shutemov,
Linux FS-devel Mailing List, LKML, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Oscar Salvador, Peter Xu, Matthew Wilcox
On Mon, Nov 1, 2021 at 12:38 PM Jue Wang <juew@google.com> wrote:
>
> A related bug but whose fix may belong to a separate series:
>
> split_huge_page fails when invoked concurrently on the same THP page.
>
> It's possible that multiple memory errors on the same THP get consumed
> by multiple threads and come down to split_huge_page path easily.
Yeah, I think it should be a known problem since the very beginning.
The THP split requires to pin the page and does check if the refcount
is expected or not and freezes the refcount if it is expected. So if
two concurrent paths try to split the same THP, one will fail due to
the pin from the other path, but the other one will succeed.
I don't think of a better way to remediate it other than retrying from
the very start off the top of my head. We can't simply check if it is
still a THP or not since THP split will just move the refcount pin to
the poisoned subpage so the retry path will lose the refcount for its
poisoned subpage.
Did you run into this problem on any real production environment? Or
it is just a artificial test case? I'm wondering if the extra
complexity is worth or not.
>
> Thanks,
> -Jue
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly
2021-11-01 20:11 ` Yang Shi
@ 2021-11-02 3:49 ` Matthew Wilcox
2021-11-02 16:41 ` Yang Shi
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2021-11-02 3:49 UTC (permalink / raw)
To: Yang Shi
Cc: Jue Wang, Hugh Dickins, Andrew Morton, Kirill A. Shutemov,
Linux FS-devel Mailing List, LKML, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Oscar Salvador, Peter Xu
On Mon, Nov 01, 2021 at 01:11:33PM -0700, Yang Shi wrote:
> On Mon, Nov 1, 2021 at 12:38 PM Jue Wang <juew@google.com> wrote:
> >
> > A related bug but whose fix may belong to a separate series:
> >
> > split_huge_page fails when invoked concurrently on the same THP page.
> >
> > It's possible that multiple memory errors on the same THP get consumed
> > by multiple threads and come down to split_huge_page path easily.
>
> Yeah, I think it should be a known problem since the very beginning.
> The THP split requires to pin the page and does check if the refcount
> is expected or not and freezes the refcount if it is expected. So if
> two concurrent paths try to split the same THP, one will fail due to
> the pin from the other path, but the other one will succeed.
No, they can both fail, if the timing is just right, because they each
have a refcount, so neither will succeed.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly
2021-11-02 3:49 ` Matthew Wilcox
@ 2021-11-02 16:41 ` Yang Shi
0 siblings, 0 replies; 5+ messages in thread
From: Yang Shi @ 2021-11-02 16:41 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jue Wang, Hugh Dickins, Andrew Morton, Kirill A. Shutemov,
Linux FS-devel Mailing List, LKML, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Oscar Salvador, Peter Xu
On Mon, Nov 1, 2021 at 8:50 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Nov 01, 2021 at 01:11:33PM -0700, Yang Shi wrote:
> > On Mon, Nov 1, 2021 at 12:38 PM Jue Wang <juew@google.com> wrote:
> > >
> > > A related bug but whose fix may belong to a separate series:
> > >
> > > split_huge_page fails when invoked concurrently on the same THP page.
> > >
> > > It's possible that multiple memory errors on the same THP get consumed
> > > by multiple threads and come down to split_huge_page path easily.
> >
> > Yeah, I think it should be a known problem since the very beginning.
> > The THP split requires to pin the page and does check if the refcount
> > is expected or not and freezes the refcount if it is expected. So if
> > two concurrent paths try to split the same THP, one will fail due to
> > the pin from the other path, but the other one will succeed.
>
> No, they can both fail, if the timing is just right, because they each
> have a refcount, so neither will succeed.
Oh, yes, if there is race between unlock_page and put_page.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [v5 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
@ 2021-10-20 21:07 Yang Shi
2021-10-20 21:07 ` [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
0 siblings, 1 reply; 5+ messages in thread
From: Yang Shi @ 2021-10-20 21:07 UTC (permalink / raw)
To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel
When discussing the patch that splits page cache THP in order to offline the
poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
from working since the page cache page will be truncated if uncorrectable
errors happen. By looking this deeper it turns out this approach (truncating
poisoned page) may incur silent data loss for all non-readonly filesystems if
the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/tmpfs
since the data blocks are actually gone.
To solve this problem we could keep the poisoned dirty page in page cache then
notify the users on any later access, e.g. page fault, read/write, etc. The
clean page could be truncated as is since they can be reread from disk later on.
The consequence is the filesystems may find poisoned page and manipulate it as
healthy page since all the filesystems actually don't check if the page is
poisoned or not in all the relevant paths except page fault. In general, we
need make the filesystems be aware of poisoned page before we could keep the
poisoned page in page cache in order to solve the data loss problem.
To make filesystems be aware of poisoned page we should consider:
- The page should be not written back: clearing dirty flag could prevent from
writeback.
- The page should not be dropped (it shows as a clean page) by drop caches or
other callers: the refcount pin from hwpoison could prevent from invalidating
(called by cache drop, inode cache shrinking, etc), but it doesn't avoid
invalidation in DIO path.
- The page should be able to get truncated/hole punched/unlinked: it works as it
is.
- Notify users when the page is accessed, e.g. read/write, page fault and other
paths (compression, encryption, etc).
The scope of the last one is huge since almost all filesystems need do it once
a page is returned from page cache lookup. There are a couple of options to
do it:
1. Check hwpoison flag for every path, the most straightforward way.
2. Return NULL for poisoned page from page cache lookup, the most callsites
check if NULL is returned, this should have least work I think. But the
error handling in filesystems just return -ENOMEM, the error code will incur
confusion to the users obviously.
3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but this
will involve significant amount of code change as well since all the paths
need check if the pointer is ERR or not just like option #1.
I did prototype for both #1 and #3, but it seems #3 may require more changes
than #1. For #3 ERR_PTR will be returned so all the callers need to check the
return value otherwise invalid pointer may be dereferenced, but not all callers
really care about the content of the page, for example, partial truncate which
just sets the truncated range in one page to 0. So for such paths it needs
additional modification if ERR_PTR is returned. And if the callers have their
own way to handle the problematic pages we need to add a new FGP flag to tell
FGP functions to return the pointer to the page.
It may happen very rarely, but once it happens the consequence (data corruption)
could be very bad and it is very hard to debug. It seems this problem had been
slightly discussed before, but seems no action was taken at that time. [2]
As the aforementioned investigation, it needs huge amount of work to solve
the potential data loss for all filesystems. But it is much easier for
in-memory filesystems and such filesystems actually suffer more than others
since even the data blocks are gone due to truncating. So this patchset starts
from shmem/tmpfs by taking option #1.
TODO:
* The unpoison has been broken since commit 0ed950d1f281 ("mm,hwpoison: make
get_hwpoison_page() call get_any_page()"), and this patch series make
refcount check for unpoisoning shmem page fail.
* Expand to other filesystems. But I haven't heard feedback from filesystem
developers yet.
Patch breakdown:
Patch #1: cleanup, depended by patch #2
Patch #2: fix THP with hwpoisoned subpage(s) PMD map bug
Patch #3: coding style cleanup
Patch #4: refactor and preparation.
Patch #5: keep the poisoned page in page cache and handle such case for all
the paths.
Patch #6: the previous patches unblock page cache THP split, so this patch
add page cache THP split support.
Changelog
v4 --> v5:
* Fixed the typos (patch 2/6) per Naoya.
* Coding style clean up (patch 5/6) per Naoya.
* Fixed missed put_page() in shmem_file_read_iter().
* Collected review tags from Naoya.
v3 --> v4:
* Separated coding style cleanup from patch 2/5 by adding a new patch
(patch 3/6) per Kirill.
* Moved setting PageHasHWPoisoned flag to proper place (patch 2/6) per
Peter Xu.
* Elaborated why soft offline doesn't need to set this flag in the commit
message (patch 2/6) per Peter Xu.
* Renamed "dec" parameter to "extra_pins" for has_extra_refcount() (patch 4/6)
per Peter Xu.
* Adopted the suggestions for comment and coding style (patch 5/6) per
Naoya.
* Checked if page is hwpoison or not for shmem_get_link() (patch 5/6) per
Peter Xu.
* Collected acks.
v2 --> v3:
* Incorporated the comments from Kirill.
* Reordered the series to reflect the right dependency (patch #3 from v2
is patch #1 in this revision, patch #1 from v2 is patch #2 in this
revision).
* After the reorder, patch #2 depends on patch #1 and both need to be
backported to -stable.
v1 --> v2:
* Incorporated the suggestion from Kirill to use a new page flag to
indicate there is hwpoisoned subpage(s) in a THP. (patch #1)
* Dropped patch #2 of v1.
* Refctored the page refcount check logic of hwpoison per Naoya. (patch #2)
* Removed unnecessary THP check per Naoya. (patch #3)
* Incorporated the other comments for shmem from Naoya. (patch #4)
Yang Shi (6):
mm: hwpoison: remove the unnecessary THP check
mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
mm: filemap: coding style cleanup for filemap_map_pmd()
mm: hwpoison: refactor refcount check handling
mm: shmem: don't truncate page if memory failure happens
mm: hwpoison: handle non-anonymous THP correctly
include/linux/page-flags.h | 23 ++++++++++++
mm/filemap.c | 12 +++----
mm/huge_memory.c | 2 ++
mm/memory-failure.c | 136 ++++++++++++++++++++++++++++++++++++++++++++-------------------------
mm/memory.c | 9 +++++
mm/page_alloc.c | 4 ++-
mm/shmem.c | 37 +++++++++++++++++--
mm/userfaultfd.c | 5 +++
8 files changed, 170 insertions(+), 58 deletions(-)
[1] https://lore.kernel.org/linux-mm/CAHbLzkqNPBh_sK09qfr4yu4WTFOzRy+MKj+PA7iG-adzi9zGsg@mail.gmail.com/T/#m0e959283380156f1d064456af01ae51fdff91265
[2] https://lore.kernel.org/lkml/20210318183350.GT3420@casper.infradead.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly
2021-10-20 21:07 [v5 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
@ 2021-10-20 21:07 ` Yang Shi
0 siblings, 0 replies; 5+ messages in thread
From: Yang Shi @ 2021-10-20 21:07 UTC (permalink / raw)
To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel
Currently hwpoison doesn't handle non-anonymous THP, but since v4.8 THP
support for tmpfs and read-only file cache has been added. They could
be offlined by split THP, just like anonymous THP.
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
mm/memory-failure.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3603a3acf7b3..bd697c64e973 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1443,14 +1443,11 @@ static int identify_page_state(unsigned long pfn, struct page *p,
static int try_to_split_thp_page(struct page *page, const char *msg)
{
lock_page(page);
- if (!PageAnon(page) || unlikely(split_huge_page(page))) {
+ if (unlikely(split_huge_page(page))) {
unsigned long pfn = page_to_pfn(page);
unlock_page(page);
- if (!PageAnon(page))
- pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
- else
- pr_info("%s: %#lx: thp split failed\n", msg, pfn);
+ pr_info("%s: %#lx: thp split failed\n", msg, pfn);
put_page(page);
return -EBUSY;
}
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-11-02 16:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-01 19:38 [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly Jue Wang
2021-11-01 20:11 ` Yang Shi
2021-11-02 3:49 ` Matthew Wilcox
2021-11-02 16:41 ` Yang Shi
-- strict thread matches above, loose matches on Subject: below --
2021-10-20 21:07 [v5 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
2021-10-20 21:07 ` [v5 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).