linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Yang Shi <shy828301@gmail.com>
Cc: "osalvador@suse.de" <osalvador@suse.de>,
	"hughd@google.com" <hughd@google.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: ##freemail## Re: [PATCH] mm: hwpoison: deal with page cache THP
Date: Wed, 8 Sep 2021 02:50:25 +0000	[thread overview]
Message-ID: <20210908025024.GA4117799@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <CAHbLzko6yTTnBopzASaYeuD5bBsc_kdhQxQoVd-wqEhkzX1agg@mail.gmail.com>

On Tue, Sep 07, 2021 at 02:34:24PM -0700, Yang Shi wrote:
> On Fri, Sep 3, 2021 at 5:03 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Fri, Sep 3, 2021 at 11:01 AM Yang Shi <shy828301@gmail.com> wrote:
...
> > > > >
> > > > > AFAIK the address_space error just works for fsync. Anyway I could be wrong.
> > > > >
> > > > > I think clearing the dirty flag might be the easiest way? It seems
> > > > > unnecessary to notify the users when writing back since the most write
> > > > > back happens asynchronously. They should be notified when the page is
> > > > > accessed, e.g. read/write and page fault.
> > > > >
> > > > > I did some further investigation and got a clearer picture for
> > > > > writeback filesystem:
> > > > > 1. The page should be not written back: clearing dirty flag could
> > > > > prevent from writeback
> > > > > 2. The page should be not dropped (it shows as a clean page): the
> > > > > refcount pin from hwpoison could prevent from invalidating (called by
> > > > > cache drop, inode cache shrinking, etc), but it doesn't avoid
> > > > > invalidation in DIO path (easy to deal with)
> > > > > 3. The page should be able to get truncated/hole punched/unlinked: it
> > > > > works as it is
> > > > > 4. Notify users when the page is accessed, e.g. read/write, page fault
> > > > > and other paths: this is hard
> > > > >
> > > > > The hardest part is #4. Since there are too many paths in filesystems
> > > > > that do *NOT* check if page is poisoned or not, e.g. read/write,
> > > > > compression (btrfs, f2fs), etc. A couple of ways to handle it off the
> > > > > top of my head:
> > > > > 1. Check hwpoison flag for every path, the most straightforward way,
> > > > > but a lot work
> > > > > 2. Return NULL for poisoned page from page cache lookup, the most
> > > > > callsites check if NULL is returned, this should have least work I
> > > > > think. But the error handling in filesystems just return -ENOMEM, the
> > > > > error code will incur confusion to the users obviously.
> > > > > 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO),
> > > > > but this will involve significant amount of code change as well since
> > > > > all the paths need check if the pointer is ERR or not.
> > > >
> > > > I think the approach #3 sounds good for now, it seems to me that these
> > > > statements are about general ways to handle error pages on all page cache
> > > > users, so then the amount of code changes is a big problem, but when
> > > > focusing on shmem/tmpfs, could the amount of changes be more handlable, or
> > > > still large?
> > >
> > > Yeah, I agree #3 makes more sense. Just return an error when finding
> > > out corrupted page. I think this is the right semantic.
> 
> I had a prototype patchset for #3, but now I have to consider stepping
> back to rethink which way we should go. I actually did a patchset for
> #1 too. By comparing the two patchsets and some deeper investigation
> during preparing the two patchsets, I realized #3 may not be the best
> way.
> 
> For #3 ERR_PTR will be returned so all the callers need to check the
> return value otherwise invalid pointer may be dereferenced, but not
> all callers really care about the content of the page, for example,
> partial truncate which just set the truncated range in one page to 0.
> So for such paths it needs additional modification if ERR_PTR is
> returned. And if the callers have their own way to handle the
> problematic pages we need to add a new FGP flag to tell FGP functions
> to return the pointer to the page.
> 
> But we don't need to do anything for such paths if we go with #1. For
> most paths (e.g. read/write) both ways need to check the validity of
> the page by checking ERR PTR or page flag. But a lot of extra code
> could be saved for the paths which don't care about the actual data
> for #1.

OK, so it's not clear which is better. Maybe we need discussion over
patches...

...

> > >
> > > >
> > > > > 5. We also could define a new FGP flag to return poisoned page, NULL
> > > > > or error pointer. This also should need significant code change since
> > > > > a lt callsites need to be contemplated.
> > > >
> > > > Could you explain a little more about which callers should use the flag?
> > >
> > > Just to solve the above invalidate/truncate problem and page fault
> > > doesn't expect an error pointer. But it seems the above
> > > invalidate/truncate paths don't matter. Page fault should be the only
> > > user since page fault may need unlock the page if poisoned page is
> > > returned.

Orignally PG_hwpoison is detected in page fault handler for file-backed pages
like below,

  static vm_fault_t __do_fault(struct vm_fault *vmf)
  ...
          ret = vma->vm_ops->fault(vmf);
          if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
                              VM_FAULT_DONE_COW)))
                  return ret;
  
          if (unlikely(PageHWPoison(vmf->page))) {
                  if (ret & VM_FAULT_LOCKED)
                          unlock_page(vmf->page);
                  put_page(vmf->page);
                  vmf->page = NULL;
                  return VM_FAULT_HWPOISON;
          }

so it seems to me that if a filesystem switches to the new scheme of keeping
error pages in page cache, ->fault() callback for the filesystem needs to be
aware of hwpoisoned pages inside it and returns VM_FAULT_HWPOISON when it
detects an error page (maybe by detecting pagecache_get_page() to return
PTR_ERR(-EIO or -EHWPOISON) or some other ways).  In the other filesystems
with the old scheme, fault() callback does not return VM_FAULT_HWPOISON so
that page fault handler falls back to the generic PageHWPoison check above.

> >
> > It seems page fault check IS_ERR(page) then just return
> > VM_FAULT_HWPOISON. But I found a couple of places in shmem which want
> > to return head page then handle subpage or just return the page but
> > don't care the content of the page. They should ignore hwpoison. So I
> > guess we'd better to have a FGP flag for such cases.

At least currently thp are supposed to be split before error handling.
We could loosen this assumption to support hwpoison on a subpage of thp,
but I'm not sure whether that has some benefit.
And also this discussion makes me aware that we need to have a way to
prevent hwpoisoned pages from being used to thp collapse operation.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2021-09-08  2:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 22:13 [PATCH] mm: hwpoison: deal with page cache THP Yang Shi
2021-08-26  6:17 ` HORIGUCHI NAOYA(堀口 直也)
2021-08-26 20:03   ` Yang Shi
2021-08-26 22:03     ` Yang Shi
2021-08-27  3:57       ` HORIGUCHI NAOYA(堀口 直也)
2021-08-27  5:02         ` Yang Shi
2021-08-30 23:44           ` Yang Shi
2021-09-02  3:07             ` HORIGUCHI NAOYA(堀口 直也)
2021-09-02 18:32               ` Yang Shi
2021-09-03 11:53                 ` HORIGUCHI NAOYA(堀口 直也)
2021-09-03 18:01                   ` Yang Shi
2021-09-04  0:03                     ` Yang Shi
2021-09-07 21:34                       ` Yang Shi
2021-09-08  2:50                         ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-09-08  3:14                           ` ##freemail## " Yang Shi
2021-09-08  4:25                             ` HORIGUCHI NAOYA(堀口 直也)
2021-09-09 23:07                               ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210908025024.GA4117799@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).