From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=DBvG=N6=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1247C433EF
	for <linux-mm@archiver.kernel.org>; Wed,  8 Sep 2021 03:14:40 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 74BE861041
	for <linux-mm@archiver.kernel.org>; Wed,  8 Sep 2021 03:14:40 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 74BE861041
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id E12E7940068; Tue,  7 Sep 2021 23:14:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DC389940042; Tue,  7 Sep 2021 23:14:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CB0B6940068; Tue,  7 Sep 2021 23:14:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114])
	by kanga.kvack.org (Postfix) with ESMTP id B9522940042
	for <linux-mm@kvack.org>; Tue,  7 Sep 2021 23:14:39 -0400 (EDT)
Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 758668249980
	for <linux-mm@kvack.org>; Wed,  8 Sep 2021 03:14:39 +0000 (UTC)
X-FDA: 78562938678.29.D01A9A6
Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53])
	by imf19.hostedemail.com (Postfix) with ESMTP id 3A808B0000A3
	for <linux-mm@kvack.org>; Wed,  8 Sep 2021 03:14:39 +0000 (UTC)
Received: by mail-ej1-f53.google.com with SMTP id ia27so1144848ejc.10
        for <linux-mm@kvack.org>; Tue, 07 Sep 2021 20:14:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc:content-transfer-encoding;
        bh=w28qwB6uuk0TmzATBqadaAK/GMSgO2rEp01QI9DAiu4=;
        b=IOnxrwqp7N6Esw4/Sp7ke8HcDz3rOHTzmn5b/+Ug5iuYUP2fUxkbpGdLKhnbiuzRPX
         O1Xprm93z4CVnQ6UruW26o0nhp+puApY4JVWCg4RlR0dLRS17szLArt9RctNkoUTvWq0
         17hRauezF0AugWyQt7JvZ03Z/K1KDpwrlS1n3cEsDwXYBOlsnpwntAYzN/NonOdvWyL2
         poEmpYNGRgtVz6XlIF/k+HVpazgJE2nuGWYL1LQuSQJY8V7mwcCdZPqPMpcqb4vmuboN
         j8f/s5HZUHA29ZgN4GH8yJFs7cM3F6PwPTsRe+rF3iP3QzAyz964toGfFo/Rzmmf4U69
         5znA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc:content-transfer-encoding;
        bh=w28qwB6uuk0TmzATBqadaAK/GMSgO2rEp01QI9DAiu4=;
        b=psu+Q2TK91JZ48m98oKPRgJF9b2OnV8RSjSZ3vDKc9NhkfUKwAb9yeQPcz69zTXfTF
         PmQrc14gnDVUQEJAUwkZi3Q8FlHB1WoHnJj62tH5aVbLkG05sDXFst9veGEWv6WNeKnQ
         JWHcDv0IYPvrcKREZ9Cbr17OsQG30Z5tbz2g1I0WSSxvlm6r0N9tkpyNLPxwKRRksxFo
         Mgs7C3ud0VlpXg5r/LYOZP6W4HzG+Mz2kS4p3cULpQe9aQGvrSkRgWKyo0xYXudqKR/O
         8CMQIuA2vwO/nbNibdcPKAQW8YkzIyH5LU5D3npJaWrqgd5b5PYZcecaZ+2LjYpZvda3
         Em4g==
X-Gm-Message-State: AOAM533jr1md0WsNlNko2iulbl0TeRK2Z4aI7hgbEsbmNL/k1af7kjWT
	+L+Oe8CvF5ObbHlvvHoaCbkB4qmT6fwKKoGZ/ao=
X-Google-Smtp-Source: ABdhPJzhgOVf6JDYsVJsgBRW5Hv4Dgb9WcXaR3/VzlYUjt4rkZV4lNNjWJyCD+iIIHfQ7UbLzSyQ3oPMT19xYRx7Uhs=
X-Received: by 2002:a17:907:2492:: with SMTP id zg18mr1793585ejb.233.1631070877936;
 Tue, 07 Sep 2021 20:14:37 -0700 (PDT)
MIME-Version: 1.0
References: <CAHbLzkpvR77xfs+ip1T8G09=ixz4Ko3E-6iKTEZkFCfGTxi6Aw@mail.gmail.com>
 <20210827035739.GA3247360@hori.linux.bs1.fc.nec.co.jp> <CAHbLzkr35uVHCJB_cr_fZdz3_FXXGam7dsrAn15j5BPfmfX-_A@mail.gmail.com>
 <CAHbLzko+XqFLx9=e2=E3rGRsLzcm32dZnpDf20gnUb2dAR0d_Q@mail.gmail.com>
 <20210902030728.GA1860112@hori.linux.bs1.fc.nec.co.jp> <CAHbLzkoBz2DwPXC5Nj+Kd6-pRaHgtZ-MGNPzYW4azN6Xpz3oyw@mail.gmail.com>
 <20210903115311.GA2477773@hori.linux.bs1.fc.nec.co.jp> <CAHbLzkoXaB_Hz_ZUw4BvR0PKV_ZV4F+hHekRS9RgUnxBYkMV_g@mail.gmail.com>
 <CAHbLzkpDK5qb1=VM8qeDdfjdqRyuL+ehwOK=8uZPYuR7J+xMEQ@mail.gmail.com>
 <CAHbLzko6yTTnBopzASaYeuD5bBsc_kdhQxQoVd-wqEhkzX1agg@mail.gmail.com> <20210908025024.GA4117799@hori.linux.bs1.fc.nec.co.jp>
In-Reply-To: <20210908025024.GA4117799@hori.linux.bs1.fc.nec.co.jp>
From: Yang Shi <shy828301@gmail.com>
Date: Tue, 7 Sep 2021 20:14:26 -0700
Message-ID: <CAHbLzkptCkXdVdG9mr1BU8TEyEpW57go_Pj1uVopkxORqYi8aw@mail.gmail.com>
Subject: Re: ##freemail## Re: [PATCH] mm: hwpoison: deal with page cache THP
To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= <naoya.horiguchi@nec.com>
Cc: "osalvador@suse.de" <osalvador@suse.de>, "hughd@google.com" <hughd@google.com>, 
	"kirill.shutemov@linux.intel.com" <kirill.shutemov@linux.intel.com>, 
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, 
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Authentication-Results: imf19.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=IOnxrwqp;
	spf=pass (imf19.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 3A808B0000A3
X-Stat-Signature: anfoyh9k8dsqsn5wwcrnhznid17maezf
X-HE-Tag: 1631070879-423801
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Sep 7, 2021 at 7:50 PM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=
=E7=9B=B4=E4=B9=9F)
<naoya.horiguchi@nec.com> wrote:
>
> On Tue, Sep 07, 2021 at 02:34:24PM -0700, Yang Shi wrote:
> > On Fri, Sep 3, 2021 at 5:03 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Fri, Sep 3, 2021 at 11:01 AM Yang Shi <shy828301@gmail.com> wrote:
> ...
> > > > > >
> > > > > > AFAIK the address_space error just works for fsync. Anyway I co=
uld be wrong.
> > > > > >
> > > > > > I think clearing the dirty flag might be the easiest way? It se=
ems
> > > > > > unnecessary to notify the users when writing back since the mos=
t write
> > > > > > back happens asynchronously. They should be notified when the p=
age is
> > > > > > accessed, e.g. read/write and page fault.
> > > > > >
> > > > > > I did some further investigation and got a clearer picture for
> > > > > > writeback filesystem:
> > > > > > 1. The page should be not written back: clearing dirty flag cou=
ld
> > > > > > prevent from writeback
> > > > > > 2. The page should be not dropped (it shows as a clean page): t=
he
> > > > > > refcount pin from hwpoison could prevent from invalidating (cal=
led by
> > > > > > cache drop, inode cache shrinking, etc), but it doesn't avoid
> > > > > > invalidation in DIO path (easy to deal with)
> > > > > > 3. The page should be able to get truncated/hole punched/unlink=
ed: it
> > > > > > works as it is
> > > > > > 4. Notify users when the page is accessed, e.g. read/write, pag=
e fault
> > > > > > and other paths: this is hard
> > > > > >
> > > > > > The hardest part is #4. Since there are too many paths in files=
ystems
> > > > > > that do *NOT* check if page is poisoned or not, e.g. read/write=
,
> > > > > > compression (btrfs, f2fs), etc. A couple of ways to handle it o=
ff the
> > > > > > top of my head:
> > > > > > 1. Check hwpoison flag for every path, the most straightforward=
 way,
> > > > > > but a lot work
> > > > > > 2. Return NULL for poisoned page from page cache lookup, the mo=
st
> > > > > > callsites check if NULL is returned, this should have least wor=
k I
> > > > > > think. But the error handling in filesystems just return -ENOME=
M, the
> > > > > > error code will incur confusion to the users obviously.
> > > > > > 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-=
EIO),
> > > > > > but this will involve significant amount of code change as well=
 since
> > > > > > all the paths need check if the pointer is ERR or not.
> > > > >
> > > > > I think the approach #3 sounds good for now, it seems to me that =
these
> > > > > statements are about general ways to handle error pages on all pa=
ge cache
> > > > > users, so then the amount of code changes is a big problem, but w=
hen
> > > > > focusing on shmem/tmpfs, could the amount of changes be more hand=
lable, or
> > > > > still large?
> > > >
> > > > Yeah, I agree #3 makes more sense. Just return an error when findin=
g
> > > > out corrupted page. I think this is the right semantic.
> >
> > I had a prototype patchset for #3, but now I have to consider stepping
> > back to rethink which way we should go. I actually did a patchset for
> > #1 too. By comparing the two patchsets and some deeper investigation
> > during preparing the two patchsets, I realized #3 may not be the best
> > way.
> >
> > For #3 ERR_PTR will be returned so all the callers need to check the
> > return value otherwise invalid pointer may be dereferenced, but not
> > all callers really care about the content of the page, for example,
> > partial truncate which just set the truncated range in one page to 0.
> > So for such paths it needs additional modification if ERR_PTR is
> > returned. And if the callers have their own way to handle the
> > problematic pages we need to add a new FGP flag to tell FGP functions
> > to return the pointer to the page.
> >
> > But we don't need to do anything for such paths if we go with #1. For
> > most paths (e.g. read/write) both ways need to check the validity of
> > the page by checking ERR PTR or page flag. But a lot of extra code
> > could be saved for the paths which don't care about the actual data
> > for #1.
>
> OK, so it's not clear which is better. Maybe we need discussion over
> patches...

Yeah, the conclusion is #1 would need fewer changes than #3.

>
> ...
>
> > > >
> > > > >
> > > > > > 5. We also could define a new FGP flag to return poisoned page,=
 NULL
> > > > > > or error pointer. This also should need significant code change=
 since
> > > > > > a lt callsites need to be contemplated.
> > > > >
> > > > > Could you explain a little more about which callers should use th=
e flag?
> > > >
> > > > Just to solve the above invalidate/truncate problem and page fault
> > > > doesn't expect an error pointer. But it seems the above
> > > > invalidate/truncate paths don't matter. Page fault should be the on=
ly
> > > > user since page fault may need unlock the page if poisoned page is
> > > > returned.
>
> Orignally PG_hwpoison is detected in page fault handler for file-backed p=
ages
> like below,
>
>   static vm_fault_t __do_fault(struct vm_fault *vmf)
>   ...
>           ret =3D vma->vm_ops->fault(vmf);
>           if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT=
_RETRY |
>                               VM_FAULT_DONE_COW)))
>                   return ret;
>
>           if (unlikely(PageHWPoison(vmf->page))) {
>                   if (ret & VM_FAULT_LOCKED)
>                           unlock_page(vmf->page);
>                   put_page(vmf->page);
>                   vmf->page =3D NULL;
>                   return VM_FAULT_HWPOISON;
>           }
>
> so it seems to me that if a filesystem switches to the new scheme of keep=
ing
> error pages in page cache, ->fault() callback for the filesystem needs to=
 be
> aware of hwpoisoned pages inside it and returns VM_FAULT_HWPOISON when it
> detects an error page (maybe by detecting pagecache_get_page() to return
> PTR_ERR(-EIO or -EHWPOISON) or some other ways).  In the other filesystem=
s
> with the old scheme, fault() callback does not return VM_FAULT_HWPOISON s=
o
> that page fault handler falls back to the generic PageHWPoison check abov=
e.

Yes, exactly. If we return ERR_PTR we need modify the above code
accordingly. But returning the page, no change is required.

>
> > >
> > > It seems page fault check IS_ERR(page) then just return
> > > VM_FAULT_HWPOISON. But I found a couple of places in shmem which want
> > > to return head page then handle subpage or just return the page but
> > > don't care the content of the page. They should ignore hwpoison. So I
> > > guess we'd better to have a FGP flag for such cases.
>
> At least currently thp are supposed to be split before error handling.

Not for file THP.

> We could loosen this assumption to support hwpoison on a subpage of thp,
> but I'm not sure whether that has some benefit.

I don't quite get your point. Currently the hwpoison flag is set on
specific subpage instead of head page.

> And also this discussion makes me aware that we need to have a way to
> prevent hwpoisoned pages from being used to thp collapse operation.

Actually not. The refcount from hwpoison could prevent it from
collapsing. But if we return ERR_PTR (#3) we need extra code in
khugepaged to handle it.

So I realized #1 would require fewer changes. And leaving the page
state check to callers seem reasonable since the callers usually check
other states, e.g. uptodate.


>
> Thanks,
> Naoya Horiguchi