From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EACC7C433EF for ; Thu, 10 Mar 2022 17:50:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2AAF48D0002; Thu, 10 Mar 2022 12:50:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 233738D0001; Thu, 10 Mar 2022 12:50:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AE038D0002; Thu, 10 Mar 2022 12:50:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id EA64A8D0001 for ; Thu, 10 Mar 2022 12:50:24 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B182824487 for ; Thu, 10 Mar 2022 17:50:24 +0000 (UTC) X-FDA: 79229215968.10.0B127B4 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf24.hostedemail.com (Postfix) with ESMTP id 4C035180010 for ; Thu, 10 Mar 2022 17:50:24 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id n18so2855412plg.5 for ; Thu, 10 Mar 2022 09:50:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=3M3YBTPk1tjfk10JDkzUkRyFGOgVlE5988An5HSRp3g=; b=RMfrtH088l/fulXIlWNXa3Ymz+Ooam6s4LhkhpdzqNo3LlYJj1vwu+B8OaK+0ypuq1 MMmCQo9WKR41chURgj19pOj7jtXKuhoRu6P1WRxp3KJAMiqeoq9V84dGBs53gLn2kgNE ORcYgWCegAfDAuPsCwPqGEwQqt06xv5KfzO7smzBeWLCMHjOjcvW45qKLbAz18WTeRGS N3toz9OVlNQV10NaNGpXW8exb7TIN5vx12GwIzyvB96pVQACa99rqpHcZdB2EcfuVasD Grpl+Pz/rdjD1fc8r7zzXcldmvQQxiAIQ21U5iq0bGZxTUSm4Yc1ElUwMiWu2bIpomEo D6lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=3M3YBTPk1tjfk10JDkzUkRyFGOgVlE5988An5HSRp3g=; b=RNyp86I+8vDx63ZHdK+oi0WFTtxxrZ1D8foZL7W+Z+OH9vOj2EZfmGUng5hJDh3kiH nAk5eDx8jcjqJPpVqIrMse3MyQkzjA7ynDbjNIJqc7ySfrBNGjrHKKFukkzeZhUkh2Ar KDzP797SdavxsZkI784paHKc+XGcsO8hRtce2XHikVhb1R48QW5OfYfPa3jJb4r7Ktno 17iY+yEmQHDE+rUVCxGN/q5+gfEPdol32MBSjnwbkzqmgKF/csEmcrokoWRcU9V5Ov/T w4AebQy3k8UOdBQ5yqkVcMgt7vzsC8gB6mRLyP8eWlKR0q7jEfS5QsHwJGWoxqiY3G4K qo+g== X-Gm-Message-State: AOAM530PGNJtavt15TCtvBMO0dzcq1G8NxcXdOkt/9grGwFY2a+EcDRv v433hN1HPfcyPt4S9MHYUyZMDfUBlXPspYgNcYM= X-Google-Smtp-Source: ABdhPJy2qBrVsUcdV+O+QA7KSmpVY761GhXicbgvxWExdju/uZESfiLW+gOoZm+yGkvcov3OIchzWcoN8n6+uko5w9M= X-Received: by 2002:a17:902:bc47:b0:151:ac43:eae0 with SMTP id t7-20020a170902bc4700b00151ac43eae0mr6387435plz.117.1646934623070; Thu, 10 Mar 2022 09:50:23 -0800 (PST) MIME-Version: 1.0 References: <20220309091449.2753904-1-naoya.horiguchi@linux.dev> <20220310000024.GA1577304@hori.linux.bs1.fc.nec.co.jp> <2a61c5c1-a99b-9da9-b319-90f95a420ab1@huawei.com> In-Reply-To: <2a61c5c1-a99b-9da9-b319-90f95a420ab1@huawei.com> From: Yang Shi Date: Thu, 10 Mar 2022 09:50:11 -0800 Message-ID: Subject: Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() To: Miaohe Lin Cc: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Naoya Horiguchi , Linux MM , Andrew Morton , Mike Kravetz , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4C035180010 X-Stat-Signature: 14rhrgdaj8gucgrqupbytpxe8m16fsfa Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=RMfrtH08; spf=pass (imf24.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1646934624-704024 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 9, 2022 at 10:24 PM Miaohe Lin wrote: > > On 2022/3/10 8:30, Yang Shi wrote: > > On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80= =80=E7=9B=B4=E4=B9=9F) > > wrote: > >> > >> On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote: > >>> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi > >>> wrote: > >>>> > >>>> From: Naoya Horiguchi > >>>> > >>>> There is a race condition between memory_failure_hugetlb() and huget= lb > >>>> free/demotion, which causes setting PageHWPoison flag on the wrong p= age > >>>> (which was a hugetlb when memory_failrue() was called, but was remov= ed > >>>> or demoted when memory_failure_hugetlb() is called). This results i= n > >>>> killing wrong processes. So set PageHWPoison flag with holding page= lock, > >>>> > >>>> Signed-off-by: Naoya Horiguchi > >>>> --- > >>>> mm/memory-failure.c | 27 ++++++++++++--------------- > >>>> 1 file changed, 12 insertions(+), 15 deletions(-) > >>>> > >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>> index ac6492e36978..fe25eee8f9d6 100644 > >>>> --- a/mm/memory-failure.c > >>>> +++ b/mm/memory-failure.c > >>>> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned l= ong pfn, int flags) > >>>> int res; > >>>> unsigned long page_flags; > >>>> > >>>> - if (TestSetPageHWPoison(head)) { > >>>> - pr_err("Memory failure: %#lx: already hardware poiso= ned\n", > >>>> - pfn); > >>>> - res =3D -EHWPOISON; > >>>> - if (flags & MF_ACTION_REQUIRED) > >>>> - res =3D kill_accessing_process(current, page= _to_pfn(head), flags); > >>>> - return res; > >>>> - } > >>>> - > >>>> - num_poisoned_pages_inc(); > >>>> - > >>>> if (!(flags & MF_COUNT_INCREASED)) { > >>>> res =3D get_hwpoison_page(p, flags); > >>> > >>> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this > >>> could solve the race? Is the below race still possible? > >>> > >>> __get_hwpoison_page() > >>> head =3D compound_head(page) > >>> > >>> hugetlb demotion (1G --> 2M) > >>> get_hwpoison_huge_page(head, &hugetlb); > >> > >> Thanks for the comment. > >> I assume Miaohe's patch below introduces additional check to detect th= e > >> race. The patch calls compound_head() for the raw error page again, s= o > >> the demotion case should be detected. I'll make the dependency clear = in > >> the commit log. > >> > >> https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huaw= ei.com/ > >> > >>> > >>> > >>> Then the head may point to a 2M page, but the hwpoisoned subpage is > >>> not in that 2M range? > >>> > >>> > >>>> if (!res) { > >>>> lock_page(head); > >>>> if (hwpoison_filter(p)) { > >>>> - if (TestClearPageHWPoison(head)) > >>>> - num_poisoned_pages_dec(); > >>>> unlock_page(head); > >>>> return -EOPNOTSUPP; > >>>> } > >>>> @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned l= ong pfn, int flags) > >>>> page_flags =3D head->flags; > >>>> > >>>> if (hwpoison_filter(p)) { > >>>> - if (TestClearPageHWPoison(head)) > >>>> - num_poisoned_pages_dec(); > >>>> put_page(p); > >>>> res =3D -EOPNOTSUPP; > >>>> goto out; > >>>> } > >>>> > >>>> + if (TestSetPageHWPoison(head)) > >>> > >>> And I don't think "head" is still the head you expected if the race > >>> happened. I think we need to re-retrieve the head once the page > >>> refcount is bumped and locked. > >> > >> I think the above justification works for this. > >> When the kernel reaches this line, the hugepage is properly pinned wit= hout being > >> freed or demoted, so "head" is still pointing to the same head page as= expected. > > > > I think Mike's comment in the earlier email works for this too. The > > huge page may get demoted before the page is pinned and locked, so the > > actual hwpoisoned subpage may belong to another smaller huge page now. > > > > I thinks Naoya assumes that there is a check before we use "head": > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5444a8ef4867..0d7c58340a98 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1534,6 +1534,17 @@ static int memory_failure_hugetlb(unsigned long pf= n, int flags) > } > > lock_page(head); > + > + /** > + * The page could have changed compound pages due to race window. > + * If this happens just bail out. > + */ > + if (!PageHuge(p) || compound_head(p) !=3D head) { > + action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED)= ; > + res =3D -EBUSY; > + goto out; > + } > + > page_flags =3D head->flags; > > if (hwpoison_filter(p)) { > -- > from: https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@h= uawei.com/ Aha, thanks, I missed that. Yeah, we definitely need to revalidate the page= . > > Thanks. > > > > >> > >> Thanks, > >> Naoya Horiguchi > >> > >>> > >>>> + goto already_hwpoisoned; > >>>> + > >>>> + num_poisoned_pages_inc(); > >>>> + > >>>> /* > >>>> * TODO: hwpoison for pud-sized hugetlb doesn't work right n= ow, so > >>>> * simply disable it. In order to make it work properly, we = need > >>>> @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned lo= ng pfn, int flags) > >>>> out: > >>>> unlock_page(head); > >>>> return res; > >>>> +already_hwpoisoned: > >>>> + unlock_page(head); > >>>> + pr_err("Memory failure: %#lx: already hardware poisoned\n", = pfn); > >>>> + res =3D -EHWPOISON; > >>>> + if (flags & MF_ACTION_REQUIRED) > >>>> + res =3D kill_accessing_process(current, page_to_pfn(= head), flags); > >>>> + return res; > >>>> } > >>>> > >>>> static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > >>>> -- > >>>> 2.25.1 > >>>> > > . > > >