From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6961C433E0 for ; Thu, 18 Mar 2021 18:15:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 592FD64E2E for ; Thu, 18 Mar 2021 18:15:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 592FD64E2E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9584A6B0078; Thu, 18 Mar 2021 14:15:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92F3B6B007B; Thu, 18 Mar 2021 14:15:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A9706B007D; Thu, 18 Mar 2021 14:15:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id 5EE506B0078 for ; Thu, 18 Mar 2021 14:15:42 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1FFA7824999B for ; Thu, 18 Mar 2021 18:15:42 +0000 (UTC) X-FDA: 77933798082.20.6BA7FD1 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf03.hostedemail.com (Postfix) with ESMTP id E5F7AC001C4F for ; Thu, 18 Mar 2021 18:13:55 +0000 (UTC) Received: by mail-ej1-f50.google.com with SMTP id t18so5418979ejc.13 for ; Thu, 18 Mar 2021 11:13:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=qUjeaAyWi70EHRLQCMdUUvnyD/huRALnIvr95A+nq/I=; b=QY6hUpTpVyj+BPjqECKGt61xKEaU01jBcR64v6kJyVOQIiJ8R26RQuamcB4Nt/82ou 6ALFTG3juplNjgj9j+vZAs7kUxqr96xvXzymoFOQ8Y7ei1sGr+gOPxgmHXPETdld6FVB 7DWIm0asL3696bIDd+okVvvASf8fFTzLlXbm6DQ5U5SlLdbTHdMTxUkyZyasVQgEdstO 3XBESx0M6sdYV7q8+5aGt45waSHxpHxGzRlefx+i5iSg1FWiosMzqWw2o3ZrNh4pkh1r vAqdcuGIDfYQ1LQeSTO5ki/iprOzwDeRMUx5fuuIVCl4XpSlkRGoHhzIwIUD9tnNDL9+ hhkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=qUjeaAyWi70EHRLQCMdUUvnyD/huRALnIvr95A+nq/I=; b=tHPqoM1a8TeE2NWHERR+1+jEXKTp8Q52/0OAVrFWF5m8IA/n2pyG7R+xnHjJV4mhPb b6LleZ4bgaZnqRK40evSty4Gp1abMrIyxAvjClU8If8spOvMIDieQug2mAJch4rnpH66 U7QEpP9J1/l5IGBflAsBIAzUaGfEsk4q/Vc4sA1zt9DmU0QHXAr6piryXFjcW6GfrnXQ GlIbjMujPdnJ8gyfzOsbpu+MPt8AiS8YnZ1kQ6ZiuFDrQbBSbPSeJM+8jqMugoNrgMXV 0hUrErLS9ZvmSePKweGsJPeWFF+X4Ac85JJG7hjcBh5ZTMLWmo1LWdTjDgWDrVzH+yr+ 6Ixg== X-Gm-Message-State: AOAM531/hqRokm3YW7nYma/rywQ2Y4W8Ylb3m55jJgOVB2LlEiTET+gd 0FaR3heVIwdksq+dDCkiobhrhiOz44B16l6oCE4c8398 X-Google-Smtp-Source: ABdhPJxaJ1swJ0B6IIdyCU53Ahh3b5u/GGShC+UB7onP91hPSEdpVP3UDKnyJgfrZQNhn5LLhR3Yl9AcxU8oEk1ZC0k= X-Received: by 2002:aa7:c1d8:: with SMTP id d24mr5175234edp.290.1616090723264; Thu, 18 Mar 2021 11:05:23 -0700 (PDT) MIME-Version: 1.0 References: <20210316140947.GA3420@casper.infradead.org> <20210318140843.7dv3wnxg4geplrjx@box> <20210318145716.GO3420@casper.infradead.org> <20210318172512.GA30960@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: <20210318172512.GA30960@hori.linux.bs1.fc.nec.co.jp> From: Yang Shi Date: Thu, 18 Mar 2021 11:05:11 -0700 Message-ID: Subject: Re: File THP and HWPoison To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: Matthew Wilcox , "Kirill A. Shutemov" , "linux-mm@kvack.org" , "Kirill A. Shutemov" , Hugh Dickins , Andi Kleen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E5F7AC001C4F X-Stat-Signature: 1piorixyhmp7h58jxzt5psug9c4eyi45 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=mail-ej1-f50.google.com; client-ip=209.85.218.50 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616091235-589165 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 18, 2021 at 10:25 AM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80= =80=E7=9B=B4=E4=B9=9F) wrote: > > On Thu, Mar 18, 2021 at 02:57:16PM +0000, Matthew Wilcox wrote: > > On Thu, Mar 18, 2021 at 05:08:43PM +0300, Kirill A. Shutemov wrote: > > > On Tue, Mar 16, 2021 at 02:09:47PM +0000, Matthew Wilcox wrote: > > > > If we get a memory failure in the middle of a file THP, I think we = handle > > > > it poorly. > > > > > > > > int memory_failure(unsigned long pfn, int flags) > > > > ... > > > > if (TestSetPageHWPoison(p)) { > > > > ... > > > > orig_head =3D hpage =3D compound_head(p); > > > > ... > > > > if (PageTransHuge(hpage)) { > > > > if (try_to_split_thp_page(p, "Memory Failure") < 0)= { > > > > action_result(pfn, MF_MSG_UNSPLIT_THP, MF_I= GNORED); > > > > return -EBUSY; > > > > } > > > > > > > > static int try_to_split_thp_page(struct page *page, const char *msg= ) > > > > { > > > > lock_page(page); > > > > if (!PageAnon(page) || unlikely(split_huge_page(page))) { > > > > unsigned long pfn =3D page_to_pfn(page); > > > > > > > > unlock_page(page); > > > > if (!PageAnon(page)) > > > > pr_info("%s: %#lx: non anonymous thp\n", ms= g, pfn); > > > > else > > > > pr_info("%s: %#lx: thp split failed\n", msg= , pfn); > > > > put_page(page); > > > > return -EBUSY; > > > > > > > > So (for some reason) we don't even try to split a file THP. But th= en, > > > > if we take a page fault on a file THP: > > > > > > > > static struct page *next_uptodate_page(struct page *page, > > > > ... > > > > if (PageHWPoison(page)) > > > > goto skip; > > > > (... but we're only testing the head page here, which isn't necessa= rily > > > > the one which got the error ...) > > > > > > > > if (pmd_none(*vmf->pmd) && PageTransHuge(page)) { > > > > vm_fault_t ret =3D do_set_pmd(vmf, page); > > > > > > > > So we now map the PMD-sized page into userspace, even though it has= a > > > > HWPoison in it. > > > > > > > > I think there are two things that we should be doing: > > > > > > > > 1. Attempt to split THPs which are file-backed. That makes most of= this > > > > problem disappear because there won't be THPs with HWPoison, mostly= . > > > > > > +Naoya. Could you give more context here? > > Recently, I tried to address the problem on > https://lore.kernel.org/linux-mm/20210209062128.453814-1-nao.horiguchi@gm= ail.com/ > but the patch was found incorrect because the related page table entries = disappeared > after split_huge_page() succeeded. I thought I'm going to study more, bu= t > didn't make it this week because I looked at other review requests. > > A pmd mapping for anonymous thp is replaced with 512 pte mappings by > split_huge_page(), so I'm wondering why we don't do the same for shmem th= p. We have to install 512 ptes for anonymous pages otherwise they are gone. But it is unnecessary for page cache pages, even though the ptes are gone they are still in xarray and can be found by page fault later. IMHO we should be able to pass in one more parameter to teach split_huge_page to reinstall ptes for page cache for some cases. > > > > > I did some git archaeology and found this check was introduced in > > 7f6bf39bbdd1 ("mm/hwpoison: fix panic due to split huge zero page") whe= re > > it wasn't intended to catch _file_ pages at all, but the zero page. > > I suspect that nobody thought to look at this when introducing THP > > for shmem. > > Yes, 7f6bf39bbdd1 was worked before thp page cache, so we did not conside= r > it at that time. > > Thanks, > Naoya Horiguchi > > > > > > > 2. When the THP fails to split, use a spare page flag to indicate t= hat > > > > the THP contains a HWPoison bit in one of its subpages. There are = a > > > > lot of PF_SECOND flags available for this purpose. > > > > > > > > but I know almost nothing about the memory-failure subsystem and I'= m > > > > still learning all the complexities of THPs, so it's entirely possi= ble > > > > I've overlooked something important. > > > > > > I wounder if it would be cleaner to switch PG_hwpoison to PF_HEAD: if > > > split failed we posion whole compound page. Yes, we will waste more > > > memory, but it makes it much cleaner for user: just check if the page= is > > > poisoned. > > > > I think that's a poor quality implementation ... it'd cause processes > > to die that weren't even touching the page that had hwpoison. Using > > a PF_SECOND bit lets us do the check as cheaply as if we made hwpoison > > PF_HEAD.