From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95E93C433E6 for ; Thu, 11 Mar 2021 19:33:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54B6564F1D for ; Thu, 11 Mar 2021 19:33:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229826AbhCKTc5 (ORCPT ); Thu, 11 Mar 2021 14:32:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbhCKTcY (ORCPT ); Thu, 11 Mar 2021 14:32:24 -0500 Received: from mail-vk1-xa29.google.com (mail-vk1-xa29.google.com [IPv6:2607:f8b0:4864:20::a29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D18DC061574 for ; Thu, 11 Mar 2021 11:32:24 -0800 (PST) Received: by mail-vk1-xa29.google.com with SMTP id b10so731481vkl.0 for ; Thu, 11 Mar 2021 11:32:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=C0wdrEAiMMmWHLihbdoEejdJOqcekwArwh5nWI9AF+E=; b=pfKBKvLcSEh2rf6Tz2UqQC334LNNdGbmXRRdPFfPxFwCOfENfCMYzVniP+pNXUO51n CQhSKFtZDS/FSvZfdtmYMbKeFWFVrEdbLxZZQrke+CrGr0GH9Xf1YHoq8ZsPdCRLakzh k3lNNPGv8cPmQk/lTuzRh8m/jsle+oKcczY1DDUlbYzuYv2Tu95yeNFdntqXvGldr8b1 r08bbJQtXfomjwkbSeIrUWxwk/IAr8HHJxATFRXTqOR1wsZvMRjCGsEeiCsGq6U30VkY pGcCz/r2qXA/wQo2HWq/3h0Cnn2Kk/GAhUK1ixUUVlehsYCtS+8Ym4e7GyOC+kqLUeVZ WZIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=C0wdrEAiMMmWHLihbdoEejdJOqcekwArwh5nWI9AF+E=; b=prfzuXI1Ds9OM0rewKMdZy6JshI+96tMsSMAPN5vAJcFFsGD3AT9jie1chXN0iHRAb iDrazunHcQvA7Zcw8KlJDdmRrSZDCzaD4FAvzzh/KHUkMP0XBaFRkNvnhlZ2HEt+9aZR w39g9hqkGEnbBIL0LM8Z9n1QCv9osuvecK35n/DgZSkuGHsccDP/OTqQ9arrFw5DDKf3 qcubKiZU5HSg/38SyWr8enw1gPGy4cVaySby3lkz8ooecTTSWsbp0ZCIYKfJkjxN8PTU rbONPE2EgzdkXa3rauovTgdPFL1LSFQweBALgedbWILCld5lvI1HB+z0+FkePFKKlqlw LGeg== X-Gm-Message-State: AOAM530MS6r9MIVEf/Ezi84xidn9FOY+byhtaO7thRfmfM2npUfdiFaX SGHAmqEyUmT2WW5hANz3sHW0wOecrsWlERvmej4nGA== X-Google-Smtp-Source: ABdhPJxN9I8OGtcD4FwpVzjbwJWAGogBPlnFfl0ADXmkUxUu3+PaAv72E+nohGXh2dZ1mWPFPaQ/qII823BVrfoHJxg= X-Received: by 2002:a1f:b686:: with SMTP id g128mr5810836vkf.25.1615491143184; Thu, 11 Mar 2021 11:32:23 -0800 (PST) MIME-Version: 1.0 References: <20210209062128.453814-1-nao.horiguchi@gmail.com> <20210311151446.GA28650@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: <20210311151446.GA28650@hori.linux.bs1.fc.nec.co.jp> From: Jue Wang Date: Thu, 11 Mar 2021 11:32:12 -0800 Message-ID: Subject: Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: Hugh Dickins , Naoya Horiguchi , Andrew Morton , Michal Hocko , Oscar Salvador , Tony Luck , Matthew Wilcox , "Aneesh Kumar K.V" , Greg Thelen , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Huge thanks to Hugh for his expertise in shmem thps and forwarding this mes= sage! Hi Naoya, This is the first time I reply to an email in a Linux upstream thread, apologies for any technical issues due to my email client. And my apologies for the state of the whole patch not quite shareable with upstream due to some kernel differences. Some fyi comment inline. Thanks, -Jue On Thu, Mar 11, 2021 at 7:14 AM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80= =E7=9B=B4=E4=B9=9F) wrote: > > On Wed, Mar 10, 2021 at 11:22:18PM -0800, Hugh Dickins wrote: > > On Tue, 9 Feb 2021, Naoya Horiguchi wrote: > > > > > From: Naoya Horiguchi > > > > > > Currently hwpoison code checks PageAnon() for thp and refuses to hand= le > > > errors on non-anonymous thps (just for historical reason). We now > > > support non-anonymou thp like shmem one, so this patch suggests to en= able > > > to handle shmem thps. Fortunately, we already have can_split_huge_pag= e() > > > to check if a give thp is splittable, so this patch relies on it. > > > > Fortunately? I don't understand. Why call can_split_huge_page() > > at all, instead of simply trying split_huge_page() directly? > > The background of this change was that I've experienced triggering > VM_BUG_ON_PAGE() at the beginning of split_huge_page_to_list() in the old= er > kernel (I forgot specific condition of the BUG_ON). I thought that that = was > caused by race between thp allocation and memory_failure. So I wanted to > have some rigid way to confirm that a given thp is splittable. Then I fo= und > can_split_huge_page(), which sounds suitable to me because I expected the= API > to be maintained by thp subsystem. > > But I rethink that split_huge_page_to_list() seems to have different set = of > VM_BUG_ON_PAGE()s now, and anyway split_huge_page_to_list() calls > can_split_huge_page() internally, so I might have wrongly read the code. > > > And could it do better than -EBUSY when split_huge_page() fails? > > Yes it could. > > > > > > > > > Signed-off-by: Naoya Horiguchi > > > > Thanks for trying to add shmem+file THP support, but I think this > > does not work as intended - Andrew, if Naoya agrees, please drop from > > mmotm for now, the fixup needed will be more than a line or two. > > I agree to drop it. I need research more to address the following comment= s. > > > > > I'm not much into memory-failure myself, but Jue discovered that the > > SIGBUS never arrives: because split_huge_page() on a shmem or file > > THP unmaps all its pmds and ptes, and (unlike with anon) leaves them > > unmapped - in normal circumstances, to be faulted back on demand. > > So the page_mapped() check in hwpoison_user_mappings() fails, > > and the intended SIGBUS is not delivered. > > Thanks for the information. The split behaves quite differently between > for anon thp and for shmem thp. I saw some unexpected behavior in my > testing, maybe that's due to the difference. > > > > > (Or, is it acceptable that the SIGBUS is not delivered to those who > > have the huge page mapped: should it get delivered later, to anyone > > who faults back in the bad 4k?) > > Later access should report error in page fault, so the worst scenario > of consuming corrupted data does not happen, but precautionary signal > does not work so it's not acceptable. In our experiment with SHMEM THPs, later accesses resulted in a zero page allocated instead of a SIGBUS with BUS_MCEERR_AR reported by the page fault handler. That part might be an opportunity to prevent some silent data corruption just in case. > > > > > We believe the tokill list has to be set up earlier, before > > split_huge_page() is called, then passed in to hwpoison_user_mappings()= . > > > > Sorry, we don't have a proper patch for that right now, but I expect > > you can see what needs to be done. But something we found on the way, > > we do have a patch for: add_to_kill() uses page_address_in_vma(), but > > that has not been used on file THP tails before - fix appended at the > > end below, so as not to waste your time on that bit. > > > > Thank you very much, I'll work on top of it. > > Thanks, > Naoya Horiguchi > > ... > > > > [PATCH] mm: fix page_address_in_vma() on file THP tails > > From: Jue Wang > > > > Anon THP tails were already supported, but memory-failure now needs to = use > > page_address_in_vma() on file THP tails, which its page->mapping check = did > > not permit: fix it. > > > > Signed-off-by: Jue Wang > > Signed-off-by: Hugh Dickins > > --- > > > > mm/rmap.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > --- 5.12-rc2/mm/rmap.c 2021-02-28 16:58:57.950450151 -0800 > > +++ linux/mm/rmap.c 2021-03-10 20:29:21.591475177 -0800 > > @@ -717,11 +717,11 @@ unsigned long page_address_in_vma(struct > > if (!vma->anon_vma || !page__anon_vma || > > vma->anon_vma->root !=3D page__anon_vma->root) > > return -EFAULT; > > - } else if (page->mapping) { > > - if (!vma->vm_file || vma->vm_file->f_mapping !=3D page->m= apping) > > - return -EFAULT; > > - } else > > + } else if (!vma->vm_file) { > > + return -EFAULT; > > + } else if (vma->vm_file->f_mapping !=3D compound_head(page)->mapp= ing) { > > return -EFAULT; > > + } > > address =3D __vma_address(page, vma); > > if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)= ) > > return -EFAULT; > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE7EAC433E0 for ; Thu, 11 Mar 2021 19:32:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D3D764F14 for ; Thu, 11 Mar 2021 19:32:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D3D764F14 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 64F0E8D02F3; Thu, 11 Mar 2021 14:32:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 625308D02B2; Thu, 11 Mar 2021 14:32:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A1FD8D02F3; Thu, 11 Mar 2021 14:32:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 27B3B8D02B2 for ; Thu, 11 Mar 2021 14:32:25 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DC2A9180ACEEB for ; Thu, 11 Mar 2021 19:32:24 +0000 (UTC) X-FDA: 77908589808.15.31FB075 Received: from mail-vk1-f171.google.com (mail-vk1-f171.google.com [209.85.221.171]) by imf20.hostedemail.com (Postfix) with ESMTP id 5503113A for ; Thu, 11 Mar 2021 19:32:20 +0000 (UTC) Received: by mail-vk1-f171.google.com with SMTP id u144so529389vkb.13 for ; Thu, 11 Mar 2021 11:32:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=C0wdrEAiMMmWHLihbdoEejdJOqcekwArwh5nWI9AF+E=; b=pfKBKvLcSEh2rf6Tz2UqQC334LNNdGbmXRRdPFfPxFwCOfENfCMYzVniP+pNXUO51n CQhSKFtZDS/FSvZfdtmYMbKeFWFVrEdbLxZZQrke+CrGr0GH9Xf1YHoq8ZsPdCRLakzh k3lNNPGv8cPmQk/lTuzRh8m/jsle+oKcczY1DDUlbYzuYv2Tu95yeNFdntqXvGldr8b1 r08bbJQtXfomjwkbSeIrUWxwk/IAr8HHJxATFRXTqOR1wsZvMRjCGsEeiCsGq6U30VkY pGcCz/r2qXA/wQo2HWq/3h0Cnn2Kk/GAhUK1ixUUVlehsYCtS+8Ym4e7GyOC+kqLUeVZ WZIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=C0wdrEAiMMmWHLihbdoEejdJOqcekwArwh5nWI9AF+E=; b=QlEb2cDant9s7PtFw6GOKAzrSU6ut70ULbjGQgqV9PAtf8jldscYl/tmU4dnME2lk9 LFxaepG9bc0qt3Dc7O4WYH3BNVeXRO1oJAM6F8M2Ko+fVUh/dQFxl1fhbLVGUyoWDQtt PFeHFeUaGp/3tFQpHLK109oYKilkFbB1gniRCKhSMuAhzr/5HheJHSozF9oLESO4wFWa BSNJ7OAfHALbySyukaMkAgsK1Sh6TwtGeyKAKzZP2F6BIIp8+nfcKiQgYdKpeZNQV1So 2liKcvpQFOTEyhouvWEBf0g8mlnGn7hDWkbSfKYN4g6F/ER9hWb2gONav6v2jFH2SU8e ssWw== X-Gm-Message-State: AOAM532SBkLeu0GFUWVtwlJH5zAV7Er7bMN5C4IhyxcB/3NU8vYMPdC2 p4TnpmI2fCb96BPiFGjQyAkMeg/MfGMO+2u+dXnjgA== X-Google-Smtp-Source: ABdhPJxN9I8OGtcD4FwpVzjbwJWAGogBPlnFfl0ADXmkUxUu3+PaAv72E+nohGXh2dZ1mWPFPaQ/qII823BVrfoHJxg= X-Received: by 2002:a1f:b686:: with SMTP id g128mr5810836vkf.25.1615491143184; Thu, 11 Mar 2021 11:32:23 -0800 (PST) MIME-Version: 1.0 References: <20210209062128.453814-1-nao.horiguchi@gmail.com> <20210311151446.GA28650@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: <20210311151446.GA28650@hori.linux.bs1.fc.nec.co.jp> From: Jue Wang Date: Thu, 11 Mar 2021 11:32:12 -0800 Message-ID: Subject: Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: Hugh Dickins , Naoya Horiguchi , Andrew Morton , Michal Hocko , Oscar Salvador , Tony Luck , Matthew Wilcox , "Aneesh Kumar K.V" , Greg Thelen , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: pftbwgi4rzgaiprnyob57k1a45dhnezq X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5503113A Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=mail-vk1-f171.google.com; client-ip=209.85.221.171 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615491140-628933 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge thanks to Hugh for his expertise in shmem thps and forwarding this mes= sage! Hi Naoya, This is the first time I reply to an email in a Linux upstream thread, apologies for any technical issues due to my email client. And my apologies for the state of the whole patch not quite shareable with upstream due to some kernel differences. Some fyi comment inline. Thanks, -Jue On Thu, Mar 11, 2021 at 7:14 AM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80= =E7=9B=B4=E4=B9=9F) wrote: > > On Wed, Mar 10, 2021 at 11:22:18PM -0800, Hugh Dickins wrote: > > On Tue, 9 Feb 2021, Naoya Horiguchi wrote: > > > > > From: Naoya Horiguchi > > > > > > Currently hwpoison code checks PageAnon() for thp and refuses to hand= le > > > errors on non-anonymous thps (just for historical reason). We now > > > support non-anonymou thp like shmem one, so this patch suggests to en= able > > > to handle shmem thps. Fortunately, we already have can_split_huge_pag= e() > > > to check if a give thp is splittable, so this patch relies on it. > > > > Fortunately? I don't understand. Why call can_split_huge_page() > > at all, instead of simply trying split_huge_page() directly? > > The background of this change was that I've experienced triggering > VM_BUG_ON_PAGE() at the beginning of split_huge_page_to_list() in the old= er > kernel (I forgot specific condition of the BUG_ON). I thought that that = was > caused by race between thp allocation and memory_failure. So I wanted to > have some rigid way to confirm that a given thp is splittable. Then I fo= und > can_split_huge_page(), which sounds suitable to me because I expected the= API > to be maintained by thp subsystem. > > But I rethink that split_huge_page_to_list() seems to have different set = of > VM_BUG_ON_PAGE()s now, and anyway split_huge_page_to_list() calls > can_split_huge_page() internally, so I might have wrongly read the code. > > > And could it do better than -EBUSY when split_huge_page() fails? > > Yes it could. > > > > > > > > > Signed-off-by: Naoya Horiguchi > > > > Thanks for trying to add shmem+file THP support, but I think this > > does not work as intended - Andrew, if Naoya agrees, please drop from > > mmotm for now, the fixup needed will be more than a line or two. > > I agree to drop it. I need research more to address the following comment= s. > > > > > I'm not much into memory-failure myself, but Jue discovered that the > > SIGBUS never arrives: because split_huge_page() on a shmem or file > > THP unmaps all its pmds and ptes, and (unlike with anon) leaves them > > unmapped - in normal circumstances, to be faulted back on demand. > > So the page_mapped() check in hwpoison_user_mappings() fails, > > and the intended SIGBUS is not delivered. > > Thanks for the information. The split behaves quite differently between > for anon thp and for shmem thp. I saw some unexpected behavior in my > testing, maybe that's due to the difference. > > > > > (Or, is it acceptable that the SIGBUS is not delivered to those who > > have the huge page mapped: should it get delivered later, to anyone > > who faults back in the bad 4k?) > > Later access should report error in page fault, so the worst scenario > of consuming corrupted data does not happen, but precautionary signal > does not work so it's not acceptable. In our experiment with SHMEM THPs, later accesses resulted in a zero page allocated instead of a SIGBUS with BUS_MCEERR_AR reported by the page fault handler. That part might be an opportunity to prevent some silent data corruption just in case. > > > > > We believe the tokill list has to be set up earlier, before > > split_huge_page() is called, then passed in to hwpoison_user_mappings()= . > > > > Sorry, we don't have a proper patch for that right now, but I expect > > you can see what needs to be done. But something we found on the way, > > we do have a patch for: add_to_kill() uses page_address_in_vma(), but > > that has not been used on file THP tails before - fix appended at the > > end below, so as not to waste your time on that bit. > > > > Thank you very much, I'll work on top of it. > > Thanks, > Naoya Horiguchi > > ... > > > > [PATCH] mm: fix page_address_in_vma() on file THP tails > > From: Jue Wang > > > > Anon THP tails were already supported, but memory-failure now needs to = use > > page_address_in_vma() on file THP tails, which its page->mapping check = did > > not permit: fix it. > > > > Signed-off-by: Jue Wang > > Signed-off-by: Hugh Dickins > > --- > > > > mm/rmap.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > --- 5.12-rc2/mm/rmap.c 2021-02-28 16:58:57.950450151 -0800 > > +++ linux/mm/rmap.c 2021-03-10 20:29:21.591475177 -0800 > > @@ -717,11 +717,11 @@ unsigned long page_address_in_vma(struct > > if (!vma->anon_vma || !page__anon_vma || > > vma->anon_vma->root !=3D page__anon_vma->root) > > return -EFAULT; > > - } else if (page->mapping) { > > - if (!vma->vm_file || vma->vm_file->f_mapping !=3D page->m= apping) > > - return -EFAULT; > > - } else > > + } else if (!vma->vm_file) { > > + return -EFAULT; > > + } else if (vma->vm_file->f_mapping !=3D compound_head(page)->mapp= ing) { > > return -EFAULT; > > + } > > address =3D __vma_address(page, vma); > > if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)= ) > > return -EFAULT; > >