From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04EF7C77B7A for ; Fri, 26 May 2023 06:16:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45B1D900003; Fri, 26 May 2023 02:16:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 40BD7900002; Fri, 26 May 2023 02:16:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FB0F900003; Fri, 26 May 2023 02:16:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 23AB2900002 for ; Fri, 26 May 2023 02:16:49 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F2D961A0D58 for ; Fri, 26 May 2023 06:16:48 +0000 (UTC) X-FDA: 80831397696.17.6F1AFCC Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf27.hostedemail.com (Postfix) with ESMTP id 536FC4000C for ; Fri, 26 May 2023 06:16:44 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685081806; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=VSbOcWwZXMxt+JdyUMFtV77jYuD0J+s9NdjzPk+7gbc=; b=CFy5UM52gpoEsSeAwac6nNViCsvAX2l+r3lNY67cTt8pESh9KVN3i6GyoK70LyZSXzeBTP AnVl817YkTlPImLivLZ2qhnqA7lyTXxs5+jlVRtjKt2kv8FE4WzQabbN7Hn6ZUP+jbg0yW iJynRApaJC/bL9tYT22vhTq9+oe9jTI= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685081806; a=rsa-sha256; cv=none; b=CkRAgpojQlhX+17K30Fximkv1eVTX8ggOAsvgF29sDCELOaBIgpwTMcCOus9kwK5qJtvYS LiXs5CUJJRuFTPlb/KgkK9k7c/479eEEie8O49Zb+GjxNwohRhPIMU8euFaRfUQGE2heQA 95dEtTWloJANnIi1VXQW0g9fyaF8SJg= Received: from dggpemm500001.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4QSF1y3NqBz18LjM; Fri, 26 May 2023 14:12:06 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Fri, 26 May 2023 14:16:37 +0800 From: Kefeng Wang To: , , , , , , , , , , CC: , Kefeng Wang Subject: [PATCH v2] x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy Date: Fri, 26 May 2023 14:32:42 +0800 Message-ID: <20230526063242.133656-1-wangkefeng.wang@huawei.com> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 536FC4000C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: iwjh13ot1g5qhxo3ntqmjeu3psm55s87 X-HE-Tag: 1685081804-805990 X-HE-Meta: U2FsdGVkX1+Ntx50z4hq2SPWPsekVRag1MrF3B85Pr5yxwxADNVi8rmUCuxFak2XYLUvSmxq+U/gbdW4hNirSgj8buO+E8Ck2WZ3DO7P9lSoQeOk0E/TUiPxojbYzee0kl0BSIKKlLmeIfiv1i2sLeFO7PvgcxwDH75BDC5fhD9TuYkVGAmIi3xfNMkeaE9GY3vS2yxpgwNKZnElQI+buhr3VJtbs5BAV0K8hwSgSu2U+YMFKVtS5G0tgF0YHfmhbjp78hJ0wVdkJweG83rvhOeGdTlSr3q0/zNw3lH5IRzzHeOetmsEXX0/u8rw5GoJe5dwkhLt4tIFpuzznWZFtEYpyW0vaXh2irVRS7g7tnNOe005aKjadroAs2kyCQ1gkypEv+5lqRfEA8mET9dLa4g4azWajO0rP5Zdt+WVSzoinQxuNflFCrcJspg6m4zlCtXUocBFgotVWXBWTT6Sg8u86RUmEcmVxasyXy2nVXC/1Qsyb6Y95wbNl35a6hUvfOLQ76ywXaEnkURbRjjjDUg6D2F7AbEsbcyR/c5ixwvPp4H9Eh7liMR1hPdhWDkYTcyULsPY4R9FAkcG25aG6tiOSp2jpfhoIATUyrnh091R7L3uJJnHuMeApd3ymYlKRxTjZgHsqXtweF2p74EwejN0Gp4JpGqqkt7uAL4WAje61dM/ZeICMuUZ/z+Xoh2TmB3gZP+KEQfdkFW3wSTurtbwGCfrXgd3E4UT4Z+jOu0tWmZpOqx2vkBZ15lJaHQ2iKvSijCm8Nq5q4sMq8zdzlNlgiCdTHDIBHGKURdnxhs6Palui9CsJSz2omzjx6h45X0SIFrO40TevxJy7tvMhD3atMlHIinz0lRBhp/tllJfbvircIZznOvHdOffAX15CXYe09a2Eah/xuY8uxMrBn0fRcDVja5rhaquRizL3gLZBaEV9di8eaTWBuO1ZD0D1ThmVQ5JRL8yFdkRVYm miTYHVMt nt/EwhvAEdmS9vzlUtzjHJwZ/jQOZ2R/PxZF3dNBr05Vjkc+UStpHKp33qTX6sKysIlQQkDvkpRQ2kjw33NxQuDrpdxGZhYekNSLaE8ZSQy7of1+5VNSOT4XDVQnAhuo9Ngm1wToKnfkRNhxdBRKUZcLX6XjgGHS/8MwknGxAYhvkjFvEHjJn2Ui47qUwsMg6nnWa7MU0/FEnD04MutLf9VW1HBXnyeRWPltl4gO2blF56/YnJsbylXvPB6O55+pPNg9EvYHZc0aEawGYe3I75IwbJG9oOi2GUCBM7InGDRwjbL0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Both EX_TYPE_FAULT_MCE_SAFE and EX_TYPE_DEFAULT_MCE_SAFE exception fixup types are used to identify fixups which allow in kernel #MC recovery, that is the Machine Check Safe Copy. If an MCE which has happened in kernel space but from which the kernel can recover, mce.kflags MCE_IN_KERNEL_RECOV will set in error_context(), and we try to fixup the exception in do_machine_check(). But due to lack of MCE_IN_KERNEL_COPYIN, although the kernel won't panic, the corrupted page don't be isolated, new one maybe consume it again, which is not what we expected. In order to avoid above issue, some hwpoison recover process[1][2][3], memory_failure_queue() is called to cope with such unhandled corrupted pages, and recently coredump hwpoison recovery support[4] is asked to do the same thing, also there are some other already existed MC-safe copy scenarios, eg, nvdimm, dm-writecache, dax, which don't isolate corrupted pages. The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy, then let the core do_machine_check() to isolate corrupted page instead of doing it one-by-one. [1] commit d302c2398ba2 ("mm, hwpoison: when copy-on-write hits poison, take page offline") [2] commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults") [3] commit 6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()") [4] https://lkml.kernel.org/r/20230417045323.11054-1-wangkefeng.wang@huawei.com Reviewed-by: Naoya Horiguchi Reviewed-by: Tony Luck Signed-off-by: Kefeng Wang --- v2: - try to describe more clear problem statement, per Dave Hansen - collect RB arch/x86/kernel/cpu/mce/severity.c | 3 +-- mm/ksm.c | 1 - mm/memory.c | 12 +++--------- 3 files changed, 4 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c index c4477162c07d..63e94484c5d6 100644 --- a/arch/x86/kernel/cpu/mce/severity.c +++ b/arch/x86/kernel/cpu/mce/severity.c @@ -293,12 +293,11 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs) case EX_TYPE_COPY: if (!copy_user) return IN_KERNEL; - m->kflags |= MCE_IN_KERNEL_COPYIN; fallthrough; case EX_TYPE_FAULT_MCE_SAFE: case EX_TYPE_DEFAULT_MCE_SAFE: - m->kflags |= MCE_IN_KERNEL_RECOV; + m->kflags |= MCE_IN_KERNEL_RECOV | MCE_IN_KERNEL_COPYIN; return IN_KERNEL_RECOV; default: diff --git a/mm/ksm.c b/mm/ksm.c index 0156bded3a66..7abdf4892387 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2794,7 +2794,6 @@ struct page *ksm_might_need_to_copy(struct page *page, if (new_page) { if (copy_mc_user_highpage(new_page, page, address, vma)) { put_page(new_page); - memory_failure_queue(page_to_pfn(page), 0); return ERR_PTR(-EHWPOISON); } SetPageDirty(new_page); diff --git a/mm/memory.c b/mm/memory.c index 8358f3b853f2..74873e7126aa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2813,10 +2813,8 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, unsigned long addr = vmf->address; if (likely(src)) { - if (copy_mc_user_highpage(dst, src, addr, vma)) { - memory_failure_queue(page_to_pfn(src), 0); + if (copy_mc_user_highpage(dst, src, addr, vma)) return -EHWPOISON; - } return 0; } @@ -5851,10 +5849,8 @@ static int copy_user_gigantic_page(struct folio *dst, struct folio *src, cond_resched(); if (copy_mc_user_highpage(dst_page, src_page, - addr + i*PAGE_SIZE, vma)) { - memory_failure_queue(page_to_pfn(src_page), 0); + addr + i*PAGE_SIZE, vma)) return -EHWPOISON; - } } return 0; } @@ -5870,10 +5866,8 @@ static int copy_subpage(unsigned long addr, int idx, void *arg) struct copy_subpage_arg *copy_arg = arg; if (copy_mc_user_highpage(copy_arg->dst + idx, copy_arg->src + idx, - addr, copy_arg->vma)) { - memory_failure_queue(page_to_pfn(copy_arg->src + idx), 0); + addr, copy_arg->vma)) return -EHWPOISON; - } return 0; } -- 2.35.3