From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC606C74A5B for ; Tue, 21 Mar 2023 07:17:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 633796B0075; Tue, 21 Mar 2023 03:17:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E3B56B0078; Tue, 21 Mar 2023 03:17:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AC0B6B007B; Tue, 21 Mar 2023 03:17:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3932F6B0075 for ; Tue, 21 Mar 2023 03:17:13 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 03206A111C for ; Tue, 21 Mar 2023 07:17:12 +0000 (UTC) X-FDA: 80592049146.13.0DC40E8 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf28.hostedemail.com (Postfix) with ESMTP id 0EE00C0008 for ; Tue, 21 Mar 2023 07:17:08 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf28.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679383031; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=syOEVzI5VDpIdESzVL9kl0P+cq5aGrrHtMvzhT6465g=; b=d61ds0vGWAsK6oqCqaQgCmukJlJGT3SolG2wG0jyrPYwLp8Vp2skYujPjSZopQUnDWy73t ti4hDMVl15WV+bRVsHGlTVKFnERG6dPiYE4stQWYQQ24oZGVLY9y7YNI9e218lfOGA8BFb R5gMWfYkOkqfGv3XPydq8RxxsIIBLxQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf28.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679383031; a=rsa-sha256; cv=none; b=4CCtxXcCBLJxZGTtq4rIQOZlxlqoEyzavVPXJNdkT2vkXEVOK+tMX1L5kyz0gj7UY5QQP8 g7RnxhjUnkKuQLbDDhckRYv+J2nnEIpw1t9x+cgK/cIYpCIeMRFLqHhcozc+1EwkYqIP8E lCDrbrVUNgPPQDQ376B4+r23UFrVAgw= Received: from dggpemm500014.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4PgjWq3xFsz17Lb0; Tue, 21 Mar 2023 15:13:59 +0800 (CST) Received: from [10.174.178.120] (10.174.178.120) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Tue, 21 Mar 2023 15:17:03 +0800 Message-ID: <3b7e99a5-de65-67a5-4f74-d0d8d40fa9f2@huawei.com> Date: Tue, 21 Mar 2023 15:17:03 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 From: mawupeng Subject: Re: [PATCH v3 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code To: , , CC: , , , , , , , , , , , , , , , , , , , References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20230317072443.3189-1-xueshuai@linux.alibaba.com> Content-Language: en-US In-Reply-To: <20230317072443.3189-1-xueshuai@linux.alibaba.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.120] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0EE00C0008 X-Stat-Signature: 8fxd8shqhm8ao3ssa8p5uque3rwsmk9d X-HE-Tag: 1679383028-517909 X-HE-Meta: U2FsdGVkX1/73RML8K8kQyer1wC1Q2pMgW/mdrSx9MRiNrX+4at5ndBkbat7j1vupG+PC9N7jMH81vmREiHruvqf7CBz3re3//rvrJqck3o0+vugdVSH9c2sXE9NbrXliGP2CrsucrcsKKqanmGvDiezGcYEsgsdDbZDFs0Afq3O48Z8U72OIaEXh+WNoZOE3eGpp+zb/dZWszP15D7L31mcd6CHjl/F5p3UyWhjL7FqXiMxU26w709Y3H057eCn/fKU4O9iCbml9IqC4zJk7vba5xl2vM83QzpXKYHhY13jpHlG9mzRec4McA1W1+o73XLWmU1v1MhgenlVLNjluIZAh1biVS5ff6iSmTGxdbCf7ptZAyWOpWBqe1jZhnuR+5rQVl0hiRG2QT/KJ9cFJnomSnz/9hsY5sfYEOwTLY+fEXsM6pNBjS7m0QR62h1Pkq/fq14Hn5/o85Z6X2ijaP8cD9bOEqCgWbZjHnB3J/rhwI+DvAOBScO9gzXdI8ATcdW2+Z7Q6H+xzlYKSWmRzIDiKRGZDwgtzUYnJsTdZbvJh2RYNWcX3Atl70VL8W3aQw+1WX3V6FQKRVUYoEwlKp2BtfAubzoTDtwuR65UGW2WwkGJ1Uq7038m43HJPhyNaUTdk25bpwv6jYwfohjKYEgCj0Kz8Agyv0GkxamSQzZjH0HoFp2zAT1rjT8ahYHPh9Jcv2uSTPq2phRUTP9kJaXKKs1DXSHddMk4dsNkstLIEFOxysWauRl1RbNhSUIBn/avR+oPYmmw2nn53WPaNxgYsaAAPj0NafUMVr7nAlk8/SMyjjyfHeQhhnwKMZ4bCrwa064gkJV0yRc/NWTpIw8rUIJGDq8OXgudf6S/dBDBXjYXkGBOpiHPMNk3xS4Yog9d08/+8miPec4faFrVonvIBLft/piA5A4SyfrDsVb0zpnXtBQlah+OyLzlncgZzz6C/IG9YDapgwd88aF lYg8nLe/ /bhNUr7pbGAu7/+OJwNnrqO0b3VCz85CsCAEJGnMftmaaRw9S42/XkOUn3F596mmyAzZzQ8LT+zjFT4h0jjc1gOujBmDJAzQ5Z19sVtpxlJCe8po3S7R6lI94ZGhjqB4C45EAmG8GZVGBzRjhJIkcmyLlodYtyj1vOXUadoGADj7sumjbOC+h8sZ+lgkuS+QG7+bPFyAqB8guGFdBa1muoY5CdOVJpcX1k+uuJBmB0+wnBzYGSQOdFvVVWbKt1NCyLXc9RHc0xzBO9DPGgpWWfaSeGBEXbAmMYWo+LUgogTDqbHLfnWVC/xOOhVhpct+HbTdK1EW7nF6e+l1sySX2TAeI8M5hUNRGuOggyu8r64x0F1g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Test-by: Ma Wupeng I have test this on arm64 with following steps: 1. make memory failure return EBUSY 2. force a UCE with einj Without this patchset, user task will not be kill since memory_failure can not handle this UCE properly and user task is in D state. The stack can be found in the end. With this patchset, user task can be killed even memory_failure return -EBUSY without doing anything. Here is the stack of user task with D state: # cat /proc/7001/stack [<0>] __flush_work.isra.0+0x80/0xa8 [<0>] __cancel_work_timer+0x144/0x1c8 [<0>] cancel_work_sync+0x1c/0x30 [<0>] memory_failure_queue_kick+0x3c/0x88 [<0>] ghes_kick_task_work+0x28/0x78 [<0>] task_work_run+0xb8/0x188 [<0>] do_notify_resume+0x1e0/0x280 [<0>] el0_da+0x130/0x138 [<0>] el0t_64_sync_handler+0x68/0xc0 [<0>] el0t_64_sync+0x188/0x190 On 2023/3/17 15:24, Shuai Xue wrote: > changes since v2 by addressing comments from Naoya: > - rename mce_task_work to sync_task_work > - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify() > - add steps to reproduce this problem in cover letter > - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e > > changes since v1: > - synchronous events by notify type > - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/ > > Currently, both synchronous and asynchronous error are queued and handled > by a dedicated kthread in workqueue. And Memory failure for synchronous > error is synced by a cancel_work_sync trick which ensures that the > corrupted page is unmapped and poisoned. And after returning to user-space, > the task starts at current instruction which triggering a page fault in > which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON. > > However, the memory failure recovery for hwpoison-aware mechanisms does not > work as expected. For example, hwpoison-aware user-space processes like > QEMU register their customized SIGBUS handler and enable early kill mode by > seting PF_MCE_EARLY at initialization. Then the kernel will directy notify > the process by sending a SIGBUS signal in memory failure with wrong > si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of > BUS_MCEERR_AR. > > To address this problem: > > - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which > indicates error happened in current execution context > - PATCH 2 separates synchronous error handling into task work so that the > current context in memory failure is exactly belongs to the task > consuming poison data. > > Then, kernel will send SIGBUS with proper si_code in kill_proc(). > > Lv Ying and XiuQi also proposed to address similar problem and we discussed > about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to > distinguish synchronous event. [2][3] The UEFI community still has no response. > After a deep dive into the SDEI TRM, the SDEI notification should be used for > asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an > exception-like entry into the client, **with the client providing an additional > asynchronous entry point similar to an interrupt entry point**". The client > (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR, > etc). So notify type is enough to distinguish synchronous event. > > To reproduce this problem: > > # STEP1: enable early kill mode > #sysctl -w vm.memory_failure_early_kill=1 > vm.memory_failure_early_kill = 1 > > # STEP2: inject an UCE error and consume it to trigger a synchronous error > #einj_mem_uc single > 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 > injecting ... > triggering ... > signal 7 code 5 addr 0xffffb0d75000 > page not present > Test passed > > The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error > and it is not fact. > > After this patch set: > > # STEP1: enable early kill mode > #sysctl -w vm.memory_failure_early_kill=1 > vm.memory_failure_early_kill = 1 > > # STEP2: inject an UCE error and consume it to trigger a synchronous error > #einj_mem_uc single > 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 > injecting ... > triggering ... > signal 7 code 4 addr 0xffffb0d75000 > page not present > Test passed > > The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error > as we expected. > > [1] https://developer.arm.com/documentation/den0054/latest/ > [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/ > [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/ > > Shuai Xue (2): > ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on > synchronous events > ACPI: APEI: handle synchronous exceptions in task work > > drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++--------------- > include/acpi/ghes.h | 3 - > mm/memory-failure.c | 13 ---- > 3 files changed, 83 insertions(+), 68 deletions(-) >