From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC6BCC07E97 for ; Wed, 29 Nov 2023 18:54:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA5276B03D7; Wed, 29 Nov 2023 13:54:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E54D46B03DE; Wed, 29 Nov 2023 13:54:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1CFD6B03DF; Wed, 29 Nov 2023 13:54:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C18E16B03D7 for ; Wed, 29 Nov 2023 13:54:54 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6E9124062B for ; Wed, 29 Nov 2023 18:54:54 +0000 (UTC) X-FDA: 81511893708.04.1D53840 Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf30.hostedemail.com (Postfix) with ESMTP id 3B4258000C for ; Wed, 29 Nov 2023 18:54:51 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=c0wfr6cu; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf30.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701284092; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wkXF0McmkxLPOnNWE7KBeTjPtQypJV7rcm8UGbBFHQg=; b=2MiJKaAkbljmf9X+c010QZ4qOl6vlKohcKJBicoZmVpF1KM0UzjEP1lQvgPpj+ZLKHw3dR 7WgIgll65SdpksQwCcZKsNj+ndEtcENJXJuDyvzfqW9xKVpGyc7v9+vb/jh/m9mpkFTaoa 8wwdzW8LuS+3r0Fj1k/ZtWT0f3nK1xo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=c0wfr6cu; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf30.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701284092; a=rsa-sha256; cv=none; b=yfW9lMEaO/uhWrYXJquK4hqq5H/Jrp2X+M99BBfsHh/WcfVbPuNSvLFTo50HtbECWz3N7E Jg0SiiSFRowFAjd7UOlKELqHla06U9vanxtinmIhnH4SLhEFSIVU5nyoFpSCJS+u87LdtY NrsFnF2XT2bKlmozlNdz8DzGIxnCvWo= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 3790B40E0173; Wed, 29 Nov 2023 18:54:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id o_QRVbiiKrpG; Wed, 29 Nov 2023 18:54:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1701284086; bh=wkXF0McmkxLPOnNWE7KBeTjPtQypJV7rcm8UGbBFHQg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=c0wfr6cuFBnDVihD/dvDSJNrlQKegnEWbgut3ay6FbTEVf8pOBDQA7A21e7W6Eu9X RJ3gmg2pnf3YR3pOexA+cr5lzeqlmDIPLm1pJXLfF2wvBzqcfDx+71wLohdZbJjcpl 0ipmtaNWI98bNqv5eTQJHvR+NOz7zku4YegeEEdFl5iCw/QkIijc1foP1vrTuvbt0t 1BXXWlIdIrXrWMgqseJdj6abP82BTtG5HhiUBaL9MG1t51yZPGz9fJWSlMX3c4ohun BQJrBNdQEiKFpTSXlWunveDYGGKgo5moDbKAU36vAJ4n8M7dsMQkNTJMiINQuIrOLy gWWXooakpHbleHq8KpbM4YnwQO3pkafLmbMBIBuz22vxr/3CEjQDJ8KBBHS+FkHmCM JnBUwie7eF+Xu9sxpx2vNunpj18HPXRF+buMTYb5379auLhD/iTGrKjBcuJC2priqZ rJF0b0DrwDablHdggGsz8sofvfOqGgyqWYbemM+VxTH7qyEv7WKJ43H1oYdEU09Odh rkbVkVWdwHnIvi9LIJNH9umLTb8Quw70i9E44fbxp1z2iBqkLxrZTi/VOS7ZVA55sg /x7ZE9sG3Mp4ZbPM8LGQXXCLD4zwSX1kjQyAZlW+hMYuqSMhdK0Wxj4fgxigSvLaMs pQtwktS4Zp3O01aT3svrZaRY= Received: from zn.tnic (pd95304da.dip0.t-ipconnect.de [217.83.4.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 8FA0340E014B; Wed, 29 Nov 2023 18:54:11 +0000 (UTC) Date: Wed, 29 Nov 2023 19:54:06 +0100 From: Borislav Petkov To: Shuai Xue , james.morse@arm.com Cc: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Message-ID: <20231129185406.GBZWeIzqwgRQe7XDo/@fat_crate.local> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20231007072818.58951-1-xueshuai@linux.alibaba.com> <20231123150710.GEZV9qnkWMBWrggGc1@fat_crate.local> <9e92e600-86a4-4456-9de4-b597854b107c@linux.alibaba.com> <20231125121059.GAZWHkU27odMLns7TZ@fat_crate.local> <1048123e-b608-4db1-8d5f-456dd113d06f@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1048123e-b608-4db1-8d5f-456dd113d06f@linux.alibaba.com> X-Rspamd-Queue-Id: 3B4258000C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ohqh89fmg8uo38n137nk317bzkajwzfy X-HE-Tag: 1701284091-82716 X-HE-Meta: U2FsdGVkX18DYejn2EM4MIHKnuOCaZge2ILRnZNbAqVBknIGNI9gLUioBNvr3csrwiiY2qVPu+/iq7wYQkpeM6PBZskaAxfIHjREhvPTI4MCtyvD1D/tWFhNkCY04JNY7tynze8meFCed+GubMBKJLx6MSFltS8liYza6Zdkfmx+lKgHS2qcnbcX5lnIA6FekEHUmb1zZRwtgEkzEz2lATQBImnlVrsW7u4rI6mz1lpndBnFQV24Fdquw6B+e/8lcLLbFJeAe64eI8ISZKL5arWWsT5LG5CJYhvAY0gSp5HMVPH/It1Ga6Hr4BDovPiIYjiicUjqAdkYEIT9MnKYF12aDRE4ggp90topxsEJ8LbJ3XaF6oU3iHsDh4YXWm9ChbYIN+PQ3vSdcT/x/sG9FduCv/QqFApXgsTQ+z7A9EBvTd12z5v27nZELy701z3dOwb40Pr1aXZIEMShoPnZ+dcD7gIQB94CUYjvimnk3oEnZHQBjedUGzeFPUfjd2Mv470ZgoVI3U/tKCzXMQeX4e4cL4TOecb0mxzxesoY/QbyxqZQ0QqRGgWkMboyT75iArn9k/0plm4mkB2kOaCS9c5nRaQQ4d+eEK+C4gjg57qUGy4r5uiCv328gl/df0unNM7bis6UzEZpBzCbB+5USdkqXO5mu5bbYy58kMlda7luvp48n2ilXeC2/hmRtPdVge8MFqnjqWjArZlEhxIR9Fe53HBKcaIbqqkoGnUtMBooE+WlRhtTHUxMIXqbTlgZqaGclP/4LKGz35OHhtMA/2XKepKstJVSM4QlRxjag8IGQ/SkhjW8GbyRO5P+mhu1Fi7W45w/l0kR5mZ7pS91RzY7aJmYkAQckVkPhCRD07Po2zNRwTiSLyYVo6zor/Zmj/sBCpoAUl0Pp+YuhYJ4tt6/ItRelqotMwk8UGH89H0E9SSCWADVFqHner5UDTOHk4rsbPPiA8Z40tZ50cu PPefymcn ke3D5CfcNmK8xepiDwoXuYNmcMg6xvxeBY8bU909afIpDDAtBT6oeLwwu9HjrI0/8CQN2HNF+KWBAujPFweKKPSZfrGdw9DJ0tNQvIYA5dF1+rtU2S6NSdZGr/Exwkw4xx+Dg5CsibzG8RSgIoW05KjR1B7dCcTRurdP2aa2welq96iAqUWTjNgxjftUIn5d27kILM79p5g4Az3mH5pGaf7oZgwovYvz2n4Y6N14DbwrQEpSC95GcaLe0yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Moving James to To: On Sun, Nov 26, 2023 at 08:25:38PM +0800, Shuai Xue wrote: > > On Sat, Nov 25, 2023 at 02:44:52PM +0800, Shuai Xue wrote: > >> - an AR error consumed by current process is deferred to handle in a > >> dedicated kernel thread, but memory_failure() assumes that it runs in the > >> current context > > > > On x86? ARM? > > > > Pease point to the exact code flow. > > An AR error consumed by current process is deferred to handle in a > dedicated kernel thread on ARM platform. The AR error is handled in bellow > flow: > > ----------------------------------------------------------------------------- > [usr space task einj_mem_uc consumd data poison, CPU 3] STEP 0 > > ----------------------------------------------------------------------------- > [ghes_sdei_critical_callback: current einj_mem_uc, CPU 3] STEP 1 > ghes_sdei_critical_callback > => __ghes_sdei_callback > => ghes_in_nmi_queue_one_entry // peak and read estatus > => irq_work_queue(&ghes_proc_irq_work) <=> ghes_proc_in_irq // irq_work > [ghes_sdei_critical_callback: return] > ----------------------------------------------------------------------------- > [ghes_proc_in_irq: current einj_mem_uc, CPU 3] STEP 2 > => ghes_do_proc > => ghes_handle_memory_failure > => ghes_do_memory_failure > => memory_failure_queue // put work task on current CPU > => if (kfifo_put(&mf_cpu->fifo, entry)) > schedule_work_on(smp_processor_id(), &mf_cpu->work); > => task_work_add(current, &estatus_node->task_work, TWA_RESUME); > [ghes_proc_in_irq: return] > ----------------------------------------------------------------------------- > // kworker preempts einj_mem_uc on CPU 3 due to RESCHED flag STEP 3 > [memory_failure_work_func: current kworker, CPU 3] > => memory_failure_work_func(&mf_cpu->work) > => while kfifo_get(&mf_cpu->fifo, &entry); // until get no work > => memory_failure(entry.pfn, entry.flags); >From the comment above that function: * The function is primarily of use for corruptions that * happen outside the current execution context (e.g. when * detected by a background scrubber) * * Must run in process context (e.g. a work queue) with interrupts * enabled and no spinlocks held. > ----------------------------------------------------------------------------- > [ghes_kick_task_work: current einj_mem_uc, other cpu] STEP 4 > => memory_failure_queue_kick > => cancel_work_sync - waiting memory_failure_work_func finish > => memory_failure_work_func(&mf_cpu->work) > => kfifo_get(&mf_cpu->fifo, &entry); // no work > ----------------------------------------------------------------------------- > [einj_mem_uc resume at the same PC, trigger a page fault STEP 5 > > STEP 0: A user space task, named einj_mem_uc consume a poison. The firmware > notifies hardware error to kernel through is SDEI > (ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED). > > STEP 1: The swapper running on CPU 3 is interrupted. irq_work_queue() rasie > a irq_work to handle hardware errors in IRQ context > > STEP2: In IRQ context, ghes_proc_in_irq() queues memory failure work on > current CPU in workqueue and add task work to sync with the workqueue. > > STEP3: The kworker preempts the current running thread and get CPU 3. Then > memory_failure() is processed in kworker. See above. > STEP4: ghes_kick_task_work() is called as task_work to ensure any queued > workqueue has been done before returning to user-space. > > STEP5: Upon returning to user-space, the task einj_mem_uc resumes at the > current instruction, because the poison page is unmapped by > memory_failure() in step 3, so a page fault will be triggered. > > memory_failure() assumes that it runs in the current context on both x86 > and ARM platform. > > > for example: > memory_failure() in mm/memory-failure.c: > > if (flags & MF_ACTION_REQUIRED) { > folio = page_folio(p); > res = kill_accessing_process(current, folio_pfn(folio), flags); > } And? Do you see the check above it? if (TestSetPageHWPoison(p)) { test_and_set_bit() returns true only when the page was poisoned already. * This function is intended to handle "Action Required" MCEs on already * hardware poisoned pages. They could happen, for example, when * memory_failure() failed to unmap the error page at the first call, or * when multiple local machine checks happened on different CPUs. And that's kill_accessing_process(). So AFAIU, the kworker running memory_failure() would only mark the page as poison. The killing happens when memory_failure() runs again and the process touches the page again. But I'd let James confirm here. I still don't know what you're fixing here. Is this something you're encountering on some machine or you simply stared at code? What does that "Both Alibaba and Huawei met the same issue in products, and we hope it could be fixed ASAP." mean? What did you meet? What was the problem? I still note that you're avoiding answering the question what the issue is and if you keep avoiding it, I'll ignore this whole thread. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette