From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BC6BCC07E97
	for <linux-mm@archiver.kernel.org>; Wed, 29 Nov 2023 18:54:55 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EA5276B03D7; Wed, 29 Nov 2023 13:54:54 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E54D46B03DE; Wed, 29 Nov 2023 13:54:54 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D1CFD6B03DF; Wed, 29 Nov 2023 13:54:54 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id C18E16B03D7
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 13:54:54 -0500 (EST)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 6E9124062B
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 18:54:54 +0000 (UTC)
X-FDA: 81511893708.04.1D53840
Received: from mail.alien8.de (mail.alien8.de [65.109.113.108])
	by imf30.hostedemail.com (Postfix) with ESMTP id 3B4258000C
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 18:54:51 +0000 (UTC)
Authentication-Results: imf30.hostedemail.com;
	dkim=pass header.d=alien8.de header.s=alien8 header.b=c0wfr6cu;
	dmarc=pass (policy=none) header.from=alien8.de;
	spf=pass (imf30.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1701284092;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=wkXF0McmkxLPOnNWE7KBeTjPtQypJV7rcm8UGbBFHQg=;
	b=2MiJKaAkbljmf9X+c010QZ4qOl6vlKohcKJBicoZmVpF1KM0UzjEP1lQvgPpj+ZLKHw3dR
	7WgIgll65SdpksQwCcZKsNj+ndEtcENJXJuDyvzfqW9xKVpGyc7v9+vb/jh/m9mpkFTaoa
	8wwdzW8LuS+3r0Fj1k/ZtWT0f3nK1xo=
ARC-Authentication-Results: i=1;
	imf30.hostedemail.com;
	dkim=pass header.d=alien8.de header.s=alien8 header.b=c0wfr6cu;
	dmarc=pass (policy=none) header.from=alien8.de;
	spf=pass (imf30.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701284092; a=rsa-sha256;
	cv=none;
	b=yfW9lMEaO/uhWrYXJquK4hqq5H/Jrp2X+M99BBfsHh/WcfVbPuNSvLFTo50HtbECWz3N7E
	Jg0SiiSFRowFAjd7UOlKELqHla06U9vanxtinmIhnH4SLhEFSIVU5nyoFpSCJS+u87LdtY
	NrsFnF2XT2bKlmozlNdz8DzGIxnCvWo=
Received: from localhost (localhost.localdomain [127.0.0.1])
	by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 3790B40E0173;
	Wed, 29 Nov 2023 18:54:48 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at mail.alien8.de
Received: from mail.alien8.de ([127.0.0.1])
	by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id o_QRVbiiKrpG; Wed, 29 Nov 2023 18:54:46 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8;
	t=1701284086; bh=wkXF0McmkxLPOnNWE7KBeTjPtQypJV7rcm8UGbBFHQg=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=c0wfr6cuFBnDVihD/dvDSJNrlQKegnEWbgut3ay6FbTEVf8pOBDQA7A21e7W6Eu9X
	 RJ3gmg2pnf3YR3pOexA+cr5lzeqlmDIPLm1pJXLfF2wvBzqcfDx+71wLohdZbJjcpl
	 0ipmtaNWI98bNqv5eTQJHvR+NOz7zku4YegeEEdFl5iCw/QkIijc1foP1vrTuvbt0t
	 1BXXWlIdIrXrWMgqseJdj6abP82BTtG5HhiUBaL9MG1t51yZPGz9fJWSlMX3c4ohun
	 BQJrBNdQEiKFpTSXlWunveDYGGKgo5moDbKAU36vAJ4n8M7dsMQkNTJMiINQuIrOLy
	 gWWXooakpHbleHq8KpbM4YnwQO3pkafLmbMBIBuz22vxr/3CEjQDJ8KBBHS+FkHmCM
	 JnBUwie7eF+Xu9sxpx2vNunpj18HPXRF+buMTYb5379auLhD/iTGrKjBcuJC2priqZ
	 rJF0b0DrwDablHdggGsz8sofvfOqGgyqWYbemM+VxTH7qyEv7WKJ43H1oYdEU09Odh
	 rkbVkVWdwHnIvi9LIJNH9umLTb8Quw70i9E44fbxp1z2iBqkLxrZTi/VOS7ZVA55sg
	 /x7ZE9sG3Mp4ZbPM8LGQXXCLD4zwSX1kjQyAZlW+hMYuqSMhdK0Wxj4fgxigSvLaMs
	 pQtwktS4Zp3O01aT3svrZaRY=
Received: from zn.tnic (pd95304da.dip0.t-ipconnect.de [217.83.4.218])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256)
	(No client certificate requested)
	by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 8FA0340E014B;
	Wed, 29 Nov 2023 18:54:11 +0000 (UTC)
Date: Wed, 29 Nov 2023 19:54:06 +0100
From: Borislav Petkov <bp@alien8.de>
To: Shuai Xue <xueshuai@linux.alibaba.com>, james.morse@arm.com
Cc: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com,
	mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com,
	naoya.horiguchi@nec.com, gregkh@linuxfoundation.org,
	will@kernel.org, jarkko@kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-edac@vger.kernel.org,
	acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org,
	x86@kernel.org, justin.he@arm.com, ardb@kernel.org,
	ying.huang@intel.com, ashish.kalra@amd.com,
	baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com,
	robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com,
	zhuo.song@linux.alibaba.com
Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task
 work with proper si_code
Message-ID: <20231129185406.GBZWeIzqwgRQe7XDo/@fat_crate.local>
References: <20221027042445.60108-1-xueshuai@linux.alibaba.com>
 <20231007072818.58951-1-xueshuai@linux.alibaba.com>
 <20231123150710.GEZV9qnkWMBWrggGc1@fat_crate.local>
 <9e92e600-86a4-4456-9de4-b597854b107c@linux.alibaba.com>
 <20231125121059.GAZWHkU27odMLns7TZ@fat_crate.local>
 <1048123e-b608-4db1-8d5f-456dd113d06f@linux.alibaba.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <1048123e-b608-4db1-8d5f-456dd113d06f@linux.alibaba.com>
X-Rspamd-Queue-Id: 3B4258000C
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-Stat-Signature: ohqh89fmg8uo38n137nk317bzkajwzfy
X-HE-Tag: 1701284091-82716
X-HE-Meta: U2FsdGVkX18DYejn2EM4MIHKnuOCaZge2ILRnZNbAqVBknIGNI9gLUioBNvr3csrwiiY2qVPu+/iq7wYQkpeM6PBZskaAxfIHjREhvPTI4MCtyvD1D/tWFhNkCY04JNY7tynze8meFCed+GubMBKJLx6MSFltS8liYza6Zdkfmx+lKgHS2qcnbcX5lnIA6FekEHUmb1zZRwtgEkzEz2lATQBImnlVrsW7u4rI6mz1lpndBnFQV24Fdquw6B+e/8lcLLbFJeAe64eI8ISZKL5arWWsT5LG5CJYhvAY0gSp5HMVPH/It1Ga6Hr4BDovPiIYjiicUjqAdkYEIT9MnKYF12aDRE4ggp90topxsEJ8LbJ3XaF6oU3iHsDh4YXWm9ChbYIN+PQ3vSdcT/x/sG9FduCv/QqFApXgsTQ+z7A9EBvTd12z5v27nZELy701z3dOwb40Pr1aXZIEMShoPnZ+dcD7gIQB94CUYjvimnk3oEnZHQBjedUGzeFPUfjd2Mv470ZgoVI3U/tKCzXMQeX4e4cL4TOecb0mxzxesoY/QbyxqZQ0QqRGgWkMboyT75iArn9k/0plm4mkB2kOaCS9c5nRaQQ4d+eEK+C4gjg57qUGy4r5uiCv328gl/df0unNM7bis6UzEZpBzCbB+5USdkqXO5mu5bbYy58kMlda7luvp48n2ilXeC2/hmRtPdVge8MFqnjqWjArZlEhxIR9Fe53HBKcaIbqqkoGnUtMBooE+WlRhtTHUxMIXqbTlgZqaGclP/4LKGz35OHhtMA/2XKepKstJVSM4QlRxjag8IGQ/SkhjW8GbyRO5P+mhu1Fi7W45w/l0kR5mZ7pS91RzY7aJmYkAQckVkPhCRD07Po2zNRwTiSLyYVo6zor/Zmj/sBCpoAUl0Pp+YuhYJ4tt6/ItRelqotMwk8UGH89H0E9SSCWADVFqHner5UDTOHk4rsbPPiA8Z40tZ50cu
 PPefymcn
 ke3D5CfcNmK8xepiDwoXuYNmcMg6xvxeBY8bU909afIpDDAtBT6oeLwwu9HjrI0/8CQN2HNF+KWBAujPFweKKPSZfrGdw9DJ0tNQvIYA5dF1+rtU2S6NSdZGr/Exwkw4xx+Dg5CsibzG8RSgIoW05KjR1B7dCcTRurdP2aa2welq96iAqUWTjNgxjftUIn5d27kILM79p5g4Az3mH5pGaf7oZgwovYvz2n4Y6N14DbwrQEpSC95GcaLe0yg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Moving James to To:

On Sun, Nov 26, 2023 at 08:25:38PM +0800, Shuai Xue wrote:
> > On Sat, Nov 25, 2023 at 02:44:52PM +0800, Shuai Xue wrote:
> >> - an AR error consumed by current process is deferred to handle in a
> >>   dedicated kernel thread, but memory_failure() assumes that it runs in the
> >>   current context
> > 
> > On x86? ARM?
> > 
> > Pease point to the exact code flow.
> 
> An AR error consumed by current process is deferred to handle in a
> dedicated kernel thread on ARM platform. The AR error is handled in bellow
> flow:
> 
> -----------------------------------------------------------------------------
> [usr space task einj_mem_uc consumd data poison, CPU 3]         STEP 0
> 
> -----------------------------------------------------------------------------
> [ghes_sdei_critical_callback: current einj_mem_uc, CPU 3]		STEP 1
> ghes_sdei_critical_callback
>     => __ghes_sdei_callback
>         => ghes_in_nmi_queue_one_entry 		// peak and read estatus
>         => irq_work_queue(&ghes_proc_irq_work) <=> ghes_proc_in_irq // irq_work
> [ghes_sdei_critical_callback: return]
> -----------------------------------------------------------------------------
> [ghes_proc_in_irq: current einj_mem_uc, CPU 3]			        STEP 2
>             => ghes_do_proc
>                 => ghes_handle_memory_failure
>                     => ghes_do_memory_failure
>                         => memory_failure_queue	 // put work task on current CPU
>                             => if (kfifo_put(&mf_cpu->fifo, entry))
>                                   schedule_work_on(smp_processor_id(), &mf_cpu->work);
>             => task_work_add(current, &estatus_node->task_work, TWA_RESUME);
> [ghes_proc_in_irq: return]
> -----------------------------------------------------------------------------
> // kworker preempts einj_mem_uc on CPU 3 due to RESCHED flag	STEP 3
> [memory_failure_work_func: current kworker, CPU 3]	
>      => memory_failure_work_func(&mf_cpu->work)
>         => while kfifo_get(&mf_cpu->fifo, &entry);	// until get no work
>             => memory_failure(entry.pfn, entry.flags);

>From the comment above that function:

 * The function is primarily of use for corruptions that
 * happen outside the current execution context (e.g. when
 * detected by a background scrubber)
 *
 * Must run in process context (e.g. a work queue) with interrupts
 * enabled and no spinlocks held.

> -----------------------------------------------------------------------------
> [ghes_kick_task_work: current einj_mem_uc, other cpu]           STEP 4
>                 => memory_failure_queue_kick
>                     => cancel_work_sync - waiting memory_failure_work_func finish
>                     => memory_failure_work_func(&mf_cpu->work)
>                         => kfifo_get(&mf_cpu->fifo, &entry); // no work
> -----------------------------------------------------------------------------
> [einj_mem_uc resume at the same PC, trigger a page fault        STEP 5
> 
> STEP 0: A user space task, named einj_mem_uc consume a poison. The firmware
> notifies hardware error to kernel through is SDEI
> (ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED).
> 
> STEP 1: The swapper running on CPU 3 is interrupted. irq_work_queue() rasie
> a irq_work to handle hardware errors in IRQ context
> 
> STEP2: In IRQ context, ghes_proc_in_irq() queues memory failure work on
> current CPU in workqueue and add task work to sync with the workqueue.
> 
> STEP3: The kworker preempts the current running thread and get CPU 3. Then
> memory_failure() is processed in kworker.

See above.

> STEP4: ghes_kick_task_work() is called as task_work to ensure any queued
> workqueue has been done before returning to user-space.
> 
> STEP5: Upon returning to user-space, the task einj_mem_uc resumes at the
> current instruction, because the poison page is unmapped by
> memory_failure() in step 3, so a page fault will be triggered.
> 
> memory_failure() assumes that it runs in the current context on both x86
> and ARM platform.
> 
> 
> for example:
> 	memory_failure() in mm/memory-failure.c:
> 
> 		if (flags & MF_ACTION_REQUIRED) {
> 			folio = page_folio(p);
> 			res = kill_accessing_process(current, folio_pfn(folio), flags);
> 		}

And?

Do you see the check above it?

	if (TestSetPageHWPoison(p)) {

test_and_set_bit() returns true only when the page was poisoned already.

 * This function is intended to handle "Action Required" MCEs on already
 * hardware poisoned pages. They could happen, for example, when
 * memory_failure() failed to unmap the error page at the first call, or
 * when multiple local machine checks happened on different CPUs.

And that's kill_accessing_process().

So AFAIU, the kworker running memory_failure() would only mark the page
as poison.

The killing happens when memory_failure() runs again and the process
touches the page again.

But I'd let James confirm here.

I still don't know what you're fixing here.

Is this something you're encountering on some machine or you simply
stared at code?

What does that

"Both Alibaba and Huawei met the same issue in products, and we hope it
could be fixed ASAP."

mean?

What did you meet?

What was the problem?

I still note that you're avoiding answering the question what the issue
is and if you keep avoiding it, I'll ignore this whole thread.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette