All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: rafael@kernel.org, lenb@kernel.org, james.morse@arm.com,
	tony.luck@intel.com, bp@alien8.de, dave.hansen@linux.intel.com,
	jarkko@kernel.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com,
	akpm@linux-foundation.org
Cc: linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	cuibixuan@linux.alibaba.com, baolin.wang@linux.alibaba.com,
	zhuo.song@linux.alibaba.com
Subject: Re: [PATCH] ACPI: APEI: do not add task_work for outside context error
Date: Mon, 19 Sep 2022 10:37:19 +0800	[thread overview]
Message-ID: <74029c74-8645-a1d3-10c7-5f309c1c611e@linux.alibaba.com> (raw)
In-Reply-To: <20220916050535.26625-1-xueshuai@linux.alibaba.com>



在 2022/9/16 PM1:05, Shuai Xue 写道:
> If an error is detected as a result of user-space process accessing a
> corrupt memory location, the CPU may take an abort. Then the platform
> firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
> NOTIFY_SOFTWARE_DELEGATED, etc.
> 
> For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the
> memory_failure() queue for synchronous errors") keep track of whether
> memory_failure() work was queued, and make task_work pending to flush out
> the queue so that the work is processed before return to user-space.
> 
> The code use init_mm to check whether the error occurs in user space:
> 
>     if (current->mm != &init_mm)
> 
> The condition is always true, becase _nobody_ ever has "init_mm" as a real
> VM any more 

(Sorry, I forgot to describe the side effect.)

If an error is detected outside of the current execution context (e.g. when
detected by a background scrubber), the current could be any thread. When a
kernel thread is interrupted, the work ghes_kick_task_work deferred to task_work
will never be processed because entry_handler returns to call ret_to_kernel()
instead of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry will not be released. After
around 200 allocations in our platform, the ghes_estatus_pool will run of memory
and ghes_in_nmi_queue_one_entry returns ENOMEM. As a result, the event failed
to be processed.

    sdei: event 805 on CPU 113 failed with error: -2

Finally, a lot of unhandled events may cause platform firmware to exceed some
threshold and reboot.

Best Regards,
Shuai

and should generally just do
> 
>     if (current->mm)
> 
> as described in active_mm.rst documentation.
> 
> Then if an error is detected outside of the current execution context (e.g.
> when detected by a background scrubber), do not add task_work as the
> original patch intends to do.
> 
> Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  drivers/acpi/apei/ghes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d91ad378c00d..80ad530583c9 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -985,7 +985,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
>  				ghes_estatus_cache_add(generic, estatus);
>  		}
>  
> -		if (task_work_pending && current->mm != &init_mm) {
> +		if (task_work_pending && current->mm) {
>  			estatus_node->task_work.func = ghes_kick_task_work;
>  			estatus_node->task_work_cpu = smp_processor_id();
>  			ret = task_work_add(current, &estatus_node->task_work,

  reply	other threads:[~2022-09-19  2:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16  5:05 [PATCH] ACPI: APEI: do not add task_work for outside context error Shuai Xue
2022-09-19  2:37 ` Shuai Xue [this message]
2022-09-24  7:49 ` [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak Shuai Xue
2022-09-24  7:50   ` kernel test robot
2022-09-24 17:17   ` Rafael J. Wysocki
2022-09-26 11:35     ` Shuai Xue
2022-09-26 15:20       ` Luck, Tony
2022-09-27  3:50         ` Shuai Xue
2022-09-27 17:47           ` Luck, Tony
2022-09-29  2:33             ` Shuai Xue
2022-09-29 20:52               ` Luck, Tony
2022-09-30  2:52                 ` Shuai Xue
2022-09-30 15:52                   ` Luck, Tony
2022-10-04 14:07                     ` Rafael J. Wysocki
2022-10-13  7:05           ` Shuai Xue
2022-10-13 17:18             ` Luck, Tony
2022-10-14 13:23               ` Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74029c74-8645-a1d3-10c7-5f309c1c611e@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=cuibixuan@linux.alibaba.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=lenb@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.