stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: "Luck, Tony" <tony.luck@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	James Morse <james.morse@arm.com>
Cc: "Len Brown" <lenb@kernel.org>, "Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"linmiaohe@huawei.com" <linmiaohe@huawei.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Stable <stable@vger.kernel.org>,
	"ACPI Devel Maling List" <linux-acpi@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"cuibixuan@linux.alibaba.com" <cuibixuan@linux.alibaba.com>,
	"baolin.wang@linux.alibaba.com" <baolin.wang@linux.alibaba.com>,
	"zhuo.song@linux.alibaba.com" <zhuo.song@linux.alibaba.com>
Subject: Re: [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
Date: Tue, 27 Sep 2022 11:50:24 +0800	[thread overview]
Message-ID: <79cb9aee-9ad5-00f4-3f7a-9c409f502685@linux.alibaba.com> (raw)
In-Reply-To: <SJ1PR11MB6083FC6B8D64933C573CAB64FC529@SJ1PR11MB6083.namprd11.prod.outlook.com>



在 2022/9/26 PM11:20, Luck, Tony 写道:
>>>
>>> -               if (task_work_pending && current->mm != &init_mm) {
>>> +               if (task_work_pending && current->mm) {
>>>                         estatus_node->task_work.func = ghes_kick_task_work;
>>>                         estatus_node->task_work_cpu = smp_processor_id();
>>>                         ret = task_work_add(current, &estatus_node->task_work,
> 
> It seems that you are getting errors reported while running kernel threads. This fix avoids
> pointlessly adding "task_work" that will never be processed because kernel threads never
> return to user mode.

Yes, you are right.

> 
> But maybe something else needs to be done? The code was, and with this fix still is,
> taking no action for the error. That doesn't seem right.

Sorry, I don't think so. As far as I know, on Arm platform, hardware error can signal
exceptions including:

- Synchronous External Abort (SEA), e,g. data abort or instruction abort
- Asynchronous External Abort (signalled as SErrors), e,g. L2 can generate SError for
  error responses from the interconnect for a Device or Non-cacheable store
- Fault Handling and Error Recovery interrupts: DDR mainline/demand/scrubber error interrupt

When the error signals asynchronous exceptions (SError or interrupt), any kind of thread can
be interrupted, including kernel thread. Because asynchronous exceptions are signaled in
background, the errors are detected outside of the current execution context.

The GHES driver always queues work to handle memory failure of a page in memory_failure_queue().
If a kernel thread is interrupted:

- Without this fix, the added task_work will never be processed so that the work will not
  be canceled.
- With this fix, the task_work will not be added.

In a conclusion, the error will be handled in a kworker with or without this fix.

The point of fix is that:

- The condition is useless because it is always tree. And I think it is not the original patch
  intends to do.
- The current code leads to memory leaks. The estatus_node will not be freed when task_work is
  added to a kernel thread.


> 
> Are you injecting errors to create this scenario? 

Yes, I am injecting error to create such scenario. After 200 injections, the ghes_estatus_pool
will run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. Finally, a lot of unhandled
events may cause platform firmware to exceed some threshold and reboot.

> What does your test do?

My injection includes two steps:

- mmap a device memory for userspace in a driver
- inject uc and trigger write like ras-tools does

I have opened source the code and you can find here[1]. It's forked from your repo and mainly based
on your code :)

By the way, do you have any plans to extend ras-tools to Arm platform. And is there a mail-list
for ras-tools? I send a mail to add test cases to ras-tools several weeks ago, but no response.
May your mailbox regards it as Spam.


[1] https://gitee.com/anolis/ras-tools/tree/arm-devel




  reply	other threads:[~2022-09-27  3:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220916050535.26625-1-xueshuai@linux.alibaba.com>
2022-09-24  7:49 ` [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak Shuai Xue
2022-09-24  7:50   ` kernel test robot
2022-09-24 17:17   ` Rafael J. Wysocki
2022-09-26 11:35     ` Shuai Xue
2022-09-26 15:20       ` Luck, Tony
2022-09-27  3:50         ` Shuai Xue [this message]
2022-09-27 17:47           ` Luck, Tony
2022-09-29  2:33             ` Shuai Xue
2022-09-29 20:52               ` Luck, Tony
2022-09-30  2:52                 ` Shuai Xue
2022-09-30 15:52                   ` Luck, Tony
2022-10-04 14:07                     ` Rafael J. Wysocki
2022-10-13  7:05           ` Shuai Xue
2022-10-13 17:18             ` Luck, Tony
2022-10-14 13:23               ` Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79cb9aee-9ad5-00f4-3f7a-9c409f502685@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=cuibixuan@linux.alibaba.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=lenb@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=rafael@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).