linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Xiaofei Tan <tanxiaofei@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Linuxarm <linuxarm@huawei.com>, Will Deacon <will@kernel.org>,
	Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Shiju Jose <shiju.jose@huawei.com>
Subject: Re: Question about SEA handling process happened in user space
Date: Thu, 9 Apr 2020 15:28:08 +0100	[thread overview]
Message-ID: <66db5a6a-e68b-00b7-6a78-2c8cd9e63aab@arm.com> (raw)
In-Reply-To: <5E8EE845.8090406@huawei.com>

Hi Xiaofei,

On 09/04/2020 10:17, Xiaofei Tan wrote:
> On 2020/4/8 0:37, James Morse wrote:
>> On 02/04/2020 07:35, Xiaofei Tan wrote:
>>> On 2020/3/31 0:49, James Morse wrote:
>>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>>> alternative is to upgrade to SIGKILL in that case.
>>>>
>>>>
>>>> If you see this instead of the address provided via firmware-first, there is a
>>>> series to improve that here:
>>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>>
>>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>>> take the signal from memory_failure() instead, which may have better information.)
>>
>>> There may be an competition issue.
>>> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
>>> before here SEA handling end, and application process may back to run.

>> With that series, it runs in process-context as task-work. memory_failure() needs to
>> sleep, so it has to run in process-context. 
> 
> 
>> Doing it as task-work means it runs before the thread returns to user-space.
> 
> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
> have returned to user-space before it.

ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
and we call tracehook_notify_resume() which ends up in task_work_run()...

That TIF flag effectively prevents this thread returning to user-space until that task
work has run.


> BTW, What context synchronous exception abort is? I thought it was process-context.

It depends what you interrupted.
32bit had different CPU modes for different contexts, we don't have that in 64bit. Instead
we mask asynchronous interrupts, and tinker with the preempt count to track the context.
Synchronous exceptions can't be masked, so they happen in whatever context you were
already in.
This means the exception handlers have to be be prepared for each eventuality.
(which is why that code is starting to look complex)


> Because in_interrupt() return false called in do_sea().

If you took the exception from EL0, or EL1 process context, yes. If you took the exception
from an IRQ handler, in_interrupt() would return true.


>> If another thread in the same process accesses the affected memory, I'd expect to take a
>> second external abort. If another process had the page mapped, it could access the
>> affected memory, again taking an external abort.

> Yes, it is hard to avoid another thread to access the affected memory.
> I just worry the same thread access it again.

This is the race that that series fixes.
It can't happen with mainline as the arch code unconditionally signals the affected
process, which was the pre-RAS behaviour.

>> These two could happen while the first CPU was in firmware generating the CPER records, so
>> its not a race we can fix. It should be harmless, the recovery action is the same, its
>> just the error counters that count more events than errors. If you actually see it happen,
>> we can try and make it smaller...

> Hmm, maybe this double SEA handling is an solution.

It assumes you get a second external-abort. We know this thread is affected, and will try
and consume the error again if we restart it. We shouldn't restart it until we've given
the recovery our best shot.
Letting it loose is a poor choice if you have any kind of threshold for error-counts. They
may jump NR_CPUs at a time until every CPU is waiting in memory_failure()...


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-04-09 14:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-30 13:10 Question about SEA handling process happened in user space Xiaofei Tan
2020-03-30 16:49 ` James Morse
2020-03-31  9:41   ` Xiaofei Tan
2020-03-31 17:00     ` James Morse
2020-04-01  3:49       ` Xiaofei Tan
2020-04-07 16:37         ` James Morse
2020-04-09  8:42           ` Xiaofei Tan
2020-04-09 14:28             ` James Morse
2020-04-10  2:55               ` Xiaofei Tan
2020-04-16 13:27                 ` James Morse
2020-04-18 10:49                   ` Xiaofei Tan
2020-04-02  6:35   ` Xiaofei Tan
2020-04-07 16:37     ` James Morse
2020-04-09  9:17       ` Xiaofei Tan
2020-04-09 14:28         ` James Morse [this message]
2020-04-10  9:43           ` Xiaofei Tan
2020-04-16 13:50             ` James Morse
2020-04-18 11:25               ` Xiaofei Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66db5a6a-e68b-00b7-6a78-2c8cd9e63aab@arm.com \
    --to=james.morse@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=tanxiaofei@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).