Re: Question about SEA handling process happened in user space

From: James Morse <james.morse@arm.com>
To: Xiaofei Tan <tanxiaofei@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Linuxarm <linuxarm@huawei.com>, Will Deacon <will@kernel.org>,
	Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Shiju Jose <shiju.jose@huawei.com>
Subject: Re: Question about SEA handling process happened in user space
Date: Tue, 31 Mar 2020 18:00:53 +0100	[thread overview]
Message-ID: <f9732852-046c-347c-21e1-7690e6b84a50@arm.com> (raw)
In-Reply-To: <5E83104A.7020803@huawei.com>

Hi Xiaofei,

On 3/31/20 10:41 AM, Xiaofei Tan wrote:
> On 2020/3/31 0:49, James Morse wrote:
>> On 3/30/20 2:10 PM, Xiaofei Tan wrote:
>>> I'm a little confused about the handling process of SEA happened in user space.
>>
>>> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
>>> valid only for synchronous External abort on a translation table walk.
>>
>>> But for this FAR valid scenario(attached code from line 684 to 687),
>>> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
>>> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.
>>
>> 'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
>> all' external abort fault-status-code. This lets the CPU tell us that it doesn't
>> know what the faulting virtual address is.

>> I'm not quite sure what your question is.
>>
>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>> alternative is to upgrade to SIGKILL in that case.

> Got it. May be the description of FnV bit of register ESR_ELx is not quite exactly. Because
> following the code, CPU may still have an chance to tell the address for SEA, not on translation table walk.

Its up to the CPU. If it has a VA for this fault, it can store it in FAR_EL1. If
it doesn't, it can set ESR_EL1.FnV to say the value in FAR_EL1 is UNKNOWN.

(these are some made up examples, I don't know how any particular CPU does this...)
For example, the address translation may be the last thing the CPU does. When it
gets an error, it still has the VA address on hand, and can report it in FAR_EL1.
Another CPU may do all the address translation early, when it gets an error, all
it has is the physical address, which it can't put in FAR_EL1.

For the translation table walks, the CPU obviously has to have the VA on hand to
do the walk, so its expected to report it.

>> If you see this instead of the address provided via firmware-first, there is a
>> series to improve that here:
>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>
>> (We skip this signal code of APEI promises it did all the work. This lets you
>> take the signal from memory_failure() instead, which may have better information.)

> This should be an great direction.
> I have two concerns.
> 1.memory_failure() is only called for "memory error section" record. Then
> should we use this memory record for ghes sea report? Our platform is
> using "ARM processor error section".

For what classes of error?

If memory has become corrupted, you should tell the OS about the memory error.

From (my) memory: linux will just print out 'processor errors', and panic() if
they are marked as fatal. I don't think you can use these to convey a memory
error...

> 2.Should we define an error source structure for each cpu core in HEST table?
> If not, there may be conflict if more than one cpu core fall into SEA.

This is a question for the people who wrote your firmware.
For firmware first, you must have set SCR_EL3.EA. What does your firmware do if
two CPUs take an external abort at the same time?

Each CPU having its own area to read/write CPER would mean you need one
NOTIFY_SEA entry in the HEST for each area ... but how does the OS know which
CPU is which?

I think its better for there to be one area for CPER. If a second CPU takes an
external abort while the first is processing it, it has to be held in firmware
until the GHESv2 ACK says the area is no longer in use.
This way firmware guarantees the CPU taking the emulated external abort, will
always find its records in the CPER area.

>> If its the SIGKILL entries: these are for the translation table walk.
>> There is no point telling user-space about corruption in its page tables as it
>> can't do anything about it. The kernel's handling of this is to kill the
>> process. (page tables make up a very small amount of memory, so this should be
>> rarer than the regular 'external abort' case)

> Hmm, then it is useless that CPU record address for this entries.

An OS that is better than linux may use FAR_EL1 to handle these errors!

Linux doesn't because user-space memory can be re-mapped by another CPU. We need
to know the affected physical-address in order to handle the error, but can't
know what that was if a remote CPU remapped the page between us taking the
external abort, and do_sea() starting to walk the page-tables with FAR_EL1.

Firmware-first's memory-error description gives us the physical address if
firmware can learn it by imp-def means. v8.2 RAS extensions gives us an ERRxADDR
register that holds the physical-address.

Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel