All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about SEA handling process happened in user space
@ 2020-03-30 13:10 Xiaofei Tan
  2020-03-30 16:49 ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-03-30 13:10 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, James Morse, Dave Martin
  Cc: Linuxarm, linux-arm-kernel, Shiju Jose

Hi ALL

I'm a little confused about the handling process of SEA happened in user space.
Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
valid only for synchronous External abort on a translation table walk.

But for this FAR valid scenario(attached code from line 684 to 687),
we send signal SIGKILL to kill process. For some other scenario, such as line 680,
FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.

should it be an problem ?

FnV, bit [10]
	FAR not Valid, for a synchronous External abort other than a synchronous External abort on a
	translation table walk.
	0b0	FAR is valid.
	0b1	FAR is not valid, and holds an UNKNOWN value.
	This field is only valid if the IFSC code is 0b010000. It is RES0 for all other aborts.
	This field resets to an architecturally UNKNOWN value.

arch/arm64/mm/fault.c
663 static const struct fault_info fault_info[] = {
664         { do_bad,               SIGKILL, SI_KERNEL,     "ttbr address size fault"       },
665         { do_bad,               SIGKILL, SI_KERNEL,     "level 1 address size fault"    },
666         { do_bad,               SIGKILL, SI_KERNEL,     "level 2 address size fault"    },
667         { do_bad,               SIGKILL, SI_KERNEL,     "level 3 address size fault"    },
668         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 0 translation fault"     },
669         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 1 translation fault"     },
670         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 2 translation fault"     },
671         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 3 translation fault"     },
672         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 8"                     },
673         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 1 access flag fault"     },
674         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault"     },
675         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 3 access flag fault"     },
676         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 12"                    },
677         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 1 permission fault"      },
678         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 2 permission fault"      },
679         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 3 permission fault"      },
680         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
681         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 17"                    },
682         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 18"                    },
683         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 19"                    },
684         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 (translation table walk)"      },
685         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 (translation table walk)"      },
686         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 (translation table walk)"      },
687         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 (translation table walk)"      },
688         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented
689         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 25"                    },
690         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 26"                    },
691         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 27"                    },
692         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
693         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
694         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
695         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
696         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 32"                    },

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-03-30 13:10 Question about SEA handling process happened in user space Xiaofei Tan
@ 2020-03-30 16:49 ` James Morse
  2020-03-31  9:41   ` Xiaofei Tan
  2020-04-02  6:35   ` Xiaofei Tan
  0 siblings, 2 replies; 18+ messages in thread
From: James Morse @ 2020-03-30 16:49 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 3/30/20 2:10 PM, Xiaofei Tan wrote:
> I'm a little confused about the handling process of SEA happened in user space.

> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
> valid only for synchronous External abort on a translation table walk.

> But for this FAR valid scenario(attached code from line 684 to 687),
> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.

'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
all' external abort fault-status-code. This lets the CPU tell us that it doesn't
know what the faulting virtual address is.

do_sea() checks for this:
|	if (esr & ESR_ELx_FnV)
|		siaddr = NULL;
|	else
|		siaddr  = (void __user *)addr;

If we can't know the address, there isn't much we can do.
(This check is really making sure we don't pass junk to user-space when FnV is set)


> should it be an problem ?

I'm not quite sure what your question is.

If the CPU doesn't tell us the address, we can't tell user-space what it is. The
alternative is to upgrade to SIGKILL in that case.


If you see this instead of the address provided via firmware-first, there is a
series to improve that here:
https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/

(We skip this signal code of APEI promises it did all the work. This lets you
take the signal from memory_failure() instead, which may have better information.)


If its the SIGKILL entries: these are for the translation table walk.
There is no point telling user-space about corruption in its page tables as it
can't do anything about it. The kernel's handling of this is to kill the
process. (page tables make up a very small amount of memory, so this should be
rarer than the regular 'external abort' case)


Thanks,

James



> 680         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
> 684         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 (translation table walk)"      },
> 685         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 (translation table walk)"      },
> 686         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 (translation table walk)"      },
> 687         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 (translation table walk)"      },
> 688         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented
> 692         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
> 693         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
> 694         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
> 695         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
> 696         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 32"                    },

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-03-30 16:49 ` James Morse
@ 2020-03-31  9:41   ` Xiaofei Tan
  2020-03-31 17:00     ` James Morse
  2020-04-02  6:35   ` Xiaofei Tan
  1 sibling, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-03-31  9:41 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

Thanks for kindly reply.

On 2020/3/31 0:49, James Morse wrote:
> Hi Xiaofei,
> 
> On 3/30/20 2:10 PM, Xiaofei Tan wrote:
>> I'm a little confused about the handling process of SEA happened in user space.
> 
>> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
>> valid only for synchronous External abort on a translation table walk.
> 
>> But for this FAR valid scenario(attached code from line 684 to 687),
>> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
>> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.
> 
> 'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
> all' external abort fault-status-code. This lets the CPU tell us that it doesn't
> know what the faulting virtual address is.
> 
> do_sea() checks for this:
> |	if (esr & ESR_ELx_FnV)
> |		siaddr = NULL;
> |	else
> |		siaddr  = (void __user *)addr;
> 
> If we can't know the address, there isn't much we can do.
> (This check is really making sure we don't pass junk to user-space when FnV is set)
> 
> 

OK. So even if FAR is not valid, we still send SIGBUS for SEA, not on translation table walk, but
set the addr to NULL here.

>> should it be an problem ?
> 
> I'm not quite sure what your question is.
> 
> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
> alternative is to upgrade to SIGKILL in that case.
> 

Got it. May be the description of FnV bit of register ESR_ELx is not quite exactly. Because
following the code, CPU may still have an chance to tell the address for SEA, not on translation table walk.

> 
> If you see this instead of the address provided via firmware-first, there is a
> series to improve that here:
> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
> 
> (We skip this signal code of APEI promises it did all the work. This lets you
> take the signal from memory_failure() instead, which may have better information.)
> 

This should be an great direction.
I have two concerns.
1.memory_failure() is only called for "memory error section" record. Then
should we use this memory record for ghes sea report? Our platform is
using "ARM processor error section".
2.Should we define an error source structure for each cpu core in HEST table?
If not, there may be conflict if more than one cpu core fall into SEA.

> 
> If its the SIGKILL entries: these are for the translation table walk.
> There is no point telling user-space about corruption in its page tables as it
> can't do anything about it. The kernel's handling of this is to kill the
> process. (page tables make up a very small amount of memory, so this should be
> rarer than the regular 'external abort' case)
> 

Hmm, then it is useless that CPU record address for this entries.

> 
> Thanks,
> 
> James
> 
> 
> 
>> 680         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
>> 684         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 (translation table walk)"      },
>> 685         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 (translation table walk)"      },
>> 686         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 (translation table walk)"      },
>> 687         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 (translation table walk)"      },
>> 688         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented
>> 692         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 693         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 694         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 695         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 696         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 32"                    },
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-03-31  9:41   ` Xiaofei Tan
@ 2020-03-31 17:00     ` James Morse
  2020-04-01  3:49       ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-03-31 17:00 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 3/31/20 10:41 AM, Xiaofei Tan wrote:
> On 2020/3/31 0:49, James Morse wrote:
>> On 3/30/20 2:10 PM, Xiaofei Tan wrote:
>>> I'm a little confused about the handling process of SEA happened in user space.
>>
>>> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
>>> valid only for synchronous External abort on a translation table walk.
>>
>>> But for this FAR valid scenario(attached code from line 684 to 687),
>>> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
>>> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.
>>
>> 'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
>> all' external abort fault-status-code. This lets the CPU tell us that it doesn't
>> know what the faulting virtual address is.

>> I'm not quite sure what your question is.
>>
>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>> alternative is to upgrade to SIGKILL in that case.

> Got it. May be the description of FnV bit of register ESR_ELx is not quite exactly. Because
> following the code, CPU may still have an chance to tell the address for SEA, not on translation table walk.

Its up to the CPU. If it has a VA for this fault, it can store it in FAR_EL1. If
it doesn't, it can set ESR_EL1.FnV to say the value in FAR_EL1 is UNKNOWN.

(these are some made up examples, I don't know how any particular CPU does this...)
For example, the address translation may be the last thing the CPU does. When it
gets an error, it still has the VA address on hand, and can report it in FAR_EL1.
Another CPU may do all the address translation early, when it gets an error, all
it has is the physical address, which it can't put in FAR_EL1.

For the translation table walks, the CPU obviously has to have the VA on hand to
do the walk, so its expected to report it.


>> If you see this instead of the address provided via firmware-first, there is a
>> series to improve that here:
>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>
>> (We skip this signal code of APEI promises it did all the work. This lets you
>> take the signal from memory_failure() instead, which may have better information.)

> This should be an great direction.
> I have two concerns.
> 1.memory_failure() is only called for "memory error section" record. Then
> should we use this memory record for ghes sea report? Our platform is
> using "ARM processor error section".

For what classes of error?

If memory has become corrupted, you should tell the OS about the memory error.

From (my) memory: linux will just print out 'processor errors', and panic() if
they are marked as fatal. I don't think you can use these to convey a memory
error...


> 2.Should we define an error source structure for each cpu core in HEST table?
> If not, there may be conflict if more than one cpu core fall into SEA.

This is a question for the people who wrote your firmware.
For firmware first, you must have set SCR_EL3.EA. What does your firmware do if
two CPUs take an external abort at the same time?

Each CPU having its own area to read/write CPER would mean you need one
NOTIFY_SEA entry in the HEST for each area ... but how does the OS know which
CPU is which?

I think its better for there to be one area for CPER. If a second CPU takes an
external abort while the first is processing it, it has to be held in firmware
until the GHESv2 ACK says the area is no longer in use.
This way firmware guarantees the CPU taking the emulated external abort, will
always find its records in the CPER area.


>> If its the SIGKILL entries: these are for the translation table walk.
>> There is no point telling user-space about corruption in its page tables as it
>> can't do anything about it. The kernel's handling of this is to kill the
>> process. (page tables make up a very small amount of memory, so this should be
>> rarer than the regular 'external abort' case)

> Hmm, then it is useless that CPU record address for this entries.

An OS that is better than linux may use FAR_EL1 to handle these errors!

Linux doesn't because user-space memory can be re-mapped by another CPU. We need
to know the affected physical-address in order to handle the error, but can't
know what that was if a remote CPU remapped the page between us taking the
external abort, and do_sea() starting to walk the page-tables with FAR_EL1.

Firmware-first's memory-error description gives us the physical address if
firmware can learn it by imp-def means. v8.2 RAS extensions gives us an ERRxADDR
register that holds the physical-address.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-03-31 17:00     ` James Morse
@ 2020-04-01  3:49       ` Xiaofei Tan
  2020-04-07 16:37         ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-01  3:49 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James

On 2020/4/1 1:00, James Morse wrote:
> On 3/31/20 10:41 AM, Xiaofei Tan wrote:
>> On 2020/3/31 0:49, James Morse wrote:
>>> On 3/30/20 2:10 PM, Xiaofei Tan wrote:
>>>> I'm a little confused about the handling process of SEA happened in user space.
>>>
>>>> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
>>>> valid only for synchronous External abort on a translation table walk.
>>>
>>>> But for this FAR valid scenario(attached code from line 684 to 687),
>>>> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
>>>> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.
>>>
>>> 'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
>>> all' external abort fault-status-code. This lets the CPU tell us that it doesn't
>>> know what the faulting virtual address is.
> 
>>> I'm not quite sure what your question is.
>>>
>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>> alternative is to upgrade to SIGKILL in that case.
> 
>> Got it. May be the description of FnV bit of register ESR_ELx is not quite exactly. Because
>> following the code, CPU may still have an chance to tell the address for SEA, not on translation table walk.
> 
> Its up to the CPU. If it has a VA for this fault, it can store it in FAR_EL1. If
> it doesn't, it can set ESR_EL1.FnV to say the value in FAR_EL1 is UNKNOWN.
> 
> (these are some made up examples, I don't know how any particular CPU does this...)
> For example, the address translation may be the last thing the CPU does. When it
> gets an error, it still has the VA address on hand, and can report it in FAR_EL1.
> Another CPU may do all the address translation early, when it gets an error, all
> it has is the physical address, which it can't put in FAR_EL1.
> 

OK.

> For the translation table walks, the CPU obviously has to have the VA on hand to
> do the walk, so its expected to report it.
> 

OK.

> 
>>> If you see this instead of the address provided via firmware-first, there is a
>>> series to improve that here:
>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>
>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>> take the signal from memory_failure() instead, which may have better information.)
> 
>> This should be an great direction.
>> I have two concerns.
>> 1.memory_failure() is only called for "memory error section" record. Then
>> should we use this memory record for ghes sea report? Our platform is
>> using "ARM processor error section".
> 
> For what classes of error?
> 

Both processor cache ecc error and memory error (marked by poison) can lead to SEA.

> If memory has become corrupted, you should tell the OS about the memory error.
> 
>>From (my) memory: linux will just print out 'processor errors', and panic() if
> they are marked as fatal. I don't think you can use these to convey a memory
> error...
> 

OK. Then firmware should detect error source. If it is processor cache error,
we use "ARM processor error section", else if it is memory error, we use "memory error section".

Normally, we report memory error only from RAS node of DDRC or HHA module. For SEA,
It is a little strange to report as memory error when collect errors from processor RAS node.

> 
>> 2.Should we define an error source structure for each cpu core in HEST table?
>> If not, there may be conflict if more than one cpu core fall into SEA.
> 
> This is a question for the people who wrote your firmware.
> For firmware first, you must have set SCR_EL3.EA. What does your firmware do if
> two CPUs take an external abort at the same time?

Will block the second one until first SEA finished and error source of HEST table free.

> 
> Each CPU having its own area to read/write CPER would mean you need one
> NOTIFY_SEA entry in the HEST for each area ... but how does the OS know which
> CPU is which?

Yes, OS don't know this.
So, it is ok to share the only one area for all CPUs.

> 
> I think its better for there to be one area for CPER. If a second CPU takes an
> external abort while the first is processing it, it has to be held in firmware
> until the GHESv2 ACK says the area is no longer in use.
> This way firmware guarantees the CPU taking the emulated external abort, will
> always find its records in the CPER area.
> 

Agree.

> 
>>> If its the SIGKILL entries: these are for the translation table walk.
>>> There is no point telling user-space about corruption in its page tables as it
>>> can't do anything about it. The kernel's handling of this is to kill the
>>> process. (page tables make up a very small amount of memory, so this should be
>>> rarer than the regular 'external abort' case)
> 
>> Hmm, then it is useless that CPU record address for this entries.
> 
> An OS that is better than linux may use FAR_EL1 to handle these errors!
> 
> Linux doesn't because user-space memory can be re-mapped by another CPU. We need
> to know the affected physical-address in order to handle the error, but can't
> know what that was if a remote CPU remapped the page between us taking the
> external abort, and do_sea() starting to walk the page-tables with FAR_EL1.
> 
> Firmware-first's memory-error description gives us the physical address if
> firmware can learn it by imp-def means. v8.2 RAS extensions gives us an ERRxADDR
> register that holds the physical-address.
> 

Got it, thanks.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-03-30 16:49 ` James Morse
  2020-03-31  9:41   ` Xiaofei Tan
@ 2020-04-02  6:35   ` Xiaofei Tan
  2020-04-07 16:37     ` James Morse
  1 sibling, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-02  6:35 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

On 2020/3/31 0:49, James Morse wrote:
> Hi Xiaofei,
> 
> On 3/30/20 2:10 PM, Xiaofei Tan wrote:
>> I'm a little confused about the handling process of SEA happened in user space.
> 
>> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
>> valid only for synchronous External abort on a translation table walk.
> 
>> But for this FAR valid scenario(attached code from line 684 to 687),
>> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
>> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.
> 
> 'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
> all' external abort fault-status-code. This lets the CPU tell us that it doesn't
> know what the faulting virtual address is.
> 
> do_sea() checks for this:
> |	if (esr & ESR_ELx_FnV)
> |		siaddr = NULL;
> |	else
> |		siaddr  = (void __user *)addr;
> 
> If we can't know the address, there isn't much we can do.
> (This check is really making sure we don't pass junk to user-space when FnV is set)
> 
> 
>> should it be an problem ?
> 
> I'm not quite sure what your question is.
> 
> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
> alternative is to upgrade to SIGKILL in that case.
> 
> 
> If you see this instead of the address provided via firmware-first, there is a
> series to improve that here:
> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
> 
> (We skip this signal code of APEI promises it did all the work. This lets you
> take the signal from memory_failure() instead, which may have better information.)
> 

There may be an competition issue.
APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
before here SEA handling end, and application process may back to run.

> 
> If its the SIGKILL entries: these are for the translation table walk.
> There is no point telling user-space about corruption in its page tables as it
> can't do anything about it. The kernel's handling of this is to kill the
> process. (page tables make up a very small amount of memory, so this should be
> rarer than the regular 'external abort' case)
> 
> 
> Thanks,
> 
> James
> 
> 
> 
>> 680         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
>> 684         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 (translation table walk)"      },
>> 685         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 (translation table walk)"      },
>> 686         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 (translation table walk)"      },
>> 687         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 (translation table walk)"      },
>> 688         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented
>> 692         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 693         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 694         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 695         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 696         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 32"                    },
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-01  3:49       ` Xiaofei Tan
@ 2020-04-07 16:37         ` James Morse
  2020-04-09  8:42           ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-04-07 16:37 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 01/04/2020 04:49, Xiaofei Tan wrote:
> On 2020/4/1 1:00, James Morse wrote:
>> On 3/31/20 10:41 AM, Xiaofei Tan wrote:
>>> 1.memory_failure() is only called for "memory error section" record. Then
>>> should we use this memory record for ghes sea report? Our platform is
>>> using "ARM processor error section".

>> For what classes of error?

> Both processor cache ecc error and memory error (marked by poison) can lead to SEA.

These are the errors that can cause the hardware to notify software via external abort.
For which classes of error does your firmware then use a 'processor error'?

It sounds like you assume everything reported in the CPU records must be a processor
error, and everything reported by the memory error must be a memory error.


(digression: this isn't really true!
The CPU could report that it read poison from memory. Is this a memory error, or a CPU
error? Equally the memory controller could report that a PCIe device wrote to a
not-present DIMM. Is this a memory error?)


>> If memory has become corrupted, you should tell the OS about the memory error.
>>
>> >From (my) memory: linux will just print out 'processor errors', and panic() if
>> they are marked as fatal. I don't think you can use these to convey a memory
>> error...
>>
> 
> OK. Then firmware should detect error source. If it is processor cache error,
> we use "ARM processor error section", else if it is memory error, we use "memory error section".

> Normally, we report memory error only from RAS node of DDRC or HHA module. For SEA,

Do you have patches to get linux to do something useful with the processor error nodes?

We'd need it to handle uncorrected cache errors with a physical address, as if they were
memory errors...

A virtual address is no-use as the memory may have been re-mapped in the meantime.


> It is a little strange to report as memory error when collect errors from processor
> RAS node.

Its pragmatic: today linux ignores the processor errors.
If you suffer a cache error, the memory that backed that cached location is now also
corrupt, as you've lost the writes that made the cache-line dirty.

If you can describe this memory corruption, without treating it as 'the error' then an OS
that doesn't know about the process error sections will still do the right thing. (i.e.
leave out the device/row/rank stuff to avoid it being attributed to a DIMM)

The downside is you have fake memory errors when nothing bad happened to the DIMM. These
should be uniform, and smaller than the errors actually occurring at the DIMM.

I've no idea if patches adding support for the processor error nodes would be considered
for stable.


>>> 2.Should we define an error source structure for each cpu core in HEST table?
>>> If not, there may be conflict if more than one cpu core fall into SEA.
>>
>> This is a question for the people who wrote your firmware.
>> For firmware first, you must have set SCR_EL3.EA. What does your firmware do if
>> two CPUs take an external abort at the same time?
> 
> Will block the second one until first SEA finished and error source of HEST table free.

Okay, so one 'SEA' entry in the HEST describes the single region that CPER will be written to.


>> Each CPU having its own area to read/write CPER would mean you need one
>> NOTIFY_SEA entry in the HEST for each area ... but how does the OS know which
>> CPU is which?
> 
> Yes, OS don't know this.
> So, it is ok to share the only one area for all CPUs.

Yes, as there is no way to pair the memory with the CPUs.

If there is more than one region, then each CPU taking an external abort will walk the
list, checking each one.

Its up to firmware to ensure this is serialised. Sounds like you've got this sorted.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-02  6:35   ` Xiaofei Tan
@ 2020-04-07 16:37     ` James Morse
  2020-04-09  9:17       ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-04-07 16:37 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 02/04/2020 07:35, Xiaofei Tan wrote:
> On 2020/3/31 0:49, James Morse wrote:
>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>> alternative is to upgrade to SIGKILL in that case.
>>
>>
>> If you see this instead of the address provided via firmware-first, there is a
>> series to improve that here:
>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>
>> (We skip this signal code of APEI promises it did all the work. This lets you
>> take the signal from memory_failure() instead, which may have better information.)

> There may be an competition issue.
> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
> before here SEA handling end, and application process may back to run.

I'm not sure what you mean by 'bottom half', isn't this a softirq term?

With that series, it runs in process-context as task-work. memory_failure() needs to
sleep, so it has to run in process-context. Doing it as task-work means it runs before the
thread returns to user-space.


If another thread in the same process accesses the affected memory, I'd expect to take a
second external abort. If another process had the page mapped, it could access the
affected memory, again taking an external abort.

These two could happen while the first CPU was in firmware generating the CPER records, so
its not a race we can fix. It should be harmless, the recovery action is the same, its
just the error counters that count more events than errors. If you actually see it happen,
we can try and make it smaller...


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-07 16:37         ` James Morse
@ 2020-04-09  8:42           ` Xiaofei Tan
  2020-04-09 14:28             ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-09  8:42 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

On 2020/4/8 0:37, James Morse wrote:
> On 01/04/2020 04:49, Xiaofei Tan wrote:
>> On 2020/4/1 1:00, James Morse wrote:
>>> On 3/31/20 10:41 AM, Xiaofei Tan wrote:
>>>> 1.memory_failure() is only called for "memory error section" record. Then
>>>> should we use this memory record for ghes sea report? Our platform is
>>>> using "ARM processor error section".
> 
>>> For what classes of error?
> 
>> Both processor cache ecc error and memory error (marked by poison) can lead to SEA.
> 
> These are the errors that can cause the hardware to notify software via external abort.
> For which classes of error does your firmware then use a 'processor error'?
> 

For all processor generation and consumption errors.

> It sounds like you assume everything reported in the CPU records must be a processor
> error, and everything reported by the memory error must be a memory error.
> 

Not exactly, we use processor error section for everything reported in the CPU records, and
the memory is similar. Though it just consumed errors from other node. So, the problem is what
kinds of error section should we use for error consumption case.

> 
> (digression: this isn't really true!
> The CPU could report that it read poison from memory. Is this a memory error, or a CPU
> error? Equally the memory controller could report that a PCIe device wrote to a
> not-present DIMM. Is this a memory error?)
> 

Yes, this is error consumption case.

> 
>>> If memory has become corrupted, you should tell the OS about the memory error.
>>>
>>> >From (my) memory: linux will just print out 'processor errors', and panic() if
>>> they are marked as fatal. I don't think you can use these to convey a memory
>>> error...
>>>
>>
>> OK. Then firmware should detect error source. If it is processor cache error,
>> we use "ARM processor error section", else if it is memory error, we use "memory error section".
> 
>> Normally, we report memory error only from RAS node of DDRC or HHA module. For SEA,
> 
> Do you have patches to get linux to do something useful with the processor error nodes?
> 
> We'd need it to handle uncorrected cache errors with a physical address, as if they were
> memory errors...
> 

Yes, we have some patches to do this thing inside. Then memory_failure() will be called for
arm processor error section when physical address is available.

> A virtual address is no-use as the memory may have been re-mapped in the meantime.
> 

Right.

> 
>> It is a little strange to report as memory error when collect errors from processor
>> RAS node.
> 
> Its pragmatic: today linux ignores the processor errors.
> If you suffer a cache error, the memory that backed that cached location is now also
> corrupt, as you've lost the writes that made the cache-line dirty.
> 
> If you can describe this memory corruption, without treating it as 'the error' then an OS
> that doesn't know about the process error sections will still do the right thing. (i.e.
> leave out the device/row/rank stuff to avoid it being attributed to a DIMM)
> 
> The downside is you have fake memory errors when nothing bad happened to the DIMM. These
> should be uniform, and smaller than the errors actually occurring at the DIMM.
> 

agree.

> I've no idea if patches adding support for the processor error nodes would be considered
> for stable.
> 

I think this part is worth improving.
BTW, should ARM processor record physical address when consumed an memory poison error for SEA?
It is helpful to do error recovery. Is this mandatory for arm spec?

> 
>>>> 2.Should we define an error source structure for each cpu core in HEST table?
>>>> If not, there may be conflict if more than one cpu core fall into SEA.
>>>
>>> This is a question for the people who wrote your firmware.
>>> For firmware first, you must have set SCR_EL3.EA. What does your firmware do if
>>> two CPUs take an external abort at the same time?
>>
>> Will block the second one until first SEA finished and error source of HEST table free.
> 
> Okay, so one 'SEA' entry in the HEST describes the single region that CPER will be written to.
> 
> 
>>> Each CPU having its own area to read/write CPER would mean you need one
>>> NOTIFY_SEA entry in the HEST for each area ... but how does the OS know which
>>> CPU is which?
>>
>> Yes, OS don't know this.
>> So, it is ok to share the only one area for all CPUs.
> 
> Yes, as there is no way to pair the memory with the CPUs.
> 
> If there is more than one region, then each CPU taking an external abort will walk the
> list, checking each one.
> 
> Its up to firmware to ensure this is serialised. Sounds like you've got this sorted.
> 

Yes

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-07 16:37     ` James Morse
@ 2020-04-09  9:17       ` Xiaofei Tan
  2020-04-09 14:28         ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-09  9:17 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

On 2020/4/8 0:37, James Morse wrote:
> On 02/04/2020 07:35, Xiaofei Tan wrote:
>> On 2020/3/31 0:49, James Morse wrote:
>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>> alternative is to upgrade to SIGKILL in that case.
>>>
>>>
>>> If you see this instead of the address provided via firmware-first, there is a
>>> series to improve that here:
>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>
>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>> take the signal from memory_failure() instead, which may have better information.)
> 
>> There may be an competition issue.
>> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
>> before here SEA handling end, and application process may back to run.
> 
> I'm not sure what you mean by 'bottom half', isn't this a softirq term?
> 

I mean the bottom half of interrupt. Of course, this is SEA, but similar.

> With that series, it runs in process-context as task-work. memory_failure() needs to
> sleep, so it has to run in process-context. 


> Doing it as task-work means it runs before the thread returns to user-space.

Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
have returned to user-space before it.


BTW, What context synchronous exception abort is? I thought it was process-context.
Because in_interrupt() return false called in do_sea().

> 
> If another thread in the same process accesses the affected memory, I'd expect to take a
> second external abort. If another process had the page mapped, it could access the
> affected memory, again taking an external abort.
> 

Yes, it is hard to avoid another thread to access the affected memory.
I just worry the same thread access it again.

> These two could happen while the first CPU was in firmware generating the CPER records, so
> its not a race we can fix. It should be harmless, the recovery action is the same, its
> just the error counters that count more events than errors. If you actually see it happen,
> we can try and make it smaller...
> 

Hmm, maybe this double SEA handling is an solution.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-09  8:42           ` Xiaofei Tan
@ 2020-04-09 14:28             ` James Morse
  2020-04-10  2:55               ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-04-09 14:28 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 09/04/2020 09:42, Xiaofei Tan wrote:
> James Morse wrote:
>> Do you have patches to get linux to do something useful with the processor error nodes?
>>
>> We'd need it to handle uncorrected cache errors with a physical address, as if they were
>> memory errors...

> Yes, we have some patches to do this thing inside. Then memory_failure() will be called for
> arm processor error section when physical address is available.

I look forward to reading them!

[...]

> I think this part is worth improving.

> BTW, should ARM processor record physical address when consumed an memory poison error for SEA?
> It is helpful to do error recovery. Is this mandatory for arm spec?

ERR<n>ADDR? Its not mandatory to be filled for any error. It can be some imp-def bus
address or a virtual address. It can also be left out if ERR<b>STATUS.AV says its not valid.

This is really a question for your hardware people. Does your implementation always give a
physical-address for a synchronous external abort?


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-09  9:17       ` Xiaofei Tan
@ 2020-04-09 14:28         ` James Morse
  2020-04-10  9:43           ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-04-09 14:28 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 09/04/2020 10:17, Xiaofei Tan wrote:
> On 2020/4/8 0:37, James Morse wrote:
>> On 02/04/2020 07:35, Xiaofei Tan wrote:
>>> On 2020/3/31 0:49, James Morse wrote:
>>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>>> alternative is to upgrade to SIGKILL in that case.
>>>>
>>>>
>>>> If you see this instead of the address provided via firmware-first, there is a
>>>> series to improve that here:
>>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>>
>>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>>> take the signal from memory_failure() instead, which may have better information.)
>>
>>> There may be an competition issue.
>>> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
>>> before here SEA handling end, and application process may back to run.

>> With that series, it runs in process-context as task-work. memory_failure() needs to
>> sleep, so it has to run in process-context. 
> 
> 
>> Doing it as task-work means it runs before the thread returns to user-space.
> 
> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
> have returned to user-space before it.

ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
and we call tracehook_notify_resume() which ends up in task_work_run()...

That TIF flag effectively prevents this thread returning to user-space until that task
work has run.


> BTW, What context synchronous exception abort is? I thought it was process-context.

It depends what you interrupted.
32bit had different CPU modes for different contexts, we don't have that in 64bit. Instead
we mask asynchronous interrupts, and tinker with the preempt count to track the context.
Synchronous exceptions can't be masked, so they happen in whatever context you were
already in.
This means the exception handlers have to be be prepared for each eventuality.
(which is why that code is starting to look complex)


> Because in_interrupt() return false called in do_sea().

If you took the exception from EL0, or EL1 process context, yes. If you took the exception
from an IRQ handler, in_interrupt() would return true.


>> If another thread in the same process accesses the affected memory, I'd expect to take a
>> second external abort. If another process had the page mapped, it could access the
>> affected memory, again taking an external abort.

> Yes, it is hard to avoid another thread to access the affected memory.
> I just worry the same thread access it again.

This is the race that that series fixes.
It can't happen with mainline as the arch code unconditionally signals the affected
process, which was the pre-RAS behaviour.

>> These two could happen while the first CPU was in firmware generating the CPER records, so
>> its not a race we can fix. It should be harmless, the recovery action is the same, its
>> just the error counters that count more events than errors. If you actually see it happen,
>> we can try and make it smaller...

> Hmm, maybe this double SEA handling is an solution.

It assumes you get a second external-abort. We know this thread is affected, and will try
and consume the error again if we restart it. We shouldn't restart it until we've given
the recovery our best shot.
Letting it loose is a poor choice if you have any kind of threshold for error-counts. They
may jump NR_CPUs at a time until every CPU is waiting in memory_failure()...


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-09 14:28             ` James Morse
@ 2020-04-10  2:55               ` Xiaofei Tan
  2020-04-16 13:27                 ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-10  2:55 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

On 2020/4/9 22:28, James Morse wrote:
> On 09/04/2020 09:42, Xiaofei Tan wrote:
>> James Morse wrote:
>>> Do you have patches to get linux to do something useful with the processor error nodes?
>>>
>>> We'd need it to handle uncorrected cache errors with a physical address, as if they were
>>> memory errors...
> 
>> Yes, we have some patches to do this thing inside. Then memory_failure() will be called for
>> arm processor error section when physical address is available.
> 
> I look forward to reading them!
> 

https://lkml.org/lkml/2018/1/26/197

Our guy tried to upstream it, but not accepted. :(

> [...]
> 
>> I think this part is worth improving.
> 
>> BTW, should ARM processor record physical address when consumed an memory poison error for SEA?
>> It is helpful to do error recovery. Is this mandatory for arm spec?
> 
> ERR<n>ADDR? Its not mandatory to be filled for any error. It can be some imp-def bus
> address or a virtual address. 

virtual address ? but arm spec called it physical address.


> It can also be left out if ERR<b>STATUS.AV says its not valid.
> 

Yes

> This is really a question for your hardware people. 

Yes

> Does your implementation always give a physical-address for a synchronous external abort?
> 

We hope so. But hardware guys say it is hard to record physical address for every situation.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-09 14:28         ` James Morse
@ 2020-04-10  9:43           ` Xiaofei Tan
  2020-04-16 13:50             ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-10  9:43 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

On 2020/4/9 22:28, James Morse wrote:
> On 09/04/2020 10:17, Xiaofei Tan wrote:
>> On 2020/4/8 0:37, James Morse wrote:
>>> On 02/04/2020 07:35, Xiaofei Tan wrote:
>>>> On 2020/3/31 0:49, James Morse wrote:
>>>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>>>> alternative is to upgrade to SIGKILL in that case.
>>>>>
>>>>>
>>>>> If you see this instead of the address provided via firmware-first, there is a
>>>>> series to improve that here:
>>>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>>>
>>>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>>>> take the signal from memory_failure() instead, which may have better information.)
>>>
>>>> There may be an competition issue.
>>>> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
>>>> before here SEA handling end, and application process may back to run.
> 
>>> With that series, it runs in process-context as task-work. memory_failure() needs to
>>> sleep, so it has to run in process-context. 
>>
>>
>>> Doing it as task-work means it runs before the thread returns to user-space.
>>
>> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
>> have returned to user-space before it.
> 
> ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
> and we call tracehook_notify_resume() which ends up in task_work_run()...
> 
> That TIF flag effectively prevents this thread returning to user-space until that task
> work has run.
> 

Got it. This function is great.
BTW, i have not found the place of setting the flag _TIF_NOTIFY_RESUME. Is it set by default for each thread?

> 
>> BTW, What context synchronous exception abort is? I thought it was process-context.
> 
> It depends what you interrupted.
> 32bit had different CPU modes for different contexts, we don't have that in 64bit. Instead
> we mask asynchronous interrupts, and tinker with the preempt count to track the context.
> Synchronous exceptions can't be masked, so they happen in whatever context you were
> already in.
> This means the exception handlers have to be be prepared for each eventuality.
> (which is why that code is starting to look complex)
> 

OK.

> 
>> Because in_interrupt() return false called in do_sea().
> 
> If you took the exception from EL0, or EL1 process context, yes. If you took the exception
> from an IRQ handler, in_interrupt() would return true.
> 

Got it.

> 
>>> If another thread in the same process accesses the affected memory, I'd expect to take a
>>> second external abort. If another process had the page mapped, it could access the
>>> affected memory, again taking an external abort.
> 
>> Yes, it is hard to avoid another thread to access the affected memory.
>> I just worry the same thread access it again.
> 
> This is the race that that series fixes.
> It can't happen with mainline as the arch code unconditionally signals the affected
> process, which was the pre-RAS behaviour.
> 

OK

>>> These two could happen while the first CPU was in firmware generating the CPER records, so
>>> its not a race we can fix. It should be harmless, the recovery action is the same, its
>>> just the error counters that count more events than errors. If you actually see it happen,
>>> we can try and make it smaller...
> 
>> Hmm, maybe this double SEA handling is an solution.
> 
> It assumes you get a second external-abort. We know this thread is affected, and will try
> and consume the error again if we restart it. We shouldn't restart it until we've given
> the recovery our best shot.
> Letting it loose is a poor choice if you have any kind of threshold for error-counts. They
> may jump NR_CPUs at a time until every CPU is waiting in memory_failure()...
> 

Got it. Thanks.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-10  2:55               ` Xiaofei Tan
@ 2020-04-16 13:27                 ` James Morse
  2020-04-18 10:49                   ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-04-16 13:27 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi Xiaofei,

On 10/04/2020 03:55, Xiaofei Tan wrote:
> On 2020/4/9 22:28, James Morse wrote:
>> On 09/04/2020 09:42, Xiaofei Tan wrote:
>>> James Morse wrote:
>>>> Do you have patches to get linux to do something useful with the processor error nodes?
>>>>
>>>> We'd need it to handle uncorrected cache errors with a physical address, as if they were
>>>> memory errors...
>>
>>> Yes, we have some patches to do this thing inside. Then memory_failure() will be called for
>>> arm processor error section when physical address is available.
>>
>> I look forward to reading them!

> https://lkml.org/lkml/2018/1/26/197
> 
> Our guy tried to upstream it, but not accepted. :(

Wrong series?

https://lkml.org/lkml/2018/1/26/194 is not creating any handing for processor error nodes.

That series tried to to suck all the pending errors out of the core code, into an arch
specific queue:
| arch/arm64/kernel/ras.c              | 173 +++++++++++++++++++++++++++++++++++

As far as I understand it, that was to ensure the memory_failure() work was done before we
return to user-space.

My attempt to fix that got rolled up in the SDEI series. It was posted again here:
https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/


If you need processor errors handling, there should be code added to the
CPER_SEC_PROC_ARM else-if in ghes_do_proc() to do the handling.

You may end up duplicating bits of ghes_handle_memory_failure(), to report the memory
errors that happened in the cache.
If you want to count corrected errors, a device in ghes_edac is probably the way to do that.


>> [...]
>>
>>> I think this part is worth improving.
>>
>>> BTW, should ARM processor record physical address when consumed an memory poison error for SEA?
>>> It is helpful to do error recovery. Is this mandatory for arm spec?
>>
>> ERR<n>ADDR? Its not mandatory to be filled for any error. It can be some imp-def bus
>> address or a virtual address. 
> 
> virtual address ? but arm spec called it physical address.

That was my recollection too! But I checked again before writing this:

"4.4.5 ERR<n>ADDR, Error Record Address Register" in
https://static.docs.arm.com/ddi0587/cb/2019_07_05_DD_0587_C_b.pdf

has a VA bit for a virtual-address, and 'AI' for this imp-def bus address, more properly
described as on that "might not match the programmers' view of the physical address for
the recorded location."


[...]

>> Does your implementation always give a physical-address for a synchronous external abort?

> We hope so. But hardware guys say it is hard to record physical address for every situation.

Yeah ...

Hopefully the situations where its too-hard are also the rarest, we can class these as
fatal (because we can't handle them).


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-10  9:43           ` Xiaofei Tan
@ 2020-04-16 13:50             ` James Morse
  2020-04-18 11:25               ` Xiaofei Tan
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2020-04-16 13:50 UTC (permalink / raw)
  To: Xiaofei Tan
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

On 10/04/2020 10:43, Xiaofei Tan wrote:
> On 2020/4/9 22:28, James Morse wrote:
>> On 09/04/2020 10:17, Xiaofei Tan wrote:
>>> On 2020/4/8 0:37, James Morse wrote:
>>>> With that series, it runs in process-context as task-work. memory_failure() needs to
>>>> sleep, so it has to run in process-context. 

>>>> Doing it as task-work means it runs before the thread returns to user-space.
>>>
>>> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
>>> have returned to user-space before it.
>>
>> ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
>> and we call tracehook_notify_resume() which ends up in task_work_run()...
>>
>> That TIF flag effectively prevents this thread returning to user-space until that task
>> work has run.

> Got it. This function is great.

I think PeterZ pointed me at it,


> BTW, i have not found the place of setting the flag _TIF_NOTIFY_RESUME. Is it set by default for each thread?

With that series, APEI calls task_work_add(), which calls set_notify_resume() from
include/linux/tracehook.h
|	if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME))


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-16 13:27                 ` James Morse
@ 2020-04-18 10:49                   ` Xiaofei Tan
  0 siblings, 0 replies; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-18 10:49 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James,

On 2020/4/16 21:27, James Morse wrote:
> On 10/04/2020 03:55, Xiaofei Tan wrote:
>> On 2020/4/9 22:28, James Morse wrote:
>>> On 09/04/2020 09:42, Xiaofei Tan wrote:
>>>> James Morse wrote:
>>>>> Do you have patches to get linux to do something useful with the processor error nodes?
>>>>>
>>>>> We'd need it to handle uncorrected cache errors with a physical address, as if they were
>>>>> memory errors...
>>>
>>>> Yes, we have some patches to do this thing inside. Then memory_failure() will be called for
>>>> arm processor error section when physical address is available.
>>>
>>> I look forward to reading them!
> 
>> https://lkml.org/lkml/2018/1/26/197
>>
>> Our guy tried to upstream it, but not accepted. :(
> 
> Wrong series?

No

> 
> https://lkml.org/lkml/2018/1/26/194 is not creating any handing for processor error nodes.
> 

The main patch is this. It just re-write the function ghes_arm_process_error().
https://lkml.org/lkml/2018/1/26/198.


> That series tried to to suck all the pending errors out of the core code, into an arch
> specific queue:
> | arch/arm64/kernel/ras.c              | 173 +++++++++++++++++++++++++++++++++++
> 
> As far as I understand it, that was to ensure the memory_failure() work was done before we
> return to user-space.
> 
> My attempt to fix that got rolled up in the SDEI series. It was posted again here:
> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
> 
> 
> If you need processor errors handling, there should be code added to the
> CPER_SEC_PROC_ARM else-if in ghes_do_proc() to do the handling.
> 
> You may end up duplicating bits of ghes_handle_memory_failure(), to report the memory
> errors that happened in the cache.
> If you want to count corrected errors, a device in ghes_edac is probably the way to do that.
> 

OK. I will do some research for this. thanks.

> 
>>> [...]
>>>
>>>> I think this part is worth improving.
>>>
>>>> BTW, should ARM processor record physical address when consumed an memory poison error for SEA?
>>>> It is helpful to do error recovery. Is this mandatory for arm spec?
>>>
>>> ERR<n>ADDR? Its not mandatory to be filled for any error. It can be some imp-def bus
>>> address or a virtual address. 
>>
>> virtual address ? but arm spec called it physical address.
> 
> That was my recollection too! But I checked again before writing this:
> 
> "4.4.5 ERR<n>ADDR, Error Record Address Register" in
> https://static.docs.arm.com/ddi0587/cb/2019_07_05_DD_0587_C_b.pdf
> 
> has a VA bit for a virtual-address, and 'AI' for this imp-def bus address, more properly
> described as on that "might not match the programmers' view of the physical address for
> the recorded location."
> 

OK.The spec also support hw to record virtual address.

> 
> [...]
> 
>>> Does your implementation always give a physical-address for a synchronous external abort?
> 
>> We hope so. But hardware guys say it is hard to record physical address for every situation.
> 
> Yeah ...
> 
> Hopefully the situations where its too-hard are also the rarest, we can class these as
> fatal (because we can't handle them).
> 

Agree.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about SEA handling process happened in user space
  2020-04-16 13:50             ` James Morse
@ 2020-04-18 11:25               ` Xiaofei Tan
  0 siblings, 0 replies; 18+ messages in thread
From: Xiaofei Tan @ 2020-04-18 11:25 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Linuxarm, Will Deacon, Dave Martin,
	linux-arm-kernel, Shiju Jose

Hi James

On 2020/4/16 21:50, James Morse wrote:
> On 10/04/2020 10:43, Xiaofei Tan wrote:
>> On 2020/4/9 22:28, James Morse wrote:
>>> On 09/04/2020 10:17, Xiaofei Tan wrote:
>>>> On 2020/4/8 0:37, James Morse wrote:
>>>>> With that series, it runs in process-context as task-work. memory_failure() needs to
>>>>> sleep, so it has to run in process-context. 
> 
>>>>> Doing it as task-work means it runs before the thread returns to user-space.
>>>>
>>>> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
>>>> have returned to user-space before it.
>>>
>>> ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
>>> and we call tracehook_notify_resume() which ends up in task_work_run()...
>>>
>>> That TIF flag effectively prevents this thread returning to user-space until that task
>>> work has run.
> 
>> Got it. This function is great.
> 
> I think PeterZ pointed me at it,
> 

Great.

> 
>> BTW, i have not found the place of setting the flag _TIF_NOTIFY_RESUME. Is it set by default for each thread?
> 
> With that series, APEI calls task_work_add(), which calls set_notify_resume() from
> include/linux/tracehook.h
> |	if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME))
> 

Got it. thanks.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-04-18 11:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30 13:10 Question about SEA handling process happened in user space Xiaofei Tan
2020-03-30 16:49 ` James Morse
2020-03-31  9:41   ` Xiaofei Tan
2020-03-31 17:00     ` James Morse
2020-04-01  3:49       ` Xiaofei Tan
2020-04-07 16:37         ` James Morse
2020-04-09  8:42           ` Xiaofei Tan
2020-04-09 14:28             ` James Morse
2020-04-10  2:55               ` Xiaofei Tan
2020-04-16 13:27                 ` James Morse
2020-04-18 10:49                   ` Xiaofei Tan
2020-04-02  6:35   ` Xiaofei Tan
2020-04-07 16:37     ` James Morse
2020-04-09  9:17       ` Xiaofei Tan
2020-04-09 14:28         ` James Morse
2020-04-10  9:43           ` Xiaofei Tan
2020-04-16 13:50             ` James Morse
2020-04-18 11:25               ` Xiaofei Tan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.