All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: "Baicar, Tyler" <tbaicar@codeaurora.org>
Cc: linux-efi@vger.kernel.org, kvm@vger.kernel.org,
	matt@codeblueprint.co.uk, catalin.marinas@arm.com,
	will.deacon@arm.com, robert.moore@intel.com,
	paul.gortmaker@windriver.com, lv.zheng@intel.com,
	kvmarm@lists.cs.columbia.edu, fu.wei@linaro.org,
	zjzhang@codeaurora.org, linux@armlinux.org.uk,
	linux-acpi@vger.kernel.org, eun.taik.lee@samsung.com,
	shijie.huang@arm.com, labbott@redhat.com, lenb@kernel.org,
	harba@codeaurora.org, john.garry@huawei.com,
	marc.zyngier@arm.com, punit.agrawal@arm.com, rostedt@goodmis.org,
	nkaje@codeaurora.org, sandeepa.s.prabhu@gmail.com,
	linux-arm-kernel@lists.infradead.org, devel@acpica.org,
	rjw@rjwysocki.net, rruigrok@codeaurora.org,
	linux-kernel@vger.kernel.org, astone@redhat.com,
	hanjun.guo@linaro.org, pbonzini@redhat.com,
	akpm@linux-foundation.org, bristot@redhat.com,
	shiju.jose@huawei.com
Subject: Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort
Date: Thu, 19 Jan 2017 17:55:22 +0000	[thread overview]
Message-ID: <5880FD8A.2000405@arm.com> (raw)
In-Reply-To: <e8c0151e-2087-d49c-42fc-906755aca052@codeaurora.org>

Hi Tyler,

On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>>       return 1;
>>>   }
>>>   +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> +    struct siginfo info;
>>> +
>>> +    atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> +    pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> +         fault_name(esr), esr, addr);
>>> +
>>> +    info.si_signo = SIGBUS;
>>> +    info.si_errno = 0;
>>> +    info.si_code  = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>
>>
>>> +    info.si_addr  = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>>
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx  indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.

> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.

I may be misreading something here...

This patch has:
>	info.si_addr  = (void __user *)addr;
>	arm64_notify_die("", regs, &info, esr);

>From arch/arm64/kernel/traps.c:arm64_notify_die():
>	if (user_mode(regs)) {
>		current->thread.fault_address = 0;
>		current->thread.fault_code = err;
>		force_sig_info(info->si_signo, info, current);
>	}

So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.

I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)


> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.

Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.

I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]


Thanks,

James


[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse@arm.com>
To: "Baicar, Tyler" <tbaicar@codeaurora.org>
Cc: christoffer.dall@linaro.org, marc.zyngier@arm.com,
	pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk,
	catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net,
	lenb@kernel.org, matt@codeblueprint.co.uk,
	robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org,
	zjzhang@codeaurora.org, mark.rutland@arm.com,
	akpm@linux-foundation.org, eun.taik.lee@samsung.com,
	sandeepa.s.prabhu@gmail.com, labbott@redhat.com,
	shijie.huang@arm.com, rruigrok@codeaurora.org,
	paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org,
	rostedt@goodmis.org, bristot@redhat.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-efi@vger.kernel.org, devel@acpica.org,
	Suzuki.Poulose@arm.com, punit.agrawal@arm.com, astone@redhat.com,
	harba@codeaurora.org, hanjun.guo@linaro.org,
	john.garry@huawei.com, shiju.jose@huawei.com
Subject: Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort
Date: Thu, 19 Jan 2017 17:55:22 +0000	[thread overview]
Message-ID: <5880FD8A.2000405@arm.com> (raw)
In-Reply-To: <e8c0151e-2087-d49c-42fc-906755aca052@codeaurora.org>

Hi Tyler,

On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>>       return 1;
>>>   }
>>>   +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> +    struct siginfo info;
>>> +
>>> +    atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> +    pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> +         fault_name(esr), esr, addr);
>>> +
>>> +    info.si_signo = SIGBUS;
>>> +    info.si_errno = 0;
>>> +    info.si_code  = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>
>>
>>> +    info.si_addr  = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>>
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx  indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.

> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.

I may be misreading something here...

This patch has:
>	info.si_addr  = (void __user *)addr;
>	arm64_notify_die("", regs, &info, esr);

>From arch/arm64/kernel/traps.c:arm64_notify_die():
>	if (user_mode(regs)) {
>		current->thread.fault_address = 0;
>		current->thread.fault_code = err;
>		force_sig_info(info->si_signo, info, current);
>	}

So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.

I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)


> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.

Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.

I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]


Thanks,

James


[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

WARNING: multiple messages have this Message-ID (diff)
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort
Date: Thu, 19 Jan 2017 17:55:22 +0000	[thread overview]
Message-ID: <5880FD8A.2000405@arm.com> (raw)
In-Reply-To: <e8c0151e-2087-d49c-42fc-906755aca052@codeaurora.org>

Hi Tyler,

On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>>       return 1;
>>>   }
>>>   +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> +    struct siginfo info;
>>> +
>>> +    atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> +    pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> +         fault_name(esr), esr, addr);
>>> +
>>> +    info.si_signo = SIGBUS;
>>> +    info.si_errno = 0;
>>> +    info.si_code  = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>
>>
>>> +    info.si_addr  = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>>
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx  indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.

> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.

I may be misreading something here...

This patch has:
>	info.si_addr  = (void __user *)addr;
>	arm64_notify_die("", regs, &info, esr);

>From arch/arm64/kernel/traps.c:arm64_notify_die():
>	if (user_mode(regs)) {
>		current->thread.fault_address = 0;
>		current->thread.fault_code = err;
>		force_sig_info(info->si_signo, info, current);
>	}

So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.

I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)


> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.

Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.

I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]


Thanks,

James


[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

  reply	other threads:[~2017-01-19 17:55 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 18:15 [PATCH V7 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
2017-01-12 18:15 ` Tyler Baicar
2017-01-12 18:15 ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 01/10] acpi: apei: read ack upon ghes record consumption Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 03/10] efi: parse ARM processor error Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-16 11:53   ` Will Deacon
2017-01-16 11:53     ` Will Deacon
2017-01-16 11:53     ` Will Deacon
2017-01-16 20:09     ` Baicar, Tyler
2017-01-16 20:09       ` Baicar, Tyler
2017-01-16 20:09       ` Baicar, Tyler
2017-01-17 10:27       ` Will Deacon
2017-01-17 10:27         ` Will Deacon
2017-01-17 10:27         ` Will Deacon
2017-01-18 22:53         ` Baicar, Tyler
2017-01-18 22:53           ` Baicar, Tyler
2017-01-18 22:53           ` Baicar, Tyler
2017-01-17 10:23     ` James Morse
2017-01-17 10:23       ` James Morse
2017-01-17 10:23       ` James Morse
2017-01-18 22:52       ` Baicar, Tyler
2017-01-18 22:52         ` Baicar, Tyler
2017-01-18 22:52         ` Baicar, Tyler
2017-01-18 22:52         ` Baicar, Tyler
2017-01-17 10:31   ` James Morse
2017-01-17 10:31     ` James Morse
2017-01-17 10:31     ` James Morse
2017-01-18 23:26     ` Baicar, Tyler
2017-01-18 23:26       ` Baicar, Tyler
2017-01-18 23:26       ` Baicar, Tyler
2017-01-19 17:55       ` James Morse [this message]
2017-01-19 17:55         ` James Morse
2017-01-19 17:55         ` James Morse
2017-01-20 20:35         ` Baicar, Tyler
2017-01-20 20:35           ` Baicar, Tyler
2017-01-20 20:35           ` Baicar, Tyler
2017-01-23 10:01           ` James Morse
2017-01-23 10:01             ` James Morse
2017-01-23 10:01             ` James Morse
2017-01-24 18:41             ` Baicar, Tyler
2017-01-24 18:41               ` Baicar, Tyler
2017-01-24 18:41               ` Baicar, Tyler
2017-01-12 18:15 ` [PATCH V7 05/10] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-18 14:50   ` James Morse
2017-01-18 14:50     ` James Morse
2017-01-18 14:50     ` James Morse
2017-01-18 23:51     ` Baicar, Tyler
2017-01-18 23:51       ` Baicar, Tyler
2017-01-18 23:51       ` Baicar, Tyler
2017-01-19 17:57       ` James Morse
2017-01-19 17:57         ` James Morse
2017-01-19 17:57         ` James Morse
2017-01-20 20:58         ` Baicar, Tyler
2017-01-20 20:58           ` Baicar, Tyler
2017-01-20 20:58           ` Baicar, Tyler
     [not found]           ` <8b9d254a-5450-d841-baf7-5819a88043e4-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-01-24 17:55             ` James Morse
2017-01-24 17:55               ` James Morse
2017-01-24 17:55               ` James Morse
2017-01-24 18:43               ` Baicar, Tyler
2017-01-24 18:43                 ` Baicar, Tyler
2017-01-24 18:43                 ` Baicar, Tyler
2017-01-12 18:15 ` [PATCH V7 06/10] acpi: apei: panic OS with fatal error status block Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 07/10] efi: print unrecognized CPER section Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 08/10] ras: acpi / apei: generate trace event for " Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 09/10] trace, ras: add ARM processor error trace event Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15 ` [PATCH V7 10/10] arm/arm64: KVM: add guest SEA support Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
2017-01-12 18:15   ` Tyler Baicar
     [not found]   ` <1484244924-24786-11-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-01-16 11:58     ` Marc Zyngier
2017-01-16 11:58       ` Marc Zyngier
2017-01-16 11:58       ` Marc Zyngier
2017-01-16 20:14       ` Baicar, Tyler
2017-01-16 20:14         ` Baicar, Tyler
2017-01-16 20:14         ` Baicar, Tyler
2017-01-16 20:14         ` Baicar, Tyler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5880FD8A.2000405@arm.com \
    --to=james.morse@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=astone@redhat.com \
    --cc=bristot@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=devel@acpica.org \
    --cc=eun.taik.lee@samsung.com \
    --cc=fu.wei@linaro.org \
    --cc=hanjun.guo@linaro.org \
    --cc=harba@codeaurora.org \
    --cc=john.garry@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=labbott@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lv.zheng@intel.com \
    --cc=marc.zyngier@arm.com \
    --cc=matt@codeblueprint.co.uk \
    --cc=nkaje@codeaurora.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=pbonzini@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=rruigrok@codeaurora.org \
    --cc=sandeepa.s.prabhu@gmail.com \
    --cc=shijie.huang@arm.com \
    --cc=shiju.jose@huawei.com \
    --cc=tbaicar@codeaurora.org \
    --cc=will.deacon@arm.com \
    --cc=zjzhang@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.