From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754263AbdASSKK (ORCPT ); Thu, 19 Jan 2017 13:10:10 -0500 Received: from foss.arm.com ([217.140.101.70]:32800 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751627AbdASSIH (ORCPT ); Thu, 19 Jan 2017 13:08:07 -0500 Message-ID: <5880FD8A.2000405@arm.com> Date: Thu, 19 Jan 2017 17:55:22 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: "Baicar, Tyler" CC: christoffer.dall@linaro.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net, lenb@kernel.org, matt@codeblueprint.co.uk, robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org, zjzhang@codeaurora.org, mark.rutland@arm.com, akpm@linux-foundation.org, eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com, labbott@redhat.com, shijie.huang@arm.com, rruigrok@codeaurora.org, paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org, devel@acpica.org, Suzuki.Poulose@arm.com, punit.agrawal@arm.com, astone@redhat.com, harba@codeaurora.org, hanjun.guo@linaro.org, john.garry@huawei.com, shiju.jose@huawei.com Subject: Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort References: <1484244924-24786-1-git-send-email-tbaicar@codeaurora.org> <1484244924-24786-5-git-send-email-tbaicar@codeaurora.org> <587DF279.1010402@arm.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tyler, On 18/01/17 23:26, Baicar, Tyler wrote: > On 1/17/2017 3:31 AM, James Morse wrote: >> On 12/01/17 18:15, Tyler Baicar wrote: >>> SEA exceptions are often caused by an uncorrected hardware >>> error, and are handled when data abort and instruction abort >>> exception classes have specific values for their Fault Status >>> Code. >>> When SEA occurs, before killing the process, go through >>> the handlers registered in the notification list. >>> Update fault_info[] with specific SEA faults so that the >>> new SEA handler is used. >>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr, >>> struct pt_regs *regs) >>> return 1; >>> } >>> +/* >>> + * This abort handler deals with Synchronous External Abort. >>> + * It calls notifiers, and then returns "fault". >>> + */ >>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >>> +{ >>> + struct siginfo info; >>> + >>> + atomic_notifier_call_chain(&sea_handler_chain, 0, NULL); >>> + >>> + pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n", >>> + fault_name(esr), esr, addr); >>> + >>> + info.si_signo = SIGBUS; >>> + info.si_errno = 0; >>> + info.si_code = 0; >> Half of the other do_*() functions in this file read the signo and code from the >> fault_info table. >> >> >>> + info.si_addr = (void __user *)addr; >> addr here was read from FAR_EL1, but for some of the classes of exception you >> have listed below this register isn't updated with the faulting address. >> >> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an >> Exception level that is using Aarch64" has: >>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External >>> Aborts other than Synchronous External Aborts on Translation Table Walks. In >>> this case, the ISS.FnV bit returned in ESR_ELx indicates whether FAR_ELx is >>> valid. >> This is a problem if we get 'synchronous external abort' or 'synchronous parity >> error' while a user space process was running. > It looks like this would just cause an incorrect address to be printed in the > above pr_err. > Unless I'm missing something, I don't see arm64_notify_die or anything that gets > called from > there using the info.si_addr variable. I may be misreading something here... This patch has: > info.si_addr = (void __user *)addr; > arm64_notify_die("", regs, &info, esr); >>From arch/arm64/kernel/traps.c:arm64_notify_die(): > if (user_mode(regs)) { > current->thread.fault_address = 0; > current->thread.fault_code = err; > force_sig_info(info->si_signo, info, current); > } So if the SEA interrupted userspace, we put maybe-unknown addr into force_sig_info() to deliver a signal to user space. User-space then gets a copy of the info struct containing the maybe-unknown addr. I think this is an existing bug, but if we are separating the synchronous external aborts from the generic do_bad handler, we should probably check the FnV bit. (I think we should still print it out) > What do you suggest I do here? The firmware should be reporting the physical and > virtual > address information if it is available in the HEST entry that the kernel will > parse. Its not just firmware that may trigger this, other SoCs may use it for parity or ECC errors, and they may not always have a valid address in FAR_EL1. I think we should check the FnV bit in the esr variable and set info.si_addr to 0 if the addr we have isn't valid: 'For some implementations, the value of si_addr may be inaccurate.' [0] Thanks, James [0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html