From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754212AbbBKVIV (ORCPT ); Wed, 11 Feb 2015 16:08:21 -0500 Received: from pmta1.delivery1.ore.mailhop.org ([54.191.214.3]:56105 "EHLO pmta1.delivery1.ore.mailhop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753987AbbBKVIQ (ORCPT ); Wed, 11 Feb 2015 16:08:16 -0500 X-Mail-Handler: DuoCircle Outbound SMTP X-Originating-IP: 104.193.169.186 X-Report-Abuse-To: abuse@duocircle.com (see https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information for abuse reporting information) X-MHO-User: U2FsdGVkX1/ynGdVIq5ZZ+8iM96rel2l Date: Wed, 11 Feb 2015 12:33:58 -0800 From: Tony Lindgren To: Pali =?utf-8?B?Um9ow6Fy?= Cc: Nishanth Menon , Matthijs van Duin , Sebastian Reichel , linux-omap@vger.kernel.org, Aaro Koskinen , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot) Message-ID: <20150211203358.GG2531@atomide.com> References: <20131206213613.GA19648@earth.universe> <201502111339.54480@pali> <201502112128.44852@pali> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <201502112128.44852@pali> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Pali Rohár [150211 12:32]: > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote: > > On 11 February 2015 at 13:39, Pali Rohár > wrote: > > >> Anyhow, since checking the firewalls/APs to see if you have > > >> permission will probably only get you yet another fault if > > >> things are walled off, the robust way of dealing with this > > >> sort of situation is by probing the device with a read > > >> while trapping bus faults. This also handles modules that > > >> are unreachable for other reasons, e.g. being disabled by > > >> eFuse. > > > > > > It is possible to patch kernel code to mask or ignore that > > > fault? Can you help me with something like that? > > > > As I mentioned, I'm still learning my way around the kernel, > > so I don't feel very comfortable suggesting a concrete patch > > just yet. I've been browsing arch/arm/mm/ however and my > > impression is that all that would be required is editing > > fault.c by making a copy of do_bad but containing > > return user_mode(regs) || !fixup_exception(regs); > > and hook it onto the appropriate fault codes. However, this > > really needs the opinion of someone more familiar with this > > code. > > > > I do have an observation to make on the issue of fault > > decoding: the list in fsr-2level.c may be "standard ARMv3 and > > ARMv4 aborts" but they are quite wrong for ARMv7 which has: > > > > [ 0] - > > [ 1] alignment fault > > [ 2] debug event > > [ 3] section access flag fault > > [ 4] instruction cache maintainance fault (reported via data > > abort) [ 5] section translation fault > > [ 6] page access flag fault > > [ 7] page translation fault > > [ 8] bus error on access > > [ 9] section domain fault > > [10] - > > [11] page domain fault > > [12] bus error on section table walk > > [13] section permission fault > > [14] bus error on page table walk > > [15] page permission fault > > [16] (TLB conflict abort) > > [17] - > > [18] - > > [19] - > > [20] (lockdown abort) > > [21] - > > [22] async bus error (reported via data abort) > > [23] - > > [24] async parity/ECC error (reported via data abort) > > [25] parity/ECC error on access > > [26] (coprocessor abort) > > [27] - > > [28] parity/ECC error on section table walk > > [29] - > > [30] parity/ECC error on page table walk > > [31] - > > > > Some entries are patched up near the bottom of fault.c but > > many bogus messages remain, for example the "on linefetch" vs > > "on non-linefetch" is misleading since no such thing can be > > inferred from the fault status on v7. Also, the i-cache > > maintenance fault handling looks wrong to me: it should fetch > > the actual fault status from IFSR (even though the address > > still comes from DFSR) and dispatch based on that. > > > > Async external aborts (async bus error and async parity/ECC > > error) give you basically no info. DFAR will contain garbage > > hence displaying it will confuse rather than enlighten, a > > traceback is pointless since the instruction that caused the > > access is long retired, likewise user_mode() doesn't matter > > since a transition to kernel space may have happened after > > the access that cause the abort. Basically they should be > > treated more as an IRQ than as a fault (note they can also be > > masked just like irqs). In case of a bus error, it may be > > appropriate to just warn about it, or perhaps send a signal > > to the current process, although in the latter case it should > > have some means to distinguish it from a synchronous bus > > error. > > > > At least on the cortex-a8, a parity/ECC error (whether async > > or not) is to be regarded as absolutely fatal. Quoth the > > TRM: "No recovery is possible. The abort handler must disable > > the caches, communicate the fail directly with the external > > system, request a reboot." > > > > Bit 10 no longer indicates an asynchronous (let alone > > imprecise) fault. Apart from the debug events and async > > aborts (and possibly some implementation-defined aborts), all > > aborts listed are synchronous, and DFAR/IFAR is valid. > > There's no technical obstruction to make these trappable via > > the kernel exception handling mechanism. (Though at least in > > case of parity/ECC errors one shouldn't.) > > Tony, Nishanth, or somebody else... can you help with memory > management? Or do you know some expert for arch/arm/mm/ code? Changing the abort handling should be discussed on the linux-arm-kernel list. Probably best to play with that first for a proof of concept patch :) Regards, Tony