From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754212AbbBKVIV (ORCPT <rfc822;w@1wt.eu>);
	Wed, 11 Feb 2015 16:08:21 -0500
Received: from pmta1.delivery1.ore.mailhop.org ([54.191.214.3]:56105 "EHLO
	pmta1.delivery1.ore.mailhop.org" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1753987AbbBKVIQ (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 11 Feb 2015 16:08:16 -0500
X-Mail-Handler: DuoCircle Outbound SMTP
X-Originating-IP: 104.193.169.186
X-Report-Abuse-To: abuse@duocircle.com (see https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information for abuse reporting information)
X-MHO-User: U2FsdGVkX1/ynGdVIq5ZZ+8iM96rel2l
Date: Wed, 11 Feb 2015 12:33:58 -0800
From: Tony Lindgren <tony@atomide.com>
To: Pali =?utf-8?B?Um9ow6Fy?= <pali.rohar@gmail.com>
Cc: Nishanth Menon <nm@ti.com>, Matthijs van Duin <matthijsvanduin@gmail.com>,
        Sebastian Reichel <sre@ring0.de>, linux-omap@vger.kernel.org,
        Aaro Koskinen <aaro.koskinen@iki.fi>, Pavel Machek <pavel@ucw.cz>,
        linux-kernel@vger.kernel.org
Subject: Re: runtime check for omap-aes bus access permission (was: Re:
 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
Message-ID: <20150211203358.GG2531@atomide.com>
References: <20131206213613.GA19648@earth.universe>
 <201502111339.54480@pali>
 <CAALWOA_ngoSKjB=ZQ264Va37bBK7v41Ei45SyoYLiMdanTKnxQ@mail.gmail.com>
 <201502112128.44852@pali>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <201502112128.44852@pali>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

* Pali Rohár <pali.rohar@gmail.com> [150211 12:32]:
> On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> > On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com> 
> wrote:
> > >> Anyhow, since checking the firewalls/APs to see if you have
> > >> permission will probably only get you yet another fault if
> > >> things are walled off, the robust way of dealing with this
> > >> sort of situation is by probing the device with a read
> > >> while trapping bus faults. This also handles modules that
> > >> are unreachable for other reasons, e.g. being disabled by
> > >> eFuse.
> > > 
> > > It is possible to patch kernel code to mask or ignore that
> > > fault? Can you help me with something like that?
> > 
> > As I mentioned, I'm still learning my way around the kernel,
> > so I don't feel very comfortable suggesting a concrete patch
> > just yet. I've been browsing arch/arm/mm/ however and my
> > impression is that all that would be required is editing
> > fault.c by making a copy of do_bad but containing
> >     return user_mode(regs) || !fixup_exception(regs);
> > and hook it onto the appropriate fault codes.  However, this
> > really needs the opinion of someone more familiar with this
> > code.
> > 
> > I do have an observation to make on the issue of fault
> > decoding: the list in fsr-2level.c may be "standard ARMv3 and
> > ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> > 
> > [ 0] -
> > [ 1] alignment fault
> > [ 2] debug event
> > [ 3] section access flag fault
> > [ 4] instruction cache maintainance fault (reported via data
> > abort) [ 5] section translation fault
> > [ 6] page access flag fault
> > [ 7] page translation fault
> > [ 8] bus error on access
> > [ 9] section domain fault
> > [10] -
> > [11] page domain fault
> > [12] bus error on section table walk
> > [13] section permission fault
> > [14] bus error on page table walk
> > [15] page permission fault
> > [16] (TLB conflict abort)
> > [17] -
> > [18] -
> > [19] -
> > [20] (lockdown abort)
> > [21] -
> > [22] async bus error (reported via data abort)
> > [23] -
> > [24] async parity/ECC error (reported via data abort)
> > [25] parity/ECC error on access
> > [26] (coprocessor abort)
> > [27] -
> > [28] parity/ECC error on section table walk
> > [29] -
> > [30] parity/ECC error on page table walk
> > [31] -
> > 
> > Some entries are patched up near the bottom of fault.c but
> > many bogus messages remain, for example the "on linefetch" vs
> > "on non-linefetch" is misleading since no such thing can be
> > inferred from the fault status on v7.  Also, the i-cache
> > maintenance fault handling looks wrong to me: it should fetch
> > the actual fault status from IFSR (even though the address
> > still comes from DFSR) and dispatch based on that.
> > 
> > Async external aborts (async bus error and async parity/ECC
> > error) give you basically no info. DFAR will contain garbage
> > hence displaying it will confuse rather than enlighten, a
> > traceback is pointless since the instruction that caused the
> > access is long retired, likewise user_mode() doesn't matter
> > since a transition to kernel space may have happened after
> > the access that cause the abort. Basically they should be
> > treated more as an IRQ than as a fault (note they can also be
> > masked just like irqs). In case of a bus error, it may be
> > appropriate to just warn about it, or perhaps send a signal
> > to the current process, although in the latter case it should
> > have some means to distinguish it from a synchronous bus
> > error.
> > 
> > At least on the cortex-a8, a parity/ECC error (whether async
> > or not) is to be regarded as absolutely fatal.  Quoth the
> > TRM: "No recovery is possible. The abort handler must disable
> > the caches, communicate the fail directly with the external
> > system, request a reboot."
> > 
> > Bit 10 no longer indicates an asynchronous (let alone
> > imprecise) fault.  Apart from the debug events and async
> > aborts (and possibly some implementation-defined aborts), all
> > aborts listed are synchronous, and DFAR/IFAR is valid.
> > There's no technical obstruction to make these trappable via
> > the kernel exception handling mechanism. (Though at least in
> > case of parity/ECC errors one shouldn't.)
> 
> Tony, Nishanth, or somebody else... can you help with memory 
> management? Or do you know some expert for arch/arm/mm/ code?

Changing the abort handling should be discussed on the
linux-arm-kernel list. Probably best to play with that first
for a proof of concept patch :)

Regards,

Tony