All of lore.kernel.org
 help / color / mirror / Atom feed
* imprecise abort behaviour
@ 2014-02-07 18:20 Ben Dooks
  2014-02-07 18:20 ` [PATCH] ARM: mm: add imprecise abort non-deadly handler Ben Dooks
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Dooks @ 2014-02-07 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

With the current discussion about enabling the external abort
handling, I would like to get the following patch reviewed to
change the behaviour of imprecise aborts to simply causing the
host to get a warning instead of a fatal signal.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] ARM: mm: add imprecise abort non-deadly handler
  2014-02-07 18:20 imprecise abort behaviour Ben Dooks
@ 2014-02-07 18:20 ` Ben Dooks
  2014-02-10 14:37   ` Dave Martin
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Dooks @ 2014-02-07 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

Given that imprecise aborts may be delivered after the action that
caused them (or even for non-cpu related activities such as bridge
faults from a bus-master) it is possible that the wrong process is
terminated as a result.

It is not know at this time in an SMP system which cores get notified
of an imprecise external abort, we have yet to find the right details
in the architecture reference manuals. This also means that killing
the process is probably the wrong thing to do on reception of these aborts.

Add a handler to take and print imprecise aborts and allow the process
to continue. This should ensure that the abort is shown but not kill
the process that was running on the cpu core at the time.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 arch/arm/mm/fault.c      | 15 +++++++++++++++
 arch/arm/mm/fsr-2level.c | 32 ++++++++++++++++----------------
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index eb8830a..c1fd5ba 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -512,6 +512,21 @@ do_bad(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	return 1;
 }
 
+/*
+ * On an imprecise external abort it is possible that the currently running
+ * process did not cause it (it could be from an external bus bridge or
+ * another device causing a fault on the bus).
+ *
+ * Always return handled as we do not know how to do it and killing the
+ * current process may not prevent future faults.
+ */
+static int
+do_bad_imprecise(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
+{
+	printk(KERN_INFO "Caught imprecise abort (0x%03x) %08lx", fsr, addr);
+	return 0;
+}
+
 struct fsr_info {
 	int	(*fn)(unsigned long addr, unsigned int fsr, struct pt_regs *regs);
 	int	sig;
diff --git a/arch/arm/mm/fsr-2level.c b/arch/arm/mm/fsr-2level.c
index 18ca74c..8f5ef60 100644
--- a/arch/arm/mm/fsr-2level.c
+++ b/arch/arm/mm/fsr-2level.c
@@ -24,22 +24,22 @@ static struct fsr_info fsr_info[] = {
 	 * 10 of the FSR, and may not be recoverable.  These are only
 	 * supported if the CPU abort handler supports bit 10.
 	 */
-	{ do_bad,		SIGBUS,  0,		"unknown 16"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 17"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 18"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 19"			   },
-	{ do_bad,		SIGBUS,  0,		"lock abort"			   }, /* xscale */
-	{ do_bad,		SIGBUS,  0,		"unknown 21"			   },
-	{ do_bad,		SIGBUS,  BUS_OBJERR,	"imprecise external abort"	   }, /* xscale */
-	{ do_bad,		SIGBUS,  0,		"unknown 23"			   },
-	{ do_bad,		SIGBUS,  0,		"dcache parity error"		   }, /* xscale */
-	{ do_bad,		SIGBUS,  0,		"unknown 25"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 26"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 27"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 28"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 29"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 30"			   },
-	{ do_bad,		SIGBUS,  0,		"unknown 31"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 16"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 17"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 18"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 19"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"lock abort"			   }, /* xscale */
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 21"			   },
+	{ do_bad_imprecise,	SIGBUS,  BUS_OBJERR,	"imprecise external abort"	   }, /* xscale */
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 23"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"dcache parity error"		   }, /* xscale */
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 25"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 26"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 27"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 28"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 29"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 30"			   },
+	{ do_bad_imprecise,	SIGBUS,  0,		"unknown 31"			   },
 };
 
 static struct fsr_info ifsr_info[] = {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] ARM: mm: add imprecise abort non-deadly handler
  2014-02-07 18:20 ` [PATCH] ARM: mm: add imprecise abort non-deadly handler Ben Dooks
@ 2014-02-10 14:37   ` Dave Martin
  2014-02-10 17:25     ` Ben Dooks
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Martin @ 2014-02-10 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 07, 2014 at 06:20:14PM +0000, Ben Dooks wrote:
> Given that imprecise aborts may be delivered after the action that
> caused them (or even for non-cpu related activities such as bridge
> faults from a bus-master) it is possible that the wrong process is
> terminated as a result.
> 
> It is not know at this time in an SMP system which cores get notified
> of an imprecise external abort, we have yet to find the right details
> in the architecture reference manuals. This also means that killing
> the process is probably the wrong thing to do on reception of these aborts.
> 
> Add a handler to take and print imprecise aborts and allow the process
> to continue. This should ensure that the abort is shown but not kill
> the process that was running on the cpu core at the time.

Not treating these as thread-specific faults seems correct, since we
never have a way to map these aborts back to the culprit ... except that
there is a likelihood the culprit is still running when the abort fires.


"Spurious" imprecise aborts pretty much always indicate a hardware error
or a nasty bug somewhere.

Another cause is badly implemented, buggy or malicious userspace software
being given more exotic mmaps that it is qualified to deal with
responsibly.  That's a nasty bug in the distro maintainer / system
administrator / vendor.

So, I think this should be at least KERN_ERROR; maybe KERN_CRIT or above.
We must not encourage people to think that these aborts are somehow
benign.

If we really want people to fix their bugs, it may be worth considering
panic(), or doing this when some threshold is reached.  This may be a
bit harsh though, at least without some threshold.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] ARM: mm: add imprecise abort non-deadly handler
  2014-02-10 14:37   ` Dave Martin
@ 2014-02-10 17:25     ` Ben Dooks
  2014-02-11 15:43       ` Dave Martin
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Dooks @ 2014-02-10 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/14 14:37, Dave Martin wrote:
> On Fri, Feb 07, 2014 at 06:20:14PM +0000, Ben Dooks wrote:
>> Given that imprecise aborts may be delivered after the action that
>> caused them (or even for non-cpu related activities such as bridge
>> faults from a bus-master) it is possible that the wrong process is
>> terminated as a result.
>>
>> It is not know at this time in an SMP system which cores get notified
>> of an imprecise external abort, we have yet to find the right details
>> in the architecture reference manuals. This also means that killing
>> the process is probably the wrong thing to do on reception of these aborts.
>>
>> Add a handler to take and print imprecise aborts and allow the process
>> to continue. This should ensure that the abort is shown but not kill
>> the process that was running on the cpu core at the time.
>
> Not treating these as thread-specific faults seems correct, since we
> never have a way to map these aborts back to the culprit ... except that
> there is a likelihood the culprit is still running when the abort fires.
>
>
> "Spurious" imprecise aborts pretty much always indicate a hardware error
> or a nasty bug somewhere.

I need to find out where the one we are catching is coming from in our
system.

> Another cause is badly implemented, buggy or malicious userspace software
> being given more exotic mmaps that it is qualified to deal with
> responsibly.  That's a nasty bug in the distro maintainer / system
> administrator / vendor.
>
> So, I think this should be at least KERN_ERROR; maybe KERN_CRIT or above.
> We must not encourage people to think that these aborts are somehow
> benign.

Ok, KERN_ERROR or KERN_CRIT sound reasonable.

> If we really want people to fix their bugs, it may be worth considering
> panic(), or doing this when some threshold is reached.  This may be a
> bit harsh though, at least without some threshold.

I was considering also firing a WARN_ON(abort_count++ > 10) or something
similar.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] ARM: mm: add imprecise abort non-deadly handler
  2014-02-10 17:25     ` Ben Dooks
@ 2014-02-11 15:43       ` Dave Martin
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Martin @ 2014-02-11 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 10, 2014 at 05:25:15PM +0000, Ben Dooks wrote:
> On 10/02/14 14:37, Dave Martin wrote:

[...]

> >"Spurious" imprecise aborts pretty much always indicate a hardware error
> >or a nasty bug somewhere.
> 
> I need to find out where the one we are catching is coming from in our
> system.

Would certainly be good to know...

> >Another cause is badly implemented, buggy or malicious userspace software
> >being given more exotic mmaps that it is qualified to deal with
> >responsibly.  That's a nasty bug in the distro maintainer / system
> >administrator / vendor.
> >
> >So, I think this should be at least KERN_ERROR; maybe KERN_CRIT or above.
> >We must not encourage people to think that these aborts are somehow
> >benign.
> 
> Ok, KERN_ERROR or KERN_CRIT sound reasonable.
> 
> >If we really want people to fix their bugs, it may be worth considering
> >panic(), or doing this when some threshold is reached.  This may be a
> >bit harsh though, at least without some threshold.
> 
> I was considering also firing a WARN_ON(abort_count++ > 10) or something
> similar.

Maybe WARN_ON the first such abort, and ratelimit somehow after?

Being noisy is good -- it's just killing a random, possibly-innocent task
that's not so good.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-11 15:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-07 18:20 imprecise abort behaviour Ben Dooks
2014-02-07 18:20 ` [PATCH] ARM: mm: add imprecise abort non-deadly handler Ben Dooks
2014-02-10 14:37   ` Dave Martin
2014-02-10 17:25     ` Ben Dooks
2014-02-11 15:43       ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.