All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception
@ 2009-03-13 22:20 Andrew Klossner
  2009-03-16  4:28 ` Liu Dave-R63238
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Klossner @ 2009-03-13 22:20 UTC (permalink / raw)
  To: u-boot

> In fact. u-boot has some support for these error events, check the
> cpu/mpc85xx/interrupts, 8548cds, 8544ds board file and drivers/pci
> /fsl-pci...c. beside they has not good framework.

Yes, that's a start.

The interrupts should be configured as critical so that the processor
doesn't hang when core_fault_in* is asserted while MSR[EE] == 0.

iivpr0, iidr0, and l2errinten need to be set up to take interrupts on
L2 cache errors.  We've seen perhaps a dozen such errors since we
started using these chips.

err_int_en is only set up for MPC8536DS, MPC8572DS, and MPC8610HPCD.
You can get memory-select errors on boards that don't have ECC.
Granted, it's the result of a software flaw, but you'd rather take an
exception than hang when that flaw occurs.

A dozen or so boards with E500 cores don't seem to enable the
necessary interrupts at all.

If all these points were addressed, then it would be safe not to set
HID1[RFXE].

> Linux kernel has the EDAC driver for 85xx platform in main tree,
> include most of error case.

CONFIG_EDAC_MPC85XX is turned off in the defconfig file for all real
boards, so they will hang if core_fault_in* is asserted while running
Linux.  mpc85xx_edac.c doesn't seem to use critical interrupts, so an
error while local_irq_disable() will hang.  It handles PCI, L2, and
memory errors, but not localbus or ECM.  It has the potential to be
developed into a complete solution, but it's not there today.

  -=- Andrew Klossner

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception
  2009-03-13 22:20 [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception Andrew Klossner
@ 2009-03-16  4:28 ` Liu Dave-R63238
  0 siblings, 0 replies; 6+ messages in thread
From: Liu Dave-R63238 @ 2009-03-16  4:28 UTC (permalink / raw)
  To: u-boot

Furthermore,
Your patch didn't set the RFXE bit correctly.
It should be
	lis	r0,HID1_RFXE at h
 	ori	r0,r0,(HID1_ASTME|HID1_ABE)@l	/* Addr streaming &
broadcast */
 	mtspr	HID1,r0

Thanks, Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception
  2009-03-12 15:15 Andrew Klossner
@ 2009-03-13  2:22 ` Liu Dave-R63238
  0 siblings, 0 replies; 6+ messages in thread
From: Liu Dave-R63238 @ 2009-03-13  2:22 UTC (permalink / raw)
  To: u-boot

> I agree that this is the better approach, and I would be delighted if
> somebody would write and submit that code for both u-boot and Linux.
> It's a few hundred lines and takes careful testing, as we found with
> our proprietary OS.

In fact. u-boot has some support for these error events, check the
cpu/mpc85xx/interrupts, 8548cds, 8544ds board file and drivers/pci
/fsl-pci...c. beside they has not good framework.

Linux kernel has the EDAC driver for 85xx platform in main tree, include
most of error case.

> In the meantime, both u-boot and Linux are susceptible to processor
> stalls.  This one-line patch replaces those stalls with exceptions.
> The machine check exception handler won't diagnose the problem (e.g.,
> it won't display "localbus parity error,") but at least we know that
> something happened and we have a register dump with which to begin
> analyzing the problem.

What is error you usually happened? L2 cache, ECM mapping, PCI[x,ex]
error
or local bus error? 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception
@ 2009-03-12 15:15 Andrew Klossner
  2009-03-13  2:22 ` Liu Dave-R63238
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Klossner @ 2009-03-12 15:15 UTC (permalink / raw)
  To: u-boot

> Why not use the interrupt mode?
> If you use that mode, it will get more valuable information. Currently,
> U-boot have a little bit error detection mechanism if you configure
> the interrupts and enable the error interrupts.

I agree that this is the better approach, and I would be delighted if
somebody would write and submit that code for both u-boot and Linux.
It's a few hundred lines and takes careful testing, as we found with
our proprietary OS.

In the meantime, both u-boot and Linux are susceptible to processor
stalls.  This one-line patch replaces those stalls with exceptions.
The machine check exception handler won't diagnose the problem (e.g.,
it won't display "localbus parity error,") but at least we know that
something happened and we have a register dump with which to begin
analyzing the problem.

  -=- Andrew Klossner

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception
  2009-03-11 18:36 Andrew Klossner
@ 2009-03-12  2:29 ` Liu Dave-R63238
  0 siblings, 0 replies; 6+ messages in thread
From: Liu Dave-R63238 @ 2009-03-12  2:29 UTC (permalink / raw)
  To: u-boot

> Set HID1[RFXE] = 1 in cpu/mpc85xx/start.S.  When this bit is 0, any
> condition that asserts the internal core_fault_in* signal will result
> in a processor hang, recoverable only with reset.  When this bit is 1,
> such a condition will cause a machine check exception and software
> will have a chance to print an error message.
> 
> Conditions that can assert core_fault_in* include ECM local access
> error (read an unmapped target address), multi-bit ECC error in L2
> cache or DDR RAM, localbus parity error, and a variety of PCI errors.
> 
> A long discussion of why this bit must be set can be found in, among
> other places, the "MPC8548E PowerQUICC III Integrated Processor Family
> Reference Manual" section 6.10.2, table 6-19 "HID1 Field
> Descriptions."  It says that leaving the bit 0 "is not a recommended
> configuration.  The processor may stall indefinitely due to an
> unreported error."
> 
> We have tested the use of this bit for two years, both in u-boot/Linux
> and in a proprietary operating system, in systems using MPC8541,
> MPC8545/8, and MPC8536.

Why not use the interrupt mode?
If you use that mode, it will get more valuable information. Currently,
U-boot have a little bit error detection mechanism if you configure
the interrupts and enable the error interrupts.

Thanks, Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception
@ 2009-03-11 18:36 Andrew Klossner
  2009-03-12  2:29 ` Liu Dave-R63238
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Klossner @ 2009-03-11 18:36 UTC (permalink / raw)
  To: u-boot

Set HID1[RFXE] = 1 in cpu/mpc85xx/start.S.  When this bit is 0, any
condition that asserts the internal core_fault_in* signal will result
in a processor hang, recoverable only with reset.  When this bit is 1,
such a condition will cause a machine check exception and software
will have a chance to print an error message.

Conditions that can assert core_fault_in* include ECM local access
error (read an unmapped target address), multi-bit ECC error in L2
cache or DDR RAM, localbus parity error, and a variety of PCI errors.

A long discussion of why this bit must be set can be found in, among
other places, the "MPC8548E PowerQUICC III Integrated Processor Family
Reference Manual" section 6.10.2, table 6-19 "HID1 Field
Descriptions."  It says that leaving the bit 0 "is not a recommended
configuration.  The processor may stall indefinitely due to an
unreported error."

We have tested the use of this bit for two years, both in u-boot/Linux
and in a proprietary operating system, in systems using MPC8541,
MPC8545/8, and MPC8536.

Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com>
---
 cpu/mpc85xx/start.S |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/cpu/mpc85xx/start.S b/cpu/mpc85xx/start.S
index 80f9677..8dfbc81 100644
--- a/cpu/mpc85xx/start.S
+++ b/cpu/mpc85xx/start.S
@@ -166,6 +166,7 @@ _start_e500:
 
 #ifndef CONFIG_E500MC
 	li	r0,(HID1_ASTME|HID1_ABE)@l	/* Addr streaming & broadcast */
+	ori	r0,r0,HID1_RFXE at h	/* Enable read fault exceptions */
 	mtspr	HID1,r0
 #endif
 
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-03-16  4:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-13 22:20 [U-Boot] [PATCH] PowerPC MPC85xx: don't hang on read exception Andrew Klossner
2009-03-16  4:28 ` Liu Dave-R63238
  -- strict thread matches above, loose matches on Subject: below --
2009-03-12 15:15 Andrew Klossner
2009-03-13  2:22 ` Liu Dave-R63238
2009-03-11 18:36 Andrew Klossner
2009-03-12  2:29 ` Liu Dave-R63238

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.