All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs
@ 2009-08-17 23:00 Paul Mackerras
  2009-08-17 23:00 ` [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time Paul Mackerras
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Paul Mackerras @ 2009-08-17 23:00 UTC (permalink / raw)
  To: linuxppc-dev

On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two
32-bit halves.  On SMP we write the higher-order half and then the
lower-order half, with a write barrier between the two halves, but on
UP there was no particular ordering of the writes to the two halves.

This extends the ordering that we already do on SMP to the UP case as
well.  The reason is that with the perf_counter subsystem potentially
accessing user memory at interrupt time to get stack traces, we have
to be careful not to create an incorrect but apparently valid PTE even
on UP.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/pgtable.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index eb17da7..2a5da06 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -104,8 +104,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	else
 		pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte));
 
-#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT) && defined(CONFIG_SMP)
-	/* Second case is 32-bit with 64-bit PTE in SMP mode. In this case, we
+#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
+	/* Second case is 32-bit with 64-bit PTE.  In this case, we
 	 * can just store as long as we do the two halves in the right order
 	 * with a barrier in between. This is possible because we take care,
 	 * in the hash code, to pre-invalidate if the PTE was already hashed,
@@ -140,7 +140,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 
 #else
 	/* Anything else just stores the PTE normally. That covers all 64-bit
-	 * cases, and 32-bit non-hash with 64-bit PTEs in UP mode
+	 * cases, and 32-bit non-hash with 32-bit PTEs.
 	 */
 	*ptep = pte;
 #endif
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time
  2009-08-17 23:00 [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Paul Mackerras
@ 2009-08-17 23:00 ` Paul Mackerras
  2009-08-18  4:24   ` Benjamin Herrenschmidt
  2009-08-17 23:01 ` [PATCH 3/3 v3] perf_counter: powerpc: Add callchain support Paul Mackerras
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Paul Mackerras @ 2009-08-17 23:00 UTC (permalink / raw)
  To: linuxppc-dev

This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine.  Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor.  This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table.  32-bit processors are already able to access user memory
at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.

On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca.  This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields.  To prevent this, we hard-disable
interrupts in switch_slb.  Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.

This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.

Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.

If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism.  An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/asm-offsets.c    |    2 +
 arch/powerpc/kernel/exceptions-64s.S |   19 +++++++++++++++++
 arch/powerpc/mm/slb.c                |   37 +++++++++++++++++++++++----------
 arch/powerpc/mm/stab.c               |   11 +++++++++-
 4 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 561b646..197b156 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -67,6 +67,8 @@ int main(void)
 	DEFINE(MMCONTEXTID, offsetof(struct mm_struct, context.id));
 #ifdef CONFIG_PPC64
 	DEFINE(AUDITCONTEXT, offsetof(struct task_struct, audit_context));
+	DEFINE(SIGSEGV, SIGSEGV);
+	DEFINE(NMI_MASK, NMI_MASK);
 #else
 	DEFINE(THREAD_INFO, offsetof(struct task_struct, stack));
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index eb89811..8ac85e0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -729,6 +729,11 @@ BEGIN_FTR_SECTION
 	bne-	do_ste_alloc		/* If so handle it */
 END_FTR_SECTION_IFCLR(CPU_FTR_SLB)
 
+	clrrdi	r11,r1,THREAD_SHIFT
+	lwz	r0,TI_PREEMPT(r11)	/* If we're in an "NMI" */
+	andis.	r0,r0,NMI_MASK@h	/* (i.e. an irq when soft-disabled) */
+	bne	77f			/* then don't call hash_page now */
+
 	/*
 	 * On iSeries, we soft-disable interrupts here, then
 	 * hard-enable interrupts so that the hash_page code can spin on
@@ -833,6 +838,20 @@ handle_page_fault:
 	bl	.low_hash_fault
 	b	.ret_from_except
 
+/*
+ * We come here as a result of a DSI at a point where we don't want
+ * to call hash_page, such as when we are accessing memory (possibly
+ * user memory) inside a PMU interrupt that occurred while interrupts
+ * were soft-disabled.  We want to invoke the exception handler for
+ * the access, or panic if there isn't a handler.
+ */
+77:	bl	.save_nvgprs
+	mr	r4,r3
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	li	r5,SIGSEGV
+	bl	.bad_page_fault
+	b	.ret_from_except
+
 	/* here we have a segment miss */
 do_ste_alloc:
 	bl	.ste_allocate		/* try to insert stab entry */
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 5b7038f..a685652 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -92,15 +92,13 @@ static inline void create_shadowed_slbe(unsigned long ea, int ssize,
 		     : "memory" );
 }
 
-void slb_flush_and_rebolt(void)
+static void __slb_flush_and_rebolt(void)
 {
 	/* If you change this make sure you change SLB_NUM_BOLTED
 	 * appropriately too. */
 	unsigned long linear_llp, vmalloc_llp, lflags, vflags;
 	unsigned long ksp_esid_data, ksp_vsid_data;
 
-	WARN_ON(!irqs_disabled());
-
 	linear_llp = mmu_psize_defs[mmu_linear_psize].sllp;
 	vmalloc_llp = mmu_psize_defs[mmu_vmalloc_psize].sllp;
 	lflags = SLB_VSID_KERNEL | linear_llp;
@@ -117,12 +115,6 @@ void slb_flush_and_rebolt(void)
 		ksp_vsid_data = get_slb_shadow()->save_area[2].vsid;
 	}
 
-	/*
-	 * We can't take a PMU exception in the following code, so hard
-	 * disable interrupts.
-	 */
-	hard_irq_disable();
-
 	/* We need to do this all in asm, so we're sure we don't touch
 	 * the stack between the slbia and rebolting it. */
 	asm volatile("isync\n"
@@ -139,6 +131,21 @@ void slb_flush_and_rebolt(void)
 		     : "memory");
 }
 
+void slb_flush_and_rebolt(void)
+{
+
+	WARN_ON(!irqs_disabled());
+
+	/*
+	 * We can't take a PMU exception in the following code, so hard
+	 * disable interrupts.
+	 */
+	hard_irq_disable();
+
+	__slb_flush_and_rebolt();
+	get_paca()->slb_cache_ptr = 0;
+}
+
 void slb_vmalloc_update(void)
 {
 	unsigned long vflags;
@@ -180,12 +187,20 @@ static inline int esids_match(unsigned long addr1, unsigned long addr2)
 /* Flush all user entries from the segment table of the current processor. */
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
-	unsigned long offset = get_paca()->slb_cache_ptr;
+	unsigned long offset;
 	unsigned long slbie_data = 0;
 	unsigned long pc = KSTK_EIP(tsk);
 	unsigned long stack = KSTK_ESP(tsk);
 	unsigned long unmapped_base;
 
+	/*
+	 * We need interrupts hard-disabled here, not just soft-disabled,
+	 * so that a PMU interrupt can't occur, which might try to access
+	 * user memory (to get a stack trace) and possible cause an SLB miss
+	 * which would update the slb_cache/slb_cache_ptr fields in the PACA.
+	 */
+	hard_irq_disable();
+	offset = get_paca()->slb_cache_ptr;
 	if (!cpu_has_feature(CPU_FTR_NO_SLBIE_B) &&
 	    offset <= SLB_CACHE_ENTRIES) {
 		int i;
@@ -200,7 +215,7 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 		}
 		asm volatile("isync" : : : "memory");
 	} else {
-		slb_flush_and_rebolt();
+		__slb_flush_and_rebolt();
 	}
 
 	/* Workaround POWER5 < DD2.1 issue */
diff --git a/arch/powerpc/mm/stab.c b/arch/powerpc/mm/stab.c
index 98cd1dc..ab5fb48 100644
--- a/arch/powerpc/mm/stab.c
+++ b/arch/powerpc/mm/stab.c
@@ -164,7 +164,7 @@ void switch_stab(struct task_struct *tsk, struct mm_struct *mm)
 {
 	struct stab_entry *stab = (struct stab_entry *) get_paca()->stab_addr;
 	struct stab_entry *ste;
-	unsigned long offset = __get_cpu_var(stab_cache_ptr);
+	unsigned long offset;
 	unsigned long pc = KSTK_EIP(tsk);
 	unsigned long stack = KSTK_ESP(tsk);
 	unsigned long unmapped_base;
@@ -172,6 +172,15 @@ void switch_stab(struct task_struct *tsk, struct mm_struct *mm)
 	/* Force previous translations to complete. DRENG */
 	asm volatile("isync" : : : "memory");
 
+	/*
+	 * We need interrupts hard-disabled here, not just soft-disabled,
+	 * so that a PMU interrupt can't occur, which might try to access
+	 * user memory (to get a stack trace) and possible cause an STAB miss
+	 * which would update the stab_cache/stab_cache_ptr per-cpu variables.
+	 */
+	hard_irq_disable();
+
+	offset = __get_cpu_var(stab_cache_ptr);
 	if (offset <= NR_STAB_CACHE_ENTRIES) {
 		int i;
 
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/3 v3] perf_counter: powerpc: Add callchain support
  2009-08-17 23:00 [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Paul Mackerras
  2009-08-17 23:00 ` [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time Paul Mackerras
@ 2009-08-17 23:01 ` Paul Mackerras
  2009-08-18  4:23   ` Benjamin Herrenschmidt
  2009-08-18  0:00 ` [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Kumar Gala
  2009-08-18  4:24 ` Benjamin Herrenschmidt
  3 siblings, 1 reply; 33+ messages in thread
From: Paul Mackerras @ 2009-08-17 23:01 UTC (permalink / raw)
  To: linuxppc-dev

This adds support for tracing callchains for powerpc, both 32-bit
and 64-bit, and both in the kernel and userspace, from PMU interrupt
context.

The first three entries stored for each callchain are the NIP (next
instruction pointer), LR (link register), and the contents of the LR
save area in the second stack frame (the first is ignored because the
ABI convention on powerpc is that functions save their return address
in their caller's stack frame).  Because leaf functions don't have to
save their return address (LR value) and don't have to establish a
stack frame, it's possible for either or both of LR and the second
stack frame's LR save area to have valid return addresses in them.
This is basically impossible to disambiguate without either reading
the code or looking at auxiliary information such as CFI tables.
Since we don't want to do either of those things at interrupt time,
we store both LR and the second stack frame's LR save area.

Once we get past the second stack frame, there is no ambiguity; all
return addresses we get are reliable.

For kernel traces, we check whether they are valid kernel instruction
addresses and store zero instead if they are not (rather than
omitting them, which would make it impossible for userspace to know
which was which).  We also store zero instead of the second stack
frame's LR save area value if it is the same as LR.

For kernel traces, we check for interrupt frames, and for user traces,
we check for signal frames.  In each case, since we're starting a new
trace, we store a PERF_CONTEXT_KERNEL/USER marker so that userspace
knows that the next three entries are NIP, LR and the second stack fram=
e
for the interrupted context.

We read user memory with __get_user_inatomic.  On 64-bit, if this
PMU interrupt occurred while interrupts are soft-disabled, and
there is no MMU hash table entry for the page, we will get an
-EFAULT return from __get_user_inatomic even if there is a valid
Linux PTE for the page, since hash_page isn't reentrant.  Thus we
have code here to read the Linux PTE and access the page via the
kernel linear mapping.  Since 64-bit doesn't use (or need) highmem
there is no need to do kmap_atomic.  On 32-bit, we don't do soft
interrupt disabling, so this complication doesn't occur and there
is no need to fall back to reading the Linux PTE, since hash_page
(or the TLB miss handler) will get called automatically if necessary.

Note that we cannot get PMU interrupts in the interval during
context switch between switch_mm (which switches the user address
space) and switch_to (which actually changes current to the new
process).  On 64-bit this is because interrupts are hard-disabled
in switch_mm and stay hard-disabled until they are soft-enabled
later, after switch_to has returned.  So there is no possibility
of trying to do a user stack trace when the user address space is
not current's address space.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/Makefile         |    2 +-
 arch/powerpc/kernel/perf_callchain.c |  527 ++++++++++++++++++++++++++=
++++++++
 2 files changed, 528 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/kernel/perf_callchain.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefil=
e
index b73396b..9619285 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -97,7 +97,7 @@ obj64-$(CONFIG_AUDIT)=09=09+=3D compat_audit.o
=20
 obj-$(CONFIG_DYNAMIC_FTRACE)=09+=3D ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER)=09+=3D ftrace.o
-obj-$(CONFIG_PPC_PERF_CTRS)=09+=3D perf_counter.o
+obj-$(CONFIG_PPC_PERF_CTRS)=09+=3D perf_counter.o perf_callchain.o
 obj64-$(CONFIG_PPC_PERF_CTRS)=09+=3D power4-pmu.o ppc970-pmu.o power5-=
pmu.o \
 =09=09=09=09   power5+-pmu.o power6-pmu.o power7-pmu.o
 obj32-$(CONFIG_PPC_PERF_CTRS)=09+=3D mpc7450-pmu.o
diff --git a/arch/powerpc/kernel/perf_callchain.c b/arch/powerpc/kernel=
/perf_callchain.c
new file mode 100644
index 0000000..f74b62c
--- /dev/null
+++ b/arch/powerpc/kernel/perf_callchain.c
@@ -0,0 +1,527 @@
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright =A9 2009 Paul Mackerras, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_counter.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#ifdef CONFIG_PPC64
+#include "ppc32.h"
+#endif
+
+/*
+ * Store another value in a callchain_entry.
+ */
+static inline void callchain_store(struct perf_callchain_entry *entry,=
 u64 ip)
+{
+=09unsigned int nr =3D entry->nr;
+
+=09if (nr < PERF_MAX_STACK_DEPTH) {
+=09=09entry->ip[nr] =3D ip;
+=09=09entry->nr =3D nr + 1;
+=09}
+}
+
+/*
+ * Is sp valid as the address of the next kernel stack frame after pre=
v_sp?
+ * The next frame may be in a different stack area but should not go
+ * back down in the same stack area.
+ */
+static int valid_next_sp(unsigned long sp, unsigned long prev_sp)
+{
+=09if (sp & 0xf)
+=09=09return 0;=09=09/* must be 16-byte aligned */
+=09if (!validate_sp(sp, current, STACK_FRAME_OVERHEAD))
+=09=09return 0;
+=09if (sp >=3D prev_sp + STACK_FRAME_OVERHEAD)
+=09=09return 1;
+=09/*
+=09 * sp could decrease when we jump off an interrupt stack
+=09 * back to the regular process stack.
+=09 */
+=09if ((sp & ~(THREAD_SIZE - 1)) !=3D (prev_sp & ~(THREAD_SIZE - 1)))
+=09=09return 1;
+=09return 0;
+}
+
+static void perf_callchain_kernel(struct pt_regs *regs,
+=09=09=09=09  struct perf_callchain_entry *entry)
+{
+=09unsigned long sp, next_sp;
+=09unsigned long next_ip;
+=09unsigned long lr;
+=09long level =3D 0;
+=09unsigned long *fp;
+
+=09lr =3D regs->link;
+=09sp =3D regs->gpr[1];
+=09callchain_store(entry, PERF_CONTEXT_KERNEL);
+=09callchain_store(entry, regs->nip);
+
+=09if (!validate_sp(sp, current, STACK_FRAME_OVERHEAD))
+=09=09return;
+
+=09for (;;) {
+=09=09fp =3D (unsigned long *) sp;
+=09=09next_sp =3D fp[0];
+
+=09=09if (next_sp =3D=3D sp + STACK_INT_FRAME_SIZE &&
+=09=09    fp[STACK_FRAME_MARKER] =3D=3D STACK_FRAME_REGS_MARKER) {
+=09=09=09/*
+=09=09=09 * This looks like an interrupt frame for an
+=09=09=09 * interrupt that occurred in the kernel
+=09=09=09 */
+=09=09=09regs =3D (struct pt_regs *)(sp + STACK_FRAME_OVERHEAD);
+=09=09=09next_ip =3D regs->nip;
+=09=09=09lr =3D regs->link;
+=09=09=09level =3D 0;
+=09=09=09callchain_store(entry, PERF_CONTEXT_KERNEL);
+
+=09=09} else {
+=09=09=09if (level =3D=3D 0)
+=09=09=09=09next_ip =3D lr;
+=09=09=09else
+=09=09=09=09next_ip =3D fp[STACK_FRAME_LR_SAVE];
+
+=09=09=09/*
+=09=09=09 * We can't tell which of the first two addresses
+=09=09=09 * we get are valid, but we can filter out the
+=09=09=09 * obviously bogus ones here.  We replace them
+=09=09=09 * with 0 rather than removing them entirely so
+=09=09=09 * that userspace can tell which is which.
+=09=09=09 */
+=09=09=09if ((level =3D=3D 1 && next_ip =3D=3D lr) ||
+=09=09=09    (level <=3D 1 && !kernel_text_address(next_ip)))
+=09=09=09=09next_ip =3D 0;
+
+=09=09=09++level;
+=09=09}
+
+=09=09callchain_store(entry, next_ip);
+=09=09if (!valid_next_sp(next_sp, sp))
+=09=09=09return;
+=09=09sp =3D next_sp;
+=09}
+}
+
+#ifdef CONFIG_PPC64
+
+#ifdef CONFIG_HUGETLB_PAGE
+#define is_huge_psize(pagesize)=09(HPAGE_SHIFT && mmu_huge_psizes[page=
size])
+#else
+#define is_huge_psize(pagesize)=090
+#endif
+
+/*
+ * On 64-bit we don't want to invoke hash_page on user addresses from
+ * interrupt context, so if the access faults, we read the page tables=

+ * to find which page (if any) is mapped and access it directly.
+ */
+static int read_user_stack_slow(void __user *ptr, void *ret, int nb)
+{
+=09pgd_t *pgdir;
+=09pte_t *ptep, pte;
+=09int pagesize;
+=09unsigned long addr =3D (unsigned long) ptr;
+=09unsigned long offset;
+=09unsigned long pfn;
+=09void *kaddr;
+
+=09pgdir =3D current->mm->pgd;
+=09if (!pgdir)
+=09=09return -EFAULT;
+
+=09pagesize =3D get_slice_psize(current->mm, addr);
+
+=09/* align address to page boundary */
+=09offset =3D addr & ((1ul << mmu_psize_defs[pagesize].shift) - 1);
+=09addr -=3D offset;
+
+=09if (is_huge_psize(pagesize))
+=09=09ptep =3D huge_pte_offset(current->mm, addr);
+=09else
+=09=09ptep =3D find_linux_pte(pgdir, addr);
+
+=09if (ptep =3D=3D NULL)
+=09=09return -EFAULT;
+=09pte =3D *ptep;
+=09if (!pte_present(pte) || !(pte_val(pte) & _PAGE_USER))
+=09=09return -EFAULT;
+=09pfn =3D pte_pfn(pte);
+=09if (!page_is_ram(pfn))
+=09=09return -EFAULT;
+
+=09/* no highmem to worry about here */
+=09kaddr =3D pfn_to_kaddr(pfn);
+=09memcpy(ret, kaddr + offset, nb);
+=09return 0;
+}
+
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long=
 *ret)
+{
+=09if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
+=09    ((unsigned long)ptr & 7))
+=09=09return -EFAULT;
+
+=09if (!__get_user_inatomic(*ret, ptr))
+=09=09return 0;
+
+=09return read_user_stack_slow(ptr, ret, 8);
+}
+
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *=
ret)
+{
+=09if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+=09    ((unsigned long)ptr & 3))
+=09=09return -EFAULT;
+
+=09if (!__get_user_inatomic(*ret, ptr))
+=09=09return 0;
+
+=09return read_user_stack_slow(ptr, ret, 4);
+}
+
+static inline int valid_user_sp(unsigned long sp, int is_64)
+{
+=09if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 3=
2)
+=09=09return 0;
+=09return 1;
+}
+
+/*
+ * 64-bit user processes use the same stack frame for RT and non-RT si=
gnals.
+ */
+struct signal_frame_64 {
+=09char=09=09dummy[__SIGNAL_FRAMESIZE];
+=09struct ucontext=09uc;
+=09unsigned long=09unused[2];
+=09unsigned int=09tramp[6];
+=09struct siginfo=09*pinfo;
+=09void=09=09*puc;
+=09struct siginfo=09info;
+=09char=09=09abigap[288];
+};
+
+static int is_sigreturn_64_address(unsigned long nip, unsigned long fp=
)
+{
+=09if (nip =3D=3D fp + offsetof(struct signal_frame_64, tramp))
+=09=09return 1;
+=09if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
+=09    nip =3D=3D current->mm->context.vdso_base + vdso64_rt_sigtramp)=

+=09=09return 1;
+=09return 0;
+}
+
+/*
+ * Do some sanity checking on the signal frame pointed to by sp.
+ * We check the pinfo and puc pointers in the frame.
+ */
+static int sane_signal_64_frame(unsigned long sp)
+{
+=09struct signal_frame_64 __user *sf;
+=09unsigned long pinfo, puc;
+
+=09sf =3D (struct signal_frame_64 __user *) sp;
+=09if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo)=
 ||
+=09    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
+=09=09return 0;
+=09return pinfo =3D=3D (unsigned long) &sf->info &&
+=09=09puc =3D=3D (unsigned long) &sf->uc;
+}
+
+static void perf_callchain_user_64(struct pt_regs *regs,
+=09=09=09=09   struct perf_callchain_entry *entry)
+{
+=09unsigned long sp, next_sp;
+=09unsigned long next_ip;
+=09unsigned long lr;
+=09long level =3D 0;
+=09struct signal_frame_64 __user *sigframe;
+=09unsigned long __user *fp, *uregs;
+
+=09next_ip =3D regs->nip;
+=09lr =3D regs->link;
+=09sp =3D regs->gpr[1];
+=09callchain_store(entry, PERF_CONTEXT_USER);
+=09callchain_store(entry, next_ip);
+
+=09for (;;) {
+=09=09fp =3D (unsigned long __user *) sp;
+=09=09if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
+=09=09=09return;
+=09=09if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
+=09=09=09return;
+
+=09=09/*
+=09=09 * Note: the next_sp - sp >=3D signal frame size check
+=09=09 * is true when next_sp < sp, which can happen when
+=09=09 * transitioning from an alternate signal stack to the
+=09=09 * normal stack.
+=09=09 */
+=09=09if (next_sp - sp >=3D sizeof(struct signal_frame_64) &&
+=09=09    (is_sigreturn_64_address(next_ip, sp) ||
+=09=09     (level <=3D 1 && is_sigreturn_64_address(lr, sp))) &&
+=09=09    sane_signal_64_frame(sp)) {
+=09=09=09/*
+=09=09=09 * This looks like an signal frame
+=09=09=09 */
+=09=09=09sigframe =3D (struct signal_frame_64 __user *) sp;
+=09=09=09uregs =3D sigframe->uc.uc_mcontext.gp_regs;
+=09=09=09if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
+=09=09=09    read_user_stack_64(&uregs[PT_LNK], &lr) ||
+=09=09=09    read_user_stack_64(&uregs[PT_R1], &sp))
+=09=09=09=09return;
+=09=09=09level =3D 0;
+=09=09=09callchain_store(entry, PERF_CONTEXT_USER);
+=09=09=09callchain_store(entry, next_ip);
+=09=09=09continue;
+=09=09}
+
+=09=09if (level =3D=3D 0)
+=09=09=09next_ip =3D lr;
+=09=09callchain_store(entry, next_ip);
+=09=09++level;
+=09=09sp =3D next_sp;
+=09}
+}
+
+static inline int current_is_64bit(void)
+{
+=09/*
+=09 * We can't use test_thread_flag() here because we may be on an
+=09 * interrupt stack, and the thread flags don't get copied over
+=09 * from the thread_info on the main stack to the interrupt stack.
+=09 */
+=09return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
+}
+
+#else  /* CONFIG_PPC64 */
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *=
ret)
+{
+=09if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+=09    ((unsigned long)ptr & 3))
+=09=09return -EFAULT;
+
+=09return __get_user_inatomic(*ret, ptr);
+}
+
+static inline void perf_callchain_user_64(struct pt_regs *regs,
+=09=09=09=09=09  struct perf_callchain_entry *entry)
+{
+}
+
+static inline int current_is_64bit(void)
+{
+=09return 0;
+}
+
+static inline int valid_user_sp(unsigned long sp, int is_64)
+{
+=09if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
+=09=09return 0;
+=09return 1;
+}
+
+#define __SIGNAL_FRAMESIZE32=09__SIGNAL_FRAMESIZE
+#define sigcontext32=09=09sigcontext
+#define mcontext32=09=09mcontext
+#define ucontext32=09=09ucontext
+#define compat_siginfo_t=09struct siginfo
+
+#endif /* CONFIG_PPC64 */
+
+/*
+ * Layout for non-RT signal frames
+ */
+struct signal_frame_32 {
+=09char=09=09=09dummy[__SIGNAL_FRAMESIZE32];
+=09struct sigcontext32=09sctx;
+=09struct mcontext32=09mctx;
+=09int=09=09=09abigap[56];
+};
+
+/*
+ * Layout for RT signal frames
+ */
+struct rt_signal_frame_32 {
+=09char=09=09=09dummy[__SIGNAL_FRAMESIZE32 + 16];
+=09compat_siginfo_t=09info;
+=09struct ucontext32=09uc;
+=09int=09=09=09abigap[56];
+};
+
+static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+=09if (nip =3D=3D fp + offsetof(struct signal_frame_32, mctx.mc_pad))
+=09=09return 1;
+=09if (vdso32_sigtramp && current->mm->context.vdso_base &&
+=09    nip =3D=3D current->mm->context.vdso_base + vdso32_sigtramp)
+=09=09return 1;
+=09return 0;
+}
+
+static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int f=
p)
+{
+=09if (nip =3D=3D fp + offsetof(struct rt_signal_frame_32,
+=09=09=09=09 uc.uc_mcontext.mc_pad))
+=09=09return 1;
+=09if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
+=09    nip =3D=3D current->mm->context.vdso_base + vdso32_rt_sigtramp)=

+=09=09return 1;
+=09return 0;
+}
+
+static int sane_signal_32_frame(unsigned int sp)
+{
+=09struct signal_frame_32 __user *sf;
+=09unsigned int regs;
+
+=09sf =3D (struct signal_frame_32 __user *) (unsigned long) sp;
+=09if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &reg=
s))
+=09=09return 0;
+=09return regs =3D=3D (unsigned long) &sf->mctx;
+}
+
+static int sane_rt_signal_32_frame(unsigned int sp)
+{
+=09struct rt_signal_frame_32 __user *sf;
+=09unsigned int regs;
+
+=09sf =3D (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+=09if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &re=
gs))
+=09=09return 0;
+=09return regs =3D=3D (unsigned long) &sf->uc.uc_mcontext;
+}
+
+static unsigned int __user *signal_frame_32_regs(unsigned int sp,
+=09=09=09=09unsigned int next_sp, unsigned int next_ip)
+{
+=09struct mcontext32 __user *mctx =3D NULL;
+=09struct signal_frame_32 __user *sf;
+=09struct rt_signal_frame_32 __user *rt_sf;
+
+=09/*
+=09 * Note: the next_sp - sp >=3D signal frame size check
+=09 * is true when next_sp < sp, for example, when
+=09 * transitioning from an alternate signal stack to the
+=09 * normal stack.
+=09 */
+=09if (next_sp - sp >=3D sizeof(struct signal_frame_32) &&
+=09    is_sigreturn_32_address(next_ip, sp) &&
+=09    sane_signal_32_frame(sp)) {
+=09=09sf =3D (struct signal_frame_32 __user *) (unsigned long) sp;
+=09=09mctx =3D &sf->mctx;
+=09}
+
+=09if (!mctx && next_sp - sp >=3D sizeof(struct rt_signal_frame_32) &&=

+=09    is_rt_sigreturn_32_address(next_ip, sp) &&
+=09    sane_rt_signal_32_frame(sp)) {
+=09=09rt_sf =3D (struct rt_signal_frame_32 __user *) (unsigned long) s=
p;
+=09=09mctx =3D &rt_sf->uc.uc_mcontext;
+=09}
+
+=09if (!mctx)
+=09=09return NULL;
+=09return mctx->mc_gregs;
+}
+
+static void perf_callchain_user_32(struct pt_regs *regs,
+=09=09=09=09   struct perf_callchain_entry *entry)
+{
+=09unsigned int sp, next_sp;
+=09unsigned int next_ip;
+=09unsigned int lr;
+=09long level =3D 0;
+=09unsigned int __user *fp, *uregs;
+
+=09next_ip =3D regs->nip;
+=09lr =3D regs->link;
+=09sp =3D regs->gpr[1];
+=09callchain_store(entry, PERF_CONTEXT_USER);
+=09callchain_store(entry, next_ip);
+
+=09while (entry->nr < PERF_MAX_STACK_DEPTH) {
+=09=09fp =3D (unsigned int __user *) (unsigned long) sp;
+=09=09if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
+=09=09=09return;
+=09=09if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
+=09=09=09return;
+
+=09=09uregs =3D signal_frame_32_regs(sp, next_sp, next_ip);
+=09=09if (!uregs && level <=3D 1)
+=09=09=09uregs =3D signal_frame_32_regs(sp, next_sp, lr);
+=09=09if (uregs) {
+=09=09=09/*
+=09=09=09 * This looks like an signal frame, so restart
+=09=09=09 * the stack trace with the values in it.
+=09=09=09 */
+=09=09=09if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
+=09=09=09    read_user_stack_32(&uregs[PT_LNK], &lr) ||
+=09=09=09    read_user_stack_32(&uregs[PT_R1], &sp))
+=09=09=09=09return;
+=09=09=09level =3D 0;
+=09=09=09callchain_store(entry, PERF_CONTEXT_USER);
+=09=09=09callchain_store(entry, next_ip);
+=09=09=09continue;
+=09=09}
+
+=09=09if (level =3D=3D 0)
+=09=09=09next_ip =3D lr;
+=09=09callchain_store(entry, next_ip);
+=09=09++level;
+=09=09sp =3D next_sp;
+=09}
+}
+
+/*
+ * Since we can't get PMU interrupts inside a PMU interrupt handler,
+ * we don't need separate irq and nmi entries here.
+ */
+static DEFINE_PER_CPU(struct perf_callchain_entry, callchain);
+
+struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+{
+=09struct perf_callchain_entry *entry =3D &__get_cpu_var(callchain);
+
+=09entry->nr =3D 0;
+
+=09if (current->pid =3D=3D 0)=09=09/* idle task? */
+=09=09return entry;
+
+=09if (!user_mode(regs)) {
+=09=09perf_callchain_kernel(regs, entry);
+=09=09if (current->mm)
+=09=09=09regs =3D task_pt_regs(current);
+=09=09else
+=09=09=09regs =3D NULL;
+=09}
+
+=09if (regs) {
+=09=09if (current_is_64bit())
+=09=09=09perf_callchain_user_64(regs, entry);
+=09=09else
+=09=09=09perf_callchain_user_32(regs, entry);
+=09}
+
+=09return entry;
+}
--=20
1.6.0.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs
  2009-08-17 23:00 [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Paul Mackerras
  2009-08-17 23:00 ` [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time Paul Mackerras
  2009-08-17 23:01 ` [PATCH 3/3 v3] perf_counter: powerpc: Add callchain support Paul Mackerras
@ 2009-08-18  0:00 ` Kumar Gala
  2009-08-18  0:14   ` Paul Mackerras
  2009-08-18  4:24 ` Benjamin Herrenschmidt
  3 siblings, 1 reply; 33+ messages in thread
From: Kumar Gala @ 2009-08-18  0:00 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


On Aug 17, 2009, at 6:00 PM, Paul Mackerras wrote:

> On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two
> 32-bit halves.  On SMP we write the higher-order half and then the
> lower-order half, with a write barrier between the two halves, but on
> UP there was no particular ordering of the writes to the two halves.
>
> This extends the ordering that we already do on SMP to the UP case as
> well.  The reason is that with the perf_counter subsystem potentially
> accessing user memory at interrupt time to get stack traces, we have
> to be careful not to create an incorrect but apparently valid PTE even
> on UP.
>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/pgtable.h |    6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)

Just out of interest did you end up hitting this in testing?

- k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs
  2009-08-18  0:00 ` [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Kumar Gala
@ 2009-08-18  0:14   ` Paul Mackerras
  0 siblings, 0 replies; 33+ messages in thread
From: Paul Mackerras @ 2009-08-18  0:14 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

Kumar Gala writes:

> On Aug 17, 2009, at 6:00 PM, Paul Mackerras wrote:
> 
> > On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two
> > 32-bit halves.  On SMP we write the higher-order half and then the
> > lower-order half, with a write barrier between the two halves, but on
> > UP there was no particular ordering of the writes to the two halves.
> >
> > This extends the ordering that we already do on SMP to the UP case as
> > well.  The reason is that with the perf_counter subsystem potentially
> > accessing user memory at interrupt time to get stack traces, we have
> > to be careful not to create an incorrect but apparently valid PTE even
> > on UP.
> >
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> > ---
> > arch/powerpc/include/asm/pgtable.h |    6 +++---
> > 1 files changed, 3 insertions(+), 3 deletions(-)
> 
> Just out of interest did you end up hitting this in testing?

No.  Ben told me he wanted this change, so I did what I was told. :)

Paul.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3 v3] perf_counter: powerpc: Add callchain support
  2009-08-17 23:01 ` [PATCH 3/3 v3] perf_counter: powerpc: Add callchain support Paul Mackerras
@ 2009-08-18  4:23   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2009-08-18  4:23 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Tue, 2009-08-18 at 09:01 +1000, Paul Mackerras wrote:
> This adds support for tracing callchains for powerpc, both 32-bit
> and 64-bit, and both in the kernel and userspace, from PMU interrupt
> context.

> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time
  2009-08-17 23:00 ` [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time Paul Mackerras
@ 2009-08-18  4:24   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2009-08-18  4:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Tue, 2009-08-18 at 09:00 +1000, Paul Mackerras wrote:
> This provides a mechanism to allow the perf_counters code to access
> user memory in a PMU interrupt routine.  Such an access can cause
> various kinds of interrupt: SLB miss, MMU hash table miss, segment
> table miss, or TLB miss, depending on the processor.  This commit
> only deals with 64-bit classic/server processors, which use an MMU
> hash table.  32-bit processors are already able to access user memory
> at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
> the possibility of reentering hash_page or the TLB miss handlers,
> since they run with interrupts disabled.

  .../...

> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs
  2009-08-17 23:00 [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Paul Mackerras
                   ` (2 preceding siblings ...)
  2009-08-18  0:00 ` [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Kumar Gala
@ 2009-08-18  4:24 ` Benjamin Herrenschmidt
  2009-08-19 23:16     ` [U-Boot] " Feng Kan
  3 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2009-08-18  4:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Tue, 2009-08-18 at 09:00 +1000, Paul Mackerras wrote:
> On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two
> 32-bit halves.  On SMP we write the higher-order half and then the
> lower-order half, with a write barrier between the two halves, but on
> UP there was no particular ordering of the writes to the two halves.
> 
> This extends the ordering that we already do on SMP to the UP case as
> well.  The reason is that with the perf_counter subsystem potentially
> accessing user memory at interrupt time to get stack traces, we have
> to be careful not to create an incorrect but apparently valid PTE even
> on UP.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
> ---
>  arch/powerpc/include/asm/pgtable.h |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index eb17da7..2a5da06 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -104,8 +104,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	else
>  		pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte));
>  
> -#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT) && defined(CONFIG_SMP)
> -	/* Second case is 32-bit with 64-bit PTE in SMP mode. In this case, we
> +#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
> +	/* Second case is 32-bit with 64-bit PTE.  In this case, we
>  	 * can just store as long as we do the two halves in the right order
>  	 * with a barrier in between. This is possible because we take care,
>  	 * in the hash code, to pre-invalidate if the PTE was already hashed,
> @@ -140,7 +140,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  
>  #else
>  	/* Anything else just stores the PTE normally. That covers all 64-bit
> -	 * cases, and 32-bit non-hash with 64-bit PTEs in UP mode
> +	 * cases, and 32-bit non-hash with 32-bit PTEs.
>  	 */
>  	*ptep = pte;
>  #endif

^ permalink raw reply	[flat|nested] 33+ messages in thread

* NAND  ECC Error with wrong SMC ording bug
  2009-08-18  4:24 ` Benjamin Herrenschmidt
@ 2009-08-19 23:16     ` Feng Kan
  0 siblings, 0 replies; 33+ messages in thread
From: Feng Kan @ 2009-08-19 23:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: u-boot, linux-mtd

Hi All:

It seems that the ECC correction is broken on the Linux with the 4xx 
NDFC driver.
It uses the SMC order when reading the ECC code. 2-1-3

static int ndfc_calculate_ecc(struct mtd_info *mtd,
                               const u_char *dat, u_char *ecc_code)
{
         struct ndfc_controller *ndfc = &ndfc_ctrl;
         uint32_t ecc;
         uint8_t *p = (uint8_t *)&ecc;

         wmb();
         ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
         /* The NDFC uses Smart Media (SMC) bytes order */
         ecc_code[0] = p[2];
         ecc_code[1] = p[1];
         ecc_code[2] = p[3];

         return 0;
}

However, when in the correction function, the byte address order is 
again reverses
causing incorrect byte location.

                  * performace it does not make any difference
                  */
                 if (eccsize_mult == 1)
                         byte_addr = (addressbits[b0] << 4) + 
addressbits[b1];
 >>>> The above really should be byte_addr = (addressbits[b1] << 4) + 
addressbits[b0];

                 else
                         byte_addr = (addressbits[b2 & 0x3] << 8) +
                                     (addressbits[b1] << 4) + 
addressbits[b0];
                 bit_addr = addressbits[b2 >> 2];
                 /* flip the bit */
                 buf[byte_addr] ^= (1 << bit_addr);
                 printk(KERN_INFO "Corrected b[0] 0x%x b[1]0x%x\n", b0, b1);
                 printk(KERN_INFO "cal ecc b[0] 0x%x b[1]0x%x\n", 
calc_ecc[0] , calc_ecc[1]);
                 printk(KERN_INFO "read ecc b[0] 0x%x b[1]0x%x\n", 
read_ecc[0] , read_ecc[1]);
                 return 1;

I see other boards using SMC as well, can someone comment on the change 
I am proposing.
Should I change the correction algorithm or the calculate function? If 
the later is preferred
it would mean the change must be pushed in both U-Boot and Linux.

Feng Kan
AMCC Software

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-19 23:16     ` Feng Kan
  0 siblings, 0 replies; 33+ messages in thread
From: Feng Kan @ 2009-08-19 23:16 UTC (permalink / raw)
  To: u-boot

Hi All:

It seems that the ECC correction is broken on the Linux with the 4xx 
NDFC driver.
It uses the SMC order when reading the ECC code. 2-1-3

static int ndfc_calculate_ecc(struct mtd_info *mtd,
                               const u_char *dat, u_char *ecc_code)
{
         struct ndfc_controller *ndfc = &ndfc_ctrl;
         uint32_t ecc;
         uint8_t *p = (uint8_t *)&ecc;

         wmb();
         ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
         /* The NDFC uses Smart Media (SMC) bytes order */
         ecc_code[0] = p[2];
         ecc_code[1] = p[1];
         ecc_code[2] = p[3];

         return 0;
}

However, when in the correction function, the byte address order is 
again reverses
causing incorrect byte location.

                  * performace it does not make any difference
                  */
                 if (eccsize_mult == 1)
                         byte_addr = (addressbits[b0] << 4) + 
addressbits[b1];
 >>>> The above really should be byte_addr = (addressbits[b1] << 4) + 
addressbits[b0];

                 else
                         byte_addr = (addressbits[b2 & 0x3] << 8) +
                                     (addressbits[b1] << 4) + 
addressbits[b0];
                 bit_addr = addressbits[b2 >> 2];
                 /* flip the bit */
                 buf[byte_addr] ^= (1 << bit_addr);
                 printk(KERN_INFO "Corrected b[0] 0x%x b[1]0x%x\n", b0, b1);
                 printk(KERN_INFO "cal ecc b[0] 0x%x b[1]0x%x\n", 
calc_ecc[0] , calc_ecc[1]);
                 printk(KERN_INFO "read ecc b[0] 0x%x b[1]0x%x\n", 
read_ecc[0] , read_ecc[1]);
                 return 1;

I see other boards using SMC as well, can someone comment on the change 
I am proposing.
Should I change the correction algorithm or the calculate function? If 
the later is preferred
it would mean the change must be pushed in both U-Boot and Linux.

Feng Kan
AMCC Software

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: NAND  ECC Error with wrong SMC ording bug
  2009-08-19 23:16     ` [U-Boot] " Feng Kan
@ 2009-08-20  4:38       ` Sean MacLennan
  -1 siblings, 0 replies; 33+ messages in thread
From: Sean MacLennan @ 2009-08-20  4:38 UTC (permalink / raw)
  To: Feng Kan; +Cc: linuxppc-dev, linux-mtd, u-boot

On Wed, 19 Aug 2009 16:16:54 -0700
Feng Kan <fkan@amcc.com> wrote:

> I see other boards using SMC as well, can someone comment on the
> change I am proposing.
> Should I change the correction algorithm or the calculate function?
> If the later is preferred
> it would mean the change must be pushed in both U-Boot and Linux.

Odds are the calculate function is wrong. The correction algo is used
by many nand drivers, I *assume* it is correct. The calculate function
was set to agree with u-boot (1.3.0).

Cheers,
   Sean

P.S. Yes, I know the u-boot is an ancient version :(

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-20  4:38       ` Sean MacLennan
  0 siblings, 0 replies; 33+ messages in thread
From: Sean MacLennan @ 2009-08-20  4:38 UTC (permalink / raw)
  To: u-boot

On Wed, 19 Aug 2009 16:16:54 -0700
Feng Kan <fkan@amcc.com> wrote:

> I see other boards using SMC as well, can someone comment on the
> change I am proposing.
> Should I change the correction algorithm or the calculate function?
> If the later is preferred
> it would mean the change must be pushed in both U-Boot and Linux.

Odds are the calculate function is wrong. The correction algo is used
by many nand drivers, I *assume* it is correct. The calculate function
was set to agree with u-boot (1.3.0).

Cheers,
   Sean

P.S. Yes, I know the u-boot is an ancient version :(

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND  ECC Error with wrong SMC ording bug
  2009-08-20  4:38       ` [U-Boot] " Sean MacLennan
@ 2009-08-20  5:01         ` Stefan Roese
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Roese @ 2009-08-20  5:01 UTC (permalink / raw)
  To: u-boot; +Cc: linuxppc-dev, Feng Kan, linux-mtd, Sean MacLennan

On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > I see other boards using SMC as well, can someone comment on the
> > change I am proposing.
> > Should I change the correction algorithm or the calculate function?
> > If the later is preferred
> > it would mean the change must be pushed in both U-Boot and Linux.
>
> Odds are the calculate function is wrong. The correction algo is used
> by many nand drivers, I *assume* it is correct. The calculate function
> was set to agree with u-boot (1.3.0).

Yes, it seems that you changed the order in the calculation function while 
reworking the NDFC driver for arch/powerpc. So we should probably change this 
order back to the original version. And change it in U-Boot as well.

BTW: I didn't see any problems with ECC so far with the current code. Feng, 
how did you spot this problem?

Cheers,
Stefan

--
DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-20  5:01         ` Stefan Roese
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roese @ 2009-08-20  5:01 UTC (permalink / raw)
  To: u-boot

On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > I see other boards using SMC as well, can someone comment on the
> > change I am proposing.
> > Should I change the correction algorithm or the calculate function?
> > If the later is preferred
> > it would mean the change must be pushed in both U-Boot and Linux.
>
> Odds are the calculate function is wrong. The correction algo is used
> by many nand drivers, I *assume* it is correct. The calculate function
> was set to agree with u-boot (1.3.0).

Yes, it seems that you changed the order in the calculation function while 
reworking the NDFC driver for arch/powerpc. So we should probably change this 
order back to the original version. And change it in U-Boot as well.

BTW: I didn't see any problems with ECC so far with the current code. Feng, 
how did you spot this problem?

Cheers,
Stefan

--
DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office at denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND  ECC Error with wrong SMC ording bug
  2009-08-20  5:01         ` Stefan Roese
@ 2009-08-20 19:36           ` Sean MacLennan
  -1 siblings, 0 replies; 33+ messages in thread
From: Sean MacLennan @ 2009-08-20 19:36 UTC (permalink / raw)
  To: Stefan Roese; +Cc: u-boot, Feng Kan, linux-mtd, linuxppc-dev

On Thu, 20 Aug 2009 07:01:21 +0200
Stefan Roese <sr@denx.de> wrote:

> On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > > I see other boards using SMC as well, can someone comment on the
> > > change I am proposing.
> > > Should I change the correction algorithm or the calculate
> > > function? If the later is preferred
> > > it would mean the change must be pushed in both U-Boot and Linux.
> >
> > Odds are the calculate function is wrong. The correction algo is
> > used by many nand drivers, I *assume* it is correct. The calculate
> > function was set to agree with u-boot (1.3.0).
> 
> Yes, it seems that you changed the order in the calculation function
> while reworking the NDFC driver for arch/powerpc. So we should
> probably change this order back to the original version. And change
> it in U-Boot as well.
> 
> BTW: I didn't see any problems with ECC so far with the current code.
> Feng, how did you spot this problem?

Ok, I think I have reproduced the problem programmatically. Basically,
I force a one bit error with the following patch:

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 8c21b89..91dd5b4 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 	uint8_t *ecc_calc = chip->buffers->ecccalc;
 	const uint8_t *p = buf;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
+	static int count;
 
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
 		chip->ecc.hwctl(mtd, NAND_ECC_WRITE);
-		chip->write_buf(mtd, p, eccsize);
-		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
+		if (count == 0) {
+			count = 1;
+			printk("Corrupt one bit: %08x => %08x\n",
+			       *p, *p ^ 8);
+			*(uint8_t *)p ^= 8;
+			chip->write_buf(mtd, p, eccsize);
+			*(uint8_t *)p ^= 8;
+			nand_calculate_ecc(mtd, p, &ecc_calc[i]);
+		} else {
+			chip->write_buf(mtd, p, eccsize);
+			chip->ecc.calculate(mtd, p, &ecc_calc[i]);
+		}
 	}
 
 	for (i = 0; i < chip->ecc.total; i++)

Basically I write a one bit error to the NAND, but calculate with the
correct bit. This assumes nand_calculate_ecc is correct.

I then added debugs to the correction to make sure it corrected
properly:

diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
index c0cb87d..57dcaa1 100644
--- a/drivers/mtd/nand/nand_ecc.c
+++ b/drivers/mtd/nand/nand_ecc.c
@@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf,
 			byte_addr = (addressbits[b2 & 0x3] << 8) +
 				    (addressbits[b1] << 4) + addressbits[b0];
 		bit_addr = addressbits[b2 >> 2];
+
+		printk("Single bit error: correct %08x => %08x\n",
+		       buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr));
+
 		/* flip the bit */
 		buf[byte_addr] ^= (1 << bit_addr);
 		return 1;
 
 	}
 	/* count nr of bits; use table lookup, faster than calculating it */
-	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)
+	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) {
+		printk("ECC DATA BAD\n"); // SAM DBG
 		return 1;	/* error in ecc data; no action needed */
+	}
 
 	printk(KERN_ERR "uncorrectable error : ");
 	return -1;

With the current ndfc code, the error correction gets the bits wrong.
Switching it back to the original way and the correction is correct.

diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
index 89bf85a..497e175 100644
--- a/drivers/mtd/nand/ndfc.c
+++ b/drivers/mtd/nand/ndfc.c
@@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
 
 	wmb();
 	ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
-	/* The NDFC uses Smart Media (SMC) bytes order */
-	ecc_code[0] = p[2];
-	ecc_code[1] = p[1];
+	ecc_code[0] = p[1];
+	ecc_code[1] = p[2];
 	ecc_code[2] = p[3];
 
 	return 0;

Does anybody see a problem with my method of reproducing the bug? This
bug is deadly for our customers. I don't want to make the change unless
it is absolutely necessary.

Cheers,
   Sean

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-20 19:36           ` Sean MacLennan
  0 siblings, 0 replies; 33+ messages in thread
From: Sean MacLennan @ 2009-08-20 19:36 UTC (permalink / raw)
  To: u-boot

On Thu, 20 Aug 2009 07:01:21 +0200
Stefan Roese <sr@denx.de> wrote:

> On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > > I see other boards using SMC as well, can someone comment on the
> > > change I am proposing.
> > > Should I change the correction algorithm or the calculate
> > > function? If the later is preferred
> > > it would mean the change must be pushed in both U-Boot and Linux.
> >
> > Odds are the calculate function is wrong. The correction algo is
> > used by many nand drivers, I *assume* it is correct. The calculate
> > function was set to agree with u-boot (1.3.0).
> 
> Yes, it seems that you changed the order in the calculation function
> while reworking the NDFC driver for arch/powerpc. So we should
> probably change this order back to the original version. And change
> it in U-Boot as well.
> 
> BTW: I didn't see any problems with ECC so far with the current code.
> Feng, how did you spot this problem?

Ok, I think I have reproduced the problem programmatically. Basically,
I force a one bit error with the following patch:

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 8c21b89..91dd5b4 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 	uint8_t *ecc_calc = chip->buffers->ecccalc;
 	const uint8_t *p = buf;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
+	static int count;
 
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
 		chip->ecc.hwctl(mtd, NAND_ECC_WRITE);
-		chip->write_buf(mtd, p, eccsize);
-		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
+		if (count == 0) {
+			count = 1;
+			printk("Corrupt one bit: %08x => %08x\n",
+			       *p, *p ^ 8);
+			*(uint8_t *)p ^= 8;
+			chip->write_buf(mtd, p, eccsize);
+			*(uint8_t *)p ^= 8;
+			nand_calculate_ecc(mtd, p, &ecc_calc[i]);
+		} else {
+			chip->write_buf(mtd, p, eccsize);
+			chip->ecc.calculate(mtd, p, &ecc_calc[i]);
+		}
 	}
 
 	for (i = 0; i < chip->ecc.total; i++)

Basically I write a one bit error to the NAND, but calculate with the
correct bit. This assumes nand_calculate_ecc is correct.

I then added debugs to the correction to make sure it corrected
properly:

diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
index c0cb87d..57dcaa1 100644
--- a/drivers/mtd/nand/nand_ecc.c
+++ b/drivers/mtd/nand/nand_ecc.c
@@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf,
 			byte_addr = (addressbits[b2 & 0x3] << 8) +
 				    (addressbits[b1] << 4) + addressbits[b0];
 		bit_addr = addressbits[b2 >> 2];
+
+		printk("Single bit error: correct %08x => %08x\n",
+		       buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr));
+
 		/* flip the bit */
 		buf[byte_addr] ^= (1 << bit_addr);
 		return 1;
 
 	}
 	/* count nr of bits; use table lookup, faster than calculating it */
-	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)
+	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) {
+		printk("ECC DATA BAD\n"); // SAM DBG
 		return 1;	/* error in ecc data; no action needed */
+	}
 
 	printk(KERN_ERR "uncorrectable error : ");
 	return -1;

With the current ndfc code, the error correction gets the bits wrong.
Switching it back to the original way and the correction is correct.

diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
index 89bf85a..497e175 100644
--- a/drivers/mtd/nand/ndfc.c
+++ b/drivers/mtd/nand/ndfc.c
@@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
 
 	wmb();
 	ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
-	/* The NDFC uses Smart Media (SMC) bytes order */
-	ecc_code[0] = p[2];
-	ecc_code[1] = p[1];
+	ecc_code[0] = p[1];
+	ecc_code[1] = p[2];
 	ecc_code[2] = p[3];
 
 	return 0;

Does anybody see a problem with my method of reproducing the bug? This
bug is deadly for our customers. I don't want to make the change unless
it is absolutely necessary.

Cheers,
   Sean

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* RE: [U-Boot] NAND  ECC Error with wrong SMC ording bug
  2009-08-20 19:36           ` Sean MacLennan
  (?)
@ 2009-08-20 22:56             ` Victor Gallardo
  -1 siblings, 0 replies; 33+ messages in thread
From: Victor Gallardo @ 2009-08-20 22:56 UTC (permalink / raw)
  To: Sean MacLennan, Stefan Roese
  Cc: Prodyut Hazarika, u-boot, Feng Kan, linux-mtd, linuxppc-dev

Hi Sean,

The change is necessary in both Linux and u-boot. Without this change =
customer are seeing the problem.

Best Regards,

Victor Gallardo

> -----Original Message-----
> From: linuxppc-dev-bounces+vgallardo=3Damcc.com@lists.ozlabs.org =
[mailto:linuxppc-dev-
> bounces+vgallardo=3Damcc.com@lists.ozlabs.org] On Behalf Of Sean =
MacLennan
> Sent: Thursday, August 20, 2009 12:37 PM
> To: Stefan Roese
> Cc: u-boot@lists.denx.de; Feng Kan; linux-mtd@lists.infradead.org; =
linuxppc-dev@ozlabs.org
> Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
>=20
> On Thu, 20 Aug 2009 07:01:21 +0200
> Stefan Roese <sr@denx.de> wrote:
>=20
> > On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > > > I see other boards using SMC as well, can someone comment on the
> > > > change I am proposing.
> > > > Should I change the correction algorithm or the calculate
> > > > function? If the later is preferred
> > > > it would mean the change must be pushed in both U-Boot and =
Linux.
> > >
> > > Odds are the calculate function is wrong. The correction algo is
> > > used by many nand drivers, I *assume* it is correct. The calculate
> > > function was set to agree with u-boot (1.3.0).
> >
> > Yes, it seems that you changed the order in the calculation function
> > while reworking the NDFC driver for arch/powerpc. So we should
> > probably change this order back to the original version. And change
> > it in U-Boot as well.
> >
> > BTW: I didn't see any problems with ECC so far with the current =
code.
> > Feng, how did you spot this problem?
>=20
> Ok, I think I have reproduced the problem programmatically. Basically,
> I force a one bit error with the following patch:
>=20
> diff --git a/drivers/mtd/nand/nand_base.c =
b/drivers/mtd/nand/nand_base.c
> index 8c21b89..91dd5b4 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct =
mtd_info *mtd, struct nand_chip
> *chip,
>  	uint8_t *ecc_calc =3D chip->buffers->ecccalc;
>  	const uint8_t *p =3D buf;
>  	uint32_t *eccpos =3D chip->ecc.layout->eccpos;
> +	static int count;
>=20
>  	for (i =3D 0; eccsteps; eccsteps--, i +=3D eccbytes, p +=3D eccsize) =
{
>  		chip->ecc.hwctl(mtd, NAND_ECC_WRITE);
> -		chip->write_buf(mtd, p, eccsize);
> -		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> +		if (count =3D=3D 0) {
> +			count =3D 1;
> +			printk("Corrupt one bit: %08x =3D> %08x\n",
> +			       *p, *p ^ 8);
> +			*(uint8_t *)p ^=3D 8;
> +			chip->write_buf(mtd, p, eccsize);
> +			*(uint8_t *)p ^=3D 8;
> +			nand_calculate_ecc(mtd, p, &ecc_calc[i]);
> +		} else {
> +			chip->write_buf(mtd, p, eccsize);
> +			chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> +		}
>  	}
>=20
>  	for (i =3D 0; i < chip->ecc.total; i++)
>=20
> Basically I write a one bit error to the NAND, but calculate with the
> correct bit. This assumes nand_calculate_ecc is correct.
>=20
> I then added debugs to the correction to make sure it corrected
> properly:
>=20
> diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
> index c0cb87d..57dcaa1 100644
> --- a/drivers/mtd/nand/nand_ecc.c
> +++ b/drivers/mtd/nand/nand_ecc.c
> @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, =
unsigned char *buf,
>  			byte_addr =3D (addressbits[b2 & 0x3] << 8) +
>  				    (addressbits[b1] << 4) + addressbits[b0];
>  		bit_addr =3D addressbits[b2 >> 2];
> +
> +		printk("Single bit error: correct %08x =3D> %08x\n",
> +		       buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr));
> +
>  		/* flip the bit */
>  		buf[byte_addr] ^=3D (1 << bit_addr);
>  		return 1;
>=20
>  	}
>  	/* count nr of bits; use table lookup, faster than calculating it */
> -	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) =3D=3D 1)
> +	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) =3D=3D 1) =
{
> +		printk("ECC DATA BAD\n"); // SAM DBG
>  		return 1;	/* error in ecc data; no action needed */
> +	}
>=20
>  	printk(KERN_ERR "uncorrectable error : ");
>  	return -1;
>=20
> With the current ndfc code, the error correction gets the bits wrong.
> Switching it back to the original way and the correction is correct.
>=20
> diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> index 89bf85a..497e175 100644
> --- a/drivers/mtd/nand/ndfc.c
> +++ b/drivers/mtd/nand/ndfc.c
> @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info =
*mtd,
>=20
>  	wmb();
>  	ecc =3D in_be32(ndfc->ndfcbase + NDFC_ECC);
> -	/* The NDFC uses Smart Media (SMC) bytes order */
> -	ecc_code[0] =3D p[2];
> -	ecc_code[1] =3D p[1];
> +	ecc_code[0] =3D p[1];
> +	ecc_code[1] =3D p[2];
>  	ecc_code[2] =3D p[3];
>=20
>  	return 0;
>=20
> Does anybody see a problem with my method of reproducing the bug? This
> bug is deadly for our customers. I don't want to make the change =
unless
> it is absolutely necessary.
>=20
> Cheers,
>    Sean
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-20 22:56             ` Victor Gallardo
  0 siblings, 0 replies; 33+ messages in thread
From: Victor Gallardo @ 2009-08-20 22:56 UTC (permalink / raw)
  To: Sean MacLennan, Stefan Roese
  Cc: Prodyut Hazarika, u-boot, Feng Kan, linux-mtd, linuxppc-dev

Hi Sean,

The change is necessary in both Linux and u-boot. Without this change customer are seeing the problem.

Best Regards,

Victor Gallardo

> -----Original Message-----
> From: linuxppc-dev-bounces+vgallardo=amcc.com@lists.ozlabs.org [mailto:linuxppc-dev-
> bounces+vgallardo=amcc.com@lists.ozlabs.org] On Behalf Of Sean MacLennan
> Sent: Thursday, August 20, 2009 12:37 PM
> To: Stefan Roese
> Cc: u-boot@lists.denx.de; Feng Kan; linux-mtd@lists.infradead.org; linuxppc-dev@ozlabs.org
> Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
> 
> On Thu, 20 Aug 2009 07:01:21 +0200
> Stefan Roese <sr@denx.de> wrote:
> 
> > On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > > > I see other boards using SMC as well, can someone comment on the
> > > > change I am proposing.
> > > > Should I change the correction algorithm or the calculate
> > > > function? If the later is preferred
> > > > it would mean the change must be pushed in both U-Boot and Linux.
> > >
> > > Odds are the calculate function is wrong. The correction algo is
> > > used by many nand drivers, I *assume* it is correct. The calculate
> > > function was set to agree with u-boot (1.3.0).
> >
> > Yes, it seems that you changed the order in the calculation function
> > while reworking the NDFC driver for arch/powerpc. So we should
> > probably change this order back to the original version. And change
> > it in U-Boot as well.
> >
> > BTW: I didn't see any problems with ECC so far with the current code.
> > Feng, how did you spot this problem?
> 
> Ok, I think I have reproduced the problem programmatically. Basically,
> I force a one bit error with the following patch:
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 8c21b89..91dd5b4 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip
> *chip,
>  	uint8_t *ecc_calc = chip->buffers->ecccalc;
>  	const uint8_t *p = buf;
>  	uint32_t *eccpos = chip->ecc.layout->eccpos;
> +	static int count;
> 
>  	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
>  		chip->ecc.hwctl(mtd, NAND_ECC_WRITE);
> -		chip->write_buf(mtd, p, eccsize);
> -		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> +		if (count == 0) {
> +			count = 1;
> +			printk("Corrupt one bit: %08x => %08x\n",
> +			       *p, *p ^ 8);
> +			*(uint8_t *)p ^= 8;
> +			chip->write_buf(mtd, p, eccsize);
> +			*(uint8_t *)p ^= 8;
> +			nand_calculate_ecc(mtd, p, &ecc_calc[i]);
> +		} else {
> +			chip->write_buf(mtd, p, eccsize);
> +			chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> +		}
>  	}
> 
>  	for (i = 0; i < chip->ecc.total; i++)
> 
> Basically I write a one bit error to the NAND, but calculate with the
> correct bit. This assumes nand_calculate_ecc is correct.
> 
> I then added debugs to the correction to make sure it corrected
> properly:
> 
> diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
> index c0cb87d..57dcaa1 100644
> --- a/drivers/mtd/nand/nand_ecc.c
> +++ b/drivers/mtd/nand/nand_ecc.c
> @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf,
>  			byte_addr = (addressbits[b2 & 0x3] << 8) +
>  				    (addressbits[b1] << 4) + addressbits[b0];
>  		bit_addr = addressbits[b2 >> 2];
> +
> +		printk("Single bit error: correct %08x => %08x\n",
> +		       buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr));
> +
>  		/* flip the bit */
>  		buf[byte_addr] ^= (1 << bit_addr);
>  		return 1;
> 
>  	}
>  	/* count nr of bits; use table lookup, faster than calculating it */
> -	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)
> +	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) {
> +		printk("ECC DATA BAD\n"); // SAM DBG
>  		return 1;	/* error in ecc data; no action needed */
> +	}
> 
>  	printk(KERN_ERR "uncorrectable error : ");
>  	return -1;
> 
> With the current ndfc code, the error correction gets the bits wrong.
> Switching it back to the original way and the correction is correct.
> 
> diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> index 89bf85a..497e175 100644
> --- a/drivers/mtd/nand/ndfc.c
> +++ b/drivers/mtd/nand/ndfc.c
> @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
> 
>  	wmb();
>  	ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> -	/* The NDFC uses Smart Media (SMC) bytes order */
> -	ecc_code[0] = p[2];
> -	ecc_code[1] = p[1];
> +	ecc_code[0] = p[1];
> +	ecc_code[1] = p[2];
>  	ecc_code[2] = p[3];
> 
>  	return 0;
> 
> Does anybody see a problem with my method of reproducing the bug? This
> bug is deadly for our customers. I don't want to make the change unless
> it is absolutely necessary.
> 
> Cheers,
>    Sean
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-20 22:56             ` Victor Gallardo
  0 siblings, 0 replies; 33+ messages in thread
From: Victor Gallardo @ 2009-08-20 22:56 UTC (permalink / raw)
  To: u-boot

Hi Sean,

The change is necessary in both Linux and u-boot. Without this change customer are seeing the problem.

Best Regards,

Victor Gallardo

> -----Original Message-----
> From: linuxppc-dev-bounces+vgallardo=amcc.com at lists.ozlabs.org [mailto:linuxppc-dev-
> bounces+vgallardo=amcc.com at lists.ozlabs.org] On Behalf Of Sean MacLennan
> Sent: Thursday, August 20, 2009 12:37 PM
> To: Stefan Roese
> Cc: u-boot at lists.denx.de; Feng Kan; linux-mtd at lists.infradead.org; linuxppc-dev at ozlabs.org
> Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
> 
> On Thu, 20 Aug 2009 07:01:21 +0200
> Stefan Roese <sr@denx.de> wrote:
> 
> > On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
> > > > I see other boards using SMC as well, can someone comment on the
> > > > change I am proposing.
> > > > Should I change the correction algorithm or the calculate
> > > > function? If the later is preferred
> > > > it would mean the change must be pushed in both U-Boot and Linux.
> > >
> > > Odds are the calculate function is wrong. The correction algo is
> > > used by many nand drivers, I *assume* it is correct. The calculate
> > > function was set to agree with u-boot (1.3.0).
> >
> > Yes, it seems that you changed the order in the calculation function
> > while reworking the NDFC driver for arch/powerpc. So we should
> > probably change this order back to the original version. And change
> > it in U-Boot as well.
> >
> > BTW: I didn't see any problems with ECC so far with the current code.
> > Feng, how did you spot this problem?
> 
> Ok, I think I have reproduced the problem programmatically. Basically,
> I force a one bit error with the following patch:
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 8c21b89..91dd5b4 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip
> *chip,
>  	uint8_t *ecc_calc = chip->buffers->ecccalc;
>  	const uint8_t *p = buf;
>  	uint32_t *eccpos = chip->ecc.layout->eccpos;
> +	static int count;
> 
>  	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
>  		chip->ecc.hwctl(mtd, NAND_ECC_WRITE);
> -		chip->write_buf(mtd, p, eccsize);
> -		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> +		if (count == 0) {
> +			count = 1;
> +			printk("Corrupt one bit: %08x => %08x\n",
> +			       *p, *p ^ 8);
> +			*(uint8_t *)p ^= 8;
> +			chip->write_buf(mtd, p, eccsize);
> +			*(uint8_t *)p ^= 8;
> +			nand_calculate_ecc(mtd, p, &ecc_calc[i]);
> +		} else {
> +			chip->write_buf(mtd, p, eccsize);
> +			chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> +		}
>  	}
> 
>  	for (i = 0; i < chip->ecc.total; i++)
> 
> Basically I write a one bit error to the NAND, but calculate with the
> correct bit. This assumes nand_calculate_ecc is correct.
> 
> I then added debugs to the correction to make sure it corrected
> properly:
> 
> diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
> index c0cb87d..57dcaa1 100644
> --- a/drivers/mtd/nand/nand_ecc.c
> +++ b/drivers/mtd/nand/nand_ecc.c
> @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf,
>  			byte_addr = (addressbits[b2 & 0x3] << 8) +
>  				    (addressbits[b1] << 4) + addressbits[b0];
>  		bit_addr = addressbits[b2 >> 2];
> +
> +		printk("Single bit error: correct %08x => %08x\n",
> +		       buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr));
> +
>  		/* flip the bit */
>  		buf[byte_addr] ^= (1 << bit_addr);
>  		return 1;
> 
>  	}
>  	/* count nr of bits; use table lookup, faster than calculating it */
> -	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)
> +	if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) {
> +		printk("ECC DATA BAD\n"); // SAM DBG
>  		return 1;	/* error in ecc data; no action needed */
> +	}
> 
>  	printk(KERN_ERR "uncorrectable error : ");
>  	return -1;
> 
> With the current ndfc code, the error correction gets the bits wrong.
> Switching it back to the original way and the correction is correct.
> 
> diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> index 89bf85a..497e175 100644
> --- a/drivers/mtd/nand/ndfc.c
> +++ b/drivers/mtd/nand/ndfc.c
> @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
> 
>  	wmb();
>  	ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> -	/* The NDFC uses Smart Media (SMC) bytes order */
> -	ecc_code[0] = p[2];
> -	ecc_code[1] = p[1];
> +	ecc_code[0] = p[1];
> +	ecc_code[1] = p[2];
>  	ecc_code[2] = p[3];
> 
>  	return 0;
> 
> Does anybody see a problem with my method of reproducing the bug? This
> bug is deadly for our customers. I don't want to make the change unless
> it is absolutely necessary.
> 
> Cheers,
>    Sean
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND  ECC Error with wrong SMC ording bug
  2009-08-20  5:01         ` Stefan Roese
@ 2009-08-20 23:42           ` Feng Kan
  -1 siblings, 0 replies; 33+ messages in thread
From: Feng Kan @ 2009-08-20 23:42 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linuxppc-dev, u-boot, linux-mtd, Sean MacLennan

Hi Stefan:

We had a board with high number of correctable ECC errors. Which crashed 
the jffs when it
was miss correcting the wrong byte location.

Do you want me to submit a patch for this, or do you prefer to do it. I 
am submitting a patch
for linux right now.

Feng Kan
AMCC Software

On 08/19/2009 10:01 PM, Stefan Roese wrote:
> On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
>    
>>> I see other boards using SMC as well, can someone comment on the
>>> change I am proposing.
>>> Should I change the correction algorithm or the calculate function?
>>> If the later is preferred
>>> it would mean the change must be pushed in both U-Boot and Linux.
>>>        
>> Odds are the calculate function is wrong. The correction algo is used
>> by many nand drivers, I *assume* it is correct. The calculate function
>> was set to agree with u-boot (1.3.0).
>>      
> Yes, it seems that you changed the order in the calculation function while
> reworking the NDFC driver for arch/powerpc. So we should probably change this
> order back to the original version. And change it in U-Boot as well.
>
> BTW: I didn't see any problems with ECC so far with the current code. Feng,
> how did you spot this problem?
>
> Cheers,
> Stefan
>
> --
> DENX Software Engineering GmbH,      MD: Wolfgang Denk&  Detlev Zundel
> HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de
>    

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-20 23:42           ` Feng Kan
  0 siblings, 0 replies; 33+ messages in thread
From: Feng Kan @ 2009-08-20 23:42 UTC (permalink / raw)
  To: u-boot

Hi Stefan:

We had a board with high number of correctable ECC errors. Which crashed 
the jffs when it
was miss correcting the wrong byte location.

Do you want me to submit a patch for this, or do you prefer to do it. I 
am submitting a patch
for linux right now.

Feng Kan
AMCC Software

On 08/19/2009 10:01 PM, Stefan Roese wrote:
> On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
>    
>>> I see other boards using SMC as well, can someone comment on the
>>> change I am proposing.
>>> Should I change the correction algorithm or the calculate function?
>>> If the later is preferred
>>> it would mean the change must be pushed in both U-Boot and Linux.
>>>        
>> Odds are the calculate function is wrong. The correction algo is used
>> by many nand drivers, I *assume* it is correct. The calculate function
>> was set to agree with u-boot (1.3.0).
>>      
> Yes, it seems that you changed the order in the calculation function while
> reworking the NDFC driver for arch/powerpc. So we should probably change this
> order back to the original version. And change it in U-Boot as well.
>
> BTW: I didn't see any problems with ECC so far with the current code. Feng,
> how did you spot this problem?
>
> Cheers,
> Stefan
>
> --
> DENX Software Engineering GmbH,      MD: Wolfgang Denk&  Detlev Zundel
> HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office at denx.de
>    

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
  2009-08-20 19:36           ` Sean MacLennan
  (?)
@ 2009-08-21  5:17             ` vimal singh
  -1 siblings, 0 replies; 33+ messages in thread
From: vimal singh @ 2009-08-21  5:17 UTC (permalink / raw)
  To: Sean MacLennan; +Cc: u-boot, Stefan Roese, Feng Kan, linux-mtd, linuxppc-dev

<snip>

> With the current ndfc code, the error correction gets the bits wrong.
> Switching it back to the original way and the correction is correct.
>
> diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> index 89bf85a..497e175 100644
> --- a/drivers/mtd/nand/ndfc.c
> +++ b/drivers/mtd/nand/ndfc.c
> @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
>
> =A0 =A0 =A0 =A0wmb();
> =A0 =A0 =A0 =A0ecc =3D in_be32(ndfc->ndfcbase + NDFC_ECC);
> - =A0 =A0 =A0 /* The NDFC uses Smart Media (SMC) bytes order */
> - =A0 =A0 =A0 ecc_code[0] =3D p[2];
> - =A0 =A0 =A0 ecc_code[1] =3D p[1];
> + =A0 =A0 =A0 ecc_code[0] =3D p[1];
> + =A0 =A0 =A0 ecc_code[1] =3D p[2];
> =A0 =A0 =A0 =A0ecc_code[2] =3D p[3];
>
> =A0 =A0 =A0 =A0return 0;
>
> Does anybody see a problem with my method of reproducing the bug? This
> bug is deadly for our customers. I don't want to make the change unless
> it is absolutely necessary..

Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

-vimal

>
> Cheers,
> =A0 Sean
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
@ 2009-08-21  5:17             ` vimal singh
  0 siblings, 0 replies; 33+ messages in thread
From: vimal singh @ 2009-08-21  5:17 UTC (permalink / raw)
  To: Sean MacLennan; +Cc: u-boot, Stefan Roese, Feng Kan, linux-mtd, linuxppc-dev

<snip>

> With the current ndfc code, the error correction gets the bits wrong.
> Switching it back to the original way and the correction is correct.
>
> diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> index 89bf85a..497e175 100644
> --- a/drivers/mtd/nand/ndfc.c
> +++ b/drivers/mtd/nand/ndfc.c
> @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
>
>        wmb();
>        ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> -       /* The NDFC uses Smart Media (SMC) bytes order */
> -       ecc_code[0] = p[2];
> -       ecc_code[1] = p[1];
> +       ecc_code[0] = p[1];
> +       ecc_code[1] = p[2];
>        ecc_code[2] = p[3];
>
>        return 0;
>
> Does anybody see a problem with my method of reproducing the bug? This
> bug is deadly for our customers. I don't want to make the change unless
> it is absolutely necessary..

Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

-vimal

>
> Cheers,
>   Sean
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND ECC Error with wrong SMC ording bug
@ 2009-08-21  5:17             ` vimal singh
  0 siblings, 0 replies; 33+ messages in thread
From: vimal singh @ 2009-08-21  5:17 UTC (permalink / raw)
  To: u-boot

<snip>

> With the current ndfc code, the error correction gets the bits wrong.
> Switching it back to the original way and the correction is correct.
>
> diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> index 89bf85a..497e175 100644
> --- a/drivers/mtd/nand/ndfc.c
> +++ b/drivers/mtd/nand/ndfc.c
> @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
>
> ? ? ? ?wmb();
> ? ? ? ?ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> - ? ? ? /* The NDFC uses Smart Media (SMC) bytes order */
> - ? ? ? ecc_code[0] = p[2];
> - ? ? ? ecc_code[1] = p[1];
> + ? ? ? ecc_code[0] = p[1];
> + ? ? ? ecc_code[1] = p[2];
> ? ? ? ?ecc_code[2] = p[3];
>
> ? ? ? ?return 0;
>
> Does anybody see a problem with my method of reproducing the bug? This
> bug is deadly for our customers. I don't want to make the change unless
> it is absolutely necessary..

Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

-vimal

>
> Cheers,
> ? Sean
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
  2009-08-21  5:17             ` vimal singh
@ 2009-08-21  6:26               ` Sean MacLennan
  -1 siblings, 0 replies; 33+ messages in thread
From: Sean MacLennan @ 2009-08-21  6:26 UTC (permalink / raw)
  To: vimal singh; +Cc: u-boot, Stefan Roese, Feng Kan, linux-mtd, linuxppc-dev

On Fri, 21 Aug 2009 10:47:09 +0530
vimal singh <vimal.newwork@gmail.com> wrote:

> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

It is automagically selected when you select the NDFC driver.

Cheers,
   Sean

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND ECC Error with wrong SMC ording bug
@ 2009-08-21  6:26               ` Sean MacLennan
  0 siblings, 0 replies; 33+ messages in thread
From: Sean MacLennan @ 2009-08-21  6:26 UTC (permalink / raw)
  To: u-boot

On Fri, 21 Aug 2009 10:47:09 +0530
vimal singh <vimal.newwork@gmail.com> wrote:

> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

It is automagically selected when you select the NDFC driver.

Cheers,
   Sean

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
  2009-08-21  5:17             ` vimal singh
@ 2009-08-21  6:27               ` Stefan Roese
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Roese @ 2009-08-21  6:27 UTC (permalink / raw)
  To: u-boot; +Cc: linuxppc-dev, linux-mtd, vimal singh, Sean MacLennan

On Friday 21 August 2009 07:17:09 vimal singh wrote:
> > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> > index 89bf85a..497e175 100644
> > --- a/drivers/mtd/nand/ndfc.c
> > +++ b/drivers/mtd/nand/ndfc.c
> > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
> >
> >        wmb();
> >        ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> > -       /* The NDFC uses Smart Media (SMC) bytes order */
> > -       ecc_code[0] = p[2];
> > -       ecc_code[1] = p[1];
> > +       ecc_code[0] = p[1];
> > +       ecc_code[1] = p[2];
> >        ecc_code[2] = p[3];
> >
> >        return 0;
> >
> > Does anybody see a problem with my method of reproducing the bug? This
> > bug is deadly for our customers. I don't want to make the change unless
> > it is absolutely necessary..
>
> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

Yes, MTD_NAND_ECC_SMC is selected via Kconfig for this driver.

Cheers,
Stefan

--
DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND ECC Error with wrong SMC ording bug
@ 2009-08-21  6:27               ` Stefan Roese
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roese @ 2009-08-21  6:27 UTC (permalink / raw)
  To: u-boot

On Friday 21 August 2009 07:17:09 vimal singh wrote:
> > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> > index 89bf85a..497e175 100644
> > --- a/drivers/mtd/nand/ndfc.c
> > +++ b/drivers/mtd/nand/ndfc.c
> > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
> >
> >        wmb();
> >        ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> > -       /* The NDFC uses Smart Media (SMC) bytes order */
> > -       ecc_code[0] = p[2];
> > -       ecc_code[1] = p[1];
> > +       ecc_code[0] = p[1];
> > +       ecc_code[1] = p[2];
> >        ecc_code[2] = p[3];
> >
> >        return 0;
> >
> > Does anybody see a problem with my method of reproducing the bug? This
> > bug is deadly for our customers. I don't want to make the change unless
> > it is absolutely necessary..
>
> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

Yes, MTD_NAND_ECC_SMC is selected via Kconfig for this driver.

Cheers,
Stefan

--
DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office at denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
  2009-08-21  5:17             ` vimal singh
  (?)
@ 2009-08-21  6:30               ` Victor Gallardo
  -1 siblings, 0 replies; 33+ messages in thread
From: Victor Gallardo @ 2009-08-21  6:30 UTC (permalink / raw)
  To: vimal singh, Sean MacLennan
  Cc: u-boot, Stefan Roese, Feng Kan, linux-mtd, linuxppc-dev

Hi Vimal,=0A=0A> > With the current ndfc code, the error correction gets th=
e bits wrong.=0A> > Switching it back to the original way and the correctio=
n is correct.=0A> >=0A> > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mt=
d/nand/ndfc.c=0A> > index 89bf85a..497e175 100644=0A> > --- a/drivers/mtd/n=
and/ndfc.c=0A> > +++ b/drivers/mtd/nand/ndfc.c=0A> > @@ -101,9 +101,8 @@ st=
atic int ndfc_calculate_ecc(struct mtd_info *mtd,=0A> >=0A> > =A0 =A0 =A0 =
=A0wmb();=0A> > =A0 =A0 =A0 =A0ecc =3D in_be32(ndfc->ndfcbase + NDFC_ECC);=
=0A> > - =A0 =A0 =A0 /* The NDFC uses Smart Media (SMC) bytes order */=0A> =
> - =A0 =A0 =A0 ecc_code[0] =3D p[2];=0A> > - =A0 =A0 =A0 ecc_code[1] =3D p=
[1];=0A> > + =A0 =A0 =A0 ecc_code[0] =3D p[1];=0A> > + =A0 =A0 =A0 ecc_code=
[1] =3D p[2];=0A> > =A0 =A0 =A0 =A0ecc_code[2] =3D p[3];=0A> >=0A> > =A0 =
=A0 =A0 =A0return 0;=0A> >=0A> > Does anybody see a problem with my method =
of reproducing the bug? This=0A> > bug is deadly for our customers. I don't=
 want to make the change unless=0A> > it is absolutely necessary..=0A> =0A>=
 Just one question: did you enabled MTD_NAND_ECC_SMC in configs?=0A=0AYes, =
it was set.=0A=0ABest Regards,=0A=0AVictor Gallardo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
@ 2009-08-21  6:30               ` Victor Gallardo
  0 siblings, 0 replies; 33+ messages in thread
From: Victor Gallardo @ 2009-08-21  6:30 UTC (permalink / raw)
  To: vimal singh, Sean MacLennan
  Cc: u-boot, Stefan Roese, Feng Kan, linux-mtd, linuxppc-dev

Hi Vimal,

> > With the current ndfc code, the error correction gets the bits wrong.
> > Switching it back to the original way and the correction is correct.
> >
> > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> > index 89bf85a..497e175 100644
> > --- a/drivers/mtd/nand/ndfc.c
> > +++ b/drivers/mtd/nand/ndfc.c
> > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
> >
> >        wmb();
> >        ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> > -       /* The NDFC uses Smart Media (SMC) bytes order */
> > -       ecc_code[0] = p[2];
> > -       ecc_code[1] = p[1];
> > +       ecc_code[0] = p[1];
> > +       ecc_code[1] = p[2];
> >        ecc_code[2] = p[3];
> >
> >        return 0;
> >
> > Does anybody see a problem with my method of reproducing the bug? This
> > bug is deadly for our customers. I don't want to make the change unless
> > it is absolutely necessary..
> 
> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

Yes, it was set.

Best Regards,

Victor Gallardo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND ECC Error with wrong SMC ording bug
@ 2009-08-21  6:30               ` Victor Gallardo
  0 siblings, 0 replies; 33+ messages in thread
From: Victor Gallardo @ 2009-08-21  6:30 UTC (permalink / raw)
  To: u-boot

Hi Vimal,

> > With the current ndfc code, the error correction gets the bits wrong.
> > Switching it back to the original way and the correction is correct.
> >
> > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
> > index 89bf85a..497e175 100644
> > --- a/drivers/mtd/nand/ndfc.c
> > +++ b/drivers/mtd/nand/ndfc.c
> > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
> >
> > ? ? ? ?wmb();
> > ? ? ? ?ecc = in_be32(ndfc->ndfcbase + NDFC_ECC);
> > - ? ? ? /* The NDFC uses Smart Media (SMC) bytes order */
> > - ? ? ? ecc_code[0] = p[2];
> > - ? ? ? ecc_code[1] = p[1];
> > + ? ? ? ecc_code[0] = p[1];
> > + ? ? ? ecc_code[1] = p[2];
> > ? ? ? ?ecc_code[2] = p[3];
> >
> > ? ? ? ?return 0;
> >
> > Does anybody see a problem with my method of reproducing the bug? This
> > bug is deadly for our customers. I don't want to make the change unless
> > it is absolutely necessary..
> 
> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

Yes, it was set.

Best Regards,

Victor Gallardo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [U-Boot] NAND  ECC Error with wrong SMC ording bug
  2009-08-20 23:42           ` Feng Kan
@ 2009-08-21  7:59             ` Stefan Roese
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Roese @ 2009-08-21  7:59 UTC (permalink / raw)
  To: u-boot; +Cc: linuxppc-dev, Feng Kan, linux-mtd, Sean MacLennan

Hi Feng,

On Friday 21 August 2009 01:42:42 Feng Kan wrote:
> We had a board with high number of correctable ECC errors. Which crashed
> the jffs when it
> was miss correcting the wrong byte location.

OK, thanks.

> Do you want me to submit a patch for this, or do you prefer to do it.

Sure, please go ahead and send a patch to fix this in U-Boot as well.

Thanks.

Cheers,
Stefan

--
DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [U-Boot] NAND  ECC Error with wrong SMC ording bug
@ 2009-08-21  7:59             ` Stefan Roese
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roese @ 2009-08-21  7:59 UTC (permalink / raw)
  To: u-boot

Hi Feng,

On Friday 21 August 2009 01:42:42 Feng Kan wrote:
> We had a board with high number of correctable ECC errors. Which crashed
> the jffs when it
> was miss correcting the wrong byte location.

OK, thanks.

> Do you want me to submit a patch for this, or do you prefer to do it.

Sure, please go ahead and send a patch to fix this in U-Boot as well.

Thanks.

Cheers,
Stefan

--
DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office at denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-08-21  7:59 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-17 23:00 [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Paul Mackerras
2009-08-17 23:00 ` [PATCH 2/3 v3] powerpc: Allow perf_counters to access user memory at interrupt time Paul Mackerras
2009-08-18  4:24   ` Benjamin Herrenschmidt
2009-08-17 23:01 ` [PATCH 3/3 v3] perf_counter: powerpc: Add callchain support Paul Mackerras
2009-08-18  4:23   ` Benjamin Herrenschmidt
2009-08-18  0:00 ` [PATCH 1/3 v3] powerpc/32: Always order writes to halves of 64-bit PTEs Kumar Gala
2009-08-18  0:14   ` Paul Mackerras
2009-08-18  4:24 ` Benjamin Herrenschmidt
2009-08-19 23:16   ` NAND ECC Error with wrong SMC ording bug Feng Kan
2009-08-19 23:16     ` [U-Boot] " Feng Kan
2009-08-20  4:38     ` Sean MacLennan
2009-08-20  4:38       ` [U-Boot] " Sean MacLennan
2009-08-20  5:01       ` Stefan Roese
2009-08-20  5:01         ` Stefan Roese
2009-08-20 19:36         ` Sean MacLennan
2009-08-20 19:36           ` Sean MacLennan
2009-08-20 22:56           ` Victor Gallardo
2009-08-20 22:56             ` Victor Gallardo
2009-08-20 22:56             ` Victor Gallardo
2009-08-21  5:17           ` vimal singh
2009-08-21  5:17             ` vimal singh
2009-08-21  5:17             ` vimal singh
2009-08-21  6:26             ` Sean MacLennan
2009-08-21  6:26               ` Sean MacLennan
2009-08-21  6:27             ` Stefan Roese
2009-08-21  6:27               ` Stefan Roese
2009-08-21  6:30             ` Victor Gallardo
2009-08-21  6:30               ` Victor Gallardo
2009-08-21  6:30               ` Victor Gallardo
2009-08-20 23:42         ` Feng Kan
2009-08-20 23:42           ` Feng Kan
2009-08-21  7:59           ` Stefan Roese
2009-08-21  7:59             ` Stefan Roese

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.