All of lore.kernel.org
 help / color / mirror / Atom feed
* pxa3xx_nand issues
@ 2010-09-22 17:12 pieterg
  2010-09-23  6:05 ` Eric Miao
  0 siblings, 1 reply; 15+ messages in thread
From: pieterg @ 2010-09-22 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

In my search for the cause of the huge number of single/double bit errors 
I'm experiencing on colibri pxa320/310 devices, I've come across this 
commit

http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892

According to the commitlog, it attempts to work around an issue regarding 
non-page-aligned reads.
The workaround seems to force page-aligned access, by dropping the offset 
within the page (column address bytes).
However, in my setup (with a jffs2 filesystem on nand), non-page-aligned 
reads never occur, but non-page-aligned writes occur very frequently. 
(during the jffs2 gc).
These are also affected by this commit, while the commitlog does not state 
whether or not the same issue would occur for the program command, and in 
that case, whether or not the same workaround would apply.

I've tried to revert the commit, but unfortunately this doesn't reduce the 
huge number of single/double bit errors (and jffs2 crc errors as a result) 
I'm getting.

But having these non-aligned writes during GC, would that indicate a problem 
with my jffs2 image parameters perhaps?
(though I cannot imagine this could actually cause double bit errors)

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-22 17:12 pxa3xx_nand issues pieterg
@ 2010-09-23  6:05 ` Eric Miao
  2010-09-23 11:32   ` pieterg
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Miao @ 2010-09-23  6:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
> In my search for the cause of the huge number of single/double bit errors
> I'm experiencing on colibri pxa320/310 devices, I've come across this
> commit
>
> http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
>
> According to the commitlog, it attempts to work around an issue regarding
> non-page-aligned reads.
> The workaround seems to force page-aligned access, by dropping the offset
> within the page (column address bytes).
> However, in my setup (with a jffs2 filesystem on nand), non-page-aligned
> reads never occur, but non-page-aligned writes occur very frequently.
> (during the jffs2 gc).
> These are also affected by this commit, while the commitlog does not state
> whether or not the same issue would occur for the program command, and in
> that case, whether or not the same workaround would apply.
>
> I've tried to revert the commit, but unfortunately this doesn't reduce the
> huge number of single/double bit errors (and jffs2 crc errors as a result)
> I'm getting.
>
> But having these non-aligned writes during GC, would that indicate a problem
> with my jffs2 image parameters perhaps?
> (though I cannot imagine this could actually cause double bit errors)
>

It might not be related to the commit above.  The NAND controller will
always read the whole page and ignoring the column address, that patch
tries to make less confusion. The offset is actually handled completely
by software (memorized).

Cc'ed Haojian and Lei.

> Rgds, Pieter
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-23  6:05 ` Eric Miao
@ 2010-09-23 11:32   ` pieterg
  2010-09-23 15:29     ` pieterg
  0 siblings, 1 reply; 15+ messages in thread
From: pieterg @ 2010-09-23 11:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
> On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
> > In my search for the cause of the huge number of single/double bit
> > errors I'm experiencing on colibri pxa320/310 devices, I've come across
> > this commit
> >
> > 
http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
> >
> > According to the commitlog, it attempts to work around an issue
> > regarding non-page-aligned reads.
> > The workaround seems to force page-aligned access, by dropping the
> > offset within the page (column address bytes).
> > However, in my setup (with a jffs2 filesystem on nand),
> > non-page-aligned reads never occur, but non-page-aligned writes occur
> > very frequently. (during the jffs2 gc).
> > These are also affected by this commit, while the commitlog does not
> > state whether or not the same issue would occur for the program
> > command, and in that case, whether or not the same workaround would
> > apply.
> >
> > I've tried to revert the commit, but unfortunately this doesn't reduce
> > the huge number of single/double bit errors (and jffs2 crc errors as a
> > result) I'm getting.
> >
> > But having these non-aligned writes during GC, would that indicate a
> > problem with my jffs2 image parameters perhaps?
> > (though I cannot imagine this could actually cause double bit errors)
>
> It might not be related to the commit above.  The NAND controller will
> always read the whole page and ignoring the column address, that patch
> tries to make less confusion. The offset is actually handled completely
> by software (memorized).

I can see how the read offset works, but I do not quite see how this would 
work for writes (which call the same prepare_read_prog_cmd, and have their 
column address stripped as well).
Found out that this happens when writing oob data by the way; these are 
writes with offset 2048 within the page. Jffs2 does this when writing 
cleanmarkers.

However, I'm also convinced that this is probably unrelated to my problems.
In fact, the problem always occurs on the same pages.
I could identify about 10 eraseblocks with pages which produce single/double 
bit errors.
After I marked them bad (manually), I've seen no more bit errors, and the 
jffs2 rootfs has remained perfectly healthy.

Apparently a double bit error is not a reason to consider a block bad; jffs2 
does not mark a block bad untill it failed to be erased more than 2 times.
But it seems the nand controller (or at least the pxa3xx_nand driver) 
doesn't report any problems when erasing these blocks. (I will further 
investigate this)

I would happily blame this on the NAND which might be bad, if this were just 
a single board instead of all colibri pxa320/310 boards I've tried so far, 
more than 5 in total.

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-23 11:32   ` pieterg
@ 2010-09-23 15:29     ` pieterg
  2010-09-23 18:03       ` Matt Reimer
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: pieterg @ 2010-09-23 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 23 September 2010 13:32:26 pieterg wrote:
> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
> > > In my search for the cause of the huge number of single/double bit
> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
> > > across this commit
> > > 
http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
> > > According to the commitlog, it attempts to work around an issue
> > > regarding non-page-aligned reads.
> > > The workaround seems to force page-aligned access, by dropping the
> > > offset within the page (column address bytes).
> > > However, in my setup (with a jffs2 filesystem on nand),
> > > non-page-aligned reads never occur, but non-page-aligned writes occur
> > > very frequently. (during the jffs2 gc).
> > > These are also affected by this commit, while the commitlog does not
> > > state whether or not the same issue would occur for the program
> > > command, and in that case, whether or not the same workaround would
> > > apply.
> > >
> > > I've tried to revert the commit, but unfortunately this doesn't
> > > reduce the huge number of single/double bit errors (and jffs2 crc
> > > errors as a result) I'm getting.
> > >
> > > But having these non-aligned writes during GC, would that indicate a
> > > problem with my jffs2 image parameters perhaps?
> > > (though I cannot imagine this could actually cause double bit errors)
> >
> > It might not be related to the commit above.  The NAND controller will
> > always read the whole page and ignoring the column address, that patch
> > tries to make less confusion. The offset is actually handled completely
> > by software (memorized).
>
> I can see how the read offset works, but I do not quite see how this
> would work for writes (which call the same prepare_read_prog_cmd, and
> have their column address stripped as well).
> Found out that this happens when writing oob data by the way; these are
> writes with offset 2048 within the page. Jffs2 does this when writing
> cleanmarkers.

Tested this, and found out that this commit is actually quite essential for 
writes as well.
Without it, the OOB data doesn't get written.
So we can close this part of the topic, commit 7f9938d0 is perfectly fine.

> I could identify about 10 eraseblocks with pages which produce
> single/double bit errors.
> After I marked them bad (manually), I've seen no more bit errors, and the
> jffs2 rootfs has remained perfectly healthy.

Turned out to be a short-term solution.
After a while I got more double-bit errors, and ended up bad-marking a dozen 
or so other eraseblocks, and it does not seem to stop.

Strangest thing is that when I write a new jffs2 image with uboot (nand 
erase, nand write) or with the kernel (flash_eraseall, nandwrite), it never 
contains any biterrors when I mount it.
Only after the filesystem has been mounted, gets modified, and then after 
the first reboot, the biterrors are there.

One other issue which I noticed because besides double bit errors I get many 
single bit errors as well; the ERR_SBERR is never cleared.
ERR_DBERR is cleared to ERR_NONE in two locations, but ERR_SBERR is not.
(probably in order to allow pxa3xx_nand_ecc_correct to pick it up)
However, I've seen that the retcode could still be ERR_SBERR in 
pxa3xx_nand_waitfunc, causing an erase error to be assumed, as a result all 
eraseblocks in the partition ended up being marked bad in a loop, till 
there were no more remaining eraseblocks.
I guess ERR_SBERR should probably be ignored in pxa3xx_nand_waitfunc?

That's what I did in the remainder of my tests (after having unmarked the 
blocks that were wrongly marked bad) so I think this issue did not 
contribute to my biterror problems.

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-23 15:29     ` pieterg
@ 2010-09-23 18:03       ` Matt Reimer
  2010-09-25  2:50       ` Haojian Zhuang
  2010-09-26 14:32       ` Lei Wen
  2 siblings, 0 replies; 15+ messages in thread
From: Matt Reimer @ 2010-09-23 18:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 23, 2010 at 8:29 AM, pieterg <pieterg@gmx.com> wrote:
> On Thursday 23 September 2010 13:32:26 pieterg wrote:
>> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
>> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
>> > > In my search for the cause of the huge number of single/double bit
>> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
>> > > across this commit
>> > >
> http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
>> > > According to the commitlog, it attempts to work around an issue
>> > > regarding non-page-aligned reads.
>> > > The workaround seems to force page-aligned access, by dropping the
>> > > offset within the page (column address bytes).
>> > > However, in my setup (with a jffs2 filesystem on nand),
>> > > non-page-aligned reads never occur, but non-page-aligned writes occur
>> > > very frequently. (during the jffs2 gc).
>> > > These are also affected by this commit, while the commitlog does not
>> > > state whether or not the same issue would occur for the program
>> > > command, and in that case, whether or not the same workaround would
>> > > apply.
>> > >
>> > > I've tried to revert the commit, but unfortunately this doesn't
>> > > reduce the huge number of single/double bit errors (and jffs2 crc
>> > > errors as a result) I'm getting.
>> > >
>> > > But having these non-aligned writes during GC, would that indicate a
>> > > problem with my jffs2 image parameters perhaps?
>> > > (though I cannot imagine this could actually cause double bit errors)
>> >
>> > It might not be related to the commit above. ?The NAND controller will
>> > always read the whole page and ignoring the column address, that patch
>> > tries to make less confusion. The offset is actually handled completely
>> > by software (memorized).
>>
>> I can see how the read offset works, but I do not quite see how this
>> would work for writes (which call the same prepare_read_prog_cmd, and
>> have their column address stripped as well).
>> Found out that this happens when writing oob data by the way; these are
>> writes with offset 2048 within the page. Jffs2 does this when writing
>> cleanmarkers.
>
> Tested this, and found out that this commit is actually quite essential for
> writes as well.
> Without it, the OOB data doesn't get written.
> So we can close this part of the topic, commit 7f9938d0 is perfectly fine.

IIRC I came up with this patch while trying to use JFFS2 on a pxa320.

Matt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-23 15:29     ` pieterg
  2010-09-23 18:03       ` Matt Reimer
@ 2010-09-25  2:50       ` Haojian Zhuang
  2010-09-27 11:38         ` pieterg
  2010-09-26 14:32       ` Lei Wen
  2 siblings, 1 reply; 15+ messages in thread
From: Haojian Zhuang @ 2010-09-25  2:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 23, 2010 at 11:29 PM, pieterg <pieterg@gmx.com> wrote:
> On Thursday 23 September 2010 13:32:26 pieterg wrote:
>> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
>> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
>> > > In my search for the cause of the huge number of single/double bit
>> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
>> > > across this commit
>> > >
> http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
>> > > According to the commitlog, it attempts to work around an issue
>> > > regarding non-page-aligned reads.
>> > > The workaround seems to force page-aligned access, by dropping the
>> > > offset within the page (column address bytes).
>> > > However, in my setup (with a jffs2 filesystem on nand),
>> > > non-page-aligned reads never occur, but non-page-aligned writes occur
>> > > very frequently. (during the jffs2 gc).
>> > > These are also affected by this commit, while the commitlog does not
>> > > state whether or not the same issue would occur for the program
>> > > command, and in that case, whether or not the same workaround would
>> > > apply.
>> > >
>> > > I've tried to revert the commit, but unfortunately this doesn't
>> > > reduce the huge number of single/double bit errors (and jffs2 crc
>> > > errors as a result) I'm getting.
>> > >
>> > > But having these non-aligned writes during GC, would that indicate a
>> > > problem with my jffs2 image parameters perhaps?
>> > > (though I cannot imagine this could actually cause double bit errors)
>> >
>> > It might not be related to the commit above. ?The NAND controller will
>> > always read the whole page and ignoring the column address, that patch
>> > tries to make less confusion. The offset is actually handled completely
>> > by software (memorized).
>>
>> I can see how the read offset works, but I do not quite see how this
>> would work for writes (which call the same prepare_read_prog_cmd, and
>> have their column address stripped as well).
>> Found out that this happens when writing oob data by the way; these are
>> writes with offset 2048 within the page. Jffs2 does this when writing
>> cleanmarkers.
>
> Tested this, and found out that this commit is actually quite essential for
> writes as well.
> Without it, the OOB data doesn't get written.
> So we can close this part of the topic, commit 7f9938d0 is perfectly fine.
>
>> I could identify about 10 eraseblocks with pages which produce
>> single/double bit errors.
>> After I marked them bad (manually), I've seen no more bit errors, and the
>> jffs2 rootfs has remained perfectly healthy.
>
> Turned out to be a short-term solution.
> After a while I got more double-bit errors, and ended up bad-marking a dozen
> or so other eraseblocks, and it does not seem to stop.
>
> Strangest thing is that when I write a new jffs2 image with uboot (nand
> erase, nand write) or with the kernel (flash_eraseall, nandwrite), it never
> contains any biterrors when I mount it.
> Only after the filesystem has been mounted, gets modified, and then after
> the first reboot, the biterrors are there.
Could you make sure whether these "wrong" block are truely bad block?
Maybe you can erase/write them continuously multi-times in XDB.

>
> One other issue which I noticed because besides double bit errors I get many
> single bit errors as well; the ERR_SBERR is never cleared.
> ERR_DBERR is cleared to ERR_NONE in two locations, but ERR_SBERR is not.
> (probably in order to allow pxa3xx_nand_ecc_correct to pick it up)
> However, I've seen that the retcode could still be ERR_SBERR in
> pxa3xx_nand_waitfunc, causing an erase error to be assumed, as a result all
> eraseblocks in the partition ended up being marked bad in a loop, till
> there were no more remaining eraseblocks.
> I guess ERR_SBERR should probably be ignored in pxa3xx_nand_waitfunc?
>
Yes, ERR_SBERR should be ignored since NAND controller can correct this.

> That's what I did in the remainder of my tests (after having unmarked the
> blocks that were wrongly marked bad) so I think this issue did not
> contribute to my biterror problems.
>
> Rgds, Pieter
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-23 15:29     ` pieterg
  2010-09-23 18:03       ` Matt Reimer
  2010-09-25  2:50       ` Haojian Zhuang
@ 2010-09-26 14:32       ` Lei Wen
  2010-09-27 11:54         ` pieterg
  2 siblings, 1 reply; 15+ messages in thread
From: Lei Wen @ 2010-09-26 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 23, 2010 at 11:29 PM, pieterg <pieterg@gmx.com> wrote:
> On Thursday 23 September 2010 13:32:26 pieterg wrote:
>> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
>> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
>> > > In my search for the cause of the huge number of single/double bit
>> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
>> > > across this commit
>> > >
> http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
>> > > According to the commitlog, it attempts to work around an issue
>> > > regarding non-page-aligned reads.
>> > > The workaround seems to force page-aligned access, by dropping the
>> > > offset within the page (column address bytes).
>> > > However, in my setup (with a jffs2 filesystem on nand),
>> > > non-page-aligned reads never occur, but non-page-aligned writes occur
>> > > very frequently. (during the jffs2 gc).
>> > > These are also affected by this commit, while the commitlog does not
>> > > state whether or not the same issue would occur for the program
>> > > command, and in that case, whether or not the same workaround would
>> > > apply.
>> > >
>> > > I've tried to revert the commit, but unfortunately this doesn't
>> > > reduce the huge number of single/double bit errors (and jffs2 crc
>> > > errors as a result) I'm getting.
>> > >
>> > > But having these non-aligned writes during GC, would that indicate a
>> > > problem with my jffs2 image parameters perhaps?
>> > > (though I cannot imagine this could actually cause double bit errors)
>> >
>> > It might not be related to the commit above. ?The NAND controller will
>> > always read the whole page and ignoring the column address, that patch
>> > tries to make less confusion. The offset is actually handled completely
>> > by software (memorized).
>>
>> I can see how the read offset works, but I do not quite see how this
>> would work for writes (which call the same prepare_read_prog_cmd, and
>> have their column address stripped as well).
>> Found out that this happens when writing oob data by the way; these are
>> writes with offset 2048 within the page. Jffs2 does this when writing
>> cleanmarkers.
>
> Tested this, and found out that this commit is actually quite essential for
> writes as well.
> Without it, the OOB data doesn't get written.
> So we can close this part of the topic, commit 7f9938d0 is perfectly fine.

PXA3xx NAND controller write semantic is to send whole page of data to
the NAND flash with
the page's address. If you set the ndcr1 not page align value, it is
also fine by pxa3xx_nand
sending the data to the flash, but nand flash would not accept this
kind of behavior as
it is defined in its spec.

Certainly, if you really need to do this, there is still has ways. :)
Send the RANDOM DATA INPUT command (0x80 + 5cycle address + 0x85 +
2cycles column address) + 0x10 would serve this.
But seems the pxa310 cannot do such job, which is supported by newer
silicon in pxa168 or mmp2.

>
>> I could identify about 10 eraseblocks with pages which produce
>> single/double bit errors.
>> After I marked them bad (manually), I've seen no more bit errors, and the
>> jffs2 rootfs has remained perfectly healthy.
>
> Turned out to be a short-term solution.
> After a while I got more double-bit errors, and ended up bad-marking a dozen
> or so other eraseblocks, and it does not seem to stop.
>
> Strangest thing is that when I write a new jffs2 image with uboot (nand
> erase, nand write) or with the kernel (flash_eraseall, nandwrite), it never
> contains any biterrors when I mount it.
> Only after the filesystem has been mounted, gets modified, and then after
> the first reboot, the biterrors are there.

You may notice that when a new file system is mounted, the flash is
only read by controller.
This mean your uboot is all right for writing, and your kernel is also
ok for reading.
While your driver write function may got broken. Timing? Not so sure...

>
> One other issue which I noticed because besides double bit errors I get many
> single bit errors as well; the ERR_SBERR is never cleared.
> ERR_DBERR is cleared to ERR_NONE in two locations, but ERR_SBERR is not.
> (probably in order to allow pxa3xx_nand_ecc_correct to pick it up)
> However, I've seen that the retcode could still be ERR_SBERR in
> pxa3xx_nand_waitfunc, causing an erase error to be assumed, as a result all
> eraseblocks in the partition ended up being marked bad in a loop, till
> there were no more remaining eraseblocks.
> I guess ERR_SBERR should probably be ignored in pxa3xx_nand_waitfunc?

Em, Although ERR_SBERR indicate this error can be corrected by nand
controller, it
still make sense to report this to upper level. FS like UBIFS could
use this message to
do the flash data integrity maintenance. This already be fixed in my
patch set which
sent a month ago.
>
> That's what I did in the remainder of my tests (after having unmarked the
> blocks that were wrongly marked bad) so I think this issue did not
> contribute to my biterror problems.

Biterr may be caused by timing, bad block... I think you'd better use
the mtd test built in linux
kernel to make sure timing is all right.

Best regards,
Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-25  2:50       ` Haojian Zhuang
@ 2010-09-27 11:38         ` pieterg
  0 siblings, 0 replies; 15+ messages in thread
From: pieterg @ 2010-09-27 11:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 25 September 2010 04:50:04 Haojian Zhuang wrote:
> On Thu, Sep 23, 2010 at 11:29 PM, pieterg <pieterg@gmx.com> wrote:
> > On Thursday 23 September 2010 13:32:26 pieterg wrote:
> >> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
> >> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
> >> > > In my search for the cause of the huge number of single/double bit
> >> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
> >> > > across this commit
> >
> > http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=co
> >mmit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
> >
> >> > > According to the commitlog, it attempts to work around an issue
> >> > > regarding non-page-aligned reads.
> >> > > The workaround seems to force page-aligned access, by dropping the
> >> > > offset within the page (column address bytes).
> >> > > However, in my setup (with a jffs2 filesystem on nand),
> >> > > non-page-aligned reads never occur, but non-page-aligned writes
> >> > > occur very frequently. (during the jffs2 gc).
> >> > > These are also affected by this commit, while the commitlog does
> >> > > not state whether or not the same issue would occur for the
> >> > > program command, and in that case, whether or not the same
> >> > > workaround would apply.
> >> > >
> >> > > I've tried to revert the commit, but unfortunately this doesn't
> >> > > reduce the huge number of single/double bit errors (and jffs2 crc
> >> > > errors as a result) I'm getting.
> >> > >
> >> > > But having these non-aligned writes during GC, would that indicate
> >> > > a problem with my jffs2 image parameters perhaps?
> >> > > (though I cannot imagine this could actually cause double bit
> >> > > errors)
> >> >
> >> > It might not be related to the commit above. ?The NAND controller
> >> > will always read the whole page and ignoring the column address,
> >> > that patch tries to make less confusion. The offset is actually
> >> > handled completely by software (memorized).
> >>
> >> I can see how the read offset works, but I do not quite see how this
> >> would work for writes (which call the same prepare_read_prog_cmd, and
> >> have their column address stripped as well).
> >> Found out that this happens when writing oob data by the way; these
> >> are writes with offset 2048 within the page. Jffs2 does this when
> >> writing cleanmarkers.
> >
> > Tested this, and found out that this commit is actually quite essential
> > for writes as well.
> > Without it, the OOB data doesn't get written.
> > So we can close this part of the topic, commit 7f9938d0 is perfectly
> > fine.
> >
> >> I could identify about 10 eraseblocks with pages which produce
> >> single/double bit errors.
> >> After I marked them bad (manually), I've seen no more bit errors, and
> >> the jffs2 rootfs has remained perfectly healthy.
> >
> > Turned out to be a short-term solution.
> > After a while I got more double-bit errors, and ended up bad-marking a
> > dozen or so other eraseblocks, and it does not seem to stop.
> >
> > Strangest thing is that when I write a new jffs2 image with uboot (nand
> > erase, nand write) or with the kernel (flash_eraseall, nandwrite), it
> > never contains any biterrors when I mount it.
> > Only after the filesystem has been mounted, gets modified, and then
> > after the first reboot, the biterrors are there.
>
> Could you make sure whether these "wrong" block are truely bad block?
> Maybe you can erase/write them continuously multi-times in XDB.

Unfortunately I don't have XDB.
However, I can erase/write/read them with u-boot and with the kernel 
(flash_eraseall / nandwrite), several times, without ever getting a 
NDSR_CS0_BBD status.
However, I get many NDSR_DBERR and NDSR_SBERR interrupts.

But because these occur during a read, the kernel never takes any action, 
the blocks will not be marked bad.
(And I find it hard to believe that such a huge number of blocks on a brand 
new chip would actually be bad)

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-26 14:32       ` Lei Wen
@ 2010-09-27 11:54         ` pieterg
  2010-09-27 12:22           ` Lei Wen
  0 siblings, 1 reply; 15+ messages in thread
From: pieterg @ 2010-09-27 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 26 September 2010 16:32:47 Lei Wen wrote:
> On Thu, Sep 23, 2010 at 11:29 PM, pieterg <pieterg@gmx.com> wrote:
> > On Thursday 23 September 2010 13:32:26 pieterg wrote:
> >> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
> >> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
> >> > > In my search for the cause of the huge number of single/double bit
> >> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
> >> > > across this commit
> >
> > http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=co
> >mmit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
> >
> >> > > According to the commitlog, it attempts to work around an issue
> >> > > regarding non-page-aligned reads.
> >> > > The workaround seems to force page-aligned access, by dropping the
> >> > > offset within the page (column address bytes).
> >> > > However, in my setup (with a jffs2 filesystem on nand),
> >> > > non-page-aligned reads never occur, but non-page-aligned writes
> >> > > occur very frequently. (during the jffs2 gc).
> >> > > These are also affected by this commit, while the commitlog does
> >> > > not state whether or not the same issue would occur for the
> >> > > program command, and in that case, whether or not the same
> >> > > workaround would apply.
> >> > >
> >> > > I've tried to revert the commit, but unfortunately this doesn't
> >> > > reduce the huge number of single/double bit errors (and jffs2 crc
> >> > > errors as a result) I'm getting.
> >> > >
> >> > > But having these non-aligned writes during GC, would that indicate
> >> > > a problem with my jffs2 image parameters perhaps?
> >> > > (though I cannot imagine this could actually cause double bit
> >> > > errors)
> >> >
> >> > It might not be related to the commit above. ?The NAND controller
> >> > will always read the whole page and ignoring the column address,
> >> > that patch tries to make less confusion. The offset is actually
> >> > handled completely by software (memorized).
> >>
> >> I can see how the read offset works, but I do not quite see how this
> >> would work for writes (which call the same prepare_read_prog_cmd, and
> >> have their column address stripped as well).
> >> Found out that this happens when writing oob data by the way; these
> >> are writes with offset 2048 within the page. Jffs2 does this when
> >> writing cleanmarkers.
> >
> > Tested this, and found out that this commit is actually quite essential
> > for writes as well.
> > Without it, the OOB data doesn't get written.
> > So we can close this part of the topic, commit 7f9938d0 is perfectly
> > fine.
>
> PXA3xx NAND controller write semantic is to send whole page of data to
> the NAND flash with
> the page's address. If you set the ndcr1 not page align value, it is
> also fine by pxa3xx_nand
> sending the data to the flash, but nand flash would not accept this
> kind of behavior as
> it is defined in its spec.
>
> Certainly, if you really need to do this, there is still has ways. :)
> Send the RANDOM DATA INPUT command (0x80 + 5cycle address + 0x85 +
> 2cycles column address) + 0x10 would serve this.
> But seems the pxa310 cannot do such job, which is supported by newer
> silicon in pxa168 or mmp2.
>
> >> I could identify about 10 eraseblocks with pages which produce
> >> single/double bit errors.
> >> After I marked them bad (manually), I've seen no more bit errors, and
> >> the jffs2 rootfs has remained perfectly healthy.
> >
> > Turned out to be a short-term solution.
> > After a while I got more double-bit errors, and ended up bad-marking a
> > dozen or so other eraseblocks, and it does not seem to stop.
> >
> > Strangest thing is that when I write a new jffs2 image with uboot (nand
> > erase, nand write) or with the kernel (flash_eraseall, nandwrite), it
> > never contains any biterrors when I mount it.
> > Only after the filesystem has been mounted, gets modified, and then
> > after the first reboot, the biterrors are there.
>
> You may notice that when a new file system is mounted, the flash is
> only read by controller.
> This mean your uboot is all right for writing, and your kernel is also
> ok for reading.
> While your driver write function may got broken. Timing? Not so sure...
>
> > One other issue which I noticed because besides double bit errors I get
> > many single bit errors as well; the ERR_SBERR is never cleared.
> > ERR_DBERR is cleared to ERR_NONE in two locations, but ERR_SBERR is
> > not. (probably in order to allow pxa3xx_nand_ecc_correct to pick it up)
> > However, I've seen that the retcode could still be ERR_SBERR in
> > pxa3xx_nand_waitfunc, causing an erase error to be assumed, as a result
> > all eraseblocks in the partition ended up being marked bad in a loop,
> > till there were no more remaining eraseblocks.
> > I guess ERR_SBERR should probably be ignored in pxa3xx_nand_waitfunc?
>
> Em, Although ERR_SBERR indicate this error can be corrected by nand
> controller, it
> still make sense to report this to upper level. FS like UBIFS could
> use this message to
> do the flash data integrity maintenance. This already be fixed in my
> patch set which
> sent a month ago.
>
> > That's what I did in the remainder of my tests (after having unmarked
> > the blocks that were wrongly marked bad) so I think this issue did not
> > contribute to my biterror problems.
>
> Biterr may be caused by timing, bad block... I think you'd better use
> the mtd test built in linux
> kernel to make sure timing is all right.

Which mtd test in particular?
I've run most tests, without any errors:

-oobtest (complains only about read-past-oob-size not returning a proper 
error)
-pagetest
-subpagetest
-readtest
-speedtest

Not even a single bit error during any of those tests.

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-27 11:54         ` pieterg
@ 2010-09-27 12:22           ` Lei Wen
  2010-09-27 13:50             ` pieterg
  0 siblings, 1 reply; 15+ messages in thread
From: Lei Wen @ 2010-09-27 12:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 27, 2010 at 7:54 PM, pieterg <pieterg@gmx.com> wrote:
> On Sunday 26 September 2010 16:32:47 Lei Wen wrote:
>> On Thu, Sep 23, 2010 at 11:29 PM, pieterg <pieterg@gmx.com> wrote:
>> > On Thursday 23 September 2010 13:32:26 pieterg wrote:
>> >> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
>> >> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
>> >> > > In my search for the cause of the huge number of single/double bit
>> >> > > errors I'm experiencing on colibri pxa320/310 devices, I've come
>> >> > > across this commit
>> >
>> > http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=co
>> >mmit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
>> >
>> >> > > According to the commitlog, it attempts to work around an issue
>> >> > > regarding non-page-aligned reads.
>> >> > > The workaround seems to force page-aligned access, by dropping the
>> >> > > offset within the page (column address bytes).
>> >> > > However, in my setup (with a jffs2 filesystem on nand),
>> >> > > non-page-aligned reads never occur, but non-page-aligned writes
>> >> > > occur very frequently. (during the jffs2 gc).
>> >> > > These are also affected by this commit, while the commitlog does
>> >> > > not state whether or not the same issue would occur for the
>> >> > > program command, and in that case, whether or not the same
>> >> > > workaround would apply.
>> >> > >
>> >> > > I've tried to revert the commit, but unfortunately this doesn't
>> >> > > reduce the huge number of single/double bit errors (and jffs2 crc
>> >> > > errors as a result) I'm getting.
>> >> > >
>> >> > > But having these non-aligned writes during GC, would that indicate
>> >> > > a problem with my jffs2 image parameters perhaps?
>> >> > > (though I cannot imagine this could actually cause double bit
>> >> > > errors)
>> >> >
>> >> > It might not be related to the commit above. ?The NAND controller
>> >> > will always read the whole page and ignoring the column address,
>> >> > that patch tries to make less confusion. The offset is actually
>> >> > handled completely by software (memorized).
>> >>
>> >> I can see how the read offset works, but I do not quite see how this
>> >> would work for writes (which call the same prepare_read_prog_cmd, and
>> >> have their column address stripped as well).
>> >> Found out that this happens when writing oob data by the way; these
>> >> are writes with offset 2048 within the page. Jffs2 does this when
>> >> writing cleanmarkers.
>> >
>> > Tested this, and found out that this commit is actually quite essential
>> > for writes as well.
>> > Without it, the OOB data doesn't get written.
>> > So we can close this part of the topic, commit 7f9938d0 is perfectly
>> > fine.
>>
>> PXA3xx NAND controller write semantic is to send whole page of data to
>> the NAND flash with
>> the page's address. If you set the ndcr1 not page align value, it is
>> also fine by pxa3xx_nand
>> sending the data to the flash, but nand flash would not accept this
>> kind of behavior as
>> it is defined in its spec.
>>
>> Certainly, if you really need to do this, there is still has ways. :)
>> Send the RANDOM DATA INPUT command (0x80 + 5cycle address + 0x85 +
>> 2cycles column address) + 0x10 would serve this.
>> But seems the pxa310 cannot do such job, which is supported by newer
>> silicon in pxa168 or mmp2.
>>
>> >> I could identify about 10 eraseblocks with pages which produce
>> >> single/double bit errors.
>> >> After I marked them bad (manually), I've seen no more bit errors, and
>> >> the jffs2 rootfs has remained perfectly healthy.
>> >
>> > Turned out to be a short-term solution.
>> > After a while I got more double-bit errors, and ended up bad-marking a
>> > dozen or so other eraseblocks, and it does not seem to stop.
>> >
>> > Strangest thing is that when I write a new jffs2 image with uboot (nand
>> > erase, nand write) or with the kernel (flash_eraseall, nandwrite), it
>> > never contains any biterrors when I mount it.
>> > Only after the filesystem has been mounted, gets modified, and then
>> > after the first reboot, the biterrors are there.
>>
>> You may notice that when a new file system is mounted, the flash is
>> only read by controller.
>> This mean your uboot is all right for writing, and your kernel is also
>> ok for reading.
>> While your driver write function may got broken. Timing? Not so sure...
>>
>> > One other issue which I noticed because besides double bit errors I get
>> > many single bit errors as well; the ERR_SBERR is never cleared.
>> > ERR_DBERR is cleared to ERR_NONE in two locations, but ERR_SBERR is
>> > not. (probably in order to allow pxa3xx_nand_ecc_correct to pick it up)
>> > However, I've seen that the retcode could still be ERR_SBERR in
>> > pxa3xx_nand_waitfunc, causing an erase error to be assumed, as a result
>> > all eraseblocks in the partition ended up being marked bad in a loop,
>> > till there were no more remaining eraseblocks.
>> > I guess ERR_SBERR should probably be ignored in pxa3xx_nand_waitfunc?
>>
>> Em, Although ERR_SBERR indicate this error can be corrected by nand
>> controller, it
>> still make sense to report this to upper level. FS like UBIFS could
>> use this message to
>> do the flash data integrity maintenance. This already be fixed in my
>> patch set which
>> sent a month ago.
>>
>> > That's what I did in the remainder of my tests (after having unmarked
>> > the blocks that were wrongly marked bad) so I think this issue did not
>> > contribute to my biterror problems.
>>
>> Biterr may be caused by timing, bad block... I think you'd better use
>> the mtd test built in linux
>> kernel to make sure timing is all right.
>
> Which mtd test in particular?
> I've run most tests, without any errors:
>
> -oobtest (complains only about read-past-oob-size not returning a proper
> error)
> -pagetest
> -subpagetest
> -readtest
> -speedtest
>
> Not even a single bit error during any of those tests.

That is so weird...
Does your jffs2 image make correct? Page size and block size set right?
You must know that if you write twice on one page, you could also see the
double bit error or single bit error, but it doesn't relate with the
bad block or
timing. And this could explain why you could get all test passed in mtd tests.

Best regards,
Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-27 12:22           ` Lei Wen
@ 2010-09-27 13:50             ` pieterg
  2010-09-27 17:39               ` pieterg
  0 siblings, 1 reply; 15+ messages in thread
From: pieterg @ 2010-09-27 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 27 September 2010 14:22:37 Lei Wen wrote:
> On Mon, Sep 27, 2010 at 7:54 PM, pieterg <pieterg@gmx.com> wrote:
> > On Sunday 26 September 2010 16:32:47 Lei Wen wrote:
> >> On Thu, Sep 23, 2010 at 11:29 PM, pieterg <pieterg@gmx.com> wrote:
> >> > On Thursday 23 September 2010 13:32:26 pieterg wrote:
> >> >> On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
> >> >> > On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg@gmx.com> wrote:
> >> >> > > In my search for the cause of the huge number of single/double
> >> >> > > bit errors I'm experiencing on colibri pxa320/310 devices, I've
> >> >> > > come across this commit
> >> >
> >> > http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a
> >> >=co mmit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
> >> >
> >> >> > > According to the commitlog, it attempts to work around an issue
> >> >> > > regarding non-page-aligned reads.
> >> >> > > The workaround seems to force page-aligned access, by dropping
> >> >> > > the offset within the page (column address bytes).
> >> >> > > However, in my setup (with a jffs2 filesystem on nand),
> >> >> > > non-page-aligned reads never occur, but non-page-aligned writes
> >> >> > > occur very frequently. (during the jffs2 gc).
> >> >> > > These are also affected by this commit, while the commitlog
> >> >> > > does not state whether or not the same issue would occur for
> >> >> > > the program command, and in that case, whether or not the same
> >> >> > > workaround would apply.
> >> >> > >
> >> >> > > I've tried to revert the commit, but unfortunately this doesn't
> >> >> > > reduce the huge number of single/double bit errors (and jffs2
> >> >> > > crc errors as a result) I'm getting.
> >> >> > >
> >> >> > > But having these non-aligned writes during GC, would that
> >> >> > > indicate a problem with my jffs2 image parameters perhaps?
> >> >> > > (though I cannot imagine this could actually cause double bit
> >> >> > > errors)
> >> >> >
> >> >> > It might not be related to the commit above. ?The NAND controller
> >> >> > will always read the whole page and ignoring the column address,
> >> >> > that patch tries to make less confusion. The offset is actually
> >> >> > handled completely by software (memorized).
> >> >>
> >> >> I can see how the read offset works, but I do not quite see how
> >> >> this would work for writes (which call the same
> >> >> prepare_read_prog_cmd, and have their column address stripped as
> >> >> well).
> >> >> Found out that this happens when writing oob data by the way; these
> >> >> are writes with offset 2048 within the page. Jffs2 does this when
> >> >> writing cleanmarkers.
> >> >
> >> > Tested this, and found out that this commit is actually quite
> >> > essential for writes as well.
> >> > Without it, the OOB data doesn't get written.
> >> > So we can close this part of the topic, commit 7f9938d0 is perfectly
> >> > fine.
> >>
> >> PXA3xx NAND controller write semantic is to send whole page of data to
> >> the NAND flash with
> >> the page's address. If you set the ndcr1 not page align value, it is
> >> also fine by pxa3xx_nand
> >> sending the data to the flash, but nand flash would not accept this
> >> kind of behavior as
> >> it is defined in its spec.
> >>
> >> Certainly, if you really need to do this, there is still has ways. :)
> >> Send the RANDOM DATA INPUT command (0x80 + 5cycle address + 0x85 +
> >> 2cycles column address) + 0x10 would serve this.
> >> But seems the pxa310 cannot do such job, which is supported by newer
> >> silicon in pxa168 or mmp2.
> >>
> >> >> I could identify about 10 eraseblocks with pages which produce
> >> >> single/double bit errors.
> >> >> After I marked them bad (manually), I've seen no more bit errors,
> >> >> and the jffs2 rootfs has remained perfectly healthy.
> >> >
> >> > Turned out to be a short-term solution.
> >> > After a while I got more double-bit errors, and ended up bad-marking
> >> > a dozen or so other eraseblocks, and it does not seem to stop.
> >> >
> >> > Strangest thing is that when I write a new jffs2 image with uboot
> >> > (nand erase, nand write) or with the kernel (flash_eraseall,
> >> > nandwrite), it never contains any biterrors when I mount it.
> >> > Only after the filesystem has been mounted, gets modified, and then
> >> > after the first reboot, the biterrors are there.
> >>
> >> You may notice that when a new file system is mounted, the flash is
> >> only read by controller.
> >> This mean your uboot is all right for writing, and your kernel is also
> >> ok for reading.
> >> While your driver write function may got broken. Timing? Not so
> >> sure...
> >>
> >> > One other issue which I noticed because besides double bit errors I
> >> > get many single bit errors as well; the ERR_SBERR is never cleared.
> >> > ERR_DBERR is cleared to ERR_NONE in two locations, but ERR_SBERR is
> >> > not. (probably in order to allow pxa3xx_nand_ecc_correct to pick it
> >> > up) However, I've seen that the retcode could still be ERR_SBERR in
> >> > pxa3xx_nand_waitfunc, causing an erase error to be assumed, as a
> >> > result all eraseblocks in the partition ended up being marked bad in
> >> > a loop, till there were no more remaining eraseblocks.
> >> > I guess ERR_SBERR should probably be ignored in
> >> > pxa3xx_nand_waitfunc?
> >>
> >> Em, Although ERR_SBERR indicate this error can be corrected by nand
> >> controller, it
> >> still make sense to report this to upper level. FS like UBIFS could
> >> use this message to
> >> do the flash data integrity maintenance. This already be fixed in my
> >> patch set which
> >> sent a month ago.
> >>
> >> > That's what I did in the remainder of my tests (after having
> >> > unmarked the blocks that were wrongly marked bad) so I think this
> >> > issue did not contribute to my biterror problems.
> >>
> >> Biterr may be caused by timing, bad block... I think you'd better use
> >> the mtd test built in linux
> >> kernel to make sure timing is all right.
> >
> > Which mtd test in particular?
> > I've run most tests, without any errors:
> >
> > -oobtest (complains only about read-past-oob-size not returning a
> > proper error)
> > -pagetest
> > -subpagetest
> > -readtest
> > -speedtest
> >
> > Not even a single bit error during any of those tests.
>
> That is so weird...
> Does your jffs2 image make correct? Page size and block size set right?

I think so.
--eraseblock=0x20000 --pad --no-cleanmarkers
I'm not specifying a --pagesize because as far as I understand, this is not 
related to the flash page. So the default (4K) should be fine.

But just to rule out any mkfs.jffs2 related problems, I've also tried to 
mount an empty partition, and copy my rootfs contents into it.
That results in the exact same kind of problems.

> You must know that if you write twice on one page, you could also see the
> double bit error or single bit error, but it doesn't relate with the
> bad block or
> timing. And this could explain why you could get all test passed in mtd
> tests.

Yes, it very much looks like that.
After the initial writing of the image and mounting the filesystem, 
everything is still fine. But after making changes to the mounted 
filesystem, the problems start.
But I cannot see how/why jffs2 would decide to write without erasing first.
The problems start in the first eraseblock after the last one written with 
nandwrite.
That block was definitely erased with flash_eraseall (I obviously erased the 
entire partition), no cleanmarkers were written, so even an immediate page 
write by jffs2 would be fine.

(this is not only a jffs2 issue by the way, tried ubifs as well, though I 
did not test it as thorougly as jffs2; as soon as I saw the first double 
bit error with ubifs I assumed the situation was the same and I could just 
as well stick to jffs2 to debug this problem)

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-27 13:50             ` pieterg
@ 2010-09-27 17:39               ` pieterg
  2010-10-01  0:15                 ` Marek Vasut
  0 siblings, 1 reply; 15+ messages in thread
From: pieterg @ 2010-09-27 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 27 September 2010 15:50:50 pieterg wrote:
> On Monday 27 September 2010 14:22:37 Lei Wen wrote:
> > Does your jffs2 image make correct? Page size and block size set right?
>
> I think so.
> --eraseblock=0x20000 --pad --no-cleanmarkers

Well... that caused most of my problems.
--pad means pad to eraseblock size, not pagesize.
So the first writes in the new filesystem would cause pages in that last 
eraseblock to be overwritten.
With SLC I always got away with that, but with MLC, clearly not.

(now I'm still left with lots of singlebit errors, and occasional jffs2 CRC 
errors, pxa3xx_nand was obviously not meant to deal with such low quality 
MLC, no action is taken on bit errors, but at least everything can be 
explained now)

Thanks everybody for helping me find the cause of this problem.

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-09-27 17:39               ` pieterg
@ 2010-10-01  0:15                 ` Marek Vasut
  2010-10-01  6:55                   ` pieterg
  0 siblings, 1 reply; 15+ messages in thread
From: Marek Vasut @ 2010-10-01  0:15 UTC (permalink / raw)
  To: linux-arm-kernel

Dne Po 27. z??? 2010 19:39:04 pieterg napsal(a):
> On Monday 27 September 2010 15:50:50 pieterg wrote:
> > On Monday 27 September 2010 14:22:37 Lei Wen wrote:
> > > Does your jffs2 image make correct? Page size and block size set right?
> > 
> > I think so.
> > --eraseblock=0x20000 --pad --no-cleanmarkers
> 
> Well... that caused most of my problems.
> --pad means pad to eraseblock size, not pagesize.
> So the first writes in the new filesystem would cause pages in that last
> eraseblock to be overwritten.
> With SLC I always got away with that, but with MLC, clearly not.
> 
> (now I'm still left with lots of singlebit errors, and occasional jffs2 CRC
> errors, pxa3xx_nand was obviously not meant to deal with such low quality
> MLC, no action is taken on bit errors, but at least everything can be
> explained now)
> 
> Thanks everybody for helping me find the cause of this problem.

So this is closed?

btw. wont ubifs be better choice for such a big flash ?

Cheers
> 
> Rgds, Pieter
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-10-01  0:15                 ` Marek Vasut
@ 2010-10-01  6:55                   ` pieterg
  2010-10-01  7:25                     ` Marek Vasut
  0 siblings, 1 reply; 15+ messages in thread
From: pieterg @ 2010-10-01  6:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 01 October 2010 02:15:09 Marek Vasut wrote:
> Dne Po 27. z??? 2010 19:39:04 pieterg napsal(a):
> > On Monday 27 September 2010 15:50:50 pieterg wrote:
> > > On Monday 27 September 2010 14:22:37 Lei Wen wrote:
> > > > Does your jffs2 image make correct? Page size and block size set
> > > > right?
> > >
> > > I think so.
> > > --eraseblock=0x20000 --pad --no-cleanmarkers
> >
> > Well... that caused most of my problems.
> > --pad means pad to eraseblock size, not pagesize.
> > So the first writes in the new filesystem would cause pages in that
> > last eraseblock to be overwritten.
> > With SLC I always got away with that, but with MLC, clearly not.
> >
> > (now I'm still left with lots of singlebit errors, and occasional jffs2
> > CRC errors, pxa3xx_nand was obviously not meant to deal with such low
> > quality MLC, no action is taken on bit errors, but at least everything
> > can be explained now)
> >
> > Thanks everybody for helping me find the cause of this problem.
>
> So this is closed?

Yes, I guess what remains (many single bit errors, and occasional double bit 
errors) is 'normal' for this type of MLC NAND.
Still not happy with it, especially since there is no way to recover when 
double bit errors occur, but for now this will have to do.

> btw. wont ubifs be better choice for such a big flash ?

Certainly, it's on my list (after I've updated to the latest u-boot to get 
ubi support there as well)

Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pxa3xx_nand issues
  2010-10-01  6:55                   ` pieterg
@ 2010-10-01  7:25                     ` Marek Vasut
  0 siblings, 0 replies; 15+ messages in thread
From: Marek Vasut @ 2010-10-01  7:25 UTC (permalink / raw)
  To: linux-arm-kernel

Dne P? 1. ??jna 2010 08:55:52 pieterg napsal(a):
> On Friday 01 October 2010 02:15:09 Marek Vasut wrote:
> > Dne Po 27. z??? 2010 19:39:04 pieterg napsal(a):
> > > On Monday 27 September 2010 15:50:50 pieterg wrote:
> > > > On Monday 27 September 2010 14:22:37 Lei Wen wrote:
> > > > > Does your jffs2 image make correct? Page size and block size set
> > > > > right?
> > > > 
> > > > I think so.
> > > > --eraseblock=0x20000 --pad --no-cleanmarkers
> > > 
> > > Well... that caused most of my problems.
> > > --pad means pad to eraseblock size, not pagesize.
> > > So the first writes in the new filesystem would cause pages in that
> > > last eraseblock to be overwritten.
> > > With SLC I always got away with that, but with MLC, clearly not.
> > > 
> > > (now I'm still left with lots of singlebit errors, and occasional jffs2
> > > CRC errors, pxa3xx_nand was obviously not meant to deal with such low
> > > quality MLC, no action is taken on bit errors, but at least everything
> > > can be explained now)
> > > 
> > > Thanks everybody for helping me find the cause of this problem.
> > 
> > So this is closed?
> 
> Yes, I guess what remains (many single bit errors, and occasional double
> bit errors) is 'normal' for this type of MLC NAND.
> Still not happy with it, especially since there is no way to recover when
> double bit errors occur, but for now this will have to do.
> 
> > btw. wont ubifs be better choice for such a big flash ?
> 
> Certainly, it's on my list (after I've updated to the latest u-boot to get
> ubi support there as well)

I'm currently fixing 2010.09, there were some changes that broke support for 
every pxa (or rather every ARM) machine. I have two machines fixed, but those are 
still pxa2xx. PXA3xx is on the list and thanks to this, pxa3xx should soon also 
hit mainline (well ... when, that depends on Wolfgang).

Cheers
> 
> Rgds, Pieter

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-10-01  7:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-22 17:12 pxa3xx_nand issues pieterg
2010-09-23  6:05 ` Eric Miao
2010-09-23 11:32   ` pieterg
2010-09-23 15:29     ` pieterg
2010-09-23 18:03       ` Matt Reimer
2010-09-25  2:50       ` Haojian Zhuang
2010-09-27 11:38         ` pieterg
2010-09-26 14:32       ` Lei Wen
2010-09-27 11:54         ` pieterg
2010-09-27 12:22           ` Lei Wen
2010-09-27 13:50             ` pieterg
2010-09-27 17:39               ` pieterg
2010-10-01  0:15                 ` Marek Vasut
2010-10-01  6:55                   ` pieterg
2010-10-01  7:25                     ` Marek Vasut

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.