All of lore.kernel.org
 help / color / mirror / Atom feed
* Regarding latest EUCLEAN/bitflip_threshold patchset
@ 2012-05-09 10:26 Shmulik Ladkani
  2012-05-11 11:51 ` Artem Bityutskiy
  0 siblings, 1 reply; 8+ messages in thread
From: Shmulik Ladkani @ 2012-05-09 10:26 UTC (permalink / raw)
  To: Mike Dunn; +Cc: linux-mtd, dedekind1

Hi Mike,

I noticed 'nand_do_read_oob' explicitly tests 'mtd->ecc_stats.corrected'
and returns EUCLEAN - as it was not ported to use the new
'bitflip_threshold'.

From nand_base.c:

	if (mtd->ecc_stats.failed - stats.failed)
		return -EBADMSG;

	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;

- May drivers increment mtd->ecc_stats.{corrected,failed} during their
  ecc.read_oob() call?

- If so, can we (should we?) report EUCLEAN according to the
  bitflip_threshold in this case?

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-09 10:26 Regarding latest EUCLEAN/bitflip_threshold patchset Shmulik Ladkani
@ 2012-05-11 11:51 ` Artem Bityutskiy
  2012-05-12 18:37   ` Mike Dunn
  0 siblings, 1 reply; 8+ messages in thread
From: Artem Bityutskiy @ 2012-05-11 11:51 UTC (permalink / raw)
  To: Shmulik Ladkani, Mike Dunn; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

On Wed, 2012-05-09 at 13:26 +0300, Shmulik Ladkani wrote:
> Hi Mike,
> 
> I noticed 'nand_do_read_oob' explicitly tests 'mtd->ecc_stats.corrected'
> and returns EUCLEAN - as it was not ported to use the new
> 'bitflip_threshold'.

Good point. Mike did you miss this function?

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-11 11:51 ` Artem Bityutskiy
@ 2012-05-12 18:37   ` Mike Dunn
  2012-05-12 20:13     ` Shmulik Ladkani
  2012-05-14 19:28     ` Mike Dunn
  0 siblings, 2 replies; 8+ messages in thread
From: Mike Dunn @ 2012-05-12 18:37 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Shmulik Ladkani


On 05/11/2012 04:51 AM, Artem Bityutskiy wrote:
> On Wed, 2012-05-09 at 13:26 +0300, Shmulik Ladkani wrote:
>> Hi Mike,
>>
>> I noticed 'nand_do_read_oob' explicitly tests 'mtd->ecc_stats.corrected'
>> and returns EUCLEAN - as it was not ported to use the new
>> 'bitflip_threshold'.
> 
> Good point. Mike did you miss this function?


No, I deliberately left it as-is.  Maybe a litle lazy or sloppy on my part, but
ignoring it is of no consequence (see below).

Shmulik, your posts still don't get to my inbox.  Still wondering why.  Anyway,
pasting from the archive...


> From nand_base.c:
>
> 	if (mtd->ecc_stats.failed - stats.failed)
> 		return -EBADMSG;
>
> 	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
>
> - May drivers increment mtd->ecc_stats.{corrected,failed} during their
>   ecc.read_oob() call?


Currently no nand drivers increment stats.corrected for oob-only reads.  Since
nand_do_read_oob() does not read page data, stats never increment and -EUCLEAN
is never returned.  To avoid complicating the issue, I ignored the case of
reading oob-only.


> - If so, can we (should we?) report EUCLEAN according to the
>   bitflip_threshold in this case?


I guess it depends on how widespread is the desire or capability of performing
ecc on oob-only reads.  The new diskonchip devices (docg3, docg4) are capable of
performing ecc on oob-only data.  These can do one bit corrections over 15 (of
the 16 total) oob bytes using the hamming algorithm (though neither driver
supports it currently).  But since in this case only one bitflip can be
corrected, it will always be below bitflip_threshold.  Then there's the question
of how do you interpret uncorrectible bitflips vis-a-vis eraseblock health when
using a weaker ecc algorithm for oob-only.

These questions are currently all theoretical.  I think the threshold test
should be removed, and replaced with 'return 0', at least for now.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-12 18:37   ` Mike Dunn
@ 2012-05-12 20:13     ` Shmulik Ladkani
  2012-05-16  6:49       ` Brian Norris
  2012-05-14 19:28     ` Mike Dunn
  1 sibling, 1 reply; 8+ messages in thread
From: Shmulik Ladkani @ 2012-05-12 20:13 UTC (permalink / raw)
  To: Mike Dunn, Brian Norris; +Cc: linux-mtd, dedekind1

Hi,

On Sat, 12 May 2012 11:37:37 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> On 05/11/2012 04:51 AM, Artem Bityutskiy wrote:
> > On Wed, 2012-05-09 at 13:26 +0300, Shmulik Ladkani wrote:
> >> Hi Mike,
> >>
> >> I noticed 'nand_do_read_oob' explicitly tests 'mtd->ecc_stats.corrected'
> >> and returns EUCLEAN - as it was not ported to use the new
> >> 'bitflip_threshold'.
> > 
> > Good point. Mike did you miss this function?
> 
> No, I deliberately left it as-is.  Maybe a litle lazy or sloppy on my part, but
> ignoring it is of no consequence (see below).
> 
> Shmulik, your posts still don't get to my inbox.  Still wondering why.  Anyway,
> pasting from the archive...

No idea why this happens, others seem to get my posts...

> > From nand_base.c:
> >
> > 	if (mtd->ecc_stats.failed - stats.failed)
> > 		return -EBADMSG;
> >
> > 	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
> >
> > - May drivers increment mtd->ecc_stats.{corrected,failed} during their
> >   ecc.read_oob() call?
> 
> Currently no nand drivers increment stats.corrected for oob-only reads.  Since
> nand_do_read_oob() does not read page data, stats never increment and -EUCLEAN
> is never returned.  To avoid complicating the issue, I ignored the case of
> reading oob-only.
> 
> > - If so, can we (should we?) report EUCLEAN according to the
> >   bitflip_threshold in this case?
> 
> I guess it depends on how widespread is the desire or capability of performing
> ecc on oob-only reads.  The new diskonchip devices (docg3, docg4) are capable of
> performing ecc on oob-only data.  These can do one bit corrections over 15 (of
> the 16 total) oob bytes using the hamming algorithm (though neither driver
> supports it currently).  But since in this case only one bitflip can be
> corrected, it will always be below bitflip_threshold.  Then there's the question
> of how do you interpret uncorrectible bitflips vis-a-vis eraseblock health when
> using a weaker ecc algorithm for oob-only.

I see.
So the current bitflip_threshold scheme is probably not applicable to
'nand_do_read_oob' - because the strength over the OOB would probably
differ from the page's ECC strength.

> These questions are currently all theoretical.  I think the threshold test
> should be removed, and replaced with 'return 0', at least for now.

Well, I was also surprised to see that 'nand_do_read_oob' may return
EUCLEAN or EBADMSG at all.

Digging further, I found out it was a relatively recent addition:
[041e4575 mtd: nand: handle ECC errors in OOB] by Brian Norris.

Brian, care to elaborate regarding 041e4575, and comment how do you
think it should be ported to the new bitflip_threshold mechanism, if at
all?

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-12 18:37   ` Mike Dunn
  2012-05-12 20:13     ` Shmulik Ladkani
@ 2012-05-14 19:28     ` Mike Dunn
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Dunn @ 2012-05-14 19:28 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Shmulik Ladkani

Hi Shmulik, still no joy.  Pasting from the archives again...


> No idea why this happens, others seem to get my posts...


Clearly.  Otherwise you'd be talking to yourself a lot :)


> I see.
> So the current bitflip_threshold scheme is probably not applicable to
> 'nand_do_read_oob' - because the strength over the OOB would probably
> differ from the page's ECC strength.


Correct.  So if for example the diskonchip docg4 implements hamming in oob, and
there are two bitflips, that's uncorrectible.  But two bitflips would not be
considered indicative of a block going bad for a full page read.


> Well, I was also surprised to see that 'nand_do_read_oob' may return
> EUCLEAN or EBADMSG at all.


Yeah, I chose to just gloss over that.


> Digging further, I found out it was a relatively recent addition:
> [041e4575 mtd: nand: handle ECC errors in OOB] by Brian Norris.


Hmmm...

Thanks,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-12 20:13     ` Shmulik Ladkani
@ 2012-05-16  6:49       ` Brian Norris
  2012-05-16 18:09         ` Mike Dunn
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Norris @ 2012-05-16  6:49 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: linux-mtd, Mike Dunn, dedekind1

Hi,

On Sat, May 12, 2012 at 1:13 PM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> On Sat, 12 May 2012 11:37:37 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
>> On 05/11/2012 04:51 AM, Artem Bityutskiy wrote:
>> > From nand_base.c:
>> >
>> >     if (mtd->ecc_stats.failed - stats.failed)
>> >             return -EBADMSG;
>> >
>> >     return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
>> >
>> > - May drivers increment mtd->ecc_stats.{corrected,failed} during their
>> >   ecc.read_oob() call?
>>
>> Currently no nand drivers increment stats.corrected for oob-only reads.  Since
>> nand_do_read_oob() does not read page data, stats never increment and -EUCLEAN
>> is never returned.  To avoid complicating the issue, I ignored the case of
>> reading oob-only.

My out-of-tree driver increments ecc_stats.corrected.

>> > - If so, can we (should we?) report EUCLEAN according to the
>> >   bitflip_threshold in this case?
>>
>> I guess it depends on how widespread is the desire or capability of performing
>> ecc on oob-only reads.  The new diskonchip devices (docg3, docg4) are capable of
>> performing ecc on oob-only data.  These can do one bit corrections over 15 (of
>> the 16 total) oob bytes using the hamming algorithm (though neither driver
>> supports it currently).  But since in this case only one bitflip can be
>> corrected, it will always be below bitflip_threshold.  Then there's the question
>> of how do you interpret uncorrectible bitflips vis-a-vis eraseblock health when
>> using a weaker ecc algorithm for oob-only.
>
> I see.
> So the current bitflip_threshold scheme is probably not applicable to
> 'nand_do_read_oob' - because the strength over the OOB would probably
> differ from the page's ECC strength.
>
>> These questions are currently all theoretical.  I think the threshold test
>> should be removed, and replaced with 'return 0', at least for now.
>
> Well, I was also surprised to see that 'nand_do_read_oob' may return
> EUCLEAN or EBADMSG at all.
>
> Digging further, I found out it was a relatively recent addition:
> [041e4575 mtd: nand: handle ECC errors in OOB] by Brian Norris.
>
> Brian, care to elaborate regarding 041e4575, and comment how do you
> think it should be ported to the new bitflip_threshold mechanism, if at
> all?

Hmm, well 041e4575 was designed without much of a window into how
others really needed it, as I didn't know of others who had the same
features. My hardware has its own threshold features that can be used
to mask bitflips; it has ECC that covers OOB at the same time as the
page data; when reading OOB only, it actually reads the page data as
well, in order to perform ECC properly. So when I report bitflips from
read_oob, I'm reporting the bitflips for the entire page+OOB sector.
But due to my hardware-based threshold, this only is reported for a
high number of bitflips.

So, I'm not sure how to properly reconcile the new threshold code, the
nand_do_read_oob() EUCLEAN and EBADMSG, and various schemes for
OOB-only ECC (or the common case of no ECC for OOB-only). I'll try to
give this some more thought and get back to you. But please comment if
my feedback so far stirs any ideas with you guys. Perhaps 041e4575 was
not as clean as I thought in the first place.

Brian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-16  6:49       ` Brian Norris
@ 2012-05-16 18:09         ` Mike Dunn
  2012-05-17  9:22           ` Shmulik Ladkani
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Dunn @ 2012-05-16 18:09 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-mtd, Shmulik Ladkani, dedekind1

On 05/15/2012 11:49 PM, Brian Norris wrote:
> 
> My out-of-tree driver increments ecc_stats.corrected.
> 


I should have said "currently no in-tree driver incremenst stats" :)


> Hmm, well 041e4575 was designed without much of a window into how
> others really needed it, as I didn't know of others who had the same
> features. My hardware has its own threshold features that can be used
> to mask bitflips; it has ECC that covers OOB at the same time as the
> page data; when reading OOB only, it actually reads the page data as
> well, in order to perform ECC properly. So when I report bitflips from
> read_oob, I'm reporting the bitflips for the entire page+OOB sector.
> But due to my hardware-based threshold, this only is reported for a
> high number of bitflips.


Ha!  This complicates the issue even more, since reading the whole page *will*
give you useful information on block wear.

Based on my (admittedly liimited) knowledge, my impression is that we should
just not bother with EUCLEAN for oob-only reads.  Reading the whole page for
oob-only ops is a special case, and using a separate ECC scheme intended for
oob-only does not provide useful bitflip data.


> 
> So, I'm not sure how to properly reconcile the new threshold code, the
> nand_do_read_oob() EUCLEAN and EBADMSG, and various schemes for
> OOB-only ECC (or the common case of no ECC for OOB-only). I'll try to
> give this some more thought and get back to you. But please comment if
> my feedback so far stirs any ideas with you guys. Perhaps 041e4575 was
> not as clean as I thought in the first place.


It's hard to be clean in a messy environment :)

Thanks Brian,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Regarding latest EUCLEAN/bitflip_threshold patchset
  2012-05-16 18:09         ` Mike Dunn
@ 2012-05-17  9:22           ` Shmulik Ladkani
  0 siblings, 0 replies; 8+ messages in thread
From: Shmulik Ladkani @ 2012-05-17  9:22 UTC (permalink / raw)
  To: Mike Dunn, Brian Norris; +Cc: linux-mtd, dedekind1

Brian, Mike, thanks,

On Wed, 16 May 2012 11:09:21 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> On 05/15/2012 11:49 PM, Brian Norris wrote:
> > Hmm, well 041e4575 was designed without much of a window into how
> > others really needed it, as I didn't know of others who had the same
> > features. My hardware has its own threshold features that can be used
> > to mask bitflips; it has ECC that covers OOB at the same time as the
> > page data; when reading OOB only, it actually reads the page data as
> > well, in order to perform ECC properly. So when I report bitflips from
> > read_oob, I'm reporting the bitflips for the entire page+OOB sector.
> > But due to my hardware-based threshold, this only is reported for a
> > high number of bitflips.
> 
> 
> Ha!  This complicates the issue even more, since reading the whole page *will*
> give you useful information on block wear.
> 
> Based on my (admittedly liimited) knowledge, my impression is that we should
> just not bother with EUCLEAN for oob-only reads.  Reading the whole page for
> oob-only ops is a special case, and using a separate ECC scheme intended for
> oob-only does not provide useful bitflip data.

So leave-as-is seems most reasonable, given that:
- oob-only reads are not yet accounted vis-a-vis eraseblock health
- in-tree drivers' ECC doesn't cover the OOB (they don't increment
  ecc_stats on a chip->ecc.read_oob call). If they do, a supporting
  framework might be implemented
- Brian's out-of-tree driver isn't affected ;)

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-17  9:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-09 10:26 Regarding latest EUCLEAN/bitflip_threshold patchset Shmulik Ladkani
2012-05-11 11:51 ` Artem Bityutskiy
2012-05-12 18:37   ` Mike Dunn
2012-05-12 20:13     ` Shmulik Ladkani
2012-05-16  6:49       ` Brian Norris
2012-05-16 18:09         ` Mike Dunn
2012-05-17  9:22           ` Shmulik Ladkani
2012-05-14 19:28     ` Mike Dunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.