All of lore.kernel.org
 help / color / mirror / Atom feed
* ubi deadlock on .36+
@ 2010-11-03 21:30 Grazvydas Ignotas
  2010-11-04  7:29 ` Artem Bityutskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Grazvydas Ignotas @ 2010-11-03 21:30 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

Hi,

there seems to be some issue with NAND on my OMAP3 board that causes
CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug
in UBI that makes it loop forever (or very long) printing this:

uncorrectable error :
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
from PEB 0:512, read 512 bytes
uncorrectable error :
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
from PEB 68:512, read 512 bytes
UBI: run torture test for PEB 68
UBI: PEB 68 passed torture test, do not mark it a bad


here is full log of one minute session, after which I killed power:
http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup


Gražvydas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubi deadlock on .36+
  2010-11-03 21:30 ubi deadlock on .36+ Grazvydas Ignotas
@ 2010-11-04  7:29 ` Artem Bityutskiy
  2010-11-04 13:07   ` Grazvydas Ignotas
  0 siblings, 1 reply; 7+ messages in thread
From: Artem Bityutskiy @ 2010-11-04  7:29 UTC (permalink / raw)
  To: Grazvydas Ignotas; +Cc: linux-mtd

On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote:
> Hi,
> 
> there seems to be some issue with NAND on my OMAP3 board that causes
> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug
> in UBI that makes it loop forever (or very long) printing this:
> 
> uncorrectable error :
> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
> from PEB 0:512, read 512 bytes
> uncorrectable error :
> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
> from PEB 68:512, read 512 bytes
> UBI: run torture test for PEB 68
> UBI: PEB 68 passed torture test, do not mark it a bad
> 
> 
> here is full log of one minute session, after which I killed power:
> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup

Hmm, could you please enable UBI debugging and provide me the logs? See
here for some hints:
http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubi deadlock on .36+
  2010-11-04  7:29 ` Artem Bityutskiy
@ 2010-11-04 13:07   ` Grazvydas Ignotas
  2010-11-13 12:37     ` Artem Bityutskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Grazvydas Ignotas @ 2010-11-04 13:07 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote:
>> Hi,
>>
>> there seems to be some issue with NAND on my OMAP3 board that causes
>> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug
>> in UBI that makes it loop forever (or very long) printing this:
>>
>> uncorrectable error :
>> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
>> from PEB 0:512, read 512 bytes
>> uncorrectable error :
>> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
>> from PEB 68:512, read 512 bytes
>> UBI: run torture test for PEB 68
>> UBI: PEB 68 passed torture test, do not mark it a bad
>>
>>
>> here is full log of one minute session, after which I killed power:
>> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup
>
> Hmm, could you please enable UBI debugging and provide me the logs? See
> here for some hints:
> http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport

done:
http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubi deadlock on .36+
  2010-11-04 13:07   ` Grazvydas Ignotas
@ 2010-11-13 12:37     ` Artem Bityutskiy
  2010-11-13 13:15       ` Artem Bityutskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Artem Bityutskiy @ 2010-11-13 12:37 UTC (permalink / raw)
  To: Grazvydas Ignotas; +Cc: linux-mtd

On Thu, 2010-11-04 at 15:07 +0200, Grazvydas Ignotas wrote:
> On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote:
> >> Hi,
> >>
> >> there seems to be some issue with NAND on my OMAP3 board that causes
> >> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug
> >> in UBI that makes it loop forever (or very long) printing this:
> >>
> >> uncorrectable error :
> >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
> >> from PEB 0:512, read 512 bytes
> >> uncorrectable error :
> >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
> >> from PEB 68:512, read 512 bytes
> >> UBI: run torture test for PEB 68
> >> UBI: PEB 68 passed torture test, do not mark it a bad
> >>
> >>
> >> here is full log of one minute session, after which I killed power:
> >> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup
> >
> > Hmm, could you please enable UBI debugging and provide me the logs? See
> > here for some hints:
> > http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport
> 
> done:
> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2

But would it be possible to enable all UBI debugging messages?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubi deadlock on .36+
  2010-11-13 12:37     ` Artem Bityutskiy
@ 2010-11-13 13:15       ` Artem Bityutskiy
  2010-11-13 14:23         ` Grazvydas Ignotas
  0 siblings, 1 reply; 7+ messages in thread
From: Artem Bityutskiy @ 2010-11-13 13:15 UTC (permalink / raw)
  To: Grazvydas Ignotas; +Cc: linux-mtd

On Sat, 2010-11-13 at 14:37 +0200, Artem Bityutskiy wrote:
> On Thu, 2010-11-04 at 15:07 +0200, Grazvydas Ignotas wrote:
> > On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > > On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote:
> > >> Hi,
> > >>
> > >> there seems to be some issue with NAND on my OMAP3 board that causes
> > >> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug
> > >> in UBI that makes it loop forever (or very long) printing this:
> > >>
> > >> uncorrectable error :
> > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
> > >> from PEB 0:512, read 512 bytes
> > >> uncorrectable error :
> > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes
> > >> from PEB 68:512, read 512 bytes
> > >> UBI: run torture test for PEB 68
> > >> UBI: PEB 68 passed torture test, do not mark it a bad
> > >>
> > >>
> > >> here is full log of one minute session, after which I killed power:
> > >> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup
> > >
> > > Hmm, could you please enable UBI debugging and provide me the logs? See
> > > here for some hints:
> > > http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport
> > 
> > done:
> > http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2
> 
> But would it be possible to enable all UBI debugging messages?

While trying to figure out what is happening in your system, I realized
one possible scenario which may confuse UBI. I've added a patch below.
This probably won't fix your issue (but you could try), I need more time
to think about what was happening. But a log with all messages (not only
I/O) would help. Thanks.

>From 703ba5f120644fefef3cfed46c0d8ccf6a15b4ee Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Sat, 13 Nov 2010 15:08:29 +0200
Subject: [PATCH] UBI: improve UBI robustness

When reading data from the flash, corrupt the buffer we are about to
read to before reading. The idea is to fix the following possible
situation:

1. The buffer contains data from previous operation, e.g., read from
   another PEB previously. The data looks like expected, e.g., if we
   just do not read anything and return - the caller would not
   notice this. E.g., if we are reading a VID header, the buffer may
   contain a valid VID header from another PEB.
2. The driver is buggy and returns use success or -EBADMSG or
   -EUCLEAN, but it does not actually put any data to the buffer.

This may confuse UBI or upper layers - they may think the buffer
contains valid data while in fact it is just old data. This is
especially possible because UBI (and UBIFS) relies on CRC, and
treats data as correct even in case of ECC errors if the CRC is
correct.

Try to prevent this situation by changing the first byte of the
buffer.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 drivers/mtd/ubi/io.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
index c2960ac..9ab1a33 100644
--- a/drivers/mtd/ubi/io.c
+++ b/drivers/mtd/ubi/io.c
@@ -146,6 +146,28 @@ int ubi_io_read(const struct ubi_device *ubi, void *buf, int pnum, int offset,
 	if (err)
 		return err;
 
+	/*
+	 * Deliberately corrupt the buffer to improve robustness. Indeed, if we
+	 * do not do this, the following may happen:
+	 * 1. The buffer contains data from previous operation, e.g., read from
+	 *    another PEB previously. The data looks like expected, e.g., if we
+	 *    just do not read anything and return - the caller would not
+	 *    notice this. E.g., if we are reading a VID header, the buffer may
+	 *    contain a valid VID header from another PEB.
+	 * 2. The driver is buggy and returns us success or -EBADMSG or
+	 *    -EUCLEAN, but it does not actually put any data to the buffer.
+	 *
+	 * This may confuse UBI or upper layers - they may think the buffer
+	 * contains valid data while in fact it is just old data. This is
+	 * especially possible because UBI (and UBIFS) relies on CRC, and
+	 * treats data as correct even in case of ECC errors if the CRC is
+	 * correct.
+	 *
+	 * Try to prevent this situation by changing the first byte of the
+	 * buffer.
+	 */
+	*((uint8_t *)buf) ^= 0xFF;
+
 	addr = (loff_t)pnum * ubi->peb_size + offset;
 retry:
 	err = ubi->mtd->read(ubi->mtd, addr, len, &read, buf);
-- 
1.7.2.3

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: ubi deadlock on .36+
  2010-11-13 13:15       ` Artem Bityutskiy
@ 2010-11-13 14:23         ` Grazvydas Ignotas
  2010-11-14  7:50           ` Artem Bityutskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Grazvydas Ignotas @ 2010-11-13 14:23 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

On Sat, Nov 13, 2010 at 3:15 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> While trying to figure out what is happening in your system, I realized
> one possible scenario which may confuse UBI. I've added a patch below.
> This probably won't fix your issue (but you could try), I need more time
> to think about what was happening. But a log with all messages (not only
> I/O) would help. Thanks.

Well I think I already know what's wrong with my driver - it has
subpage reads broken. So UBI tries to read a subpage, driver fails
there, then it runs a torture test on full PEB that passes (because
page reads work right), marks that PEB as good and retries the subpage
read that fails again, and the story repeats. Does that sound like
reasonable scenario, or do you still want more debugging logs?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubi deadlock on .36+
  2010-11-13 14:23         ` Grazvydas Ignotas
@ 2010-11-14  7:50           ` Artem Bityutskiy
  0 siblings, 0 replies; 7+ messages in thread
From: Artem Bityutskiy @ 2010-11-14  7:50 UTC (permalink / raw)
  To: Grazvydas Ignotas; +Cc: linux-mtd

On Sat, 2010-11-13 at 16:23 +0200, Grazvydas Ignotas wrote:
> On Sat, Nov 13, 2010 at 3:15 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > While trying to figure out what is happening in your system, I realized
> > one possible scenario which may confuse UBI. I've added a patch below.
> > This probably won't fix your issue (but you could try), I need more time
> > to think about what was happening. But a log with all messages (not only
> > I/O) would help. Thanks.
> 
> Well I think I already know what's wrong with my driver - it has
> subpage reads broken. So UBI tries to read a subpage, driver fails
> there, then it runs a torture test on full PEB that passes (because
> page reads work right), marks that PEB as good and retries the subpage
> read that fails again, and the story repeats. Does that sound like
> reasonable scenario, or do you still want more debugging logs?

Yaeah, obviously you have driver problems, I'm just interested to
improve UBI's resilience.
 
-- 
Best Regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-14  7:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-03 21:30 ubi deadlock on .36+ Grazvydas Ignotas
2010-11-04  7:29 ` Artem Bityutskiy
2010-11-04 13:07   ` Grazvydas Ignotas
2010-11-13 12:37     ` Artem Bityutskiy
2010-11-13 13:15       ` Artem Bityutskiy
2010-11-13 14:23         ` Grazvydas Ignotas
2010-11-14  7:50           ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.