All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: UBI-FS Master Node failure
       [not found] <5587c30b.a8c1420a.4df03.ffffae2aSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2015-06-22  9:01 ` Richard Weinberger
  2015-06-22  9:20   ` David J Myers
       [not found]   ` <5587d358.69d0b40a.4fc7.ffff87eeSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Richard Weinberger @ 2015-06-22  9:01 UTC (permalink / raw)
  To: David J Myers; +Cc: linux-mtd

On Mon, Jun 22, 2015 at 10:01 AM, David J Myers
<david.myers@amg-panogenics.com> wrote:
> Guys,
> I have an embedded product running a system based on linux-2.6.29,
> originally from the IC supplier, but patched and modified to our spec.

That's a very old kernel. Did you backport *all* stable patches?

> Recently we have had two units go down with the same UBI-FS Master Node
> failure, both in LEB-2 at slightly different offsets. The console log looks
> like this:-
>
> [    6.645845] UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB
> 2:86016
> [    6.653268] UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data
> at LEB 2:86016
> [    6.668163] UBIFS error (pid 1): ubifs_scan: LEB 2 scanning failed
> [    6.889661] UBIFS error (pid 1): ubifs_recover_master_node: failed to
> recover master node
> [    6.898497] List of all partitions:
> [    6.902188] 1f00             128 mtdblock0 (driver?)
> [    6.907218] 1f01             768 mtdblock1 (driver?)
> [    6.912314] 1f02             128 mtdblock2 (driver?)
> [    6.917318] 1f03            4096 mtdblock3 (driver?)
> [    6.922395] 1f04            4096 mtdblock4 (driver?)
> [    6.927397] 1f05           65536 mtdblock5 (driver?)
> [    6.932464] 1f06          184320 mtdblock6 (driver?)
> [    6.937455] No filesystem could mount root, tried:  ubifs
> [    6.942988] Kernel panic - not syncing: VFS: Unable to mount root fs on
> unknown-block(0,0)
>
> I found two patches to fs/ubifs/recovery.c since 2.6.29 which I applied, but
> they did not fix the corrupted flash. These two patches were this one:-

I fear it is not that easy.  Maybe you're facing a different issue.
And if the data is already corrupted there is no guarantee that a UBIFS
recent UBIFS can fix it.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: UBI-FS Master Node failure
  2015-06-22  9:01 ` UBI-FS Master Node failure Richard Weinberger
@ 2015-06-22  9:20   ` David J Myers
       [not found]   ` <5587d358.69d0b40a.4fc7.ffff87eeSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 4+ messages in thread
From: David J Myers @ 2015-06-22  9:20 UTC (permalink / raw)
  To: 'Richard Weinberger'; +Cc: linux-mtd

>> Guys,
>> I have an embedded product running a system based on linux-2.6.29, 
>> originally from the IC supplier, but patched and modified to our spec.

>That's a very old kernel. Did you backport *all* stable patches?

I only back-ported the two patches as shown previously. These seemed to be the only two relevant patches I could find. Do you know of any other relevant patches?

>> Recently we have had two units go down with the same UBI-FS Master 
>> Node failure, both in LEB-2 at slightly different offsets. The console 
>> log looks like this:-
>>
>> [    6.645845] UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB
>> 2:86016
>> [    6.653268] UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data
>> at LEB 2:86016
>> [    6.668163] UBIFS error (pid 1): ubifs_scan: LEB 2 scanning failed
>> [    6.889661] UBIFS error (pid 1): ubifs_recover_master_node: failed to
>> recover master node
>> [    6.898497] List of all partitions:
>> [    6.902188] 1f00             128 mtdblock0 (driver?)
>> [    6.907218] 1f01             768 mtdblock1 (driver?)
>> [    6.912314] 1f02             128 mtdblock2 (driver?)
>> [    6.917318] 1f03            4096 mtdblock3 (driver?)
>> [    6.922395] 1f04            4096 mtdblock4 (driver?)
>> [    6.927397] 1f05           65536 mtdblock5 (driver?)
>> [    6.932464] 1f06          184320 mtdblock6 (driver?)
>> [    6.937455] No filesystem could mount root, tried:  ubifs
>> [    6.942988] Kernel panic - not syncing: VFS: Unable to mount root fs on
>> unknown-block(0,0)
>>
>> I found two patches to fs/ubifs/recovery.c since 2.6.29 which I 
>> applied, but they did not fix the corrupted flash. These two patches 
>> were this one:-

>I fear it is not that easy.  Maybe you're facing a different issue.
>And if the data is already corrupted there is no guarantee that a UBIFS recent UBIFS can fix it.

I was hoping these patches would recover the corrupt UBIFS, however I'll settle for preventing the same fault occurring in other units. Do you think this problem is fixed in the recent UBIFS implementations? How can I test this?

Many thanks.

--
Thanks,
//richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: UBI-FS Master Node failure
       [not found]   ` <5587d358.69d0b40a.4fc7.ffff87eeSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2015-06-22  9:26     ` Richard Weinberger
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Weinberger @ 2015-06-22  9:26 UTC (permalink / raw)
  To: David J Myers; +Cc: linux-mtd

Am 22.06.2015 um 11:20 schrieb David J Myers:
>>> Guys,
>>> I have an embedded product running a system based on linux-2.6.29, 
>>> originally from the IC supplier, but patched and modified to our spec.
> 
>> That's a very old kernel. Did you backport *all* stable patches?
> 
> I only back-ported the two patches as shown previously. These seemed to be the only two relevant patches I could find. Do you know of any other relevant patches?

UBI and UBIFS got a lot of fixes after 2.6.29. All are relevant.

>>> Recently we have had two units go down with the same UBI-FS Master 
>>> Node failure, both in LEB-2 at slightly different offsets. The console 
>>> log looks like this:-
>>>
>>> [    6.645845] UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB
>>> 2:86016
>>> [    6.653268] UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data
>>> at LEB 2:86016
>>> [    6.668163] UBIFS error (pid 1): ubifs_scan: LEB 2 scanning failed
>>> [    6.889661] UBIFS error (pid 1): ubifs_recover_master_node: failed to
>>> recover master node
>>> [    6.898497] List of all partitions:
>>> [    6.902188] 1f00             128 mtdblock0 (driver?)
>>> [    6.907218] 1f01             768 mtdblock1 (driver?)
>>> [    6.912314] 1f02             128 mtdblock2 (driver?)
>>> [    6.917318] 1f03            4096 mtdblock3 (driver?)
>>> [    6.922395] 1f04            4096 mtdblock4 (driver?)
>>> [    6.927397] 1f05           65536 mtdblock5 (driver?)
>>> [    6.932464] 1f06          184320 mtdblock6 (driver?)
>>> [    6.937455] No filesystem could mount root, tried:  ubifs
>>> [    6.942988] Kernel panic - not syncing: VFS: Unable to mount root fs on
>>> unknown-block(0,0)
>>>
>>> I found two patches to fs/ubifs/recovery.c since 2.6.29 which I 
>>> applied, but they did not fix the corrupted flash. These two patches 
>>> were this one:-
> 
>> I fear it is not that easy.  Maybe you're facing a different issue.
>> And if the data is already corrupted there is no guarantee that a UBIFS recent UBIFS can fix it.
> 
> I was hoping these patches would recover the corrupt UBIFS, however I'll settle for preventing the same fault occurring in other units. Do you think this problem is fixed in the recent UBIFS implementations? How can I test this?

As written above, UBI and UIBFS faced a lot of issues which have been fixed.
Without a detailed analysis of the corrupted UBIFS I can't say much.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* UBI-FS Master Node failure
@ 2015-06-22  8:01 David J Myers
  0 siblings, 0 replies; 4+ messages in thread
From: David J Myers @ 2015-06-22  8:01 UTC (permalink / raw)
  To: linux-mtd

Guys,
I have an embedded product running a system based on linux-2.6.29,
originally from the IC supplier, but patched and modified to our spec.
Recently we have had two units go down with the same UBI-FS Master Node
failure, both in LEB-2 at slightly different offsets. The console log looks
like this:-

[    6.645845] UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB
2:86016
[    6.653268] UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data
at LEB 2:86016
[    6.668163] UBIFS error (pid 1): ubifs_scan: LEB 2 scanning failed
[    6.889661] UBIFS error (pid 1): ubifs_recover_master_node: failed to
recover master node
[    6.898497] List of all partitions:
[    6.902188] 1f00             128 mtdblock0 (driver?)
[    6.907218] 1f01             768 mtdblock1 (driver?)
[    6.912314] 1f02             128 mtdblock2 (driver?)
[    6.917318] 1f03            4096 mtdblock3 (driver?)
[    6.922395] 1f04            4096 mtdblock4 (driver?)
[    6.927397] 1f05           65536 mtdblock5 (driver?)
[    6.932464] 1f06          184320 mtdblock6 (driver?)
[    6.937455] No filesystem could mount root, tried:  ubifs
[    6.942988] Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(0,0)

I found two patches to fs/ubifs/recovery.c since 2.6.29 which I applied, but
they did not fix the corrupted flash. These two patches were this one:-

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index f94ddf7..31d09d1 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -299,6 +299,32 @@ int ubifs_recover_master_node(struct ubifs_info *c)
                      goto out_free;
              }
              memcpy(c->rcvrd_mst_node, c->mst_node, UBIFS_MST_NODE_SZ);
+
+              /*
+              * We had to recover the master node, which means there was an
+              * unclean reboot. However, it is possible that the master
node
+              * is clean at this point, i.e., %UBIFS_MST_DIRTY is not set.
+              * E.g., consider the following chain of events:
+              *
+              * 1. UBIFS was cleanly unmounted, so the master node is clean
+              * 2. UBIFS is being mounted R/W and starts changing the
master
+              *    node in the first (%UBIFS_MST_LNUM). A power cut
happens,
+              *    so this LEB ends up with some amount of garbage at the
+              *    end.
+              * 3. UBIFS is being mounted R/O. We reach this place and
+              *    recover the master node from the second LEB
+              *    (%UBIFS_MST_LNUM + 1). But we cannot update the media
+              *    because we are being mounted R/O. We have to defer the
+              *    operation.
+              * 4. However, this master node (@c->mst_node) is marked as
+              *    clean (since the step 1). And if we just return, the
+              *    mount code will be confused and won't recover the master
+              *    node when it is re-mounter R/W later.
+              *
+              *    Thus, to force the recovery by marking the master node
as
+              *    dirty.
+              */
+              c->mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
       } else {
              /* Write the recovered master node */
              c->max_sqnum = le64_to_cpu(mst->ch.sqnum) - 1;
-- 
1.7.10.2

And this one:-
> diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
> index 5256f42..2c98d77 100644
> --- a/fs/ubifs/recovery.c
> +++ b/fs/ubifs/recovery.c
> @@ -273,7 +273,8 @@ int ubifs_recover_master_node(struct ubifs_info *c)
>                              if (cor1)
>                                     goto out_err;
>                              mst = mst1;
> -                    } else if (offs1 == 0 && offs2 + sz >= c->leb_size) {
> +                    } else if (offs1 == 0 &&
> +                               c->leb_size - offs2 - sz < sz) {
>                              /* 1st LEB was unmapped and written, 2nd not
*/
>                              if (cor1)
>                                     goto out_err;
>

Please advise.
-	J

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-22  9:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5587c30b.a8c1420a.4df03.ffffae2aSMTPIN_ADDED_BROKEN@mx.google.com>
2015-06-22  9:01 ` UBI-FS Master Node failure Richard Weinberger
2015-06-22  9:20   ` David J Myers
     [not found]   ` <5587d358.69d0b40a.4fc7.ffff87eeSMTPIN_ADDED_BROKEN@mx.google.com>
2015-06-22  9:26     ` Richard Weinberger
2015-06-22  8:01 David J Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.