All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 1/2] UBIFS: fix empty log leb error
@ 2014-12-25 12:58 hujianyang
  2014-12-25 13:10 ` [PATCH RFC 2/2] UBIFS: scan all log lebs during log replay hujianyang
  0 siblings, 1 reply; 4+ messages in thread
From: hujianyang @ 2014-12-25 12:58 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd, shengyong1

If first log leb which is indicated by @log_lnum in mst_node
is empty, current UBIFS will stop finding cs_node and directly
mount the partition.

This patch fix this problem by checking c->cs_sqnum to
guarantee the empty leb is not first log leb and return an
error if the first log leb is incorrectly empty.

Signed-off-by: hujianyang <hujianyang@huawei.com>
---
 fs/ubifs/replay.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 3187925..f13f4b2 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -846,7 +846,14 @@ static int replay_log_leb(struct ubifs_info *c, int lnum, int offs, void *sbuf)
 	}

 	if (sleb->nodes_cnt == 0) {
-		err = 1;
+		if (unlikely(c->cs_sqnum == 0)) {
+			/* This is the first log LEB, should not be empty */
+			ubifs_err("first log leb LEB %d:%d is empty, no CS node exist",
+				  lnum, offs);
+			err = -EINVAL;
+		} else {
+			err = 1;
+		}
 		goto out;
 	}

-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH RFC 2/2] UBIFS: scan all log lebs during log replay
  2014-12-25 12:58 [PATCH RFC 1/2] UBIFS: fix empty log leb error hujianyang
@ 2014-12-25 13:10 ` hujianyang
  2015-01-31 12:46   ` Artem Bityutskiy
  0 siblings, 1 reply; 4+ messages in thread
From: hujianyang @ 2014-12-25 13:10 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd, shengyong1

UBIFS suppose no more log lebs need to be scanned when
replay_log_leb() return %1. But it is not true because
some log lebs maybe incorrectly empty and the next leb
of this empty one may contain valid filesystem data.

This patch scan all the log lebs during log replaying
to guarantee these log lebs are in normal state.

I'm not sure if it is a valuable problem and don't know
if this patch gives a good way to solve this problem
either.

Signed-off-by: hujianyang <hujianyang@huawei.com>
---
 fs/ubifs/replay.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index f13f4b2..5537b90 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -1014,7 +1014,7 @@ out:
  */
 int ubifs_replay_journal(struct ubifs_info *c)
 {
-	int err, lnum, free;
+	int err, lnum, free, end;

 	BUILD_BUG_ON(UBIFS_TRUN_KEY > 5);

@@ -1032,13 +1032,28 @@ int ubifs_replay_journal(struct ubifs_info *c)
 	dbg_mnt("start replaying the journal");
 	c->replaying = 1;
 	lnum = c->ltail_lnum = c->lhead_lnum;
+	end = 0;

 	do {
 		err = replay_log_leb(c, lnum, 0, c->sbuf);
-		if (err == 1)
+
+		/*
+		 * If @end is set, residual log lebs should be empty or
+		 * contain older ref_nodes. replay_log_leb() must return %1.
+		 */
+		if (unlikely(!err && end)) {
+			ubifs_err("LEB %d is valid but considered to be empty",
+				  lnum);
+			err = -EINVAL;
+			goto out;
+		}
+
+		if (err == 1 && !end) {
 			/* We hit the end of the log */
-			break;
-		if (err)
+			end = 1;
+		}
+
+		if (err < 0)
 			goto out;
 		lnum = ubifs_next_log_lnum(c, lnum);
 	} while (lnum != c->ltail_lnum);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC 2/2] UBIFS: scan all log lebs during log replay
  2014-12-25 13:10 ` [PATCH RFC 2/2] UBIFS: scan all log lebs during log replay hujianyang
@ 2015-01-31 12:46   ` Artem Bityutskiy
  2015-02-02  2:49     ` hujianyang
  0 siblings, 1 reply; 4+ messages in thread
From: Artem Bityutskiy @ 2015-01-31 12:46 UTC (permalink / raw)
  To: hujianyang; +Cc: linux-mtd, shengyong1

On Thu, 2014-12-25 at 21:10 +0800, hujianyang wrote:
> UBIFS suppose no more log lebs need to be scanned when
> replay_log_leb() return %1. But it is not true because
> some log lebs maybe incorrectly empty and the next leb
> of this empty one may contain valid filesystem data.

>From your other e-mail, I understood that you are doing fuzzing. That
is, you take a valid file-system, then you corrupt it (e.g., erase one
LEB), and then you try to mount it.

This is a great test, I never saw anyone doing it for UBIFS, so I expect
there may be issues.

Let's agree on expectations.

What do we expect UBIFS to do when we give it a corrupted image? I think
we expect it to either reject it, or accept, but some files may be lost
or corrupted.

And we for sure do not expect UBIFS to crash the kernel, or corrupt the
file-system even more (still good data should not become corrupted).

Now, what this patch does, it essentially it fixes the following UBIFS
behavior.

UBIFS replies the log. It meets empty LEB (this is our injected
corruption), and it believes this is the end of the log. It proceeds and
mounts the FS.

Nothing wrong so far, IMO.

However, the LEBs we missed contain valid data. And what will happen to
them once we start writing to the FS? Right - they'll be erased, and
good data disappears. However, this will be the data that was not
committed, which means this will be something written to the file-system
just before the power cut. And I think it is expected that this data may
disappear. So this is not that bad.

The other _potential_ problem is that the "skipping one LEB" scenario
has never been really tested, and there may be bugs which may lead to
kernel crashes or data loss.

Therefore, while I acknowledge the problem, I do not think this patch is
the right answer to the problem.

The easiest solution is to do nothing. Just let the rest of the log go
away. Does not look big deal to me.

The more difficult thing to do is to start analyzing the log further,
see what's in there, and keep replying if we are sure this is our stuff.
But I cannot come up with a reliable algorithm for this off the top of
my head.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC 2/2] UBIFS: scan all log lebs during log replay
  2015-01-31 12:46   ` Artem Bityutskiy
@ 2015-02-02  2:49     ` hujianyang
  0 siblings, 0 replies; 4+ messages in thread
From: hujianyang @ 2015-02-02  2:49 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, shengyong1

On 2015/1/31 20:46, Artem Bityutskiy wrote:
> On Thu, 2014-12-25 at 21:10 +0800, hujianyang wrote:
>> UBIFS suppose no more log lebs need to be scanned when
>> replay_log_leb() return %1. But it is not true because
>> some log lebs maybe incorrectly empty and the next leb
>> of this empty one may contain valid filesystem data.
> 
>>From your other e-mail, I understood that you are doing fuzzing. That
> is, you take a valid file-system, then you corrupt it (e.g., erase one
> LEB), and then you try to mount it.
> 
> This is a great test, I never saw anyone doing it for UBIFS, so I expect
> there may be issues.

Yes, it is.

One of production teams using UBI/UBIFS met an ECC error during log
replay and current UBIFS refuse to mount. This implement is unacceptable
for them. I was asked to solve this problem.

My partner Sheng Yong and I focus on log replay first, although other
parts of UBIFS may suffer ECC errors which could lead to other kinds of
problems.

We separate *ECC error during log replay* into two parts:

1) ECC error on log lebs
2) ECC error on buds

Current implement directly return an error if there is any unrecoverable
corrupt. Log area contains uncommitted data and the data losing of this
area is acceptable as you said. But here is a problem: log replay not
only update TNC tree but also update LPT area. That means, directly discard
corrupt data may put this partition into a unstable state, and lead to
a problem I described in my last patch:

   [PATCH resend] UBIFS: return -EINVAL if first log leb is empty


Since the log of UBIFS is multi-heads, It not easy to judge which log
should be kept, which log should be discard when an ECC error occur.
So we decide to discard all journals with LPT updating -- mark the used
buds as dirty.

It's easy for the second problem *2) ECC error on buds* to achieve
this implement. But hard for *1) ECC error on log lebs*, because
we don't know which leb contains buds in this case, seems a whole scan
is needed. So we just solve the second problem but the first one is
still pending. I'd like to show you the code this week.

> 
> Let's agree on expectations.
> 
> What do we expect UBIFS to do when we give it a corrupted image? I think
> we expect it to either reject it, or accept, but some files may be lost
> or corrupted.
> 
> And we for sure do not expect UBIFS to crash the kernel, or corrupt the
> file-system even more (still good data should not become corrupted).
> 
> Now, what this patch does, it essentially it fixes the following UBIFS
> behavior.
> 
> UBIFS replies the log. It meets empty LEB (this is our injected
> corruption), and it believes this is the end of the log. It proceeds and
> mounts the FS.
> 
> Nothing wrong so far, IMO.
> 
> However, the LEBs we missed contain valid data. And what will happen to
> them once we start writing to the FS? Right - they'll be erased, and
> good data disappears. However, this will be the data that was not
> committed, which means this will be something written to the file-system
> just before the power cut. And I think it is expected that this data may
> disappear. So this is not that bad.
>
> The other _potential_ problem is that the "skipping one LEB" scenario
> has never been really tested, and there may be bugs which may lead to
> kernel crashes or data loss.
>

Sure. But current embedded system usually store their rootfs on a
NOR flash partition. And we just lose new adding things, it's no
harm for a local system, I see.


> Therefore, while I acknowledge the problem, I do not think this patch is
> the right answer to the problem.
> 
> The easiest solution is to do nothing. Just let the rest of the log go
> away. Does not look big deal to me.
> 
> The more difficult thing to do is to start analyzing the log further,
> see what's in there, and keep replying if we are sure this is our stuff.
> But I cannot come up with a reliable algorithm for this off the top of
> my head.
> 

Agree with you.

So I didn't resend this patch. I need some time to reconsidering this
problem. I want to discuss with you and others when I sent this patch,
and need to make sure it is a right way I'm walking forward.

Your replaying give me confidence to keep on working for reliable
problems.

Could you please discuss more with me when I send out the implement
which recover a *buds ECC error* image?


Thanks~!
Hu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-02  2:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-25 12:58 [PATCH RFC 1/2] UBIFS: fix empty log leb error hujianyang
2014-12-25 13:10 ` [PATCH RFC 2/2] UBIFS: scan all log lebs during log replay hujianyang
2015-01-31 12:46   ` Artem Bityutskiy
2015-02-02  2:49     ` hujianyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.